Get started with Python - a Google Course Subject

(You can download the files from the Github and run in Google's Colab by clicking the icon "Open in Colab", or in your local machine)

Module 1 - Helpful resources and tips

Quiz Q1 - Q3

Quiz Q4 - Q6

Google's Lecturer

Course descriptions - Overview

https://www.coursera.org/learn/get-started-with-python/supplement/UD00Z/course-2-overview

The Google Advanced Data Analytics Certificate has seven courses. Get Started with Python is the second course.

Seven icons show courses sequentially from left to right with course 2 highlighted.

Foundations of Data Science — Learn how data professionals operate in the workplace and how different roles in the field of data science contribute to an organization’s vision of the future. Then, explore data science roles, communication skills, and data ethics.
Get Started with Python — (current course) Discover how the programming language Python can power your data analysis. Learn core Python concepts, such as data types, functions, conditional statements, loops, and data structures.
Go Beyond the Numbers: Translate Data into Insights — Learn the fundamentals of data cleaning and visualizations and how to reveal the important stories that live within data.
The Power of Statistics — Explore descriptive and inferential statistics, basic probability and probability distributions, sampling, confidence intervals, and hypothesis testing.
Regression Analysis: Simplify Complex Data Relationships — Learn to model variable relationships, focusing on linear and logistic regression.
The Nuts and Bolts of Machine Learning — Learn unsupervised machine learning techniques and how to apply them to organizational data.
Google Advanced Data Analytics Capstone — Complete a hands-on project designed to demonstrate the skills and competencies you acquire in the program.

In the course, one will also learn about NumPy, pandas, statsmodels, matplotlib, seaborn, scikit-learn, and more. These are code libraries that are used every day by data professionals on the job. You'll explore these in detail later.

Jupyter Notebook

Module #2: How to use Jupyter Notebooks : https://www.coursera.org/learn/get-started-with-python/supplement/2poER/how-to-use-jupyter-notebooks

Why Jupyter Notebook?

Notebooks are particularly useful for working with data. Here are some ways that Jupyter notebooks excel:

Modular/interactive computing: You can write and execute individual chunks of code in small, manageable chunks, which are called cells. You can run a cell without necessarily having to run the whole notebook. This is especially helpful for data exploration and experimentation. Cells are also helpful with debugging, because they provide a user-friendly way to make a mistake, notice that you made the mistake, and iterate back to correct your mistake, without having to re-execute a whole script.
Integration of code and documentation: Notebooks allow you to combine code, textual explanations, and visualizations like charts, graphs, and tables—all in a single document.
Support for multiple languages: The Advanced Data Analytics program will use Python, but Jupyter notebooks support many other languages, making them powerful and versatile.
Data exploration and analysis: The notebook simplifies working with data by offering tools to load, clean, analyze, and examine it in an elegant interface.
Cloud-based services: Many cloud computing platforms host Jupyter notebooks, which makes it easy to run and share notebooks without setting up a local environment. This is very useful for collaboration.
Libraries and extensions: There is a rich ecosystem of extensions and plugins that enhance functionality for whatever type of project you’re working on.

Resources for more information about Jupyter Notebooks:

Jupyter Notebooks interface training
Jupyter software homepage
Jupyter documentation
Jupyter Notebooks cloud (online)
Jupyter community forum
Jupyter notebooks community forum
Python community forum
StackOverflow questions (crowdsource forum to help solve problems)
Jupyter Notebooks installation

Module #1 - Object-Oriented Programming

OOP 4:42 : planets' attributes are its shape and columns (below pictures)

Attributes allow you to access

The four fundamental concepts in object-oriented programming include objects, classes, attributes, and methods.

Week #1 Quiz 1 (Q4 - Q6)

Try to run in Github or Jupyter Notebook at: https://github.com/patrickyip0/Get-Start-Python-with-Google

Module 1 - Variables and data types

https://www.coursera.org/learn/get-started-with-python/lecture/k3ex2/variables-and-data-types

Variables can store values of any data type. A data type is an attribute that describes a piece of data based on its values, its programming language, or the operations it can perform. In Python, this includes strings, integers, floats, lists, dictionaries, and more.

Module 1 - Create Precise Variable Names

Create Precise Variable Names: https://www.coursera.org/learn/get-started-with-python/lecture/fB03O/create-precise-variable-names

Naming conventions are consistent guidelines that describe the content, creation date, and version of a file in its name.
Naming restrictions are rules built into the syntax of the language itself that must be followed.
Some important naming conventions: To avoid keywords: (that are reserved for a specific purpose and that can only be used for that purpose) e.g. "for," "in," "if," and "else" (which appear in special colors).

e.g.

Module 1 - Data Types and Conversions

Data types and conversions: https://www.coursera.org/learn/get-started-with-python/lecture/z9zda/data-types-and-conversions

You can use the type function to have the computer tell you the data type. e.g. type(3)
Run e.g. in JupyterLab or Google Colab: https://colab.research.google.com/
Run in my Google Colab at: https://colab.research.google.com/drive/1xkygHo655LA_qJlfrEhlz1XSAnPVpojA?usp=sharing

Week 1 - Quiz - Test your knowledge: Using Python syntax

Terms and definitions from Course 2, Module 1

Argument: Information given to a function in its parentheses
Assignment: The process of storing a value in a variable
Attribute: A value associated with an object or class which is referenced by name using dot notation
Cells: The modular code input and output fields into which Jupyter Notebooks are partitioned
Class: An object’s data type that bundles data and functionality together
Computer programming: The process of giving instructions to a computer to perform an action or set of actions
Data type: An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform
Dot notation: How to access the methods and attributes that belong to an instance of a class
Dynamic typing: Variables that can point to objects of any data type
Explicit conversion: The process of converting a data type of an object to a required data type
Expression: A combination of numbers, symbols, or other variables that produce a result when evaluated
Float: A data type that represents numbers that contain decimals
Immutable data type: A data type in which the values can never be altered or updated
Implicit conversion: The process Python uses to automatically convert one data type to another without user involvement
Integer: A data type used to represent whole numbers without fractions
Jupyter Notebook: An open-source web application for creating and sharing documents containing live code, mathematical formulas, visualizations, and text
Keyword: A special word in a programming language that is reserved for a specific purpose and that can only be used for that purpose
Markdown: A markup language that lets the user write formatted text in a coding environment or plain-text editor
Method: A function that belongs to a class and typically performs an action or operation
Naming conventions: Consistent guidelines that describe the content, creation date, and version of a file in its name
Naming restrictions: Rules built into the syntax of a programming language
Object: An instance of a class; a fundamental building block of Python
Object-oriented programming: A programming system that is based around objects which can contain both data and code that manipulates that data
Programming languages: The words and symbols used to write instructions for computers to follow
String: A sequence of characters and punctuation that contains textual information
Syntax: The structure of code words, symbols, placement, and punctuation
Typecasting: Converting data from one type to another (see explicit conversion)
Variable: A named container which stores values in a reserved location in the computer’s memory

--- ---

Module #2

Module 2 - Define functions and returning values

To write clean codes
A function is a body of reusable code for performing specific processes or tasks.
Built-in functions: print(), str(),
Github hyperlink: https://github.com/patrickyip0/Get-Start-Python-with-Google/blob/

Modularity (~ Reusability): Modularity is the ability to write code in separate components that work together and that can be reused for other programs.
Refactoring: Refactoring is the process of restructuring code while maintaining its original functionality. This is a part of creating self-documenting code.
Self-documenting code is code written in a way that is readable and makes its purpose clear.

Module 2 - Use comments to scaffold your code

Use comments - https://www.coursera.org/learn/get-started-with-python/lecture/EKWJa/use-comments-to-scaffold-your-code

Github #3 hyperlink: https://github.com/patrickyip0/Get-Start-Python-with-Google/blob/ (#3. Use comments to scaffold your code)

Module 2 - Make comparisons using operators

Link: Make comparisons using operators

Comparators
Logical operators: are operators that connect multiple statements together and perform more complex comparisons, e.g. and, or, not ...

Run the lab file in Github (click the icon "Open in Colab")

Module 2 - Use if, elif, else statements to make decisions

Link: Use if, elif, else statements to make decisions

==>

Lab Running in Github: ipyb Jupter Notebook (Run in Colab)

Lab Exemplar in Github: Jupyter Notebook (click "Run in Colab")

Wrap Up:

Module 2 - week 2 challenges

Module 3 - week 3

Learnt: Variables, Data types, Functions, Operators, To write clean code, Conditional statements
To learn: Loops, Strings

Module 3 - While Loops

Hyperlink: Intro While Loops

Try to run in my Github or Colab.

The word "break" is a keyword that lets you escape a loop without triggering any ELSE statement that follows it in the loop.

Run Lab activity - While Loops (Github Codespace, or Colab)

Module 3 - For Loops

Hyperlink: Intro For Loops

Try to run in my Github or Colab.

For-Loops vs While-Loops

Use for-loops when there's a sequence of elements that you want to iterate over. E.g., to loop over a variable, such as a record in a dataset, it's always better to use for loops.
Use while-loops when you want to repeat an action until a boolean condition changes, without having to write the same code repeatedly.

Module 3 - Exemplar Lab

Download or Run in the Github space.

Module 3 - Work With Strings

Hyperlink: Work /w Strings

Try to run Jupyter Notebook in my Github

Module 3 - String Slicing

Hyperlink: String slicing

Try to run in my Github

Indexing is Python's way of letting us refer to individual items within an iterable by their relative position, allowing us to select, filter, edit and manipulate data.
Indexing can be used on: strings, lists, tuples and most other iterable data types.
Indexing lets us slice strings to create smaller strings, or substrings.

Python running in Jupyter Notebook (Github.dev codespace)

If you do not know how long the string is, use [-1]:

Module 3 - Format Strings

Hyperlink: Format Strings

Try to run in my Github

The format method formats and inserts specific substrings into designated places within a larger string. format()

Quiz:

Module 3 Challenge

Module 4 - Lists & Tuples - Welcome

Hyperlink: Module 4 - Lists & Tuples - Welcome

Try to run in my Github

Data Structures are collections of data values or objects that contain different data types.

Two of the most important libraries and packages for data professionals:

1) Numerical Python (NumPy), which is known for its high-performance computational power.
Data professionals use NumPy to rapidly process large quantities of data. It's so useful for analyzing large and complex datasets.
2) Python Data Analysis Library (Pandas), which is a key tool for advanced data analytics.
Pandas makes analyzing data in the form of a table with rows and columns easier and more efficient, because it has tools specifically designed for the job.

Module 4 - Intro to Lists

Hyperlink: Module 4 - Intro to Lists

Try to run in my Github

List is a data structure that helps store, and manipulate an ordered collection of items, e.g. a list of email addresses associated with a user account.
It allows indexing and slicing.
A sequence is a positionally ordered collection of items. Lists are sequences of elements of any data type.
Note that different data structures are either mutable or immutable -- Mutability refers to the ability to change the internal state of a data structure.
Lists are mutable (their elements can be modified, added, or removed) while strings are immutable.

Module 4 - Modify the contents of a List

Hyperlink: Module 4 - Modify the Contents of Lists

Try to run in my Github

append(element) : append method adds an element to the end of a list.
XXX.insert(#, yyy) : insert(index#, element) is a function that takes an index as the first parameter and an element as the second parameter, then inserts the element into a list.
XXX.remove(index#, element) is a method that removes an element from a list.
xxx.pop(index#) : pop(#) function extracts an element from a list by removing it at a given index number.
Reference Guide Lists
Forgot to upload the dataset file "train.csv" to Github => Errors

In terminal, run command: git pull origin main # or 'master' if using older repo

Python REPL for interactive debugging

Module 4 - Intro Tuples

Hyperlink: Module 4 - Intro Tuples

Try to run in my Github

Module 4: More with Loops, Lists and Tuples

Module 4 - zip(), enumerate(), and list comprehension

Hyperlink: Module 4 - zip(), enumerate() and list()

Try to run in my Github

enumerate( )
Quiz:

Module 4 - Introduction to Dictionaries

Hyperlink: Module 4 - Intro Dictionaries

Try to run in my Github

Dictionary is a data structure that consists of a collection of key-value pairs.

Dict( ) can be used to create a dictionary function.
Immutable keys include: integers, floats, tuples and strings.
Mutable data types cannot be used as keys: Lists, Sets, and other dictionaries.

Module 4 - Dictionary Methods

Hyperlink: Module 4 - Dictionary Methods

Try to run in my Github

Module 4 - Introduction to Sets

Hyperlink: Module 4 - Introduction to Sets

Try to run in my Github

A set is a data structure in Python that contains only unordered, non-interchangeable elements.
Sets are instantiated with the set( ) function or non-empty braces.
set( ) is a function takes an iterable as an argument and returns a new set object.
Each element in sets must be unique.
Reference for "iterable": iterable, iterator and generator in Python
To define an empty set, you have to use the set( ) function.
Because the elements inside a set are immutable, a set cannot be indexed or sliced.
intersection( ) finds the elements that two sets have in common.
Union( ) finds all the elements from both sets.
Difference( ) finds the elements present in one set, but not the other.
Symmetric_difference( ) finds elements from both sets that are mutually not present in the other.
Ref.: Reference guide: Sets
Quiz:

Module 4 - The Power of Packages (Numpy)

Hyperlink: Module 4 - The Power of Packages (Numpy)

Try to run in my Github

A library, or package refers to a reusable collection of code. It also contains related modules and documentation.
Python libraries: matplotlib, seaborn, Numpy and Pandas.
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Seaborn is a data visualization library that's based on matplotlib. It provides a simpler interface for working with common plots and graphs.
NumPy (Numerical Python) is an essential library that contains multidimensional array and matrix data structures and functions to manipulate them. This library is used for scientific computation.
Pandas (Python Data Analysis) is a powerful library built on top of NumPy that's used to manipulate and analyze tabular data.
Scikit-learn, a library, and statsmodels, a package, consist of functions. Data professionals can use them to test the performance of statistical models.
Modules are accessed from within a package or a library. They are Python files that contain collections of functions and global variables.
Commonly used modules for data professional work are: Math and random (functions).

Module 4 - Introduction to Numpy

Hyperlink: Module 4 - Introduction to Numpy

Try to run in my Github

NumPy's power comes from vectorization. Vectorization enables operations to be performed on multiple components of a data object at the same time (especially useful when manipulating very large quantities of data):

More efficient and faster
Vectors also take less memory space.

Import statement is used to load an external library, package, module, or function into your computing environment.
Aliasing lets you assign an alternate name, or alias, by which you can refer to something (abbreviating Numpy to np).
Ref. Commonly used built-in modules
".shape" attribute to confirm the shape of an array.
".ndim" attribute to confirm the number of dimensions the array has.
Ref.: Reference Guide - Arrays + NumPy Refs + NumPy Tutorials + NumPy User Guide + NumPy Docs

Module 5 - Introduction to Pandas

Hyperlink: Module 4 - Introduction to Pandas

Try to run in my Github

Pandas' key functionality is the manipulation and analysis of tabular data - i.e. data in the form of a table (with rows and columns), e.g. a spreadsheet.
Pandas has two core object classes: dataframes and series
Dataframe is a two-dimensional, labeled data structure with rows and columns: e.g. a spreadsheet or a SQL table.
Below 1st example is a dataframe created from a dictionary, where each key of the dictionary represents a column name, and the values for that key are in a list.
The 2nd example is created from a NumPy array, resembling a list of lists, where each sub-list represents a row of the table. [execution #94]
Series is a One-D labeled array.
NaN = Not a Number = Null values that are represented in pandas, standing for "not a number".
Use dot notation, but this only works if the column name does not contain any whitespaces.: e.g. df3.Age or df3.['Age']
Better to use bracket notation, because it makes the code easier to read: e.g. df3[['Name', 'Age']]
To select rows or columns by index, you'll need to use iloc[ ]: e.g. df3.iloc[0] # row 0

e.g. row 0 of data frame: df3.iloc[[0]] # the whole row 0 (also shown in [105])
e.g. entire rows below: df3.iloc[0:3] # the whole row 0 to 2 (also in [106])
e.g. select subsets: df3.iloc[0:3, [3, 4]] # the rows 0 to 2 at columns 3, 4 (in [108])
e.g. get a dataframe view of all rows at column #3: df3.iloc[:, [3]] # all rows at column #3 (in [107])
e.g. use iloc to access value in row 0, column 3: df3.iloc[0, 3]

Boolean masking is a filtering technique that overlays a Boolean grid onto a dataframe in order to select only the values in the dataframe that align with the True values of the grid.
Grouping and Aggregation m4 分組 & 聚合
Groupby is a pandas DataFrame method that groups rows of the dataframe together based on their values at one or more columns, which allows further analysis of the groups: df4.groupby(['type']).sum( ) # sum, mean, min, max, median, count( )
dataframe df:
agg( ) = aggregate = a Panda groupby method allows you to apply multiple calculations to groups of data.
Link: More on grouping and aggregation : .groupby( ), .min (), .size( )
Coursera: M4 - Merging and joining data

Panda Functions: concat( ) and merge( )
The function "concat( )" combines data either by adding it horizontally as new columns for existing rows, or vertically as new rows for existing columns.

----- =====

Top Popular IDEs (Integrated Development Platforms) for Python Development