Intro to Data Science in Python
Introduction to Data Science in Python
- University of Michigan
Module 1 - Python Types and Sequences
https://www.coursera.org/learn/python-data-analysis/home/info
Module 1 - Python More on Strings
Module 1 - Python Demo: Read and write CSV files
Module 1 - Python Dates and Times
Module 1 - Python Objects and Map()
- This line creates a list named
people
, which contains strings representing the names of faculty members, including their titles (e.g., "Dr.") and full names.
def split_title_and_name(person):
- This line defines a function named
split_title_and_name
that takes a single parameter,person
. This parameter will represent each string from thepeople
list.
title = person.split()[0]
- The
split()
method is called on theperson
string. This method splits the string into a list of words based on spaces. - The first element of this list, which represents the title (e.g., "Dr."), is assigned to the variable
title
.
lastname = person.split()[-1]
- Similar to the previous line, the
split()
method is used again. - Instead of accessing the first element,
[-1]
is used to get the last element of the split list, which represents the last name of the person (e.g., "Brooks").
return '{} {}'.format(title, lastname)
- This line constructs a new string using the
format()
method. - It combines the
title
andlastname
into a single string in the format "Title Lastname" (e.g., "Dr. Brooks") and returns this string from the function.
list(map(split_title_and_name, people))
- The
map()
function applies thesplit_title_and_name
function to each element of thepeople
list. - The result of
map()
is an iterable, which is then converted to a list using thelist()
constructor. - This creates a new list containing the formatted strings generated by the
split_title_and_name
function.
Module 1 - Advanced Python Lambda and List Comprehensions
Question 1: Convert this function into a lambda:
Output:
Q2a:
Q3:
ans1.: by Poe
by sol.:
Module 1 - Numerical Python Library
Module 1 - Manipulating Text with Regular Expression
- Regex
- In this lecture we're going to talk about pattern matching in strings using regular expressions. Regular expressions, or regexes, are written in a condensed formatting language. In general, you can think of a regular expression as a pattern which you give to a regex processor with some source data. The processor then parses that source data using that pattern, and returns chunks of text back to the a data scientist or programmer for further manipulation. There's really three main reasons you would want to do this - to check whether a pattern exists within some source data, to get all instances of a complex pattern from some source data, or to clean your source data using a pattern generally through string splitting. Regexes are not trivial, but they are a foundational technique for data cleaning in data science applications, and a solid understanding of regexs will help you quickly and efficiently manipulate text data for further data science application.
- Now, you could teach a whole course on regular expressions alone, especially if you wanted to demystify how the regex parsing engine works and efficient mechanisms for parsing text. In this lecture I want to give you basic understanding of how regex works - enough knowledge that, with a little directed sleuthing, you'll be able to make sense of the regex patterns you see others use, and you can build up your practical knowledge of how to use regexes to improve your data cleaning. By the end of this lecture, you will understand the basics of regular expressions, how to define patterns for matching, how to apply these patterns to strings, and how to use the results of those patterns in data processing.
- Finally, a note that in order to best learn regexes you need to write regexes. I encourage you to stop the video at any time and try out new patterns or syntax you learn at any time.
- Lab link: https://www.coursera.org/learn/python-data-analysis/ungradedLab/lwFS6/module-1-jupyter-notebooks/lab?path=%2Flab%2Ftree%2Fresources%2Fcourse1%2Fweek-1%2FRegex_ed.ipynb
(Pattern inside)grades="ACAAAABCBCBAA"
(Pattern inside)grades="ACAAAABCBCBAA"
Q5:
Grok:
Week #1 - Quiz #1:
Quiz1 Link: https://www.coursera.org/learn/python-data-analysis/assignment-submission/1srgc/quiz-1
===== ===== =====
Module #2
Module 2 - Querying a Series
- Lab - Query a series - QueryingSeries_ed.ipynb: https://www.coursera.org/learn/python-data-analysis/ungradedLab/hEbnB/module-2-jupyter-notebooks/lab?path=%2Flab%2Ftree%2Fresources%2Fcourse1%2Fweek-2%2FQueryingSeries_ed.ipynb
- Poe explains:
Module 2 - DataFrame Data Structure
-----
Module 2 - DataFrame Indexing and Loading
In this lecture, you've learned how to import a CSV file into a pandas DataFrame object, and how to do some basic data cleaning to the column names. The CSV file import mechanisms in pandas have lots of different options, and you really need to learn these in order to be proficient at data manipulation. Once you have set up the format and shape of a DataFrame, you have a solid start to further actions such as conducting data analysis and modeling.
Now, there are other data sources you can load directly into dataframes as well, including HTML web pages, databases, and other file formats. But the CSV is by far the most common data format you'll run into, and an important one to know how to manipulate in pandas.
-----
Module 2 - Querying a DataFrame
----- ----- -----
Quiz #2: Q1:
- Q2:
Not capital letter on the 1st row of the following table:
Q4: For the given DataFrame df we want to keep only the records with a toefl score greater than 105. Which of the following will not work?
-----
Module 3 - More data Processing with Pandas
- Merging Dataframes
- Q1: Consider the two DataFrames shown below, both of which have Name as the index. Which of the following expressions can be used to get the data of all students (from student_df) including their roles as staff, where nan denotes no role?
(Hints: test in: MergingDataFrame_ed
/lab/tree/resources/course1/week-3/MergingDataFrame_ed.ipynb)
- Q2: Consider a DataFrame named df with columns named P2010, P2011, P2012, P2013, 2014 and P2015 containing float values. We want to use the apply method to get a new DataFrame named result_df with a new column AVG. The AVG column should average the float values across P2010 to P2015. The apply method should also remove the 6 original columns (P2010 to P2015). For that, what should be the value of x and y in the given code?
/lab/tree/resources/course1/week-3/PandasIdioms_ed.ipynb)
- Q3: Consider the Dataframe df below, instantiated with a list of grades, ordered from best grade to worst. Which of the following options can be used to substitute X in the code given below, if we want to get all the grades between 'A' and 'B' where 'A' is better than 'B'?
(Hints: Scales /lab/tree/resources/course1/week-3/Scales.ipynb)
- Q4: Consider the DataFrame df shown in the image below. Which of the following can return the head of the pivot table as shown in the image below df? (Hints: PivotTable_ed /lab/tree/resources/course1/week-3/PivotTable_ed.ipynb)
- Q5: Assume that the date '11/29/2019' in MM/DD/YYYY format is the 4th day of the week, what will be the result of the following? (hints: use DateFunctionality_ed
- Q6: Consider a DataFrame df. We want to create groups based on the column group_key in the DataFrame and fill the nan values with group means using: (hints: GroupBy_ed /lab/tree/resources/course1/week-3/GroupBy_ed.ipynb )
- Q7: Consider the DataFrames above, both of which have a standard integer based index. Which of the following can be used to get the data of all students (from student_df) and merge it with their staff roles where nan denotes no role? (hints: using MergingDataFrame_ed
- Q8: Consider a DataFrame df with columns name, reviews_per_month, and review_scores_value. This DataFrame also consists of several missing values. Which of the following can be used to:
- Q9: What will be the result of the following code?:
- Q10: Which of the following is not a valid expression to create a Pandas GroupBy object from the DataFrame shown below?
Strings -Basics / Slicing
msg='welcome to Python 101: Strings'
msg1=msg[18]+' '+msg[:8]+msg[25:29]+msg[7:11]+msg[13]+msg[12]+msg[2]+msg[1]+msg[-5]
print(msg1.title())
print(msg1[::-1].title())
Strings-2
User Inputs
User Input - Exercise
ans.:
Lists- Basics 1
Lists 2:
Lists Exercise:
https://www.coursera.org/learn/learn-python-1/ungradedWidget/bauQ8/lists-exercise
Split & Join:
Tuples:
Using Sets
Sets - Exercises
-----
Scrimba: Free Python Learning Resources: https://www.v1.scrimba.com/learn/python
How I Would Learn Python FAST (if I could start over)
-----
3) Youtube Video: 5 AI Projects You Can Build This Weekend (with Python)
留言
張貼留言