Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)

Описание к видео Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)

Data is everywhere! Enhance your career and acquire new skills by taking a course on DataCamp! Click here to take the first chapter of any course for FREE: https://datacamp.pxf.io/c/3588040/101... (you’ll be supporting my channel too!)

In this video we scrape Wikipedia pages to create a dataset on Disney movies.

The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.

We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library

Link to code & datasets: https://github.com/KeithGalli/disney-...
Previous tutorial on Beautiful Soup:    • Comprehensive Python Beautiful Soup W...  

If you enjoyed this video, make sure to like & subscribe :)

This video was sponsored by DataCamp

---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup

Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: https://en.wikipedia.org/wiki/Toy_Sto...

Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: https://en.wikipedia.org/wiki/List_of...
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)

Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)

Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)

Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)

---------------------
Extra resources!
Setup Jupyter notebook: https://jupyter.readthedocs.io/en/lat...
Google Colab (cloud-based notebook): https://colab.research.google.com/
Learn regular expressions:    • Python Tutorial: re Module - How to W...  

Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

Join the Python Army to get access to perks!
YouTube -    / @keithgalli  
Patreon -   / keithgalli  

---------------------
Follow me on social media!
Instagram |   / keithgalli  
Twitter |   / keithgalli  

If you are curious to learn how I make my tutorials, check out this video:    • How to Make a High Quality Tutorial V...  

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Комментарии

Информация по комментариям в разработке