Description
Scrapy Masterclass: Web Scraping Python and Data Pipelines Scrapy Masterclass Course: Web Scraping Python and Data Pipelines published by Udemy Academy. Work on 7 real-world web scraping projects using Scrapy, Splash and Selenium. Create data pipelines locally and on AWS.
Everyone tells you what to do with the data you already have. But how can you “own” this data? Most discussions about data engineering and data science today focus on how to analyze and process data sets to extract useful insights. However, they all assume that these datasets are already available. which are brought together in one way or another. They spend quite a bit of time showing you how to get your hands on this dataset! This course fills this gap. Scrapy is about preparing you for the process of extracting interesting data from websites to create powerful web scraping pipelines. That’s right, there are tons of datasets out there right now that you can consume for free or for a fee. However, what happens if these datasets are outdated? What if they don’t meet your specific needs? It’s best to know how to build your dataset from scratch, regardless of how unstructured your data source is.
Scrapy is a Python web scraping framework. Thousands of businesses and professionals use it to collect data and create datasets. They can then sell them or use them in their own projects. Today you can be one of these professionals. Even start your own business based on data collection! Today, data scientists and data engineers are among the highest paid in the industry. However, they can’t do anything if they don’t have enough data to work with. In this course, I’ll show you how to capture, organize, and store unstructured data from HTML, CSS, and JavaScript websites. By mastering this skill, you can begin your data engineering/science career with an additional skill set under your belt: web scraping. You will also learn the next steps after obtaining your information. ETL (Extract, Transform and Load) starts with Scrapy (Extract). But this course covers two other aspects (Transformation and Loading). Using Scrapy Pipelines, we’ll see how to store our data in SQL and NoSQL databases, Elasticsearch clusters, event brokers like Kafka, object storage like S3, and message queues like AWS SQS. Even if you don’t know anything about web scraping or data collection, even if this all seems new to you, you’ve come to the right place.
What you will learn in the Scrapy masterclass: Python web scraping and data pipelines course:
- Extract data from the toughest websites using Scrapy
- Create ETL pipelines and store data in CSV, JSON, MySQL, MongoDB and S3 formats.
- Avoid getting banned and avoid bot protection techniques.
- Use Splash to remove JavaScript-enabled websites.
- Use the power of Selenium browser automation to delete any website.
- Deploy your Scrapy bots in on-premises and AWS environments.
Who should attend :
- Anyone wanting to automate data collection from websites (web scraping) using Scrapy.
- Anyone wanting to start a business around data collection and web scraping.
- Data engineers, data scientists, ML engineers who want to master web scraping for their data collection needs.
- Developers, DevOps engineers or IT professionals who want to move into data engineering.
- Python programmers who want to learn more about Scrapy or web scraping in general.
Course Specifications
- Editor: Udemy
- Instructor: Ahmed Elfakharany
- French language
- Training level: Introductory to Advanced
- Number of courses: 40
- Training duration: 5 hours and 44 minutes
Course themes 2022/12
Course prerequisites
A little Python experience
All projects are running on Python 3.10, so it must be installed
Knowledge of Linux is recommended but not strictly required
Familiarity with HTTP and HTML protocol
Pictures
Scrapy Masterclass: Introductory video to Python web scraping and data pipelines
installation guide
After the clip, watch with your favorite reader.
english subtitles
Quality: 720p
Download link
password file(s): free download software
File size
2.85 GB