Python Scrapy Tutorial - 23 - Bypass Restrictions using User-Agent

Описание к видео Python Scrapy Tutorial - 23 - Bypass Restrictions using User-Agent

In the last video we scraped the book section of amazon and we used something known as user-agent to bypass the restriction. So what exactly is this user agent and how is it able to bypass the restrictions placed by amazon?

Whenever a browser like chrome or Mozilla visits a website, that website asks for the identity of your browser. That identity of you browser is known as a user-agent. And if we give the same identity to a website like amazon. It places restrictions and sometimes bans the computer from visiting amazon.

So there are two ways to trick amazon. First is to use user-agents that are allowed by amazon. For example, amazon has to allow Google to crawl it's website if it wants it's products to be shown on Google Search. So we can basically replace our user-agent with Google's user-agent which is known as Google bot and trick amazon into thinking that actually Google is crawling the website and not us. And this exactly what we did in the last video. We found out the Google's user-agent name by typing it in Google Search. And then we replaced our user agent with Google.

The other way is to keep rotating our user-agents. If amazon identifies our computer using our user-agent then we can probably use fake user-agents in rotation and trick amazon into thinking that a lot of browsers are visiting the website instead of just one and this is what we will be learning in this video.

Next video - Bypass restrictions using Proxies
   • Python Scrapy Tutorial - 24 - Bypass ...  

Full playlist -    • Python Web Scraping & Crawling using ...  

Subscribe -    / @buildwithpython  
Website - www.buildwithpython.com
Instagram -   / buildwithpython  

#python

Комментарии

Информация по комментариям в разработке