Scrape Google Search Results With Python(2023)
The Python programming language was developed in 1991 by Guido van Rossum with its main emphasis on code readability and clear and concise syntax.
This tutorial will teach us to scrape Google Search Results using Python. Python has gained vast popularity in the web scraping community due to advantages like readability, scalability, etc. This makes it a great alternative to other programming languages and a perfect choice for web scraping tasks.
This blog post will not only focus on scraping Google but also provide you with a clear understanding of why Python is the best choice for extracting data from Google and what are the benefits of collecting information from Google.
We are going to use HTTPX
and BS4
for scraping and parsing the raw HTML data.
By the end of this article, you will have a basic understanding of scraping Google Search Results with Python. You can also leverage this knowledge for future web scraping projects with other programming languages.
Why Scrape Google Search Results?
There are a variety of benefits for scraping Google Search Results:
SEO — Scrape Google Search Results to track your website performance on Google to monitor the rankings for a particular set of keywords.
Ad Monitoring — Monitor your competitors’ ads for a variety of keywords and gain insights into their marketing tactics to capture the audience in the market.
Lead Generation — It can also be used to extract contact information about your potential clients whom you can later target for marketing and communication purposes.
Why Python for Scraping Google?
Python is a robust and powerful language that has given great importance to its code readability and clarity. This enables beginners to learn and implement scraping scripts quickly and easily. It also has a large and active community of developers who can help you in case of any problem in your code.
Another advantage of using Python is that it offers a wide range of frameworks and libraries specifically designed for scraping data from the web, including Scrapy, BeautifulSoup, Playwright, and Selenium.
Overall, Python offers numerous advantages like high performance, scalability, and various other scraping resources. This makes it stands as an excellent choice for not only extracting data from Google but also for other web scraping tasks.
Scraping Google Search Results Using Python
In this blog post, we will create a basic Python script to scrape the first ten Google Search Results, including their title, description, and link.
Set-Up
For those users who have not installed Python on their devices, please consider these videos:
If you don’t want to watch videos, you can directly install Python from their official website.
Installing Libraries
Now, let’s install the necessary libraries for this project in our folder.
Beautiful Soup —A third-party library to parse the extracted HTML from the websites.
HTTPX— A fully featured HTTP client for Python to extract data from websites.
If you don’t want to read their documentation, install these two libraries by running the below commands.
pip install httpx
pip install beautifulsoup4
Process
So, we have completed the setup of our Python project for scraping Google. Let us first import the libraries we will use further in this tutorial.
import httpx
import asyncio
from bs4 import BeautifulSoup
Then, we will define an asynchronous function that will scrape the organic data from this webpage.
async def get_organic_data():
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4703.0 Safari/537.36"
}
async with httpx.AsyncClient() as client:
response = await client.get("https://www.google.com/search?q=python+tutorial&gl=us&hl=en", headers=headers)
After defining the function, we initialized the headers variable to User Agent to make our scraping bot mimic an organic user.
User Agent is a request header that identifies the device requesting the software.
Refer to this guide if you want to learn more about headers: Web Scraping With Python
Then, we used an asynchronous context manager to create an HTTP client. Finally, we used this client to make an HTTP GET request on our target URL with the specified headers by using the await
keyword to wait for the response from the server.
Now, within the context manager, we will create a BeautifulSoup object to parse and navigate through the HTML.
async with httpx.AsyncClient() as client:
response = await client.get("https://www.google.com/search?q=python+tutorial&gl=us&hl=en", headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
After creating the Beautiful Soup object, we will locate the tags for the required elements from the HTML.
If you inspect the webpage, you will get to know that every organic result is under the div
container with class g
.
So, we will loop over every div
having class g
to get the required information from the HTML.
organic_results = []
for el in soup.select(".g"):
Then, we will locate the tags for the title, description, and link.
If you further inspect the HTML, or if you take a look at the above image, you will find that the tag for the title is h3
, the tag for the link is .yuRUbf > a
and the tag for the description is .VwiC3b
.
organic_results = []
i = 0
for el in soup.select(".g"):
organic_results.append({
"title": el.select_one("h3").text,
"link": el.select_one(".yuRUbf > a")["href"],
"description": el.select_one(".VwiC3b").text,
"rank": i+1
})
i+=1
print(organic_results)
asyncio.run(get_organic_data())
Run this code in your terminal. You will be able to get the required data from Google.
[
{
"title": "The Python Tutorial \u2014 Python 3.11.3 documentation",
"link": "https://docs.python.org/3/tutorial/",
"description": "This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. It helps to have a Python interpreter\u00a0...",
"rank": 1
},
{
"title": "Python Tutorial",
"link": "https://www.w3schools.com/python/",
"description": "Learn by examples! This tutorial supplements all explanations with clarifying examples. See All Python Examples. Python Quiz. Test your Python skills with a\u00a0...",
"rank": 2
},
.....
Congratulations !! You have successfully made a Python script to scrape Google Search Results.
But this method still can’t be used to scrape data from Google at a large scale, as this can result in a permanent block of your IP by Google. Instead, you can try this Google Scraper API to scrape data from Google without getting blocked.
Using Google Search API to Scrape Search Results
Serpdog provides an easy and streamlined solution to scrape Google Search Results with its robust SERP APIs, and it also solves the problem of dealing with proxies and CAPTCHAs for a smooth scraping journey. It provides tons of extra data other than organic results in the most affordable pricing in the whole industry.
You will also get 1000 free API credits upon signing up.
After registering on our website, you will get an API Key. Copy your API Key in the below code, and you will be able to easily scrape Google Search Results with Python at a rapid speed.
import requests
payload = {'api_key': 'APIKEY', 'q':'python+tutorial' , 'gl':'us'}
resp = requests.get('https://api.serpdog.io/search', params=payload)
print (resp.text)
Conclusion:
In this tutorial, we learned to scrape Google Search Results using PHP. Feel free to message me anything you need clarification on. Follow me on Twitter. Thanks for reading!