Using WebScrapingAPI with Python

In this guide, we will go over the best ways to use WebScrapingAPI as part of a larger script written in Python. Before beginning, make sure you retrieved your unique API key found in the account dashboard after making an account. The API key will be necessary for any kind of request sent to WebScrapingAPI. A basic request that scrapes the URL "https://httpbin.org/get" will look like this:

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get"
}

response = requests.request("GET", url, params=params)

print(response.text)

Next, we will start adding request parameters to the API call in order to customize the request and subsequent results.

Rendering JavaScript

Some web pages render essential page elements using JavaScript, which means that some content is not present (and therefore not scrapable) with the initial page load. With the render_js parameter enabled, the WebScrapingAPI is capable of accessing the target web using a headless browser and allows JavaScript page elements to render before delivering the final scraping result. To enable JavaScript rendering, simply append the render_js HTTP GET parameter to your API request URL and set it to 1. By default, this parameter is set to 0 (off).

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "render_js":"1"
}

response = requests.request("GET", url, params=params)

print(response.text)

Proxies

The WebScrapingAPI is making use of a pool of 100+ million IP addresses worldwide, making your requests impossible to unblock. We keep two separate pools made of private data center IPs and residential together with mobile IPs.

Across both data center and residential proxies, the WebScrapingAPI supports more than 195 global geolocations your scraping request can be sent from.

Specify the proxy type you want to use with your request using the proxy_type parameter, for data center proxies proxy_type=datacenter and for residential proxies proxy_type=residential.

Datacenter proxies can be selected by passing the parameter proxy_type=datacenter:

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "proxy_type":"datacenter"
}

response = requests.request("GET", url, params=params)

print(response.text)

Residential proxies can be selected by passing the parameter proxy_type=residential:

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "proxy_type":"residential"
}

response = requests.request("GET", url, params=params)

print(response.text)

Geolocation

Using the API’s country HTTP GET parameter you can choose a specific country by its 2-letter country code for your scraping request. Please find an example API request on the right side, which specifies US (for United States) as a proxy location using the default datacenter proxies.

Data Center Proxies Supported Countries

For datacenter proxies, the API currently supports a total of 7 global geolocations: United States (us), Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), and Australia (au).

Residential Proxies Supported Countries

For premium (residential) proxies, the API currently supports a total of 40 global geolocations. You can download a full list of supported countries and 2-letter country codes using the following link.

Access to 195 countries is available to Enterprise customers upon request.

Geolocation can be specified with the country parameter:

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "proxy_type":"datacenter",
  "country":"us"
}

response = requests.request("GET", url, params=params)

print(response.text)

Sessions

To reuse the same proxy for multiple requests, simply use the session parameter (e.g. session=123). The value can be any integer. Simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 15 minutes after the last usage.

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "proxy_type":"datacenter",
  "country":"us",
  "session":"100"
}

response = requests.request("GET", url, params=params)

print(response.text)

Custom Headers

If you would like to add custom/additional headers (user agents, cookies, etc.), simply add them as parameters to the request.

Only use this feature to get customized results, do not use this feature to avoid blocks. WebScrapingAPI has a powerful internal engine that takes care of everything for you.

Below, you will find an example request used to scrape the URL https://httpbin.org/headers, which will mirror the headers sent.

import http.client

conn = http.client.HTTPSConnection("api.webscrapingapi.com")
api_key = "XXXXXX"
url = "http%3A%2F%2Fhttpbin.org%2Fheaders"
full_url = f"/v1?api_key={api_key}&url={url}"

my_headers = {
    "My-header": "test",
    "Accept": "application/json",
    "User-Agent": "potato",
    "Cookie": "name1=value1; name2=value2"
}

conn.request("GET", full_url, headers=my_headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

Custom Cookies

If you would like to send custom cookies to the target website, simply add them as parameters to the request or as the “Cookie” header.

Below, you will find an example request used to scrape the URL https://httpbin.org/cookies, which will mirror the cookies sent.

import http.client

conn = http.client.HTTPSConnection("api.webscrapingapi.com")
api_key = "XXXXXX"
url = "http%3A%2F%2Fhttpbin.org%2Fcookies"
full_url = f"/v1?api_key={api_key}&url={url}"

my_headers = {
    "Accept": "application/json",
    "Cookie": "name1=value1; name2=value2"
}

conn.request("GET", full_url, headers=my_headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

POST Requests

(BETA) Now it is possible to scrape forms or API endpoints directly. You can do this by sending a POST request to WebScrapingAPI with your api_key and url parameter.

Below, you will find an example request used to scrape the URL https://httpbin.org/post.

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/post"
}

headers = {
  "Content-Type": "application/x-www-form-urlencoded; charset=utf-8"
}

data = {
  "foo": "bar"
}

response = requests.request("POST", url, params=params, headers=headers, data=data)

print(response.text)

This request will return the following results:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "foo": "bar"
  },
  "headers": {
    "date": "Sat, 20 Mar 2021 20:17:27 GMT",
    "content-type": "application/json",
    "content-length": "566",
    "connection": "close",
    "server": "gunicorn/19.9.0",
    "access-control-allow-origin": "*",
    "access-control-allow-credentials": "true"
  },
  "json": null,
  "origin": "23.92.126.215",
  "url": "https://httpbin.org/post"
}

Forcing Timeouts

When using a browser to make requests, some URLs have the tendency to have long timeouts, because some esoteric part of the DOM is still loading. For that reason, WebScrapingAPI returns all the HTML that could be gathered until the timeout was triggered. The example to the right demonstrates how to force a timeout after timeout=3000 3 seconds. The maximum value that can be set for this parameter is 14000.

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://httpbin.org/get",
  "render_js":"1",
  "timeout":"200"
}

response = requests.request("GET", url, params=params)

print(response.text)

Binary Files

To scrape any type of file within the limit of 2MB, just set the url parameter to the URL of that specific file. The response will include an object with the key base64_string and the file converted to a base64 string as the value.

The following API request is the simplest form of invocation you can make by only specifying the url parameter:

import requests

url = "https://api.webscrapingapi.com/v1"

params = {
  "api_key":"XXXXXX",
  "url":"https://i.imgur.com/bHkmaqm.jpeg"
}

response = requests.request("GET", url, params=params)

print(response.text)

The result will be:

{
    "base64_string": "[...]oFcmgqvH0f+hxwQMj/9k="
}

For a complete list of request parameters, please read the WebScrapingAPI documentation.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.