Python urllib.error.httperror: http error 403: forbidden

The urllib.error.httperror: http error 403: forbidden occurs when you try to scrap a webpage using urllib.request module and the mod_security blocks the request. There are several reasons why you get this error. Let’s take a look at each of the use cases in detail.

How to fix urllib.error.httperror: http error 403: forbidden?

Usually, the websites are protected with App Gateway, WAF rules, etc., which monitor whether the requests are from the actual users or triggered through the automated bot system. The mod_security or the WAF rule will block these requests treating them as spider/bot requests. These security features are the most standard ones to prevent DDOS attacks on the server.

Now coming back to the error when you make a request to any site using urllib.request basically, you will not set any user-agents and headers and by default the urllib sets something like python urllib/3.3.0, which is easily detected by the mod_security.

The mod_security is usually configured in such a way that if any requests happen without a valid user-agent header(browser user-agent), the mod_security will block the request and return the urllib.error.httperror: http error 403: forbidden

Example of 403 forbidden error

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/')
webpage = urlopen(req).read()

Output

  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
urllib.error.HTTPError: HTTP Error 403: Forbidden
PS C:\Projects\Tryouts> from urllib.request import Request, urlopen

The easy way to resolve the error is by passing a valid user-agent as a header parameter, as shown below. 

from urllib.request import Request, urlopen

req = Request('https://www.yahoo.com', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

Alternatively, you can even set a timeout if you are not getting the response from the website. Python will raise a socket exception if the website doesn’t respond within the mentioned timeout period.

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req,timeout=10).read()

In some cases, like getting a real-time bitcoin or stock market value, you will send requests every second, and the servers can block if there are too many requests coming from the same IP address and throws 403 security error.

If you get this error because of too many requests, consider adding delay between each request to resolve the error.

Leave a Reply

Your email address will not be published.

Sign Up for Our Newsletters

Subscribe to get notified of the latest articles. We will never spam you. Be a part of our ever-growing community.

You May Also Like
Python String Islower()

Python String islower()

Table of Contents Hide islower() Syntaxislower() Parameterislower() Return ValueExample 1: Demonstrating the working of islower() methodExample 2: Practical use case of islower() in a program Python String islower() method is a…
View Post
Python Abs()

Python abs()

Table of Contents Hide abs() Syntax abs() Parametersabs() Return ValueWhat does the abs() function do in Python?Example 1: Get absolute value of a number in PythonExample 2: Get the magnitude of…
View Post
Python Comment Block / Python Multiline Comment

Python Comment Block

Table of Contents Hide Introduction to Python Comment BlockTypes of comments in PythonSingle-line commentsInline commentsMultiline commentsUsing Multiple Hashtags (#)Python docstringsSingle line docstringsMulti-line docstrings Comments are a piece of text in…
View Post