The urllib.error.httperror: http error 403: forbidden occurs when you try to scrap a webpage using urllib.request module and the mod_security blocks the request. There are several reasons why you get this error. Let’s take a look at each of the use cases in detail.
How to fix urllib.error.httperror: http error 403: forbidden?
Usually, the websites are protected with App Gateway, WAF rules, etc., which monitor whether the requests are from the actual users or triggered through the automated bot system. The mod_security or the WAF rule will block these requests treating them as spider/bot requests. These security features are the most standard ones to prevent DDOS attacks on the server.
Now coming back to the error when you make a request to any site using urllib.request basically, you will not set any user-agents and headers and by default the urllib sets something like python urllib/3.3.0, which is easily detected by the mod_security.
The mod_security is usually configured in such a way that if any requests happen without a valid user-agent header(browser user-agent), the mod_security will block the request and return the urllib.error.httperror: http error 403: forbidden
Example of 403 forbidden error
from urllib.request import Request, urlopen
req = Request('http://www.cmegroup.com/')
webpage = urlopen(req).read()
Output
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
urllib.error.HTTPError: HTTP Error 403: Forbidden
PS C:\Projects\Tryouts> from urllib.request import Request, urlopen
The easy way to resolve the error is by passing a valid user-agent as a header parameter, as shown below.
from urllib.request import Request, urlopen
req = Request('https://www.yahoo.com', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
Alternatively, you can even set a timeout if you are not getting the response from the website. Python will raise a socket exception if the website doesn’t respond within the mentioned timeout period.
from urllib.request import Request, urlopen
req = Request('http://www.cmegroup.com/', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req,timeout=10).read()
In some cases, like getting a real-time bitcoin or stock market value, you will send requests every second, and the servers can block if there are too many requests coming from the same IP address and throws 403 security error.
If you get this error because of too many requests, consider adding delay between each request to resolve the error.