web scraping – How to solve python requests library returning different html content every time I call it?


I am using python requests to scrape a sports betting website to make an API to retrieve sports betting odds. The requests library is returning different html content from the same url sometimes. I detected the issue because I run the script in my terminal over and over again, and about 4/5 times it works, then the 5th time I will get an error because an html element I try to find using BeautifulSoup is no longer there.

I know the html content is different because in my script I added a few lines to simply write to a txt file the entire document that the request returns. After running this once successfully, I changed the file name slightly to write a new file. I ran this over and over again until the web scraping failed due to the error previously mentioned. At that point I used filecmp library to compare the two txt files and they were not identical.

I suppose this could be a JavaScript related issue? The error is occurring because I am using BeautifulSoup to search for an html element with a specific class name that is obviously there sometimes and not there other times, thus resulting in an attribute error when I try doing something with the html element that BeautifulSoup finds that the terminal tells me is type None.

Has anyone else encountered this before? Any ideas on how to fix?



Source link

Leave a Comment