Sunday, May 5, 2019

Changelog 2019-05-04 - Two Fixes

Target RegEx fixed

The Target scraper was missing products because of some site changes to target.com. Edits to the RegEx fixed the issue and future-proofed to some degree, in case Target ever reverts their website back to the way it was before

Target scraper now actually scraps all items

There were a few issues previously that were impacting the Target scraper. Because target.com is a dynamically constructed website, products at the bottom of the page were not loaded until you scroll at or past them. A fix to the scraper to execute a javascript command to scroll to the bottom of the page before scraping the HTML ensures that all items are loaded before scraping.

Another issue was with scraping multiple pages on target.com. Because not every item being searched for can fit on one page, the scraper needs to look at three different web pages. Each webpage has a unique URL so scraping each one individually should have worked. However, it turns out target.com is setting a cookie in the browser when you visit the first page which allows you to visit the subsequent pages. If you navigate directly to any page other than the first page, it redirects you to the first page. This meant I was scraping the same page multiple times rather than each page once. I change to make sure the cookies were maintained allows the scraper to visit all relevant pages without a problem.

No comments:

Post a Comment

Hot Cash Give Away Jan 2021

  Hey everyone! Time for another round of Hot Cash codes! Hot Cash ends soon so be sure to use these ASAP! Comment below with the codes you ...