Thursday, April 25, 2019

Changelog 2019-04-24 - Three Fixes


Amazon posting is less sensitive to URL changes

Amazon has a lot of extra information in their URLs that do not make then good unique identifiers. However, each URL contains an ASIN which is unique. Uniqueness is now determined by ASIN rather than URL, and URLs are now reconstructed from the ASIN. This decreases the bot thinking that it saw a new item when it really just saw a previously seen item with a new URL. 

Target should stop alerting on non-Funko products

An improvement to the regex for scraping Target’s site should decrease the amount of non-Funko products being found. However, this RegExr decreases performance and needs to be improved. 

Target and Amazon will now be more reliably scraped

A change to the get request to include more relevant headers will more reliably trick the server into thinking we are a legit request and return thefull HTML more reliably. 

No comments:

Post a Comment

Hot Cash Give Away Jan 2021

  Hey everyone! Time for another round of Hot Cash codes! Hot Cash ends soon so be sure to use these ASAP! Comment below with the codes you ...