Wednesday, July 31, 2019

Changelog 2019-07-31 - Two Fixes

Fixed Amazon Regex

Amazon changed their website so the regex needed to be updated. The new regex is much more efficient than before so Amazon scraping should see some mild improvements in overall runtime.

Addeed Dryscrape Request Headers

Dryscrape requests never used headers previously. Headers are useful for tricking websites into believing a request is a legit request. Amazon started giving the scraper a captcha when I made too many requests at once (testing the above fix) so I added some headers to the request in order for the requests to go through properly.

Monday, July 29, 2019

Changelog 2019-07-29 - Five Fixes

Added support for Galactic Toys

Galactic toys is now monitored both for preorders and for exclusives. Exclusives will not be posted to their own twitter feed yet, but they will be posted to r/funkopop and to their own discord channel. New releases will be posted to the usual channels and added to preorder posts on reddit, if applicable. 

Fixed some URL parsing for the Blog Scraper

Images on Funko blogs should be detected correctly again. An error in parsing the regex results caused an issue where all image URLs would just be the base URL. This issue is fixed so compilation images should again be posted to reddit.

Performance improvements to the main scraping code

if the main scraping function cannot connect to the database, it will quit rather than run until it crashes. This allows for faster retries and saves on CPU time on the server.

FYE in-stock detection should be more reliable

Sometimes, FYE would alert on OOS items. An extra check helps ensure the item is actually in stock. it won't work 100% of the time, but it should help decrease the number of false positives. 

ModTools - One post per X days works again

A bad import in this script was causing crashes and failures to prevent people from posting the correct number of times. A fix to the import allows the script to run normally again and remove posts when people post too often.

Friday, July 19, 2019

Changelog 2019-07-19 - One Fix

Target Will Now Alert on Items More Reliably

Alright, let's do this one last time. I know I've said that I've fixed Target before, but this time it's really going to stick. Previously, we were scraping their web page. Now, we're using their API. Below is an explanation of why this is better, following by the short set of limitations.

Previously, we scraped their website by trying to capture the HTML. Because uses a modern javascript framework, we could not just throw a regular old get request at them. Instead, we had to do some browser spoofing to get the state of their page. However, this was unreliable because the site would often times not load the entire page. Their site uses some time/bandwidth saving techniques and only loads what it thinks it needs to load. It's tricky to get the browser spoofer to convince the page to load everything, so this resulted in in-stock items going unseen for hours at a time. And once they finally were seen, they were alerted on as back in stock even though they never went out of stock. Now, we are using their API to directly talk to their backend and get the information 100% of the time.

However, their API returns some messy information in the way of Tee Shirts. Anything with multiple sizes is handled differently than most items. Some of the Tee Shirts I am able to determine if they are in stock or not, but others have such confusing information associated with them that I cannot currently tell the difference between in stock and out of stock for them. So, until I can figure this out, some of the Pop+Tee combos will always show as out of stock and won't be alerted on. But that is a very small subset of items and this is an acceptable loss for now.

Wednesday, July 17, 2019

Changelog 2019-07-17 - One Fix

Hot Topic and Box Lunch should no longer alert on false positives

An issue with the regex that looked at individual item pages on Hot Topic and Box Lunch's websites would result in some false positives being sent, as well as some items being missed. The regex was unreliable because some pages wouldn't have it and others would have it, even though they were in stock. Changing this to something that is more unique to out of stock items helps ensure that we alert on items properly.

Tuesday, July 16, 2019

Changelog 2019-07-16 - Four Fixes scraper now ignores OOS items

Previously, it was assumed that would not display items that are OOS. However, because they are, we were not alarming on restocks. Changing the regex to account for OOS messages allows us to properly alert on restocks.

Blog scraper now accounts for png and jpg

The Funko blog used to only post png images. Now, we have seen blogs that include both jpg and png, so the regex needed to change to accommodate for both file types.

Gamestop now alarms on new-in-stock items

Similar to how when an item is first seen on Hot Topic's website, any new item, exclusive or otherwise, will be sent to @FunkoRelease on twitter in addition to being automatically added to preorder threads, if applicable. This was done by finding a few pages that give good coverage of items that are just added to Gamestop's site and scraping those pages.

Hot Topic/Box Lunch alerts should now show stock numbers for some items

Hot Topic used to give more detailed information about stock numbers but now it only shows how many are left if that number is fewer than 10. Changed some regex to pick up that number if it exists. 

Wednesday, July 10, 2019

Changelog 2019-07-10 - Three Fixes

Monitoring now includes Fugitive Toys is now supported with @FunkoFugitive on Twitter as well as on the Funko Pop sub-reddit. This is in anticipation of the LE 500 Andy Dwyer Funko POP!s that will be releasing soon. All exclusives will be posted to that Twitter account and to Reddit. All commons will be posted to @FunkoRelease on Twitter.

Consider supporting this work via Patreon if you want to see more of these changes as adding new stores usually increases the cost of maintaining the servers.

Funko Blog now has longer timeout

Many news posts are scraped from and posted to Reddit/Twitter. However, a timeout that was too low was preventing the site from being scraped correctly. Increasing the timeout from 30 seconds to 45 seconds fixed this issue and now the blog site is scraped correctly again.

Added more support for image scraping on the Funko Blog

Images are stored in a CDN that Funko maintains. Scraping for these URLs allows us to build compilation images of all products showcased on a blog. Funko recently started using a new CDN so support has been added to scrape these images.

Tuesday, July 2, 2019

Changelog 2019-07-02 - Two Fixes

Funko In-Stock Alerts now also post to Discord

As a Patreon perk, users can sign up to join a private Discord server.  This server includes a channel for all exclusives going in stock or restocking, as well as newly released pops like with @FunkoRelease on twitter. This gives users another option for how to interact with these alerts.

Patreon links added to some Reddit stickied comments

Some of the stickied comments on automatically created Reddit posts now include a link to Patreon for users to sign up and support this project.

Hot Cash Give Away Jan 2021

  Hey everyone! Time for another round of Hot Cash codes! Hot Cash ends soon so be sure to use these ASAP! Comment below with the codes you ...