I've setup a few custom Google parsers using the Net::HTTP parser and they work great but I can't ever seem to achieve anywhere close to the same speeds as parsing with the Google Parser.
One of the reasons I know is I am using different user agents which deliver larger file sizes so I know that contributes to the slow down but are there any tips or best practices for optimizing a custom Google parser using Net::HTTP?
I have the settings the same for things like threads, query delay, proxy ban time, timeout, etc so I know those settings aren't a factor. And the query strings are the same as far as I can tell basically just using newer user agents or mobile user agents.
With the Google parser there is an option to enable sessions but not with Net::HTTP could that be a reason for slower scraping?
I know this is a pretty general question but any tips/ideas for optimizing for speed would be welcome.
As always, thanks!
One of the reasons I know is I am using different user agents which deliver larger file sizes so I know that contributes to the slow down but are there any tips or best practices for optimizing a custom Google parser using Net::HTTP?
I have the settings the same for things like threads, query delay, proxy ban time, timeout, etc so I know those settings aren't a factor. And the query strings are the same as far as I can tell basically just using newer user agents or mobile user agents.
With the Google parser there is an option to enable sessions but not with Net::HTTP could that be a reason for slower scraping?
I know this is a pretty general question but any tips/ideas for optimizing for speed would be welcome.
As always, thanks!
Last edited: