Customize Google Images Scraper

scrapefun

A-Parser Enterprise License
A-Parser Enterprise
I need to extract some additional information from Google Images results and am not sure how to go about it.

On the Google image results page each image generates a url like this:

href="http://www.google.com/imgres?imgurl...BQ&tbm=isch&ved=0CDQQMygCMAI&biw=1366&bih=631"

I need to extract the values for these parameters:

imgurl=
imgrefurl=
tbnid=

And finally, is there a way to extract the filetype of the image into a variable as well (jpg, png, etc)? Something like $filetype?

So for the final result I would like stored on each line:
$query;$loop.count;$imgurl;$imgrefurl;$tbnid.$filetype\n
 
I know Forbidden is super busy so I would be open to hiring someone to get this solution. If anyone is interested just send me a PM.
 
Very interesting solution:
  • Use SE::Google::Images SE::Google::Images + Raw data results for generate queries and get raw html
  • Use complex regex to get all data
  • Use power of Result format for generate proper result


cv6c7.png


Code:
eyJwcmVzZXQiOiJ0b3BpYy0xNjA5OiBjdXN0b20gZ29vZ2xlIGltYWdlcyBwYXJz
ZXIiLCJ2YWx1ZSI6eyJwcmVzZXQiOiJ0b3BpYy0xNjA5OiBjdXN0b20gZ29vZ2xl
IGltYWdlcyBwYXJzZXIiLCJwYXJzZXJzIjpbWyJTRTo6R29vZ2xlOjpJbWFnZXMi
LCJkZWZhdWx0Iix7InR5cGUiOiJvdmVycmlkZSIsImlkIjoicmF3ZGF0YSIsInZh
bHVlIjp0cnVlfSx7InR5cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOlsicGFn
ZXMiLCJkYXRhIl0sInJlZ2V4IjoiaW1ndXJsPShbXiZdKj8oPzpcXC4oanBlP2d8
cG5nfGdpZikpPykmYW1wO2ltZ3JlZnVybD0oW14mXSspJi4qP3RibmlkPShbXjpd
Kyk6IiwicmVnZXhUeXBlIjoiaWciLCJyZXN1bHRUeXBlIjoiYXJyYXkiLCJhcnJh
eU5hbWUiOiJpbWdzIiwicmVzdWx0cyI6WyJsaW5rIiwidHlwZSIsInJlZiIsInRi
bmlkIl19LHsidHlwZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2
YWx1ZSI6IlslIEZPUkVBQ0ggaW1ncyAtJV1cbiRxdWVyeTskbG9vcC5jb3VudDsk
bGluazskcmVmOyR7dGJuaWR9LlslIHR5cGUgPT0gJ25vbmUnID8gJ2RlZmF1bHQu
anBnJyA6IHR5cGUgJV0gXG5bJSBFTkQgJV0ifV1dLCJyZXN1bHRzRm9ybWF0Ijoi
JHAxLnByZXNldCIsInJlc3VsdHNTYXZlVG8iOiJmaWxlIiwicmVzdWx0c0ZpbGVO
YW1lIjoiJGRhdGVmaWxlLmZvcm1hdCgpLnR4dCIsImFkZGl0aW9uYWxGb3JtYXRz
IjpbXSwicmVzdWx0c1VuaXF1ZSI6Im5vIiwicXVlcnlGb3JtYXQiOlsiJHF1ZXJ5
Il0sInVuaXF1ZVF1ZXJpZXMiOmZhbHNlLCJzYXZlRmFpbGVkUXVlcmllcyI6ZmFs
c2UsIml0ZXJhdG9yT3B0aW9ucyI6eyJvbkFsbExldmVscyI6ZmFsc2UsInF1ZXJ5
QnVpbGRlcnNBZnRlckl0ZXJhdG9yIjpmYWxzZX0sInJlc3VsdHNPcHRpb25zIjp7
Im92ZXJ3cml0ZSI6ZmFsc2V9LCJkb0xvZyI6Im5vIiwia2VlcFVuaXF1ZSI6Ik5v
IiwibW9yZU9wdGlvbnMiOmZhbHNlLCJyZXN1bHRzUHJlcGVuZCI6IiIsInJlc3Vs
dHNBcHBlbmQiOiIiLCJxdWVyeUJ1aWxkZXJzIjpbXSwicmVzdWx0c0J1aWxkZXJz
IjpbXSwiY29uZmlnT3ZlcnJpZGVzIjpbXX19
 
Thanks! This works great.

Is it possible to use the result of one parser to form the queries for another parser? I saw in the help files that it was not possible when the page was posted but wondered if it was possible yet?

Basically, I want to use the net::http parser to download the actual image from Google images. I got it working as a stand alone task but I would like to be able to use the "$link" result value from the Google image parser as the query for the net:http parser.

Thanks again for your help!
 
Is it possible to use the result of one parser to form the queries for another parser? I saw in the help files that it was not possible when the page was posted but wondered if it was possible yet?

still not possible
 
Ok.

What I am doing is creating an additional result file when scraping Google Images that just contains the image URLs and then I use those as the $query for the Net::HTTP parser in a separate task but with this method I can't match up the image to the original keyword query.

I want to name the images with the query from the Google Images task. How do I match up the image to the correct query the Google Image parser task?
 
As the request file, select obtained in the previous task file.
TduCJ.png

The result is img folder with a pictures, named by keyword and number.
 
Back
Top