Is this possible? Download File and Parse Data Same Time

scrapefun

A-Parser Enterprise License
A-Parser Enterprise
Before I even attempt something, is it possible to download the full source of a page and save as a file AND also parse that page for certain data and save to a file in json format?

(I already download the source to a file just need to add the parsing to json part)

So it would be:

1: Visit page, grab source and save to file
2: Extract data from that same page and save to json

If it is possible I will have a lot more questions? lol :)
 
Yes it is possible. To output in JSON is necessary to use a method .json.
Here's parsing a site with saving the source code and the results to JSON format (for example Wikipedia):
7cuvh.png

Code:
eyJwcmVzZXQiOiJkZWZhdWx0IiwidmFsdWUiOnsicHJlc2V0IjoiZGVmYXVsdCIs
InBhcnNlcnMiOltbIk5ldDo6SFRUUCIsImRlZmF1bHQiLHsidHlwZSI6Im92ZXJy
aWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2YWx1ZSI6IiR0aXRsZS5qc29uXFxu
JHRvcDEwLmpzb24ifSx7InR5cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOiJk
YXRhIiwicmVnZXgiOiI8dGl0bGU+KC4rPyk8L3RpdGxlPiIsInJlZ2V4VHlwZSI6
IiIsInJlc3VsdFR5cGUiOiJmbGF0IiwiYXJyYXlOYW1lIjoiIiwicmVzdWx0cyI6
WyJ0aXRsZSJdfSx7InR5cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOiJkYXRh
IiwicmVnZXgiOiI8IS0tICguKz9ocikgLS0+IiwicmVnZXhUeXBlIjoiZyIsInJl
c3VsdFR5cGUiOiJhcnJheSIsImFycmF5TmFtZSI6InRvcDEwIiwicmVzdWx0cyI6
WyJsYW5nIl19XV0sInJlc3VsdHNGb3JtYXQiOiIkcDEucHJlc2V0IiwicmVzdWx0
c1NhdmVUbyI6ImZpbGUiLCJyZXN1bHRzRmlsZU5hbWUiOiJ3aWtpL2pzb24udHh0
IiwiYWRkaXRpb25hbEZvcm1hdHMiOltbIndpa2kvc291cmNlLnR4dCIsIiRwMS5k
YXRhIl1dLCJyZXN1bHRzVW5pcXVlIjoibm8iLCJxdWVyeUZvcm1hdCI6WyIkcXVl
cnkiXSwidW5pcXVlUXVlcmllcyI6ZmFsc2UsInNhdmVGYWlsZWRRdWVyaWVzIjpm
YWxzZSwiaXRlcmF0b3JPcHRpb25zIjp7Im9uQWxsTGV2ZWxzIjpmYWxzZSwicXVl
cnlCdWlsZGVyc0FmdGVySXRlcmF0b3IiOmZhbHNlfSwicmVzdWx0c09wdGlvbnMi
Onsib3ZlcndyaXRlIjpmYWxzZX0sImRvTG9nIjoiZGIiLCJrZWVwVW5pcXVlIjoi
Tm8iLCJtb3JlT3B0aW9ucyI6ZmFsc2UsInJlc3VsdHNQcmVwZW5kIjoiIiwicmVz
dWx0c0FwcGVuZCI6IiIsInF1ZXJ5QnVpbGRlcnMiOltdLCJyZXN1bHRzQnVpbGRl
cnMiOltdLCJjb25maWdPdmVycmlkZXMiOltdfX0=
 
A per usual I am stuck with all the Regex I need to do.

Here is a sample query:
https://www.google.co.uk/search?q=keywrd+planner&pws=0&uule=w+CAIQICINVW5pdGVkIFN0YXRlcw&num=20

I am using this as the user agent:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0

I need to get the Title and Link from each SERP result but I also need to extract the data highlighted in the images below


misspell.pngrelated.png


Any help would be greatly appreciated. All the regex I try doesn't work. Thanks
 
ZYDbf.png

ReGex for spell:
Code:
<a class="spell".+?>(.+?)<\/a>
ReGex for spell_orig:
Code:
<a class="spell_orig".+?>(.+?)<\/a>
HTML tags can be cleared using Results builder.
 
Thanks!

I have figured out most of the regex thanks to your examples except for when I try to extract the related keywords from the screenshot in my previous post. I have this:

2015-08-24_0926.png



This works for grabbing the first suggestion from each of the two columns but does not grab all of them.


Next I'm not sure how to properly format the json file. I want to create a json file for each keyword/query that has a layout something like this:

2015-08-24_0928.png


Finally, I want to save the json file and a raw file containing the source code in different directories and then create a new directory every 5000 queries and save any failed queries to a separate file. I have code for this in another custom parser that I used below but not sure it translates to this new one.

Here is everything I have so far:

2015-08-24_0933.png





Code:
eyJwcmVzZXQiOiJHb29nbGUgUmF3ICYgUGFyc2UgVG8gSlNPTiIsInZhbHVlIjp7
InByZXNldCI6Ikdvb2dsZSBSYXcgJiBQYXJzZSBUbyBKU09OIiwicGFyc2VycyI6
W1siTmV0OjpIVFRQIiwiZGVmYXVsdCIseyJ0eXBlIjoib3ZlcnJpZGUiLCJpZCI6
ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoiJHF1ZXJ5Lmpzb25cXG4kbG9vcC5jb3Vu
dC5qc29uXFxuJHNlcnAuanNvblxcbiR0b3AxMC5qc29uXFxuJHNwZWxsLmpzb25c
XG4kc3BlbGxfb3JpZ2luYWwuanNvblxcbiRyZWxhdGVkLmpzb25cXG4ifSx7InR5
cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOiJkYXRhIiwicmVnZXgiOiI8aDMg
Y2xhc3M9XCJyXCI+PGEgaHJlZj0uKz8+KC4rPyk8XFwvYT4iLCJyZWdleFR5cGUi
OiJnIiwicmVzdWx0VHlwZSI6ImFycmF5IiwiYXJyYXlOYW1lIjoic2VycCIsInJl
c3VsdHMiOlsidGl0bGUiXX0seyJ0eXBlIjoiY3VzdG9tUmVzdWx0IiwicmVzdWx0
IjoiZGF0YSIsInJlZ2V4IjoiPGgzIGNsYXNzPVwiclwiPjxhIGhyZWY9XCIoLis/
KVwiIiwicmVnZXhUeXBlIjoiZyIsInJlc3VsdFR5cGUiOiJhcnJheSIsImFycmF5
TmFtZSI6InRvcDEwIiwicmVzdWx0cyI6WyJsaW5rIl19LHsidHlwZSI6Im92ZXJy
aWRlIiwiaWQiOiJ1c2VyLWFnZW50IiwidmFsdWUiOiJNb3ppbGxhLzUuMCAoV2lu
ZG93cyBOVCA2LjE7IFdPVzY0OyBydjozOS4wKSBHZWNrby8yMDEwMDEwMSBGaXJl
Zm94LzM5LjAifSx7InR5cGUiOiJvdmVycmlkZSIsImlkIjoiZ29vZENvZGUiLCJ2
YWx1ZSI6MjAwfSx7InR5cGUiOiJvdmVycmlkZSIsImlkIjoicXVlcnlmb3JtYXQi
LCJ2YWx1ZSI6Imh0dHBzOi8vd3d3Lmdvb2dsZS5jby51ay9zZWFyY2g/cT0kcXVl
cnkmcHdzPTAmdXVsZT13K0NBSVFJQ0lOVlc1cGRHVmtJRk4wWVhSbGN3Jm51bT0y
MCJ9LHsidHlwZSI6ImN1c3RvbVJlc3VsdCIsInJlc3VsdCI6ImRhdGEiLCJyZWdl
eCI6IjxhIGNsYXNzPVwic3BlbGxcIi4rPz4oLis/KTxcXC9hPiIsInJlZ2V4VHlw
ZSI6InMiLCJyZXN1bHRUeXBlIjoiZmxhdCIsImFycmF5TmFtZSI6IiIsInJlc3Vs
dHMiOlsic3BlbGwiXX0seyJ0eXBlIjoiY3VzdG9tUmVzdWx0IiwicmVzdWx0Ijoi
ZGF0YSIsInJlZ2V4IjoiPGEgY2xhc3M9XCJzcGVsbFwiLis/PiguKz8pPFxcL2E+
IiwicmVnZXhUeXBlIjoicyIsInJlc3VsdFR5cGUiOiJmbGF0IiwiYXJyYXlOYW1l
IjoiIiwicmVzdWx0cyI6WyJzcGVsbF9vcmlnaW5hbCJdfSx7InR5cGUiOiJjdXN0
b21SZXN1bHQiLCJyZXN1bHQiOiJkYXRhIiwicmVnZXgiOiI8ZGl2IGNsYXNzPVwi
YnJzX2NvbFwiPjxwIGNsYXNzPVwiX2U0YlwiPjxhIGhyZWY9Lis/PiguKz8pPFxc
L2E+PFxcL3A+IiwicmVnZXhUeXBlIjoiZyIsInJlc3VsdFR5cGUiOiJhcnJheSIs
ImFycmF5TmFtZSI6InJlbGF0ZWQiLCJyZXN1bHRzIjpbInN1Z2dlc3Rpb25zIl19
XV0sInJlc3VsdHNGb3JtYXQiOiIkcDEucHJlc2V0IiwicmVzdWx0c1NhdmVUbyI6
ImZpbGUiLCJyZXN1bHRzRmlsZU5hbWUiOiJzZXJwX2pzb24vWyUgSUYgcDEuaW5m
by5zdWNjZXNzID09IDEgJV1bJSBVU0UgTWF0aDsgXCJ1c19cIl8gTWF0aC5pbnQo
cXVlcnkubnVtIC8gNTAwMCkgX1wiL1wiXyBxdWVyeSBfXCIuanNvblwiICVdWyUg
RU5EICVdIiwiYWRkaXRpb25hbEZvcm1hdHMiOltbInNlcnBfcmF3L1slIElGIHAx
LmluZm8uc3VjY2VzcyA9PSAxICVdWyUgVVNFIE1hdGg7IFwidXNfXCJfIE1hdGgu
aW50KHF1ZXJ5Lm51bSAvIDUwMDApIF9cIi9cIl8gcXVlcnkgX1wiLmh0bWxcIiAl
XVslIEVORCAlXSIsIiRwMS5kYXRhIl0sWyJzZXJwX2ZhaWwvZmFpbGVkLnR4dCIs
IlslIElGIHAxLmluZm8uc3VjY2VzcyA9PSAwICVdJHF1ZXJ5XFxuWyUgRU5EICVd
Il1dLCJyZXN1bHRzVW5pcXVlIjoibm8iLCJxdWVyeUZvcm1hdCI6WyIkcXVlcnki
XSwidW5pcXVlUXVlcmllcyI6ZmFsc2UsInNhdmVGYWlsZWRRdWVyaWVzIjpmYWxz
ZSwiaXRlcmF0b3JPcHRpb25zIjp7Im9uQWxsTGV2ZWxzIjpmYWxzZSwicXVlcnlC
dWlsZGVyc0FmdGVySXRlcmF0b3IiOmZhbHNlfSwicmVzdWx0c09wdGlvbnMiOnsi
b3ZlcndyaXRlIjpmYWxzZX0sImRvTG9nIjoibm8iLCJrZWVwVW5pcXVlIjoiTm8i
LCJtb3JlT3B0aW9ucyI6ZmFsc2UsInJlc3VsdHNQcmVwZW5kIjoiIiwicmVzdWx0
c0FwcGVuZCI6IiIsInF1ZXJ5QnVpbGRlcnMiOltdLCJyZXN1bHRzQnVpbGRlcnMi
Olt7InNvdXJjZSI6WzAsInNwZWxsIl0sInR5cGUiOiJyZW1vdmVIdG1sIiwidG8i
OiJzcGVsbCJ9LHsic291cmNlIjpbMCwic3BlbGxfb3JpZ2luYWwiXSwidHlwZSI6
InJlbW92ZUh0bWwiLCJ0byI6Im9yaWdpbmFsIn1dLCJjb25maWdPdmVycmlkZXMi
OltdfX0=


Thanks! I'm always amazed what this software can do but more amazed by the support!
 
Last edited by a moderator:
This works for grabbing the first suggestion from each of the two columns but does not grab all of them.
String <div class="brs_col"> in this regular expression superfluous:
Code:
<p class="_e4b"><a href=.+?>(.+?)<\/a><\/p>
Next I'm not sure how to properly format the json file. I want to create a json file for each keyword/query that has a layout something like this:
It is necessary create a variable that will contain all data, and is already its output into JSON.
Code:
[% result.spell = p1.spell;
result.spell_original = p1.spellorig;
result.suggestions = p1.related;
result.serp = p1.serp;
result.json() %]
Finally, I want to save the json file and a raw file containing the source code in different directories and then create a new directory every 5000 queries and save any failed queries to a separate file. I have code for this in another custom parser that I used below but not sure it translates to this new one.
Here everything is done correctly.

As a result we get here is a preset:
yh1LE.png

Code:
eyJwcmVzZXQiOiJodHRwOi8vYS1wYXJzZXIuY29tL3RocmVhZHMvMTc5Mi8iLCJ2
YWx1ZSI6eyJwcmVzZXQiOiJodHRwOi8vYS1wYXJzZXIuY29tL3RocmVhZHMvMTc5
Mi8iLCJwYXJzZXJzIjpbWyJOZXQ6OkhUVFAiLCJkZWZhdWx0Iix7InR5cGUiOiJv
dmVycmlkZSIsImlkIjoiZm9ybWF0cmVzdWx0IiwidmFsdWUiOiJbJSByZXN1bHQu
c3BlbGwgPSBwMS5zcGVsbDtcbnJlc3VsdC5zcGVsbF9vcmlnaW5hbCA9IHAxLnNw
ZWxsb3JpZztcbnJlc3VsdC5zdWdnZXN0aW9ucyA9IHAxLnJlbGF0ZWQ7XG5yZXN1
bHQuc2VycCA9IHAxLnNlcnA7XG5yZXN1bHQuanNvbigpICVdIn0seyJ0eXBlIjoi
Y3VzdG9tUmVzdWx0IiwicmVzdWx0IjoiZGF0YSIsInJlZ2V4IjoiPGgzIGNsYXNz
PVwiclwiPjxhIGhyZWY9XCIoLis/KVwiIG9ubW91c2Vkb3duLis/LCcoXFxkKykn
LC4rP1wiPiguKz8pPFxcL2E+IiwicmVnZXhUeXBlIjoiZyIsInJlc3VsdFR5cGUi
OiJhcnJheSIsImFycmF5TmFtZSI6InNlcnAiLCJyZXN1bHRzIjpbImxpbmsiLCJy
YW5rIiwidGl0bGUiXX0seyJ0eXBlIjoib3ZlcnJpZGUiLCJpZCI6InVzZXItYWdl
bnQiLCJ2YWx1ZSI6Ik1vemlsbGEvNS4wIChXaW5kb3dzIE5UIDYuMTsgV09XNjQ7
IHJ2OjM5LjApIEdlY2tvLzIwMTAwMTAxIEZpcmVmb3gvMzkuMCJ9LHsidHlwZSI6
Im92ZXJyaWRlIiwiaWQiOiJnb29kQ29kZSIsInZhbHVlIjoyMDB9LHsidHlwZSI6
Im92ZXJyaWRlIiwiaWQiOiJxdWVyeWZvcm1hdCIsInZhbHVlIjoiaHR0cHM6Ly93
d3cuZ29vZ2xlLmNvLnVrL3NlYXJjaD9xPSRxdWVyeSZwd3M9MCZ1dWxlPXcrQ0FJ
UUlDSU5WVzVwZEdWa0lGTjBZWFJsY3cmbnVtPTIwIn0seyJ0eXBlIjoiY3VzdG9t
UmVzdWx0IiwicmVzdWx0IjoiZGF0YSIsInJlZ2V4IjoiPGEgY2xhc3M9XCJzcGVs
bFwiLis/PiguKz8pPFxcL2E+IiwicmVnZXhUeXBlIjoicyIsInJlc3VsdFR5cGUi
OiJmbGF0IiwiYXJyYXlOYW1lIjoiIiwicmVzdWx0cyI6WyJzcGVsbCJdfSx7InR5
cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOiJkYXRhIiwicmVnZXgiOiI8cCBj
bGFzcz1cIl9lNGJcIj48YSBocmVmPS4rPz4oLis/KTxcXC9hPjxcXC9wPiIsInJl
Z2V4VHlwZSI6ImciLCJyZXN1bHRUeXBlIjoiYXJyYXkiLCJhcnJheU5hbWUiOiJy
ZWxhdGVkIiwicmVzdWx0cyI6WyJzdWdnZXN0aW9ucyJdfSx7InR5cGUiOiJjdXN0
b21SZXN1bHQiLCJyZXN1bHQiOiJkYXRhIiwicmVnZXgiOiI8YSBjbGFzcz1cInNw
ZWxsX29yaWdcIi4rPz4oLis/KTxcXC9hPi4rIiwicmVnZXhUeXBlIjoicyIsInJl
c3VsdFR5cGUiOiJmbGF0IiwiYXJyYXlOYW1lIjoiIiwicmVzdWx0cyI6WyJzcGVs
bG9yaWciXX1dXSwicmVzdWx0c0Zvcm1hdCI6IiRwMS5wcmVzZXQiLCJyZXN1bHRz
U2F2ZVRvIjoiZmlsZSIsInJlc3VsdHNGaWxlTmFtZSI6InNlcnBfanNvbi9bJSBJ
RiBwMS5pbmZvLnN1Y2Nlc3MgPT0gMSAlXVslIFVTRSBNYXRoOyBcInVzX1wiXyBN
YXRoLmludChxdWVyeS5udW0gLyA1MDAwKSBfXCIvXCJfIHF1ZXJ5IF9cIi5qc29u
XCIgJV1bJSBFTkQgJV0iLCJhZGRpdGlvbmFsRm9ybWF0cyI6W1sic2VycF9yYXcv
WyUgSUYgcDEuaW5mby5zdWNjZXNzID09IDEgJV1bJSBVU0UgTWF0aDsgXCJ1c19c
Il8gTWF0aC5pbnQocXVlcnkubnVtIC8gNTAwMCkgX1wiL1wiXyBxdWVyeSBfXCIu
aHRtbFwiICVdWyUgRU5EICVdIiwiJHAxLmRhdGEiXSxbInNlcnBfZmFpbC9mYWls
ZWQudHh0IiwiWyUgSUYgcDEuaW5mby5zdWNjZXNzID09IDAgJV0kcXVlcnlcXG5b
JSBFTkQgJV0iXV0sInJlc3VsdHNVbmlxdWUiOiJubyIsInF1ZXJ5Rm9ybWF0Ijpb
IiRxdWVyeSJdLCJ1bmlxdWVRdWVyaWVzIjpmYWxzZSwic2F2ZUZhaWxlZFF1ZXJp
ZXMiOmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25BbGxMZXZlbHMiOmZhbHNl
LCJxdWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFsc2V9LCJyZXN1bHRzT3B0
aW9ucyI6eyJvdmVyd3JpdGUiOmZhbHNlfSwiZG9Mb2ciOiJubyIsImtlZXBVbmlx
dWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0c1ByZXBlbmQiOiIi
LCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6W10sInJlc3VsdHNC
dWlsZGVycyI6W3sic291cmNlIjpbMCwic3BlbGwiXSwidHlwZSI6InJlbW92ZUh0
bWwiLCJ0byI6InNwZWxsIn0seyJzb3VyY2UiOlswLCJzcGVsbG9yaWciXSwidHlw
ZSI6InJlbW92ZUh0bWwiLCJ0byI6InNwZWxsb3JpZyJ9LHsic291cmNlIjpbMCxb
InJlbGF0ZWQiLCJzdWdnZXN0aW9ucyJdXSwidHlwZSI6InJlbW92ZUh0bWwiLCJh
cnJheSI6InJlbGF0ZWQiLCJ0byI6InN1Z2dlc3Rpb25zIn1dLCJjb25maWdPdmVy
cmlkZXMiOltdfX0=

Result:
{
"serp" : [
{
"rank" : "1",
"title" : "Google AdWords: Keyword Planner - AdWords - Google",
"link" : "https://adwords.google.co.uk/KeywordPlanner"
},
{
"title" : "Keyword Tool: FREE Alternative to Google Keyword Planner",
"rank" : "2",
"link" : "http://keywordtool.io/"
},
{
"link" : "https://en.wikipedia.org/wiki/Keyword",
"rank" : "3",
"title" : "Keyword - Wikipedia, the free encyclopedia"
},
{
"link" : "https://moz.com/beginners-guide-to-seo/keyword-research",
"title" : "How To Do Keyword Research - The Beginners Guide ... - Moz",
"rank" : "4"
},
{
"title" : "Keyword Research Tools from Wordtracker",
"rank" : "5",
"link" : "http://www.wordtracker.com/"
},
{
"title" : "WordStream&#39;s Free Keyword Tool | Wordstream",
"rank" : "6",
"link" : "http://www.wordstream.com/keywords"
},
{
"title" : "What is Keyword? Webopedia",
"rank" : "7",
"link" : "http://www.webopedia.com/TERM/K/keyword.html"
},
{
"title" : "Keyword - Word games at Royalgames.com!",
"rank" : "8",
"link" : "http://www.royalgames.com/games/word-games/keyword/?language=en_US"
},
{
"rank" : "9",
"title" : "Keyword Discovery - Advanced keyword research tool and ...",
"link" : "http://www.keyworddiscovery.com/"
},
{
"rank" : "10",
"title" : "PPC Keyword Concatenation Tool / Paid Search Tools | Found",
"link" : "https://www.found.co.uk/ppc-keyword-tool/"
},
{
"title" : "Keyword | Define Keyword at Dictionary.com",
"rank" : "11",
"link" : "http://dictionary.reference.com/browse/keyword"
},
{
"link" : "https://support.google.com/adwords/answer/1704371?hl=en",
"title" : "How keywords work - AdWords Help",
"rank" : "12"
},
{
"rank" : "13",
"title" : "Keywords: Definition - AdWords Help",
"link" : "https://support.google.com/adwords/answer/6323?hl=en"
},
{
"rank" : "14",
"title" : "Using keyword insertion - AdWords Help",
"link" : "https://support.google.com/adwords/answer/2454041?hl=en"
},
{
"rank" : "15",
"title" : "Using Keyword Planner to get keyword ideas and traffic ...",
"link" : "https://support.google.com/adwords/answer/2999770?hl=en"
},
{
"rank" : "16",
"title" : "Keyword User Guide - HubSpot Academy",
"link" : "http://knowledge.hubspot.com/keyword-user-guide-v2"
},
{
"link" : "http://tools.seobook.com/keyword-tools/seobook/",
"title" : "Seo Book Keyword Suggestion Tool - SEO Tools",
"rank" : "17"
},
{
"rank" : "18",
"title" : "Keyword Eye | Visual Keyword Research &amp; Competitor Tools",
"link" : "http://www.keywordeye.com/"
},
{
"link" : "https://yoast.com/focus-keyword/",
"title" : "The perfect focus keyword for your post or page • Yoast",
"rank" : "19"
},
{
"link" : "https://yoast.com/keyword-research-tools/",
"rank" : "20",
"title" : "Keyword research tools: which ones to use? • Yoast"
}
],
"spell_original" : "keywrd",
"suggestions" : [
{
"suggestions" : "keyword research"
},
{
"suggestions" : "keyword planner"
},
{
"suggestions" : "keyword generator"
},
{
"suggestions" : "keyword tool"
},
{
"suggestions" : "keyword research tool free"
},
{
"suggestions" : "keyword spy"
},
{
"suggestions" : "keyword discovery"
},
{
"suggestions" : "keyword tracker"
}
],
"spell" : "keyword"
}
 
Thanks! Works great. I never would have gotten the json and variable part right.

Great support as always!
 
Back
Top