Scrape internal links and give report?

r3dn4x

A-Parser Enterprise License
A-Parser Enterprise
I'm sure it is possible, but I am having trouble figuring out the best way to scrape all internal links on a domain, and output a report like the one given in Google Webmaster Tools.

Example:

/page-1/, 293 links
/page-2/, 192 links

then for each page, have a breakdown of all links and what anchors are used.
 
It is necessary to use a parser HTML::LinkExtractor HTML::LinkExtractor with Parse to level function for searching links on a site with the specified depth and set the format of the result in the desired form. Also recommended to enable unique of requests.
Hf0zW.png

Code:
eyJwcmVzZXQiOiJkZWZhdWx0IiwidmFsdWUiOnsicHJlc2V0IjoiZGVmYXVsdCIs
InBhcnNlcnMiOltbIkhUTUw6OkxpbmtFeHRyYWN0b3IiLCJkZWZhdWx0Iix7InR5
cGUiOiJvcHRpb25zIiwiaWQiOiJwYXJzZUxldmVsIiwidmFsdWUiOjN9XV0sInJl
c3VsdHNGb3JtYXQiOiIkcXVlcnk6ICRwMS5pbnRjb3VudFxcbiRwMS5pbnRsaW5r
cy5mb3JtYXQoJyRsaW5rICRjbGVhbmFuY2hvclxcbicpIiwicmVzdWx0c1NhdmVU
byI6ImZpbGUiLCJyZXN1bHRzRmlsZU5hbWUiOiIkZGF0ZWZpbGUuZm9ybWF0KCku
dHh0IiwiYWRkaXRpb25hbEZvcm1hdHMiOltdLCJyZXN1bHRzVW5pcXVlIjoibm8i
LCJxdWVyeUZvcm1hdCI6WyIkcXVlcnkiXSwidW5pcXVlUXVlcmllcyI6dHJ1ZSwi
c2F2ZUZhaWxlZFF1ZXJpZXMiOmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25B
bGxMZXZlbHMiOmZhbHNlLCJxdWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFs
c2UsInF1ZXJ5QnVpbGRlcnNPbkFsbExldmVscyI6ZmFsc2V9LCJyZXN1bHRzT3B0
aW9ucyI6eyJvdmVyd3JpdGUiOmZhbHNlfSwiZG9Mb2ciOiJubyIsImtlZXBVbmlx
dWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0c1ByZXBlbmQiOiIi
LCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6W10sInJlc3VsdHNC
dWlsZGVycyI6W10sImNvbmZpZ092ZXJyaWRlcyI6W10sInJ1blRhc2tPbkNvbXBs
ZXRlIjpudWxsLCJ1c2VSZXN1bHRzRmlsZUFzUXVlcmllc0ZpbGUiOmZhbHNlLCJy
dW5UYXNrT25Db21wbGV0ZUNvbmZpZyI6ImRlZmF1bHQiLCJ0b29sc0pTIjoiIn19

Example of results:
 
Back
Top