Improvements
- In connection with the transfer of the main built-in scrapers to the new Node.js platform, the scrapers have been completely rewritten and updated:
- Major improvements from migration scrapers data to Node.js:
- performance increase in ~ 1.5 times
- unification of HTTP engine with JavaScript scrapers, unified bypass of CloudFlare
- Added new scrapers:
- In
HTML::EmailExtractor added Skip non-HTML blocks option to disable collection of mails inside script, style tags, etc. - In
SE::Google::Translate added new variables:- $translit_orig - original text in transliteration
- $translit_translated - translated text in transliteration
- $variants.$i.text - a list of translation options for the original text
- In
SE::Bing updated list of regions and languages - In
Social::Instagram::Profile и
Social::Instagram::Post added the ability to collect the number of video views - In
SE::Yandex::Translate added the ability to disable the use of sessions - In
Net::HTTP added the ability to specify user-agent for Chrome - In scraper
Rank::MOZ fixed the error that occurred when calling the scraper from the JS method this.parser.request(). - In
Rank::CMS added support for new apps.json and the ability to use
Net::HTTP - In
Net::Whois updated support for all zones - Added option for proxycheckers Exclude from "All", and also made changes in logic:
- "All" - uses all proxies selected for tasks
- specific proxychecker - uses it even if it is not selected in the task
- Added support for outdated versions SSL
- JS scrapers: Added option tlsOpts for this.request(), allows you to transfer settings for https connections
- JS scrapers: updating Node.js с 14.2.0 to 14.15.0
- JS scrapers: the puppeteer module is included in the A-Parser build and does not require a separate installation
- Many different fixes in
SE::Google and
SE::Yandex due to changes in the SERP - In
SE::Yandex removed the function of auto-recognition of captcha due to the change in the type of captcha - Fixed work
SE::Google::Translate - In
HTML::EmailExtractor fixed a bug where large html blocks were skipped - Fixed bug in
Social::Instagram::profile due to which more than one page did not scrap - Fixed authorization in
SE::Google::KeywordPlanner - In
SE::Google::TrustCheck fixed definition of horizontal link blocks - In
SE::Baidu fixed related keywords scraping - In
Shop::Amazon fixed collection of sellers, and also fixed a bug related to the number of pages - Fixed
Rank::Linkpad, and also removed the $links_cost variable in it, since this indicator is no longer at the source - In
Rank::Social::Signal the variable $googleplus_like has been removed due to obsolete - In
Rank::CMS fixed detection based on scripts for new apps.json - Also adapted to changes in serp:
SE::Yandex::Translate,
SE::MailRu,
Rank::MajesticSEO,
SE::Yandex::Direct,
SE::Google::ByImage,
Rank::Ahrefs,
Shop::eBay,
SE::Yandex::Register,
SE::Seznam,
Shop::Yandex::Market,
SE::Dogpile,
SE::Dogpile::Images,
SE::Startpage,
SE::Baidu,
Shop::AliExpress,
SE::Youtube,
Rank::Social::Signal,
SE::Yandex::SQI,
SecurityTrails::Domain
- In
SE::Yandex fixed work Extra query string - Fixed regex in
HTML::EmailExtractor to correct errors in some cases - Fixed scraper behavior
SE::Google::KeywordPlanner in the absence of results on request
Maps::Yandex fixed and translated to puppeteer- Fixed a bug in the priorities of choosing a proxychecker
- JS scrapers: fixed follow_meta_refresh
- API: fixed rawResults parameter work
