1.2.1076 - 3 new scrapers. Сompleting the transition to Node.js. Integration of puppeteer into build

Support Artur · Dec 21, 2020

Improvements

In connection with the transfer of the main built-in scrapers to the new Node.js platform, the scrapers have been completely rewritten and updated:
Major improvements from migration scrapers data to Node.js:
- performance increase in ~ 1.5 times
- unification of HTTP engine with JavaScript scrapers, unified bypass of CloudFlare
Added new scrapers:
In HTML::EmailExtractor added Skip non-HTML blocks option to disable collection of mails inside script, style tags, etc.
In SE::Google::Translate added new variables:
- $translit_orig - original text in transliteration
- $translit_translated - translated text in transliteration
- $variants.$i.text - a list of translation options for the original text
In SE::Bing updated list of regions and languages
In Social::Instagram::Profile и Social::Instagram::Post added the ability to collect the number of video views
In SE::Yandex::Translate added the ability to disable the use of sessions
In Net::HTTP added the ability to specify user-agent for Chrome
In scraper Rank::MOZ fixed the error that occurred when calling the scraper from the JS method this.parser.request().
In Rank::CMS added support for new apps.json and the ability to use Net::HTTP
In Net::Whois updated support for all zones
Added option for proxycheckers Exclude from "All", and also made changes in logic:
- "All" - uses all proxies selected for tasks
- specific proxychecker - uses it even if it is not selected in the task
Added support for outdated versions SSL
JS scrapers: Added option tlsOpts for this.request(), allows you to transfer settings for https connections
JS scrapers: updating Node.js с 14.2.0 to 14.15.0
JS scrapers: the puppeteer module is included in the A-Parser build and does not require a separate installation

Corrections due to changes in the SERP

Many different fixes in SE::Google and SE::Yandex due to changes in the SERP
In SE::Yandex removed the function of auto-recognition of captcha due to the change in the type of captcha
Fixed work SE::Google::Translate
In HTML::EmailExtractor fixed a bug where large html blocks were skipped
Fixed bug in Social::Instagram::profile due to which more than one page did not scrap
Fixed authorization in SE::Google::KeywordPlanner
In SE::Google::TrustCheck fixed definition of horizontal link blocks
In SE::Baidu fixed related keywords scraping
In Shop::Amazon fixed collection of sellers, and also fixed a bug related to the number of pages
Fixed Rank::Linkpad, and also removed the $links_cost variable in it, since this indicator is no longer at the source
In Rank::Social::Signal the variable $googleplus_like has been removed due to obsolete
In Rank::CMS fixed detection based on scripts for new apps.json
Also adapted to changes in serp: SE::Yandex::Translate, SE::MailRu, Rank::MajesticSEO, SE::Yandex::Direct, SE::Google::ByImage, Rank::Ahrefs, Shop::eBay, SE::Yandex::Register, SE::Seznam, Shop::Yandex::Market, SE::Dogpile, SE::Dogpile::Images, SE::Startpage, SE::Baidu, Shop::AliExpress, SE::Youtube, Rank::Social::Signal, SE::Yandex::SQI, SecurityTrails::Domain

Bug fixes

In SE::Yandex fixed work Extra query string
Fixed regex in HTML::EmailExtractor to correct errors in some cases
Fixed scraper behavior SE::Google::KeywordPlanner in the absence of results on request
Maps::Yandex fixed and translated to puppeteer
Fixed a bug in the priorities of choosing a proxychecker
JS scrapers: fixed follow_meta_refresh
API: fixed rawResults parameter work

1.2.1076 - 3 new scrapers. Сompleting the transition to Node.js. Integration of puppeteer into build

Support Artur

A-Parser Enterprise License

About

Quick navigation

Social media

Support