1.2.912 - NodeJS update, performance improvement, adaptation to changes in recaptchas

Support Artur

A-Parser Enterprise License
A-Parser Enterprise
1.2.912.png


We have completed the transition to NodeJS as the main engine for scrapers and present a new stable version 1.2.912 with support for NodeJS 14.2.0. This update combines many improvements, including increased performance, reduced memory consumption, a completely new network stack, as well as support for native NodeJS modules, allowing you to use the full power of the npmjs directory in A-Parser

Also, this update includes changes in working with ReCaptcha2 in the Google scraper, our team was one of the first to find a solution to circumvent the new version of the recaptcha and tested it together with the RuCaptcha service, for which they have a separate respect. At the moment, the correct captcha bypass has been tested with RuCaptcha, Anti-Captcha, XEvil and CapMonster.

In addition, many optimizations were made in the core of A-Parser, and performance was significantly increased when using a large number of tasks or large proxy lists. The scraper Rank::CMS Rank::CMS has been completely rewritten and stabilized, support for the new apps.json format and support for user rules have been added.

Improvements
  • NodeJS updated to v14.0.0, v8 to 8.1
  • Added support for the data-s parameter in recaptures for SE::Google SE::Google, also added the ReCaptcha2 pass proxy option
  • Increased thread limit to 10,000 for Windows OS
  • Significantly improved performance with a large number of active proxies and / or jobs, completely rewritten the stack for working with proxies, optimized work with large lists
  • Added new scraper Rank::KeysSo Rank::KeysSo
  • Completely rewritten in JS SE::Yahoo::Suggest[parser], [parser]Rank::Alexa::API SE::Yahoo::Suggest[parser], [parser]Rank::Alexa::API and Rank::Archive Rank::Archive
  • Improved performance when using regular expressions, as well as improved compatibility
  • In SE::Google::KeywordPlanner SE::Google::KeywordPlanner added automatic token retrieval
  • In SE::Bing SE::Bing added the ability to scrap links to cached pages, as well as the ability to scrap mobile results
  • In the scraper Util::ReCaptcha2 Util::ReCaptcha2, when choosing the provider Capmonster or Xevil it is now optional to specify the Provider url
  • In SE::Google::Trends SE::Google::Trends added the ability to specify an arbitrary date range
  • In Rank::CMS Rank::CMS added the choice of a regular engine and support for its own file with features
  • In SE::Yandex::ByImage SE::Yandex::ByImage added option Don't scrape if no other sizes, which allows you to disable the collection of results if the desired image is not in other sizes
  • [NodeJS] Fixed this.cookies.getAll()
  • [NodeJS] Added protection against endless loops and long regulars
  • [JS scrapers] Added follow_meta_refresh option for this.request
  • [JS scrapers] Added bypass_cloudflare option for this.request
  • [JS scrapers] Underscore replaced by Lodash
  • [JS scrapers] Added a mark in the log when calling other scrapers
  • [JS scrapers] Using the previous proxy after a request to another scraper
  • [JS scrapers] Added destroy() method
Corrections due to changes in the issuance
Corrections
  • Fixed a bug due to which the selected proxy checker was ignored
  • Fixed work of Decode HTML entities and Extract domain functions in Result Constructor
  • Fixed problem with encoding detection
  • Fixed error using $tools.query
  • Fixed bug in Rank::MajesticSEO Rank::MajesticSEO in which all attempts were used in the absence of results
  • Fixed work of http2
  • Fixed a bug when the scraper crashes due to the inability to write in alive.txt
  • Fixed captcha capturing in SE::Yandex::Register SE::Yandex::Register and Check::RosKomNadzor Check::RosKomNadzor
  • Fixed the difference in requests sent via Net::HTTP Net::HTTP and JS
  • Fixed bug in SE::Yahoo SE::Yahoo
  • Bugs fixed in Rank::CMS Rank::CMS when choosing an application without a category
  • [NodeJS] Fixed calculation of scraper code execution time
  • [JS scrapers] When the body is empty, the content-length header was not transmitted when posting a request
  • [JS scrapers] Fixed work of CloudFlare bypass
  • [JS scrapers] Fixed work with sessions
  • [JS scrapers] Fixed work with overrides for this.parser.request
  • [JS scrapers] Fixed error in encoding detection in JS scrapers
 
Back
Top