We have completed the transition to NodeJS as the main engine for scrapers and present a new stable version 1.2.912 with support for NodeJS 14.2.0. This update combines many improvements, including increased performance, reduced memory consumption, a completely new network stack, as well as support for native NodeJS modules, allowing you to use the full power of the npmjs directory in A-Parser
Also, this update includes changes in working with ReCaptcha2 in the Google scraper, our team was one of the first to find a solution to circumvent the new version of the recaptcha and tested it together with the RuCaptcha service, for which they have a separate respect. At the moment, the correct captcha bypass has been tested with RuCaptcha, Anti-Captcha, XEvil and CapMonster.
In addition, many optimizations were made in the core of A-Parser, and performance was significantly increased when using a large number of tasks or large proxy lists. The scraper
Rank::CMS has been completely rewritten and stabilized, support for the new apps.json format and support for user rules have been added.Improvements
- NodeJS updated to v14.0.0, v8 to 8.1
- Added support for the data-s parameter in recaptures for
SE::Google, also added the ReCaptcha2 pass proxy option
- Increased thread limit to 10,000 for Windows OS
- Significantly improved performance with a large number of active proxies and / or jobs, completely rewritten the stack for working with proxies, optimized work with large lists
- Added new scraper
Rank::KeysSo - Completely rewritten in JS
SE::Yahoo::Suggest[parser], [parser]Rank::Alexa::API and
Rank::Archive - Improved performance when using regular expressions, as well as improved compatibility
- In
SE::Google::KeywordPlanner added automatic token retrieval
- In
SE::Bing added the ability to scrap links to cached pages, as well as the ability to scrap mobile results
- In the scraper
Util::ReCaptcha2, when choosing the provider Capmonster or Xevil it is now optional to specify the Provider url - In
SE::Google::Trends added the ability to specify an arbitrary date range - In
Rank::CMS added the choice of a regular engine and support for its own file with features - In
SE::Yandex::ByImage added option Don't scrape if no other sizes, which allows you to disable the collection of results if the desired image is not in other sizes - [NodeJS] Fixed this.cookies.getAll()
- [NodeJS] Added protection against endless loops and long regulars
- [JS scrapers] Added follow_meta_refresh option for this.request
- [JS scrapers] Added bypass_cloudflare option for this.request
- [JS scrapers] Underscore replaced by Lodash
- [JS scrapers] Added a mark in the log when calling other scrapers
- [JS scrapers] Using the previous proxy after a request to another scraper
- [JS scrapers] Added destroy() method
- Many fixes in
SE::Google - Fixed
SE::Youtube, incl. scraping by tags - Fixed collection of links in
Shop::eBay - Fixed phone scraping in
Maps::Google - Fixed work with captchas in
SE::Yandex::ByImage
- In
Rank::Social::Signal the variable $facebook_comment was deleted due to irrelevance
SE::Startpage,
Rank::Linkpad,
Social::Instagram::post,
SE::Yandex::Translate
- Fixed a bug due to which the selected proxy checker was ignored
- Fixed work of Decode HTML entities and Extract domain functions in Result Constructor
- Fixed problem with encoding detection
- Fixed error using $tools.query
- Fixed bug in
Rank::MajesticSEO in which all attempts were used in the absence of results - Fixed work of http2
- Fixed a bug when the scraper crashes due to the inability to write in alive.txt
- Fixed captcha capturing in
SE::Yandex::Register and
Check::RosKomNadzor - Fixed the difference in requests sent via
Net::HTTP and JS - Fixed bug in
SE::Yahoo - Bugs fixed in
Rank::CMS when choosing an application without a category - [NodeJS] Fixed calculation of scraper code execution time
- [JS scrapers] When the body is empty, the content-length header was not transmitted when posting a request
- [JS scrapers] Fixed work of CloudFlare bypass
- [JS scrapers] Fixed work with sessions
- [JS scrapers] Fixed work with overrides for this.parser.request
- [JS scrapers] Fixed error in encoding detection in JS scrapers
