API 方法
ping
检查服务器和 API 的运行状态。
请求示例
{
"password": "pass",
"action": "ping"
}
响应示例
{
"success": 1,
"data": "pong"
}
oneRequest
单次数据抓取请求,可以使用任何爬虫工具和预设。结果将根据预设中定义的结果格式生成字符串,并包含爬虫工具的完整运行日志。
请求示例
{
"password": "pass",
"action": "oneRequest",
"data": {
"query": "test",
"parser": "SE::Google",
"configPreset": "default",
"preset": "Pages Count use Proxy"
}
}
参数说明:
parser- 执行请求的爬虫工具preset- 使用的预设configPreset- 使用的线程配置query- 查询内容本身rawResults- 可选参数,如果设置,则返回包含该爬虫工具支持的所有结果的 results 数组,而不是结果字符串 resultStringneedData- 可选参数,指定是否在响应中传递 data 和 pages,用于节省内存,默认关闭doLog- 可选参数,指定是否随结果一起返回运行日志,默认开启options- 应用于爬虫工具的附加选项数组,例如 override - 允许覆盖预设中的值
响应示例
{
"success": 1,
"data": {
"resultString": "test: 6070000000\n",
"logs": [
[
0,
1614933603,
"Parser SE::Google::0 parse query test"
],
[
0,
1614933603,
"Parse page 1"
],
...
[
0,
1614933604,
"Thread complete work"
]
]
}
}
响应中的变量说明:
resultString- 结果字符串logs- 请求执行日志数组
覆盖爬虫工具参数
在 options 中可以覆盖所用爬虫工具的任何参数,这些参数将优先于设置预设中指定的值。
请求示例:
{
"password": "pass",
"action": "oneRequest",
"data": {
"parser": "SE::Google",
"preset": "default",
"query": "test",
"rawResults": 1,
"doLog": 0,
"options": [
{
"value": 1,
"type": "override",
"id": "pagecount"
},
{
"value": 10,
"type": "override",
"id": "linksperpage"
},
{
"type": "override",
"id": "useproxy",
"value": false
}
]
}
}
响应:
{
"success": 1,
"data": {
"logs": [],
"results": [
{
"ads": [],
"success": 1,
"info": {
"success": 1,
"retries": 0,
"stats": {
"success": 1,
"retries": 0,
"proxiesUsed": 0,
"requests": 0,
"queries": 1
}
},
"query": {
"first": "test",
"threadId": "0",
"lvl": 0,
"query": "test",
"queryUid": "0",
"orig": "test",
"prefered_proxy": null
},
"totalcount": "6130000000",
"serp": [
"https://www.test.de/",
"Stiftung Warentest",
"Sichern Sie sich mit der <em>test</em>.de-Flatrate freien Zugriff auf alle <em>Tests</em> und Produktfinder. Mehr erfahren: <em>test</em>.de-Flatrates. <em>Test</em> CO2 ...",
0,
...
],
"related": [
"test <b>speed</b>",
...
],
"detected_geo": "Germany",
"rich": [
"Featured snippet"
],
"misspell": 0,
"resultsCount": 11
}
]
}
}
bulkRequest
批量数据抓取请求,可以使用任何爬虫工具和预设,还可以指定用于数据抓取的线程数。结果将根据预设中定义的结果格式生成字符串,并包含每个线程的完整爬虫工具运行日志。
请求示例
{
"password": "pass",
"action": "bulkRequest",
"data": {
"parser": "SE::Google",
"preset": "Pages Count no Proxy",
"configPreset": "default",
"threads": 3,
"rawResults": 1,
"queries": [
"test1",
"test2",
"test3",
"test4",
"test5"
]
}
}
data 中可能的变量说明:
threads- 数据抓取的线程数queries- 查询数组。其他变量与 oneRequest 完全一致,除了query
响应示例
{
"success": 1,
"data": {
"logs": {
"0": {
"1": [
[
4,
1614935687,
"SE::Google::0",
"test1"
],
...
[
0,
1614935688,
"Thread complete work"
]
]
},
...其他线程的日志...
},
"results": [
{
"ads": [],
"success": 1,
"info": {
"success": 1,
"retries": 0,
"stats": {
"success": 1,
"retries": 0,
"proxiesUsed": 0,
"requests": 0,
"queries": 1
}
},
"query": {
"first": "test1",
"threadId": "0",
"lvl": 0,
"query": "test1",
"queryUid": "0",
"orig": "test1",
"prefered_proxy": null
},
"totalcount": "35800000",
"serp": [
"https://www.test1solutions.com/",
"FoamFlex200 | Oil Spill Absorbent | Oil Disaster solution",
"<em>TEST1</em> IS SOLUTION. In addition to the different models ...",
0,
...
],
"related": [
"<b>test2</b>",
...
],
"detected_geo": "Germany",
"rich": [
"Featured snippet",
"People also ask"
],
"misspell": 0,
"resultsCount": 12
},
...其他查询的结果...
]
}
}
addTask
将任务添加到队列,所有参数与任务编辑器界面中设置的参数一致
为了快速获取任务的完整配置,可以使用 任务编辑器,创建任务并 获取 API 请求的 JSON
请求示例
{
"password": "pass",
"action": "addTask",
"data": {
"preset": "default",
"configPreset": "100 Threads",
"parsers": [
[
"SE::Bing",
"default",
{
"type": "override",
"id": "pagecount",
"value": 1
},
{
"type": "options",
"id": "parseAll",
"value": true
}
]
],
"resultsFormat": "$p1.serp.format('$link;$anchor\\n')",
"resultsSaveTo": "file",
"resultsFileName": "$datefile.format().csv",
"additionalFormats": [],
"resultsUnique": "string",
"queriesFrom": "text",
"queryFormat": [
"$query"
],
"uniqueQueries": true,
"saveFailedQueries": false,
"iteratorOptions": {
"onAllLevels": false,
"queryBuildersAfterIterator": false,
"queryBuildersOnAllLevels": false
},
"resultsOptions": {
"overwrite": false,
"writeBOM": true
},
"doLog": "db",
"limitLogsCount": "0",
"keepUnique": "No",
"moreOptions": true,
"resultsPrepend": "链接;锚文本\n",
"resultsAppend": "",
"queryBuilders": [],
"resultsBuilders": [],
"configOverrides": [],
"runTaskOnComplete": null,
"useResultsFileAsQueriesFile": false,
"runTaskOnCompleteConfig": "default",
"toolsJS": "",
"prio": 5,
"removeOnComplete": false,
"callURLOnComplete": "",
"queries": [
"test",
"bla-bla"
]
}
}
响应示例
响应中返回已创建任务的 id。
{
"success": 1,
"data": "697403"
}
从文件获取查询的任务
为了让任务从文件中获取查询,需要设置 "queriesFrom": "file" 以及查询文件的路径 "queriesFile": ["queries1.txt", "queries2.txt"]。其他部分与上述示例相同。
请求示例
{
"password": "pass",
"action": "addTask",
"data": {
"preset": "default",
"configPreset": "100 Threads",
"parsers": [
[
"SE::Google",
"default"
]
],
"resultsFormat": "$p1.preset",
"resultsSaveTo": "file",
"resultsFileName": "$datefile.format().csv",
"additionalFormats": [],
"resultsUnique": "no",
"queriesFrom": "file",
"queryFormat": [
"$query"
],
"uniqueQueries": false,
"saveFailedQueries": false,
"iteratorOptions": {
"onAllLevels": false,
"queryBuildersAfterIterator": false,
"queryBuildersOnAllLevels": false
},
"resultsOptions": {
"overwrite": false,
"writeBOM": false
},
"doLog": "no",
"limitLogsCount": "0",
"keepUnique": "No",
"moreOptions": false,
"resultsPrepend": "",
"resultsAppend": "",
"queryBuilders": [],
"resultsBuilders": [],
"configOverrides": [],
"runTaskOnComplete": null,
"useResultsFileAsQueriesFile": false,
"runTaskOnCompleteConfig": "default",
"toolsJS": "",
"prio": 5,
"removeOnComplete": false,
"callURLOnComplete": "",
"queriesFile": [
"queries/Text Document.txt"
]
}
}
响应示例
{
"success": 1,
"data": "5432"
}
运行之前保存的任务
可以运行预先通过界面创建的预设。在这种情况下,只需指定查询即可。也可以覆盖任务的任何参数,它们将优先于预设中的值。
请求示例
{
"password": "pass",
"action": "addTask",
"data": {
"queriesFrom": "text",
"queries": [
"google.com",
"yandex.ru"
],
"configPreset": "default",
"preset": "Analyze Domains"
}
}
响应示例
{
"success": 1,
"data": "5436"
}
removeOnRestart 标志
此标志指示任务将在爬虫工具重启时被删除。
请求示例
{
"password": "pass",
"action": "addTask",
"data": {
"queriesFrom": "text",
"queries": [
"google.com",
"yandex.ru"
],
"configPreset": "default",
"preset": "Analyze Domains",
"removeOnRestart": 1
}
}
响应示例
{
"success": 1,
"data": "5437"
}
removeOnComplete 标志
此标志指示任务将在完成后被删除。
请求示例
{
"password": "pass",
"action": "addTask",
"data": {
"queriesFrom": "text",
"queries": [
"google.com",
"yandex.ru"
],
"configPreset": "default",
"preset": "Analyze Domains",
"removeOnComplete": 1
}
}
响应示例
{
"success": 1,
"data": "5438"
}
info
获取有关 A-Parser 状态的一般信息并获取所有可用爬虫工具的列表。
可以直接通过链接获取 A-Parser 的最新版本号: https://a-parser.com/members/versions
示例:
{"lastBetaVersion":"1.2.1484","lastAlphaVersion":"1.2.1484","lastVersion":"1.2.1432"}
请求示例
{
"password": "pass",
"action": "info"
}
响应示例
{
"success": 1,
"data": {
"tasksInQueue": 0,
"pid": "13968",
"activeProxyCheckerThreads": 0,
"workingTasks": 0,
"activeThreads": 0,
"version": "1.2.1151",
"availableParsers": [
"API::Server::Redis",
"Check::BackLink",
"Check::RosKomNadzor",
"DeepL::Translator",
"GooglePlay::Apps",
"HTML::EmailExtractor",
...
"Util::AntiGate",
"Util::ReCaptcha2",
"Util::YandexRecognize"
]
}
}
getParserPreset
获取指定爬虫工具和预设的设置。
使用此方法可以获取用于其他 API 请求的完整参数列表。
请求示例
{
"password": "pass",
"action": "getParserPreset",
"data": {
"parser": "SE::Google",
"preset": "default"
}
}
响应示例
{
"success": 1,
"data": {
"queryformat": "$query",
"parsenotfound": 1,
"reCaptchaRetries": 3,
"pagecount": 5,
"gl": "",
"proxyChecker": "*",
"hl": "en",
"domain": "www.google.com",
"timeout": 60,
"Util_ReCaptcha2_preset": "default",
"useproxy": 1,
"nfpr": 0,
"extraquery": "",
"serptime": "all",
"location": "",
"usesessions": 1,
"filter": 1,
"linksperpage": 100,
"dontTakeSession": 0,
"addHeaders": "",
"serp": "",
"proxyretries": 10,
"device": "desktop",
"requestdelay": 0,
"debug_nonexists_domains": 0,
"proxybannedcleanup": 600,
"formatresult": "$serp.format('$link\\n')",
"reCaptchaPassProxy": 0,
"lr": ""
}
}
getProxies
请求存活代理列表。返回来自所有代理检查器的存活代理列表。
请求示例
{
"password": "pass",
"action": "getProxies"
}
响应示例
{
"success": 1,
"data": {
"127.0.0.1:23486": [
"socks"
],
"127.0.0.1:23140": [
"socks"
],
"127.0.0.1:21971": [
"http"
]
}
}
代理的 IP:port 将作为数组名称。数组的第一个元素是代理类型,可以取 3 个值 - http、socks、socks4。如果指定了用户名/密码认证,则第二个和第三个元素将是用户名和密码。
也可以仅从特定的代理检查器获取代理列表。为此,需要额外传递 checkers 数组。
示例:
{
"password": "pass",
"action": "getProxies",
"data": {
"checkers": [
"Elite proxies",
"free proxies"
]
}
}
getTaskState
根据任务 id 获取任务状态。
请求示例
{
"password": "pass",
"action": "getTaskState",
"data": {
"taskUid": "181"
}
}
响应示例
{
"success": 1,
"data": {
"status": "completed",
"stats": "<b>Overall stats</b><br>Runtime: 0:00:19<br>HTTP requests: 464<br><br><b>1. HTML::LinkExtractor</b><br>Queries done: 254<br>Successful queries: 252<br>Proxies used: 0 (per query)<br>Retries used: 1.07 (per query)<br>HTTP requests: 1.82 (per query)",
"state": {
"totalFail": 2,
"totalWaitProxyThreads": 0,
"minimized": 0,
"queriesDoneCount": 254,
"avgSpeed": 802,
"activeThreads": 0,
"startTime": 1507281122,
"changeTime": 1507281141,
"queriesCount": 1,
"logExists": 0,
"runTime": 19,
"uniqueResultsCount": 656,
"requests": "464",
"addTime": 1507281120,
"additionalCount": 253,
"queriesDoneCountAtStart": 0,
"lastQuery": "https://www.nytimes.com/ref/membercenter/help/infoservdirectory.html",
"curSpeed": 846,
"started": 1,
"resultsCount": 31079
}
}
}
响应返回任务状态 (status) 及其统计信息 (state)。
也可以同时获取多个任务的信息,为此需要传递 id 数组。
请求示例
{
"password": "pass",
"action": "getTaskState",
"data": {
"taskUid": [
"22",
"23",
"31"
]
}
}
在这种情况下,响应将是一个包含每个任务数据的数组。
响应示例
{
"success": 1,
"data": [
{
"status": "completed",
"stats": "<b>Overall stats</b><br>Runtime: 0:00:01<br>HTTP requests: 0<br><br><b>1. SE::Bing</b><br>Queries done: 0<br>Successful queries: 0<br>Proxies used: 0 (per query)<br>Retries used: 0 (per query)<br>HTTP requests: 0 (per query)",
"state": {
"totalFail": 0,
"totalWaitProxyThreads": 0,
"minimized": 0,
"queriesDoneCount": 0,
"avgSpeed": 0,
"activeThreads": 0,
"startTime": 1507023540,
"changeTime": 1507023541,
"queriesCount": 1,
"logExists": 1,
"runTime": 1,
"uniqueResultsCount": 0,
"requests": 0,
"addTime": 1507023443,
"additionalCount": 0,
"queriesDoneCountAtStart": 0,
"lastQuery": "none",
"curSpeed": 0,
"started": 1,
"resultsCount": 0
}
},
...其他任务的数据...
]
}
getTaskConf
根据任务 id 获取任务配置。
请求示例
{
"password": "pass",
"action": "getTaskConf",
"data": {
"taskUid": "181"
}
}
响应示例
响应返回任务设置,包括最终的文件名。
{
"success": 1,
"data": {
"parsers": [
[
"SE::Bing",
"default",
{
"value": 1,
"type": "override",
"id": "pagecount"
}
]
],
"resultsFileName": "Mar-05_13-12-23.txt",
"runTaskOnComplete": null,
"limitLogsCount": "0",
"resultsPrepend": "",
"origResultsFileName": "$datefile.format().txt",
"queriesFrom": "text",
"runTaskOnCompleteConfig": "default",
"doLog": "db",
"useResultsFileAsQueriesFile": 0,
"additionalFormats": [],
"resultsSaveTo": "file",
"callURLOnComplete": "",
"resultsFormat": "$p1.related.format('$key\\n')",
"queryBuilders": [],
"preset": "default",
"resultsAppend": "",
"uniqueQueries": 0,
"keepUnique": 0,
"prio": "5",
"saveFailedQueries": 0,
"configPreset": "100 Threads",
"queries": [
"test"
],
"toolsJS": "",
"moreOptions": 0,
"resultsBuilders": [],
"resultsUnique": "string",
"iteratorOptions": {
"onAllLevels": 0,
"queryBuildersAfterIterator": 0,
"queryBuildersOnAllLevels": 0
},
"removeOnComplete": 0,
"queryFormat": [
"$query"
],
"configOverrides": [],
"resultsOptions": {
"overwrite": 0,
"writeBOM": 0
}
}
}
getTaskResultsFile
根据任务 id 获取结果下载链接。通过获取的链接只能下载文件一次,无需认证(使用一次性令牌)。
仅适用于静态文件名和 $datefile.format()。要将动态结果文件名转换为静态文件名,可以使用模板引擎标志 isStaticTemplate()
请求示例
{
"password": "pass",
"action": "getTaskResultsFile",
"data": {
"taskUid": "181"
}
}
响应示例
{
"success": 1,
"data": "http://127.0.0.1:9091/downloadResults?fileName=Mar-05_13-12-23.txt&token=wbvwlkes"
}
getTasksList
获取活动任务列表。如果传递附加参数 completed: 1,则获取已完成任务的列表。
请求示例
{
"password": "pass",
"action": "getTasksList",
"data": {
"completed": "1"
}
}
响应示例
{
"success": 1,
"data": [
"2291",
"2324",
"2331",
"2384",
"2398",
"2434",
"2445",
"3482",
...
]
}
getParserInfo
列出指定爬虫工具可以返回的所有可用结果。
请求示例
{
"password" : "pass",
"action" : "getParserInfo",
"data" : {
"parser" : "SE::Google"
}
}
响应示例
{
"success": 1,
"data": {
"results": {
"arrays": {
"ads": [
"Ads list",
[
[
"link",
"Link"
],
...
]
],
"related": [
"Related keywords",
[
[
"key",
"Key"
]
]
],
"rich": [
"Rich snippets list",
[
[
"name",
"Name"
]
]
],
"serp": [
"Main serp list",
[
[
"link",
"Link"
],
...
]
],
"pages": [
"Raw data array",
[
[
"data",
"Raw data"
]
]
]
},
"flat": [
[
"query",
"Formatted query"
],
...
]
}
}
}
getAccountsCount
获取活跃的 Yandex 账号数量。
请求示例
{
"password": "pass",
"action": "getAccountsCount"
}
响应示例
{
"success": 1,
"data": {
"SE::Yandex": 18
}
}
deleteTaskResultsFile
根据任务 id 删除结果文件。
请求示例
{
"password": "pass",
"action": "deleteTaskResultsFile",
"data": {
"taskUid": "181"
}
}
响应示例
{
"success": 1
}
changeTaskStatus
根据任务 id 更改任务状态。任务可以切换到以下 4 种状态:
- starting - 启动任务
- pausing - 暂停任务
- stopping - 停止任务
- deleting - 删除任务
请求示例
{
"password": "pass",
"action": "changeTaskStatus",
"data": {
"taskUid": "181",
"toStatus": "deleting"
}
}
响应示例
{
"success": 1
}
changeProxyCheckerState
更改代理检查器状态(1 - 开启 / 0 - 关闭)。
请求示例
{
"password": "pass",
"action": "changeProxyCheckerState",
"data": {
"checker": "proxychecker name",
"state": 1
}
}
响应示例
{
"success": 1
}
moveTask
根据任务 id 在队列中移动任务。可能的移动方向:
- start - 移至队列开头
- end - 移至队列末尾
- up - 向上移动一个位置
- down - 向下移动一个位置
请求示例
{
"password": "pass",
"action": "moveTask",
"data": {
"taskUid": "181",
"direction": "start"
}
}
响应示例
{
"success": 1
}
update
将爬虫工具可执行文件更新到最新的可用版本。发送命令后,A-Parser 将自动重启。API 将在下载并更新可执行文件后返回成功响应,这可能需要 1-3 分钟。
请求示例
{
"password": "pass",
"action": "update"
}
响应示例
{
"success": 1
}