跳转到主要内容

对其他爬虫工具的请求

await this.parser.request(parser, preset, overrideParams, query)

await this.parser.request(parser, preset, overrideParams, query)

从另一个爬虫工具(内置或另一个 JS 爬虫工具)获取结果,参数如下:

  • parser - 爬虫工具名称 (SE::Google, JS::Custom::Example)
  • preset - 被调用爬虫工具的配置预设
  • overrideParams - 用于覆盖被调用爬虫工具设置的哈希表
  • query - 查询

部分忽略由 preset 参数指定的被调用爬虫工具的预设。具体而言,useproxyproxyCheckerproxybannedcleanup 设置将从调用 await this.parser.request 的爬虫工具中获取。

overrideParams 中可以重写被调用爬虫工具的参数,此外还提供以下标志:

overrideParams.resultArraysWithObjects

resultArraysWithObjects: 0 - 确定以何种形式返回被调用爬虫工具的结果数组:

  • 如果开启 (1) - 将返回对象数组
    [{link: 'link1', anchor: 'anchor1'}, {link: 'link2', anchor: 'anchor2'}, ...]
  • 如果关闭 (0) - 将返回标准值数组
    ['link1', 'anchor1', 'link2', 'anchor2', ...]

示例:

import { BaseParser } from 'a-parser-types';

class JS_DocExample extends BaseParser {
static defaultConf: typeof BaseParser.defaultConf = {
results_format: "$links.format('$link\n')",
results: {
arrays: {
links: ['Links', [
['link', 'link']
]]
}
}
}

async parse(set, results) {
let response = await this.parser.request('SE::Google', 'default', {
resultArraysWithObjects: 1,
pagecount: 1
}, set.query)
results.success = response.success;

if(response.success) {
response.serp.forEach(element => {
results.links.push(element.link);
});
}

return results;
}
}

结果示例:

https://www.speedtest.net/
https://www.investopedia.com/terms/t/t-test.asp
https://www.cdc.gov/coronavirus/2019-ncov/testing/diagnostic-testing.html
https://fast.com/
https://www.thinkwithgoogle.com/feature/testmysite/
https://projectstream.google.com/speedtest
https://www.nhs.uk/conditions/coronavirus-covid-19/testing/
https://www.fda.gov/consumers/consumer-updates/coronavirus-disease-2019-testing-basics
https://zoom.us/test
https://www.gov.uk/get-coronavirus-test
https://en.wikipedia.org/wiki/Test_(assessment)
...

overrideParams.needData

needData: 1 - 确定是否在响应中传递 (1) 或不传递 (0) data/pages[],可用于优化

overrideParams.needResults

needResults: [ ... ] - 需要返回的结果列表。

示例:

let response = await this.parser.request('SE::Bing', 'default', {
needResults: [
'totalcount'
]
}, set.query)

结果:

{"success":1,"info":{"success":1,"retries":0},"totalcount":"2130000000"}
提示

对于 API::Server::RedisAPI::Server::Redis 的工作原理相同

overrideParams.skipProxySettingsInheritance

skipProxySettingsInheritance: 0 - 此选项允许禁用被调用爬虫工具对 useproxy 参数的继承

对支持批量模式的爬虫工具进行请求

还支持访问在 批量模式 下运行的爬虫工具。为此,需要在 query 中传递 ['key1', 'key2', ...] 形式的查询数组。

被调用爬虫工具的处理结果将包含在 bulkResults 数组中,以下是该数组的示例:

{
"bulkResults": [
{
"success": 1,
"someArrayResult": [...],
"someFlatResult": '...',
"query": "key1",
"data": "..."
},
{
"success": 1,
"someArrayResult": [...],
"someFlatResult": '...',
"query": "key2",
"data": "..."
},
{
"success": 1,
"someArrayResult": [...],
"someFlatResult": '...',
"query": "key3",
"data": "..."
}
],
"success": 1,
"info": {
"success": 1,
"retries": 1
}
}

批量模式下调用另一个爬虫工具的示例

import { BaseParser } from 'a-parser-types';

export class JS_Example_BulkQueries extends BaseParser {
static defaultConf: typeof BaseParser.defaultConf = {
version: '1.0.0',
results: {
flat: [
['views', 'Views count per month']
]
},
results_format: "$query: $views\\n",
SE_Yandex_Direct_Frequency_preset: 'default',
bulkQueries: 10 // 设置“包”中的查询数量
};

static editableConf: typeof BaseParser.editableConf = [
['SE_Yandex_Direct_Frequency_preset', ['combobox', 'SE::Yandex::Direct::Frequency preset']]
];

async parse(set, results) {
const { success, bulkResults } = await this.parser.request(
'SE::Yandex::Direct::Frequency',
this.conf.SE_Yandex_Direct_Frequency_preset,
{ useAccounts: 1 },
set.bulkQueries.map((el) => el.query) // 将对象数组 set.bulkQueries 转换为 [query1, query2, ... query10]
);
if(success) {
// 将结果填充到 results.bulkResults 中
for(let query_number = 0; query_number < set.bulkQueries.length; query_number++) {
results.bulkResults[query_number].views = bulkResults[query_number].views;
results.bulkResults[query_number].success = bulkResults[query_number].success;
}
}

return results;
}
}