跳转到主要内容

Cloudflare::Radar - Cloudflare Radar 爬虫工具

img

Cloudflare Radar 爬虫工具概览

Cloudflare Radar 爬虫工具允许通过域名快速确定网站类别。

得益于内置强大的 Template Toolkit 模板引擎,您可以按需保存所需格式和结构的结果,该引擎允许对结果应用额外逻辑并以各种格式输出数据,包括 JSON、SQL 和 CSV

采集数据

数据采集自 radar.cloudflare.com 服务:

  • 网站类别

应用场景

  • 确定域名所属的网站类别

查询

查询时需要提供域名列表,例如:

a-parser.com  
yandex.ru
google.com
vk.com
facebook.com
youtube.com

结果输出示例

A-Parser 支持灵活的结果格式化,这得益于内置的 Template Toolkit 模板引擎,使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。

默认输出

结果格式:

$query: $categories.format('$name, ')\n

包含类别及其描述的结果示例:

a-parser.com: Business, Business & Economy, 
yandex.ru: News & Media, Entertainment,
vk.com: Social Networks, Society & Lifestyle,
youtube.com: Video Streaming, Entertainment,
facebook.com: Social Networks, Society & Lifestyle,
google.com: Search Engines, Technology,

输出到 CSV 表格

结果格式:

[% FOREACH categories;
tools.CSVline(name, desc);
END %]

结果示例:

Business,"Sites related to business."
"Business & Economy","Sites that are related to business, economy, finance, education, science and technology."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Search Engines","Sites that allow users to search for content using keywords."
Technology,"Sites related to technology that are not included in the science category."
"News & Media","Sites related to news and media."
Entertainment,"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc."

将结果转储为 JSON

通用结果格式:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.categories = [];

FOREACH item IN p1.categories;
obj.categories.push({
name = item.name
desc = item.desc
});
END;

obj.json %]

起始文本:

[

结束文本:

]

结果示例:

[{"query":"yandex.ru","categories":[{"desc":"Sites related to news and media.","name":"News & Media"},{"desc":"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc.","name":"Entertainment"}]},{"query":"google.com","categories":[{"desc":"Sites that allow users to search for content using keywords.","name":"Search Engines"},{"desc":"Sites related to technology that are not included in the science category.","name":"Technology"}]},{"query":"a-parser.com","categories":[{"desc":"Sites related to business.","name":"Business"},{"desc":"Sites that are related to business, economy, finance, education, science and technology.","name":"Business & Economy"}]}]
提示

要在任务编辑器中使用“Prepend text”和“Append text”选项,需要激活“More options”。

可选设置

参数名称默认值描述
Bypass CloudFlare with Chrome Max Pages10通过 Chrome 绕过 CF 时的最大页面数
Bypass CloudFlare with Chrome Headless如果启用此选项,通过 Chrome 绕过 CF 时将不显示浏览器界面
Use session保存良好的会话,从而能够以更少的错误实现更快速的数据抓取。