跳转到主要内容

FreeAI::GoogleAI - Google AI 模式爬虫工具

GoogleAI

Google AI mode 爬虫工具概览

Google AI mode 爬虫工具是一款功能强大的工具,用于从领先的 AI 系统之一收集结构化信息,因为 AI mode 基于 Google 的 Gemini 运行。通过与 Google AI mode 集成,您获得的不仅是链接列表,而是基于大量来源(包括学术文章、博客、论坛和新闻门户)的最新、简洁且相关的回答。

Google AI mode 爬虫工具支持自然语言查询,包括补充说明、上下文问题和嵌套结构。得益于多线程工作模式,处理速度可达 每分钟 800–1500 个查询。根据配置和使用的预设,您可以在几分钟内获取数万个独特的文本片段和数百个链接。

得益于强大的 Template Toolkit 模板引擎,抓取结果可以保存为任何需要的格式,该引擎允许将数据结构化为 JSON, CSV, SQL 等格式,并能实时应用过滤、排序和数据聚合。

FreeAI::GoogleAI 爬虫工具非常适合竞争情报、事实与引用收集、知识库创建、新闻监测和主题分析等任务,这得益于其输出结果的高质量和上下文相关性。

信息

目前 Google 仅在美国境内提供 AI mode 访问权限,因此要运行该爬虫工具,需要使用美国 (US) 代理。同样由于这个原因,结果仅提供英文版本。

采集数据

  • 回答文本(HTML 格式)
  • 数据源的链接、锚点和摘要

功能

  • 与 SE::Google 爬虫工具类似,支持绕过 JS 检测并处理 ReCaptcha 验证码
  • 支持会话 (Sessions),以实现更稳定、更快速的工作

应用场景

  • 针对主题查询收集结构化回答,用于创建知识库、内容计划、参考系统和生成 FAQ
  • 提取带有锚点和摘要的来源链接 - 非常适合构建权威资源列表、引用和收集反向链接
  • 监测品牌、产品或人物的提及 - 并结合上下文和来源
  • 从权威来源搜索和分析专家观点、趋势和见解
  • 快速检查关键主题信息的时效性和完整性
  • 自动化竞争对手分析:引用了哪些资源、涵盖了哪些主题以及频率如何
  • 支持需要从不同来源聚合准确信息的科研和分析项目
  • 任何其他需要快速获取带有真实来源确认和逻辑上下文的简短、准确回答的任务

查询

在查询中需要指定搜索词,就像直接在 Google AI mode 搜索框中输入一样,例如:

How to learn fast?
How to improve memory and concentration?
What is a parser?

结果

信息

以下及后续示例中的结果已简化,以便更好地展示

默认情况下会输出查询的回答,例如:

A parser is a program that <b>analyzes input data, typically text or code, and converts it into a structured format that a computer can understand and process</b>.<br />
<b>Here&#39;s a breakdown of what a parser does:</b>
<ul>
<li><b>Takes Input:</b> Parsers receive input, which can be source code, data in a specific format (like XML or JSON), interactive commands, etc.</li>
<li><b>Breaks Down Input:</b> The parser breaks the input into its constituent parts, such as keywords, operators, identifiers, and symbols, which are called tokens.</li>
...
To improve memory and concentration, you can implement a combination of lifestyle changes and cognitive exercises.<br />
<b>Lifestyle Changes:</b>
<ul>
<li><b>Prioritize Sleep:</b> Adequate and quality sleep is crucial for memory consolidation and cognitive function. Aim for 7-9 hours of sleep each night and maintain a consistent sleep schedule.</li>
...
To learn fast and retain information effectively, focus on strategies that promote active engagement, optimize brain function, and utilize science-backed learning techniques.<br />
<b>1. Enhance Cognitive Function and Focus:</b>
<ul>
<li><b>Prioritize Sleep:</b> Quality sleep is crucial for memory consolidation and improved concentration. Aim for 7-9 hours of sleep per night for optimal results.</li>
...

结果输出变体

得益于内置的 Template Toolkit 模板引擎,A-Parser 支持灵活的结果格式化,这使其能够以任意形式输出结果,包括 CSV 或 JSON 等结构化格式。

导出链接列表

结果格式:

$links.format('$link\n')

结果示例:

https://www.techtarget.com/searchapparchitecture/definition/parser
https://www.linkedin.com/advice/3/what-role-parser-programming-language-theory-bxxbe#:~:text=A%20parser%20in%20a%20compiler,the%20compilation%20process%20will%20stop.
https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser
https://botpenguin.com/glossary/syntax-analysis
https://www.dremio.com/wiki/parsing/#:~:text=Parsing%2C%20also%20known%20as%20syntax,%2C%20processed%2C%20or%20analyzed%20effectively.
...

以 CSV 格式输出链接、锚点和摘要及其位置

结果格式:

[% FOREACH item IN links; tools.CSVline(loop.count, item.link, item.anchor, item.snippet); END %]

结果示例:

...
8,https://www.howtolearn.com/2021/01/9-science-backed-ways-to-learn-anything-faster/,"9 Science Backed Ways to Learn Anything Faster - HowToLearn.com","Jan 4, 2021 — Table of Contents * Break Your Learning into Manageable Portions. * Review What You&#39;re Learning Frequently. * Use Mindmaps to Summarize Information. * Take Note..."
9,https://www.linkedin.com/pulse/how-slow-learners-easily-speed-up-learning-sathya-molagoda,"How Slow Learners Easily Speed Up Learning - LinkedIn","Dec 24, 2022 — How Slow Learners Easily Speed Up Learning * Introduction. If you are a slow learner, don&#39;t worry. You can easily speed up your learning by using some tips and ..."
10,https://www.linkedin.com/pulse/5-science-backed-learning-techniques-save-time-supercharge-k-c-barr-ljqee,"5 Science-Backed Learning Techniques That Save Time and ...","Jan 16, 2025 — Offsetting forgetting is like patching those holes to retain as much water as possible. * 5 Actionable Steps: Review new material within 24 hours of learning it..."
1,https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser,"What Is a Parser? A Beginner-Friendly Guide to ... - Huntress","May 27, 2025 — What&#39;s a Parser (And Why Should You Care)? ... Ever wondered what happens behind the scenes when computers process text, code, or data? That&#39;s where parsers com..."
2,https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser,"What Is a Parser? A Beginner-Friendly Guide to ... - Huntress","May 27, 2025 — What&#39;s a Parser (And Why Should You Care)? ... Ever wondered what happens behind the scenes when computers process text, code, or data? That&#39;s where parsers com..."
3,"https://www.techtarget.com/searchapparchitecture/definition/parser#:~:text=In%20computer%20technology%2C%20a%20parser,necessary%20input%20has%20been%20provided.","What is a Parser? Definition, Types and Examples - TechTarget","Jul 7, 2022 — What is a parser? In computer technology, a parser is a program that&#39;s usually part of a compiler. It receives input in the form of sequential source program in..."
提示

通用结果格式中,使用 Template Toolkit 模板引擎FOREACH 循环中输出 $links 数组。

在结果文件名中,只需将文件扩展名更改为 csv 即可。

可用设置

参数名称默认值描述
Results languageAuto (Based on IP)选择结果语言(参数 lr=)
Search from countryAuto (Based on IP)选择搜索的国家(地理相关搜索,参数 gl=)
Location (city)按城市、地区搜索。可以以 novosibirsk, russia 的形式指定城市;完整的地点列表可以在 Geotargets 中找到(副本 - 需要使用 Canonical Name 列中的值)。同时需要设置正确的 Google 域名
Util::ReCaptcha2 presetdefault确定是否使用 Util::ReCaptcha2Util::ReCaptcha2 来绕过 ReCaptcha
Util::AntiGate presetdefault确定是否使用 Util::AntiGateUtil::AntiGate 来绕过图形验证码
ReCaptcha2 retries3在不更换代理的情况下,尝试发送 ReCaptcha 验证码答案的指定次数
ReCaptcha2 pass proxy允许将(用于 Google 请求的)代理和(从 Google 响应中获取的)Cookie 传递给 ReCaptcha 识别服务
Use sessions保存良好的会话,从而能够更快地进行数据抓取,减少错误数量。
Don't take session可以选择不使用保存的良好会话
Additional headers允许指定任何自定义请求头
Redirect browser max pages10用于绕过 JavaScript 启用检查防御的浏览器页面数量