FreeAI::GoogleAI - Google AI 模式爬虫工具

Google AI mode 爬虫工具概览
Google AI mode 爬虫工具是一款功能强大的工具,用于从领先的 AI 系统之一收集结构化信息,因为 AI mode 基于 Google 的 Gemini 运行。通过与 Google AI mode 集成,您获得的不仅是链接列表,而是基于大量来源(包括学术文章、博客、论坛和新闻门户)的最新、简洁且相关的回答。
Google AI mode 爬虫工具支持自然语言查询,包括补充说明、上下文问题和嵌套结构。得益于多线程工作模式,处理速度可达 每分钟 800–1500 个查询。根据配置和使用的预设,您可以在几分钟内获取数万个独特的文本片段和数百个链接。
得益于强大的 Template Toolkit 模板引擎,抓取结果可以保存为任何需要的格式,该引擎允许将数据结构化为 JSON, CSV, SQL 等格式,并能实时应用过滤、排序和数据聚合。
FreeAI::GoogleAI 爬虫工具非常适合竞争情报、事实与引用收集、知识库创建、新闻监测和主题分析等任务,这得益于其输出结果的高质量和上下文相关性。
目前 Google 仅在美国境内提供 AI mode 访问权限,因此要运行该爬虫工具,需要使用美国 (US) 代理。同样由于这个原因,结果仅提供英文版本。
采集数据
- 回答文本(HTML 格式)
- 数据源的链接、锚点和摘要
功能
- 与 SE::Google 爬虫工具类似,支持绕过 JS 检测并处理 ReCaptcha 验证码
- 支持会话 (Sessions),以实现更稳定、更快速的工作
应用场景
- 针对主题查询收集结构化回答,用于创建知识库、内容计划、参考系统和生成 FAQ
- 提取带有锚点和摘要的来源链接 - 非常适合构建权威资源列表、引用和收集反向链接
- 监测品牌、产品或人物的提及 - 并结合上下文和来源
- 从权威来源搜索和分析专家观点、趋势和见解
- 快速检查关键主题信息的时效性和完整性
- 自动化竞争对手分析:引用了哪些资源、涵盖了哪些主题以及频率如何
- 支持需要从不同来源聚合准确信息的科研和分析项目
- 任何其他需要快速获取带有真实来源确认和逻辑上下文的简短、准确回答的任务
查询
在查询中需要指定搜索词,就像直接在 Google AI mode 搜索框中输入一样,例如:
How to learn fast?
How to improve memory and concentration?
What is a parser?
结果
以下及后续示例中的结果已简化,以便更好地展示
默认情况下会输出查询的回答,例如:
A parser is a program that <b>analyzes input data, typically text or code, and converts it into a structured format that a computer can understand and process</b>.<br />
<b>Here's a breakdown of what a parser does:</b>
<ul>
<li><b>Takes Input:</b> Parsers receive input, which can be source code, data in a specific format (like XML or JSON), interactive commands, etc.</li>
<li><b>Breaks Down Input:</b> The parser breaks the input into its constituent parts, such as keywords, operators, identifiers, and symbols, which are called tokens.</li>
...
To improve memory and concentration, you can implement a combination of lifestyle changes and cognitive exercises.<br />
<b>Lifestyle Changes:</b>
<ul>
<li><b>Prioritize Sleep:</b> Adequate and quality sleep is crucial for memory consolidation and cognitive function. Aim for 7-9 hours of sleep each night and maintain a consistent sleep schedule.</li>
...
To learn fast and retain information effectively, focus on strategies that promote active engagement, optimize brain function, and utilize science-backed learning techniques.<br />
<b>1. Enhance Cognitive Function and Focus:</b>
<ul>
<li><b>Prioritize Sleep:</b> Quality sleep is crucial for memory consolidation and improved concentration. Aim for 7-9 hours of sleep per night for optimal results.</li>
...
结果输出变体
得益于内置的 Template Toolkit 模板引擎,A-Parser 支持灵活的结果格式化,这使其能够以任意形式输出结果,包括 CSV 或 JSON 等结构化格式。
导出链接列表
结果格式:
$links.format('$link\n')
结果示例:
https://www.techtarget.com/searchapparchitecture/definition/parser
https://www.linkedin.com/advice/3/what-role-parser-programming-language-theory-bxxbe#:~:text=A%20parser%20in%20a%20compiler,the%20compilation%20process%20will%20stop.
https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser
https://botpenguin.com/glossary/syntax-analysis
https://www.dremio.com/wiki/parsing/#:~:text=Parsing%2C%20also%20known%20as%20syntax,%2C%20processed%2C%20or%20analyzed%20effectively.
...
以 CSV 格式输出链接、锚点和摘要及其位置
结果格式:
[% FOREACH item IN links; tools.CSVline(loop.count, item.link, item.anchor, item.snippet); END %]
结果示例:
...
8,https://www.howtolearn.com/2021/01/9-science-backed-ways-to-learn-anything-faster/,"9 Science Backed Ways to Learn Anything Faster - HowToLearn.com","Jan 4, 2021 — Table of Contents * Break Your Learning into Manageable Portions. * Review What You're Learning Frequently. * Use Mindmaps to Summarize Information. * Take Note..."
9,https://www.linkedin.com/pulse/how-slow-learners-easily-speed-up-learning-sathya-molagoda,"How Slow Learners Easily Speed Up Learning - LinkedIn","Dec 24, 2022 — How Slow Learners Easily Speed Up Learning * Introduction. If you are a slow learner, don't worry. You can easily speed up your learning by using some tips and ..."
10,https://www.linkedin.com/pulse/5-science-backed-learning-techniques-save-time-supercharge-k-c-barr-ljqee,"5 Science-Backed Learning Techniques That Save Time and ...","Jan 16, 2025 — Offsetting forgetting is like patching those holes to retain as much water as possible. * 5 Actionable Steps: Review new material within 24 hours of learning it..."
1,https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser,"What Is a Parser? A Beginner-Friendly Guide to ... - Huntress","May 27, 2025 — What's a Parser (And Why Should You Care)? ... Ever wondered what happens behind the scenes when computers process text, code, or data? That's where parsers com..."
2,https://www.huntress.com/cybersecurity-101/topics/what-is-a-parser,"What Is a Parser? A Beginner-Friendly Guide to ... - Huntress","May 27, 2025 — What's a Parser (And Why Should You Care)? ... Ever wondered what happens behind the scenes when computers process text, code, or data? That's where parsers com..."
3,"https://www.techtarget.com/searchapparchitecture/definition/parser#:~:text=In%20computer%20technology%2C%20a%20parser,necessary%20input%20has%20been%20provided.","What is a Parser? Definition, Types and Examples - TechTarget","Jul 7, 2022 — What is a parser? In computer technology, a parser is a program that's usually part of a compiler. It receives input in the form of sequential source program in..."
在通用结果格式中,使用 Template Toolkit 模板引擎 在 FOREACH 循环中输出 $links 数组。
在结果文件名中,只需将文件扩展名更改为 csv 即可。
可用设置
| 参数名称 | 默认值 | 描述 |
|---|---|---|
| Results language | Auto (Based on IP) | 选择结果语言(参数 lr=) |
| Search from country | Auto (Based on IP) | 选择搜索的国家(地理相关搜索,参数 gl=) |
| Location (city) | 按城市、地区搜索。可以以 novosibirsk, russia 的形式指定城市;完整的地点列表可以在 Geotargets 中找到(副本 - 需要使用 Canonical Name 列中的值)。同时需要设置正确的 Google 域名 | |
| Util::ReCaptcha2 preset | default | 确定是否使用 Util::ReCaptcha2 来绕过 ReCaptcha |
| Util::AntiGate preset | default | 确定是否使用 Util::AntiGate 来绕过图形验证码 |
| ReCaptcha2 retries | 3 | 在不更换代理的情况下,尝试发送 ReCaptcha 验证码答案的指定次数 |
| ReCaptcha2 pass proxy | ☐ | 允许将(用于 Google 请求的)代理和(从 Google 响应中获取的)Cookie 传递给 ReCaptcha 识别服务 |
| Use sessions | ☑ | 保存良好的会话,从而能够更快地进行数据抓取,减少错误数量。 |
| Don't take session | ☐ | 可以选择不使用保存的良好会话 |
| Additional headers | 允许指定任何自定义请求头 | |
| Redirect browser max pages | 10 | 用于绕过 JavaScript 启用检查防御的浏览器页面数量 |

