SE::Pinterest - Pinterest 搜索结果爬虫工具

Pinterest 爬虫工具概览
Pinterest 搜索结果爬虫工具。得益于 SE::Pinterest 爬虫工具,您可以获取图片链接库或可供后续使用的图片。您还可以抓取域名库或文本库,用于后续的文本生成。您可以直接使用在 Pinterest 搜索栏中输入的查询词。
得益于 A-Parser 的多线程工作机制,查询处理速度可达每分钟 4000 次以上,平均每分钟可获取约 100000 条结果。

A-Parser 的功能允许保存 Pinterest 爬虫工具设置以供后续使用(预设),设置数据抓取计划等等。您可以使用自动查询扩展、从文件读取子查询、遍历数字字母组合及列表,以获取尽可能多的结果。
得益于内置强大的 Template Toolkit 模板引擎,您可以按所需的格式和结构保存结果,该引擎允许对结果应用额外的逻辑,并以各种格式输出数据,包括 JSON、SQL 和 CSV。
爬虫工具应用案例
通过链接下载图片
A-Parser 支持使用任务链:当第一个任务完成后,第二个任务将开始执行,第一个任务获取的链接将作为第二个任务的查询词。
下载示例
eJyNVktT2zAQ/iuMJofQhsQcevGFCZRM6VCSQjiFdEbEa1cgS0aSQxiT/96V7PhV
J3Dzrval3W8/OSOG6mc9U6DBaOIvMpK4b+KTGRMGUDBHVzGN4Oi7fBVc0gAUGZCE
Kg3KeizI3aXvl8Z4FkBIU27IcjkgqMJPPZEqpjZoLzkdFhnKwzu6hrnEw5BxqNQT
lG5oDNYroAbs6TB0gfrHQ7OxEWgQMMOkoDzPYAuqst4L9pJaf20UExHao6gY6ImS
MaoNuCBW+barcEF6TiYYJnX+v3Mf4oeUaxgQjeVOKBYTtE8YtoAaqaaJrQn1GZFi
zPk1rIFXZi7+eco4tlKPQ3S6Khy7Tab/xdiWV6ynWoN6VVhDGcVJ59NflVcgr2WE
Nw8e8d6cxcygrC9kKuxwPFQ+AyRl324kamKpoExjVAplcoRNAiJAw2pq46RSNW7R
mExTuZIiZNEU61csgJ1lKuaIzam4kHHCwV6LOCDqJhJTDbcVYMa6GIoVynLboS5c
QtuHAqoDYqTk+uddXniiGOLxmy03xrbWayhau6Kc399eN6qr8OWwpc2DCKU0j2hK
rIeBSCKw7JI55PnkckOtt8Zj2CRUBBDkNW+X20HWWLIbML7/Yz6f1RYMTRREsMFI
2EIDwpyYtwT84ZczZjs16icien9K4Cx6j1h47KaE9nM0Qh9mF0gp+lZsme1CfrJK
tZFx3tZytKj/C9RNbacqAoWc1tbZ7pC9nrvCmnKHJCEFVPFxlY0bnnS4shsKQtcz
5ZuZ2xVZhDTwklJOtnViqUjCXVmP8vUdijQ+6WXFNxpsh5Z68rAd0BIp53s2ogvs
nTzyGZgdRKtXLaiQH7Jbmzb3QPLTuDvd7uFrJF/asc0Z0TJVK5smZ0y7RHb4tt1k
OSixOeov/oyWX48fHob9M78JyV4HJguQ5O62qPJJ6tr/Dt5p8bbXxWrlinfQuVdf
5DY3NOjQ28No7QfIjfOj58E7/DS0jxvPgrftIivvwOPWTbqHXhWv/qLYhG4GO+QU
ZFX+SmSt/wI/Q4+KxVBE+UnPcje7ADq3wYTakcLp9h+3SP7O
采集的数据
- Pin 的标题和描述
- 图片链接
- 域名和来源链接
- Pin ID
- 作者和 Pin 者的姓名及用户名
功能
- 抓取多达 100 页的搜索结果
- 支持指定所需的页数
使用场景
- 采集图片用于填充自己的博客
- 采集文本数据用于文本生成
- 任何其他使用 Pinterest 数据的场景
查询
查询时需要指定搜索短语,例如:
Waterfall
Speak in english
Cats and dogs
cars
查询占位符
您可以使用 内置宏 来扩展查询,例如我们想获取一个非常大的论坛数据库,可以指定几个不同语言的基础查询:
forum
论坛
foro
论坛
在查询格式中指定从 a 到 zzzz 的字符遍历,此方法可以最大限度地轮换搜索结果并获取大量新的唯一结果:
$query {az:a:zzzz}
该宏将为每个原始搜索查询创建 475254 个额外查询,总计 4 x 475254 = 1901016 个搜索查询,这个数字令人印象深刻,但对于 A-Parser 来说完全不是问题。在每分钟 2000 次查询的速度下,此类任务仅需 16 小时即可处理完毕。
结果输出示例
A-Parser 通过内置的模板引擎 Template Toolkit 支持灵活的结果格式化,这使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。
默认输出
结果格式:
$serp.format('$image\n')
结果示例:
https://i.pinimg.com/originals/dd/ba/c6/ddbac6d4813a82c55a658bf3afe8b90e.jpg
https://i.pinimg.com/originals/2e/73/f4/2e73f44e74756c79950772fca4fbea7d.jpg
https://i.pinimg.com/originals/49/4f/72/494f7289e47ac7988611dc7a191e0f89.jpg
https://i.pinimg.com/originals/43/bb/28/43bb28b987d916f543e128c9cdb3464f.jpg
https://i.pinimg.com/originals/0c/d8/6a/0cd86ab71aae1e16fef47fef803998ee.jpg
https://i.pinimg.com/originals/5c/76/d5/5c76d519525457d9807a37c07d140f22.jpg
https://i.pinimg.com/originals/45/64/66/456466acb284572575ce693a35bb75a3.jpg
https://i.pinimg.com/originals/ad/b9/34/adb934d8393b64177c7a0a4195648e4e.jpg
https://i.pinimg.com/originals/b6/ef/70/b6ef705f7f6635bf4e895ee5d114714b.jpg
https://i.pinimg.com/originals/7a/e5/9c/7ae59c58c87f28ffb4568f7b817852eb.jpg
https://i.pinimg.com/originals/57/0d/9a/570d9a47a72e75c9c5419da8ca7b9776.jpg
...
输出到 CSV 表格
结果格式:
[% FOREACH item IN serp;
tools.CSVline(query, item.link, item.pinner_username, item.title, item.id);
END %]
结果示例:
"Speak in english ",https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest,,"Great Ways to Start a Sentence",289637819794664055
"Speak in english ",https://www.buzzfeed.com/maximilianzender/deutschlernen?sub=0_10833256#10833256,,"23 Fotos, die dich zur Weißglut bringen, wenn du gerade Deutsch lernst",84653667980063381
"Speak in english ",https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest,,"Great Ways to Start a Sentence",307089268347214342
"Speak in english ",https://www.sprachheld.de/category/englisch-lernen/,,"Englisch lernen: W-Fragen!",13792342597188564
"Speak in english ",http://knowyourmeme.com/photos/1176258-pen-pineapple-apple-pen,,"Avengers - I have a pen spoof | ""Pen Pineapple Apple Pen""",609604499561518518
"Speak in english ",https://lehrermarktplatz.de/material/222137/picture-dictionary-word-banks-all-about-english-vocabulary-wachsendes-paket,,"Picture Dictionary: All about English Vocabulary + Word Banks! Englisch Wörterbuch/Vokabeln UK & USA",19210735900566228
"Speak in english ",https://lehrermarktplatz.de/material/180602/conversation-bubbles-english-for-beginners-bearbeitbare-sprechblasen?utm_medium=social&utm_source=pinterest&utm_campaign=tailwind_tribes&utm_content=tribes&utm_term=1144892530_56228916_845126,,"Beginner Conversation Bubbles | Tafelmaterial Englisch Unterricht Grundschule & Klasse 5",314477986488878805
"Speak in english ",https://mobile.twitter.com/AndereDeutsch/status/648719176048222208,,Twitter,1829656081959551
"Speak in english ",https://die-lernlotsen.com/englische-redewendungen/,,"Wow! Englische Redewendungen, die du immer anwenden kannst",82612974406076695
...
以 SQL 格式保存
结果格式:
[% FOREACH serp;
"INSERT INTO serp VALUES('" _ query _ "', '";
link _ "', '";
image _ "', '";
pinner_username _ "')\n";
END %]
结果示例:
INSERT INTO serp VALUES('Speak in english ', 'https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest', 'https://i.pinimg.com/originals/dd/ba/c6/ddbac6d4813a82c55a658bf3afe8b90e.jpg', '')
INSERT INTO serp VALUES('Speak in english ', 'https://www.buzzfeed.com/maximilianzender/deutschlernen?sub=0_10833256#10833256', 'https://i.pinimg.com/originals/2e/73/f4/2e73f44e74756c79950772fca4fbea7d.jpg', '')
INSERT INTO serp VALUES('Speak in english ', 'https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest', 'https://i.pinimg.com/originals/49/4f/72/494f7289e47ac7988611dc7a191e0f89.jpg', 'TEFLcourse')
INSERT INTO serp VALUES('Speak in english ', 'https://www.sprachheld.de/category/englisch-lernen/', 'https://i.pinimg.com/originals/43/bb/28/43bb28b987d916f543e128c9cdb3464f.jpg', 'sprachheld')
INSERT INTO serp VALUES('Speak in english ', 'http://knowyourmeme.com/photos/1176258-pen-pineapple-apple-pen', 'https://i.pinimg.com/originals/0c/d8/6a/0cd86ab71aae1e16fef47fef803998ee.jpg', '')
INSERT INTO serp VALUES('Speak in english ', 'https://lehrermarktplatz.de/material/222137/picture-dictionary-word-banks-all-about-english-vocabulary-wachsendes-paket', 'https://i.pinimg.com/originals/5c/76/d5/5c76d519525457d9807a37c07d140f22.jpg', 'unterrichtmitspass')
INSERT INTO serp VALUES('Speak in english ', 'https://lehrermarktplatz.de/material/180602/conversation-bubbles-english-for-beginners-bearbeitbare-sprechblasen?utm_medium=social&utm_source=pinterest&utm_campaign=tailwind_tribes&utm_content=tribes&utm_term=1144892530_56228916_845126', 'https://i.pinimg.com/originals/45/64/66/456466acb284572575ce693a35bb75a3.jpg', '')
INSERT INTO serp VALUES('Speak in english ', 'https://mobile.twitter.com/AndereDeutsch/status/648719176048222208', 'https://i.pinimg.com/originals/ad/b9/34/adb934d8393b64177c7a0a4195648e4e.jpg', '')
INSERT INTO serp VALUES('Speak in english ', 'https://die-lernlotsen.com/englische-redewendungen/', 'https://i.pinimg.com/originals/b6/ef/70/b6ef705f7f6635bf4e895ee5d114714b.jpg', 'lernlotsen')
...
将结果转储为 JSON
通用结果格式:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.query = query;
obj.images = [];
FOREACH item IN p1.serp;
obj.images.push({
title = item.title
pinner_username = item.pinner_username
link = item.link
image = item.image
});
END;
obj.json %]
起始文本:
[
结束文本:
]
结果示例:
[{
"query": "Speak in english",
"images": [
{
"link": "https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest",
"author": "",
"title": "Great Ways to Start a Sentence",
"image": "https://i.pinimg.com/originals/dd/ba/c6/ddbac6d4813a82c55a658bf3afe8b90e.jpg"
},
{
"link": "https://www.buzzfeed.com/maximilianzender/deutschlernen?sub=0_10833256#10833256",
"author": "",
"title": "23 Fotos, die dich zur Weißglut bringen, wenn du gerade Deutsch lernst",
"image": "https://i.pinimg.com/originals/2e/73/f4/2e73f44e74756c79950772fca4fbea7d.jpg"
},
{
"link": "https://www.teflcourse.net/english-grammar-corner/great-ways-to-start-a-sentence/?r=Pinterest",
"author": "TEFLcourse",
"title": "Great Ways to Start a Sentence",
"image": "https://i.pinimg.com/originals/49/4f/72/494f7289e47ac7988611dc7a191e0f89.jpg"
},
...
]
}]
要在任务编辑器中使“Prepend text”和“Append text”选项可用,需要激活“More options”。
可用设置
| 参数 | 默认值 | 描述 |
|---|---|---|
| Pages count | 5 | 要抓取的页数 |