跳转到主要内容

SE::Bing::Images - Bing 图片爬虫工具

Bing Images

爬虫工具概览

Bing 图片搜索结果爬虫工具。借助 SE::Bing::Images 爬虫工具,您可以获取图片链接库或可供后续使用的图片。您可以直接使用在 Bing 搜索栏中输入的查询词。

A-Parser 的功能允许您保存 Bing 爬虫工具的数据抓取设置以便后续使用(预设),设置数据抓取计划等等。您可以使用自动查询扩展、从文件中替换子查询、遍历数字字母组合和列表,以获取尽可能多的结果。

由于内置了强大的模板引擎 Template Toolkit,可以按您需要的形式和结构保存结果,该引擎允许对结果应用额外的逻辑,并以各种格式输出数据,包括 JSON、SQL 和 CSV

爬虫工具应用案例

A-Parser 允许使用任务链:当第一个任务完成后,第二个任务将开始执行,第一个任务中的链接将作为第二个任务的查询词。

下载示例

如何将示例导入 A-Parser

eJyNVUtT2zAQ/iuMhkNoQ2IOvfjCBNpM6VBCIZxCOqPGa1cgS0aSA4zJf++ubPwI
JvQmrfa9334qmOP23l4asOAsCxcFy/yZhexEqGTvLOUJ7H3Vj0pqHoFhQ5ZxY8GQ
8oJdfwtD0gtDr2jxOYKY59Kx5XLI0BUe7VSblJPL/exoVPmvH6/5GuYaH2MhoRFP
8XbBUyCriDug11HsHQ0ORu6JPPAoEk5oxWUZgXJqot4o8ZCTvXUGU0R9vBoBdmp0
imIH3gkJn18zXLB9f2foJvf2v0obFsZcWhgyi+lOOSYTbb8IB4Y7bWYZ5YTygmk1
kfIc1iAbNe//JBcSu2knMRqdVYb9KrM3PjZ1ie1QazCPBnOovfjbyexnYxXpc51g
5dEfrFuKVDi821OdKxpOgMJ7gKzu24VGSaoN1GGcyaEOjqDJQEWo2ExtkjWiThWd
yXSFK61ikcwwfyMieNXM1RyROVOnOs0kUFmshFgXjLmFqwYwE1sNhS51utuuTn1A
6kMF1SFzWkv747pMPDMC8fiF0k2xre0cqtauuJQ3V+ed7Bp8kWedwEoLxUjXQaIR
UljXZlh01ucCXBh+n88vW3uDKgYSeEIv2BkHyh265wzC0adjQQ0YDzKVvNxlcJy8
JCI+8M1H/TkqoY2gvTCGP1fLQ8WVL6vcOp2W3aonhvK/wP0wXkWVo1jy1pbSatC+
+BLWXHqAKK2g8Y8b6vxMtIcLLR4o245ULlypV0VR2sFDziXbtPmi2X1fsh2XWzlS
eXq4X1RnVNiMiFFKtz2IUbmU7wC9D8O99PA/6NkJwqDZO6U/JK1tNnwHaW1Q9dMs
cibvWcKCWZ2bFbkpiY6wT8OldrLlsMbeeLD4PV5+Pri9HQ2Owy7k9nswV4GgNN8s
h80/0re2PXSxRbdBHxnVm9nDwkF7/7ZXusNiwTtEtP1v+HF9xOrBbkbffu6webDp
45hgx5/Uz5W7PoOg/RFQQD8D7PmRH1JJRvX/X7z90cMCjRqiwive7+xlaUkYr3Qw
pvV7f7T5B3ZQ48A=

采集的数据

  • 图片链接
  • 图片宽度和高度
  • 图片格式
  • 包含图片的页面链接
  • 锚点文本
  • 图片预览图
采集的数据

功能

  • 指定页数
  • 选择搜索语言和国家
  • 选择安全搜索选项

使用场景

  • 采集图片用于填充博客和网站内容
  • 采集头像库

查询词

需要指定搜索短语作为查询词,例如:

dogecoin
Football
Waterfall
Speak in english
Cats and dogs
cars

查询词占位符

您可以使用内置宏来扩展查询词,例如我们想获取一个非常大的论坛数据库,指定几个不同语言的基础查询词:

forum
论坛
foro
论坛

在查询格式中指定从 a 到 zzzz 的字符遍历,这种方法可以最大程度地轮换搜索结果并获得大量新的唯一结果:

$query {az:a:zzzz}

该宏将为每个原始搜索查询创建 475254 个额外查询,总计将产生 4 x 475254 = 1901016 个搜索查询,这个数字令人印象深刻,但对于 A-Parser 来说完全不是问题。在每分钟 2000 个查询的速度下,此类任务仅需 16 小时即可处理完毕。

结果输出示例

A-Parser 凭借内置的 Template Toolkit 模板引擎支持灵活的结果格式化,这使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。

默认输出

结果格式:

$serp.format('$link\n')

结果示例:

https://image.cnbcfm.com/api/v1/image/104930399-6ED2-REQ-010517-Dogecoin.jpg?v=1529477162
https://i0.wp.com/promoterdigital.com/wp-content/uploads/2017/12/Como-Comprar-Dogecoin.png?ssl=1
https://www.shdon.com/blog/images/2014/03/dogecoin-meme.png
https://www.thecoinrepublic.com/wp-content/uploads/2019/05/dogecoin.jpg
https://static.india.com/wp-content/uploads/2021/01/dogecoin.jpg
https://i0.wp.com/criptotendencia.com/wp-content/uploads/2020/07/Dogecoin-Vuelve-el-efecto-TikTok.jpg?fit=1200%2C800&ssl=1
https://www.cryptonewsz.com/wp-content/uploads/2019/01/Dogecoin-DOGE-Price-Analysis-Jan.11.jpg
https://www.mebere.com/sites/default/files/inline-images/Invest in Dogecoin.jpg
https://www.thecoinrepublic.com/wp-content/uploads/2020/04/Dogecoin-Doge-logo2.png
https://assets.coingecko.com/coins/images/7670/large/DogeCoin-Private-logo.png?1549240189
https://thelibertarianrepublic.com/wp-content/uploads/2019/05/Screen-Shot-2019-05-17-at-2.01.29-PM.jpg
https://i.teknolojioku.com/storage/files/images/2021/01/29/dogecoin-1-b1lB_cover.jpg

输出到 CSV 表格

结果格式:

[% FOREACH item IN serp;
tools.CSVline(query, item.link, item.width, item.height, item.type, item.pagelink, item.thumb, item.anchor);
END %]

结果示例:

dogecoin,https://interanalyst.us/wp-content/uploads/2018/01/dogecoin-logo.png,900,900,png,https://interanalyst.us/markets/cryptocurrency-trading-signals/dogecoin-doge/,https://tse1.mm.bing.net/th?id=OIP.MG983TODjELbLg7pfna_8AHaHa&pid=15.1,"Dogecoin | DOGE | Investment Charts with Algorithmic Trading Signals ..."
dogecoin,http://vkool.com/wp-content/uploads/2018/03/Dogecoin-1.jpg,1200,800,jpeg,https://vkool.com/dogecoin-doge-token-review/,https://tse1.mm.bing.net/th?id=OIP.Az4VhLecunYQo3ztj_waTgHaE8&pid=15.1,"Dogecoin (DOGE) Token Review (UPDATED 2018): What is It? | VKOOL"
dogecoin,https://www.cryptonewsz.com/wp-content/uploads/2019/02/Dogecoin.jpg,2048,1304,jpeg,https://www.cryptonewsz.com/price-analysis-of-dogecoin-doge/9113/,https://tse4.mm.bing.net/th?id=OIP.5B3vzuHUWNMGotGqJGWRCQHaEt&pid=15.1,"Price Analysis of Dogecoin (DOGE) - CryptoNewsZ"
dogecoin,https://digitalmoneytimes.com/wp-content/uploads/2018/09/dogecoin-petition.jpg,1000,750,jpeg,https://digitalmoneytimes.com/dogecoin-price-1-million-doge-nearly-equals-the-value-of-1-bitcoin/,https://tse3.mm.bing.net/th?id=OIP.-1QNz1XY_Ox2_MHmb2XqJAHaFj&pid=15.1,"Dogecoin Price: 1 Million Doge Nearly Equals the Value of 1 Bitcoin ..."
dogecoin,https://www.cryptonewsz.com/wp-content/uploads/2019/01/Dogecoin-DOGE-Price-Analysis-Jan.-22.jpg,1920,1200,jpeg,https://www.cryptonewsz.com/dogecoin-doge-price-analysis-jan-22/6798/,https://tse1.mm.bing.net/th?id=OIP.-CoV662aYK1YAz8SEo3KDwHaEo&pid=15.1,"Dogecoin (DOGE): Price Analysis, Jan. 22 - CryptoNewsZ"

以 SQL 格式保存

结果格式:

[% FOREACH serp;
"INSERT INTO serp VALUES('" _ query _ "', '"; link _ "', '"; anchor _ "', '"; pagelink _ "', '"; thumb _ "')\n";
END %]

结果示例:

INSERT INTO serp VALUES('dogecoin', 'https: //www.cryptonewsz.com/wp-content/uploads/2019/02/Dogecoin.jpg', 'Price Analysis of Dogecoin (DOGE) - CryptoNewsZ', 'https://www.cryptonewsz.com/price-analysis-of-dogecoin-doge/9113/', 'https://tse4.mm.bing.net/th?id=OIP.5B3vzuHUWNMGotGqJGWRCQHaEt&pid=15.1')
INSERT INTO serp VALUES('dogecoin', 'https: //interanalyst.us/wp-content/uploads/2018/01/dogecoin-logo.png', 'Dogecoin | DOGE | Investment Charts with Algorithmic ...', 'https://interanalyst.us/markets/cryptocurrency-trading-signals/dogecoin-doge/', 'https://tse1.mm.bing.net/th?id=OIP.MG983TODjELbLg7pfna_8AHaHa&pid=15.1')
INSERT INTO serp VALUES('dogecoin', 'https: //www.cryptonewsz.com/wp-content/uploads/2019/03/stack-of-dogecoins.jpg', 'Dogecoin (DOGE) Price Analysis : The Future of Dogecoin ...', 'https://www.cryptonewsz.com/dogecoin-doge-price-analysis-the-future-of-dogecoin-looks-very-bright/11091/', 'https://tse4.mm.bing.net/th?id=OIP.GLgXn-8nxIgTZNkUHpiQ3AHaGR&pid=15.1')
INSERT INTO serp VALUES('dogecoin', 'https: //cdn.vox-cdn.com/thumbor/G_w4Nyo9IJx5q5xa5E92vJCVyUQ=/21x0:539x345/1200x800/filters:focal(21x0:539x345)/cdn.vox-cdn.com/assets/3727699/Dogecoin_logo.png', 'Bitcoin is so 2013: Dogecoin is the new cryptocurrency on ...', 'http://www.theverge.com/2013/12/16/5216862/bitcoin-is-so-2013-dogecoin-is-new-crypto-currency-on-the-block', 'https://tse1.mm.bing.net/th?id=OIP.ILO1wN5PD5jq3AOdG13itQHaE8&pid=15.1')
INSERT INTO serp VALUES('dogecoin', 'https: //www.fxwrite.com/wp-content/uploads/2019/07/dogecoin.jpg', 'Dogecoin Price Showing Record 10.79% Uptrend Momentum', 'https://www.fxwrite.com/dogecoin-price-analysis-dogecoin-doge-is-showing-an-uptrend-momentum/', 'https://tse1.mm.bing.net/th?id=OIP.7ipqVaPp3hBRzmVPMmXYEQHaEK&pid=15.1')
INSERT INTO serp VALUES('dogecoin', 'https: //www.cryptonewsz.com/wp-content/uploads/2019/01/Dogecoin-DOGE-Price-Analysis-Jan.-22.jpg', 'Dogecoin (DOGE): Price Analysis, Jan. 22 - CryptoNewsZ', 'https://www.cryptonewsz.com/dogecoin-doge-price-analysis-jan-22/6798/', 'https://tse1.mm.bing.net/th?id=OIP.-CoV662aYK1YAz8SEo3KDwHaEo&pid=15.1')

将结果转储为 JSON

通用结果格式:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.images = [];

FOREACH item IN p1.serp;
obj.images.push({
link = item.link
width = item.width
height = item.height
type = item.type
pagelink = item.pagelink
thumb = item.thumb
anchor = item.anchor
});
END;

obj.json %]

起始文本:

[

结束文本:

]

结果示例:

[{
"images": [
{
"link": "http://mymodernmet.com/wp/wp-content/uploads/2017/03/gabrielius-khiterer-stray-cats-8.jpg",
"width": "750",
"anchor": "Dignified Stray Cat Photos Celebrate Their Unique Beauty",
"pagelink": "https://mymodernmet.com/gabrielius-khiterer-stray-cat-photos/",
"type": "jpeg",
"thumb": "https://tse3.mm.bing.net/th?id=OIP.ZjfS8JQc9sahsK0-w8dRFAHaKJ&pid=15.1",
"height": "1028"
},
{
"link": "https://cms.qz.com/wp-content/uploads/2016/09/catscolonizedbyboat.jpg?quality=75&strip=all&w=1400",
"width": "1400",
"anchor": "DNA shows that cats colonized most of the planet by boat ...",
"pagelink": "https://qz.com/793979/dna-evidence-shows-cats-colonized-earth-by-boat/",
"type": "jpeg",
"thumb": "https://tse1.mm.bing.net/th?id=OIP.V6n6qGkcNx4JquiU6AvbBAHaEK&pid=15.1",
"height": "788"
},
{
"link": "https://hddesktopwallpapers.in/wp-content/uploads/2015/09/cute-baby-cats-wallpaper.jpg",
"width": "2560",
"anchor": "cute baby cats wallpaper - HD Desktop Wallpapers | 4k HD",
"pagelink": "https://hddesktopwallpapers.in/cute-baby-cats-wallpaper/",
"type": "jpeg",
"thumb": "https://tse4.mm.bing.net/th?id=OIP.QdEkrZjd1c_VN_aUtleoFgHaEo&pid=15.1",
"height": "1600"
},

],
"query": "cats"
}]

提示

要在任务编辑器中使“Prepend text”和“Append text”选项可用,需要激活“More options”。

可用设置

参数默认值描述
Pages count10要抓取的页数
Safe searchModerate安全搜索
RegionBased on IP地区
Interface languageAny界面语言