跳转到主要内容

SE::Yandex::Images - Yandex 图片爬虫工具

Yandex Images

爬虫工具概览

Yandex 搜索结果图片爬虫工具。得益于 SE::Yandex::Images 爬虫工具,您可以获取图片链接库或准备好供进一步使用的图片。您可以直接使用在 Yandex 搜索栏中输入的查询格式。

A-Parser 的功能允许保存 Yandex 爬虫工具的数据抓取设置以供将来使用(预设),设置数据抓取计划等等。您可以使用自动查询扩展、从文件替换子查询、遍历数字字母组合和列表,以获取尽可能多的结果。

由于内置了强大的模板引擎 Template Toolkit,可以将结果保存为您需要的任何形式和结构,该引擎允许对结果应用额外的逻辑,并以各种格式输出数据,包括 JSON、SQL 和 CSV

爬虫工具应用案例

A-Parser 允许使用任务链,在第一个任务完成后,第二个任务将开始执行,第一个任务的链接将作为第二个任务的查询。

下载示例

如何将示例导入 A-Parser

eJyNVktT2zAQ/iuMJofQmsQcevGFCbSZ0qGEQjh0QjqjxmtXIEtGkkMYk//elWT8
SE3ozVrte7/95JIYqh/0lQINRpNoUZLcfZOI/KQihs3BeUZTOPgsnwSXNAZFApJT
pUFZ9QW5+RJFXjOKnKpGhRgSWnBDlsuAoDv81FOpMmrdDvLjURWjvryha5hLvEwY
h0Y8xdMlzcBaxdSAvR0lztHwcGQ21gONY2aYFJT7CDarJuqtYI+FtddGMZGiPh4V
Az1VMkOxAefECp9fM1yQgTsTdFM4+x/ehkQJ5RoCojHdKcVk4t0bZkBRI9Ustzmh
vCRSTDi/gDXwRs35Py0Yx37qSYJG55Vhv8rsHx/busR2qDWoJ4U51F7c6XT2vbGK
5YVMsfL4N9bNWcYMnvWZLIQdTojCB4C87tulREkmFdRhjCqgDo7AyUHEqNhMbZI3
ok4Vncl0hSspEpbOMH/FYnjVLMQc0TkTZzLLOdiyiIdYF46FhusGMBNdDcUe6nR3
XZ25gLYPFVQDYqTk+tuNTzxXDPH4yaabYVvbOVStXVHOb68vOtk1+LKeZQoryQSx
ugZSiZDCurZB2VmgSzBR9HU+v2rtDaooSGGDXrAzBoQ5Ms85RKMPJ8w2YDzMRfpy
n8NJ+pKy5NA1H/XnqIQ2zO6FUvS5Wh5bnL9ZFdrIzHernhjK/wB1w3gVVY4STltb
alfD7osrYU25A4iQAhr/uKHGzUQ6uNjFA6HbkfzCeb0qipAGHgvKybbNF83uu5L1
2G/lSBTZ0aCsvlFhO7KM4t32IEYUnL8B9D4M99LD/6BnLwjDZu+EfJe0dtnwDaS1
QdVPs8iZtGcJS6JloVbWjSc6i307XNtOsgxq7I2Hi1/j5cfDu7vR8CTqQm7Qg7kK
BN58uwyat6RvbXvoYoduwz4yqjezh4XD9v7trnSHxcI3iGj33XDjeo/Vw/2Mvnvd
YfNw28cx4Z43qZ8r9z0GYfshsAHdDLDnx25Inozqf4Cy702PSjRrqAqPeL7XV97W
orzSwajabf7x9i9vweZ7

采集的数据

  • 图片链接
  • 图片页面链接
  • 代码片段
  • 锚点
  • 图片宽度和高度
  • 预览图链接
采集的数据

功能

  • 支持所有过滤器(尺寸、方向、类型、颜色、文件类型...)
  • 可以设置安全搜索
  • 可以使用 AntiGate

使用场景

  • 采集图片用于填充自己的博客、视频站、门页站...
  • 采集头像库

查询

查询时需要指定搜索词,例如:

Cats
Football
Waterfall
Speak in english
cars

查询占位符

您可以使用内置宏来扩展查询,例如我们想获取一个非常大的论坛数据库,指定几个不同语言的基础查询:

forum
论坛
foro
论坛

在查询格式中指定从 a 到 zzzz 的字符遍历,此方法可以最大限度地轮换搜索结果并获得许多新的唯一结果:

$query {az:a:zzzz}

该宏将为每个原始搜索查询创建 475254 个额外查询,总计将产生 4 x 475254 = 1901016 个搜索查询,这个数字令人印象深刻,但对于 A-Parser 来说完全不是问题。在每分钟 2000 个查询的速度下,这样的任务仅需 16 小时即可处理完毕。

结果输出示例

A-Parser 支持通过内置模板引擎 Template Toolkit 进行灵活的结果格式化,这使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。

默认输出

结果格式:

$serp.format('$link\n')

结果示例:

https://shkolazhizni.ru/img/content/i155/155642_or.jpg
https://afishanovgorod.ru/wp-content/uploads/2020/01/kinolenta-1-scaled.jpg
https://avatars.mds.yandex.net/get-zen_doc/1616946/pub_5cdfb3f33b662600b29d78c4_5cdfb78b6a5bb500b3d59589/scale_1200
https://avatars.mds.yandex.net/get-zen_doc/3445317/pub_5f24b39c1054270f0a24d074_5f24bd9254d4fa66c07b6925/scale_1200
https://www.touchdynamic.com/wp-content/uploads/2014/04/bigstock-Young-people-sitting-in-multip-46563898.jpg
https://avatars.mds.yandex.net/get-zen_doc/245342/pub_5c89f1b42d1df300b38fec91_5c8a1a5250389e00b37bf9d7/scale_1200
https://www.quirkybyte.com/wp-content/uploads/2015/08/01.jpg
https://avatars.mds.yandex.net/get-zen_doc/40456/pub_5cbc81be1c8e87021bb87d80_5cbc8bfec6be9900b2455a2b/scale_1200
https://avatars.mds.yandex.net/get-zen_doc/235144/pub_5d23d51f31878200adb4e9a4_5d23d65a998ed600aee64e42/scale_1200
https://www.kinobusiness.com/upload/iblock/c5b/DF_04550_R2_novyy-razmer.jpg
https://ovideo.ru/images/gallery/0017/9810/0042.jpg

输出到 CSV 表格

结果格式:

[% FOREACH item IN serp;
tools.CSVline(query, item.link, item.width, item.height, item.page, item.thumb);
END %]

结果示例:

cats,https://pixy.org/src/18/184373.jpg,4751,3984,https://pixy.org/184373/,"Mongrel <b>cat</b> image.",http://im0-tub-ru.yandex.net/i?id=c480a842fb1a5ba2253ee91d07fd90aa&n=13
cats,https://cdn.hipwallpaper.com/i/92/81/b7KSXF.jpg,1024,768,https://hipwallpaper.com/wallpapers-of-cats,"<b>Cats</b> images Kitten Wallpaper (1024x768) wallpaper photos (12251034) .",http://im0-tub-ru.yandex.net/i?id=8b1a9cef01d0a6dedb97b03af8f0ddb1&n=13
cats,http://s1.1zoom.net/big3/434/Cats_Kittens_White_Grass_461681.jpg,5110,3400,http://www.1zoom.net/Animals/wallpaper/461681/z6860/5110x3400,"Images Kittens <b>Cats</b> White Grass Animals 5110x3400 kitty <b>cat</b> <b>cat</b> animal . ",http://im0-tub-ru.yandex.net/i?id=73ddd1e5b0d07eefeffdfde8262c8bf6&n=13
cats,https://get.wallhere.com/photo/cat-grass-whiskers-rest-wild-cat-light-fauna-mammal-collar-vertebrate-cat-like-mammal-small-to-medium-sized-cats-tabby-cat-domestic-short-haired-cat-bengal-european-shorthair-pixie-bob-720415.jpg,2560,1600,https://wallhere.com/en/wallpaper/720415,"Download Wallpaper <b>cat</b>, grass, whiskers, rest, wild <b>cat</b>, light, fauna, mamm...",http://im0-tub-ru.yandex.net/i?id=55bb113936977a5b61509b0daac844c7&n=13
cats,https://i.artfile.ru/2000x1333_437668_[www.ArtFile.ru].jpg,2000,1333,https://www.artfile.ru/i.php?i=437668,"动物, 猫 ",http://im0-tub-ru.yandex.net/i?id=51f12adb1d2b751a33de66a5292cf6a6&n=13
cats,http://s1.1zoom.net/big7/481/Cats_Glance_352969.jpg,2560,1706,http://www.1zoom.net/Animals/wallpaper/352969/z4306/%26original=3,"Images <b>Cats</b> Glance animal <b>cat</b> Staring Animals.",http://im0-tub-ru.yandex.net/i?id=651291c9c226b0b476cc3f6da05c089b&n=13
cats,https://cdn.wallpapersafari.com/87/66/macZVd.jpg,1600,1200,https://wallpapersafari.com/kawaii-cat-wallpaper/,"Find more Cute Kittens images Damn cute <b>cats</b> wallpaper photos. ",http://im0-tub-ru.yandex.net/i?id=7301df7db9522b89ed6fe88e4d2e17db&n=13
cats,https://wallup.net/wp-content/uploads/2018/10/07/705-cuddle-cat.jpg,2560,1600,https://wallup.net/?p=709264,"cuddle, <b>Cat</b> Wallpapers HD / Desktop and Mobile Backgrounds.",http://im0-tub-ru.yandex.net/i?id=400a0ce0f68b5c2a7674c5c6ba75b5b5&n=13
cats,https://c.pxhere.com/photos/c2/c4/Animals_Wallpaper_Cat_Closeup_Desktop_Wallpaper_Feline_Kitten_Pet-1612853.jpg!d,1200,797,https://pxhere.com/en/photo/1612853,"<b>cat</b>, whiskers, mammal, small to medium sized <b>cats</b>, felidae, nose, face",http://im0-tub-ru.yandex.net/i?id=0060cbf3f602c21ad2f026892e01001a&n=13
cats,https://i.ebayimg.com/00/s/MTE2MFgxNTYy/z/evQAAOSwImRYXCtO/$_57.JPG?set_id=8800005007,1562,1160,https://sendle.ru/88861-kart-kartochek-s-zametkami/163098755590-4pack-cat-cats-kitten-kittens-spring-stationery-greeting-notecards-envelopes-2.html,"4Pack <b>Cat</b> <b>Cats</b> Kitten Kittens Spring Stationery Greeting Notecards...",http://im0-tub-ru.yandex.net/i?id=117d40eb07ff3953141763b3c736805d&n=13
cats,https://www.wallpaperup.com/uploads/wallpapers/2013/12/19/199466/aa19b19c04da9b48b68b1365c103588d.jpg,1920,1200,https://www.wallpaperup.com/199466/cats_animals_pets.html,"<b>cats</b> animals pets wallpaper.",http://im0-tub-ru.yandex.net/i?id=fdebf609d67809cd910596dd3d6026d9&n=13
cats,https://www.wallpaperup.com/uploads/wallpapers/2017/10/17/1115160/3c78081c04f34444da4dc54f1bcd3a65.jpg,2048,1365,https://www.wallpaperup.com/1115160/kitten_cat_baby_cute.html,"kitten <b>cat</b> baby cute wallpaper.",http://im0-tub-ru.yandex.net/i?id=d9e9dae7b1e4009c7fcfab7bb7548baf&n=13
cats,https://pbs.twimg.com/media/DTa7lXRX4AEm_rB.jpg:large,1600,1200,https://twitter.com/cats_and_you/status/952154716137549825,"<b>Cats</b>_and_you on Twitter.",http://im0-tub-ru.yandex.net/i?id=4d605439d3e7bf5c340b0b357a5f084a&n=13

以 SQL 格式保存

结果格式:

[%  FOREACH serp;   "INSERT INTO serp VALUES('" _ query _ "', '"; link _ "', '";  pagelink _ "', '";    thumb _ "')\n"; END  %]

结果示例:

INSERT INTO serp VALUES('cats', 'https://pixy.org/src/18/184373.jpg', 'https://pixy.org/184373/', 'http://im0-tub-ru.yandex.net/i?id=c480a842fb1a5ba2253ee91d07fd90aa&n=13')
INSERT INTO serp VALUES('cats', 'http://s1.1zoom.net/big3/434/Cats_Kittens_White_Grass_461681.jpg', 'http://www.1zoom.net/Animals/wallpaper/461681/z6860/5110x3400', 'http://im0-tub-ru.yandex.net/i?id=73ddd1e5b0d07eefeffdfde8262c8bf6&n=13')
INSERT INTO serp VALUES('cats', 'https://cdn.hipwallpaper.com/i/92/81/b7KSXF.jpg', 'https://hipwallpaper.com/wallpapers-of-cats', 'http://im0-tub-ru.yandex.net/i?id=8b1a9cef01d0a6dedb97b03af8f0ddb1&n=13')
INSERT INTO serp VALUES('cats', 'https://i.artfile.ru/2000x1333_437668_[www.ArtFile.ru].jpg', 'https://www.artfile.ru/i.php?i=437668', 'http://im0-tub-ru.yandex.net/i?id=51f12adb1d2b751a33de66a5292cf6a6&n=13')
INSERT INTO serp VALUES('cats', 'https://get.wallhere.com/photo/cat-grass-whiskers-rest-wild-cat-light-fauna-mammal-collar-vertebrate-cat-like-mammal-small-to-medium-sized-cats-tabby-cat-domestic-short-haired-cat-bengal-european-shorthair-pixie-bob-720415.jpg', 'https://wallhere.com/en/wallpaper/720415', 'http://im0-tub-ru.yandex.net/i?id=55bb113936977a5b61509b0daac844c7&n=13')
INSERT INTO serp VALUES('cats', 'https://cdn.wallpapersafari.com/87/66/macZVd.jpg', 'https://wallpapersafari.com/kawaii-cat-wallpaper/', 'http://im0-tub-ru.yandex.net/i?id=7301df7db9522b89ed6fe88e4d2e17db&n=13

将结果转储为 JSON

通用结果格式:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.images = [];

FOREACH item IN p1.serp;
obj.images.push({
width = item.width
height = item.height
link = item.link
pagelink = item.pagelink
thumb = item.thumb
});
END;

obj.json %]

起始文本:

[

结束文本:

]

结果示例:

[{
"images": [
{
"link": "https://pixy.org/src/18/184373.jpg",
"width": 4751,
"page": "https://pixy.org/184373/",
"thumb": "http://im0-tub-ru.yandex.net/i?id=c480a842fb1a5ba2253ee91d07fd90aa&n=13",
"height": 3984
},
{
"link": "http://s1.1zoom.net/big3/434/Cats_Kittens_White_Grass_461681.jpg",
"width": 5110,
"page": "http://www.1zoom.net/Animals/wallpaper/461681/z6860/5110x3400",
"thumb": "http://im0-tub-ru.yandex.net/i?id=73ddd1e5b0d07eefeffdfde8262c8bf6&n=13",
"height": 3400
},
{
"link": "https://cdn.hipwallpaper.com/i/92/81/b7KSXF.jpg",
"width": 1024,
"page": "https://hipwallpaper.com/wallpapers-of-cats",
"thumb": "http://im0-tub-ru.yandex.net/i?id=8b1a9cef01d0a6dedb97b03af8f0ddb1&n=13",
"height": 768
},

],
"query": "cats"
}]
提示

要在任务编辑器中使“Prepend text”和“Append text”选项可用,需要激活“More options”。

可能的设置

参数默认值描述
AntiGate presetdefault选择预设 Util::AntiGateUtil::AntiGate,更多设置详情见此处
AntiGate preset for old captchadefaultAntiGate preset 类似,但仅用于普通(旧的,单张图片形式)验证码。如果此处未选择预设,则此类验证码将使用在 AntiGate preset 中选择的预设。
Experimental img captcha max count5每次尝试的最大重复验证码图片数量
Preffered captcha typeClick选择首选验证码类型:Click 或 Puzzle
Pages count10要抓取的页数
Image sizeAny图片尺寸
OrientationAny方向
Image typeAny图片类型
Image colorAny图片颜色
File typeAny图片文件类型
Commercial仅显示付费图片(商品)
RecentAny显示全部或仅显示最新图片
Security levelModerate安全搜索级别