跳转到主要内容

GooglePlay::Apps - Google Play 应用爬虫工具

爬虫工具概览

爬虫工具概览GooglePlay::AppsGooglePlay::Apps – 从 Google Play 抓取应用数据。自动采集所有数据:名称、评分、作者、描述、应用链接和价格。

您可以使用自动查询扩展、从文件替换子查询、枚举字母数字组合和列表,以获取尽可能多的结果。通过使用结果过滤,您可以立即清理结果,删除所有不需要的垃圾内容(使用否定词)。

A-Parser 的功能允许保存 GooglePlay::Apps 爬虫工具的采集设置以便将来使用(预设),设置采集计划等等。

得益于内置强大的模板引擎 Template Toolkit,可以将结果保存为您需要的任何形式和结构,该引擎允许对结果应用额外的逻辑,并以各种格式输出数据,包括 JSON、SQL 和 CSV

爬虫工具应用案例

采集的数据

数据采集自 play.google.com 服务

采集的数据
  • 名称
  • 评分
  • 描述
  • 应用链接
  • 价格
  • 下载次数
  • 应用类别

功能

  • 选择进行搜索的国家/地区
  • 选择结果页面的语言
  • 按价格搜索(免费或付费)
  • 按评分搜索(全部或 4+ 以上)

使用场景

  • 从 Google Play 抓取应用数据

查询

作为查询,需要指定在 GooglePlay 中进行搜索的词组,例如:

心率计

查询替换

您可以使用内置宏自动从文件中替换子查询,例如,我们想为每个查询添加一些其他词列表,指定几个主要查询:

fantasy
tower defense
rpg

在查询格式中指定从文件 keywords.txt 替换额外词的宏,此方法可以成倍增加查询的多样性:

{subs:keywords} $query 

该宏将为每个原始搜索查询创建与文件中相同数量的额外查询,宏运行后的结果为 [原始查询数量] x [Keywords 文件中的查询数量] = [总查询数量]

例如,如果文件 keywords.txt 包含:

free
online

最终替换宏将把 3 个主要查询变为 6 个:

free fantasy
online fantasy
free tower defense
online tower defense
free rpg
online rpg

结果输出示例

得益于内置的模板引擎 Template Toolkit,A-Parser 支持灵活的结果格式化,这使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。

默认输出

结果格式:

$serp.format('$name ($rating, $author): $desc ($link, $price)\n')

结果示例:

Heart Rate Monitor (4.5, Meet Your Need Production): Most Precise Heart Rate Monitor Ever + Unlimited Measurements and Records. (https://play.google.com/store/apps/details?id=com.supersimpleapps.heart_rate_monitor_newui, free)  
Instant Heart Rate : Heart Rate & Pulse Monitor (4.3, Azumio Inc.): Best way to track heart rate, health, fitness, workout, cardio, stress, training (https://play.google.com/store/apps/details?id=si.modula.android.instantheartrate, free)
Pulse - Heart Rate Monitor (4.3, Ahmerov Vadim): Get your heart rate in seconds! The Best way to measure your heart rate. (https://play.google.com/store/apps/details?id=com.icebit.pulse, free)
Heart Rate Monitor (4.5, REPS): The simplest yet the most accurate app to monitor your heart rate! (https://play.google.com/store/apps/details?id=com.repsi.heartrate, free)
Cardiograph - Heart Rate Meter (3.8, MacroPinch): How fast is your heart beating? Measure your pulse. (https://play.google.com/store/apps/details?id=com.macropinch.hydra.android, free)
iCare Health Monitor (BP & HR) (4.6, iCare Fit Studio): Measure blood pressure,heart rate,vision,hearing,SpO2,breath rate by the phone (https://play.google.com/store/apps/details?id=comm.cchong.BloodAssistant, free)

输出到 CSV 表格

结果格式:

[% FOREACH i IN serp; 
tools.CSVline(i.name, i.rating, i.author, i.price);
END %]

文件名:

$datefile.format().csv

起始文本:

名称,评分,作者,价格

提示

在结果格式中应用 Template Toolkit 模板引擎 以在 FOREACH 循环中输出 $serp 数组元素。

在结果文件名中,只需将文件扩展名更改为 csv。

为了使“Prepend text”选项在任务编辑器中可用,需要激活“More options”。 在“Prepend text”中写入以逗号分隔的列名,并将第二行留空。

以 SQL 格式保存

结果格式:

[% FOREACH serp;
"INSERT INTO serp VALUES('" _ name _ "', '" _ rating _ "', '" _ author _ "')\n";
END %]

结果示例:

INSERT INTO serp VALUES('Internet Speed Test-FiberTest for Android Smart TV', '3.9', 'RedMango Analytics PVT LTD')
INSERT INTO serp VALUES('Phone Check and Test', '4.6', 'inPocket Software')
INSERT INTO serp VALUES('Speedtest by Ookla', '4.3', 'Ookla')
INSERT INTO serp VALUES('Expert Speed Test WiFi Analyzer - analiti', '4.6', 'analiti Experts Group')
INSERT INTO serp VALUES('Brain Test: Tricky Puzzles', '4.6', 'Unico Studio')
INSERT INTO serp VALUES('Love Test Calculator - Compatibility Tester Prank', '3.8', 'DH3 Games')
INSERT INTO serp VALUES('FAST Speed Test', '4.3', 'Netflix, Inc.')
INSERT INTO serp VALUES('DMV WRITTEN TEST: Free DMV Tests for All 50 States', '4.7', 'DMV Written Test')
INSERT INTO serp VALUES('Brain Test 2: Tricky Stories', '4.6', 'Unico Studio')
INSERT INTO serp VALUES('Internet speed test Meter- SpeedTest Master', '4.9', 'Test speed internet & Net meter')
INSERT INTO serp VALUES('Test Your Android - Hardware Testing & Utilities', '4.2', 'Hibernate')
INSERT INTO serp VALUES('Stupid Test - How Smart Are You?', '3.6', 'DH3 Games')
INSERT INTO serp VALUES('love test', '4.1', 'MobilPlug')
INSERT INTO serp VALUES('Personality Trait Test', '4.2', 'Aaditya Prakash')
INSERT INTO serp VALUES('Aptitude test. Personality test games', '4.2', 'iq test')
INSERT INTO serp VALUES('What animal are you? Test', '3.5', 'Hemisoft')
INSERT INTO serp VALUES('Lie Detector Face Test Simulator Prank', '3.9', 'Droidheads')
INSERT INTO serp VALUES('BFF Friendship Test', '4.1', 'DH3 Games')
INSERT INTO serp VALUES('IQ and Aptitude Test Practice', '4.0', 'LangiS')
INSERT INTO serp VALUES('FCC Speed Test', '3.3', 'FCCAPPs')
INSERT INTO serp VALUES('40+ Psychological Tests', '4.4', 'MuraDev')
INSERT INTO serp VALUES('G1 Test Genie: Drivers Test Practice Ontario 2021', '4.5', 'Elegant E-Learning')
INSERT INTO serp VALUES('Brain test - psychological and iq test', '4.4', 'iq test')

将结果转储为 JSON

通用结果格式:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.serp = [];

FOREACH item IN p1.serp;
obj.serp.push({
name = item.name
author = item.author
rating = item.rating
price = item.price
});
END;

obj.json %]

起始文本:

[

结束文本:

]

结果示例:

[{"serp": [{"name":"Internet Speed Test-FiberTest for Android Smart TV","price":"free","author":"RedMango Analytics PVT LTD","rating":"3.9"},{"name":"Phone Check and Test","price":"free","author":"inPocket Software","rating":"4.6"},{"name":"Speedtest by Ookla","price":"free","author":"Ookla","rating":"4.3"},{"name":"Expert Speed Test WiFi Analyzer - analiti","price":"free","author":"analiti Experts Group","rating":"4.6"},{"name":"Brain Test: Tricky Puzzles","price":"free","author":"Unico Studio","rating":"4.6"},{"name":"Love Test Calculator - Compatibility Tester Prank","price":"free","author":"DH3 Games","rating":"3.8"},{"name":"FAST Speed Test","price":"free","author":"Netflix, Inc.","rating":"4.3"},{"name":"DMV WRITTEN TEST: Free DMV Tests for All 50 States","price":"free","author":"DMV Written Test","rating":"4.7"},{"name":"Internet speed test Meter- SpeedTest Master","price":"free","author":"Test speed internet & Net meter","rating":"4.9"},{"name":"Brain Test 2: Tricky Stories","price":"free","author":"Unico Studio","rating":"4.6"},{"name":"Stupid Test - How Smart Are You?","price":"free","author":"DH3 Games","rating":"3.6"}]}]
提示

为了使“Prepend text”和“Append text”选项在任务编辑器中可用,需要激活“More options”。

可能的设置

参数默认值描述
CountryUnited States选择进行搜索的国家/地区。
LanguageEnglish选择结果页面的语言。
PricesAll按价格筛选结果(免费或付费)。
RatingsAll按评分筛选结果。