Reddit::Posts - Reddit 帖子爬虫工具

Reddit::Posts 爬虫工具概览
Reddit::Posts - Reddit 帖子爬虫工具。从同名服务中抓取帖子列表及每个帖子的详细信息。
您可以使用查询自动扩展、从文件替换子查询、遍历字母数字组合以及列表,以获取尽可能多的结果。
A-Parser 的功能允许保存 Reddit::Posts 爬虫工具的数据抓取设置以供将来使用(预设),设置数据抓取计划等等。
得益于内置强大的 Template Toolkit 模板引擎,可以将结果保存为您需要的任何形式和结构,这允许对结果应用额外的逻辑并以各种格式输出数据,包括 JSON、SQL 和 CSV。
采集的数据
帖子数组:
- 帖子链接
- 标题和标签 (flair)
- 评分、评论数和奖励数
- 创建日期
- 发布帖子的社区
- 作者及其标签 (flair)
- 帖子内容:markdown 文本、媒体内容链接和外部资源链接
- 帖子是否为广告
功能
- 指定抓取的页数
- 指定结果排序方式
- 选择结果的时间范围
- 支持在特定社区内进行数据抓取
应用场景
- 任何需要获取 Reddit 帖子数据的场景
查询
支持多种查询方式:
话题链接
示例:
https://www.reddit.com/t/bitcoin/
https://www.reddit.com/t/kim_kardashian/
默认情况下,结果将输出帖子链接列表,例如:
https://www.reddit.com/r/Bitcoin/comments/14nbyy2/i_took_out_a_35000_loan_to_buy_bitcoin_1_year/
https://www.reddit.com/r/CryptoCurrency/comments/14guprs/bitcoin_is_up_75_since_jim_cramer_told_investors/
https://www.reddit.com/r/Bitcoin/comments/14opp2t/this_guy_was_paid_32_bitcoin_to_hold_up_this_sign/
https://www.reddit.com/r/CryptoCurrency/comments/14ivx43/nearly_69_of_all_bitcoin_supply_did_not_move_in/
https://www.reddit.com/r/CryptoCurrency/comments/149vy0o/bitcoin_dips_below_25k_for_the_first_time_in_3/
...
社区链接
链接中指向时间和排序方式的参数也会被考虑,此时设置中的相关选项将被忽略。示例:
https://www.reddit.com/r/nba/
https://www.reddit.com/r/OrlandoMagic/top/?t=month
默认情况下,结果将输出帖子链接列表,例如:
https://www.reddit.com/r/OrlandoMagic/comments/14a5br2/
https://www.reddit.com/r/OrlandoMagic/comments/14nqfk1/keep_mo_or_no_mo/
https://www.reddit.com/r/nba/comments/14nfzki/202324_nba_free_agent_tracker/
https://www.reddit.com/user/Grammarly/comments/14ghtld/verbessere_deine_schreibfertigkeit_auf_englisch/
https://www.reddit.com/r/nba/comments/14r4l4s/vernon_dillon_brooks_took_991_shots_last_year_he/
https://www.reddit.com/r/nba/comments/14ql1es/highlight_matt_devlin_inexplicably_yells_punjabi/
https://www.reddit.com/user/TelekomShop/comments/yqkina/der_highspeedhotspot_zum_mitnehmen_die_speedbox/
https://www.reddit.com/r/nba/comments/14qysvi/michael_jordan_with_the_spin_hanging_onehanded/
https://www.reddit.com/r/nba/comments/14qxrep/dwyane_wade_leads_the_redeem_team_with_27_points/
...
关键词
示例:
wordpress features
parser
默认情况下,结果将输出帖子链接列表,例如:
https://www.reddit.com/r/ShitpostXIV/comments/14511em/i_am_a_proud_grey_parser/
https://www.reddit.com/r/opengl/comments/147sbjk/4_hours_of_my_obj_parser_so_far/
https://www.reddit.com/r/Compilers/comments/14pi9xh/demystifying_pratt_parsers/
https://www.reddit.com/r/ZETTAHOST/comments/11qdg99/how_to_change_the_wordpress_featured_image_size/
https://www.reddit.com/r/Wordpress/comments/14p1k2p/what_features_is_wordpress_missing_i_want_to_help/
https://www.reddit.com/r/Wordpress/comments/13q8g5x/is_it_possible_and_advisable_to_build_a_website/
...
关键词和社区链接
爬虫工具支持在特定社区内搜索关键词。为此,需要在查询中指定关键词,并用空格分隔社区链接。示例:
jesus https://www.reddit.com/r/atheism/
stage 3 https://www.reddit.com/r/Audi/
默认情况下,结果将输出帖子链接列表,例如:
https://www.reddit.com/r/Audi/comments/vi6cs5/thoughts_on_used_stage_3_2017_a3/
https://www.reddit.com/r/Audi/comments/lfvjuo/just_picked_up_this_beauty_stage_3_b5_s4/
https://www.reddit.com/r/Audi/comments/ssr8ui/anyone_else_track_their_audis_ttrs_stage_3_big/
https://www.reddit.com/r/atheism/comments/14lq0y6/heaven_and_hell_are_not_what_jesus_preached/
https://www.reddit.com/r/atheism/comments/13gxzj6/so_jesus_freaks_can_shove_their_religion_onto/
https://www.reddit.com/r/atheism/comments/13b8kl6/chris_pratt_compares_his_struggles_to_jesus/
https://www.reddit.com/r/atheism/comments/137k88b/artwork_of_jesus_surrounded_by_hot_leather/
...
结果输出选项
A-Parser 凭借内置的 Template Toolkit 模板引擎支持灵活的结果格式化,这使其能够以任意形式以及结构化形式(如 CSV 或 JSON)输出结果。
可用设置
| 参数 | 默认值 | 描述 |
|---|---|---|
| Pages count | 5 | 搜索结果页数 |
| Sort | Relevance | 结果排序方式 |
| Time | All time | 结果时间范围 |
| Use HTTP/2 transport | ☑ | 确定是否使用 HTTP/2 代替 HTTP/1.1 |