概览

新的 JavaScript API v2 旨在通过弃用生成器并改用 async/await 来简化开发工作。此外，此 API 还消除了先前版本的一些限制，增加了对 TypeScript 的支持并提升了性能。我们建议使用此 JavaScript API 来创建所有新的爬虫工具。

要使用 JavaScript API v2，只需让您的爬虫工具类继承自基础类 BaseParser 即可。让我们通过示例来了解爬虫工具类的结构：

TypeScript
JavaScript

files/parsers/v2-example/v2-example.ts
import { BaseParser } from 'a-parser-types';

export class JS_v2_example extends BaseParser {
    static defaultConf: typeof BaseParser.defaultConf = {
        version: '0.0.1',
        results: {
            flat: [
                ['title', 'Title'],
                ['h1', 'H1 Header']
            ],
            arrays: {
                h2: ['H2 Headers List', [
                    ['header', 'Header'],
                ]],
            }
        },
        max_size: 2 * 1024 * 1024,
        parsecodes: {
            200: 1,
        },
        results_format: "Title: $title\nH1: $h1\nH2 headers:\n$h2.format('$header\\n')\n",
        limitH2Tags: 3,
    };

    static editableConf: typeof BaseParser.editableConf = [
        ['limitH2Tags', ['textfield', 'Limit H2 tags']],
    ];

    async parse(set, results) {
        const { success, data, headers } = await this.request('GET', set.query);
    
        if (success && typeof data == 'string') {
            let matches;
            if (matches = data.match(/<title[^>]*>(.*?)<\/title>/))
                results.title = matches[1];
            
            if (matches = data.match(/<h1[^>]*>(.*?)<\/h1>/))
                results.h1 = matches[1];

            if (results.h2) {
                let count = 0;
                const re = /<h2[^>]*>(.*?)<\/h2>/g;
                while(matches = re.exec(data)) {
                    results.h2.push(matches[1]);
                    if (++count == this.conf.limitH2Tags)
                        break;
                }
            }
        }

        return results;
    }
}

files/parsers/v2-example-js/v2-example-js.js
const { BaseParser } = require("a-parser-types");

class JS_v2_example_js extends BaseParser {
    static defaultConf = {
        version: '0.0.1',
        results: {
            flat: [
                ['title', 'Title'],
                ['h1', 'H1 Header']
            ],
            arrays: {
                h2: ['H2 Headers List', [
                    ['header', 'Header'],
                ]],
            }
        },
        max_size: 2 * 1024 * 1024,
        parsecodes: {
            200: 1,
        },
        results_format: "Title: $title\nH1: $h1\nH2 headers:\n$h2.format('$header\\n')\n",
        limitH2Tags: 3,
    };

    static editableConf = [
        ['limitH2Tags', ['textfield', 'Limit H2 tags']],
    ];

    async parse(set, results) {
        const { success, data, headers } = await this.request('GET', set.query);
    
        if (success && typeof data == 'string') {
            let matches;
            if (matches = data.match(/<title[^>]*>(.*?)<\/title>/))
                results.title = matches[1];
            
            if (matches = data.match(/<h1[^>]*>(.*?)<\/h1>/))
                results.h1 = matches[1];

            if (results.h2) {
                let count = 0;
                const re = /<h2[^>]*>(.*?)<\/h2>/g;
                while(matches = re.exec(data)) {
                    results.h2.push(matches[1]);
                    if (++count == this.conf.limitH2Tags)
                        break;
                }
            }
        }

        return results;
    }
}

常用链接

🔗 将文件保存到磁盘的示例

演示如何将文件直接保存到磁盘的示例

🔗 会话操作示例

在 JavaScript 爬虫工具中使用会话功能

🔗 在会话中保存数据示例

演示在会话中存储任意数据的可能性

🔗 使用 results.addElement()

使用 results.addElement() 填充数据数组的示例，并演示其与普通 .push() 的区别

常用链接​

🔗 将文件保存到磁盘的示例

🔗 会话操作示例

🔗 在会话中保存数据示例

🔗 使用 results.addElement()

常用链接