The purpose of this article - to show the general direction in parsing RSS. As an example, we will use RSS of our forum: http://en.a-parser.com/forum/english-news/index.rss. This preset may be used for other sites, but due to different standards, may need to make some modifications.
The process of parsing is quite simple and consists mainly of information search using regular expressions and its output to the result file.
The process of parsing is quite simple and consists mainly of information search using regular expressions and its output to the result file.
- Use
Net::HTTP - The proxy may not be used
- First parse all <item>...</item>
- Then from the resulting array parse the necessary information. In this example: title, date, URL and content.
- Purified from the superfluous elements (in the example it is HTML tags and entities, as well as other residual lines).
- Outputs a result using the capabilities of Template Toolkit, in the desired format.
http://en.a-parser.com/forum/english-news/index.rss
Output results in table - Tue, 22 Sep 2015 12:25:44 +0000
http://en.a-parser.com/threads/1854/
Everyone knows that A-Parser develop for parsing the information from the Internet in various amounts. Often received data is processed by other scripts or programs. But sometimes there is a need to output collected data in a beautiful form for further visual analysis. And about below.
Someone for such purposes (and small quantities) using a simple text file, someone outputs to CSV, and then processes the data in Excel, and we also output them as a table on a Web page. We will make a...
Output results in table
**********
Make sitemap with A-Parser - Fri, 18 Sep 2015 15:38:38 +0000
http://en.a-parser.com/threads/1849/
Sitemap (Wiki) - this is XML-file with the information for the search engines (such as Yandex, Google, Bing) about web pages, which are to be indexing. It helps search engines index the site more intelligently. Some SEO-experts consider the lack of such map a gross error. On the Internet there are many services and tools for creating these maps, as well as for their validation. We try to create sitemap using A-parser.
So, let's see what...
Make sitemap with A-Parser
**********
Loading of links via js - Fri, 18 Sep 2015 15:00:04 +0000
http://en.a-parser.com/threads/1848/
Today, on the example of site http://www.chegg.com/ we learn how to parse the information that is loaded by JS-script.
1) In this link, as described above, the content really is loaded by JS script. But in the end it too somewhere is receiving data, and therefore make somewhere request. We just need to find this query, and then use it. For this we use Developer Tools (Ctrl + Shift + I,...
Loading of links via js
...
eyJwcmVzZXQiOiJSU1MiLCJ2YWx1ZSI6eyJwcmVzZXQiOiJSU1MiLCJwYXJzZXJz IjpbWyJOZXQ6OkhUVFAiLCJkZWZhdWx0Iix7InR5cGUiOiJjdXN0b21SZXN1bHQi LCJyZXN1bHQiOiJkYXRhIiwicmVnZXgiOiI8aXRlbT4oLis/KTxcXC9pdGVtPiIs InJlZ2V4VHlwZSI6InNnIiwicmVzdWx0VHlwZSI6ImFycmF5IiwiYXJyYXlOYW1l IjoiaXRlbXMiLCJyZXN1bHRzIjpbIml0ZW0iXX0seyJ0eXBlIjoib3ZlcnJpZGUi LCJpZCI6InVzZXByb3h5IiwidmFsdWUiOmZhbHNlfSx7InR5cGUiOiJjdXN0b21S ZXN1bHQiLCJyZXN1bHQiOlsiaXRlbXMiLCJpdGVtIl0sInJlZ2V4IjoiPHRpdGxl PiguKz8pPFxcL3RpdGxlPiIsInJlZ2V4VHlwZSI6InMiLCJyZXN1bHRUeXBlIjoi YXJyYXkiLCJhcnJheU5hbWUiOiJ0aXRsZXMiLCJyZXN1bHRzIjpbInRpdGxlIl19 LHsidHlwZSI6ImN1c3RvbVJlc3VsdCIsInJlc3VsdCI6WyJpdGVtcyIsIml0ZW0i XSwicmVnZXgiOiI8cHViRGF0ZT4oLis/KTxcXC9wdWJEYXRlPiIsInJlZ2V4VHlw ZSI6InMiLCJyZXN1bHRUeXBlIjoiYXJyYXkiLCJhcnJheU5hbWUiOiJkYXRlcyIs InJlc3VsdHMiOlsiZGF0ZSJdfSx7InR5cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1 bHQiOlsiaXRlbXMiLCJpdGVtIl0sInJlZ2V4IjoiPGxpbms+KC4rPyk8XFwvbGlu az4iLCJyZWdleFR5cGUiOiJzIiwicmVzdWx0VHlwZSI6ImFycmF5IiwiYXJyYXlO YW1lIjoibGlua3MiLCJyZXN1bHRzIjpbImxpbmsiXX0seyJ0eXBlIjoiY3VzdG9t UmVzdWx0IiwicmVzdWx0IjpbIml0ZW1zIiwiaXRlbSJdLCJyZWdleCI6Iig/Ojxj b250ZW50fDxkZXNjcmlwdGlvbikuKz8oLis/KSg/OjxcXC9jb250ZW50fDxcXC9k ZXNjcmlwdGlvbikiLCJyZWdleFR5cGUiOiJzIiwicmVzdWx0VHlwZSI6ImFycmF5 IiwiYXJyYXlOYW1lIjoiZGVzY3MiLCJyZXN1bHRzIjpbImRlc2MiXX0seyJ0eXBl Ijoib3ZlcnJpZGUiLCJpZCI6ImRldGVjdGNoYXJzZXQiLCJ2YWx1ZSI6dHJ1ZX0s eyJ0eXBlIjoib3ZlcnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoi JHF1ZXJ5XFxuXFxuXG5bJSBpID0gMDtcbldISUxFIGkgPCBpdGVtcy5zaXplO1xu dGl0bGVzLiRpLnRpdGxlIF9cIiAtIFwiIF8gZGF0ZXMuJGkuZGF0ZSBfXCJcXG5c IjtcbmxpbmtzLiRpLmxpbmsgX1wiXFxuXCI7XG5kZXNjcy4kaS5kZXNjIF9cIlxc bioqKioqKioqKipcXG5cIjtcbmkgPSBpICsgMTtcbkVORCAlXVxuIn1dXSwicmVz dWx0c0Zvcm1hdCI6IiRwMS5wcmVzZXQiLCJyZXN1bHRzU2F2ZVRvIjoiZmlsZSIs InJlc3VsdHNGaWxlTmFtZSI6IiRkYXRlZmlsZS5mb3JtYXQoKS50eHQiLCJhZGRp dGlvbmFsRm9ybWF0cyI6W10sInJlc3VsdHNVbmlxdWUiOiJubyIsInF1ZXJ5Rm9y bWF0IjpbIiRxdWVyeSJdLCJ1bmlxdWVRdWVyaWVzIjpmYWxzZSwic2F2ZUZhaWxl ZFF1ZXJpZXMiOmZhbHNlLCJpdGVyYXRvck9wdGlvbnMiOnsib25BbGxMZXZlbHMi OmZhbHNlLCJxdWVyeUJ1aWxkZXJzQWZ0ZXJJdGVyYXRvciI6ZmFsc2V9LCJyZXN1 bHRzT3B0aW9ucyI6eyJvdmVyd3JpdGUiOmZhbHNlfSwiZG9Mb2ciOiJubyIsImtl ZXBVbmlxdWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0c1ByZXBl bmQiOiIiLCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6W10sInJl c3VsdHNCdWlsZGVycyI6W3sic291cmNlIjpbMCxbImRlc2NzIiwiZGVzYyJdXSwi dHlwZSI6ImRlY29kZUh0bWwiLCJhcnJheSI6ImRlc2NzIiwidG8iOiJkZXNjIn0s eyJzb3VyY2UiOlswLFsiZGVzY3MiLCJkZXNjIl1dLCJ0eXBlIjoic3RyaW5nUmVw bGFjZSIsImFycmF5IjoiZGVzY3MiLCJzZWFyY2giOiJlbmNvZGVkPjwhW0NEQVRB WyIsInJlcGxhY2UiOiIiLCJ0byI6ImRlc2MifSx7InNvdXJjZSI6WzAsWyJkZXNj cyIsImRlc2MiXV0sInR5cGUiOiJzdHJpbmdSZXBsYWNlIiwiYXJyYXkiOiJkZXNj cyIsInNlYXJjaCI6Il1dPiIsInJlcGxhY2UiOiIiLCJ0byI6ImRlc2MifSx7InNv dXJjZSI6WzAsWyJkZXNjcyIsImRlc2MiXV0sInR5cGUiOiJyZW1vdmVIdG1sIiwi YXJyYXkiOiJkZXNjcyIsInRvIjoiZGVzYyJ9LHsic291cmNlIjpbMCxbImRlc2Nz IiwiZGVzYyJdXSwidHlwZSI6InN0cmluZ1JlcGxhY2UiLCJhcnJheSI6ImRlc2Nz Iiwic2VhcmNoIjoiPCFbQ0RBVEFbIiwicmVwbGFjZSI6IiIsInRvIjoiZGVzYyJ9 XSwiY29uZmlnT3ZlcnJpZGVzIjpbXX19