Maintaining scraped folder structure?

Webmin

A-Parser Pro License
A-Parser Pro
It doesn't look like it is possible to keep the same folder structure currently when outputting scraped information.

For example, if the input query text file looks something like this:

cars
cars/ford
cars/ford/focus/
cars/ford/focus/red
cars/porsche
cars/porsche/carrera
cars/porsche/carrera/black
cars/porsche/carrera/black/new
cars/porsche/carrera/black/used
cars/nissan
cars/nissan/xtrail
dogs/
dogs/bulldog/black
dogs/labrador/golden
dogs/labrador/golden/large
dogs/labrador/brown/large
...

Is it possible to maintain the same structure in the output files?

Also, how hard would this be to output all of the scraped information straight into a sql file so a databse of the information could be created (again keeping the same structure)?

Thank you.
 
Last edited:
In this case, use $query in the file name format, for example:
Code:
$query/$datefile.format().txt
This will create the folder structure:
ziYn3.png


Also, how hard would this be to output all of the scraped information straight into a sql file so a databse of the information could be created (again keeping the same structure)?
You can use result format like
Code:
INSERT INTO blah VALUES('$query', '$p1.pr', ...)\n
 
Thanks for the reply.

I should of made it clear that my input file isn't in a folder structure and that regex is used to extract the information needed from the input file. So my input text file is actually something like:

Bedroom (#1)
Bedroom (#1) Bedding (#2)
Bedroom (#1) Bedding (#2) Bed Pillows (#20445)
Bedroom (#1) Bedding (#2) Bed Skirts (#20450)
Bedroom (#1) Bedding (#2) Bed-in-a-Bag (#20469)
Bedroom (#1) Bedding (#2) Blankets & Throws (#175750)
Bedroom (#1) Bedding (#2) Canopies & Netting (#48090)
Bedroom (#1) Bedding (#2) Comforters & Sets (#45462)
Bedroom (#1) Bedding (#2) Decorative Bed Pillows (#115630)
Bedroom (#1) Bedding (#2) Duvet Covers & Sets (#37644)
Bedroom (#1) Bedding (#2) Mattress Pads & Feather Beds (#175751)
Bedroom (#1) Bedding (#2) Other Bedding (#25815)
Bedroom (#1) Bedding (#2) Pillow Shams (#43397)
Bedroom (#1) Bedding (#2) Quilts, Bedspreads & Coverlets (#175749)

I know currently it's not achievable when using regex on the input file so it's more of a request if this functionality can be added (or is there a work around)

Thanks.
 
Then use this file name format:
Code:
[% query.replace('\s*\(.+?\)\s*', '/') %]$datefile.format().txt
 
Thank you, I will try and let you know how I get on.

I sent you a PM with the actual file I will be using as the input query text file in case the example I have typed up above is slightly different. Could you let me know if there are any changes to the regular expression above please?

Thank you.
 
Back
Top