Thursday, September 22, 2016

Scrapy

http://stackoverflow.com/questions/14073442/scrapy-storing-the-data

You can view a list of available commands by typing scrapy crawl -h from within your project directory.
scrapy crawl spidername -o items.json -t json
  • -o specifies the output filename for dumped items (items.json)
  • -t specifies the format for dumping items (json)
scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv
  • --set is used to set/override a setting
  • FEED_URI is used to set the storage backend for the item dumping. In this instance it is set to "output.csv" which is using the local filesystem ie a simple output file.(for current example - output.csv)
  • FEED_FORMAT is used to set the serialization format for the (output) feed ie (for current example csv)
References (Scrapy documentation):
  1. Available tool commands (for the command line)
  2. Feed exports

2 comments:

  1. http://stackoverflow.com/questions/33307073/removing-html-tags-without-text-extract

    ReplyDelete
  2. https://blog.scrapinghub.com/2016/07/20/scrapy-tips-from-the-pros-july-2016/
    https://stackoverflow.com/questions/37644/examining-berkeley-db-files-from-the-cli

    ReplyDelete