scrapy - Avoid Duplicate URL Crawling -


i coded simple crawler. in settings.py file, referring scrapy documentation, used

dupefilter_class = 'scrapy.dupefilter.rfpdupefilter' 

if stop crawler , restart crawler again, scraping duplicate urls again. doing wrong?

i believe looking "persistence support", pause , resume crawls.

to enable can do:

scrapy crawl somespider -s jobdir=crawls/somespider-1 

you can read more here.


Comments