php - Solved with answer: How to change format of "date string" in scrapy spider? -
i scraping several websites using scrapy. 1 problem "post_date" item has different formats on different websites, example "06/01/2015" vs "1 june 2015". know how convert date string "06/01/2015" "1 june 2015", make date strings have same format in mysql.
hypothetically, date on website provided as:
<div class="date">06/01/2015</div>
the following parse function in scrapy spider:
def parse(self, response): hxs = htmlxpathselector(response) sites = hxs.select('//*') site in sites: il = exampleitemloader(response=response, selector=site) il.add_xpath('post_date', 'div[@class="date"]/text()') ^^^^^^^^^^^^^^^^^^^^^^^^^^ yield il.load_item()
the above code collect date string "06/01/2015". on other hand, when try convert date string "01 june 2015" following code, didn't work.
il.add_xpath('post_date', 'datetime.datetime.strptime(div[@class="date"]/text(), "%m/%d/%y").strftime("%d %b %y")') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
i got error message follows:
exceptions.valueerror: invalid xpath:
or should use "replace_value" convert format after xpath? such hypothetical codes follows:
il.add_xpath('post_date', 'div[@class="date"]/text()') il.replace_value('post_date', 'datetime.datetime.strptime("old post_date value", "%m/%d/%y").strftime("%d %b %y")') ^^^^^^^^^^^^^^^^^^^^^
can done in scrapy spider? thanks!
after reading scrapy documentation , testing, came solution how solved it:
il.add_xpath('post_date', 'div[@class="date"]/text()') il.replace_value('post_date', datetime.datetime.strptime(il.get_collected_values('post_date')[0], "%m/%d/%y").strftime("%d %b %y"))
the output got is:
01 june 2015
explanation: il.get_collected_values('post_date') collected value provided il.add_xpath(), "list" value of "[u'06/01/2015']". tried il.get_collected_values('post_date')[0] , got "06/01/2015" out of list.
il.replace_value('post_date') assigns new value collected 'post_date'.
Comments
Post a Comment