php - Solved with answer: How to change format of "date string" in scrapy spider? -


i scraping several websites using scrapy. 1 problem "post_date" item has different formats on different websites, example "06/01/2015" vs "1 june 2015". know how convert date string "06/01/2015" "1 june 2015", make date strings have same format in mysql.

hypothetically, date on website provided as:

<div class="date">06/01/2015</div> 

the following parse function in scrapy spider:

def parse(self, response): hxs = htmlxpathselector(response) sites = hxs.select('//*')      site in sites: il = exampleitemloader(response=response, selector=site)            il.add_xpath('post_date', 'div[@class="date"]/text()')                            ^^^^^^^^^^^^^^^^^^^^^^^^^^ yield il.load_item()  

the above code collect date string "06/01/2015". on other hand, when try convert date string "01 june 2015" following code, didn't work.

il.add_xpath('post_date', 'datetime.datetime.strptime(div[@class="date"]/text(), "%m/%d/%y").strftime("%d %b %y")')                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

i got error message follows:

 exceptions.valueerror: invalid xpath: 

or should use "replace_value" convert format after xpath? such hypothetical codes follows:

il.add_xpath('post_date', 'div[@class="date"]/text()') il.replace_value('post_date', 'datetime.datetime.strptime("old post_date value", "%m/%d/%y").strftime("%d %b %y")')                                                           ^^^^^^^^^^^^^^^^^^^^^ 

can done in scrapy spider? thanks!

after reading scrapy documentation , testing, came solution how solved it:

il.add_xpath('post_date', 'div[@class="date"]/text()') il.replace_value('post_date', datetime.datetime.strptime(il.get_collected_values('post_date')[0], "%m/%d/%y").strftime("%d %b %y")) 

the output got is:

01 june 2015 

explanation: il.get_collected_values('post_date') collected value provided il.add_xpath(), "list" value of "[u'06/01/2015']". tried il.get_collected_values('post_date')[0] , got "06/01/2015" out of list.

il.replace_value('post_date') assigns new value collected 'post_date'.


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -