python - IMDb HTML Extraction - With Beautiful Soup -


with beautiful soup4, i'm trying text doesn't seem tagged. (i may wrong, i'm not capable html)

i need extract several values imdb code of page; budget value , latest worldwide gross value particular film. length of code varies between films if there method using beautiful soup4 extract these values regardless of line number, hugely helpful. code:

<div id="tn15content"> <h5>budget</h5> $165,000,000 (estimated)<br/> <br/> 

from source code of page: imdb box office page interstellar

i need '$165,000,000' extracted can store etc.

the gross code more confusing:

<h5>gross</h5> $188,020,017 (usa) (<a href="/date/03-19/">19 march</a> <a href="/year/2015/">2015</a>)<br/>$187,991,439 (usa) (<a href="/date/03-15/">15 march</a> <a href="/year/2015/">2015</a>)<br/>$187,930,551 (usa) (<a href="/date/03-14/">14 march</a> <a href="/year/2015/">2015</a>)<br/>$187,918,949 (usa) (<a href="/date/03-11/">11 march</a> <a href="/year/2015/">2015</a>)<br/>$187,888,097 (usa) (<a href="/date/03-08/">8 march</a> <a href="/year/2015/">2015</a>)<br/> 

all need recent (the worldwide figures further through huge chunk of code decided leave out due spacing on here.

i know there similar problem on here solved, couldn't solution work nor comment ask user providing answer particular solution due being new site. going try , imdbpy work, wasn't sure how install winpython.

use regular expression

\$([0-9,]+) \(usa\)  \$([0-9,]+) \(worldwide\) 

http://pythex.org/


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -