python - IMDb HTML Extraction - With Beautiful Soup -
with beautiful soup4, i'm trying text doesn't seem tagged. (i may wrong, i'm not capable html)
i need extract several values imdb code of page; budget value , latest worldwide gross value particular film. length of code varies between films if there method using beautiful soup4 extract these values regardless of line number, hugely helpful. code:
<div id="tn15content"> <h5>budget</h5> $165,000,000 (estimated)<br/> <br/>
from source code of page: imdb box office page interstellar
i need '$165,000,000' extracted can store etc.
the gross code more confusing:
<h5>gross</h5> $188,020,017 (usa) (<a href="/date/03-19/">19 march</a> <a href="/year/2015/">2015</a>)<br/>$187,991,439 (usa) (<a href="/date/03-15/">15 march</a> <a href="/year/2015/">2015</a>)<br/>$187,930,551 (usa) (<a href="/date/03-14/">14 march</a> <a href="/year/2015/">2015</a>)<br/>$187,918,949 (usa) (<a href="/date/03-11/">11 march</a> <a href="/year/2015/">2015</a>)<br/>$187,888,097 (usa) (<a href="/date/03-08/">8 march</a> <a href="/year/2015/">2015</a>)<br/>
all need recent (the worldwide figures further through huge chunk of code decided leave out due spacing on here.
i know there similar problem on here solved, couldn't solution work nor comment ask user providing answer particular solution due being new site. going try , imdbpy work, wasn't sure how install winpython.
Comments
Post a Comment