How to select only this text node using BeautifulSoup and Python? -

- May 15, 2014

i have html stucture :

<div class="foo">     <h3>title</h3>     <br>some text want retrieve. <br><br> text too.     <br> (numbers , position of "br" tag indetermined) , 1 too.     <div class="subfoo">some other text don't want.</div> </div>

in python script, have written :

examplesoup = bs4.beautifulsoup(res.text, "html.parser") elems = examplesoup.select('.foo') print(elems[0].gettext())

as expected whole text :

title text want retrieve. other text don't want.

how string in div has no tag around ie :"some text want retrieve. text too. , 1 too." ? help.

you can use .next_sibling next element in tree.

example

>>> soup = beautifulsoup(html) >>> print soup.prettify() <html>  <body>   <div class="foo">    <h3>     title    </h3>    text want retrieve.    <div class="subfoo">     other text don't want.    </div>   </div>  </body> </html>  >>> print soup.find('div', { 'class' : 'foo' } ).h3.next_sibling.strip() text want retrieve.

Search This Blog

Chrom

How to select only this text node using BeautifulSoup and Python? -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

android - onKey event for editText not firing -