How to select only this text node using BeautifulSoup and Python? -


i have html stucture :

<div class="foo">     <h3>title</h3>     <br>some text want retrieve. <br><br> text too.     <br> (numbers , position of "br" tag indetermined) , 1 too.     <div class="subfoo">some other text don't want.</div> </div> 

in python script, have written :

examplesoup = bs4.beautifulsoup(res.text, "html.parser") elems = examplesoup.select('.foo') print(elems[0].gettext()) 

as expected whole text :

title text want retrieve. other text don't want. 

how string in div has no tag around ie :"some text want retrieve. text too. , 1 too." ? help.

you can use .next_sibling next element in tree.

example

>>> soup = beautifulsoup(html) >>> print soup.prettify() <html>  <body>   <div class="foo">    <h3>     title    </h3>    text want retrieve.    <div class="subfoo">     other text don't want.    </div>   </div>  </body> </html>  >>> print soup.find('div', { 'class' : 'foo' } ).h3.next_sibling.strip() text want retrieve. 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -