How to select only this text node using BeautifulSoup and Python? -
i have html stucture :
<div class="foo"> <h3>title</h3> <br>some text want retrieve. <br><br> text too. <br> (numbers , position of "br" tag indetermined) , 1 too. <div class="subfoo">some other text don't want.</div> </div>
in python script, have written :
examplesoup = bs4.beautifulsoup(res.text, "html.parser") elems = examplesoup.select('.foo') print(elems[0].gettext())
as expected whole text :
title text want retrieve. other text don't want.
how string in div has no tag around ie :"some text want retrieve. text too. , 1 too." ? help.
you can use .next_sibling
next element in tree.
example
>>> soup = beautifulsoup(html) >>> print soup.prettify() <html> <body> <div class="foo"> <h3> title </h3> text want retrieve. <div class="subfoo"> other text don't want. </div> </div> </body> </html> >>> print soup.find('div', { 'class' : 'foo' } ).h3.next_sibling.strip() text want retrieve.
Comments
Post a Comment