PowerShell capture website data using regex and Invoke-WebRequest -
i trying capture playing song on radio station when displayed on website, i'm @ beginning of writing script, far have following code:
$webpage = (invoke-webrequest http://www.2dayfm.com.au).content $regex = [regex]"(.*nowplayinginfo.*span)" $regex.match($webpage).value.split(">")[4].replace("</span","")
this captures website listed in code, there's 2 things issue.
the first thing, when code run, comes loading... reason this, if @ result of this:
(invoke-webrequest http://www.2dayfm.com.au).content | clip
paste notepad, if search "playing:" has line:
<p><span class="listenheading">playing:</span> <span id="nowplayinginfo">loading...</span></p>
when run invoke-webrequest in code, captures website @ point in time, , see in real life, navigate in browser http://www.2dayfm.com.au/ , right @ top playing song is, says loading... short time before song loads.
the other thing hoping remove second line of code , clean regex on first line, don't need use many split & replace methods.
the other way trying work copying xpath chrome inspect element, use
(invoke-webrequest -uri 'http://www.2dayfm.com.au').content | select-xml -xpath '//*[@id="nowplayinginfo"]'
but doesn't seem work either, doesn't accept xpath, xpath chrome thinks is, different powershell expects xpath be.
using scraper isn't going work because initial html content downloaded. page uses javascript/ajax render song/artist info manipulating dom after initial download. however, can use internetexplorer.application com object this:
$ie = new-object -comobject internetexplorer.application $ie.navigate('http://www.2dayfm.com.au/') while ($ie.readystate -ne 4) { start-sleep -seconds 1 } # need timeout here $null = $ie.document.body.innerhtml -match '\s+id\s*=\s*"nowplayinginfo"\s*>(.*)</span' $ie.quit() $matches[1]
outputs:
little mix, black magic
the $null =
bit rid of true
output -match
operator generates (assuming regex matches).
Comments
Post a Comment