PowerShell capture website data using regex and Invoke-WebRequest -

- May 15, 2014

i trying capture playing song on radio station when displayed on website, i'm @ beginning of writing script, far have following code:

$webpage = (invoke-webrequest http://www.2dayfm.com.au).content $regex = [regex]"(.*nowplayinginfo.*span)" $regex.match($webpage).value.split(">")[4].replace("</span","")

this captures website listed in code, there's 2 things issue.

the first thing, when code run, comes loading... reason this, if @ result of this:

(invoke-webrequest http://www.2dayfm.com.au).content | clip

paste notepad, if search "playing:" has line:

<p><span class="listenheading">playing:</span> <span id="nowplayinginfo">loading...</span></p>

when run invoke-webrequest in code, captures website @ point in time, , see in real life, navigate in browser http://www.2dayfm.com.au/ , right @ top playing song is, says loading... short time before song loads.

the other thing hoping remove second line of code , clean regex on first line, don't need use many split & replace methods.

the other way trying work copying xpath chrome inspect element, use

(invoke-webrequest -uri 'http://www.2dayfm.com.au').content | select-xml -xpath '//*[@id="nowplayinginfo"]'

but doesn't seem work either, doesn't accept xpath, xpath chrome thinks is, different powershell expects xpath be.

using scraper isn't going work because initial html content downloaded. page uses javascript/ajax render song/artist info manipulating dom after initial download. however, can use internetexplorer.application com object this:

$ie = new-object -comobject internetexplorer.application $ie.navigate('http://www.2dayfm.com.au/') while ($ie.readystate -ne 4) { start-sleep -seconds 1 } # need timeout here $null = $ie.document.body.innerhtml -match '\s+id\s*=\s*"nowplayinginfo"\s*>(.*)</span' $ie.quit() $matches[1]

outputs:

little mix, black magic

the $null = bit rid of true output -match operator generates (assuming regex matches).

Search This Blog

Chrom

PowerShell capture website data using regex and Invoke-WebRequest -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

javascript - AngularJS - Uncaught Error: [$injector:modulerr] Failed to instantiate module -

ios - Swift Array Resetting Itself -