Since the demise of Dotnet.org.za, I haven’t had a blog. So I decided to start again. So lets start with something simple that I had to do today.
It sound like it should be simple to get an electronic list of postal codes for South Africa to import into your app, but its surprisingly difficult. I decided to write a PowerShell script to retrieve it from SA Web. They have a page for each alphabet letter that contains the suburbs starting with that letter. The script downloads each of the 26 pages and parses each page for the list of suburbs and their codes.
I decided to use the InternetExplorer.Application COM object that allows me to access the HTML DOM.
$alph=@()
97..122|%{$alph+=[char]$_}
$urls=@()
$alph | % {$urls+="http://www.saweb.co.za/pcodes/"+$_+".html"}
$ie= new-object -com InternetExplorer.Application
$ie.visible = $true
$codes = @()
$urls | % {
$ie.Navigate($_);
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$doc = $ie.Document;
$doc.getElementsByTagName("P") | % {$codes += $_.innerText }
}
$ie.Quit()
$results = @()
$regex = [regex]"^(?<suburb>[\w\s]+)\s-\s(?<code>\d+)\s-.*$"
$codes | % {
$regex.Matches($_) | %{
if($_.Success -eq $True) {
$suburb = $_.Groups[1].Value
$code = $_.Groups[2].Value.PadLeft(4, '0')
$results += New-Object Object | Add-Member NoteProperty Suburb $suburb -PassThru | Add-Member NoteProperty Code $code -PassThru
}
}
}
$results| Export-Csv "postalcodes.csv"
The results are exported as a CSV.
No comments:
Post a Comment