Thursday, May 16, 2013

Using Python and Google Maps API to Geocode addresses

Anyone who has used Python scripting in the past knows how powerful it can be. With the xml.dom.minidom module built into to the python standard library, it allows us to string and parse elements from an xml document. When making a call to the Google Maps API directions service results can be returned in either json or xml. Personally I prefer to use xml because it is simpler to manipulate and understand.

When using the  Google Directions API the returned xml file contains gobs of data including distance in meters from origin to destination, time in seconds for origin to destination, geographic coordinate pair of destination and origin, as well as details about each leg of the directions. The only catch is, Google limits the rate at which users can request data. There is a limit for calls per seconds as well as a daily cap on total requests per 24 hour period.

The example python script below may give you a few ideas on how you can use Google Maps to geocode small batches of street addresses simply with Python. A similar script could also be implemented into ArcGIS field calculator when represented as a function.

A practical and simple example web application which generates a HeatMap using the HeatMapAPI and displays on a Google Map available here.


Related Posts:


An example python script:

##Importing Modules
import os, urllib, urllib2, time, csv 
from xml.dom.minidom import parseString


origin = "Shippensburg_Shopping_Center"
destinations = "addresses.txt"
f = csv.reader(open('BATCH5.csv'))

CustomerNumber, HomeAddress1, POSCashEquivalentQ, SalesDate, MediaTypeName, MediaTypeID = zip(*f)

addresses = HomeAddress1
for i, rawaddress in enumerate(addresses):
    address = rawaddress.strip().replace(' ','_').replace('.','').replace('-','') + "_17257"
    pos_address = rawaddress.strip()
    print("Processing: " + address)
    #
    # Request data from the Google API
    #

    
    
    url = "http://maps.googleapis.com/maps/api/directions/xml?origin=" + origin + "&destination=" + address + "&sensor=false"
    xmlFile = urllib.urlretrieve(url,'test.xml')
    xmlFileOpen = urllib2.urlopen(url).read()
    xmlFileDom = parseString(xmlFileOpen)
    #
    # Parse the XML file to find and grab the total duration value (seconds)
    # Notice the compound getElementsByTagName call, which grabs the entire 'duration' snippet, then parses out the duration value
    #
    xmlstatus = xmlFileDom.getElementsByTagName("DirectionsResponse")[0].getElementsByTagName("status")[0].toxml()
    apiStatus = xmlstatus.replace('<status>','').replace('</status>','')
    print(apiStatus)    

    try:
        xmlDuration = xmlFileDom.getElementsByTagName("duration")[-1].getElementsByTagName("value")[-1].toxml()
        xmlSeconds = xmlDuration.replace('<value>','').replace('</value>','')

    #
    # Parse the XML file to find and grab the total distance (meters)
    # Notice the compound getElementsByTagName call, which grabs the entire 'distance' snippet, then parses out the distance value
    #
    
        xmlDistance = xmlFileDom.getElementsByTagName("distance")[-1].getElementsByTagName("value")[-1].toxml()
        xmlMeters = xmlDistance.replace('<value>','').replace('</value>','')

    #
    # Parse the XML file to find and grab the geographic coordinates (I have no idea if you want these, but maybe you do.)
    # Notice the compound getElementsByTagName calls...
    #
    
        xmlLng = xmlFileDom.getElementsByTagName("end_location")[-1].getElementsByTagName("lng")[-1].toxml()
        xmlLat = xmlFileDom.getElementsByTagName("end_location")[-1].getElementsByTagName("lat")[-1].toxml()
        geoCoords = xmlLng.replace('<lng>','').replace('</lng>','') + "," + xmlLat.replace('<lat>','').replace('</lat>','')
        
    except IndexError:
        
    outdata = open('batch4_googleout.csv','a')
    outdata.write(pos_address + "," + address + "," + geoCoords + "," + xmlSeconds + "," + xmlMeters + "\n")
    outdata.close()

    #
    # Delete vars once they've outlived their usefulness
    #
    del(url,xmlFile,xmlFileOpen,xmlFileDom)
    del(xmlDuration,xmlSeconds)
    del(xmlDistance,xmlMeters)
    del(xmlLng,xmlLat,geoCoords,outdata)
    # The end of the "for i" loop is indicated by the line that is not indented (that is not in the loop).
    # If there's more items available in the 'addresses' list, then go back to the top of the loop and do the next one.
    # If there's no more items, then end the loop and move on to the commands below.
del(origin,addresses,address)
print ("Done")


 

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...