Python Soup is Tasty

April 30, 2009, 5:50 pm

This is why I love Python. I wanted to get a list of countries for my Timeline Project, so I went to WikiPedia and found a pretty decent list of countries. The combination of Python and Beautiful Soup made writing a tool to scrape the data faster than copy-pasting and text editing.

import urllib2 import httplib import codecs import sys from BeautifulSoup import BeautifulSoup opener = urllib2.build_opener() try: url = "http://en.wikipedia.org/wiki/List_of_countries" req = urllib2.Request(url, "", { "User-Agent" : "Souper" } ()) response = opener.open(req) data = response.read() except urllib2.URLError, err: print "HTTP error:", err.reason sys.exit () except httplib.HTTPException, err: print "HTTP error:", err sys.exit () streamWriter = codecs.lookup('utf-8')[-1] sys.stdout = streamWriter(sys.stdout) soup = BeautifulSoup (data) print "$countries = array (" countries = [] image_spans = soup.findAll('span', {"class" : "flagicon"}) for span in image_spans: href = span.findNextSibling('a') if (href): countries.append (unicode(href.contents[0]).encode('ascii','ignore')) for i in range(0, len(countries)): print """ + countries[i] + (""," if (i < len(countries) - 1) else "");")

Permalink - Tags: Development