Final Project Review and Discussion

Alec Ansusinha

Motivation

In December, I took a cross country journey and travelled over 2900 miles. The memories were invaluable, but the gasoline had a hefty price tag. Gas is expensive! And it would have been fantastic to know exactly where I could find the cheapest gas in the city. Sometimes the price varies as much as $0.50 per gallon. I might have saved $30 had there been a tool to find cheaper gas. Thus, my project will attempt to create a web application that shows exactly where one can find cheap gas in their city.


Objective

My objective for the project was to design and implement a web application that collects the lowest gas prices in Columbus, OH and displays them on a map. The prices will be updated daily to reflect the ten or fifteen lowest-priced gas stations in the city.

Methodology

The first step I took toward accomplishing my objective was finding websites that already collect the gas prices at regular intervals. I quickly learned of GasBuddy.com, which has associate websites for some of the largest cities in the United States. Columbus' affiliate site is columbusgasprices.com which is the site I chose to use to collect my data.

I learned that gasbuddy.com already has mapped stations and their gas prices, but their map is not easy to find, clunky, and outdated looking, and not as simple as mine set out to be, so I still had a lot to work with.

My next step would be figuring out a way to grab the information from the columbusgasprices website and parse it into a usable format for my web application, and I needed it to update regularly.

I wrote a python script, which can be downloaded from my other server. In short, the script takes the raw hypertext from columbus gas prices and sorts through it to find the fifteen lowest priced gas stations for standard grade around Columbus. Let me break it down further.

  1. The script imports several important modules including requests, which retrieves the raw hypertext and beautiful soup which prettifies and parses the hypertext for easy scraping.
  2. The data is retrieved and parsed by beautifulsoup. The information we need to present to a user of the web app will be: price, name of gas station, and address, and we will need the latitude and longitude in order to display the gas stations on a map.
  3. If you require a deeper explanation of the data parsing please contact me @a.ansusinha@gmail.com. It took a long time to learn how to write a proper web scraping script. Essentially, I identified the data structure used by the website and had beautiful soup find exactly what i needed. Then, I created arrays for each of the categories of information.
  4. After grabbing the address strings for the lowest priced gas stations, I had to geocode them to find their approximate GPS coordinates. This entailed reformatting the strings into a valid query structure. Then, I used Google Maps' geocoding API.
  5. After putting lat and lng into their own array, I created a two dimensional array, so that each row was a tuple that contained all the information for one gas station. They were already kept in order and sorted solely based on the way I created each of the original arrays.
  6. The 2-dimensional array was then converted and exported to a comma separated variable formatted text file in the same directory as the python file.
Here are some of the complications I encountered writing the script:
  1. The datatypes used by the beautifulsoup module were very foreign to me, it took a lot of iterations to figure out what I could do with the parsed data. Objects that intuitively seemed like strings or numerals end up being arrays or special datatypes which regular python cannot handle or iterate.
  2. The formatting of columbus gas prices' website is horrendous. Instead of text for their prices, they use images, and the only way I was able to understand how to parse those images into usable data was by scraping the class attributes of div tags. If I were to design the site myself, I might store the data differently, so I could dynamically generate tables that hold the information.
  3. Further, the addresses provided for each gas tation were not uniform. Sometimes a cross street or a nearby main road would be listed. This threw off Google Maps geocoding api and often meant markers would be placed in the middle of nowhere or hundreds of miles away. I remedied this problem by truncating all cross streets and substrings beginning with "near."

I created a linux crontab, which is essentially a TaskScheduler, so that every 3 hours, the python script will run and the gas prices csv file will be updated to reflect the most current postings on Columbusgasprices.com. This coul be adjusted to any frequency down to the minute. I arbitrarily decided 8 updates a day would be sufficient.

My html file contains scripts for the leaflet, jQuery, and jQuery-csv libraries. One of my biggest complications through the whole project was that I could not run my python script in the gis.osu.edu web gis server. I had to create a new server (I used Amazon hosting). And then I had to figure out a method to implement cross-server AJAX, so that I could grab the data from the csv on another server and use it to generate markers for each gas station. Luckily, I found a work-around, but it compromises on security slightly. The jQuery-csv library allowed me to easily convert the csv file into a two dimensional python array. And I designed the script so that on a successful get request of the csv file, the fifteen markers of the fifteen lowest gas prices would be dynamically generated.

For the basemap I used a simple, elegant black-and-white map that makes it easy to see streets and the markers for gas stations clearly.

Possible Further Projects

Overall, I think the project was grueling, but a resounding success. I feel comfortable scripting in python, and I was able to learn a lot about queries, APIs, and web protocols while building a practical tool that can be easily upgraded. And, the objectives I devised in my proposal were exactly met.