Problem statement
I’ve picked up sailing this summer as a hobby. Its fun, but its wind-dependent and I am not allowed to take a club boat out if the wind-speed is higher than 10 knots. I usually check the weather at home and head to the club if the forecast is < 10 knots. Once I’m at the club, I check the current windspeed just before taking the boat out to make sure the situation hasn’t changed.
Its been bugging me for a long time that the windfinder website makes me download > 1.5MB of data when all I want is the current wind direction and speed.
I use the Alcatel Go Flip phone, and I don’t have a lot of data allowance in my mobile service. Downloading 1.5MB everytime I want to know the wind-speed and direction is obnoxious. Its rude to force my device to run JS code to fiddle with the DOM instead of doing a simpler parse-and-display operation. Its much better to fiddle with the data on my workstation at home and have my phone get only the information I want.
Investigations and options
I’ve been thinking of using libcurl and C to get the webpage and parse the HTML and then serve only the data I’m interested in, so that I don’t have to deal with downloading the 1MB of JS that is triggered by the original 0.5 MB HTML page. Today, I finally combed through the original HTML and found where that particular data was being stored. Its not even in the HTML, its in script-tags in the HTML! I question the wisdom of this, and would love to find out why they chose to do it this way.
I have no clue as to why they don’t build the HTML on the server-side and pass that over the wire.
Another thing that would have made sense to me is have the HTML fire up JS that calls an API endpoint that returns the data in a nice datastructure and then place that data into the right parts of the DOM tree. This makes sense because then the webpage is a thin presentation layer over the API endpoint (that could concievably be used by other frontends such as a native mobile application). I don’t see why they’ve done it the way they have.
Chosen solution
I finally solved this silly problem by piping curl through sed and getting all the information I care about in 29 bytes. The code is below. With a bit more parsing, I can get the 29 bytes down even further. I don’t need to, but I can. This is what I like. I like getting the information I want with the least amount of ceremony.
/tmp/foo| $ curl -s "https://www.windfinder.com/forecast/lake_ontario_cherry_beach" | sed -ne '/wd:/p' -e '/ws: /p' wd: 61, ws: 2.0, /tmp/foo| $
Now I need to do this in a cron-job every 15 minutes and write the output to a file (yay caching, so I’m not being a jerk to the windfinder website). After that, any webserver (e.g. nginx) can serve that file upon an HTTP request.
Easy, peasy, beautiful.
Leave a Reply