The best part of this analysis was the tone - he was polite, non-judgemental, and understanding of the fact that it's tough to launch something new and yes, the developers could have done it better, but no need to pile-on, they're probably kicking themselves enough as is.
I am developing a similar application with a map and live data generated dynamically by my server. The difference with my app, is that the data will be global, not for one city.
The Citi Bike Map appears to load all of the data available regardless of map extent, which can be practical with a small dataset.
How do you deal with caching when the extent of the user's map is constantly changing? My server currently returns data relevant to the map extent only. If I were to get Cron to regularly crunch out one big global cache to send to the browser, I would kill my site.
I am assuming the best option would be to figure out some sort of tiling scheme.
If anyone out there has any ideas about that, I am all ears!
You should look into vector map tiles! Basically, you create a little JSON snippet for every image that the map server returns, and then overlay it in the client. There's support for JSON tiles in Polymaps (<http://polymaps.org/>).
Cool! I've not heard of Polymaps before, and will look into it. For the time being, I am using Google Maps.
Do you have any recommendations on how to create the tiles?
The most important thing, is that when a user clicks the markers in my map(whether or not they are real vector markers or rasters passing themselves off as markers), the marker's associated attribute values need to be available to be fed into a popup.
I wouldn't worry about the caching, as your app is unlikely to get the same news coverage as the citi bikes program. I think the author misdiagnosed the citi bikes problem. My guess is they just had too few apache processes. If there was just one database table, the database's query cache should have been able to handle this load. His suggestions of cron based caching are likely redundant and overkill. Even if that wasn't enough, PHP's APC would be a better solution.
You need a way to do geospatial queries and likely clustering of the results. PostGIS will work well, and there are kmeans clustering plugins, or you can just do grid based clustering.
I think a much simpler and more generally applicable piece of advice is to do load testing. Its not that he couldnt think of a faster way to do it. Hejust didnt realize heneeded to because he didnt load test.
The browser is not the only place that can cache HTTP responses - proxy caches from the client's ISP and/or local WAN or corporate intranets should also obey caching rules in the HTTP response headers.
HTTP caching works. The problem with the bike share site in my opinion is instead of using established methods for scaling (like HTTP caching) and other good practices they reinvented the wheel as a Rube Goldberg monstrosity which, big surprise, didn't work.
I've updated my comment above to say "and" instead of "or" as you're quite correct.
From the timeline of events (surviving initial onslaught, then falling over) I'd assumed that was a large part of the problem, people sitting on a live-updating map.