It’s time to update Denver Traffic Accidents on GitHub. The easiest thing would be to do this manually but I don’t want to have to keep doing it, no matter how infrequently. However, there is now a snag that I need to deal with – like IMP the data source has changed somewhat. Both datasets are now served using ArcGIS servers.
One part of this challenge is easy: I don’t know all that much about the GeoJSON format. I’ve worked with some heavily nested JSON before and didn’t enjoy that. But that’s not the case here. This should easily be solved with some research on the format.
The other is tougher. Esri ArcGIS servers have a 2000-row default limit on API calls. It can be changed but this hasn’t. There are about 250,000 accidents in the database. I have all the accidents in the database up until about April 23 in CSV. Now, there is also a download option for CSV that doesn’t limit the number but this has it’s own challenge. There is no API for it. Selenium may be an option that could result in the full CSV download but it seems like the URLs change and some parts are unlabeled.
A third option is to simply augment the data I already put on Kaggle. That would involve converting the GeoJSON to CSV or Parquet for compactness. Changing formats shouldn’t be too hard but the database goes back and relables entries that have pending court cases that usually involve felonies. There aren’t too many but a clean download would be preferable.