Corona Country is a twist on the earlier project Covid19 Progress. This app shows you data at the county level in your area. So you can see how coronavirus is going where you are.
There are a couple of reasons I decided to launch this project. While it’s interesting to compare countries and states, I think it’s more interesting to feel some agency and control over your own personal choices. Knowing which areas to avoid and which areas to frequent for things like shopping and essential services could mean the difference between life and death for people with comorbidity factors.
Another major difference from the previous project is that it shows a two-week average in daily change rather than focusing on the current day’s data. The problem with showing the current day’s data is that since this is a chaotic disaster, the numbers often seem to be batched and some zeros show up in between numbers in the thousands. Showing an average makes it much more clear what’s going on if you’re trying to see a simple number that explains and compares between counties.
Since Corona Country is GPS-based and shows you the nearby statistics, it also features a dropdown menu to allow you to check out the statistics in other metropolitan areas.
Tech Stack
This project is very different from Covid19 Progress on the back-end. The challenge is very complex. We need to be able to handle both time-series and statistical data about approximately 4,000 counties around the country over a period of several months, and probably another year at least into the future.
Each day, Johns Hopkins releases a very large CSV file containing lots of information about every county. The first step is fetching this file. A lot of the data is either bad, ignored, or junk. Many of the fields are never filled out, or contain irrelevant or redundant data. The file is too large to put into memory, so the script must loop through it looking for the current data for each county. Next, that data must be parsed and then put into each county’s individual archive file. This current data must also be updated in the database. (The database contains only the current day’s data for each county.)
I know what you’re thinking; why not just put everything in the database? The problem is that each time a user loads the page, we would have to fetch thousands of rows and then parse them into the correct format that chart.js wants. The other problem is that past rows are often updated in new releases of the Johns Hopkins file. So we would have to check hundreds of thousands of records every day to see if they need to be updated. By the time this is all over, that will probably grow to millions of rows. Instead of doing it that way, we can put just the basics in the database and use it only to select the closest counties and get the static data. This is much faster. The rest of the data is already parsed and waiting for us in each county’s archive file. Parsing these files is slow work which only needs to be done once a day.
When a user opens the site, the static homepage asks for the user’s gps information or offers to let them choose a metropolitan area. Then it queries the api which finds any counties within 50 miles, and fetches the archived data for those counties and returns it to the user. This takes an average of just 30 milliseconds and about 180k of ram per request.