Posts

CrimsonCache overview
CrimsonCache overview

CrimsonCache generates a synthetic set of blood donors which are then used to populate a MySQL database of blood donations over time. This is for SQL practice. Additionally, there is an update script that generates a much smaller second MySQL database that represent daily donations. This feeds into …

Starting Hemolytics

It might seem a little nuts but I’m starting a sister project for Crimson_Cache. This one will take the SQLite database of donors and ingest it as a Postgres database. It will do the same with the daily blood donations databases. And then a separate (DuckDB) database for analysis. The idea …

Why a Synthetic Dataset

I’m back on CrimsonCache for a while. But it dawned on me that it’s worth laying out my reasons for making a synthetic dataset vs finding one that is real since I have a strong preference for real data. It comes down to four reasons. Overcomes data scarcity. I don’t have access to …

Argumentum Ad Verecundiam

You weren’t feeling well, so you went to your doctor. Based on training, years of experience, and a careful examination of your signs and symptoms, they order tests. The results come back: cancer. They refer you to a medical oncologist, who confirms the diagnosis and outlines a treatment plan, …

Updating Denver Traffic …

It’s time to update Denver Traffic Accidents on GitHub. The easiest thing would be to do this manually but I don’t want to have to keep doing it, no matter how infrequently. However, there is now a snag that I need to deal with – like IMP the data source has changed somewhat. Both …

Streamlit Is Good but...

Let me first say that I like Streamlit. If you don’t have a large or difficult dataset to work with, it’s pretty easy to make a good looking dashboard pretty quickly. It is so much faster to create something good looking than Plotly Dash. That said, here are some areas I’ve found …

Quakes overview
Quakes overview

Quakes is a multipart project that displays earthquakes world wide. Let’s explain it with the graphic above: At midnight, GMT, a GitHub actions creates an ephemeral runner that runs data_processing.py which downloads USGS earthquake data. The data is already clean but needs to be transformed …

Tweaking_Quakes

I'm ~really thrilled~ pretty happy with [Streamlit](Streamlit.io). I have not seen something this good looking and easy to work with since folium. It's what I hoped Dash would be. As part of getting to know it, I'm redoing - or at least adding to - Quakes. Essientially, a web app and I may go on …

IMP Datasource Changed

To be brief, NGA has changed the way they serve data again. While inconvenient, this just might be a good thing. When I first started working with their data, I think you could get it in XML, JSON, and maybe something else. I don’t remember which one I used, but it was a nightmare because there was …

CrimsonCache Part2

CrimsonCache is now minimally functional and has a home on GitHub now. I still have more to do though. For now the task list is: Fix annoying spike of donations at the end Exporting to SQL Tests to confirm aggregate analysis holds but variation is functional Further work might include a dashboard …

What Bing Doing?!?

If you don’t get the title reference, you need to know your meme. I was searching something on Microsoft’s Bing today. I find it hit or miss in general and this was a miss. I entered ‘Bing AI’ in the search and inadvertently hit enter. And a lot of results of how to use it …

Azure Should Eat Its Own …

I finished the recert for the Azure Data Scientist tonight. Not a bad test for several reasons. First, Microsoft has what they call a ‘generous retake policy’. Second, it’s essientially an open book exam – as developemnt is in real life. Third, it’s run totally by …