UmpScorecards

An online platform dedicated to measuring the accuracy, consistency, and favor of MLB umpires.

The @UmpScorecards platform is by far my largest and most time consuming personal project. Started during August of 2020, the platform consists of an automated Twitter, with over 300k followers and 350m impressions (as of the end of the 2022 season), and a dedicated data archive with over 20k users per week.

Here are some technologies and techniques I've learned while working on different parts this project:

Data Analysis

Monte Carlo simulations: @UmpScorecards accuracy statistics are generated using a Monte Carlo simulation to simulate and account for potential measurement error in the tracking system.

Density estimation: @UmpScorecards consistency statistics use Kernel Density Estimation and a modified version of Bayes' rule to estimate an umpire's inidividual strike zone.

The Twitter

Data visualization: Individual components of @UmpScorecards graphics are generated in Matplotlib and joined together to make the final image.

Automated workflows: The entire daily run process for @UmpScorecards is automated using the Google Compute Engine and Cloud Scheduler.

The Data Archive

Cloud infrastructure: The @UmpScorecards data archive is hosted on the Google Cloud Platform's (GCP) Google App Engine (GAE). The app takes advantage of several of the GCP APIs, including Cloud Storage, Compute Enginer, and Cloud Scheduler.

Backend development: The @UmpScorecards data archive takes advantage of a proprietary API to return data to users. The API is also hosted on GCP, and is written in Python and Django. The API allows users to make complex requests with numerous filters.

Frontend development: The @UmpScorecards data archive relies on numerous open-source projects, such as d3.js, DataTables.js, and datetime.js, to serve users content. The frontend of the site was built in Django with help from Bootstrap.

General web development: In building out the data archive, I've been introduced to a wide array of other parts of the web development process, including tracking site analytics (done through Google Analytics) and generating ad revenue (done through Google AdSense).