dltHub
Blog /

Finding the UFC GOAT: A Full Stack Pipeline with dlt, dbt and Metabase

  • Reshef Sharvit,
    Principal Engineer at Skyhawk Security

INTRO

I’m Reshef, a software engineer passionate about distributed systems, infrastructure, and databases. I enjoy taking real-world subjects and turning them into projects. I have always found this approach more effective than reading manuals - I’d simply rather build something.

GOATs

One of the most common questions in sports is, “Who’s the GOAT?”

It’s popular because it blends objective achievement with subjective judgment, inviting endless debate across eras, rules, and styles of play. Fans use it as a way to compare greatness, legacy, and impact - not just raw statistics.

In the NBA, Michael Jordan is widely regarded as the greatest of all time. A perfect 6–0 Finals record, the highest points-per-game average in both the regular season and playoffs, and a career-best PER of 27.9 make his case hard to dispute.

PER (Player Efficiency Rating) is a metric designed to summarize a player’s per-minute statistical impact into a single number, with 15 representing league average.

Things are different in the UFC, a fundamentally individual sport. While fights have been tracked since 1993, differences across eras, weight classes, and competition levels make crowning a single GOAT far more complex - or do they?

If we rely on surface-level stats like win-loss records, we might crown Khabib Nurmagomedov, who retired undefeated at 29–0. It’s a valid case - but is it complete? How many champions did he beat, and how deep was the competition he consistently faced?

Or is the answer Georges St-Pierre (GSP): a two-division champion with nine consecutive title defenses across two eras and an overall record of 26–2?

I believe the answer is far more complicated - and requires significantly deeper analysis and meaningful comparisons. That’s exactly why I assembled a set of 15 (and growing) metrics designed to help move toward a more definitive, data-driven answer.

Addressing the GOAT question in UFC

To meaningfully tackle the GOAT question, I needed access to the complete history of UFC fights, along with the ability to query the data deeply and extract meaningful insights.

Fortunately, I came across a GitHub repository that scrapes UFC statistics directly from ufcstats.com and exports the data as CSV files:

https://github.com/Greco1899/scrape_ufc_stats

The statistical data alone accounts for roughly 95% of the work. However, since the UFC is occasionally associated with scandals and other disruptive events. Such as PED usage, injuries, or retirements, I also scraped Wikipedia to extract this information and stored it in a CSV format.

These CSV files contained raw data that needed to be loaded into a database. I chose PostgreSQL - the go-to database for basically 90% of use cases and 100% of personal projects.

Technically, I could have performed the analysis directly on the CSV files.

Modern tools make it possible to query flat files, and for small, one-off analyses, this approach can be perfectly acceptable. However, this project was about more than answering a single question. I wanted a foundation that could evolve.

dlt

I used dlt to load the data into the warehouse. While there are dozens of alternative tools available, dlt stood out for being both elegant and practical. It provides automatic schema inference, incremental loading, and sensible defaults with minimal configuration.

With just a few lines of code, the CSV files were loaded into automatically created tables with inferred schemas. dlt also keeps track of load history, schemas, and schema versions out of the box, removing the need for additional metadata or orchestration layers.

In practice, it took only a few minutes to go from raw CSV files to a fully queryable dataset.

dbt

Once dlt had loaded the raw data into fact tables, the next step was to derive more meaningful, analysis-ready representations from it. Since the data already lived in the database - and this was a personal project - dbt felt like the natural choice.

dbt provides an SQL-first, centralized transformation layer that makes data transformations explicit, version-controlled, and reproducible. Just as importantly, it allows you to define tests alongside your models, enforcing assumptions like uniqueness, non-null constraints, and referential integrity directly in the warehouse.

With the data already organized into fact and dimension tables, all that remained was to write dbt models that transformed this raw data into analytical views tailored to the questions I wanted to ask. Once those models were in place, querying the data became straightforward and repeatable.

Metabase

With the data loaded and modeled, the next step was making it accessible to others - especially non-technical fans. I needed a way to present the results without requiring SQL or database knowledge.

I hadn’t worked with Metabase before, but after seeing it recommended by a few Reddit users, I decided to give it a try. Much like dlt and dbt, which I had previously worked with, Metabase has an exceptionally low barrier to entry. I was able to set it up and start exploring the data within minutes.

The interface is clean, intuitive, and requires very little configuration. Within a couple of hours, I had built table-based visualizations on top of all my analytical views, making the data easy to explore and understand.

The Metabase API is easy to get started with and lets you configure and generate visualizations via HTTP requests. I wrapped this functionality in scripts—see metabase/setup.sh and metabase/charts.sh.

Last but not least - image display - Metabase is able to display the fighter images directly from a Python HTTP Web server that serves them, which is neat and adds to user experience.

How I programmatically created the tables:

Example Metabase tables

My Verdict

The combination of dlt, dbt, and Metabase proved highly effective. Low entry barriers, batteries included (but only those I need), make life much easier.

As for the UFC, I like the analogy between Postgres and Jon Jones. Both have been around forever, survived multiple generations of fighters and hype cycles, and kept improving with age.

Jones has defeated wave after wave of champions and “next big things.” Every few years it’s the same refrain: “Have you seen this new NoSQL database that scales to trillions of requests per second?” Sure, but I’ll still use Postgres.

He ranks near the top in almost every metric: a two-division champion, record holder for title fight wins and champions defeated, and among the best in overall record and finishes.

There are caveats. Jones failed drug tests twice (in an era where many did), declined short-notice fights, and has a controversial win over Dominick Reyes.

Still, the eye test reveals what stats can’t: elite fight IQ, near-invulnerability, mastery of distance, and constant adaptation. That’s what separates the truly great from the merely impressive.