We Run Our SDLC Out of Git
We put our entire SDLC in git. Requirements, decisions, task assignments, everything. Then we cancelled standup. Nobody complained. OK, I complained, which is apparently how you get assigned the blog post about it.
Expert insights on data engineering
We put our entire SDLC in git. Requirements, decisions, task assignments, everything. Then we cancelled standup. Nobody complained. OK, I complained, which is apparently how you get assigned the blog post about it.
A practical reference for data pipeline patterns: loading strategies, slowly changing dimensions, change data capture, Lambda, Kappa, and Medallion architecture, reliability fundamentals like idempotency and atomic swap, and orchestration patterns.
The hyperscale data platform pitch is compelling, but it was designed for a different customer. Here is an honest look at where open source tools like PostgreSQL, DuckDB, Airflow, and dbt outperform proprietary platforms for most Oklahoma organizations, and when the proprietary option is actually the right call.
Stop coupling DAGs by time or ExternalTaskSensor. Airflow's dataset scheduling lets you wire pipelines together through the data they produce and consume, so the right DAGs run at the right time without the fragility.
Oklahoma energy companies are sitting on enormous amounts of data spread across systems that were never designed to talk to each other. PPDM gives you a standard. Data engineering makes it actually work.
We've seen a trend of small-to-mid size Oklahoma businesses outgrowing their data setup. Here's how to tell if it's time to bring in a real data engineer.
If your Airflow Variables, Connections, and secrets only exist in the UI or someone's memory, you don't have a config strategy, you have a time-bomb. Here's how to actually fix that.
Learn how to build portable, testable data pipelines by containerizing your ETL logic and using Airflow purely as a scheduler, keeping your code independent from any specific orchestration tool.
Stop fighting for inbound VPN access. Put your Airflow workers where the data lives and let them call home.
This guide takes an existing Meltano proof-of-concept and elevates it by using a true data source and target database. We will be working with public REST API endpoints as an extractor, and we will use PostgreSQL as a loader.
Containerize a Meltano EL pipeline with Docker to get a reproducible, self-contained workflow that produces a JSONL artifact.