SA
All writing

February 2025

The Data Quality Tax

Before you think about AI, think about your data. Most organizations are paying a hidden tax they can't see.

Every organization that has tried to build an AI system on top of real enterprise data has encountered some version of the same problem: the data isn't what they thought it was. Columns that should be consistent aren't. Definitions that seemed shared turn out to be local conventions. Timestamps that look like dates encode business logic no one documented. The data exists, technically. It just doesn't mean what you assumed it meant.

I call this the data quality tax. It's the hidden cost that every AI initiative pays — in delayed timelines, degraded model performance, and the organizational energy spent debugging data problems that were mistaken for model problems. Most teams don't see it coming because they assess data readiness the wrong way: they look at whether data exists rather than whether it's usable.

The distinction matters enormously. 'We have transaction data going back five years' is a data existence statement. 'Our transaction data has consistent schema, accurate timestamps, complete records for the fields we need, and is refreshed on a cadence that matches our use case' is a data usability statement. Most organizations can make the first claim confidently. Far fewer can make the second.

In progress

The full piece is being written.

I write slowly and edit more. The intro above is the real opening — the rest is coming. If you want to be notified when it's done, send me a note.