
From Dogpile to Deepfakes of Truth: What Happens When GenAI Meets Your Messy Data
Remember when solving a technical problem meant opening five tabs, three forum threads, and a frenzied blog post from 2009? We had our favourite search engines (I used Dogpile; no shame), and the ritual was familiar: cast a query, skim a stack of results, chase the golden nugget through page two, three, maybe four. It was detective work with patience.
Then came Generative AI. Same questions, fewer tabs. Ask, get an answer—clean, authoritative, often helpful. After a few wins, we start to trust it. It’s like having a senior developer who never sleeps… until it’s confidently wrong.
Developers already have a name for this: vibe coding. The model sounds right, compiles sometimes, and invents a configuration flag when it’s feeling brave. Fine when Stack Overflow is the source of truth—you test it, it breaks, you fix it. But what happens when we aim that same confidence cannon at our corporate data?
When AI gets access to your internal truth
GenAI plugged into enterprise systems feels like magic: “Summarise last quarter’s performance.” “Explain the drop in churn.” “Which products drove margin?” It answers in perfect prose, with charts that look expensive. If your data is clean and your master entities are stable, this is glorious.
If not? You’ve built a confidence amplifier for inconsistencies.
- Authority first, audit later: The model speaks in certainties; humans read the first draft.
- Aggregation hides drift: If “customer” means three different things across systems, GenAI elegantly averages contradictions.
- Plausible glue: Mismatched keys and stale reference data become smooth narratives—wrong, but smooth.
MDM: the boring prerequisite for useful AI
Master Data Management and data cleansing aren’t cosmetic; they’re the rules that stop “Alice Smith” being four different people, keep product hierarchies consistent, and prevent addresses, currencies, and statuses from playing musical chairs across systems. Without that foundation, AI doesn’t just make mistakes—it publishes them in well‑written English.
A quick sanity check Switch off every pipeline. Which master entities could you still trust tomorrow—Customer, Product, Supplier? If the answer is “depends which system,” you have an MDM problem, not a platform one.
Now the uncomfortable question: why don’t job specs say this out loud?
Here’s the twist. Skimming UK listings, you’ll find plenty of Data Engineer, Architect, and Analytics roles—but relatively few that explicitly call out “data cleansing” or MDM. The tooling parade is long (Spark, Snowflake, dbt), the quality work is a footnote, if present at all.
Why the omission?
- It’s not glamorous: “We prevent duplicate customers” doesn’t sparkle like “We stream billions of events.”
- One‑off myth: “We’ll cleanse during migration” (translation: we won’t revisit it).
- Ownership fog: If quality is “everyone’s job,” it becomes no one’s responsibility.
- Tool over outcome: Listing platforms feels concrete; promising “high‑integrity master data” sounds like homework.
So, are organisations confident—or just quiet? Two possibilities:
- They’ve nailed data quality and see no need to advertise it. (Rare, but bless them.)
- They’re under‑investing in MDM and cleansing, assuming modern stacks will magically fix semantics. (Common, and expensive.)
If the second sounds familiar, expect AI to amplify the wobble. Shiny pipelines are great. Shaky truths are costly.
A quick snapshot of UK job listings shows “data cleansing” appears, but not nearly as often as you’d expect given the stakes. Counts fluctuate and many boards scope by location, so treat this as directional, not definitive. Still, the pattern holds: tooling gets the headline; cleansing and MDM are quietly implied or omitted.
What to put in the job spec—no fluff
- Engineer: Implement and monitor data quality rules with automated alerts; design remediation playbooks.
- Architect: Define MDM patterns (golden records, survivorship, hierarchy management) and enforce propagation.
- Analytics: Model dimensions/facts against master data; treat KPI definitions as versioned contracts.
- Stewardship: Own core entity definitions; publish trust metrics (completeness, uniqueness, validity).
The punchline
Search engines made us sceptical; GenAI makes us decisive. That’s an upgrade only if your data deserves the confidence. Put cleansing and MDM on the critical path—and in the job descriptions. Otherwise, you’ll get authoritative answers that sound brilliant, cost real money, and quietly disagree with reality.
Single‑page snapshot: On one Reed results page (~12 postings visible), 2 explicitly used “data cleansing/data cleanse,” while ~10 leaned on adjacent phrasing (“data quality,” “preparation,” “migration”)—a suggestive gap, not a final verdict.