Building a smart-metering data platform for a UK energy supplier

Client

A top-six UK energy retailer

Sector

Energy & Utilities

Engagement

Strategy, delivery and embedded engineering - multi-quarter programme.

The challenge

What the client needed

The client had grown rapidly through acquisition and was running smart-metering data across four different storage systems, with inconsistent quality, no end-to-end lineage and a ten-day delay between meter reading and customer-facing insight. Regulatory reporting was manual, error-prone and increasingly unsustainable as the rollout of half-hourly settlement approached.

Our approach

How we worked

Ran a six-week discovery to map every consumer of smart-metering data - internal teams, external regulators and customer-facing products - and quantify the impact of the existing latency and quality issues.
Designed a target architecture based on a lakehouse pattern with open table formats, separating raw, conformed and product-ready zones with explicit data contracts between them.
Stood up an embedded squad of senior data engineers, SREs and a product manager working alongside the client’s in-house team, using paired delivery from day one.
Replaced manual regulatory extracts with an automated reporting service driven from the conformed zone, with full lineage back to source meter readings.
Built a customer-facing API layer that powered three new products in the first 12 months, including a personal-usage app feature with measurable retention impact.

Outcomes

Measured results

All figures verified with the client. Specific identifiers withheld in line with our standard confidentiality terms.

End-to-end latency from meter reading to customer-facing insight reduced from 10 days to under 30 minutes.
Regulatory reporting cycle time reduced by 80% with zero significant data quality findings in the following two reporting periods.
Three new customer-facing products launched on the platform within 12 months, with combined attributable margin improvement in the low single-digit millions.
Total cost of ownership across the previous four data stores reduced by 42% within 18 months, after decommissioning.
Internal data engineering team transitioned to full ownership of the platform within nine months.

“We have tried this twice before. The difference this time was the engineering depth on the ground and the discipline of the data contracts approach - both of which Halfteck brought from day one.”
- Director of Data, UK Energy Retail

Working on something similar?

If this engagement looks like the kind of problem you are facing, we would be glad to compare notes by email.

sales@halfteck.com

Context and constraints

The supplier had grown through a combination of organic acquisition and consolidation, and as a result the smart-metering estate had become a patchwork of legacy data stores. Reads arrived from the national metering infrastructure in several formats, were landed in a handful of relational databases, and were then copied, transformed and re-copied by a long tail of overnight batch jobs. Each downstream team had quietly built its own version of the truth, which meant that billing, customer service and regulatory reporting frequently disagreed about something as basic as a single household's consumption for a given day.

The constraints were significant. Smart-metering data is governed tightly in the United Kingdom, and the supplier was obliged to meet strict obligations around consent, data retention and the separation of personal and consumption data. Any platform we built had to honour those obligations by design rather than as an afterthought. There was also no appetite for a risky big-bang cut-over: with more than five million domestic customers depending on accurate bills, the business needed a migration path that preserved continuity at every step. Finally, the existing batch windows were already overrunning, so the new platform had to be materially faster while costing less to operate.

The approach in depth

We began with a short discovery phase to map every source, every consumer and every transformation in the current estate. This was deliberately unglamorous work, but it surfaced the hidden couplings that tend to derail modernisation programmes. We catalogued data lineage end to end, identified which transformations were genuinely business logic and which were merely accidents of history, and agreed a canonical data model that all downstream teams could rally around. That canonical model became the contract for everything that followed.

From there we designed a layered platform built around a clear separation of concerns. A landing zone ingested raw reads exactly as received, preserving them immutably for audit and replay. A curation layer applied validation, de-duplication and the agreed canonical model. A serving layer then exposed clean, governed datasets to consumers through well-defined interfaces. Crucially, we treated the canonical model as the only sanctioned route to consumption data, which over time allowed us to retire the shadow copies that had caused so much disagreement.

We also invested early in data quality as a first-class capability rather than a bolt-on. Validation rules, completeness checks and reconciliation against control totals were embedded into the pipelines so that problems were caught at the point of ingestion rather than discovered weeks later in a regulatory submission. We typically find that this front-loading of quality effort pays for itself many times over, because the cost of a defect rises sharply the further downstream it travels.

Delivery phases and sequencing

We sequenced delivery to reduce risk and to demonstrate value early. The first phase stood up the landing zone and curation layer for a single, well-understood data domain, running it in parallel with the legacy estate so that outputs could be reconciled line by line. Only once we were confident the new platform matched or improved on the old one did we begin redirecting consumers.

Subsequent phases onboarded additional domains and progressively decommissioned legacy jobs. We used a strangler pattern throughout, allowing the old and new systems to coexist while the centre of gravity shifted steadily towards the new platform. This meant the business never faced a single high-stakes weekend on which everything had to work; instead, risk was spread across many small, reversible steps. Each phase concluded with a formal verification against the source data and a sign-off from the relevant business owners, which kept trust high and avoided the accumulation of unvalidated change.

Architecture and technology decisions and trade-offs

We favoured managed cloud services wherever they reduced operational toil, reserving bespoke engineering for the genuinely differentiating parts of the platform. The ingestion and storage layers leaned on object storage with a well-governed catalogue, which gave us cheap, durable retention and straightforward replay. For transformation we chose a declarative, version-controlled approach so that every change to business logic was reviewable, testable and reversible, rather than locked away in opaque scripts.

Not every decision was clear-cut, and we were candid with the supplier about the trade-offs. A more event-driven, near-real-time design was attractive and would have unlocked some customer-facing features sooner, but it carried higher operational complexity and a steeper learning curve for the in-house team. We therefore adopted a pragmatic middle path: a batch and micro-batch core that met the immediate need reliably, with clearly defined seams that would allow streaming to be introduced later for specific use cases without re-architecting the platform. We also accepted some additional storage cost in the landing zone in exchange for the audit and replay benefits of immutable raw data, judging that the regulatory and operational value comfortably outweighed the expense.

Measurable outcomes

The platform delivered improvements across cost, speed and trust. Overnight processing that had been at risk of breaching its window now completed comfortably with headroom to spare, and the consolidation of shadow copies meant that billing, customer service and regulatory teams finally worked from a single, reconciled view. We typically see a marked reduction in data-related billing disputes once a canonical model is enforced, and this engagement followed that pattern.

Just as importantly, the platform became an enabler rather than a constraint. With clean, governed datasets available through stable interfaces, the supplier's product teams were able to launch new customer-facing propositions, such as richer consumption insights and tariff guidance, far more quickly than before. The marginal cost of a new data product fell substantially, because the heavy lifting of ingestion, quality and governance had already been done once and could be reused.

Single canonical model agreed across billing, service and regulatory teams, retiring shadow copies of consumption data.
Immutable landing zone giving full auditability and the ability to replay any period on demand.
Embedded data quality with validation and reconciliation at the point of ingestion rather than downstream.
Strangler-pattern migration running old and new in parallel to avoid a risky big-bang cut-over.
Reusable serving interfaces that cut the marginal cost and lead time of launching new data products.
Privacy and consent by design aligned to UK smart-metering obligations from the outset.

Lessons learned

The clearest lesson was the value of treating discovery as engineering rather than paperwork. The time spent mapping lineage and untangling accidental complexity repaid itself many times over, because it allowed us to migrate with confidence rather than hope. A second lesson was the discipline of parallel running and reconciliation: it is tempting to switch off legacy systems quickly to bank the savings, but the trust earned by proving equivalence line by line was worth the modest extra cost of running both for a while.

Finally, we were reminded that the most durable platforms are those that make the right thing the easy thing. By making the canonical model the path of least resistance for consumers, we did not have to police compliance through governance committees; teams adopted it because it was simply the most convenient and reliable source available. That principle, designing for adoption rather than mandating it, is something we carry into every data engagement.

If you are wrestling with a fragmented data estate and need to modernise without putting billing or regulatory reporting at risk, we would be glad to help. Talk to us about a similar engagement. Email sales@halfteck.com.