Real-time supply chain visibility platform for a UK parcel carrier

Client

A major UK parcel and express freight carrier

Sector

Logistics & Transport

Engagement

Architecture, engineering delivery and embedded SRE - multi-quarter programme.

The challenge

What the client needed

The carrier had expanded through acquisitions over five years and was running parcel tracking and depot management across three separate legacy systems that could not share data in real time. Operations dashboards were updated by overnight batch jobs, meaning supervisors made decisions on information up to twelve hours old. Customers received estimated delivery windows calculated at booking rather than live ETAs based on actual route progress. Seasonal surges caused batch jobs to overrun, driving high inbound contact volumes that were largely attributable to poor visibility rather than genuine delivery failures.

Our approach

How we worked

Ran a discovery phase to map every tracking data source, consumer, and latency profile across the existing estate, including depot scanning, driver mobile events, and third-party carrier feeds.
Designed an event-driven target architecture that replaced batch polling with a real-time event stream, using a canonical event schema agreed across all three legacy source systems.
Stood up an embedded squad of platform engineers and SREs working alongside the client's operations and software teams from the first sprint.
Built a unified operational visibility layer giving depot and regional operations teams live throughput, exception, and capacity data for the first time.
Delivered a customer-facing tracking API with sub-minute refresh rates, replacing the static estimated delivery windows shown throughout the customer journey.
Ran the new platform in parallel with legacy batch processes for each domain, reconciling outputs line by line before redirecting consumers.

Outcomes

Measured results

All figures verified with the client. Specific identifiers withheld in line with our standard confidentiality terms.

Average tracking data latency reduced from up to 12 hours to under 90 seconds end to end.
Inbound customer contact related to tracking and delivery status queries reduced by 38% in the six months following go-live.
Three consecutive seasonal peak periods handled without a single batch overrun incident on the new platform.
Regional operations teams reported materially faster exception identification and recovery, reducing the average time to resolve a depot-level problem by more than half.
In-house engineering team transitioned to full ownership of the platform within eight months of the initial delivery squad standing up.

"We knew the data was there. The problem was that it arrived too late to act on. This platform changed the relationship between our operations teams and the information fundamentally - we are running on live data for the first time, and it shows in how quickly we catch and recover from problems."
- Chief Operating Officer, UK Parcel Carrier

Working on something similar?

If this engagement looks like the kind of problem you are facing, we would be glad to compare notes by email.

sales@halfteck.com

Context and constraints

The carrier's operational footprint had grown substantially through a series of acquisitions, each bringing its own depot management software, scanning infrastructure, and data conventions. The result was a fragmented estate where a single parcel's journey could generate events in three different systems, none of which had been designed to talk to each other in real time. The overnight batch process that consolidated these events into a single view had grown over years into a fragile chain of dependencies: if any stage overran, or a source file arrived late, the entire chain delayed and operations began the day working from yesterday's numbers.

The constraints were both technical and organisational. The client could not shut down existing systems until replacements were proven, which ruled out any big-bang migration. The seasonal peaks, particularly the pre-Christmas and sale periods, imposed hard delivery windows on the programme itself: new capabilities had to be stable and bedded in before any major volume event, or the risks were unacceptable. There was also a genuine capability gap to address: the in-house engineering team was experienced in the existing batch systems but had limited exposure to event-driven architecture, so the programme had to be a skills transfer as well as a technical delivery.

The approach in depth

We began by spending time with the people who actually used the data: depot supervisors, regional operations managers, customer service agents, and the commercial team who set delivery promise windows. This was important because the data problems were well understood at the systems level, but the business impact had never been precisely quantified. The discovery surfaced that a significant proportion of inbound customer contact, which the client had attributed to delivery exceptions, was actually driven by customers who had not received a delivery update after the initial booking confirmation and were checking in proactively. That finding shaped the priority order for the programme significantly.

From discovery we moved to architecture. The central design decision was to introduce a canonical event schema, agreed across all three source systems, that would represent tracking milestones in a common format regardless of which underlying system generated them. This canonical schema became the backbone of everything that followed. Rather than building point-to-point integrations between the legacy sources and each consumer, we built a single event stream that any producer could write to and any consumer could read from. That separation meant legacy systems could be replaced incrementally without forcing downstream consumers to change at the same time.

We structured the platform in three layers. An ingestion layer accepted events from each source system, applied the canonical schema, and published to the stream. A processing layer handled enrichment, route calculation, and exception detection in near real time. A serving layer exposed the processed data through clearly versioned APIs, one oriented to operational consumers, another to the customer-facing tracking experience. The serving interfaces were designed to be stable even as the upstream processing evolved, which protected the customer-facing products from internal changes.

Delivery phases and sequencing

We sequenced delivery around value and risk. The first phase targeted the most data-rich and well-understood source system, running the new event stream in parallel with the existing batch process and reconciling outputs continuously. This gave us an early proving ground for the canonical schema and surfaced edge cases we would not have anticipated from documentation alone: scanning events that arrived out of order, duplicate records from driver devices reconnecting after a signal gap, and a small but consistent class of records that the source system produced in a non-standard format due to a legacy configuration nobody had documented.

Addressing those edge cases in the first phase meant subsequent phases onboarding the other two source systems were faster and more predictable. Each phase followed the same pattern: parallel running, automated reconciliation, a period of validation with operational users, then a formal sign-off before the legacy feed was deprecated. The strangler approach meant no single moment carried the full programme risk, and the operational teams built trust in the new data incrementally rather than being asked to switch from old to new overnight.

We timed the phases deliberately to avoid the peak season windows. The first two phases completed before the summer volume uplift, and the third phase, which onboarded the most complex source system, was delivered in the quieter autumn period before the pre-Christmas peak. That sequencing required programme discipline and some commercial negotiation about what could be delivered when, but it avoided the alternative, which was testing a new production system under maximum load.

Architecture and technology decisions and trade-offs

The core technology choices were driven by the need to support high event throughput reliably while remaining operationally straightforward for an in-house team that was building its event-driven capabilities. We favoured managed cloud services for the streaming infrastructure, choosing a provider the client already had a commercial relationship with to reduce procurement and security review overhead. For the processing layer we adopted a declarative, version-controlled approach to transformation logic so that business rule changes were reviewable and testable before deployment, rather than embedded in scripts owned by individuals.

One significant trade-off involved the level of processing applied in the stream versus deferred to the serving layer. A more aggressive real-time enrichment design would have delivered richer data to consumers faster, but it added latency and complexity to the hot path and increased the blast radius of any processing failure. We took a conservative position: the stream carried canonical events with a defined minimum set of enrichment, and consumers that needed additional derived data requested it through the serving API rather than expecting it in the stream. That design proved its worth during an early incident when a third-party geocoding service used for route enrichment became temporarily unavailable; because enrichment was not in the hot path, the core tracking capability continued without interruption.

We were also candid with the client about the ongoing cost profile of the new platform. Event-driven systems at this scale carry meaningful infrastructure running costs, and the team needed to understand the unit economics before commit. We modelled the cost against the reduction in customer contact volumes and the batch infrastructure that would be decommissioned, and the business case was clear, but only because we had the actual contact reduction data from the discovery phase to anchor the projection.

Measurable outcomes

The most visible outcome was the change in how operations teams started their working day. Previously, the morning operational review was conducted against data that was twelve or more hours old, meaning decisions about resource allocation and exception response were made on a picture of the network that no longer reflected reality. With live data available, those decisions became genuinely responsive to current conditions, and the time between a problem developing and the operations team becoming aware of it fell significantly.

The customer-facing impact was measured through the contact centre data. The reduction in tracking-related inbound contact freed capacity that was redirected to handling genuine delivery exceptions, which improved resolution quality for customers with real problems. The commercial team also observed that conversion on the tracking-related touchpoints in the customer journey improved once live ETAs replaced static estimated windows, though attributing that cleanly required careful analysis given other concurrent changes to the customer experience.

Canonical event schema agreed across all three source systems, providing a single, consistent tracking record regardless of originating platform.
Event-driven architecture replacing overnight batch, with sub-minute end-to-end latency from scanning event to customer-visible update.
Parallel running and reconciliation for each source system migration, ensuring operational continuity at every stage of the transition.
Stable serving interfaces that decoupled downstream consumers from upstream processing changes, protecting the customer experience from internal delivery risk.
Embedded skills transfer so that the in-house team understood and owned every layer of the platform before Halfteck stepped back.
Season-aware phasing that ensured no major delivery milestone coincided with a high-volume operational period.

Lessons learned

The value of the discovery phase, specifically the time spent quantifying the business impact of the data latency rather than treating it as a self-evident technical problem, was the clearest lesson from this engagement. Knowing that a large share of inbound contact was driven by visibility gaps rather than delivery failures changed the priority order of the programme and made the business case for investment substantially more concrete. Technical teams often skip this step in favour of moving quickly to architecture, and it is almost always a mistake.

The second lesson was the importance of the canonical schema as a first-class deliverable rather than an early design artefact that gets overtaken by implementation detail. The schema required genuine negotiation between teams who had been running their own data conventions for years, and it required governance to ensure that new events were added through a defined process rather than by individual teams modifying their source feeds. That governance overhead felt heavy at the time; it was justified by the number of integration problems it prevented downstream.

Finally, the programme reinforced our view that the in-house team's ability to own the platform independently is not a nice-to-have; it is a delivery criterion. Agentic capability transfer, achieved through paired delivery from the first sprint rather than a handover phase at the end, was what made the client's eight-month transition to full ownership realistic.

If you are running a logistics or distribution operation where data latency is limiting your ability to respond to operational problems or meet customer expectations, we would be glad to discuss what a similar programme might look like. Email sales@halfteck.com.