A walk-through of the methodology we've shipped across forty-plus migration programmes since 2010 — together with a concrete anonymised case study from a 2025 UK challenger-bank engagement. Co-written by the five engineers who staffed it.
Every cloud migration is different. Every cloud migration that goes well is different in roughly the same ways. After forty-plus programmes, here's the structure we've settled on — and a concrete case study from a 2025 engagement that exercised every part of it.
Methodology gets a bad name because it's usually presented as a 60-slide framework that nobody on the delivery team actually reads. What we mean by methodology is much narrower: a set of standing decisions about how we approach the problem, so that we're not re-debating first principles on every engagement.
The point of a good methodology isn't rigidity. It's shared default — so that when a real surprise comes up (and one always does), the team has more energy to think about it. The bits that aren't surprising should be on autopilot.
To make the next six sections concrete, we'll thread a real engagement through them. The client was a UK challenger bank, ~£280M GMV through their card products, ~420 staff, FCA-regulated. They came to us with a legacy estate hosted across two co-lo data centres in Slough and Manchester — Java monoliths against an Oracle Exadata, batch jobs running through TWS, modest Linux fleet, on-prem Splunk for SOC.
The brief was straightforward to write and hard to scope: move to public cloud, satisfy FCA operational resilience (SS1/21), reduce infrastructure spend, and have it done before the data-centre lease at Slough expired in eighteen months. We ran the diagnostic over two weeks, then the full programme over thirty-six.
The five of us each owned a workstream. Muffaddal ran the engagement and held the architecture record. Vijeet led the cloud landing zone and infrastructure-as-code. Taha ran the security and FCA-resilience workstream end-to-end. Jimmy owned the data-migration strategy, including the Oracle-to-Aurora cutover that turned out to be the hardest single piece of work. Ashok embedded with the bank's two Java teams to refactor and re-platform the monoliths a slice at a time.
The first two weeks of every migration are spent understanding what you actually have — which is almost always different from what the architecture diagrams say.
Specifically, we want answers to three questions. What's running? (an inventory at the workload level, not the server level). What does it cost? (current run-rate, broken down by workload, including the bits that aren't on the cloud bill — licensing, on-prem hardware refresh, people). What does it depend on? (the messy graph of which workloads talk to which, including the integrations nobody documented).
The architecture diagram is always a lie of omission. The current-state map is the truth.
Jimmy on the case: the bank had a documented system catalogue listing 47 applications. After two weeks of discovery interviews and traffic analysis we'd identified 71. The 24 missing ones were a mixture of shadow IT, vendor-managed appliances nobody had updated the catalogue for, and three batch jobs that had been running on a Windows desktop under a developer's desk since 2017. None of this is unusual.
With the current state mapped, we work backwards from the outcome. Not "we'll move to AWS" — that's a destination, not a target state. The target state we care about is the operating model: how the platform will be supported, who'll have access, how changes flow, what the SLOs are, how cost is tracked and attributed.
Three options at this stage, always. The recommendation, the alternative we'd be happy with, and the one we'd advise against. Even if the recommendation is obvious, naming the others forces clarity about what's being chosen and what's being given up.
Muffaddal on the case: we presented three options — full AWS landing zone with multi-account separation, AWS plus a UK sovereign-cloud island for the most sensitive PII, or Azure with a Microsoft-led identity story. Recommendation: option one. The bank's CIO had been on the Azure side of a previous role and pushed back. We wrote the alternative-options summary specifically so the board could see what they'd be choosing if they overruled us. They didn't.
The most common cause of migration regret isn't a technical failure. It's a sequencing failure — moving the workloads in the wrong order. The principle we use is straightforward: migrate the workloads that earn the most leverage soonest. Usually that means starting with the ones that are most painful in the current environment (slow to deploy, hard to scale, dependent on legacy capacity) — because the relief is immediate and visible, and you build credibility for the rest of the programme.
The opposite mistake is starting with the easiest workload because it's the easiest. That's the slowest path: nothing meaningful improves, sponsors get nervous, and the difficult workloads still have to be done at the end with less goodwill.
Ashok on the case: the obvious "easy first" candidate was the company intranet — three pages of HTML and a search box. Boring, low-risk, no users at risk. We deliberately skipped it. The first workload we moved was the card-issuance API — the slowest-deploying, most-on-call-paged service in the estate. Six weeks in, the on-call team's pager nights dropped from twelve to two. That's the credibility budget for everything that came after.
If the migration has any complexity, it lives here. Compute and code are relatively easy to move. Data — especially live transactional data — is where careers go wrong. There are really only three strategies, and you must pick one consciously:
We pick based on the answer to one question: what's the worst acceptable outcome if data drifts for an hour? If the answer involves regulators or refunds, CDC. If the answer is "we just re-run a reconciliation overnight", cutover.
Jimmy on the case: banking transaction data sits squarely in the "regulators or refunds" category, so CDC was the only option. We used Debezium against Oracle GoldenGate to stream into Aurora Postgres, with reconciliation jobs running every fifteen minutes against a deliberate two-week shadow period. The hardest single problem in the entire programme was a clock-drift issue between the Oracle source and the Aurora target that introduced a 47-millisecond skew on transaction ordering. We caught it because the reconciliation jobs were tracking it. Without the shadow period, it would have gone live and been very difficult to unwind.
We've never run a migration cutover without a documented rollback path — even for clients who insisted they didn't need one. Twice in fifteen years we've actually had to use it. Both times the team that designed it didn't believe we'd ever need it, and both times it saved the engagement.
The rollback isn't just a technical procedure. It's a decision tree: at each stage of cutover, what would have to go wrong for us to abort, who has the authority to make that call, and what the recovery path looks like for everything that's already moved. Most teams write the technical procedure and skip the decision tree. The decision tree is what matters.
Taha on the case: on the final cutover weekend we had a four-stage rollback decision tree, with named authority at each gate — the bank's COO at the highest, our engagement lead (Muffaddal) at the lowest. At stage two we caught a packet-loss spike on the new ingress (turned out to be a misconfigured route in the transit gateway). We were forty-three minutes from the no-return point. Muffaddal called the abort, we rolled the in-flight transactions back to the legacy stack, fixed the route, and restarted the cutover six hours later. The bank's regulator was watching this in real time and signed off the cutover the following Tuesday — partly because the rollback worked cleanly.
Two weeks of post-migration operation, then we leave. That's the design. We hand back runbooks, dashboards, an updated architecture record, and recorded training sessions. The team that takes over is the team you already had, augmented by whatever hiring we recommended during the engagement.
The temptation, on both sides, is to convert the engagement into ongoing managed services. Sometimes that's the right call — but it's a separate decision, scoped fresh, with the cost-benefit honestly examined. Migration energy and ongoing-operations energy are different things, and you don't want one bleeding into the other unconsciously.
Vijeet on the case: we left the bank with twenty-eight Terraform modules, a documented landing-zone account structure, runbooks for the seven operational scenarios that mattered, and a hiring brief for two SREs they recruited inside three months. The bank chose not to retain us for ongoing managed services — they had the team in place. Eighteen months later they came back for a separate FinOps engagement to consolidate three years of reserved-instance and savings-plan commitments. That's the relationship the methodology is designed to produce.
For the case study above, at twelve-month steady state:
Two weeks discovering what's really there. Two weeks designing what should replace it. Most of the rest is sequencing, data strategy, and cutover discipline. The methodology isn't magic — it's a set of standing decisions that free the team to think about the surprises that actually matter on your specific engagement. The case study walks through one of those engagements; we've done forty more like it.