Your cloud estate has outgrown your security model. Now what?

UK city financial district skyline overlaid with secure cloud icons and network connections representing regulated financial data being protected in the cloud.

Most UK financial services firms I talk to rarely have a cloud strategy problem, but they do have to deal with cloud sprawl. The first wave of workloads went up three or four years ago, the security model went up with them, and nobody has had time to step back and ask whether any of it still hangs together. If that sounds familiar, this is for you.

If I'm honest, the real answer to "is our cloud secure?" in most regulated firms is "we don't really know any more". Not because anyone has been negligent, but because the estate has grown faster than the controls around it. New SaaS platforms got signed off by individual business units. Developers spun up environments to ship faster. The original landing zone was designed for two or three workloads and now hosts at least thirty. Every individual decision was reasonable. The cumulative result is something nobody can confidently describe end to end.

This is the bit the FCA actually cares about. Not whether you are in the cloud, but whether you can evidence that the controls protecting your important business services still work, across an estate that has changed considerably since you last lifted up the covers.

Three things tend to be true at this point. The original landing zone has been outgrown. Identity has fragmented across multiple tenants, SaaS platforms and federated providers. And nobody has a clean view of which workloads support which important business services under PS21/3. Sound familiar? Right, the rest of this post is about what to do about it.

Why the original landing zone usually breaks first

Most landing zones I review were designed in 2021 or 2022, when the firm had a handful of workloads and a clear migration roadmap. The architect made sensible choices at the time. Three years on, those same choices are creaking under twenty times the load, and nobody has refactored them because everyone is too busy delivering features.

The usual symptoms:

Subscription sprawl with no consistent guardrails. New subs get spun up to unblock projects. The policies that should apply do not get attached due to application workarounds. You end up with a long tail of environments that drift away from your baseline.
Centralised logging that has quietly stopped being centralised. Some workloads forward to Sentinel, some go to a different SIEM, some go nowhere at all because the original diagnostic settings were never enforced as policy.
Network segmentation that worked for two hub-and-spoke environments and now does not scale. Hub firewalls become a bottleneck. Exceptions pile up. The original zero trust ambition gets quietly abandoned.
No clear ownership of the platform itself. The team that built the landing zone has rotated out, and the team running it now is firefighting tickets rather than refactoring foundations.

If any of that lands, the fix is not to start from scratch. It is to do a proper landing zone uplift: re-baseline your policies, enforce them at scale, and prioritise the gaps that map to your most material workloads first. Microsoft's Cloud Adoption Framework has a decent landing zone review pattern that gives you a structured way to do this without throwing the baby out with the bathwater.

The trap firms fall into is treating this as a tech refresh. It is not. It is a controls problem with technology underneath, and your risk and compliance teams should be in the room from day one.

Identity is almost always where the real risk is hiding

Whenever I do a cloud security review for a financial services firm, identity is consistently the area that has degraded most since the original deployment. There are a few reasons for this.

The original Entra ID setup (or AAD, depending on whether you have got over Microsofts obsession with changing names) was probably configured for one set of conditional access policies, a single MFA stance, and a manageable number of admin roles. Then mergers, acquisitions, contractor populations, customer-facing identities and SaaS platforms all got bolted on. Each addition made sense individually. The combined picture is now genuinely hard to reason about.

Privileged access has almost certainly grown beyond what was originally designed too. PIM was the plan, but nobody wants to be the person blocking a senior leader's permanent Global Admin assignment, so exceptions accumulate. By the time you do a posture review, the "select few" privileged accounts have quietly become forty.

Then there are the service principals and managed identities. This is the bit that worries me most in any review I do, because it is also the bit attackers know to look for. App registrations from three years ago are still active, with secrets that have been rotated zero times, granting permissions to APIs that nobody currently working at the firm fully understands. Nobody audits them because nobody owns them.

If your firm is in this position, prioritise three things:

A genuine privileged access cleanup. Not "we'll do PIM next quarter", but a hard reset on who has standing access to what, and why. Break glass accounts, just-in-time elevation, approval workflows for sensitive roles, and proper auditing of every privileged action.
An app registration and service principal review. Inventory every one. Check what permissions they hold. Rotate every secret. Disable anything that has not authenticated in 90 days. This is dull work and it matters more than almost anything else on the list.
A joiner-mover-leaver process that actually works across hybrid identity. Not the JML process you wrote in 2020. The one that handles the contractor who joined via a SaaS platform, picked up entitlements via a group they shouldn't be in, and is now offboarding while their access lingers across four different systems.

None of this is glamorous. All of it is what an FCA section 166 review would ask you to evidence.

Mapping cloud workloads to important business services

This is the bit that connects the technical work to the regulatory conversation, and it is where most firms have a gap.

Under PS21/3, you are required to identify your important business services, set impact tolerances for them, and evidence that you can stay within those tolerances under severe but plausible scenarios. The transition period ended on 31 March 2025, so this is no longer aspirational. The regulator expects you to be doing it now, and to be doing it well.

The cloud version of this question is straightforward to ask and harder to answer: which cloud services, in which regions, with which dependencies, support each of your important business services? If you cannot answer that for every service in scope, you have work to do.

The bit that catches firms out is the dependency chain. Your customer-facing platform might be hosted in Azure West Europe, but it depends on an Entra ID tenant, a third-party identity provider for federation, a SaaS platform for document signing, a managed database, and a payment gateway. A failure in any one of those breaks the service. Your impact tolerance has to account for the whole chain, not just the bit you operate.

This is also where the new Critical Third Parties regime comes in (FCA PS24/16 and PRA PS16/24, in force from 1 January 2025). The Treasury can now designate suppliers as CTPs where their failure could threaten financial stability. Importantly, the regime does not transfer your accountability to the supplier. You still own the risk. What it does is give the regulators powers to oversee the resilience of the suppliers you depend on most.

Practically, this means three things for your cloud estate:

Map your fourth-party dependencies, not just your third parties. If your SaaS provider depends on AWS, you depend on AWS. The regulator expects you to understand that.
Have a stressed exit plan for every material cloud service, not just a contractual right to exit. The difference matters. Stressed exit means "we can actually get out in 90 days if we have to", not "the contract says we can".
Take concentration risk seriously. If three of your important business services all run on the same hyperscaler in the same region, that is a concentration risk your CRO will want a view on, regardless of how good the provider is.

Data residency, lawful access and the bits people skip

Data residency comes up in every cloud security review I do, and most firms have a thinner answer than they think.

The basics are usually fine. You can show that data is stored in the UK or EU. Encrypted at rest. Encrypted in transit. Box ticked. The bit that gets harder is around lawful access. Post-Schrems II, and given the US CLOUD Act, the question is not just where the data sits. It is who can compel access to it, under what jurisdiction, and what your provider would actually do if served.

Most hyperscalers have published transparency reports and data boundary commitments that are genuinely useful here. Microsoft's EU Data Boundary, AWS's Digital Sovereignty Pledge and Google's Sovereign Controls offerings are all worth understanding in detail before you make assumptions about what your existing setup actually delivers. The capabilities are real. They are also not on by default, and they often cost extra.

Two specific things worth checking:

Where do your encryption keys live, and who can access them under what circumstances? Customer-managed keys held in your own HSM are a meaningfully different posture from provider-managed keys, even if both technically tick the "encryption at rest" box.
Where do logs and metadata live? Data residency promises often cover the primary data and not the operational telemetry. For financial services, that distinction matters.

What good looks like at this stage of the journey

If you are at the cloud sprawl point, "good" is not a perfect estate. Nobody has one of those. Good is an estate where you can:

Describe end to end which workloads support which important business services
Evidence that your guardrails actually apply to every subscription and every workload, not just the ones the original architect remembered
Show a clean privileged access model with no standing Global Admins, proper PIM, and reviewed service principals
Demonstrate that your impact tolerances under PS21/3 are being met in practice, with scenario testing that includes cloud-specific failure modes
Articulate your fourth-party dependencies and have stressed exit plans for the workloads that matter most
Explain in plain English to your CRO what would happen if your primary cloud region went down for 24 hours, and have evidence to back it up

None of that requires you to start over. All of it requires you to slow down long enough to step back and look at the whole picture. In my experience, the firms who do this best treat it as a 12 to 18 month programme of focused remediation, owned by a named individual, with quarterly checkpoints to the risk committee. The ones that struggle are the ones who try to bolt it onto an already overloaded BAU team and hope for the best.

If your firm is somewhere on this journey and you want to compare notes, give me a shout. Always happy to grab a coffee or jump on a call to talk through where you are and what good might look like for your specific shape of estate.

Frequently asked questions

Do we need to redo our landing zone if we already deployed one in 2021?

Probably not redo, but almost certainly uplift. Most landing zones designed for the first wave of cloud adoption do not scale to the workloads firms now run on them. The fix is to re-baseline your guardrails, enforce them through policy at scale, and prioritise gaps tied to your most material workloads. Treat it as a controls refresh with technology underneath, not a tech refresh on its own.

How does PS21/3 actually apply to cloud workloads?

PS21/3 requires you to identify your important business services, set impact tolerances, and evidence you can stay within them under severe but plausible scenarios. For cloud, that means mapping every cloud service, dependency and third party that supports each important business service, and proving the chain stays within tolerance even when something fails. The transition period ended on 31 March 2025, so the FCA now expects this to be operational rather than in flight.

Does the Critical Third Parties regime move my regulatory risk to the cloud provider?

No. The CTP regime (FCA PS24/16 and PRA PS16/24, in force from 1 January 2025) gives regulators oversight powers over designated providers, but accountability for managing your third-party risk stays with you. Your firm still owns the risk in any outsourcing arrangement. What changes is that the regulators can now intervene to raise the resilience of providers they have designated as systemically important.

How do we know if our cloud privileged access is in good shape?

Three quick tests. First, count your standing Global Admins. If it is more than three or four, you have a problem. Second, look at how many service principals exist, when their secrets were last rotated, and whether anyone owns them. Third, check whether your PIM elevation requests are actually being reviewed, or just rubber-stamped. If any of those three answers makes you wince, prioritise a privileged access cleanup before anything else.

What's a realistic timeline for getting a sprawled cloud estate back under control?

12 to 18 months of focused remediation is what I see working in practice, depending on the size of the estate and how much technical debt has accumulated. The first 90 days should be discovery and prioritisation: what do you actually have, what maps to important business services, where are the worst gaps. After that, you can sequence the remediation by risk and tie it directly to your operational resilience programme rather than running it as a separate workstream.