Skip to content
API Management as a Runtime Control Plane for AKS and Container Apps

API Management as a Runtime Control Plane for AKS and Container Apps

in

I put Azure API Management in front of an AKS cluster for the first time about three years ago, and I treated it as a fancy reverse proxy. I set up a couple of APIs, pointed them at my backend services, added a validate-jwt policy because the docs said I should, and called it a day. Four months later I had services running on both AKS and Container Apps, two completely different auth implementations for the same logical API, and a rate limiting story that was basically "whatever the backend can handle before it falls over." Boy, what a mess that was :)

What I learned from that experience, and what I keep seeing other teams learn the hard way, is that the moment you deploy services across AKS and Container Apps, you inherit a governance problem that nobody thinks through until they're neck-deep in it. Each runtime brings its own networking assumptions, scaling behavior, certificate handling, and secret injection mechanics. Then you need to put something in front of them to handle authentication, rate limiting, version management, routing logic, and incident response procedures that work across both platforms simultaneously. The proxy is the part you see first, but the real work is establishing a consistent API governance layer above all that inconsistency.

What Does APIM Actually Do?

The way I think about API Management is: hosted API gateway, developer portal, and policy engine rolled into one service. It basically intercepts HTTP(S) requests, applies transformations and business rules, routes traffic to backends, and collects telemetry. Now, let me be clear about what it isn't though: it's not an ingress controller, not a service mesh, not a replacement for managed identity or Entra integration at the application level, not a load balancer, and not a data plane for east-west service communication.

I've seen teams confuse APIM with Kubernetes ingress all the time since both sit in front of services and route traffic. The difference is actually structural. An ingress controller lives inside Kubernetes, understands Kubernetes abstractions, and binds tightly to workload definitions. APIM lives outside all your runtimes, speaks to them via explicit backends, and manages governance policies that exist independently of how you deploy code. This means you can change how you deploy without touching how you govern.

When you've got only AKS, you can just use a Kubernetes-native gateway like Envoy Gateway or Contour via the Gateway API since it integrates with your cluster and scales with it. The old NGINX Ingress Controller reached end of life in early 2026 and Gateway API is the standard replacement. When you've got AKS plus Container Apps plus maybe a few App Service instances, APIM becomes the right answer since it sits above all of them and enforces consistent rules regardless of how each runtime works internally.

APIM Solves the Governance Gap in Mixed Runtimes

AKS gives you control over networking, scaling, identity, and observability, but it requires you to make those decisions correctly and maintain them. Container Apps abstracts away networking complexity, handles cert provisioning, and gives you consumption-based scaling, but you relinquish low-level control. They don't work the same way.

I ran into this exact problem when I needed to gate one of my APIs behind Entra ID. In AKS, I used an OAuth proxy sidecar. In Container Apps, I used the built-in authentication feature because it was convenient. Then I realized I had two completely different auth implementations for the same logical API, which meant a client calling the AKS endpoint saw different behavior than one calling the Container Apps endpoint, and when I changed the auth policy later, I had to remember to update both implementations, in both places, using different tools. That's when I moved the validate-jwt policy into APIM and defined the Entra integration once. I changed it once, I debugged it once, and auth drift stopped being a problem.

That being said, the tension is real: APIM introduces operational complexity and you now have another service to actually run, configure, monitor, and troubleshoot. If you do that carelessly, you just add friction without gaining control. But if you ignore APIM and let each service implement its own governance, you fragment your API surface and make compliance, quota management, and incident response much harder. The right answer depends on whether you've got enough API heterogeneity that consistency matters. A single small team running three services in AKS might be fine without it, but a platform team supporting dozens of services across multiple runtimes with compliance requirements can't afford not to have it.

Reference Architecture

graph TD
    FD["Azure Front Door\nDDoS + TLS"] --> APIM["API Management\nGateway + Policies"]
    APIM --> AKS["AKS Services\nPrivate Endpoints"]
    APIM --> CA["Container Apps\nVNet Integration"]
    APIM -->|"Managed Identity"| KV["Key Vault\nCerts & Secrets"]
    AKS --> LA["Log Analytics\nCentralized Telemetry"]
    CA --> LA
    APIM --> LA
    AKS -->|"Workload Identity"| KV
    CA -->|"Workload Identity"| KV

A production setup looks like this:

API Management Consumption tier overview with gateway URL and API configuration

External traffic arrives at Azure Front Door (or Application Gateway for internal-only deployments), which provides DDoS protection, geographic routing, and TLS termination. Front Door routes traffic to APIM, which may be deployed as a managed service or with self-hosted gateways in your VNets. APIM authenticates requests, applies rate limiting and transformation policies, and routes requests to backends.

Backends are basically spread across two platforms. AKS services are accessed via private service endpoints or through private link connectivity, and Container Apps services are accessed via their FQDN, either publicly or through a VNet integration. You can also expose Container Apps jobs (triggered asynchronously) through APIM for orchestration or status checking.

All services authenticate downstream using managed identity: APIM uses a managed identity to access Key Vault for certificates and secrets, and services use workload identity to authenticate to databases, Key Vault, or other Azure resources. The system doesn't pass service account keys around.

For observability, APIM emits logs and metrics to Log Analytics, while AKS and Container Apps emit their own telemetry. You just aggregate all of this in a shared Log Analytics workspace and query it with KQL, and Application Insights instruments applications running in both runtimes to track request traces, dependencies, and errors end-to-end.

On the networking side, APIM has no public backend routes. All backends are accessed via private endpoints or VNet integration. APIM itself can be deployed with private endpoints so that traffic between Front Door and APIM stays off the internet, and services in AKS aren't exposed directly since APIM is the sole public touch point.

For secrets, APIM accesses certificates and keys in Key Vault via managed identity, and services inject secrets via workload identity at runtime. Quite straightforward once you've got the plumbing in place.

Policy Design: The Working Patterns

Alright, let's talk about the policies. APIM policies are XML configurations that define how requests and responses flow through the gateway. The policy reference is massive, but understanding the working patterns is essential to avoid common mistakes.

Authentication and Authorization

The validate-jwt policy is the most common starting point. It basically checks JWT signatures against a public key endpoint (the JWKS endpoint from your Entra tenant) and extracts claims you can use downstream:

<validate-jwt header-name="Authorization">
  <openid-config url="https://login.microsoftonline.com/TENANT/v2.0/.well-known/openid-configuration" />
  <issuers>
    <issuer>https://login.microsoftonline.com/TENANT/v2.0</issuer>
  </issuers>
  <audiences>
    <audience>api://my-api</audience>
  </audiences>
  <required-claims>
    <claim name="scp" match="all">
      <value>admin</value>
    </claim>
  </required-claims>
</validate-jwt>

This requires requests to have a token with the "admin" scope. If the claim's missing, validation fails and the request gets rejected.

You can also use API keys:

<check-header name="X-API-Key" failed-check-httpcode="401" failed-check-error-message="API key is missing or invalid" ignore-case="false">
  <value>your-expected-api-key-value</value>
</check-header>

Rate Limiting and Quotas

Rate limiting controls how many requests a client can make in a time window, and quotas cap total consumption over a longer period. I use both to protect backend services from overload and enforce SLAs.

<rate-limit-by-key calls="100" renewal-period="60" counter-key="@(context.Request.IpAddress)" />

This allows 100 requests per IP address per minute. You can also rate-limit by subscription, user, or a custom expression.

For quotas tied to a product subscription:

<quota-by-key calls="10000" renewal-period="2592000" counter-key="@(context.Subscription.Id)" />

This allows 10,000 requests per subscription per 30 days.

Retries and Circuit Breaking

For resilient API consumption:

<retry condition="@(context.Response.StatusCode == 500 || context.Response.StatusCode == 503)" count="3" interval="1" delta="2" max-interval="20">
  <forward-request />
</retry>

Retries the backend request up to 3 times if the backend returns 500 or 503.

Products, Versions, and Revisions

APIM organizes APIs into products, versions, and revisions. Products are basically access control boundaries, a single product might contain the customer API v1, v2, and the order API v1, and a subscription grants access to all of them. Versions allow multiple versions of the same API to coexist. Revisions are immutable snapshots of an API definition, and when you change a policy, APIM creates a new revision you can test before making it current. You can roll back without data loss if something breaks, which is actually quite nice when you're iterating on policies in production.

When Should You Run Self-Hosted Gateways?

APIM comes in two deployment models: the managed service and self-hosted gateways.

The managed service is the default. You create an APIM instance, define your APIs and policies, and it just runs on Microsoft's infrastructure. It handles scaling, patching, and availability for you.

Self-hosted gateways are containerized replicas of the APIM data plane (available only in Developer and Premium tiers). You deploy them as containers in AKS or Container Instances. The gateway synchronizes its configuration from the managed APIM instance but runs the actual request/response processing locally, which gives you the best of both worlds if your latency or compliance requirements demand it.

So, self-hosted gateways are useful for three reasons: latency (deploying a gateway in the same region or VNet reduces round-trip time), compliance (where APIs must stay within a specific network boundary), and hybrid scenarios where you've got backends on-premises and some in the cloud. The self-hosted gateway docs go into detail on how to set this up.

The tradeoff is operational burden. Self-hosted gateways are containers you have to patch, monitor, scale, and troubleshoot like any other workload. They don't auto-scale as readily as the managed service, and you're basically trading managed convenience for network proximity. Start with the managed APIM service and only move to self-hosted gateways if you've got a concrete requirement. Whatever you do, don't deploy them just because they sound cool.

Internal Versus External API Patterns

APIM works equally well for both, but the patterns differ. Here's a quick comparison I keep in mind:

Aspect External APIs Internal APIs
Consumers Third-party / customer apps Your own services
Authentication API keys, OAuth, mutual TLS Managed identity, Entra ID
Versioning Conservative, can't force client upgrades Aggressive, you coordinate deployments
Rate limits Strict, tied to SLA contracts More generous, you control the traffic
Documentation Developer portal, formal specs Internal wiki, OpenAPI specs
Breaking changes Requires deprecation period Coordinated release

External APIs get published to the developer portal and need complete, up-to-date documentation, clear versioning, and formal SLA contracts. Internal APIs are consumed by your own services, you control both the client and the server, so you can actually move quite a bit faster. The APIM developer portal docs cover how to set up the external-facing portal if you need it.

What Are the Common APIM Mistakes?

Auth in Both APIM and Backend

I've seen teams apply JWT validation in APIM policies and again in application code. It's redundant. If APIM validates the token and injects the claims into the request, the backend can just trust them. Re-validating in the backend adds latency and complexity without any meaningful security benefit.

The exception is if you don't trust APIM itself, either because it's self-hosted and you've got low confidence in its operation, or because you need defense-in-depth. But once APIM validates, the backend can assume the token's good.

Policy Drift

APIM configuration drifts over time, and if those changes are manual and ad-hoc, production diverges from documentation. Six months later no one remembers why a specific rate limit is set to 500 calls per minute. Manage APIM configuration in code (Terraform, Bicep, or OpenAPI definitions), store it in Git, review changes in pull requests, and apply it consistently across all environments. You can find good policy samples on GitHub to get started.

Over-Complicated Policies

APIM policies can do a lot of things (transform requests, rewrite URLs, generate tokens, cache responses), but large policies that do ten things at once are just a nightmare to test and debug. Keep policies focused. If a policy's becoming large (and they always do), split it into multiple smaller policies or APIs.

Missing Observability in Custom Code

If you write custom code in a policy, make sure it actually logs what it does. Use the trace policy to write debug output:

<trace source="custom-policy" severity="verbose">
    <message>@("Check result: " + (bool)context.Variables["some-check"])</message>
</trace>

Those traces appear in Application Insights telemetry and diagnostic logs, queryable like any other log entry.

Treating APIM as a DDoS Mitigation

APIM rate limiting won't protect you from a large-scale DDoS attack. Use Azure DDoS Protection standard or Azure Front Door's included DDoS protection for that. APIM rate limiting is basically appropriate for quota enforcement and preventing accidental API abuse, not for defending against intentional attacks.

Observability: Logging, Metrics, and Incident Response

On the observability side, APIM gives you diagnostic logs (HTTP request/response details like headers, body, status codes, latencies), metrics (throughput, latency, error rates, alert criteria), and structured logs from policies that capture business-level data.

All of this flows into Log Analytics, queryable with KQL.

A common diagnostic query:

ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where StatusCode >= 400
| summarize count() by StatusCode, ApiId
| order by count_ desc

This shows which APIs are returning errors in the past hour.

To track errors end-to-end across APIM and application services, use Application Insights with distributed tracing. Set a unique trace ID in APIM:

<set-variable name="trace-id" value="@(context.Request.Id)" />

Then inject that trace ID into the headers sent to backends:

<set-header name="X-Trace-ID" exists-action="override">
  <value>@((string)context.Variables["trace-id"])</value>
</set-header>

In your application code (running in AKS or Container Apps), read that header and attach it to all Application Insights events:

var traceId = request.Headers["X-Trace-ID"];
telemetryClient.TrackTrace("Processing customer request",
  new Dictionary<string, string> { { "traceId", traceId } });

For incident response, APIM logs should feed into your alerting system. Typical alert thresholds I use:

  • alert if api_error_rate exceeds 5% in the last 5 minutes (this is usually the first thing that fires when something breaks)
  • alert if api_p99_latency goes above 2000ms in the last 5 minutes, since that quite often signals a backend bottleneck or a policy doing too much work
  • alert if gateway_backend_connectivity_failures > 10 in the last 10 minutes

When an alert fires, the on-duty engineer checks the KQL logs to understand what failed.

Is APIM Worth the Cost?

APIM pricing depends heavily on which tier you pick. The classic tiers (Developer, Basic, Standard, Premium) charge a flat monthly rate per gateway unit with no per-call fees, so you're basically paying for capacity, not consumption. The Consumption tier flips that model: no base cost, but you pay per API call. The v2 tiers (Basic v2, Standard v2, and Premium v2) sit somewhere in between with unit-based pricing, faster deployment, and VNet injection support in Premium v2.

There's also the cost of self-hosted gateways if you deploy them (Developer and Premium tiers only). Each gateway runs as a container, consuming cluster resources on top of the APIM license.

APIM isn't cheap. Check the pricing page and you'll see what I mean. The real question is whether it's worth the cost relative to the operational burden it removes.

If you've got five teams, each implementing their own auth, rate limiting, logging, and API versioning logic, your total operational cost is high. That work spreads across teams, it's inconsistent, and it's difficult to aggregate for compliance or debugging. Centralizing that work in APIM costs money, but it frees up team time and eliminates inconsistency.

If you've got a small team with a small API surface, APIM might not pay for itself. You're paying for a managed service and adding a new operational touch point for consistency you don't yet need.

That being said, APIM adds latency because all requests flow through it before reaching backends. On a local network, the added overhead runs about 10-50ms for simple policies, higher if you're doing heavy transformations or response caching. For APIs where that's acceptable, it's a non-issue. For low-latency services where every millisecond actually matters, it might not be. You can reduce latency by deploying self-hosted gateways in the same VNet or stripping unnecessary transformations from your policies, but APIM isn't a zero-latency solution.

Multi-Region Deployment

When you deploy APIM across regions, the architecture becomes more complex but also more resilient.

The Premium tier supports multi-region deployment within a single APIM instance. You add regional gateways and APIM automatically synchronizes API definitions and policy configurations across all of them, and propagation takes less than 10 seconds. Only the gateway component replicates, the management plane and developer portal stay in the primary region.

This means if the primary region goes down, secondary gateways keep serving API requests using the last-known configuration. You don't lose traffic, you just lose the ability to make config changes until the primary recovers, and that's actually an acceptable tradeoff for production workloads.

If you're not on Premium (and it isn't cheap), you'll need separate APIM instances per region. In that case, APIM configuration becomes your disaster recovery concern, which means a policy change in the US region doesn't automatically apply to the EU region. Use infrastructure-as-code (Terraform or Bicep) to keep them in sync.

Either way, for true active-active replication where all regions serve the same customers simultaneously, you also need backends that are themselves replicated across regions. APIM just sits in front of them.

Operating Model and Incident Response

APIM is a critical path. If it fails, all APIs are unreachable, which means APIM must be part of your disaster recovery planning and operational discipline.

Define clear ownership: the platform team owns APIM instances, shared policies, products, diagnostics, certificates, and network connectivity. API product owners own backend correctness, API contracts, and version lifecycle. Application teams remain responsible for backend latency, retry behavior, and schema compatibility. That split prevents vague shared ownership, which is basically the worst anti-pattern in APIM programs.

Treat APIM changes like platform changes. Run a weekly review of failed requests by API and backend, review rate-limit hits and quota exhaustion, and review any temporary policy exceptions introduced during incidents.

Now, run incident drills quarterly. Pick a failure scenario, execute it, and measure your response. For example: your primary APIM instance becomes unavailable. How long does failover take? Or: a policy change introduces a bug that breaks all authentication. How quickly can you detect and rollback?

In my experience, the teams that get the most value from APIM are the ones that manage it in code, monitor it heavily, and have runbooks for failure. That operational discipline matters quite a bit more than the technology itself.

APIM Is a Governance Layer, Not a Feature

I wouldn't go back to managing auth, rate limiting, and versioning per-service across multiple runtimes. It's an operational system that you have to manage like any other infrastructure (versioned, monitored, backed up, and scaled appropriately), but what it gives you is a single point of control above platform diversity. Need a new authentication provider? You can add it to APIM once instead of updating ten places. Need to debug why an API is failing? One place to look. Need a new rate limiting policy? Just change it once.

For teams running AKS and Container Apps, APIM trades operational complexity for architectural clarity. Whether that trade is worth it depends on the size and complexity of your API surface. For a small team with three services it's probably not worth it, but for a platform team with fifty services across multiple runtimes, you can't afford not to have it. Define clear ownership, manage configuration in code (the Bicep resource definitions are a good starting point), monitor it heavily, and have a plan for when it fails.

That being said, have a good one!