Blog/How We Achieve Sub-Millisecond Authorization at Scale
EngineeringMarch 22, 202612 min read

How We Achieve Sub-Millisecond Authorization at Scale

ZP

Zafer Polat Kalender

Founder & CEO

When we set out to build PermitNetworks, we had one non-negotiable requirement: authorization decisions must take less than 1 millisecond. Not on average — at the 99th percentile. Here's how we built an authorization engine that evaluates complex, multi-layered policies in 0.4ms p50 and 0.8ms p99.

Why Latency Matters for Agent Authorization

AI agents operate at machine speed. A trading bot might execute 500 transactions per second. A customer support agent might handle 50 concurrent conversations. If your authorization layer adds even 10ms of latency per decision, you've introduced 5 seconds of delay per second of operation for that trading bot — making it effectively useless.

Most existing authorization services operate in the 5-50ms range. Some are even slower, requiring network round-trips to centralized policy engines. For human-facing applications, this is fine. For autonomous agents making hundreds of decisions per second, it's a dealbreaker.

The Rust Policy Engine

The core of PermitNetworks is a policy evaluation engine written in Rust. We chose Rust for three reasons: zero-cost abstractions, no garbage collector pauses, and memory safety guarantees. In an authorization system, a GC pause at the wrong moment means a delayed decision — and a delayed decision for a trading bot can mean real money lost.

Our engine compiles policies into an optimized intermediate representation at deploy time, not at evaluation time. When an authorization request arrives, the engine doesn't parse policy text — it walks a pre-compiled decision tree that's been optimized for the specific combination of agents, actions, and resources in your configuration.

// Simplified policy evaluation path fn evaluate(request: &AuthzRequest, tree: &DecisionTree) -> Decision { let node = tree.root(); loop { match node.evaluate(&request.context) { Branch::Left(next) => node = next, Branch::Right(next) => node = next, Branch::Terminal(decision) => return decision, } } }

Data Structures That Matter

The choice of data structures is the single biggest factor in authorization performance. We use three key structures:

Patricia Tries for Permission Lookup

Permission paths like agents.purchase-bot.actions.spend.resources.company-funds map naturally to a trie structure. Patricia tries (radix trees) compress common prefixes, reducing memory usage and lookup time. A permission check that would require string splitting and hash map lookups in a naive implementation becomes a single trie traversal — O(k) where k is the depth of the permission path, typically 4-6 levels.

Bloom Filters for Fast Rejection

Most authorization requests are for common, well-known agent/action combinations. We use bloom filters as a fast-path: if the bloom filter says "definitely not in the deny list," we can skip the full deny-rule evaluation entirely. This eliminates ~70% of unnecessary policy evaluations in production workloads.

Ring Buffers for Rate Limiting

Rate limit evaluation requires tracking request counts over sliding time windows. We use lock-free ring buffers with atomic counters, avoiding any mutex contention. Each agent/action pair gets its own ring buffer, pre-allocated at policy deployment time.

Edge Deployment Strategy

The fastest network request is the one you don't make. Our policy engine runs at the edge, deployed to 40+ points of presence globally. When an agent in your US-East infrastructure makes an authorization request, it's evaluated by an engine in the same region — no cross-region network hop required.

Policy updates propagate to all edge nodes within 500ms using a gossip protocol. This means you can update a spending limit and know it's enforced globally in under a second, while still maintaining sub-millisecond evaluation latency.

Benchmarks

We run continuous benchmarks against production-representative workloads. Here are our current numbers:

# Policy evaluation latency (10-rule policy, 1M requests)
p50: 0.4ms
p90: 0.6ms
p99: 0.8ms
# Throughput (single core)
~45,000 decisions/second
# Memory usage (1,000 policies, 10,000 agents)
~12MB resident

For comparison, a typical cloud-based policy engine like OPA with a REST API adds 5-15ms of network latency alone, before policy evaluation even begins. Even running OPA as a sidecar adds 2-5ms. Our edge-deployed Rust engine eliminates both the network hop and the evaluation overhead.

The Tradeoffs We Made

Speed isn't free. We made deliberate tradeoffs to achieve sub-millisecond performance:

  • Pre-compilation over flexibility: Policies are compiled at deploy time, not interpreted at runtime. This means policy changes require a redeploy (which takes <500ms), but evaluation is dramatically faster.
  • Memory over disk: All active policies and agent state are held in memory. This limits the total number of policies per edge node, but eliminates disk I/O from the hot path.
  • Eventual consistency for updates: Policy propagation is eventually consistent (500ms window). For the vast majority of use cases, this is indistinguishable from strong consistency.

These tradeoffs are the right ones for agent authorization, where every millisecond counts and policies change infrequently relative to the rate of authorization decisions. When your agent is making 500 decisions per second, you need an engine that was built for that workload from the ground up.