Skip to content

[COSMOS] HTTP/2 maxFrameSize hardcoded to 64KB causes Http2Exception + IllegalReferenceCountException with Cosmos Gateway responses #49397

@tvaron3

Description

@tvaron3

Summary

ReactorNettyClient advertises SETTINGS_MAX_FRAME_SIZE = 65536 (64 KB) when negotiating HTTP/2 with the Cosmos Gateway, but the Gateway sends DATA frames larger than this — observed up to 3,683,373 bytes (~3.7 MB) against *-eastus2.documents.azure.com (multi-write, eventual). When this happens, Netty rejects the frame and the entire HTTP/2 parent TCP connection becomes unusable.

Repro

  • Java SDK built from main + PR Add HTTP/2 PING for broken connection detection. #49095 (HEAD 3d10f06fe4a at the time)
  • azure-cosmos-benchmark AsyncWriteBenchmark / AsyncQueryBenchmark
  • Gateway mode, HTTP/2 enabled (Http2ConnectionConfig.setEnabled(true)), thin-client disabled
  • Workload mix that includes large reads / queries (point reads, ReadFeed of a logical partition, parallel queries)

Within ~30 s of warmup, the writes JVM logs 100+ events like:

io.netty.handler.codec.http2.Http2Exception: Frame length: 3683373 exceeds maximum: 65536
   at io.netty.handler.codec.http2.DefaultHttp2FrameReader.preProcessFrame(DefaultHttp2FrameReader.java:195)
   ...

each immediately followed by a cascade:

WARN  com.azure.cosmos.implementation.http.Http2ParentChannelExceptionHandler -
  Exception on HTTP/2 parent connection
  [channel=[id: ..., L:/10.0.0.4:33134 - R:thin-client-mwr-eventual-ci-eastus2.documents.azure.com/40.84.77.110:443],
   activeStreams=2, channelActive=true, ...]
io.netty.util.IllegalReferenceCountException: Http2FrameCodec#decode() might have released its input buffer...
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1

The IllegalReferenceCountException is the secondary symptom — the failed frame leaves the inbound ByteBuf in a bad refCount state, and the next handler's release() trips over refCnt == 0.

Observed counts in a single 3-minute window of one writes JVM at concurrency 25: ~168 events. Each event takes down the parent TCP connection along with 2–3 in-flight streams, which then retry on a fresh connection — successful end-to-end but with a tail-latency hit and very noisy logs.

Root cause

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java#L153-L157

.http2Settings(settings -> settings
    .initialWindowSize(1024 * 1024) // 1MB initial window size
    .maxFrameSize(64 * 1024)        // 64KB max frame size
    .maxConcurrentStreams(httpCfgAccessor().getEffectiveMaxConcurrentStreams(http2Cfg))
)

The 64 KB cap is far below what the Cosmos Gateway sends back for normal reads/queries. RFC 7540 allows SETTINGS_MAX_FRAME_SIZE up to 2^24 - 1 (16,777,215).

Proposed fix

  1. Raise the default maxFrameSize to 1 MB (16× current). 1 MB comfortably covers the observed 3.7 MB outliers — even if the server still sends oversized frames occasionally, raising the floor will dramatically cut event frequency. (If we want to fully cover the observed payload, 4 MB is safer.)
  2. Bump initialWindowSize to match (≥ maxFrameSize) so flow control does not become the new bottleneck — e.g. initialWindowSize = max(1 MB, 2 × maxFrameSize).
  3. Expose both as knobs on Http2ConnectionConfig (setMaxFrameSize, setInitialWindowSize) so callers with unusual workloads (very large docs, big paginated queries) can tune them.

Side effects to consider

Side effect Severity Mitigation
Per-frame memory — Netty must hold a contiguous ByteBuf of up to maxFrameSize while decoding. 1 MB × 30 concurrent streams = ~30 MB worst case per HTTP/2 connection. Pooled DirectByteBuf from PooledByteBufAllocator handles this efficiently. Low Default JVM MaxDirectMemorySize is usually sufficient. Document the new ceiling.
Head-of-line blocking on parent TCP — A frame is transmitted atomically. A 1 MB frame on stream A delays interleaving for streams B, C, D on the same connection until it finishes. Larger frame → more HOL impact. Medium 1 MB is the typical industry sweet spot (gRPC default is 4 MB, Envoy default is also conservative). 16 MB would noticeably hurt tail latency of small concurrent requests, so don't max it out by default.
Pooled allocator fragmentation — Default arena chunk is 16 MB / 8 KB pages. 1 MB allocations compose cleanly; 4 MB still pools well; 16 MB may force unpooled allocation in some configs. Low at 1 MB Stay ≤ 4 MB by default.
Flow-control mismatch — If maxFrameSize > initialWindowSize, the server stalls between frames waiting for WINDOW_UPDATE. Medium Always raise initialWindowSize together.
TLS record overhead — TLS records are ≤ 16 KB, so SslHandler accumulates more partial decrypts before passing them up for larger frames. CPU impact is negligible. Negligible None needed.
DoS surface from a misbehaving peermaxConcurrentStreams × maxFrameSize is the worst-case memory pin per connection. With current maxConcurrentStreams and 1 MB, this is bounded and small. Low Cosmos Gateway is trusted; mTLS + TLS validates origin.

Environment

  • Java 21 (Azul Zulu) on Ubuntu 22.04, -Xmx6g -Xms6g -XX:MaxDirectMemorySize=4g -XX:+UseG1GC
  • Netty version: shipped with azure-cosmos 4.0.1-beta.1 build from main + PR Add HTTP/2 PING for broken connection detection. #49095
  • Cosmos account: multi-write, eventual consistency, Gateway mode, ~50 partitions, mixed point reads/queries/writes
  • Concurrency: 25 per workload split (writes / reads / queries in separate JVMs)

Happy to grab full thread dumps, GC logs, or netty wiretap captures if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.Cosmos

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions