Skip to content

Conversation

@johnpanos
Copy link
Contributor

@johnpanos johnpanos commented Jan 22, 2026

Description

Adds Redis-based response caching to reduce latency and costs for repeated LLM requests. Our NLP team needed configurable caching with standard HTTP semantics for cache control.

Features:

  • Shared Redis cache across all ext_proc instances
  • HTTP Cache-Control support (no-cache, no-store, private, max-age)
  • Per-route TTL configuration (default: 1 hour)
  • x-aigw-cache: hit/miss response header for observability

Configuration:

  • responseCache field on AIGatewayRoute
  • extProc.redis.addr in Helm values (or secretRef for production)
  • Controller flags for Redis connection

Related Issues/PRs

N/A

Notes for reviewers

Main changes:

  • internal/cache/ - Cache interface, Redis client, Cache-Control parsing
  • internal/extproc/processor_impl.go - Cache lookup/store logic
  • api/v1alpha1/ai_gateway_route.go - ResponseCacheConfig API
  • examples/response-cache/ - Example manifests
  • site/docs/capabilities/traffic/response-caching.md - Documentation
  • tests/e2e/response_cache_test.go - E2E tests

@johnpanos johnpanos requested a review from a team as a code owner January 22, 2026 03:13
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 22, 2026
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
@johnpanos johnpanos force-pushed the add-response-caching branch from a33f7f5 to 4839284 Compare January 22, 2026 03:13
@johnpanos johnpanos changed the title Add response caching feat: add redis response caching Jan 22, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 95.58011% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.20%. Comparing base (fefb039) to head (d6357f5).

Files with missing lines Patch % Lines
internal/extproc/processor_impl.go 93.26% 3 Missing and 4 partials ⚠️
internal/cache/cache.go 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1803      +/-   ##
==========================================
+ Coverage   84.04%   84.20%   +0.15%     
==========================================
  Files         117      120       +3     
  Lines       12990    13171     +181     
==========================================
+ Hits        10917    11090     +173     
- Misses       1418     1422       +4     
- Partials      655      659       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@johnpanos johnpanos force-pushed the add-response-caching branch 3 times, most recently from 443a99d to 166d190 Compare January 22, 2026 06:24
@johnpanos johnpanos force-pushed the add-response-caching branch from 166d190 to d6357f5 Compare January 22, 2026 06:39
@johnpanos
Copy link
Contributor Author

johnpanos commented Jan 23, 2026

image

We've been running this in prod for around a few days now, and so far it's been holding up really well.

@missBerg
Copy link
Contributor

Might be too much of a stretch for initial implementation, but I'm thinking if this could be done in a way that it would work for pure Envoy Gateway 🤔 so that AIGW simply utilizes it with a different cache key pattern.

So that people needing caching for non-inference could also use it.

Appreciate this may be too much for initial implementation but would be sooo cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants