-
Notifications
You must be signed in to change notification settings - Fork 160
feat: add redis response caching #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
a33f7f5 to
4839284
Compare
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1803 +/- ##
==========================================
+ Coverage 84.04% 84.20% +0.15%
==========================================
Files 117 120 +3
Lines 12990 13171 +181
==========================================
+ Hits 10917 11090 +173
- Misses 1418 1422 +4
- Partials 655 659 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
Signed-off-by: John Panos <[email protected]>
443a99d to
166d190
Compare
Signed-off-by: John Panos <[email protected]>
166d190 to
d6357f5
Compare
|
Might be too much of a stretch for initial implementation, but I'm thinking if this could be done in a way that it would work for pure Envoy Gateway 🤔 so that AIGW simply utilizes it with a different cache key pattern. So that people needing caching for non-inference could also use it. Appreciate this may be too much for initial implementation but would be sooo cool! |

Description
Adds Redis-based response caching to reduce latency and costs for repeated LLM requests. Our NLP team needed configurable caching with standard HTTP semantics for cache control.
Features:
no-cache,no-store,private,max-age)x-aigw-cache: hit/missresponse header for observabilityConfiguration:
responseCachefield on AIGatewayRouteextProc.redis.addrin Helm values (orsecretReffor production)Related Issues/PRs
N/A
Notes for reviewers
Main changes:
internal/cache/- Cache interface, Redis client, Cache-Control parsinginternal/extproc/processor_impl.go- Cache lookup/store logicapi/v1alpha1/ai_gateway_route.go- ResponseCacheConfig APIexamples/response-cache/- Example manifestssite/docs/capabilities/traffic/response-caching.md- Documentationtests/e2e/response_cache_test.go- E2E tests