-
Notifications
You must be signed in to change notification settings - Fork 244
build: Add gevals testing targets and enhance server management #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
lyarwood
wants to merge
3
commits into
containers:main
from
lyarwood:add-gevals-commands-and-imporove-start-server
Closed
build: Add gevals testing targets and enhance server management #638
lyarwood
wants to merge
3
commits into
containers:main
from
lyarwood:add-gevals-commands-and-imporove-start-server
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Enhance the local development environment to fully support Podman as an alternative to Docker, with improved container engine detection and fixes for Kind cluster networking. Container Engine Detection: - Verify that docker or podman are actually running before selecting them by checking `docker info` / `podman info` - Prevents errors when a container engine is installed but not active - Consistently use CONTAINER_ENGINE variable throughout makefiles Keycloak Connectivity: - Add curl --resolve flag to bypass DNS and connect directly to localhost - Ensures reliable connectivity to Keycloak running in the Kind cluster - Prevents DNS-related connection failures Kind Cluster Management: - Set KIND_EXPERIMENTAL_PROVIDER inline when using podman on Linux - Simplify cluster creation and deletion logic - Consistently use CONTAINER_ENGINE for all container operations Ingress Controller Fixes: - Enable hostNetwork mode for nginx ingress controller - Required for proper port binding with Podman in Kind clusters - Limit nginx worker processes to 2 to avoid pthread resource exhaustion - Remove redundant hostPort bindings (not needed with hostNetwork) - Set dnsPolicy to ClusterFirstWithHostNet for proper name resolution These changes enable seamless use of Podman for local Kubernetes development with Kind, Keycloak, and the MCP server. Assisted-By: Claude <[email protected]> Signed-off-by: Lee Yarwood <[email protected]>
Add comprehensive KubeVirt support to enable local testing of VM management tools with a complete Kind + KubeVirt setup. New Makefile Targets: - local-env-setup-kubevirt: Complete environment setup including KubeVirt - kubevirt-install: Install KubeVirt operator and CDI - kubevirt-uninstall: Remove KubeVirt and CDI from cluster - kubevirt-status: Display KubeVirt, CDI, and VM status KubeVirt Installation (build/kubevirt.mk): - Installs KubeVirt v1.7.0 operator and custom resource - Installs CDI (Containerized Data Importer) v1.64.0 for disk management - Waits for components to be ready before proceeding - Provides status checking for all KubeVirt resources The local-env-setup-kubevirt target orchestrates a complete setup: 1. Creates Kind cluster with ingress and cert-manager 2. Installs KubeVirt and CDI 3. Builds the MCP server binary This enables developers to test VM lifecycle management tools (start, stop, restart, create) in a local Kubernetes environment. Signed-off-by: Lee Yarwood <[email protected]>
Add Makefile targets to run gevals tests against the MCP server using STDIO transport, and enhance the existing run-server target with automatic KUBECONFIG detection and configurable toolsets. New Gevals Testing Targets (build/gevals.mk): - gevals-check: Verify gevals is available and configured - gevals-run: Run gevals tests with OpenAI-compatible agent - gevals-run-claude: Run gevals tests with Claude Code agent Enhanced Existing Target: - run-server: Now automatically sets KUBECONFIG with priority: _output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config This ensures the server uses the local Kind cluster when available. Default port changed to 8008 (from 8080). Preserved Existing Target: - stop-server: Unchanged, stops the background MCP server Configurable Toolsets (TOOLSETS variable): - Default: core,config,helm,kubevirt (includes KubeVirt for VM testing) - Used by both run-server and gevals targets - Override via environment: export TOOLSETS=core,config,helm - Or via make argument: make gevals-run TOOLSETS=core,kubevirt Gevals Testing Workflow: 1. Build the MCP server binary (added as dependency) 2. Export KUBECONFIG for task verification scripts 3. Generate temporary MCP config with absolute path to binary 4. Generate temporary eval config with corrected absolute paths 5. Execute gevals evaluation with configured agent STDIO Transport for Gevals: - Gevals spawns the MCP server binary directly as a subprocess - No HTTP server required (unlike run-server which uses HTTP/SSE) - Temporary MCP config generated with: command, args (--toolsets), env - Each test run gets fresh server instance with specified toolsets Key Features: - Dynamic model name injection via MODEL_NAME environment variable - Optional LLM judge (disabled when JUDGE_* vars not set) - Fixes relative paths in eval files (../tasks → absolute paths) - Uses eval-inline.yaml for simpler agent configuration - Consistent KUBECONFIG handling across all targets - Displays active configuration (binary path, toolsets, model) before running Uses in-tree evaluation files from evals/ directory: - evals/openai-agent/eval-inline.yaml - evals/claude-code/eval-inline.yaml - evals/tasks/ (shared task definitions) - evals/mcp-config.yaml (updated to use STDIO transport) The integration works seamlessly with local-env-setup and local-env-setup-kubevirt targets, providing end-to-end testing of the MCP server with a local Kind cluster. Configuration via Environment Variables: MCP Server (run-server): - MCP_PORT: Server port (default: 8008) - MCP_HEALTH_TIMEOUT: Health check timeout in seconds (default: 60) - MCP_HEALTH_INTERVAL: Health check interval in seconds (default: 2) Toolsets (both run-server and gevals): - TOOLSETS: Comma-separated toolsets to enable (default: core,config,helm,kubevirt) Gevals Testing: - GEVALS_BIN: Path to gevals binary (default: looks in PATH) - MODEL_NAME: AI model to use (default: gemini-2.0-flash) - MODEL_BASE_URL/MODEL_KEY: AI API credentials (required) - JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL_NAME: Optional LLM judge - GEVALS_ARGS: Additional arguments (e.g., -r "pattern" for filtering) Documentation (docs/gevals-testing.md): - Complete guide for gevals testing with the MCP server - Documents both automatic (gevals spawns server) and manual workflows - Includes configuration, task filtering, and troubleshooting - Examples for running specific test subsets (kubevirt, kubernetes, etc.) - Explains STDIO vs HTTP transport modes Assisted-By: Claude <[email protected]> Signed-off-by: Lee Yarwood <[email protected]>
de1f3fb to
92623cc
Compare
Contributor
Author
|
/close |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Makefile targets to run gevals tests against the MCP server using
in-tree evaluation files, and enhance the existing run-server target
with automatic KUBECONFIG detection for local Kind clusters.
New Gevals Testing Targets (build/gevals.mk):
Enhanced Existing Target:
_output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config
This ensures the server uses the local Kind cluster when available.
Default port changed to 8008 (from 8080).
Preserved Existing Target:
Gevals Testing Workflow:
Key Features:
Uses in-tree evaluation files from evals/ directory:
The integration works seamlessly with local-env-setup and
local-env-setup-kubevirt targets, providing end-to-end testing of
the MCP server with a local Kind cluster.
Configuration via Environment Variables:
MCP Server (run-server):
Gevals Testing:
Documentation (docs/gevals-testing.md):