build: Add gevals testing targets and enhance server management #638

lyarwood · 2026-01-12T19:06:27Z

Add Makefile targets to run gevals tests against the MCP server using
in-tree evaluation files, and enhance the existing run-server target
with automatic KUBECONFIG detection for local Kind clusters.

New Gevals Testing Targets (build/gevals.mk):

gevals-check: Verify gevals is available and configured
gevals-run: Run gevals tests with OpenAI-compatible agent
gevals-run-claude: Run gevals tests with Claude Code agent

Enhanced Existing Target:

run-server: Now automatically sets KUBECONFIG with priority:
_output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config
This ensures the server uses the local Kind cluster when available.
Default port changed to 8008 (from 8080).

Preserved Existing Target:

stop-server: Unchanged, stops the background MCP server

Gevals Testing Workflow:

Check gevals availability (looks in PATH)
Export KUBECONFIG for task verification scripts
Create temporary eval config with corrected absolute paths
Execute gevals evaluation with configured agent

Key Features:

Dynamic model name injection via MODEL_NAME environment variable
Optional LLM judge (disabled when JUDGE_* vars not set)
Fixes relative paths in eval files (../tasks → absolute paths)
Uses eval-inline.yaml for simpler agent configuration
Consistent KUBECONFIG handling across all targets

Uses in-tree evaluation files from evals/ directory:

evals/openai-agent/eval-inline.yaml
evals/claude-code/eval-inline.yaml
evals/tasks/ (shared task definitions)

The integration works seamlessly with local-env-setup and
local-env-setup-kubevirt targets, providing end-to-end testing of
the MCP server with a local Kind cluster.

Configuration via Environment Variables:

MCP Server (run-server):

MCP_PORT: Server port (default: 8008)
TOOLSETS: Comma-separated toolsets to enable
MCP_HEALTH_TIMEOUT: Health check timeout in seconds (default: 60)
MCP_HEALTH_INTERVAL: Health check interval in seconds (default: 2)

Gevals Testing:

GEVALS_BIN: Path to gevals binary (default: looks in PATH)
MODEL_NAME: AI model to use (default: gemini-2.0-flash)
MODEL_BASE_URL/MODEL_KEY: AI API credentials (required)
JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL_NAME: Optional LLM judge
GEVALS_ARGS: Additional arguments (e.g., -r "pattern" for filtering)

Documentation (docs/gevals-testing.md):

Complete guide for gevals testing with the MCP server
Documents both automatic and manual server management workflows
Includes configuration, task filtering, and troubleshooting
Examples for running specific test subsets (kubevirt, kubernetes, etc.)

Enhance the local development environment to fully support Podman as an alternative to Docker, with improved container engine detection and fixes for Kind cluster networking. Container Engine Detection: - Verify that docker or podman are actually running before selecting them by checking `docker info` / `podman info` - Prevents errors when a container engine is installed but not active - Consistently use CONTAINER_ENGINE variable throughout makefiles Keycloak Connectivity: - Add curl --resolve flag to bypass DNS and connect directly to localhost - Ensures reliable connectivity to Keycloak running in the Kind cluster - Prevents DNS-related connection failures Kind Cluster Management: - Set KIND_EXPERIMENTAL_PROVIDER inline when using podman on Linux - Simplify cluster creation and deletion logic - Consistently use CONTAINER_ENGINE for all container operations Ingress Controller Fixes: - Enable hostNetwork mode for nginx ingress controller - Required for proper port binding with Podman in Kind clusters - Limit nginx worker processes to 2 to avoid pthread resource exhaustion - Remove redundant hostPort bindings (not needed with hostNetwork) - Set dnsPolicy to ClusterFirstWithHostNet for proper name resolution These changes enable seamless use of Podman for local Kubernetes development with Kind, Keycloak, and the MCP server. Assisted-By: Claude <[email protected]> Signed-off-by: Lee Yarwood <[email protected]>

Add comprehensive KubeVirt support to enable local testing of VM management tools with a complete Kind + KubeVirt setup. New Makefile Targets: - local-env-setup-kubevirt: Complete environment setup including KubeVirt - kubevirt-install: Install KubeVirt operator and CDI - kubevirt-uninstall: Remove KubeVirt and CDI from cluster - kubevirt-status: Display KubeVirt, CDI, and VM status KubeVirt Installation (build/kubevirt.mk): - Installs KubeVirt v1.7.0 operator and custom resource - Installs CDI (Containerized Data Importer) v1.64.0 for disk management - Waits for components to be ready before proceeding - Provides status checking for all KubeVirt resources The local-env-setup-kubevirt target orchestrates a complete setup: 1. Creates Kind cluster with ingress and cert-manager 2. Installs KubeVirt and CDI 3. Builds the MCP server binary This enables developers to test VM lifecycle management tools (start, stop, restart, create) in a local Kubernetes environment. Signed-off-by: Lee Yarwood <[email protected]>

Add Makefile targets to run gevals tests against the MCP server using STDIO transport, and enhance the existing run-server target with automatic KUBECONFIG detection and configurable toolsets. New Gevals Testing Targets (build/gevals.mk): - gevals-check: Verify gevals is available and configured - gevals-run: Run gevals tests with OpenAI-compatible agent - gevals-run-claude: Run gevals tests with Claude Code agent Enhanced Existing Target: - run-server: Now automatically sets KUBECONFIG with priority: _output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config This ensures the server uses the local Kind cluster when available. Default port changed to 8008 (from 8080). Preserved Existing Target: - stop-server: Unchanged, stops the background MCP server Configurable Toolsets (TOOLSETS variable): - Default: core,config,helm,kubevirt (includes KubeVirt for VM testing) - Used by both run-server and gevals targets - Override via environment: export TOOLSETS=core,config,helm - Or via make argument: make gevals-run TOOLSETS=core,kubevirt Gevals Testing Workflow: 1. Build the MCP server binary (added as dependency) 2. Export KUBECONFIG for task verification scripts 3. Generate temporary MCP config with absolute path to binary 4. Generate temporary eval config with corrected absolute paths 5. Execute gevals evaluation with configured agent STDIO Transport for Gevals: - Gevals spawns the MCP server binary directly as a subprocess - No HTTP server required (unlike run-server which uses HTTP/SSE) - Temporary MCP config generated with: command, args (--toolsets), env - Each test run gets fresh server instance with specified toolsets Key Features: - Dynamic model name injection via MODEL_NAME environment variable - Optional LLM judge (disabled when JUDGE_* vars not set) - Fixes relative paths in eval files (../tasks → absolute paths) - Uses eval-inline.yaml for simpler agent configuration - Consistent KUBECONFIG handling across all targets - Displays active configuration (binary path, toolsets, model) before running Uses in-tree evaluation files from evals/ directory: - evals/openai-agent/eval-inline.yaml - evals/claude-code/eval-inline.yaml - evals/tasks/ (shared task definitions) - evals/mcp-config.yaml (updated to use STDIO transport) The integration works seamlessly with local-env-setup and local-env-setup-kubevirt targets, providing end-to-end testing of the MCP server with a local Kind cluster. Configuration via Environment Variables: MCP Server (run-server): - MCP_PORT: Server port (default: 8008) - MCP_HEALTH_TIMEOUT: Health check timeout in seconds (default: 60) - MCP_HEALTH_INTERVAL: Health check interval in seconds (default: 2) Toolsets (both run-server and gevals): - TOOLSETS: Comma-separated toolsets to enable (default: core,config,helm,kubevirt) Gevals Testing: - GEVALS_BIN: Path to gevals binary (default: looks in PATH) - MODEL_NAME: AI model to use (default: gemini-2.0-flash) - MODEL_BASE_URL/MODEL_KEY: AI API credentials (required) - JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL_NAME: Optional LLM judge - GEVALS_ARGS: Additional arguments (e.g., -r "pattern" for filtering) Documentation (docs/gevals-testing.md): - Complete guide for gevals testing with the MCP server - Documents both automatic (gevals spawns server) and manual workflows - Includes configuration, task filtering, and troubleshooting - Examples for running specific test subsets (kubevirt, kubernetes, etc.) - Explains STDIO vs HTTP transport modes Assisted-By: Claude <[email protected]> Signed-off-by: Lee Yarwood <[email protected]>

lyarwood · 2026-01-28T18:21:41Z

/close

lyarwood added 3 commits January 12, 2026 18:35

lyarwood force-pushed the add-gevals-commands-and-imporove-start-server branch from de1f3fb to 92623cc Compare January 12, 2026 19:36

Cali0707 self-requested a review January 12, 2026 19:50

lyarwood mentioned this pull request Jan 16, 2026

chore(kubevirt): add eval tasks for VM creation and lifecycle #626

Merged

lyarwood closed this Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: Add gevals testing targets and enhance server management #638

build: Add gevals testing targets and enhance server management #638

Uh oh!

lyarwood commented Jan 12, 2026

Uh oh!

lyarwood commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

build: Add gevals testing targets and enhance server management #638

build: Add gevals testing targets and enhance server management #638

Uh oh!

Conversation

lyarwood commented Jan 12, 2026

Uh oh!

lyarwood commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant