Skip to content

Conversation

@lyarwood
Copy link
Contributor

Add Makefile targets to run gevals tests against the MCP server using
in-tree evaluation files, and enhance the existing run-server target
with automatic KUBECONFIG detection for local Kind clusters.

New Gevals Testing Targets (build/gevals.mk):

  • gevals-check: Verify gevals is available and configured
  • gevals-run: Run gevals tests with OpenAI-compatible agent
  • gevals-run-claude: Run gevals tests with Claude Code agent

Enhanced Existing Target:

  • run-server: Now automatically sets KUBECONFIG with priority:
    _output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config
    This ensures the server uses the local Kind cluster when available.
    Default port changed to 8008 (from 8080).

Preserved Existing Target:

  • stop-server: Unchanged, stops the background MCP server

Gevals Testing Workflow:

  1. Check gevals availability (looks in PATH)
  2. Export KUBECONFIG for task verification scripts
  3. Create temporary eval config with corrected absolute paths
  4. Execute gevals evaluation with configured agent

Key Features:

  • Dynamic model name injection via MODEL_NAME environment variable
  • Optional LLM judge (disabled when JUDGE_* vars not set)
  • Fixes relative paths in eval files (../tasks → absolute paths)
  • Uses eval-inline.yaml for simpler agent configuration
  • Consistent KUBECONFIG handling across all targets

Uses in-tree evaluation files from evals/ directory:

  • evals/openai-agent/eval-inline.yaml
  • evals/claude-code/eval-inline.yaml
  • evals/tasks/ (shared task definitions)

The integration works seamlessly with local-env-setup and
local-env-setup-kubevirt targets, providing end-to-end testing of
the MCP server with a local Kind cluster.

Configuration via Environment Variables:

MCP Server (run-server):

  • MCP_PORT: Server port (default: 8008)
  • TOOLSETS: Comma-separated toolsets to enable
  • MCP_HEALTH_TIMEOUT: Health check timeout in seconds (default: 60)
  • MCP_HEALTH_INTERVAL: Health check interval in seconds (default: 2)

Gevals Testing:

  • GEVALS_BIN: Path to gevals binary (default: looks in PATH)
  • MODEL_NAME: AI model to use (default: gemini-2.0-flash)
  • MODEL_BASE_URL/MODEL_KEY: AI API credentials (required)
  • JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL_NAME: Optional LLM judge
  • GEVALS_ARGS: Additional arguments (e.g., -r "pattern" for filtering)

Documentation (docs/gevals-testing.md):

  • Complete guide for gevals testing with the MCP server
  • Documents both automatic and manual server management workflows
  • Includes configuration, task filtering, and troubleshooting
  • Examples for running specific test subsets (kubevirt, kubernetes, etc.)

Enhance the local development environment to fully support Podman as an
alternative to Docker, with improved container engine detection and
fixes for Kind cluster networking.

Container Engine Detection:
- Verify that docker or podman are actually running before selecting them
  by checking `docker info` / `podman info`
- Prevents errors when a container engine is installed but not active
- Consistently use CONTAINER_ENGINE variable throughout makefiles

Keycloak Connectivity:
- Add curl --resolve flag to bypass DNS and connect directly to localhost
- Ensures reliable connectivity to Keycloak running in the Kind cluster
- Prevents DNS-related connection failures

Kind Cluster Management:
- Set KIND_EXPERIMENTAL_PROVIDER inline when using podman on Linux
- Simplify cluster creation and deletion logic
- Consistently use CONTAINER_ENGINE for all container operations

Ingress Controller Fixes:
- Enable hostNetwork mode for nginx ingress controller
- Required for proper port binding with Podman in Kind clusters
- Limit nginx worker processes to 2 to avoid pthread resource exhaustion
- Remove redundant hostPort bindings (not needed with hostNetwork)
- Set dnsPolicy to ClusterFirstWithHostNet for proper name resolution

These changes enable seamless use of Podman for local Kubernetes
development with Kind, Keycloak, and the MCP server.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Add comprehensive KubeVirt support to enable local testing of VM
management tools with a complete Kind + KubeVirt setup.

New Makefile Targets:
- local-env-setup-kubevirt: Complete environment setup including KubeVirt
- kubevirt-install: Install KubeVirt operator and CDI
- kubevirt-uninstall: Remove KubeVirt and CDI from cluster
- kubevirt-status: Display KubeVirt, CDI, and VM status

KubeVirt Installation (build/kubevirt.mk):
- Installs KubeVirt v1.7.0 operator and custom resource
- Installs CDI (Containerized Data Importer) v1.64.0 for disk management
- Waits for components to be ready before proceeding
- Provides status checking for all KubeVirt resources

The local-env-setup-kubevirt target orchestrates a complete setup:
1. Creates Kind cluster with ingress and cert-manager
2. Installs KubeVirt and CDI
3. Builds the MCP server binary

This enables developers to test VM lifecycle management tools
(start, stop, restart, create) in a local Kubernetes environment.

Signed-off-by: Lee Yarwood <[email protected]>
Add Makefile targets to run gevals tests against the MCP server using
STDIO transport, and enhance the existing run-server target with
automatic KUBECONFIG detection and configurable toolsets.

New Gevals Testing Targets (build/gevals.mk):
- gevals-check: Verify gevals is available and configured
- gevals-run: Run gevals tests with OpenAI-compatible agent
- gevals-run-claude: Run gevals tests with Claude Code agent

Enhanced Existing Target:
- run-server: Now automatically sets KUBECONFIG with priority:
  _output/kubeconfig (Kind cluster) → $KUBECONFIG → ~/.kube/config
  This ensures the server uses the local Kind cluster when available.
  Default port changed to 8008 (from 8080).

Preserved Existing Target:
- stop-server: Unchanged, stops the background MCP server

Configurable Toolsets (TOOLSETS variable):
- Default: core,config,helm,kubevirt (includes KubeVirt for VM testing)
- Used by both run-server and gevals targets
- Override via environment: export TOOLSETS=core,config,helm
- Or via make argument: make gevals-run TOOLSETS=core,kubevirt

Gevals Testing Workflow:
1. Build the MCP server binary (added as dependency)
2. Export KUBECONFIG for task verification scripts
3. Generate temporary MCP config with absolute path to binary
4. Generate temporary eval config with corrected absolute paths
5. Execute gevals evaluation with configured agent

STDIO Transport for Gevals:
- Gevals spawns the MCP server binary directly as a subprocess
- No HTTP server required (unlike run-server which uses HTTP/SSE)
- Temporary MCP config generated with: command, args (--toolsets), env
- Each test run gets fresh server instance with specified toolsets

Key Features:
- Dynamic model name injection via MODEL_NAME environment variable
- Optional LLM judge (disabled when JUDGE_* vars not set)
- Fixes relative paths in eval files (../tasks → absolute paths)
- Uses eval-inline.yaml for simpler agent configuration
- Consistent KUBECONFIG handling across all targets
- Displays active configuration (binary path, toolsets, model) before running

Uses in-tree evaluation files from evals/ directory:
- evals/openai-agent/eval-inline.yaml
- evals/claude-code/eval-inline.yaml
- evals/tasks/ (shared task definitions)
- evals/mcp-config.yaml (updated to use STDIO transport)

The integration works seamlessly with local-env-setup and
local-env-setup-kubevirt targets, providing end-to-end testing of
the MCP server with a local Kind cluster.

Configuration via Environment Variables:

MCP Server (run-server):
- MCP_PORT: Server port (default: 8008)
- MCP_HEALTH_TIMEOUT: Health check timeout in seconds (default: 60)
- MCP_HEALTH_INTERVAL: Health check interval in seconds (default: 2)

Toolsets (both run-server and gevals):
- TOOLSETS: Comma-separated toolsets to enable (default: core,config,helm,kubevirt)

Gevals Testing:
- GEVALS_BIN: Path to gevals binary (default: looks in PATH)
- MODEL_NAME: AI model to use (default: gemini-2.0-flash)
- MODEL_BASE_URL/MODEL_KEY: AI API credentials (required)
- JUDGE_BASE_URL/JUDGE_API_KEY/JUDGE_MODEL_NAME: Optional LLM judge
- GEVALS_ARGS: Additional arguments (e.g., -r "pattern" for filtering)

Documentation (docs/gevals-testing.md):
- Complete guide for gevals testing with the MCP server
- Documents both automatic (gevals spawns server) and manual workflows
- Includes configuration, task filtering, and troubleshooting
- Examples for running specific test subsets (kubevirt, kubernetes, etc.)
- Explains STDIO vs HTTP transport modes

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
@lyarwood lyarwood force-pushed the add-gevals-commands-and-imporove-start-server branch from de1f3fb to 92623cc Compare January 12, 2026 19:36
@Cali0707 Cali0707 self-requested a review January 12, 2026 19:50
@lyarwood
Copy link
Contributor Author

/close

@lyarwood lyarwood closed this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant