Skip to content

[Feature]: Parallel evaluation runs #20

@danishcontractor

Description

@danishcontractor

Feature Request

Cuga does not have an easy way of overriding registry ports etc for if evaluation runs need to be executed in parallel on the same node in a cluster.

I tried a quick solution of adding a command line argument in cli.py but multiple places in the code have hard coded strings for reading values from settings file.

@app.command(help="Evaluate Cuga on your test cases", short_help="Run Cuga Evaluation")

def evaluate(
    test_cases_file_path: str = typer.Argument("", help="Path to your test cases file"),
    output_file_path: str = typer.Argument("results.json", help="Path to your output file, it defaults to 'results.json'"),
    registry_port: Optional[int] = typer.Option(None, "--registry-port", help="Port for the MCP registry server"),
):
    """Run Cuga on your test cases."""
    try:
        # Decide which port to use
        port_to_use = registry_port if registry_port is not None else settings.server_ports.registry

        # O
```ptional: show what was parsed
        print(f"[evaluate] registry_port parsed: {registry_port}, using: {port_to_use}")

        # Launch registry on the chosen port
        run_direct_service(
            "registry",
            [
                "uvicorn",
                "cuga.backend.tools_env.registry.registry.api_registry_server:app",
                "--host", "127.0.0.1",
                "--port", str(port_to_use),
            ],
        )


        if direct_processes:
            console.print()
            console.print(
                Panel(
                    f"[bold white]Registry:[/bold white] [cyan]http://localhost:{settings.server_ports.registry}[/cyan]",
                    title="[bold yellow]Registry service is running. Press Ctrl+C to stop[/bold yellow]",
                    border_style="cyan",
                    padding=(1, 2),
                )
            )
            # Wait for registry to start
            logger.info("Waiting for registry to start...")
            wait_for_registry_server(port_to_use)

            # Then start demo - using explicit fastapi command
            run_direct_service(
                "evaluation",
                [
                    "uv",
                    "run",
                    "--group",
                    "dev",
                    os.path.join(PACKAGE_ROOT, "evaluation/evaluate_cuga.py"),
                    "-t",
                    test_cases_file_path,
                    "-r",
                    output_file_path,
                ],
            )
        wait_for_direct_processes()

    except Exception as e:
        logger.error(f"Error starting registry service: {e}")
        stop_direct_processes()
        raise typer.Exit(1)
    return

Motivation / Problem

Allow an evaluation to be executed as follows
MCP_SERVERS_FILE = cuga evaluate <output_dir> --registry_port

Use Case

Parallel evaluation of multiple domains

Proposed Solution

Unclear

Alternatives Considered

See main method

Priority

High - Important for my workflow

Implementation Complexity (if known)

None

Additional Context

No response

Checklist

  • I have searched existing issues and feature requests to ensure this is not a duplicate
  • I have provided a clear use case and motivation for this feature
  • I am willing to help test this feature once implemented
  • I am interested in contributing to the implementation of this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions