Skip to content

EvaluationΒ #16

@ChicyChen

Description

@ChicyChen

Dear authors,

Could you please provide more details on how you evaluate different benchmarks (CV-Bench, BLINK, RoboSpatial, etc) for different models (Qwen-2.5-VL-7B, SpaceLLaVA, RoboPoint, etc)? I try to reproduce the results in the paper, but find big differences. For example, the results I got for SpaceLLaVA using the official eval codes for RoboSpatial is much lower than what the paper reports. I would really appreciate your help.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions