-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Dear authors,
Could you please provide more details on how you evaluate different benchmarks (CV-Bench, BLINK, RoboSpatial, etc) for different models (Qwen-2.5-VL-7B, SpaceLLaVA, RoboPoint, etc)? I try to reproduce the results in the paper, but find big differences. For example, the results I got for SpaceLLaVA using the official eval codes for RoboSpatial is much lower than what the paper reports. I would really appreciate your help.
Thanks!
Zhoues
Metadata
Metadata
Assignees
Labels
No labels