-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Open
Labels
Description
System Info
Google Colab
transformersversion: 4.57.6- Platform: Linux-6.6.105+-x86_64-with-glibc2.35
- Python version: 3.12.12
- Huggingface_hub version: 0.36.0
- Safetensors version: 0.7.0
- Accelerate version: 1.12.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.9.0+cu126 (CUDA)
- Tensorflow version (GPU?): 2.19.1 (False)
- Flax version (CPU?/GPU?/TPU?): 0.11.2 (gpu)
- Jax version: 0.7.2
- JaxLib version: 0.7.2
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: Tesla T4
Who can help?
@yonigozlan @molbap @zucchini-nlp
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Steps to reproduce:
- The issue is apparent when comparing official JAX implementation from @mitscha link to example with official SigLIP2 example from HF docs
- When running inference in same model config the learned temperature and bias are different leading to "not ideal" performance in zero shot image classification. Please note that as of recent times the model is "operational", in previous version of HF transformers the model always gave 0% probs to all candidate labels.
| Implementation | Temperature (Logit scale) | Bias |
|---|---|---|
| JAX Official | 109.9 | -15.9 |
| HF Transformers 4.57.6 | 4.6994 | -15.9324 |
Expected behavior
I expected learned temperature and bias to be similar.
When setting temperature to 100+ in HF implementation, model performance improves.
Please see attached notebook for a full example to replicate this issue.