SigLIP2 discrepancy between HF implementation and original JAX implementation

### System Info

Google Colab

- `transformers` version: 4.57.6
- Platform: Linux-6.6.105+-x86_64-with-glibc2.35
- Python version: 3.12.12
- Huggingface_hub version: 0.36.0
- Safetensors version: 0.7.0
- Accelerate version: 1.12.0
- Accelerate config: 	not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.9.0+cu126 (CUDA)
- Tensorflow version (GPU?): 2.19.1 (False)
- Flax version (CPU?/GPU?/TPU?): 0.11.2 (gpu)
- Jax version: 0.7.2
- JaxLib version: 0.7.2
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: Tesla T4

### Who can help?

@yonigozlan @molbap @zucchini-nlp

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Steps to reproduce:

1. The issue is apparent when comparing official JAX implementation from @mitscha [link to example](https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb) with official SigLIP2 [example from HF docs](https://huggingface.co/docs/transformers/en/model_doc/siglip2?usage=Pipeline#overview)
2. When running inference in same model config the learned temperature and bias are different leading to "not ideal" performance in zero shot image classification. Please note that as of recent times the model is "operational", in previous version of HF transformers the model always gave 0% probs to all candidate labels.

| Implementation | Temperature (Logit scale) | Bias |
|--------|--------|--------|
| JAX Official | 109.9 | -15.9 |
| HF Transformers 4.57.6 | 4.6994 | -15.9324 |

### Expected behavior

I expected learned temperature and bias to be similar.

When setting temperature to 100+ in HF implementation, model performance improves.

Please see attached notebook for a full example to replicate this issue.

[SigLIP2_temperature_bias.ipynb](https://github.com/user-attachments/files/24858616/SigLIP2_temperature_bias.ipynb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SigLIP2 discrepancy between HF implementation and original JAX implementation #43493

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SigLIP2 discrepancy between HF implementation and original JAX implementation #43493

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions