Haixuan Xavier Tao
cf88bcd8fa
Update dependency transformers to >=4.48.0,<=4.48.0 [SECURITY] - abandoned ( #778 )
This PR contains the following updates:
| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [transformers](https://redirect.github.com/huggingface/transformers ) |
`>=4.43.0,<=4.43.3` -> `>=4.48.0,<=4.48.0` |
[](https://docs.renovatebot.com/merge-confidence/ )
|
[](https://docs.renovatebot.com/merge-confidence/ )
|
[](https://docs.renovatebot.com/merge-confidence/ )
|
[](https://docs.renovatebot.com/merge-confidence/ )
|
### GitHub Vulnerability Alerts
#### [CVE-2024-11392](https://nvd.nist.gov/vuln/detail/CVE-2024-11392 )
Hugging Face Transformers MobileViTV2 Deserialization of Untrusted Data
Remote Code Execution Vulnerability. This vulnerability allows remote
attackers to execute arbitrary code on affected installations of Hugging
Face Transformers. User interaction is required to exploit this
vulnerability in that the target must visit a malicious page or open a
malicious file.
The specific flaw exists within the handling of configuration files. The
issue results from the lack of proper validation of user-supplied data,
which can result in deserialization of untrusted data. An attacker can
leverage this vulnerability to execute code in the context of the
current user. Was ZDI-CAN-24322.
#### [CVE-2024-11394](https://nvd.nist.gov/vuln/detail/CVE-2024-11394 )
Hugging Face Transformers Trax Model Deserialization of Untrusted Data
Remote Code Execution Vulnerability. This vulnerability allows remote
attackers to execute arbitrary code on affected installations of Hugging
Face Transformers. User interaction is required to exploit this
vulnerability in that the target must visit a malicious page or open a
malicious file.
The specific flaw exists within the handling of model files. The issue
results from the lack of proper validation of user-supplied data, which
can result in deserialization of untrusted data. An attacker can
leverage this vulnerability to execute code in the context of the
current user. Was ZDI-CAN-25012.
#### [CVE-2024-11393](https://nvd.nist.gov/vuln/detail/CVE-2024-11393 )
Hugging Face Transformers MaskFormer Model Deserialization of Untrusted
Data Remote Code Execution Vulnerability. This vulnerability allows
remote attackers to execute arbitrary code on affected installations of
Hugging Face Transformers. User interaction is required to exploit this
vulnerability in that the target must visit a malicious page or open a
malicious file.
The specific flaw exists within the parsing of model files. The issue
results from the lack of proper validation of user-supplied data, which
can result in deserialization of untrusted data. An attacker can
leverage this vulnerability to execute code in the context of the
current user. Was ZDI-CAN-25191.
---
### Release Notes
<details>
<summary>huggingface/transformers (transformers)</summary>
###
[`v4.48.0`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.48.0 ):
: ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose,
DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge,
Moonshine
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.47.1...v4.48.0 )
#### New models
##### ModernBERT
The ModernBert model was proposed in [Smarter, Better, Faster, Longer: A
Modern Bidirectional Encoder for Fast, Memory Efficient, and Long
Context Finetuning and Inference](https://arxiv.org/abs/2412.13663 ) by
Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar
Hallström, Said Taghadouini, Alexis Galalgher, Raja Bisas, Faisal
Ladhak, Tom Aarsen, Nathan Cooper, Grifin Adams, Jeremy Howard and
Iacopo Poli.
It is a refresh of the traditional encoder architecture, as used in
previous models such as
[BERT](https://huggingface.co/docs/transformers/en/model_doc/bert ) and
[RoBERTa](https://huggingface.co/docs/transformers/en/model_doc/roberta ).
It builds on BERT and implements many modern architectural improvements
which have been developed since its original release, such as:
- [Rotary Positional
Embeddings](https://huggingface.co/blog/designing-positional-encoding )
to support sequences of up to 8192 tokens.
- [Unpadding](https://arxiv.org/abs/2208.08124 ) to ensure no compute is
wasted on padding tokens, speeding up processing time for batches with
mixed-length sequences.
- [GeGLU](https://arxiv.org/abs/2002.05202 ) Replacing the original MLP
layers with GeGLU layers, shown to improve performance.
- [Alternating Attention](https://arxiv.org/abs/2004.05150v2 ) where most
attention layers employ a sliding window of 128 tokens, with Global
Attention only used every 3 layers.
- [Flash
Attention](https://redirect.github.com/Dao-AILab/flash-attention ) to
speed up processing.
- A model designed following recent [The Case for Co-Designing Model
Architectures with Hardware](https://arxiv.org/abs/2401.14489 ), ensuring
maximum efficiency across inference GPUs.
- Modern training data scales (2 trillion tokens) and mixtures
(including code ande math data)

- Add ModernBERT to Transformers by
[@​warner-benjamin](https://redirect.github.com/warner-benjamin )
in
[#​35158](https://redirect.github.com/huggingface/transformers/issues/35158 )
##### Aria
The Aria model was proposed in [Aria: An Open Multimodal Native
Mixture-of-Experts Model](https://huggingface.co/papers/2410.05993 ) by
Li et al. from the Rhymes.AI team.
Aria is an open multimodal-native model with best-in-class performance
across a wide range of multimodal, language, and coding tasks. It has a
Mixture-of-Experts architecture, with respectively 3.9B and 3.5B
activated parameters per visual token and text token.
- Add Aria by
[@​aymeric-roucher](https://redirect.github.com/aymeric-roucher )
in
[#​34157](https://redirect.github.com/huggingface/transformers/issues/34157 )

##### TimmWrapper
We add a `TimmWrapper` set of classes such that timm models can be
loaded in as transformer models into the library.
Here's a general usage example:
```py
import torch
from urllib.request import urlopen
from PIL import Image
from transformers import AutoConfig, AutoModelForImageClassification, AutoImageProcessor
checkpoint = "timm/resnet50.a1_in1k"
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png '
))
image_processor = AutoImageProcessor.from_pretrained(checkpoint)
inputs = image_processor(img, return_tensors="pt")
model = AutoModelForImageClassification.from_pretrained(checkpoint)
with torch.no_grad():
logits = model(**inputs).logits
top5_probabilities, top5_class_indices = torch.topk(logits.softmax(dim=1) * 100, k=5)
```
Thanks to this, timm models now have access to pipelines, as well as
`Trainer`, accelerate device maps, quantization, etc:
```py
import torch
from urllib.request import urlopen
from PIL import Image
from transformers import pipeline
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png '
))
pipe = pipeline("image-classification", model="timm/resnet18.a1_in1k")
print(pipe(img))
```
- Add TimmWrapper by
[@​qubvel](https://redirect.github.com/qubvel ) and
[@​amyeroberts](https://redirect.github.com/amyeroberts ) in
[#​34564](https://redirect.github.com/huggingface/transformers/issues/34564 )
##### Pixtral-Large
Pixtral modeling and checkpoint conversion code has been updated to
support the new
[Pixtral-Large](https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 )
model.
- Update Pixtral conversion script to support large format! by
[@​arthurzucker](https://redirect.github.com/arthurzucker ) in
[#​34801](https://redirect.github.com/huggingface/transformers/issues/34801 )
##### ColPali
The ColPali model was proposed in [ColPali: Efficient Document Retrieval
with Vision Language Models](https://doi.org/10.48550/arXiv.2407.01449 )
by Manuel Faysse\*, Hugues Sibille\*, Tony Wu\*, Bilel Omrani, Gautier
Viaud, Céline Hudelot, Pierre Colombo (\* denotes equal contribution).
Work lead by ILLUIN Technology.
In the proposed ColPali approach, the authors leverage VLMs to construct
efficient multi-vector embeddings directly from document images
(“screenshots”) for document retrieval. They train the model to maximize
the similarity between these document embeddings and the corresponding
query embeddings, using the late interaction method introduced in
ColBERT.

- Add ColPali to 🤗 transformers by
[@​tonywu71](https://redirect.github.com/tonywu71 ) and
[@​yonigozlan](https://redirect.github.com/yonigozlan ) in
[#​33736](https://redirect.github.com/huggingface/transformers/issues/33736 )
##### Falcon3
Falcon3 represents a natural evolution from previous releases,
emphasizing expanding the models’ science, math, and code capabilities.
This iteration includes five base models: Falcon3-1B-Base,
Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and
Falcon3-10B-Base. In developing these models, the authors incorporated
several key innovations aimed at improving the models’ performances
while reducing training costs:
One pre-training: They conducted a single large-scale pretraining run on
the 7B model, using 2048 H100 GPU chips, leveraging 14 trillion tokens
featuring web, code, STEM, and curated high-quality and multilingual
data. Depth up-scaling for improved reasoning: Building on recent
studies on the effects of model depth, they upscaled the 7B model to a
10B parameters model by duplicating the redundant layers and continuing
pre-training with 2TT of high-quality data. This yielded
Falcon3-10B-Base which achieves state-of-the-art zero-shot and few-shot
performance for models under 13B parameters. Knowledge distillation for
better tiny models: To provide compact and efficient alternatives, we
developed Falcon3-1B-Base and Falcon3-3B-Base by leveraging pruning and
knowledge distillation techniques, using less than 100GT of curated
high-quality data, thereby redefining pre-training efficiency.
- Add Falcon3 documentation by
[@​mokeddembillel](https://redirect.github.com/mokeddembillel ) in
[#​35307](https://redirect.github.com/huggingface/transformers/issues/35307 )
##### Bamba
Bamba-9B is a decoder-only language model based on the
[Mamba-2](https://redirect.github.com/state-spaces/mamba ) architecture
and is designed to handle a wide range of text generation tasks. It is
trained from scratch using a two-stage training approach. In the first
stage, the model is trained on 2 trillion tokens from the Dolma v1.7
dataset. In the second stage, it undergoes additional training on 200
billion tokens, leveraging a carefully curated blend of high-quality
data to further refine its performance and enhance output quality.
Checkout all Bamba-9B model checkpoints
[here](https://redirect.github.com/foundation-model-stack/bamba ).
- Add the Bamba Model by
[@​fabianlim](https://redirect.github.com/fabianlim ) in
[#​34982](https://redirect.github.com/huggingface/transformers/issues/34982 )
##### VitPose
ViTPose is a state-of-the-art vision transformer-based model for human
pose estimation, introduced by Yufei Xu, Jing Zhang, Qiming Zhang, and
Dacheng Tao in ["ViTPose: Simple Vision Transformer Baselines for Human
Pose Estimation”](https://arxiv.org/abs/2204.12484 ).
The model leverages the capabilities of vision transformers to
accurately predict 2D human keypoints. Adopting a top-down approach,
ViTPose estimates keypoints locations for each detected person, allowing
it to be easily used with any object detection model.

- Add VitPose by
[@​SangbumChoi](https://redirect.github.com/SangbumChoi ) and
[@​NielsRogge](https://redirect.github.com/NielsRogge ) in
[#​30530](https://redirect.github.com/huggingface/transformers/issues/30530 )
##### DINOv2 with registers
The DINOv2 with Registers model was proposed in [Vision Transformers
Need Registers](https://arxiv.org/abs/2309.16588 ) by Timothée Darcet,
Maxime Oquab, Julien Mairal, Piotr Bojanowski.
The [Vision
Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/vit )
(ViT) is a transformer encoder model (BERT-like) originally introduced
to do supervised image classification on ImageNet.
Next, people figured out ways to make ViT work really well on
self-supervised image feature extraction (i.e. learning meaningful
features, also called embeddings) on images without requiring any
labels. Some example papers here include
[DINOv2](https://huggingface.co/docs/transformers/main/en/model_doc/dinov2 )
and
[MAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae ).
The authors of DINOv2 noticed that ViTs have artifacts in attention
maps. It’s due to the model using some image patches as “registers”. The
authors propose a fix: just add some new tokens (called “register”
tokens), which you only use during pre-training (and throw away
afterwards). This results in:
- no artifacts
- interpretable attention maps
- and improved performances.
<!---->
- Add DINOv2 with registers by
[@​NielsRogge](https://redirect.github.com/NielsRogge ) in
[#​35348](https://redirect.github.com/huggingface/transformers/issues/35348 )
##### Emu3
The Emu3 model was proposed in [Emu3: Next-Token Prediction is All You
Need](https://arxiv.org/abs/2409.18869 ) by Xinlong Wang, Xiaosong Zhang,
Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze
Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li,
Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi
Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang.
Emu3 sets a new standard in multimodal AI by using next-token prediction
to handle images, text, and videos. It simplifies multimodal modeling by
tokenizing all data into a unified format and training a single
transformer. Visual data is tokenized using vector quantization methods
based on [VQ-VAE](https://arxiv.org/abs/1711.00937 ) model. Discretized
visual tokens are later fused with text token ids for image and text
generation.
Emu3 outperforms leading models like SDXL and LLaVA-1.6 in both
generation and perception tasks, without relying on diffusion or
compositional methods..
- Add Emu3 by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp ) in
[#​33770](https://redirect.github.com/huggingface/transformers/issues/33770 )
##### Cohere2
A new Cohere update was added through a new "Cohere2" set of classes.
- Add Cohere2 model by
[@​alexrs-cohere](https://redirect.github.com/alexrs-cohere ) in
[#​35224](https://redirect.github.com/huggingface/transformers/issues/35224 )
##### TextNet
[TextNet](https://arxiv.org/abs/2111.02394 ) is a lightweight and
efficient architecture designed specifically for text detection,
offering superior performance compared to traditional models like
MobileNetV3. With variants TextNet-T, TextNet-S, and TextNet-B (6.8M,
8.0M, and 8.9M parameters respectively), it achieves an excellent
balance between accuracy and inference speed.
- Add TextNet by
[@​jadechoghari](https://redirect.github.com/jadechoghari ) in
[#​34979](https://redirect.github.com/huggingface/transformers/issues/34979 )
##### DiffLlama
[Differential Transformer](https://arxiv.org/abs/2410.05258 ) combines
the Llama architecture with Differential Transformer's Attention.
- Add DiffLllama by
[@​weak-kajuma](https://redirect.github.com/weak-kajuma ) in
[#​34083](https://redirect.github.com/huggingface/transformers/issues/34083 )
##### PixtralLarge
The conversion script needed a few update, while the modeling code was
barely changed!
- \[PixtralLarge] Update Pixtral conversion script to support large
format!
([#​34801](https://redirect.github.com/huggingface/transformers/issues/34801 ))
##### Moonshine
Moonshine is an autoregressive speech recognition encoder-decoder model
that improves upon Whisper's architecture. Namely, it replaces absolute
position embeddings with Rotary Position Embeddings (RoPE). This allows
Moonshine to handle audio inputs of any length, unlike Whisper, which is
restricted to fixed 30-second windows. It was introduced by Nat
Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, and
Pete Warden in [Moonshine: Speech Recognition for Live Transcription and
Voice Commands
](https://arxiv.org/abs/2410.15608 ).
- Add Moonshine by [@​eustlb](https://redirect.github.com/eustlb )
in
[#​34784](https://redirect.github.com/huggingface/transformers/issues/34784 )
#### Quantization methods
##### VPTQ Quantization
From the VPTQ contributors:
> VPTQ is a novel Post-Training Quantization method that leverages
Vector Quantization to high accuracy on LLMs at an extremely low
bit-width (<2-bit). VPTQ can compress 70B, even the 405B model, to 1-2
bits without retraining and maintain high accuracy.. More details here:
https://github.com/microsoft/vptq
- FEAT : Adding VPTQ quantization method to HFQuantizer by
[@​wejoncy](https://redirect.github.com/wejoncy ) in
[#​34770](https://redirect.github.com/huggingface/transformers/issues/34770 )
##### HIGGS Quantization
From the contributors:
> HIGGS is a new 0-shot quantization algorithm that combines Hadamard
preprocessing with MSE-Optimal quantization grids to achieve lower
quantization error and SOTA performance. You can find more information
in the [paper](https://arxiv.org/abs/2411.17525 ).
>
> Runtime support for HIGGS is implemented through
[FLUTE](https://arxiv.org/abs/2407.10960 ), and its
[library](https://redirect.github.com/HanGuo97/flute?tab=readme-ov-file ).
>
> This PR adds support for HIGGS+FLUTE into transformers allowing for
low-error 0-shot quantization and fast LLM inference.
- HIGGS Quantization Support by
[@​BlackSamorez](https://redirect.github.com/BlackSamorez ) in
[#​34997](https://redirect.github.com/huggingface/transformers/issues/34997 )
#### Cleanup
We merged a cleanup for vision language models, to make sure it all
models are standardized.
- VLMs: major clean up 🧼
([#​34502](https://redirect.github.com/huggingface/transformers/issues/34502 ))
#### Breaking changes
##### Conversion scripts
Many models in Transformers include scripts to convert the original
model checkpoints into a Transformers-compatible format. These scripts
can be found in the repo using the glob pattern
`models/**/convert_*.py`. They were a recurring source of vulnerability
reports and CVEs because many models were originally released using
insecure formats like older PyTorch `.bin` weights or `pickle` files.
The conversion scripts had to open these formats, and this meant that
they were vulnerable to maliciously crafted inputs.
In practice, we do not see this as a serious vulnerability. The
conversion scripts are never imported or called by the rest of the
library; each script is standalone, and so the only way to exploit the
vulnerability is to create a malicious checkpoint, induce a user to
download it, and then also induce them to manually call a specific
conversion script on it.
However, even if there is little practical risk of an exploit, we are
aware that open vulnerability reports create a compliance problem for
users, and so beginning with this release we will be excluding these
conversion scripts from release branches and wheels. They will remain
accessible to developers on the `main` branch.
- 🚨🚨🚨 Delete conversion scripts when making release wheels by
[@​Rocketknight1](https://redirect.github.com/Rocketknight1 ) in
[#​35296](https://redirect.github.com/huggingface/transformers/issues/35296 )
##### Backtracking in Nougat
A regular expression used within the Nougat code has been modified to
ensure it does not hang. The method should output the same results but
we cannot guarantee it; we recommend upgrading to the latest
transformers if you use this model to ensure your code is
performance-optimized.
- 🚨🚨🚨 Limit backtracking in Nougat regexp by
[@​qubvel](https://redirect.github.com/qubvel ) in
[#​35264](https://redirect.github.com/huggingface/transformers/issues/35264 )
##### Whisper decoding
This PR finalizes work that aimes to enable short-form (< 30 secs) and
long-form generation using temperature fallback. It is a significant
improvement to the whisper codebase, but it does result in the following
breaking changes:
➡️ **Previously:**\
• Short-form: Returned a `ModelOutput` or `torch.LongTensor`, including
decoder input IDs and the EOS token ID.\
• Long-form: Returned a `Dict` or `torch.LongTensor`, excluding decoder
input IDs and the EOS token ID.
➡️ **From now on:**\
Short-form and long-form generation are now treated identically, meaning
output differentiation based on these modes is no longer applicable.
Decoder input IDs and EOS token IDs are never returned, except in two
specific cases: when `return_dict_in_generate=True` and
(`return_timestamps=False` or `force_unique_generate_call=True`).
In this case, the output will be a `ModelOutput`, which is the result of
the underlying call to GenerationMixin’s generate. Indeed,
`return_timestamps=False` ensures no seeking occurs; only a single call
to generate is made. Therefore, this output includes both decoder input
IDs and the EOS token ID.
- \[Whisper] 🚨 Fix whisper decoding 🚨 by
[@​eustlb](https://redirect.github.com/eustlb ) in
[#​34135](https://redirect.github.com/huggingface/transformers/issues/34135 )
##### Attention refactor
In order to have a cleaner, isolated, future-proof code for the
attention layers, they have been refactored so as to keep the model
attention code within their files; but attention definitions relating to
SDPA, Flash Attention, and other types of attention have been moved to a
common file.
- 🚨 All attention refactor🚨 by
[@​ArthurZucker](https://redirect.github.com/ArthurZucker ) in
[#​35235](https://redirect.github.com/huggingface/transformers/issues/35235 )
#### Bugfixes and improvements
- \[tokenizers] Ensure that add_prefix_space is propagated to
backend_tokenizer.pre_tokenizer
([#​35593](https://redirect.github.com/huggingface/transformers/issues/35593 ))
- Setup loss_type in config at model init time
([#​34616](https://redirect.github.com/huggingface/transformers/issues/34616 ))
- \[docs] Update Python version in translations by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35096](https://redirect.github.com/huggingface/transformers/issues/35096 )
- \[docs] top_p, top_k, temperature docstrings by
[@​stevhliu](https://redirect.github.com/stevhliu ) in
[#​35065](https://redirect.github.com/huggingface/transformers/issues/35065 )
- Fix private forked repo. CI by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35114](https://redirect.github.com/huggingface/transformers/issues/35114 )
- Add feature dim attributes to BitLinear for easier PEFT integration by
[@​agostinv](https://redirect.github.com/agostinv ) in
[#​34946](https://redirect.github.com/huggingface/transformers/issues/34946 )
- Update I-JEPA checkpoints path by
[@​qubvel](https://redirect.github.com/qubvel ) in
[#​35120](https://redirect.github.com/huggingface/transformers/issues/35120 )
- Fix GA loss bugs and add unit test by
[@​techkang](https://redirect.github.com/techkang ) in
[#​35121](https://redirect.github.com/huggingface/transformers/issues/35121 )
- \[I-JEPA] Update docs by
[@​NielsRogge](https://redirect.github.com/NielsRogge ) in
[#​35148](https://redirect.github.com/huggingface/transformers/issues/35148 )
- Corrected typo in agent system prompts by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35143](https://redirect.github.com/huggingface/transformers/issues/35143 )
- Option to set 'non_blocking' for to(device) in BatchEncoding and
BatchFeature by
[@​daniel-bogdoll](https://redirect.github.com/daniel-bogdoll ) in
[#​34883](https://redirect.github.com/huggingface/transformers/issues/34883 )
- Fix typo in EETQ Tests by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​35160](https://redirect.github.com/huggingface/transformers/issues/35160 )
- Cleanup: continue the init refactor by
[@​LysandreJik](https://redirect.github.com/LysandreJik ) in
[#​35167](https://redirect.github.com/huggingface/transformers/issues/35167 )
- Super tiny fix logging message by
[@​fzyzcjy](https://redirect.github.com/fzyzcjy ) in
[#​35132](https://redirect.github.com/huggingface/transformers/issues/35132 )
- Fixed typo of 'avilable' in prompts.py by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35145](https://redirect.github.com/huggingface/transformers/issues/35145 )
- \[CI] Fix bnb quantization tests with accelerate>=1.2.0 by
[@​matthewdouglas](https://redirect.github.com/matthewdouglas ) in
[#​35172](https://redirect.github.com/huggingface/transformers/issues/35172 )
- Fix `num_items_in_batch` not being an integer by
[@​xspirus](https://redirect.github.com/xspirus ) in
[#​35115](https://redirect.github.com/huggingface/transformers/issues/35115 )
- Assisted decoding multi-gpu by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp ) in
[#​35116](https://redirect.github.com/huggingface/transformers/issues/35116 )
- Fix file path for shard_num 1 with mllama converter by
[@​strangiato](https://redirect.github.com/strangiato ) in
[#​35053](https://redirect.github.com/huggingface/transformers/issues/35053 )
- Support BatchNorm in Hubert pos_conv_emb as in fairseq by
[@​gallilmaimon](https://redirect.github.com/gallilmaimon ) in
[#​34389](https://redirect.github.com/huggingface/transformers/issues/34389 )
- Remove unnecessary masked_fill in deberta models by
[@​xadupre](https://redirect.github.com/xadupre ) in
[#​35182](https://redirect.github.com/huggingface/transformers/issues/35182 )
- Fix DBRX LayerNorm init method by
[@​hgt312](https://redirect.github.com/hgt312 ) in
[#​35177](https://redirect.github.com/huggingface/transformers/issues/35177 )
- Fixing GGUF support for StableLm by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​35060](https://redirect.github.com/huggingface/transformers/issues/35060 )
- \[i18n-ar] Translated file : `docs/source/ar/community.md` into Arabic
by [@​AhmedAlmaghz](https://redirect.github.com/AhmedAlmaghz ) in
[#​33027](https://redirect.github.com/huggingface/transformers/issues/33027 )
- Multiple typo fixes in NLP, Audio docs by
[@​henryhmko](https://redirect.github.com/henryhmko ) in
[#​35181](https://redirect.github.com/huggingface/transformers/issues/35181 )
- Only import torch.distributed if it is available by
[@​GaetanLepage](https://redirect.github.com/GaetanLepage ) in
[#​35133](https://redirect.github.com/huggingface/transformers/issues/35133 )
- \[i18n-<languageCode>] Translating Benchmarks.md to Chinese by
[@​asdkfjsd](https://redirect.github.com/asdkfjsd ) in
[#​35137](https://redirect.github.com/huggingface/transformers/issues/35137 )
- \[docs] Fix FlashAttention link by
[@​stevhliu](https://redirect.github.com/stevhliu ) in
[#​35171](https://redirect.github.com/huggingface/transformers/issues/35171 )
- Update data collator docstrings to accurately reference Nvidia tensor
core compute capability version by
[@​johngrahamreynolds](https://redirect.github.com/johngrahamreynolds )
in
[#​35188](https://redirect.github.com/huggingface/transformers/issues/35188 )
- \[i18n-<languageCode>] Translating agents.md to Chinese by
[@​HMJ0628](https://redirect.github.com/HMJ0628 ) in
[#​35139](https://redirect.github.com/huggingface/transformers/issues/35139 )
- BLIP: enable device map by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp ) in
[#​34850](https://redirect.github.com/huggingface/transformers/issues/34850 )
- 🧹 Remove deprecated RotaryEmbedding parts in the Attention layers by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez ) in
[#​34858](https://redirect.github.com/huggingface/transformers/issues/34858 )
- \[PEFT] Better Trainer error when prompt learning with loading best
model at the end by
[@​BenjaminBossan](https://redirect.github.com/BenjaminBossan ) in
[#​35087](https://redirect.github.com/huggingface/transformers/issues/35087 )
- Cleanup: continue the init refactor by
[@​LysandreJik](https://redirect.github.com/LysandreJik ) in
[#​35170](https://redirect.github.com/huggingface/transformers/issues/35170 )
- Fix CI by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez ) in
[#​35208](https://redirect.github.com/huggingface/transformers/issues/35208 )
- Fix seamless TTS generate by
[@​ylacombe](https://redirect.github.com/ylacombe ) in
[#​34968](https://redirect.github.com/huggingface/transformers/issues/34968 )
- docs: clarify initializer_range parameter description in
Idefics3VisionConfig by
[@​h3110Fr13nd](https://redirect.github.com/h3110Fr13nd ) in
[#​35215](https://redirect.github.com/huggingface/transformers/issues/35215 )
- Fixed typo of 'indentifier' in audio_utils.py by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35226](https://redirect.github.com/huggingface/transformers/issues/35226 )
- Fix type hints for apply_chat_template by
[@​Rocketknight1](https://redirect.github.com/Rocketknight1 ) in
[#​35216](https://redirect.github.com/huggingface/transformers/issues/35216 )
- Support Python 3.10+ Union style in chat template type hints parsing
by [@​RezaRahemtola](https://redirect.github.com/RezaRahemtola ) in
[#​35103](https://redirect.github.com/huggingface/transformers/issues/35103 )
- Refactoring `AssistedCandidateGenerator` for Improved Modularity and
Reusability by
[@​keyboardAnt](https://redirect.github.com/keyboardAnt ) and
[@​jmamou](https://redirect.github.com/jmamou ) in
[#​35009](https://redirect.github.com/huggingface/transformers/issues/35009 )
- Change back to `Thread` for SF conversion by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35236](https://redirect.github.com/huggingface/transformers/issues/35236 )
- \[Init refactor] Modular changes by
[@​LysandreJik](https://redirect.github.com/LysandreJik ) in
[#​35240](https://redirect.github.com/huggingface/transformers/issues/35240 )
- Fix typo in chat template example by
[@​EricWinsorDSIT](https://redirect.github.com/EricWinsorDSIT ) in
[#​35250](https://redirect.github.com/huggingface/transformers/issues/35250 )
- Run model as compressed/uncompressed mode by
[@​horheynm](https://redirect.github.com/horheynm ) in
[#​34719](https://redirect.github.com/huggingface/transformers/issues/34719 )
- skip Fuyu from test_generate by
[@​nhamanasu](https://redirect.github.com/nhamanasu ) in
[#​35246](https://redirect.github.com/huggingface/transformers/issues/35246 )
- \[tests] fix "Tester object has no attribute '\_testMethodName'" by
[@​faaany](https://redirect.github.com/faaany ) in
[#​34910](https://redirect.github.com/huggingface/transformers/issues/34910 )
- Use `rsfE` with `pytest` by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35119](https://redirect.github.com/huggingface/transformers/issues/35119 )
- Update AMD docker image (rocm 6.1) by
[@​ivarflakstad](https://redirect.github.com/ivarflakstad ) in
[#​35259](https://redirect.github.com/huggingface/transformers/issues/35259 )
- Fixed typos in Audio Classification Documentation by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35263](https://redirect.github.com/huggingface/transformers/issues/35263 )
- Translating agents_advanced.md to Chinese by
[@​HMJ0628](https://redirect.github.com/HMJ0628 ) in
[#​35231](https://redirect.github.com/huggingface/transformers/issues/35231 )
- Fix FSDP no longer working by
[@​muellerzr](https://redirect.github.com/muellerzr ) in
[#​35212](https://redirect.github.com/huggingface/transformers/issues/35212 )
- don't use no_sync when deepspeed doesn't support it for certain zero
stages by [@​winglian](https://redirect.github.com/winglian ) in
[#​35157](https://redirect.github.com/huggingface/transformers/issues/35157 )
- \[i18n-Chinese] Translating perf_train_cpu.md to Chinese by
[@​asdkfjsd](https://redirect.github.com/asdkfjsd ) in
[#​35242](https://redirect.github.com/huggingface/transformers/issues/35242 )
- Fall back to slow image processor in ImageProcessingAuto when no fast
processor available by
[@​yonigozlan](https://redirect.github.com/yonigozlan ) in
[#​34785](https://redirect.github.com/huggingface/transformers/issues/34785 )
- Aggeregate test summary files in CircleCI workflow runs by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​34989](https://redirect.github.com/huggingface/transformers/issues/34989 )
- Blip: fix offloading and MP tests by
[@​zucchini-nlp](https://redirect.github.com/zucchini-nlp ) in
[#​35239](https://redirect.github.com/huggingface/transformers/issues/35239 )
- Fix : model used to test ggml conversion of Falcon-7b is incorrect by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​35083](https://redirect.github.com/huggingface/transformers/issues/35083 )
- Temporarily disable amd push ci by
[@​ivarflakstad](https://redirect.github.com/ivarflakstad ) in
[#​35293](https://redirect.github.com/huggingface/transformers/issues/35293 )
- Delete redundancy for loop checks. by
[@​zhanluxianshen](https://redirect.github.com/zhanluxianshen ) in
[#​35288](https://redirect.github.com/huggingface/transformers/issues/35288 )
- \[Whisper] patch float type on mps by
[@​eustlb](https://redirect.github.com/eustlb ) in
[#​35295](https://redirect.github.com/huggingface/transformers/issues/35295 )
- Fix typos in Translated Audio Classification Docs by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35287](https://redirect.github.com/huggingface/transformers/issues/35287 )
- Translating "translate perf_infer_gpu_multi.md" to Chinese by
[@​HMJ0628](https://redirect.github.com/HMJ0628 ) in
[#​35271](https://redirect.github.com/huggingface/transformers/issues/35271 )
- Fix wrongs in quicktour\[zh] by
[@​zhanluxianshen](https://redirect.github.com/zhanluxianshen ) in
[#​35272](https://redirect.github.com/huggingface/transformers/issues/35272 )
- Improved documentation of Automatic speech recognition by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35268](https://redirect.github.com/huggingface/transformers/issues/35268 )
- fix modular order by
[@​ArthurZucker](https://redirect.github.com/ArthurZucker ) in
[#​35297](https://redirect.github.com/huggingface/transformers/issues/35297 )
- Add sdpa for Beit by
[@​OmarManzoor](https://redirect.github.com/OmarManzoor ) in
[#​34941](https://redirect.github.com/huggingface/transformers/issues/34941 )
- Support for SDPA for SAM models by
[@​MagnusS0](https://redirect.github.com/MagnusS0 ) in
[#​34110](https://redirect.github.com/huggingface/transformers/issues/34110 )
- remove `benchmark` job in `push-important-models.yml` by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35292](https://redirect.github.com/huggingface/transformers/issues/35292 )
- Fix typos in translated quicktour docs by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35302](https://redirect.github.com/huggingface/transformers/issues/35302 )
- Fix image preview in multi-GPU inference docs by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35303](https://redirect.github.com/huggingface/transformers/issues/35303 )
- Fix remove unused parameter in docs by
[@​zzzzzsa](https://redirect.github.com/zzzzzsa ) in
[#​35306](https://redirect.github.com/huggingface/transformers/issues/35306 )
- Add Cohere2 docs details by
[@​alexrs-cohere](https://redirect.github.com/alexrs-cohere ) in
[#​35294](https://redirect.github.com/huggingface/transformers/issues/35294 )
- Fixed typo in audio_classification.md by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35305](https://redirect.github.com/huggingface/transformers/issues/35305 )
- \[docs] Improve register_pipeline by
[@​stevhliu](https://redirect.github.com/stevhliu ) in
[#​35300](https://redirect.github.com/huggingface/transformers/issues/35300 )
- Fix loading with only state dict and low_cpu_mem_usage = True by
[@​SunMarc](https://redirect.github.com/SunMarc ) in
[#​35217](https://redirect.github.com/huggingface/transformers/issues/35217 )
- \[tests] make cuda-only tests device-agnostic by
[@​faaany](https://redirect.github.com/faaany ) in
[#​35222](https://redirect.github.com/huggingface/transformers/issues/35222 )
- Trigger GitHub CI with a comment on PR by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35211](https://redirect.github.com/huggingface/transformers/issues/35211 )
- change bnb tests by
[@​jiqing-feng](https://redirect.github.com/jiqing-feng ) in
[#​34713](https://redirect.github.com/huggingface/transformers/issues/34713 )
- \[Whisper] fix docstrings typo by
[@​eustlb](https://redirect.github.com/eustlb ) in
[#​35319](https://redirect.github.com/huggingface/transformers/issues/35319 )
- feat: add `benchmarks_entrypoint.py` by
[@​McPatate](https://redirect.github.com/McPatate ) in
[#​34495](https://redirect.github.com/huggingface/transformers/issues/34495 )
- Fix documentation for ColPali by
[@​tonywu71](https://redirect.github.com/tonywu71 ) in
[#​35321](https://redirect.github.com/huggingface/transformers/issues/35321 )
- Update comment CI bot by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35323](https://redirect.github.com/huggingface/transformers/issues/35323 )
- PaliGemma: Make sure to add <eos> to suffix if <image> is present in
`text` by [@​probicheaux](https://redirect.github.com/probicheaux )
in
[#​35201](https://redirect.github.com/huggingface/transformers/issues/35201 )
- Fix some fa2 tests by
[@​ArthurZucker](https://redirect.github.com/ArthurZucker ) in
[#​35340](https://redirect.github.com/huggingface/transformers/issues/35340 )
- Modernbert Release Fixes by
[@​warner-benjamin](https://redirect.github.com/warner-benjamin )
in
[#​35344](https://redirect.github.com/huggingface/transformers/issues/35344 )
- \[`docs`] Add link to ModernBERT Text Classification GLUE finetuning
script by [@​tomaarsen](https://redirect.github.com/tomaarsen ) in
[#​35347](https://redirect.github.com/huggingface/transformers/issues/35347 )
- fix onnx export of speech foundation models by
[@​nikosanto13](https://redirect.github.com/nikosanto13 ) in
[#​34224](https://redirect.github.com/huggingface/transformers/issues/34224 )
- \[`Mamba2`] Fix caching, slow path, and multi-gpu by
[@​vasqu](https://redirect.github.com/vasqu ) in
[#​35154](https://redirect.github.com/huggingface/transformers/issues/35154 )
- Reduce CircleCI usage by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35355](https://redirect.github.com/huggingface/transformers/issues/35355 )
- Implement AsyncTextIteratorStreamer for asynchronous streaming by
[@​CISC](https://redirect.github.com/CISC ) in
[#​34931](https://redirect.github.com/huggingface/transformers/issues/34931 )
- Cleaner attention interfaces by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez ) in
[#​35342](https://redirect.github.com/huggingface/transformers/issues/35342 )
- Add Tensor Parallel support for Qwen2VL by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35050](https://redirect.github.com/huggingface/transformers/issues/35050 )
- fix zoedepth initialization error under deepspeed zero3 by
[@​Tavish9](https://redirect.github.com/Tavish9 ) in
[#​35011](https://redirect.github.com/huggingface/transformers/issues/35011 )
- Aurevoir PyTorch 1 by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35358](https://redirect.github.com/huggingface/transformers/issues/35358 )
- bugfix: torch.export failure caused by `_make_causal_mask` by
[@​jiwoong-choi](https://redirect.github.com/jiwoong-choi ) in
[#​35291](https://redirect.github.com/huggingface/transformers/issues/35291 )
- update codecarbon by
[@​nhamanasu](https://redirect.github.com/nhamanasu ) in
[#​35243](https://redirect.github.com/huggingface/transformers/issues/35243 )
- Update test fetcher when we want to test all by
[@​ArthurZucker](https://redirect.github.com/ArthurZucker ) in
[#​35364](https://redirect.github.com/huggingface/transformers/issues/35364 )
- Use `weights_only=True` with `torch.load` for `transfo_xl` by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35241](https://redirect.github.com/huggingface/transformers/issues/35241 )
- Make `test_generate_with_static_cache` even less flaky by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​34995](https://redirect.github.com/huggingface/transformers/issues/34995 )
- Improve modular transformers documentation by
[@​joelpaulkoch](https://redirect.github.com/joelpaulkoch ) in
[#​35322](https://redirect.github.com/huggingface/transformers/issues/35322 )
- Improved Documentation Of Audio Classification by
[@​Uvi-12](https://redirect.github.com/Uvi-12 ) in
[#​35368](https://redirect.github.com/huggingface/transformers/issues/35368 )
- \[docs] Follow up register_pipeline by
[@​stevhliu](https://redirect.github.com/stevhliu ) in
[#​35310](https://redirect.github.com/huggingface/transformers/issues/35310 )
- owlvit/2 dynamic input resolution by
[@​bastrob](https://redirect.github.com/bastrob ) in
[#​34764](https://redirect.github.com/huggingface/transformers/issues/34764 )
- Fix new FA2 if `is_causal` is passed explicitly by
[@​Cyrilvallez](https://redirect.github.com/Cyrilvallez ) in
[#​35390](https://redirect.github.com/huggingface/transformers/issues/35390 )
- bitsandbytes: simplify 8bit dequantization by
[@​matthewdouglas](https://redirect.github.com/matthewdouglas ) in
[#​35068](https://redirect.github.com/huggingface/transformers/issues/35068 )
- make LlamaModel.\_update_causal_mask torch compilable by
[@​winglian](https://redirect.github.com/winglian ) in
[#​35187](https://redirect.github.com/huggingface/transformers/issues/35187 )
- Patch GPTNeoX to use adequate FA2 if position_ids is provided by
[@​taha-yassine](https://redirect.github.com/taha-yassine ) in
[#​35318](https://redirect.github.com/huggingface/transformers/issues/35318 )
- uniformize kwargs for SAM by
[@​tibor-reiss](https://redirect.github.com/tibor-reiss ) in
[#​34578](https://redirect.github.com/huggingface/transformers/issues/34578 )
- Deprecate \_is_quantized_training_enabled by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​34991](https://redirect.github.com/huggingface/transformers/issues/34991 )
- Scale loss before backward by
[@​qgallouedec](https://redirect.github.com/qgallouedec ) in
[#​35207](https://redirect.github.com/huggingface/transformers/issues/35207 )
- Fix typing in docstring for `PaliGemmaProcessor` by
[@​alvarobartt](https://redirect.github.com/alvarobartt ) in
[#​35278](https://redirect.github.com/huggingface/transformers/issues/35278 )
- Fix : VPTQ test by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​35394](https://redirect.github.com/huggingface/transformers/issues/35394 )
- add bnb support for Ascend NPU by
[@​statelesshz](https://redirect.github.com/statelesshz ) in
[#​31512](https://redirect.github.com/huggingface/transformers/issues/31512 )
- bugfix Idefics3 processor - handle gracefully cases with text and no
images by [@​mfarre](https://redirect.github.com/mfarre ) in
[#​35363](https://redirect.github.com/huggingface/transformers/issues/35363 )
- Adding logger.info about update_torch_dtype in some quantizers by
[@​MekkCyber](https://redirect.github.com/MekkCyber ) in
[#​35046](https://redirect.github.com/huggingface/transformers/issues/35046 )
- Add compile test for fast image processor by
[@​yonigozlan](https://redirect.github.com/yonigozlan ) in
[#​35184](https://redirect.github.com/huggingface/transformers/issues/35184 )
- Disable `.github/workflows/self-comment-ci.yml` for now by
[@​ydshieh](https://redirect.github.com/ydshieh ) in
[#​35366](https://redirect.github.com/huggingface/transformers/issues/35366 )
- enable non-cuda awq model support without modify version by
[@​jiqing-feng](https://redirect.github.com/jiqing-feng ) in
[#​35334](https://redirect.github.com/huggingface/transformers/issues/35334 )
- \[`GPTQ`, `CompressedTensors`] Fix unsafe imports and metada check by
[@​vasqu](https://redirect.github.com/vasqu ) in
[#​34815](https://redirect.github.com/huggingface/transformers/issues/34815 )
- Drop inplace operation for loss computation with gradient accumulation
by [@​qgallouedec](https://redirect.github.com/qgallouedec ) in
[#​35416](https://redirect.github.com/huggingface/transformers/issues/35416 )
- Fix: Rename keyword argument in_channels to num_channels by
[@​ningyuv](https://redirect.github.com/ningyuv ) in
[#​35289](https://redirect.github.com/huggingface/transformers/issues/35289 )
- CLIP conversion script - Change fairseq to OpenAI by
[@​gau-nernst](https://redirect.github.com/gau-nernst ) in
[#​35384](https://redirect.github.com/huggingface/transformers/issues/35384 )
- Fix f-string to show `ACCELERATE_MIN_VERSION` on error by
[@​KSafran](https://redirect.github.com/KSafran ) in
[#​35189](https://redirect.github.com/huggingface/transformers/issues/35189 )
- Fix `model_accepts_loss_kwargs` for timm model by
[@​qubvel](https://redirect.github.com/qubvel ) in
[#​35257](https://redirect.github.com/huggingface/transformers/issues/35257 )
- Update perf_infer_gpu_one.md: fix a typo by
[@​martin0258](https://redirect.github.com/martin0258 ) in
[#​35441](https://redirect.github.com/huggingface/transformers/issues/35441 )
- Add compute_loss_func to Seq2SeqTrainer by
[@​d223302](https://redirect.github.com/d223302 ) in
[#​35136](https://redirect.github.com/huggingface/transformers/issues/35136 )
- Update docs for `sdpa_kernel` by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35410](https://redirect.github.com/huggingface/transformers/issues/35410 )
- \[i18n-ar] Translated file:
`docs/source/ar/tasks/question_answering.md` into Arabic by
[@​AhmedAlmaghz](https://redirect.github.com/AhmedAlmaghz ) in
[#​35196](https://redirect.github.com/huggingface/transformers/issues/35196 )
- \[i18n-ar] Translated file: `docs/source/ar/tasks/summarization.md`
into Arabic by
[@​AhmedAlmaghz](https://redirect.github.com/AhmedAlmaghz ) in
[#​35195](https://redirect.github.com/huggingface/transformers/issues/35195 )
- Update translated docs for `sdpa_kernel` by
[@​jla524](https://redirect.github.com/jla524 ) in
[#​35461](https://redirect.github.com/huggingface/transformers/issues/35461 )
- Reintroduce Python 3.9 support for ModernBERT by
[@​tomaarsen](https://redirect.github.com/tomaarsen ) in
[#​35458](https://redirect.github.com/huggingface/transformers/issues/35458 )
- Fix new BNB test failures by
[@​matthewdouglas](https://redirect.github.com/matthewdouglas ) in
[#​35345](https://redirect.github.com/huggingface/transformers/issues/35345 )
- Fix docs typos. by
[@​zhanluxianshen](https://redirect.github.com/zhanluxianshen ) in
[#​35465](https://redirect.github.com/huggingface/transformers/issues/35465 )
- Fix paligemma warning message by
[@​hiyouga](https://redirect.github.com/hiyouga ) in
[#​35486](https://redirect.github.com/huggingface/transformers/issues/35486 )
#### Significant community contributions
The following contributors have made significant changes to the library
over the last release:
- [@​ydshieh](https://redirect.github.com/ydshieh )
- Fix private forked repo. CI
([#​35114](https://redirect.github.com/huggingface/transformers/issues/35114 ))
- Change back to `Thread` for SF conversion
([#​35236](https://redirect.github.com/huggingface/transformers/issues/35236 ))
- Use `rsfE` with `pytest`
([#​35119](https://redirect.github.com/huggingface/transformers/issues/35119 ))
- Aggeregate test summary files in CircleCI workflow runs
([#​34989](https://redirect.github.com/huggingface/transformers/issues/34989 ))
- remove `benchmark` job in `push-important-models.yml`
([#​35292](https://redirect.github.com/huggingface/transformers/issues/35292 ))
- Trigger GitHub CI with a comment on PR
([#​35211](https://redirect.github.com/huggingface/transformers/issues/35211 ))
- Update comment CI bot
([#​35323](https://redirect.github.com/huggingface/transformers/issues/35323 ))
- Reduce CircleCI usage
([#​35355](https://redirect.github.com/huggingface/transformers/issues/35355 ))
- Aurevoir PyTorch 1
([#​35358](https://redirect.github.com/huggingface/transformers/issues/35358 ))
- Use `weights_only=True` with `torch.load` for `transfo_xl`
([#​35241](https://redirect.github.com/huggingface/transformers/issues/35241 ))
- Make `test_generate_with_static_cache` even less flaky
([#​34995](https://redirect.github.com/huggingface/transformers/issues/34995 ))
- Disable `.github/workflows/self-comment-ci.yml` for now
([#​35366](https://redirect.github.com/huggingface/transformers/issues/35366 ))
- [@​aymeric-roucher](https://redirect.github.com/aymeric-roucher )
- Add Aria
([#​34157](https://redirect.github.com/huggingface/transformers/issues/34157 ))
- [@​NielsRogge](https://redirect.github.com/NielsRogge )
- \[I-JEPA] Update docs
([#​35148](https://redirect.github.com/huggingface/transformers/issues/35148 ))
- Add DINOv2 with registers
([#​35348](https://redirect.github.com/huggingface/transformers/issues/35348 ))
- [@​HMJ0628](https://redirect.github.com/HMJ0628 )
- \[i18n-<languageCode>] Translating agents.md to Chinese
([#​35139](https://redirect.github.com/huggingface/transformers/issues/35139 ))
- Translating agents_advanced.md to Chinese
([#​35231](https://redirect.github.com/huggingface/transformers/issues/35231 ))
- Translating "translate perf_infer_gpu_multi.md" to Chinese
([#​35271](https://redirect.github.com/huggingface/transformers/issues/35271 ))
- [@​alexrs-cohere](https://redirect.github.com/alexrs-cohere )
- Add Cohere2 model
([#​35224](https://redirect.github.com/huggingface/transformers/issues/35224 ))
- Add Cohere2 docs details
([#​35294](https://redirect.github.com/huggingface/transformers/issues/35294 ))
- [@​ArthurZucker](https://redirect.github.com/ArthurZucker )
- fix modular order
([#​35297](https://redirect.github.com/huggingface/transformers/issues/35297 ))
- 🚨 All attention refactor🚨
([#​35235](https://redirect.github.com/huggingface/transformers/issues/35235 ))
- Fix some fa2 tests
([#​35340](https://redirect.github.com/huggingface/transformers/issues/35340 ))
- Update test fetcher when we want to test all
([#​35364](https://redirect.github.com/huggingface/transformers/issues/35364 ))
- [@​tonywu71](https://redirect.github.com/tonywu71 )
- Add ColPali to 🤗 transformers
([#​33736](https://redirect.github.com/huggingface/transformers/issues/33736 ))
- Fix documentation for ColPali
([#​35321](https://redirect.github.com/huggingface/transformers/issues/35321 ))
- [@​OmarManzoor](https://redirect.github.com/OmarManzoor )
- Add sdpa for Beit
([#​34941](https://redirect.github.com/huggingface/transformers/issues/34941 ))
- [@​fabianlim](https://redirect.github.com/fabianlim )
- Add the Bamba Model
([#​34982](https://redirect.github.com/huggingface/transformers/issues/34982 ))
- [@​warner-benjamin](https://redirect.github.com/warner-benjamin )
- Add ModernBERT to Transformers
([#​35158](https://redirect.github.com/huggingface/transformers/issues/35158 ))
- Modernbert Release Fixes
([#​35344](https://redirect.github.com/huggingface/transformers/issues/35344 ))
- [@​wejoncy](https://redirect.github.com/wejoncy )
- FEAT : Adding VPTQ quantization method to HFQuantizer
([#​34770](https://redirect.github.com/huggingface/transformers/issues/34770 ))
- [@​bastrob](https://redirect.github.com/bastrob )
- owlvit/2 dynamic input resolution
([#​34764](https://redirect.github.com/huggingface/transformers/issues/34764 ))
- [@​BlackSamorez](https://redirect.github.com/BlackSamorez )
- HIGGS Quantization Support
([#​34997](https://redirect.github.com/huggingface/transformers/issues/34997 ))
###
[`v4.47.1`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.47.1 )
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.47.0...v4.47.1 )
### Patch release v4.47.1
We waited a little bit to make sure it was stable, thanks
[@​winglian](https://redirect.github.com/winglian ) for double
checking and everyone for the fixes!
- Fix GA loss bugs and add unit test
([#​35121](https://redirect.github.com/your-repo/pull/35121 ))
Contributed by [@​techkang](https://redirect.github.com/techkang )
and [@​ArthurZucker](https://redirect.github.com/ArthurZucker ).
- Fix num_items_in_batch not being an integer
([#​35115](https://redirect.github.com/your-repo/pull/35115 ))
Contributed by [@​xspirus](https://redirect.github.com/xspirus ).
- Fix FSDP no longer working
([#​35212](https://redirect.github.com/your-repo/pull/35212 ))
Contributed by
[@​muellerzr](https://redirect.github.com/muellerzr ).
- Don't use no_sync when DeepSpeed doesn't support it for certain ZeRO
configurations
([#​35212](https://redirect.github.com/your-repo/pull/35212 ))
Contributed by [@​winglian](https://redirect.github.com/winglian ).
- Only import torch.distributed if it is available
([#​35133](https://redirect.github.com/your-repo/pull/35133 ))
Contributed by
[@​GaetanLepage](https://redirect.github.com/GaetanLepage ).
- \[Whisper] Patch float type on MPS
([#​35295](https://redirect.github.com/your-repo/pull/35295 ))
Contributed by [@​eustlb](https://redirect.github.com/eustlb ). 🔜
we should probably have MPS CIs to avoid repeating this!
###
[`v4.47.0`](https://redirect.github.com/huggingface/transformers/releases/tag/v4.47.0 ):
v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel
[Compare
Source](https://redirect.github.com/huggingface/transformers/compare/v4.46.3...v4.47.0 )
#### New models
##### PaliGemma-2
PaliGemma 2 and PaliGemma are lightweight open vision-language models
(VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199 ), and based
on open components like the [SigLIP vision
model](https://arxiv.org/abs/2303.15343 ) and the [Gemma language
model](https://arxiv.org/abs/2403.08295 ). PaliGemma takes both images
and text as inputs and can answer questions about images with detail and
context, meaning that PaliGemma can perform deeper analysis of images
and provide useful insights, such as captioning for images and short
videos, object detection, and reading text embedded within images.
PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are
based on Gemma 2 2B, 9B, and 27B models, respectively. The original
PaliGemma models are available in the 3B size. For more information on
Gemma model variants, see the [Gemma models
list](https://ai.google.dev/gemma/docs/get_started#models-list ).
PaliGemma model variants support different pixel resolutions for image
inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.
<img width="743" alt="image"
src="https://github.com/user-attachments/assets/55cda8a6-b463-4a58-b7d3-f7d50ee2fa11 ">
##### I-JEPA
The I-JEPA model was proposed in [Image-based Joint-Embedding Predictive
Architecture](https://arxiv.org/pdf/2301.08243.pdf ) by Mahmoud Assran,
Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael
Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning
method that predicts the representations of one part of an image based
on other parts of the same image. This approach focuses on learning
semantic features without relying on pre-defined invariances from
hand-crafted data transformations, which can bias specific tasks, or on
filling in pixel-level details, which often leads to less meaningful
representations.
<img width="413" alt="image"
src="https://github.com/user-attachments/assets/561ca9d7-0327-477a-96b8-61d2af0caf34 ">
- Add I-JEPA by [@​jmtzt](https://redirect.github.com/jmtzt ) in
[#​33125](https://redirect.github.com/huggingface/transformers/issues/33125 )
##### OLMo 2
<img width="833" alt="image"
src="https://github.com/user-attachments/assets/1abdde92-0aae-404a-b83e-77ec8bd13b7f ">
The OLMo2 model is the successor of the OLMo model, which was proposed
in [OLMo: Accelerating the Science of Language
Models](https://arxiv.org/abs/2402.00838 ).
The architectural chan
</details>
---
### Configuration
📅 **Schedule**: Branch creation - "" (UTC), Automerge - At any time (no
schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR was generated by [Mend Renovate](https://mend.io/renovate/ ).
View the [repository job
log](https://developer.mend.io/github/dora-rs/dora ).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xNjQuMSIsInVwZGF0ZWRJblZlciI6IjM5LjE2NC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
11 months ago