Hugging Face Transformers

Use the Transformers library to work with state-of-the-art pretrained machine learning models for text, computer vision, audio, video, and multimodal tasks. Access over 1 million model checkpoints from the Hugging Face Hub.

Instructions

When the user asks to work with transformer models, large language models, vision models, or needs to use pretrained ML models:

1. **Installation Setup**

- Verify Python 3.9+ and PyTorch 2.1+ are installed

- Create a virtual environment if one doesn't exist

- Install transformers with: `pip install "transformers[torch]"`

- For latest features: clone from `https://github.com/huggingface/transformers.git` and install with `pip install '.[torch]'`

2. **Using the Pipeline API**

- Use `pipeline()` as the high-level interface for quick inference

- Common tasks: `text-generation`, `text-classification`, `automatic-speech-recognition`, `image-classification`, `visual-question-answering`, `translation`, `summarization`, `zero-shot-classification`

- Pattern: `pipeline = pipeline(task="task-name", model="model-id")`

- The model is automatically downloaded and cached from Hugging Face Hub

3. **Text Generation**

- For simple generation: pass a string prompt to the pipeline

- For chat/instruct models: construct a chat history with roles (`system`, `user`, `assistant`)

- Set generation parameters like `max_new_tokens`, `temperature`, `top_p` as needed

- Use `dtype=torch.bfloat16` and `device_map="auto"` for efficient GPU usage

4. **Model Selection**

- Browse models at `https://huggingface.co/models`

- Filter by task, library (transformers), language, license

- Popular models: Llama, Qwen, BERT, GPT, ViT, Whisper, CLIP

- Use model ID format: `organization/model-name` (e.g., `meta-llama/Meta-Llama-3-8B-Instruct`)

5. **Multimodal Tasks**

- For vision: pass image URLs or file paths directly to pipeline

- For audio: pass audio file URLs or paths to ASR/audio classification pipelines

- For VQA: provide both `image` and `question` parameters

- Pipeline automatically handles preprocessing and postprocessing

6. **Best Practices**

- Cache models locally by running pipeline once - subsequent runs reuse cached weights

- Use appropriate dtype (bfloat16/float16) to reduce memory usage

- Set `device_map="auto"` for automatic GPU/CPU distribution

- Check model cards on Hub for usage examples, limitations, and license info

- For production: consider dedicated inference engines (vLLM, TGI) that leverage transformers definitions

7. **Code Patterns**

```python

from transformers import pipeline

import torch

# Text generation

generator = pipeline("text-generation", model="model-id")

result = generator("prompt text", max_new_tokens=100)

# Chat

chat_pipeline = pipeline("text-generation", model="chat-model-id", device_map="auto", dtype=torch.bfloat16)

messages = [{"role": "user", "content": "question"}]

response = chat_pipeline(messages, max_new_tokens=512)

# Vision/Audio

classifier = pipeline("image-classification", model="model-id")

result = classifier("image_url_or_path")

```

8. **Troubleshooting**

- Out of memory: reduce batch size, use smaller model, or enable 8-bit/4-bit quantization

- Slow inference: ensure GPU is available, use bfloat16 dtype, consider vLLM for production

- Model not found: verify model ID spelling and check if model exists on Hub

- Authentication: use `huggingface-cli login` for gated models

Examples

**Generate text with Qwen:**

```python

from transformers import pipeline

generator = pipeline("text-generation", model="Qwen/Qwen2.5-1.5B")

result = generator("The future of AI is")

print(result[0]["generated_text"])

```

**Chat with Llama:**

```python

import torch

from transformers import pipeline

chat = [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Explain transformers in simple terms."}

]

pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")

response = pipe(chat, max_new_tokens=256)

print(response[0]["generated_text"][-1]["content"])

```

**Image classification:**

```python

from transformers import pipeline

classifier = pipeline("image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")

result = classifier("path/to/image.jpg")

print(result)

```

**Speech recognition:**

```python

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")

result = transcriber("audio_file.mp3")

print(result["text"])

```

Notes

Transformers acts as the model-definition framework - compatible with training frameworks (Axolotl, DeepSpeed, FSDP) and inference engines (vLLM, SGLang, TGI)

Models are framework-agnostic - move between PyTorch, JAX, and TensorFlow

Always check model licenses before commercial use

For command-line chat: use `transformers chat model-id` (requires transformers serve)

Explore the [documentation](https://huggingface.co/docs/transformers) for advanced usage, fine-tuning, and custom models

Hugging Face Transformers

Hugging Face Transformers

Instructions

Examples

Notes

Reviews (0)