State-of-the-art pretrained models for text, vision, audio, and multimodal ML tasks. Unified API for inference and training with PyTorch, supporting 1M+ model checkpoints on Hugging Face Hub.
Use the Transformers library to work with state-of-the-art pretrained machine learning models for text, computer vision, audio, video, and multimodal tasks. Access over 1 million model checkpoints from the Hugging Face Hub.
When the user asks to work with transformer models, large language models, vision models, or needs to use pretrained ML models:
1. **Installation Setup**
- Verify Python 3.9+ and PyTorch 2.1+ are installed
- Create a virtual environment if one doesn't exist
- Install transformers with: `pip install "transformers[torch]"`
- For latest features: clone from `https://github.com/huggingface/transformers.git` and install with `pip install '.[torch]'`
2. **Using the Pipeline API**
- Use `pipeline()` as the high-level interface for quick inference
- Common tasks: `text-generation`, `text-classification`, `automatic-speech-recognition`, `image-classification`, `visual-question-answering`, `translation`, `summarization`, `zero-shot-classification`
- Pattern: `pipeline = pipeline(task="task-name", model="model-id")`
- The model is automatically downloaded and cached from Hugging Face Hub
3. **Text Generation**
- For simple generation: pass a string prompt to the pipeline
- For chat/instruct models: construct a chat history with roles (`system`, `user`, `assistant`)
- Set generation parameters like `max_new_tokens`, `temperature`, `top_p` as needed
- Use `dtype=torch.bfloat16` and `device_map="auto"` for efficient GPU usage
4. **Model Selection**
- Browse models at `https://huggingface.co/models`
- Filter by task, library (transformers), language, license
- Popular models: Llama, Qwen, BERT, GPT, ViT, Whisper, CLIP
- Use model ID format: `organization/model-name` (e.g., `meta-llama/Meta-Llama-3-8B-Instruct`)
5. **Multimodal Tasks**
- For vision: pass image URLs or file paths directly to pipeline
- For audio: pass audio file URLs or paths to ASR/audio classification pipelines
- For VQA: provide both `image` and `question` parameters
- Pipeline automatically handles preprocessing and postprocessing
6. **Best Practices**
- Cache models locally by running pipeline once - subsequent runs reuse cached weights
- Use appropriate dtype (bfloat16/float16) to reduce memory usage
- Set `device_map="auto"` for automatic GPU/CPU distribution
- Check model cards on Hub for usage examples, limitations, and license info
- For production: consider dedicated inference engines (vLLM, TGI) that leverage transformers definitions
7. **Code Patterns**
```python
from transformers import pipeline
import torch
# Text generation
generator = pipeline("text-generation", model="model-id")
result = generator("prompt text", max_new_tokens=100)
# Chat
chat_pipeline = pipeline("text-generation", model="chat-model-id", device_map="auto", dtype=torch.bfloat16)
messages = [{"role": "user", "content": "question"}]
response = chat_pipeline(messages, max_new_tokens=512)
# Vision/Audio
classifier = pipeline("image-classification", model="model-id")
result = classifier("image_url_or_path")
```
8. **Troubleshooting**
- Out of memory: reduce batch size, use smaller model, or enable 8-bit/4-bit quantization
- Slow inference: ensure GPU is available, use bfloat16 dtype, consider vLLM for production
- Model not found: verify model ID spelling and check if model exists on Hub
- Authentication: use `huggingface-cli login` for gated models
**Generate text with Qwen:**
```python
from transformers import pipeline
generator = pipeline("text-generation", model="Qwen/Qwen2.5-1.5B")
result = generator("The future of AI is")
print(result[0]["generated_text"])
```
**Chat with Llama:**
```python
import torch
from transformers import pipeline
chat = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformers in simple terms."}
]
pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipe(chat, max_new_tokens=256)
print(response[0]["generated_text"][-1]["content"])
```
**Image classification:**
```python
from transformers import pipeline
classifier = pipeline("image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")
result = classifier("path/to/image.jpg")
print(result)
```
**Speech recognition:**
```python
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
result = transcriber("audio_file.mp3")
print(result["text"])
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/hugging-face-transformers/raw