Groq Python API Client

Use the official Groq Python library to access Groq's ultra-fast LLM inference API from any Python 3.9+ application. This skill provides type-safe synchronous and asynchronous clients with automatic retries, error handling, and streaming support.

What This Skill Does

This skill enables you to:

1. Make synchronous and asynchronous chat completion requests to Groq's API

2. Stream responses for real-time token delivery

3. Handle errors gracefully with typed exceptions

4. Transcribe audio files using Whisper models

5. Configure retries, timeouts, and custom HTTP clients

6. Access raw response headers and metadata

7. Use type-safe request parameters and Pydantic response models

Installation

```bash

pip install groq

```

For improved async performance with aiohttp backend:

```bash

pip install groq[aiohttp]

```

Setup Instructions

1. Get Your API Key

2. Store API Key Securely

Create a `.env` file (never commit this to version control):

```bash

GROQ_API_KEY=your_api_key_here

```

Install python-dotenv to load environment variables:

```bash

pip install python-dotenv

```

3. Basic Synchronous Usage

```python

import os

from groq import Groq

Client automatically reads GROQ_API_KEY from environment

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

Create a chat completion

chat_completion = client.chat.completions.create(

messages=[

{

"role": "system",

"content": "You are a helpful assistant."

{

"role": "user",

"content": "Explain the importance of low latency LLMs"

}

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

print(chat_completion.choices[0].message.content)

```

4. Asynchronous Usage

```python

import os

import asyncio

from groq import AsyncGroq

client = AsyncGroq(api_key=os.environ.get("GROQ_API_KEY"))

async def main() -> None:

chat_completion = await client.chat.completions.create(

messages=[

{

"role": "user",

"content": "Explain quantum computing in simple terms"

}

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

print(chat_completion.choices[0].message.content)

asyncio.run(main())

```

5. Using Aiohttp for Better Async Performance

```python

import os

import asyncio

from groq import AsyncGroq, DefaultAioHttpClient

async def main() -> None:

async with AsyncGroq(

api_key=os.environ.get("GROQ_API_KEY"),

http_client=DefaultAioHttpClient()

) as client:

chat_completion = await client.chat.completions.create(

messages=[{"role": "user", "content": "Hello!"}],

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

print(chat_completion.choices[0].message.content)

asyncio.run(main())

```

Advanced Features

Streaming Responses

```python

with client.chat.completions.with_streaming_response.create(

messages=[{"role": "user", "content": "Tell me a story"}],

model="meta-llama/llama-4-scout-17b-16e-instruct",

stream=True

) as response:

for line in response.iter_lines():

print(line)

```

Audio Transcription

```python

from pathlib import Path

transcription = client.audio.transcriptions.create(

model="whisper-large-v3-turbo",

file=Path("/path/to/audio.mp3")

)

print(transcription.text)

```

Error Handling

```python

import groq

from groq import Groq

client = Groq()

try:

chat_completion = client.chat.completions.create(

messages=[{"role": "user", "content": "Hello"}],

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

except groq.APIConnectionError as e:

print("Network error:", e.__cause__)

except groq.RateLimitError as e:

print("Rate limit hit, back off")

except groq.AuthenticationError as e:

print("Invalid API key")

except groq.APIStatusError as e:

print(f"API error: {e.status_code} - {e.response}")

```

Configuring Retries and Timeouts

```python

import httpx

from groq import Groq

Configure default retry and timeout behavior

client = Groq(

max_retries=5, # Default is 2

timeout=httpx.Timeout(60.0, read=10.0, write=10.0, connect=5.0)

)

Override per-request

client.with_options(timeout=30.0, max_retries=3).chat.completions.create(

messages=[{"role": "user", "content": "Quick question"}],

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

```

Accessing Raw Response Data

```python

response = client.chat.completions.with_raw_response.create(

messages=[{"role": "user", "content": "Hello"}],

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

Access headers

print(response.headers.get('X-Request-ID'))

Parse the completion object

completion = response.parse()

print(completion.id)

```

Common Use Cases

Multi-turn Conversations

```python

messages = [

{"role": "system", "content": "You are a Python expert."}

]

First turn

messages.append({"role": "user", "content": "What are list comprehensions?"})

response = client.chat.completions.create(

messages=messages,

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

messages.append({"role": "assistant", "content": response.choices[0].message.content})

Second turn

messages.append({"role": "user", "content": "Show me an example"})

response = client.chat.completions.create(

messages=messages,

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

print(response.choices[0].message.content)

```

Type-Safe Responses

All responses are Pydantic models with helper methods:

```python

response = client.chat.completions.create(

messages=[{"role": "user", "content": "Hello"}],

model="meta-llama/llama-4-scout-17b-16e-instruct"

)

Serialize to JSON

json_str = response.to_json()

Convert to dictionary

data = response.to_dict()

Check if field was set

if 'id' in response.model_fields_set:

print(f"Response ID: {response.id}")

```

Error Status Codes

| Status Code | Error Type |

|------------|-----------|

| 400 | `BadRequestError` |

| 401 | `AuthenticationError` |

| 403 | `PermissionDeniedError` |

| 404 | `NotFoundError` |

| 422 | `UnprocessableEntityError` |

| 429 | `RateLimitError` |

| ≥500 | `InternalServerError` |

| N/A | `APIConnectionError` |

Debugging

Enable debug logging:

```bash

export GROQ_LOG=debug

```

Or in code:

```python

import logging

logging.basicConfig(level=logging.DEBUG)

```

Important Notes

**API Key Security**: Never hardcode API keys. Use environment variables with python-dotenv.

**Automatic Retries**: Connection errors, 408, 409, 429, and ≥500 errors are retried twice by default with exponential backoff.

**Default Timeout**: Requests timeout after 1 minute. Adjust with the `timeout` parameter.

**VS Code Type Checking**: Set `python.analysis.typeCheckingMode` to `basic` for inline type error detection.

**Async Context Manager**: When using aiohttp, wrap the client in an `async with` block for proper resource cleanup.

Resources

**Full API Reference**: [api.md on GitHub](https://github.com/groq/groq-python/tree/main/api.md)

**REST API Docs**: [console.groq.com/docs](https://console.groq.com/docs)

**PyPI Package**: [pypi.org/project/groq](https://pypi.org/project/groq/)

**Source Code**: [github.com/groq/groq-python](https://github.com/groq/groq-python)

Groq Python API Client

Groq Python API Client

What This Skill Does

Installation

Setup Instructions

1. Get Your API Key

2. Store API Key Securely

3. Basic Synchronous Usage

Client automatically reads GROQ_API_KEY from environment

Create a chat completion

4. Asynchronous Usage

5. Using Aiohttp for Better Async Performance

Advanced Features

Streaming Responses

Audio Transcription

Error Handling

Configuring Retries and Timeouts

Configure default retry and timeout behavior

Override per-request

Accessing Raw Response Data

Access headers

Parse the completion object

Common Use Cases

Multi-turn Conversations

First turn

Second turn

Type-Safe Responses

Serialize to JSON

Convert to dictionary

Check if field was set

Error Status Codes

Debugging

Important Notes

Resources

Reviews (0)