Enforce PEP 8, Ruff formatting, comprehensive type hints, Google-style docstrings, and data science best practices for Python development with pandas, numpy, and plotly.
A comprehensive skill for generating clean, well-documented Python code following industry best practices, with a focus on data science workflows using pandas, numpy, and plotly.
This skill enforces strict code quality standards for Python development, automatically generating:
1. **Adhere to PEP 8 and Ruff standards**
- Use descriptive variable names without unnecessary abbreviations
- Prefer early returns to reduce nesting complexity
- Format code using standard Python conventions
2. **Generate complete, working code**
- Never include TODOs or placeholders
- All code must be production-ready and fully functional
- Include all necessary imports at the top of each code block
3. **Prioritize readability**
- Use vectorized operations (list comprehensions, pandas methods) when available
- Break complex operations into logical, modular functions or classes
- Add inline comments explaining the "why" behind key operations
4. **Always include Python type hints**
- Add type hints to every function parameter and return value
- Use appropriate types from `typing` module when needed
5. **Generate Google-style docstrings for all functions and classes**
Required sections:
- Brief one-sentence description
- `Args:` section with parameter types and descriptions
- `Returns:` section with return type and explanation
- `Example:` section showing usage (especially for data manipulation)
Template:
```python
def function_name(param1: Type, param2: Type) -> ReturnType:
"""
Brief one-sentence description.
Args:
param1 (Type): Description of parameter 1.
param2 (Type): Description of parameter 2.
Returns:
ReturnType: Description of the returned value.
Example:
>>> function_name(example_param1, example_param2)
expected_output
"""
# Function implementation
```
6. **For DataFrame operations, document expected structure**
- Include column names and data types
- Specify assumptions about missing values
- Example structure in docstring or comments
7. **Default to data science libraries**
- Use `pandas`, `numpy`, and `plotly` as primary libraries
- Standard imports:
```python
import pandas as pd
import numpy as np
import plotly.express as px
```
8. **Include error handling and logging**
- Add try-except blocks for file operations (e.g., `pd.read_csv`)
- Log important operations, especially in statistical/ML tasks
- Handle edge cases gracefully
9. **Generate modular, logical code structure**
- Break down complex operations into smaller functions
- Each function should have a single, clear responsibility
- Ensure functions are reusable and well-encapsulated
10. **When refactoring**
- Maintain full functionality of original code
- Improve readability without sacrificing performance
- Ensure all tests still pass
11. **Assume data science context**
- Default to DataFrame and array operations
- Use pandas for CSV operations with proper error checks
- Include data validation where appropriate
12. **For machine learning code**
- Assume PyTorch workflow when applicable
- Include device settings: `torch.device('cuda' if torch.cuda.is_available() else 'cpu')`
- Add model evaluation and metrics tracking
13. **Multi-file project support**
- Coordinate code across modules using context
- Maintain consistent style and documentation across files
- Use relative imports appropriately
14. **Documentation export compatibility**
- Ensure docstrings are copy-paste ready for Google Docs
- Include comments explaining integration processes when relevant
15. **Data cleaning and analysis functions**
- Document assumptions (e.g., missing value handling strategy)
- Include data validation checks
- Provide clear error messages for invalid inputs
16. **Statistical operations**
- Use vectorized pandas/numpy operations
- Avoid explicit loops when vectorization is possible
- Include appropriate statistical tests and confidence intervals
17. **Visualization code**
- Use plotly for interactive visualizations
- Include clear axis labels and titles
- Make plots accessible with proper color schemes
```python
import pandas as pd
from typing import Optional
def load_customer_data(filepath: str, encoding: str = 'utf-8') -> pd.DataFrame:
"""
Load customer data from CSV file with error handling.
Args:
filepath (str): Path to the CSV file.
encoding (str): File encoding. Defaults to 'utf-8'.
Returns:
pd.DataFrame: DataFrame with columns ['customer_id', 'name', 'email', 'signup_date'].
Raises:
FileNotFoundError: If the specified file does not exist.
pd.errors.EmptyDataError: If the file is empty.
Example:
>>> df = load_customer_data('data/customers.csv')
>>> df.head()
"""
try:
df = pd.read_csv(filepath, encoding=encoding)
required_columns = ['customer_id', 'name', 'email', 'signup_date']
if not all(col in df.columns for col in required_columns):
raise ValueError(f"Missing required columns. Expected: {required_columns}")
return df
except FileNotFoundError:
raise FileNotFoundError(f"File not found: {filepath}")
except pd.errors.EmptyDataError:
raise pd.errors.EmptyDataError(f"Empty file: {filepath}")
```
```python
import pandas as pd
import numpy as np
from typing import Dict
def calculate_customer_metrics(df: pd.DataFrame) -> Dict[str, float]:
"""
Calculate key customer engagement metrics from transaction data.
Assumes df contains 'customer_id', 'purchase_amount', and 'purchase_date' columns.
Missing values in 'purchase_amount' are treated as zero.
Args:
df (pd.DataFrame): Transaction data with customer purchases.
Returns:
Dict[str, float]: Dictionary containing:
- 'avg_purchase': Average purchase amount
- 'total_revenue': Total revenue
- 'unique_customers': Number of unique customers
Example:
>>> metrics = calculate_customer_metrics(transaction_df)
>>> print(f"Average purchase: ${metrics['avg_purchase']:.2f}")
"""
df = df.copy()
df['purchase_amount'] = df['purchase_amount'].fillna(0)
metrics = {
'avg_purchase': df['purchase_amount'].mean(),
'total_revenue': df['purchase_amount'].sum(),
'unique_customers': df['customer_id'].nunique()
}
return metrics
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/python-data-science-best-practices/raw