Python Data Science Development Rules

This skill enforces strict Python coding standards with a focus on data science workflows. It ensures all generated code follows PEP 8, includes comprehensive type hints and Google-style docstrings, and integrates best practices for pandas, numpy, and plotly development.

Instructions

General Style & Formatting

1. **Adhere strictly to PEP 8 and Ruff formatting standards**

- Use descriptive variable names and avoid unnecessary abbreviations

- Prefer early returns for clarity and minimal nesting

- Format code using standard Python conventions with minimal manual adjustments

2. **Always generate complete, working code**

- No TODOs or placeholders

- Include all necessary imports at the top

- Ensure code is fully functional and ready to run

Type Hints & Documentation

3. **Automatically include Python type hints for every function and method**

- Add type hints to all parameters and return values

- Use appropriate types from `typing` module when needed

4. **Generate Google-style docstrings for every function and class**

- Brief description of the function's purpose

- Parameters section with types and descriptions

- Returns section with type and explanation

- Example section when applicable (especially for data manipulation functions)

- For DataFrame operations, include expected structure (column names and data types)

- Ensure docstrings are copy-paste ready

5. **Follow this docstring template universally:**

```python

def function_name(param1: Type, param2: Type) -> ReturnType:

"""

Brief one-sentence description.

Args:

param1 (Type): Description.

param2 (Type): Description.

Returns:

ReturnType: Description of the returned value.

Example:

>>> function_name(example_param1, example_param2)

expected_output

"""

# function body

```

Code Generation & Refactoring

6. **Produce modular code broken into logical functions or classes**

- Each function should have a single, clear responsibility

- When suggesting refactoring, ensure revised code remains fully functional

7. **Use data science libraries by default:**

- pandas for data manipulation

- numpy for numerical operations

- plotly for visualization

- Include standard imports: `import pandas as pd`, `import numpy as np`, `import plotly.express as px`

8. **Include robust error handling and logging**

- Add try-except blocks for statistical or machine learning tasks

- Include basic logging for complex operations

9. **Add inline comments for complex operations**

- Describe the "why" behind key operations

- Facilitate future documentation and maintenance

Context Awareness & Integration

10. **Assume data science context by default**

- Check if operations are on DataFrames or numerical arrays

- Tailor examples to the data structure being used

11. **For DataFrame operations:**

- Use `pandas.read_csv` with proper error checks for CSV reading

- Include explanations of assumptions (e.g., missing values handling)

- Prioritize vectorized operations over loops

12. **For machine learning code:**

- Assume PyTorch workflow when applicable

- Include device settings using `torch.device(...)`

13. **Prioritize readability and efficiency:**

- Use list comprehensions or pandas vectorized methods when available

- Choose concise, clear solutions over verbose implementations

14. **Support multi-file projects:**

- Coordinate code across modules using proper imports

- Include comments explaining integration with external systems (e.g., Google Docs export)

Example Usage

When asked to "create a function to clean a DataFrame":

```python

import pandas as pd

from typing import List, Optional

def clean_dataframe(

df: pd.DataFrame,

columns_to_drop: Optional[List[str]] = None,

fill_na_value: Optional[float] = 0.0

) -> pd.DataFrame:

"""

Clean a DataFrame by dropping specified columns and filling missing values.

Args:

df (pd.DataFrame): Input DataFrame to clean.

columns_to_drop (Optional[List[str]]): List of column names to drop. Defaults to None.

fill_na_value (Optional[float]): Value to use for filling NaN entries. Defaults to 0.0.

Returns:

pd.DataFrame: Cleaned DataFrame with specified columns removed and NaN values filled.

Example:

>>> data = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6], 'C': [7, 8, 9]})

>>> clean_dataframe(data, columns_to_drop=['C'], fill_na_value=0)

A B

0 1.0 4.0

1 2.0 0.0

2 0.0 6.0

"""

# Create a copy to avoid modifying the original DataFrame

df_cleaned = df.copy()

# Drop specified columns if provided

if columns_to_drop:

df_cleaned = df_cleaned.drop(columns=columns_to_drop, errors='ignore')

# Fill missing values with the specified value

df_cleaned = df_cleaned.fillna(fill_na_value)

return df_cleaned

```

Constraints

Never generate incomplete code with placeholders

Always include type hints even for simple functions

Docstrings are mandatory for all functions and classes

Prioritize pandas vectorized operations over iterative approaches

Follow PEP 8 without exception

Python Data Science Development Rules

Python Data Science Development Rules

Instructions

General Style & Formatting

Type Hints & Documentation

Code Generation & Refactoring

Context Awareness & Integration

Example Usage

Constraints

Reviews (0)