Sales Data Analysis with Streamlit

A comprehensive data analysis skill for building interactive Streamlit applications that analyze sales data from Excel files. This skill demonstrates best practices for data analysis workflows, including statistical reporting, time series visualization, and bilingual documentation.

What This Skill Does

This skill enables you to create a professional data analysis web application that:

Provides file upload interface for Excel (.xlsx) data

Generates built-in sample datasets for testing

Performs comprehensive statistical analysis on sales data

Creates time series plots and bar charts with Russian labels

Produces downloadable analysis reports

Implements comprehensive testing (unit, integration, quality)

Includes CI/CD pipeline with Docker deployment

Project Structure

When using this skill, you'll create:

```

project_root/

├── streamlit_app.py # Main Streamlit application (self-contained)

├── requirements.txt # Python dependencies

├── Dockerfile # Container configuration

├── .dockerignore # Docker build exclusions

├── pytest.ini # Pytest configuration

├── tests/

│ ├── conftest.py # Shared test fixtures

│ ├── test_analysis_functions.py # Unit tests

│ ├── test_streamlit_integration.py # Integration tests

│ ├── test_code_quality.py # Code quality tests

│ ├── test_data_samples.py # Data validation tests

│ └── run_tests.py # Automated test runner

├── .github/workflows/

│ └── ci-cd.yml # GitHub Actions pipeline

└── docs/

└── sample_sales_data.xlsx # Example data (optional)

```

Step-by-Step Implementation Instructions

Phase 1: Core Application Setup

1. **Create virtual environment and install dependencies**

```bash

python -m venv venv

source venv/bin/activate # Linux/Mac

# or

.\venv\Scripts\activate # Windows

```

2. **Create `requirements.txt`** with production dependencies:

```

streamlit>=1.28.0

pandas>=2.0.0

openpyxl>=3.1.0

matplotlib>=3.7.0

seaborn>=0.12.0

```

3. **Build main Streamlit application** (`streamlit_app.py`):

- Set page config with Russian title and wide layout

- Create file uploader for Excel files (`.xlsx`)

- Implement sample data generator function for testing

- Add data validation (check for required columns: Дата, Продукт, Продажи)

- Display raw data preview with `st.dataframe()`

- Convert date columns to datetime format automatically

4. **Implement statistical analysis functions**:

- `calculate_basic_stats()`: Mean, median, std, min, max for numeric columns

- `calculate_trends()`: Group by product, calculate totals and averages

- `identify_peak_periods()`: Find top N periods by sales volume

- `generate_analysis_report()`: Create bilingual summary report (Russian + English)

5. **Add visualization functions**:

- `plot_time_series()`: Line chart of sales over time by product

- `plot_product_comparison()`: Bar chart comparing total sales by product

- `plot_correlation_heatmap()`: Seaborn heatmap of numeric correlations

- Ensure all plots use Russian labels and professional styling

- Use `st.pyplot()` to display matplotlib figures

6. **Create download functionality**:

- Generate CSV export of analysis results

- Use `st.download_button()` for report downloads

- Include timestamp in filename

Phase 2: Testing Infrastructure

7. **Create pytest configuration** (`pytest.ini`):

```ini

[pytest]

testpaths = tests

python_files = test_*.py

python_classes = Test*

python_functions = test_*

markers =

slow: marks tests as slow (deselect with '-m "not slow"')

performance: marks performance-related tests

```

8. **Set up test fixtures** (`tests/conftest.py`):

- `sample_dataframe`: Create realistic sales data for testing

- `invalid_dataframe`: Missing columns for error testing

- `empty_dataframe`: Empty DataFrame for edge cases

- `large_dataframe`: Performance testing dataset (1000+ rows)

9. **Write unit tests** (`tests/test_analysis_functions.py`):

- Test each analysis function with various data scenarios

- Mock Streamlit components using `unittest.mock`

- Validate output types and value ranges

- Test error handling with invalid inputs

10. **Write integration tests** (`tests/test_streamlit_integration.py`):

- Test full application workflow from upload to visualization

- Verify component interactions (file upload → analysis → display)

- Test sample data generation and loading

- Validate end-to-end data processing pipeline

11. **Write code quality tests** (`tests/test_code_quality.py`):

- Syntax validation with `py_compile`

- Code structure checks (imports, functions, classes)

- Security checks (no hardcoded credentials, SQL injection risks)

- Performance benchmarks (marked with `@pytest.mark.slow`)

12. **Write data validation tests** (`tests/test_data_samples.py`):

- Test with realistic data scenarios

- Validate edge cases (single row, missing dates, negative values)

- Test data type conversions and error handling

13. **Create test runner script** (`tests/run_tests.py`):

- Automated test execution with summary report

- Optional performance testing flag

- Coverage reporting integration

- Colored output for better readability

Phase 3: Docker Containerization

14. **Create Dockerfile**:

```dockerfile

FROM python:3.13-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY streamlit_app.py .

EXPOSE 8501

CMD ["streamlit", "run", "streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]

```

15. **Create `.dockerignore`**:

- Exclude: `venv/`, `__pycache__/`, `.git/`, `tests/`, `docs/`

- Include only: `streamlit_app.py`, `requirements.txt`

- Keep image size minimal (production-only files)

16. **Test Docker build locally**:

```bash

docker build -t sales-analytics-app .

docker run -p 8501:8501 sales-analytics-app

```

Phase 4: CI/CD Pipeline

17. **Create GitHub Actions workflow** (`.github/workflows/ci-cd.yml`):

- **Test job**: Run on PRs to `main`

- Set up Python 3.13

- Install dependencies from requirements.txt

- Run pytest with coverage report

- Upload coverage artifact

- **Build job**: Run on PRs (depends on test job)

- Build Docker image for validation

- No push to registry (PR validation only)

- **Deploy job**: Run on push to `main` (depends on test job)

- Set up Docker Buildx for multi-platform builds

- Login to Docker Hub

- Build and push with tags: `latest`, `branch-name`, `sha-<commit>`

- Target platforms: `linux/amd64`, `linux/arm64`

18. **Configure GitHub Secrets**:

- `DOCKER_USERNAME`: Docker Hub username

- `DOCKER_TOKEN`: Docker Hub access token (generate from Docker Hub settings)

19. **Set up branch protection rules** (recommended):

- Require passing tests before merge to `main`

- Require at least 1 approval for PRs

- Enable automatic deletion of merged branches

Phase 5: Documentation and Deployment

20. **Create comprehensive README.md**:

- Project overview and features

- Installation instructions (local + Docker)

- Usage examples with screenshots

- Testing instructions

- CI/CD workflow explanation

- Contribution guidelines

21. **Create CLAUDE.md** (this file) for AI agent guidance:

- Project structure overview

- Key files and their purposes

- Development commands

- Testing architecture

- Known issues and troubleshooting

22. **Test deployment workflow**:

- Create feature branch from `dev`

- Make changes and push

- Create PR to `main` (triggers test + build jobs)

- Merge to `main` (triggers deploy job)

- Verify image available on Docker Hub

- Pull and run production image

Usage Examples

Running Locally

```bash

Install dependencies

pip install -r requirements.txt

Start Streamlit app

streamlit run streamlit_app.py

Opens at http://localhost:8501

```

Running Tests

```bash

Quick test run

pytest tests/ -v

With coverage report

pytest tests/ --cov=streamlit_app --cov-report=term-missing

Skip slow tests

pytest tests/ -v -m "not slow"

Automated test suite

python tests/run_tests.py

```

Docker Deployment

```bash

Build locally

docker build -t sales-analytics-app .

Run container

docker run -p 8501:8501 sales-analytics-app

Pull from Docker Hub (after CI/CD)

docker pull YOUR_USERNAME/sales-analytics-app:latest

docker run -p 8501:8501 YOUR_USERNAME/sales-analytics-app:latest

```

Git Workflow

```bash

Development work

git checkout dev

git add .

git commit -m "Add feature"

git push origin dev

Production deployment

git checkout main

git merge dev

git push origin main # Triggers CI/CD deployment

```

Important Constraints

1. **Data Format Requirements**:

- Excel files must have columns: `Дата` (Date), `Продукт` (Product), `Продажи` (Sales)

- Date format: YYYY-MM-DD or automatic conversion

- Sales values: numeric (integer or float)

2. **Language Considerations**:

- Primary interface in Russian

- Bilingual documentation (Russian + English)

- Use UTF-8 encoding for all files

- Windows console may have Cyrillic display issues (use Docker for production)

3. **Performance Guidelines**:

- Sample data generation: up to 1000 rows recommended

- Large file uploads (>10MB): add progress indicators

- Cache expensive computations with `@st.cache_data`

- Mark performance tests with `@pytest.mark.slow`

4. **Testing Standards**:

- Minimum 80% code coverage target

- All tests must pass before merging to `main`

- Mock Streamlit components in unit tests

- Use fixtures for consistent test data

5. **Docker Best Practices**:

- Use slim Python base image (python:3.13-slim)

- Exclude development files via `.dockerignore`

- Self-contained application (no external data dependencies)

- Multi-platform builds for broader compatibility

6. **CI/CD Requirements**:

- PRs trigger test + build jobs (no deployment)

- Push to `main` triggers full deployment pipeline

- Docker Hub credentials required in GitHub secrets

- Branch protection recommended for production safety

Key Technologies

**Python 3.13+**: Modern Python features and performance

**Streamlit**: Interactive web framework for data apps

**Pandas**: Data manipulation and analysis

**Matplotlib/Seaborn**: Statistical visualization

**Pytest**: Comprehensive testing framework

**Docker**: Containerization for deployment

**GitHub Actions**: CI/CD automation

Success Criteria

Your implementation is complete when:

✅ Streamlit app runs and accepts Excel uploads

✅ Sample data generation works correctly

✅ All visualizations render with Russian labels

✅ Statistical analysis produces accurate results

✅ All tests pass (unit, integration, quality)

✅ Docker image builds and runs successfully

✅ CI/CD pipeline deploys to Docker Hub on merge to `main`

✅ Application is accessible at `http://localhost:8501`

✅ Code coverage is >80%

Sales Data Analysis with Streamlit

Sales Data Analysis with Streamlit

What This Skill Does

Project Structure

Step-by-Step Implementation Instructions

Phase 1: Core Application Setup

Phase 2: Testing Infrastructure

Phase 3: Docker Containerization

Phase 4: CI/CD Pipeline

Phase 5: Documentation and Deployment

Usage Examples

Running Locally

Install dependencies

Start Streamlit app

Opens at http://localhost:8501

Running Tests

Quick test run

With coverage report

Skip slow tests

Automated test suite

Docker Deployment

Build locally

Run container

Pull from Docker Hub (after CI/CD)

Git Workflow

Development work

Production deployment

Important Constraints

Key Technologies

Success Criteria

Reviews (0)