Interactive web-based sales data analysis tool with Excel upload, statistical reporting, and visualizations. Bilingual (Russian/English) Streamlit application for time series analysis.
A comprehensive data analysis skill for building interactive Streamlit applications that analyze sales data from Excel files. This skill demonstrates best practices for data analysis workflows, including statistical reporting, time series visualization, and bilingual documentation.
This skill enables you to create a professional data analysis web application that:
When using this skill, you'll create:
```
project_root/
├── streamlit_app.py # Main Streamlit application (self-contained)
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── .dockerignore # Docker build exclusions
├── pytest.ini # Pytest configuration
├── tests/
│ ├── conftest.py # Shared test fixtures
│ ├── test_analysis_functions.py # Unit tests
│ ├── test_streamlit_integration.py # Integration tests
│ ├── test_code_quality.py # Code quality tests
│ ├── test_data_samples.py # Data validation tests
│ └── run_tests.py # Automated test runner
├── .github/workflows/
│ └── ci-cd.yml # GitHub Actions pipeline
└── docs/
└── sample_sales_data.xlsx # Example data (optional)
```
1. **Create virtual environment and install dependencies**
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
```
2. **Create `requirements.txt`** with production dependencies:
```
streamlit>=1.28.0
pandas>=2.0.0
openpyxl>=3.1.0
matplotlib>=3.7.0
seaborn>=0.12.0
```
3. **Build main Streamlit application** (`streamlit_app.py`):
- Set page config with Russian title and wide layout
- Create file uploader for Excel files (`.xlsx`)
- Implement sample data generator function for testing
- Add data validation (check for required columns: Дата, Продукт, Продажи)
- Display raw data preview with `st.dataframe()`
- Convert date columns to datetime format automatically
4. **Implement statistical analysis functions**:
- `calculate_basic_stats()`: Mean, median, std, min, max for numeric columns
- `calculate_trends()`: Group by product, calculate totals and averages
- `identify_peak_periods()`: Find top N periods by sales volume
- `generate_analysis_report()`: Create bilingual summary report (Russian + English)
5. **Add visualization functions**:
- `plot_time_series()`: Line chart of sales over time by product
- `plot_product_comparison()`: Bar chart comparing total sales by product
- `plot_correlation_heatmap()`: Seaborn heatmap of numeric correlations
- Ensure all plots use Russian labels and professional styling
- Use `st.pyplot()` to display matplotlib figures
6. **Create download functionality**:
- Generate CSV export of analysis results
- Use `st.download_button()` for report downloads
- Include timestamp in filename
7. **Create pytest configuration** (`pytest.ini`):
```ini
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
performance: marks performance-related tests
```
8. **Set up test fixtures** (`tests/conftest.py`):
- `sample_dataframe`: Create realistic sales data for testing
- `invalid_dataframe`: Missing columns for error testing
- `empty_dataframe`: Empty DataFrame for edge cases
- `large_dataframe`: Performance testing dataset (1000+ rows)
9. **Write unit tests** (`tests/test_analysis_functions.py`):
- Test each analysis function with various data scenarios
- Mock Streamlit components using `unittest.mock`
- Validate output types and value ranges
- Test error handling with invalid inputs
10. **Write integration tests** (`tests/test_streamlit_integration.py`):
- Test full application workflow from upload to visualization
- Verify component interactions (file upload → analysis → display)
- Test sample data generation and loading
- Validate end-to-end data processing pipeline
11. **Write code quality tests** (`tests/test_code_quality.py`):
- Syntax validation with `py_compile`
- Code structure checks (imports, functions, classes)
- Security checks (no hardcoded credentials, SQL injection risks)
- Performance benchmarks (marked with `@pytest.mark.slow`)
12. **Write data validation tests** (`tests/test_data_samples.py`):
- Test with realistic data scenarios
- Validate edge cases (single row, missing dates, negative values)
- Test data type conversions and error handling
13. **Create test runner script** (`tests/run_tests.py`):
- Automated test execution with summary report
- Optional performance testing flag
- Coverage reporting integration
- Colored output for better readability
14. **Create Dockerfile**:
```dockerfile
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY streamlit_app.py .
EXPOSE 8501
CMD ["streamlit", "run", "streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
```
15. **Create `.dockerignore`**:
- Exclude: `venv/`, `__pycache__/`, `.git/`, `tests/`, `docs/`
- Include only: `streamlit_app.py`, `requirements.txt`
- Keep image size minimal (production-only files)
16. **Test Docker build locally**:
```bash
docker build -t sales-analytics-app .
docker run -p 8501:8501 sales-analytics-app
```
17. **Create GitHub Actions workflow** (`.github/workflows/ci-cd.yml`):
- **Test job**: Run on PRs to `main`
- Set up Python 3.13
- Install dependencies from requirements.txt
- Run pytest with coverage report
- Upload coverage artifact
- **Build job**: Run on PRs (depends on test job)
- Build Docker image for validation
- No push to registry (PR validation only)
- **Deploy job**: Run on push to `main` (depends on test job)
- Set up Docker Buildx for multi-platform builds
- Login to Docker Hub
- Build and push with tags: `latest`, `branch-name`, `sha-<commit>`
- Target platforms: `linux/amd64`, `linux/arm64`
18. **Configure GitHub Secrets**:
- `DOCKER_USERNAME`: Docker Hub username
- `DOCKER_TOKEN`: Docker Hub access token (generate from Docker Hub settings)
19. **Set up branch protection rules** (recommended):
- Require passing tests before merge to `main`
- Require at least 1 approval for PRs
- Enable automatic deletion of merged branches
20. **Create comprehensive README.md**:
- Project overview and features
- Installation instructions (local + Docker)
- Usage examples with screenshots
- Testing instructions
- CI/CD workflow explanation
- Contribution guidelines
21. **Create CLAUDE.md** (this file) for AI agent guidance:
- Project structure overview
- Key files and their purposes
- Development commands
- Testing architecture
- Known issues and troubleshooting
22. **Test deployment workflow**:
- Create feature branch from `dev`
- Make changes and push
- Create PR to `main` (triggers test + build jobs)
- Merge to `main` (triggers deploy job)
- Verify image available on Docker Hub
- Pull and run production image
```bash
pip install -r requirements.txt
streamlit run streamlit_app.py
```
```bash
pytest tests/ -v
pytest tests/ --cov=streamlit_app --cov-report=term-missing
pytest tests/ -v -m "not slow"
python tests/run_tests.py
```
```bash
docker build -t sales-analytics-app .
docker run -p 8501:8501 sales-analytics-app
docker pull YOUR_USERNAME/sales-analytics-app:latest
docker run -p 8501:8501 YOUR_USERNAME/sales-analytics-app:latest
```
```bash
git checkout dev
git add .
git commit -m "Add feature"
git push origin dev
git checkout main
git merge dev
git push origin main # Triggers CI/CD deployment
```
1. **Data Format Requirements**:
- Excel files must have columns: `Дата` (Date), `Продукт` (Product), `Продажи` (Sales)
- Date format: YYYY-MM-DD or automatic conversion
- Sales values: numeric (integer or float)
2. **Language Considerations**:
- Primary interface in Russian
- Bilingual documentation (Russian + English)
- Use UTF-8 encoding for all files
- Windows console may have Cyrillic display issues (use Docker for production)
3. **Performance Guidelines**:
- Sample data generation: up to 1000 rows recommended
- Large file uploads (>10MB): add progress indicators
- Cache expensive computations with `@st.cache_data`
- Mark performance tests with `@pytest.mark.slow`
4. **Testing Standards**:
- Minimum 80% code coverage target
- All tests must pass before merging to `main`
- Mock Streamlit components in unit tests
- Use fixtures for consistent test data
5. **Docker Best Practices**:
- Use slim Python base image (python:3.13-slim)
- Exclude development files via `.dockerignore`
- Self-contained application (no external data dependencies)
- Multi-platform builds for broader compatibility
6. **CI/CD Requirements**:
- PRs trigger test + build jobs (no deployment)
- Push to `main` triggers full deployment pipeline
- Docker Hub credentials required in GitHub secrets
- Branch protection recommended for production safety
Your implementation is complete when:
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/sales-data-analysis-with-streamlit/raw