DictPFL Research Implementation Assistant

Expert assistant for working with DictPFL (Dictionary-based Private Federated Learning), a research implementation demonstrating efficient homomorphic encryption in federated learning using dictionary decomposition (DePE) and pruning with reactivation (PrME).

What This Skill Does

This skill helps you understand, develop, test, and extend the DictPFL research codebase, which compares three federated learning approaches with homomorphic encryption:

1. **FedHE-Full** - Baseline with full gradient encryption

2. **FedML-HE** - Partial encryption (top 10% gradients)

3. **DictPFL** - Dictionary decomposition with pruned encrypted gradients

Target performance: < 60 seconds total training time, < 1 second per round on toy-scale datasets.

Step-by-Step Instructions

1. Repository Setup and Environment Verification

When starting work on the DictPFL project:

Verify the repository structure contains core files: `demo.py`, `dataset.py`, `model.py`, `fhe_utils.py`, `fedhe_full.py`, `fedml_he.py`, `dictpfl.py`, `metrics.py`, `plots.py`

Check Python dependencies are installed: `torch`, `numpy`, `scikit-learn`, `matplotlib`, `gradio`, `Pyfhel`

If dependencies missing, install via: `pip install torch numpy scikit-learn matplotlib gradio Pyfhel`

Locate the research paper `2510.21086v1.pdf` in repository root for mathematical formulations

2. Understanding the Architecture

Before making changes, explain the architecture:

**Federated Learning Pipeline**: 5 clients with non-IID data partitioning, local training, gradient encryption, server aggregation

**FHE Layer**: Pyfhel library with CKKS scheme (n=8192, scale=2^30), only supports encryptFrac(), decryptFrac(), ciphertext addition

**DictPFL Specifics**:

- DePE: SVD factorization (W ≈ D × T), encrypt only lookup table T gradients

- PrME: Gradient pruning with probabilistic reactivation (β=0.2)

- Fixed dictionary D, only train lookup table T

**Simulation Flow**: Client-side local training → gradient selection → encryption → server aggregation → decryption → model update

3. Running the Demo

To execute the interactive Gradio demo:

Run: `python demo.py`

Demo allows selecting dataset (make_moons or MNIST subset), method (FedHE-Full, FedML-HE, DictPFL), and hyperparameters

Expected output: Real-time training metrics (accuracy, communication cost, encryption time)

Performance target: Complete training in < 60 seconds

4. Implementing Tests

When adding test coverage:

Create `tests/` directory if not present

Implement unit tests for FHE operations in `tests/test_fhe_utils.py`:

- Encrypt/decrypt correctness (roundtrip should match original float)

- Homomorphic addition: decrypt(enc(a) + enc(b)) == a + b

- Batch encryption correctness

Integration tests for each method (`tests/test_fedhe_full.py`, etc.):

- End-to-end gradient aggregation

- Verify encrypted aggregation matches plaintext aggregation

Performance benchmarks in `tests/test_performance.py`:

- Time per round < 1 second

- Communication bytes comparison across methods

Convergence tests: Verify accuracy targets are met within expected rounds

Run tests: `python -m pytest tests/ -v`

5. Modifying FHE Parameters

If adjusting homomorphic encryption settings in `fhe_utils.py`:

CKKS context parameters: `poly_modulus_degree=8192`, `coeff_mod_bit_sizes=[60,40,40,60]`, `scale=2^30`

Increasing `poly_modulus_degree` improves security but slows operations

Adjusting `scale` affects precision (higher scale = more precision but larger ciphertexts)

Only modify if understanding CKKS scheme deeply; default params are tuned for demo performance

Test encryption/decryption accuracy after any changes

6. Extending to New Datasets

To add support for new datasets beyond make_moons and MNIST:

Modify `dataset.py` to include new data loading function

Ensure non-IID partitioning logic is applied (label skew or quantity skew)

Keep dataset small (1000-5000 samples total) to meet < 60s performance target

Update `demo.py` Gradio interface to include new dataset option

Test convergence with new dataset for all three methods

7. Optimizing Performance

When performance targets are not met:

**Profile bottlenecks**: Use Python `cProfile` or `line_profiler` to identify slow operations

**Batch encryption**: Pack multiple gradient values into single ciphertext (see `fhe_utils.py` batch functions)

**Reduce model size**: Use fewer parameters (hundreds, not thousands)

**Prune more aggressively**: Increase pruning ratio `s` in PrME (trade-off: slower convergence)

**Parallelize client operations**: Use Python `multiprocessing` for independent client training

Verify per-round time < 1 second, total time < 60 seconds

8. Privacy and Security Analysis

When evaluating privacy guarantees:

**FedHE-Full**: Full gradient-level privacy, highest communication overhead

**FedML-HE**: Partial privacy (90% of gradients exposed in plaintext)

**DictPFL**: Full privacy with reduced communication (smaller encrypted lookup table)

Refer to paper (`2510.21086v1.pdf`) Section 3 for formal privacy analysis

FHE provides computational security; no multiplication or bootstrapping needed in this implementation

Server cannot infer individual gradients from homomorphic sum

9. Code Maintenance Best Practices

When modifying the codebase:

Keep FHE operations simple: only addition, no multiplication/rotation

Document any changes to dictionary rank `r`, pruning ratio `s`, reactivation probability `β`

Maintain single CKKS context for simplicity (demo assumes shared context)

Add type hints to new functions for clarity

Update this CLAUDE.md file if architecture changes

Test convergence after any method modifications

10. Troubleshooting Common Issues

If encountering problems:

**Import errors**: Verify Pyfhel installed correctly (may require system dependencies on some OS)

**Slow encryption**: Check CKKS parameters, consider batch encryption

**Poor convergence**: Tune learning rate, reduce pruning ratio, adjust dictionary rank

**Memory errors**: Reduce dataset size or model size

**Gradio interface not loading**: Check port conflicts, try `demo.launch(share=True)`

Important Constraints

**FHE Limitation**: CKKS scheme only supports addition; no ciphertext multiplication or bootstrapping

**Simulation**: Single shared context across clients (production would use individual key pairs)

**Performance Target**: Demo must complete < 60 seconds; prioritize speed over scalability

**Dataset Scale**: Use toy datasets (1000-5000 samples) for demo purposes

**Privacy Trade-off**: DictPFL reduces communication cost while maintaining full gradient privacy

Reference Materials

Research paper: `2510.21086v1.pdf` (in repository root)

Pyfhel documentation: [https://pyfhel.readthedocs.io/](https://pyfhel.readthedocs.io/)

CKKS scheme overview: Refer to paper Section 2.1 for cryptographic details

Example Usage

**Adding a new test:**

```python

tests/test_fhe_utils.py

import pytest

from fhe_utils import FHEContext

def test_homomorphic_addition():

ctx = FHEContext()

a, b = 3.14, 2.71

ct_a = ctx.encryptFrac(a)

ct_b = ctx.encryptFrac(b)

ct_sum = ct_a + ct_b

result = ctx.decryptFrac(ct_sum)

assert abs(result - (a + b)) < 1e-3, "Homomorphic addition failed"

```

**Modifying dictionary rank:**

```python

dictpfl.py

Change rank from default to smaller value for faster encryption

DICTIONARY_RANK = 10 # Reduce from default (e.g., 20) for speed

```

**Profiling performance:**

```bash

python -m cProfile -o profile.stats demo.py

python -c "import pstats; pstats.Stats('profile.stats').sort_stats('cumtime').print_stats(20)"

```

DictPFL Research Implementation Assistant

DictPFL Research Implementation Assistant

What This Skill Does

Step-by-Step Instructions

1. Repository Setup and Environment Verification

2. Understanding the Architecture

3. Running the Demo

4. Implementing Tests

5. Modifying FHE Parameters

6. Extending to New Datasets

7. Optimizing Performance

8. Privacy and Security Analysis

9. Code Maintenance Best Practices

10. Troubleshooting Common Issues

Important Constraints

Reference Materials

Example Usage

tests/test_fhe_utils.py

dictpfl.py

Change rank from default to smaller value for faster encryption

Reviews (0)