Arduino Sensor Analytics & LSTM Prediction System
Real-time sensor data collection, streaming, analysis, and prediction system. Collects temperature and humidity data from Arduino UNO + DHT11 sensor, streams through Apache Kafka, analyzes with Apache Spark, predicts future values using dual LSTM models, and visualizes in modern web dashboards.
System Architecture
```
Arduino UNO + DHT11 → Serial → Python Producer → Kafka → Spark Streaming → Analysis
↓
Dual LSTM Models ← Historical Data
↓
Web Dashboard (Real-time visualization)
```
Instructions
1. Understand the Project Structure
Before starting, familiarize yourself with the key components:
**docker-compose.yml**: Kafka (KRaft mode) + Spark cluster infrastructure**environment.yml**: Conda Python 3.9 environment with TensorFlow 2.13.0, Dash 2.14.1**simple_producer.py**: Kafka producer supporting sample data or real Arduino serial input**spark-apps/spark_streaming.py**: Spark streaming job for hourly/minute aggregations**simple_lstm.py**: Dual LSTM models (separate for temperature and humidity)**dashboard.py**: Modern glassmorphism dashboard on port 8050**dashboard_legacy.py**: Simple minimal dashboard on port 8060**arduino_code.ino**: DHT11 sensor code for Arduino UNO (digital pin 2)2. Start the System
Use the provided scripts to launch all components:
```bash
Start Docker (Kafka + Spark) and Conda environment
./start.sh
In separate terminals:
./run_producer.sh # Terminal 1: Kafka producer
./run_spark.sh # Terminal 2: Spark streaming
./run_dashboard.sh # Terminal 3: Main dashboard
```
Or manually:
```bash
Start infrastructure
docker-compose up -d
Activate Conda environment
conda env create -f environment.yml
conda activate sensor-analytics
Run components individually
python simple_producer.py
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1 spark-apps/spark_streaming.py
python dashboard.py
```
3. Access Web Interfaces
**Main Dashboard**: http://localhost:8050 (modern design with glassmorphism)**Legacy Dashboard**: http://localhost:8060 (simple design)**Kafka UI**: http://localhost:8081 (topic/message inspection)**Spark UI**: http://localhost:8080 (job monitoring)4. Configure Arduino Integration (Optional)
For real sensor data instead of sample data:
1. Upload **arduino_code.ino** to Arduino UNO with DHT11 sensor on digital pin 2
2. Find serial port:
```bash
# macOS
ls /dev/cu.*
# Linux
ls /dev/ttyUSB* /dev/ttyACM*
```
3. Edit **simple_producer.py**:
```python
USE_SAMPLE_DATA = False
serial_port = '/dev/cu.usbserial-140' # Update with your port
```
4. Restart producer: `./run_producer.sh`
5. Monitor LSTM Model Training
The dashboard automatically retrains dual LSTM models every 1 minute:
**Temperature LSTM**: Predicts next minute's temperature**Humidity LSTM**: Predicts next minute's humidity**Training Data**: Uses minute-averaged sensor readings (minimum 20 points)**Architecture**: Sequence length 10, 10 epochs, batch size 4**Performance**: MSE tracked separately for each modelCheck console logs for retraining status and MSE values.
6. Debug Common Issues
**LSTM predictions showing 0.0 or fake values:**
Ensure models are training with new data (check console logs)Verify separate temperature and humidity models are initializedWait for at least 20 data points before retraining**Graphs showing only one line:**
Fixed Y-axis ranges prevent scaling issuesTemperature: 15-35°C, Humidity: 25-85%Check `historical_data` list is populated**Kafka connection errors:**
```bash
docker ps | grep kafka
docker logs kafka
```
**Serial port not found:**
Reconnect ArduinoCheck permissions: `sudo chmod 666 /dev/ttyUSB0`7. Test Components Independently
```bash
Test LSTM model standalone
python simple_lstm.py
Test producer without Kafka
(Edit simple_producer.py to add print statements)
Check Spark streaming output
View terminal running run_spark.sh
```
8. Stop the System
```bash
./stop.sh
Or manually:
docker-compose down
conda deactivate
```
9. Customize Dashboard
**Modern Dashboard (dashboard.py):**
Glassmorphism design with CSS GridFixed Y-axis ranges for visibilityAuto-refresh every 2 secondsGradient backgrounds**Legacy Dashboard (dashboard_legacy.py):**
Basic HTML tablesSimple chartsMinimal stylingPort 8060Modify CSS in dashboard.py or create new layouts by copying the structure.
10. Extend the System
**Add new sensors:**
1. Update arduino_code.ino with new sensor readings
2. Modify simple_producer.py to parse additional fields
3. Update Spark aggregations in spark_streaming.py
4. Add new LSTM models in simple_lstm.py for new metrics
5. Extend dashboard charts
**Change prediction intervals:**
Edit retraining interval in dashboard.py (default: 60 seconds)Adjust LSTM sequence length in simple_lstm.py (default: 10)**Store predictions:**
Add database connector in dashboard.pySave predictions to PostgreSQL/MongoDBCreate historical prediction analysis viewsKey Configuration Details
**Kafka Topic**: `sensor-data` (KRaft mode, no Zookeeper)**Spark Version**: 3.4.1 with Kafka 0-10 connector**LSTM Architecture**: Separate models, MinMaxScaler normalization, sliding window**Data Flow**: Arduino → Serial (2s) → Kafka → Spark (1min aggregation) → LSTM → Dashboard**Dashboard Refresh**: 2 seconds**Model Retraining**: 1 minute (minimum 20 data points)Important Notes
System requires Docker and Conda for full functionalityArduino integration is optional; sample data generator includedModern dashboard uses fixed Y-axis ranges to prevent visibility issuesLSTM models train separately for temperature and humidity to avoid prediction conflictsHistorical data limited to last 100 readings to prevent memory issuesSuitable for educational/research purposes; production use requires additional error handling and persistence