Arduino Sensor Analytics & LSTM Prediction System

Real-time sensor data collection, streaming, analysis, and prediction system. Collects temperature and humidity data from Arduino UNO + DHT11 sensor, streams through Apache Kafka, analyzes with Apache Spark, predicts future values using dual LSTM models, and visualizes in modern web dashboards.

System Architecture

```

Arduino UNO + DHT11 → Serial → Python Producer → Kafka → Spark Streaming → Analysis

↓

Dual LSTM Models ← Historical Data

↓

Web Dashboard (Real-time visualization)

```

Instructions

1. Understand the Project Structure

Before starting, familiarize yourself with the key components:

**docker-compose.yml**: Kafka (KRaft mode) + Spark cluster infrastructure

**environment.yml**: Conda Python 3.9 environment with TensorFlow 2.13.0, Dash 2.14.1

**simple_producer.py**: Kafka producer supporting sample data or real Arduino serial input

**spark-apps/spark_streaming.py**: Spark streaming job for hourly/minute aggregations

**simple_lstm.py**: Dual LSTM models (separate for temperature and humidity)

**dashboard.py**: Modern glassmorphism dashboard on port 8050

**dashboard_legacy.py**: Simple minimal dashboard on port 8060

**arduino_code.ino**: DHT11 sensor code for Arduino UNO (digital pin 2)

2. Start the System

Use the provided scripts to launch all components:

```bash

Start Docker (Kafka + Spark) and Conda environment

./start.sh

In separate terminals:

./run_producer.sh # Terminal 1: Kafka producer

./run_spark.sh # Terminal 2: Spark streaming

./run_dashboard.sh # Terminal 3: Main dashboard

```

Or manually:

```bash

Start infrastructure

docker-compose up -d

Activate Conda environment

conda env create -f environment.yml

conda activate sensor-analytics

Run components individually

python simple_producer.py

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1 spark-apps/spark_streaming.py

python dashboard.py

```

3. Access Web Interfaces

**Main Dashboard**: http://localhost:8050 (modern design with glassmorphism)

**Legacy Dashboard**: http://localhost:8060 (simple design)

**Kafka UI**: http://localhost:8081 (topic/message inspection)

**Spark UI**: http://localhost:8080 (job monitoring)

4. Configure Arduino Integration (Optional)

For real sensor data instead of sample data:

1. Upload **arduino_code.ino** to Arduino UNO with DHT11 sensor on digital pin 2

2. Find serial port:

```bash

# macOS

ls /dev/cu.*

# Linux

ls /dev/ttyUSB* /dev/ttyACM*

```

3. Edit **simple_producer.py**:

```python

USE_SAMPLE_DATA = False

serial_port = '/dev/cu.usbserial-140' # Update with your port

```

4. Restart producer: `./run_producer.sh`

5. Monitor LSTM Model Training

The dashboard automatically retrains dual LSTM models every 1 minute:

**Temperature LSTM**: Predicts next minute's temperature

**Humidity LSTM**: Predicts next minute's humidity

**Training Data**: Uses minute-averaged sensor readings (minimum 20 points)

**Architecture**: Sequence length 10, 10 epochs, batch size 4

**Performance**: MSE tracked separately for each model

Check console logs for retraining status and MSE values.

6. Debug Common Issues

**LSTM predictions showing 0.0 or fake values:**

Ensure models are training with new data (check console logs)

Verify separate temperature and humidity models are initialized

Wait for at least 20 data points before retraining

**Graphs showing only one line:**

Fixed Y-axis ranges prevent scaling issues

Temperature: 15-35°C, Humidity: 25-85%

Check `historical_data` list is populated

**Kafka connection errors:**

```bash

docker ps | grep kafka

docker logs kafka

```

**Serial port not found:**

Reconnect Arduino

Check permissions: `sudo chmod 666 /dev/ttyUSB0`

7. Test Components Independently

```bash

Test LSTM model standalone

python simple_lstm.py

Test producer without Kafka

(Edit simple_producer.py to add print statements)

Check Spark streaming output

View terminal running run_spark.sh

```

8. Stop the System

```bash

./stop.sh

Or manually:

docker-compose down

conda deactivate

```

9. Customize Dashboard

**Modern Dashboard (dashboard.py):**

Glassmorphism design with CSS Grid

Fixed Y-axis ranges for visibility

Auto-refresh every 2 seconds

Gradient backgrounds

**Legacy Dashboard (dashboard_legacy.py):**

Basic HTML tables

Simple charts

Minimal styling

Port 8060

Modify CSS in dashboard.py or create new layouts by copying the structure.

10. Extend the System

**Add new sensors:**

1. Update arduino_code.ino with new sensor readings

2. Modify simple_producer.py to parse additional fields

3. Update Spark aggregations in spark_streaming.py

4. Add new LSTM models in simple_lstm.py for new metrics

5. Extend dashboard charts

**Change prediction intervals:**

Edit retraining interval in dashboard.py (default: 60 seconds)

Adjust LSTM sequence length in simple_lstm.py (default: 10)

**Store predictions:**

Add database connector in dashboard.py

Save predictions to PostgreSQL/MongoDB

Create historical prediction analysis views

Key Configuration Details

**Kafka Topic**: `sensor-data` (KRaft mode, no Zookeeper)

**Spark Version**: 3.4.1 with Kafka 0-10 connector

**LSTM Architecture**: Separate models, MinMaxScaler normalization, sliding window

**Data Flow**: Arduino → Serial (2s) → Kafka → Spark (1min aggregation) → LSTM → Dashboard

**Dashboard Refresh**: 2 seconds

**Model Retraining**: 1 minute (minimum 20 data points)

Important Notes

System requires Docker and Conda for full functionality

Arduino integration is optional; sample data generator included

Modern dashboard uses fixed Y-axis ranges to prevent visibility issues

LSTM models train separately for temperature and humidity to avoid prediction conflicts

Historical data limited to last 100 readings to prevent memory issues

Suitable for educational/research purposes; production use requires additional error handling and persistence

Arduino Sensor Analytics & LSTM Prediction System

Arduino Sensor Analytics & LSTM Prediction System

System Architecture

Instructions

1. Understand the Project Structure

2. Start the System

Start Docker (Kafka + Spark) and Conda environment

In separate terminals:

Start infrastructure

Activate Conda environment

Run components individually

3. Access Web Interfaces

4. Configure Arduino Integration (Optional)

5. Monitor LSTM Model Training

6. Debug Common Issues

7. Test Components Independently

Test LSTM model standalone

Test producer without Kafka

(Edit simple_producer.py to add print statements)

Check Spark streaming output

View terminal running run_spark.sh

8. Stop the System

Or manually:

9. Customize Dashboard

10. Extend the System

Key Configuration Details

Important Notes

Reviews (0)