MNIST CUDA Neural Network

Build, train, and test a CUDA-accelerated multilayer perceptron for MNIST digit classification.

What This Skill Does

This skill helps you work with a CUDA-accelerated neural network implementation for MNIST digit classification. The codebase includes:

CUDA kernels for forward/backward passes

MNIST IDX file loading and normalization

ASCII art digit visualization

Complete training pipeline with accuracy reporting

Make-based build system for C and CUDA code

Architecture Overview

**Network:** 784 (input) → 128 (hidden, ReLU) → 10 (output logits)

**Training Pipeline:**

```

MNIST IDX files → MNISTDataset → normalize_mnist() → NormalizedMNIST

↓

copy to GPU (d_input)

↓

forward() → compute_loss() → backward() → update_params()

```

**Core Files:**

`model.cu/h` - CUDA neural network implementation

`train.cu` - Training program with GPU memory management

`mnist.c/h` - IDX format data loader and normalizer

`display.c/h` - ASCII visualization

`bswap.c/h` - Big-endian byte swapping utilities

Instructions

1. Verify Environment Setup

Check that CUDA is properly installed:

```bash

nvcc --version

```

If CUDA is not at `/usr/local/cuda`, set the `CUDA_PATH` environment variable:

```bash

export CUDA_PATH=/path/to/cuda

```

2. Verify MNIST Data Files

Ensure training and test data exist:

`data/mnist/train-images.idx3-ubyte`

`data/mnist/train-labels.idx1-ubyte`

`data/mnist/t10k-images.idx3-ubyte`

`data/mnist/t10k-labels.idx1-ubyte`

If missing, download from [MNIST Database](http://yann.lecun.com/exdb/mnist/).

3. Build Commands

**Build all targets:**

```bash

make all

```

**Build and run digit display:**

```bash

make display

```

**Build and run C tests:**

```bash

make test

```

**Build and run CUDA model tests:**

```bash

make test-cuda

```

**Train the model:**

```bash

make train

```

**Clean build artifacts:**

```bash

make clean

```

Build outputs go to `bin/` directory.

4. Code Modification Guidelines

**Compilers:**

C code: Clang with C23 standard, strict warnings enabled

CUDA code: NVCC with `-O2 -g` optimization

**Memory Management Rules:**

All `Model` and `TrainState` pointers use GPU device memory (prefix `d_`)

MNIST labels pointer is shared between `MNISTDataset` and `NormalizedMNIST` (only free once)

Always use `CUDA_CHECK` macro for CUDA API calls

**Code Style:**

Formatting enforced by `.clang-format` (LLVM style, 2-space indent)

Run `make format` or ensure pre-commit hooks are active

5. Understanding CUDA Kernels

**Forward Pass:**

`linear_relu_kernel` - Fused linear layer + ReLU for hidden layer

`linear_kernel` - Linear layer for output logits

**Loss & Gradients:**

`softmax_cross_entropy_kernel` - Numerically stable fused softmax, loss, and gradient

**Backward Pass:**

`matmul_at_b_kernel` - Weight gradients via C = A^T × B

`bias_grad_kernel` - Bias gradient by summing over batch

`hidden_grad_kernel` - Backprop through layer 2 with ReLU mask

**Optimization:**

`sgd_kernel` - SGD update: param -= lr × grad

**Metrics:**

`count_correct_kernel` - Accuracy via argmax comparison

6. Key Data Structures

**Model:**

`d_W1`, `d_b1` - Hidden layer weights and biases (GPU)

`d_W2`, `d_b2` - Output layer weights and biases (GPU)

**TrainState:**

`d_hidden`, `d_logits` - Forward pass activations (GPU)

`d_dW1`, `d_db1`, `d_dW2`, `d_db2` - Parameter gradients (GPU)

`d_loss` - Scalar loss value (GPU)

**MNISTDataset:**

`pixels` - Raw uint8 pixel data (0-255), CPU memory

`labels` - Label array (0-9), CPU memory

**NormalizedMNIST:**

`pixels` - Normalized float pixel data (0.0-1.0), CPU memory

`labels` - Shared pointer from source dataset

7. Current Implementation Status

**Implemented:**

IDX data loading with big-endian support

Xavier weight initialization

Forward pass with ReLU activation

Fused softmax + cross-entropy loss

Full backward pass with gradient computation

SGD parameter updates

Accuracy computation

Complete training loop

**Future Optimizations:**

Shared memory tiling for matrix operations

cuBLAS integration for GEMM operations

Mixed-precision training

Example Usage

**Train for 10 epochs and monitor accuracy:**

```bash

make train

```

**Visualize sample digits:**

```bash

make display

```

**Run test suite:**

```bash

make test && make test-cuda

```

Important Notes

Ensure `CUDA_PATH` is set if CUDA is not in `/usr/local/cuda`

The network uses Xavier initialization for stable training

Softmax + cross-entropy is fused for numerical stability

All GPU memory uses device pointers prefixed with `d_`

MNIST labels are shared between raw and normalized datasets (avoid double-free)

MNIST CUDA Neural Network

MNIST CUDA Neural Network

What This Skill Does

Architecture Overview

Instructions

1. Verify Environment Setup

2. Verify MNIST Data Files

3. Build Commands

4. Code Modification Guidelines

5. Understanding CUDA Kernels

6. Key Data Structures

7. Current Implementation Status

Example Usage

Important Notes

Reviews (0)