HomeAI ChipDay 2 Enhanced

Neural Network Fundamentals

Complete neural network math for chip designers. Perceptrons, layers, backpropagation, activation functions, and hardware implications.

By EcrioniX · Published June 13, 2026 · ~4800 words · 14 min read

1. The Perceptron Model

Foundation of all neural networks: a single processing unit that mimics a neuron.

Perceptron Output: output = activation(Σ(weight_i × input_i) + bias) Example (3 inputs): z = w₁×x₁ + w₂×x₂ + w₃×x₃ + b output = ReLU(z) = max(0, z) Hardware implication: - Multiply-accumulate (MAC): 3 multiplications + 2 additions - Simple, parallelizable - Can be implemented in single cycle with pipeline Why it works: - Weights learn features (e.g., edge detection in images) - Bias adjusts threshold - Activation adds non-linearity (essential for learning)

2. Multi-Layer Networks

Stacking perceptrons creates expressive models.

Neural Network Structure: Input Layer (28×28=784 pixels) ↓ Hidden Layer 1 (128 neurons) ↓ Hidden Layer 2 (64 neurons) ↓ Output Layer (10 classes: digits 0-9) Forward Pass: h₁ = ReLU(W₁ × x + b₁) # 784×128 matrix multiply h₂ = ReLU(W₂ × h₁ + b₂) # 128×64 matrix multiply y = softmax(W₃ × h₂ + b₃) # 64×10 matrix multiply Total MACs: 784×128 + 128×64 + 64×10 ≈ 110,000 MACs per inference

3. Common Layer Types

Fully Connected (Dense)

Every input connected to every output. Most straightforward layer.

Convolutional (CNN)

Sliding window of weights across spatial input.

Recurrent (LSTM, GRU)

Sequential processing with hidden state (memory).

Attention (Transformer)

Query, key, value dot products (matrix multiply again!).

4. Backpropagation (Training)

How weights are updated during training (chip designers need to support this).

Forward Pass: z = Wx + b a = ReLU(z) Loss Computation: L = (a - target)² Backward Pass (gradient computation): dL/da = 2(a - target) dL/dz = dL/da × ReLU'(z) dL/dW = dL/dz × x^T dL/db = dL/dz Weight Update (Gradient Descent): W_new = W_old - learning_rate × dL/dW b_new = b_old - learning_rate × dL/db Hardware: Forward + backward = 2-3x compute of inference alone

5. Activation Functions

Introduce non-linearity. Hardware must support these efficiently.

FunctionFormulaHardware CostUse Case
ReLUmax(0, x)1 comparisonHidden layers (most common)
Sigmoid1/(1+e^-x)Expensive (exp)Binary classification
Tanh(e^x - e^-x)/(e^x + e^-x)Expensive (exp)RNN gates
Softmaxe^x_i / Σe^x_jExpensive (exp, reduce)Multi-class output
GELUx × Φ(x)Moderate (approx)Transformers (modern)

Hardware design tip: ReLU is free (just max logic). Others require expensive exponential hardware or lookup tables.

6. Batch Processing

Processing multiple samples simultaneously (critical for throughput).

Single Sample: output = f(W × x + b) # 1 sample Compute: K MACs (K = weight parameters) Batch of N Samples: Output = f(W × X + b) # X is N×input matrix Compute: K × N MACs (same hardware, N times throughput) Utilization: Single sample: Low (if hardware sized for batches) Batch of 32: Good Batch of 256: Excellent (memory bandwidth limits) Chip design: Systolic array sized for batch processing Example: 256×256 array processes batch-of-16, 256-dimensional vectors

7. Model Architectures

CNNs (Convolutional Neural Networks)

RNNs/LSTMs

Transformers

8. Hardware Implications

Next (Day 3): Inference architecture and data flow optimization.