Home›AI Chip›Day 14

Design Decisions & Integration

Practical AI chip design decisions. Architecture selection, technology choices, design flow, and system integration.

By EcrioniX · Published June 13, 2026 · ~3300 words · 9 min read

1. Technology Node Selection

Node	Year	Density	Cost/die	Use Case
5nm	2021+	Highest	High	High-perf (GPU, TPU)
7nm	2019+	High	Medium	Balanced (inference)
14nm	2015+	Medium	Low	Mobile, volume
28nm	2012+	Low	Very Low	Legacy, cost-critical

Trade-off: Advanced node = higher density/performance, but higher masks cost (need high volume)

2. Architecture Decision Tree

Question 1: Use case?

Inference only → Simpler, optimize latency/throughput
Training + inference → More complex, need both forward/backward

Question 2: Performance target?

Latency-critical → Few large cores, high clock
Throughput-critical → Many smaller cores, lower clock

Question 3: Flexibility?

Specialized → Systolic array (fast, inflexible)
General-purpose → GPU-like (slower, flexible)

3. Design Flow

Specification: Performance target, power budget, cost
Architecture: Core count, memory hierarchy, interconnect
RTL design: Implement in Verilog/SystemVerilog
Simulation: Verify correctness at architectural level
Synthesis: Convert RTL to gate-level (timing closure)
Physical design: Place & route, power/ground distribution
Sign-off: Static timing, DRC/LVS, power analysis
Tape-out: Send to fab
Post-silicon: Test, bring-up, production

4. Verification Strategy

Unit testing: Each module independently (MAC unit, cache, etc.)
Integration testing: Full chip with real models (ResNet, BERT)
Formal verification: Arithmetic correctness proofs
Timing closure: Static timing analysis (STA)
Power simulation: Worst-case power scenarios

5. Real-World Design Examples

Google TPU v4 Decisions

Node: 5nm (highest density)
Architecture: 256×256 systolic array (specialized)
Memory: 8GB on-chip HBM3 (weight caching)
Power: 400W (datacenter power budget)
Design choice: Specialize for matrix multiply, accept inflexibility

NVIDIA H100 Decisions

Node: 5nm (competitive with TPU)
Architecture: GPU-like (many cores, tensor units)
Memory: 80GB HBM3 (large, flexible)
Power: 700W (higher than TPU)
Design choice: Flexibility over specialization (supports various models)

6. Cost Considerations

Mask cost (NRE): ~$5M at 5nm, $500K at 28nm

Amortized if volume > 100K units
For low-volume (academia), use FPGA or 65nm

Die cost: Proportional to area and yield

256×256 systolic: ~100mm² at 5nm
Cost ~$50 per die (5nm, high yield)

7. Design Checklist

✅ Define specs: Performance, power, cost targets
✅ Choose technology: Balance density, cost, schedule
✅ Architecture: Systolic, GPU-like, or hybrid?
✅ Memory hierarchy: On-chip SRAM, off-chip DRAM
✅ Precision support: FP32, BF16, INT8 combo
✅ Design flow: Tools (Synopsys, Cadence)
✅ Verification: Testbenches, formal proofs
✅ Timing/power: Closure and sign-off

Next (Day 15): System integration and production.