HomeDay 15

Real-World Case Studies

Lessons from production chip bugs and how better CDC design prevented them.

Case Study 1: The Missing Synchronizer

Bug: Engineer "optimized" away a synchronizer on an interrupt signal crossing from peripheral to CPU clock domain. Thought: "interrupt only changes once per second, we'll catch it in simulation."

Result: Rare metastability in production. Interrupt sometimes observed by CPU as 1 or 0 randomly. Field reports: system hangs every few hours.

Lesson: You cannot optimize away CDC synchronizers. Every async input needs 2+ FFs. No exceptions.

Case Study 2: Gray Code Misapplication

Bug: Binary counter pointer synced with Gray code across domains (correct). But the designer also applied Gray code to "optimize" a command encoding field (WRONG—Gray code only works for pointers).

Result: Commands randomly corrupted. Took weeks to isolate because tests passed 90% of the time.

Lesson: Gray code is only for counters/pointers. Use handshake or pulse synchronization for other data.

Case Study 3: Reset Never Synchronized

Bug: Global async reset applied directly to all clock domains (no synchronization). Led to metastability in secondary clock domains during power-up.

Result: Intermittent startup failures. Device worked 95% of the time, failed 5%. Extremely hard to reproduce and debug.

Lesson: Reset MUST synchronize to every clock domain. Use synchronizers, not just direct assertions.

Case Study 4: The Pulses That Collided

Bug: Pulse from domain A to domain B used simple 2-FF synchronizer (designed for levels, not pulses). If pulse occurred near synchronizer metastability window, it could be extended or lost.

Result: Transactions intermittently missed. Data loss in production.

Lesson: Use pulse synchronizers or stretchers, not 2-FF on pulses.

Key Lessons from Production

✅ CDC bugs are rare but catastrophic
✅ Simulation alone misses corner cases
✅ Always use formal CDC verification
✅ Document every cross-domain signal
✅ Review CDC before RTL freeze
✅ Train your team on CDC best practices
✅ CDC failures = field failures (can't be patched)

🎉 Congratulations!

You've completed the CDC Guide. You understand:

Next steps: Build a multi-clock design. Start with a simple async FIFO (Day 5), add a handshake protocol (Day 6), verify it with simulation and formal. That's where CDC mastery happens.

Clock domain crossing is hard, but now you know the rules. Every correct synchronizer protects billions of transistors. 🚀