VLSI Interview Questions – Google, Qualcomm, NVIDIA

01

Easy RTL Design

What is the difference between a latch and a flip-flop? When would you use each?

A latch is a level-sensitive storage element. When the enable (or clock) is HIGH, the output follows the input continuously — the latch is "transparent." When the enable goes LOW, the last value is held. A flip-flop is edge-triggered — it samples the input only at the exact rising (or falling) clock edge and ignores the input at all other times.

Why flip-flops dominate RTL design: Their predictable sampling window makes static timing analysis (STA) straightforward — setup and hold times are well-defined relative to one clock edge. Latches create "transparent windows" that make STA far more complex; timing tools must ensure that no combinational path through an open latch violates timing in any cycle.

When to use latches deliberately:

Power savings: A latch consumes no clock dynamic power when transparent (no clock-to-output toggling).
High-performance pipelines: In "latch-based" designs (common in custom datapath and CPUs), a latch pair (master + slave) forms a pseudo-FF but allows time-borrowing — a slow first half-cycle can steal time from a fast second half-cycle, improving throughput.
Specialized cells: Sense amplifiers and certain memory cells use latch-based structures.

Interview tip: Google interviewers often follow up: "What is an enable flip-flop and how is it synthesized?" — Answer: it's a mux feeding the D input: D = en ? data_in : Q. The synthesis tool maps this to a clock-gate cell, not a latch.

02

Medium RTL Design

How do you design a clock divider by 3 with exactly 50% duty cycle?

Dividing by an odd number and achieving 50% duty cycle requires using both clock edges. A single-edge counter can only produce a 33%/67% duty cycle.

The technique: Create two signals derived from the same mod-3 counter — one toggled on the rising edge, one on the falling edge — then OR them together.

Use a 2-bit counter clocked on the rising edge counting 0→1→2→0. Generate out_r = HIGH when count == 0, LOW when count == 1.
Use the same counter logic clocked on the falling edge. Generate out_f identically.
Final output = out_r OR out_f. Because out_r and out_f are offset by half a source clock period, their OR produces exactly 1.5 periods HIGH and 1.5 periods LOW out of every 3 source periods — 50% duty cycle.

Why this works: The rising-edge FF goes HIGH at the source rising edge of count 0, and the falling-edge FF goes HIGH at the source falling edge of count 0 — they overlap for 1.5 source cycles. The combined waveform has transitions at every 1.5 source clock periods.

03

Medium RTL Design

How do you calculate the required depth of an asynchronous FIFO?

The FIFO must hold all data written during a burst before the reader catches up. The minimum depth is:

Depth ≥ (Write rate − Read rate) × Burst duration + Synchronizer latency guard

Breaking this down:

Burst excess: If a writer sends at fw words/cycle for T cycles, and the reader drains at fr words/cycle, the net accumulation is (fw − fr) × T words. This is the minimum storage needed.
Synchronizer latency: The gray-code read pointer takes 2–3 destination-clock cycles to propagate through the synchronizer. During this window, the write side may falsely see the FIFO as full (or the read side sees it as empty). Add 2–3 words of margin per side.
Round up to power of 2: Async FIFO address arithmetic requires a power-of-2 depth so that the gray code pointer MSB inversion trick for full/empty detection works correctly.

Example: Writer at 400 MHz (1 word/cycle), reader at 200 MHz (1 word/cycle), burst of 16 words. Excess = (1−0.5)×16 = 8 words, plus 4 guard words → depth = 12, round up to 16.

04

Easy RTL Design

What is a glitch in combinational logic and how do you prevent it?

A glitch (or hazard) is a spurious, short-duration output pulse that occurs when multiple inputs change simultaneously and race through paths of unequal delay. Even though the steady-state output is correct, the transient produces an unwanted transition.

Classic example: A static-1 hazard in a 2-input AND gate where both inputs come from the same signal A through paths with different delays — one direct, one through an inverter. Mathematically A AND NOT(A) = 0, but if the direct path is faster, the gate briefly sees 1 AND 1 = 1 before the inverted path arrives.

Why glitches matter:

Power waste: Every glitch is a switching event that consumes dynamic power (αCV²f). High-activity buses can waste significant power.
Clock path corruption: A glitch on a clock or enable line can clock a flip-flop at the wrong time, causing functional failure.
Latch transparency: Glitches on a latch enable propagate directly to the latch output while it is transparent.

Prevention:

Register outputs: Sampling glitchy combinational logic in a flip-flop on the clock edge filters all glitches shorter than the setup window.
Hazard-free logic: In Karnaugh map minimization, add "consensus" prime implicant terms that cover the transition between any two adjacent groups — eliminates static hazards.
Clock gating cells (ICG): Use library clock gate cells that latch the enable on the clock LOW phase — ensures the gated clock output is always a complete pulse or no pulse at all.

05

Hard RTL Design

How do you implement a glitch-free clock multiplexer for two asynchronous clocks?

A naive assign clk_out = sel ? clk1 : clk0 will produce a glitch when sel changes — the output can get a truncated pulse from one clock or a merged pulse from both. This corrupts any flop clocked by clk_out.

The safe design uses an interlocked two-branch structure:

Each branch has a flip-flop clocked on the falling edge of its own clock to gate that branch on or off.
Branch 0 FF: D = !sel AND !en1_q, clocked on negedge clk0. Branch 1 FF: D = sel AND !en0_q, clocked on negedge clk1.
Each branch gates its own clock: clk0_g = clk0 AND en0_q, clk1_g = clk1 AND en1_q.
Output: clk_out = clk0_g OR clk1_g.

Why falling-edge clocking? Gating on the falling edge ensures that the enable change is captured while the clock is LOW, so the gated clock output is either a complete HIGH pulse or nothing — never a partial pulse.

Why the cross-interlocking? The !en1_q / !en0_q terms ensure only one branch is ever active at a time. The transition from clk0 to clk1 requires clk0's branch to deassert fully before clk1's branch asserts — preventing both from being active simultaneously.

Caveat: The switch takes 1–2 cycles of the slower clock to complete (synchronizer latency). This is expected and acceptable. Never use this for cycle-accurate switching without understanding the latency.

06

Easy Timing / STA

What are setup time and hold time? What happens when each is violated?

Setup time (t_su) is the minimum time the data input must be stable before the active clock edge for the flip-flop to reliably capture it. Hold time (t_h) is the minimum time the data must remain stable after the clock edge.

Together they define a "forbidden window" around the clock edge where data must not change.

Setup violation (data arrives too late): The flip-flop samples data before it has settled to a valid logic level. The FF may capture the wrong value or enter a metastable state. This is a functional failure at the target frequency — the design either works slowly or not at all. Setup violations are frequency-dependent: slow down the clock enough and they disappear.

Hold violation (data changes too soon after the clock edge): The flip-flop's captured value is overwritten before it is fully stored. This causes the FF to capture a corrupted value — either the new data that hasn't fully arrived, or garbage. Hold violations are frequency-independent — they occur even at 1 Hz and are caused by short combinational paths (fast data propagation relative to clock skew). They are the more dangerous class because no amount of slowing down the clock fixes them.

Key insight: Setup violations → fix by making data arrive earlier or giving more time (slow clock). Hold violations → fix by making data arrive later (insert delays), completely unrelated to clock frequency.

07

Medium Timing / STA

How do you fix a setup timing violation? How about a hold violation? Are the fixes different?

Yes, the fixes are completely different and cannot be mixed up.

Fixing a setup violation (data arrives too late — reduce data path delay):

Replace high-drive-strength cells with faster variants (higher Vt cells are slower; swap to lower Vt)
Reduce logic depth — restructure combinational logic to fewer gate stages
Use retiming — move registers across combinational logic to balance stages
Add pipeline registers to split a long path into two shorter ones
Optimize clock skew — use positive skew on the capture FF (delay the capture clock) to give the data path more time
Last resort: reduce the clock frequency

Fixing a hold violation (data arrives too early — increase minimum data path delay):

Insert delay buffers (DEL cells from the standard cell library) on the short data path
Use higher-drive-strength cells — paradoxically, some cells have more internal delay than smaller ones
Add logic stages that cancel each other (insert an even number of inverters)
Adjust clock skew: negative skew on the capture FF (advance the capture clock) reduces the hold window

Critical rule: When fixing hold violations by adding buffers, always re-check setup slack — the added delay consumes setup margin. Both must pass simultaneously.

08

Medium Timing / STA

What is the difference between clock skew and clock jitter? How does each affect timing?

Clock skew is a static, deterministic difference in clock arrival time between two flip-flops on the same chip. It is fixed for a given netlist and process corner. Skew arises from different buffer depths or wire lengths in the clock tree.

Clock jitter is a dynamic, cycle-to-cycle variation in the clock edge position. It is random (caused by power supply noise, substrate coupling, PLL VCO noise) and varies every cycle. You cannot predict the sign or magnitude of jitter in any given cycle.

Effect on timing:

Skew and setup: Positive skew (capture FF clock arrives later) helps setup — the data has more time to travel. Negative skew (capture clock earlier) hurts setup.
Skew and hold: Positive skew hurts hold — data launched early from the launch FF might arrive at the capture FF before its (delayed) clock edge. Negative skew helps hold.
Jitter and both: Jitter degrades both setup and hold because you can't know which direction the edge will shift. STA tools add a "clock uncertainty" (a worst-case jitter margin) that reduces both setup and hold slack. It cannot be recovered through skew optimization.

Rule of thumb: Skew is a tool — you can use it (via clock tree tuning) to deliberately help tight paths. Jitter is noise — you can only characterize it and budget for it.

09

Hard Timing / STA

Walk through a complete flip-flop to flip-flop timing path analysis. What is slack?

For a path from a launch flip-flop (FF1) to a capture flip-flop (FF2):

Setup check — data must arrive before the capture edge:

Data arrival time = T_clk_launch + T_cq(FF1) + T_comb
Data required time = T_clk_capture − T_setup(FF2)
Setup slack = Required − Arrival = (T_clk_capture − T_su) − (T_clk_launch + T_cq + T_comb)

Hold check — data must not arrive too early:

Data must arrive after: T_clk_capture + T_hold(FF2)
Hold slack = Arrival − Hold_required = (T_clk_launch + T_cq_min + T_comb_min) − (T_clk_capture + T_hold)

Where:

T_cq = clock-to-Q propagation delay of the launch FF
T_comb = total combinational path delay (sum of gate + wire delays)
T_setup / T_hold = FF timing constraints from the cell library
T_clk_capture − T_clk_launch = clock skew (positive = capture is later)

Slack = margin above the requirement. Positive slack → timing met. Negative slack → timing violated. The most negative slack in the design = worst negative slack (WNS); summing all negative slacks = total negative slack (TNS).

STA tools use MMMC: Multi-mode, multi-corner analysis — checking setup at slow-corner (slow cells, high temp, low Vdd) and hold at fast-corner (fast cells, low temp, high Vdd) simultaneously, since those are the worst cases for each check.

10

Hard Timing / STA

After place-and-route, a flip-flop path has both a setup AND a hold violation. How do you approach this?

A simultaneous setup and hold violation on the same path means the combinational delay window is too narrow — the path is fast enough to threaten hold, but not fast enough to meet setup. This typically means the logic between two FFs is very shallow (perhaps only a wire or one gate), but there is also a clock tree imbalance creating large skew.

Diagnosis first:

Check the clock skew between launch and capture FFs. Excessive positive skew can simultaneously worsen hold (by making the capture clock late) while hurting setup if the data path is borderline.
Look at the path's actual combinational depth — very few gates means it's a structurally short path.

Fix strategy:

Rebalance the clock tree first: Reduce skew between these two FFs. This is the most targeted fix — less skew directly improves both simultaneously.
Insert delay cells: Add buffers on the data path to increase minimum delay (fix hold). Then verify setup is still met — if setup is tight, you may need to also optimize the logic depth.
Restructure logic: If setup is violated because the path is in a long combinational chain overall, pipelining it (adding an intermediate register) can help. But this changes the design architecture.

At advanced nodes (7nm, 5nm): This scenario is common because cells are extremely fast and clock skew control is harder with dense routing. PDKs provide dedicated delay cells (e.g., DLY4, BUF_DEL) tuned specifically for hold fixing without wasting area.

11

Medium CDC

What is metastability? Can it be completely eliminated?

Metastability occurs when a flip-flop's setup or hold time is violated — the flip-flop enters a metastable state where its output is neither a valid logic 0 nor a valid logic 1. The internal node of the FF is stuck near the switching threshold (V_DD/2) and takes an unpredictable time to resolve to a valid level.

The physics: A flip-flop is a bistable element with two stable equilibria (0 and 1) and one unstable equilibrium (the metastable point). When forced into the unstable point, it resolves exponentially fast — but how long it takes is governed by thermal noise and is therefore random.

Can it be eliminated? No — not completely. Any time asynchronous data crosses a clock boundary, there is a non-zero probability of violating the setup/hold window. The probability of remaining metastable beyond a time T_r decreases exponentially with T_r, but never reaches exactly zero.

What we do instead: We manage the probability using synchronizers. The key metric is MTBF (Mean Time Between Failures). A 2-flop synchronizer gives the metastable FF one full clock period to resolve — in modern CMOS (τ ≈ 30ps), at 1 GHz this gives MTBF of thousands of years, making failure astronomically unlikely.

Critical: A metastable output propagating into combinational logic is dangerous — the intermediate voltage can cause multiple gates to output contradictory values, causing unpredictable circuit-wide failures. Always ensure metastability resolves before the signal fans out.

12

Medium CDC

What is a 2-flop synchronizer and why exactly 2 flops? Why not 1 or 3?

A 2-flop synchronizer consists of two back-to-back flip-flops, both clocked by the destination domain clock, inserted on a signal crossing from another clock domain.

How it works: The first FF may go metastable when it samples the asynchronous input. It has one full clock period (minus the FF's own propagation delay and the second FF's setup time) to resolve. Because metastability resolution time is exponential, the probability that it remains metastable long enough to corrupt the second FF is extremely small.

Why not 1 flop? One flop doesn't give enough resolution time. The metastable FF must resolve within roughly T_clk − T_cq − T_setup2 — for a 1 GHz clock, that's ~500ps. The probability of remaining metastable that long is non-trivial in some technologies.

Why not 3 flops? Three flops are rarely necessary. With a 1 GHz destination clock and τ ≈ 30ps (modern 7nm), a 2-flop synchronizer gives:

Resolution time T_r ≈ 500ps, τ ≈ 30ps
MTBF ≈ e^(T_r/τ) / (f_c × f_d) ≈ e^16.7 / 10^18 → millions of years

Three flops extend MTBF to an astronomically larger number that adds no practical benefit. Use 3 flops only in safety-critical applications (automotive ASIL-D, aerospace) where even million-year MTBF is required to be proven insufficient with 2.

13

Medium CDC

Why are Gray code counters used for asynchronous FIFO read/write pointers?

In an async FIFO, the read pointer lives in the read clock domain and the write pointer lives in the write clock domain. Each pointer must be compared against the other's synchronized version to determine full or empty.

The problem with binary counters: When a binary counter increments, multiple bits change simultaneously. For example, 0111 → 1000 changes all 4 bits. If you sample a binary counter while it's transitioning, you might read any of the 16 possible values — a catastrophic error that could falsely declare the FIFO full or empty, corrupting data.

Why Gray code solves this: A Gray code changes exactly one bit per count. When the pointer transitions from count N to N+1, only one bit flips. If the synchronized copy is sampled mid-transition, the worst case is that it sees either count N or count N+1 — off by at most one.

The FIFO full/empty logic deliberately has one count of tolerance: full is declared when the pointers are exactly DEPTH apart, not DEPTH−1. This one-count margin absorbs the maximum one-count error introduced by Gray code sampling, making the detection robust.

Implementation note: The Gray code conversion is simple: gray = bin XOR (bin >> 1). Store the Gray counter as the pointer, convert back to binary (requires a loop of XORs) only if you need the absolute address for memory indexing.

14

Hard CDC

What is MTBF in the context of synchronizers and what parameters influence it?

MTBF (Mean Time Between Failures) quantifies how often a synchronizer is expected to allow a metastable signal to propagate into the destination domain. The standard formula is:

MTBF = e^(T_r / τ) / (f_c × f_d × t_w)

Where:

T_r — resolution time available for metastability to resolve (≈ T_clk − T_cq_ff1 − T_su_ff2). The more time, the exponentially higher the MTBF.
τ — technology metastability time constant. A smaller τ means the FF resolves faster, improving MTBF. Scales with process: ~100ps at 180nm, ~30ps at 7nm.
f_c — destination clock frequency. Higher frequency = more sampling opportunities per second = more chances for metastability to cause failure.
f_d — data toggle rate. How often does the incoming signal change near the clock edge?
t_w — the setup+hold window width. Narrower window = smaller probability of entering metastability per clock cycle.

The exponential dependence on T_r/τ is why adding a second synchronizer flip-flop dramatically improves MTBF — it adds one full clock period to T_r.

Practical target: Design teams typically target MTBF > 10,000 years per synchronizer. At 1 GHz with modern 7nm silicon, a 2-FF synchronizer achieves this comfortably. If not, either add a third FF or investigate if the signal can be designed to toggle only when safely away from clock edges.

15

Easy Low Power

What is the difference between dynamic power and static (leakage) power? How do you reduce each?

Dynamic power = α × C_L × V_DD² × f — consumed when a node switches from 0 to 1 (charges the load capacitance) or 1 to 0 (discharges through the pull-down network). α is the activity factor (fraction of clock cycles the node switches). This power is zero when the circuit is idle.

Static (leakage) power = I_leakage × V_DD — consumed even when no gates are switching, due to sub-threshold current, gate oxide tunneling, and junction leakage. It does not depend on frequency and is present whenever power is supplied.

Historical trend: In nodes above 90nm, dynamic power dominated. Below 28nm and especially at 7nm/5nm, leakage has grown dramatically because transistors cannot be switched fully off at low supply voltages. Modern SoCs spend significant area on leakage management.

Reduction techniques:

Dynamic: Clock gating (reduce α), operand isolation (prevent toggling of datapath), voltage scaling (V² dependence), frequency scaling, low-swing signaling
Static: Power gating (cut V_DD to an entire domain via header/footer cells), multi-Vt design (use High-Vt cells in non-critical paths — slower but much lower leakage), reverse body biasing, state retention during power down

16

Medium Low Power

How is clock gating implemented correctly? Why can't you just AND the clock with an enable signal?

Clock gating removes the clock signal from a register bank when its value won't change, eliminating the clock-to-Q dynamic power and the switching power of all downstream logic. The clock network can account for 30–40% of total chip dynamic power, making clock gating one of the highest-impact power techniques.

Why you cannot simply write assign gated_clk = clk AND enable:

If enable changes while clk is HIGH, the AND gate output glitches — it produces a truncated clock pulse shorter than a full cycle. This truncated pulse can violate the setup/hold requirements of any flip-flop it clocks, corrupting stored data or causing metastability.

The correct implementation — Integrated Clock Gating (ICG) cell:

A latch samples the enable signal on the LOW phase of the clock (when clock = 0)
The latched enable is then ANDed with the clock
Because the latch captures enable only when the clock is LOW, by the time the clock rises, the latch output is stable — the AND gate sees a stable enable and a clean rising edge → full clock pulse or no pulse, never a partial one

In RTL, you write: if (enable) register <= data; and the synthesis tool infers an ICG cell. Never write clock gating manually at the gate level in RTL — let the tool use the optimized library ICG cell.

Interview follow-up: "What is operand isolation?" — It prevents data inputs to a gated block from toggling even when the block is clock-gated, saving the switching power of the combinational logic feeding the registers.

17

Easy Verification / DFT

What is a scan chain and why is it used in Design for Test (DFT)?

A scan chain connects the flip-flops in a design into a long shift register that can be controlled and observed from the chip's I/O pins, purely for testing purposes.

How it works: Each flip-flop in the design is replaced with a scan flip-flop — identical to a normal FF but with an extra 2:1 mux at the data input:

In functional mode (scan_enable = 0): the mux passes normal D input — the design operates as designed.
In scan mode (scan_enable = 1): the mux passes the previous FF's output — all FFs form a shift register. You can shift in a test pattern, capture one functional clock cycle, and shift out the results for comparison.

Why it's essential: Without scan, testing whether a stuck-at fault (wire permanently stuck at 0 or 1) exists deep in the chip requires applying just the right sequence of primary input patterns — combinatorially explosive. With scan, an ATPG (Automatic Test Pattern Generation) tool can directly control any FF's state and observe any FF's captured output, enabling near-100% stuck-at fault coverage with a manageable number of test vectors.

Test flow at production: After packaging, every chip is tested on an ATE (Automated Test Equipment). The scan chain shifts in millions of vectors and compares the shifted-out responses against the fault-free model. Any mismatch → chip fails and is discarded.

DFT also covers: BIST (Built-In Self Test) for memories, boundary scan (JTAG IEEE 1149.1) for board-level interconnect testing, and compression (EDT, WBC) to reduce the number of test vectors while maintaining coverage.

18

Easy Protocols

Explain the AXI4 VALID/READY handshake mechanism. What rule must never be broken?

Every AXI4 channel (AW, W, B, AR, R) uses a two-signal handshake: VALID (driven by the sender) and READY (driven by the receiver). A transfer occurs on the rising clock edge when both VALID and READY are simultaneously HIGH.

Rules:

The sender asserts VALID when it has valid data/address to send and must not deassert VALID until the transfer completes (both signals HIGH on a clock edge).
The receiver asserts READY when it can accept data. READY may be HIGH before VALID (pre-ready) — this is fine.
If VALID is asserted and READY is LOW, both sides wait. Neither can "cancel" the transaction by deasserting VALID without completing the handshake.

The rule that must never be broken: VALID must not combinatorially depend on READY. If the master only asserts VALID after it sees READY, and the slave only asserts READY after it sees VALID, the result is a deadlock — neither ever fires first. READY is allowed to depend on VALID, but not vice versa.

AXI4 has 5 independent channels, enabling key features: a new write address (AW) can be accepted while write data (W) from a previous burst is still in-flight. Read and write transactions are completely independent, maximizing bus utilization.

19

Hard Protocols

How does AXI4 support out-of-order transaction completion? What is the role of transaction IDs?

AXI4 allows a master to issue multiple outstanding read or write transactions before receiving responses. Each transaction is tagged with a Transaction ID (ARID for reads, AWID for writes). The slave and interconnect are free to complete transactions in a different order from how they were issued — a fast SRAM access may return data before a slow DRAM access even if the DRAM request was issued first.

How the master reconciles responses: Read data returns on the R channel with RID matching the original ARID. Write responses return on the B channel with BID matching AWID. The master maintains an outstanding transaction table and uses the ID to match each response to the correct request.

Ordering rule per ID: Transactions with the same ID must complete in order. If a master issues two reads both with ARID=3, the interconnect must return them in order. Transactions with different IDs have no ordering guarantee relative to each other.

Interconnect ID widening: When multiple masters share an interconnect, the fabric appends a master-identifying prefix to each ID (e.g., 2-bit master select + original ARID = extended RID). On the response path, the prefix is used to route the response back to the correct master, which strips the prefix before comparing IDs.

AXI4 vs AXI3: AXI3 allowed interleaved write data (WID specified which burst a W beat belonged to). AXI4 removed WID — write data must always be in the same order as write addresses. This simplification makes interconnects significantly cheaper to implement.

20

Easy Architecture

What is pipelining? What does it improve and what are its trade-offs?

Pipelining divides a long combinational operation into N sequential stages, each separated by flip-flops. Instead of one result every T_total clock period (dictated by the slowest path), you get one result per T_total/N clock period — throughput increases N× once the pipeline is full.

Example: A 5-stage 32-bit multiplier at 500 MHz produces one product every 2 ns. Without pipelining, the same logic would run at 100 MHz (5× slower combinational chain). With pipelining, a new multiply starts every cycle — though each individual result still takes 5 cycles of latency.

What pipelining improves: Throughput (results per unit time) — directly, by allowing clock frequency to be multiplied by the number of stages.

Trade-offs:

Latency: Each result takes N cycles to complete instead of 1. This is often acceptable for bulk data, but hurts interactive or latency-sensitive operations.
Area: N−1 extra register stages add flip-flop area and routing overhead.
Power: More registers switching every cycle; however the lower V_DD enabled by higher-frequency operation may offset this.
Hazards: Data hazards (RAW — read after write: a stage needs a result not yet produced by a later stage), control hazards (branches), and structural hazards (resource conflicts) require stalls, forwarding, or branch prediction logic — all of which reduce ideal throughput.
Balancing: If one stage is slower than others, it bottlenecks the pipeline. All stages must be balanced to the same worst-case delay for the frequency gain to be fully realized.

01

Easy RTL Design

What is the difference between blocking (=) and non-blocking (<=) assignments in Verilog? When should each be used?

Blocking assignment (=) executes sequentially within an always block — each statement completes before the next begins, exactly like a software assignment. The left-hand side updates immediately.

Non-blocking assignment (<=) evaluates all right-hand sides first (using values from the current time step), then schedules all left-hand side updates to happen simultaneously at the end of the time step. This models the parallel behavior of flip-flops sampling their D inputs on a clock edge.

The golden rules:

Use = (blocking) for combinational logic in always @(*) blocks. The sequential evaluation correctly implements the logic function.
Use <= (non-blocking) for sequential logic in always @(posedge clk) blocks. The simultaneous update models how FFs all sample their D input on the same clock edge.
Never mix both types in the same always block.

Classic bug with the wrong choice: A shift register written with blocking assignments (a = in; b = a; c = b;) immediately propagates the input through all stages in a single clock cycle. With non-blocking (a <= in; b <= a; c <= b;), all three FFs sample their current input simultaneously — correct shift register behavior.

Synthesis impact: Mixing = and <= in a clocked block can produce simulation–synthesis mismatches — the simulator and the synthesized hardware behave differently. This is one of the most common Verilog bugs in job interviews and real designs.

02

Easy RTL Design

What are the differences between wire, reg, and logic in SystemVerilog? Why was logic introduced?

wire (Verilog): a net type representing a physical connection. It can only be driven by continuous assignments (assign) or module output ports. Multiple drivers resolve via wired-AND/OR logic depending on the net type. It cannot hold state.

reg (Verilog): a variable that can be driven inside procedural blocks (always, initial). Despite its name, it does NOT necessarily synthesize to a register — a reg inside always @(*) synthesizes to combinational logic. The name is misleading and a common source of confusion.

logic (SystemVerilog): a unified 4-state variable type that replaces both wire and reg for most use cases. It can be driven by both continuous assignments and procedural blocks. The key restriction: logic allows only one driver — the compiler flags multi-driver errors that wire silently allows. This catches accidental bus conflicts at compile time.

Why logic was introduced:

Eliminates the confusing reg misnomer — logic communicates data type, not inferred hardware.
Provides compile-time multiple-driver checking that wire lacks.
Works in both continuous and procedural contexts, reducing declarations.

Modern SV practice: Use logic for almost everything. Use wire only when you explicitly need multiple drivers (e.g., tri-state buses, wired-AND). Avoid reg entirely in new SystemVerilog code.

03

Medium RTL Design

How do you detect full and empty conditions in a synchronous FIFO? What is the "extra bit" trick?

A synchronous FIFO uses a write pointer (wrptr) and a read pointer (rdptr) to track the head and tail. Both pointers start at 0. When the FIFO is empty, both point to the same location — and when it is completely full, both also point to the same location after wrapping around. This ambiguity is the core challenge of FIFO pointer design.

The naive approach fails: If both pointers are N-bit binary counters with range 0 to DEPTH-1, you cannot distinguish full from empty because both conditions result in wrptr == rdptr.

The extra bit trick: Use N+1 bit pointers, where N = log₂(DEPTH). The lower N bits are the actual memory address; the MSB (the "extra bit") acts as an overflow wrap indicator.

Empty: wrptr == rdptr (all N+1 bits equal — same wrap count, same address)
Full: wrptr[N-1:0] == rdptr[N-1:0] AND wrptr[N] != rdptr[N] (same address, but one extra wrap ahead)

The MSBs differ when the write pointer has wrapped one more time than the read pointer — meaning the FIFO is exactly DEPTH entries deep.

For async FIFOs: The same trick applies, but the pointers are Gray-coded before being synchronized across clock domains. Gray code changes only one bit per count, making the synchronized pointer off by at most one count — safe for full/empty logic.

04

Medium Timing / STA

What is the difference between a false path and a multi-cycle path in SDC constraints? How do you set each?

False path: a timing path that exists in the netlist but will never carry valid data in the real operating system. STA should completely ignore it — no setup or hold analysis. Examples:

Paths between two completely unrelated, never-simultaneously-active clock domains
Paths from a test-mode-only mux output that is static during functional operation
Reset synchronizer paths where the reset is never timing-critical
Paths between scan mode logic not active during functional timing

SDC: set_false_path -from [get_cells launch_ff] -to [get_cells capture_ff]

Multi-cycle path (MCP): a path where data is intentionally designed to take N clock cycles to settle. The designer tells STA to use N×T_clk as the available time for the setup check instead of 1×T_clk.

SDC for a 2-cycle setup path: set_multicycle_path 2 -setup -from ... -to ...

Critical rule for MCP: When you relax setup by N cycles, you MUST also adjust the hold check. By default, STA places the hold check one cycle before the setup capture edge — correct for 1-cycle paths. For a 2-cycle setup, the hold check must also move back one cycle:

SDC: set_multicycle_path 1 -hold -from ... -to ...

Common mistake: Setting set_multicycle_path 2 -setup without the matching -hold exception creates an overly pessimistic hold check one cycle before the new setup capture — often an impossible hold requirement that forces unnecessary delay insertion.

05

Hard Timing / STA

What is OCV (On-Chip Variation)? What is the difference between flat OCV, AOCV, and POCV?

OCV (On-Chip Variation) acknowledges that cells at different locations on the same die do not experience identical conditions. Spatial gradients in temperature, VDD (due to IR drop), and manufacturing process (oxide thickness, doping) cause cells in different parts of the chip to have slightly different delays — even if they are the same cell type running at the same nominal conditions.

This matters for STA because the clock path and data path typically run through physically different areas of the chip. If both paths were derated the same way, the error would cancel. But since one may be faster and the other slower, we must be pessimistic.

Flat OCV (Flat Derating): Applies a single multiplicative derating factor to all cells. The launch data path is made slower (multiply delays by e.g. 1.05) and the capture clock path is made faster (multiply by 0.95) for setup — worst-case pessimism everywhere. Simple but overly conservative.

AOCV (Advanced OCV): Uses a lookup table indexed by path depth (number of logic stages) and distance. Longer paths with more stages average out variation — a 30-stage path has less cell-to-cell variation than a 2-stage path. AOCV assigns less derating to deep paths, reducing pessimism and improving timing convergence without sacrificing accuracy.

POCV (Parametric OCV / LVF): Uses full statistical distributions (mean and sigma) for each cell's delay, propagating uncertainties through the path using statistical addition. This is the most accurate method and is becoming the industry standard at 7nm and below, where AOCV is no longer pessimistic enough.

Qualcomm context: Snapdragon SoCs run at high frequency with tight timing margins. Moving from flat OCV to AOCV/POCV can recover 50–150 ps of setup slack that flat OCV was needlessly consuming — directly enabling higher clock frequencies or lower voltage operation.

06

Hard Timing / STA

What is CRPR (Clock Reconvergence Pessimism Removal)? Why does it matter?

When STA analyzes a flip-flop-to-flip-flop path, the launch clock path (from clock source to FF1) and the capture clock path (from clock source to FF2) often share common clock buffers near the root of the clock tree before they diverge.

With OCV derating, the tool pessimistically applies opposite deratings to the launch and capture paths: the launch clock is made slower (derated up) and the capture clock is made faster (derated down) for setup analysis. But the shared portion of the two paths cannot simultaneously be both slow and fast — it is the same physical cell running at the same moment in time.

CRPR removes this double-counting. For the portion of clock tree that is common to both launch and capture paths, the STA tool calculates how much pessimism was added by applying opposite deratings to the same cells, and adds that amount back as credit. The formula:

CRPR credit = max_delay(common) − min_delay(common)

This credit is added back to the setup slack. Typical CRPR values range from 10 ps to 100 ps depending on how much of the clock tree is shared and how aggressive the OCV derating is.

CRPR is sometimes called CPPR (Common Path Pessimism Removal) — both terms mean the same thing. Modern STA tools (PrimeTime, Tempus) apply it automatically.

Why it matters: Without CRPR, many paths that physically meet timing are flagged as violations. This causes unnecessary engineering effort to "fix" timing that is already correct. Enabling CRPR can reduce the number of failing paths by 20–40% without any design changes.

07

Hard CDC

How do you safely transfer a multi-bit data bus across clock domains? Why can't you just synchronize each bit independently?

Why per-bit synchronization fails: Each bit of the bus passes through its own 2-FF synchronizer independently. Each synchronizer may sample from a different source clock cycle — bit 3 might capture the value from cycle N while bit 0 captures the value from cycle N+1. The destination domain then reads a "torn" word that never existed in the source domain. For a 32-bit bus, this can produce completely wrong data.

Safe techniques for multi-bit CDC:

Gray code (for counters/pointers): If the bus is a counter that increments by one at a time, encode it in Gray code before the crossing. Only one bit changes per count, so a sampled-in-transition value is at most off by one — which FIFO logic tolerates.
Handshake (req/ack): Source asserts a request (req) after data has been stable for at least one source cycle. Destination synchronizes req (2-FF), samples the data only after req is asserted, then asserts ack. Source deasserts req after seeing synchronized ack. Both req and ack use separate 2-FF synchronizers. Low throughput (takes ~4–6 destination clock cycles per transfer) but works for any arbitrary data.
Asynchronous FIFO: For streaming data, use an async FIFO with Gray-coded pointers. The FIFO internally handles all multi-bit CDC safely.
Qualified sampling: Source keeps data stable for at least 3 destination clock cycles, then asserts a single "data valid" signal. Destination synchronizes the valid signal and samples the data on the synchronized valid. Risky — relies on the source holding data long enough.

Qualcomm modem chips have dozens of clock domains (CPU, DSP, RF, power management, audio) all communicating across boundaries. Multi-bit CDC handling is one of the most common bug sources in modem SoC development.

08

Medium CDC

What is a pulse synchronizer? When would you use it instead of a 2-FF level synchronizer?

A 2-FF level synchronizer is used when the source signal is a steady level that persists for many source clock cycles. The destination captures it safely after 2 destination clocks.

A pulse synchronizer is needed when the source generates a single-cycle pulse — a signal that is HIGH for exactly one source clock cycle. A 2-FF synchronizer cannot reliably capture this: if the destination clock is slower or at an unfortunate phase, the pulse may be missed entirely.

How a toggle-based pulse synchronizer works:

Source domain: A toggle flip-flop converts each incoming pulse into a level change. Every time a pulse arrives, the FF inverts its output. The toggle signal therefore holds its value until the next pulse — making it a persistent level that won't be missed.
Clock crossing: The toggle signal crosses the domain via a standard 2-FF synchronizer.
Destination domain: An XOR of the synchronized output and its one-cycle-delayed copy detects each edge → generates a clean single-cycle pulse in the destination domain.

Constraint: Source pulses must be spaced at least 3 destination clock cycles apart so the previous toggle has fully propagated through the synchronizer before the next pulse arrives. If pulses can arrive faster, use an async FIFO instead.

Use cases: Interrupt signals, one-shot event notifications, handshake request pulses — any scenario where the source generates a distinct, infrequent event and the destination must detect it exactly once.

09

Medium Low Power

What is UPF (Unified Power Format)? What does it define and why is it needed?

UPF (IEEE 1801) is a standard format for capturing the power intent of a chip design in a separate file that accompanies the RTL. As SoCs moved to multiple power domains, it became impossible to express power management purely in RTL — the RTL describes logical functionality, not which block gets what voltage or when a domain shuts off.

What UPF defines:

Supply networks: Which voltage rails exist (VDD_CPU, VDD_MODEM, VDD_AON), their nominal voltages, and how they connect to design blocks.
Power domains: Which RTL modules belong to which supply rail. Each domain has a defined primary power supply.
Power states: Which domains are ON or OFF in each operating mode (e.g., "sleep mode: modem ON, CPU OFF, AON ON").
Isolation cells: Specifies where isolation cells must be inserted at the boundary of power-gatable domains, and what value they should clamp to when the domain is off.
Retention registers: Which flip-flops need SRPG (State Retention Power Gating) cells to preserve state across a power-off event.
Level shifters: Where voltage-level-shifting cells are needed between domains running at different voltages.
Power switches: Header (PMOS) or footer (NMOS) transistors that gate the power supply to a domain.

Qualcomm uses UPF extensively in Snapdragon SoCs. A typical Snapdragon has 10+ power domains (CPU clusters, GPU, DSP, modem, camera, display, always-on). Without UPF, manually inserting and verifying isolation cells, retention registers, and level shifters across hundreds of domain boundaries would be error-prone and unmanageable.

10

Medium Low Power

What are isolation cells and retention registers (SRPG)? When are they required?

Isolation cells are required at the output boundary of any power-gated domain. When a domain's power supply is cut, its flip-flops lose their state and outputs become undefined (float to a random value or X). If an always-on domain receives these floating signals, it may malfunction — latching garbage data, causing spurious state transitions, or drawing excessive short-circuit current.

An isolation cell is inserted on each output net of the power-gated block. It is connected to an always-on supply. When the domain is OFF, the isolation cell clamps the output to a safe known value (typically 0 for AND-based isolation, or 1 for OR-based) as specified in UPF. When the domain is ON, the isolation cell passes the signal through transparently.

Retention registers (SRPG — State Retention Power Gating) are special flip-flop variants with a small "shadow latch" connected to a separate always-on power rail (typically a low-leakage supply). The shadow latch holds only a few transistors, consuming a fraction of the normal FF's leakage.

Operation:

Before power-off: The power management controller sends a SAVE signal → each SRPG cell captures its current state into its shadow latch.
Domain is off: Main supply cut, shadow latch retains state at very low power.
After power-on: A RESTORE signal pushes the shadow state back into the main FF.

Without retention, the block must re-initialize from scratch after every power-up, adding latency and requiring software re-programming of registers.

11

Easy Low Power

What is a voltage island in SoC design? What cells are required at the boundaries?

A voltage island is a physically distinct region of the chip that operates at a different supply voltage from surrounding blocks. By running low-activity blocks at a lower V_DD, dynamic power scales as V², giving dramatic savings — dropping from 1.0V to 0.8V reduces dynamic power by 36%.

Why Qualcomm uses voltage islands: A Snapdragon SoC has very different performance and power requirements across blocks. The modem baseband runs continuously but at moderate frequency. The application CPU cores spike to high performance on demand. The always-on sensor hub must run at <0.7V for weeks on battery. A single supply voltage optimized for the fastest block wastes enormous power in slower blocks.

Required boundary cells:

Level shifters (LS): Signals crossing between domains at different voltages must be shifted to the receiving domain's logic levels. A signal from a 0.8V domain HIGH (0.8V) is not guaranteed to be a valid HIGH in a 1.1V domain without level shifting. Level shifters are inserted on every signal crossing.
Isolation cells: If the lower-voltage island can be powered off completely, isolation cells (see previous question) are needed to clamp its outputs.
Level-shifting isolation cells: Combined cells that both shift voltage and isolate — used at boundaries between always-on and power-gatable domains at different voltages.

Design overhead: Level shifters add area, delay (~10–50 ps), and power. Proper floorplanning ensures domain boundaries are short, minimizing the number of crossing signals and therefore level shifters needed.

12

Easy Physical Design

What is Clock Tree Synthesis (CTS)? What does the tool try to achieve and what comes after it?

Clock Tree Synthesis (CTS) is the physical design step that builds the clock distribution network — a buffered tree that delivers the clock signal from the clock source (PLL output or pad) to every flip-flop's clock pin across the entire chip.

Goals of CTS:

Minimize clock skew: Every FF should see the clock edge at (nearly) the same time. Unbalanced trees create skew that consumes setup and hold timing margins.
Meet insertion delay target: Total latency from clock source to FF clock pins must be within the budgeted range (typically set in SDC via set_clock_latency).
Minimize clock power: The clock network toggles every cycle and can consume 30–40% of total chip dynamic power. The tool balances skew reduction against cell count and wire length.
Respect no-touch (NDR) routing rules: Clock nets typically use special Non-Default Routing Rules (wider wires, more spacing, preferred upper metal layers) for reduced resistance and better EM reliability.

Flow position: CTS runs after placement (cell locations are fixed) but before detailed routing. After CTS, timing analysis uses real clock arrival times instead of ideal clock assumptions — hold violations often emerge here because real clock trees have skew that didn't exist in pre-CTS analysis.

Post-CTS hold fixing: After CTS, the timer switches from ideal clock to propagated clock. Paths that were hold-clean with ideal clocks often fail with real clock skew. Hold fixing (inserting delay buffers) is a major post-CTS activity before proceeding to routing.

13

Medium Physical Design

What is IR drop in a VLSI design? How does it affect timing and how do you fix it?

IR drop is the voltage reduction along the power delivery network from the supply pins to the power pins of individual cells. The metal power grid has resistance (R), and the switching current (I) causes a voltage drop V = I × R. A cell operating at V_nominal − ΔV is slower than a cell at the full supply voltage.

Two types:

Static IR drop: Average current × grid resistance. Determined by the long-term average switching activity. Used for power integrity sign-off of DC operating point.
Dynamic (transient) IR drop: When a large number of cells switch simultaneously (e.g., a wide datapath all clocking at once), the instantaneous current surge exceeds the average. The power grid voltage transiently collapses by a larger amount, limited by the inductance and decoupling capacitance. This "voltage droop" is worse than static IR and is the primary concern at high frequencies.

Effect on timing: In a high-IR-drop region, cells are slower than characterized at nominal voltage. A path that passes STA at nominal conditions may violate setup timing in silicon due to IR-induced delay increase. Hold violations are less common (slower cells improve hold margin).

Fixes:

Widen power stripes or add more power mesh layers
Add decoupling capacitors (decaps) near high-switching density regions
Spread high-activity cells during placement to avoid current hot spots
Use power gating with controlled wake-up sequences to avoid simultaneous switching
In STA: apply voltage derating in high-IR-drop regions for more accurate sign-off

14

Medium Physical Design

What is the antenna effect in VLSI fabrication? How is it detected and fixed?

During VLSI fabrication, metal layers are deposited and patterned one at a time using plasma etching. Plasma charges accumulate on exposed metal wires during etching. If a long metal wire is already connected to a transistor gate oxide but NOT yet connected to a diffusion region (which would discharge the charge safely), the accumulated charges can create a large voltage across the thin gate oxide — sufficient to cause permanent gate oxide damage: threshold voltage shifts, increased leakage, or immediate breakdown.

The antenna ratio = (metal area of the wire connected to the gate) / (gate oxide area). Process Design Kits (PDKs) specify maximum allowable antenna ratios (typically 400–1000 for metal, 200–600 for vias). Exceeding this ratio means the wire can accumulate enough charge to damage the oxide.

How it's detected: The router's DRC (Design Rule Check) engine computes the cumulative antenna ratio for every net using the partial routing built up layer by layer. If it exceeds the limit, an antenna violation is flagged.

Fixes:

Metal jumper (layer hopping): Break the long wire by jumping to a higher metal layer and back. This "resets" the antenna accumulation because higher-layer routing is done later, after diffusion connections have been made. Most common fix.
Antenna diode: Insert a reverse-biased diode near the gate, connected to the same metal wire. During plasma etching, the diode provides a discharge path to substrate, preventing charge buildup. Small area cost, always effective.
Reduce net length: Re-route the net to use shorter wires on lower layers.

15

Easy Verification / DFT

What is the difference between functional coverage and code coverage? Which is more important?

Code coverage measures how much of the RTL source code was exercised by the simulation:

Line/statement coverage: Were all lines of RTL executed?
Branch coverage: Were both sides of every if/else and every case arm taken?
Toggle coverage: Did every signal toggle both 0→1 and 1→0?
FSM coverage: Were all states visited and all transitions taken?

Code coverage is automatically collected by the simulator with no extra specification — easy to get, but tells you nothing about what scenarios were verified. You can hit 100% branch coverage while never testing the most critical protocol corner case.

Functional coverage is user-defined. The verification engineer specifies which scenarios, protocol states, and parameter combinations are important to verify — then measures whether simulations actually exercised them:

Was an AXI4 burst of ARLEN=255 (256 beats) issued?
Did a FIFO simultaneously receive a write and a read when exactly one slot was free?
Did a CDC crossing happen with data changing every source cycle?

Which matters more? Both are necessary; neither alone is sufficient. Code coverage ensures no dead code was accidentally left un-exercised. Functional coverage ensures the right scenarios were tested. A mature sign-off process requires both to be above target (typically 95%+ code coverage, 100% defined functional coverpoints).

16

Medium Verification / DFT

Describe the key components of a UVM (Universal Verification Methodology) testbench. How does it differ from a traditional directed testbench?

UVM (IEEE 1800.2) is a standardized SystemVerilog methodology for building reusable, scalable verification environments using an object-oriented framework. It replaces brittle, one-off directed testbenches.

Key UVM components:

uvm_test: Top-level test class. Selects which scenario/sequence to run and configures the environment. Different tests reuse the same TB infrastructure.
uvm_env: Container that instantiates and connects agents, scoreboards, and coverage collectors for one DUT.
uvm_agent: Models one protocol interface (e.g., AXI4 master). Contains: Driver (applies stimulus to DUT pins), Monitor (observes DUT pins and creates transaction objects), Sequencer (arbitrates between sequences and feeds items to the driver).
uvm_sequence / uvm_sequence_item: Defines the actual stimulus transactions. Sequences can be layered (a higher-level sequence calls lower-level sequences) and constrained-random.
uvm_scoreboard: Compares DUT output (from monitor) against a reference model's expected output. Reports pass/fail.
TLM ports (uvm_analysis_port): Standardized communication channels between components — no direct references between classes.

Vs. directed testbench: A directed testbench hand-codes every stimulus vector — it only tests what the engineer explicitly wrote. A UVM testbench with constrained-random stimulus explores the full stimulus space automatically within user-specified constraints, finding corner cases no human would write by hand.

Coverage-driven verification: UVM enables "close-the-loop" verification: run simulations → check functional coverage → add constraints to target uncovered scenarios → repeat until all coverpoints hit. This replaces guesswork with a systematic, measurable sign-off process.

17

Medium Verification / DFT

What are the main fault models used in ATPG? What does each model test for?

ATPG (Automatic Test Pattern Generation) tools model physical manufacturing defects as logical faults and generate patterns to detect them. The main fault models are:

Stuck-At Fault (SAF): A wire is permanently stuck at logic 0 (SA0) or 1 (SA1), regardless of what drives it. Models open circuits, resistive shorts to VDD/GND, and broken connections. The most widely used model. A stuck-at fault is detected by finding a test that excites the fault (drives the opposite value) and propagates the effect to a primary output or scan chain output. Industry target: 95–99% fault coverage.
Transition Delay Fault (TDF): Tests whether a net can make a complete 0→1 or 1→0 transition within one clock cycle. Detects resistive defects that don't prevent correct logic levels but slow transitions — critical at high frequency where even a slightly slow net causes a setup violation. TDF requires two-pattern tests: launch the transition, then capture the response one cycle later.
Path Delay Fault (PDF): Tests the end-to-end propagation delay of a specific signal path. More accurate timing characterization than TDF — detects accumulated small delays across many gates. Requires many patterns but provides the most complete timing sign-off.
Bridging Fault: Models an unintended short between two adjacent nets. A short that combines two signals via wired-AND or wired-OR logic. Increasingly important at 7nm/5nm where metal pitch is very tight and coupling between adjacent wires is a common defect.
Cell-Aware Fault: Tests for defects inside standard cells at the transistor level (open/short in the cell's internal netlist). Catches defects that SAF, modeled at the cell's logical interface, would miss.

Qualcomm mobile chips use both SA and transition delay fault testing at production. High volume = even a 0.01% defect escape rate means thousands of field failures. Comprehensive fault coverage is non-negotiable.

18

Medium Protocols

What is MIPI CSI-2? How is it used in mobile SoCs and what are its key electrical characteristics?

CSI-2 (Camera Serial Interface 2) is a MIPI Alliance standard for connecting image sensors to application processors. It is the dominant camera interface in smartphones — virtually every mobile camera uses CSI-2.

Physical layer (D-PHY): CSI-2 uses MIPI D-PHY, a differential serial interface with two operating modes:

High-Speed (HS) mode: Low-swing differential signaling (100–300 mV differential) at 80 Mbps to 4.5 Gbps per lane. Used for pixel data transmission.
Low-Power (LP) mode: CMOS-level single-ended signaling. Used for control, synchronization, and lane management. Much lower speed.

Architecture: One clock lane + 1 to 4 data lanes. Each lane is a differential pair (DP/DN). For a quad-lane sensor at 4.5 Gbps/lane: total bandwidth = 4 × 4.5 = 18 Gbps — sufficient for 200 MP sensors at full frame rate.

Virtual channels: Up to 4 virtual channel IDs allow multiple cameras to share the same physical CSI-2 interface, multiplexed by the sensor or ISP.

C-PHY (newer alternative): Uses 3-wire "trios" with encoded 3-symbol signaling, achieving 5.7 Gsymbols/s per trio = ~2.28 bits per symbol → higher effective data rate without increasing frequency. Used in high-resolution cameras where D-PHY lane count limits bandwidth.

VLSI implementation: The CSI-2 receiver on a Snapdragon SoC consists of a D-PHY frontend (analog deserializer), a lane merger, a CSI-2 protocol decoder, and an interface to the Image Signal Processor (ISP). It must process pixels faster than they arrive to prevent FIFO overflow — typically 500 MHz+ operating frequency.

VLSI Engineer Interview Questions