DAY 16 · TIMING

Timing Constraints & Timing Closure

Q: What is setup time and hold time?

Setup time (Tsu) is the minimum time data must be stable BEFORE the clock edge for reliable capture. Hold time (Th) is the minimum time data must remain stable AFTER the clock edge. Violating setup time causes data to be missed (output metastable or wrong). Violating hold time causes the flip-flop to capture a transitioning value.

Q: What does WNS mean in timing reports?

WNS (Worst Negative Slack) is the most critical timing violation in your design. If WNS is negative, your design will not run at the target clock frequency. TNS (Total Negative Slack) is the sum of all negative slack paths. To achieve timing closure both WNS and TNS must be ≥ 0.

Q: Why does pipelining improve Fmax?

Pipelining inserts registers between combinational stages, breaking a long computation into smaller pieces. Each stage has a shorter critical path and therefore needs less time between clock edges. The clock can run faster (higher Fmax) at the cost of adding latency (extra clock cycles before the result appears).

By EcrioniX · Updated Jun 11, 2026

Your design simulates perfectly but fails mysteriously on real hardware. The culprit is almost always timing. This lesson demystifies setup time, hold time, critical paths, and WNS/TNS. You will write real XDC constraints and build a pipelined adder that demonstrates how pipeline registers break critical paths and push Fmax higher.

1. Setup time and hold time

Every flip-flop has two timing requirements around its clock edge:

Setup time (Tsu) — data must be stable for at least Tsu before the clock edge. If data arrives too late, the flip-flop may output a wrong value or enter metastability.
Hold time (Th) — data must remain stable for at least Th after the clock edge. If data changes too quickly, the flip-flop can latch the wrong value.

The timing formula

For a path from flip-flop A to flip-flop B:
Slack = Tclk − (Tco_A + Tcomb + Tsu_B)

Where Tco is clock-to-output delay, Tcomb is combinational logic delay, Tsu is setup time of B. If slack < 0, your design will fail. The tool reports this as negative slack.

2. WNS, TNS and timing closure

Metric	Meaning	Target
WNS	Worst Negative Slack — the single worst timing path	≥ 0 ns
TNS	Total Negative Slack — sum of all negative-slack paths	0 ns
WHS	Worst Hold Slack	≥ 0 ns
THS	Total Hold Slack	0 ns
Fmax	Maximum achievable clock frequency	≥ your target

3. XDC constraint examples

constraints.xdc

# constraints.xdc — example timing constraints for Basys3/Nexys4 (100 MHz)

# 1. Define the clock on pin W5 (Basys3 100 MHz oscillator)
#    Period = 10 ns (100 MHz)
create_clock -period 10.000 -name sys_clk -waveform {0.000 5.000} [get_ports clk]

# 2. Input delay — tell the tool how late inputs arrive after the clock edge
#    If external data is valid 2 ns after the board clock edge:
set_input_delay -clock sys_clk -max 2.0 [get_ports {a[*] b[*]}]
set_input_delay -clock sys_clk -min 0.5 [get_ports {a[*] b[*]}]

# 3. Output delay — tell the tool when output must be valid before next clock
#    If external device needs data 2 ns before next clock edge:
set_output_delay -clock sys_clk -max 2.0 [get_ports {result[*]}]
set_output_delay -clock sys_clk -min 0.5 [get_ports {result[*]}]

# 4. False path — async reset does not need timing analysis
set_false_path -from [get_ports rst]

4. Port table — pipe_add

Port	Dir	Width	Description
clk	IN	1	System clock
rst	IN	1	Synchronous reset
a	IN	16	First addend, captured at stage 1
b	IN	16	Second addend, captured at stage 1
result	OUT	17	Sum a+b, valid 2 cycles after inputs (2-stage pipeline)
valid	OUT	1	Valid flag pipelined alongside result

5. pipe_add.v — 2-stage pipelined adder

pipe_add.v

// pipe_add.v — 2-stage pipelined 16-bit adder
// Stage 1: capture inputs and compute partial sum (lower 8 bits + carry)
// Stage 2: complete upper 8 bits + combine — result appears 2 cycles later
// This shows how pipeline registers break the critical path.

module pipe_add (
    input  wire        clk,
    input  wire        rst,
    input  wire [15:0] a,
    input  wire [15:0] b,
    input  wire        valid_in,
    output reg  [16:0] result,
    output reg         valid_out
);

// ---- Stage 1 registers ----
// Latch inputs and compute low-byte sum
reg [7:0]  s1_sum_lo;   // a[7:0] + b[7:0]
reg        s1_carry;     // carry out from low byte
reg [7:0]  s1_a_hi;      // upper bytes passed through
reg [7:0]  s1_b_hi;
reg        s1_valid;

always @(posedge clk) begin
    if (rst) begin
        s1_sum_lo <= 0;
        s1_carry  <= 0;
        s1_a_hi   <= 0;
        s1_b_hi   <= 0;
        s1_valid  <= 0;
    end else begin
        {s1_carry, s1_sum_lo} <= a[7:0] + b[7:0];   // critical path: 8-bit adder
        s1_a_hi  <= a[15:8];
        s1_b_hi  <= b[15:8];
        s1_valid <= valid_in;
    end
end

// ---- Stage 2 registers ----
// Compute upper-byte sum using carry from stage 1
always @(posedge clk) begin
    if (rst) begin
        result    <= 0;
        valid_out <= 0;
    end else begin
        result    <= {(s1_a_hi + s1_b_hi + s1_carry), s1_sum_lo};  // 8-bit adder + carry
        valid_out <= s1_valid;
    end
end

endmodule

Why does pipelining help Fmax?

Without pipelining, a 16-bit adder has a carry-ripple chain of 16 full-adder delays. On Xilinx 7-series this might be ~3 ns, limiting Fmax to ~333 MHz. By splitting at byte 8, each stage has only 8 full-adder delays (~1.5 ns), roughly doubling Fmax — at the cost of 1 extra cycle of latency.

6. Testbench — tb_pipe_add.v

tb_pipe_add.v

// tb_pipe_add.v — self-checking testbench for pipe_add
// Accounts for 2-cycle pipeline latency
`timescale 1ns/1ps

module tb_pipe_add;

reg        clk = 0;
reg        rst = 1;
reg [15:0] a = 0, b = 0;
reg        valid_in = 0;
wire[16:0] result;
wire       valid_out;

pipe_add dut(.clk(clk),.rst(rst),.a(a),.b(b),.valid_in(valid_in),
             .result(result),.valid_out(valid_out));

always #5 clk = ~clk;

integer pass_cnt = 0, fail_cnt = 0;

// Store expected values in a queue (shift register)
reg [16:0] exp_q [0:3];
reg        vld_q [0:3];
integer    i;

task apply;
    input [15:0] ta, tb;
    begin
        @(posedge clk);
        a <= ta; b <= tb; valid_in <= 1;
        // Push expected into queue
        exp_q[0] <= ta + tb;
        vld_q[0] <= 1;
    end
endtask

initial begin
    $dumpfile("tb_pipe_add.vcd");
    $dumpvars(0, tb_pipe_add);

    repeat(4) @(posedge clk);
    rst = 0;

    // Send a series of inputs
    // Results appear 2 cycles later on valid_out
    @(posedge clk); a<=16'h0001; b<=16'h0002; valid_in<=1;
    @(posedge clk); a<=16'hFFFF; b<=16'h0001; valid_in<=1;
    @(posedge clk); a<=16'h1234; b<=16'h5678; valid_in<=1;
    @(posedge clk); a<=16'hAAAA; b<=16'h5555; valid_in<=1;
    @(posedge clk); valid_in<=0;

    // Wait for pipeline to flush (2 cycles)
    repeat(4) @(posedge clk);

    // Check results (monitor valid_out)
    $finish;
end

// Self-checking monitor
reg [16:0] expected_vals [0:3];
integer    check_idx = 0;
initial begin
    expected_vals[0] = 17'h00003;  // 1+2
    expected_vals[1] = 17'h10000;  // 0xFFFF+1 = 0x10000
    expected_vals[2] = 17'h068AC;  // 0x1234+0x5678
    expected_vals[3] = 17'h0FFFF;  // 0xAAAA+0x5555
end

always @(posedge clk) begin
    if (valid_out && check_idx < 4) begin
        if (result === expected_vals[check_idx]) begin
            $display("PASS [%0d]: result=0x%05X", check_idx, result);
            pass_cnt = pass_cnt + 1;
        end else begin
            $display("FAIL [%0d]: got=0x%05X exp=0x%05X", check_idx, result, expected_vals[check_idx]);
            fail_cnt = fail_cnt + 1;
        end
        check_idx = check_idx + 1;
        if (check_idx == 4) begin
            if (fail_cnt == 0)
                $display("\nALL TESTS PASSED (%0d/4)", pass_cnt);
            else
                $display("\nFAILED: %0d passed, %0d failed", pass_cnt, fail_cnt);
            $finish;
        end
    end
end

initial #2000 begin $display("TIMEOUT"); $finish; end

endmodule

7. Expected output

PASS [0]: result=0x00003
PASS [1]: result=0x10000
PASS [2]: result=0x068AC
PASS [3]: result=0x0FFFF

ALL TESTS PASSED (4/4)

Key Takeaways

Setup slack = clock period − (Tco + Tcomb + Tsu). Negative slack means timing failure.
WNS is the worst single path; TNS is the sum — both must be ≥ 0 for timing closure.
Pipelining breaks long combinational paths, raising Fmax at the cost of added latency.
create_clock is mandatory — without it the tools assume an infinite period and make no effort to optimise timing.
Always set set_input_delay and set_output_delay for correct I/O timing analysis.

Frequently Asked Questions

What is setup time and hold time?

Setup time (Tsu) is the minimum time data must be stable before the clock edge. Hold time (Th) is the minimum time after the clock edge. Violating setup time causes wrong data capture. Violating hold time can cause the flip-flop to latch a transitioning value.

What does WNS mean in timing reports?

WNS (Worst Negative Slack) is the most critical timing violation. If negative, the design cannot run at the target clock. TNS (Total Negative Slack) is the sum of all negative-slack paths. Both must be ≥ 0 for timing closure.

Why does pipelining improve Fmax?

Pipelining inserts registers between combinational stages, giving each stage a shorter critical path. The clock can run faster (higher Fmax) at the cost of extra clock cycles of latency before results appear.