HomeFPGA from ScratchDay 16
DAY 16 · TIMING

Timing Constraints & Timing Closure

By EcrioniX · Updated Jun 11, 2026

Your design simulates perfectly but fails mysteriously on real hardware. The culprit is almost always timing. This lesson demystifies setup time, hold time, critical paths, and WNS/TNS. You will write real XDC constraints and build a pipelined adder that demonstrates how pipeline registers break critical paths and push Fmax higher.

1. Setup time and hold time

Every flip-flop has two timing requirements around its clock edge:

The timing formula

For a path from flip-flop A to flip-flop B:
Slack = Tclk − (Tco_A + Tcomb + Tsu_B)

Where Tco is clock-to-output delay, Tcomb is combinational logic delay, Tsu is setup time of B. If slack < 0, your design will fail. The tool reports this as negative slack.

2. WNS, TNS and timing closure

MetricMeaningTarget
WNSWorst Negative Slack — the single worst timing path≥ 0 ns
TNSTotal Negative Slack — sum of all negative-slack paths0 ns
WHSWorst Hold Slack≥ 0 ns
THSTotal Hold Slack0 ns
FmaxMaximum achievable clock frequency≥ your target

3. XDC constraint examples

constraints.xdc
# constraints.xdc — example timing constraints for Basys3/Nexys4 (100 MHz)

# 1. Define the clock on pin W5 (Basys3 100 MHz oscillator)
#    Period = 10 ns (100 MHz)
create_clock -period 10.000 -name sys_clk -waveform {0.000 5.000} [get_ports clk]

# 2. Input delay — tell the tool how late inputs arrive after the clock edge
#    If external data is valid 2 ns after the board clock edge:
set_input_delay -clock sys_clk -max 2.0 [get_ports {a[*] b[*]}]
set_input_delay -clock sys_clk -min 0.5 [get_ports {a[*] b[*]}]

# 3. Output delay — tell the tool when output must be valid before next clock
#    If external device needs data 2 ns before next clock edge:
set_output_delay -clock sys_clk -max 2.0 [get_ports {result[*]}]
set_output_delay -clock sys_clk -min 0.5 [get_ports {result[*]}]

# 4. False path — async reset does not need timing analysis
set_false_path -from [get_ports rst]

4. Port table — pipe_add

PortDirWidthDescription
clkIN1System clock
rstIN1Synchronous reset
aIN16First addend, captured at stage 1
bIN16Second addend, captured at stage 1
resultOUT17Sum a+b, valid 2 cycles after inputs (2-stage pipeline)
validOUT1Valid flag pipelined alongside result

5. pipe_add.v — 2-stage pipelined adder

pipe_add.v
// pipe_add.v — 2-stage pipelined 16-bit adder
// Stage 1: capture inputs and compute partial sum (lower 8 bits + carry)
// Stage 2: complete upper 8 bits + combine — result appears 2 cycles later
// This shows how pipeline registers break the critical path.

module pipe_add (
    input  wire        clk,
    input  wire        rst,
    input  wire [15:0] a,
    input  wire [15:0] b,
    input  wire        valid_in,
    output reg  [16:0] result,
    output reg         valid_out
);

// ---- Stage 1 registers ----
// Latch inputs and compute low-byte sum
reg [7:0]  s1_sum_lo;   // a[7:0] + b[7:0]
reg        s1_carry;     // carry out from low byte
reg [7:0]  s1_a_hi;      // upper bytes passed through
reg [7:0]  s1_b_hi;
reg        s1_valid;

always @(posedge clk) begin
    if (rst) begin
        s1_sum_lo <= 0;
        s1_carry  <= 0;
        s1_a_hi   <= 0;
        s1_b_hi   <= 0;
        s1_valid  <= 0;
    end else begin
        {s1_carry, s1_sum_lo} <= a[7:0] + b[7:0];   // critical path: 8-bit adder
        s1_a_hi  <= a[15:8];
        s1_b_hi  <= b[15:8];
        s1_valid <= valid_in;
    end
end

// ---- Stage 2 registers ----
// Compute upper-byte sum using carry from stage 1
always @(posedge clk) begin
    if (rst) begin
        result    <= 0;
        valid_out <= 0;
    end else begin
        result    <= {(s1_a_hi + s1_b_hi + s1_carry), s1_sum_lo};  // 8-bit adder + carry
        valid_out <= s1_valid;
    end
end

endmodule

Why does pipelining help Fmax?

Without pipelining, a 16-bit adder has a carry-ripple chain of 16 full-adder delays. On Xilinx 7-series this might be ~3 ns, limiting Fmax to ~333 MHz. By splitting at byte 8, each stage has only 8 full-adder delays (~1.5 ns), roughly doubling Fmax — at the cost of 1 extra cycle of latency.

6. Testbench — tb_pipe_add.v

tb_pipe_add.v
// tb_pipe_add.v — self-checking testbench for pipe_add
// Accounts for 2-cycle pipeline latency
`timescale 1ns/1ps

module tb_pipe_add;

reg        clk = 0;
reg        rst = 1;
reg [15:0] a = 0, b = 0;
reg        valid_in = 0;
wire[16:0] result;
wire       valid_out;

pipe_add dut(.clk(clk),.rst(rst),.a(a),.b(b),.valid_in(valid_in),
             .result(result),.valid_out(valid_out));

always #5 clk = ~clk;

integer pass_cnt = 0, fail_cnt = 0;

// Store expected values in a queue (shift register)
reg [16:0] exp_q [0:3];
reg        vld_q [0:3];
integer    i;

task apply;
    input [15:0] ta, tb;
    begin
        @(posedge clk);
        a <= ta; b <= tb; valid_in <= 1;
        // Push expected into queue
        exp_q[0] <= ta + tb;
        vld_q[0] <= 1;
    end
endtask

initial begin
    $dumpfile("tb_pipe_add.vcd");
    $dumpvars(0, tb_pipe_add);

    repeat(4) @(posedge clk);
    rst = 0;

    // Send a series of inputs
    // Results appear 2 cycles later on valid_out
    @(posedge clk); a<=16'h0001; b<=16'h0002; valid_in<=1;
    @(posedge clk); a<=16'hFFFF; b<=16'h0001; valid_in<=1;
    @(posedge clk); a<=16'h1234; b<=16'h5678; valid_in<=1;
    @(posedge clk); a<=16'hAAAA; b<=16'h5555; valid_in<=1;
    @(posedge clk); valid_in<=0;

    // Wait for pipeline to flush (2 cycles)
    repeat(4) @(posedge clk);

    // Check results (monitor valid_out)
    $finish;
end

// Self-checking monitor
reg [16:0] expected_vals [0:3];
integer    check_idx = 0;
initial begin
    expected_vals[0] = 17'h00003;  // 1+2
    expected_vals[1] = 17'h10000;  // 0xFFFF+1 = 0x10000
    expected_vals[2] = 17'h068AC;  // 0x1234+0x5678
    expected_vals[3] = 17'h0FFFF;  // 0xAAAA+0x5555
end

always @(posedge clk) begin
    if (valid_out && check_idx < 4) begin
        if (result === expected_vals[check_idx]) begin
            $display("PASS [%0d]: result=0x%05X", check_idx, result);
            pass_cnt = pass_cnt + 1;
        end else begin
            $display("FAIL [%0d]: got=0x%05X exp=0x%05X", check_idx, result, expected_vals[check_idx]);
            fail_cnt = fail_cnt + 1;
        end
        check_idx = check_idx + 1;
        if (check_idx == 4) begin
            if (fail_cnt == 0)
                $display("\nALL TESTS PASSED (%0d/4)", pass_cnt);
            else
                $display("\nFAILED: %0d passed, %0d failed", pass_cnt, fail_cnt);
            $finish;
        end
    end
end

initial #2000 begin $display("TIMEOUT"); $finish; end

endmodule

7. Expected output

PASS [0]: result=0x00003
PASS [1]: result=0x10000
PASS [2]: result=0x068AC
PASS [3]: result=0x0FFFF

ALL TESTS PASSED (4/4)

Key Takeaways

Frequently Asked Questions

What is setup time and hold time?

Setup time (Tsu) is the minimum time data must be stable before the clock edge. Hold time (Th) is the minimum time after the clock edge. Violating setup time causes wrong data capture. Violating hold time can cause the flip-flop to latch a transitioning value.

What does WNS mean in timing reports?

WNS (Worst Negative Slack) is the most critical timing violation. If negative, the design cannot run at the target clock. TNS (Total Negative Slack) is the sum of all negative-slack paths. Both must be ≥ 0 for timing closure.

Why does pipelining improve Fmax?

Pipelining inserts registers between combinational stages, giving each stage a shorter critical path. The clock can run faster (higher Fmax) at the cost of extra clock cycles of latency before results appear.

← Previous
Day 15: Block RAM & ROMs