Off-Chip Protocol

SPI – Serial Peripheral Interface

SPI is a synchronous, full-duplex serial protocol developed by Motorola. A single master drives the clock and selects slaves individually via dedicated chip-select lines, exchanging data simultaneously on two unidirectional data lines. Its simplicity and speed make it the de-facto interface for sensors, flash memory, ADCs, DACs, and display controllers.

Type — Synchronous, full-duplex
Wires — 4 (SCLK, MOSI, MISO, CS_N)
Topology — Single master, multi-slave
Speed — Up to tens of MHz (device-dependent)
Standard — De-facto (no formal spec)

Overview

SPI (Serial Peripheral Interface) is a synchronous, point-to-point serial bus. The master generates the clock and controls every transfer; slaves are purely reactive. Because both MOSI and MISO operate simultaneously, each clock edge shifts one bit out of the master shift register while one bit shifts in — making every SPI transaction a full-duplex swap of two shift registers.

There is no formal SPI specification — different vendors implement variations — but the core four-wire interface and shift-register model are universally consistent. The only significant ambiguity between devices is the clock polarity and phase (CPOL/CPHA), which defines which clock edge samples data.

SPI vs I²C: SPI is faster (tens of MHz vs 400 kHz–3.4 MHz), fully synchronous, and has no addressing overhead — but requires one CS line per slave. I²C needs only two wires for a full multi-master bus with addressing. Choose SPI for speed; choose I²C for bus simplicity.

Signal Reference

SPI uses four signals. All are driven by the master except MISO, which is driven by the selected slave.

SignalFull NameDirectionDescription
SCLKSerial ClockMaster → SlaveClock generated by master. Frequency and idle polarity determined by CPOL setting.
MOSIMaster Out Slave InMaster → SlaveSerial data from master to selected slave. MSB transmitted first by convention.
MISOMaster In Slave OutSlave → MasterSerial data returned from selected slave. Only the active slave drives this line; others must tri-state it.
CS_NChip Select (active low)Master → SlaveSelects the target slave. One dedicated CS_N line per slave. Also called SS_N (Slave Select).

MISO contention: In a multi-slave system, unselected slaves must tri-state (high-Z) their MISO output. If a slave device has no tri-state capability, a daisy-chain topology (see Variants) must be used instead.

CPOL / CPHA Modes

CPOL (Clock Polarity) sets the idle state of SCLK. CPHA (Clock Phase) selects which clock edge captures data. Together they define four SPI modes.

ModeCPOLCPHAClock idleSample edgeShift edgeCommon devices
000LOWRising ↑Falling ↓Most sensors, SD cards, STM32 default
101LOWFalling ↓Rising ↑Some ADCs, shift registers
210HIGHFalling ↓Rising ↑Some display controllers
311HIGHRising ↑Falling ↓SPI flash (QSPI mode 3)
Mode 0 · Most common
CPOL=0, CPHA=0
Clock idle LOW. Data valid before first rising edge. Sample on ↑, shift on ↓.
Mode 1
CPOL=0, CPHA=1
Clock idle LOW. First clock edge shifts data, second edge samples it.
Mode 2
CPOL=1, CPHA=0
Clock idle HIGH. Data valid before first falling edge. Sample on ↓.
Mode 3
CPOL=1, CPHA=1
Clock idle HIGH. Mirror of Mode 0 — same edges, opposite polarity. Used by QSPI flash.

Modes 0 and 3 are functionally equivalent from a data-capture perspective (both sample on the rising edge relative to the active clock cycle). Modes 1 and 2 are likewise equivalent. Always check the slave device datasheet for its required mode.

Mode 0 Timing Diagram

An 8-bit Mode 0 (CPOL=0, CPHA=0) transfer. CS_N asserts low, MOSI presents MSB (D7) immediately, and the master samples MISO on every rising edge of SCLK. MOSI advances to the next bit on every falling edge.

SAMPLE CS_N SCLK MOSI MISO D7 D6 D5 D4 D3 D2 D1 D0 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 idle CPOL=0 idle LOW shifts on ↓ idle

Green dashed columns = rising edges (SCLK ↑) → MISO sampled by master, MOSI sampled by slave. MOSI advances on falling edges (↓). Both MOSI (D7..D0) and MISO (Q7..Q0) carry independent 8-bit values exchanged simultaneously.

Mode 3 Comparison (CPOL=1, CPHA=1)

In Mode 3, SCLK idles HIGH. The timing diagram mirrors Mode 0 — data is still sampled on the rising SCLK edge — but CS_N assertion now sees the clock falling first before the first rising sample edge. The same RTL can support Mode 3 by inverting SCLK polarity.

CS_N SCLK DATA SAMPLE ↑ idle HIGH

SPI Variants & Topologies

Bus width variants

VariantData linesDirectionUse case
Standard SPI1 (MOSI + MISO)Full-duplexSensors, ADC/DAC, general peripherals
Dual SPI2 (IO0, IO1)Half-duplexFlash read at 2× speed; both lines bidirectional
Quad SPI (QSPI)4 (IO0–IO3)Half-duplexNOR flash, PSRAM — 4× throughput over standard SPI
Octal SPI8 (IO0–IO7)Half-duplexHyperBus / high-density embedded flash

Multi-slave topologies

TopologyCS linesMISOTrade-off
Independent CS1 per slaveAll slaves share one MISO line (must tri-state)Fastest; simultaneous select impossible; needs tri-state capable slaves
Daisy-chain1 sharedMISO of each slave feeds MOSI of the next; last slave → master MISOFewer GPIO pins; data arrives N×8 bits later; limited to shift-register slaves

QSPI address phase trick: Many QSPI flash devices send the command byte in standard SPI (1-bit) and then switch to quad mode for the address and data phases. The RTL must manage the mode switch mid-transaction.

Verilog RTL — SPI Master (Mode 0)

A parametric SPI master implementing Mode 0 (CPOL=0, CPHA=0). CLK_DIV sets the SCLK half-period in system clock cycles, so SCLK frequency = clk / (2 × CLK_DIV). Data is MSB-first, 8-bit by default. The done output pulses for one clock cycle after the last bit completes and rx_data is valid.

Verilog — spi_master.v
// SPI Master — Mode 0 (CPOL=0, CPHA=0), MSB first
// SCLK = clk / (2 × CLK_DIV)   |   minimum CLK_DIV = 2
module spi_master #(
  parameter DATA_WIDTH = 8,
  parameter CLK_DIV    = 4
)(
  input  wire                  clk,
  input  wire                  rst_n,
  // user interface
  input  wire                  start,
  input  wire [DATA_WIDTH-1:0] tx_data,
  output reg  [DATA_WIDTH-1:0] rx_data,
  output reg                   busy,
  output reg                   done,
  // SPI pins
  output reg                   sclk,
  output wire                  mosi,
  input  wire                  miso,
  output reg                   cs_n
);
  localparam IDLE   = 2'd0;
  localparam XFER   = 2'd1;
  localparam FINISH = 2'd2;

  reg [DATA_WIDTH-1:0]         tx_sr, rx_sr;
  reg [$clog2(DATA_WIDTH)-1:0] bit_idx;  // counts down: DATA_WIDTH-1 → 0
  reg [$clog2(CLK_DIV)-1:0]   div_cnt;  // clock divider counter
  reg [1:0]                    state;

  assign mosi = tx_sr[DATA_WIDTH-1];   // MSB of shift register drives MOSI

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      state   <= IDLE; busy <= 0; done <= 0;
      cs_n    <= 1;    sclk <= 0;
      tx_sr   <= 0;    rx_sr <= 0; rx_data <= 0;
      bit_idx <= 0;    div_cnt <= 0;
    end else begin
      done <= 0;                          // single-cycle pulse

      case (state)

        IDLE: begin
          if (start && !busy) begin
            tx_sr   <= tx_data;
            bit_idx <= DATA_WIDTH - 1;
            div_cnt <= 0;
            cs_n    <= 0;               // assert CS
            busy    <= 1;
            state   <= XFER;
          end
        end

        XFER: begin
          if (div_cnt == CLK_DIV - 1) begin
            div_cnt <= 0;
            sclk    <= ~sclk;

            if (!sclk) begin             // → rising edge: sample MISO
              rx_sr <= {rx_sr[DATA_WIDTH-2:0], miso};
            end else begin              // → falling edge: advance MOSI
              if (bit_idx == 0)
                state <= FINISH;
              else begin
                tx_sr   <= {tx_sr[DATA_WIDTH-2:0], 1'b0};
                bit_idx <= bit_idx - 1;
              end
            end
          end else
            div_cnt <= div_cnt + 1;
        end

        FINISH: begin
          cs_n    <= 1;                  // deassert CS
          sclk    <= 0;                  // return clock to idle
          rx_data <= rx_sr;             // latch received byte
          busy    <= 0;
          done    <= 1;
          state   <= IDLE;
        end
      endcase
    end
  end
endmodule

Key design decisions:

Testbench & Simulation

The testbench wires MISO directly to MOSI (loopback). Because SPI is a shift-register swap, whatever the master shifts out arrives back on MISO — so rx_data must equal tx_data after every transfer. Five bytes are sent and the console prints PASS/FAIL for each.

Verilog — spi_tb.v
`timescale 1ns/1ps
module spi_master_tb;
  localparam DW  = 8;
  localparam DIV = 2;   // SCLK = 100 MHz / 4 = 25 MHz

  reg        clk, rst_n, start;
  reg  [7:0] tx_data;
  wire [7:0] rx_data;
  wire       busy, done;
  wire       sclk, mosi, cs_n;
  wire       miso;

  assign miso = mosi;   // loopback: slave echoes master

  spi_master #(.DATA_WIDTH(DW), .CLK_DIV(DIV)) uut (
    .clk(clk), .rst_n(rst_n), .start(start),
    .tx_data(tx_data), .rx_data(rx_data),
    .busy(busy), .done(done),
    .sclk(sclk), .mosi(mosi), .miso(miso), .cs_n(cs_n)
  );

  initial clk = 0;
  always #5 clk = ~clk;       // 100 MHz system clock

  task send;
    input [7:0] data;
    begin
      @(posedge clk); #1;
      tx_data = data; start = 1;
      @(posedge clk); #1; start = 0;
      @(posedge done); #1;
      $display("TX=0x%02h  RX=0x%02h  %s",
               data, rx_data, (rx_data===data)?"PASS":"FAIL");
    end
  endtask

  initial begin
    $dumpfile("spi.vcd"); $dumpvars(0, spi_master_tb);
    rst_n = 0; start = 0; tx_data = 0;
    #25 rst_n = 1;

    send(8'hA5);  #20;
    send(8'h3C);  #20;
    send(8'hFF);  #20;
    send(8'h00);  #20;
    send(8'h69);  #20;

    $display("--- Simulation complete ---");
    $finish;
  end
endmodule

Expected console output:

Expected Output
TX=0xA5  RX=0xa5  PASS
TX=0x3C  RX=0x3c  PASS
TX=0xFF  RX=0xff  PASS
TX=0x00  RX=0x00  PASS
TX=0x69  RX=0x69  PASS
--- Simulation complete ---

Run it in the browser

Open the EcrioniX Verilog simulator with the SPI master + testbench pre-loaded. Click Run (or Ctrl+Enter) to compile and execute.

Run in Simulator

Interview Q&A

What is the role of CPOL and CPHA in SPI, and how do you choose the correct mode?
CPOL sets the idle (inactive) clock level: 0 = idle LOW, 1 = idle HIGH. CPHA determines which clock edge captures data: 0 = first edge, 1 = second edge. Together they produce four modes. You choose the mode by reading the slave's datasheet — it always specifies which mode it expects. Connecting a master in the wrong mode causes bit errors because data is sampled at the wrong edge. Most devices (sensors, ADCs) default to Mode 0.
Why is SPI called full-duplex and what does that mean at the hardware level?
Full-duplex means both the master and slave transmit simultaneously on separate wires. Internally, both sides maintain a shift register. On each active clock edge, the master's shift register shifts one bit out onto MOSI while simultaneously shifting one MISO bit in — and the slave does the mirror image. Every SPI "write" is actually a swap: the master sends its 8-bit value and receives the slave's 8-bit value at the same time. If only one direction is needed, the unwanted data is simply discarded.
How does the RTL handle the timing between the SCLK rising edge (MISO sample) and the falling edge (MOSI shift)?
The sclk register is toggled using a non-blocking assignment (sclk <= ~sclk). The "if (!sclk)" branch executes in the same clock cycle that sclk is being set high — because sclk still reads its old (pre-toggle) value in a non-blocking context. So !sclk is true when sclk was 0 (going to 1 = rising edge), triggering rx_sr update. Similarly, the else branch captures the falling edge. CLK_DIV system clock cycles separate each edge, giving the slave's MISO output time to settle before the next rising sample.
What happens on the MISO line if two slaves are both selected in a standard SPI bus?
Bus contention. Both slaves actively drive the shared MISO line, potentially creating a direct short between a high driver and a low driver. This can corrupt data and damage output buffers. Properly designed slaves tri-state (disable) their MISO output when their CS_N is high. The master must never assert two CS_N lines simultaneously. In a daisy-chain topology this problem doesn't arise since only one slave's output is connected to the next's input.
What is the minimum value for CLK_DIV in this RTL, and why does CLK_DIV=1 fail?
The minimum safe value is CLK_DIV=2. With CLK_DIV=1: $clog2(1) = 0, so [$clog2(CLK_DIV)-1:0] becomes [-1:0] — a zero-width or negative-width vector that is either a synthesis error or synthesizes to a 1-bit register that wraps immediately. Additionally, CLK_DIV=1 means the counter expires every clock cycle, toggling SCLK at the system clock rate, which leaves no time for MISO to propagate from the slave before the next sample edge. CLK_DIV ≥ 2 gives at least one idle system clock cycle between edges.
How would you extend this master to support all four SPI modes?
Two changes: (1) Add a CPOL parameter. In IDLE, initialize sclk <= CPOL instead of 0. After FINISH, set sclk <= CPOL. (2) Add a CPHA parameter. When CPHA=1, the first clock edge shifts data (rather than samples it) — so swap the rising/falling edge logic in the XFER state. One clean way is to compute sample_edge = sclk ^ CPOL ^ CPHA and use that as the branch condition instead of !sclk. This covers all four modes with minimal logic change.
Why is there no formal SPI standard, and how does that affect hardware design?
Motorola introduced SPI in the mid-1980s but never published a formal specification. Different manufacturers implement variations: some omit MISO (write-only slaves), some use active-high CS, some add a word-select line for different frame sizes, and Dual/Quad SPI are entirely vendor-defined extensions. In practice this means you must read the target device's datasheet carefully rather than relying on any standard. It also means your RTL may need to be parameterizable (CPOL, CPHA, frame length, CS polarity, inter-byte gap) to be reusable across projects.
What is the difference between Dual SPI and QSPI, and when would you use each?
Dual SPI uses IO0 and IO1 bidirectionally (typically command in standard mode, data in dual mode) giving 2× throughput. QSPI adds IO2 and IO3 for 4× throughput. Both are half-duplex during the data phase (you either send or receive, not both simultaneously). QSPI is the de-facto interface for external NOR flash on microcontrollers and FPGAs (e.g., Winbond W25Qxx). Dual SPI is a stepping stone — most QSPI-capable devices also support dual mode as a fallback. Use QSPI whenever boot time or read throughput from external flash is a system constraint.