SPI – Serial Peripheral Interface
SPI is a synchronous, full-duplex serial protocol developed by Motorola. A single master drives the clock and selects slaves individually via dedicated chip-select lines, exchanging data simultaneously on two unidirectional data lines. Its simplicity and speed make it the de-facto interface for sensors, flash memory, ADCs, DACs, and display controllers.
Overview
SPI (Serial Peripheral Interface) is a synchronous, point-to-point serial bus. The master generates the clock and controls every transfer; slaves are purely reactive. Because both MOSI and MISO operate simultaneously, each clock edge shifts one bit out of the master shift register while one bit shifts in — making every SPI transaction a full-duplex swap of two shift registers.
There is no formal SPI specification — different vendors implement variations — but the core four-wire interface and shift-register model are universally consistent. The only significant ambiguity between devices is the clock polarity and phase (CPOL/CPHA), which defines which clock edge samples data.
SPI vs I²C: SPI is faster (tens of MHz vs 400 kHz–3.4 MHz), fully synchronous, and has no addressing overhead — but requires one CS line per slave. I²C needs only two wires for a full multi-master bus with addressing. Choose SPI for speed; choose I²C for bus simplicity.
Signal Reference
SPI uses four signals. All are driven by the master except MISO, which is driven by the selected slave.
| Signal | Full Name | Direction | Description |
|---|---|---|---|
| SCLK | Serial Clock | Master → Slave | Clock generated by master. Frequency and idle polarity determined by CPOL setting. |
| MOSI | Master Out Slave In | Master → Slave | Serial data from master to selected slave. MSB transmitted first by convention. |
| MISO | Master In Slave Out | Slave → Master | Serial data returned from selected slave. Only the active slave drives this line; others must tri-state it. |
| CS_N | Chip Select (active low) | Master → Slave | Selects the target slave. One dedicated CS_N line per slave. Also called SS_N (Slave Select). |
MISO contention: In a multi-slave system, unselected slaves must tri-state (high-Z) their MISO output. If a slave device has no tri-state capability, a daisy-chain topology (see Variants) must be used instead.
CPOL / CPHA Modes
CPOL (Clock Polarity) sets the idle state of SCLK. CPHA (Clock Phase) selects which clock edge captures data. Together they define four SPI modes.
| Mode | CPOL | CPHA | Clock idle | Sample edge | Shift edge | Common devices |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | LOW | Rising ↑ | Falling ↓ | Most sensors, SD cards, STM32 default |
| 1 | 0 | 1 | LOW | Falling ↓ | Rising ↑ | Some ADCs, shift registers |
| 2 | 1 | 0 | HIGH | Falling ↓ | Rising ↑ | Some display controllers |
| 3 | 1 | 1 | HIGH | Rising ↑ | Falling ↓ | SPI flash (QSPI mode 3) |
Modes 0 and 3 are functionally equivalent from a data-capture perspective (both sample on the rising edge relative to the active clock cycle). Modes 1 and 2 are likewise equivalent. Always check the slave device datasheet for its required mode.
Mode 0 Timing Diagram
An 8-bit Mode 0 (CPOL=0, CPHA=0) transfer. CS_N asserts low, MOSI presents MSB (D7) immediately, and the master samples MISO on every rising edge of SCLK. MOSI advances to the next bit on every falling edge.
Green dashed columns = rising edges (SCLK ↑) → MISO sampled by master, MOSI sampled by slave. MOSI advances on falling edges (↓). Both MOSI (D7..D0) and MISO (Q7..Q0) carry independent 8-bit values exchanged simultaneously.
Mode 3 Comparison (CPOL=1, CPHA=1)
In Mode 3, SCLK idles HIGH. The timing diagram mirrors Mode 0 — data is still sampled on the rising SCLK edge — but CS_N assertion now sees the clock falling first before the first rising sample edge. The same RTL can support Mode 3 by inverting SCLK polarity.
SPI Variants & Topologies
Bus width variants
| Variant | Data lines | Direction | Use case |
|---|---|---|---|
| Standard SPI | 1 (MOSI + MISO) | Full-duplex | Sensors, ADC/DAC, general peripherals |
| Dual SPI | 2 (IO0, IO1) | Half-duplex | Flash read at 2× speed; both lines bidirectional |
| Quad SPI (QSPI) | 4 (IO0–IO3) | Half-duplex | NOR flash, PSRAM — 4× throughput over standard SPI |
| Octal SPI | 8 (IO0–IO7) | Half-duplex | HyperBus / high-density embedded flash |
Multi-slave topologies
| Topology | CS lines | MISO | Trade-off |
|---|---|---|---|
| Independent CS | 1 per slave | All slaves share one MISO line (must tri-state) | Fastest; simultaneous select impossible; needs tri-state capable slaves |
| Daisy-chain | 1 shared | MISO of each slave feeds MOSI of the next; last slave → master MISO | Fewer GPIO pins; data arrives N×8 bits later; limited to shift-register slaves |
QSPI address phase trick: Many QSPI flash devices send the command byte in standard SPI (1-bit) and then switch to quad mode for the address and data phases. The RTL must manage the mode switch mid-transaction.
Verilog RTL — SPI Master (Mode 0)
A parametric SPI master implementing Mode 0 (CPOL=0, CPHA=0). CLK_DIV sets the SCLK half-period in system clock cycles, so SCLK frequency = clk / (2 × CLK_DIV). Data is MSB-first, 8-bit by default. The done output pulses for one clock cycle after the last bit completes and rx_data is valid.
// SPI Master — Mode 0 (CPOL=0, CPHA=0), MSB first // SCLK = clk / (2 × CLK_DIV) | minimum CLK_DIV = 2 module spi_master #( parameter DATA_WIDTH = 8, parameter CLK_DIV = 4 )( input wire clk, input wire rst_n, // user interface input wire start, input wire [DATA_WIDTH-1:0] tx_data, output reg [DATA_WIDTH-1:0] rx_data, output reg busy, output reg done, // SPI pins output reg sclk, output wire mosi, input wire miso, output reg cs_n ); localparam IDLE = 2'd0; localparam XFER = 2'd1; localparam FINISH = 2'd2; reg [DATA_WIDTH-1:0] tx_sr, rx_sr; reg [$clog2(DATA_WIDTH)-1:0] bit_idx; // counts down: DATA_WIDTH-1 → 0 reg [$clog2(CLK_DIV)-1:0] div_cnt; // clock divider counter reg [1:0] state; assign mosi = tx_sr[DATA_WIDTH-1]; // MSB of shift register drives MOSI always @(posedge clk or negedge rst_n) begin if (!rst_n) begin state <= IDLE; busy <= 0; done <= 0; cs_n <= 1; sclk <= 0; tx_sr <= 0; rx_sr <= 0; rx_data <= 0; bit_idx <= 0; div_cnt <= 0; end else begin done <= 0; // single-cycle pulse case (state) IDLE: begin if (start && !busy) begin tx_sr <= tx_data; bit_idx <= DATA_WIDTH - 1; div_cnt <= 0; cs_n <= 0; // assert CS busy <= 1; state <= XFER; end end XFER: begin if (div_cnt == CLK_DIV - 1) begin div_cnt <= 0; sclk <= ~sclk; if (!sclk) begin // → rising edge: sample MISO rx_sr <= {rx_sr[DATA_WIDTH-2:0], miso}; end else begin // → falling edge: advance MOSI if (bit_idx == 0) state <= FINISH; else begin tx_sr <= {tx_sr[DATA_WIDTH-2:0], 1'b0}; bit_idx <= bit_idx - 1; end end end else div_cnt <= div_cnt + 1; end FINISH: begin cs_n <= 1; // deassert CS sclk <= 0; // return clock to idle rx_data <= rx_sr; // latch received byte busy <= 0; done <= 1; state <= IDLE; end endcase end end endmodule
Key design decisions:
- Shift register as MOSI source —
assign mosi = tx_sr[MSB]means MOSI updates automatically on every non-blocking assignment totx_sr. No extra mux needed. - Single-cycle FINISH — CS deassertion and
donehappen in one clock. Real designs may add a CS hold time (1–2 ns) here if the slave datasheet requires it. - Clock divider minimum —
CLK_DIV=2is the minimum safe value. With CLK_DIV=1,div_cntwould be 0 bits wide (from$clog2(1)), causing a synthesis error.
Testbench & Simulation
The testbench wires MISO directly to MOSI (loopback). Because SPI is a shift-register swap, whatever the master shifts out arrives back on MISO — so rx_data must equal tx_data after every transfer. Five bytes are sent and the console prints PASS/FAIL for each.
`timescale 1ns/1ps module spi_master_tb; localparam DW = 8; localparam DIV = 2; // SCLK = 100 MHz / 4 = 25 MHz reg clk, rst_n, start; reg [7:0] tx_data; wire [7:0] rx_data; wire busy, done; wire sclk, mosi, cs_n; wire miso; assign miso = mosi; // loopback: slave echoes master spi_master #(.DATA_WIDTH(DW), .CLK_DIV(DIV)) uut ( .clk(clk), .rst_n(rst_n), .start(start), .tx_data(tx_data), .rx_data(rx_data), .busy(busy), .done(done), .sclk(sclk), .mosi(mosi), .miso(miso), .cs_n(cs_n) ); initial clk = 0; always #5 clk = ~clk; // 100 MHz system clock task send; input [7:0] data; begin @(posedge clk); #1; tx_data = data; start = 1; @(posedge clk); #1; start = 0; @(posedge done); #1; $display("TX=0x%02h RX=0x%02h %s", data, rx_data, (rx_data===data)?"PASS":"FAIL"); end endtask initial begin $dumpfile("spi.vcd"); $dumpvars(0, spi_master_tb); rst_n = 0; start = 0; tx_data = 0; #25 rst_n = 1; send(8'hA5); #20; send(8'h3C); #20; send(8'hFF); #20; send(8'h00); #20; send(8'h69); #20; $display("--- Simulation complete ---"); $finish; end endmodule
Expected console output:
TX=0xA5 RX=0xa5 PASS TX=0x3C RX=0x3c PASS TX=0xFF RX=0xff PASS TX=0x00 RX=0x00 PASS TX=0x69 RX=0x69 PASS --- Simulation complete ---
Run it in the browser
Open the EcrioniX Verilog simulator with the SPI master + testbench pre-loaded. Click Run (or Ctrl+Enter) to compile and execute.
Interview Q&A
sclk register is toggled using a non-blocking assignment (sclk <= ~sclk). The "if (!sclk)" branch executes in the same clock cycle that sclk is being set high — because sclk still reads its old (pre-toggle) value in a non-blocking context. So !sclk is true when sclk was 0 (going to 1 = rising edge), triggering rx_sr update. Similarly, the else branch captures the falling edge. CLK_DIV system clock cycles separate each edge, giving the slave's MISO output time to settle before the next rising sample.
$clog2(1) = 0, so [$clog2(CLK_DIV)-1:0] becomes [-1:0] — a zero-width or negative-width vector that is either a synthesis error or synthesizes to a 1-bit register that wraps immediately. Additionally, CLK_DIV=1 means the counter expires every clock cycle, toggling SCLK at the system clock rate, which leaves no time for MISO to propagate from the slave before the next sample edge. CLK_DIV ≥ 2 gives at least one idle system clock cycle between edges.
CPOL parameter. In IDLE, initialize sclk <= CPOL instead of 0. After FINISH, set sclk <= CPOL. (2) Add a CPHA parameter. When CPHA=1, the first clock edge shifts data (rather than samples it) — so swap the rising/falling edge logic in the XFER state. One clean way is to compute sample_edge = sclk ^ CPOL ^ CPHA and use that as the branch condition instead of !sclk. This covers all four modes with minimal logic change.