What do CPOL and CPHA mean in SPI and how do the four modes differ?

CPOL (Clock Polarity) sets the idle state of SCLK: CPOL=0 means SCLK is low when idle; CPOL=1 means high. CPHA (Clock Phase) sets when data is sampled: CPHA=0 means data is captured on the first clock edge; CPHA=1 means on the second edge. The four modes are: Mode 0 (CPOL=0, CPHA=0) — idle low, sample on rising edge; Mode 1 (CPOL=0, CPHA=1) — idle low, sample on falling edge; Mode 2 (CPOL=1, CPHA=0) — idle high, sample on falling edge; Mode 3 (CPOL=1, CPHA=1) — idle high, sample on rising edge. Master and slave must be configured to the same mode.

How does SPI slave select (CS/SS) work with multiple slaves?

SPI uses an active-low chip select (CS or SS — Slave Select) line per slave device. The master pulls the target slave's CS low before the transaction and returns it high afterward. When CS is high, the slave ignores SCLK and MOSI and tristates MISO. With N slaves, N dedicated GPIO lines are needed, which can become a pin-count problem. Alternatives include daisy-chaining (slaves connected in a shift-register chain sharing a single CS) or using a decoder to reduce GPIO count from N to log2(N) lines.

What are the key differences between SPI and I2C?

SPI uses 4 wires (SCLK, MOSI, MISO, CS) per slave versus I2C's 2 wires (SDA, SCL) shared by all devices. SPI is full-duplex (simultaneous send and receive), while I2C is half-duplex. SPI has no theoretical speed limit and typically runs at tens to hundreds of MHz; I2C tops out at 12.5 Mbps (I3C) or 5 Mbps (UFm I2C). SPI has no built-in ACK, addressing, or arbitration — the master controls everything through CS. I2C supports multiple masters and up to 127 devices on one bus without extra pins. SPI is preferred for high-speed sensors and displays; I2C for multi-device mixed-speed buses.

Off-Chip Protocol

SPI – Serial Peripheral Interface

SPI is a synchronous, full-duplex serial protocol developed by Motorola. A single master drives the clock and selects slaves individually via dedicated chip-select lines, exchanging data simultaneously on two unidirectional data lines. Its simplicity and speed make it the de-facto interface for sensors, flash memory, ADCs, DACs, and display controllers.

Type — Synchronous, full-duplex

Wires — 4 (SCLK, MOSI, MISO, CS_N)

Topology — Single master, multi-slave

Speed — Up to tens of MHz (device-dependent)

Standard — De-facto (no formal spec)

Overview

SPI (Serial Peripheral Interface) is a synchronous, point-to-point serial bus. The master generates the clock and controls every transfer; slaves are purely reactive. Because both MOSI and MISO operate simultaneously, each clock edge shifts one bit out of the master shift register while one bit shifts in — making every SPI transaction a full-duplex swap of two shift registers.

There is no formal SPI specification — different vendors implement variations — but the core four-wire interface and shift-register model are universally consistent. The only significant ambiguity between devices is the clock polarity and phase (CPOL/CPHA), which defines which clock edge samples data.

SPI vs I²C: SPI is faster (tens of MHz vs 400 kHz–3.4 MHz), fully synchronous, and has no addressing overhead — but requires one CS line per slave. I²C needs only two wires for a full multi-master bus with addressing. Choose SPI for speed; choose I²C for bus simplicity.

Signal Reference

SPI uses four signals. All are driven by the master except MISO, which is driven by the selected slave.

Signal	Full Name	Direction	Description
SCLK	Serial Clock	Master → Slave	Clock generated by master. Frequency and idle polarity determined by CPOL setting.
MOSI	Master Out Slave In	Master → Slave	Serial data from master to selected slave. MSB transmitted first by convention.
MISO	Master In Slave Out	Slave → Master	Serial data returned from selected slave. Only the active slave drives this line; others must tri-state it.
CS_N	Chip Select (active low)	Master → Slave	Selects the target slave. One dedicated CS_N line per slave. Also called SS_N (Slave Select).

MISO contention: In a multi-slave system, unselected slaves must tri-state (high-Z) their MISO output. If a slave device has no tri-state capability, a daisy-chain topology (see Variants) must be used instead.

CPOL / CPHA Modes

CPOL (Clock Polarity) sets the idle state of SCLK. CPHA (Clock Phase) selects which clock edge captures data. Together they define four SPI modes.

Mode	CPOL	CPHA	Clock idle	Sample edge	Shift edge	Common devices
0	0	0	LOW	Rising ↑	Falling ↓	Most sensors, SD cards, STM32 default
1	0	1	LOW	Falling ↓	Rising ↑	Some ADCs, shift registers
2	1	0	HIGH	Falling ↓	Rising ↑	Some display controllers
3	1	1	HIGH	Rising ↑	Falling ↓	SPI flash (QSPI mode 3)

Mode 0 · Most common

CPOL=0, CPHA=0

Clock idle LOW. Data valid before first rising edge. Sample on ↑, shift on ↓.

Mode 1

CPOL=0, CPHA=1

Clock idle LOW. First clock edge shifts data, second edge samples it.

Mode 2

CPOL=1, CPHA=0

Clock idle HIGH. Data valid before first falling edge. Sample on ↓.

Mode 3

CPOL=1, CPHA=1

Clock idle HIGH. Mirror of Mode 0 — same edges, opposite polarity. Used by QSPI flash.

Modes 0 and 3 are functionally equivalent from a data-capture perspective (both sample on the rising edge relative to the active clock cycle). Modes 1 and 2 are likewise equivalent. Always check the slave device datasheet for its required mode.

Mode 0 Timing Diagram

An 8-bit Mode 0 (CPOL=0, CPHA=0) transfer. CS_N asserts low, MOSI presents MSB (D7) immediately, and the master samples MISO on every rising edge of SCLK. MOSI advances to the next bit on every falling edge.

Green dashed columns = rising edges (SCLK ↑) → MISO sampled by master, MOSI sampled by slave. MOSI advances on falling edges (↓). Both MOSI (D7..D0) and MISO (Q7..Q0) carry independent 8-bit values exchanged simultaneously.

Mode 3 Comparison (CPOL=1, CPHA=1)

In Mode 3, SCLK idles HIGH. The timing diagram mirrors Mode 0 — data is still sampled on the rising SCLK edge — but CS_N assertion now sees the clock falling first before the first rising sample edge. The same RTL can support Mode 3 by inverting SCLK polarity.

SPI Variants & Topologies

Bus width variants

Variant	Data lines	Direction	Use case
Standard SPI	1 (MOSI + MISO)	Full-duplex	Sensors, ADC/DAC, general peripherals
Dual SPI	2 (IO0, IO1)	Half-duplex	Flash read at 2× speed; both lines bidirectional
Quad SPI (QSPI)	4 (IO0–IO3)	Half-duplex	NOR flash, PSRAM — 4× throughput over standard SPI
Octal SPI	8 (IO0–IO7)	Half-duplex	HyperBus / high-density embedded flash

Multi-slave topologies

Topology	CS lines	MISO	Trade-off
Independent CS	1 per slave	All slaves share one MISO line (must tri-state)	Fastest; simultaneous select impossible; needs tri-state capable slaves
Daisy-chain	1 shared	MISO of each slave feeds MOSI of the next; last slave → master MISO	Fewer GPIO pins; data arrives N×8 bits later; limited to shift-register slaves

QSPI address phase trick: Many QSPI flash devices send the command byte in standard SPI (1-bit) and then switch to quad mode for the address and data phases. The RTL must manage the mode switch mid-transaction.

Verilog RTL — SPI Master (Mode 0)

A parametric SPI master implementing Mode 0 (CPOL=0, CPHA=0). CLK_DIV sets the SCLK half-period in system clock cycles, so SCLK frequency = clk / (2 × CLK_DIV). Data is MSB-first, 8-bit by default. The done output pulses for one clock cycle after the last bit completes and rx_data is valid.

Verilog — spi_master.v

// SPI Master — Mode 0 (CPOL=0, CPHA=0), MSB first
// SCLK = clk / (2 × CLK_DIV)   |   minimum CLK_DIV = 2
module spi_master #(
  parameter DATA_WIDTH = 8,
  parameter CLK_DIV    = 4
)(
  input  wire                  clk,
  input  wire                  rst_n,
  // user interface
  input  wire                  start,
  input  wire [DATA_WIDTH-1:0] tx_data,
  output reg  [DATA_WIDTH-1:0] rx_data,
  output reg                   busy,
  output reg                   done,
  // SPI pins
  output reg                   sclk,
  output wire                  mosi,
  input  wire                  miso,
  output reg                   cs_n
);
  localparam IDLE   = 2'd0;
  localparam XFER   = 2'd1;
  localparam FINISH = 2'd2;

  reg [DATA_WIDTH-1:0]         tx_sr, rx_sr;
  reg [$clog2(DATA_WIDTH)-1:0] bit_idx;  // counts down: DATA_WIDTH-1 → 0
  reg [$clog2(CLK_DIV)-1:0]   div_cnt;  // clock divider counter
  reg [1:0]                    state;

  assign mosi = tx_sr[DATA_WIDTH-1];   // MSB of shift register drives MOSI

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      state   <= IDLE; busy <= 0; done <= 0;
      cs_n    <= 1;    sclk <= 0;
      tx_sr   <= 0;    rx_sr <= 0; rx_data <= 0;
      bit_idx <= 0;    div_cnt <= 0;
    end else begin
      done <= 0;                          // single-cycle pulse

      case (state)

        IDLE: begin
          if (start && !busy) begin
            tx_sr   <= tx_data;
            bit_idx <= DATA_WIDTH - 1;
            div_cnt <= 0;
            cs_n    <= 0;               // assert CS
            busy    <= 1;
            state   <= XFER;
          end
        end

        XFER: begin
          if (div_cnt == CLK_DIV - 1) begin
            div_cnt <= 0;
            sclk    <= ~sclk;

            if (!sclk) begin             // → rising edge: sample MISO
              rx_sr <= {rx_sr[DATA_WIDTH-2:0], miso};
            end else begin              // → falling edge: advance MOSI
              if (bit_idx == 0)
                state <= FINISH;
              else begin
                tx_sr   <= {tx_sr[DATA_WIDTH-2:0], 1'b0};
                bit_idx <= bit_idx - 1;
              end
            end
          end else
            div_cnt <= div_cnt + 1;
        end

        FINISH: begin
          cs_n    <= 1;                  // deassert CS
          sclk    <= 0;                  // return clock to idle
          rx_data <= rx_sr;             // latch received byte
          busy    <= 0;
          done    <= 1;
          state   <= IDLE;
        end
      endcase
    end
  end
endmodule

Key design decisions:

Shift register as MOSI source — assign mosi = tx_sr[MSB] means MOSI updates automatically on every non-blocking assignment to tx_sr. No extra mux needed.
Single-cycle FINISH — CS deassertion and done happen in one clock. Real designs may add a CS hold time (1–2 ns) here if the slave datasheet requires it.
Clock divider minimum — CLK_DIV=2 is the minimum safe value. With CLK_DIV=1, div_cnt would be 0 bits wide (from $clog2(1)), causing a synthesis error.

Testbench & Simulation

The testbench wires MISO directly to MOSI (loopback). Because SPI is a shift-register swap, whatever the master shifts out arrives back on MISO — so rx_data must equal tx_data after every transfer. Five bytes are sent and the console prints PASS/FAIL for each.

Verilog — spi_tb.v

`timescale 1ns/1ps
module spi_master_tb;
  localparam DW  = 8;
  localparam DIV = 2;   // SCLK = 100 MHz / 4 = 25 MHz

  reg        clk, rst_n, start;
  reg  [7:0] tx_data;
  wire [7:0] rx_data;
  wire       busy, done;
  wire       sclk, mosi, cs_n;
  wire       miso;

  assign miso = mosi;   // loopback: slave echoes master

  spi_master #(.DATA_WIDTH(DW), .CLK_DIV(DIV)) uut (
    .clk(clk), .rst_n(rst_n), .start(start),
    .tx_data(tx_data), .rx_data(rx_data),
    .busy(busy), .done(done),
    .sclk(sclk), .mosi(mosi), .miso(miso), .cs_n(cs_n)
  );

  initial clk = 0;
  always #5 clk = ~clk;       // 100 MHz system clock

  task send;
    input [7:0] data;
    begin
      @(posedge clk); #1;
      tx_data = data; start = 1;
      @(posedge clk); #1; start = 0;
      @(posedge done); #1;
      $display("TX=0x%02h  RX=0x%02h  %s",
               data, rx_data, (rx_data===data)?"PASS":"FAIL");
    end
  endtask

  initial begin
    $dumpfile("spi.vcd"); $dumpvars(0, spi_master_tb);
    rst_n = 0; start = 0; tx_data = 0;
    #25 rst_n = 1;

    send(8'hA5);  #20;
    send(8'h3C);  #20;
    send(8'hFF);  #20;
    send(8'h00);  #20;
    send(8'h69);  #20;

    $display("--- Simulation complete ---");
    $finish;
  end
endmodule

Expected console output:

Expected Output

TX=0xA5  RX=0xa5  PASS
TX=0x3C  RX=0x3c  PASS
TX=0xFF  RX=0xff  PASS
TX=0x00  RX=0x00  PASS
TX=0x69  RX=0x69  PASS
--- Simulation complete ---

Run it in the browser

Open the EcrioniX Verilog simulator with the SPI master + testbench pre-loaded. Click Run (or Ctrl+Enter) to compile and execute.

Run in Simulator

Interview Q&A

What is the role of CPOL and CPHA in SPI, and how do you choose the correct mode?

CPOL sets the idle (inactive) clock level: 0 = idle LOW, 1 = idle HIGH. CPHA determines which clock edge captures data: 0 = first edge, 1 = second edge. Together they produce four modes. You choose the mode by reading the slave's datasheet — it always specifies which mode it expects. Connecting a master in the wrong mode causes bit errors because data is sampled at the wrong edge. Most devices (sensors, ADCs) default to Mode 0.

Why is SPI called full-duplex and what does that mean at the hardware level?

Full-duplex means both the master and slave transmit simultaneously on separate wires. Internally, both sides maintain a shift register. On each active clock edge, the master's shift register shifts one bit out onto MOSI while simultaneously shifting one MISO bit in — and the slave does the mirror image. Every SPI "write" is actually a swap: the master sends its 8-bit value and receives the slave's 8-bit value at the same time. If only one direction is needed, the unwanted data is simply discarded.

How does the RTL handle the timing between the SCLK rising edge (MISO sample) and the falling edge (MOSI shift)?

The sclk register is toggled using a non-blocking assignment (sclk <= ~sclk). The "if (!sclk)" branch executes in the same clock cycle that sclk is being set high — because sclk still reads its old (pre-toggle) value in a non-blocking context. So !sclk is true when sclk was 0 (going to 1 = rising edge), triggering rx_sr update. Similarly, the else branch captures the falling edge. CLK_DIV system clock cycles separate each edge, giving the slave's MISO output time to settle before the next rising sample.

What happens on the MISO line if two slaves are both selected in a standard SPI bus?

Bus contention. Both slaves actively drive the shared MISO line, potentially creating a direct short between a high driver and a low driver. This can corrupt data and damage output buffers. Properly designed slaves tri-state (disable) their MISO output when their CS_N is high. The master must never assert two CS_N lines simultaneously. In a daisy-chain topology this problem doesn't arise since only one slave's output is connected to the next's input.

What is the minimum value for CLK_DIV in this RTL, and why does CLK_DIV=1 fail?

The minimum safe value is CLK_DIV=2. With CLK_DIV=1: $clog2(1) = 0, so [$clog2(CLK_DIV)-1:0] becomes [-1:0] — a zero-width or negative-width vector that is either a synthesis error or synthesizes to a 1-bit register that wraps immediately. Additionally, CLK_DIV=1 means the counter expires every clock cycle, toggling SCLK at the system clock rate, which leaves no time for MISO to propagate from the slave before the next sample edge. CLK_DIV ≥ 2 gives at least one idle system clock cycle between edges.

How would you extend this master to support all four SPI modes?

Two changes: (1) Add a CPOL parameter. In IDLE, initialize sclk <= CPOL instead of 0. After FINISH, set sclk <= CPOL. (2) Add a CPHA parameter. When CPHA=1, the first clock edge shifts data (rather than samples it) — so swap the rising/falling edge logic in the XFER state. One clean way is to compute sample_edge = sclk ^ CPOL ^ CPHA and use that as the branch condition instead of !sclk. This covers all four modes with minimal logic change.

Why is there no formal SPI standard, and how does that affect hardware design?

Motorola introduced SPI in the mid-1980s but never published a formal specification. Different manufacturers implement variations: some omit MISO (write-only slaves), some use active-high CS, some add a word-select line for different frame sizes, and Dual/Quad SPI are entirely vendor-defined extensions. In practice this means you must read the target device's datasheet carefully rather than relying on any standard. It also means your RTL may need to be parameterizable (CPOL, CPHA, frame length, CS polarity, inter-byte gap) to be reusable across projects.

What is the difference between Dual SPI and QSPI, and when would you use each?

Dual SPI uses IO0 and IO1 bidirectionally (typically command in standard mode, data in dual mode) giving 2× throughput. QSPI adds IO2 and IO3 for 4× throughput. Both are half-duplex during the data phase (you either send or receive, not both simultaneously). QSPI is the de-facto interface for external NOR flash on microcontrollers and FPGAs (e.g., Winbond W25Qxx). Dual SPI is a stepping stone — most QSPI-capable devices also support dual mode as a fallback. Use QSPI whenever boot time or read throughput from external flash is a system constraint.