EcrioniX ASIC Project · v1.0 — Architecture
SobelCore-01
Sobel Edge Detection ASIC IP — Architecture · Theory · Pipeline · Interface
Domain Image Processing
Input 8-bit Grayscale, pixel-streaming
Kernel 3×3 Sobel Gx + Gy
Output 8-bit Edge Magnitude
Latency 2 rows + 3 cycles
Simulator Icarus Verilog
System Architecture
Where SobelCore-01 fits in a real imaging pipeline — sensor to edge map
Camera CMOS Sensor RGB Bayer RGB ISP Debayer Colour Correct RGB LumaCore-01 RGB → Gray 8-bit luminance Gray[7:0] SobelCore-01 Line Buffers 3×3 Window Sobel Gx + Gy |Gx|+|Gy| THIS IP edge[7:0] Threshold Binary Edge Optional stage 0/1 Application Vision / AI CPU / NPU LEGEND Existing IP This IP External
Internal Block Diagram
Every sub-block inside SobelCore-01 and the data flow between them
SOBELCORE-01 IP BOUNDARY pixel_in[7:0] valid_in frame_width clk rst_n LINE BUFFERS Row[n−2] buffer Row[n−1] buffer Row[n] buffer depth = frame_width pixels each 3 rows 3×3 WINDOW p00 p01 p02 p10 p11 p12 p20 p21 p22 p11 = current output pixel centre 9 pixels Gx KERNEL [-1 0 +1] [-2 0 +2] [-1 0 +1] additions only (no multiplier) Gy KERNEL [-1 -2 -1] [ 0 0 0] [ +1 +2 +1] additions only (no multiplier) Gx Gy MAGNITUDE |Gx| + |Gy| clamp to 8-bit edge_out[7:0] valid_out → OUTPUT clk / rst_n feeds all registers STAGE 1 STAGE 2 STAGE 3a STAGE 3b STAGE 4
Use Cases
Where real-world systems embed a Sobel edge detection hardware block

Autonomous Vehicles

Lane marking detection, obstacle boundary extraction, traffic sign localization. A Sobel IP runs ahead of the CNN to reduce NPU workload by pre-highlighting edges at 60 fps.

Semiconductor Wafer Inspection

High-speed optical scanners use Sobel IPs in FPGAs to detect defect edges on wafer surfaces in real time — far too fast for CPU-based processing at 4K resolution.

Medical Imaging

MRI, CT, and X-ray preprocessing pipelines embed Sobel filters to highlight tissue boundaries and organ contours before the radiologist or AI classifier sees the image.

Barcode & QR Detection

Edge detection dramatically simplifies locating code regions in a frame. Embedded scanners (POS terminals, handheld readers) run Sobel in dedicated silicon to enable sub-millisecond decode.

Surveillance & Motion Detection

Background-subtracted frames are edge-detected in FPGA to isolate moving object contours without streaming full pixel data to CPU — reduces bandwidth and wakes the system only on real events.

FPGA Embedded Vision

Industrial cameras feeding Xilinx/Intel FPGAs use Sobel IP cores as the first stage of an image-processing pipeline — before histogram equalization, optical flow, or CNN feature extraction.

Theory — How Sobel Works
Mathematical foundation translated to integer hardware
Step 1 — Gradient Computation (continuous math)
Gx = ∂I/∂x  ≈  (right columns) − (left columns)
Gy = ∂I/∂y  ≈  (bottom rows) − (top rows)
G = √(Gx² + Gy²)  ←  exact magnitude
An edge is where pixel intensity changes sharply. The derivative is large there. Sobel estimates the derivative using finite differences over a 3×3 neighbourhood.
Step 2 — Hardware Approximation (no √, no ×)
G ≈ |Gx| + |Gy|  ←  Manhattan distance (hardware-friendly)

Gx = (p02 + 2×p12 + p22) − (p00 + 2×p10 + p20)
= (p02 − p00) + 2×(p12 − p10) + (p22 − p20)

Gy = (p20 + 2×p21 + p22) − (p00 + 2×p01 + p02)
= (p20 − p00) + 2×(p21 − p01) + (p22 − p02)
The ×2 factor is a 1-bit left shift. Every other weight is ±1 — just add or subtract. No multiplier hardware needed at all. This is exactly what makes Sobel so attractive for FPGA and ASIC implementations.

Gx — Horizontal Edge Kernel

−1
0
+1
−2
0
+2
−1
0
+1
Strong response on vertical edges (left/right brightness change)

Gy — Vertical Edge Kernel

−1
−2
−1
0
0
0
+1
+2
+1
Strong response on horizontal edges (top/bottom brightness change)
Pipeline Stages
Data flows through 4 registered stages — 4-cycle steady-state latency after fill
S1
LINE BUFFER FILL

3-Row Line Buffer

Each incoming pixel is written into a circular SRAM or shift-register chain of depth frame_width. Three such buffers hold rows n−2, n−1, and n simultaneously. A new output pixel can be produced every clock once 2 full rows are buffered. This is the latency "fill" cost = 2×frame_width cycles at startup.

S2
3×3 WINDOW EXTRACTION

Sliding Window Register

A 3×3 array of 8-bit registers (p00–p22) is fed from the three row-buffer tails and two additional per-row shift registers. On every clock, the window slides one pixel right — the leftmost column is discarded and a new rightmost column is clocked in from the three buffers. Registered output ensures clean timing to stage 3.

S3
SOBEL KERNEL — COMBINATIONAL

Gx and Gy Computation (parallel)

Both Gx and Gy are computed in the same clock cycle using only addition, subtraction, and 1-bit left shifts. The signed 10-bit results are registered at the end of this stage. No multipliers. Critical path = 3 additions + 1 left shift ≈ very short; this stage is never the timing bottleneck.

S4
MAGNITUDE + CLIP

|Gx| + |Gy| → 8-bit Output

Absolute values of Gx and Gy are summed. The result is a 11-bit value (max = 4×255 = 1020). It is clamped to 255 before outputting as edge_out[7:0]. valid_out is asserted simultaneously.

IP Interface — Port List
Pixel-streaming interface — 1 pixel per clock, backpressure-free
PortWidthDirDescription
clk1INSystem clock, rising-edge active
rst_n1INActive-low synchronous reset — clears all line buffers and pipeline regs
frame_width11INHorizontal pixel count of the frame (max 2048). Must be stable during a frame
valid_in1INAssert high when pixel_in holds a valid pixel. No gaps allowed mid-row
pixel_in8INGrayscale pixel value, 8-bit unsigned (0 = black, 255 = white)
valid_out1OUTHigh when edge_out is valid. Delayed by 2 rows + 4 pipeline stages after first valid_in
edge_out8OUTEdge magnitude = |Gx|+|Gy|, clamped to 8-bit. Bright = strong edge, 0 = flat region
Timing Diagram
Pixel flow during steady-state operation (after line buffer fill)
clk valid_in pixel_in valid_out edge_out P[n] P[n+1] P[n+2] P[n+3] P[n+4] 4-cycle pipeline latency E[n] E[n+1] E[n+2] E[n+3] E[n+4] E[n] = edge magnitude of the pixel that entered as P[n]
Verilog Top-Level Module
Port declaration that the full RTL will implement
Verilog sobel_core.v
module sobel_core #(
    parameter IMG_W = 640   // max frame width (sets line-buffer depth)
) (
    input  wire        clk,
    input  wire        rst_n,
    // input stream
    input  wire        valid_in,
    input  wire [7:0]  pixel_in,   // 8-bit grayscale
    input  wire [10:0] frame_width,// actual columns this frame
    // output stream
    output reg         valid_out,
    output reg  [7:0]  edge_out    // |Gx|+|Gy| clamped to 8-bit
);

// ─── Stage 1: Line buffers ──────────────────────────────────────
// Three circular FIFOs of depth IMG_W hold rows n-2, n-1, n

// ─── Stage 2: 3×3 sliding window register ──────────────────────
// reg [7:0] p[0:2][0:2]; (row, col)

// ─── Stage 3: Gx / Gy (combinational, registered at output) ────
// Gx = (p[0][2]+2*p[1][2]+p[2][2]) - (p[0][0]+2*p[1][0]+p[2][0])
// Gy = (p[2][0]+2*p[2][1]+p[2][2]) - (p[0][0]+2*p[0][1]+p[0][2])

// ─── Stage 4: magnitude + clip ─────────────────────────────────
// mag = |Gx| + |Gy|; edge_out = (mag > 255) ? 255 : mag[7:0]

endmodule
What Will Be Implemented
Planned deliverables for SobelCore-01 — this page is the architecture phase
Architecture Page

Theory, block diagrams, port spec, pipeline timing — this page

Verilog RTL — DUT

sobel_core.v — line buffers, 3×3 window, Gx/Gy, magnitude

Verilog Testbench

sobel_tb.v — reads grayscale hex, drives DUT, writes edge hex

Python Pipeline

gray → hex → iverilog → edge → PNG reconstruction

Browser Upload Tool

Upload any image → run Sobel DUT → download edge image

Synthesis Report

Yosys → SKY130 — gate count, area, critical path depth

Related ASIC IP
LumaCore-01 — RGB to Grayscale ASIC IP
The upstream IP that feeds grayscale pixels to SobelCore-01
View LumaCore-01 →