You don't need a fab, a billion dollars, or a 500-person team. Soft IP is the highest-margin business in semiconductors — and you can start it with open-source tools, a laptop, and deep RTL expertise.
This is the exact sequence every successful Soft IP company follows. Each stage builds on the last. Don't skip — especially legal and packaging. Those two kill more startups than bad engineering.
Every tool below is free, open-source, and used in real production IP development. You do not need a Synopsys or Cadence license to build and verify a sellable Soft IP.
| Tool | Category | What It Does | Get It |
|---|---|---|---|
| Verilator | Simulation | Fastest open-source RTL simulator; converts SV/Verilog to C++ for cycle-accurate simulation. Industry-trusted. | verilator.org |
| Icarus Verilog (iverilog) | Simulation | Classic event-driven Verilog simulator. Fast setup, good for directed testbenches and quick checks. | iverilog.icarus.com |
| cocotb | Verification | Python-based coroutine testbench framework — write UVM-equivalent testbenches in Python. Huge community. | cocotb.org |
| VUnit | Verification | VHDL/SystemVerilog unit test framework with automatic test discovery and CI integration. | vunit.github.io |
| GTKWave | Debug | Waveform viewer for VCD/FST/LXT files. View simulation output and debug RTL behaviour. | gtkwave.sourceforge.net |
| Yosys | Synthesis | Open-source RTL synthesis framework. Produces netlists from Verilog — also used for formal verification with GHDL. | yosyshq.net |
| OpenROAD | Physical Design | Full open-source RTL-to-GDSII flow: floorplan, placement, CTS, routing. Supports Sky130 and GF180 PDKs. | openroad.tools |
| SymbiYosys (sby) | Formal Verification | Property checking with SVA/PSL using open-source solvers (Boolector, Yices, Z3). Prove your IP correct. | symbiyosys.readthedocs.io |
| SkyWater Sky130 PDK | PDK (130nm) | Google-sponsored open PDK for 130nm. Enables full GDSII tapeout via OpenROAD. Great for IP validation. | github.com/google/skywater-pdk |
| KLayout | Layout Viewer | GDSII layout viewer and editor. View your synthesised layout after OpenROAD place-and-route. | klayout.de |
| Surfer | Waveform | Modern waveform viewer (Rust-based) — fast, supports large simulation dumps, browser-native option. | surfer-project.org |
| Slang | Lint | SystemVerilog language server and linter — catches RTL errors before simulation. VS Code integration. | sv-lang.com |
New to any of these tools? Read the step-by-step workflow guide: Open-Source EDA Tools — Git → iverilog → GTKWave → Verilator → cocotb → Yosys →
Let's walk through a real Soft IP product end to end — from RTL to revenue. We'll build a 32-channel AXI4 DMA Controller. This is one of the most purchased IPs in SoC design — every chip with DDR memory and a processor needs one.
| Parameter | Value |
|---|---|
| Channels | 32 independent DMA channels, individually configurable |
| Data width | 32 / 64 / 128-bit (parameterised) |
| Address space | 32-bit or 64-bit (parameterised) |
| Bus interface | AXI4 Master (data) + AXI4-Lite Slave (config/status) |
| Transfer modes | Memory-to-Memory, Peripheral-to-Memory, Memory-to-Peripheral, Scatter-Gather |
| Interrupts | Per-channel completion/error interrupt, coalescing support |
| Gate count | ~25K gates at 128-bit, 32-channel configuration |
| Max frequency | 800 MHz on TSMC 7nm; 500 MHz on Sky130 |
| Deliverables | RTL (SV), UVM testbench, cocotb testbench, SDC, synthesis scripts, 80-page datasheet, IP-XACT descriptor |
// axi4_dma_top.sv — top-level wrapper module axi4_dma_top #( parameter int CH = 32, // number of channels parameter int DW = 64, // data width: 32/64/128 parameter int AW = 32, // address width: 32/64 parameter int BURST_LEN = 256 // max AXI burst length )( input logic clk, rst_n, // AXI4-Lite config slave input logic [11:0] s_axi_awaddr, input logic s_axi_awvalid, output logic s_axi_awready, input logic [31:0] s_axi_wdata, input logic s_axi_wvalid, output logic s_axi_wready, output logic [1:0] s_axi_bresp, output logic s_axi_bvalid, input logic s_axi_bready, input logic [11:0] s_axi_araddr, input logic s_axi_arvalid, output logic s_axi_arready, output logic [31:0] s_axi_rdata, output logic [1:0] s_axi_rresp, output logic s_axi_rvalid, input logic s_axi_rready, // AXI4 data master output logic [AW-1:0] m_axi_araddr, output logic [7:0] m_axi_arlen, output logic m_axi_arvalid, input logic m_axi_arready, input logic [DW-1:0] m_axi_rdata, input logic [1:0] m_axi_rresp, input logic m_axi_rvalid, output logic m_axi_rready, output logic [AW-1:0] m_axi_awaddr, output logic [7:0] m_axi_awlen, output logic m_axi_awvalid, input logic m_axi_awready, output logic [DW-1:0] m_axi_wdata, output logic [DW/8-1:0] m_axi_wstrb, output logic m_axi_wlast, output logic m_axi_wvalid, input logic m_axi_wready, input logic [1:0] m_axi_bresp, input logic m_axi_bvalid, output logic m_axi_bready, output logic [CH-1:0] irq // per-channel interrupt ); // Internal wiring logic [CH-1:0] ch_en; logic [CH-1:0][AW-1:0] src_addr, dst_addr; logic [CH-1:0][23:0] xfer_len; logic [CH-1:0] ch_done, ch_err; axi4_dma_regfile #(.CH(CH)) u_reg (.*); // AXI-Lite CSR block axi4_dma_scheduler #(.CH(CH)) u_sched (.*); // round-robin arbiter axi4_dma_engine #(.CH(CH),.DW(DW),.AW(AW)) u_eng (.*); // burst engine axi4_dma_irq #(.CH(CH)) u_irq (.*); // interrupt controller endmodule
import cocotb
from cocotb.clock import Clock
from cocotb.triggers import RisingEdge, Timer
from cocotbext.axi import AxiLiteMaster, AxiMaster, AxiRam
@cocotb.test()
async def test_mem_to_mem_transfer(dut):
"""Verify single M2M DMA transfer on channel 0"""
clock = Clock(dut.clk, 5, units="ns") # 200 MHz
cocotb.start_soon(clock.start())
# Reset
dut.rst_n.value = 0
await Timer(100, units="ns")
dut.rst_n.value = 1
await RisingEdge(dut.clk)
# Build AXI masters
axil = AxiLiteMaster(AxiLiteBus.from_prefix(dut, "s_axi"), dut.clk, dut.rst_n)
ram = AxiRam(AxiBus.from_prefix(dut, "m_axi"), dut.clk, dut.rst_n, size=0x10000)
# Load test pattern into source region
data = bytes(range(256))
ram.write(0x0000, data)
# Program DMA channel 0: src=0x0000, dst=0x1000, len=256
await axil.write(0x000, 0x00000000) # CH0_SRC_ADDR
await axil.write(0x004, 0x00001000) # CH0_DST_ADDR
await axil.write(0x008, 0x00000100) # CH0_LEN = 256 bytes
await axil.write(0x00C, 0x00000001) # CH0_CTRL: enable
# Wait for interrupt
for _ in range(10000):
await RisingEdge(dut.clk)
if dut.irq.value & 1:
break
else:
raise cocotb.result.TestFailure("DMA timeout — no IRQ on ch0")
# Verify destination
result = ram.read(0x1000, 256)
assert result == data, f"Data mismatch: {result[:8]} != {data[:8]}"
dut._log.info("PASS: M2M DMA transfer verified")
## synth.tcl — quick gate-count and timing estimate with Yosys yosys -import # Read RTL read_verilog -sv rtl/axi4_dma_top.sv read_verilog -sv rtl/axi4_dma_regfile.sv read_verilog -sv rtl/axi4_dma_scheduler.sv read_verilog -sv rtl/axi4_dma_engine.sv read_verilog -sv rtl/axi4_dma_irq.sv # Synthesize against generic cells hierarchy -check -top axi4_dma_top synth -top axi4_dma_top stat ;# prints gate count and memory usage write_verilog -noattr netlist/axi4_dma_netlist.v
Use three tiers. Most IP companies do. This maximises revenue across customer sizes without losing small design teams or leaving money on the table with large SoC houses.
| Tier | Price (AXI4-DMA example) | What They Get | Who Buys |
|---|---|---|---|
| Evaluation | Free (GitHub) | Lite RTL (4-channel, no scatter-gather), simulation scripts, basic docs | Engineers evaluating feasibility |
| Standard License | $25,000 upfront | Full 32-channel RTL, complete testbench, datasheet, 1 year email support, 1 project/1 chip family | Startups, university spin-outs, low-volume designs |
| Enterprise License | $80,000 upfront + $0.10/chip royalty | All Standard deliverables + source NDA + unlimited projects + phone support + 3 years updates | Mid-size SoC companies, automotive, IoT at scale |
Royalties are the real wealth engine. Even a modest royalty rate on a mass-market chip compounds into serious annual income:
| Channel | How It Works | Revenue Share |
|---|---|---|
| Direct (your website) | Customer finds you via GitHub/blog/LinkedIn, buys directly via contract + wire transfer | 100% yours |
| ChipEstimate.com | IP marketplace — list your IP, customers browse and request datasheets | ~15% commission |
| Synopsys DesignWare Partner | Synopsys sells your IP alongside their portfolio to their installed base of 200+ customers | ~30–40% to Synopsys |
| Cadence IP Exchange | Similar to DesignWare partner — listed in Cadence's IP catalogue | ~30% to Cadence |
| GitHub Sponsors + eval | Open a free evaluation version; channel inbound leads to paid license | 100% on conversion |
| Item | Cost | Notes |
|---|---|---|
| EDA Tools | $0 | Yosys, Verilator, cocotb, GTKWave — all free |
| FPGA Development Board | $200–$500 (one-time) | Xilinx Arty A7 or Terasic DE10 for prototyping |
| Company incorporation | $500–$2,000 | LLC (US) or LLP (India) — use online services |
| IP attorney (license agreement) | $2,000–$5,000 | Do this once, reuse the template. Non-negotiable. |
| Copyright registration | $35–$65 | US Copyright Office or equivalent |
| Website + domain | $100–$200/yr | GitHub Pages is free; use it until you have revenue |
| Conference (DAC, DVCON) | $1,000–$3,000/event | Optional in year 1; direct outreach is more efficient |
| Total Year 1 | ~$5,000–$12,000 | Mostly your time. Tiny for a software business. |
Not immediately. Many IP startups begin as side projects. Build and verify the IP on evenings and weekends, get one paying customer, then decide if the revenue justifies going full-time. The critical caveat: check your employment contract. Some companies have IP assignment clauses that cover work done in your spare time if it's related to your employer's business. When in doubt, get legal advice before sharing the IP commercially.
Several layers of protection:
1. NDA before sharing — always sign a mutual NDA before sharing any RTL or datasheet with a prospect. This is a legal deterrent and creates liability if they breach it.
2. License agreement scope — limit the license to specific projects, chip families, or volume. "Unlimited" licenses cost significantly more.
3. Copyright registration — your RTL is automatically copyrighted when written, but registered copyright makes litigation much easier and enables statutory damages.
4. Obfuscation for evaluation — the free GitHub version can use obfuscated or truncated code that proves the interface works but not the full implementation.
5. Reputation — the semiconductor industry is small. Companies that steal IP get blacklisted. This is a stronger deterrent than lawyers.
Per-project license: The customer pays to use your IP in one specific chip design. If they design a second chip using your IP, they pay again. This is the most common model for complex IP. Revenue scales with the customer's design activity.
Per-seat license: The customer pays for each engineer who can access the IP. Used more for tools and EDA software than for RTL IP, but sometimes seen for verification IP (VIP).
Perpetual royalty-free: One payment, forever, unlimited chips. Used when the customer wants certainty. Price this 5–10× higher than a per-project license to compensate for losing future royalties.
The chicken-and-egg problem. Four approaches that work:
1. Evaluation version on GitHub — engineers find it, run it, trust it through code quality. Engineers are the real decision-makers for IP selection at small companies.
2. University / research partnerships — offer a free research license to a university lab that publishes papers using your IP. Published papers with your IP name are social proof.
3. Former employer as first customer — if you built expertise at a large company, offer them a discounted first-customer deal. They know your quality.
4. Small startup as first customer — target a 5–30 person fabless startup. They can't afford full ARM licenses. Offer a founder-to-founder deal: lower upfront, keep royalties. They get affordable IP, you get a design win reference.
It depends entirely on the open-source license:
MIT / Apache 2.0: Yes, you can use as a base, modify, and sell commercially. Attribution required. No copyleft.
CERN-OHL-S / GPL: Copyleft — any derivative must also be open-sourced. You cannot sell a closed commercial derivative.
Solderpad / CERN-OHL-P / CERN-OHL-W: Permissive or weak copyleft — commercial derivatives allowed with conditions.
Best approach: build your core IP from scratch (full ownership), and optionally use MIT/Apache libraries for testbench utilities or helper scripts. Always check the license of every file you incorporate.
For a focused, well-defined IP like a UART/SPI controller: 2–4 months for an experienced RTL engineer working part-time.
For a medium-complexity IP like an AXI DMA controller: 6–12 months part-time, or 3–6 months full-time. This includes RTL, verification, documentation, and synthesis validation.
For a complex IP like a full RISC-V core: 1–3 years full-time with a small team.
The documentation and verification typically take as long as the RTL design itself. Don't underestimate this — it's what separates a product from a prototype.
| Week | Milestone | Output |
|---|---|---|
| Week 1–2 | Choose your niche IP | Written 1-page product definition: what it does, target customers, competitive landscape |
| Week 3–6 | Design RTL v0.1 | Working Verilog/SV module passing basic directed sim, checked into GitHub |
| Week 7–8 | Write testbench | cocotb or UVM TB running 20+ test cases; lint-clean with Verilator |
| Week 9–10 | FPGA prototype | IP running on Arty A7 or DE10-Lite, actual hardware behavior confirmed |
| Week 11–12 | Documentation v1 | 20-page datasheet: block diagram, port list, programming model, timing diagrams |
| Week 12 | First outreach | 10 LinkedIn messages sent to IC design managers. First prospect meeting booked. |
The semiconductor industry generates hundreds of billions of dollars per year, but most people only hear about it through chip manufacturers like TSMC or fabless design houses like Qualcomm and NVIDIA. Hidden behind these names is a quieter, more profitable layer: the IP vendors — companies that sell the reusable building blocks that every chip is assembled from.
Soft IP is the software of the hardware world. A well-designed AXI DMA controller, once built, can be licensed hundreds of times across different customers and generations of silicon. Each license is nearly pure margin. Each chip shipped earns a royalty. Unlike a chip startup — which burns tens of millions building a single product before generating a dollar — an IP company can reach profitability with two or three engineers and a handful of customers.
Until recently, professional semiconductor development required expensive Synopsys or Cadence licenses that put the barrier well beyond individual engineers. That barrier is gone. Yosys synthesises production RTL. Verilator simulates it at near-commercial speed. Cocotb writes Python testbenches that rival UVM environments. OpenROAD routes chips to GDSII. SkyWater's Sky130 PDK tapes out silicon for free through the Google-sponsored Open MPW program.
The infrastructure that once required a $10M EDA budget is now free. What remains scarce is deep domain expertise — the ability to design a DMA controller that handles every corner case, or a crypto core that meets FIPS 140-3, or a DDR PHY that closes timing on your customer's specific process. That expertise is what engineers spend careers building. It is also exactly what you can package and sell.
The opportunity has never been larger, the tools have never been more accessible, and the world has never needed more semiconductor IP — every AI chip, every EV controller, every 5G radio is assembled from dozens of licensed IP blocks. The question is not whether this business model works. ARM, SiFive, CEVA, and Rambus proved it does. The question is whether you will build the next one.