Arty A7 Resource Budget
| Resource | Arty A7-35T Available | Our SoC Usage | Utilisation |
|---|---|---|---|
| LUTs | 20,800 | ~14,500 | ~70% |
| Flip-Flops | 41,600 | ~8,200 | ~20% |
| Block RAM (36Kb) | 50 | 8 (288KB SRAM) | 16% |
| DSP48E1 | 90 | 64 (4×4 array, 4 DSPs/PE) | 71% |
| Clock (MMCM) | 5 | 1 | 20% |
XDC — Arty A7 constraint file
## Clock — 100 MHz on-board oscillator create_clock -period 10.000 -name sys_clk [get_ports CLK100MHZ] ## SoC clock: derive 50 MHz from MMCM create_generated_clock -name soc_clk \ -source [get_pins mmcm_inst/CLKIN1] \ -divide_by 2 [get_pins mmcm_inst/CLKOUT0] ## UART pins (Arty A7 USB-UART bridge) set_property PACKAGE_PIN A9 [get_ports uart_tx]; set_property IOSTANDARD LVCMOS33 [get_ports uart_tx] set_property PACKAGE_PIN B9 [get_ports uart_rx]; set_property IOSTANDARD LVCMOS33 [get_ports uart_rx] ## Reset button (BTN0) set_property PACKAGE_PIN D9 [get_ports rst_n]; set_property IOSTANDARD LVCMOS33 [get_ports rst_n] ## LEDs — show accelerator status set_property PACKAGE_PIN H5 [get_ports {led[0]}]; set_property IOSTANDARD LVCMOS33 [get_ports {led[0]}] set_property PACKAGE_PIN J5 [get_ports {led[1]}]; set_property IOSTANDARD LVCMOS33 [get_ports {led[1]}] ## Timing exceptions: false paths on async reset set_false_path -from [get_ports rst_n] ## Input/output delay (UART) set_input_delay -clock soc_clk 2.0 [get_ports uart_rx] set_output_delay -clock soc_clk 2.0 [get_ports uart_tx]
TCL — Vivado build script
# create_project.tcl create_project riscv_accel ./vivado_proj -part xc7a35ticsg324-1L add_files -fileset sources_1 [glob ./rtl/*.v] add_files -fileset constrs_1 ./constrs/arty_a7.xdc set_property top riscv_accel_soc [current_fileset] # Synthesis launch_runs synth_1 -jobs 4 wait_on_run synth_1 # Implementation launch_runs impl_1 -to_step write_bitstream -jobs 4 wait_on_run impl_1 # Check timing open_run impl_1 report_timing_summary -file timing_summary.rpt report_utilization -file utilization.rpt # Program board (if connected) open_hw_manager connect_hw_server open_hw_target program_hw_devices [get_hw_devices xc7a35t_0]
Day 13 — Interview Questions
Q1What is the difference between synthesis and implementation in FPGA flow?
Synthesis translates RTL (Verilog/VHDL) into a netlist of FPGA primitives: LUTs, flip-flops, DSP48 blocks, block RAMs, and I/O buffers. It performs logic optimisation (Boolean minimisation, retiming) and maps the design to the target FPGA's primitive library. The output is a synthesised netlist (.dcp checkpoint). Implementation takes the synthesised netlist and performs: (1) Placement — assigns each primitive to a physical location on the FPGA die, (2) Routing — connects primitives using the FPGA's programmable interconnect, (3) Timing analysis — checks that every path meets the clock period constraint. Implementation is much slower than synthesis (can take hours for large designs) because it must solve placement and routing as a constrained optimisation problem. Synthesis failures indicate RTL/library issues; implementation failures are typically timing closure or resource overuse.
Q2What is setup slack and how do you fix negative slack in Vivado?
Setup slack = (clock period) − (data path delay) − (setup time). Negative slack means the data path is too slow: the signal doesn't arrive at the destination flip-flop before the clock edge. Fixes: (1) Reduce clock frequency (increase period constraint) — simplest but costs performance, (2) Add pipeline registers to break long combinational paths — insert an FF in the middle of the critical path, (3) Use Vivado's retiming option (set_property RETIMING true) to automatically move registers across logic, (4) Manually restructure the RTL to shorten the critical path (e.g., reduce the adder tree depth in the PE accumulator), (5) Use DSP48 primitives for the multiplier — they have dedicated routing that is faster than LUT-based multiplication. Always check the timing report: the critical path shows exactly which logic stages are slow.
Q3How does a Block RAM (BRAM) differ from distributed RAM on an FPGA?
Block RAM is a dedicated, hard-wired dual-port synchronous SRAM primitive embedded in the FPGA fabric (36Kb per block on 7-series). It has fixed latency (1 clock cycle read), high bandwidth (two independent ports), and uses zero LUTs. It's ideal for large memories (SRAM scratchpad, instruction memory). Distributed RAM is built from LUTs configured in RAM mode — flexible in size but uses LUT resources, supports async read (combinational), and is limited to small sizes (64 bytes per LUT in SRL mode). For our SoC: the 256KB SRAM uses 8× BRAM36 primitives (8 × 36Kb = 288Kb, slightly over-provisioned). The boot ROM uses a small BRAM initialised with the hex file using $readmemh in the RTL, which Vivado infers as a BRAM with initial contents from the constraint file or MIF file.
Q4How do you debug a RISC-V SoC running on FPGA without a JTAG debugger?
The primary debug method is UART: implement a UART 16550 peripheral at a known MMIO address, write printf-like functions in bare-metal C that transmit over UART, and connect a serial terminal (minicom, TeraTerm) on the host PC at 115200 baud. Use LED outputs driven by key status signals (accel_done, uart_tx_active, error flags) for instant visual feedback. For deeper hardware debug: use Xilinx ILA (Integrated Logic Analyser) — a soft logic analyser that captures internal signals to on-chip BRAM and streams them back to Vivado over JTAG. Add an ILA core in block diagram or by insertion in RTL: ila_0 inst (.clk(clk), .probe0(accel_start), .probe1(STATUS), ...). Trigger on accel_start to capture the entire accelerator execution waveform without any external equipment.
Q5What is a false path and when do you use set_false_path in XDC?
A false path is a timing path that should not be analysed for setup/hold violations because it is functionally impossible for the signal to propagate in a single clock cycle, or because the path is not timing-critical in practice. Common false paths: (1) Asynchronous reset inputs — the reset is applied manually and held for many cycles; a single-cycle setup check is meaningless, (2) Test mode signals (scan_enable) — only active in scan shift mode, not in functional mode, (3) Cross-domain paths where proper synchronisation (two-FF synchroniser) is already implemented — the synchroniser handles metastability; the path before the first FF is a false path. Declaring false paths prevents Vivado from trying to meet timing on them, which avoids false failures and speeds up implementation. Incorrectly declaring a valid path as false is a serious mistake — it hides real timing violations.
Q6How do you initialise the RISC-V CPU's instruction memory on the FPGA?
The RISC-V boot ROM (and optionally the SRAM) must contain the compiled program before the CPU starts executing. Two approaches: (1) BRAM initialisation from COE/MEM file — Vivado can pre-load BRAM contents at bitstream generation time. Write the compiled binary as a hex file (objcopy -O verilog), reference it in the RTL with $readmemh("boot.mem", mem_array), and Vivado will embed the contents in the bitstream. The CPU runs immediately after FPGA programming with no separate loading step. (2) UART boot loader — the boot ROM contains a tiny UART receiver that waits for a binary image over UART, loads it into SRAM, then jumps to it. This allows updating the software without regenerating the bitstream. Option 1 is simpler; Option 2 is more practical for iterative software development.