Setting Up an Open Source FPGA
Hey folks, after a 3 year hiatus (in which time I found and married the love of my life!), we are back to it with The Lambda Scheme. Since the time I wrote my last post AI has gone from a hot space in the software world to a mainstream conversation topic around the family dinner table. What a world. So you would think that AI will surely be top of mind for us here on The Lambda Scheme. We'll get to that.
Where I left things off here I was trying to make a neural net classifier for musical instruments based on their harmonic signatures. I never quite got it to work so I thought upon my return it would be good to build up to something that complex from some AI fundamentals. That's what we'll be doing in the next few posts.
In my next posts we will see how you can get a neural net running on different kinds of hardware to give it a speed boost. Doing so will give us a solid appreciation for how LLMs really work. As a part of my research for the next post I had to get an FPGA running which was a harrowing adventure. So in this post I'll share my findings for anyone who wants to do anything similar with FPGAs. FPGAs are fun and versatile so setting one up is an investment and we will for sure see them again in this blog.
For any of you who have never seen one before, an FPGA is basically a chip that can efficiently emulate any other digital hardware. When chip designers want to test out their designs, they do so on an FPGA before having the chips fabricated en masse. In college they were central in our advanced CPU design courses. Today we won't be doing anything quite so advanced. We are just going to set up a basic hello world. Since this is still rather difficult I made a post out of it. As is our fashion on the Lambda Scheme we'll also get a little philosophical and take the opportunity to really understand the broader mental models that go into getting an FPGA set up.
Commercial vs Open Source
But first a rant...
When I first started this project I figured I should go with the time tested big names in hardware so I ordered myself an Arty A7 board with a Xilinx chip on it. But seriously, just try getting this thing's toolchain running on Linux. After providing lots of personal details to make an account, I had to download a 54Gb (54 GIGABYTE!) toolchain from a CDN that was limited to single digit Mbps... only to discover hours later that among the myriad options I'd selected, I'd opted for the enterprise version of the toolchain which requires a paid license. FML. After getting the right version setting up the toolchain proved to be a configuration nightmare.
So after that first experience I sent that thing straight back and opted for a fully open source chip, the Radiona ULX3S, with a fully open source toolchain. Lean, easy to install, easy to use. God bless.
Since I didn't find a clear guide online for setting up the code scaffolding for the chip, think of this post as the missing tutorial with some general hardware knowledge sprinkled in for those who want to learn.
Basics of an FPGA
Before diving into our tutorial, let's talk a little bit about how FPGAs work, just enough to know how to use them correctly. If you ever took a course on computer architecture or read my previous post on Quantum computing this diagram will look familiar.
It represents a simple digital circuit using abstract building blocks called logic gates. In this example (taken from Boole's own original writings) we have a circuit to evaluate the Biblical dietary restriction "Clean beasts are those which both divide the hoof and chew the cud."
When digital circuits like our kosher box above have no state we call them "combinational" and we can fully capture their behavior with a truth table like this one
So now if we want to simulate this circuit, we don't need to simulate the logic gates that make it up and how they interract. Another approach is that we just store all of these possible outputs in memory and then just "look up" the one that corresponds to our input.
This approach while usually less efficient in hardware is more versatile because we can take the same "look up circuit" and reconfigure it to mimick the behavior of any combinatorial circuit. Arrange these "lookup tables" so that they feed into each other based on your configuration and you have yourself an FPGA!
Bigger grids of these circuits can mimick more complex design. So using this knowledge, we can do some back of the envelope math to check if a given chip will meet our needs. Our FPGA itself has 12k LUTs, each of which can take four bits of input and turn them input 1 bit of output. A given model weight in an LLM will map to at least 8 bits for high-ish fidelity so it's pretty clear we won't be able to handle even relatively small models like TinyStories.
What we can more realistically do is break up our LLM into parts and see if we can get some of those parts running on an FPGA. This will be a great opportunity for us to really learn the internals of an LLM. My suspicion is that a pipeline of FPGAs each doing some stage of the transformation could outperform a GPU, but that's what we're gonna find out! Before that, we'll do some basic mucking around with LLMs in our next few posts and before that we'll continue with our basic set-up of our FPGA.
The Open Source Toolchain
To get an FPGA to do what you want you start with a digital circuit design that you want it to emulate. If you've ever used something like logisim before, the idea is similar, but instead of GUIs the pros use a hardware description language to specify the circuit. From here your steps are basically:
- Convert the design into a bitstream that you can send to the ULX3S over its USB port
- Send the bitstream onto the FPGA at which point it will start emulating your circuit
Yosys
Yosys is the tool you use to simulate circuit designs and convert them to bit streams.
Downloads are available from the official github page, once downloaded you can follow the instructions to add oss-cad-suite to your bin path and you should be able to see
$ yosys --version
Yosys 0.54+37 (git sha1 99f7d79ab, clang++ 18.1.8 -fPIC -O3)
OpenFPGALoader
Next openFPGALoader is how you will interface with the hardware itself and get your bitstream onto the FPGA. This should also come with the oss-cad-suite above. Confirm by running
$ openFPGALoader --Version
openFPGALoader v0.13.1
Some useful basic commands here for testing the operation of this tool
| Command | Function |
|---|---|
--list-boards |
List the boards which openFPGALoader is able to support |
--list-cables |
List the USB adapters-jtag adapters that openFPGALoader supports |
--scan-usb |
Scan for boards to connect to |
--detect |
Connect to a board and get info |
We'll use it here to do a sanity check and ensure we can connect to our board
$ sudo openFPGALoader --scan-usb
empty
Bus device vid:pid probe_type manufacturer serial product
001 087 0x0403:0x6015 ft231X FER-RADIONA-EMARD D01205 ULX3S FPGA 12K v3.0.8
$ sudo openFPGALoader -b ulx3s --detect
empty
Jtag probe limited to 3MHz
Jtag frequency : requested 6000000Hz -> real 3000000Hz
ret 0
index 0:
idcode 0x21111043
manufacturer lattice
family ECP5
model LFE5U-12
irlength 8
Note you will need to run these with sudo because we haven't set up our udev rules yet which we'll do next!
Udev Rules
A udev rule is basically a Linux configuration that tells the operating system what to do when a device is connected. In this case we'll set up a udev rule that gives yourself permission to access the FPGA when its connected so that you don't need to keep running commands as root.
So start by getting your device's vendor and product info. In this case we'll list our USB devices and look for the FTDI USB Bridge and get the vendor and product ID from the output.
$ lsusb | grep Future
Bus 001 Device 087: ID 0403:6015 Future Technology Devices International, Ltd Bridge(I2C/SPI/UART/FIFO)
Plug these values into a new udev file that says the device can be accessed by anyone in the "plugdev" group
sudo tee /etc/udev/rules.d/99-ulx3s-ftdi.rules >/dev/null <<'RULES'
# ULX3S / FTDI access for non-root
SUBSYSTEM=="usb", ATTR{idVendor}=="0403", ATTR{idProduct}=="6015", GROUP="plugdev", MODE="0666", TAG+="uaccess"
RULES
Tell udev to load the new rule
sudo udevadm control --reload-rules
sudo udevadm trigger
and ensure you are in the right groups
sudo usermod -aG plugdev,dialout $USER
and now you should be able to run all of the openFPGALoader commands above without sudo permissions
Nextpnr and ECPack
Two more tools sit between yosys which takes your source code and starts to turn it into something you can put on an FPGA and openFPGALoader which sends the final bitstream to the FPGA.
nextpnr-ecp5 and ecppack each take an intermediate abstract representation of your design and add some specifics for your board to get the final right configuration.
nextpnr-ecp5 is what's called a "place and route" tool which decides which specific LUTs will be used to emulate the different components of your design and which routes between them will model the connections between the components.
ecppack takes that placement and encodes it as a stream of bits which can be sent one-by-one to the board in a way it understands to configure itself
The full pipeline looks like this
Code Scaffolding
Makefile
Ok let's tie all these tools together into a Makefile
Store this at Makefile
PACKAGE = CABGA381
top.json: top.v uart.v
yosys -p 'read_verilog top.v uart.v; synth_ecp5 -top top -json top.json'
top.config: top.json ulx3s_12f.lpf
nextpnr-ecp5 --json top.json --lpf ulx3s_12f.lpf --textcfg top.config --12k --package $(PACKAGE)
top.bit: top.config
ecppack top.config top.bit
flash: top.bit
openFPGALoader --board=ulx3s top.bit
.PHONY: flash-%
flash-%:
@echo "🔁 Linking $*.v to top.v and flashing..."
ln -sf $*.v top.v
$(MAKE) flash
clean:
rm top.bit top.config top.json
I've configured this makefile to support loading different designs onto the board depending on which one you select. If I for instance have a file called echo.v then I can run make flash-echo which will load this onto the board.
There are a handful of files which are generic configuration files for my board which will always be the same so really what I'm doing is swapping in one particular file (in this case echo.v) by symlinking it to top.v and then running the toolchain with top.v and my other files.
Let's go over these one by one.
Netlist JSON
top.json: top.v uart.v
yosys -p 'read_verilog top.v uart.v; synth_ecp5 -top top -json top.json'
as mentioned the first stage in our pipeline is to create a generic netlist JSON. This file describes your design in abstract terms that are agnostic to your particular board but more dumbed down and easier to parse than raw verilog. It takes as input the top.v file with my main design as well as a uart.v module which is a general UART I'll be covering in the next section. Once you add these files you'll be able to run make top.json to get the netlist.
We can probe into this file to get a little more detail in its internals. We'll see it has two top-level structures for information
$ jq keys < top.json
[
"creator",
"modules"
]
See what it specifies as the creator
$ jq .creator < top.json
"Yosys 0.54+37 (git sha1 99f7d79ab, clang++ 18.1.8 -fPIC -O3)"
count the distinct modules it produces
$ jq '.modules | length' < top.json
91
probe into one particular module
$ jq '.modules.top | keys' < top.json
[
"attributes",
"cells",
"netnames",
"ports"
]
and even see how particular LUTs are configured
$ jq '.modules.top.cells.led_reg_LUT4_C_1' < top.json
{
"hide_name": 0,
"type": "LUT4",
"parameters": {
"INIT": "1111000011001100"
},
Pretty cool!
Config
The next step is what we call "place and route" - this is where we take the abstract set of components and connections we have in our JSON file and map them to specific LUTs and wirings in our FPGA. To do this, our place and route tool nextpnr-ecp5 needs to know the specifics of our board like what LUTs are available and what they're connected to.
The model name of the FPGA that's on the ULX3S board will look like LFE5U-XXF-6BG381C where XX is the number, in thousands, of LUTs in the chip (12, 44, or 84). The 6BG381C at the end tells us about the packaging of our chip. Specifically it says that underneath the chip there is a ball brid of 381 contact points and these are how we interface with the chip. Since these touch points are in a grid we can reference them by names like "H3" or "L1", meaning "column H row 3" etc. On the board, these contact points touch actual devices like buttons, lights, and USB interfaces which we want to be able to reference with more abstract names like "led3" or "wifi_gpio0" so we need something that tells the tool how to map from names to specific pins. This information all lives in a constrant file called an .lpf file.
I've included my full lpf file for the ULX3S here. In it you'll find lines like
LOCATE COMP "led[0]" SITE "B2";
That basically say that when we say in our HDL that something is connected to led[0] that means it's connected to touch point B2 on the chip (which on our board is then connected to LED 0). Putting the command all together we get the make rule
PACKAGE = CABGA381
top.config: top.json ulx3s_12f.lpf
nextpnr-ecp5 --json top.json --lpf ulx3s_12f.lpf --textcfg top.config --12k --package $(PACKAGE)
where --12k says we're configuring for a twelve-thousand LUT chip, the CABGA381 package is used for a 381 touch point ball grid array packaging, and we provide our constraint file as ulx3s_12f.lpf.
Some example lines you might see in this file include
.tile R18C4:PLC2
arc: A5 V02S0101
word: SLICEA.K0.INIT 0011001100111100
enum: SLICEA.MODE CCU2
This config line targets a "tile" in the FPGA which is a grid of four "slices" SLICEA, SLICEB, SLICEC, and SLICED. We specifically target tile R18C4:PLC2 (random identifier). The arc directive establishes a connection from the A5 pin covered earlier to one of the tile's inputs. The word directive lets us configure an individual LUT with particular lookup values and the enum directive let's us set the mode for our slice (there are a few things they can be besides basic LUTs, don't worry about it too much).
Tying it All Together and Flashing
Alright that's enough horsing around with internals, let's get to getting something on the board. With all of these bits in place you can use the ecppack tool to turn a config file into a bit sequence that can be streamed to the fpga over USB to configure it.
top.bit: top.config
ecppack top.config top.bit
and then openFPGALoader to actually do the streaming
flash: top.bit
openFPGALoader --board=ulx3s top.bit
As a convenience, I wrote this top-level phony make target that wraps everything all together.
flash-%:
@echo "🔁 Linking $*.v to top.v and flashing..."
ln -sf $*.v top.v
$(MAKE) flash
It basically lets you have a bunch of different designs in the same directory to play around with and then at build time will hot-swap in the one you want (via symbolic linking) and flash that onto the board. So I have for instance the hello.v program we'll go over in the next section and can run make flash-hello to flash it onto the board.
Hello, World!
And now the time is come. With all that investment it's time to get a little payoff (more to come in future posts!) and set our FPGA up with a simple hello, world! program. This program is going to read characters one at a time from the input USB stream and echo them back. It will also set the LEDs of the FPGA as it receives characters so we can see a little action on the board. While not the most exciting, this program will be an important step to building up more complex behavior down the line where we will need to stream tokens to/from our board if we want to use it to run LLMs.
So let's put in our echo program.
module top (
input clk_25mhz,
output [7:0] led,
output wifi_gpio0,
input ftdi_txd,
output ftdi_rxd
);
assign wifi_gpio0 = 1;
wire clk = clk_25mhz;
wire reset = 0;
wire uart_txd_ready, uart_rxd_strobe;
reg uart_txd_strobe = 0;
reg [7:0] uart_txd;
wire [7:0] uart_rxd;
reg [7:0] led_reg;
uart #(.DIVISOR(2604)) uart_inst (
.clk(clk),
.reset(reset),
.serial_txd(ftdi_rxd),
.serial_rxd(ftdi_txd),
.txd(uart_txd),
.txd_ready(uart_txd_ready),
.txd_strobe(uart_txd_strobe),
.rxd(uart_rxd),
.rxd_strobe(uart_rxd_strobe)
);
assign led = led_reg;
always @(posedge clk) begin
uart_txd_strobe <= 0;
if (uart_rxd_strobe) begin
led_reg <= uart_rxd;
uart_txd_strobe <= 1;
uart_txd <= uart_rxd;
end
end
endmodule
Breaking down a couple of critical bits here we have
module top (
input clk_25mhz,
output [7:0] led,
output wifi_gpio0,
input ftdi_txd,
output ftdi_rxd
);
This top module is what yosys reads as the top-level description of our hardware (via the -top top CLI flag), sort of like a main function but for hardware. This top-level module can then pull in whatever other modules make up the rest of the design and tie them together as needed. In this case we include another component, a UART, which we'll cover in more detail below.
uart #(.DIVISOR(2604)) uart_inst (
clk(clk),
.reset(reset),
.serial_txd(ftdi_rxd),
.serial_rxd(ftdi_txd),
.txd(uart_txd),
.txd_ready(uart_txd_ready),
.txd_strobe(uart_txd_strobe),
.rxd(uart_rxd),
.rxd_strobe(uart_rxd_strobe)
);
but for now think of it as our bridge to the USB interface. We'll write and read characters via USB and this component lets us stream those between our board and our computer.
always @(posedge clk) begin
uart_txd_strobe <= 0;
if (uart_rxd_strobe) begin
led_reg <= uart_rxd;
uart_txd_strobe <= 1;
uart_txd <= uart_rxd;
end
end
This says that we want a hardware block that is triggered on every clock edge to see if the UART has a character for us to read. If so, read the character in, display its bits with our LEDs so we can see it on the board, and write the character back. Now to get this to work we need a UART which can stream these bits. Let's talk a little about what this component is.
When I write characters to my board they get sent through the board's USB interface into a chip called its USB bridge. Chips like this are most commonly manufactured by Future Technology Devices International so we just call them FTDI chips for short. This chip handles all the complexity of communicating over USB and translates these interdevice signals into a much simpler protocol, in this case a transistor-transistor logic (TTL) serial signal.
Thanks to this approach, the component we need to write for our board is fairly simple. It looks like htis.
module uart #(parameter DIVISOR=40)(
input clk,
input reset,
output serial_txd,
input serial_rxd,
input [7:0] txd,
input txd_strobe,
output txd_ready,
output [7:0] rxd,
output rxd_strobe
);
// TX
reg [15:0] tx_cnt = 0;
reg [3:0] tx_bit = 0;
reg [9:0] tx_shift = 10'b1111111111;
reg tx_busy = 0;
assign serial_txd = tx_shift[0];
assign txd_ready = !tx_busy;
always @(posedge clk) begin
if (reset) begin
tx_cnt <= 0; tx_bit <= 0; tx_busy <= 0; tx_shift <= 10'b1111111111;
end else if (!tx_busy && txd_strobe) begin
tx_shift <= {1'b1, txd, 1'b0};
tx_busy <= 1;
tx_cnt <= 0;
tx_bit <= 0;
end else if (tx_busy) begin
tx_cnt <= tx_cnt + 1;
if (tx_cnt == DIVISOR-1) begin
tx_cnt <= 0;
tx_shift <= {1'b1, tx_shift[9:1]};
tx_bit <= tx_bit + 1;
if (tx_bit == 9)
tx_busy <= 0;
end
end
end
// RX
reg [15:0] rx_cnt = 0;
reg [3:0] rx_bit = 0;
reg [7:0] rx_shift = 0;
reg rx_reading = 0;
reg rxd_strobe_reg = 0;
assign rxd = rx_shift;
assign rxd_strobe = rxd_strobe_reg;
reg last_rxd = 1;
always @(posedge clk) begin
rxd_strobe_reg <= 0;
if (reset) begin
rx_cnt <= 0; rx_bit <= 0; rx_reading <= 0;
rxd_strobe_reg <= 0;
end else if (!rx_reading) begin
if (!serial_rxd && last_rxd) begin
rx_cnt <= DIVISOR + (DIVISOR >> 1); // Wait a read cycle and a half to start reading
rx_bit <= 0;
rx_reading <= 1;
end
end else begin
if (rx_cnt > 0) begin
rx_cnt <= rx_cnt - 1;
end else if (rx_bit < 8) begin
rx_cnt <= DIVISOR;
rx_shift <= {serial_rxd, rx_shift[7:1]};
rx_bit <= rx_bit + 1;
end else begin
rx_reading <= 0;
rxd_strobe_reg <= 1;
end
end
if (rx_ready) rx_ready <= 0;
last_rxd <= serial_rxd;
end
endmodule
Fair warning. I attempted a few iterations of having ChatGPT write this UART for me and every time it failed so I eventually just wrote it myself. Caveat emptor
And with that we have all the pieces in place so we go ahead and run
$ make flash-echo
which should put our echo circuit design onto the board. We'll go ahead and run
$ screen /dev/ttyUSB0 9600
which will set up a virtual terminal on our device that communicates directly over the serial line of the board at 9600 baud. This corresponds to the internal clock we set up for our board, so how frequently it will be reading in bits. And if all goes according to plan, you'll see the characters you write, written back to you! Exciting
Ok well for me it was very exciting, but we'll be getting to some of the meatier stuff in future posts. Until then, I hope you enjoyed this deep dive into hardware tools and feel you have a few more skills in your arsenal to build cool and interesting things. If this didn't work for you right off the bat (how could it not?) I'll leave you with some debugging tips I learned along the way.
Stay curious and stay tuned everyone!
Troubleshooting
Some troubleshooting tips based on my experience:
- If your USB cable seems to be working (your board lights up) but your computer doesn't discover the board, consider you may be using a power-only USB cable and need to switch with one that also has data lines (I did not know this was a thing)
- Use
dmesg -wto monitor kernel logs when connecting your device - this can help you debug any other communication issues between your board and your computera - The READMEs for project trellis and nextpnr (specifically for ecp5) were great for helping me get the toolchain set up. At one point I tried using a container with the toolchain and if you know me you know I'm a big fan of containers, but ultimately abandoned this and went for a host-level installation.
- Once you have the toolchain and can flash a design onto the device, but are struggling to get communication working, use the on-board LEDs to diagnose the issue. I had to use these a bunch by setting values for them to see what the board was receiving from the UART. It takes some ingenuity, but by encoding debug messages to yourself by turning lights on and off you can figure out a lot of what's going on inside the board.
Additional Resources
Some things to help you along the way