Skip to main content

Who needs VRAM anyway? Or, calculating Mandelbrot on an FPGA

About a month ago Terasic had a 50% off on their DE2 board if you could prove that you're a student (check) and attended a course using FPGAs (check). Sadly I was involved in too many other things at the moment to actually use it.

Fast forward to a week ago. I finally took the time to install Quartus and begin playing around with the hardware. All my previous encounters with FPGAs and CPLDs where with VHDL and cumbersome programs like WebPACK (slow) and HDL Designer (complex). Since my first to programs had been a disappointment I had almost given up on finding a good environment (what's wrong with a CLI compiler and Makefiles!?) but after spending a day or so in Quartus I can honestly say I enjoy it.

During the course mentioned, one laboratory exercise was to create a functional VGA generator using only a VGA DAC located on the board. I had since then become very curious about Verilog, which seemed to fit my preference of C over Ada quite well. This made chosing my first project quite easy; porting the VGA generator to Verilog.


First VGA success!

Success! After some minor amount of bugs from translating the code it worked! I was very happy to have had such easy victory over something that potentially could have been so much hassle. To my knowledge the fitter didn't even mess things up - which I have had my share of issues with in other tools.

Now then, the VGA output generation works but it is not that thrilling to stare at a green box. Some time ago I discussed with a friend about the possibility of implemented a pipelined Mandelbrot generator using a fixed-point decimal system. We thought that the easy operations of Mandelbrot wouldn't pose a problem really and though the idea was plausible. This provided the seed for the next part: generating Mandelbrot in hardware.

A quick walkthrough on Mandelbrot. The Mandelbrot set consists of the points where the equation z_n+1 = z_n^2 + c is bounded (i.e does not diverge to infinity). A rule of thumb can be applied saying that if |z_n| > 2 then it will diverge. Coloring all points belonging to the set white and using 100 iterations before giving up and assume that it is bounded results in the following image.


Mandelbrot set colored white.

That's nice and all, but the real beauty of Mandelbrot sets become apparent if we instead color each pixel with a intensity proportional to the amount of iterations it takes before we know that it is not bounded - if it is bounded (i.e we gave up) we color it black.


Mandelbrot set colored green w/ varied intensity.

Back to the hardware! A normal Mandelbrot set is calculated from -2.0 to 1 (real) and -1.5 and 1.5 (imaginary) or something close to that. This poses a problem: floating point calculations. I do not want to do floating point calculations in the processor because (1) it lacks the blocks for it and (2) it would be a waste of resources I imagine (I would love to be proven wrong here). I decided to go for the solution previously discussed, namely a fixed-point decimal system.

A fixed-point decimal system is basically a scale applied to the initial number. After some experimentation I found that 25 bit of decimal, 7 bit of integer and 1 sign bit probably would be good - total 33 bits.

Since addition and subtraction works as usual with fixed-point as they do with integers, the only operator I needed to implement was multiplication. This is implemented as ab = a*b*1/scale where 1/scale is a power of two and thus implemented as a right shift.

Simple, isn't it? Moving on the Mandelbrot calculation.

The calculations for out_re and out_im are very straight forward to derive and I will not do so in this post, but I will mention the magic value 134217728. 134217728 is 4 in this fixed point system, remember that the rule of thumb was that |Z| > 2? We implement that as Re(z)^2 + Im(z)^2 > 4. I am very satisfied in how clean the implementation of this one iteration ended up being.

Final step, pushing the limits of the hardware and adding as many of these as possible. Testing various combinations proved that a 12 stage pipeline with one calculation per stage is the best. It was possible to have multiple calculations per stage up until some limit at which the fitter would scream that I violated setup and hold times. Since the results are quite funky, I have chosen to include them at the end with a quick description of my diagnosis why they look like they do.

Anyway, final image and the interesting parts of the code follows.


Mandelbrot final output - 12 iterations.

As promised, the goofs follows.


Mandelbrot w/o pipeline, calculation errors and violation of hold time.

Mandelbrot w/o pipeline, violation of hold time.

Mandelbrot w/ pipeline, rounding errors, violating hold time.

Mandelbrot w/ pipeline, rounding errors.

Comments

Popular posts from this blog

Buying an IBM Mainframe

I bought an IBM mainframe for personal use. I am doing this for learning and figuring out how it works. If you are curious about what goes into this process, I hope this post will interest you. I am not the first one by far to do something like this. There are some people on the internet that I know have their own personal mainframes, and I have drawn inspiration from each and every one of them. You should follow them if you are interested in these things: @connorkrukosky @sebastian_wind @faultywarrior @kevinbowling1 This post is about buying an IBM z114 mainframe (picture 1) but should translate well to any of the IBM mainframes from z9 to z14. Picture 1: An IBM z114 mainframe in all its glory Source: IBM What to expect of the process Buying a mainframe takes time. I never spent so much time on a purchase before. In fact - I purchased my first apartment with probably less planning and research. Compared to buying an apartment you have no guard rails. You are left

zBC12, the new family member

Yesterday after more than a year's delay my zBC12 mainframe finally booted up. This is a machine that was donated to me in hopes to advance the hobbyist community, which I am eternally grateful for. Image 1: Athena, the zBC12 that just now got online Then what is the main selling point of the zBC12 versus the z114? You might recall my article  System z on contemporary zLinux  where I explained that running modern Linux on a z114 is hard. This is the main selling point for me to upgrade - being able to run things like more modern Linuxes than z114. While the latest OSes in zLinux, z/VM, and z/OS require z13 or newer - a zBC12 still allows me to run a few releases newer software. Image 2: The operator himself in the picture with Athena Perhaps one of the bigger deals that is very welcome is the support for OSA-Express5S. This means that while previously you needed both PCIe and I/O bays in order to have both effective higher speed connectivity like 8G FC or 10 GB Ethernet as well as

Brocade Fabric OS downloads

Fabric OS is what runs on the SAN switches I will be using for the mainframe. It has a bit of annoying upgrade path as the guldmyr blog can attest to. TL;DR is that you need to do minor upgrades (6.3 -> 6.4 -> 7.0 -> ... > 7.4) which requires you to get all  Fabric OS images for those versions. Not always easy. So, let's make it a bit easier. Hopefully this will not end up with the links being taken down, but at least it helped somebody I hope. These downloads worked for me and are hash-verified when I could find a hash to verify against. Use at your own risk etc. The URLs are: ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-168954-1/v7.3.2a.zip ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-157071-1/v7.2.1g.zip ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-150357-1/v7.1.2b.zip ftp://ftp.hp.com/pub/softlib/software12/COL38684/co-133135-1/v7.0.2e.zip ftp://ftp.hp.com/pub/softlib/software13/COL22074/co-155018-1/v6.4.3h.zip ftp://ftp.hp.c