Skip to main content

Who needs VRAM anyway? Or, calculating Mandelbrot on an FPGA

About a month ago Terasic had a 50% off on their DE2 board if you could prove that you're a student (check) and attended a course using FPGAs (check). Sadly I was involved in too many other things at the moment to actually use it.

Fast forward to a week ago. I finally took the time to install Quartus and begin playing around with the hardware. All my previous encounters with FPGAs and CPLDs where with VHDL and cumbersome programs like WebPACK (slow) and HDL Designer (complex). Since my first to programs had been a disappointment I had almost given up on finding a good environment (what's wrong with a CLI compiler and Makefiles!?) but after spending a day or so in Quartus I can honestly say I enjoy it.

During the course mentioned, one laboratory exercise was to create a functional VGA generator using only a VGA DAC located on the board. I had since then become very curious about Verilog, which seemed to fit my preference of C over Ada quite well. This made chosing my first project quite easy; porting the VGA generator to Verilog.


First VGA success!

Success! After some minor amount of bugs from translating the code it worked! I was very happy to have had such easy victory over something that potentially could have been so much hassle. To my knowledge the fitter didn't even mess things up - which I have had my share of issues with in other tools.

Now then, the VGA output generation works but it is not that thrilling to stare at a green box. Some time ago I discussed with a friend about the possibility of implemented a pipelined Mandelbrot generator using a fixed-point decimal system. We thought that the easy operations of Mandelbrot wouldn't pose a problem really and though the idea was plausible. This provided the seed for the next part: generating Mandelbrot in hardware.

A quick walkthrough on Mandelbrot. The Mandelbrot set consists of the points where the equation z_n+1 = z_n^2 + c is bounded (i.e does not diverge to infinity). A rule of thumb can be applied saying that if |z_n| > 2 then it will diverge. Coloring all points belonging to the set white and using 100 iterations before giving up and assume that it is bounded results in the following image.


Mandelbrot set colored white.

That's nice and all, but the real beauty of Mandelbrot sets become apparent if we instead color each pixel with a intensity proportional to the amount of iterations it takes before we know that it is not bounded - if it is bounded (i.e we gave up) we color it black.


Mandelbrot set colored green w/ varied intensity.

Back to the hardware! A normal Mandelbrot set is calculated from -2.0 to 1 (real) and -1.5 and 1.5 (imaginary) or something close to that. This poses a problem: floating point calculations. I do not want to do floating point calculations in the processor because (1) it lacks the blocks for it and (2) it would be a waste of resources I imagine (I would love to be proven wrong here). I decided to go for the solution previously discussed, namely a fixed-point decimal system.

A fixed-point decimal system is basically a scale applied to the initial number. After some experimentation I found that 25 bit of decimal, 7 bit of integer and 1 sign bit probably would be good - total 33 bits.

Since addition and subtraction works as usual with fixed-point as they do with integers, the only operator I needed to implement was multiplication. This is implemented as ab = a*b*1/scale where 1/scale is a power of two and thus implemented as a right shift.

Simple, isn't it? Moving on the Mandelbrot calculation.

The calculations for out_re and out_im are very straight forward to derive and I will not do so in this post, but I will mention the magic value 134217728. 134217728 is 4 in this fixed point system, remember that the rule of thumb was that |Z| > 2? We implement that as Re(z)^2 + Im(z)^2 > 4. I am very satisfied in how clean the implementation of this one iteration ended up being.

Final step, pushing the limits of the hardware and adding as many of these as possible. Testing various combinations proved that a 12 stage pipeline with one calculation per stage is the best. It was possible to have multiple calculations per stage up until some limit at which the fitter would scream that I violated setup and hold times. Since the results are quite funky, I have chosen to include them at the end with a quick description of my diagnosis why they look like they do.

Anyway, final image and the interesting parts of the code follows.


Mandelbrot final output - 12 iterations.

As promised, the goofs follows.


Mandelbrot w/o pipeline, calculation errors and violation of hold time.

Mandelbrot w/o pipeline, violation of hold time.

Mandelbrot w/ pipeline, rounding errors, violating hold time.

Mandelbrot w/ pipeline, rounding errors.

Comments

Popular posts from this blog

Buying an IBM Mainframe

I bought an IBM mainframe for personal use. I am doing this for learning and figuring out how it works. If you are curious about what goes into this process, I hope this post will interest you.

I am not the first one by far to do something like this. There are some people on the internet that I know have their own personal mainframes, and I have drawn inspiration from each and every one of them. You should follow them if you are interested in these things:
@connorkrukosky@sebastian_wind@faultywarrior@kevinbowling1 This post is about buying an IBM z114 mainframe (picture 1) but should translate well to any of the IBM mainframes from z9 to z14.

What to expect of the process Buying a mainframe takes time. I never spent so much time on a purchase before. In fact - I purchased my first apartment with probably less planning and research. Compared to buying an apartment you have no guard rails. You are left to your own devices to ensure the state of whatever thing you are buying as it likely…

Powering a mainframe

The last few days have been eventful. I was contacted by the datacenter that the mainframe's cage is now ready for moving in, and the power has been made available. Very exciting! I grabbed my home-made power cables (more on that later) and my best screwdrivers and set off to the datacenter.


The datacenter staff, not needing a forklift in their day-to-day, had managed to solicit the services of a forklift, the associated operator, and some very handy folks to help navigate the mainframe from the storage space to its final location.



After some intense period of fighting the inclination of the road between the storage facility and the cage (and a door that was a bit too small) it was finally in place. Incidentally we were forced to trust the wheels on this pretty rough floor. I did not expect it to roll that well on raw concrete, I was pleasantly surprised. This thing is a tank!

Now, everybody wanted to see if it was working. My machine did not come with a power cable so I had to so…

Open Datacenter Hardware - Leopard Server

Introduction The Leopard is an OpenRack v1 compliant 12V server commissioned by Facebook to offer compute power. It consists of 2x Intel Xeon E5-2678 v3 and is available with either DDR3 or DDR4 memory. The model is manufactured by two vendors primarily: Quanta and Wiwynn.

Leopard features a 24x PCIe slot which can fit either a PCIe card with low profile, or a riser card with 1x 16x and 1x 8x slots. The server also supports a 3.5" SATA drive as well as either an mSATA or an M.2 drive mounted on the motherboard.

Connectivity wise Leopard has a mezzanine card slot allowing for example 10Gb/s or 25Gb/s Ethernet.

Figure 1 and figure 2 shows the server layout. The server is made to fit inside an OpenRack v1 enclosure, at which point it looks something like figure 3. Due to power constraints an OpenRack v1 can fit 30 of these servers before power starts to become an issue. The Leopard servers that the organization Serverfarm Upstate provides are all fitted with 256GiB DDR3 RAM and 2x …