Skip to main content

Unboxing accessories and DS6800 troubles

I am back in Switzerland. It has been a good two months away from work, and the first morning after coming back I rushed to the office to unpack and inspect the goods that will hopefully make the mainframe come alive.

So far the equipment is:
  • DLm2000 virtual tape system
    • 1x Virtual Tape Engine (VTE)
    • 1x Access Control Point (ACP)
  • 2x Brocade 5100 SAN switch
  • 1x Arista DCS 7050S-64-R 10/40 Gbps switch
I am also eagerly awaiting 2x Brocade 7800 SAN switch which should help me connect with other mainframers in the world and share some storage with them.

The Arista and the Brocade 5100 were really no surprises, booted up fine and had decently recent firmware. No worries there, so I'll skip the details.

DLm2000

The DLm2000 is a component I am excited about and I think has great hack potential. It is more or less 2x FICON PCIe cards, and 1x 10 Gbit/s card. It presents itself as one or multiple tape drives and stores the drives as AWS tape files. Given the size of the server itself (2U + 1U) and to minimize running expenses I want to virtualize the server, so the first step is to take disk images of everything. The ACP had a normal SATA drive so that was no issue. You can see the very normal server in picture 1.

DLm2000 ACP from the inside
Picture 1: DLm2000 ACP from the inside
The ACP is a single CPU board with a 2x 1GbE Intel network card. Nothing exciting, which makes it a prime candidate for being virtualized.

The VTE is more of a challenge though. See Picture 2.

Picture 2: DLm2000 VTE from the inside
The DLm2000 is more of a beast. I might end up using this platform to run the virtualization on, we will see. Dual Xeon CPUs and memory slots available to make it a quite beefy platform. In order to take disk images I prefer to not boot the system, I have a helper machine called "slurpee" (Picture 3) that I use to connect to various media to make images out of them. The problem is that the VTE is SAS RAID based. By the looks of it the system uses RAID-1 (two identical 15k drives) so stripping away the RAID metadata should be easy enough, but connecting the drives is another matter.

"Slurpee" the data cloner
Picture 3: "slurpee" the data cloner
So, I am back doing this the old fashioned way of booting through a USB drive and doing the clone on the host system. Now the second problem: VGA. Over the past decade it appears that VGA has left the modern company proven by the fact that I could not find a single monitor or adapter to connect VGA out to HDMI. Luckily the system has an Intel RMM (version 3 I assume). Unluckily it appeared to not be configured and thus deactivated. It produced no Ethernet packets that would give a hint of which IP it was configured for, nor did it try to DHCP. The manual states that the way to enable it is through the host BIOS, so we are back to square one.

I have ordered the cheapest VGA -> HDMI converter I could find, it should arrive early this week. More updates as that progresses.

DS6800

When I bought the DS6800 I knew it would be a gamble. It is known to be a really unreliable machine, as one Hacker News commenter confirms:
I cringed when he said he bought a DS6800. When I worked at IBM we had about 20 of them and they were shit. Always broke and getting into weird states. No way I'd run one with a support contract.
Long story short, the DS6800 does turn on and one controller works from what it seems well enough. The internal diagnostics tells the story about the other controller though.

Sanity Checker v0.30 invoked on c1 (noname)
------------------------------------------ Kona 0 --------- Kona 1 ---------
Checking free memory...................... FAILED!          Passed           
Verifying RW partitions................... FAILED!          Passed           
Verifying Kona replacement is enabled..... FAILED!          Passed           
Checking running processes................ FAILED!          Passed           
Checking disk space....................... FAILED!          Passed           
Checking SBR status....................... skipped          skipped          
Verifying four online DA partitions....... FAILED!          Passed           
Verifying certain files do not exist...... Passed           Passed           
Verifying that LCPSS is in Dual mode...... FAILED!          FAILED!          
Verifying no open hardware problems....... FAILED!          FAILED!          
Verifying no open software problems....... FAILED!          FAILED!          
Checking file permissions................. FAILED!          Passed           
Verifying no open cabling problems........ FAILED!          Passed           
Verifying no open data loss problems...... FAILED!          Passed           
Checking symbolic links................... FAILED!          Passed           
Checking number of IML retries............ FAILED!          Passed           
Verifying no CF R/W errors................ FAILED!          Passed           
Scanning ranks............................ FAILED!          Passed           
Checking serials in ncipl (strict)........ FAILED!          FAILED!          
Checking serials in ncipl (vote).......... skipped          skipped          
Checking PDM ISS consistency.............. skipped          skipped          
Checking PDM corruption................... skipped          Passed           
Checking Pulled out BANJO................. FAILED!          Passed           
----------------------------------------------------------------------------
Looking at the traffic from both controllers one is really lively with ARPs and ICMPs and everything, while the other one is just dead. I have heard stories from how these controllers die that range from ridiculous things like that they cannot handle a full filesystem to the RAID controller just dies.

So, I have one working controller - shouldn't that be enough? Maybe, but probably not. From people that have way more experience than me operating these systems I have learned that the system will continue to function with one controller, but will not accept any array changes. This means that your data will continue to live on, but you cannot make any new. And since I want to start with a clean array, that is no good.

Swapping the places of the two controllers worked in that the Kona 1 now became Kona 0, but the other controller is still dead - so at least the chassis is functional.

I am also in talks with a seller on Alibaba about buying more controllers to see if I can brute force it, but given the reliability reputation I do not wish to put a lot more money in the DS6800 unless I can be certain things will work out.

If I were to figure out where the flash is located and how to access it offline I might be more tempted in trying to repair these things. The controller has a bunch of headers (see picture 4) that I am sure would be useful, but so far I haven't been able to figure out where the 2 GB system flash is located.

Inside the DS6800 controller
Picture 4: Inside the DS6800 controller
Next step in debugging this will be to connect to the serial console the cards have to see if it allows any form of recovery. It is a custom RJ11 -> DB9 cable that I will need to assemble as soon as I figure out the pinout.

That's all for now!

Comments

Popular posts from this blog

Buying an IBM Mainframe

I bought an IBM mainframe for personal use. I am doing this for learning and figuring out how it works. If you are curious about what goes into this process, I hope this post will interest you.

I am not the first one by far to do something like this. There are some people on the internet that I know have their own personal mainframes, and I have drawn inspiration from each and every one of them. You should follow them if you are interested in these things:
@connorkrukosky@sebastian_wind@faultywarrior@kevinbowling1 This post is about buying an IBM z114 mainframe (picture 1) but should translate well to any of the IBM mainframes from z9 to z14.

What to expect of the process Buying a mainframe takes time. I never spent so much time on a purchase before. In fact - I purchased my first apartment with probably less planning and research. Compared to buying an apartment you have no guard rails. You are left to your own devices to ensure the state of whatever thing you are buying as it likely…

Powering a mainframe

The last few days have been eventful. I was contacted by the datacenter that the mainframe's cage is now ready for moving in, and the power has been made available. Very exciting! I grabbed my home-made power cables (more on that later) and my best screwdrivers and set off to the datacenter.


The datacenter staff, not needing a forklift in their day-to-day, had managed to solicit the services of a forklift, the associated operator, and some very handy folks to help navigate the mainframe from the storage space to its final location.



After some intense period of fighting the inclination of the road between the storage facility and the cage (and a door that was a bit too small) it was finally in place. Incidentally we were forced to trust the wheels on this pretty rough floor. I did not expect it to roll that well on raw concrete, I was pleasantly surprised. This thing is a tank!

Now, everybody wanted to see if it was working. My machine did not come with a power cable so I had to so…

Open Datacenter Hardware - Leopard Server

Introduction The Leopard is an OpenRack v1 compliant 12V server commissioned by Facebook to offer compute power. It consists of 2x Intel Xeon E5-2678 v3 and is available with either DDR3 or DDR4 memory. The model is manufactured by two vendors primarily: Quanta and Wiwynn.

Leopard features a 24x PCIe slot which can fit either a PCIe card with low profile, or a riser card with 1x 16x and 1x 8x slots. The server also supports a 3.5" SATA drive as well as either an mSATA or an M.2 drive mounted on the motherboard.

Connectivity wise Leopard has a mezzanine card slot allowing for example 10Gb/s or 25Gb/s Ethernet.

Figure 1 and figure 2 shows the server layout. The server is made to fit inside an OpenRack v1 enclosure, at which point it looks something like figure 3. Due to power constraints an OpenRack v1 can fit 30 of these servers before power starts to become an issue. The Leopard servers that the organization Serverfarm Upstate provides are all fitted with 256GiB DDR3 RAM and 2x …