Unboxing accessories and DS6800 troubles

I am back in Switzerland. It has been a good two months away from work, and the first morning after coming back I rushed to the office to unpack and inspect the goods that will hopefully make the mainframe come alive.

So far the equipment is:

DLm2000 virtual tape system

1x Virtual Tape Engine (VTE)
1x Access Control Point (ACP)

2x Brocade 5100 SAN switch
1x Arista DCS 7050S-64-R 10/40 Gbps switch

I am also eagerly awaiting 2x Brocade 7800 SAN switch which should help me connect with other mainframers in the world and share some storage with them.

The Arista and the Brocade 5100 were really no surprises, booted up fine and had decently recent firmware. No worries there, so I'll skip the details.

DLm2000

The DLm2000 is a component I am excited about and I think has great hack potential. It is more or less 2x FICON PCIe cards, and 1x 10 Gbit/s card. It presents itself as one or multiple tape drives and stores the drives as AWS tape files. Given the size of the server itself (2U + 1U) and to minimize running expenses I want to virtualize the server, so the first step is to take disk images of everything. The ACP had a normal SATA drive so that was no issue. You can see the very normal server in picture 1.

Picture 1: DLm2000 ACP from the inside

The ACP is a single CPU board with a 2x 1GbE Intel network card. Nothing exciting, which makes it a prime candidate for being virtualized.

The VTE is more of a challenge though. See Picture 2.

Picture 2: DLm2000 VTE from the inside

The DLm2000 is more of a beast. I might end up using this platform to run the virtualization on, we will see. Dual Xeon CPUs and memory slots available to make it a quite beefy platform. In order to take disk images I prefer to not boot the system, I have a helper machine called "slurpee" (Picture 3) that I use to connect to various media to make images out of them. The problem is that the VTE is SAS RAID based. By the looks of it the system uses RAID-1 (two identical 15k drives) so stripping away the RAID metadata should be easy enough, but connecting the drives is another matter.

Picture 3: "slurpee" the data cloner

So, I am back doing this the old fashioned way of booting through a USB drive and doing the clone on the host system. Now the second problem: VGA. Over the past decade it appears that VGA has left the modern company proven by the fact that I could not find a single monitor or adapter to connect VGA out to HDMI. Luckily the system has an Intel RMM (version 3 I assume). Unluckily it appeared to not be configured and thus deactivated. It produced no Ethernet packets that would give a hint of which IP it was configured for, nor did it try to DHCP. The manual states that the way to enable it is through the host BIOS, so we are back to square one.

I have ordered the cheapest VGA -> HDMI converter I could find, it should arrive early this week. More updates as that progresses.

DS6800

When I bought the DS6800 I knew it would be a gamble. It is known to be a really unreliable machine, as one Hacker News commenter confirms:

I cringed when he said he bought a DS6800. When I worked at IBM we had about 20 of them and they were shit. Always broke and getting into weird states. No way I'd run one with a support contract.

Long story short, the DS6800 does turn on and one controller works from what it seems well enough. The internal diagnostics tells the story about the other controller though.

Sanity Checker v0.30 invoked on c1 (noname)
------------------------------------------ Kona 0 --------- Kona 1 ---------
Checking free memory...................... FAILED!          Passed           
Verifying RW partitions................... FAILED!          Passed           
Verifying Kona replacement is enabled..... FAILED!          Passed           
Checking running processes................ FAILED!          Passed           
Checking disk space....................... FAILED!          Passed           
Checking SBR status....................... skipped          skipped          
Verifying four online DA partitions....... FAILED!          Passed           
Verifying certain files do not exist...... Passed           Passed           
Verifying that LCPSS is in Dual mode...... FAILED!          FAILED!          
Verifying no open hardware problems....... FAILED!          FAILED!          
Verifying no open software problems....... FAILED!          FAILED!          
Checking file permissions................. FAILED!          Passed           
Verifying no open cabling problems........ FAILED!          Passed           
Verifying no open data loss problems...... FAILED!          Passed           
Checking symbolic links................... FAILED!          Passed           
Checking number of IML retries............ FAILED!          Passed           
Verifying no CF R/W errors................ FAILED!          Passed           
Scanning ranks............................ FAILED!          Passed           
Checking serials in ncipl (strict)........ FAILED!          FAILED!          
Checking serials in ncipl (vote).......... skipped          skipped          
Checking PDM ISS consistency.............. skipped          skipped          
Checking PDM corruption................... skipped          Passed           
Checking Pulled out BANJO................. FAILED!          Passed           
----------------------------------------------------------------------------

Looking at the traffic from both controllers one is really lively with ARPs and ICMPs and everything, while the other one is just dead. I have heard stories from how these controllers die that range from ridiculous things like that they cannot handle a full filesystem to the RAID controller just dies.

So, I have one working controller - shouldn't that be enough? Maybe, but probably not. From people that have way more experience than me operating these systems I have learned that the system will continue to function with one controller, but will not accept any array changes. This means that your data will continue to live on, but you cannot make any new. And since I want to start with a clean array, that is no good.

Swapping the places of the two controllers worked in that the Kona 1 now became Kona 0, but the other controller is still dead - so at least the chassis is functional.

I am also in talks with a seller on Alibaba about buying more controllers to see if I can brute force it, but given the reliability reputation I do not wish to put a lot more money in the DS6800 unless I can be certain things will work out.

If I were to figure out where the flash is located and how to access it offline I might be more tempted in trying to repair these things. The controller has a bunch of headers (see picture 4) that I am sure would be useful, but so far I haven't been able to figure out where the 2 GB system flash is located.

Picture 4: Inside the DS6800 controller

Next step in debugging this will be to connect to the serial console the cards have to see if it allows any form of recovery. It is a custom RJ11 -> DB9 cable that I will need to assemble as soon as I figure out the pinout.

That's all for now!

System z on contemporary zLinux

IBM System z supports a handful of operating systems; z/VM, z/VSE, z/OS, z/TPF, and finally zLinux. All the earlier mentioned OSes are proprietary except for zLinux which is simply Linux with a fancy z in the name. zLinux is the term used to describe a Linux distribution compiled for S390 (31 bit) or S390X (64 bit). As we are talking about modern mainframes I will not be discussing S390, only S390X. There is a comfortable amount of distributions that support S390X - more or less all of the popular distributions do. In this list we find distributions like Debian, Ubuntu, Gentoo, Fedora, and RHEL. Noticeably Arch is missing but then again they only have an official port for x86-64. This is great - this means that we could download the latest Ubuntu, boot the DVD, and be up and running in no time, right? Well, sadly no. The devil is, as always, in the details. When compiling high level code like C/C++/Go the compiler needs to select an instruction set to use for the compi...

mainframe.dev