I am back in Switzerland. It has been a good two months away from work, and the first morning after coming back I rushed to the office to unpack and inspect the goods that will hopefully make the mainframe come alive.
So far the equipment is:
- DLm2000 virtual tape system
- 1x Virtual Tape Engine (VTE)
- 1x Access Control Point (ACP)
- 2x Brocade 5100 SAN switch
- 1x Arista DCS 7050S-64-R 10/40 Gbps switch
I am also eagerly awaiting 2x Brocade 7800 SAN switch which should help me connect with other mainframers in the world and share some storage with them.
The Arista and the Brocade 5100 were really no surprises, booted up fine and had decently recent firmware. No worries there, so I'll skip the details.
The DLm2000 is a component I am excited about and I think has great hack potential. It is more or less 2x FICON PCIe cards, and 1x 10 Gbit/s card. It presents itself as one or multiple tape drives and stores the drives as AWS tape files. Given the size of the server itself (2U + 1U) and to minimize running expenses I want to virtualize the server, so the first step is to take disk images of everything. The ACP had a normal SATA drive so that was no issue. You can see the very normal server in picture 1.
|Picture 1: DLm2000 ACP from the inside
The ACP is a single CPU board with a 2x 1GbE Intel network card. Nothing exciting, which makes it a prime candidate for being virtualized.
The VTE is more of a challenge though. See Picture 2.
|Picture 2: DLm2000 VTE from the inside
The DLm2000 is more of a beast. I might end up using this platform to run the virtualization on, we will see. Dual Xeon CPUs and memory slots available to make it a quite beefy platform. In order to take disk images I prefer to not boot the system, I have a helper machine called "slurpee" (Picture 3) that I use to connect to various media to make images out of them. The problem is that the VTE is SAS RAID based. By the looks of it the system uses RAID-1 (two identical 15k drives) so stripping away the RAID metadata should be easy enough, but connecting the drives is another matter.
|Picture 3: "slurpee" the data cloner
So, I am back doing this the old fashioned way of booting through a USB drive and doing the clone on the host system. Now the second problem: VGA. Over the past decade it appears that VGA has left the modern company proven by the fact that I could not find a single monitor or adapter to connect VGA out to HDMI. Luckily the system has an Intel RMM (version 3 I assume). Unluckily it appeared to not be configured and thus deactivated. It produced no Ethernet packets that would give a hint of which IP it was configured for, nor did it try to DHCP. The manual states that the way to enable it is through the host BIOS, so we are back to square one.
I have ordered the cheapest VGA -> HDMI converter I could find, it should arrive early this week. More updates as that progresses.
When I bought the DS6800 I knew it would be a gamble. It is known to be a really unreliable machine, as one Hacker News commenter confirms:
Long story short, the DS6800 does turn on and one controller works from what it seems well enough. The internal diagnostics tells the story about the other controller though.I cringed when he said he bought a DS6800. When I worked at IBM we had about 20 of them and they were shit. Always broke and getting into weird states. No way I'd run one with a support contract.
Sanity Checker v0.30 invoked on c1 (noname) ------------------------------------------ Kona 0 --------- Kona 1 --------- Checking free memory...................... FAILED! Passed Verifying RW partitions................... FAILED! Passed Verifying Kona replacement is enabled..... FAILED! Passed Checking running processes................ FAILED! Passed Checking disk space....................... FAILED! Passed Checking SBR status....................... skipped skipped Verifying four online DA partitions....... FAILED! Passed Verifying certain files do not exist...... Passed Passed Verifying that LCPSS is in Dual mode...... FAILED! FAILED! Verifying no open hardware problems....... FAILED! FAILED! Verifying no open software problems....... FAILED! FAILED! Checking file permissions................. FAILED! Passed Verifying no open cabling problems........ FAILED! Passed Verifying no open data loss problems...... FAILED! Passed Checking symbolic links................... FAILED! Passed Checking number of IML retries............ FAILED! Passed Verifying no CF R/W errors................ FAILED! Passed Scanning ranks............................ FAILED! Passed Checking serials in ncipl (strict)........ FAILED! FAILED! Checking serials in ncipl (vote).......... skipped skipped Checking PDM ISS consistency.............. skipped skipped Checking PDM corruption................... skipped Passed Checking Pulled out BANJO................. FAILED! Passed ----------------------------------------------------------------------------Looking at the traffic from both controllers one is really lively with ARPs and ICMPs and everything, while the other one is just dead. I have heard stories from how these controllers die that range from ridiculous things like that they cannot handle a full filesystem to the RAID controller just dies.
So, I have one working controller - shouldn't that be enough? Maybe, but probably not. From people that have way more experience than me operating these systems I have learned that the system will continue to function with one controller, but will not accept any array changes. This means that your data will continue to live on, but you cannot make any new. And since I want to start with a clean array, that is no good.
Swapping the places of the two controllers worked in that the Kona 1 now became Kona 0, but the other controller is still dead - so at least the chassis is functional.
I am also in talks with a seller on Alibaba about buying more controllers to see if I can brute force it, but given the reliability reputation I do not wish to put a lot more money in the DS6800 unless I can be certain things will work out.
If I were to figure out where the flash is located and how to access it offline I might be more tempted in trying to repair these things. The controller has a bunch of headers (see picture 4) that I am sure would be useful, but so far I haven't been able to figure out where the 2 GB system flash is located.
|Picture 4: Inside the DS6800 controller
Next step in debugging this will be to connect to the serial console the cards have to see if it allows any form of recovery. It is a custom RJ11 -> DB9 cable that I will need to assemble as soon as I figure out the pinout.
That's all for now!