Skip to main content

Unboxing accessories and DS6800 troubles

I am back in Switzerland. It has been a good two months away from work, and the first morning after coming back I rushed to the office to unpack and inspect the goods that will hopefully make the mainframe come alive.

So far the equipment is:
  • DLm2000 virtual tape system
    • 1x Virtual Tape Engine (VTE)
    • 1x Access Control Point (ACP)
  • 2x Brocade 5100 SAN switch
  • 1x Arista DCS 7050S-64-R 10/40 Gbps switch
I am also eagerly awaiting 2x Brocade 7800 SAN switch which should help me connect with other mainframers in the world and share some storage with them.

The Arista and the Brocade 5100 were really no surprises, booted up fine and had decently recent firmware. No worries there, so I'll skip the details.

DLm2000

The DLm2000 is a component I am excited about and I think has great hack potential. It is more or less 2x FICON PCIe cards, and 1x 10 Gbit/s card. It presents itself as one or multiple tape drives and stores the drives as AWS tape files. Given the size of the server itself (2U + 1U) and to minimize running expenses I want to virtualize the server, so the first step is to take disk images of everything. The ACP had a normal SATA drive so that was no issue. You can see the very normal server in picture 1.

DLm2000 ACP from the inside
Picture 1: DLm2000 ACP from the inside
The ACP is a single CPU board with a 2x 1GbE Intel network card. Nothing exciting, which makes it a prime candidate for being virtualized.

The VTE is more of a challenge though. See Picture 2.

Picture 2: DLm2000 VTE from the inside
The DLm2000 is more of a beast. I might end up using this platform to run the virtualization on, we will see. Dual Xeon CPUs and memory slots available to make it a quite beefy platform. In order to take disk images I prefer to not boot the system, I have a helper machine called "slurpee" (Picture 3) that I use to connect to various media to make images out of them. The problem is that the VTE is SAS RAID based. By the looks of it the system uses RAID-1 (two identical 15k drives) so stripping away the RAID metadata should be easy enough, but connecting the drives is another matter.

"Slurpee" the data cloner
Picture 3: "slurpee" the data cloner
So, I am back doing this the old fashioned way of booting through a USB drive and doing the clone on the host system. Now the second problem: VGA. Over the past decade it appears that VGA has left the modern company proven by the fact that I could not find a single monitor or adapter to connect VGA out to HDMI. Luckily the system has an Intel RMM (version 3 I assume). Unluckily it appeared to not be configured and thus deactivated. It produced no Ethernet packets that would give a hint of which IP it was configured for, nor did it try to DHCP. The manual states that the way to enable it is through the host BIOS, so we are back to square one.

I have ordered the cheapest VGA -> HDMI converter I could find, it should arrive early this week. More updates as that progresses.

DS6800

When I bought the DS6800 I knew it would be a gamble. It is known to be a really unreliable machine, as one Hacker News commenter confirms:
I cringed when he said he bought a DS6800. When I worked at IBM we had about 20 of them and they were shit. Always broke and getting into weird states. No way I'd run one with a support contract.
Long story short, the DS6800 does turn on and one controller works from what it seems well enough. The internal diagnostics tells the story about the other controller though.

Sanity Checker v0.30 invoked on c1 (noname)
------------------------------------------ Kona 0 --------- Kona 1 ---------
Checking free memory...................... FAILED!          Passed           
Verifying RW partitions................... FAILED!          Passed           
Verifying Kona replacement is enabled..... FAILED!          Passed           
Checking running processes................ FAILED!          Passed           
Checking disk space....................... FAILED!          Passed           
Checking SBR status....................... skipped          skipped          
Verifying four online DA partitions....... FAILED!          Passed           
Verifying certain files do not exist...... Passed           Passed           
Verifying that LCPSS is in Dual mode...... FAILED!          FAILED!          
Verifying no open hardware problems....... FAILED!          FAILED!          
Verifying no open software problems....... FAILED!          FAILED!          
Checking file permissions................. FAILED!          Passed           
Verifying no open cabling problems........ FAILED!          Passed           
Verifying no open data loss problems...... FAILED!          Passed           
Checking symbolic links................... FAILED!          Passed           
Checking number of IML retries............ FAILED!          Passed           
Verifying no CF R/W errors................ FAILED!          Passed           
Scanning ranks............................ FAILED!          Passed           
Checking serials in ncipl (strict)........ FAILED!          FAILED!          
Checking serials in ncipl (vote).......... skipped          skipped          
Checking PDM ISS consistency.............. skipped          skipped          
Checking PDM corruption................... skipped          Passed           
Checking Pulled out BANJO................. FAILED!          Passed           
----------------------------------------------------------------------------
Looking at the traffic from both controllers one is really lively with ARPs and ICMPs and everything, while the other one is just dead. I have heard stories from how these controllers die that range from ridiculous things like that they cannot handle a full filesystem to the RAID controller just dies.

So, I have one working controller - shouldn't that be enough? Maybe, but probably not. From people that have way more experience than me operating these systems I have learned that the system will continue to function with one controller, but will not accept any array changes. This means that your data will continue to live on, but you cannot make any new. And since I want to start with a clean array, that is no good.

Swapping the places of the two controllers worked in that the Kona 1 now became Kona 0, but the other controller is still dead - so at least the chassis is functional.

I am also in talks with a seller on Alibaba about buying more controllers to see if I can brute force it, but given the reliability reputation I do not wish to put a lot more money in the DS6800 unless I can be certain things will work out.

If I were to figure out where the flash is located and how to access it offline I might be more tempted in trying to repair these things. The controller has a bunch of headers (see picture 4) that I am sure would be useful, but so far I haven't been able to figure out where the 2 GB system flash is located.

Inside the DS6800 controller
Picture 4: Inside the DS6800 controller
Next step in debugging this will be to connect to the serial console the cards have to see if it allows any form of recovery. It is a custom RJ11 -> DB9 cable that I will need to assemble as soon as I figure out the pinout.

That's all for now!

Comments

Popular posts from this blog

Buying an IBM Mainframe

I bought an IBM mainframe for personal use. I am doing this for learning and figuring out how it works. If you are curious about what goes into this process, I hope this post will interest you. I am not the first one by far to do something like this. There are some people on the internet that I know have their own personal mainframes, and I have drawn inspiration from each and every one of them. You should follow them if you are interested in these things: @connorkrukosky @sebastian_wind @faultywarrior @kevinbowling1 This post is about buying an IBM z114 mainframe (picture 1) but should translate well to any of the IBM mainframes from z9 to z14. Picture 1: An IBM z114 mainframe in all its glory Source: IBM What to expect of the process Buying a mainframe takes time. I never spent so much time on a purchase before. In fact - I purchased my first apartment with probably less planning and research. Compared to buying an apartment you have no guard rails. You are left

Brocade Fabric OS downloads

Fabric OS is what runs on the SAN switches I will be using for the mainframe. It has a bit of annoying upgrade path as the guldmyr blog can attest to. TL;DR is that you need to do minor upgrades (6.3 -> 6.4 -> 7.0 -> ... > 7.4) which requires you to get all  Fabric OS images for those versions. Not always easy. So, let's make it a bit easier. Hopefully this will not end up with the links being taken down, but at least it helped somebody I hope. These downloads worked for me and are hash-verified when I could find a hash to verify against. Use at your own risk etc. The URLs are: ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-168954-1/v7.3.2a.zip ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-157071-1/v7.2.1g.zip ftp://ftp.hp.com/pub/softlib/software13/COL59674/co-150357-1/v7.1.2b.zip ftp://ftp.hp.com/pub/softlib/software12/COL38684/co-133135-1/v7.0.2e.zip ftp://ftp.hp.com/pub/softlib/software13/COL22074/co-155018-1/v6.4.3h.zip ftp://ftp.hp.c

zBC12, the new family member

Yesterday after more than a year's delay my zBC12 mainframe finally booted up. This is a machine that was donated to me in hopes to advance the hobbyist community, which I am eternally grateful for. Image 1: Athena, the zBC12 that just now got online Then what is the main selling point of the zBC12 versus the z114? You might recall my article  System z on contemporary zLinux  where I explained that running modern Linux on a z114 is hard. This is the main selling point for me to upgrade - being able to run things like more modern Linuxes than z114. While the latest OSes in zLinux, z/VM, and z/OS require z13 or newer - a zBC12 still allows me to run a few releases newer software. Image 2: The operator himself in the picture with Athena Perhaps one of the bigger deals that is very welcome is the support for OSA-Express5S. This means that while previously you needed both PCIe and I/O bays in order to have both effective higher speed connectivity like 8G FC or 10 GB Ethernet as well as