Mainframe setup 1.0

A couple of weeks ago we streamed installing the mainframe in the datacenter from scratch. When we arrived the only thing that was there was the mainframe itself, some power cables, a fiber drop for Internet, and some shelves. Nothing was connected together.

Being a datacenter it is of course noisy from all the fans, so I am not surprised that streaming with sound did not work out. Never-the-less it seems like at least some folks enjoyed the stream.

Given that the stream content is somewhere around 12 hours, I figured a written walk-through of what we connected and why is in order. This is it.

The main connections you will need to your mainframe are:

Power
Fibre Channel (FICON and/or FCP)
Ethernet Fiber
Ethernet Copper (Management)

Power

Power has already been covered in the Powering the mainframe article with the notable update that connecting both power cables for some reason trips the fuse in the datacenter. The manual says this:

Depending on the server configuration, this leakage current can reach 350mA (350 milliamps). For most reliable operation, Ground Fault Circuit Interrupter (GFCI), Earth Leakage Circuit Breaker (ELCB) or Residual Current Circuit Breaker (RCCB) type circuit breakers are not recommended for use with System z servers.

We will need to do some measurements to confirm this is the case, but for now we will run with only one connection hooked up. It does cause a couple of warnings to show up in the HMC and it does not report any power consumption data, but things at least work :-).

Fibre Channel and Ethernet Fiber

These are the two main interconnects used to connect the mainframe to the outside world. Fibre Channel (FC), either in FICON or FCP (SCSI) mode, is used for storage systems like hard drives and RAID arrays. Ethernet is connected to cards that are called OSA Express cards. For z114 the two versions in use are OSA-Express3 and OSA-Express4S - the prior sits in the I/O cage, the latter in the PCIe cage.

Picture 1: The front PCIe cage

The PCIe cage (Picture 1) is the replacement of the I/O cage and the one I am focusing on, both in this blog but also in real-life. It mainly contains ESCON ports - which being the predecessor of FICON is not interesting at all to me.

When cabling make sure to consider that you need to cable both front and back. I ended up buying trunk cables from FS.com that neatly allowed me to connect all 10 fiber pairs per side in only three cables - one for FC/1G/10G respectively. The cables are also armored which is nice.

Picture 2: SAN A, Brocade 7800 Extension Switch

The other side of the FC cables needs to go somewhere, and talking to folks that build FC networks as a day job I found out that the common way to set things up is to have an A (Picture 2) and a B (Picture 3) side. When changing something you change one side at a time - it is fine if you disrupt one side as it will fall over to the other in that case. The downside is that you need to configure two set of devices identically.

Picture 3: SAN B, Brocade 5100 Switch

I chose two different models mainly because I wanted to try these two different switches to compare how they are to manage and operate - but also because I want to give Fibre Channel over IP (FCIP) a try. The Brocade 7800 supports FCIP but is significantly more expensive. FCIP is a quite commonly used protocol to link mainframe storage over Internet, and some preliminary experiments I have conducted shows that it should be able to offer some non-trivial amount of I/O over even quite poor links. We will see how it stands up in reality down the road.


Picture 4: Arista Ethernet Switch

Given that you will want to access your mainframe and its workloads from outside the datacenter you need an Ethernet switch. Getting a cheap 10G switch from eBay is highly recommended. The z114 mostly use 1G fibers but I purchased a pair of 10Gb cards as well to give it a try.

Ethernet Copper (Management)

While you can manage the mainframe fully from the Support Elements (SE) that are bolted to the frame, you will most likely want to run a Hardware Management Console (HMC) that allows you to use an ordinary web browser (Firefox apparently recommended by IBM) to access the management interface.

For this to work you need to connect the port J02, seen in picture 5 as the second cable from the left (red) from the front and the back BPH-e (Bulk Power Hub) switch. This is how the HMC can talk to the primary and the secondary SE.

The J01 port is used to connect the SEs to the Internet, I suppose in order for them to be able to download things from IBM. Since I do not have any service contract I have not bothered to set them up - but we connected them up none the less.

Picture 5: BPH-e Mainframe Ethernet Switch

OSA-ICC

In addition to this there is most likely some Ethernet RJ45 ports in either the I/O cage or the PCIe cage that are so called OSA-ICC ports. These ports can be configured to be TN3270 IP consoles. You can assign one of these ports as a console in an LPAR which allows you some pretty fool proof way of accessing a console for maintenance or in emergencies.

This is only supported on OSA with copper ports, and sadly for z114 that means exclusively models that use the I/O cage. In order to conserve power I aim to disconnect my I/O cage as it according to documentation consumes about 500W power. If it ends up being less power-hungry without all the ESCON cards I might keep it to get these pretty cool ports.

Accessories

I chose to run the HMC software as a virtualized VM on a pretty powerful machine (picture 6) I had laying around. Running the HMC virtualized seems to be an unsupported configuration, but required no hackery to install and seems to be working really well. The only gotchas was when using ESXi to make sure to use SATA as the emulated storage backend, and USB 3.0 seemed to work better than then USB 2.0 default controller. Even VMXNET 3 worked out of the box.

I chose to install a Fibre Channel card (not pictured) in the machine as well. This card is then hooked up to a Linux VM using PCIe pass-through allowing it to fully utilize the card as if it was a "proper" server. Why? This makes it possible for the Linux VM to be a so called FC target - i.e. a virtual hard drive that the mainframe can boot from or use as a normal hard drive.

Picture 6: OCP Leopard as a VM host

Emergency Access

Everything above is enough to give access to both maintenance and workloads running on the mainframe on sunny days. However as part of a serious production network - or a chaotic laboratory network in this case - things might not always go as planned and in those cases you will need to be able to access things outside the normal routes. This is usually called out-of-band access and involves a secondary connection, preferably not sharing any of the points of failure from the primary route.

In order to provide this kind of secondary route I chose to use 4G/LTE. It is quite cheap to buy a second-hand LTE modem for use in laptops, and the form-factor is the same for embedded Linux servers. I went with the APU2 from PCengines (picture 7). They make very affordable and open-source embedded servers in the vicinity of where I live in Switzerland. From placing the order I had the device in my hands within 24h.

Picture 7: PCengines APU2 computer with LTE connection

Not pictured is the external LTE antenna that one of the datacenter staff mounted in the window. All in all the latency and throughput offered is excellent as an emergency access alternative. Over a Wireguard tunnel the resulting latency is 55 ms, to be compared with 3G's of 300ms+ which was utterly unusable.

Given the APU2 has so many ports, I opted to using the WiFi card as a 3rd emergency route connected to the datacenter's office WiFi. I haven't told them about this yet, so when they read this I hope they are OK with it ;-). It is of course much nicer to use than LTE but it also shares a couple of points of failure. Realistically I am likely the most probable failure mode when I accidentally configure the wrong port on the network switch, and for that the WiFi works great.

I also connected the PDUs with remote management to this box to allow me to power-cycle almost all the gear if need be.

Finished 1.0 Setup

In Picture 8 you can see how it all looks connected. There are some parts that will be part of 2.0 like the DS6800 array that is still being repaired.

Picture 8: Finished 1.0 setup

I must say I am very happy with how it all turned out, and it wouldn't be possible without all the help of some seriously awesome people. I will make an attempt of listing them below but I will without doubt forget some folks - I hope they do not hold it against me.

The datacenter Ungleich's wonderful staff
Emil for helping the logistical nightmare
Markus for helping me connect everything together
Andreas for helping moving equipment between buildings
Sebastian for all the help in becoming a mainframe hobbyist myself
Moshix for all the excellent mainframe knowledge
Jonathan Rauch for providing me with IBM help
Mattson for sprinkling some of that certified electrician dust over the power cables
Google's shipping & receiving, you accepted way too many packages than anyone should reasonably do
My partner for supporting my crazy hobbies, I love you

As always, thanks for reading!

mainframe.dev