I have done systems administration for as long as I remember, and while I have set up countless of services and servers - I have quite limited experience working with the full life-cycle of truly enterprise software. Therefor, I thought it would be interesting to understand more on how one would plan for and execute updates on an IBM Z Series mainframe.
If you consider a typical Linux system you generally have some sort of distribution release number (e.g. Ubuntu 20.04 LTS) as well as maybe a service pack (E.g. 20.04.1). Drilling a bit deeper you have individual packages like Apache2 at a certain version number. As part of keeping the system up to date you would typically have a schedule like the following:
- Update all packages once every week
- Update the release once every two years
The service packs in Linux are typically implicit. If you install the latest package updates that means you are up to date on the latest service pack. In Windows the service packs used to be explicit (e.g. Windows XP SP4) but Microsoft has since then adopted so called feature updates that act similar to Linux releases. When you install those feature updates the whole OS is reinstalled as part of those updates, while trying to preserve the data as best as it can.
What about mainframes?
Mainframes work quite similar to what we are used to, but not exactly. This is where the acronyms start. IBM calls software as Licensed Internal Code (LIC). If you see the term LIC you can mentally substitute it for "program", "software", or possibly "firmware". The version of a LIC is referred to as its Control Level (CL). The LIC has releases similar to Linux releases, while the CL is more like service packs.
Example: The Support Elements' (SE) LIC is called "Driver XYZ" where XYZ is the release name. For the z114 the latest LIC release is "Driver 93G" - commonly referred to as just "Driver 93". In my case, the full version I am using is "Driver 93G CL0011" which you now should be able to parse.
Side note: The installation medium for the LIC is commonly referred to as activated read-only memory (AROM). If anyone knows the history behind that naming I would love to hear it!
So far there are next to no differences to a typical system except for the new acronyms and names. That ends here. You would be excused to think that users of CL 1 would update to CL 2 when that is released, it is the logical thing coming from a typical Linux world! However, that is not what you typically do here. The only time you would change the CL is when you do a upgrade to a new release (e.g. Driver 86E CL x to 93G CL y). In fact, if you try to restore a backup to a system on a newer CL it would fail. In practice you install the recommended CL level at the time of the LIC upgrade and you stick with that.
So, what about day-to-day updates?
So if you do not update the CL level, how do you make sure you have all the updates? Unsurprisingly, in a similar fashion as that typical Linux system we keep referring to. A CL is simply a "lower bar" of versions of all packages. If you were to compare CL 1 and CL 10 you would see the same packages but newer versions, no other differences. In fact, a fully updated CL 1 and a fully updated CL 10 will have the identical set of package versions.
Fun note! Knowing that CL 1 and CL 10 when updates contains the same package versions, why is it that you cannot restore a backup? It has to do with the way backups are implemented on the support elements. A backup contains the system modifications since the CL it is running. This means that a system based on CL 1 would have larger backups than systems based on CL 10. Knowing this, it is hopefully easy to see why restoring a CL 1 backup upon a CL 10 might lead to an inconsistent system - which is why I am guessing IBM choose this particular limitation.
So far I have used the term packages - that term is actually not used for IBM LIC updates. They are instead called Engineering Change streams (EC or EC Stream) and are associated with an identification number and a description. The version is also not called version, it is referred to as Microcode Change Level.
An EC number identifies a particular set of internal code that collectively has a common purpose. Within each EC, an MCL identifies a particular internal code change and distinguishes it from previous and subsequent changes (ie. other MCLs) to the same EC.
Example: The FICON firmware for z114 is called N48123 and has the description "Ficon Express8S LIC". On my z114 I am running MCL 11.
If you ever mess up an update and need to roll an update back, you can always do that from the support element. An EC Stream has four different versions; Retrieved, Installable, Activated, and Accepted. When an update is made effective it is referred to as activated. An activated but not accepted update can be rolled back. When the update is known to be good it can then be accepted to update the known good state to include that particular update. In practice you would accept all currently activated updates when you start a new update. That means that you can always roll back to the last known good state.
With all that explained you should now be able to parse the following dialog. This is the dialog that displays exactly what management software and firmware is running on the mainframe.
If things have gone really bad you can reinstall the support elements and restore from a backup, at which point all firmware and management software will be returned to the known working version. I find that particular fact really nice. In a PC server if you want to downgrade firmware you will likely have to do an ad-hoc process per component (e.g. one for BIOS, one for NIC firmware, etc.).
There are quite a number of processes I have not covered in details, but they are worth mentioning. So here goes.
Some mainframes have their support elements kept disconnected from the internet for security reasons. When doing an update you then order a SUL (sorry, do not know the acronym!) package from IBM which can be used to install all available updates.
If you do not wish to install a particular update, there are plenty of ways you can select some but not others - that is something that I wish more modern systems would consider. Indeed, this function is part of a critical feature called Concurrent Driver Upgrade (CDU). It works by upgrading to a fixed point but not further. The driver upgrade has been crafted by IBM to support doing hitless upgrades from these CDU points. I can highly recommend the interested reader to read more about this process in the z Series HMC Redbook.
Another important subject are so called High Impact/Pervasive Program Temporary Fix (HIPER) alerts. These are sent out by IBM to inform customers about critical bugs and what updates needs to be installed to work around these issues. This is how an example HIPER looks like:
I asked an IBMer "How often do a typical mainframe customer install the latest updates?". I was not sure what I expected, but the answer was that the recommendation is to do an update every 3 months and is done by an IBM technician. I take this to mean that few if any customers go through the update process themselves.
That is all I had for you folks this time! Since my z114 does not have an IBM support contract my experience running these processes described above is limited. Do you have stories to tell or experiences to share? Let me know down in the comments!
Thanks for reading!