SAPonPower

An ongoing discussion about SAP infrastructure

There they go again, HPE misleading customers about Superdome X futures

When the acquisition of SGI by HPE was announced last year, I was openly skeptical about HPE’s motives and wrote a blog post to discuss the potential reasons and implications behind this decision. I felt pretty certain that HPE would not retain both Superdome X (SD-X) and the high end SGI system, now called MC990 X as they play in the same space. I speculated that the MC990 X would not be the winner because it has inferior memory reliability technology and uses a hand wired matrix to interconnect different nodes compared to SD-X which uses a backplane to connect nodes to switches for connectivity to other nodes.

At SapphireNow this past week, I learned, from a couple of different sources, that Intel’s next generation platform, “Purely” using “Skylake” processors include so many significant changes that SD-X will very likely not be able to support them. Those changes include a new chip level interconnect technology “UltraPath” (UPI), as a follow on to QPI, faster memory and a larger footprint which, according to pictures available on the internet, appears to be at least 25% larger. This makes sense since the high end Skylake processor will have up to 28 cores versus Broadwell’s 24 cores and includes 25% more L3 cache but uses the same 14nm lithography. Though specs are limited, my sources also believe this new platform will require more power and cooling than the “Broadwell” chips used in SD-X today. SD-X cell boards are already pretty compact nodes, which made sense when they were trying to deliver a “converged” solution, but which means that any increase in footprint of any component could push these boards beyond their limits. Clearly, the UPI change alone would require HPE’s XNC2 cell controller to be redesigned even if no power or cooling limits were exceeded. Considering how few of these chips are used per year, it would be impractical for HPE to maintain a development program for both XNC2 (and the associated SX3000 switch) in addition to the SGI acquired NUMALink7 cell controllers. The speculation is therefore that SD-X will, in fact, be the loser and HPE will utilize the MC990 X as the go forward, high end platform.

If this turns out to be a correct conclusion, then a lot of very large customers are likely to be in for a major shock since they have purchased SD-X with the expectation of being able to upgrade it and add on like architecture systems for their SAP HANA workloads as they grow over time as well as for other large memory applications. Had they been informed about the upcoming obsolescence of SD-X, many of these customers may have made a different purchasing decision. I am not convinced they would have purchased the MC990 X instead for a variety of reasons, not the least of which is the lack of robust memory protection, available with SD-X, known as RAS, Lockstep or DDDC+ mode, but not with MC990 X. Lacking this, the failure of a chip on a memory DIMM could result in a HANA failure, system failure or the need to immediately shut down the system to diagnose and fix the failed DIMM. Most SAP customers running Suite on HANA or S/4HANA are unlikely to find this level of protection to be acceptable, especially when they are running massive HANA transactional systems.

On the subject of reliability, one of the highest exposures that any system has are the physical connections and the MC990 X has a lot of them to form the ccNUMA mesh. Each cable has a connector on each end which are then inserted into the appropriate port on each pair of node controllers. Any time a cable is inserted into any sort of receptacle, there is a possibility that it will fail due to the pressure required to insert it or corrosion. A system with 2 nodes has 8 such cables, 2 between the pair of controllers on each node and 4 between the nodes. A system with 5 nodes has approximately 50 such cables. While not necessarily catastrophic, a failure of one of these cables would require a planned outage as soon as possible as there would be a performance impact from the loss of the connection.

Another limitation of the MC990 X is the lack of virtualization technology. As each node in a MC990 X contains 4 sockets, the level of granularity of this system is so coarse that very poor utilization is likely to result and a massive increase in footprint will be required to handle the same set of workloads that might have been previously handled by SD-X. Physical partitioning is available, but based on one or more 4 socket nodes, such a waste as to be completely irrelevant.

If this speculation proves to be true, and HPE has not been sharing this roadmap with customers, then those customers have been convinced to invest in an obsolete platform and they should be furious. For those in the process of making a decision, they should be asking HPE the right questions about SD-X product futures and then taking that into account as well as the product weaknesses of MC990 X.

I should note that IBM has but one go forward architecture for Power Systems based on the on-chip interconnect technology that has been included and evolving since POWER4. Customers that invested in Power technology have seen a consistent 3 to 4 year cycle to the next generation with those who purchased high end models often having the option to upgrade from one generation to the next.

If you read my blog post last week, you may recall that I pointed out how HP was lying to customers about HANA on IBM Power Systems.  Now we learn that they may also be doing the same sort of thing about their own products.  Me thinks a trend is emerging here!?!?

Advertisements

May 23, 2017 Posted by | Uncategorized | , , , , , , , , , | 2 Comments