POWER10 – Memory Sharing and how HANA customers will benefit
As an in-memory database, SAP HANA is obviously limited by access to memory. Having massive CPU throughput with a small amount of memory could be useful for a an HPC application that needs to crunch through trillions of operations on a small amount of data. By comparison, a HANA system typically scales up with both CPU and memory at the same time.
Intel attempted to solve this problem through the use of large scale persistent DIMMs. Unfortunately, they delivered a completely unbalanced solution with their Cascade Lake processors that included a small incremental performance increase coupled with Optane DIMMs which are 3 to 5 times slower than DDR4 DIMMs (at best). By the way, the new “Barlow Pass” Optane DIMMs, that will be available with next gen Copper Lake and Ice Lake systems, will reportedly only deliver 15% bandwidth improvement over today’s “Apache Pass” DIMMs.[i] Allow me to clap with one hand at that yawner of an improvement. Their solution is somewhat analogous to a transportation problem where a road has 2 lanes and is out of capacity. You can increase the horsepower of each vehicle a bit and pack in far more seats in each vehicle, but it will likely to take longer to get all of the various passengers in different vehicles to their destination and they will most assuredly be much more uncomfortable.
IBM with POWER10, but comparison, attacked this problem by addressing all aspects simultaneously. As mentioned in part 1, POWER10 sockets have the potential of delivering 3 times the workload of POWER9 sockets. So, not a small incremental improvement as with Cascade Lake, but a massive one. Then they increased the bandwidth to memory by at least 4x, meaning they can keep the CPUs fed with data … and in case memory can’t keep up, they added support for DDR5 and its much faster speeds and throughput. Then they increased socket to socket communications bandwidth by a factor of four since transactional and analytic workloads, like HANA, tend to be spread across sockets and often need to access data from another socket. And just in case the system runs out of DIMM sockets, they introduced a new capability, “memory clustering” or “memory inception”[ii] which allows memory on another physical system to be accessed remotely (more on this later) with a 50 to 100ns latency hit[iii]. And just to make sure that I/O did not become the next bottleneck, they have doubled down on their previous leadership with being the first major vendor to support PCIe Gen4 by including support for PCIe Gen5 with a potential for twice the I/O throughput.
Using the previous analogy, IBM attacked the problem by tripling the horsepower of each vehicle with lots of extra doors and comfortable seats, quadrupling the number of lanes on the road and enabled each vehicle to support tandem additions. In other words, everybody can get to their destinations much faster and in great comfort.
So, what is this memory clustering? Put simply, it is an IBM developed technology which enables a VM on one system to map memory allocated to it by PowerVM on another system as if it was locally attached. In this way, a VM which requires more memory than is available on the system upon which it is running can be provided with that memory from one or more systems in the cluster. It does this through the same PowerAXON (IBM’s SMP interconnect technology) as is used within each system across sockets. As a result, the projected additional latency is likely to be only slightly higher than accessing memory across the NUMA fabric.
IBM described multiple different potential topologies, ranging from “Enterprise Class” at extreme bandwidth, to hub and spoke; mixing and matching CPU heavy with Memory heavy nodes to even multi-hop pod-level clustering of potentially thousands of nodes. With POWER10 featuring a 2 Petabyte memory addressability, the possibilities are mind boggling.
For HANA workloads, I see a range of possibilities. The idea that a customer could extend memory across systems is the utopia of “never having to say I’m sorry”. By that, I mean that in the bad old days (current times that is), if you purchased, for example, a 2TB system with all DIMM slots used, and your HANA instance needed a tiny amount more memory than available on the system, you had three choices: 1) let HANA deal with insufficient memory and start moving columns in and out of memory with all of the associated performance impact implied, 2) move the workload to a larger system at a substantial cost and a loss of the existing investment (which always brings a smile and hug from the CFO) or 3) if possible, shut down the instance and the system, rip out all existing DIMMs and replace them with larger ones (even more disruptive and still very expensive).
With memory clustering, you could harvest unused capacity in your cluster at no incremental cost. Or, if all memory was in use, you could resize a less important workload, e.g. a HANA sandbox VM or non-prod app server, and reallocate it to the production VM requiring more memory. Or you could move a less important workload to a different server potentially in a different data center or perhaps a much smaller system and reuse the memory using clustering. Or you could purchase a small, low GHz, small number of activated cores system to add to the cluster with plenty of available memory to be used by the various VMs in the cluster. The possibilities are endless, but you will notice, having to say to management that “I blew it” was not one of the options.
Does this take the place of “Storage Class Memory” (SCM) aka persistent memory? Not at all. In fact, POWER10 has explicit support for SCM DIMMs. The question is more of whether SCM technology is ready for HANA. At 3 to 5 times worse latency than DRAM, Intel’s SCM, Optane, most certainly is not. In fact, I call it highly irresponsible to promote a technology with barely a mention of the likely performance drawbacks as has been done by Intel and their merry band of misinformation brethren, e.g. HPE, Cisco, Dell, etc.
I prefer IBM’s more measured approach of supporting technology options, encouraging openness and ecosystem innovation, and focusing on delivering real value with solutions that make sense now as opposed to others’ approaches that can lead customers down a path where they will inevitably have to apologize later when things don’t work as promised. I am also looking forward to 2021 to see what sort of POWER10 systems and related infrastructure options IBM will announce.
[i] https://www.tomshardware.com/news/intel-barlow-pass-dimm-3200mts-support-15w-tdp
[ii] https://www.crn.com/news/components-peripherals/ibm-power10-cpu-s-memory-inception-is-industry-s-holy-grail-
https://www.servethehome.com/ibm-power10-searching-for-the-holy-grail-of-compute/hot-chips-32-ibm-power10-memory-clustering-enterprise-scale-memory-sharing/
[iii] https://www.hpcwire.com/2020/08/17/ibm-debuts-power10-touts-new-memory-scheme-security-and-inferencing/
No comments yet.
Leave a Reply