SAPonPower

An ongoing discussion about SAP infrastructure

3D XPoint Memory – The best thing for SAP HANA since HANA was invented?

At #SapphireNow, the Intel booth was all atwitter about the new “game changer”, “revolutionary”, “future of computing”, “best thing since the wheel” (ok, I made that last one up).  Yes, they were thrilled with 3D XPoint Optane memory.[i]  It is being positioned as persistent memory, like SSD but much faster and which can take the place of real, a.k.a. DRAM, memory … eventually.  Paraphrasing them, “You will be able to replace conventional memory with 3D XPoint memory at almost the same speed but which gives you the ability to restart your system after failure in a matter of seconds, not minutes or hours, because the entire HANA image will be stored in persistent memory, not on disk or SSDs.”

This sounds fantastic as long as we completely ignore reality.  Let’s dissect the above sentence.

“almost the same speed” – current speculation is that 3DXPoint memory will be about 10 times slower than conventional memory.  That is WAY better than external SSD storage, which is around 1000 times slower, but for memory resident applications, like HANA, 10 times slower will result in at least a 10x performance reduction for HANA.  Remember, we have no idea how this might affect an application which expects very fast access to memory.

“restart your system after failure:” – silly me, I thought the idea was to prevent failure in the first place.  I am curious how often system failure is caused by memory errors or any other cause for which diagnostics might be required to evaluate the underlying problem as well a repair action to fix that problem.  Then the question is in which scenario is a customer willing to circumvent diagnostics and return the system to productive use.  This also assumes that customers are willing to run mission critical systems without any sort of HA solution such as HANA System Replication or HANA Host Auto-Failover.  The use of an HA solution would fail-over production to a secondary system which means that any memory image on the primary system would be out of date almost instantly.

“restart … in seconds” – So, your system has failed for unknown reasons and you are willing to forgo any sort of evaluation of the underlying cause.  So far so good.  So, Linux is capable of restarting and keeping the memory image as it was before hand and utilizing persistent main memory? Not entirely, but with RHEL 7.3 (not supported for HANA yet), using special device drivers applications may be rewritten to utilize “pmem” for pseudo storage devices.[ii]  And HANA is capable of restarting as well from whatever point it was in at the time of failure.  Also, did not know HANA could do this and am surprised that SAP prioritized fast restart ahead of the long laundry list of customer provided requirements … which I doubt they did.  And HANA can figure out what transactions were in flight at the time of failure, which ones had made some changes to memory, but not all, e.g. started to insert data into a delta table but perhaps had not completed this action at time of failure?  Totally wicked!! … and total fantasy, at least for now.

You can easily imagine a variety of other conditions where columns are being updated, e.g. during a delta merge, but have not finished in which some columns contain updated elements and others do not.  I am not saying these are insurmountable problems, but considering that you can’t even make a change to the size of a HANA system without restarting HANA currently, it is a massive stretch to imagine how SAP has or is willing to invest the time and effort to make this work for a highly questionable benefit with likely severe performance degradation.

So, 3D XPoint memory as a replacement for conventional memory is clearly all hype, but don’t expect anyone from Intel or their proponents to tell you this.  How about as a technology for much faster SSDs?  Now we are talking!  I doubt there is any reason why this will not be quickly adopted by disk subsystem vendors and available from multiple sources.

As to whether HANA workloads will benefit, that is a different story.  Remember, HANA is a read-once workload.  Once a column is loaded into memory, it is never read again until unloaded and this should only occur if the memory subsystem is undersized or the system is restarted after maintenance.  So, fast storage is useful for restarts, but super-fast storage is only needed when a system must return to full operation after maintenance very quickly and without any performance degradation, i.e. every column loaded into memory, in 10 minutes or so.  Just as a point of comparison, IBM ran a test with 10 NVMe cards and delivered about 1TB per minute when restarting HANA.  To the best of my knowledge, few customers have expressed more than a passing interest in this capability.  I could imagine a scenario in which customers are willing to put a somewhat recent tier of data, e.g. 1 to 2 year old data, on persistent main memory, with perhaps external, and orders of magnitude slower, storage used for older data.  Once again, this is a nice concept but until SAP writes or adopts code to enable this, it is just a theory.

As to writes, most enterprise storage subsystems can deliver response times that are twice as fast as SAP requires.  IBM SVC (SAN Volume Controller) connected to an IBM Power System has been tested in real customer installations and has delivered the fastest times of any storage subsystem in the industry with a peak latency of only 161us (microseconds) for 4K block size log writes as measured by HWCCT or over 6 times better latency than what SAP requires.   SVC is part of a family of products including V7000, V9000 and Spectrum Virtualization Software which all utilize similar concepts and software.

In other words, you don’t have to wait for tomorrow to get fast restarts and minimized transactional log writes, you just need to select the write infrastructure partner, IBM.

[i] https://www.theregister.co.uk/2017/05/17/coming_xeon_sps_will_run_sap_hana_16_times_faster/
[ii] https://developers.redhat.com/blog/2016/12/05/configuring-and-using-persistent-memory-rhel-7-3/

June 12, 2017 Posted by | Uncategorized | , , , , , | 4 Comments