SAPonPower

An ongoing discussion about SAP infrastructure

Optane DC Persistent Memory – Proven, industrial strength or full of hype – Detail, part 2

If the performance considerations from part 1 were the only issues, a reasonable case could be made for the potential value of doing a PoC with this technology.  But, of course, those are not the only issues.  One of the reasons that NVDIMMs have longer latencies than DRAM is due to their persistence and therefore the need to encrypt data placed on these components.  Encryption and decryption take a lot of computational power and can have a substantial impact on latency and bandwidth.  The funny thing is that encryption of these NVDIMMs can be turned off if desired, presumably with a resulting improvement to performance.  But what kind of customer would be willing to turn off this vital security technology?

Another desirable trait of modern, in-memory platforms is advanced memory protection which allows a system to continue to operate in the event of a DIMM failure.  This often starts with basic ECC, but then progresses to SDDC, DDDC (Chipkill or Lockstep), ADDDC (Skylake and beyond only) and IBM’s unique Chipkill + chip sparing technology.  ADDDC is not available for NVDIMMs, but DDDC is.  The downside of DDDC is that it comes with a significant performance penalty. No performance numbers have been provided for NVDIMMs configured with DDDC, but previous generations saw 20% to 40% degradation when using this mode.[i][ii]

What kind of customer would be willing to disable key security features or run critical systems without the best available reliability technologies?  I would certainly advise customers to use encryption and advanced reliability technologies in most circumstances.  Only those customers that can scramble business critical, PII and/or HIPAA data should ever consider disabling persistent memory encryption.  I searched, using every option that I could imagine, and failed to find a single web site that recommended ever disabling NVDIMM encryption.

SAP Benchmarks results posted on the external web site do not show the details of how security and reliability configuration parameters have been set.  It is therefore impossible to say whether HPE enabled or disabled these protection features.  In my many years of experience and extensive discussion with benchmarking experts, I can share that every single one, at every vendor, used every tool or technology that did not violate official rules to enhance results.  It would not be too much of a leap to project that HPE, and other vendors posting results with NVDIMMs, have likely disabled anything that might cause their results to diminish in any way.  (HPE, if you would like to share your configuration details, I would be happy to post them and if I have mischaracterized how you ran these benchmarks, will also post a retraction.) As a result, these BWH results may not only have relevance to only a small subset of the potential workloads but may also represent an unacceptable exposure to any company that has high single system availability requirements or has one of those unreasonable security departments which thinks that data protection is actually worthwhile.

And then, there are OLTP customers.  Based on the lack of benchmark testing of Suite on HANA, S/4HANA or C/4HANA combined with the above data from Lenovo about the massive reduction of bandwidth and associated huge increase in latency for OLTP, it would be MOST unwise to place any of these types of environments on systems with NVDIMMs without extensive testing of real customer workloads to ensure that internal performance SLAs can be met.

Certain types of workloads may perform decently with NVDIMMs.  BW environments where the primary use is for predictable and repeatable queries and reports may see only moderate performance degradation compared to DRAM based systems, but still orders of magnitude better performance that AnyDB systems which merely cache recently used data in memory and keep most data on external storage.  BW Extension nodes, S/4 Data aging objects and other types of archival systems that take older, less frequently used data and place them on other tiers of storage or systems, could certainly benefit from NVDIMMs.  Non-prod workloads which are not in the critical path to production, e.g. dev, test, sandbox, might make sense to place on systems with NVDIMMs.  All of these depend on an acceptance of potential performance issues and hardware/firmware/software fixes that inevitably come once customers start playing with version 1.0 of any new technology.

Based on likely performance issues, inferior RAS technology and the above mentioned “fix” dilemma, I would strongly advise that critical systems like production, QA, pre-prod, HA and DR should stay on DRAM based systems until bleeding edge customers prove the value of NVDIMMs and are willing to publicly share their journey.

The question then becomes whether the benefit to a subset of the environments are so substantial that it makes sense to select a vendor for HANA systems based on their ability to utilize NVDIMMs even when this technology might not be used for the most critical of the workloads and their associated critical path and HA/DR systems. This gets into the subjects of cost reduction and restart speeds which will be covered in part 3 of this series.

[i]https://lenovopress.com/lp0048.pdf

[ii]https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-broadwell-ex-memory-performance-ww-en.pdf

May 27, 2019 Posted by | Uncategorized | , , , , , , , , , , , | Leave a comment

Optane DC Persistent Memory – Proven, industrial strength or full of hype – Detail, part 1

Several non-Intel sites suggest that Intel’s storage class memory (Lenovo abbreviates these as DCPMM, while many others refer to them with the more generic term NVDIMM) delivers a read latency of roughly 5 times slower than DRAM, e.g. 350 nanoseconds for NVDIMM vs. 70 nanoseconds for DRAM.[i]  A much better analysis comes from Lenovo which examined a variety of load conditions and published their results in a white paper.[ii]  Here are some of the results:

  • A fully populated 6x DCPMM socket could deliver up to 40GB/s read throughput, 15GB/s write
  • Each additional pair of DCPMMs delivered proportional increases in throughput
  • Random reads had a load to use latency that was roughly 50% higher than sequential reads
  • Random reads had a max per socket (6x DCPMM) throughput that was between 10 and 13GB/s compared to 40 to 45GB/s for sequential reads

The most interesting quote from this section was: “Overall, workloads that are more read intensive and sequential in nature will see the best performance.”  This echoes the quote from SAP’s NVRAM white paper: “From the perspective (of) read accesses, sequential scans fare better in NVRAM than point reads: the cache line pre-fetch is expected to mitigate the higher latency.[iii]

The next section is even more interesting.  Some of its results comparing the performance differences of DRAM to DCPMM were:

  • Almost 3x better max sequential read bandwidth
  • Over 5x better max random read bandwidth
  • Over 5x better max sequential 2:1 R/W bandwidth
  • Over 8x better max random 2:1 R/W bandwidth
  • Latencies for DCPMM in the random 2:1 R/W test hit a severe knee of the curve and showed max latencies over 8x that of DRAM at very light bandwidth loads
  • DRAM, by comparison, continued to deliver significantly increasing bandwidth with only a small amount of latency degradation until it hit a knee of the curve at over 10x of the max DCPMM bandwidth

Unfortunately, this is not a direct indication of how an application like HANA might perform.  For that, we have to look at available benchmarks. To date, none of the SD benchmarks have utilized NVDIMMs.  Lenovo published a couple of BWH results, one with and one without NVDIMMs, but used different numbers of records, so they are not directly comparable.  HPE, on the other hand, published a couple of BWH results using the exact same systems and numbers of records.[iv]  Remarkably, only a small, 6% performance degradation, going from an all DRAM 3TB configuration to a mixed 768GB/3TB NVDIMM configuration occurred in the parallel query execution phase of the benchmark.  The exact configuration is not shown on the public web site, but we can assume something about the config based on SAP Note: 2700084 – FAQ: SAP HANA Persistent Memory: To achieve highest memory performance, all DIMM slots have to be used in pairs of DRAM DIMMs and persistent memory DIMMs, i.e. the system must be equipped with one DRAM DIMM and one NVDIMM in each memory channel.”  Vendors submitting benchmark results do not have to follow these guidelines, but if HPE did, then they used 24@32GB DRAM DIMMs and 24@128TB NVDIMMs.  Also, following other guidelines in the same SAP Note and the SAP HANA Administration Guide, HPE most likely placed the column store on NVDIMMS with row store, caches, intermediate and final results calculations on DRAM DIMMs.

BWH is a benchmark composed of 1.3 billion records which can easily be loaded into a 1TB system with room to spare.  To achieve larger configurations, vendors can load the same 1.3B records a second, third or more times, which HPE did a total of 5 times to get to 6.5B records.  The column compression dictionary tables, only grow with unique data, i.e. do not grow when you repeat the same data set regardless of the number of times it is added.

BWH includes 3 phases, a load phase which represents data ingestion from ERP, a parallel query phase and a sequential, single user complex query phase.  Some have focused on the ingestion and complex query phases, because they show the most degradation in performance vs. DRAM.  While that is tempting, I believe the parallel query phase is of the most relevance.  During this phase, 385 queries of low, medium and high complexity (no clue as to how SAP defines those complexities, what their SQL looks like or how many of each type are included) are run, in parallel and randomly.  After an hour, the total count of queries completed is reported. In theory, the larger the database, the fewer the queries that could be run per hour as each query would have more data to traverse.  However, that is not what we see in these results.

Lenovo, once again, provides the best insights here.  With Skylake processors, they reported two results.  On the first, they loaded 1.3B records, on the second 5.2B records or 4 times the number of rows with only twice the memory.  One might predict that queries per hour would be 4 times or more worse considering the non-proportionate increase in memory.  The results, however, show only a little over 2x decrease in Query/hr. Dell reported a similar set of results, this time with Cascade Lake, also with only real memory and also only around 2x decrease in Query/hr for 4X larger number of records.

What does that tell us? It is impossible to say for sure. From the SAP NVRAM white paper referenced earlier, “One can observe that some of the queries are more sensitive to the latency of the persistent memory than others. This can be explained by multiple factors:

  1. Does the query exhibit a memory access pattern that can easily prefetch by the hardware
  2. prefetchers? Is the working set of queries small enough to fit in CPU
  3. cache and hence agnostic to persistent memory latency? Is processing of the query compute or latency bound?”

SAP stores results in the “Static Cache”. “The static result cache is particularly helpful in the following scenario:  Complex query based on a view; Rather small result set; Limited amount of changes in the underlying tables.  The static result cache can provide the following advantages: Reduction of CPU consumption; Reduction of SAP HANA thread utilization; Performance improvements[v]

Other areas like delta storage, caches, intermediate result sets or row store remain solely in dynamic RAM (DRAM) is usually stored in DRAM, not NVDIMMs.[vi]

The data in BWH is completely static.  Some queries are complex and presumably based on views.   Since the same queries execute over and over again, prefetchers may become especially effective.  It may be possible that some or many of the 385 queries in BWH may be hitting the results cache in DRAM.  In other words, after the first set of queries run, a decent percentage of accesses may be hitting only the DRAM portion of memory, masking much of the latency and bandwidth issues of NVRAM.  In other words, this benchmark may actually be testing CPU power against a set of results cached in working memory more than actual query speed against column store.

So, let us now consider the HPE benchmark with NVDIMMs.  On the surface, 6% degradation with NVDIMMs vs. all DRAM seems improbable considering NVDIMM higher latency/lower bandwidth.  But after considering the above caching, repetitive data and repeating query set, it should not be much of a shock that this sort of benchmark could be masking the real performance effects.  Then we should consider the quote from Lenovo’s white paper above which said that NVDIMMs are a great technology for read intensive, sequential workloads.

Taken together, while not definitive, we can deduce that a real workload using more varied and random reads, against a non-repeating set of records might see a substantially different query throughput than demonstrated by this benchmark.

Believe it or not, there is even more detail on this subject, which will be the focus of a part 2 post.

 

[i]https://www.pcper.com/news/Storage/Intels-Optane-DC-Persistent-Memory-DIMMs-Push-Latency-Closer-DRAM

[ii]https://lenovopress.com/lp1083.pdf

[iii]http://www.vldb.org/pvldb/vol10/p1754-andrei.pdf

[iv]https://www.sap.com/dmc/exp/2018-benchmark-directory/#/bwh

[v]https://launchpad.support.sap.com/#/notes/2336344

[vi]https://launchpad.support.sap.com/#/notes/2700084

May 20, 2019 Posted by | Uncategorized | , , , , , , , , , , , | Leave a comment

SAP HANA support for HPE nPar on Superdome Flex update

In addition to the outstanding support for virtualization technologies like PowerVM for HANA and the lukewarm support for VMware by SAP, SAP also supports other technologies that allow larger systems to be subdivided into smaller nodes.  Note that I did not say virtualization, but subdivision.  Physical partitioning (PPAR) is a technology invented in the 1990s and only allows components, e.g. boards or NUMA nodes, to be allocated to a separate workload from others on the same physical system.

On October 22, 2018, SAP updated its SAP Note for HPE nPar technology.[i]  With this update, SAP now supports nPars with Superdome Flex.  Granularity is incredibly fine (not).  As noted in the SAP note, “Via nPartitions, the following  partition sizes are supported in terms of the number of sockets:

    • Skylake based architecture: ScaleUp 16s, 12s, 8s, 4s; ScaleOut 4s, 8s, 16s

Or to put it in terms of cores, each socket has 28 cores, so granularity is 112 cores.  You need only 20 cores?  No problem, you get to consume 112.  You need 113 cores? Also no problem, you get to consume 224 cores.  But, on the positive side, these npars are “electrically isolated” which has 2 really important implications.  First, the only way to isolate one or more Superdome Flex drawers into a separate nPar is to physically change the mesh wiring of the entire system.  That means that if you decide to change the configuration of nPars, dynamic changes would be the exact opposite of what is supported.  In fact, according to customer reports, HPE requires a Statement of Work service contract to come out and rewire the system and it takes multiple days … one customer reported multiple weeks.  The second implication is that all resources on the node(s) in an nPar are dedicated to that nPar.  In the above example, if you need 20 cores, you probably require around a ½ TB of memory for BW or 1TB of memory for S/4.  It is possible to configure an nPar with as little as 1.5TB of memory which means that you might waste an entire TB if you only need ½ TB.  Alternately, if you have other workloads on other nPars that require more cores and memory and you want to keep all drawers consistent to allow for future changes, you might actually have up to 6TB per drawer meaning much more wasted memory if you only require ½ TB for a particular workload.  By the way, the only other elements that are shared when a system is broken up into physically isolated nPars are the frame(s), power supplies and the RMC – Rack Management Controller.  PCIe cards cannot be shared due to the physical isolation, so by using nPars, you essentially take a very expensive system and carve it into a bunch of smaller and very expensive, isolated systems which are difficult to reconfigure.  Alternately, if you really must use HPE technology for smaller workloads, you could purchase smaller systems at much lower prices.

I have really been trying to scratch my head and understand why anyone would want this type of 1990s era partitioning technology.  HPE certainly does because it results in higher profits from selling larger systems with more aggregate capacity while giving the false appearance of flexibility.  For customers, on the other hand, it offers massive waste and very limited flexibility.

My advice: Don’t be a sucker and get taken in by HPE’s misdirection play.  Either purchase appropriately sized systems for each workload or purchase systems that offer real virtualization, such as IBM Power Systems, with fine grained allocation of resources sharing of components such as PCIe adapters and true server consolidation, but don’t purchase one of these massive HPE systems and then eliminate any perceived value of using such a large system by cutting it up into smaller systems.

 

 

[i]2103848 – SAP HANA on HPE nPartitions in production

 

March 25, 2019 Posted by | Uncategorized | , , , , , , , , , , , | Leave a comment

Power Systems – Delivering best of breed scalability for SAP HANA

SAP quietly revised a SAP Note last week but it certainly made a loud sound for some.  Version 47 of https://launchpad.support.sap.com/#/notes/2188482 now says that OLTP workloads, such as Suite on HANA or S/4HANA are now supported on IBM Power Systems up to 24TB.  OLAP workloads, like BW HANA may be implemented on IBM Power Systems with up to 16TB for a single scale-up instance.  As noted in https://launchpad.support.sap.com/#/notes/2055470, scale-out BW is supported with up to 16 nodes bringing the maximum supported BW environment to a whopping 256TB.

As impressive as those stats are, it should also be noted that SAP also provided new core-to-memory (CTM) guidance with the 24TB OLTP system sized at 176-cores which results in 140GB/core, up from the previous 113.7GB/core at 16TB.  The 16TB OLAP system, sized at 192-cores, translates to 85.3GB/core, up from the previous 50GB/core for 4-socket and above systems.

By comparison, the maximum supported sizes for Intel Skylake systems are 6TB for OLAP and 12TB for OLTP which correlates to 27.4GB/core OLAP and 54.9GB/core OLTP.  In other words, SAP has published numbers which suggest Power Systems can handle workloads that are  2.7x (OLAP) and 2x (OLAP) the size of the maximum supported Skylake systems.  On the CTM side, this works out to a maximum of 3.1x (OLAP) and 2.6x (OLTP) better performance per core for Power Systems over Skylake.

Full disclosure, these numbers do not represent the highest scaling Intel systems.  In order to find them, you must look at the previous generation of systems.  Some may consider them obsolete, but for customers that must scale beyond 6TB/12TB (OLAP/OLTP) and are unwilling or unable to consider Power Systems, an immediate sunk investment may be their only choice.  (Note to customers in this undesirable predicament, if you really want to get an independent, third party verification of potential obsolesence, ask your favorite leasing companies, not associated or owned by the vendor, what residual value they would assume after 1 year for these systems vs. what they would assume for similar Skylake systems after 1 year.)

The previous “generation” of HPE Superdome, “X”, which as discussed in my last blog post shares 0% technology with Skylake based HPE Superdome “Flex”, was supported up to 8TB/16TB with 384 cores for both OLAP and OLTP, resulting in CTM of 21.3GB/42.7GB/core.  The SGI derived HPE MC990 X, which is the real predecessor to the new “Flex” system, was supported up to 4TB/20TB with 192 cores OLAP with 480 cores.

Strangely, “Flex” is only supported for HANA with 2 nodes or chassis where “MC990 X” was supported with up to 5 nodes.  It has been over 4 months since “Flex” was announced and at announcement date, HPE loudly proclaimed that “Flex” could support 48TB with 8 chassis/32 sockets https://news.hpe.com/hewlett-packard-enterprise-unveils-the-worlds-most-scalable-and-modular-in-memory-computing-platform/.  Since that time, some HPE reps have been telling customers that 32TB support with HANA was imminent.  One has to wonder what the hold up is.  First it took a couple of months just to get 128GB DIMM support. Now, it is taking even longer to get more than 2-node support for HANA.  If I were a potential HPE customer, I would be very curious and asking my rep about these delays (and I would have my BS detector set to high sensitivity).

Customers have now been presented with a stark contrast.  On one side, Power Systems has been on a roll; growing market share in HANA, regular increases in supported memory sizes, the ability to handle the largest single image HANA memory sizes of any vendor, outstanding mainframe derived reliability and radically better flexibility with built in virtualization and support for a maximum of 8 concurrent production HANA instances or 7 production with many dozens of non-prod HANA, application servers, non-HANA DBs and/or a wide variety of other applications supported in a shared pool, all at competitive price points.

On the other hand, Intel based HANA systems seem to be stuck in a rut with decreased maximum memory sizes (admittedly, this may be temporary), anemic increases in CTM, improved RAS but not yet to the league of Power Systems and a very questionable VMware based virtualization support filled with caveats, limitations, overhead and poor, at best, sharing of resources.

March 28, 2018 Posted by | Uncategorized | , , , , , , , , , , , , , , , | Leave a comment

HPE Superdome is dead, but HPE marketing continues its deceptive ways.

Today, 11/6/17, HPE announced the “New” Superdome Flex.  If you did not look too closely, you would think that this was some sort of descendant of Superdome.  After all, the Integrity Superdome took the original Superdome and replaced PA-RISC chips and the SX1000 cell controller with Itanium chips and a faster SX2000 cell controller.  Superdome 2 took this further by upgrading to the latest Itanium chips, an even faster SX3000 cell controller and moved from a cell board to a blade configuration.  Superdome X changed out the Itanium chips for Intel Xeon chips which it upgraded over several generations.  So, it would be only natural to think that Superdome Flex did something similar and that is exactly what HPE wants you to think.

Except, this is not even remotely like any prior Superdome and has inherited almost nothing from it.  In fact, this is a very straightforward descendant of the SGI UV 300H, which HPE renamed the MC990 X after the acquisition.  A glance at the front of the “new” system shows the same basic design, a 4-socket, 5U chassis even down to the unique diagonal handles on the fans, but they apparently moved the NUMAlink fabric ports (no longer called that; renamed Superdome Flex ports) from the back to the front, perhaps to get rid of a little of the rats nest of cables which defined the SGI UV 300H.  This means there is no SX3000 or cross bar switch in the Flex and the blade design is gone.  Even the memory DIMMS are different which implies that nothing could be moved from an old Superdome X to a new “Flex” other than perhaps some old PCIe adapters.

So, if the entire design is based on an SGI acquired technology and it shares nothing from its “namesake”, one would need to avoided that course in ethics in high school or college to find it appropriate to suggest to customers that this is a related technology.  Imagine if Honda changed the engine, frame, transmission, trim and body style but called their new car an Accord “Flex” because it used the same bumper and tire sizes, would you feel as if they were trying to manipulate you?

Back to the more important topic, Superdome is now dead!  I have been saying this for a while and blogged about this several months ago.  I suggested that any customer considering investing in this technology view it as instantly obsolete and a sunk investment.  I pointed out the huge investment in ccNUMA interconnect technologies and how it was hard to imagine how HPE could afford to invest in 2 different ones at the same time, so only one system was likely to survive.  I explained that the SGI technology offered more space and power to host the new, larger, higher wattage and heat dissipating Skylake processors.  It appears that my projections were correct.  For customers that ignored that advice, I just hope you got a really great price and don’t mind paying a lot for old technology for any upgrades or dumping your old systems at a huge financial loss.  For any customer still considering a Superdome X, the writing is no longer on the wall.  It is on HPE’s web site.  https://news.hpe.com/hewlett-packard-enterprise-unveils-the-worlds-most-scalable-and-modular-in-memory-computing-platform/

Currently, no white papers have been published showing the architecture and detailed specs of this “new” system, only a relatively high level “Spec” sheet.  Perhaps HPE is too embarrassed to publish this since it would likely resemble the SGI UV 300H in way too many ways, including the old rats nest of 4-bit wide interconnect cables.  Once they do, I will investigate and will likely publish a separate post to share what I find.

On the SAP front, new HANA appliance specs have been published for “Flex”.   It is interesting, and again embarrassing for HPE, that only up to 8-socket configs are shown, with less BWoH memory support @ 6TB max than the old, and now obsolete, Superdome X.  Even more interesting is the lack of SoH and S/4 configs, and I have a suspicion as to why.  Turns out that the spec sheet does have one interesting point after all.  It shows the maximum size memory DIMMS are 64GB and the number of DIMMS slots is 48 with a max supported memory of 3TB per chassis, i.e. half of what is necessary to support the 6TB per 4-sockets that other competing Intel vendors support.

So, if you need a supported HANA configuration today with current generation processors for BWoH beyond 6TB, look at any vendor other than HPE with 8-socket Skylake systems or IBM Power Systems.  If you need a supported SoH or S/4 configuration with current gen processors, look at any vendor other than HPE and beyond 12TB, only IBM Power Systems is supported at this level.

November 6, 2017 Posted by | Uncategorized | , , , , , , , , , , | Leave a comment

HPE, still playing fast and loose with the facts about SAP HANA on Power

Writing a blog post would be so much simpler if IBM permitted me to lie, but that is prohibited.  That is clearly not the case at HPE, see this recent blog post: https://community.hpe.com/t5/Alliances/SAP-HANA-runs-best-on-x86-Period/ba-p/6971659#.WZxDPa01TxW

It contains so many lies, it is hard to know where to start.   Let’s start with the biggest one.  There is a 10x difference in performance KPIs required by SAP to certify and ship a HANA appliance vs. a solution certified for TDI only.

You really have to love those lies that are refuted by such easily obtained facts from documentation that is apparently not used by HPE called SAP Notes.  SAP note 1943937 specifically states: All HWCCT tests of appliances (compute servers) certified with scenario HANA-HWC-AP SU 1.1 or HANA-HWC-AP RH 1.1 must use HWCCT of SAP HANA SPS10 or higher or a related SAP HANA revision”  Interesting that appliances must use the same HWCCT test of SAP KPIs as used by TDI.  So, based on HPE’s blog post, does this mean that if an HPE appliance compute server is used for TDI, it will perform 10x worse than if it is used in an appliance? That would imply that the secret sauce of HPE’s appliances is so incredible that it acts like a dual turbocharger on a car!

The blog post goes on to say “It (Power) ‘works’ … but it is just held to a ~10x lower standard without any of the performance optimizations attributed to SAP’s co-innovation efforts with Intel.”  OK, two lies in one sentence, obviously going for the gold here.  As we have already discussed the 10x lie, let us just look at the second one, i.e. performance optimizations.  Specifically, the blog calls out AVX and TSX as the performance optimizations for the Intel platform.  They are correct, those optimization don’t work on Power as, instead of AVX (Advanced Vector Extensions), Power has two fully symmetric vector pipelines called via VSX (Vector-Scalar eXtensions) instructions which HANA has been “optimized” to use in the same manner as AVX.  And TSX, a.k.a. Transactional System Extensions, came out after POWER8 Transactional Memory, but HANA was optimized for both at the same time.

The blog post also stated “Intel E7v3 CPUs for HANA (TSX and AVX) that offer a 5x performance boost over older Intel E7 or Power 8 CPUs.“  Awesome, but where is the proof behind this statement?  Perhaps a benchmark?  Nope, not one published on SAP’s site, even the old SD benchmark backs up this claim (which shows, by the way, almost the same SAPS/core for E7v3 vs. E7v2 and way less than POWER8, but maybe that benchmark does not use those optimizations?  Ok, then maybe the sizing certifications show this?  Nope.  A 4 socket CS500 Ivy Bridge system (E7v2) is published as supporting up to a 2TB SoH solution where a 4 socket CS500 Haswell systems (E7v3) is shown as supporting up to a 3TB SoH solution.  So far, that is just 50% more, not 5x, but perhaps HPE can’t tell the difference between 0.5 and 5.0?  But didn’t Haswell have more cores per socket?  Yes, it had 18 cores/socket vs. 15 cores/socket for IvyBridge, i.e. 20% more cores/socket.  So, E7v3 based systems could actually host 25% more memory and associated workload than E7v2 based systems per core.  Of course, I am sure that the switch from DDR3 to DDR4 from E7v2 to E7v3 had nothing to do with this performance improvement.  So, 5x performance boost is clearly nothing but a big fat lie.

And the hits just keep coming.  The next statement is just lovely “Naturally, this only matters if you want to be able to call SAP support to get help on nuance performance issues impacting your productive SAP HANA deployment.“  I guess he is trying to suggest that SAP won’t help you with performance problems if you are running any TDI solution including Power Systems, except this is contradicted by all of the SAP notes about TDI and HWCCT, not to mention the experience of customers who have implemented HANA on Power.

IBM wants to be the king of legacy businesses like mainframe and UNIX. That’s pretty much the only platforms they have left. So now that they can state that HANA “works” on Power, they can make a case to their AIX/Power customers that they should stay on AIX/Power for SAP and HANA and avoid what IBM claims to be an “oh so painful’ Unix to x86 migration.“  HPE, suggesting that you can run HANA on AIX/Power since HANA only runs on Linux/Power, might not be telling a lie but might just be expressing ignorance and the inability to use sophisticated and obscure search tools like Google.   As to the suggestion of IBM having said that a SAP heterogeneous migration is “oh so painful” ignores the fact that we have done hundreds of such migrations from HP/UX among others.  Perhaps the author is reflecting HPE’s migration experience with what every other migration provider sees as very well understood and fully supported SAP process.  As to IBM’s motivation, HPE is trying to suggest that IBM is only in the HANA business to support its legacy SAP on AIX/Power.  Just looking at the thriving business of HANA on Power, over 850 HANA on Power wins since becoming a supported HANA provider, might suggest otherwise.  How about the complete absence of any quotes, marketing materials or other documentation that shows that IBM is in this market for any other reason than it is the future of SAP and IBM intends to remain a premier partner of SAP and our customers?

Taken together, this blog post shows that HPE must be using the advanced Skylake processors in their Superdome-X and MC990 X (SGI UV 300H) to generate lies at an astounding pace.  Oh wait, I forgot (not really) that HPE still has not announced support for Skylake in their high end systems and is only certified for BWoH, not SoH or S/4HANA, with Skylake and only up to 3TB at that, … one month after announcement of Skylake!  Wow, I guess this shows simultaneously how much their pace of technology innovation has slowed down and partnership with SAP has decreased!

And, let’s end on their last amazing sentence, “Every day you put off that UNIX to x86 migration, you are running HANA in a performance degraded mode with production support limitations.“  Yes, HPE seems to be recommending that you migrate your UNIX based systems, i.e. those running Business Suite 7, to x86.  In other words, do two migrations, once from UNIX to x86 and a second to HANA.  Sounds like a totally disconnect from business reality!  As to the second part of that sentence, it simply does not make sense, so we are not going to attribute that to a lie, but to a simple logic error.

What are we to conclude about this blog post?  Taken on its own, it is a rogue employee.  Taken with the deluge of other similar misleading and outright lies emerging from HPE and we see a trend.  HPE has gone from being a well-respected systems supplier to a struggling company that promotes and condones lies and hires less than competent individuals to propagate misinformation.  Sounds more like a failing communist state than a company worthy of a customer’s trust.

August 22, 2017 Posted by | Uncategorized | , , , , , , , , | 4 Comments

Intel Skylake has been announced and the self-described HANA “market leader”, HPE, is curiously trailing the field

Intel announced general availability of their “Skylake” processor on the “Purely” platform last week.  Soon after, SAP posted certified HANA configurations for Lenovo and Fujitsu up to 8 sockets and 12TB memory for Suite on HANA (SoH) and S/4HANA (S4) and 6TB for BW on HANA (BWoH).  They also posted certified configurations for Dell and Cisco up to 4-socket systems with 6TB SoH/S4 and 3TB BWoH.  The certified configurations posted for HPE, which describes itself as the HANA market leader, only included up to 4-socket/3TB BWoH configurations, no configurations for SoH/S4 and nothing for any larger systems.

It is still early and more certified configurations will no doubt emerge over time, but these early results do beg the question, “what is going on with HPE?”  I checked the most recent press releases for HPE and they did not even mention the Skylake debut much less their certification with SAP HANA.  If you Google using the keywords, HPE, Skylake and HANA, you may find a few discussions about HPE’s acquisition of SGI and my previous blog posts with my speculation about Superdome’s demise and HPE’s misleading of customers about this impending event, but nothing from HPE.

So, I will share a little more speculation as to what this slow start for HPE in the Skylake space might portend.

Option 1 – HPE is not investing the funds necessary to certify all of their possible configurations and SoH/S4.  Anyone that has been involved with the HANA certification process will tell you that it is very time consuming and expensive.  As you can see from HPE’s primary Intel based competitors, they are all very eager to increase their market share and acted quickly.  Is HPE becoming complacent?  Are they having financial restrictions that have not been made public?

Option 2 – HPE’s technology limitations are becoming apparent.  The Converged System 500 is based on Proliant DL560/580 systems which support a maximum of 4 sockets.  These systems utilize Intel QPI and now UPI interconnect technologies, i.e. no custom ASICs or ccNUMA switches are required.  The CS900 based on the Superdome X and the MC990 X (SGI UV 300H) utilize custom ASICs and, in the case of Superdome X, a set of ccNUMA switches.  As I speculated previously, Superdome X is probably at end of life, so it may never see another certification on SAP’s HANA site.  As to the MC990 X, the crystal ball is a bit more hazy.  Perhaps HPE is trying to shoot for the moon and hit a number beyond the 20TB for SoH/S4 that is currently supported meaning a much longer and more complex set of certification tests.  Or perhaps they are running into technical challenges with the new ASICs required to support UPI.

Option 3 – MC990 X is going to officially become HPE’s only high end offering to support Skylake and subsequent processors and Superdome X is going to be announced at end of life.  If this were to happen, it would mean that anyone that had recently purchased such a system would have purchased a system that is immediately obsolete.

If Option 1 turns out to be true, one would have to concerned about HPE’s future in the HANA space.  If Option 2 turns out to be true, one would have to be really concerned about HPE’s future in the HANA space.  And if Option 3 turns out to be true, why would HPE be waiting?  The answer may be inventory.  If HPE has a substantial inventory of “old” Broadwell based blades and Superdome X chassis, they will undoubtedly want to unload these at the highest price possible and they know that the value of obsolete systems after such an announcement would drop into the below cost of manufacturing range.

So, you pick the most likely scenario.  Worst case for HPE is that they are just a little slow or shooting too high.  Worst case for customers is that they purchase a HANA system based on Superdome X and end up with a few hundred thousand dollar boat anchor.  If you work for a company considering the purchase of an HPE Superdome X solution, you may want to ask about its future and, if you find it is at end of life, select another solution for your SAP HANA requirements.

Inevitably, more systems will be published on SAP’s certification page, https://www.sap.com/dmc/exp/2014-09-02-hana-hardware/enEN/appliances.html#viewcount=100&categories=certified%2CIntel%20Skylake%20SP .  When that happens, especially if any of my predictions turn out to be true or if they are all wrong and another scenario emerges, I will post an update.

July 20, 2017 Posted by | Uncategorized | , , , , , , , , , , , , | Leave a comment

There they go again, HPE misleading customers about Superdome X futures

When the acquisition of SGI by HPE was announced last year, I was openly skeptical about HPE’s motives and wrote a blog post to discuss the potential reasons and implications behind this decision. I felt pretty certain that HPE would not retain both Superdome X (SD-X) and the high end SGI system, now called MC990 X as they play in the same space. I speculated that the MC990 X would not be the winner because it has inferior memory reliability technology and uses a hand wired matrix to interconnect different nodes compared to SD-X which uses a backplane to connect nodes to switches for connectivity to other nodes.

At SapphireNow this past week, I learned, from a couple of different sources, that Intel’s next generation platform, “Purely” using “Skylake” processors include so many significant changes that SD-X will very likely not be able to support them. Those changes include a new chip level interconnect technology “UltraPath” (UPI), as a follow on to QPI, faster memory and a larger footprint which, according to pictures available on the internet, appears to be at least 25% larger. This makes sense since the high end Skylake processor will have up to 28 cores versus Broadwell’s 24 cores and includes 25% more L3 cache but uses the same 14nm lithography. Though specs are limited, my sources also believe this new platform will require more power and cooling than the “Broadwell” chips used in SD-X today. SD-X cell boards are already pretty compact nodes, which made sense when they were trying to deliver a “converged” solution, but which means that any increase in footprint of any component could push these boards beyond their limits. Clearly, the UPI change alone would require HPE’s XNC2 cell controller to be redesigned even if no power or cooling limits were exceeded. Considering how few of these chips are used per year, it would be impractical for HPE to maintain a development program for both XNC2 (and the associated SX3000 switch) in addition to the SGI acquired NUMALink7 cell controllers. The speculation is therefore that SD-X will, in fact, be the loser and HPE will utilize the MC990 X as the go forward, high end platform.

If this turns out to be a correct conclusion, then a lot of very large customers are likely to be in for a major shock since they have purchased SD-X with the expectation of being able to upgrade it and add on like architecture systems for their SAP HANA workloads as they grow over time as well as for other large memory applications. Had they been informed about the upcoming obsolescence of SD-X, many of these customers may have made a different purchasing decision. I am not convinced they would have purchased the MC990 X instead for a variety of reasons, not the least of which is the lack of robust memory protection, available with SD-X, known as RAS, Lockstep or DDDC+ mode, but not with MC990 X. Lacking this, the failure of a chip on a memory DIMM could result in a HANA failure, system failure or the need to immediately shut down the system to diagnose and fix the failed DIMM. Most SAP customers running Suite on HANA or S/4HANA are unlikely to find this level of protection to be acceptable, especially when they are running massive HANA transactional systems.

On the subject of reliability, one of the highest exposures that any system has are the physical connections and the MC990 X has a lot of them to form the ccNUMA mesh. Each cable has a connector on each end which are then inserted into the appropriate port on each pair of node controllers. Any time a cable is inserted into any sort of receptacle, there is a possibility that it will fail due to the pressure required to insert it or corrosion. A system with 2 nodes has 8 such cables, 2 between the pair of controllers on each node and 4 between the nodes. A system with 5 nodes has approximately 50 such cables. While not necessarily catastrophic, a failure of one of these cables would require a planned outage as soon as possible as there would be a performance impact from the loss of the connection.

Another limitation of the MC990 X is the lack of virtualization technology. As each node in a MC990 X contains 4 sockets, the level of granularity of this system is so coarse that very poor utilization is likely to result and a massive increase in footprint will be required to handle the same set of workloads that might have been previously handled by SD-X. Physical partitioning is available, but based on one or more 4 socket nodes, such a waste as to be completely irrelevant.

If this speculation proves to be true, and HPE has not been sharing this roadmap with customers, then those customers have been convinced to invest in an obsolete platform and they should be furious. For those in the process of making a decision, they should be asking HPE the right questions about SD-X product futures and then taking that into account as well as the product weaknesses of MC990 X.

I should note that IBM has but one go forward architecture for Power Systems based on the on-chip interconnect technology that has been included and evolving since POWER4. Customers that invested in Power technology have seen a consistent 3 to 4 year cycle to the next generation with those who purchased high end models often having the option to upgrade from one generation to the next.

If you read my blog post last week, you may recall that I pointed out how HP was lying to customers about HANA on IBM Power Systems.  Now we learn that they may also be doing the same sort of thing about their own products.  Me thinks a trend is emerging here!?!?

May 23, 2017 Posted by | Uncategorized | , , , , , , , , , | 2 Comments

HPE, in an act of desperation, is spreading misinformation about SAP HANA on IBM Power Systems

Misinformation is a poor characterization of HPE’s behavior.  HPE, or some of its employees, are showing customers charts with a variety of statements which are simply untrue.  In any normal definition, this is called a lie.  This is unethical and unprofessional.  I will repeat what it says in my profile, these are my opinions, not a reflection of those of IBM.

You may have seen the blog post from Vicente Moranta: https://www.linkedin.com/pulse/truth-wild-lies-being-told-hanaonpower-vicente-moranta or my own post on IBM Systems Blog: https://www.ibm.com/blogs/systems/can-you-tell-hana-on-ibm-power-systems-fact-from-fiction/ . At IBM, we have this set of ethical rules called “IBM Business Conduct Guidelines” (BCG).  This 15 to 20 page document is required reading every year with a mandatory test to ensure comprehensive understanding of these rules.  I can boil down one of the most important themes into two words: DON’T LIE!

For those of you who have been reading this blog for a while, you may question whether I am too verbose and that may be fair as I thoroughly research each subject and include attribution for claims, usually including direct links to the source of those claims.  I would never think of making up “facts” and, on rare occasion when a reader has informed me of a mistake, I always correct the mistake as well as include a comment to that effect.

Some background:  A few weeks ago, a customer sent us a list of questions about SAP HANA on IBM Power Systems.  At first, the questions seemed bizarre as they included some very pointed misunderstandings about HANA and SAP in general and IBM’s role with SAP in particular.  As I read them more thoroughly, I realized that someone or some entity had coached the customer.  This was confirmed when I received a copy of a HPE presentation from a completely different source with almost identically worded statements.  By the way, back to the BCG, IBM employees are not allowed to view much less share information from a competitor marked confidential and this presentation was not marked with anything, meaning it was being shown to customers with, or without, HPE management’s official knowledge.

Some of the lies it shares:

  • HPE has 99%+ share of the HANA market. It is kind of funny to note that this claim is contradicted in the same table where it shows 80% share for Intel.  I guess they are confusing SAP and SAP HANA markets which is misleading at best.  More importantly, SAP does not release market share information and even if they did, I think the Lenovo, Cisco, Dell and Fujitsu might together claim more than 1% of the market.
  • IBM, not SAP “delivers” HANA code to customers because they have access to SAP code and have created a “special” version of SAP HANA. Wow, it is hard to figure out where to start here.  SAP owns HANA and only they distribute code.  They refused to support other operating systems than Linux, including AIX, for the very reason of wanting a common code tree for all platforms.  HPE is correct that IBM works closely with SAP to optimize HANA code, a fact which should be lauded not criticized.  Apparently, HPE must not have such a relationship and are jealous?  What HPE does not understand is that regardless of who, IBM, Intel or other, contributes code to SAP or suggests modifications to code, SAP makes all decisions regarding that code, including support, and incorporates it into the common code tree meaning all platforms can benefit if the code is not related to a specific, proprietary instruction set.  When Intel contributed code for TSX, Power HANA was not able to use this code, but with appropriate modifications, SAP was able to add the code to call IBM’s similar “Transactional Memory” calls.  Now, there is simple logic which ensures the appropriate call is made depending on the underlying processor architecture.  Likewise, when IBM saw that the huge number of threads in its architecture might push limits in HANA, it worked with SAP to improve the thread and workload dispatch mechanisms in HANA.  When Intel released their Broadwell-EX 24-core chips and SAP approved large socket counts, these systems would have hit the same threading issues, but with the new mechanisms already in place, were able to benefit from IBM & SAP’s joint effort.  Maybe HPE means that SAP has to compile the same code as used for Intel systems on the Power platform.  Well duh, it is a different chip architecture, so this is computer science 101, but hardly a different “version”.
  • Release priority – #1 Intel, #2 Power. Wrong again HPE!  HANA 2.0 released simultaneously on Intel and Power, as they did for S/4HANA 1610 on-prem edition, support for SoH with HANA 2.0, etc.  Where do you get your misinformation HPE?  This information is widely available on SAP’s Service Marketplace and the SAP PAM.
  • Sizes supported – HPE shows Power support of “only” 4.8TB for BW, 9TB for SoH vs. 24TB for Intel and No scale-out HANA on Power – I will give HPE the benefit of the doubt on the 4.8TB statement as 6TB just came out, but the “only” part is strange in that in the same table it shows “only” 4TB support on Intel. As to 9TB SoH and lack of scale-out HANA, both are wrong and have been for a while with 16TB SoH available since December 4th, 2016, see SAP Note 2188482 and scale-out HANA since November 2015. As to the 24TB claim for Intel, the largest supported HANA appliance is 20TB, so HPE, once again, seems to be making up facts.

There were other lies, but I think you get the idea.  Here are a few suggestions:

To HPE management: Shame on you for permitting such behavior or if done with your knowledge, for encouraging it.  If you have any “integrity” (pun intended), you will fire the employees and managers responsible for knowingly spreading lies and will print a retraction in appropriate press sources and on your web site.  If you don’t, then you are demonstrating, loud and clear, that your company is not to be trusted.

To HPE employees: Unless your management takes the above suggestions to heart with appropriate action to rectify this wrong, I am not sure how you can sleep well working for a company that considers truth to be something to be sacrificed at their convenience.  Hope they are truthful about your benefits.

To SAP customers: I can only speak for myself; when a restaurant, retailer or manufacturer lies to me and/or the public, I refuse to ever do business with them again.  The old saying applies, “fool me once, shame on you, fool me twice, shame on me.”  When you consider the minimal differences, if any, in cost of acquisition between all HANA system providers on the market, including IBM Power

May 15, 2017 Posted by | Uncategorized | , , , , , , , , , | Leave a comment

HANA on Power hits the Trifecta!

Actually, trifecta would imply only 3 big wins at the same time and HANA on Power Systems just hit 4 such big wins.

Win 1 – HANA 2.0 was announced by SAP with availability on Power Systems simultaneously as with Intel based systems.[i]  Previous announcements by SAP had indicated that Power was now on an even footing as Intel for HANA from an application support perspective, however until this announcement, some customers may have still been unconvinced.  I noticed this on occasion when presenting to customers and I made such an assertion and saw a little disbelief on some faces.  This announcement leaves no doubt.

Win 2 – HANA 2.0 is only available on Power Systems with SUSE SLES 12 SP1 in Little Endian (LE) mode.  Why, you might ask, is this a “win”?  Because true database portability is now a reality.  In LE mode, it is possible to pick up a HANA database built on Intel, make no modifications at all, and drop it on a Power box.  This removes a major barrier to customers that might have considered a move but were unwilling to deal with the hassle, time requirements, effort and cost of an export/import.  Of course, the destination will be HANA 2.0, so an upgrade from HANA 1.0 to 2.0 on the source system will be required prior to a move to Power among various other migration options.   This subject will likely be covered in a separate blog post at a later date.  This also means that customers that want to test how HANA will perform on Power compared to an incumbent x86 system will have a far easier time doing such a PoC.

Win 3 – Support for BW on the IBM E850C @ 50GB/core allowing this system to now support 2.4TB.[ii]  The previous limit was 32GB/core meaning a maximum size of 1.5TB.  This is a huge, 56% improvement which means that this, already very competitive platform, has become even stronger.

Win 4 – Saving the best for last, SAP announced support for Suite on HANA (SoH) and S/4HANA of up to 16TB with 144 cores on IBM Power E880 and E880C systems.ii  Several very large customers were already pushing the previous 9TB boundary and/or had run the SAP sizing tools and realized that more than 9TB would be required to move to HANA.  This announcement now puts IBM Power Systems on an even footing with HPE Superdome X.  Only the lame duck SGI UV 300H has support for a larger single image size @ 20TB, but not by much.  Also notice that to get to 16TB, only 144 cores are required for Power which means that there are still 48 cores unused in a potential 192 core systems, i.e. room for growth to a future limit once appropriate KPIs are met.  Consider that the HPE Superdome X requires all 16 sockets to hit 16TB … makes you wonder how they will achieve a higher size prior to a new chip from Intel.

Win 5 – Oops, did I say there were only 4 major wins?  My bad!  Turns out there is a hidden win in the prior announcement, easily overlooked.  Prior to this new, higher memory support, a maximum of 96GB/core was allowed for SoH and S/4HANA workloads.  If one divides 16TB by 144 cores, the new ratio works out to 113.8GB/core or an 18.5% increase.  Let’s do the same for HPE Superdome X.  16 sockets times 24 core/socket = 384 cores.  16TB / 384 cores = 42.7GB/core.  This implies that a POWER8 core can handle 2.7 times the workload of an Intel core for this type of workload.  Back in July, I published a two-part blog post on scaling up large transactional workloads.[iii]  In that post, I noted that transactional workloads access data primarily in rows, not in columns, meaning they traverse columns that are typically spread across many cores and sockets.  Clearly, being able to handle more memory per core and per socket means that less traversing is necessary resulting in a high probability of significantly better performance with HANA on Power compared to competing platforms, especially when one takes into consideration their radically higher ccNUMA latencies and dramatically lower ccNUMA bandwidth.

Taken together, these announcements have catapulted HANA on IBM Power Systems from being an outstanding option for most customers, but with a few annoying restrictions and limits especially for larger customers, to being a best-of-breed option for all customers, even those pushing much higher limits than the typical customer does.

[i] https://launchpad.support.sap.com/#/notes/2235581

[ii] https://launchpad.support.sap.com/#/notes/2188482

[iii] https://saponpower.wordpress.com/2016/07/01/large-scale-up-transactional-hana-systems-part-1/

December 6, 2016 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , , | 3 Comments