SAPonPower

An ongoing discussion about SAP infrastructure

POWER10 – Opening the door for new features and radical TCO reduction to SAP HANA customers

As predicted, IBM continued its cycle of introducing a new POWER chip every four years (after a set of incremental or “plus” enhancements usually after 2 years).[i]  POWER10, revealed today at HotChips, has already been  described in copious detail by the major chip and technology publications.[ii]  I really want to talk about how I believe this new chip may affect SAP HANA workloads, but please bear with me over a paragraph or so to totally geek out over the technology.

POWER10, manufactured by Samsung using 7nm lithography, will feature up to 15 cores/chip, up from the current POWER9 maximum of 12 cores (although most systems have been shipping with chips with 11 cores or fewer enabled).  IBM found, as did Intel, that manufacturing increasingly smaller lithography with maximum GHz and maximum core count chips was extremely difficult. This resulted in an expensive process with inevitable microscopic defects in a core or two, so they have intentionally designed a 16-core chip that will never ship with 15 cores enabled. This means that they can handle a reasonable number of manufacturing defects without a substantial impact to chip quality, simply be turning off which ever core does not pass their very stringent quality tests.  This will take the maximum the number of cores in all systems up proportionately based on the number of sockets in each system.  If that was the only improvement, it would be nice, but not earth shattering as every other chip vendor also usually introduces more or faster cores with each iteration.

Of course, POWER10 will have much more than that.  Relative to POWER9, each core will feature 1.5 times the amount of L1 cache, 4x the L2 cache, 4x the TLB and 2x general and 4x matrix SIMD.  (Sorry, not going to explain each as I am already getting a lot more geeky than my audience typically prefers.). Each chip will feature over 4 times the memory bandwidth and 4x the interconnect bandwidth.  Of course, if IBM only focused on CPU performance and interconnect and memory bandwidths, then the bottleneck would naturally just be pushed to the next logical points, i.e. memory and I/O.   So, per a long-standing philosophy at IBM, they didn’t ignore these areas and instead built in support for both existing PCIe Gen 4 and emerging PCIe Gen 5 and both existing DDR4 and emerging DDR5 memory DIMMS, not to mention native support for future GDDR DIMMs.  And if this was not enough, POWER10 includes all sorts of core microarchitecture improvements which I will most definitely not get into here as most of my friends in the SAP world are probably already shaking their heads trying to understand the implications of the above improvements.

You might think with all of these enhancements, this chip would run so hot that you could heat an Olympic swimming pool with just one system, but due to a variety of process improvements, this chip actually runs at 3x the power efficiency of POWER9.

Current in-memory workloads, like SAP HANA, rarely face a performance issue due to computational performance.  It is reasonable to ask the question whether this is because these workloads are designed with an understanding of the current limitations of systems and avoid doing more computationally intense actions to avoid delivering unsatisfying performance to users who are increasingly impatient with any delays.  HANA has kind of set the standard for rapid response so delivering anything else would be seen as a failure.  Is HANA holding back on new, but very costly (in computational terms) features that could be unleashed once a sufficiently fast CPU becomes available?  If so, then POWER10 could be the catalyst for unleashing some incredible new HANA capabilities, if SAP takes advantage of this opportunity.

A related issue is that perhaps existing cores can deliver better performance, but memory has not been able to keep pace.  Remember, 128GB DRAM DIMMS are still the accepted maximum size by SAP for HANA even though 256GB DIMMS have been on the market for some time now.  As SAP has long used internal benchmarks to determine ratios of memory to CPU, could POWER10 enable to the use of much more dense memory to drive down the number of sockets required to support HANA workloads thereby decreasing TCO or enabling more server consolidation?  Remember, Power Systems all featured imbedded, hardware/hypervisor based virtualization, so adding workloads to a system is just a matter of harnessing any unused extra capacity.

IBM has not released any performance projections for HANA running on POWER10 but has provided some unrelated number crunching and AI projections.  Based on those plus the raw and incredibly impressive improvements in both the microarchitecture and number of cores, dramatic cache and TLB increases and gigantic memory and interconnect bandwidth expansions, I predict that each socket will support 2 to 3 times the size of HANA workloads that are possible today (assuming sufficient memory is installed).

In the short term, I expect that customers will be able to utilize TDI 5 relaxed sizing rules to use much larger DIMMs and amounts of memory per POWER10 system to accomplish two goals.  For customers with relatively small HANA systems, they will be able to cut in half the number of sockets required compared to existing HANA systems.  For customers that currently have large numbers of systems, they will be able to consolidate those into many fewer POWER10 sytems.  Either way, customers will see dramatic reductions in TCO using POWER10.

As to those customers that often run out of memory before they run out of CPU, stay tuned for part 2 of this blog post as we discuss perhaps the most exciting new innovation with POWER10, Memory Clustering.

 

[i] https://twitter.com/IBMPowerSystems/status/1295382644402917376
https://www.linkedin.com/posts/ibm-systems_power10-activity-6701148328098369536-jU9l
https://twitter.com/IBMNews/status/1295361283307581442
[ii] Forbes – https://www.forbes.com/sites/moorinsights/2020/08/17/ibm-discloses-new-power10-processer-at-hot-chips-conference/#12e7a5814b31
VentureBeat – https://venturebeat.com/2020/08/16/ibm-unveils-power10-processor-for-big-data-analytics-and-ai-workloads/
Reuters – https://www.reuters.com/article/us-ibm-samsung-elec/ibm-rolls-out-newest-processor-chip-taps-samsung-for-manufacturing-idUSKCN25D09L
IT Jungle – https://www.itjungle.com/2020/08/17/power-to-the-tenth-power/

 

August 17, 2020 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Optane DC Persistent Memory – Proven, industrial strength or full of hype – Detail, part 1

Several non-Intel sites suggest that Intel’s storage class memory (Lenovo abbreviates these as DCPMM, while many others refer to them with the more generic term NVDIMM) delivers a read latency of roughly 5 times slower than DRAM, e.g. 350 nanoseconds for NVDIMM vs. 70 nanoseconds for DRAM.[i]  A much better analysis comes from Lenovo which examined a variety of load conditions and published their results in a white paper.[ii]  Here are some of the results:

  • A fully populated 6x DCPMM socket could deliver up to 40GB/s read throughput, 15GB/s write
  • Each additional pair of DCPMMs delivered proportional increases in throughput
  • Random reads had a load to use latency that was roughly 50% higher than sequential reads
  • Random reads had a max per socket (6x DCPMM) throughput that was between 10 and 13GB/s compared to 40 to 45GB/s for sequential reads

The most interesting quote from this section was: “Overall, workloads that are more read intensive and sequential in nature will see the best performance.”  This echoes the quote from SAP’s NVRAM white paper: “From the perspective (of) read accesses, sequential scans fare better in NVRAM than point reads: the cache line pre-fetch is expected to mitigate the higher latency.[iii]

The next section is even more interesting.  Some of its results comparing the performance differences of DRAM to DCPMM were:

  • Almost 3x better max sequential read bandwidth
  • Over 5x better max random read bandwidth
  • Over 5x better max sequential 2:1 R/W bandwidth
  • Over 8x better max random 2:1 R/W bandwidth
  • Latencies for DCPMM in the random 2:1 R/W test hit a severe knee of the curve and showed max latencies over 8x that of DRAM at very light bandwidth loads
  • DRAM, by comparison, continued to deliver significantly increasing bandwidth with only a small amount of latency degradation until it hit a knee of the curve at over 10x of the max DCPMM bandwidth

Unfortunately, this is not a direct indication of how an application like HANA might perform.  For that, we have to look at available benchmarks. To date, none of the SD benchmarks have utilized NVDIMMs.  Lenovo published a couple of BWH results, one with and one without NVDIMMs, but used different numbers of records, so they are not directly comparable.  HPE, on the other hand, published a couple of BWH results using the exact same systems and numbers of records.[iv]  Remarkably, only a small, 6% performance degradation, going from an all DRAM 3TB configuration to a mixed 768GB/3TB NVDIMM configuration occurred in the parallel query execution phase of the benchmark.  The exact configuration is not shown on the public web site, but we can assume something about the config based on SAP Note: 2700084 – FAQ: SAP HANA Persistent Memory: To achieve highest memory performance, all DIMM slots have to be used in pairs of DRAM DIMMs and persistent memory DIMMs, i.e. the system must be equipped with one DRAM DIMM and one NVDIMM in each memory channel.”  Vendors submitting benchmark results do not have to follow these guidelines, but if HPE did, then they used 24@32GB DRAM DIMMs and 24@128TB NVDIMMs.  Also, following other guidelines in the same SAP Note and the SAP HANA Administration Guide, HPE most likely placed the column store on NVDIMMS with row store, caches, intermediate and final results calculations on DRAM DIMMs.

BWH is a benchmark composed of 1.3 billion records which can easily be loaded into a 1TB system with room to spare.  To achieve larger configurations, vendors can load the same 1.3B records a second, third or more times, which HPE did a total of 5 times to get to 6.5B records.  The column compression dictionary tables, only grow with unique data, i.e. do not grow when you repeat the same data set regardless of the number of times it is added.

BWH includes 3 phases, a load phase which represents data ingestion from ERP, a parallel query phase and a sequential, single user complex query phase.  Some have focused on the ingestion and complex query phases, because they show the most degradation in performance vs. DRAM.  While that is tempting, I believe the parallel query phase is of the most relevance.  During this phase, 385 queries of low, medium and high complexity (no clue as to how SAP defines those complexities, what their SQL looks like or how many of each type are included) are run, in parallel and randomly.  After an hour, the total count of queries completed is reported. In theory, the larger the database, the fewer the queries that could be run per hour as each query would have more data to traverse.  However, that is not what we see in these results.

Lenovo, once again, provides the best insights here.  With Skylake processors, they reported two results.  On the first, they loaded 1.3B records, on the second 5.2B records or 4 times the number of rows with only twice the memory.  One might predict that queries per hour would be 4 times or more worse considering the non-proportionate increase in memory.  The results, however, show only a little over 2x decrease in Query/hr. Dell reported a similar set of results, this time with Cascade Lake, also with only real memory and also only around 2x decrease in Query/hr for 4X larger number of records.

What does that tell us? It is impossible to say for sure. From the SAP NVRAM white paper referenced earlier, “One can observe that some of the queries are more sensitive to the latency of the persistent memory than others. This can be explained by multiple factors:

  1. Does the query exhibit a memory access pattern that can easily prefetch by the hardware
  2. prefetchers? Is the working set of queries small enough to fit in CPU
  3. cache and hence agnostic to persistent memory latency? Is processing of the query compute or latency bound?”

SAP stores results in the “Static Cache”. “The static result cache is particularly helpful in the following scenario:  Complex query based on a view; Rather small result set; Limited amount of changes in the underlying tables.  The static result cache can provide the following advantages: Reduction of CPU consumption; Reduction of SAP HANA thread utilization; Performance improvements[v]

Other areas like delta storage, caches, intermediate result sets or row store remain solely in dynamic RAM (DRAM) is usually stored in DRAM, not NVDIMMs.[vi]

The data in BWH is completely static.  Some queries are complex and presumably based on views.   Since the same queries execute over and over again, prefetchers may become especially effective.  It may be possible that some or many of the 385 queries in BWH may be hitting the results cache in DRAM.  In other words, after the first set of queries run, a decent percentage of accesses may be hitting only the DRAM portion of memory, masking much of the latency and bandwidth issues of NVRAM.  In other words, this benchmark may actually be testing CPU power against a set of results cached in working memory more than actual query speed against column store.

So, let us now consider the HPE benchmark with NVDIMMs.  On the surface, 6% degradation with NVDIMMs vs. all DRAM seems improbable considering NVDIMM higher latency/lower bandwidth.  But after considering the above caching, repetitive data and repeating query set, it should not be much of a shock that this sort of benchmark could be masking the real performance effects.  Then we should consider the quote from Lenovo’s white paper above which said that NVDIMMs are a great technology for read intensive, sequential workloads.

Taken together, while not definitive, we can deduce that a real workload using more varied and random reads, against a non-repeating set of records might see a substantially different query throughput than demonstrated by this benchmark.

Believe it or not, there is even more detail on this subject, which will be the focus of a part 2 post.

 

[i]https://www.pcper.com/news/Storage/Intels-Optane-DC-Persistent-Memory-DIMMs-Push-Latency-Closer-DRAM

[ii]https://lenovopress.com/lp1083.pdf

[iii]http://www.vldb.org/pvldb/vol10/p1754-andrei.pdf

[iv]https://www.sap.com/dmc/exp/2018-benchmark-directory/#/bwh

[v]https://launchpad.support.sap.com/#/notes/2336344

[vi]https://launchpad.support.sap.com/#/notes/2700084

May 20, 2019 Posted by | Uncategorized | , , , , , , , , , , , | Leave a comment

Optane DC Persistent Memory – Proven, industrial strength or full of hype?

Intel® Optane™ DC persistent memory represents a groundbreaking technology innovation” says the press release from Intel.  They go on to say that it “represents an entirely new way of managing data for demanding workloads like the SAP HANA platform. It is non-volatile, meaning data does not need to be re-loaded from persistent storage to memory after a shutdown. Meanwhile, it runs at near-DRAM speeds, keeping up with the performance needs and expectations of complex SAP HANA environments, and their users.”  and “Total cost of ownership for memory for an SAP HANA environment can be reduced by replacing expensive DRAM modules with non-volatile persistent memory.”  In other words, they are saying that it performs well, lowers cost and improves restart speeds dramatically.  Let’s take a look at each of these potential benefits, starting with Performance, examine their veracity and evaluate other options to achieve these same goals.

I know that some readers appreciate the long and detailed posts that I typically write.  Others might find them overwhelming.  So, I am going to start with my conclusions and then provide the reasoning behind them in a separate posts.

Conclusions

Performance

Storage class memory is an emerging type of memory that has great potential but in its current form, Intel DC Persistent Memory, is currently unproven, could have a moderate performance impact to highly predictable, low complexity workloads; will likely have a much higher impact to more complex workloads and potentially a significant performance degradation to OLTP workloads that could make meeting performance SLAs impossible.

Some workloads, e.g. aged data in the form of extension nodes, data aging objects, HANA native storage extensions, data tiering or archives could be placed on this type of storage to improve speed of access.  On the other hand, if the SLAs for access to aged data do not require near in-memory speeds, then the additional cost of persistent memory over old, and very cheap, spinning disk may not be justified.

Highly predictable, simple, read-only query environments, such as canned reporting from a BW systems may derive some value from this class of memory however data load speeds will need to be carefully examined to ensure data ingestion throughput to encrypted persistent storage allow for daily updates within the allowed outage window.

Restart Speeds

Intel’s Storage Class memory is clearly orders of magnitude faster than external storage, whether SSD or other types of media.  Assuming this was the only issue that customers were facing, there were no performance or reliability implications and no other way to address restart times, then this might be a valuable technology.  As SAP has announced DRAM based HANA Fast Restart with HANA 2.0 SPS04 and most customers use HANA System Replication when they have high uptime requirements, the need for rapid restarts may be significantly diminished.  Also, this may be a solution to a problem of Intel’s own making as IBM Power Systems customers rarely share this concern, perhaps because IBM invested heavily in fast I/O processing in their processor chips.

TCO

On a GB to GB comparison, Optane is indeed less expensive than DRAM … assuming you are able to use all of it.  Several vendors’ and SAP’s guidance suggest you populate the same number of slots with NVDIMMs as are used for DRAM DIMMs.  SAP recommends only using NVDIMMs for columnar storage and historic memory/slot limitations are largely based on performance.  This means that some of this new storage may go unused which means the cost per used GB may not be as low as the cost per installed GB.

And if saving TCO is the goal, there are dozens of other ways in which TCO can be minimized, not just lowering the cost of DIMMs.  For customers that are really focused on reducing TCO, effective virtualization, different HA/DR methodologies, optimized storage and other associated IT cost optimization may have as much or more impact on TCO as may be possible with the use of storage class memory.  In addition, the cost of downtime should be included in any TCO analysis and since this type of memory is unproven in wide spread and/or large memory installations, plus the available memory protection is less than is available for DRAM based DIMMs, this potential cost to the enterprise may dwarf the savings from using this technology currently.

May 13, 2019 Posted by | Uncategorized | , , , , , , , , , , , , , | 1 Comment

SAP support for multiple production HANA VMs with VMware

Recently, SAP updated their SAP notes regarding the ability to run multiple production HANA VMs with VMware.  On the surface, this sounds like VMware has achieved parity with IBM’s PowerVM, but the reality could not be much farther away from that perception.  This is not to say that users of VMware for HANA will see no improvement.  For a few customers, this will be a good option, but as always, the devil is in the details and, as always, I will play the part of the devil.

Level of VMware supported: 5.5 … still.  VMware 6.0 is supported only for non-production.[i]  If VMware 6.0 is so wonderful and they are such “great” partners with SAP, it seems awfully curious why a product announced on Feb 3, 2015 is still not supported by SAP.

Maximum size of each production HANA instance: 64 virtual processors and 1TB of memory, however this translates to 32 physical processors with Hyperthreading enabled and sizing guidelines must still be followed.  Currently, BW HANA is sized @ 2TB for 4 Haswell chips, i.e. 28.4GB/core which translates to a maximum size of 910GB for a 32 core VM/64 vp, so slightly less than the 1TB supported.  Suite on HANA on Intel is supported at 1.5x higher memory ratio than BW, but since the size of the VM is limited to 1TB, this point is largely moot.

Performance impact: At a minimum, SAP estimates a 12% performance degradation compared to bare metal (upon which most benchmarks are run and from which most sizings are based), so one would logically conclude that the memory/cpu ratio should be reduced by the same level.  The 12% performance impact, but not the reduced sizing effect that I believe should be the result, are detailed in a SAP note.[ii]  It goes on to state “However, there are around 100 low-level performance tests in the test suite exercising various HANA kernel components that exhibit a performance degradation of more than 12%. This indicates that there are particular scenarios which might not be suited for HANA on VMware.Only 100?  When you consider the only like-for-like published benchmarks using VMware and HANA[iii], which showed a 12% degradation (coincidence, I think not) for a single VM HANA system vs. bare metal, it leaves one to wonder what sort of degradation might occur in a multiple VM HANA environment.  There is no guidance provided on this which might make anything less than a bleeding edge customer with no regard for SLA’s to be VERY cautious.  Another SAP note[iv] goes on to state, “For optimal VM performance, VMware recommends to size the VMs within the NUMA node boundaries of the specific server system (CPU cores and local NUMA node memory).” How much impact?  Not provided here.  So, either you size your VMs to fit within NUMA building blocks, i.e. a single 18-core socket or you suffer an undefined performance penalty.  It is also interesting to note what VMware said in Performance Best Practices for VMware vSphere® 5.5Be careful when using CPU affinity on systems with hyper-threading. Because the two logical processors share most of the processor resources, pinning vCPUs, whether from different virtual machines or from a single SMP virtual machine, to both logical processors on one core (CPUs 0 and 1, for example) could cause poor performance.”  That certainly gives me the warm and fuzzy!

Multiple VM support: Yes, you can now run multiple production HANA VMs on a system[v].  HOWEVER, “The vCPUs of a single VM must be pinned to physical cores, so the CPU cores of a socket get exclusively used by only one single VM. A single VM may span more than one socket, however. CPU and Memory overcommitting must not be used.”  This is NOT the value of virtualization, but of physical partitioning, a wonderful technology if we were living in the 1990s.  So, if you have an 8-socket system, you can run up to 8 simultaneous production VMs as long as al VMs are smaller than 511GB for BW, 767GB for SoH.  Need 600GB for BW, well that will cost you a full socket even though you only need a few cores thereby reducing the maximum number of VMs you can support on the system.  And this is before we take the 12% performance impact detailed above into consideration which could further limit the memory per core and number of VMs supported.

Support for mixed production HANA VMs and non-production: Not included in any of the above mentioned SAP notes.  One can infer from the above notes that this is not permitted meaning that there is no way to harvest unused cycles from production for the use of ANY other workload, whether non-prod or non-HANA DB.

Problem resolution: SAP Note 1995460 details the process by which problems may be resolved and while they are guided via SAP’s OSS system, there is transfer of ownership of problems when a known VMware related fix is not available.  The exact words are: “For all other performance related issues, the customer will be referred within SAP’s OSS system to VMware for support. VMware will take ownership and work with SAP HANA HW/OS partner, SAP and the customer to identify the root cause. Due to the abstraction of hardware that occurs when using virtualization, some hardware details are not directly available to SAP HANA support.” and of a little more concern “SAP support may request that additional details be gathered by the customer or the SAP HANA HW partner to help with troubleshooting issues or VMware to reproduce the issue on SAP HANA running in a bare metal environment.”

My summary of the above: One of more production HANA instances may be run under VMware 5.5 with a maximum of 64 vp/32 pp (assuming Hyperthreading is enabled) and a minimum of 12% performance degradation with potential proportionate impact on sizing, with guidance about “some scenarios might be suited for HANA with VMware”, with potential performance issues when VMs cross socket boundaries, with physical partitioning at the socket level, no sharing of CPU resources, no support for running non-production on the same system to harvest unused cycles and potential requirement to reproduce issues on a bare metal system if necessary.

Yes, that was a long, run-on sentence.  But it begs the question of just when would VMware be a good choice for hosting one or more production HANA instances?  My take is that unless you have very small instances which are unsuitable for HANA MDC (multitenancy) or are a cloud provider for very small companies, there is simply no value in this solution.  For those potential cloud providers, their target customer set would include companies with very small HANA requirements and the willingness to accept an SLA that is very flexible on performance targets while using a shared infrastructure for which there a wide variety of issues for which a problem in one VM could result in the whole system to fail with multiple customers impacted simultaneously.

And in case anyone is concerned that I am simply the bearer of bad news, let me remind the reader that IBM Power Systems with PowerVM is supported by SAP with up to 4 production HANA VMs (on the E870 and E880, 3 on all other HANA supported Power Systems) with granularity at the core level, no restrictions on NUMA boundaries, the ability to have a shared pool in place of one of the above production VMs with any number of non-production HANA VMs up to the limits of PowerVM which can utilize unused cycles from the production VMs, no performance penalties, no caveats about what types of workloads are well suited for PowerVM, excellent partition isolation preventing the vast majorities that could happen in one VM from affecting any other ones and no problem resolution handoffs or ownership changes.

In other words, if customers want to continue the journey around virtualization and server consolidation that they started in the early 2000s, want to have a very flexible infrastructure which can grow as they move to SoH, shrink as they move to S/4, grow as they consolidate more workloads into their primary instance, shrink again as they roll off data using data tiering, data aging or perhaps Hadoop and all without having to take significant system outages or throw away investment and purchase additional systems, IBM Power Systems with PowerVM can support this; VMware cannot.

 

———————

[i] 1788665 – SAP HANA Support for virtualized / partitioned (multi-tenant) environments

[ii] 1995460 – Single SAP HANA VM on VMware vSphere in production

[iii] Benchmark detail for bare metal and VMware 5.5 based runs from http://global.sap.com/solutions/benchmark/bweml-results.htm:

06/02/2014 HP 2,000,000,000 111,850 SuSE Linux Enterprise Server 11 on VMWARE ESX 5.5 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory

03/26/2014 HP 2,000,000,000 126,980 SuSE Linux Enterprise Server 11 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory

[iv] 2161991 – VMware vSphere configuration guidelines

[v] 2024433 – Multiple SAP HANA VMs on VMware vSphere in production

April 5, 2016 Posted by | Uncategorized | , , , , , , , , , | 4 Comments

SAP performance report sponsored by HP, Intel and VMware shows startling results

Not often does a sponsored study show the opposite of what was intended, but this study does.  An astute blog reader alerted me about a white paper sponsored by HP, VMware and Intel by an organization called Enterprise Strategy Group (ESG).  The white paper is entitled “Lab Validation Report – HP ProLiant DL980, Intel Xeon, and VMware vSphere 5 SAP Performance Analysis – Effectively Virtualizing Tier-1 Application Workloads – By Tony Palmer, Brian Garrett, and Ajen Johan – July 2011.”  The words that they used to describe the results are, as expected, highly complementary to HP, Intel and VMware.  In this paper, ESG points out that almost 60% of the respondents to their study have not virtualized “tier 1” applications like SAP yet but expect a rapid increase in the use of virtualization.  We can only assume that they only surveyed x86 customers as 100% of Power Systems customers are virtualized since the PowerVM hypervisor is baked into the hardware and firmware of every system and can’t be removed.  Nevertheless, it is encouraging that customers are moving in the right direction and that there is so much potential for the increased use of virtualization.

 

ESG provided some amazing statistics regarding scalability.  ESG probably does not realize just how bad this makes VMware and HP look, otherwise, they probably would not have published it.  They ran an SAP ECC 6.0 workload which they describe as “real world” but for which they provide no backup as to what this workload was comprised of, so it is possible that a given customer’s workload may be even more intensive than the one tested.  They ran a single VM with 4 vcpu, then 8, 16 and 32.  They show both the number of users supported as well as the IOPS and dialog response time.  Then, in their conclusions, they state that scaling was nearly linear.  This data shows that when scaling from 4 to 32 cores, an 8x increase, the number of users supported increased from 600 to 3,000, a 5x increase.  Put a different way, 5/8 = .625 or 62.5% scalability.  Not only is this not even remotely close to linear scaling, but it is an amazing poor level of scalability.  IOPS, likewise, increased from 140 to 630 demonstrating 56.3% scalability and response time went from .2 seconds to 1 second, which while respectable, was 5 times that of the 4 vcpu VM.

 

ESG also ran a non-virtualized test with 32 physical cores.  In this test, they achieved only 4,400 users/943 IOPS.  Remember, VMware is limited to 32 vcpu which works out to the equivalent of 16 cores.  So, with twice the number of effective physical cores, they were only able to support 46.7% more users and 49.7% more IOPS.  To make matters much worse, response time almost doubled to 1.9 seconds.

 

ESG went on to make the following statement: “Considering that the SAP workload tested utilized only half of the CPU and one quarter of the available RAM installed in the DL980 tested, it is not unreasonable to expect that a single DL980 could easily support a second virtualized SAP workload at a similarly high utilization level and/or multiple less intensive workloads driven by other applications.”  If response time is already borderline poor with VMware managing only a single workload, is it reasonable to assume that response time will go up or down if you add a second workload?  If IOPS are not even keeping pace with the poor scalability of vcpu, it is reasonable to assume that IOPS will all of a sudden start improving faster?  If you have not tested the effect of running a second workload, is it reasonable to speculate what might happen under drastically different conditions?  This is like saying that on a hot summer day, an air conditioner was able to maintain a cool temperature in a sunny room with half of the chairs occupied and therefore it is not “unreasonable” to assume that it could do the same with all chair occupied.  That might be the case, but there is absolutely no supporting evidence to support such a speculation.

 

ESG further speculates that because this test utilized default values for BIOS, OS, SAP and SQL Server, performance would likely be higher with tuning.  … And my car will probably go faster if I wash it and add air to the tires, but by how much??  In summary and I am paraphrasing, ESG says that VMware, Intel processors and HP servers are ready for SAP primetime providing reliability and performance while simplifying operations and lowering costs.  Interesting that they talk about reliability yet they, once again, provide no supporting evidence and did not mention a single thing about reliability earlier in the paper other than to say that the HP DL980 G7 delivers “enhanced reliability”.  I certainly believe every marketing claim that a company makes without data to back it up, don’t you?

 

There are three ways that you can read this white paper.

  1. ESG has done a thorough job of evaluating HP x86 systems, Intel and VMware and has proven that this environment can handle SAP workloads with ease
  2. ESG has proven that VMware has either incredibly poor scalability or high overhead or both
  3. ESG has limited credibility as they make predictions for which they have no data to support their conclusions

 

While I might question how ESG makes predictions, I don’t believe that they do a poor job at performance testing.  They seem to operate like an economist, i.e. they are very good at collecting data but make predictions based on past experience, not hard data.  When is the last time that economists correctly predicted market fluctuations?  If they did, they would all be incredibly rich!

 

I think it would be irresponsible to say that VMware based environments are incapable of handling SAP workloads.  On the contrary, VMware is quite capable, but there are significant caveats.  VMware does best with small workloads, e.g. 4 to 8  vcpu, not with larger workloads e.g. 16 to 32 vcpu.  This means if a customer utilizes SAP on VMware, they will need more and smaller images than they would on excellent scaling platforms like IBM Power Systems, which drives up management costs substantially and reduces flexibility.  By way of comparison, published SAP SD 2-tier benchmark results for IBM Power Systems utilizing POWER7 technology show 99% scalability when comparing the performance of a 16-core to a 32-core system at the same MHz, 89.3% scalability when comparing a 64-core to a 128-core system with a 5% higher MHz, which when normalized to the same MHz shows 99% scalability even at this extremely high performance level.

 

The second caveat for VMware and HP/Intel systems is in that area that ESG brushed over as if it was a foregone conclusion, i.e. reliability.  Solitaire Interglobal examined data from over 40,000 customers and found that x86 systems suffer from 3 times or more system outages when comparing Linux based x86 systems to Power Systems and up to 10 times more system outages when comparing Windows based x86 systems to Power Systems.  They also found radically higher outage durations for both Linux and Windows compared to Power and much lower overall availability when looking at both planned and unplanned outages in general: http://ibm.co/strategicOS and specifically in virtualized environments: http://ibm.co/virtualizationplatformmatters.  Furthermore, as noted in my post from late last year, https://saponpower.wordpress.com/2011/08/29/vsphere-5-0-compared-to-powervm/, VMware introduces a number of single points of failure when mission critical applications demand just the opposite, i.e. the elimination of single points of failure.

 

I am actually very happy to see this ESG white paper, as it is has proven how poor VMware scales for large workloads like SAP in ways that few other published studies have ever exposed.  Power Systems continues to set the bar very high when it comes to delivering effective virtualization for large and small SAP environments while offering outstanding, mission critical reliability.  As noted in https://saponpower.wordpress.com/2011/08/15/ibm-power-systems-compared-to-x86-for-sap-landscapes/, IBM does this while maintaining a similar or lower TCO when all production, HA and non-production systems, 3 years of 24x7x365 hardware maintenance, licenses and 24x7x365 support for Enterprise Linux and vSphere 5.0 Enterprise Plus … and that analysis was done back when I did not have ESG’s lab report showing how poorly VMware scales.  I may have to revise my TCO estimates based on this new data.

October 23, 2012 Posted by | Uncategorized | , , , , , , , , , | 3 Comments

vSphere 5.0 compared to PowerVM

Until recently, VMware partitions suffered from a significant scalability limitation. Each partition could scale to a maximum of 8 virtual processors (vp) with vSphere 4.1 Enterprise Edition. For many customers and uses, this did not pose much of an issue as some of the best candidates for x86 virtualization are the thousands of small, older servers which can easily fit within a single core of a modern Intel or AMD chip. For SAP customers, however, the story was often quite different. Eight vp does not equate to 8 cores, it equates to 8 processor threads. Starting with Nehalem, Intel offered HyperThreading which allowed each core to run two different OS threads simultaneously. This feature boosted throughput, on average, by about 30% and just about all benchmarks since that time have been run with HyperThreading enabled. Although it is possible to disable it, few customers elect to do so as it removes that 30% increased throughput from the system. With HyperThreading enabled, 8 VMware vp utilize 4 cores/8 threads which can be as little as 20% of the cores on a single chip. Put in simple terms, this can be as little as 5,000 SAPS depending on the version and MHz of the chip. Many SAP customers routinely run their current application servers at 5,000 to 10,000 SAPS, meaning moving these servers to VMware partitions would result in the dreaded hotspot, i.e. bad performance and a flood of calls to the help desk. By comparison, PowerVM (IBM’s Power Systems virtualization technology) partitions may scale as large as the underlying hardware and if that limit is reached, may be migrated live to a larger server, assuming one exists in the cluster, and the partition allowed to continue to operate without interruption and a much higher partition size capability.

 
VMware recently introduced vSphere 5.0. Among a long list of improvements is the ability to utilize 32 vp for a single partition. On the surface, this would seem to imply that VMware can scale to all but a very few large demands. Once you dig deeper, several factors emerge. As vSphere 5.0 is very new, there are not many benchmarks and even less customer experience. There is no such thing as a linearly scalable server, despite benchmarks that seem to imply this, even from my own company. All systems have a scalability knee of the curve. Where some workloads, e.g. AIM7, when tested by IBM showed up to 7.5 times the performance with 8 vp compared to 1 vp on a Xeon 5570 system with vSphere 4.0 update 1, it is worthwhile to note that this was only achieved when no other partitions were running, clearly not the reason why anyone would utilize VMware. In fact, one would expect just the opposite, that an overcommitment of CPU resources would be utilized to get the maximum throughput of a system. On another test, DayTrader2.0 in JDBC mode, a scalability maximum of 4.67 the performance of a single thread was reached with 8 vp, once again while running no other VMs. It would be reasonable to assume that VMware has done some scaling optimization but it would be premature and quite unlikely to assume that 32 vp will scale even remotely close to 4 times the performance of an 8 vp VM. When multiple VMs run at the same time, VMware overhead and thread contention may reduce effective scaling even further. For the time being, a wise customer would be well advised to wait until more evidence is presented before assuming that all scaling issues have been resolved.

But this is just one issue and, perhaps, not the most important one. SAP servers are by their very nature, mission critical. For database servers, any downtime can have severe consequences. For application servers, depending on how customers implement their SAP landscapes and the cost of downtime, some outages may not have as large the consequence. It is important to note that when an application server fails, the context for each user ‘s session is lost. In a best case scenario, the users can recall all necessary details to re-run the transactions in flight after re-logging on to another application server. This means that the only loss is the productivity of that user multiplied by the number of users previously logged on and doing productive work on that server. Assuming 500 users and 5 minutes to get logged back on, transaction initiated through to completion, this is only 2,500 minutes of lost productivity which at a loaded cost of $75,000 per employee is only a total loss to the company of $1,500 per occurrence. With one such occurrence per application server per year, this would result in $6,000 of cost over 5 years and should be included in any comparison of TCO. Of course, this does not take into consideration any IT staff time required to fix the server, any load on the help desk to help resolve issues, nor any political cost to IT if failures happen too frequently. But what happens if the users are unable to recall all of the details necessary to re-run the transactions or what happens if tight integration with production requires that manufacturing be suspended until all users are able to get back to where they had been? The costs can escalate very quickly.

So, what is my point? All x86 hypervisors, including VMware 4.1 and 5.0, are software layers on top of the hardware. In the event of an uncorrectable error in the hardware, the hypervisor usually fails and, in turn, takes down all VMs that it is hosting. Furthermore, problems are not just confined to the CPU, but could be caused by memory, power supplies, fans or a large variety of other components. I/O is yet another critical issue. VMware provides shared I/O resources to partitions, but it does this sharing within the same hypervisor. A device driver error, physical card error or, in some cases, even an external error in a cable, for example, might result in a hypervisor critical error and resulting outage. In other words, the hypervisor becomes a very large single point of failure. In order to avoid the sort of costs described above, most customers try to architect mission critical systems to reduce single points of failure not introduce new ones.

PowerVM takes the opposite approach. First, it is implemented in hardware and firmware. As the name implies, hardware is hardened meaning it is inherently more reliable and far less code is required since many functions are built into the chip.

Second, PowerVM acts primarily as an elegant dispatcher. In other words, it decides which partition executes next in a given core, but then it gets out of the way and allows that partition to execute natively in that core with no hypervisor in the middle of it. This means that if an uncorrectable error were to occur, an exceedingly rare event for Power Systems due to the wide array of fault tolerant components not available in any x86 server, in most situations the error would be confined to a single core and the partition executing in that core at that moment.

Third, sharing of I/O is done through the use of a separate partition called the Virtual I/O (VIO) server. This is done to remove this code from the hypervisor, thereby making they hypervisor more resilient and also to allow for extra redundancy. In most situations, IBM recommends that customers utilize more than one VIO server and spread I/O adapters across those servers with redundant virtual connections to each partition. This means that if an error were to occur in a VIO server, once again a very rare event, only the VIO server might fail, but the other VIO servers would not fail and there would be no impact on the hypervisor since it is not involved in the sharing of I/O at all. Furthermore, partitions would not fail since they would be multipathing virtual devices across more than one VIO server.

So even if VMware can scale beyond 8vp, the question is how much of your enterprise are you ready to place on a single x86 server? 500 users? 1,000 users? 5,000 users? Remember, 500 users calling the help desk at one time would result in long delays. 1,000 at the same time would result in many individuals not waiting and calling their LOB execs instead.

In the event that this is not quite enough of a reason to select Power and PowerVM over x86 with VMware, it is worthwhile to consider the security exposure differences. This has been covered already in a prior blog entry comparing Power to x86 servers, but is worthwhile noting again. PowerVM has no known vulnerabilities according to the National Vulnerability Database, http:// nvd.nist.gov. By comparison, a search on that web site for VMware results in 119 hits. Admittedly, this includes older versions as well as workstation versions, but it is clear that hackers have historically found weaknesses to exploit. VMware has introduced vShield with vSphere 5.0, a set of technologies intended to make VMware more secure, but only it would be prudent to wait and see if this closes all holes or opens new ones.

Also covered in the prior blog entry, the security of the hypervisor is only one piece of the equation. Equally, or perhaps more important is the security of the underlying OSs. Likewise, AIX is among the least vulnerable OSs with Linux and Windows having an order of magnitude more vulnerabilities. Also covered in that blog was a discussion about problem isolation, determination and vendor ownership of problems to drive them to successful resolution. With IBM, almost the entire stack is owned by IBM and supported for mission critical computing whereas with x86, the stack is a hodgepodge of vendors with different support agreements, capabilities and views on who should be responsible for a problem often resulting in finger pointing.

There is no question that VMware has tremendous potential for applications that are not mission critical as well as being an excellent fit for many non-production environments. For SAP, the very definition of mission critical, a more robust, more scalable and better secured environment is needed and Power Systems with PowerVM does an excellent job of delivering on these requirements.

Oh, I did not mention cost. With the new memory based pricing model for vSphere 5.0, applications such as SAP, which demand enormous quantities of memory, may easily exceed the new limits for memory pool size forcing the purchase of additional VMware licenses. Those extra license costs and their associated maintenance, can easily add enough cost that the price differences, if there are any, between Power and x86 further close to be almost meaningless.

August 29, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , , , , | 5 Comments