IBM Power10 debuts with a new SAP Benchmark!
Today, SAP published a new SD 2-tier result for IBM’s soon to be announced Power E1080.[i] First the highlights:
- 174,000 SD Users
- 955,050 SAPS
- 120 cores
Wait, almost 1M SAPS with only 120 cores? HPE achieved 670,830 SAPS (122,300 users) with 224 cores on their Superdome Flex 280 with the Intel Xeon Platinum 8380H Processor in January, 2021.
This new result is almost 3 times the SAPS/core of HPE’s biggest and baddest system. (Funny note, autocorrect tried to change “baddest” to “saddest”). This new result is also about 33% faster, on a per core basis, than the previous Power 980 result published at the end of 2018. That is certainly not remarkable since Intel’s per core performance on this benchmark also increased about 69.5%, since 2017 … sorry, missed the decimal, 0.695%. (Comparing two Dell 2-socket results, Intel 8180 & Intel 8380).
Clearly, IBM has moved the microarchitecture technology ball forward with a huge improvement in per core performance. And that is significant in that Intel seems to have given up on the microarchitecture game and only seems to be focused on increasing the core count (now up to 40 per socket).
But isn’t the SD benchmark based on ECC 6.0 and primarily an app server benchmark, so do we really care if we are talking about SAP workloads? For that matter, isn’t HANA the name of the game now and how can we correlate this result against HANA workloads?
Yes and you can’t. I will answer the second question first. SAP rules forbid comparisons against different benchmarks and for good reason; they don’t have the same logic, application code, database usage, memory dependency or anything else for that matter. But, we will get to the impact on HANA a bit later in this blog.
The SD benchmark is rather removed from reality both by its age, its dependence on an outdated interface (the old and much loved SAP GUI, not) not to mention old non-HANA databases. Fun fact: since 2005, 96 results are used MaxDB or Sybase, 155 – IBM Db2, 52 – Oracle, 413 – Microsoft SQL and 0 used HANA. And since application servers can easily scale across dozens of systems, the performance per core doesn’t really matter all that much, and this equation usually boils down to $/SAPS.
At Hot Chips 2020, Bill Starke, IBM Power Chief Architect and Brian Thompto, Power10 core architect, revealed a bunch of amazing speeds and feeds including 2.25x the memory bandwidth for Power10 vs Power9 per socket.[ii] We know that HANA eats memory bandwidth for breakfast, lunch, dinner and all snacks in between. This new SD benchmark (and others that IBM will undoubtedly publish very soon) suggest that these new Power processors will be able to handle all workloads, including SAP HANA, with either fewer cores or with the same number of cores and tons of CPU cycles to spare.
It might be tempting to consider using a smaller Power10 system, but this is where the problem gets a bit sticky. HANA not only loves memory bandwidth, but unless you are going to provision a server with less memory than SAP recommends or use one of their tiered approaches, you still need the same quantity of memory regardless of server or microarchitecture. You could certainly reduce the number of cores per socket or go to slower chip speeds and this might be a very good approach for reducing HANA system costs for a lot of customers. Another option to consider is using those spare cycles for something else, after all HANA is supported by SAP for use with IBM PowerVM shared processor pools.
What other workloads might you use those cycles for? We could get into a big discussion about all sort of other workloads, like AI, HPC, etc., but how about we keep this simple? How about for the thing that the SD benchmark actually does test, application serving? Even with S/4HANA and Fiori, you still need application servers. And if you already purchased a server for HANA based on memory requirements and you have a ton of cycles left over, this means that the $/SAPS for those application servers essentially goes toward $0! I have not priced an Intel server lately, but I am pretty certain that the price is not even remotely close to $0.
For existing SAP on Power customers (both HANA and non-HANA), Power10 is going to be amazing, resulting in either better performance, lower cost or both! For customers still trying to decide on which type of system to use, I would strongly encourage a full landscape cost comparison be performed including production HANA and application servers, HA, non-prod and DR.
And as good a news as this is for on-premise customers, cloud vendors that offer HANA on Power, such as IBM, Syntax and SAP, should be even more excited about how they can decrease their costs while offering better solutions to their customers with Power10.
[i] https://www.sap.com/dmc/exp/2018-benchmark-directory/#/sd
[ii] https://www.nextplatform.com/2020/08/18/ibm-brings-an-architecture-gun-to-a-chip-knife-fight/
POWER10 – Opening the door for new features and radical TCO reduction to SAP HANA customers
As predicted, IBM continued its cycle of introducing a new POWER chip every four years (after a set of incremental or “plus” enhancements usually after 2 years).[i] POWER10, revealed today at HotChips, has already been described in copious detail by the major chip and technology publications.[ii] I really want to talk about how I believe this new chip may affect SAP HANA workloads, but please bear with me over a paragraph or so to totally geek out over the technology.
POWER10, manufactured by Samsung using 7nm lithography, will feature up to 15 cores/chip, up from the current POWER9 maximum of 12 cores (although most systems have been shipping with chips with 11 cores or fewer enabled). IBM found, as did Intel, that manufacturing increasingly smaller lithography with maximum GHz and maximum core count chips was extremely difficult. This resulted in an expensive process with inevitable microscopic defects in a core or two, so they have intentionally designed a 16-core chip that will never ship with 15 cores enabled. This means that they can handle a reasonable number of manufacturing defects without a substantial impact to chip quality, simply be turning off which ever core does not pass their very stringent quality tests. This will take the maximum the number of cores in all systems up proportionately based on the number of sockets in each system. If that was the only improvement, it would be nice, but not earth shattering as every other chip vendor also usually introduces more or faster cores with each iteration.
Of course, POWER10 will have much more than that. Relative to POWER9, each core will feature 1.5 times the amount of L1 cache, 4x the L2 cache, 4x the TLB and 2x general and 4x matrix SIMD. (Sorry, not going to explain each as I am already getting a lot more geeky than my audience typically prefers.). Each chip will feature over 4 times the memory bandwidth and 4x the interconnect bandwidth. Of course, if IBM only focused on CPU performance and interconnect and memory bandwidths, then the bottleneck would naturally just be pushed to the next logical points, i.e. memory and I/O. So, per a long-standing philosophy at IBM, they didn’t ignore these areas and instead built in support for both existing PCIe Gen 4 and emerging PCIe Gen 5 and both existing DDR4 and emerging DDR5 memory DIMMS, not to mention native support for future GDDR DIMMs. And if this was not enough, POWER10 includes all sorts of core microarchitecture improvements which I will most definitely not get into here as most of my friends in the SAP world are probably already shaking their heads trying to understand the implications of the above improvements.
You might think with all of these enhancements, this chip would run so hot that you could heat an Olympic swimming pool with just one system, but due to a variety of process improvements, this chip actually runs at 3x the power efficiency of POWER9.
Current in-memory workloads, like SAP HANA, rarely face a performance issue due to computational performance. It is reasonable to ask the question whether this is because these workloads are designed with an understanding of the current limitations of systems and avoid doing more computationally intense actions to avoid delivering unsatisfying performance to users who are increasingly impatient with any delays. HANA has kind of set the standard for rapid response so delivering anything else would be seen as a failure. Is HANA holding back on new, but very costly (in computational terms) features that could be unleashed once a sufficiently fast CPU becomes available? If so, then POWER10 could be the catalyst for unleashing some incredible new HANA capabilities, if SAP takes advantage of this opportunity.
A related issue is that perhaps existing cores can deliver better performance, but memory has not been able to keep pace. Remember, 128GB DRAM DIMMS are still the accepted maximum size by SAP for HANA even though 256GB DIMMS have been on the market for some time now. As SAP has long used internal benchmarks to determine ratios of memory to CPU, could POWER10 enable to the use of much more dense memory to drive down the number of sockets required to support HANA workloads thereby decreasing TCO or enabling more server consolidation? Remember, Power Systems all featured imbedded, hardware/hypervisor based virtualization, so adding workloads to a system is just a matter of harnessing any unused extra capacity.
IBM has not released any performance projections for HANA running on POWER10 but has provided some unrelated number crunching and AI projections. Based on those plus the raw and incredibly impressive improvements in both the microarchitecture and number of cores, dramatic cache and TLB increases and gigantic memory and interconnect bandwidth expansions, I predict that each socket will support 2 to 3 times the size of HANA workloads that are possible today (assuming sufficient memory is installed).
In the short term, I expect that customers will be able to utilize TDI 5 relaxed sizing rules to use much larger DIMMs and amounts of memory per POWER10 system to accomplish two goals. For customers with relatively small HANA systems, they will be able to cut in half the number of sockets required compared to existing HANA systems. For customers that currently have large numbers of systems, they will be able to consolidate those into many fewer POWER10 sytems. Either way, customers will see dramatic reductions in TCO using POWER10.
As to those customers that often run out of memory before they run out of CPU, stay tuned for part 2 of this blog post as we discuss perhaps the most exciting new innovation with POWER10, Memory Clustering.
[i] https://twitter.com/IBMPowerSystems/status/1295382644402917376
https://www.linkedin.com/posts/ibm-systems_power10-activity-6701148328098369536-jU9l
https://twitter.com/IBMNews/status/1295361283307581442
[ii] Forbes – https://www.forbes.com/sites/moorinsights/2020/08/17/ibm-discloses-new-power10-processer-at-hot-chips-conference/#12e7a5814b31
VentureBeat – https://venturebeat.com/2020/08/16/ibm-unveils-power10-processor-for-big-data-analytics-and-ai-workloads/
Reuters – https://www.reuters.com/article/us-ibm-samsung-elec/ibm-rolls-out-newest-processor-chip-taps-samsung-for-manufacturing-idUSKCN25D09L
IT Jungle – https://www.itjungle.com/2020/08/17/power-to-the-tenth-power/
SAP increases support for HANA on Power, now up to 16 concurrent production VMs with IBM PowerVM
On March 1, 2019, SAP updated SAP Note 2230704 – SAP HANA on IBM Power Systems with multiple – LPARs per physical host. Previously, up to 8 concurrent HANA production VMs could be supported on the Power E880 system with 16 sockets. Now, the new POWER9 based E980, also with 16 sockets, is supported with up to 16 concurrent HANA production VMs. As was the case prior to this update, each VM must have a minimum of 4 cores and 128GB and can grow to as large as 16TB for OLAP and 24TB for OLTP. The maximum VM count may be reduced by 1 if a shared pool is desired for one or more non-production HANA, any other SAP or non-SAP workloads. There is no restriction on the number of VMs that can run in a shared pool from an SAP perspective, but practical physical limits are usually hit before any PowerVM architectural limits. CPU capacity not used by the production VMs may be shared, temporarily, with VMs in the shared pool using a proprietary technology called dedicated-donating where the production VM, which owns the CPU capacity, may loan part of it to the shared pool and get it back immediately when needed for that production workload.
Most customers were quite happy with 8 concurrent VMs, so why should anyone care about 16? Turns out, some customers have really complex landscapes. I recently had discussions with a customer that has around 12 current and planned production HANA instances. They were debating whether to use HANA in a multi-tenant configuration. The problem is that all HANA tenants in a mult-tenant VM are tightly bound, i.e. when the VM, OS or SAP software needs to be updated or reconfigured, all tenants are affected simultaneously. While not impossible to deal with, this introduces operational complexity. If those same 12 instances were placed on a new POWER server, the operational complexity could be eliminated.
As much as this might benefit the edge customers with a large number of instances, it will really benefit cloud vendors that utilize Power with greater flexibility, more sharing of resources and lower management and infrastructure costs. Also, isolation between cloud clients are essential, so multi-tenancy is rarely an effective option. On the other hand, PowerVM offers very strong isolation, so this offers an excellent option for cloud providers even when different clients share the same infrastructure.
This announcement also closes a perceived gap where VMware could already run up to 16 concurrent VMs on an 8-socket system. The caveat to that support was that the minimum and maximum size of each VM when running at the 16 VM level was ½ socket. Of course, you could play the mix and match game with some VMs at the ½ socket level and others at the full or multi-socket level, but neither of these options provides for very good granularity. For systems with 28-core sockets, the granularity per VM is 14 cores, 28 cores and then multiples of 28 cores up to 112 cores. For those that are configured at 1/2 sockets, if there is no other workload to consume the other 1/2 socket, then the capacity is simply wasted. Memory, likewise, has some granularity limitations. According to VMware’s Best Practice Guide, “When an SAP HANA VM needs to be larger than a single NUMA node, allocate all resource of this NUMA node; and to maintain a high CPU cache hit ratio, never share a NUMA node for production use cases with other workloads. Also, avoid allocation of more memory than a single NUMA node has connected to it because this would force it to access memory that is not local to the SAP HANA processes scheduled on the NUMA node.” In other words, any memory not consumed by the HANA VM(s) on a particular socket/node is simply wasted since other nodes should not utilize memory across nodes.
By comparison, production HANA workloads running on PowerVM may be adjusted by 1 core at a time with memory granularity measured in MB, not GB or TB.
In an upcoming blog post, I will give some practical examples of landscapes and how PowerVM and VMware virtualization would utilize resources.
With this enhanced level of support from SAP, IBM Power Systems with PowerVM is once again the clear leader in terms of virtualization for HANA environments.
What should you do when your LoBs say they are not ready for S/4HANA – part 2, Why choice of Infrastructure matters
Conversions to S/4HANA usually take place over several months and involve dozens to hundreds of steps. With proper care and planning, these projects can run on time, within budget and result in a final Go-Live that is smooth and occurs within an acceptable outage window. Alternately, horror stories abound of projects delayed, errors made, outages far beyond what was considered acceptable by the business, etc. The choice of infrastructure to support a conversion may the last thing on anyone’s mind, but it can have a dramatic impact on achieving a successful outcome.
A conversion includes running many pre-checks[i] which can run for quite a while[ii] which implies that they can drive CPU utilization to high levels for a significant duration and impact other running workloads. As a result, consultants routinely recommend that you make a copy of any system, especially production, against which you will run these pre-checks. SAP recommends that you run these pre-checks against every system to be converted, e.g. Development, Test, Sandbox, QA, etc. If they are being used for on-going work, it may be advisable to also make a copy of them and run those copies on other systems or within virtual machines which can be limited to avoid causing performance issues with other co-resident virtual machines.
In order to find issues and correct them, conversion efforts usually involve a phased approach with multiple conversions of support systems, e.g. Dev, Test, Sandbox, QA using a tool such as SAP’s Systems Update Manager with the Database Migration Option (SUM w/DMO). One of the goals of each run is to figure out how long it will take and what actions need to be taken to ensure the Go-Live production conversion completes within the required outage window, including any post-processing, performance tuning, validation and backups.
In an attempt to keep expenses low, many customers will choose to use existing systems or VMs or systems in addition to a new “target” system or systems if HA is to be tested. This means that the customer’s network will likely be used in support of these connections. Taken together, the use of shared infrastructure components may come into opposition with events among the shared components which can impacts these tests and activities. For example, if a VM is used but not enough CPU or network bandwidth is provided, the duration of the test may extend well beyond what is planned meaning more cost for per-hour consulting and may not provide the insight into what needs to be fixed and how long the actual migration may take. How about if you have plenty of CPU capacity or even a dedicated system, but the backup group decides to initiate a large database backup at the same time and on the same network as your migration test is using. Or maybe, you decide to run a test at a time that another group, e.g. operations, needs to test something that impacts it or when new equipment or firmware is being installed and modifications to shared infrastructure are occurring, etc. Of course, you can have good change management and carefully arrange when your conversion tests will occur which means that you may have restricted windows of opportunity at times that are not always convenient for your team.
Let’s not forget that the application/database conversion is only one part of a successful conversion. Functional validation tests are often required which could overwhelm limited infrastructure or take it away from parallel conversion tasks. Other easily overlooked but critical tasks include ensuring all necessary interfaces work; that third party middleware and applications install and operate correctly; that backups can be taken and recovered; that HA systems are tested with acceptable RPO and RTO; that DR is set up and running properly also with acceptable RPO and RTO. And since this will be a new application suite with different business processes and a brand new Fiori interface, training most likely will be required as well.
So, how can the choice of infrastructure make a difference to this almost overwhelming set of issues and requirements? It comes down to flexibility. Infrastructure which is built on virtualization allows many of these challenges to be easily addressed. I will use an existing IBM Power Systems customer running Oracle, DB2 or Sybase to demonstrate how this would work.
The first issue dealt with running pre-checks on existing systems. If those existing systems are Power Systems and enough excess capacity is available, PowerVM, the IBM Virtualization Hypervisor, allows a VM to be started with an exact copy of production, passed through normal post-processing such as BDLS to create a non-production copy or cloned and placed behind a network firewall. This VM could be fenced off logically and throttled such that production running on the same system would always be given preference for cpu resources. By comparison, a similar database located on an x86 system would likely not be able to use this process as the database is usually running on bare-metal systems for which no VM can be created.
Alternately, for Power Systems, the exact same process could be utilized to carve out a VM on a new HANA target system and this is where the real value starts to emerge. Once a copy or clone is available on the target HANA on Power system, as much capacity can be allocated to the various pre-checks and related tasks as needed without any concern for the impact on production or the need to throttle these processes thereby optimizing the duration of these tasks. On the same system, HANA target VMs may be created. As mock conversions take place, an internal virtual network may be utilized. Not only is such a network faster by a factor of 2 or more, but it is dedicated to this single purpose and completely unaffected by anything else going on within the datacenter network. No coordination is required beyond the conversion team which means that there is no externally imposed delay to begin a test or constraints on how long such a test may take or, for that matter, how many times such a test may be run.
The story only gets better. Remember, SAP suggests you run these pre-checks on all non-prod landscapes. With PowerVM, you may fire up any number of different copy/clone VMs and/or HANA VMs. This means that as you move from one phase to the next, from one instance to the next, from conversion to production, run a static validation environment while other tasks continue, conduct training classes, run many different phases for different projects at the same time, PowerVM enables the system to respond to your changing requirements. This helps avoid the need to purchase extra interim systems and buy when needed, not significantly ahead of time due to the inflexibility of other platforms. You can even simulate an HA environment to allow you to test your HA strategy without needing a second system, up to the physical limits of the system, of course. This is where a tool like SAP’s TDMS, Test Data Migration Server, might come in very handy.
And when it comes time for the actual Go-Live conversion, the running production database VM may be moved live from the “old” system, without any downtime, to the “new” system and the migration may now proceed using the virtual, in-memory network at the fastest possible speed and with all external factors removed. Of course, if the “old” system is based on POWER8, it may then be used/upgraded for other HANA purposes. Prior Power Systems as well as current generation POWER8 systems can be used for a wide variety of other purposes, both SAP and those that are not.
Bottom line: The choice of infrastructure can help you eliminate external influences that cause delays and complications to your conversion project, optimize your spend on infrastructure, and deliver the best possible throughput and lowest outage window when it comes to the Go-Live cut-over. If complete control over your conversion timeline was not enough, avoidance of delays keeps costs for non-fixed cost resources to a minimum. For any SAP customer not using Power Systems today, this flexibility can provide enormous benefits, however the process of moving between systems would be somewhat different. For any existing Power Systems customer, this flexibility makes a move to HANA on Power Systems almost a no-brainer, especially since IBM has so effectively removed TCA as a barrier to adoption.
[i] https://blogs.sap.com/2017/01/20/system-conversion-to-s4hana-1610-part-2-pre-checks/
[ii] https://uacp.hana.ondemand.com/http.svc/rc/PRODUCTION/pdfe68bfa55e988410ee10000000a441470/1511%20001/en-US/CONV_OP1511_FPS01.pdf page 19
SAP HANA on Power support expands dramatically
SAP’s release of HANA SPS11 marks a critical milestone for SAP/IBM customers. About a year ago, I wrote that there was Hope for HoP, HANA on Power. Some considered this wishful thinking, little more than a match struck in the Windy City. In August, that hope became a pilot light with SAP’s announcement of General Availability of Scale-up BW HANA running on the Power Systems platform. Still, the doubters questioned whether Power could make a dent in a field already populated by dozens of x86 vendors with hundreds of supported appliances and thousands of installed customers. With almost 1 new customer per business day deciding to implement HANA on Power since that time, the pilot light has quickly evolved into a nice strong flame on a stove.
In November, 2015, SAP unleashed a large assortment of support for HoP. First, they released a first of a kind support for running more than 1 production instance using virtualization on a system.[1] For those that don’t recall, SAP limits systems running HANA in production on VMware to one[2], count that as 1, total VMs on the entire system. Yes, non-prod can utilize VMware to its heart’s content, but is it wise to mess with best practices and utilize different stacks for prod and non-prod, much less deal with restrictions that limit the number of vps to 64, i.e. 32 real processors not counting VMware overhead and 1TB of memory? Power now supports up to 4 resource pools on E870 and E880 systems and 3 on systems below this level. One of those resource pools can be a “shared pool” supporting many VMs of any kind and any supported OS as long as none of them run production HANA instances. Any production HANA instance must run in a dedicated or dedicated-donating partition in which when production HANA needs CPU resources, it gets it without any negotiation or delay, but when it does not require all of the resources, it allows partitions in the shared pool to utilize unused resources. This is ideal for HANA as it is often characterized by wide variations in loads, often low utilization and very low utilization on non-prod, HA and DR systems, resulting in the much better flexibility and resource utilization (read that as reduced cost).
But SAP did not stop there. Right before the US Thanksgiving holiday, SAP released support for running HANA on Power with Business Suite, specifically ERP 6.0 EHP7, CRM 7.0 EHP3 and SRM 7.0 EHP3, SAP Landscape Transformation Replication Server 2.0, HANA dynamic tiering, BusinessObjects Business Intelligence platform 4.1 SP 03, HANA smart data integration 1.0 SP02, HANA spatial SPS 11 and controlled availability of BPC[3], scale-out BW[4] using the TDI model with up to 16-nodes. SAP plans to update the application support note as each additional application passes customer and/or internal tests, with support rolling out rapidly in the next few months.
Not enough? Well, SAP took the next step and increased the memory per core ratio on high end systems, i.e. the E870 and E880, to 50GB/core for BW workloads thereby increasing the total memory supported in a scale-up configuration to 4.8TB.[5]
What does this mean for SAP customers? It means that the long wait is over. Finally, a robust, reliable, scalable and flexible platform is available to support a wide variety of HANA environments, especially those considered to be mission critical. Those customers that were waiting for a bet-your-business solution need wait no more. In short order, the match jumped to a pilot light, then a flame to a full cooktop. Just wait until S/4HANA, SCM and LiveCache are supported on HoP, likely not a long wait at this rate, and the flame will have jumped to one of those jet burners used for crawfish boiling from my old home town of New Orleans! Sorry, did I push the metaphor to far? 🙂
[1] 2230704 – SAP HANA on IBM Power Systems with multiple – LPARs per physical host
[2] 1995460 – Single SAP HANA VM on VMware vSphere in production
[3] 2218464 – Supported products when running SAP HANA on IBM Power Systems and http://news.sap.com/customers-choose-sap-hana-to-run-their-business/
[4] BW Scale-out support restriction that was previously present has been removed from 2133369 – SAP HANA on IBM Power Systems: Central Release Note for SPS 09 and SPS 10
[5] 2188482 – SAP HANA on IBM Power Systems: Allowed Hardware
HoP keeps Hopping Forward – GA Announcement and New IBM Solution Editions for SAP HANA
Almost two years ago, I speculated about the potential value of a HANA on Power solution. In June, 2014, SAP announced a Test and Evaluation program for Scale-up BW HANA on Power. That program shifted into high gear in October, 2014 and roughly 10 customers got to start kicking the tires on this solution. Those customers had the opportunity to push HANA to its very limits. Remember, where Intel systems have 2 threads per core, POWER8 has up to 8 threads per core. Where the maximum size of most conventional Intel systems can scale to 240 threads, the IBM POWER E870 can scale to an impressive 640 threads and the E880 system can scale to 1536 threads. This means that IBM is able to provide an invaluable test bed for system scalability to SAP. As SAP’s largest customers move toward Suite “4” HANA (S4HANA), they need to have confidence in the scalability of HANA and IBM is leading the way in proving this capability.
A Ramp-up program began in March with approximately 25 customers around the world being given the opportunity to have access to GA level code and start to build out BW POC and production environments. This brings us forward to the announcement by SAP this week @ SapphireNow in Orlando of the GA of HANA on Power. SAP announced that customers will have the option of choosing Power for their BW HANA platform, initially to be used in a scale-up mode and plans to support scale-out BW, Suite on HANA and the full complement of side-car applications over the next 12 to 18 months.
Even the most loyal IBM customer knows the comparative value of other BW HANA solutions already available on the market. To this end, IBM announced new “solution editions”. A solution edition is simply a packaging of components, often with special pricing, to match expectations of the industry for a specific type of solution. “Sounds like an appliance to me” says the guy with a Monty Python type of accent and intonation (no, I am not making fun of the English and am, in fact, a huge fan of Cleese and company). True, if one were to look only at the headline and ignore the details. In reality, IBM is looking toward these as starting points, not end points and most certainly not as any sort of implied limitation. Remember, IBM Power Systems are based on the concept of Logical Partitions using Power Virtualization Manager (PVM). As a result, a Power “box” is simply that, a physical container within which one or multiple logical systems reside and the size of each “system” is completely arbitrary based on customer requirements.
So, a “solution edition” simply defines a base configuration designed to be price competitive with the industry while allowing customers to flexibly define “systems” within it to meet their specific requirements and add incremental capability above that minimum as is appropriate for their business needs. While a conventional x86 system might have 1TB of memory to support a system that requires 768GB, leaving the rest unutilized, a Power System provides for that 768GB system and allows the rest of the memory to be allocated to other virtual machines. Likewise, HANA is often characterized by periods of 100% utilization, in support of instantaneous response time demanded of ad-hoc queries, followed by unfathomably long periods (in computer terms) of little to no activity. Many customers might consider this to be a waste of valuable computing resource and look forward to being able to harness this for the myriad of other business purposes that their businesses actually depend on. This is the promise of Power. Put another way, the appliance model results in islands of automation like we saw in the 1990s where Power continues the model of server consolidation and virtualization that has become the modus operandi of the 2000s.
But, says the pitchman for a made for TV product, if you call right now, we will double the offer. If you believe that, then you are probably not reading my blog. If a product was that good, they would not have to give you more for the same price. Power, on the other hand, takes a different approach. Where conventional BW HANA systems offer a maximum size of 2TB for a single node, Power has no such inherent limitations. To handle larger sizes, conventional systems must “scale-out” with a variety of techniques, potentially significantly increased costs and complexity. Power offers the potential to simply “scale-up”. Future IBM Power solutions may be able to scale-up to 4TB, 8TB or even 16TB. In a recent post to this blog, I explained that to match the built in redundancy for mission critical reliability of memory in Power, x86 systems would require memory mirroring at twice the amount of memory with an associated increase in CPU and reduction in memory bandwidth for conventional x86 systems. SAP is pushing the concepts of MCOS, MCOD and multi-tenancy, meaning that customers are likely to have even more of their workloads consolidated on fewer systems in the future. This will result in demand for very large scaling systems with unprecedented levels of availability. Only IBM is in position to deliver systems that meet this requirement in the near future.
Details on these solution editions can be found at http://www-03.ibm.com/systems/power/hardware/sod.html
In the last few days, IBM and other organizations have published information about the solution editions and the value of HANA on Power. Here are some sites worth visiting:
Press Release: IBM Unveils Power Systems Solutions to Support SAP HANA
Video: The Next Chapter in IBM and SAP Innovation: Doug Balog announces SAP HANA on POWER8
Case study: Technische Universität München offers fast, simple and smart hosting services with SAP and IBM
Video: Technische Universität München meet customer expectations with SAP HANA on IBM POWER8
Analyst paper: IBM: Empowering SAP HANA Customers and Use Cases
Article: HANA On Power Marches Toward GA
Selected SAP Press
ComputerWorld: IBM’s new Power Systems servers are just made for SAP Hana
eWEEK, IBM Launches Power Systems for SAP HANA
ExecutiveBiz, IBM Launches Power Systems Servers for SAP Hana Database System; Doug Balog Comments
TechEYE.net, IBM and SAP work together again
ZDNet: IBM challenges Intel for space on SAP HANA
Data Center Knowledge: IBM Stakes POWER8 Claim to SAP Hana Hardware Market
Enterprise Times: IBM gives SAP HANA a POWER8 boost
The Platform: IBM Scales Up Power8 Iron, Targets In-Memory
Also a planning guide for HANA on Power has been published at http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102502 .
How to ensure Business Suite on HANA infrastructure is mission critical ready
Companies that plan on running Business Suite on HANA (SoH) require systems that are at least as fault tolerant as their current mission critical database systems. Actually, the case can be made that these systems have to exceed current reliability design specifications due to the intrinsic conditions of HANA, most notably, but not limited to, extremely large memory sizes. Other factors that will further exacerbate this include MCOD, MCOS, Virtualization and the new SPS09 feature, Multi-Tenancy.
A customer with 5TB of data in their current uncompressed Suite database will most likely see a reduction due to HANA compression (SAP note 1793345, and the HANA cookbook²) bringing their system size, including HANA work space, to roughly 3TB. That same customer may have previously been using a database buffer of 100GB +/- 50GB. At a current buffer size of 100GB, their new HANA system will require 30 times the amount of memory as the conventional database did. All else being equal, 30x of any component will result in 30x failures. In 2009, Google engineers wrote a white paper in which they noted that 8% of DIMMS experienced errors every year with most being hard errors and that when a correctable error occurred in a DIMM, there was a much higher chance that another would occur in that same DIMM leading, potentially, to uncorrectable errors.¹ As memory technology has not changed much since then, other than getting denser which could lead to even more likelihood of errors due to cosmic rays and other sources, the risk has likely not decreased. As a result, unless companies wish to take chances with their most critical asset, they should elect to use the most reliable memory available.
IBM provides exactly that, the best of breed open systems memory reliability, not as an option at a higher cost, but included with every POWER8 system, from the one and two socket scale-out systems to even more advanced capabilities with the 4 & 8-socket systems, some of which will scale to 16-sockets (announced as a Statement of Direction for 2015). This memory protection is represented in multiple discreet features that work together to deliver unprecedented reliability. The following gets into quite a bit of technical detail, so if you don’t have your geek hat on, (mine can’t be removed as it was bonded to my head when I was reading Heinlein in 6th grade; yes, I know that dates me), then you may want to jump to the conclusions at the end.
Chipkill – Essentially a RAID like technology that spans data and ECC recovery information across multiple memory chips such that in the event of a chip failure, operations may continue without interruption. Using x8 chips, Chipkill provides for Single Device Data Correction (SDDC) and with x4 chips, provides Double Device Data Correction (DDDC) due to the way in which data and ECC is spread across more chips simultaneously.
Spare DRAM modules – Each rank of memory (4 ranks per card on scale-out systems, 8 ranks per card on enterprise systems) contains an extra memory chip. This chip is used to automatically rebuild the data that was held, previously, on the failed chip in the above scenario. This happens transparently and automatically. The effect is two-fold: One, once the recovery is complete, no additional processing is required to perform Chipkill recovery allowing performance to return to pre-failure levels; Two, maintenance may be deferred as desired by the customer as Chipkill can, yet again, allow for uninterrupted operations in the event of a second memory chip failure and, in fact, IBM does not even make a call out for repair until a second chip fails.
Dynamic memory migration and Hypervisor memory mirroring – These are unique technologies only available on IBM’s Enterprise E870 and E880 systems. In the event that a DIMM experiences errors that cannot be permanently corrected using sparing capability, the DIMM is called out for replacement. If the ECC is capable of continuing to correct the errors, the call out is known as a predictive callout indicating the possibility of a future failure. In such cases, if an E870 or E880 has unlicensed or unassigned DIMMS with sufficient capacity to handle it, logical memory blocks using memory from a predictively failing DIMM will be dynamically migrated to the spare/unused capacity. When this is successful this allows the system to continue to operate until the failing DIMM is replaced, without concern as to whether the failing DIMM might cause any future uncorrectable error. Hypervisor memory mirroring is a selective mirroring technology for the memory used by the hypervisor which means that even a triple chip failure in a memory DIMM would not affect the operations of the hypervisor as it would simply start using the mirror.
L4 cache – Instead of conventional parity or ECC protected memory buffers used by other vendors, IBM utilizes special eDRAM (a more reliable technology to start with) which not only offers dramatically better performance but includes advanced techniques to delete cache lines for persistent recoverable and non-recoverable fault scenarios as well as to deallocate portions of the cache spanning multiple cache lines.
Extra memory lane – the connection from memory DIMMs or cards is made up of dozens of “lanes” which we can see visually as “pins”. POWER8 systems feature an extra lane on each POWER8 chip. In the event of an error, the system will attempt to retry the transfer, use ECC correction and if the error is determined by the service processor to be a hard error (as opposed to a soft/transient error), the system can deallocate the failing lane and allocate the spare lane to take its place. As a result, no downtime in incurred and planned maintenance may be scheduled at a time that is convenient for the customer since all lanes, including the “replaced” one are still fully protected by ECC.
L2 and L3 Caches likewise have an array of protection technology including both cache line delete and cache column repair in addition to ECC and special hardening called “soft latches” which makes these caches less susceptible to soft error events.
As readers of my blog know, I rarely point out only one side of the equation without the other and in this case, the contrast to existing HANA capable systems could not be more dramatic making the symbol between the two sides a very big > symbol; details to follow.
Intel offers a variety of protection technologies for memory but leaves the decision as to which to employ up to customers. This ranges from “performance mode” which has the least protection to “RAS mode” which has more protection at the cost of reduced performance.
Let’s start with the exclusives for IBM: eDRAM L4 cache with its inherent superior protection and performance over conventional memory buffer chips, dynamic memory migration and hypervisor memory mirroring available on IBM Enterprise class servers, none of which are available in any form on x86 servers. If these were the only advantages for Power Systems, this would already be compelling for mission critical systems, but this is only the start:
Lock step – Intel included similar technology to Chipkill in all of their chips which they call Lock step. Lock step utilizes two DIMMs behind a single memory buffer chip to store a 64-byte cache line + ECC data instead of the standard single DIMM to provide 1x or 2x 8-bit error detection and 8-bit error correction within a single x8 or x4 DRAM respectively (with x4 modules, this is known as Double Device Data Correction or DDDC and is similar to standard POWER Chipkill with x4 modules.) Lock Step is only available in RAS mode which incurs a penalty relative to performance mode. Fujitsu released a performance white paper³ in which they described the results of a memory bandwidth benchmark called STREAM in which they described Lock step memory as running at only 57% of the speed of performance mode memory.
Lock step is certainly an improvement over standard or performance mode in that most single device events can be corrected on the fly (and two such events serially for x4 DIMMS) , but correction incurs a performance penalty above and beyond that incurred from being in Lock step mode in the first place. After the first such failure, for x8 DIMMS, the system cannot withstand a second failure in that Lockstep pair of DIMMS and a callout for repair (read this as make a planned shutdown as soon as possible) be made to prevent a second and fatal error. For x4 DIMMS, assuming the performance penalty is acceptable, the planned shutdown could be postponed to a more convenient time. Remember, with the POWER spare DRAMS, no such immediate action is required.
Memory sparing – Since taking an emergency shutdown is unacceptable for a SoH system, Lock Step memory is therefore insufficient since it handles only the emergency situation but does not eliminate the need for a repair action (as the POWER memory spare does) and it incurs a performance penalty due to having to “lash” together two cards to act as one (as compared to POWER that achieves superior reliability with a single memory card). Some x86 systems offer memory sparing in which one rank per memory channel is configured as a spare. For instance, with the Lenovo System x x3850, each memory channel supports 3 DIMMs or ranks. If sparing is used, the effective memory throughput of the system is reduced by 1/3 since one of every 3 DIMMs is no longer available for normal operations and the memory that must be purchased is increased by 50%. In other words, 1TB of usable memory requires 1.5TB of installed memory. The downsize of sparing is that it is a predictive failure technology, not a reactive one. According to the IBM X6 Servers: Technical Overview Redbook- “Sparing provides a degree of redundancy in the memory subsystem, but not to the extent of mirroring. In contrast to mirroring, sparing leaves more memory for the operating system. In sparing mode, the trigger for failover is a preset threshold of correctable errors. When this threshold is reached, the content is copied to its spare. The failed rank is then taken offline, and the spare counterpart is activated for use.” In other words, this works best when you can see it coming, not after a part of the memory has failed. When I asked a gentleman manning the Lenovo booth at TechEd && d-code about sparing, he first looked at me as if I had a horn sticking out of my head and then replied that almost no one uses this technology. Now, I think I understand why. This is a good option, but at a high cost and still falls short of POWER8 memory protection which is both predictive and reactive and dynamically responds to unforeseen events. By comparison, memory sparing requires a threshold to be reached and then enough time to be available to complete a full rank copy, even if only a single chip is showing signs of imminent failure.
Memory mirroring – This technology utilizes a complete second set of memory channels and DIMMs to maintain a second copy of memory at all times. This allows for a chip or an entire DIMM to fail with no loss of data as the second copy immediately takes over. This option, however, does require that you double the amount of memory in the system, utilize plenty of system overhead to keep the pairs synchronized and take away ½ of the memory bandwidth (the other half of which goes to the copy). This option may perform better than the memory sparing option because reads occur from both copies in an interleaved manner, but writes have to occur to both synchronously.
Conclusions:
Memory mirroring for x86 systems is the closest option to the continuous memory availability that POWER8 delivers. Of course, having to purchase 2TB of memory in order to have proper protection of 1TB of effective memory adds a significant cost to the system and takes away substantial memory bandwidth. HANA utilizes memory as few other systems do.
The problem is that x86 vendors won’t tell customers this. Why? Now, I can only speculate, but that is why I have a blog. The x86 market is extremely competitive. Most customers ask multiple vendors to bid on HANA opportunities. It would put a vendor at a disadvantage to include this sort of option if the customer has not required it of all vendors. In turn, x86 vendors don’t won’t to even insinuate that they might need such additional protection as that would imply a lack of reliability to meet mission critical standards.
So, let’s take this to the next logical step. If a company is planning on implementing SoH using the above protection, they will need to double their real memory. Many customers will need 4TB, 8TB or even some in the 12TB to 16TB range with a few even larger. For the 4TB example, an 8TB system would be required which, as of the writing of this blog post, is not currently certified by SAP. For the 8TB example, 16TB would be required which exceeds most x86 vendor’s capabilities. At 12TB, only two vendors have even announced the intention of building a system to support 24TB and at 16TB, no vendor has currently announced plans to support 32TB of memory.
Oh, by the way, Fujitsu, in the above referenced white paper, measured the memory throughput of a system with memory mirroring and found it to be 69% that of a performance optimized system. Remember, HANA demands extreme memory throughput and benchmarks typically use the fastest memory, not necessarily the most reliable meaning that if sizings are based on benchmarks, they may require adjustment when more reliable memory options are utilized. Would larger core counts then be required to drive the necessary memory bandwidth?
Clearly, until SAP writes new rules to accommodate this necessary technology or vendors run realistic benchmarks showing just how much cpu and memory capacity is needed to support a properly mirrored memory subsystem on an x86 box, customers will be on their own to figure out what to do.
That guess work will be removed once HANA on Power GAs as it already includes the mission critical level of memory protection required for SoH and does so without any performance penalty.
Many thanks to Dan Henderson, IBM RAS expert extraordinaire, from whom I liberally borrowed some of the more technically accurate sentences in this post from his latest POWER8 RAS whitepaper¹¹ and who reviewed this post to make sure that I properly represented both IBM and non-IBM RAS options.
¹ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
² https://cookbook.experiencesaphana.com/bw/operating-bw-on-hana/hana-database-administration/monitoring-landscape/memory-usage/
³ http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CD0QFjAA&url=http%3A%2F%2Fdocs.ts.fujitsu.com%2Fdl.aspx%3Fid%3D8ff6579c-966c-4bce-8be0-fc7a541b4a02&ei=t9VsVIP6GYW7yQTGwIGICQ&usg=AFQjCNHS1fOnd_QAnVV6JjRju9iPlAZkQg&bvm=bv.80120444,d.aWw
¹¹ http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=WH&infotype=SA&appname=STGE_PO_PO_USEN&htmlfid=POW03133USEN&attachment=POW03133USEN.PDF#loaded.
Why SAP HANA on IBM Power Systems
This entry has been superseded by a new one: https://saponpower.wordpress.com/2014/06/06/there-is-hop-for-hana-hana-on-power-te-program-begins/
Before you get the wrong impression, SAP has not announced the availability of HANA on Power and in no way, should you interpret this posting as any sort of pre-announcement. This is purely a discussion about why you should care whether SAP decides to support HANA on Power .
As you may be aware, during SAP’s announcement of the availability of HANA for Business Suite DB for ramp-up customers in early January, Vishal Sikka, Chief Technology Officer and member of the Executive Board at SAP, stated “We have been working heavily with [IBM]. All the way from lifecycle management and servers to services even Cognos, Business Intelligence on top of HANA – and also evaluating the work that we have been doing on POWER. As to see how far we can be go with POWER – the work that we have been doing jointly at HPI. This is a true example of open co-innovation, that we have been working on.” Ken Tsai, VP of SAP HANA product marketing, later added in an interview with IT Jungle, “Power is something that we’re looking at very closely now.” http://www.itjungle.com/fhs/fhs011513-story01.html. And from Amit Sinha, head of database and technology product marketing. “[HANA] on Power is a research project currently sponsored at Hasso Plattner Institute. We await results from that to take next meaningful steps jointly with IBM,” Clearly, something significant is going on. So, why should you care?
Very simply, the reasons why customers chose Power Systems (and perhaps HP Integrity and Oracle/Fujitsu SPARC/Solaris) for SAP DBs in the past, i.e. scalability, reliability, security, are just as relevant now with HANA as in the past with conventional databases, perhaps even more so. Why more so? Because once the promise of real time analytics on an operational database is realized, not necessarily in Version 1.0 of the product, but in the future undoubtedly, then the value obtained by this capability would result in the exact same loss of value if the system was not available or did not respond with the speed necessary for real time analytics.
A little known fact is that HANA for Business Suite DB currently is limited to a single node. This means that scale out options, common in the BW HANA space and others, is not an option for this implementation of the product. Until that becomes available, customers that wish to host large databases may require a larger number of cores than x86 vendors currently offer.
A second known but often overlooked fact is that parallel transactional database systems for SAP are often complex, expensive and have so many limitations that only two types of customers consider this option; those which need continuous or near continuous availability and those that want to move away from a robust UNIX solution and realize that to attain the same level of uptime as a single node UNIX system with conventional HA, an Oracle RAC or DB2 PureScale cluster is required. Why is it so complex? Without getting into too much detail, we need to look at the way SAP applications work and interact with the database. As most are aware, when a user logs on to SAP, they are connecting to a unique application server and until they log off, will remain connected to that server. Each application server is, in turn, connected to one node of a parallel DB cluster. Each request to read or write data is sent to that node and if the data is local, i.e. in the memory of that node, the processing occurs very rapidly. If, on the other hand, the data is on another node, that data must be moved from the remote node to the local node. Oracle RAC and DB2 PureScale use two different approaches with Oracle RAC using their Cache Fusion to move the data across an IP network and DB2 PureScale using Remote DMA to move the data across the network without using an IP stack, thereby improving speed and reducing overhead. Though there may be benefits of one over the other, this posting is not intended to debate this point, but instead point out that even with the fastest, lowest overhead transfer on an Infiniband network, access to remote memory is still thousands of time slower than accessing local memory.
Some applications are “cluster aware”, i.e. application servers connect to multiple DB nodes at the same time and direct traffic based on data locality which can only be possible if the DB and App servers work cooperatively to communicate what data is located where. SAP Business Suite is not currently cluster aware meaning that without a major change in the Netweaver Stack, replacing a conventional DB with a HANA DB will not result in cluster awareness and the HANA DB for Business Suite may need to remain as a single node implementation for some time.
Reliability and Security have been the subject of previous blog posts and will be reviewed in some detail in an upcoming post. Clearly, where some level of outages may be tolerable for application servers due to an n+1 architecture, few customers consider outages of a DB server to be acceptable unless they have implemented a parallel cluster and even then, may be mitigated, but still not considered tolerable. Also, as mentioned above, in order to achieve this, one must deal with the complexity, cost and limitations of a parallel DB. Since HANA for Business Suite is a single node implementation, at least for the time being, an outage or security intrusion would result in a complete outage of that SAP instance, perhaps more depending on interaction and interfaces between SAP components. Power Systems has a proven track record among Medium and Large Enterprise SAP customers of delivering the lowest level of both planned and unplanned outages and security vulnerabilities of any open system.
Virtualization and partition mobility may also be important factors to consider. As all Power partitions are by their very definition “virtualized”, it should be possible to dynamically resize a HANA DB partition, host multiple HANA DB partitions on the same system and even move those partitions around using Live Partition Mobility. By comparison, an x86 environment lacking VMware or similar virtualization technology could do none of the above. Though, in theory, SAP might support x86 virtualization at some point for production HANA Business Suites DBs, they don’t currently and there are a host of reasons why they should not which are the same reasons why any production SAP databases should not be hosted on VMware as I discussed in my blog posting: https://saponpower.wordpress.com/2011/08/29/vsphere-5-0-compared-to-powervm/ Lacking x86 virtualization, a customer might conceivably need a DB/HA pair of physical machines for each DB instance compared to potentially a single DB/HA pair for a Power based virtualized environment.
And now a point of pure speculation; with a conventional database, basis administrators and DBAs weigh off the cost/benefit of different levels in a storage hierarchy including main memory, flash and HDDs. Usually, main memory is sized to contain upwards of 95% of commonly accessed data with flash being used for logs and some hot data files and HDDs for everything else. For some customers, 30% to 80% of an SAP database is utilized so infrequently that keeping aged items in memory makes little sense and would add cost without any associated benefit. Unlike conventional DBs, with HANA, there is no choice. 100% of an SAP database must reside in memory with flash used for logs and HDDs used for a copy of the data in memory. Not only does this mean radically larger amounts of memory must be used but as a DB grows, more memory must be added over time. Also, more memory means more DIMMS with an associated increase in DIMM failure rates, power consumption and heat dissipation. Here Power Systems once again shines. First, IBM offers Power Systems with much larger memory capabilities but also offers Memory on Demand on Power 770 and above systems. With this, customers can pay for just the memory they need today and incrementally and non-disruptively add more as they need it. That is not speculation, but the following is. Power Systems using AIX offers Active Memory Expansion (AME), a unique feature which allows infrequently accessed memory pages to be placed into a compress pool which occupies much less space than uncompressed pages. AIX then transparently moves pages between uncompressed and compressed pools based on page activity using a hardware accelerator in POWER7+. In theory, a HANA DB could take advantage of this in an unprecedented way. Where test with DB2 have shown a 30% to 40% expansion rate (i.e. 10GB of real memory looks like 13GB to 14GB to the application), since potentially far more of a HANA DB would have low use patterns that it may be possible to size the memory of a HANA DB at a small fraction of the actual data size and consequently at a much lower cost plus associated lower rates of DIMM failures, less power and cooling.
If you feel that these potential benefits make sense and that you would like to see a HoP option, it is important that you share this desire with SAP as they are the only ones that can make the decision to support Power. Sharing your desire does not imply that you are ready to pull the trigger or that you won’t consider all available option, simply that you would like to get informed about SAP’s plans. In this way, SAP can gauge customer interest and you can have the opportunity to find out which of the above suggested benefits might actually be part of a HoP implementation or even get SAP to consider supporting one or more of them that you consider to be important. Customers interested in receiving more detailed information on the HANA on Power effort should approach their local SAP Account Executive in writing, requesting disclosure information on this platform technology effort.
Oracle M9000 SAP ATO Benchmark analysis
SAP has a large collection of different benchmark suites. Most people are familiar with the SAP Sales and Distribution (SD) 2-tier benchmark as the vast majority of all results have been published using this benchmark suite. A lesser known benchmark suite is called ATO or Assemble-to-Order. When the ATO benchmark was designed it was intended to replace SD as a more “realistic” workload. As the benchmark is a little more complicated to run and SAP Quicksizer sizings are based on the SD workload the ATO benchmark never got much traction and from 1998 through 2003, only 19 results were published. Prior to September 2, 2011, this benchmark had seemed to become extinct. On that date, Oracle and Fujitsu, published a 2-tier result for the SPARC M9000 along with the predictable claim of world record result. Oracle should be commended for having beaten the results published in 2003. Of course, we might want to consider that a 2-processor/12-core, 2U Intel based system of today has already surpassed the TPC-C results of a 64-core HP Itanium Superdome that “set the record” back in 2003 at a tiny fraction of the cost and floor space.
So we give Oracle a one-handed clap for this “accomplishment”. But if I left it at that, you might question why I would even bother to post this blog entry. Let’s delve a little deeper to find the story within the story. First let me remind the reader, these are my opinions and in no way do they reflect the opinions of IBM nor has IBM endorsed or reviewed my opinions.
In 2003, Fujitsu-Siemens published a couple of ATO results using a predecessor of today’s SPARC64 VII chip called SPARC64TM V at 1.35GHz and SAP 4.6C. The just published M9000 result used the SPARC64 VII at 3.0GHz and SAP EP4 for SAP ERP 6.0 with Unicode. If one were to divide the results achieved by both systems by the number of cores and compare them, one might find that the new results deliver about a very small increase in throughput per core of roughly 6% over the old results. Of course, this does not account for the changes in SAP software, Unicode or benchmark requirements. SAP rules do not allow for extrapolations, so I will instead provide you with the data from which to make your own calculations. 100 SAPS using SAP 4.6c is equal to about 55 SAPS using Business Suite 7 with Unicode. If you were to multiply the old result by 55/100 and then divide by the number of cores, you could determine the effective throughput per core of the old system if it were running the current benchmark suite. I can’t show you the result, but will show you the formula that you can use to determine this result yourself at the end of this posting.
For comparison, I wanted to figure out how Oracle did on the SD 2-tier benchmark compared to systems back in 2003. Turns out that almost identical systems were used both in 2003 and in 2009 with the exception of the Sun M9000 which used 2.8GHz processors each of which had half of the L2 cache of the 3.0GHz system used in the ATO benchmark. If you were to use a similar formula to the one described above and then perhaps multiply by the difference in MHz, i.e. 3.0/2.8 you could derive a similar per core performance comparison of the new and old systems. Prior to performing any extrapolations, the benchmark users per core actually decreased between 2003 and 2009 by roughly 10%.
I also wanted to take a look at similar systems from IBM then and now. Fortunately, IBM published SD 2-tier results for the 8-core 1.45GHz pSeries 650 in 2003 and for the a 256-core 4.0GHz Power 795 late last year with the SAP levels being identical to the ones used by Sun and Fujitsu-Siemens respectively. Using the same calculations as were done for the SD and ATO comparisons above, IBM achieved 223% more benchmark users per core than they achieved in 2003 prior to any extrapolations.
Yes, there was no typo there. While the results by IBM improved by 223% on a per core basis, the Fujitsu processor based systems either improved by only 9% or decreased by 10% depending on which benchmark you chose. Interestingly enough, IBM had only a 9% per core advantage over Fujitsu-Siemens in 2003 which increased to a 294% advantage in 2009/2010 based on the SD 2-tier benchmark.
It is remarkable that since November 18, 2009, Oracle(Sun) has not published a single SPARC based SAP SD benchmark result while over 70 results were published by a variety of vendors, including two by Sun for their Intel systems. When Oracle finally decided to get back into the game to try to prove their relevance despite a veritable flood of analyst and press suggestions to the contrary, rather than competing on the established and vibrant SD benchmark, they choose to stand on top of a small heap of dead carcasses to say they are better than the rotting husks upon which they stand.
For full disclosure, here are the actual results:
SD 2-tier Benchmark Results
Certification Date System # Benchmark Users SAPS Cert #
1/16/2003 IBM eServer pSeries 650, 8-cores 1,220 6,130 2003002
3/11/2003 Fujitsu Siemens Computers, PrimePower 900, 8-cores 1,120 5,620 2003009
3/11/2003 Fujitsu Siemens Computers, PrimePower 900, 16-cores 2,200 11,080 2003010
11/18/2009 Sun Microsystems, M9000, 256-cores 32,000 175,600 2009046
11/15/2010 IBM Power 795, 256-cores 126,063 688,630 2010046
ATO 2-tier results:
Certification Date System Fully Processed Assembly Orders/Hr Cert #
3/11/2003 Fujitsu Siemens Computers, PrimePower 900, 8-cores 6,220 2003011
03/11/2003 Fujitsu Siemens Computers, PrimePower 900, 16-cores 12,170 2003012
09/02/2011 Oracle M9000, 256-cores 206,360 2011033
Formulas that you might use assuming you agree with the assumptions:
Performance of old system / number of cores * 55/100 = effective performance per core on new benchmark suite (EP)
(Performance of new system / cores ) / EP = relative ratio of performance per core of new system compared to old system
Improvement per core = 1 – relative ratio
This can be applied to both the SD and ATO results using the appropriate throughput measurements.
vSphere 5.0 compared to PowerVM
Until recently, VMware partitions suffered from a significant scalability limitation. Each partition could scale to a maximum of 8 virtual processors (vp) with vSphere 4.1 Enterprise Edition. For many customers and uses, this did not pose much of an issue as some of the best candidates for x86 virtualization are the thousands of small, older servers which can easily fit within a single core of a modern Intel or AMD chip. For SAP customers, however, the story was often quite different. Eight vp does not equate to 8 cores, it equates to 8 processor threads. Starting with Nehalem, Intel offered HyperThreading which allowed each core to run two different OS threads simultaneously. This feature boosted throughput, on average, by about 30% and just about all benchmarks since that time have been run with HyperThreading enabled. Although it is possible to disable it, few customers elect to do so as it removes that 30% increased throughput from the system. With HyperThreading enabled, 8 VMware vp utilize 4 cores/8 threads which can be as little as 20% of the cores on a single chip. Put in simple terms, this can be as little as 5,000 SAPS depending on the version and MHz of the chip. Many SAP customers routinely run their current application servers at 5,000 to 10,000 SAPS, meaning moving these servers to VMware partitions would result in the dreaded hotspot, i.e. bad performance and a flood of calls to the help desk. By comparison, PowerVM (IBM’s Power Systems virtualization technology) partitions may scale as large as the underlying hardware and if that limit is reached, may be migrated live to a larger server, assuming one exists in the cluster, and the partition allowed to continue to operate without interruption and a much higher partition size capability.
VMware recently introduced vSphere 5.0. Among a long list of improvements is the ability to utilize 32 vp for a single partition. On the surface, this would seem to imply that VMware can scale to all but a very few large demands. Once you dig deeper, several factors emerge. As vSphere 5.0 is very new, there are not many benchmarks and even less customer experience. There is no such thing as a linearly scalable server, despite benchmarks that seem to imply this, even from my own company. All systems have a scalability knee of the curve. Where some workloads, e.g. AIM7, when tested by IBM showed up to 7.5 times the performance with 8 vp compared to 1 vp on a Xeon 5570 system with vSphere 4.0 update 1, it is worthwhile to note that this was only achieved when no other partitions were running, clearly not the reason why anyone would utilize VMware. In fact, one would expect just the opposite, that an overcommitment of CPU resources would be utilized to get the maximum throughput of a system. On another test, DayTrader2.0 in JDBC mode, a scalability maximum of 4.67 the performance of a single thread was reached with 8 vp, once again while running no other VMs. It would be reasonable to assume that VMware has done some scaling optimization but it would be premature and quite unlikely to assume that 32 vp will scale even remotely close to 4 times the performance of an 8 vp VM. When multiple VMs run at the same time, VMware overhead and thread contention may reduce effective scaling even further. For the time being, a wise customer would be well advised to wait until more evidence is presented before assuming that all scaling issues have been resolved.
But this is just one issue and, perhaps, not the most important one. SAP servers are by their very nature, mission critical. For database servers, any downtime can have severe consequences. For application servers, depending on how customers implement their SAP landscapes and the cost of downtime, some outages may not have as large the consequence. It is important to note that when an application server fails, the context for each user ‘s session is lost. In a best case scenario, the users can recall all necessary details to re-run the transactions in flight after re-logging on to another application server. This means that the only loss is the productivity of that user multiplied by the number of users previously logged on and doing productive work on that server. Assuming 500 users and 5 minutes to get logged back on, transaction initiated through to completion, this is only 2,500 minutes of lost productivity which at a loaded cost of $75,000 per employee is only a total loss to the company of $1,500 per occurrence. With one such occurrence per application server per year, this would result in $6,000 of cost over 5 years and should be included in any comparison of TCO. Of course, this does not take into consideration any IT staff time required to fix the server, any load on the help desk to help resolve issues, nor any political cost to IT if failures happen too frequently. But what happens if the users are unable to recall all of the details necessary to re-run the transactions or what happens if tight integration with production requires that manufacturing be suspended until all users are able to get back to where they had been? The costs can escalate very quickly.
So, what is my point? All x86 hypervisors, including VMware 4.1 and 5.0, are software layers on top of the hardware. In the event of an uncorrectable error in the hardware, the hypervisor usually fails and, in turn, takes down all VMs that it is hosting. Furthermore, problems are not just confined to the CPU, but could be caused by memory, power supplies, fans or a large variety of other components. I/O is yet another critical issue. VMware provides shared I/O resources to partitions, but it does this sharing within the same hypervisor. A device driver error, physical card error or, in some cases, even an external error in a cable, for example, might result in a hypervisor critical error and resulting outage. In other words, the hypervisor becomes a very large single point of failure. In order to avoid the sort of costs described above, most customers try to architect mission critical systems to reduce single points of failure not introduce new ones.
PowerVM takes the opposite approach. First, it is implemented in hardware and firmware. As the name implies, hardware is hardened meaning it is inherently more reliable and far less code is required since many functions are built into the chip.
Second, PowerVM acts primarily as an elegant dispatcher. In other words, it decides which partition executes next in a given core, but then it gets out of the way and allows that partition to execute natively in that core with no hypervisor in the middle of it. This means that if an uncorrectable error were to occur, an exceedingly rare event for Power Systems due to the wide array of fault tolerant components not available in any x86 server, in most situations the error would be confined to a single core and the partition executing in that core at that moment.
Third, sharing of I/O is done through the use of a separate partition called the Virtual I/O (VIO) server. This is done to remove this code from the hypervisor, thereby making they hypervisor more resilient and also to allow for extra redundancy. In most situations, IBM recommends that customers utilize more than one VIO server and spread I/O adapters across those servers with redundant virtual connections to each partition. This means that if an error were to occur in a VIO server, once again a very rare event, only the VIO server might fail, but the other VIO servers would not fail and there would be no impact on the hypervisor since it is not involved in the sharing of I/O at all. Furthermore, partitions would not fail since they would be multipathing virtual devices across more than one VIO server.
So even if VMware can scale beyond 8vp, the question is how much of your enterprise are you ready to place on a single x86 server? 500 users? 1,000 users? 5,000 users? Remember, 500 users calling the help desk at one time would result in long delays. 1,000 at the same time would result in many individuals not waiting and calling their LOB execs instead.
In the event that this is not quite enough of a reason to select Power and PowerVM over x86 with VMware, it is worthwhile to consider the security exposure differences. This has been covered already in a prior blog entry comparing Power to x86 servers, but is worthwhile noting again. PowerVM has no known vulnerabilities according to the National Vulnerability Database, http:// nvd.nist.gov. By comparison, a search on that web site for VMware results in 119 hits. Admittedly, this includes older versions as well as workstation versions, but it is clear that hackers have historically found weaknesses to exploit. VMware has introduced vShield with vSphere 5.0, a set of technologies intended to make VMware more secure, but only it would be prudent to wait and see if this closes all holes or opens new ones.
Also covered in the prior blog entry, the security of the hypervisor is only one piece of the equation. Equally, or perhaps more important is the security of the underlying OSs. Likewise, AIX is among the least vulnerable OSs with Linux and Windows having an order of magnitude more vulnerabilities. Also covered in that blog was a discussion about problem isolation, determination and vendor ownership of problems to drive them to successful resolution. With IBM, almost the entire stack is owned by IBM and supported for mission critical computing whereas with x86, the stack is a hodgepodge of vendors with different support agreements, capabilities and views on who should be responsible for a problem often resulting in finger pointing.
There is no question that VMware has tremendous potential for applications that are not mission critical as well as being an excellent fit for many non-production environments. For SAP, the very definition of mission critical, a more robust, more scalable and better secured environment is needed and Power Systems with PowerVM does an excellent job of delivering on these requirements.
Oh, I did not mention cost. With the new memory based pricing model for vSphere 5.0, applications such as SAP, which demand enormous quantities of memory, may easily exceed the new limits for memory pool size forcing the purchase of additional VMware licenses. Those extra license costs and their associated maintenance, can easily add enough cost that the price differences, if there are any, between Power and x86 further close to be almost meaningless.