POWER10 – Opening the door for new features and radical TCO reduction to SAP HANA customers
As predicted, IBM continued its cycle of introducing a new POWER chip every four years (after a set of incremental or “plus” enhancements usually after 2 years).[i] POWER10, revealed today at HotChips, has already been described in copious detail by the major chip and technology publications.[ii] I really want to talk about how I believe this new chip may affect SAP HANA workloads, but please bear with me over a paragraph or so to totally geek out over the technology.
POWER10, manufactured by Samsung using 7nm lithography, will feature up to 15 cores/chip, up from the current POWER9 maximum of 12 cores (although most systems have been shipping with chips with 11 cores or fewer enabled). IBM found, as did Intel, that manufacturing increasingly smaller lithography with maximum GHz and maximum core count chips was extremely difficult. This resulted in an expensive process with inevitable microscopic defects in a core or two, so they have intentionally designed a 16-core chip that will never ship with 15 cores enabled. This means that they can handle a reasonable number of manufacturing defects without a substantial impact to chip quality, simply be turning off which ever core does not pass their very stringent quality tests. This will take the maximum the number of cores in all systems up proportionately based on the number of sockets in each system. If that was the only improvement, it would be nice, but not earth shattering as every other chip vendor also usually introduces more or faster cores with each iteration.
Of course, POWER10 will have much more than that. Relative to POWER9, each core will feature 1.5 times the amount of L1 cache, 4x the L2 cache, 4x the TLB and 2x general and 4x matrix SIMD. (Sorry, not going to explain each as I am already getting a lot more geeky than my audience typically prefers.). Each chip will feature over 4 times the memory bandwidth and 4x the interconnect bandwidth. Of course, if IBM only focused on CPU performance and interconnect and memory bandwidths, then the bottleneck would naturally just be pushed to the next logical points, i.e. memory and I/O. So, per a long-standing philosophy at IBM, they didn’t ignore these areas and instead built in support for both existing PCIe Gen 4 and emerging PCIe Gen 5 and both existing DDR4 and emerging DDR5 memory DIMMS, not to mention native support for future GDDR DIMMs. And if this was not enough, POWER10 includes all sorts of core microarchitecture improvements which I will most definitely not get into here as most of my friends in the SAP world are probably already shaking their heads trying to understand the implications of the above improvements.
You might think with all of these enhancements, this chip would run so hot that you could heat an Olympic swimming pool with just one system, but due to a variety of process improvements, this chip actually runs at 3x the power efficiency of POWER9.
Current in-memory workloads, like SAP HANA, rarely face a performance issue due to computational performance. It is reasonable to ask the question whether this is because these workloads are designed with an understanding of the current limitations of systems and avoid doing more computationally intense actions to avoid delivering unsatisfying performance to users who are increasingly impatient with any delays. HANA has kind of set the standard for rapid response so delivering anything else would be seen as a failure. Is HANA holding back on new, but very costly (in computational terms) features that could be unleashed once a sufficiently fast CPU becomes available? If so, then POWER10 could be the catalyst for unleashing some incredible new HANA capabilities, if SAP takes advantage of this opportunity.
A related issue is that perhaps existing cores can deliver better performance, but memory has not been able to keep pace. Remember, 128GB DRAM DIMMS are still the accepted maximum size by SAP for HANA even though 256GB DIMMS have been on the market for some time now. As SAP has long used internal benchmarks to determine ratios of memory to CPU, could POWER10 enable to the use of much more dense memory to drive down the number of sockets required to support HANA workloads thereby decreasing TCO or enabling more server consolidation? Remember, Power Systems all featured imbedded, hardware/hypervisor based virtualization, so adding workloads to a system is just a matter of harnessing any unused extra capacity.
IBM has not released any performance projections for HANA running on POWER10 but has provided some unrelated number crunching and AI projections. Based on those plus the raw and incredibly impressive improvements in both the microarchitecture and number of cores, dramatic cache and TLB increases and gigantic memory and interconnect bandwidth expansions, I predict that each socket will support 2 to 3 times the size of HANA workloads that are possible today (assuming sufficient memory is installed).
In the short term, I expect that customers will be able to utilize TDI 5 relaxed sizing rules to use much larger DIMMs and amounts of memory per POWER10 system to accomplish two goals. For customers with relatively small HANA systems, they will be able to cut in half the number of sockets required compared to existing HANA systems. For customers that currently have large numbers of systems, they will be able to consolidate those into many fewer POWER10 sytems. Either way, customers will see dramatic reductions in TCO using POWER10.
As to those customers that often run out of memory before they run out of CPU, stay tuned for part 2 of this blog post as we discuss perhaps the most exciting new innovation with POWER10, Memory Clustering.
[i] https://twitter.com/IBMPowerSystems/status/1295382644402917376
https://www.linkedin.com/posts/ibm-systems_power10-activity-6701148328098369536-jU9l
https://twitter.com/IBMNews/status/1295361283307581442
[ii] Forbes – https://www.forbes.com/sites/moorinsights/2020/08/17/ibm-discloses-new-power10-processer-at-hot-chips-conference/#12e7a5814b31
VentureBeat – https://venturebeat.com/2020/08/16/ibm-unveils-power10-processor-for-big-data-analytics-and-ai-workloads/
Reuters – https://www.reuters.com/article/us-ibm-samsung-elec/ibm-rolls-out-newest-processor-chip-taps-samsung-for-manufacturing-idUSKCN25D09L
IT Jungle – https://www.itjungle.com/2020/08/17/power-to-the-tenth-power/
Optane DC Persistent Memory – Proven, industrial strength or full of hype – Detail, part 3
In this final of a three part series, we will explore the two other major “benefits” of Optane DIMMs: fast restart and TCO.
Fast restart
HANA, as an in-memory database, must be loaded into memory to perform well. Intel, for years and, apparently up to current times, has suffered with a major bottleneck in its I/O subsystem. As a result, loading a single terabyte of data into memory could take 10 to 20 minutes in a best-case scenario. Anecdotally, some customers have remarked that placing superfast, all flash subsystems, such as IBM’s FlashSystem 9100, behind an Intel HANA system resulted in little improvement in load times compared to mid-range SSD subsystems. For customers attempting to bring up a 10TB storage/20TB memory HANA system, this could result in load times measured in hours. As a result, a faster way of getting a HANA system up and running was sorely needed.
This did not appear to be a problem for customers using IBM’s Power Systems. Not only has Power delivered roughly twice the I/O bandwidth of Intel systems for years, but with POWER9, IBM introduced PCIe Gen4, further extending their leadership in this area. The bottleneck is actually in the storage subsystem and number of paths that it can drive, not in the processor. To prove this, IBM ran a test with 10 NVMe cards in PCIe slots and was able to drive load speeds into HANA of almost 1TB/min.[I]. In other words, to improve restart times, Power Systems customers need only move to faster subsystems and/or add more or faster paths.
This suggests that Intel’s motivation for NVDIMMs may be to solve a problem of their own making. But this also raises a question of their understanding of HANA. If a customer is running a transactional workload such as Suite on HANA, S/4 or C/4, and is using HANA System Replication, wouldn’t at least one of the pair of nodes be available at all times? SAP supports near zero upgrades[ii], so systems, firmware, OS or even HANA itself may be updated on one of the pair of nodes while the other continues to operate, followed by a synchronization of changed data and a controlled failover so that the first node might be updated. In this way, cold restarts of HANA, where a fast restart option might make a big difference, may be driven down into a very rare occurrence. In other words, wouldn’t this be a better option than causing poor performance to everything due to radically slower DIMMs compared to DRAM as has been discussed in gory detail on the previous two posts of this series?
HANA also offers a quick restart option whereby HANA can be started and the database made available within minutes even though all of the columns have not yet been loaded into memory. Yes, performance will be pretty bad until all columns are loaded into memory, but for non-production systems and non-mission critical systems, this might be an acceptable option. Lastly, with HANA 2.0 SPS04, SAP now supports fast restart with conventional memory.[iii] This only works when the OS stays up and running, i.e. can’t be used when the system, firmware or OS is being updated, but this can be used for the vast majority of required restarts, e.g. HANA upgrades, patches and restarts when a bounce of the HANA environment is needed. Though this is not mentioned in the help documentation, it may even be possible to patch the Linux kernel while using the fast restart option if SUSE SLES is used with their “Live Patching” function.[iv]
TCO
Optane DIMMs are less expensive than DRAM DIMMs. List prices appears to be about 40% cheaper when comparing same size DIMMs. Effective prices, however, may have a much smaller delta since there exists competition for DRAM meaning discounts may be much deeper than for the NVDIMMs from Intel, currently the only source. This assumes full utilization of those NVDIMMs which may prove to be a drastically bad assumption. Sizing guidance from SAP[v]shows that the ratio of DRAM vs. PMEM (their term for NVDIMMs) capacity can be anything from 2:1 to 1:4, but it provides no guidance as to where a given workload might fall or what sort of performance impact might result. This means that a customer might purchase NVDIMMs with a capacity ratio of 1:2, e.g. 1TB DRAM:2TB PMEM, but might end up only being able to utilize only 512GB or 1TB PMEM due to negative performance results. In that case, the cost of effective NVDIMMs would have instantly doubled or quadrupled and would, effectively, be more expensive than DRAM DIMMs.
But let us assume the best rather than the worst. Even if only a 2:1 ratio works relatively well, the cost of the NVDIMMs, if sized for that ratio, would be somewhat lower than the equivalent cost of DRAM DIMMs. The problem is that memory, while a significant portion of the cost of systems, is but one element in the overall TCO of a HANA landscape. If reducing TCO is the goal, shouldn’t all options be considered?
Virtualization has been in heavy use by most customers for years helping to drive up system utilization resulting in the need for fewer systems, decreasing network and SAN ports, reducing floor space and power/cooling and, perhaps most importantly, reducing the cost of IT management. Unfortunately, few high end customers, other than those using IBM Power Systems can take advantage of this technology in the HANA world due to the many reasons identified in the latest of many previous posts.[vi] Put another way, if a customer utilizes an industrial strength and proven virtualization solution for HANA, i.e. IBM PowerVM, they may be able to reduce TCO considerably[vii]and potentially much more than the relatively small improvement due to NVDIMMs.
But if driving down memory costs is the only goal, there are a couple of ideas that are less radical than using NVDIMMs worth investigating. Depending on RTO requirements, some workloads might need an HA option, but might not require it to be ready in minutes. If this is the case, then a cold standby server running other workloads which could be killed in the event of a system outage could be utilized, e.g. QA, Dev, Test, Sandbox, Hadoop. Since no incremental memory would be required, memory costs would be substantially lower than that required for System Replication, even if NVDIMMs are used. IBM offers a tool called VM Recovery Manager which can instrument and automate such a configuration.
Another option worth considering, only for non-production workloads, is a feature of IBM PowerVM called Memory Deduplication. After different VMs are started using “a shared memory pool”, the hypervisor builds a logical memory map. It then scans the pages of each VM looking for identical memory pages at which time it uses the logical memory map to point each VM to the same real memory page thereby freeing up the redundant memory pages for use by other workloads. If a page is subsequently changed by one of the VMs, the hypervisor simply recreates a unique real memory page for that VM. The upshot of this feature is that the total quantity of DRAM memory may be reduced substantially for workloads that are relatively static and have large amounts of duplication between them. The reason that this should not be used for production is because when the VMs start, the hypervisor has not yet had the chance to deduplicate the memory pages and, if the sum of logical memory of all VMs is larger than the total memory, paging will occur. This will subside over time and may be of little consequence to non-production workloads, but the risk to performance for production might be considered unacceptable and, besides, “Memory over-commitment must not be used” for production HANA according to SAP.
Summary
Faster restarts than may be possible with traditional Intel systems may be achieved by using near zero HANA upgrades with System Replication, HANA fast restart or by switching to a system with a radically faster I/O subsystem, e.g. IBM Power Systems. TCO may be reduced with tried and proven virtualization technologies as provided with IBM PowerVM, cold standby systems or memory deduplication rather than experimenting with version 1.0 of a new technology with no track record, unknown reliability, poor guidance on sizing and potentially huge impacts to performance.
[i]https://www.ibm.com/downloads/cas/WQDZWBYJ
[ii]https://launchpad.support.sap.com/#/notes/1984882
[iii]https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.04/en-US/ce158d28135147f099b761f8b1ee43fc.html
[iv]https://launchpad.support.sap.com/#/notes/1984787
[v]https://launchpad.support.sap.com/#/notes/2786237
[vi]https://saponpower.wordpress.com/2018/09/26/vmware-pushes-past-4tb-sap-hana-limit/
Optane DC Persistent Memory – Proven, industrial strength or full of hype?
“Intel® Optane™ DC persistent memory represents a groundbreaking technology innovation” says the press release from Intel. They go on to say that it “represents an entirely new way of managing data for demanding workloads like the SAP HANA platform. It is non-volatile, meaning data does not need to be re-loaded from persistent storage to memory after a shutdown. Meanwhile, it runs at near-DRAM speeds, keeping up with the performance needs and expectations of complex SAP HANA environments, and their users.” and “Total cost of ownership for memory for an SAP HANA environment can be reduced by replacing expensive DRAM modules with non-volatile persistent memory.” In other words, they are saying that it performs well, lowers cost and improves restart speeds dramatically. Let’s take a look at each of these potential benefits, starting with Performance, examine their veracity and evaluate other options to achieve these same goals.
I know that some readers appreciate the long and detailed posts that I typically write. Others might find them overwhelming. So, I am going to start with my conclusions and then provide the reasoning behind them in a separate posts.
Conclusions
Performance
Storage class memory is an emerging type of memory that has great potential but in its current form, Intel DC Persistent Memory, is currently unproven, could have a moderate performance impact to highly predictable, low complexity workloads; will likely have a much higher impact to more complex workloads and potentially a significant performance degradation to OLTP workloads that could make meeting performance SLAs impossible.
Some workloads, e.g. aged data in the form of extension nodes, data aging objects, HANA native storage extensions, data tiering or archives could be placed on this type of storage to improve speed of access. On the other hand, if the SLAs for access to aged data do not require near in-memory speeds, then the additional cost of persistent memory over old, and very cheap, spinning disk may not be justified.
Highly predictable, simple, read-only query environments, such as canned reporting from a BW systems may derive some value from this class of memory however data load speeds will need to be carefully examined to ensure data ingestion throughput to encrypted persistent storage allow for daily updates within the allowed outage window.
Restart Speeds
Intel’s Storage Class memory is clearly orders of magnitude faster than external storage, whether SSD or other types of media. Assuming this was the only issue that customers were facing, there were no performance or reliability implications and no other way to address restart times, then this might be a valuable technology. As SAP has announced DRAM based HANA Fast Restart with HANA 2.0 SPS04 and most customers use HANA System Replication when they have high uptime requirements, the need for rapid restarts may be significantly diminished. Also, this may be a solution to a problem of Intel’s own making as IBM Power Systems customers rarely share this concern, perhaps because IBM invested heavily in fast I/O processing in their processor chips.
TCO
On a GB to GB comparison, Optane is indeed less expensive than DRAM … assuming you are able to use all of it. Several vendors’ and SAP’s guidance suggest you populate the same number of slots with NVDIMMs as are used for DRAM DIMMs. SAP recommends only using NVDIMMs for columnar storage and historic memory/slot limitations are largely based on performance. This means that some of this new storage may go unused which means the cost per used GB may not be as low as the cost per installed GB.
And if saving TCO is the goal, there are dozens of other ways in which TCO can be minimized, not just lowering the cost of DIMMs. For customers that are really focused on reducing TCO, effective virtualization, different HA/DR methodologies, optimized storage and other associated IT cost optimization may have as much or more impact on TCO as may be possible with the use of storage class memory. In addition, the cost of downtime should be included in any TCO analysis and since this type of memory is unproven in wide spread and/or large memory installations, plus the available memory protection is less than is available for DRAM based DIMMs, this potential cost to the enterprise may dwarf the savings from using this technology currently.
Scale-up vs. scale-out architectures for SAP HANA – part 1
Dozens of articles, blog posts, how-to guides and SAP notes have been written about this subject. One of the best was by John Appleby, now Global Head of DDM/HANA COEs @ SAP.[i] Several others have been written by vendors with a vested interest in the proposed option. The vendor for which I work, IBM, offers excellent solutions for both options, so my perspective is based on both my and the experiences of our many customers, some that have chosen one or the other option, or both, in some cases.
Scale-out for BW is well established, understood, fully supported by SAP and can be cost effective from the perspective of systems acquisition costs. Scale-out for S/4HANA, by comparison, is in use by very few customers, not well understood, yet is support by SAP for configurations up to 4 nodes. Does this mean that a scale-out architecture should always be used for BW and a scale-up architecture for S/4HANA the only viable choice? This blog post will discuss only BW and similar analytical environments including BW/4HANA, data marts, data lakes, etc. The next will discuss S/4HANA and the third in the series will discuss vendor selection and where one might have an advantage over the others.
Scale-out has 3 key advantages over scale-up:
- Every vendor can participate therefore competitive bidding of “commodity” level systems can result in optimal pricing.
- High availability, using host auto-failover requires nothing more than n+1 systems as the hot standby node can take over the role of any other node (some customers chose n+2 or group nodes and standby nodes).
- Some environments are simply too large to fit in even the largest supported scale-up systems.
Scale-up, likewise, has 3 key advantages over scale-out:
- Performance is, inevitably, better as joins across memory are always faster than joins across a network
- Management is much simpler as query analysis and data distribution decisions need not be performed on a regular basis plus fewer systems are involved with the corresponding decrease in monitoring, updating, connectivity, etc.
- TCO can be lower when the costs of systems, storage, network and basis management are included.
Business requirements, as always, should drive the decision as to which to use. As mentioned, when an environment is simply too large, unless a customer is willing to ask for an exception from SAP (and SAP is willing to grant it), then scale-out may be the only option. Currently, SAP supports BW configurations of up to 6TB on many 8-socket Intel Skylake based systems (up to 12TB on HPE’s 16-socket system) and up to 16TB on IBM Power Systems.
The next most important issue is usually cost. Let’s take a simple example of an 8TB BW HANA requirement. With scale-out, 4 @ 2TB nodes may be used with a single 2TB node for hot standby for a total of 10TB of memory. If scale-up is used, the primary system must be 8TB and the hot-standby another 8TB for a total of 16TB of memory. Considering that memory is the primary driver of the cost of acquisition, 16TB, from any vendor, will cost more than 10TB. If the analysis stops there, then the decision is obvious. However, I would strongly encourage all customers to examine all costs, not just TCA.
In the above example, 5 systems are required for the scale-out configuration vs. 2 for scale-up. The scale-out config could be reduced to 4 systems if 3TB nodes are used with 1TB left unused although the total memory requirement would go up to 12TB. At a minimum, twice the management activities, trouble-shooting and connectivity would be required. Also, remember, prod rarely exists on its own with some semblance of the configuration existing in QA, often DR and sometimes other non-prod instances.
The other set of activities is much more intensive. To distribute load amongst the systems, first data must be distributed. Some data must reside on the master node, e.g. all row-store tables, ABAP tables, general operations tables. Other data such as Fact, DataStore Object (DSO), Persistent Staging Area (PSA) is distributed evenly across the slave nodes based on the desired partitioning specification, e.g. hash, round robin or range. There are also more complex options where specifications can be mixed to get around hash or range limitations and create a multi-level partitioning plan). And, of course, you can partition different tables using different specifications. Which set of distribution specifications you use is highly dependent on how data is accessed and this is where it gets really complicated. Most customers start with a simple specification, begin monitoring placement using the table distribution editor and performance using STO3N plus getting feedback from end users (read that as complaints to the help desk). After some period of time and analysis of performance, many customers elect to redistribute data using a better or more complex set of specifications. Unfortunately, what is good for one query, e.g. distribute data based on month, is bad for another which looks for data based on zipcode, customer name or product number. Some customers report that the above set of activities can consume part or all of one or more FTEs.
Back to the above example. 10TB vs. 16TB which we will assume is replicated in QA and DR, for sake of argument, i.e. the scale-up solution requires 18TB more memory. If the price per TB is $35,000 then the cost different in TCA would be $630,000. The average cost of a senior basis administrator (required for this sort of complex task) in most western countries is in the $150,000 range. That means that over the course of 5 years, the TCO of the scale-up solution, considering only TCA and basis admin costs would be roughly equivalent to the cost of the scale-out solution. Systems, storage and network administration costs could push the TCO of the scale-out solution up relative to the scale-up solution.
And then there is performance. Some very high performance network adapter companies have been able to drive TCP latency across a 10Gb Ethernet down to 3.6us which sounds really good until you consider memory latency is around 120ns, i.e. 30 times faster. Joining tables across nodes not only is substantially slower, but also results in more CPU and memory overhead.[ii] A retailer in Switzerland, Coop Group, reported 5 times quicker analytics while using 85% fewer cores after migrating from an 8-node x86 scale-out BW HANA cluster with 320 total cores to a single scale-up 96-core IBM Power Systems.[iii] While various benchmarks suggest 2x or better per core performance of Power Systems vs. x86, the results suggest far higher, much of which can, no doubt, be attributed to the effect of using a scale-up architecture.
Of course, performance is relative. BW queries run with scale-out HANA will usually outperform BW on a conventional DB by an order of magnitude or more. If this is sufficient for business purposes, then it may be hard to build a case for why faster is required. But end users have a tendency to soak up additional horsepower once they understand what is possible. They do this in the form of more what-if analyses, interactive drill downs, more frequent mock-closes, etc.
If the TCO is similar or better and a scale-up approach delivers superior performance with many fewer headaches and calls to the help desk for intermittent performance problems, then it would be very worthwhile to investigate this option.
To recap; For BW HANA and similar analytical environments, Scale-out architectures usually offer the lowest TCA and scalability beyond the largest scale-up environment. Scale-up architectures offers significantly easier administration, much better performance and competitive to superior TCO.
[i]https://blogs.saphana.com/2014/12/10/sap-hana-scale-scale-hardware/
[ii]https://launchpad.support.sap.com/#/notes/2044468(see FAQ 8)
[iii]https://www.ibm.com/case-studies/coop-group-technical-reference
Lintel for SAP App Servers – The right choice
Or is it? Running SAP application servers on IBM Power Systems with Linux results in a lower TCA than using x86 systems with Linux and VMware. Usually, I don’t start a blog post with the conclusion, but was so amazed by the results of this analysis, that I could not help myself.
For several years now, I have seen many customers move older legacy app servers to x86 systems using Linux and VMware as well as implementing new SAP app servers on the same. When asked why, the answers boil down to cost, skills and standards. Historically, Lintel servers were not just perceived to cost less, but could the cost differences could be easily demonstrated. Students emerging from colleges have worked with Linux far more often than with UNIX and despite the fact that learning UNIX and how it is implemented in actual production environments is very little different in real effort/cost, the perception of Linux skills being more plentiful and consequently less expensive persists. Many companies and government entities have decided to standardize on Linux. For some legacy IBM Power Systems customers, a complicating factor, or perhaps a compelling factor in the analysis, has compared partitions on large enterprise class systems against low cost 2-socket x86 servers. And so, increasingly, enterprises have defaulted to Lintel as the app server of choice.
Something has changed however and is completely overturning the conventional wisdom discussed above. There is a new technology which takes advantage of all of those Linux skills on the market, obeys the Linux standards mandate and costs less than virtualized Lintel systems. What is this amazing new technology? Surprise, it is a descendent of the technology introduced in 1990 by IBM called the RS/6000, with new Linux only POWER8 systems. (OK, since you know that I am an IBM Power guy and I gave away the conclusion at the start of this post, that was probably not much of a surprise.) At least, this is what the marketing guys have been telling us and they have some impressive consultant studies and internal analyses that back up their claims.
For those of you who have been following this blog for a while, you know that I am skeptical of consultant analyses and even more so of internal analyses. So, instead of depending on those, I set out to prove, or disprove, this assertion. The journey began with setting reasonable assumptions. Actually, I went a little overboard and gave every benefit of the doubt to x86 and did the opposite for Power.
Overhead – The pundits, both internal and external, seem to suggest that 10% or more overhead for VMware is reasonable. Even VMware’s best practices guide for HANA suggests an overhead of 10%. However, I have heard some customers claim that 5% is possible. So, I decided to use the most favorable number and settled on 5% overhead. PowerVM does have overhead, but it is already baked into all benchmarks and sizings since it is built into the embedded hypervisor, i.e. it is there even when you are running a single virtual machine on a system.
Utilization – Many experts have suggested that average utilization of VMware systems range in the 20% to 30% range. I found at least one analyst that said that the best run shops can drive their VMware systems up to 45%. I selected 45%, once again since I want to give all of the benefit of the doubt to Lintel systems. By comparison, many experts suggest that 85% utilization is reasonable for PowerVM based systems, but I selected 75% simply to not give any of the benefit of the doubt to Power that I was giving to x86.
SAPS – Since we are talking about SAP app servers, it is logical to use SAP benchmarks. The best result that I could find for a 2 socket Linux Intel Haswell-EP system was posted by Dell @ 90,120 SAPS (1). A similar 2-socket server from IBM was posted @ 115,870 SAPS (2).
IBM has internal sizing tables, as does every vendor, in which it estimates the SAPS capacity of different servers based on different OSs. One of those servers, the Power S822L, a 20-core Linux only system, is estimated to be able to attain roughly 35% less SAPS than the benchmark result for its slightly larger cousin running AIX, but this takes into consideration differences in MHz, number of cores and small differences due to the compilers used for SAP Linux binaries.
For our hypothetical comparison, let us assume that a customer needs approximately the SAPS capacity as can be attained with three Lintel systems running VMware including the 5% overhead mentioned above, a sustained utilization of 45% and 256GB per server. Extrapolating the IBM numbers, including no additional PowerVM overhead and a sustained utilization of 75%, results in a requirement of two S822L systems each with 386GB.
Lenovo, HP and Dell all offer easy to use configurators on the web. I ran through the same configuration for each: 2 @ Intel Xeon Processor E5-2699 v3 18C 2.3GHz 45MB Cache 2133MHz 145W, 16 @ 16GB x4 DIMMS, 1 @ Dual-port 10GB Base-T Ethernet adapter, 2 @ 300GB 10K RPM disk (2 @ 1TB 7200 RPM for Dell) and 24x7x4 hour 3-year warranty upgrades (3). Both the Lenovo and HP sites show an almost identical number for RedHat Enterprise Linux with unlimited guests (Dell’s was harder to decipher since they apply discounts to the prices shown), so for consistency, I used the same price for RHEL including 3-yr premium subscription and support. VMware also offers their list prices on the web and the same numbers were used for each system, i.e. Version 5.5, 2-socket, premium support, 3yr (4).
The configuration for the S822L was created using IBM’s eConfig tool: 2 @ 10-core 3.42 GHz POWER8 Processor Card, 12 @ 32GB x4 DIMMS, 1 @ Dual-port 10GB Base-T Ethernet adapter, 2 @ 300GB 10K RPM disk and a 24x7x4 hour 3-year warranty upgrade, RHEL with unlimited guests and 3yr premium subscription and support and PowerVM with unlimited guests, 3yr 24×7 support (SWMA). Quick disclaimer; I am not a configuration expert with IBM’s products much less those from other companies which means there may be small errors, so don’t hold me to these numbers as being exact. In fact, if anyone with more expertise would like to comment on this post and provide more accurate numbers, I would appreciate that. You will see, however, that all three x86 systems fell in the same basic range, so small errors are likely of limited consequence.
The best list price among the Lintel vendors came in at $24,783 including the warranty upgrade. RHEL 7 came in at $9,259 and VMware @ $9,356 with a grand total for of $43,398 and for 3 systems, $130,194. For the IBM Power System, the hardware list was $33,136 including the warranty upgrade, PowerVM for Linux $10,450 and RHEL 7 $6,895 for a grand total of $51,109 and for 2 systems, $102,218.
So, for equivalent effective SAPS capacity, Lintel systems cost around $130K vs. $102K for Power … and this is before we consider the reliability and security advantages not to mention scalability, peak workload handling characteristics, reduced footprint, power and cooling. Just to meet the list price of the Power System, the Lintel vendors would have to deliver a minimum of 22% discount including RHEL and VMware.
Conclusions:
For customers making HANA decisions, it is important to note that the app server does not go away and SAP fully support heterogeneous configurations, i.e. it does not matter if the app server is on a different platform or even a different OS than the HANA DB server. This means that Linux based Power Boxes are the perfect companion to HANA DB servers regardless of vendor.
For customers that are refreshing older Power app servers, the comparisons can be a little more complicated in that there is a reasonable case to be made for running app servers on enterprise class systems potentially also housing database servers in terms of higher effective utilization, higher reliability, the ability to run app servers in an IFL (Integrated Facility for Linux) at very attractive prices, increased efficiencies and improved speeds through use of virtual Ethernet for app to DB communications. That said, any analysis should start with like for like, e.g. two socket scale-out Linux servers, and then consider any additional value that can be gained through the use of AIX (with active memory expansion) and/or enterprise class servers with or without IFLs. As such, this post makes a clear point that, in a worst case scenario, scale-out Linux only Power Systems are less expensive than x86. In a best case scenario, the TCO, reliability and security advantages of enterprise class Power Systems make the value proposition of IBM Power even more compelling.
For customers that have already made the move to Lintel, the message is clear. You moved for sound economic, skills and standards based reasons. When it is time to refresh your app servers or add additional ones for growth or other purposes, those same reasons should drive you to make a decision to utilize IBM Power Systems for your app servers. Any customer that wishes to pursue such an option is welcome to contact me, your local IBM rep or an IBM business partner.
Footnotes:
1. Dell PowerEdge R730 – 2 Processors / 36 Cores / 72 Threads 16,500 users, Red Hat Enterprise Linux 7, SAP ASE 16, SAP enhancement package 5 for SAP ERP 6.0, Intel Xeon Processor E5-2699 v3, 2.3 Ghz, 262,144MB, Cert # 2014033, 9/10/2014
2. IBM Power System S824, 4 Processors / 24 Cores / 192 Threads, 21,212 Users, AIX 7.1, DB2 10.5, SAP enhancement package 5 for SAP ERP 6.0, POWER8, 3.52 Ghz, 524,288MB, Cert # 2014016, 4/28/2014
3. https://www-01.ibm.com/products/hardware/configurator/americas/bhui/flowAction.wss?_eventId=launchNIConfigSession&CONTROL_Model_BasePN=5462AC1&_flowExecutionKey=_cF5B38036-BD56-7C78-D1F7-C82B3E821957_k34676A10-590F-03C2-16B2-D9B5CE08DCC9
http://configure.us.dell.com/dellstore/config.aspx?c=us&cs=04&fb=1&l=en&model_id=poweredge-r730&oc=pe_r730_1356&s=bsd&vw=classic
http://h71016.www7.hp.com/MiddleFrame.asp?view=std&oi=E9CED&BEID=19701&SBLID=&AirTime=False&BaseId=45441&FamilyID=3852&ProductLineID=431
Rebuttal to “Why choose x86” for SAP blog posting
I was intrigued by a recent blog post, entitled: Part 1: SAP on VMware : Why choose x86. https://communities.vmware.com/blogs/walkonblock/2014/02/06/part-1-sap-on-vmware-why-choose-x86. I will get to the credibility of the author in just a moment. First, however, I felt it might be interesting to review the points that were made and discuss these, point by point.
- No Vendor Lock-in: “When it comes to x86 world, there is no vendor lock-in as you can use any vendor and any make and model as per your requirements”. Interesting that the author did not discuss the vendor lock-in on chip, firmware or hypervisor. Intel, or to a very minor degree, AMD, is required for all x86 systems. This would be like being able to choose any car as long as the engine was manufactured by Toyota (a very capable manufacturer but with a lock on the industry, might not offer the best price or innovation). As any customer knows, each x86 system has its own unique BIOS and/or firmware. Sure, you can switch from one vendor to another or add a second vendor, but lacking proper QA, training, and potentially different operational procedures, this can result in problems. And then there is the hypervisor with VMware clearly the preference of the author as it is for most SAP x86 virtualization customers. No lock-in there?
SAP certifies multiple different OS and hypervisor environments for their code. Customers can utilize one or more at any given time. As all logic is written in 3rd and 4th GL languages, i.e. ABAP and JAVA, and is contained within the DB server, customers can move from one OS, HW platform and/or hypervisor to another and only have to, wait for it, do proper QA, training and modify operational procedures as appropriate. So, SAP has removed lock-in regardless of OS, HW or hypervisor.
Likewise, Oracle, DB2 and Sybase support most OS’s, HW and hypervisors (with some restrictions). Yes, a migration is required for movement between dissimilar stacks, but this could be said for moving from Windows to Linux and any move between different stacks still requires all migration activities to be completed with the potential exception of data movement when you “simply” change the HW vendor.
- Lower hardware & maintenance costs: “x86 servers are far better than cheaper than non-x86 servers. This also includes the ongoing annual maintenance costs (AMC) as well.” Funny, however, that the author only compared HW and maintenance costs and conveniently forgot about OS and hypervisor costs. Also interesting that the author forgot about utilization of systems. If one system is ½ the cost of another, but you can only drive, effectively, ½ the workload, then the cost is the same per unit of work. Industry analysts have suggested that 45% utilization is the maximum sustained to be expected out of VMware SAP systems with most seeing far less. By the same token, those analysts say that 85% or higher is to be expected of Power Systems. Also interesting to note that the author did not say which systems were being compared as new systems and options from IBM Power Systems offer close to price parity with x86 systems when HW, OS, hypervisor and 3 years of maintenance are included.
- Better performance: “Some of the models of x86 servers can actually out-perform the non-x86 servers in various forms.” Itanium is one of the examples, which is a no-duh for anyone watching published benchmarks. The other example is a Gartner paper sponsored by Intel which actually does not quote a single SAP benchmark. Too bad the author suggested this was a discussion of SAP. Last I checked (today 2/10/14), IBM Power Systems can deliver almost 5 times the SAPS performance of the largest x86 server (as measured by the 2-tier SD benchmark). On a SAPS/core basis, Power deliver almost 30% more SAPS/core compared to Windows systems and almost 60% more than Linux/x86 systems. Likewise, on the 3-tier benchmark, the latest Power result is almost 4.5 times that of the latest x86 result. So, much for point 3.
- Choice of OS: “You have choice of using any OS of your choice and not forced to choose a specific OS.” Yes, it really sucks that with Power, you are forced to choose, AIX … or IBM I for Business … or SUSE Linux … or RedHat Linux which is so much worse than being forced to choose Microsoft Windows … or Oracle Solaris … or SUSE Linux … or RedHat Linux.
- Disaster Recovery: “You can use any type of hardware, make and model when it comes to disaster recovery (DR). You don’t need to maintain hardware from same vendor.” Oh, really? First, I have not met any customers that use one stack for production and a totally different one in DR, but that is not to say that it can’t be done. Second, remember the discussion about BIOS and firmware? There can be different patches, prerequisites and workarounds for different stacks. Few customers want to spend all of the money they “saved” by investing in a separate QA cycle for DR. Even fewer want to take a chance of DR not working when they can least afford it, i.e. when there is a disaster. Interestingly, Power actually supports this better than x86 as the stack is identical regardless of which generation, model, mhz is used. You can even run in Power6 mode on a Power7+ server further enabling complete compatibility regardless of chip type meaning you can use older systems in DR to back up brand new systems in production.
- Unprecedented scalability: “You can now scale the x86 servers the way you want, TB’s of RAM’s , more than 64 cores etc is very much possible/available in x86 environment.” Yes, any way that you want as long as you don’t need more capacity than is available with the current 80 core systems. Any way that you want as long as you are not running with VMware which limits partitions to 128 threads which equates to 64 cores. Any way that you want but that VMware suggests that you contain partitions within a NUMA block which means a max of 40 cores. http://blogs.vmware.com/apps/sap Any way that you want as long as you recognize that VMware partitions are further limited in terms of scalability which results in an effective limit of 32 threads/16 cores as I have discussed in this blog previously.
- Support from Implementation Vendor: “If you check with your implementation vendor/partner, you will find they that almost all of them can certify/support implementation of SAP on x86 environment. The same is the case if you are thinking about migrating from non-x86 to x86 world.” No clue what point is being made here as all vendors on all supported systems and OSs support SAP on their systems.
The author referred to my blog as part of the proof of his/her theories which is the only reason why I noticed this blog in the first place. The author describes him/herself as “Working with Channel Presales of an MNC”. Interesting that he/she hides him/herself behind “MNC” because the “MNC” that I work for believes that transparency and honesty are required in all internet postings. That said, the author writes about nothing but VMware, so you will have to draw your own conclusions as to where this individual works or with which “MNC” his/her biases lie.
The author, in the reference to my posting, completely misunderstood the point that I made regarding the use of 2-tier SAP benchmark data in projecting the requirements of database only workloads and apparently did not even read the “about me” which shows up by default when you open my blog. I do not work for SAP and nothing that I say can be considered to represent them in any way.
Fundamentally, the author’s bottom line comment, “x86 delivers compelling total cost of ownership (TCO) while considering SAP on x86 environment” is neither supported by the facts that he/she shared nor by those shared by others. IBM Power Systems continues to offer very competitive costs with significantly superior operational characteristics for SAP and non-SAP customers.
High end Power Systems customers have a new option for SAP app servers that is dramatically less expensive than x86 Linux solutions
Up until recently, if you were expanding the use of your SAP infrastructure or have some older Power Systems that you were considering replacing with x86 Linux systems, I could give you a TCO argument that showed how you could see roughly equivalent TCO using lower end Power Servers. Of course, some people might not buy into all of the assumptions or might state that Linux was their new standard such that AIX was no longer an acceptable option. Recently, IBM made an announcement which has changed the landscape so dramatically that you can now obtain the needed capacity using high end server “dark cores” with Linux, not at an equivalent TCO, but at a dramatically lower TCA.
The new offering is called IFL which stands for Integrated Facility for Linux. This concept originated with System Z (aka mainframe) several years ago. It allows customers that have existing Power 770, 780 or 795 servers with capacity on demand “dark cores”, i.e. for which no workload currently runs and the license to use the hardware, virtualization and OS software have not been activated, to turn on a group of cores and memory specifically to be used for Linux only workloads. A Power IFL is composed of 4 cores with 32GB of memory and has a list price of $8,591.
In the announcement materials provided by IBM Marketing, an example is provided of a customer that would need to add the equivalent of 16 cores @ 80% utilization and 128GB of memory to an existing Power 780 4.4GHz system or would need the equivalent capacity using a 32-core HP DL560 2.7GHz system running at 60% utilization. They used SPECint_rate as the basis of this comparison. Including 3 year license for PowerVM, Linux subscription and support, 24×7 hardware maintenance and the above mentioned Power activations, the estimated street price would be approximately $39,100. By comparison, the above HP system plus Linux subscription and support, VMware vSphere and 24×7 hardware maintenance would come in at an estimated street price of approximately $55,200.
Already sounds like a good deal, but I am a skeptic, so I needed to run the numbers myself. I find SPECint_rate to be a good indicator of performance for almost no workloads and an incredibly terrible indicator of performance for SAP workloads. So, I took a different approach. I found a set of data from an existing SAP customer of IBM which I then used to extrapolate capacity requirements. I selected the workloads necessary to drive 16 cores of a Power 780 3.8GHz system @ 85% utilization. Why 85%? Because we, and independent sources such as Solitaire Interglobal, have data from many large customers that report routinely driving their Power Systems to a sustained utilization of 85% or higher. I then took those exact same workloads and modeled them onto x86 servers assuming that they would be virtualized using VMware. Once again, Solitaire Interglobal reports that almost no customers are able to drive a sustained utilization of 45% in this environment and that 35% would be more typical, but I chose a target utilization of 55% instead to make this as optimistic for the x86 servers as possible. I also applied only a 10% VMware overhead factor although many sources say that is also optimistic. It took almost 6 systems with each hosting about 3 partitions to handle the same workload as the above 16-core IFL pool did.
Once again, I was concerned that some of you might be even more optimistic about VMware, so I reran the model using a 65% target utilization (completely unattainable in my mind, but I wanted to work out the ultimate, all stars aligned, best admins on the planet, tons of time to tune systems, scenario) and 5% VMware overhead (I don’t know anyone that believes VMware overhead to be this low). With each system hosting 3 to 4 partitions, I was able to fit the workloads on 5 systems. If we just go crazy with unrealistic assumptions, I am sure there is a way that you could imagine these workloads fitting onto 4 systems.
Next, I wanted to determine the accurate price for those x86 systems. I used HP’s handy on-line ordering web site to price some systems. Instead of the DL560 that IBM Marketing used, I chose the DL360e Gen8 system, with 2@8-core 1.8GHz processors, 64GB of memory, a pair of 7200rpm 500GB hard drives, VMware Enterprise for 2 processors with 3 yr subscription, RH Enterprise Linux 2 socket/4 guest with 3 yr subscription, 3yr 24×7 ProCare Service and HP installation services. The total price comes to $27,871 which after an estimated discount of 25% on everything (probably not realistic), results in a street price of $20,903.
Let’s do the math. Depending on which x86 scenario you believe is reasonable, it either takes 6 systems at a cost of $125,419, 5 systems @ $104,515 or 4 systems @ $83,612 to handle the same load as a 4 IFL/16-core pool of partitions on a 780 at a cost of $39,100. So, in the most optimistic case for x86, you would still have to pay $44,512 more. It does not take a rocket scientist to realize that using Power IFLs would result in a far less expensive solution with far better reliability and flexibility characteristics not to mention better performance since communication to/from the DB servers would utilize the radically faster backplane instead of an external TCP/IP network.
But wait, you say. There is a better solution. You could use bigger x86 systems with more partitions on each one. You are correct. Thanks for bringing that up. Turns out, just as with Power Systems, if you put more partitions on each VMware system, the aggregate peaks never add up to the sum of the individual peaks. Using 32-core, DL560s @ 2.2GHz, 5% VMware overhead and 65% target utilization, you would only need 2 systems. I priced them on the HP web site with RH Linux 4 socket/unlimited guests 3yr subscription, VMware Enterprise 4 socket/3yr, 24×7 ProCare and HP installation service and found the price to be $70,626 per system, i.e. $141,252 for two systems, $105,939 after the same, perhaps unattainable 25% discount. Clearly, 2 systems are more elegant than 4 to 6, but still, this solution is still $66,839 more expensive than the IFL solution.
I started off to try and prove that IBM Marketing was being overly optimistic and ended up realizing that they were highly conservative. The business case for using IFLs for SAP app servers on an existing IBM high end system with unutilized dark cores compared to net new VMware/Linux/x86 systems is overwhelming. As many customers have decided to utilize high end Power servers for DB due to their reliability, security, flexibility and performance characteristics, the introduction of IFLs for app servers is almost a no-brainer.
Configuration details:
HP ProLiant DL360e Gen8 8 SFF Configure-to-order Server – (Energy Star)661189-ESC $11,435.00
HP ProLiant DL360e Gen8 Server
HP DL360e Gen8 Intel® Xeon® E5-2450L (1.8GHz/8-core/20MB/70W) Processor FIO Kit x 2
HP 32GB (4x8GB) Dual Rank x4 PC3L-10600 (DDR3-1333) Reg CAS-9 LP Memory Kit x 2
HP Integrated Lights Out 4 (iLO 4) Management Engine
HP Embedded B120i SATA Controller
HP 8-Bay Small Form Factor Drive Cage
HP Gen8 CPU1 Riser Kit with SAS Kit + SAS License Kit
HP 500GB 6G SATA 7.2K rpm SFF (2.5-inch) SC Midline 1yr Warranty Hard Drive x 2
HP 460W Common Slot Platinum Plus Hot Plug Power Supply
HP 1U Small Form Factor Ball Bearing Gen8 Rail Kit
3-Year Limited Warranty Included
3yr, 24×7 4hr ProCare Service $1,300.00
HP Install HP ProLiant $225.00
Red Hat Enterprise Linux 2 Sockets 4 Guest 3 Year Subscription 24×7 Support No Media Lic E-LTU $5,555.00
VMware vSphere Enterprise 1 Processor 3 yr software $4,678.00 x 2 = $9,356.00
DL360e Total price $27,871.00
ProLiant DL560 Gen8 Configure-to-order Server (Energy Star) 686792-ESC $29,364.00
HP ProLiant DL560 Gen8 Configure-to-order Server
HP DL560 Gen8 Intel® Xeon® E5-4620 (2.2GHz/8-core/16MB/95W) Processor FIO Kit
HP DL560 Gen8 Intel® Xeon® E5-4620 (2.2GHz/8-core/16MB/95W) Processor Kit x3
HP 16GB (2x8GB) Dual Rank x4 PC3L-10600 (DDR3-1333) Reg CAS-9 LP Memory Kit x 4
ENERGY STAR® qualified model
HP Embedded Smart Array P420i/2GB FBWC Controller
HP 500GB 6G SAS 7.2K rpm SFF (2.5-inch) SC Midline 1yr Warranty Hard Drive x 2
HP iLO Management Engine(iLO 4)
3 years parts, labor and onsite service (3/3/3) standard warranty. Certain restrictions and exclusions apply.
HP 3y 4h 24×7 ProCare Service $3,536.00
Red Hat Enterprise Linux 4 Sockets Unlimited Guest 3 Yr Subscription 24×7 Support No Media Lic E-LTU $18,519.00
VMware vSphere Enterprise 1 Processor 3 yr software $4,678.00 x 4 = $18,712.00
HP Install DL560 Service $495.00
DL560 Total price: $70,626.00
SAP TechEd Video – Shall I Stay or Shall I Go … to x86? Technical Factors and Considerations.
For any that might be interested in seeing a presentation that I delivered at SAP TechEd Las Vegas 2011, please click on this link. This presentation discusses why customers might consider x86 for their SAP environments and the reasons why Power Systems may deliver lower costs, better reliability and security. The video lasts approximately 60 minutes.
IBM Power Systems compared to x86 for SAP landscapes
It seems like every other day, someone asks me to help them justify why a customer should select IBM Power Systems over x86 alternatives for new or existing SAP customers. Here is a short summary of the key attributes that most customers require and the reasons why Power Systems excels or conversely, where x86 systems fall short.
TCO – Total Cost of Ownership is usually at the top of everyone’s list. Often this is confused with TCA or Total Cost of Acquisition. TCA can be very important for some individuals within customer organizations, especially when those individuals are only responsible for capital acquisition costs and not operational costs such as maintenance, power, cooling, floor space, personnel, software and other assorted costs. TCA can also be important when only capital budgets are restricted. For most customers, however, TCO is far more important. Some evaluators compare systems, one for one. While this might seem to make sense, would it be reasonable to compare a pickup truck and an 18-wheeler semi? Obviously not, so, to do a fair job of comparing TCO, a company must look at all aspects, purposes and effects of different choices. For instance, with IBM Power Systems, customers routinely utilize PowerVM, the IBM Power virtualization technology, to combine many different workloads including ERP, CRM, BW, EP, SCM, SRM and other production database and application servers, high availability servers, backup/recovery servers and non-production servers onto a single, small set of servers. While some of this is possible with x86 virtualization technologies, it is rarely done, partly due to “best practices” separation of workloads and also due to support restrictions by some software products, such as Oracle database, when used in a virtualized x86 environment. This typically results in a requirement for many more servers. Likewise, many Power Systems customers routinely drive their utilization to 80% or higher, where the best of x86 virtualization customers rarely drive to even 50% utilization. Taken together, it is very common to see 2 or 3 times the number of systems for x86 customers than for equivalently sized Power Systems customers and I provided only two reasons of the many frequently experienced by SAP customers. So, where an individual Power System might be slightly higher in cost than the equivalent x86 server, full SAP landscapes on Power Systems often require far fewer systems. Between a potentially lower cost of acquisition and the associated lower cost of management, less power, cooling, floor space and often lower cost of third party software, customers can see a significantly lower TCO with IBM Power Systems.
For customers which are approaching the limits on their data centers, either in terms of floor space, power or cooling, x86 horizontal proliferation may drive the need for data center expansion that could cost into the many millions of dollars. Power Systems may help customers to achieve radically higher levels of consolidation through its far more advanced virtualization and much higher scalability thereby potentially avoiding the need for that data center expansion. The savings, in this event, would make the other savings seem trivial by comparison.
Reliability – A system which is low cost but suffers relatively high numbers of outages may not be the best option for mission critical systems such as SAP. IBM Power Systems feature an impressive array of reliability technologies that are not available on any x86 system. This starts with failure detection circuitry which is built into the entire system including the processor chips and is called First Failure Data Capture (FFDC). FFDC has been offered and improved upon since the mid-90’s for Power Systems and its predecessors. This unique technology captures soft and hard errors from within the hardware allowing the service processor, standard with every system, to predict failures which could impact application availability and take preventive action such as dynamically deallocating components from adapter cards to memory and cache lines and even processor cores. Intel, starting with Nehalem-EX, offers Machine Check Architecture Recovery (MCA), their first version of a similar concept. As a first version, it is doubtful that it can approach the much more mature FFDC technology from IBM. Even more important is the “architecture” which, once errors are detected, passes that information, not to a service processor, but to the Operating System or Virtualization Manager with the “option” for that software to fix the problem in the hardware. This is like your car telling you that your braking system has a problem. Even if you have the mechanical ability to run advanced diagnostics, remove and replace parts, bleed the system, etc., this would involve a significant outage and most certainly could not be done on the fly. Likewise, it is extremely doubtful that Microsoft, for instance, is going to invest in software to fix a problem in an Intel processor especially since this area is likely going to change and only addresses one potential area of reliability. Furthermore, does Microsoft actually want to take on responsibility for hardware reliability? This is just one example, of many, that affect uptime, but without which SAP systems can be exposed.
Equally important is what happens if a problem does occur. Unless you are very lucky, you have experienced the Blue Screen of Death at least one or a hundred times in your past. This is one of those wonderful things that can occur when you don’t have a comprehensive reliability architecture such as that with IBM Power Systems. With x86 systems, essentially, the OS reports that a problem has occurred which could be related to the CPU, system hardware, OS, device driver, firmware, memory, application software, adapter cards, etc. and that your best course of action is to remove the last thing you installed and reboot your system. When you call your system vendor, they might suggest that you contact your OS vendor which might suggest you contact your virtualization vendor which might suggest the problem lies in your BIOS and on and on. Who takes responsibility and ownership and drives the problem to resolution? With IBM Power Systems, IBM develops and supports its own CPU, firmware, system hardware, virtualization, device drivers, OS (assuming AIX or i for Business), memory controllers and buffer chips and has a comprehensive set of rules and detection circuitry for third party hardware and software. This means that in the very rare event of an intermittent or hard to identify error occurs, which is not detected and corrected automatically, IBM takes ownership and resolves the problem unless it is determined that a third party piece of hardware or software caused the problem. In that case, IBM works diligently with its partners to resolve which includes IBM personnel that work on site at many of their partner locations such as Oracle and SAP.
Security – Often an afterthought, but potentially an extremely expensive one, should be carefully considered. PowerVM has never been successfully hacked as noted at http://nvd.nist.gov. AIX has approximately 0% of Critical and High Vulnerabilities and 2% of all OS vulnerabilities compared with 73% and 27% for Microsoft, respectively and 16% and 31% for Linux respectively. X-Force report – Mid-year 2010 http://www-935.ibm.com/services/us/iss/xforce/trendreports/ . A successful hack could result in just a personnel inconvenience for the IT staff, the loss of systems and/or in a worst case scenario, the theft of proprietary and/or personal data. SAP systems usually hold the crown jewels of an enterprise customer and should be among the best protected of any customer systems.
Bottom line – Where individual x86 systems may have a lower price tag than the equivalent Power System, full SAP landscapes will often require far fewer systems with Power Systems resulting in a lower TCO. Add to that much better reliability, fault detection, comprehensive problem resolution and ownership and rock solid security and the case for IBM Power Systems for SAP landscapes is pretty overwhelming.