SAPonPower

An ongoing discussion about SAP infrastructure

HPE acquisition of SGI implications to HANA customers

Last week, HP announced their intention to acquire SGI for $275M, a 30% premium over its market cap prior to the announcement.  This came as a surprise to most people, including me, both that HP would want to do this and how little SGI was worth, a fact that eWeek called “Embarrasing”, http://www.eweek.com/innovation/hpe-buys-sgi-for-275-million-how-far-the-mighty-have-fallen.html.  This raises a whole host of questions that HANA customers might want to consider.

First and foremost, there would appear to be three possible reasons why HP made this acquisition, 1) eliminate a key competitor and undercut Dell and Cisco at the same time, 2) acquire market share, 3) obtain access to superior technology, resources and/or keep them out of the hands of a competitor or some combination of the above.

If HP considered SGI a key competitor, it means that HP was losing a large number of deals to SGI, a fact that has not been released publicly but which would imply that customers are not convinced of Superdome X’s strength in this market.  As many are aware, both Dell and Cisco have resell agreements with SGI for their high end UV HANA systems.  It would seem unlikely that either Dell or Cisco would continue such a relationship with their arch nemesis and as such, this acquisition will seriously undermine the prospects of Dell and Cisco to compete for scale-up HANA workloads such as Suite on HANA and S/4HANA among customers that may need more than 4TB, in the case of Dell, and 6TB, in the case of Cisco.

On the other hand, market share is a reasonable goal, but SGI’s total revenue in the year ending 6/26/15 was only $512M, which would barely be a rounding error on HP’s revenue of $52B for the year ending 10/31/15.  Hard to imagine that HP could be this desperate for a potential 1% increase in revenue, assuming they had 0% overlap in markets.  Of course, they compete in the same markets, so the likely revenue increase is considerably less than even that paltry number.

That brings us to the third option, that the technology of SGI is so good that HP wanted to get their hands on it.  If that is the case, then HP would be admitting that the technology in Superdome X is inadequate for the demands of Big Data and Analytics.  I could not agree more and made such a case in a recent post on this blog.  In that post, I noted the latency inherent in HP’s minimum 8-hop round trip access to any off board resources (remote memory accesses adds another two hops), remember there are only two Intel processors per board in a Superdome X system which can accommodate up to 16 processors. Scale-up transactional workloads typically access data in rows dispersed across NUMA aligned columns, i.e. will constantly be traversing this high latency network.  Of course, this is not surprising since the architecture used in this system is INCREDIBLY OLD having been developed in the early 2000s, i.e. way before the era of Big Data.  But the surprising thing is that this would imply that HP believes SGI’s architecture is better than their own.  Remember, SGI’s UV system uses a point to point, hand wired, 4-bit wide mesh of wires between every two NUMA ASICs in the system, which, for the potential 32-sockets in a single cabinet system means 16 ASICS and 136 wires, if I have done the math correctly.  HP has been critical of the memory protection employed in systems like SGI’s UV which is based on SDDC (Single Device Data Correction).  In HP’s own words about their DDDC+1 memory protection: “This technology delivers up to a 17x improvement in the number of DIMM replacements versus those systems that use only Single-Chip Sparing technologies. Furthermore, DDDC +1 significantly reduces the chances of memory related crashes compared to systems that only have Single-Chip Sparing capabilities”  HP Integrity Superdome X system architecture and RAS Whitepaper.  What is really interesting is that SDDC does not imply even Single-Chip Sparing.

The only one of those options which makes sense to me is the first one, but of course, I have no way of knowing which of the above is correct. One thing is certain; customers considering implementing HANA on a high-end scale-up architecture from either HP or SGI, and Dell and Cisco by extension, are going to have to rethink their options.  HP has not stated which architecture will prevail or if they will keep both, hard to imagine but not out of the question either.  Without concrete direction from HP, it is possible that a customer decision for either architecture could result in almost immediate obsolescence.  I would not enjoy being in a board room of a company or meeting with my CFO to explain how I made a decision for a multi-million dollar solution which is dead-ended, worth 1/2 or less overnight and for which a complete replacement will be required sooner than later.  Likewise, it would be hard to imagine an upcoming decision for a strategic system to run the company’s entire business based on a single flip of a coin.

Now I am going to sound like a commercial for IBM, sorry.  There is an alternative which is rock solid, increasing, not decreasing in value, and has a strong and very clear roadmap, IBM Power Systems.  POWER8 for HANA has been seeing one of the fastest market acceptance rates of any solution in recent memory with hundreds of customers implementing HANA on Power Systems and/or purchasing systems in preparation for such an implementation, ranging from medium sized businesses in high growth markets to huge brand names.  The roadmap was revealed earlier this year at the OpenPower Summit, http://www.nextplatform.com/2016/04/07/ibm-unfolds-power-chip-roadmap-past-2020/.  This roadmap was further backed up with Google’s announcement of their plans for POWER9, http://www.nextplatform.com/2016/04/06/inside-future-google-rackspace-power9-system/, the UK’s SFTC plans, http://www.eweek.com/database/ibm-powers-uk-big-data-research.html worth £313 (roughly $400M based on current exchange rates) and the US DOE’s decision to base its Summit and Sierra supercomputers on IBM POWER9 systems, http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers, a $375M investment, either of which is worth more than the entire value of SGI interestingly enough.  More importantly, these two major wins mean IBM is now contractually obligated to deliver POWER9 thereby ensuring the Power roadmap for a long time to come.  And, of course, IBM has a long history of delivering mission critical systems to customers, evolving and improving the scaling and workload handling characteristics over time while simultaneously improving systems availability.

Advertisement

August 15, 2016 Posted by | Uncategorized | , , , , , , , , , , | Leave a comment

Large scale-up transactional HANA systems – part 2

Part 1 of this subject detailed the challenges when sizing large scale-up transactional HANA environments.  This part will dive into the details and methodology by which customers may select a vendor lacking an independent transactional HANA benchmark.

Past history with large transactional workloads

Before I start down this path, first it would be useful to understand why it is relevant.  HANA transaction processing utilizes many of the same techniques as a conventional database.  It accesses rows, albeit each column is physically separate, the transaction does not know this and gets all of the data together in one place prior to presenting the results to the dialog calling it.  Likewise, a write must follow ACID properties including only one update against a piece of data can occur at any time requiring that cache coherency mechanisms are employed to ensure this.  And a write to a log in addition to the memory location of the data to be changed or updated must occur.  Sounds an awful lot like a conventional DB which is why past history handling these sorts transactional workloads makes plenty of sense.

HPE has a long history with large scale transactional workloads and Superdome systems, but this was primarily based on Integrity Superdome systems using Itanium processors and HP-UX not with Intel x86 systems and Linux.  Among the Fortune 100, approximately 20 customers utilized HPE’s systems for their SAP database workloads almost entirely based on Oracle with HP-UX.  Not bad and coming in second place to IBM Power Systems with approximately 40 of the Fortune 100 customers that use SAP.  SGI has exactly 0 of those customers.  Intel x86 systems represent 8 of that customer set with 2 being on Exadata, not even close to a standard x86 implementation with its Oracle RAC and highly proprietary storage environment.  Three of the remaining x86 systems are utilized by vendors whose very existence is dependent on x86 so running on anything else would be a contradictory to their mission and these customers must make this solution work no matter what the expense and complexity might be.  That leaves 3 customers, none of which utilize Superdome X technology for their database systems.  To summarize, IBM Power has a robust set of high end current SAP transactional customers; HPE a smaller set entirely based on a different chip and OS than is offered with Superdome X; SGI has no experience in this space whatsoever; and x86 in general has limited experience confined to designs that have nothing in common with today’s high end x86 technology.

Industry Standard Benchmarks

A bit of background.  Benchmarks are lab experiments open to optimization and exploitation by experts in the area and have little resemblance to reality.  Unfortunately, it is the only third party metric by which systems can be compared.  Benchmarks fall into two general categories, those that are horrible and those that are not horrible (note I did not say good).  Horrible ones sometimes test nothing but the speed of CPUs by placing the entire running code in instruction cache and the entire read-only dataset upon which the code executes in data cache meaning no network and disk much less any memory I/O or cache coherency.   SPEC benchmarks such as SPECint2006 and SPECint_rate2006 fall into this category.  They are uniquely suited for ccNUMA systems as there is absolutely no communication between any sockets meaning this represents the best case scenario for a ccNUMA system.

It is therefore revealing that SGI, with 32 sockets and 288 cores, was only able to achieve 11,400 on this ideal ccNUMA benchmark, slightly beating HP Superdome X’s result of 11,100, also with 288 cores.  By comparison, the IBM Power Systems E880 with only 192 cores, i.e. 2/3 of the cores, achieved 14,400, i.e. 26% better performance.

In descending order from horrible to not as bad, there are other benchmarks which can be used to compare systems.  The list of benchmarks includes SAP SD 2-tier, SAP BW-EML, TPC-C and SAP 3-tier.  Of those, the SD 2-tier has the most participation among vendors and includes real SAP code and a real database, but suffers from the database being a tiny percentage of the workload, approximately 6 to 8%, meaning on ccNUMA systems, multiple app servers can be placed on each system board resulting in only database communication going across a pretty darned fast network represented by the ccNUMA fabric.  SGI is a no-show on this benchmark.  HPE did show with Superdome X @ 288 cores and achieved 545,780 SAPS (100,000 users, Ref# 2016002), and still the world record holder.  IBM Power showed up with the E870, an 80 core systems (28% of the number of cores as the HPE system) and achieved 436,100 SAPS (79,750 users, Ref# 2014034) (80% of the SAPS of the HPE system).  Imagine what IBM would have been able to achieve with this almost linearly scalable benchmark had they attempted to run it on the E880 with 192 cores (probably close to 436,100 * 192/80 although it is not allowed for any vendor to publish the “results” of any extrapolations of SAP benchmarks but no one can stop a customer from inputting those numbers into a calcuator).

BW-EML was SAP’s first benchmark designed for HANA, although not restricted to it.  As the name implies, it is a BW benchmark, so it is difficult to derive any correlation to transaction processing, but at least it does show some aspect of performance with HANA, analytic if nothing else and concurrent analytics is one of the core value propositions of HANA.  HPE was a frequent contributor to this benchmark, but always with something other than Superdome X.  It is important to note that Superdome X is the only Intel based system to utilize RAS mode or Intel Lockstep, by default, not as an option.  That mode has a memory throughput impact of 40% to 60% based on published numbers from a variety of vendors, but, to date, no published benchmarks, of any sort, have been run in this mode.  As a result, it is impossible to predict how well Superdome X might perform on this benchmark.  Still, kudos to HPE for their past participation.  Much better than SGI which is, once again, a no-show on this benchmark.  IBM Power Systems, as you might predict, still holds the record for best performance on this benchmark with the 40 core E870 system @ 2 Billion rows.

TPC-C was a transaction processing benchmark that, at least for some time period, had good participation, including from HP Superdome.  That is, until IBM embarrassed HPE so much, by delivering 50% more performance with ½ the number of cores.  After this, HPE never published another result on Superdome … and that was back in the 2007/2008 time frame.  TPC-C was certainly not a perfect benchmark, but it did have real transactions with real updates and about 10% of the benchmark involved remote accesses.  Still, SGI was a no-show and HPE stopped publishing on this level of system in 2007 while IBM continued publishing through 2010 until there was no one left to challenge their results.  A benchmark is only interesting when multiple vendors are vying for the top spot.

Last, but certainly not least, is the SAP SD 3-tier benchmark.  In this one, the database was kept on a totally separate server and there was almost no way to optimize it to remove any ccNUMA effects.  Only IBM had the guts to participate in this benchmark at a large scale with a 64-core POWER7+ system (the previous generation to POWER8).  There was no submission from HPE that came even remotely close and, once again, SGI was MIA.

Architecture

Where IBM Power Systems utilizes a “glueless” interconnect up to 16 sockets, meaning all processor chips connect to each other directly, without the use of specialized hub chips or switches, Intel systems beyond 8 sockets utilize a “glued” architecture.  Currently, only HPE and SGI offer solutions beyond 8 sockets.  HPE is using a very old architecture in the Superdome X, first deployed for PA-RISC (remember those) in the Superdome introduced in 2000.  Back then, they were using a cell controller (a.k.a. hub chip) on each system board.  When they introduced the Itanium processor in 2002, they replaced this hub chip with a new one called SX1000; basically an ASIC that connected the various components on the system board together and to the central switch by which it communicats with other system boards.  Since 2002, HPE has moved through three generations of ASICs and now is using the SX3000 which features considerably faster speeds, better reliability, some ccNUMA enhancements and connectivity to multiple interconnect switches.  Yes, you read that correctly; where Intel has delivered a new generation of x86 chips just about every year over the last 14 years, HPE has delivered 3 generations of hub chips.  Pace of innovation is clearly directly tied to volume and Superdome has never achieved sufficient volume alone nor use by other vendors to increase the speed of innovation.  This means that while HPE may have delivered a major step forward at a particular point in time, it suffers from a long lag and diminishing returns as time and Intel chip generations progress.  The important thing to understand is that every remote access, from either of the two Intel EX chips on each system board, to cache, memory or I/O connected to another system board, must pass through 8 hops, at a minimum, i.e. from calling socket, to SX3000, to central switch to remote SX3000, to remote socket and the same trip in return and that is assuming that data was resident in an on-board cache.

SGI, the other player in the beyond 8 socket space, is using a totally different approach, derived from their experience in the HPC space.  They are also using a hub chip, called a HARP ASIC, but rather than connecting through one or more central switches, in the up to 32 socket systems UV 300H system, each system board, featuring 4 Intel EX chips and a proprietary ASIC per memory riser, includes two hub chips which are linked directly to each of the other hub chips in the system.  This mesh is hand wired with a separate physical cable for every single connection.  Again, you read that correctly, hand wired.  This means that not only are physical connections made for every hub chip to hub chip connection with the inherent potential for an insertion or contact problem on each end of that wire, but as implementation size increases, say from 8-sockets/2 boards to 16-sockets/4 boards or to 32-sockets/8 boards, the number of physical, hand wired connections increases exponentially.   OK, assuming that does not make you just a little bit apprehensive, consider this:  Where HPE uses a memory protection technology called Double Device Data Correction + 1 (DDDC+1) in their Superdome X system, basically the ability to handle not just a single memory chip failure but at least 2 (not at the same time), SGI utilizes SDDC, i.e. Single device data correction.  This means that after detection of the first failure, customers must rapidly decide whether to shut down the system and replace the failing memory component (assuming it has been accurately identified), or hope their software based page deallocation technology works fast enough to avert a catastrophic system failure due to a subsequent memory failure.  Even with that software, if a memory fault occurs in a different page, the SGI system would still be exposed.    My personal opinion is that memory protection is so important in any system, but especially in large scale scale-up HANA systems, that anything short of true enterprise memory protection of at least DDDC is doing nothing other than increasing customer risk.

Summary

SGI is asking customers to accept their assertions that SAP’s certification of the SGI UV 300H at 20TB implies they can scale better than any other platform and perform well at that level, but they are providing no evidence in support of that claim.  SAP does not publish the criteria with which is certifies a solution, so it is possible that SGI has been able to “prove” addressability at 20TB, the ability to initialize a HANA system and maybe even to handle a moderate number of transactions.  Lacking any sort of independent, auditable proof via a benchmark, any reasonable body of customers (one would be nice at least) driving high transaction volumes with HANA or a conventional database and anything other than a 4-bit wide, hand wired ccNUMA nest that would seem prone to low throughput and high error rates, especially with substandard memory protection, it is hard to imagine why anyone would find this solution appealing.

HPE, by comparison, does have some history in transactional systems at high transactional volumes with a completely different CPU, OS and memory architecture, but nothing with Superdome X.  HPE has a few benchmarks, however poor, once again on systems from long ago plus mediocre results with the current generation and an architecture that has a minimum of 8-hops round trip for every remote access.  On the positive side, at least HPE gets it regarding proper memory protection, but does not address how much performance degradation results from this protection.  Once again, SAP’s certification at 16TB for Superdome X must be taken with the same grain of salt as SGI’s.

IBM Power Systems has an outstanding history with transactional systems at very high transactional volumes using current generation POWER8 systems.  Power also dominates the benchmark space and continued to deliver better and better results until no competitor dared risk the fight.  Lastly, POWER8 is latest generation of a chip designed from the ground up with ccNUMA optimization in mind and with reliability as its cornerstone, i.e. the results already include any overhead necessary to support this level of RAS.  Yes, POWER8 is only supported at 9TB today for SAP SoH and S/4HANA, but lest we forget, it is the new competitor in the HANA market and the other guys only achieved their higher supported numbers after extensive customer and internal benchmark testing, both of which are underway with Power.

July 7, 2016 Posted by | Uncategorized | , , , , , , , , , , , , , , , | Leave a comment

Large scale-up transactional HANA systems – part 1

Customers which require Suite on HANA (SoH) and S/4HANA systems with 6TB of memory or less will find a wide variety of available options.  Those options do not require any specialized type of hardware, just systems that can scale up to 8 sockets with Intel based systems and up to 64 cores with IBM Power Systems (socket count depends on the number of active cores per socket which varies by system).  If you require 6TB or less or can’t imagine ever needing more, then sizing is a fairly easy process, i.e. look at the sizing matrix from SAP and select a system which meets your needs.  If you need to plan for more than 6TB, this is where it gets a bit more challenging.   The list of options narrows to 5 vendors between 6TB and 8TB, IBM, Fujitsu, HPE, SGI and Lenovo and gets progressively smaller beyond that.

All systems with more than one socket today are ccNUMA, i.e. remote cache, memory and I/O accesses are delivered with more latency and lower bandwidth than local to the processor accesses.   HANA is highly optimized for analytics, which most of you probably already know.  The way it is optimized may not be as obvious.  Most tables in HANA are columnar, i.e. every column in a table is kept in its own structure with its own dictionary and the elements of the column are replaced with a very short dictionary pointer resulting in outstanding compression, in most cases.  Each column is placed in as few memory pages as possible which means that queries which scan through a column can run at crazy fast speeds as all of the data in the column is as “close” as possible to each other.  This columnar structure is beautifully suited for analytics on ccNUMA systems since different columns will typically be placed behind different sockets which means that only queries that cross columns and joins will have to access columns that may not be local to a socket and, even then, usually only the results have to be sent across the ccNUMA fabric.  There was a key word in the previous sentence that might have easily been missed: “analytics”.  Where analytical queries scan down columns, transactional queries typically go across rows in which, due to the structure of a columnar database, every element in located in a different column, potentially spanning across the entire ccNUMA system.  As a result, minimized latency and high cross system bandwidth may be more important than ever.

Let me stop here and give an example so that I don’t lose the readers that aren’t system nerds like myself.  I will use a utility company as an example as everyone is a utility customer.  For analytics, an executive might want to know the average usage of electricity on a given day at a given time meaning the query is composed of three elements, all contained in one table: usage, date and time.    Unless these columns are enormous, i.e. over 2 Billion rows, they are very likely stored behind a single socket with no remote accesses required.  Now, take that same company’s customer care center, where a utility consumer wants to turn on service, report an outage or find out what their last few months or years of bills have been.  In this case, all sorts of information is required to populate the appropriate screens, first name, last name, street address, city, state, meter number, account number, usage, billed amount and on and on.  Scans of columns are not required and a simple index lookup suffices, but every element is located in a different column which has to be resolved by an independent dictionary lookup/replacement of the compressed elements meaning several or several dozen communications across the systems as the columns are most likely distributed across the system.  While an individual remote access may take longer, almost 5x in a worst case scenario[i], we are still talking nanoseconds here and even 100 of those still results in a “delay” of 50 microseconds.  I know, what are you going to do while you are waiting!  Of course, a utility customer is more likely to have hundreds, thousands or tens of thousands of transactions at any given point in time and there is the problem.  An increased latency of 5x for every remote access may severely diminish the scalability of the system.

Does this mean that is not possible to scale-up a HANA transactional environment?  Not at all, but it does take more than being able to physically place a lot of memory in a system to be able to utilize it in a productive manner with good scalability.  How can you evaluate vendor claims then?  Unfortunately, the old tried and true SAP SD benchmark has not been made available to run in HANA environments.  Lacking that, you could flip a coin, believe vendor claims without proof or demand proof.  Clearly, demanding proof is the most reasonable approach, but what proof?  There are three types of proof to look at: past history with large transactional workloads, industry standard benchmarks and architecture.

In the over 8TB HANA space, there are three competitors; HPE Superdome X, SGI UV 300H and IBM Power Systems E870/E880.  I will address those systems and these three proof points in part 2 of this blog post.

[i] http://www.vldb.org/pvldb/vol8/p1442-psaroudakis.pdf

July 1, 2016 Posted by | Uncategorized | , , , , , , , , , , , , , , , | 3 Comments

SAP support for multiple production HANA VMs with VMware

Recently, SAP updated their SAP notes regarding the ability to run multiple production HANA VMs with VMware.  On the surface, this sounds like VMware has achieved parity with IBM’s PowerVM, but the reality could not be much farther away from that perception.  This is not to say that users of VMware for HANA will see no improvement.  For a few customers, this will be a good option, but as always, the devil is in the details and, as always, I will play the part of the devil.

Level of VMware supported: 5.5 … still.  VMware 6.0 is supported only for non-production.[i]  If VMware 6.0 is so wonderful and they are such “great” partners with SAP, it seems awfully curious why a product announced on Feb 3, 2015 is still not supported by SAP.

Maximum size of each production HANA instance: 64 virtual processors and 1TB of memory, however this translates to 32 physical processors with Hyperthreading enabled and sizing guidelines must still be followed.  Currently, BW HANA is sized @ 2TB for 4 Haswell chips, i.e. 28.4GB/core which translates to a maximum size of 910GB for a 32 core VM/64 vp, so slightly less than the 1TB supported.  Suite on HANA on Intel is supported at 1.5x higher memory ratio than BW, but since the size of the VM is limited to 1TB, this point is largely moot.

Performance impact: At a minimum, SAP estimates a 12% performance degradation compared to bare metal (upon which most benchmarks are run and from which most sizings are based), so one would logically conclude that the memory/cpu ratio should be reduced by the same level.  The 12% performance impact, but not the reduced sizing effect that I believe should be the result, are detailed in a SAP note.[ii]  It goes on to state “However, there are around 100 low-level performance tests in the test suite exercising various HANA kernel components that exhibit a performance degradation of more than 12%. This indicates that there are particular scenarios which might not be suited for HANA on VMware.Only 100?  When you consider the only like-for-like published benchmarks using VMware and HANA[iii], which showed a 12% degradation (coincidence, I think not) for a single VM HANA system vs. bare metal, it leaves one to wonder what sort of degradation might occur in a multiple VM HANA environment.  There is no guidance provided on this which might make anything less than a bleeding edge customer with no regard for SLA’s to be VERY cautious.  Another SAP note[iv] goes on to state, “For optimal VM performance, VMware recommends to size the VMs within the NUMA node boundaries of the specific server system (CPU cores and local NUMA node memory).” How much impact?  Not provided here.  So, either you size your VMs to fit within NUMA building blocks, i.e. a single 18-core socket or you suffer an undefined performance penalty.  It is also interesting to note what VMware said in Performance Best Practices for VMware vSphere® 5.5Be careful when using CPU affinity on systems with hyper-threading. Because the two logical processors share most of the processor resources, pinning vCPUs, whether from different virtual machines or from a single SMP virtual machine, to both logical processors on one core (CPUs 0 and 1, for example) could cause poor performance.”  That certainly gives me the warm and fuzzy!

Multiple VM support: Yes, you can now run multiple production HANA VMs on a system[v].  HOWEVER, “The vCPUs of a single VM must be pinned to physical cores, so the CPU cores of a socket get exclusively used by only one single VM. A single VM may span more than one socket, however. CPU and Memory overcommitting must not be used.”  This is NOT the value of virtualization, but of physical partitioning, a wonderful technology if we were living in the 1990s.  So, if you have an 8-socket system, you can run up to 8 simultaneous production VMs as long as al VMs are smaller than 511GB for BW, 767GB for SoH.  Need 600GB for BW, well that will cost you a full socket even though you only need a few cores thereby reducing the maximum number of VMs you can support on the system.  And this is before we take the 12% performance impact detailed above into consideration which could further limit the memory per core and number of VMs supported.

Support for mixed production HANA VMs and non-production: Not included in any of the above mentioned SAP notes.  One can infer from the above notes that this is not permitted meaning that there is no way to harvest unused cycles from production for the use of ANY other workload, whether non-prod or non-HANA DB.

Problem resolution: SAP Note 1995460 details the process by which problems may be resolved and while they are guided via SAP’s OSS system, there is transfer of ownership of problems when a known VMware related fix is not available.  The exact words are: “For all other performance related issues, the customer will be referred within SAP’s OSS system to VMware for support. VMware will take ownership and work with SAP HANA HW/OS partner, SAP and the customer to identify the root cause. Due to the abstraction of hardware that occurs when using virtualization, some hardware details are not directly available to SAP HANA support.” and of a little more concern “SAP support may request that additional details be gathered by the customer or the SAP HANA HW partner to help with troubleshooting issues or VMware to reproduce the issue on SAP HANA running in a bare metal environment.”

My summary of the above: One of more production HANA instances may be run under VMware 5.5 with a maximum of 64 vp/32 pp (assuming Hyperthreading is enabled) and a minimum of 12% performance degradation with potential proportionate impact on sizing, with guidance about “some scenarios might be suited for HANA with VMware”, with potential performance issues when VMs cross socket boundaries, with physical partitioning at the socket level, no sharing of CPU resources, no support for running non-production on the same system to harvest unused cycles and potential requirement to reproduce issues on a bare metal system if necessary.

Yes, that was a long, run-on sentence.  But it begs the question of just when would VMware be a good choice for hosting one or more production HANA instances?  My take is that unless you have very small instances which are unsuitable for HANA MDC (multitenancy) or are a cloud provider for very small companies, there is simply no value in this solution.  For those potential cloud providers, their target customer set would include companies with very small HANA requirements and the willingness to accept an SLA that is very flexible on performance targets while using a shared infrastructure for which there a wide variety of issues for which a problem in one VM could result in the whole system to fail with multiple customers impacted simultaneously.

And in case anyone is concerned that I am simply the bearer of bad news, let me remind the reader that IBM Power Systems with PowerVM is supported by SAP with up to 4 production HANA VMs (on the E870 and E880, 3 on all other HANA supported Power Systems) with granularity at the core level, no restrictions on NUMA boundaries, the ability to have a shared pool in place of one of the above production VMs with any number of non-production HANA VMs up to the limits of PowerVM which can utilize unused cycles from the production VMs, no performance penalties, no caveats about what types of workloads are well suited for PowerVM, excellent partition isolation preventing the vast majorities that could happen in one VM from affecting any other ones and no problem resolution handoffs or ownership changes.

In other words, if customers want to continue the journey around virtualization and server consolidation that they started in the early 2000s, want to have a very flexible infrastructure which can grow as they move to SoH, shrink as they move to S/4, grow as they consolidate more workloads into their primary instance, shrink again as they roll off data using data tiering, data aging or perhaps Hadoop and all without having to take significant system outages or throw away investment and purchase additional systems, IBM Power Systems with PowerVM can support this; VMware cannot.

 

———————

[i] 1788665 – SAP HANA Support for virtualized / partitioned (multi-tenant) environments

[ii] 1995460 – Single SAP HANA VM on VMware vSphere in production

[iii] Benchmark detail for bare metal and VMware 5.5 based runs from http://global.sap.com/solutions/benchmark/bweml-results.htm:

06/02/2014 HP 2,000,000,000 111,850 SuSE Linux Enterprise Server 11 on VMWARE ESX 5.5 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory

03/26/2014 HP 2,000,000,000 126,980 SuSE Linux Enterprise Server 11 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory

[iv] 2161991 – VMware vSphere configuration guidelines

[v] 2024433 – Multiple SAP HANA VMs on VMware vSphere in production

April 5, 2016 Posted by | Uncategorized | , , , , , , , , , | 4 Comments

HoP keeps Hopping Forward – GA Announcement and New IBM Solution Editions for SAP HANA

Almost two years ago, I speculated about the potential value of a HANA on Power solution.  In June, 2014, SAP announced a Test and Evaluation program for Scale-up BW HANA on Power.  That program shifted into high gear in October, 2014 and roughly 10 customers got to start kicking the tires on this solution.  Those customers had the opportunity to push HANA to its very limits.  Remember, where Intel systems have 2 threads per core, POWER8 has up to 8 threads per core.  Where the maximum size of most conventional Intel systems can scale to 240 threads, the IBM POWER E870 can scale to an impressive 640 threads and the E880 system can scale to 1536 threads.  This means that IBM is able to provide an invaluable test bed for system scalability to SAP.  As SAP’s largest customers move toward Suite “4” HANA (S4HANA), they need to have confidence in the scalability of HANA and IBM is leading the way in proving this capability.

A Ramp-up program began in March with approximately 25 customers around the world being given the opportunity to have access to GA level code and start to build out BW POC and production environments.  This brings us forward to the announcement by SAP this week @ SapphireNow in Orlando of the GA of HANA on Power.  SAP announced that customers will have the option of choosing Power for their BW HANA platform, initially to be used in a scale-up mode and plans to support scale-out BW, Suite on HANA and the full complement of side-car applications over the next 12 to 18 months.

Even the most loyal IBM customer knows the comparative value of other BW HANA solutions already available on the market.  To this end, IBM announced new “solution editions”.  A solution edition is simply a packaging of components, often with special pricing, to match expectations of the industry for a specific type of solution.  “Sounds like an appliance to me” says the guy with a Monty Python type of accent and intonation (no, I am not making fun of the English and am, in fact, a huge fan of Cleese and company).  True, if one were to look only at the headline and ignore the details.  In reality, IBM is looking toward these as starting points, not end points and most certainly not as any sort of implied limitation.  Remember, IBM Power Systems are based on the concept of Logical Partitions using Power Virtualization Manager (PVM).  As a result, a Power “box” is simply that, a physical container within which one or multiple logical systems reside and the size of each “system” is completely arbitrary based on customer requirements.

So, a “solution edition” simply defines a base configuration designed to be price competitive with the industry while allowing customers to flexibly define “systems” within it to meet their specific requirements and add incremental capability above that minimum as is appropriate for their business needs.  While a conventional x86 system might have 1TB of memory to support a system that requires 768GB, leaving the rest unutilized, a Power System provides for that 768GB system and allows the rest of the memory to be allocated to other virtual machines.   Likewise, HANA is often characterized by periods of 100% utilization, in support of instantaneous response time demanded of ad-hoc queries, followed by unfathomably long periods (in computer terms) of little to no activity.  Many customers might consider this to be a waste of valuable computing resource and look forward to being able to harness this for the myriad of other business purposes that their businesses actually depend on.  This is the promise of Power.  Put another way, the appliance model results in islands of automation like we saw in the 1990s where Power continues the model of server consolidation and virtualization that has become the modus operandi of the 2000s.

But, says the pitchman for a made for TV product, if you call right now, we will double the offer.  If you believe that, then you are probably not reading my blog.  If a product was that good, they would not have to give you more for the same price.  Power, on the other hand, takes a different approach.  Where conventional BW HANA systems offer a maximum size of 2TB for a single node, Power has no such inherent limitations.  To handle larger sizes, conventional systems must “scale-out” with a variety of techniques, potentially significantly increased costs and complexity.  Power offers the potential to simply “scale-up”.  Future IBM Power solutions may be able to scale-up to 4TB, 8TB or even 16TB.   In a recent post to this blog, I explained that to match the built in redundancy for mission critical reliability of memory in Power, x86 systems would require memory mirroring at twice the amount of memory with an associated increase in CPU and reduction in memory bandwidth for conventional x86 systems.  SAP is pushing the concepts of MCOS, MCOD and multi-tenancy, meaning that customers are likely to have even more of their workloads consolidated on fewer systems in the future.  This will result in demand for very large scaling systems with unprecedented levels of availability.  Only IBM is in position to deliver systems that meet this requirement in the near future.

Details on these solution editions can be found at http://www-03.ibm.com/systems/power/hardware/sod.html
In the last few days, IBM and other organizations have published information about the solution editions and the value of HANA on Power.  Here are some sites worth visiting:

Press Release: IBM Unveils Power Systems Solutions to Support SAP HANA
Video: The Next Chapter in IBM and SAP Innovation: Doug Balog announces SAP HANA on POWER8
Case study: Technische Universität München offers fast, simple and smart hosting services with SAP and IBM 
Video: Technische Universität München meet customer expectations with SAP HANA on IBM POWER8 
Analyst paper: IBM: Empowering SAP HANA Customers and Use Cases 
Article: HANA On Power Marches Toward GA

Selected SAP Press
ComputerWorld: IBM’s new Power Systems servers are just made for SAP Hana
eWEEK, IBM Launches Power Systems for SAP HANA
ExecutiveBiz, IBM Launches Power Systems Servers for SAP Hana Database System; Doug Balog Comments
TechEYE.netIBM and SAP work together again
ZDNet: IBM challenges Intel for space on SAP HANA
Data Center Knowledge: IBM Stakes POWER8 Claim to SAP Hana Hardware Market
Enterprise Times: IBM gives SAP HANA a POWER8 boost
The Platform: IBM Scales Up Power8 Iron, Targets In-Memory

Also a planning guide for HANA on Power has been published at http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102502 .

May 7, 2015 Posted by | Uncategorized | , , , , , , , , , , | 3 Comments

Lintel for SAP App Servers – The right choice

Or is it? Running SAP application servers on IBM Power Systems with Linux results in a lower TCA than using x86 systems with Linux and VMware. Usually, I don’t start a blog post with the conclusion, but was so amazed by the results of this analysis, that I could not help myself.

For several years now, I have seen many customers move older legacy app servers to x86 systems using Linux and VMware as well as implementing new SAP app servers on the same. When asked why, the answers boil down to cost, skills and standards. Historically, Lintel servers were not just perceived to cost less, but could the cost differences could be easily demonstrated. Students emerging from colleges have worked with Linux far more often than with UNIX and despite the fact that learning UNIX and how it is implemented in actual production environments is very little different in real effort/cost, the perception of Linux skills being more plentiful and consequently less expensive persists. Many companies and government entities have decided to standardize on Linux. For some legacy IBM Power Systems customers, a complicating factor, or perhaps a compelling factor in the analysis, has compared partitions on large enterprise class systems against low cost 2-socket x86 servers. And so, increasingly, enterprises have defaulted to Lintel as the app server of choice.

Something has changed however and is completely overturning the conventional wisdom discussed above. There is a new technology which takes advantage of all of those Linux skills on the market, obeys the Linux standards mandate and costs less than virtualized Lintel systems. What is this amazing new technology? Surprise, it is a descendent of the technology introduced in 1990 by IBM called the RS/6000, with new Linux only POWER8 systems. (OK, since you know that I am an IBM Power guy and I gave away the conclusion at the start of this post, that was probably not much of a surprise.) At least, this is what the marketing guys have been telling us and they have some impressive consultant studies and internal analyses that back up their claims.

For those of you who have been following this blog for a while, you know that I am skeptical of consultant analyses and even more so of internal analyses. So, instead of depending on those, I set out to prove, or disprove, this assertion. The journey began with setting reasonable assumptions. Actually, I went a little overboard and gave every benefit of the doubt to x86 and did the opposite for Power.

Overhead – The pundits, both internal and external, seem to suggest that 10% or more overhead for VMware is reasonable. Even VMware’s best practices guide for HANA suggests an overhead of 10%. However, I have heard some customers claim that 5% is possible. So, I decided to use the most favorable number and settled on 5% overhead. PowerVM does have overhead, but it is already baked into all benchmarks and sizings since it is built into the embedded hypervisor, i.e. it is there even when you are running a single virtual machine on a system.

Utilization – Many experts have suggested that average utilization of VMware systems range in the 20% to 30% range. I found at least one analyst that said that the best run shops can drive their VMware systems up to 45%. I selected 45%, once again since I want to give all of the benefit of the doubt to Lintel systems. By comparison, many experts suggest that 85% utilization is reasonable for PowerVM based systems, but I selected 75% simply to not give any of the benefit of the doubt to Power that I was giving to x86.

SAPS – Since we are talking about SAP app servers, it is logical to use SAP benchmarks. The best result that I could find for a 2 socket Linux Intel Haswell-EP system was posted by Dell @ 90,120 SAPS (1). A similar 2-socket server from IBM was posted @ 115,870 SAPS (2).

IBM has internal sizing tables, as does every vendor, in which it estimates the SAPS capacity of different servers based on different OSs. One of those servers, the Power S822L, a 20-core Linux only system, is estimated to be able to attain roughly 35% less SAPS than the benchmark result for its slightly larger cousin running AIX, but this takes into consideration differences in MHz, number of cores and small differences due to the compilers used for SAP Linux binaries.

For our hypothetical comparison, let us assume that a customer needs approximately the SAPS capacity as can be attained with three Lintel systems running VMware including the 5% overhead mentioned above, a sustained utilization of 45% and 256GB per server. Extrapolating the IBM numbers, including no additional PowerVM overhead and a sustained utilization of 75%, results in a requirement of two S822L systems each with 386GB.

Lenovo, HP and Dell all offer easy to use configurators on the web. I ran through the same configuration for each: 2 @ Intel Xeon Processor E5-2699 v3 18C 2.3GHz 45MB Cache 2133MHz 145W, 16 @ 16GB x4 DIMMS, 1 @ Dual-port 10GB Base-T Ethernet adapter, 2 @ 300GB 10K RPM disk (2 @ 1TB 7200 RPM for Dell) and 24x7x4 hour 3-year warranty upgrades (3). Both the Lenovo and HP sites show an almost identical number for RedHat Enterprise Linux with unlimited guests (Dell’s was harder to decipher since they apply discounts to the prices shown), so for consistency, I used the same price for RHEL including 3-yr premium subscription and support. VMware also offers their list prices on the web and the same numbers were used for each system, i.e. Version 5.5, 2-socket, premium support, 3yr (4).

The configuration for the S822L was created using IBM’s eConfig tool: 2 @ 10-core 3.42 GHz POWER8 Processor Card, 12 @ 32GB x4 DIMMS, 1 @ Dual-port 10GB Base-T Ethernet adapter, 2 @ 300GB 10K RPM disk and a 24x7x4 hour 3-year warranty upgrade, RHEL with unlimited guests and 3yr premium subscription and support and PowerVM with unlimited guests, 3yr 24×7 support (SWMA). Quick disclaimer; I am not a configuration expert with IBM’s products much less those from other companies which means there may be small errors, so don’t hold me to these numbers as being exact. In fact, if anyone with more expertise would like to comment on this post and provide more accurate numbers, I would appreciate that. You will see, however, that all three x86 systems fell in the same basic range, so small errors are likely of limited consequence.

The best list price among the Lintel vendors came in at $24,783 including the warranty upgrade. RHEL 7 came in at $9,259 and VMware @ $9,356 with a grand total for of $43,398 and for 3 systems, $130,194. For the IBM Power System, the hardware list was $33,136 including the warranty upgrade, PowerVM for Linux $10,450 and RHEL 7 $6,895 for a grand total of $51,109 and for 2 systems, $102,218.

So, for equivalent effective SAPS capacity, Lintel systems cost around $130K vs. $102K for Power … and this is before we consider the reliability and security advantages not to mention scalability, peak workload handling characteristics, reduced footprint, power and cooling. Just to meet the list price of the Power System, the Lintel vendors would have to deliver a minimum of 22% discount including RHEL and VMware.

Conclusions:
For customers making HANA decisions, it is important to note that the app server does not go away and SAP fully support heterogeneous configurations, i.e. it does not matter if the app server is on a different platform or even a different OS than the HANA DB server. This means that Linux based Power Boxes are the perfect companion to HANA DB servers regardless of vendor.

For customers that are refreshing older Power app servers, the comparisons can be a little more complicated in that there is a reasonable case to be made for running app servers on enterprise class systems potentially also housing database servers in terms of higher effective utilization, higher reliability, the ability to run app servers in an IFL (Integrated Facility for Linux) at very attractive prices, increased efficiencies and improved speeds through use of virtual Ethernet for app to DB communications. That said, any analysis should start with like for like, e.g. two socket scale-out Linux servers, and then consider any additional value that can be gained through the use of AIX (with active memory expansion) and/or enterprise class servers with or without IFLs. As such, this post makes a clear point that, in a worst case scenario, scale-out Linux only Power Systems are less expensive than x86. In a best case scenario, the TCO, reliability and security advantages of enterprise class Power Systems make the value proposition of IBM Power even more compelling.

For customers that have already made the move to Lintel, the message is clear. You moved for sound economic, skills and standards based reasons. When it is time to refresh your app servers or add additional ones for growth or other purposes, those same reasons should drive you to make a decision to utilize IBM Power Systems for your app servers. Any customer that wishes to pursue such an option is welcome to contact me, your local IBM rep or an IBM business partner.

Footnotes:
1. Dell PowerEdge R730 – 2 Processors / 36 Cores / 72 Threads 16,500 users, Red Hat Enterprise Linux 7, SAP ASE 16, SAP enhancement package 5 for SAP ERP 6.0, Intel Xeon Processor E5-2699 v3, 2.3 Ghz, 262,144MB, Cert # 2014033, 9/10/2014

2. IBM Power System S824, 4 Processors / 24 Cores / 192 Threads, 21,212 Users, AIX 7.1, DB2 10.5, SAP enhancement package 5 for SAP ERP 6.0, POWER8, 3.52 Ghz, 524,288MB, Cert # 2014016, 4/28/2014

3. https://www-01.ibm.com/products/hardware/configurator/americas/bhui/flowAction.wss?_eventId=launchNIConfigSession&CONTROL_Model_BasePN=5462AC1&_flowExecutionKey=_cF5B38036-BD56-7C78-D1F7-C82B3E821957_k34676A10-590F-03C2-16B2-D9B5CE08DCC9
http://configure.us.dell.com/dellstore/config.aspx?c=us&cs=04&fb=1&l=en&model_id=poweredge-r730&oc=pe_r730_1356&s=bsd&vw=classic
http://h71016.www7.hp.com/MiddleFrame.asp?view=std&oi=E9CED&BEID=19701&SBLID=&AirTime=False&BaseId=45441&FamilyID=3852&ProductLineID=431

4. http://store.vmware.com/store/vmware/en_US/pd/productID.288070900&src=WWW_eBIZ_productpage_vSphere_Enterprise_Buy_US

January 23, 2015 Posted by | Uncategorized | , , , , , , , , , , , , | 4 Comments

Rebuttal to “Why choose x86” for SAP blog posting

I was intrigued by a recent blog post, entitled: Part 1: SAP on VMware : Why choose x86.  https://communities.vmware.com/blogs/walkonblock/2014/02/06/part-1-sap-on-vmware-why-choose-x86.  I will get to the credibility of the author in just a moment.  First, however, I felt it might be interesting to review the points that were made and discuss these, point by point.

  1. No Vendor Lock-in: When it comes to x86 world, there is no vendor lock-in as you can use any vendor and any make and model as per your requirements”.  Interesting that the author did not discuss the vendor lock-in on chip, firmware or hypervisor.  Intel, or to a very minor degree, AMD, is required for all x86 systems.   This would be like being able to choose any car as long as the engine was manufactured by Toyota (a very capable manufacturer but with a lock on the industry, might not offer the best price or innovation). As any customer knows, each x86 system has its own unique BIOS and/or firmware.  Sure, you can switch from one vendor to another or add a second vendor, but lacking proper QA, training, and potentially different operational procedures, this can result in problems.  And then there is the hypervisor with VMware clearly the preference of the author as it is for most SAP x86 virtualization customers.  No lock-in there?

SAP certifies multiple different OS and hypervisor environments for their code.  Customers can utilize one or more at any given time.  As all logic is written in 3rd and 4th GL languages, i.e. ABAP and JAVA, and is contained within the DB server, customers can move from one OS, HW platform and/or hypervisor to another and only have to, wait for it, do proper QA, training and modify operational procedures as appropriate.  So, SAP has removed lock-in regardless of OS, HW or hypervisor.

Likewise, Oracle, DB2 and Sybase support most OS’s, HW and hypervisors (with some restrictions).  Yes, a migration is required for movement between dissimilar stacks, but this could be said for moving from Windows to Linux and any move between different stacks still requires all migration activities to be completed with the potential exception of data movement when you “simply” change the HW vendor.

  1. Lower hardware & maintenance costs: x86 servers are far better than cheaper than non-x86 servers. This also includes the ongoing annual maintenance costs (AMC) as well.”  Funny, however, that the author only compared HW and maintenance costs and conveniently forgot about OS and hypervisor costs.  Also interesting that the author forgot about utilization of systems.  If one system is ½ the cost of another, but you can only drive, effectively, ½ the workload, then the cost is the same per unit of work.  Industry analysts have suggested that 45% utilization is the maximum sustained to be expected out of VMware SAP systems with most seeing far less.  By the same token, those analysts say that 85% or higher is to be expected of Power Systems.  Also interesting to note that the author did not say which systems were being compared as new systems and options from IBM Power Systems offer close to price parity with x86 systems when HW, OS, hypervisor and 3 years of maintenance are included.
  1. Better performance:  Some of the models of x86 servers can actually out-perform the non-x86 servers in various forms.”  Itanium is one of the examples, which is a no-duh for anyone watching published benchmarks.  The other example is a Gartner paper sponsored by Intel which actually does not quote a single SAP benchmark.  Too bad the author suggested this was a discussion of SAP.  Last I checked (today 2/10/14), IBM Power Systems can deliver almost 5 times the SAPS performance of the largest x86 server (as measured by the 2-tier SD benchmark).  On a SAPS/core basis, Power deliver almost 30% more SAPS/core compared to Windows systems and almost 60% more than Linux/x86 systems.  Likewise, on the 3-tier benchmark, the latest Power result is almost 4.5 times that of the latest x86 result.  So, much for point 3.
  1. Choice of OS: You have choice of using any OS of your choice and not forced to choose a specific OS.”  Yes, it really sucks that with Power, you are forced to choose, AIX … or IBM I for Business … or SUSE Linux … or RedHat Linux which is so much worse than being forced to choose Microsoft Windows … or Oracle Solaris … or SUSE Linux … or RedHat Linux.
  1. Disaster Recovery: You can use any type of hardware, make and model when it comes to disaster recovery (DR). You don’t need to maintain hardware from same vendor.”  Oh, really?  First, I have not met any customers that use one stack for production and a totally different one in DR, but that is not to say that it can’t be done.  Second, remember the discussion about BIOS and firmware?  There can be different patches, prerequisites and workarounds for different stacks.  Few customers want to spend all of the money they “saved” by investing in a separate QA cycle for DR.  Even fewer want to take a chance of DR not working when they can least afford it, i.e. when there is a disaster.  Interestingly, Power actually supports this better than x86 as the stack is identical regardless of which generation, model, mhz is used.  You can even run in Power6 mode on a Power7+ server further enabling complete compatibility regardless of chip type meaning you can use older systems in DR to back up brand new systems in production.
  1. Unprecedented scalability: You can now scale the x86 servers the way you want, TB’s of RAM’s , more than 64 cores etc is very much possible/available in x86 environment.”  Yes, any way that you want as long as you don’t need more capacity than is available with the current 80 core systems.  Any way that you want as long as you are not running with VMware which limits partitions to 128 threads which equates to 64 cores.  Any way that you want but that VMware suggests that you contain partitions within a NUMA block which means a max of 40 cores.  http://blogs.vmware.com/apps/sap  Any way that you want as long as you recognize that VMware partitions are further limited in terms of scalability which results in an effective limit of 32 threads/16 cores as I have discussed in this blog previously.
  1. Support from Implementation Vendor: “If you check with your implementation vendor/partner, you will find they that almost all of them can certify/support implementation of SAP on x86 environment. The same is the case if you are thinking about migrating from non-x86 to x86 world.”  No clue what point is being made here as all vendors on all supported systems and OSs support SAP on their systems.

The author referred to my blog as part of the proof of his/her theories which is the only reason why I noticed this blog in the first place.  The author describes him/herself as “Working with Channel Presales of an MNC”.  Interesting that he/she hides him/herself behind “MNC” because the “MNC” that I work for believes that transparency and honesty are required in all internet postings.  That said, the author writes about nothing but VMware, so you will have to draw your own conclusions as to where this individual works or with which “MNC” his/her biases lie.

The author, in the reference to my posting, completely misunderstood the point that I made regarding the use of 2-tier SAP benchmark data in projecting the requirements of database only workloads and apparently did not even read the “about me” which shows up by default when you open my blog.  I do not work for SAP and nothing that I say can be considered to represent them in any way.

Fundamentally, the author’s bottom line comment, “x86 delivers compelling total cost of ownership (TCO) while considering SAP on x86 environment” is neither supported by the facts that he/she shared nor by those shared by others.  IBM Power Systems continues to offer very competitive costs with significantly superior operational characteristics for SAP and non-SAP customers.

February 17, 2014 Posted by | Uncategorized | , , , , , | 2 Comments

High end Power Systems customers have a new option for SAP app servers that is dramatically less expensive than x86 Linux solutions

Up until recently, if you were expanding the use of your SAP infrastructure or have some older Power Systems that you were considering replacing with x86 Linux systems, I could give you a TCO argument that showed how you could see roughly equivalent TCO using lower end Power Servers.  Of course, some people might not buy into all of the assumptions or might state that Linux was their new standard such that AIX was no longer an acceptable option.  Recently, IBM made an announcement which has changed the landscape so dramatically that you can now obtain the needed capacity using high end server “dark cores” with Linux, not at an equivalent TCO, but at a dramatically lower TCA.

The new offering is called IFL which stands for Integrated Facility for Linux.  This concept originated with System Z (aka mainframe) several years ago.  It allows customers that have existing Power 770, 780 or 795 servers with capacity on demand “dark cores”, i.e. for which no workload currently runs and the license to use the hardware, virtualization and OS software have not been activated, to turn on a group of cores and memory specifically to be used for Linux only workloads.  A Power IFL is composed of 4 cores with 32GB of memory and has a list price of $8,591.

In the announcement materials provided by IBM Marketing, an example is provided of a customer that would need to add the equivalent of 16 cores @ 80% utilization and 128GB of memory to an existing Power 780 4.4GHz system or would need the equivalent capacity using a 32-core HP DL560 2.7GHz system running at 60% utilization.  They used SPECint_rate as the basis of this comparison.  Including 3 year license for PowerVM, Linux subscription and support, 24×7 hardware maintenance and the above mentioned Power activations, the estimated street price would be approximately $39,100.  By comparison, the above HP system plus Linux subscription and support, VMware vSphere and 24×7 hardware maintenance would come in at an estimated street price of approximately $55,200.

Already sounds like a good deal, but I am a skeptic, so I needed to run the numbers myself.  I find SPECint_rate to be a good indicator of performance for almost no workloads and an incredibly terrible indicator of performance for SAP workloads.  So, I took a different approach.  I found a set of data from an existing SAP customer of IBM which I then used to extrapolate capacity requirements.  I selected the workloads necessary to drive 16 cores of a Power 780 3.8GHz system @ 85% utilization.  Why 85%?  Because we, and independent sources such as Solitaire Interglobal, have data from many large customers that report routinely driving their Power Systems to a sustained utilization of 85% or higher.  I then took those exact same workloads and modeled them onto x86 servers assuming that they would be virtualized using VMware.  Once again, Solitaire Interglobal reports that almost no customers are able to drive a sustained utilization of 45% in this environment and that 35% would be more typical, but I chose a target utilization of 55% instead to make this as optimistic for the x86 servers as possible.  I also applied only a 10% VMware overhead factor although many sources say that is also optimistic.  It took almost 6 systems with each hosting about 3 partitions to handle the same workload as the above 16-core IFL pool did.

Once again, I was concerned that some of you might be even more optimistic about VMware, so I reran the model using a 65% target utilization (completely unattainable in my mind, but I wanted to work out the ultimate, all stars aligned, best admins on the planet, tons of time to tune systems, scenario) and 5% VMware overhead (I don’t know anyone that believes VMware overhead to be this low).  With each system hosting 3 to 4 partitions, I was able to fit the workloads on 5 systems.  If we just go crazy with unrealistic assumptions, I am sure there is a way that you could imagine these workloads fitting onto 4 systems.

Next, I wanted to determine the accurate price for those x86 systems.  I used HP’s handy on-line ordering web site to price some systems.  Instead of the DL560 that IBM Marketing used, I chose the DL360e Gen8 system, with 2@8-core 1.8GHz processors, 64GB of memory, a pair of 7200rpm 500GB hard drives, VMware Enterprise for 2 processors with 3 yr subscription, RH Enterprise Linux 2 socket/4 guest with 3 yr subscription, 3yr 24×7  ProCare Service and HP installation services.  The total price comes to $27,871 which after an estimated discount of 25% on everything (probably not realistic), results in a street price of $20,903.

Let’s do the math.  Depending on which x86 scenario you believe is reasonable, it either takes 6 systems at a cost of $125,419, 5 systems @ $104,515 or 4 systems @ $83,612 to handle the same load as a 4 IFL/16-core pool of partitions on a 780 at a cost of $39,100.  So, in the most optimistic case for x86, you would still have to pay $44,512 more.  It does not take a rocket scientist to realize that using Power IFLs would result in a far less expensive solution with far better reliability and flexibility characteristics not to mention better performance since communication to/from the DB servers would utilize the radically faster backplane instead of an external TCP/IP network.

But wait, you say.  There is a better solution.  You could use bigger x86 systems with more partitions on each one.  You are correct.  Thanks for bringing that up.  Turns out, just as with Power Systems, if you put more partitions on each VMware system, the aggregate peaks never add up to the sum of the individual peaks.  Using 32-core, DL560s @ 2.2GHz, 5% VMware overhead and 65% target utilization, you would only need 2 systems.  I priced them on the HP web site with RH Linux 4 socket/unlimited guests 3yr subscription, VMware Enterprise 4 socket/3yr, 24×7 ProCare and HP installation service and found the price to be $70,626 per system, i.e. $141,252 for two systems, $105,939 after the same, perhaps unattainable 25% discount.  Clearly, 2 systems are more elegant than 4 to 6, but still, this solution is still $66,839 more expensive than the IFL solution.

I started off to try and prove that IBM Marketing was being overly optimistic and ended up realizing that they were highly conservative.  The business case for using IFLs for SAP app servers on an existing IBM high end system with unutilized dark cores compared to net new VMware/Linux/x86 systems is overwhelming.  As many customers have decided to utilize high end Power servers for DB due to their reliability, security, flexibility and performance characteristics, the introduction of IFLs for app servers is almost a no-brainer.

 

 

 

 

 

Configuration details:

HP ProLiant DL360e Gen8 8 SFF Configure-to-order Server – (Energy Star)661189-ESC $11,435.00

HP ProLiant DL360e Gen8 Server
HP DL360e Gen8 Intel® Xeon® E5-2450L (1.8GHz/8-core/20MB/70W) Processor FIO Kit x 2
HP 32GB (4x8GB) Dual Rank x4 PC3L-10600 (DDR3-1333) Reg CAS-9 LP Memory Kit x 2
HP Integrated Lights Out 4 (iLO 4) Management Engine
HP Embedded B120i SATA Controller
HP 8-Bay Small Form Factor Drive Cage
HP Gen8 CPU1 Riser Kit with SAS Kit + SAS License Kit
HP 500GB 6G SATA 7.2K rpm SFF (2.5-inch) SC Midline 1yr Warranty Hard Drive x 2
HP 460W Common Slot Platinum Plus Hot Plug Power Supply
HP 1U Small Form Factor Ball Bearing Gen8 Rail Kit
3-Year Limited Warranty Included

3yr, 24×7 4hr ProCare Service $1,300.00

HP Install HP ProLiant $225.00

Red Hat Enterprise Linux 2 Sockets 4 Guest 3 Year Subscription 24×7 Support No Media Lic E-LTU $5,555.00

VMware vSphere Enterprise 1 Processor 3 yr software $4,678.00 x 2 = $9,356.00

DL360e Total price $27,871.00

 

ProLiant DL560 Gen8 Configure-to-order Server (Energy Star) 686792-ESC    $29,364.00

HP ProLiant DL560 Gen8 Configure-to-order Server
HP DL560 Gen8 Intel® Xeon® E5-4620 (2.2GHz/8-core/16MB/95W) Processor FIO Kit
HP DL560 Gen8 Intel® Xeon® E5-4620 (2.2GHz/8-core/16MB/95W) Processor Kit x3
HP 16GB (2x8GB) Dual Rank x4 PC3L-10600 (DDR3-1333) Reg CAS-9 LP Memory Kit x 4
ENERGY STAR® qualified model
HP Embedded Smart Array P420i/2GB FBWC Controller
HP 500GB 6G SAS 7.2K rpm SFF (2.5-inch) SC Midline 1yr Warranty Hard Drive x 2
HP iLO Management Engine(iLO 4)
3 years parts, labor and onsite service (3/3/3) standard warranty. Certain restrictions and exclusions apply.

HP 3y 4h 24×7 ProCare Service   $3,536.00

Red Hat Enterprise Linux 4 Sockets Unlimited Guest 3 Yr Subscription 24×7 Support No Media Lic E-LTU     $18,519.00

VMware vSphere Enterprise 1 Processor 3 yr software $4,678.00 x 4 = $18,712.00

HP Install DL560 Service   $495.00

DL560 Total price:   $70,626.00

October 21, 2013 Posted by | Uncategorized | , , , , , , , , , | Leave a comment

Head in the cloud? Keep your feet on the ground with IBM with cloud computing for SAP.

Cloud means many things to many people.  One definition, popularized by various internet based organizations refers to cloud as a repository of web URLs, email, documents, pictures, videos, information about items for sale, etc. on a set of servers maintained by an internet provider where any server in that cluster may access and make available to the end user the requested object.  This is a good definition for those types of services, however SAP does not exist as a set of independent objects that can be stored and made available on such a cloud.

 

Another definition involves the dynamic creation, usage and deletion of system images on a set of internet based servers hosted by a provider.   Those images could contain just about anything including SAP software and customer data.  Security of customer data, both on disk and in transit across the internet, service level agreements, reliability, backup/recovery and government compliance (where appropriate) are just a few of the many issues that have to be addressed in such implementations.  Non-production systems are well suited for this type of cloud since many of the above issues may be less of a concern than for production systems.  Of course, that is only the case when no business data or intellectual property, e.g. developed ABAP or Java code, is stored on such servers in which case these systems become more and more sensitive, like production.  This type of public cloud may offer a low cost for infrequently accessed or low utilization environments.  Those economics can often change dramatically as usage increases or if more controls are desired.

 

Yet another definition utilizes traditional data center hosting providers that offer robust security, virtual private networks, high speed communications, high availability, backup/recovery and thorough controls.  The difference between conventional static hosting and cloud hosting is that the resources utilized for a given customer or application instance may be hosted on virtual rather than dedicated systems, available on demand, may be activated or removed via a self-service portal and may be multi-tenant, i.e. multiple customers may be hosted on a shared cloud.  While more expensive than the above cloud, this sort of cloud is usually more appropriate for SAP production implementations and is often less expensive than building a data center, staffing it with experts, acquiring the necessary support infrastructure, etc.

 

As many customers already own data centers, have large staffs of experts and host their own SAP systems today, another cloud alternative is often required: a Private Cloud.  These customers often wish to reduce the cost of systems by driving higher utilization, shared use of infrastructure among various workloads, automatic load balancing, improvements in staff productivity and potentially even self-service portals for on demand systems with charge back accounting to departments based on usage.

 

Utilizing a combination of tools from IBM and SAP, customers can implement a private cloud and achieve as many of the above goals as desired.  Let’s start with SAP.  SAP made its first foray into this area several years ago with their Adaptive Computing Controller (ACC).  Leveraging SAP application virtualization it allowed for start, stop and relocate of SAP instances under the control of basis administrators.  This helped SAP to get a much deeper appreciation for customer requirements which enabled them to develop SAP NetWeaver Landscape Virtualization Management (LVM).   SAP, very wisely, realized that attempting to control infrastructure resources directly would require a huge effort, continuous updates as partner technology changed not to mention an almost unlimited number of testing and support scenarios.  Instead, SAP developed a set of business workflows to allow basis admins to perform a wide array of common tasks.  They also developed an API and invited partners to write interfaces to their respective cloud enabling solutions.  In this way, while governing a workflow SAP LVM simply has to request a resource, for example, from the partner’s systems or storage manager, and once that resource is delivered, continue with the rest of the workflow on the SAP application level.

 

IBM was an early partner with SAP ACC and has continued that partnership with SAP LVM.  By integrating storage management, the solution and enablement in the IBM Power Systems environment is particularly thorough and is probably the most complete of its kind on the market.  IBM offers two types of systems managers, IBM Systems Director (SD) and IBM Flex Systems Manager (FSM).  SD is appropriate for rack based systems including conventional Power Systems in addition to IBM’s complete portfolio of systems and storage.  As part of that solution, customers can manage physical and virtual resources, maintain operating systems, consolidate error management, control high availability and even optimize data center energy utilization.  FSM is a manager specifically for IBM’s new PureSystems family of products including several Power Systems nodes.  FSM is focused on the management of the components delivered as part of a PureSystems environment where SD is focused on the entire data center including PureSystems, storage and rack based systems.  Otherwise, the functions in an LVM context, are largely the same.  FSM may be used with SD in a data center either side by side or with FSM feeding certain types of information up to SD.  IBM also offers a storage management solution called Tivoli Storage FlashCopy Manager (FCM).  This solution drives the non-disruptive copying of filesystems on appropriate storage subsystems such as IBM’s XIV as well as virtually any IBM or non-IBM storage subsystem through the IBM SAN Volume Controller (SVC) or V7000 (basically an SVC packaged with its own HDD and SSD).

 

Using the above, SAP LVM can capture OS images including SAP software, find resources on which to create new instances, rapidly deploy images, move them around as desired to load balance systems or when preventative maintenance is desired, monitor SAP instances, provide advanced dashboards, a variety of reports, and make SAP system/db copies, clones or refreshes including the SAP relevant post-copy automation tasks.

 

What makes the IBM Power Systems implementation unique is the integration between all of the pieces of the solution.  Using LVM with Power Systems with either SD, FSM or both, a basis admin can see and control both physical and virtual resources as PowerVM is built in and is part of every Power System automatically.  This means that when, for instance, a physical node is added to an environment, SD and FSM can see it immediately meaning the LVM can also see it and start using it.   In the x86 world, there are two supported configurations for LVM, native and virtualized.  Clearly, a native installation is limited by its very definition as all of the attributes of resource sharing, movement and some management features that come with virtualization are not present in a native installation.

 

According to SAP Note 1527538 – SAP NetWeaver Landscape Virtualization Management 1.0, currently only VMware is supported for virtualized x86 environments.  LVM with VMware/x86 based implementations rely on VMware vCenter meaning they can control only virtual resources.  Depending on customer implementation, systems admins may have to use a centralized systems management tool for installation, network, configuration and problem management, i.e. the physical world,  and vCenter for the virtual world.  This contrasts with SD or FSM which can manage the entire Power Systems physical and virtual environment plus all of the associated network and chassis management, where appropriate.

 

LVM with Power Systems and FCM can drive full database copy/clone/refresh activity through disk subsystems.  Disk subsystems such as IBM XIV, can make copies very fast in a variety of ways.  Some make pointer based copies which means that only changed blocks are duplicated and a “copy” is made available almost immediately for further processing by LVM.  In some situations and/or with some disk subsystems, a full copy process, in which every block is duplicated, might be utilized but this happens at the disk subsystem or SAN level without involving a host system so is not only reasonably fast but also does not take host system resources.  In fact, a host system in this configuration does not even need to stop processing but merely place the source DB into “logging only” mode which is then resumed into normal operating mode a short time later after the copy is initiated.

 

LVM with x86 offers two options.  Option 1, utilize VMware and its storage copy service.  Option 2, utilize LVM native or with VMware and use a separate plugin with from a storage subsystem vendor.  Option 2 works pretty much the same as the above described Power/FCM solution except that only certain vendors are supported and any integration of plugins from different companies, not to mention any troubleshooting is a customer task.  It might be worthwhile to consider the number of companies that might be required to solve a problem in this environment, e.g. SAP, VMware, the storage subsystem vendor, the OS vendor and the systems vendor.

 

For Option 1, VMware drives copies via vCenter using a host only process.  According to above mentioned SAP Note, “virtualization based cloning is only supported with Offline Database.”  This might be considered a bit disruptive by some and impossible to accommodate by others.  Even though it might be theoretically possible to use a VMware snapshot, a SID rename process must be employed for a clone and every table must be read in and then out again with changes to the SID.  (That said, for some other LVM activities not involving a full clone, a VMware snapshot might be used.)  As a result, VMware snapshots may quickly take on the appearance of a full copy, so may not be the best technology to use both for the overhead on the system and the fact that VMware itself does not recommend keeping database snapshots around for more than a few days at most, so the clone process typically uses the full copy option.  When the full copy is initiated by VMware, every block must be read into VMware and then back out.  Not only is this process slow for large databases, but it places a large load on the source system potentially resulting in poor performance for other partitions during this time.   Since a full copy is utilized, a VMware based copy/clone will also take radically more disk storage than a Power/XIV based clone as it is fully supported with a “changed block only” copy.

 

Of course, the whole discussion of using LVM with vCenter may be moot.  After all, the assumption is that one would be utilizing VMware for database systems.  Many customers choose not to do this for a variety of reasons, from multiple single points of failure, to scaling to database vendor support to potential issues in problem resolution due to the use of a multi-layer, multi-vendor stack, e.g. hardware from one vendor with proprietary firmware from another vendor, processor chip from another vendor, virtualization software from VMware, OS from Microsoft, SUSE, Red Hat or Oracle not to mention high availability and other potential issues.  Clearly, this would not be an issue if one eliminates database systems from the environment, but that is where some of the biggest benefit of LVM are realized.

 

LVM, as sophisticated as it is currently, does not address all of the requirements that some customers might have for a private cloud.  The good news is that it doesn’t have to.  IBM supplies a full range of cloud enabling products under the brand name of IBM Smart Cloud.  These tools range from an “Entry” product suitable for adding a simple self-service portal, some additional automation and some accounting features to a full feature “Enterprise” version.  Those tools call SD or FSM functions to manage the environment which is quite fortunate as any changes made by those tools is immediately visible to LVM, thereby completing the circle.

 

SAP and IBM collaborated to produce a wonderful and in-depth document that details the IBM/Power solution:  https://scn.sap.com/docs/DOC-24822

 

A blogger at SAP has also written extensively on the topic of Cloud for SAP.  You can see his blog at: http://blogs.sap.com/cloud/2012/07/24/sap-industry-analyst-base-camp-a-recap-of-the-sap-cloud-strategy-session/

January 4, 2013 Posted by | Uncategorized | , , , , , , , , , , , , | Leave a comment

Is the SAP 2-tier benchmark a good predictor of database performance?

Answer: Not even close, especially for x86 systems.  Sizings for x86 systems based on the 2-tier benchmark can be as much as 50% smaller for database only workloads as would be predicted by the 3-tier benchmark.  Bottom line, I recommend that any database only sizings for x86 systems or partitions be at least doubled to ensure that enough capacity is available for the workload.  At the same time, IBM Power Systems sizings are extremely conservative and have built in allowances for reality vs. hypothetical 2-tier benchmark based sizings.  What follows is a somewhat technical and detailed analysis but this topic cannot, unfortunately, be boiled down into a simple set of assertions.

 

The details: The SAP Sales and Distribution (S&D) 2-tier benchmark is absolutely vital to SAP sizings as workloads are measured in SAPS (SAP Application Performance Standard)[i], a unit of measurement based on the 2-tier benchmark.  The goal of this benchmark is to be hardware independent and useful for all types of workloads, but the reality of this benchmark is quite different.  The capacity required for the database server portion of the workload is 7% to 9% of the total capacity with the remainder used by multiple instances of dialog/update servers and a message/enqueue server.  This contrasts with the real world where the ratio of app to DB servers is more in the 4 to 1 range for transactional systems and 2 or 1 to 1 for BW.  In other words, this benchmark is primarily an application server benchmark with a relatively small database server.  Even if a particular system or database software delivered 50% higher performance for the DB server compared to what would be predicted by the 2-tier benchmark, the result on the 2-tier benchmark would only change by .07 * .5 = 3.5%.

 

How then is one supposed to size database servers when the SAP Quicksizer shows the capacity requirements based on 2-tier SAPS?   A clue may be found by examining another, closely related SAP benchmark, the S&D 3-tier benchmark.  The workload used in this benchmark is identical to that used in the 2-tier benchmark with the difference being that in the 2-tier benchmark, all instances of DB and App servers must be located within one operating system (OS) image where with the 3-tier benchmark, DB and App server instances may be distributed to multiple different OS images and servers.  Unfortunately, the unit of measurement is still SAPS but this represents the total SAPS handled by all servers working together.  Fortunately, 100% of the SAPS must be funneled through the database server, i.e. this SAPS measurement, which I will call DB SAPS, represents the maximum capacity of the DB server.

 

Now, we can compare different SAPS and DB SAPS results or sizing estimates for various systems to see how well 2-tier and 3-tier SAPS correlate with one another.  Turns out, this is easier said than done as there are precious few 3-tier published results available compared to the hundreds of results published for the 2-tier benchmark.  But, I would not be posting this blog entry if I did not find a way to accomplish this, would I?  I first wanted to find two results on the 3-tier benchmark that achieved similar results.  Fortunately, HP and IBM both published results within a month of one another back in 2008, with HP hitting 170,200 DB SAPS[ii] on a 16-core x86 system and IBM hitting 161,520 DB SAPS[iii] on a 4-core Power system.

 

While the stars did not line up precisely, it turns out that 2-tier results were published by both vendors just a few months earlier with HP achieving 17,550 SAPS[iv] on the same 16-core x86 system and IBM achieving 10,180 SAPS[v] on a 4-core and slightly higher MHz (4.7GHz or 12% faster than used in the 3-tier benchmark) Power system than the one in the 3-tier benchmark.

 

Notice that the HP 2-tier result is 72% higher than the IBM result using the faster IBM processor.  Clearly, this lead would have even higher had IBM published a result on the slower processor.  While SAP benchmark rules do not allow for estimates of slower to faster processors by vendors, even though I I am posting this as an individual not on behalf of IBM, I will err on the side of caution and give you only the formula, not the estimated result:  17,550 / (10,180 * 4.2 / 4.7) = the ratio of the published HP result to the projected slower IBM processor.  At the same time, HP achieved only a 5.4% higher 3-tier result.  How does one go from almost twice the performance to essentially tied?  Easy answer, the IBM system was designed for database workloads with a whole boatload of attributes that go almost unused in application server workloads, e.g. extremely high I/O throughput and advanced cache coherency mechanisms.

 

One might point out that Intel has really turned up its game since 2008 with the introduction of Nehalem and Westmere chips and closed the gap, somewhat, against IBM’s Power Systems.  There is some truth in that, but let’s take a look at a more recent result.  In late 2011, HP published a 3-tier result of 175,320 DB SAPS[vi].  A direct comparison of old and new results show that the new result delivered 3% more performance than the old with 12 cores instead of 16 which works out to about 37% more performance per core.  Admittedly, this is not completely correct as the old benchmark utilized SAP ECC 6.0 with ASCII and the new one used SAP ECC 6.0 EP4 with Unicode which is estimated to be a 28% higher resource workload, so in reality, this new result is closer to 76% more performance per core.  By comparison, a slightly faster DL380 G7[vii], but otherwise almost identical system to the BL460c G7, delivered 112% more SAPS/core on the 2-tier benchmark compared to the BL680c G5 and almost 171% more per SAPS/core once the 28% factor mentioned above is taken into consideration.  Once again, one would need to adjust these numbers based on differences in MHz and the formula for that would be: either of the above numbers * 3.06/3.33 = estimated SAPS/core.

 

After one does this math, one would find that improvement in 2-tier results was almost 3 times the improvement in 3-tier results further questioning whether the 2-tier benchmark has any relevance to the database tier.  And just one more complicating factor; how vendors interpret SAP Quicksizer output.  The Quicksizer conveniently breaks down the amount of workload required of both the DB and App tiers.  Unfortunately, experience shows that this breakdown does not work in reality, so vendors can make modifications to the ratios based on their experience.  Some, such as IBM, have found that DB loads are significantly higher than the Quicksizer estimates and have made sure that this tier is sized higher.  Remember, while app servers can scale out horizontally, unless a parallel DB is used, the DB server cannot, so making sure that you don’t run out of capacity is essential.  What happens when you compare the sizing from IBM to that of another vendor?  That is hard to say since each can use whatever ratio they believe is correct.  If you don’t know what ratio the different vendors use, you may be comparing apples and oranges.

 

Great!  Now, what is a customer to do now that I have completely destroyed any illusion that database sizing based on 2-tier SAPS is even remotely close to reality?

 

One option is to say, “I have no clue” and simply add a fudge factor, perhaps 100%, to the database sizing.  One could not be faulted for such a decision as there is no other simple answer.  But, one could also not be certain that this sizing was correct.  For example, how does I/O throughput fit into the equation.  It is possible for a system to be able to handle a certain amount of processing but not be able to feed data in at the rate necessary to sustain that processing.  Some virtualization managers, such as VMware have to transfer data first to the hypervisor and then to the partition or in the other direction to the disk subsystem.  This causes additional latency and overhead and may be hard to estimate.

 

A better option is to start with IBM.  IBM Power Systems is the “gold standard” for SAP open systems database hosting.  A huge population of very large SAP customers, some of which have decided to utilize x86 systems for the app tier, use Power for the DB tier.  This has allowed IBM to gain real world experience in how to size DB systems which has been incorporated into its sizing methodology.  As a result, customers should feel a great deal of trust in the sizing that IBM delivers and once you have this sizing, you can work backwards into what an x86 system should require.  Then you can compare this to the sizing delivered by the x86 vendor and have a good discussion about why there are differences.  How do you work backwards?  A fine question for which I will propose a methodology.

 

Ideally, IBM would have a 3-tier benchmark for a current system from which you could extrapolate, but that is not the case.  Instead, you could extrapolate from the published result for the Power 550 mentioned above using IBM’s rperf, an internal estimate of relative performance for database intensive environments which is published externally.  The IBM Power Systems Performance Report[viii] includes rperf ratings for current and past systems.  If we multiply the size of the database system as estimated by the IBM ERP sizer by the ratio of per core performance of IBM and x86 systems, we should be able to estimate how much capacity is required on the x86 system.  For simplicity, we will assume the sizer has determined that the database requires 10 of 16 @ IBM Power 740 3.55GHz cores.  Here is the proposed formula:

 

Power 550 DB SAPS x 1/1.28 (old SAPS to new SAPS conversion) x rperf of 740 / rperf of 550

161,520 DB SAPS x 1/1.28 x 176.57 / 36.28 = estimated DB SAPS of 740 @ 16 cores

Then we can divide that above number by the number of cores to get a per core DB SAPS estimate.  By the same token you can divide the published HP BL 460c G7 DB SAPS number by the number of cores.  Then:

Estimated Power 740 DB SAPS/core / Estimated BL460c G7 DB SAPS/core = ratio to apply to sizing

The result is a ratio of 2.6, e.g. if a workload requires 10 IBM Power 740 3.55GHz cores, it would require 26 BL460c G7 cores.  This contrasts to the per core estimated SAPS based on the 2-tier benchmark which suggests just that the Power 740 would have been just 1.4 time the performance per core.   In other words, a 2-tier based sizing would suggest that the x86 system require just 14 cores where the 3-tier comparison suggests it actually needs almost twice that.  This is, assuming the I/O throughput is sufficient.  This also suggests that both systems have the same target utilization.  In reality, where x86 systems are usually sized for no more than 65% utilization, Power System are routinely sized for up to 85% utilization.

 

If this workload was planned to run under VMware, the number of vcpus must be considered which is twice the number of cores, i.e. this workload would require 52 cores which is over the limit of 32 vcpu limit of VMware 5.0.  Even when VMware can handle 64 vcpu, the overhead of VMware and its ability to sustain the high I/O of such a workload must be included in any sizing.

 

Of course, technology moves on and Intel is into its Gen8 processors.  So, you may have to adjust what you believe to the effective throughput of the x86 system based on relative performance to the BL460c G7 above, but now, at least, you may have a frame of reference for doing the appropriate calculations.  Clearly, we have shown that 2-tier is an unreliable benchmark by which to size database only systems or partitions and can easily be off by 100% for x86 systems.

 


[ii] 170,200 SAPS/34,000 users, HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2008003

[iii] 161,520 SAPS/32,000 users, IBM System p 550, 2 Processor/4 Core/8 Thread, POWER6, 4.2 Ghz , AIX 5.3,  DB2 9.5, Certification # 2008001

[iv] 17,550 SAPS/3,500 users , HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2007055

[v] 10,180 SAPS/2,035 users, IBM System p 570, 2 Processor/4 Core/8 Thread, POWER6, 4.7 Ghz, AIX 6.1, Oracle 10G, Certification # 2007037

[vi] 175,320 SAPS/32,125 users, HP ProLiant BL460c G7, 2 Processor/12 Core/24 Thread, X5675, 3.06 Ghz,  Windows Server 2008 R2 Enterprise on VMware ESX 5.0, SQL Server 2008, Certification # 2011044

[vii]27,880 SAPS/ 5,110 users, HP ProLiant DL380 G7, 2 Processor/12 Core/24 Thread, X5680, 3.33 Ghz, Windows Server 2008 R2 Enterprise, SQL Server 2008, Certification # 2010031

July 30, 2012 Posted by | Uncategorized | , , , , , , , , , | 1 Comment