SAPonPower

An ongoing discussion about SAP infrastructure

IBM Power10 debuts with a new SAP Benchmark!

Today, SAP published a new SD 2-tier result for IBM’s soon to be announced Power E1080.[i]  First the highlights: 

  • 174,000 SD Users 
  • 955,050 SAPS
  • 120 cores

Wait, almost 1M SAPS with only 120 cores?  HPE achieved 670,830 SAPS (122,300 users) with 224 cores on their Superdome Flex 280 with the Intel Xeon Platinum 8380H Processor in January, 2021.  

This new result is almost 3 times the SAPS/core of HPE’s biggest and baddest system.  (Funny note, autocorrect tried to change “baddest” to “saddest”). This new result is also about 33% faster, on a per core basis, than the previous Power 980 result published at the end of 2018.  That is certainly not remarkable since Intel’s per core performance on this benchmark also increased about 69.5%, since 2017 … sorry, missed the decimal, 0.695%. (Comparing two Dell 2-socket results, Intel 8180 & Intel 8380).

Clearly, IBM has moved the microarchitecture technology ball forward with a huge improvement in per core performance.  And that is significant in that Intel seems to have given up on the microarchitecture game and only seems to be focused on increasing the core count (now up to 40 per socket).  

But isn’t the SD benchmark based on ECC 6.0 and primarily an app server benchmark, so do we really care if we are talking about SAP workloads?  For that matter, isn’t HANA the name of the game now and how can we correlate this result against HANA workloads?

Yes and you can’t.  I will answer the second question first.  SAP rules forbid comparisons against different benchmarks and for good reason; they don’t have the same logic, application code, database usage, memory dependency or anything else for that matter.  But, we will get to the impact on HANA a bit later in this blog.

The SD benchmark is rather removed from reality both by its age, its dependence on an outdated interface (the old and much loved SAP GUI, not) not to mention old non-HANA databases.  Fun fact: since 2005, 96 results are used MaxDB or Sybase, 155 – IBM Db2, 52 – Oracle, 413 – Microsoft SQL and 0 used HANA.  And since application servers can easily scale across dozens of systems, the performance per core doesn’t really matter all that much, and this equation usually boils down to $/SAPS.

At Hot Chips 2020, Bill Starke, IBM Power Chief Architect and Brian Thompto, Power10 core architect, revealed a bunch of amazing speeds and feeds including 2.25x the memory bandwidth for Power10 vs Power9 per socket.[ii]  We know that HANA eats memory bandwidth for breakfast, lunch, dinner and all snacks in between.  This new SD benchmark (and others that IBM will undoubtedly publish very soon) suggest that these new Power processors will be able to handle all workloads, including SAP HANA, with either fewer cores or with the same number of cores and tons of CPU cycles to spare.  

It might be tempting to consider using a smaller Power10 system, but this is where the problem gets a bit sticky.  HANA not only loves memory bandwidth, but unless you are going to provision a server with less memory than SAP recommends or use one of their tiered approaches, you still need the same quantity of memory regardless of server or microarchitecture.  You could certainly reduce the number of cores per socket or go to slower chip speeds and this might be a very good approach for reducing HANA system costs for a lot of customers.  Another option to consider is using those spare cycles for something else, after all HANA is supported by SAP for use with IBM PowerVM shared processor pools.

What other workloads might you use those cycles for?  We could get into a big discussion about all sort of other workloads, like AI, HPC, etc., but how about we keep this simple?  How about for the thing that the SD benchmark actually does test, application serving?  Even with S/4HANA and Fiori, you still need application servers.  And if you already purchased a server for HANA based on memory requirements and you have a ton of cycles left over, this means that the $/SAPS for those application servers essentially goes toward $0!  I have not priced an Intel server lately, but I am pretty certain that the price is not even remotely close to $0.

For existing SAP on Power customers (both HANA and non-HANA), Power10 is going to be amazing, resulting in either better performance, lower cost or both!  For customers still trying to decide on which type of system to use, I would strongly encourage a full landscape cost comparison be performed including production HANA and application servers, HA, non-prod and DR.  

And as good a news as this is for on-premise customers, cloud vendors that offer HANA on Power, such as IBM, Syntax and SAP, should be even more excited about how they can decrease their costs while offering better solutions to their customers with Power10.


[i] https://www.sap.com/dmc/exp/2018-benchmark-directory/#/sd

[ii] https://www.nextplatform.com/2020/08/18/ibm-brings-an-architecture-gun-to-a-chip-knife-fight/

September 1, 2021 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Haswell-EX for HANA looks good on paper, POWER8 for HANA looks even better in real life

I was delighted to read Hasso Plattner’s recent blog on the strengths of HANA on platforms using the Haswell-EX chip from Intel:  https://blogs.saphana.com/2015/06/29/impact-of-haswell-on-hana/  In that blog, he did an excellent job of explaining how technical enhancements at a processor and memory subsystem level can result in dramatic improvement in the way that HANA operates,   Now, I know what you are thinking; he likes what Dr. Plattner has to say about a competitor’s technology?   Strange as it may seem, yes … in that he has pointed out a number of relevant features that, as good as Haswell-EX might be, POWER8 surpassed, even before Haswell-EX was announced.

All of these technical features and discussion are quite interesting to us propeller heads.   Most business people, on the other hand, would probably prefer to discuss how to improve HANA operational characteristics, deliver flexibility to respond to changing business demands and meet end user SLAs including response time and continuous availability.  This is where POWER8 really shines.  With PowerVM at its core, Power Systems can be tailored to deliver capacity for HANA production to ensure consistent response time and peak load capacity during high demand times and allow other applications and partitions to utilize capacity unused by the HANA production partition.   It can easily mix production with other production and non-production partitions.  It features the ability to utilize shared network and SAN resources, if desired, to reduce datacenter cost and complexity.  POWER8 delivers unmatched reliability by default, not as an option or a tradeoff against performance.

Regarding the technical features, Herr Dr. Plattner points out that Haswell-EX systems:

  • Support up to 144 cores per system with 12TB of memory.  POWER8 supports up to 196 cores and 16TB.  Actually, this under estimates that actual memory on a POWER8 system which is actually up to 17.7TB but IBM includes the extra 1.7TB at no extra cost as hot spare chips, not available with Haswell-EX systems.
  • Deliver L1, L2 and L3 cache size increases, which though he does not state, are, in fact, 32KB (16KB in enterprise RAS mode), 256KB and 45MB respectively, compared to POWER8’s 64KB, 512KB and 96MB respectively plus 128MB L4, not available with Haswell-EX systems.
  • Introduces enhancements to vector processing via the new AVX2 instruction unit compared to  POWER8’s dual VMX instruction units.
  • Rely on local memory access for HANA performance which is absolutely true and underlines why POWER8, with up to 4 times more bandwidth to memory, is such a good fit for HANA.
  • Feature TSX, Transactional Synchronization Extensions, to improve lock synchronization, an area that Power Systems has excelled at for decades.  POWER8 was actually a bit earlier in the whole transactional memory area but was actually preceded by IBM Blue Gene/Q, another PowerPC based technology.

He concludes by pointing out that internal benchmarks are of limited value but then explains what they achieved with Haswell-EX.  As these are not externally audited nor even published, it is hard to comment on their validity.

By comparison, SAP has only one certified benchmark for which HANA systems have been utilized called BW-EML.  Haswell-EX cpus were used in the 2B row Dell PowerEdge 930 benchmark and delivered an impressive 172,450 Ad-hoc Navigation Steps/Hr . This is impressive in that it surpassed the previous IvyBridge based benchmark of 137,010 Ad-hoc Navigation Steps/Hr on the Dell PowerEdge R920, an increase of almost 26% which would normally be impressive if it weren’t for the fact that the system includes 20% more cores and 50% more memory.  By comparison, POWER8 delivered 192,750 Ad-hoc Navigation Steps/Hr with the IBM Power Enterprise System 870 or 12% more performance with 45% fewer cores and 33% less memory resulting in twice the performance per core.

It would be ideal to run the SAP SD 3-tier benchmark against a system running Suite on HANA as that would do away with discussions of benchmarks that can’t be verified and/or may have limited relevance to a transactional environment typical of Suite on HANA.  From what I understand, the current SD benchmark depends on an older version of SAP code which is not supported on HANA.  I hope that SAP is able to update the benchmark test kit to enable this benchmark to be run on HANA as that would be far better than any sort of speculation.  In the meantime, we can only rely on assertions without detail and external review or on decades of proven experience handling large scaling transactional environments with mission critical levels of availability not to mention a wide variety of audited benchmarks demonstrating this ability.  Power Systems stands alone in this respect.

Benchmark details:

Dell PowerEdge 930: 172,450 Ad-hoc Navigation Steps/Hr using 4 processors / 72 cores / 144 threads, Intel Xeon Processor E7-8890 v3, 2.50 GHz, 1.5TB main memory, Certification #: 2015014

Dell PowerEdge R920: 137,010 Ad-hoc Navigation Steps/Hr on the, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4890 v2, 2.80 GHz,  1TB main memory, Certification #: 2014044

the IBM Power Enterprise System 870: 192,750 Ad-hoc Navigation Steps/Hr with, 4 processors / 40 cores / 320 threads, POWER8, 4.19 GHz,  1TB main memory, Certification #: 2015024

July 22, 2015 Posted by | Uncategorized | , , , , , , , , , , , | Leave a comment

Is the SAP 2-tier benchmark a good predictor of database performance?

Answer: Not even close, especially for x86 systems.  Sizings for x86 systems based on the 2-tier benchmark can be as much as 50% smaller for database only workloads as would be predicted by the 3-tier benchmark.  Bottom line, I recommend that any database only sizings for x86 systems or partitions be at least doubled to ensure that enough capacity is available for the workload.  At the same time, IBM Power Systems sizings are extremely conservative and have built in allowances for reality vs. hypothetical 2-tier benchmark based sizings.  What follows is a somewhat technical and detailed analysis but this topic cannot, unfortunately, be boiled down into a simple set of assertions.

 

The details: The SAP Sales and Distribution (S&D) 2-tier benchmark is absolutely vital to SAP sizings as workloads are measured in SAPS (SAP Application Performance Standard)[i], a unit of measurement based on the 2-tier benchmark.  The goal of this benchmark is to be hardware independent and useful for all types of workloads, but the reality of this benchmark is quite different.  The capacity required for the database server portion of the workload is 7% to 9% of the total capacity with the remainder used by multiple instances of dialog/update servers and a message/enqueue server.  This contrasts with the real world where the ratio of app to DB servers is more in the 4 to 1 range for transactional systems and 2 or 1 to 1 for BW.  In other words, this benchmark is primarily an application server benchmark with a relatively small database server.  Even if a particular system or database software delivered 50% higher performance for the DB server compared to what would be predicted by the 2-tier benchmark, the result on the 2-tier benchmark would only change by .07 * .5 = 3.5%.

 

How then is one supposed to size database servers when the SAP Quicksizer shows the capacity requirements based on 2-tier SAPS?   A clue may be found by examining another, closely related SAP benchmark, the S&D 3-tier benchmark.  The workload used in this benchmark is identical to that used in the 2-tier benchmark with the difference being that in the 2-tier benchmark, all instances of DB and App servers must be located within one operating system (OS) image where with the 3-tier benchmark, DB and App server instances may be distributed to multiple different OS images and servers.  Unfortunately, the unit of measurement is still SAPS but this represents the total SAPS handled by all servers working together.  Fortunately, 100% of the SAPS must be funneled through the database server, i.e. this SAPS measurement, which I will call DB SAPS, represents the maximum capacity of the DB server.

 

Now, we can compare different SAPS and DB SAPS results or sizing estimates for various systems to see how well 2-tier and 3-tier SAPS correlate with one another.  Turns out, this is easier said than done as there are precious few 3-tier published results available compared to the hundreds of results published for the 2-tier benchmark.  But, I would not be posting this blog entry if I did not find a way to accomplish this, would I?  I first wanted to find two results on the 3-tier benchmark that achieved similar results.  Fortunately, HP and IBM both published results within a month of one another back in 2008, with HP hitting 170,200 DB SAPS[ii] on a 16-core x86 system and IBM hitting 161,520 DB SAPS[iii] on a 4-core Power system.

 

While the stars did not line up precisely, it turns out that 2-tier results were published by both vendors just a few months earlier with HP achieving 17,550 SAPS[iv] on the same 16-core x86 system and IBM achieving 10,180 SAPS[v] on a 4-core and slightly higher MHz (4.7GHz or 12% faster than used in the 3-tier benchmark) Power system than the one in the 3-tier benchmark.

 

Notice that the HP 2-tier result is 72% higher than the IBM result using the faster IBM processor.  Clearly, this lead would have even higher had IBM published a result on the slower processor.  While SAP benchmark rules do not allow for estimates of slower to faster processors by vendors, even though I I am posting this as an individual not on behalf of IBM, I will err on the side of caution and give you only the formula, not the estimated result:  17,550 / (10,180 * 4.2 / 4.7) = the ratio of the published HP result to the projected slower IBM processor.  At the same time, HP achieved only a 5.4% higher 3-tier result.  How does one go from almost twice the performance to essentially tied?  Easy answer, the IBM system was designed for database workloads with a whole boatload of attributes that go almost unused in application server workloads, e.g. extremely high I/O throughput and advanced cache coherency mechanisms.

 

One might point out that Intel has really turned up its game since 2008 with the introduction of Nehalem and Westmere chips and closed the gap, somewhat, against IBM’s Power Systems.  There is some truth in that, but let’s take a look at a more recent result.  In late 2011, HP published a 3-tier result of 175,320 DB SAPS[vi].  A direct comparison of old and new results show that the new result delivered 3% more performance than the old with 12 cores instead of 16 which works out to about 37% more performance per core.  Admittedly, this is not completely correct as the old benchmark utilized SAP ECC 6.0 with ASCII and the new one used SAP ECC 6.0 EP4 with Unicode which is estimated to be a 28% higher resource workload, so in reality, this new result is closer to 76% more performance per core.  By comparison, a slightly faster DL380 G7[vii], but otherwise almost identical system to the BL460c G7, delivered 112% more SAPS/core on the 2-tier benchmark compared to the BL680c G5 and almost 171% more per SAPS/core once the 28% factor mentioned above is taken into consideration.  Once again, one would need to adjust these numbers based on differences in MHz and the formula for that would be: either of the above numbers * 3.06/3.33 = estimated SAPS/core.

 

After one does this math, one would find that improvement in 2-tier results was almost 3 times the improvement in 3-tier results further questioning whether the 2-tier benchmark has any relevance to the database tier.  And just one more complicating factor; how vendors interpret SAP Quicksizer output.  The Quicksizer conveniently breaks down the amount of workload required of both the DB and App tiers.  Unfortunately, experience shows that this breakdown does not work in reality, so vendors can make modifications to the ratios based on their experience.  Some, such as IBM, have found that DB loads are significantly higher than the Quicksizer estimates and have made sure that this tier is sized higher.  Remember, while app servers can scale out horizontally, unless a parallel DB is used, the DB server cannot, so making sure that you don’t run out of capacity is essential.  What happens when you compare the sizing from IBM to that of another vendor?  That is hard to say since each can use whatever ratio they believe is correct.  If you don’t know what ratio the different vendors use, you may be comparing apples and oranges.

 

Great!  Now, what is a customer to do now that I have completely destroyed any illusion that database sizing based on 2-tier SAPS is even remotely close to reality?

 

One option is to say, “I have no clue” and simply add a fudge factor, perhaps 100%, to the database sizing.  One could not be faulted for such a decision as there is no other simple answer.  But, one could also not be certain that this sizing was correct.  For example, how does I/O throughput fit into the equation.  It is possible for a system to be able to handle a certain amount of processing but not be able to feed data in at the rate necessary to sustain that processing.  Some virtualization managers, such as VMware have to transfer data first to the hypervisor and then to the partition or in the other direction to the disk subsystem.  This causes additional latency and overhead and may be hard to estimate.

 

A better option is to start with IBM.  IBM Power Systems is the “gold standard” for SAP open systems database hosting.  A huge population of very large SAP customers, some of which have decided to utilize x86 systems for the app tier, use Power for the DB tier.  This has allowed IBM to gain real world experience in how to size DB systems which has been incorporated into its sizing methodology.  As a result, customers should feel a great deal of trust in the sizing that IBM delivers and once you have this sizing, you can work backwards into what an x86 system should require.  Then you can compare this to the sizing delivered by the x86 vendor and have a good discussion about why there are differences.  How do you work backwards?  A fine question for which I will propose a methodology.

 

Ideally, IBM would have a 3-tier benchmark for a current system from which you could extrapolate, but that is not the case.  Instead, you could extrapolate from the published result for the Power 550 mentioned above using IBM’s rperf, an internal estimate of relative performance for database intensive environments which is published externally.  The IBM Power Systems Performance Report[viii] includes rperf ratings for current and past systems.  If we multiply the size of the database system as estimated by the IBM ERP sizer by the ratio of per core performance of IBM and x86 systems, we should be able to estimate how much capacity is required on the x86 system.  For simplicity, we will assume the sizer has determined that the database requires 10 of 16 @ IBM Power 740 3.55GHz cores.  Here is the proposed formula:

 

Power 550 DB SAPS x 1/1.28 (old SAPS to new SAPS conversion) x rperf of 740 / rperf of 550

161,520 DB SAPS x 1/1.28 x 176.57 / 36.28 = estimated DB SAPS of 740 @ 16 cores

Then we can divide that above number by the number of cores to get a per core DB SAPS estimate.  By the same token you can divide the published HP BL 460c G7 DB SAPS number by the number of cores.  Then:

Estimated Power 740 DB SAPS/core / Estimated BL460c G7 DB SAPS/core = ratio to apply to sizing

The result is a ratio of 2.6, e.g. if a workload requires 10 IBM Power 740 3.55GHz cores, it would require 26 BL460c G7 cores.  This contrasts to the per core estimated SAPS based on the 2-tier benchmark which suggests just that the Power 740 would have been just 1.4 time the performance per core.   In other words, a 2-tier based sizing would suggest that the x86 system require just 14 cores where the 3-tier comparison suggests it actually needs almost twice that.  This is, assuming the I/O throughput is sufficient.  This also suggests that both systems have the same target utilization.  In reality, where x86 systems are usually sized for no more than 65% utilization, Power System are routinely sized for up to 85% utilization.

 

If this workload was planned to run under VMware, the number of vcpus must be considered which is twice the number of cores, i.e. this workload would require 52 cores which is over the limit of 32 vcpu limit of VMware 5.0.  Even when VMware can handle 64 vcpu, the overhead of VMware and its ability to sustain the high I/O of such a workload must be included in any sizing.

 

Of course, technology moves on and Intel is into its Gen8 processors.  So, you may have to adjust what you believe to the effective throughput of the x86 system based on relative performance to the BL460c G7 above, but now, at least, you may have a frame of reference for doing the appropriate calculations.  Clearly, we have shown that 2-tier is an unreliable benchmark by which to size database only systems or partitions and can easily be off by 100% for x86 systems.

 


[ii] 170,200 SAPS/34,000 users, HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2008003

[iii] 161,520 SAPS/32,000 users, IBM System p 550, 2 Processor/4 Core/8 Thread, POWER6, 4.2 Ghz , AIX 5.3,  DB2 9.5, Certification # 2008001

[iv] 17,550 SAPS/3,500 users , HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2007055

[v] 10,180 SAPS/2,035 users, IBM System p 570, 2 Processor/4 Core/8 Thread, POWER6, 4.7 Ghz, AIX 6.1, Oracle 10G, Certification # 2007037

[vi] 175,320 SAPS/32,125 users, HP ProLiant BL460c G7, 2 Processor/12 Core/24 Thread, X5675, 3.06 Ghz,  Windows Server 2008 R2 Enterprise on VMware ESX 5.0, SQL Server 2008, Certification # 2011044

[vii]27,880 SAPS/ 5,110 users, HP ProLiant DL380 G7, 2 Processor/12 Core/24 Thread, X5680, 3.33 Ghz, Windows Server 2008 R2 Enterprise, SQL Server 2008, Certification # 2010031

July 30, 2012 Posted by | Uncategorized | , , , , , , , , , | 1 Comment

Oracle publishes another SAP benchmark result with limited value

About a month ago, I posted a review of Oracle’s SAP ATO benchmark result and pointed out that ATO is so obscure and has so few results, that other than marketing value, their result was completely irrelevant.  About two weeks later, they published a result on another rarely used SAP benchmark, Sales and Distribution-Parallel.  While there have been a few more publishes on this variety of SD than with the ATO benchmark, the number of publishes prior to this one over the past two years could be counted on one hand, all by Oracle/Sun.

This benchmark is not completely irrelevant, just without context and competitors, it says very little about the performance of systems for SAP.  As the name implies, it requires the use of a parallel database.  While some customers have implemented SAP with a parallel database like Oracle RAC, these customers represent a very small minority, reportedly less than 1% of all SAP customers.  The reason has been discussed in my post on Exadata for SAP, so I will only summarize it here.  SAP is RAC enabled, not RAC aware meaning that tuning and scalability can be a real challenge and not for the faint of heart.  While a good solution for very high availability, the benefit depends on how often you think that you will avoid a 20 minute or so outage.   Some non-IBM customers predict their DB server will only fail every 2 years meaning RAC may help avoid 10 minutes of downtime per year for those customers.  Obviously, if the predicted failure rate is higher or the time for recovery is longer, the benefits of RAC can increase proportionately and if failures occur less often, then the value decreases.

But that is not why this benchmark is of little value.  In reality, the SD benchmark is approximately 1/16 DB workload, the rest being app servers and CI.  To put that in context, for this benchmark, at 1/16 of the total, the DB workload would be approximately 46,265 SAPS.  A 16-core Power 730 has a single system result higher than that as do just about all Westmere-EX systems.  In other words, for scalability purposes, this workload simply does not justify the need for a parallel database.  In addition, the SD benchmark requires that the app servers run on the same OS as the DB server, but since this is SD-Parallel, app servers must run on each node in the cluster.  This turns out to be perfect for benchmark optimization.   Each group of users assigned to an app server is uniquely associated with the DB node on the same server.  The data that they utilize is also loaded into the local memory of that same server and virtually no cross–talk, i.e. remote memory accesses, occurs.  These types of clustered results inevitably show near-linear scalability.  As most people know, near-linear scalability is not really possible within an SMP much less across a cluster.  This means that high apparent scalability in this benchmark is mostly a work of fiction.

Before I am accused of hypocrisy, I should mention that IBM also published results on the SD-parallel benchmark back in early 2008.  Back then, the largest single image SD result achieved 30,000 users @ 152,530 SAPS by HP on the 128 core Superdome of that era.  While a large number, there were customers that already had larger SAP ERP instances than this.  So, when IBM proved that it could achieve a higher number, 37,040 users @ 187,450 SAPS with a 5-node cluster with a total of only 80 cores, this was an interesting proof point especially since we also published a 64-core single image result of 35,400 users @ 177,950 SAPS using the Power 595 within a few days.  In other words, IBM did not try to prove that the only way to achieve high results was using a cluster, but that a cluster could produce comparable results with a few more cores.  In other words, the published result was not a substitute for providing real, substantial results, but in addition to those as a proof of support of Oracle and Oracle RAC.   The last time that Oracle or Sun provided a single image SD result was way back in December, 2009, almost ancient history in the computer world.

This new result, 134,080 users @ 740,250 SAPS on a cluster of 6 Sun Fire x4800 systems, each with 80 Intel Xeon cores is a very high result, but only surpasses the previous high water result on any version of the SD benchmark by 6% while requiring 87.5% more cores.  We can debate whether any customer would be willing to run a 6-node RAC cluster for OLTP.  We can also debate how many customers … in the entire world, have single instance requirements anywhere close to this level.  A typical SAP customer might have 1,000 to 5,000 named users but far fewer concurrent users.  This benchmark does nothing to help inform those customers about the performance they could expect using Oracle systems.

So, this new parallel result neither demonstrates true parallel scalability nor single system scalability or even relevance for small to even very large SAP workloads.  In other words, what value does it provide to the evaluators of technology?   Nothing!  What value does it provide to Oracle?  Plenty!  Not only do they get to beat their chest about another “leadership” result, but they get to imply that customers can actually achieve these sorts of results with this and various other untested and unproven configurations.  More importantly, if customers were actually to buy into RAC as being the right answer for scalability, Oracle would get to harvest untold millions of dollars in license and maintenance revenues.  This configuration included 480 cores meaning customers not utilizing an OEM license through SAP, would have to pay, 480 x .5 (core license factor) x ($47,500 (Oracle license cost) + $22,500 (Oracle RAC license cost)) = $16.8M @ list for the Oracle licenses and another $18.5M for 5 years of maintenance and this is assuming no Oracle tools such as Partitioning, Advanced Compression, Diagnostic Pack, Tuning Pack, Change Management Pack or Active Data Guard.

For comparison, the largest single image system result for IBM Power 795, mentioned above, achieved  just 6% few users with DB2 on a 256 core system.  A 256-core license of DB2 would cost a customer, 256 x (120 PVU) x ($405/PVU) = $12.4M @ list for the DB2 licenses and another $10.9M for 4 years of maintenance (first year of maintenance is included as warranty as opposed to Oracle which charges for the first year of maintenance.)  So, the DB2 license would not be inexpensive, total of $23.3M over 5 years, but that is quite a bit better than the $35.5M for the Oracle licenses mentioned above.

Full results are available at: http://www.sap.com/solutions/benchmark/index.epx

October 4, 2011 Posted by | Uncategorized | , , , , , , | Leave a comment

Oracle M9000 SAP ATO Benchmark analysis

SAP has a large collection of different benchmark suites.  Most people are familiar with the SAP Sales and Distribution (SD) 2-tier benchmark as the vast majority of all results have been published using this benchmark suite.   A lesser known benchmark suite is called ATO or Assemble-to-Order.  When the ATO benchmark was designed it was intended to replace SD as a more “realistic” workload. As the benchmark is a little more complicated to run and SAP Quicksizer sizings are based on the SD workload the ATO benchmark never got much traction and from 1998 through 2003, only 19 results were published.  Prior to September 2, 2011, this benchmark had seemed to become extinct.  On that date, Oracle and Fujitsu, published a 2-tier result for the SPARC M9000 along with the predictable claim of world record result.  Oracle should be commended for having beaten the results published in 2003.  Of course, we might want to consider that a 2-processor/12-core, 2U Intel based system of today has already surpassed the TPC-C results of a 64-core HP Itanium Superdome that “set the record” back in 2003 at a tiny fraction of the cost and floor space.

 

So we give Oracle a one-handed clap for this “accomplishment”.  But if I left it at that, you might question why I would even bother to post this blog entry.  Let’s delve a little deeper to find the story within the story.  First let me remind the reader, these are my opinions and in no way do they reflect the opinions of IBM nor has IBM endorsed or reviewed my opinions.

 

In 2003, Fujitsu-Siemens published a couple of ATO results using a predecessor of today’s SPARC64 VII chip called SPARC64TM V at 1.35GHz and SAP 4.6C.  The just published M9000 result used the SPARC64 VII at 3.0GHz and SAP EP4 for SAP ERP 6.0 with Unicode.  If one were to divide the results achieved by both systems by the number of cores and compare them, one might find that the new results deliver about a very small increase in throughput per core of roughly 6% over the old results.  Of course, this does not account for the changes in SAP software, Unicode or benchmark requirements.   SAP rules do not allow for extrapolations, so I will instead provide you with the data from which to make your own calculations.  100 SAPS using SAP 4.6c is equal to about 55 SAPS using Business Suite 7 with Unicode.   If you were to multiply the old result by 55/100 and then divide by the number of cores, you could determine the effective throughput per core of the old system if it were running the current benchmark suite.  I can’t show you the result, but will show you the formula that you can use to determine this result yourself at the end of this posting.

 

For comparison, I wanted to figure out how Oracle did on the SD 2-tier benchmark compared to systems back in 2003.  Turns out that almost identical systems were used both in 2003 and in 2009 with the exception of the Sun M9000 which used 2.8GHz processors each of which had half of the L2 cache of the 3.0GHz system used in the ATO benchmark.  If you were to use a similar formula to the one described above and then perhaps multiply by the difference in MHz, i.e. 3.0/2.8 you could derive a similar per core performance comparison of the new and old systems.  Prior to performing any extrapolations, the benchmark users per core actually decreased between 2003 and 2009 by roughly 10%.

 

I also wanted to take a look at similar systems from IBM then and now.  Fortunately, IBM published SD 2-tier results for the 8-core 1.45GHz pSeries 650 in 2003 and for the a 256-core 4.0GHz Power 795 late last year with the SAP levels being identical to the ones used by Sun and Fujitsu-Siemens respectively.  Using the same calculations as were done for the SD and ATO comparisons above, IBM achieved 223% more benchmark users per core than they achieved in 2003 prior to any extrapolations.

 

Yes, there was no typo there.  While the results by IBM improved by 223% on a per core basis, the Fujitsu processor based systems either improved by only 9% or decreased by 10% depending on which benchmark you chose.  Interestingly enough, IBM had only a 9% per core advantage over Fujitsu-Siemens in 2003 which increased to a 294% advantage in 2009/2010 based on the SD 2-tier benchmark.

 

It is remarkable that since November 18, 2009, Oracle(Sun) has not published a single SPARC based SAP SD benchmark result while over 70 results were published by a variety of vendors, including two by Sun for their Intel systems.  When Oracle finally decided to get back into the game to try to prove their relevance despite a veritable flood of analyst and press suggestions to the contrary, rather than competing on the established and vibrant SD benchmark, they choose to stand on top of a small heap of dead carcasses to say they are better than the rotting husks upon which they stand.

 

For full disclosure, here are the actual results:

SD 2-tier Benchmark Results

Certification Date        System                                                                                # Benchmark Users               SAPS                Cert #

1/16/2003                 IBM eServer pSeries 650, 8-cores                                                  1,220                            6,130               2003002

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                     1,120                            5,620               2003009

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900, 16-cores                    2,200                            11,080              2003010

11/18/2009               Sun Microsystems, M9000, 256-cores                                             32,000                          175,600            2009046

11/15/2010               IBM Power 795, 256-cores                                                           126,063                        688,630            2010046

 

ATO 2-tier results:

Certification Date        System                                                                     Fully Processed Assembly Orders/Hr            Cert #

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                6,220                                        2003011

03/11/2003               Fujitsu Siemens Computers, PrimePower 900, 16-cores               12,170                                       2003012

09/02/2011               Oracle M9000, 256-cores                                                        206,360                                     2011033

 

Formulas that you might use assuming you agree with the assumptions:

 

Performance of old system / number of cores * 55/100 = effective performance per core on new benchmark suite (EP)

 

(Performance of new system / cores ) / EP = relative ratio of performance per core of new system compared to old system

 

Improvement per core = 1 – relative ratio

 

This can be applied to both the SD and ATO results using the appropriate throughput measurements.

September 9, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , | Leave a comment