SAPonPower

An ongoing discussion about SAP infrastructure

IBM Power10 debuts with a new SAP Benchmark!

Today, SAP published a new SD 2-tier result for IBM’s soon to be announced Power E1080.[i]  First the highlights: 

  • 174,000 SD Users 
  • 955,050 SAPS
  • 120 cores

Wait, almost 1M SAPS with only 120 cores?  HPE achieved 670,830 SAPS (122,300 users) with 224 cores on their Superdome Flex 280 with the Intel Xeon Platinum 8380H Processor in January, 2021.  

This new result is almost 3 times the SAPS/core of HPE’s biggest and baddest system.  (Funny note, autocorrect tried to change “baddest” to “saddest”). This new result is also about 33% faster, on a per core basis, than the previous Power 980 result published at the end of 2018.  That is certainly not remarkable since Intel’s per core performance on this benchmark also increased about 69.5%, since 2017 … sorry, missed the decimal, 0.695%. (Comparing two Dell 2-socket results, Intel 8180 & Intel 8380).

Clearly, IBM has moved the microarchitecture technology ball forward with a huge improvement in per core performance.  And that is significant in that Intel seems to have given up on the microarchitecture game and only seems to be focused on increasing the core count (now up to 40 per socket).  

But isn’t the SD benchmark based on ECC 6.0 and primarily an app server benchmark, so do we really care if we are talking about SAP workloads?  For that matter, isn’t HANA the name of the game now and how can we correlate this result against HANA workloads?

Yes and you can’t.  I will answer the second question first.  SAP rules forbid comparisons against different benchmarks and for good reason; they don’t have the same logic, application code, database usage, memory dependency or anything else for that matter.  But, we will get to the impact on HANA a bit later in this blog.

The SD benchmark is rather removed from reality both by its age, its dependence on an outdated interface (the old and much loved SAP GUI, not) not to mention old non-HANA databases.  Fun fact: since 2005, 96 results are used MaxDB or Sybase, 155 – IBM Db2, 52 – Oracle, 413 – Microsoft SQL and 0 used HANA.  And since application servers can easily scale across dozens of systems, the performance per core doesn’t really matter all that much, and this equation usually boils down to $/SAPS.

At Hot Chips 2020, Bill Starke, IBM Power Chief Architect and Brian Thompto, Power10 core architect, revealed a bunch of amazing speeds and feeds including 2.25x the memory bandwidth for Power10 vs Power9 per socket.[ii]  We know that HANA eats memory bandwidth for breakfast, lunch, dinner and all snacks in between.  This new SD benchmark (and others that IBM will undoubtedly publish very soon) suggest that these new Power processors will be able to handle all workloads, including SAP HANA, with either fewer cores or with the same number of cores and tons of CPU cycles to spare.  

It might be tempting to consider using a smaller Power10 system, but this is where the problem gets a bit sticky.  HANA not only loves memory bandwidth, but unless you are going to provision a server with less memory than SAP recommends or use one of their tiered approaches, you still need the same quantity of memory regardless of server or microarchitecture.  You could certainly reduce the number of cores per socket or go to slower chip speeds and this might be a very good approach for reducing HANA system costs for a lot of customers.  Another option to consider is using those spare cycles for something else, after all HANA is supported by SAP for use with IBM PowerVM shared processor pools.

What other workloads might you use those cycles for?  We could get into a big discussion about all sort of other workloads, like AI, HPC, etc., but how about we keep this simple?  How about for the thing that the SD benchmark actually does test, application serving?  Even with S/4HANA and Fiori, you still need application servers.  And if you already purchased a server for HANA based on memory requirements and you have a ton of cycles left over, this means that the $/SAPS for those application servers essentially goes toward $0!  I have not priced an Intel server lately, but I am pretty certain that the price is not even remotely close to $0.

For existing SAP on Power customers (both HANA and non-HANA), Power10 is going to be amazing, resulting in either better performance, lower cost or both!  For customers still trying to decide on which type of system to use, I would strongly encourage a full landscape cost comparison be performed including production HANA and application servers, HA, non-prod and DR.  

And as good a news as this is for on-premise customers, cloud vendors that offer HANA on Power, such as IBM, Syntax and SAP, should be even more excited about how they can decrease their costs while offering better solutions to their customers with Power10.


[i] https://www.sap.com/dmc/exp/2018-benchmark-directory/#/sd

[ii] https://www.nextplatform.com/2020/08/18/ibm-brings-an-architecture-gun-to-a-chip-knife-fight/

September 1, 2021 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Oracle publishes another SAP benchmark result with limited value

About a month ago, I posted a review of Oracle’s SAP ATO benchmark result and pointed out that ATO is so obscure and has so few results, that other than marketing value, their result was completely irrelevant.  About two weeks later, they published a result on another rarely used SAP benchmark, Sales and Distribution-Parallel.  While there have been a few more publishes on this variety of SD than with the ATO benchmark, the number of publishes prior to this one over the past two years could be counted on one hand, all by Oracle/Sun.

This benchmark is not completely irrelevant, just without context and competitors, it says very little about the performance of systems for SAP.  As the name implies, it requires the use of a parallel database.  While some customers have implemented SAP with a parallel database like Oracle RAC, these customers represent a very small minority, reportedly less than 1% of all SAP customers.  The reason has been discussed in my post on Exadata for SAP, so I will only summarize it here.  SAP is RAC enabled, not RAC aware meaning that tuning and scalability can be a real challenge and not for the faint of heart.  While a good solution for very high availability, the benefit depends on how often you think that you will avoid a 20 minute or so outage.   Some non-IBM customers predict their DB server will only fail every 2 years meaning RAC may help avoid 10 minutes of downtime per year for those customers.  Obviously, if the predicted failure rate is higher or the time for recovery is longer, the benefits of RAC can increase proportionately and if failures occur less often, then the value decreases.

But that is not why this benchmark is of little value.  In reality, the SD benchmark is approximately 1/16 DB workload, the rest being app servers and CI.  To put that in context, for this benchmark, at 1/16 of the total, the DB workload would be approximately 46,265 SAPS.  A 16-core Power 730 has a single system result higher than that as do just about all Westmere-EX systems.  In other words, for scalability purposes, this workload simply does not justify the need for a parallel database.  In addition, the SD benchmark requires that the app servers run on the same OS as the DB server, but since this is SD-Parallel, app servers must run on each node in the cluster.  This turns out to be perfect for benchmark optimization.   Each group of users assigned to an app server is uniquely associated with the DB node on the same server.  The data that they utilize is also loaded into the local memory of that same server and virtually no cross–talk, i.e. remote memory accesses, occurs.  These types of clustered results inevitably show near-linear scalability.  As most people know, near-linear scalability is not really possible within an SMP much less across a cluster.  This means that high apparent scalability in this benchmark is mostly a work of fiction.

Before I am accused of hypocrisy, I should mention that IBM also published results on the SD-parallel benchmark back in early 2008.  Back then, the largest single image SD result achieved 30,000 users @ 152,530 SAPS by HP on the 128 core Superdome of that era.  While a large number, there were customers that already had larger SAP ERP instances than this.  So, when IBM proved that it could achieve a higher number, 37,040 users @ 187,450 SAPS with a 5-node cluster with a total of only 80 cores, this was an interesting proof point especially since we also published a 64-core single image result of 35,400 users @ 177,950 SAPS using the Power 595 within a few days.  In other words, IBM did not try to prove that the only way to achieve high results was using a cluster, but that a cluster could produce comparable results with a few more cores.  In other words, the published result was not a substitute for providing real, substantial results, but in addition to those as a proof of support of Oracle and Oracle RAC.   The last time that Oracle or Sun provided a single image SD result was way back in December, 2009, almost ancient history in the computer world.

This new result, 134,080 users @ 740,250 SAPS on a cluster of 6 Sun Fire x4800 systems, each with 80 Intel Xeon cores is a very high result, but only surpasses the previous high water result on any version of the SD benchmark by 6% while requiring 87.5% more cores.  We can debate whether any customer would be willing to run a 6-node RAC cluster for OLTP.  We can also debate how many customers … in the entire world, have single instance requirements anywhere close to this level.  A typical SAP customer might have 1,000 to 5,000 named users but far fewer concurrent users.  This benchmark does nothing to help inform those customers about the performance they could expect using Oracle systems.

So, this new parallel result neither demonstrates true parallel scalability nor single system scalability or even relevance for small to even very large SAP workloads.  In other words, what value does it provide to the evaluators of technology?   Nothing!  What value does it provide to Oracle?  Plenty!  Not only do they get to beat their chest about another “leadership” result, but they get to imply that customers can actually achieve these sorts of results with this and various other untested and unproven configurations.  More importantly, if customers were actually to buy into RAC as being the right answer for scalability, Oracle would get to harvest untold millions of dollars in license and maintenance revenues.  This configuration included 480 cores meaning customers not utilizing an OEM license through SAP, would have to pay, 480 x .5 (core license factor) x ($47,500 (Oracle license cost) + $22,500 (Oracle RAC license cost)) = $16.8M @ list for the Oracle licenses and another $18.5M for 5 years of maintenance and this is assuming no Oracle tools such as Partitioning, Advanced Compression, Diagnostic Pack, Tuning Pack, Change Management Pack or Active Data Guard.

For comparison, the largest single image system result for IBM Power 795, mentioned above, achieved  just 6% few users with DB2 on a 256 core system.  A 256-core license of DB2 would cost a customer, 256 x (120 PVU) x ($405/PVU) = $12.4M @ list for the DB2 licenses and another $10.9M for 4 years of maintenance (first year of maintenance is included as warranty as opposed to Oracle which charges for the first year of maintenance.)  So, the DB2 license would not be inexpensive, total of $23.3M over 5 years, but that is quite a bit better than the $35.5M for the Oracle licenses mentioned above.

Full results are available at: http://www.sap.com/solutions/benchmark/index.epx

October 4, 2011 Posted by | Uncategorized | , , , , , , | Leave a comment

Oracle M9000 SAP ATO Benchmark analysis

SAP has a large collection of different benchmark suites.  Most people are familiar with the SAP Sales and Distribution (SD) 2-tier benchmark as the vast majority of all results have been published using this benchmark suite.   A lesser known benchmark suite is called ATO or Assemble-to-Order.  When the ATO benchmark was designed it was intended to replace SD as a more “realistic” workload. As the benchmark is a little more complicated to run and SAP Quicksizer sizings are based on the SD workload the ATO benchmark never got much traction and from 1998 through 2003, only 19 results were published.  Prior to September 2, 2011, this benchmark had seemed to become extinct.  On that date, Oracle and Fujitsu, published a 2-tier result for the SPARC M9000 along with the predictable claim of world record result.  Oracle should be commended for having beaten the results published in 2003.  Of course, we might want to consider that a 2-processor/12-core, 2U Intel based system of today has already surpassed the TPC-C results of a 64-core HP Itanium Superdome that “set the record” back in 2003 at a tiny fraction of the cost and floor space.

 

So we give Oracle a one-handed clap for this “accomplishment”.  But if I left it at that, you might question why I would even bother to post this blog entry.  Let’s delve a little deeper to find the story within the story.  First let me remind the reader, these are my opinions and in no way do they reflect the opinions of IBM nor has IBM endorsed or reviewed my opinions.

 

In 2003, Fujitsu-Siemens published a couple of ATO results using a predecessor of today’s SPARC64 VII chip called SPARC64TM V at 1.35GHz and SAP 4.6C.  The just published M9000 result used the SPARC64 VII at 3.0GHz and SAP EP4 for SAP ERP 6.0 with Unicode.  If one were to divide the results achieved by both systems by the number of cores and compare them, one might find that the new results deliver about a very small increase in throughput per core of roughly 6% over the old results.  Of course, this does not account for the changes in SAP software, Unicode or benchmark requirements.   SAP rules do not allow for extrapolations, so I will instead provide you with the data from which to make your own calculations.  100 SAPS using SAP 4.6c is equal to about 55 SAPS using Business Suite 7 with Unicode.   If you were to multiply the old result by 55/100 and then divide by the number of cores, you could determine the effective throughput per core of the old system if it were running the current benchmark suite.  I can’t show you the result, but will show you the formula that you can use to determine this result yourself at the end of this posting.

 

For comparison, I wanted to figure out how Oracle did on the SD 2-tier benchmark compared to systems back in 2003.  Turns out that almost identical systems were used both in 2003 and in 2009 with the exception of the Sun M9000 which used 2.8GHz processors each of which had half of the L2 cache of the 3.0GHz system used in the ATO benchmark.  If you were to use a similar formula to the one described above and then perhaps multiply by the difference in MHz, i.e. 3.0/2.8 you could derive a similar per core performance comparison of the new and old systems.  Prior to performing any extrapolations, the benchmark users per core actually decreased between 2003 and 2009 by roughly 10%.

 

I also wanted to take a look at similar systems from IBM then and now.  Fortunately, IBM published SD 2-tier results for the 8-core 1.45GHz pSeries 650 in 2003 and for the a 256-core 4.0GHz Power 795 late last year with the SAP levels being identical to the ones used by Sun and Fujitsu-Siemens respectively.  Using the same calculations as were done for the SD and ATO comparisons above, IBM achieved 223% more benchmark users per core than they achieved in 2003 prior to any extrapolations.

 

Yes, there was no typo there.  While the results by IBM improved by 223% on a per core basis, the Fujitsu processor based systems either improved by only 9% or decreased by 10% depending on which benchmark you chose.  Interestingly enough, IBM had only a 9% per core advantage over Fujitsu-Siemens in 2003 which increased to a 294% advantage in 2009/2010 based on the SD 2-tier benchmark.

 

It is remarkable that since November 18, 2009, Oracle(Sun) has not published a single SPARC based SAP SD benchmark result while over 70 results were published by a variety of vendors, including two by Sun for their Intel systems.  When Oracle finally decided to get back into the game to try to prove their relevance despite a veritable flood of analyst and press suggestions to the contrary, rather than competing on the established and vibrant SD benchmark, they choose to stand on top of a small heap of dead carcasses to say they are better than the rotting husks upon which they stand.

 

For full disclosure, here are the actual results:

SD 2-tier Benchmark Results

Certification Date        System                                                                                # Benchmark Users               SAPS                Cert #

1/16/2003                 IBM eServer pSeries 650, 8-cores                                                  1,220                            6,130               2003002

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                     1,120                            5,620               2003009

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900, 16-cores                    2,200                            11,080              2003010

11/18/2009               Sun Microsystems, M9000, 256-cores                                             32,000                          175,600            2009046

11/15/2010               IBM Power 795, 256-cores                                                           126,063                        688,630            2010046

 

ATO 2-tier results:

Certification Date        System                                                                     Fully Processed Assembly Orders/Hr            Cert #

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                6,220                                        2003011

03/11/2003               Fujitsu Siemens Computers, PrimePower 900, 16-cores               12,170                                       2003012

09/02/2011               Oracle M9000, 256-cores                                                        206,360                                     2011033

 

Formulas that you might use assuming you agree with the assumptions:

 

Performance of old system / number of cores * 55/100 = effective performance per core on new benchmark suite (EP)

 

(Performance of new system / cores ) / EP = relative ratio of performance per core of new system compared to old system

 

Improvement per core = 1 – relative ratio

 

This can be applied to both the SD and ATO results using the appropriate throughput measurements.

September 9, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , | Leave a comment