An ongoing discussion about SAP infrastructure

Is the SAP 2-tier benchmark a good predictor of database performance?

Answer: Not even close, especially for x86 systems.  Sizings for x86 systems based on the 2-tier benchmark can be as much as 50% smaller for database only workloads as would be predicted by the 3-tier benchmark.  Bottom line, I recommend that any database only sizings for x86 systems or partitions be at least doubled to ensure that enough capacity is available for the workload.  At the same time, IBM Power Systems sizings are extremely conservative and have built in allowances for reality vs. hypothetical 2-tier benchmark based sizings.  What follows is a somewhat technical and detailed analysis but this topic cannot, unfortunately, be boiled down into a simple set of assertions.


The details: The SAP Sales and Distribution (S&D) 2-tier benchmark is absolutely vital to SAP sizings as workloads are measured in SAPS (SAP Application Performance Standard)[i], a unit of measurement based on the 2-tier benchmark.  The goal of this benchmark is to be hardware independent and useful for all types of workloads, but the reality of this benchmark is quite different.  The capacity required for the database server portion of the workload is 7% to 9% of the total capacity with the remainder used by multiple instances of dialog/update servers and a message/enqueue server.  This contrasts with the real world where the ratio of app to DB servers is more in the 4 to 1 range for transactional systems and 2 or 1 to 1 for BW.  In other words, this benchmark is primarily an application server benchmark with a relatively small database server.  Even if a particular system or database software delivered 50% higher performance for the DB server compared to what would be predicted by the 2-tier benchmark, the result on the 2-tier benchmark would only change by .07 * .5 = 3.5%.


How then is one supposed to size database servers when the SAP Quicksizer shows the capacity requirements based on 2-tier SAPS?   A clue may be found by examining another, closely related SAP benchmark, the S&D 3-tier benchmark.  The workload used in this benchmark is identical to that used in the 2-tier benchmark with the difference being that in the 2-tier benchmark, all instances of DB and App servers must be located within one operating system (OS) image where with the 3-tier benchmark, DB and App server instances may be distributed to multiple different OS images and servers.  Unfortunately, the unit of measurement is still SAPS but this represents the total SAPS handled by all servers working together.  Fortunately, 100% of the SAPS must be funneled through the database server, i.e. this SAPS measurement, which I will call DB SAPS, represents the maximum capacity of the DB server.


Now, we can compare different SAPS and DB SAPS results or sizing estimates for various systems to see how well 2-tier and 3-tier SAPS correlate with one another.  Turns out, this is easier said than done as there are precious few 3-tier published results available compared to the hundreds of results published for the 2-tier benchmark.  But, I would not be posting this blog entry if I did not find a way to accomplish this, would I?  I first wanted to find two results on the 3-tier benchmark that achieved similar results.  Fortunately, HP and IBM both published results within a month of one another back in 2008, with HP hitting 170,200 DB SAPS[ii] on a 16-core x86 system and IBM hitting 161,520 DB SAPS[iii] on a 4-core Power system.


While the stars did not line up precisely, it turns out that 2-tier results were published by both vendors just a few months earlier with HP achieving 17,550 SAPS[iv] on the same 16-core x86 system and IBM achieving 10,180 SAPS[v] on a 4-core and slightly higher MHz (4.7GHz or 12% faster than used in the 3-tier benchmark) Power system than the one in the 3-tier benchmark.


Notice that the HP 2-tier result is 72% higher than the IBM result using the faster IBM processor.  Clearly, this lead would have even higher had IBM published a result on the slower processor.  While SAP benchmark rules do not allow for estimates of slower to faster processors by vendors, even though I I am posting this as an individual not on behalf of IBM, I will err on the side of caution and give you only the formula, not the estimated result:  17,550 / (10,180 * 4.2 / 4.7) = the ratio of the published HP result to the projected slower IBM processor.  At the same time, HP achieved only a 5.4% higher 3-tier result.  How does one go from almost twice the performance to essentially tied?  Easy answer, the IBM system was designed for database workloads with a whole boatload of attributes that go almost unused in application server workloads, e.g. extremely high I/O throughput and advanced cache coherency mechanisms.


One might point out that Intel has really turned up its game since 2008 with the introduction of Nehalem and Westmere chips and closed the gap, somewhat, against IBM’s Power Systems.  There is some truth in that, but let’s take a look at a more recent result.  In late 2011, HP published a 3-tier result of 175,320 DB SAPS[vi].  A direct comparison of old and new results show that the new result delivered 3% more performance than the old with 12 cores instead of 16 which works out to about 37% more performance per core.  Admittedly, this is not completely correct as the old benchmark utilized SAP ECC 6.0 with ASCII and the new one used SAP ECC 6.0 EP4 with Unicode which is estimated to be a 28% higher resource workload, so in reality, this new result is closer to 76% more performance per core.  By comparison, a slightly faster DL380 G7[vii], but otherwise almost identical system to the BL460c G7, delivered 112% more SAPS/core on the 2-tier benchmark compared to the BL680c G5 and almost 171% more per SAPS/core once the 28% factor mentioned above is taken into consideration.  Once again, one would need to adjust these numbers based on differences in MHz and the formula for that would be: either of the above numbers * 3.06/3.33 = estimated SAPS/core.


After one does this math, one would find that improvement in 2-tier results was almost 3 times the improvement in 3-tier results further questioning whether the 2-tier benchmark has any relevance to the database tier.  And just one more complicating factor; how vendors interpret SAP Quicksizer output.  The Quicksizer conveniently breaks down the amount of workload required of both the DB and App tiers.  Unfortunately, experience shows that this breakdown does not work in reality, so vendors can make modifications to the ratios based on their experience.  Some, such as IBM, have found that DB loads are significantly higher than the Quicksizer estimates and have made sure that this tier is sized higher.  Remember, while app servers can scale out horizontally, unless a parallel DB is used, the DB server cannot, so making sure that you don’t run out of capacity is essential.  What happens when you compare the sizing from IBM to that of another vendor?  That is hard to say since each can use whatever ratio they believe is correct.  If you don’t know what ratio the different vendors use, you may be comparing apples and oranges.


Great!  Now, what is a customer to do now that I have completely destroyed any illusion that database sizing based on 2-tier SAPS is even remotely close to reality?


One option is to say, “I have no clue” and simply add a fudge factor, perhaps 100%, to the database sizing.  One could not be faulted for such a decision as there is no other simple answer.  But, one could also not be certain that this sizing was correct.  For example, how does I/O throughput fit into the equation.  It is possible for a system to be able to handle a certain amount of processing but not be able to feed data in at the rate necessary to sustain that processing.  Some virtualization managers, such as VMware have to transfer data first to the hypervisor and then to the partition or in the other direction to the disk subsystem.  This causes additional latency and overhead and may be hard to estimate.


A better option is to start with IBM.  IBM Power Systems is the “gold standard” for SAP open systems database hosting.  A huge population of very large SAP customers, some of which have decided to utilize x86 systems for the app tier, use Power for the DB tier.  This has allowed IBM to gain real world experience in how to size DB systems which has been incorporated into its sizing methodology.  As a result, customers should feel a great deal of trust in the sizing that IBM delivers and once you have this sizing, you can work backwards into what an x86 system should require.  Then you can compare this to the sizing delivered by the x86 vendor and have a good discussion about why there are differences.  How do you work backwards?  A fine question for which I will propose a methodology.


Ideally, IBM would have a 3-tier benchmark for a current system from which you could extrapolate, but that is not the case.  Instead, you could extrapolate from the published result for the Power 550 mentioned above using IBM’s rperf, an internal estimate of relative performance for database intensive environments which is published externally.  The IBM Power Systems Performance Report[viii] includes rperf ratings for current and past systems.  If we multiply the size of the database system as estimated by the IBM ERP sizer by the ratio of per core performance of IBM and x86 systems, we should be able to estimate how much capacity is required on the x86 system.  For simplicity, we will assume the sizer has determined that the database requires 10 of 16 @ IBM Power 740 3.55GHz cores.  Here is the proposed formula:


Power 550 DB SAPS x 1/1.28 (old SAPS to new SAPS conversion) x rperf of 740 / rperf of 550

161,520 DB SAPS x 1/1.28 x 176.57 / 36.28 = estimated DB SAPS of 740 @ 16 cores

Then we can divide that above number by the number of cores to get a per core DB SAPS estimate.  By the same token you can divide the published HP BL 460c G7 DB SAPS number by the number of cores.  Then:

Estimated Power 740 DB SAPS/core / Estimated BL460c G7 DB SAPS/core = ratio to apply to sizing

The result is a ratio of 2.6, e.g. if a workload requires 10 IBM Power 740 3.55GHz cores, it would require 26 BL460c G7 cores.  This contrasts to the per core estimated SAPS based on the 2-tier benchmark which suggests just that the Power 740 would have been just 1.4 time the performance per core.   In other words, a 2-tier based sizing would suggest that the x86 system require just 14 cores where the 3-tier comparison suggests it actually needs almost twice that.  This is, assuming the I/O throughput is sufficient.  This also suggests that both systems have the same target utilization.  In reality, where x86 systems are usually sized for no more than 65% utilization, Power System are routinely sized for up to 85% utilization.


If this workload was planned to run under VMware, the number of vcpus must be considered which is twice the number of cores, i.e. this workload would require 52 cores which is over the limit of 32 vcpu limit of VMware 5.0.  Even when VMware can handle 64 vcpu, the overhead of VMware and its ability to sustain the high I/O of such a workload must be included in any sizing.


Of course, technology moves on and Intel is into its Gen8 processors.  So, you may have to adjust what you believe to the effective throughput of the x86 system based on relative performance to the BL460c G7 above, but now, at least, you may have a frame of reference for doing the appropriate calculations.  Clearly, we have shown that 2-tier is an unreliable benchmark by which to size database only systems or partitions and can easily be off by 100% for x86 systems.


[ii] 170,200 SAPS/34,000 users, HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2008003

[iii] 161,520 SAPS/32,000 users, IBM System p 550, 2 Processor/4 Core/8 Thread, POWER6, 4.2 Ghz , AIX 5.3,  DB2 9.5, Certification # 2008001

[iv] 17,550 SAPS/3,500 users , HP ProLiant BL680c G5, 4 Processor/16 Core/16 Thread, E7340, 2.4 Ghz, Windows Server 2008 Enterprise Edition, SQL Server 2008, Certification # 2007055

[v] 10,180 SAPS/2,035 users, IBM System p 570, 2 Processor/4 Core/8 Thread, POWER6, 4.7 Ghz, AIX 6.1, Oracle 10G, Certification # 2007037

[vi] 175,320 SAPS/32,125 users, HP ProLiant BL460c G7, 2 Processor/12 Core/24 Thread, X5675, 3.06 Ghz,  Windows Server 2008 R2 Enterprise on VMware ESX 5.0, SQL Server 2008, Certification # 2011044

[vii]27,880 SAPS/ 5,110 users, HP ProLiant DL380 G7, 2 Processor/12 Core/24 Thread, X5680, 3.33 Ghz, Windows Server 2008 R2 Enterprise, SQL Server 2008, Certification # 2010031


July 30, 2012 - Posted by | Uncategorized | , , , , , , , , ,

1 Comment »

  1. […] SAP, themselves has recommended x86 for their customers. They mentioned “if a workload requires 10 IBM Power 740 3.55GHz cores, it would require 26 BL460c G7 cores.  This contrasts to the per core estimated SAPS based on the 2-tier benchmark which suggests just that the Power 740 would have been just 1.4 time the performance per core….” More details at… […]

    Pingback by Part 1: SAP on VMware : Why choose x86 | AJ's Blog | February 6, 2014 | Reply

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: