An ongoing discussion about SAP infrastructure

SAP Extends support for Business Suite 7 to 2027 and beyond … but the devil is always in the detail

The SAP world has been all abuzz since SAP’s announcement on February 4, 2020 about their extension of support for Business Suite 7 (BS7) which many people know as ECC 6.0 and/or related components.  According to the press release[i], customers with existing maintenance contracts will be able to continue using and getting support for BS7 through the end of 2027 and will be able to purchase extended maintenance support through the end of 2030 for an additional 2 points over their current contracts.

It is clear that SAP blinked first although, in an interview[ii], SAP positions this as a “proactive step”, not as a reaction to customer pushback.   Many tweets and articles have already come out talking about how customers now have breathing room and have clarity on support for BS7.  And if I just jumped on the bandwagon here, those of you who have been reading my blog for years would be sorely disappointed.

And now for the rest of the story

Most of you are aware that BS7 is the application suite which can use one of several 3rd party database options.  Historically, the most popular database platform for medium to large customers was Oracle DB, followed by IBM Db2.  BS7 can also run on HANA and in that context is considered Suite on HANA (SoH).

What was not mentioned in this latest announcement is the support for the underlying databases.  For this, one must access the respective SAP Notes for Oracle[iii] and Db2[iv].

This may come as a surprise to some, but if you are running Oracle, you only have until November of this year to move to Oracle 19c (or Oracle 18c, but that would seem pretty pointless as its support ends in June of 2021.)  But it gets much more fun as that is only supported under normal maintenance until March, 2023 and under extended support until March, 2026.  In theory, there might be another version or dot release supported beyond this time, but that is not detailed in any SAP Note.   In the best-case scenario, Oracle 12 customers will have to upgrade to 19c and then later to an, as yet unannounced, later version which may be more transition than many customers are willing to accept.

Likewise, for Db2 customers, 10.5, 11.1 and 11.5 are all supported through December, 2025.  The good news is that no upgrades are required through the end of 2025.

For both, however, what happens if later versions of either DB are not announced as being supported by SAP. Presumably, this means that a heterogeneous migration to Suite on HANA would be required.  In other words, unless SAP provides clarity on the DB support picture, customers using either Oracle DB or IBM Db2 may be faced with an expensive, time consuming and disruptive upgrade to SoH near the end of 2025.  Most customers have expressed that they are unwilling to do back to back migrations, so if they are required to migrate to SoH in 2025 and then migrate to S/4HANA in 2027, that is simply too close for comfort.

Lacking any further clarification from SAP, it still seems as if it would be best to complete your conversion to S/4HANA by the end of 2025.  Alternately, you may want to ask SAP for a commitment to support your current and/or planned DB for BS7 through the end of 2027, see how they respond and how much they will charge.


February 6, 2020 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Oracle Exadata for SAP revisited

Oracle’s Exadata, Exalogic and Exalytic systems have failed to take the market by storm but that has not stopped Oracle from pushing them as much as possible at every opportunity.  Recently, an SAP customer started to investigate the potential of an Exadata system for a BW environment.  I was called in to explain the issues surrounding such an implementation.   A couple of disclaimers before I start; I am not an Oracle expert nor have I placed hands on an Exadata system, so what I present here is the result of my effort to get educated on this topic.   Thanks go to some brilliant people in IBM that are incredible Oracle and SAP experts and whose initials are R.B., M.C., R.K. and D.R.

My first question is: why would any customer implement BW on a non-strategic platform such as Exadata when BW HANA is available?  Turns out, there are some reasons, albeit a little of a stretch.   Some customers may feel that BW HANA is immature and lacks the ecosystem and robust tools necessary to utilize in production today.  This is somewhat valid and, from my experience, many customers tend to wait a year or so after V1.0 of any product to consider it for production.  That said, even prior to the GA of BW HANA, SAP has reported that HANA sales were very strong, presumably for non-BW purposes.  Some customers may be abandoning the V1.0 principle in some cases which makes sense for many HANA environments where there may be no other way or very limited ways of accomplishing the task at hand, e.g.  COPA.  The jury is out on BW HANA as there are valid and viable solutions today including BW with conventional DBs and BWA.  Another reason revolves around sweetheart deals where Oracle gives 80% or larger discounts to get the first footprint in a customer’s door.  Of course, sweetheart deals usually apply only for the first installation, rarely for upgrades or additional systems which may result in an unpleasant surprise at that time.  Oracle has also signed a number of ULAs (Unlimited License Agreement) with some customers that include an Exadata as part of that agreement.  Some IT departments have learned about this only when systems actually arrived on their loading docks, not always something they were prepared to deal with.

Beside the above, what are the primary obstacles to implementing Exadata?  Most of these considerations are not limited to SAP.  Let’s consider them one at a time.

Basic OS installation and maintenance.  Turns out that despite the system looking like a single system to the end user, it operates like two distinct clusters to the administrator and DBA.  One is the RAC database cluster, which involves a minimum of two servers in a quarter rack of the “EP” nodes or full rack of “EX” nodes and up to 8 servers in a full rack of the “EP” nodes.  Each node must not only have its own copy of Oracle Enterprise Linux, but a copy of the Oracle database software, Oracle Grid Infrastructure (CRS + ASM) and any Oracle tools that are desired, of which the list can be quite significant.  The second is the storage cluster, which involves a minimum of 3 storage serves for a quarter rack, 7 for a half rack and 14 for a full rack.  Each of these nodes has its own copy of Oracle Enterprise Linux and Exadata Storage software.  So, for a half rack of “EP” nodes, a customer would have 4 RAC nodes, 7 storage nodes + 3 Infiniband switches which may require their own unique updates.  I am told that the process for applying an update is complex, manual and typically sequential.  Updates typically come out about once a month, sometimes more often.  Most updates can be applied while the Exadata server is up, but storage nodes, must be brought down, one at a time, to apply maintenance.  When a storage node is taken down for maintenance, apparently data may not be present, i.e. it must be wiped clean which means that after a patchset is applied the data must be copied from one of its ASM created copies.

The SAP Central Instance may be installed on an Exadata server, but if this is done, several issues must be considered.  One, the CI must be installed on every RAC node, individually.   The same for any updates.  When storage nodes are updated, the SAP/Exadata  best practices manual states that the CI must be tested after the storage nodes are updated, i.e. you have to bring down the CI and consequently must incur an outage of the SAP environment.

Effective vs. configured storage.  Exadata offers no hardware raid for storage, only ASM software based RAID10, i.e. it stripes the data across all available disks and mirrors those stripes to a minimum of one other storage server unless you are using SAP, in which case, the best practices manual states that you must mirror across 3 storage servers total.  This offers effectively the same protection as RAID5 with a spare, i.e. if you lose a storage server, you can fail over access to the storage behind that storage server which in turn is protected by a third server.  But, this comes at the cost of the effective amount of storage which is 1/3 of the total installed.  So, for every 100TB of installed disks, you only get 33TB of usable space compared to RAID5 with a 6+1+1 configuration which results in 75TB of usable space.  Not only is the ASM triple copy a waste of space, but every spinning disk utilizes energy and creates heat which must be removed and increases the number of potential failures which must be dealt with.

Single points of failure.  Each storage server has not one, but over a dozen single points of failure.  The infiniband controller, the disk controller and every single disk in the storage server (12 per storage server) represent single points of failure.  Remember, data is striped across every disk which means that if a disk is lost, the stripe cannot be repaired and another storage server must fulfill that request.  No problem is you usually have 1 or 2 other storage servers to which that data has been replicated.  Well, big problem in that the tuning of the system striped across not just the disks within a storage server, but across all available storage servers.  In other words, while a single database request might access data behind a single storage server, complex or large requests will have data spread across all available storage servers.  This is terrific in normal operations as it optimizes parallel read and write operations, but when a storage server fails and another picks up its duties, the one that picks up its duties now has twice the amount of storage it must manage resulting in more contention for its disks, cache, infiniband and disk controllers, i.e. the tuning for that node is pretty much wiped out until the failed storage node can be fixed.

Smart scans, not always so smart.    Like many other specialized data warehouse appliance solutions, including IBM’s Netezza, Exadata does some very clever things to speed up queries.   For instance, Exadata uses a range “index” to describe the minimum and maximum values for each column in a table for a selected set of rows.  In theory, this means that if a “where” clause requests data that is not contained in a certain set of rows, those rows will not be retrieved.  Likewise, “Smart scan” will only retrieve columns that are requested, not all columns in a table for the selected query.  Sounds great and several documents have explained when and why this works and does not work, so I will not try to do so in this document.  Instead, I will point out the operational difficulties with this.  The storage “index” is not a real index and works only with a “brute force” full table scan.  It is not a substitute for an intelligent partitioning and indexing strategy.  In fact, the term that Oracle uses is misleading as it is not a database index at all.  Likewise, smart scans are brute force full table scans and don’t work with indexes.  This makes them useful for a small subset of queries that would normally do a full table scan.  Neither of these are well suited for OLTP as OLTP deals, typically, with individual rows and utilizes indexes to determine the row in question to be queried or updated.  This means that these Exadata technologies are useful, primarily, for data warehouse environments.   Some customers may not want

So, let’s consider SAP BW.  Customers of SAP BW may have ad-hoc queries enabled where data is accessed in an unstructured and often poorly tuned way.  For these types of queries, smart scans may be very useful.  But those same customers may have dozens of reports and “canned” queries which are very specific about what they are designed to do and have dozens of well constructed indexes to enable fast access.  Those types of queries would see little or no benefit from smart scans.  Furthermore, SAP offers BWA and HANA that do an amazing job of delivering outstanding performance on ad-hoc queries.

Exadata also uses Hybrid Columnar Compression (HCC), which is quite effective at reducing the size of tables, Oracle claims about a 10 to 1 reduction.  This works very well at reducing the amount of space required on disk and in the solid state disk caches, but at a price that some customers may be unaware of.  One of the “costs” is that to enable HCC, processing must be done during construction of the table meaning that the time required to import data may take substantially longer.  Another “cost” is the voids that are left when data is inserted or deleted.  HCC works best for infrequent bulk load updates, e.g. remove the entire table and reload it with new data, not daily or more frequent inserts and deletes.   In addition the voids that it leaves, for each insert, update or delete, the “compression unit” (CU) must first be uncompressed and then recompressed with the entire CU written out to disk as the solid state caches are for reads only.  This can be a time consuming process, once again making this technology unsuitable for OLTP much less for DW/BW databases with regular update processes.  HCC is unique to Exadata which means that data which is backed up from an Exadata system may only be recovered to an Exadata system.  That is fine is Exadata is the only type of system used but not so good if a customer has a mixed environment with Exadata for production and, perhaps, conventional Oracle DB systems for other purposes, e.g. disaster recovery.

Speaking of backup, it is interesting to note that Oracle only supports their own Infiniband attached backup system.  The manuals state that other “light weight” backup agents are supported but apparently, third parties like Tivoli Storage Manager, EMC’s Legato Networker or Symantec Netbackup are not considered “light weight” and consequently, not supported.  Perhaps you use a typical split mirror or “flash” backup image that allows you to attach a static copy of the database to another system for backup purposes with minimal interruption to the production environment.  This sort of copy is often kept around for 24 hours in case of data corruption allowing for a very fast recovery.  Sorry, but not only can’t you use whatever storage standard that you may have in your enterprise since Exadata has its own internal storage, but you can’t use that sort of backup methodology either.  Same goes for DR, where you might use storage replication today.  Not an option and only Oracle DataGuard is supported for DR.

Assuming you are still unconvinced, there are a few other “minor” issues.  SAP is not “RAC aware”, as has been covered in a previous blog posting.  This means that Exadata performance is limited by two factors, i.e. a single RAC node represents the maximum possible capacity for a given transaction or query, no parallel queries are issued by SAP.  Secondly, if data that is requested by OLTP transaction, such as may be issued by ECC or CRM, unless the application server that is uniquely associated with a particular RAC node requests data that is hosted in that particular RAC node, data will have to be transferred across the infiniband network within the Exadata system at speeds that are 100,000 times slower than local memory accesses.  Exadata supports no virtualization meaning that you have to go back to a 1990s concept of separate systems for separate purposes.  While some customers may get “sweetheart” deals on the purchase of their first Exadata system, unless customers are unprecedentedly brilliant negotiators, and better than Oracle at that, it is unlikely that these “sweetheart” conditions are likely to last meaning that upgrades may be much more expensive than the first expenditure.  Next is the granularity.  An Exadata system may be purchase in a ¼ rack, ½ rack or full rack configuration.  While storage nodes may be increased separately from RAC nodes, these upgrades are also not very granular.  I spoke with a customer recently that wanted to upgrade their system from 15 cores to 16 on an IBM server.  As they had a Capacity on Demand server, this was no problem.  Try adding just 6.25% cpu capacity to an Exadata system when the minimum granularity is 100%!!  And the next level of granularity is 100% on top of the first, assuming you went from ¼ to ½ to full rack.

Also consider best practices for High Availability.  Of course, we want redundancy among nodes, but we usually want to separate components as much as possible.  Many customers that I have worked with place each node in an HA cluster in separate parts of their datacenter complex, often in separate buildings on their campus or even geographic separation.  A single Exadata system, while offering plenty of internal redundancy, does not protect against the old “water line” break, fire in that part of the datacenter, or someone hitting the big red button.  Of course, you can add that by adding another ¼ or larger Exadata rack, but that comes with more storage that you may or may not need and a mountain of expensive software.    Remember, when you utilize conventional HA for Oracle, Oracle’s terms and conditions allow for your Oracle licenses to transfer, temporarily, to that backup server so that additional licenses are not required.   No such provision exists for Exadata.

How about test, dev, sandbox and QA?  Well, either you create multiple separate clusters within each Exadata system, each with a minimum of 2 RAC nodes and 3 storage nodes, or you have to combined different purposes together and share environments between environments that your internal best practices suggest should be separated.  The result is either multiple non-prod systems or larger systems with considerable excess capacity may be required.  Costs, of course, go up proportionately or worse, may not be part of the original deal and may receive a different level of discounts.  This compares to a virtualized Power Systems box which can host partitions for dev, test, QA and DR replication servers simultaneously and without the need for any incremental hardware, beyond memory perhaps.  In the event of a disaster declaration, capacity is automatically shifted toward production but dev, test and QA don’t have to be shut down, unless the memory for those partitions is needed for production.  Instead, those partitions, simply get the “left over” cycles that production does not require.

Bottom line:  Exadata is largely useful only for infrequently updated DW environments, not the typical SAP BW environment, provides acceleration for only a subset of typical queries, is not useful for OLTP like ECC and CRM, is inflexible lacking virtualization and poor granularity, can be very costly once a proper HA environment is constructed, requires non-standard and potentially duplicative backup and DR environments, is a potential maintenance nightmare and is not strategic to SAP.

I welcome comments and will update this posting if anyone points out any factual errors that can be verified.


I just found a blog that has a very detailed analysis of the financials surrounding Exadata.  It is interesting to note that the author came to similar conclusions as I did, albeit from a completely different perspective.

May 17, 2012 Posted by | Uncategorized | , , , , , , , , , , , , | 6 Comments

IBM PureSystems for SAP

On April 11, 2012, IBM announced a new family of converged architecture systems called PureSystems.  IBM does not use the term “converged architecture” in the materials released with this announcement, preferring the term “Expert Integrated Systems” due to the fact that it goes well beyond the traditional definition of converged infrastructure.   Other companies have offered converged infrastructure in one form or another for a few years now.  HP introduced this concept several years ago, but in my countless meetings with customers, I have yet to hear a single customer mention it.  They talk about HP blades frequently, but nothing about the converged solution.  Cisco UCS, on the other hand, is much more often mentioned in this context.  While Oracle might try to suggest that they offer a set of converged infrastructure solutions, I believe that would be a stretch as each of the Exa offerings stand on their own, each with their own management, storage and network framework.  The Exa solutions might better be described as special purpose appliances with a converged hardware/software stack.  Dell’s converged solution is basically a management layer on top of their existing systems.  This would be like IBM trying to suggest that IBM Systems Director is a converged infrastructure, which has never been the case as this would imply that IBM is ignorant.


IBM learned from the missteps and mistakes of our competitors and designed a solution that takes a leadership position.  Let’s take a short journey through this new set of offerings during which I will attempt to illustrate how it is superior to competitive offerings.  A more comprehensive look at PureSystems can be found at:


Converged infrastructures generally include servers, storage, networking, virtualization and management.  Efficient utilization of resources is at the cornerstone of the value proposition in that businesses can deliver significantly more application environments with fewer personnel, lower hardware costs, greater datacenter density and lower environmental costs.  Whether and how well companies deliver on these promises is where the story gets interesting.


Issues #1: open vs. closed.  Some, such as Oracle’s Exa systems, are so closed that existing storage, servers, virtualization or software that you may have from a company other than Oracle, with rare exceptions, cannot be part of an Exa system.  Others suggest openness, but rapidly become more closed as you take a closer look.  Cisco UCS is open as long as you only want x86 systems, networking and SAN switches only from Cisco and virtualization only from VMware or Microsoft HyperV.  VCE takes UCS further and limits choices by including only EMC V-Max or Clarion CS4-480 and VMware.  By comparison, PureSystems are built on openness starting with the choice of nodes, x86 and Power, and OSs, Microsoft Windows, RedHat and SUSE Linux, AIX and IBM i.  Supported virtualization offerings include VMware, HyperV, KVM and PowerVM.  Storage can be almost anything that is supported by the IBM V7000 which includes most EMC, HDS, HP, Netapp and, of course, all IBM storage subsystems.  Networking is built into each PureSystems chassis but supports network adapters from Broadcom, Emulex and Mellanox and Fibre Channel adapters from Qlogic, Emulex and Brocade plus both QDR and FDR Infiniband adapters.  Top of rack (TOR) switching can be provided by just about any network technology of your choosing.  Management of the nodes, networking, storage and chassis is provided by IBM, but is designed to be compatible with IBM Systems Director, Tivoli and a variety of upstream managers.


Issue #2: management interface.  IBM spent a great many person years developing a consistent, intuitive and integrated management environment for PureSystems.  Among a wide variety of cross system management features, this new interface provides a global search feature allow an administrator to quickly identify where a virtual resource is located in the physical world.  Ask any administrator and you will find this is a lot more difficult than it sounds.    Cisco does a great job of demonstrating UCS based on an impressive level of prep work.  They show how easily images can be cloned and provisioned and this is indeed a significant accomplishment.  The problem is that a significant amount of prep work is required.  Likewise, when changes occur in the underlying environment, e.g. a new storage subsystem is attached or expanded, or a new network environment is added, a different set of management tools must be utilized, each with their own interface and some less intuitive than others. VCE offers a more consistent and intuitive interface, but at the cost of a very rigid set of components and software.  For instance, “Vblocks”, the term for VCE systems, must be implemented in large chunks, not granularly based on customer demands, must be “approved” for SW or firmware updates by VCE, even emergency fixes for known security issues, and do not allow any sort of outside components at all.


Issue #3: the network is the computer.  This is a bit tongue in cheek as that was the slogan of the old Sun company, anyone remember them?  Cisco’s architecture seems to be an echo of this old and outdated concept.  PureSystems, as noted above, provides an integrated network but allows a wide variety of adapters and upstream devices.  By choice, customers can directly integrate multiple chassis together without the need for a top of rack switch until and whenever they want to communicate with external networks.  For instance, should an SAP application server have to use a TOR switch to talk to a SAP DB server?  Should a DB2 PureScale cluster have to use a TOR switch to talk among its nodes and to a central lock manager?  Should an Oracle RAC cluster have to incur additional latency when communicating with its distributed lock manager?  IBM believes the answer to all of the above is that it is up to the customer.  If you want to use a TOR switch, you should, but that should be your choice, not a mandate.  After all, IBM’s goal is to provide an excellent computing infrastructure, not sell switches.  By comparison, Cisco’s architecture is dependent on very expensive 6100 and similar interconnects.  In fact, Cisco even suggests that customers utilize VM-FEX technology as they claim that it greatly simplifies network management.  What some customers may not realize is that to utilize this technology, you must disable the virtual switch used by VMware.  This switch allows different VMs on a single system to communicate at near memory speeds.  Using VM-FEX, this switch is disabled and communication, even between adjacent VMs, must communicate via TOR switches and instead of interconnect latencies measured in 100s of nanoseconds, those latencies can take several orders of magnitude greater time.


For SAP, it is reasonable to ask whether a converged infrastructure solution is required.  Clearly, the answer is no, as customers have been successfully implementing SAP on everything from single, massive, 2-tier virtualized servers to large arrays of 3-tier, small, non-virtualized systems and everything in between for many years now.  There is nothing on the SAP roadmap that specifies or certifies such technology.  But, is there value in such a solution.  The answer, obviously, is yes.


While consolidation of many different SAP instances on large, 2-tier virtualized systems offers tremendous value to customers, there are a variety of reasons why customers chose to utilize a landscape with multiple, smaller servers.  Cost of acquisition is usually the biggest factor and is almost always less when small servers are utilized.  The desire to not have all eggs in one basket is another.  Some customers prefer to keep production and non-production separate.  Yet others are uncomfortable with the use of virtualization for some systems, e.g. Oracle database systems under VMware.  This is not intended to be a comprehensive list as there may be many other factors that influence the use of such an architecture.


If multiple systems are utilized, it is very easy to get into a situation in which their utilization is low, the number of systems is multiplying like rabbits, the cost of management is high, flexibility is low, the space required is ever increasing and power/cooling is a growing concern.  In this situation, a suitably flexible converged infrastructure solution may be the optimal solution to these problems.


PureSystems may be the best solution for many SAP customers.  For existing Power Systems customers, it allows for a very smooth and completely binary compatible path to move into a converged architecture.  Both 2 socket and 4 socket Power nodes are available, the p260 and p460.  Pure Systems feature an improved airflow design and a higher power capacity than IBM’s BladeCenter, which therefore allows for nodes that can be outfitted with the latest processors running at their nominal frequency and with full memory complements.  As a result, these new nodes feature performance that is very close to the standalone Power 740 and 750 systems respectively.  With a very fast 10Gb/sec Ethernet backbone, these new nodes are ideal for virtualized application and non-production servers.  The p460 offers additional redundancy and support of dual VIO servers which makes it an excellent platform for all types of servers including database.   One, or many, of these nodes can be used for part of an SAP landscape featuring existing rack mount servers and BladeCenter blades.  Live Partition Mobility is supported between any of these nodes assuming compatible management devices, e.g. HMC, SDMC or PureFlex Manager.


Entire SAP landscapes can be hosted completely within one or more Pure Systems chassis.  Not only would such a configuration result in the most space efficient layout, but it would provide for optimized management and the fastest, lowest latency possible connections between app and DB servers and between various SAP components.


Some SAP customers feel that a hybrid approach, e.g. using Power Systems for database and x86 systems for application servers, is the right choice for them.  Once again, PureSystems delivers.  Power and x86 nodes may coexist in the same chassis, using the same management environment, the same V7000 virtualized data storage and, of course, the same network environment.  Clearly, the OSs, virtualization stacks and system characteristics are dependent on the underlying type of node, but regardless, they are all designed to work seamlessly together.


Yet other customers may prefer a 100% x86 landscape.  This is also completely supported and offers similar benefits as the 100% Power environment described above, with the inherent advantages or disadvantages of each respective platform characteristics, which has been discussed at some length in my other blog postings.


There are many good blogs that have discussed PureSystems.  Here are but a few that you may wish to check out:


May 8, 2012 Posted by | Uncategorized | , , , , , , , , , , , , , , , , | 5 Comments

ERP Platform Selection – An analysis of Gartner Group’s presentation

Recently, a document from The Gartner Group found its way across my desk.  This document was written by Philip Dawson and Donald Feinburg and was called: “Virtualizing SAP and Oracle: ERP Optimization”.  The document seemingly has a single point; everything except x86 is dead so all new implementations should go on x86 and almost everything else should be migrated to x86 when possible.   Not only is this conclusion short sighted and oblivious to the requirements of large SAP implementations, but we should remember that this is the same company that proclaimed the mainframe dead about 20 years ago, a fact that is as incorrect today as it was back then.

First, the argument that Gartner did not make and is strangely absent but, I think, most customers have as their number #1 criteria: TCO.   Actually, they did in a backhanded sort of way, but in support of legacy UNIX systems and only for tier 2 ERP instances.  I have worked directly with hundreds of SAP and Oracle customers.  While a delight to work with one without financial constraints, these are few and far in between.  Most customers must find ways to save money.  A complex ERP landscape requires a wide array of systems/partitions, high availability, DR, backup/recovery, storage devices, etc.  Companies like International Technology Group (ITG) found that when TCO is analyzed, landscapes based on IBM Power Systems are less expensive than x86:  Solitaire Interglobal Ltd analyzed TCO and many other factors without limiting their analysis to ERP systems and came up with a similar conclusion:  I have put together many landscape comparisons for SAP and have also been able to show the same effect when considering the support that is required for mission critical environments, the cost of enterprise level virtualization technologies and the cost of high availability software with all being supported for a minimum of 3 years with 24×7 by 4 hour maintenance.

All by itself, a consultant’s report which purports to help customers choose their ERP platforms but does not take this vital issue into consideration is questionable at best.  That said, allow me to take a few of the arguments they did make in the document and explain where each falls short.

Argument #1 – “ISVs focus new application functionality first on x86 Tier 1 ports. Largest installed base. – Invest!”  At least in the case of SAP, with the exception of HANA, new functionality is available on all Tier 1 platforms simultaneously, not first on x86.  AIX is considered a Tier 1 platform and shares in this “first” port.  Oracle understandably supports its OSs first, i.e. Linux and Solaris, but actually ports to AIX and other Tier 1 platforms during the same development process and simply limits other OS availability for marketing purposes.  Second point; Gartner, as a company, certainly has the experience that apparently these two, either naive or misinformed, individuals don’t, i.e. that number of installs does not equate to size of the install base.   A Fortune 100 company may have 1,000 or more times the number of SAP users and hundreds of thousands more employees compared to a small customer, but the number of installed copies of an ISV’s software may be exactly the same.  Install base is far more relevant when one considers the % of revenue/profit derived from a type of platform than number of CDs cut.  I know of a few food manufacturers that have thousands of retail customers, but sell the vast majority of their products through “just” Walmart and a few other mega retailers.  By the authors’ logic, they should abandon the mega retailers as the “largest install base” as a count of individual companies they deal with lies in the mom and pop grocery stores.

Argument #2 – “Unix/RISC & Itanium  Viable for mainstream applications and functions (but be prepared to accept delays in new features, functions and patches). Declining installed bases —Move to mainstream on system refresh!”  Some of this is certainly true for the Itanium environment due to Oracle’s explicit removal of Itanium support.  On the AIX side, however,  this statement could not be more incorrect.  Neither Oracle nor SAP has published any such guidance, made any public pronouncements or changed their level of investment with IBM and ongoing development and support efforts continue as before, i.e. AIX is a tier 1 platform and supported as such.  The install base for Power, by the way, is not declining and is benefiting from those customers moving away from HP/Itanium and Oracle/Solaris.

Argument #3 – “Move ERP application deployments toward Linux or Windows on x86. Everything else moving to niche.”  Their conclusion, as with the first argument, is based on marketshare numbers which are not published by SAP or Oracle and are unrelated to the value of installations.  SAP, for example, has a product availability matrix and nothing in that matrix suggest anything remotely close to niche status for AIX.

Argument #4 – “x86 performance and RAS features reach parity, drive Windows and Linux volumes further.”  If only there was a shred of proof to that, this might be an interesting conclusion.  The highest x86 result on the single system SAP SD 2-tier benchmark is 25,500 users/140,720 SAPS using the IBM x3850 x5 with 80 cores.  Not bad, but the highest Power Systems result is 126,063 users/688,630 SAPS using the Power 795 with 256 cores.  That is almost 5 times the size of the x86 result.  Perhaps they meant per core.  While Gartner published this last year and therefore did not have the benefit of the recent results publishes on the SAP benchmark web site, I will share them here: the best x86 result is for the IBM x3650 M4 with 2 @ E5-2690 processors/16 cores achieved 7,855 users/42,880 SAPS  or 2,680 SAPS/core.  The Power 730, with 2 Power7 3.55Ghz chips/16 cores, achieved 8,704 users/47,600 SAPS or 2,975 SAPS/core.  While the x86 results are impressive, they do not achieve parity with Power unless Gartner’s has submitted a new definition to Webster’s Dictionary of parity being 90% not 100%.

As to RAS, wow!, where do I start?  Well, RAS starts with error detection.  Since the mid 90’s, Power Systems have offered First Failure Data Capture (FFDC), previously only found on mainframe systems.  This technology is used pervasively throughout the system down to the processors and memory subsystems.  It is instrumented to detect soft and hard errors in virtually every part of the system and each chip such that the service processor can make intelligent decisions as to whether an error might cause a partition or system failure and then take appropriate actions to avoid or minimize such an event.  In the unlikely event of an unpredicted failure than can’t be contained, this technology allows the service processor to pinpoint the location of the failure such that the failing component can be blacklisted, the system or partition rebooted around the failing component or in a worst case scenario, a FRU number or the failing component to be sent to IBM for immediate replacement.  Intel recently introduced Machine Check Architecture Recovery (MCA Recovery) which is less instrumented and less functional than FFDC was back in the mid 90’s.  Currently, it can pinpoint only memory errors, but it does not utilize a service processor and instead asks the operating system or virtualization manager to handle the error.  This is sort of like GM instrumenting their engines such that when they detect a problem such as poor fuel quality or low air temperature, that the car does nothing or perhaps fails and suggests the user call a repair shop instead of the system automatically detecting the problem and adjusting the fuel injection system to compensate.

Add to this features like dynamic processor deallocation, L2 and L3 cache line delete, enhanced error detection and recovery for PCI adapters, to name just a few, for which there is no equivalent in x86 systems.  Or go one step further and look at Power System’s ability to retry an instruction on any processor in the event of a processor core error detection or being able to pick up the actual thread and drop it on another processor in the system, with no interruption to application execution, if the core is determined to be failing.  No such thing exists in x86.  How about memory protection keys which locks down memory such that, for instance, a failing device driver can corrupt the OS kernel’s memory or a misbehaving an application can corrupt the file system’s memory?  No such thing exists in x86 land.  High end Power Systems even offer the ability to recover from a system clock failure through a redundant clock as well as to add/remove a processor card while the system is running without any interruption to the running partitions such that a failing component can be replaced.  Once again, x86 systems do not offer these sort of features.  So, it makes you wonder on which planet these Gartner guys reside when they make such statements!

Argument #5 – “Consider x86 virtualization for HA.” Where do I start with such an absurd statement?   First of all, Oracle has stated that if you run with VMware or other non-Oracle VM products and have a problem, you must prove that the problem was not caused by the virtualization software by recreating the problem on a standalone system.  If that was not enough to cause a customer to avoid placing a database under virtualization, then the high I/O overhead and increased latencies probably will.  That said, some customers might still but DB under x86 virtualization, but then the issue of HA must be considered.   VMware HA does a great job of detecting a failed system or partition and recovering one or more on other systems, but VMware HA does not do system stack recovery.  For customers that run SAP, for instance, in addition to the database running, one must also ensure that the messaging and enqueue servers are running, that all network files that must be accessed are available, that all file systems are available and that if something is not, that retry or repair actions are initiated prior to restarting the entire partition.  VMware does provide an API by which third parties might write application APIs, as Symantec has done, but few proof points exist as to how well this works in production SAP environments.

Argument #6 – “All UNIX for Large Oracle DBMS deployments greater than 64 cores.”  So, if a Power platform running AIX is less expensive, more reliable and more secure, Gartner’s argument is that you should ignore these facts and, for smaller DBMS instances than 64 cores, place these on x86.  Interesting and hard to argue with an statement that makes no logical sense.

In conclusion, Gartner has produced a document which suggests a system selection based on incorrect facts, assumptions and which ignores the criteria that is important to customers.  By comparison, there is a wealth of arguments that can be made as to why Power Systems continues to be the most robust, cost effective, reliable and secure platform for ERP and other applications.

March 23, 2012 Posted by | Uncategorized | , , , , , , , , , , | 2 Comments

The top 3 things that SAP needs are memory, memory and I can’t remember the third. :-) A review of the IBM Power Systems announcements with a focus on the memory enhancements.

While this might not exactly be new news, it is worthwhile to consider the value of the latest Power Systems announcements for SAP workloads.  On October 12, 2011, IBM released a wide range of enhancements to the Power Systems family.  The ones that might have received the most publicity, not to mention new model numbers, were valuable but not the most important part of the announcement, from my point of view.  Yes, the new higher MHz Power 770 and 780 and the ability to order a 780 with 2 chips per socket thereby allowing the system to grow to 96 cores were certainly very welcome additions to the family.  Especially nice was that the 3.3 GHz processors in the new MMC model of the 770 came in at the same price as the 3.1 GHz processors in the previous MMB model.  So, 6.5% more performance at no additional cost.

For SAP, however, raw performance often takes second fiddle to memory.   The old rule is that for SAP workloads, we run out of memory long before we run out of CPU.   IBM started to address this issue in 2010 with the announcement of the Active Memory Expansion (AME)  feature of POWER7 systems.  This feature allows for dynamic compression/decompression of memory pages thereby making memory appear to be larger than it really is.   The administrator of a system can select the target “expansion” and the system will then build a “compressed” pool in memory into which pages are compressed and placed starting from those pages less frequently accessed to those more frequently accessed.  As pages are touched, they are uncompressed and moved into the regular memory pool from which they are accessed normally.  Applications run unchanged as AIX performs all of the moves without any interaction or awareness required by the application.   The point at which response time, throughput or a large amount of CPU overhead starts to occur is the “knee of the curve”, i.e. slightly higher than the point at which the expansion should be set.  A tool, called AMEPAT, allows the administrator to “model” the workload prior to turning AME on, or for that matter on older hardware as long as the OS level is AIX 6.1 TL4 SP2 or later.

Some workloads will see more benefit than others.  For instance, during internal test run by IBM, the 2-tier SD benchmark showed outstanding opportunities for compression and hit 111% expansion, e.g. 10GB of real memory appears to be 21GB to the application, before response time or thoughput showed any negative effect from the compression/decompression activity.  During testing of a retail BW workload, 160% expansion was reached.  Even database workloads tend to benefit from AME.  DB2 database, which already feature outstanding compression, have seen another 30% or 40% expansion.  The reason for this difference comes from the different approaches to compression.  In DB2, if 1,000 residences or business have an address on Main Street,  Austin, Texas,  (had to pick a city so selected my own) DB2 replaces Main Street, Austin, Texas in each row with a pointer to another table that has a single row entitled Main Street, Austin, Texas.  AME, by comparison, is more of an inline compression, e.g. if it sees a repeating pattern, it can replace that pattern with a symbol that represents the pattern and how often it repeats.  Oracle recently announced that they would also support AME.  The amount of expansion with AME will likely vary from something close to DB2, if Oracle Advanced Compression is used, to significantly higher if Advanced Compression is not used since many more opportunities for compression will likely exist.

So, AME can help SAP workloads close the capacity gap between memory and CPU.  Another way to view this is that this technology can decrease the cost of Power Systems by either allowing customers to purchase less memory or to place more workloads on the same system, thereby driving up utilization and decreasing the cost per workload.  It is worthwhile to note than many x86 systems have also tried to address this gap, but as none offer anything even remotely close to AME, they have instead resorted to more DIMM slots.  While this is a good solution, it should be noted that twice the number of DIMMs requires twice the amount of power and cooling and suffers from twice the failures, i.e. TANSTAFL: there ain’t no such thing as a free lunch.

In the latest announcements, IBM introduced support for the new 32GB dimms.  This effectively doubled the maximum memory on most models, from the 710 through the 795.  Combined with AME, this decreases or eliminates the gap between memory capacity and  CPU and makes these models even more cost effective since more workloads can share the same hardware.  Two other systems received similar enhancements recently, but these were not part of the formal announcement.  The two latest blades in the Power Systems portfolio, the PS703 and the PS704, were announced earlier this year with twice the number of cores but the same memory as the PS701 and PS702 respectively.  Now, using 16GB DIMMS, the PS703/PS704 can support up to 256GB/512GB of memory making these blades very respectable especially for application server workloads.  Add to that, with the Systems Director Management Console (SDMC) AME can be implemented for blades allowing for even more effective memory per blade.   Combined, these blades have closed the price difference even further compared to similar x86 blades.

One last memory related announcement may have been largely overlooked by many because it involved an enhancement to the Active Memory Sharing (AMS) feature of PowerVM.  AMS has historically been a technology that allowed for overcommitment of memory.  While CPU overcommitment is now routine, memory overcommitment means that some % of memory pages will have to be paged out to solid state or other types of disk.  The performance penalty is well understood making this not appropriate for production workloads but potentially beneficial for many other non-prod, HA or DR workloads.  That said, few SAP customers have implemented this technology due to the complexity and performance variability that can result.  The new announcement introduces Active Memory™ Deduplication for AMS implementations.   Using this new technology, PowerVM will scan partitions after they finish booting and locate  identical pages within and across all partitions on the system.  When identical pages are detected, all copies, except one, will be removed and all memory references will point to the same “first copy” of the page.   Since PowerVM is doing this, even the OSs can be unaware of this action.  Instead, as this post processing proceeds, the PowerVM free memory counter will increase until a steady state has been reached.  Once enough memory is freed up in this manner, new partitions may be started.  It is quite easy to imagine that a large number of pages are duplicates, e.g. each instance of an OS has many read only pages which are identical and multiple instances of an application, e.g. SAP app servers, will likewise have executable pages which are identical.  The expectation is that another 30% to 40% effective memory expansion will occur for many workloads using this new technology.  One caveat however; since the scan is after a partition boots, operationally it will be important to have a phased booting schedule to allow for the dedupe process to free up pages prior to starting more partitions thereby avoiding the possibility of paging.  Early testing suggests that the dedupe process should arrive at a steady state approximately 20 minutes after partitions are booted.

The bottom line is that with the larger DIMMS, AME and AMS Memory Deduplication, IBM Power Systems are in a great position to allow customers to fully exploit the CPU power of these systems by combining even more workloads together on fewer servers.  This will effectively drive down the TCA for customers and remove what little difference there might be between Power Systems and systems from various x86 vendors.

November 29, 2011 Posted by | Uncategorized | , , , , , , , , , , , , | 4 Comments

Oracle publishes another SAP benchmark result with limited value

About a month ago, I posted a review of Oracle’s SAP ATO benchmark result and pointed out that ATO is so obscure and has so few results, that other than marketing value, their result was completely irrelevant.  About two weeks later, they published a result on another rarely used SAP benchmark, Sales and Distribution-Parallel.  While there have been a few more publishes on this variety of SD than with the ATO benchmark, the number of publishes prior to this one over the past two years could be counted on one hand, all by Oracle/Sun.

This benchmark is not completely irrelevant, just without context and competitors, it says very little about the performance of systems for SAP.  As the name implies, it requires the use of a parallel database.  While some customers have implemented SAP with a parallel database like Oracle RAC, these customers represent a very small minority, reportedly less than 1% of all SAP customers.  The reason has been discussed in my post on Exadata for SAP, so I will only summarize it here.  SAP is RAC enabled, not RAC aware meaning that tuning and scalability can be a real challenge and not for the faint of heart.  While a good solution for very high availability, the benefit depends on how often you think that you will avoid a 20 minute or so outage.   Some non-IBM customers predict their DB server will only fail every 2 years meaning RAC may help avoid 10 minutes of downtime per year for those customers.  Obviously, if the predicted failure rate is higher or the time for recovery is longer, the benefits of RAC can increase proportionately and if failures occur less often, then the value decreases.

But that is not why this benchmark is of little value.  In reality, the SD benchmark is approximately 1/16 DB workload, the rest being app servers and CI.  To put that in context, for this benchmark, at 1/16 of the total, the DB workload would be approximately 46,265 SAPS.  A 16-core Power 730 has a single system result higher than that as do just about all Westmere-EX systems.  In other words, for scalability purposes, this workload simply does not justify the need for a parallel database.  In addition, the SD benchmark requires that the app servers run on the same OS as the DB server, but since this is SD-Parallel, app servers must run on each node in the cluster.  This turns out to be perfect for benchmark optimization.   Each group of users assigned to an app server is uniquely associated with the DB node on the same server.  The data that they utilize is also loaded into the local memory of that same server and virtually no cross–talk, i.e. remote memory accesses, occurs.  These types of clustered results inevitably show near-linear scalability.  As most people know, near-linear scalability is not really possible within an SMP much less across a cluster.  This means that high apparent scalability in this benchmark is mostly a work of fiction.

Before I am accused of hypocrisy, I should mention that IBM also published results on the SD-parallel benchmark back in early 2008.  Back then, the largest single image SD result achieved 30,000 users @ 152,530 SAPS by HP on the 128 core Superdome of that era.  While a large number, there were customers that already had larger SAP ERP instances than this.  So, when IBM proved that it could achieve a higher number, 37,040 users @ 187,450 SAPS with a 5-node cluster with a total of only 80 cores, this was an interesting proof point especially since we also published a 64-core single image result of 35,400 users @ 177,950 SAPS using the Power 595 within a few days.  In other words, IBM did not try to prove that the only way to achieve high results was using a cluster, but that a cluster could produce comparable results with a few more cores.  In other words, the published result was not a substitute for providing real, substantial results, but in addition to those as a proof of support of Oracle and Oracle RAC.   The last time that Oracle or Sun provided a single image SD result was way back in December, 2009, almost ancient history in the computer world.

This new result, 134,080 users @ 740,250 SAPS on a cluster of 6 Sun Fire x4800 systems, each with 80 Intel Xeon cores is a very high result, but only surpasses the previous high water result on any version of the SD benchmark by 6% while requiring 87.5% more cores.  We can debate whether any customer would be willing to run a 6-node RAC cluster for OLTP.  We can also debate how many customers … in the entire world, have single instance requirements anywhere close to this level.  A typical SAP customer might have 1,000 to 5,000 named users but far fewer concurrent users.  This benchmark does nothing to help inform those customers about the performance they could expect using Oracle systems.

So, this new parallel result neither demonstrates true parallel scalability nor single system scalability or even relevance for small to even very large SAP workloads.  In other words, what value does it provide to the evaluators of technology?   Nothing!  What value does it provide to Oracle?  Plenty!  Not only do they get to beat their chest about another “leadership” result, but they get to imply that customers can actually achieve these sorts of results with this and various other untested and unproven configurations.  More importantly, if customers were actually to buy into RAC as being the right answer for scalability, Oracle would get to harvest untold millions of dollars in license and maintenance revenues.  This configuration included 480 cores meaning customers not utilizing an OEM license through SAP, would have to pay, 480 x .5 (core license factor) x ($47,500 (Oracle license cost) + $22,500 (Oracle RAC license cost)) = $16.8M @ list for the Oracle licenses and another $18.5M for 5 years of maintenance and this is assuming no Oracle tools such as Partitioning, Advanced Compression, Diagnostic Pack, Tuning Pack, Change Management Pack or Active Data Guard.

For comparison, the largest single image system result for IBM Power 795, mentioned above, achieved  just 6% few users with DB2 on a 256 core system.  A 256-core license of DB2 would cost a customer, 256 x (120 PVU) x ($405/PVU) = $12.4M @ list for the DB2 licenses and another $10.9M for 4 years of maintenance (first year of maintenance is included as warranty as opposed to Oracle which charges for the first year of maintenance.)  So, the DB2 license would not be inexpensive, total of $23.3M over 5 years, but that is quite a bit better than the $35.5M for the Oracle licenses mentioned above.

Full results are available at:

October 4, 2011 Posted by | Uncategorized | , , , , , , | Leave a comment

Oracle M9000 SAP ATO Benchmark analysis

SAP has a large collection of different benchmark suites.  Most people are familiar with the SAP Sales and Distribution (SD) 2-tier benchmark as the vast majority of all results have been published using this benchmark suite.   A lesser known benchmark suite is called ATO or Assemble-to-Order.  When the ATO benchmark was designed it was intended to replace SD as a more “realistic” workload. As the benchmark is a little more complicated to run and SAP Quicksizer sizings are based on the SD workload the ATO benchmark never got much traction and from 1998 through 2003, only 19 results were published.  Prior to September 2, 2011, this benchmark had seemed to become extinct.  On that date, Oracle and Fujitsu, published a 2-tier result for the SPARC M9000 along with the predictable claim of world record result.  Oracle should be commended for having beaten the results published in 2003.  Of course, we might want to consider that a 2-processor/12-core, 2U Intel based system of today has already surpassed the TPC-C results of a 64-core HP Itanium Superdome that “set the record” back in 2003 at a tiny fraction of the cost and floor space.


So we give Oracle a one-handed clap for this “accomplishment”.  But if I left it at that, you might question why I would even bother to post this blog entry.  Let’s delve a little deeper to find the story within the story.  First let me remind the reader, these are my opinions and in no way do they reflect the opinions of IBM nor has IBM endorsed or reviewed my opinions.


In 2003, Fujitsu-Siemens published a couple of ATO results using a predecessor of today’s SPARC64 VII chip called SPARC64TM V at 1.35GHz and SAP 4.6C.  The just published M9000 result used the SPARC64 VII at 3.0GHz and SAP EP4 for SAP ERP 6.0 with Unicode.  If one were to divide the results achieved by both systems by the number of cores and compare them, one might find that the new results deliver about a very small increase in throughput per core of roughly 6% over the old results.  Of course, this does not account for the changes in SAP software, Unicode or benchmark requirements.   SAP rules do not allow for extrapolations, so I will instead provide you with the data from which to make your own calculations.  100 SAPS using SAP 4.6c is equal to about 55 SAPS using Business Suite 7 with Unicode.   If you were to multiply the old result by 55/100 and then divide by the number of cores, you could determine the effective throughput per core of the old system if it were running the current benchmark suite.  I can’t show you the result, but will show you the formula that you can use to determine this result yourself at the end of this posting.


For comparison, I wanted to figure out how Oracle did on the SD 2-tier benchmark compared to systems back in 2003.  Turns out that almost identical systems were used both in 2003 and in 2009 with the exception of the Sun M9000 which used 2.8GHz processors each of which had half of the L2 cache of the 3.0GHz system used in the ATO benchmark.  If you were to use a similar formula to the one described above and then perhaps multiply by the difference in MHz, i.e. 3.0/2.8 you could derive a similar per core performance comparison of the new and old systems.  Prior to performing any extrapolations, the benchmark users per core actually decreased between 2003 and 2009 by roughly 10%.


I also wanted to take a look at similar systems from IBM then and now.  Fortunately, IBM published SD 2-tier results for the 8-core 1.45GHz pSeries 650 in 2003 and for the a 256-core 4.0GHz Power 795 late last year with the SAP levels being identical to the ones used by Sun and Fujitsu-Siemens respectively.  Using the same calculations as were done for the SD and ATO comparisons above, IBM achieved 223% more benchmark users per core than they achieved in 2003 prior to any extrapolations.


Yes, there was no typo there.  While the results by IBM improved by 223% on a per core basis, the Fujitsu processor based systems either improved by only 9% or decreased by 10% depending on which benchmark you chose.  Interestingly enough, IBM had only a 9% per core advantage over Fujitsu-Siemens in 2003 which increased to a 294% advantage in 2009/2010 based on the SD 2-tier benchmark.


It is remarkable that since November 18, 2009, Oracle(Sun) has not published a single SPARC based SAP SD benchmark result while over 70 results were published by a variety of vendors, including two by Sun for their Intel systems.  When Oracle finally decided to get back into the game to try to prove their relevance despite a veritable flood of analyst and press suggestions to the contrary, rather than competing on the established and vibrant SD benchmark, they choose to stand on top of a small heap of dead carcasses to say they are better than the rotting husks upon which they stand.


For full disclosure, here are the actual results:

SD 2-tier Benchmark Results

Certification Date        System                                                                                # Benchmark Users               SAPS                Cert #

1/16/2003                 IBM eServer pSeries 650, 8-cores                                                  1,220                            6,130               2003002

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                     1,120                            5,620               2003009

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900, 16-cores                    2,200                            11,080              2003010

11/18/2009               Sun Microsystems, M9000, 256-cores                                             32,000                          175,600            2009046

11/15/2010               IBM Power 795, 256-cores                                                           126,063                        688,630            2010046


ATO 2-tier results:

Certification Date        System                                                                     Fully Processed Assembly Orders/Hr            Cert #

3/11/2003                 Fujitsu Siemens Computers, PrimePower 900,  8-cores                6,220                                        2003011

03/11/2003               Fujitsu Siemens Computers, PrimePower 900, 16-cores               12,170                                       2003012

09/02/2011               Oracle M9000, 256-cores                                                        206,360                                     2011033


Formulas that you might use assuming you agree with the assumptions:


Performance of old system / number of cores * 55/100 = effective performance per core on new benchmark suite (EP)


(Performance of new system / cores ) / EP = relative ratio of performance per core of new system compared to old system


Improvement per core = 1 – relative ratio


This can be applied to both the SD and ATO results using the appropriate throughput measurements.

September 9, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , | Leave a comment

HANA – Implications for UNIX systems

Ever since SAP announced HANA, I have received the occasional question about what this product means to UNIX systems, but the pace of those questions picked up significantly after Sapphire. Let me address the three phases of SAP in-memory database computing as I understand them.

HANA, High performance ANalytical Appliance, is the first in-memory database application. According to SAP, with very little effort, a company can extract large sets of data from their current SAP and non-SAP systems and, in near real time, keep that data extract up to date, at least for SAP systems. The data is placed into in-memory columns which are not only high compressible but are very fast for ad-hoc searches. Though Hasso Plattner talked about 10 to 1 compression, individuals that I have talked to that have direct experience with the current technology tell me that 5 to 1 is more likely. Even at 5 to 1, a 1TB conventional DB would fit into 200GB using HANA. The goal is not necessarily to replicate entire databases, including aged data that might be best archived, but to replicate only data that is useful in analyzing the business and developing new opportunities for driving revenue or reducing expenses. The promise is that analyses that would have been prohibitively expensive and time consuming to construct the underlying systems and database schemas will now be very affordable. If true, companies could extend innovation potential to just about anyone in the company with a good idea rather than just the elite few analysts that perform this work at the direction of only top executives. This solution is currently based on Intel based systems running Linux from a pretty decent set of SAP technology partners. Though SAP has not eliminated any other type of systems from being considered for support, they have also not indicated a plan for support of any other type of system.

The next phase of in-memory database technology, I picked up from various conversations and presentations at Sapphire. Two major areas were discussed. The first deals entirely with BW. The insinuation was that BWA and HANA are likely to be combined into a single set of technology and have the ability to run the entire BW database stack, thereby eliminating the need for a separate BW database server. I can imagine a lot of customers that already have BWA’s or are planning on HANA finding this to be a very useful direction. The lack of transactional updates in such an environment make this a very doable goal. Once again, SAP made no statements of support or elimination of support for any platform or technology.

The second area involves a small, but historically troublesome portion of SAP transactions which involve much longer run times and/or large amounts of data transfer back and forth between database and applications servers and consequentially consume much larger amounts of resources. Though SAP was not specific, their goal is to use in-memory database technology to run those sets of SAP transactions that have these sorts of characteristics. Consider this a sort of coprocessor similar to the way that BWA acted as a back end database coprocessor for BW. Other than faster performance, this would be transparent to the end user. Programmers would see this, but perhaps just as an extension of the ABAP language for these sorts of transactions. Not all customers are experiencing problems in this area. On the other hand, there are some customers that deal with these sorts of pesky performance issues quite regularly and therefore would be prime candidates for such a technology. It is also technically, quite a bit more complex to develop this sort of coprocessor. I would envision this coming out somewhat later than the in-memory BW database technology described above.

The last phase, pushed strongly by Hasso Plattner, but barely mentioned by anyone else at SAP, involved a full transactional in-memory database. This would act as a full replacement for Oracle, DB2 and SQLserver databases. Strangely, no one representing those companies seemed to be very concerned about this so, clearly, this sparked my interest. When I asked some database experts, I was given a little rudimentary education. Transactional databases are fundamentally different than primarily read-only databases populated by other databases. At the most basic level, a query in a read-only database can examine any data element with no regard for any other query that might be doing the same. A transactional database must determine if a data element that may be changed by a transaction is locked by another transaction and, if so, what to do about it, e.g. wait, steal the lock, abandon the task, etc. At a slightly more advanced level, if an update to a read-only database fails, the data can be simply repopulated from the source. If an update fails in a transactional database, real data loss with potentially profound implications can result. Backup, recovery, roll forward, roll back, security, high availability, disaster recovery and dozens of other technologies have been developed by the database companies over time to ensure comprehensive database integrity. These companies therefore believe that if SAP goes down this path, it will not be an easy or quick one and may be fraught with complications.

And then there is a matter of cost. The software portion of HANA is not inexpensive today. If SAP were to maintain a similar pricing model for the substantially more complicated transactional database of the future, customers may be faced with database licensing costs that could be twice or more than they pay currently for SAP OEM editions of DB2, SQLserver, both licensed at 8% of SAV (SAP Application Value) or Oracle, 11% of SAV (but announced as growing to 15% this month, August).

This begs the question then. What is broken today for which an SAP in-memory transactional database fixes the problem? If you can maintain virtually all of your valuable data in a read-only copy on HANA and perform all of the analyses that your heart desires, what will a single transactional and analytical repository do that you can’t do today with the separate databases? 10 years ago, having two copies of a 10TB database would have required a big investment in disk subsystems. Now, 20TB is incredibly inexpensive and is almost a rounding error in many IT budgets.

Bottom line; HANA looks like a real winner. Phase two has a lot of promise. Phase three looks like a solution looking for a problem. So for those UNIX fans, database and application server demands will continue to be met primarily by existing technology solutions for a long time to come.

August 5, 2011 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Oracle Exadata for SAP

On June 10, 2011, Oracle announced that SAP applications had been certified for use with their Exadata Database Machine. I was intrigued as to what this actually meant, what was included, what requirement was this intended to address and what limitations might be imposed by such a system. First the meaning: Did this mean that you could actually run SAP applications on an Exadata system? Absolutely not! Exadata is a database machine. It runs Oracle RAC. Exadata has been on the market for almost 2 years. Oracle 11G RAC has been certified to run SAP databases for well over a year now. Now, there is a formal statement of support for running SAP databases on Exadata. So, the obvious question, at least to me, is why did it take so long? What is fundamentally different about Oracle RAC on Exadata vs. Oracle RAC on any x86 cluster from an SAP perspective? To the best of my knowledge, SAP sees only a RAC cluster, not an Exadata system. I offer no conclusion, just an observation that this “certification” seems to have taken an awfully long time.

What was included? As mentioned before, you can’t run SAP applications on Exadata which means that you must purchase other systems for application servers. Thankfully, you can run the CI on Exadata and can use Oracle Clusterware to protect it. In the FAQ and white papers published by Oracle, there is no mention of OracleVM or any other type of virtualization. While you can run multiple databases on a single Exadata system, all would have to reside in the same set of OS images.  This could involve multiple Oracle instances, whether RAC or not, under one OS, multiple databases under one Oracle instance, or even running different database instances on different nodes, for example.   Many customers chose to have one OS image per database instance to give them the flexibility of upgrading one instance at a time. Apparently, that is not an option when a customer chooses to use Exadata, so if a customer has this requirement, they may need to purchase additional Exadata systems. So, it might seem natural to assume that all of the software required or recommended to support this environment would be included in the SAP OEM edition of Oracle, but that would be wrong. Since Exadata is based on Oracle RAC, the RAC license must be obtained either through an additional cost for the OEM license from SAP or directly through Oracle. Active DataGuard and Real Application Testing, optional components but considered by many to be important when RAC is utilized, are also not included and must be purchased separately. Lastly, Oracle Exadata Storage Server must be purchased separately.

So, what problem is this intended to solve? Scalability? IBM’s System X, as well as several other x86 vendors, have published SAP 2-tier benchmarks in excess of 100,000 SAPS not to mention over 680,000 for IBM’s Power Systems. Using the typical 1:4 ratio of database to application server SAPS, this means that you could support an SAP requirement of at least 500,000 SAPS with a single high end x86 database server. Perhaps 1% or less of all SAP implementations need more capacity than this, so this can’t be the requirement for Exadata. How about high availability? Oracle RAC was already available on a variety of systems, so this is not unique to Exadata. A primary requirement of HA is to physically separate systems but Exadata places all of the nodes in a single rack, unless you get up to really huge multi-rack configurations, so using Exadata would go contrary to HA best practices.
Let us not forget about limitations. Already mentioned is the lack of virtualization. This can be a major issue for customers with more than one database. But what about non-production? For a customer that requires a database server for each of development, test and QA, not to mention pre-prod, post-prod or any of a variety of other purposes, this could drive the need for multiple other Exadata systems and each would have a radically more capacity than a customer could reasonably be expected to utilize. What is a customer has an existing storage standard? Exadata supports only its own storage, so must be managed separately. Many customers utilize a static and/or space efficient image of the database for backup purposes but that requires a storage system that supports such a capability not to mention the ability to mount that image on a separate server, both of which are not possible with Exadata.  A workaround might involve the use of Active Data Guard to create a synchronous copy on another system which can be utilized for backup purposes, but not only does this come with additional cost, but is more complex, is not space efficient and might require additional systems capacity.  And then there are the known limitations of RAC for SAP.  While SAP is RAC enabled, it is not RAC aware. In other words, each SAP application server must be uniquely associated with a single RAC node and all traffic directed to that node. If data is located in the memory of another RAC node, that data must be moved, through cache fusion, to the requesting node, at a speed of over 100,000 times slower than the speed of moving data through main memory. This is but one of the many tuning issues related to RAC but not intended as a knock to RAC, just a reality check. For customers that require the highest possible Oracle database availability, RAC is the best choice, but it comes at the cost of tuning and other limitations.

I am sure that I must be missing something, but I can’t figure out why any customer would need an Exadata system for SAP.

July 27, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , , , | Leave a comment

FlexPod for SAP – observations

IBM Power Systems represents one of the largest single brands of systems installed by customers around the world for SAP both in terms of the number of installed instances of SAP as well as the number of seats on those systems. In the US Fortune 100, IBM Power Systems are utilized by over 50% of those customers that have SAP installed. While many other UNIX brands are seeing medium to rapid declines in their market share, IBM Power Systems continue to gain share in both the UNIX market as a whole and within the SAP UNIX market.

Within the last year or so, a number of x86 vendors have set their sights on our enviable position. To that end, they have delivered both higher performance systems as well as creative packaging that may be interesting to some customers. In this posting, I would like to discuss the latest one that I have become aware of: FlexPod for SAP, a joint collaboration of Cisco, NetApp and VMware. It is built on a base of Cisco’s Unified Computing System (UCS) and is billed as a cloud reference platform.

Flexible (as in the first syllable of FlexPod) is defined as using only systems and interconnect components from Cisco’s UCS environment (no other x86 or RISC systems), only Linux (Microsoft and other OSs not supported), only VMware virtualization (KVM, Oracle VM, Microsoft Hyper-V, apparently excluded), only NetApp Multistore (hope you don’t have another storage standard) and only Oracle RDBMS (no DB2, SQLServer or Sybase). As an engineering major, I did not spend a lot of time on English, but I don’t recall seeing the word, flexible, ever used before with so many restrictions.

VMware, as a full partner in the solution, is obviously a key technology, but there is no mention of the lack of support for Oracle in this environment. According to SAPnote 1173954: “Oracle has not certified any of its products on VMware virtualized environments.” That note goes on to say that Oracle will only provide support if a problem can be demonstrated as occurring when running outside of VMware. Most problems seem to occur under high loads which are most difficult to simulate when running on a “test” system as might be required to demonstrate this. Interestingly enough, Cisco seems to recommend the use of NFS for common file systems where VMware often recommends that NFS should not be used with VMware. Not sure where that contradiction leaves the customer.

On the subject of Oracle, since FlexPod supposedly offers great scalability, it is glaring that there is no mention of Oracle Real Application Clusters which is touted by many other x86 vendors, as well as Oracle, as a way of achieving scalability for large SAP databases.

As any SAP customer knows, building systems can be a daunting task, but managing them in a best practices manner requires even more work. The documentation on FlexPod makes only passing references to high availability with the suggestion that VMware delivers on this requirement. VMware HA is a nice utility to recover a failing Hypervisor, but does not offer full SAP stack availability as many HA products offer. In other words, if the SAP message server fails, it would be nice to restart that server which only happens if you layer on yet another vendor’s VMware plugin for HA. Of course, there is no mention of Symantec or any other VMware plug-in in the FlexPod literature. Backup and recovery are mentioned in the context of NetApp SMSAP and Protection Manager with SAP BR*tools. I can’t speak to NetApp’s tools, but I don’t recall seeing these listed on SAP’s list of certified backup software offerings like Tivoli Storage Manager and Symantec Netbackup are. Disaster Recovery is not even mentioned, nor are SAP monitoring, SAP archiving, Security. As to cloud enablement, technologies such as catalog management, self-service portal, automatic load balancing, charge back accounting, to name just a few, BMC is mentioned as the provider of these capabilities, but there are no details behind this. Sounds more like a completely separate solution with no real relation to FlexPod.

FlexPod is described as “a unified, pretested, and validated data center solution” for SAP. But it seems more equivalent to a comprehensive clothes washing system that can wash clothes if someone else sorts them, washes each load separately after careful placement, adds the right amount of soap, monitors the progress to make sure the clothes are washing correctly, removes each piece at the end, places them in the dryer and subsequently takes out only the dry pieces, irons them and hangs them up. In the Q&A, it is noted that FlexPod is not available as a single SKU, but instead a partner will “use the FlexPod reference bill of materials, sizing guide, and Cisco Validated Design to architect the solution.” Sounds more like a set of unrelated components tied together with some nice chartware and a set up white paper.

One final word; when customers bet their business on an application, as they typically do with SAP, unless they want to live on the bleeding edge with an untried and unproven solution, it is strongly advised to ask for references of a similar size and scope to what they are considering.

July 15, 2011 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , , , , , | Leave a comment