An ongoing discussion about SAP infrastructure

Scale-up vs. scale-out architectures for SAP HANA – part 1

Dozens of articles, blog posts, how-to guides and SAP notes have been written about this subject.  One of the best was by John Appleby, now Global Head of DDM/HANA COEs @ SAP.[i]  Several others have been written by vendors with a vested interest in the proposed option. The vendor for which I work, IBM, offers excellent solutions for both options, so my perspective is based on both my and the experiences of our many customers, some that have chosen one or the other option, or both, in some cases.

Scale-out for BW is well established, understood, fully supported by SAP and can be cost effective from the perspective of systems acquisition costs.  Scale-out for S/4HANA, by comparison, is in use by very few customers, not well understood, yet is support by SAP for configurations up to 4 nodes.  Does this mean that a scale-out architecture should always be used for BW and a scale-up architecture for S/4HANA the only viable choice?  This blog post will discuss only BW and similar analytical environments including BW/4HANA, data marts, data lakes, etc.  The next will discuss S/4HANA and the third in the series will discuss vendor selection and where one might have an advantage over the others. 

Scale-out has 3 key advantages over scale-up:

  • Every vendor can participate therefore competitive bidding of “commodity” level systems can result in optimal pricing.
  • High availability, using host auto-failover requires nothing more than n+1 systems as the hot standby node can take over the role of any other node (some customers chose n+2 or group nodes and standby nodes).
  • Some environments are simply too large to fit in even the largest supported scale-up systems.

Scale-up, likewise, has 3 key advantages over scale-out:

  • Performance is, inevitably, better as joins across memory are always faster than joins across a network
  • Management is much simpler as query analysis and data distribution decisions need not be performed on a regular basis plus fewer systems are involved with the corresponding decrease in monitoring, updating, connectivity, etc.
  • TCO can be lower when the costs of systems, storage, network and basis management are included.

Business requirements, as always, should drive the decision as to which to use.  As mentioned, when an environment is simply too large, unless a customer is willing to ask for an exception from SAP (and SAP is willing to grant it), then scale-out may be the only option.  Currently, SAP supports BW configurations of up to 6TB on many 8-socket Intel Skylake based systems (up to 12TB on HPE’s 16-socket system) and up to 16TB on IBM Power Systems.

The next most important issue is usually cost.  Let’s take a simple example of an 8TB BW HANA requirement.  With scale-out, 4 @ 2TB nodes may be used with a single 2TB node for hot standby for a total of 10TB of memory.  If scale-up is used, the primary system must be 8TB and the hot-standby another 8TB for a total of 16TB of memory.  Considering that memory is the primary driver of the cost of acquisition, 16TB, from any vendor, will cost more than 10TB.  If the analysis stops there, then the decision is obvious. However, I would strongly encourage all customers to examine all costs, not just TCA.

In the above example, 5 systems are required for the scale-out configuration vs. 2 for scale-up. The scale-out config could be reduced to 4 systems if 3TB nodes are used with 1TB left unused although the total memory requirement would go up to 12TB.  At a minimum, twice the management activities, trouble-shooting and connectivity would be required. Also, remember, prod rarely exists on its own with some semblance of the configuration existing in QA, often DR and sometimes other non-prod instances.

The other set of activities is much more intensive.  To distribute load amongst the systems, first data must be distributed.  Some data must reside on the master node, e.g. all row-store tables, ABAP tables, general operations tables.  Other data such as Fact, DataStore Object (DSO), Persistent Staging Area (PSA) is distributed evenly across the slave nodes based on the desired partitioning specification, e.g. hash, round robin or range.  There are also more complex options where specifications can be mixed to get around hash or range limitations and create a multi-level partitioning plan).   And, of course, you can partition different tables using different specifications.  Which set of distribution specifications you use is highly dependent on how data is accessed and this is where it gets really complicated.  Most customers start with a simple specification, begin monitoring placement using the table distribution editor and performance using STO3N plus getting feedback from end users (read that as complaints to the help desk).  After some period of time and analysis of performance, many customers elect to redistribute data using a better or more complex set of specifications. Unfortunately, what is good for one query, e.g. distribute data based on month, is bad for another which looks for data based on zipcode, customer name or product number.  Some customers report that the above set of activities can consume part or all of one or more FTEs.

Back to the above example. 10TB vs. 16TB which we will assume is replicated in QA and DR, for sake of argument, i.e. the scale-up solution requires 18TB more memory.  If the price per TB is $35,000 then the cost different in TCA would be $630,000.  The average cost of a senior basis administrator (required for this sort of complex task) in most western countries is in the $150,000 range.  That means that over the course of 5 years, the TCO of the scale-up solution, considering only TCA and basis admin costs would be roughly equivalent to the cost of the scale-out solution.  Systems, storage and network administration costs could push the TCO of the scale-out solution up relative to the scale-up solution.

And then there is performance.  Some very high performance network adapter companies have been able to drive TCP latency across a 10Gb Ethernet down to 3.6us which sounds really good until you consider memory latency is around 120ns, i.e. 30 times faster.  Joining tables across nodes not only is substantially slower, but also results in more CPU and memory overhead.[ii]  A retailer in Switzerland, Coop Group, reported 5 times quicker analytics while using 85% fewer cores after migrating from an 8-node x86 scale-out BW HANA cluster with 320 total cores to a single scale-up 96-core IBM Power Systems.[iii]  While various benchmarks suggest 2x or better per core performance of Power Systems vs. x86, the results suggest far higher, much of which can, no doubt, be attributed to the effect of using a scale-up architecture.

Of course, performance is relative.  BW queries run with scale-out HANA will usually outperform BW on a conventional DB by an order of magnitude or more.  If this is sufficient for business purposes, then it may be hard to build a case for why faster is required.  But end users have a tendency to soak up additional horsepower once they understand what is possible.  They do this in the form of more what-if analyses, interactive drill downs, more frequent mock-closes, etc.

If the TCO is similar or better and a scale-up approach delivers superior performance with many fewer headaches and calls to the help desk for intermittent performance problems, then it would be very worthwhile to investigate this option.


To recap; For BW HANA and similar analytical environments, Scale-out architectures usually offer the lowest TCA and scalability beyond the largest scale-up environment.  Scale-up architectures offers significantly easier administration, much better performance and competitive to superior TCO.


[ii] FAQ 8)



July 9, 2018 Posted by | Uncategorized | , , , , , , , , , , , | 3 Comments

IBM’s HANA solution

IBM’s implementation of an SAP HANA appliance is nothing short of a technological coup de grace over the competition.   These are words that you would never normally see me write as I am not one to use superlatives, but in this case, they are appropriate.  This is not to say that I am contradicting anything that I said in my posting last year where I was positioning when and where HANA is the right choice:

As expected, HANA has been utilized for more applications and is about to GA for BW in the very near future.  SAP reports brisk sales of systems and our experience echoes this, especially for proof of concept systems.  Even though all of the warts of a V1.0 product have not been overcome, the challenges are being met by a very determined development team at SAP following a corporate mandate.  It is only a matter of time before customers move into productive environments with more and more applications of HANA.  That said, conventional databases and systems are not going away any time soon.  Many systems such as ERP, CRM and SCM run brilliantly with conventional database systems.  This is a tribute to the way that SAP delivered a strong architecture in the past.   There are pieces of those systems which are “problem” areas and SAP is rapidly deploying solutions to fix those problems, often with HANA based point solutions, e.g. COPA.  SAP envisions a future in which HANA replaces conventional databases, but not only are there challenges to be overcome, but there are simply not that many problems with databases based on DB2, Oracle, SQLserver or Sybase, not a compelling business case for change, as of yet.

Of course, I am not trying to suggest that HANA is not appropriate for some customers or that it does not deliver outstanding results in many cases.  In many situations, the consolidation of a BW and BWA set of solutions makes tremendous sense.  As we evolve into the future, HANA will make progressively more sense to more and more customers.  And this brings me back to my original “superlative” laden statement.  How does IBM deliver what none of the competitors do?

Simply put, (yes I know, I rarely put anything simply), IBM’s HANA appliance utilizes a single, high performance stack, regardless of how small or large a customer’s environment is.  And with IBM’s solution, as demonstrated at Sapphire two weeks ago, 100TB across 100 nodes, is neither a challenge nor a limit of its architecture.   By the way, Hasso Platner unveiled this solution in his key note speech at Sapphire and, surprisingly, he called out and commended IBM.  Let us delve a little deeper and get a little less simple.

At the heart of the IBM solution is GPFS, the General Parallel File System.  This is, strangely enough, a product of IBM Power Systems.  It allows for striping of data among local SSD and HDD as well as spanning of data across clustered systems, replication of data, high availability and disaster recovery.  On a single node system, utilizing the same number of drives, up to twice the IOPS of an ext3 file system can be expected with GPFS.  When, not if, a customer grows and needs either larger systems or multiple systems, the file system stays the same and simply adapts to its different requirements.  GFPS owes its roots to high performance computing.  Customers trying to find the answer (not 42 for those of you geeky enough to understand what that means) to complex problems often required dozens, hundreds or even thousands of nodes and had to connect all of those nodes to a single set of data.  As these solutions would run for often, ridiculous, period of time, sometimes counted in months or years even, the file system upon which they relied simply could not break no matter what the underlying hardware did.  ASCI, the US Accelerated Strategic Computing Initiative focused on, among other things, simulating the effect of nuclear weapons storage decay, drove this requirement.  The same requirements exist in many other “grand challenge” problems, whether they be Deep Blue or Watson and their ability to play games better than humans, or “simple” problems of unfolding DNA or figuring out how weather systems work or how airplane wings can fly more efficiently.   More modestly, allowing thousands of scientists and engineers to collaborate using a single file system or thousands of individuals to access a file system without regard to location or the boundaries of the underlying storage technology, e.g. IBM SONAS, are results of this technology.  Slightly less modestly, GPFS is the file system used by DB2 PureScale and, historically, Oracle RAC and even though Oracle ASM is now supported on all platforms, GPFS is still frequently used despite the fact that GPFS is an additional cost over ASM which is included with the RAC license.  The outcome is an incredibly robust, fault resilient and consistent file system.

Why did I go down that rabbit hole?  Because, it is important to understand that whether a customer utilizes one HANA node with a total of 128GB of memory or a thousand nodes with 2PB of memory, the technology does not have to be developed to support this, it has already been done with GPFS.

Now for the really amazing part of this solution:  All drives, whether HDD or SSD are located within each of the nodes but through the magic of GPFS, are available to all nodes within the system.  This means there is no SAN, no NAS, no NFS, no specialized switches, no gateways, just a simple 10GB/s Ethernet for GPFS to communicate amongst its nodes.  Replication is built in so that the data and logs physically located on each node can be duplicated to one or more other nodes.  This provides for an automatic HA of the file systems where, even if a node fails, all data can be accessed from other nodes and HANA can restart the failed node on a standby system.

Some other features of IBM’s implementation are worth note.  IBM offers two different types of systems for HANA, the x3690 X5, a 1 or 2 socket system and the x3950 X5 system, a 2, 4 or 8 socket system.  Either system may be used standalone or for scale up, but currently, the x3690 only is certified to scale to 4 nodes where the x3950 is certified to scale to 16 nodes.  While it is not possible to mix and match these nodes in a scale out configuration, it is possible to utilize one in production and another in non-production, for example, as they have identical stacks.  It is interesting to note another valuable feature.  The x3950 is capable of scaling to 8 sockets, but customers don’t have to purchase an 8 socket system up front.  This is because each “drawer” of a x3950 supports up to 4 sockets and a second drawer may be added, at any time, to upgrade a system to 8 sockets.  Taken all together, an 8 socket standalone system costs roughly twice what a 4 socket standalone system does and a two node scale out implementation costs roughly twice what a single node standalone system does.  For that matter, a 16 node scale out implementation costs 8 times what a 2 node scale out implementation does.

How does this compare with implementations from HP, Dell, Cisco, Fujitsu, Hitachi and NEC?  For HP, a customer must choose whether to utilize a 4 or 8 socket systems and consequently must pay for a 4 or 8 node system up front as there is no path from the DL580 4 socket system to the DL980 8 socket system.  Many more drives are required, e.g.  a 128GB configuration requires 24@15K HDDs compared to an equivalent IBM solution which only requires 8@10K.  Next, if one decides to start standalone and then move into a parallel implementation, one must move from a ext3 and xfs file systems to NFS.  Unfortunately, HP has not certified either of those systems for scale out, so customers must move to the BL680c for scale out implementations, but the BL680c is only available as 4 socket/512GB nodes.  A standalone implementation utilizes internal disks plus, frequently, a disk drawer in order to deliver the high IOPS that HANA requires, but a scale out implementation requires one HP P6500EVA for each 4 nodes and one HP X9300 Network Storage Gateway for every 2 nodes.   The result is that not only does the stack change from standalone to scale out, but the systems change, the enclosure and interconnections change and as more nodes are added, complexity grows dramatically.  Also, cost is not proportional but instead grows at an ever increasing rate as more nodes are added.

The other vendors’ implementations all share similar characteristics with HP’s with, of course, different types of nodes, file systems and storage architecture, e.g. Cisco uses EMC’s VNX5300 and MPFS parallel file system for scale-out.

At the time of writing this, only Cisco and Dell were not certified for the 1TB HANA configuration as SAP requires 8 sockets for this size system, based on SAP sizing rules, not based on limitations of the systems to support 1TB.  Also, only IBM was certified for a scale out configuration utilizing 1TB nodes and up to 16 of those nodes.

The lead that IBM has over the competition is almost unprecedented.  In fact, in the x86 space, I am not aware of this wide a gap at any time in IBM’s history.  If you would like to get the technical details behind IBM’s HANA implementation, please visit and to see a short video on the subject from Rich Travis, one of IBM’s foremost experts on HANA and the person that I credit with most of my knowledge of HANA, but none of the mistakes in this blog post, please visit:

May 30, 2012 Posted by | Uncategorized | , , , , , , | Leave a comment

Excellent PowerVM for SAP document

About 3 years ago, the IBM SAP Competency Center in Germany produced a very good document that took the reader through the reasons and rationale for virtualizing SAP landscapes and then explained all of the technologies available on the Power Systems platform to allow users to accomplish that goal.  As many improvements have been introduced in the Power Systems line as well as with its Systems Software, a new updated version was needed.  The Competency Center rose to the task and produced this completely refreshed document.


Here is the table of contents to give you a small taste of what it covers.

Chapter 1. From a non-virtualized to a virtualized infrastructure

Chapter 2. PowerVM virtualization technologies

Chapter 3. Best practice implementation example at a customer site

Chapter 4. Hands-on management tasks

Chapter 5. Virtual I/O Server

Chapter 6. IBM PowerVM Live Partition Mobility

Chapter 7. Workload partitions

Chapter 8. SAP system setup for virtualization

Chapter 9. Monitoring

Chapter 10. Support statements by IBM and SAP


It is not what one might call “light reading”, but it is a comprehensive and well written guide to the leading edge virtualization technologies offered by IBM on Power Systems and how SAP landscapes can benefit from them.

October 26, 2011 Posted by | Uncategorized | , , , , , , | Leave a comment

SAP TechEd Video – Shall I Stay or Shall I Go … to x86? Technical Factors and Considerations.

For any that might be interested in seeing a presentation that I delivered at SAP TechEd Las Vegas 2011, please click on this link.   This presentation discusses why customers might consider x86 for their SAP environments and the reasons why Power Systems may deliver lower costs, better reliability and security.   The video lasts approximately 60 minutes.

October 4, 2011 Posted by | Uncategorized | , , , , , , , | Leave a comment