Scale-up vs. scale-out architectures for SAP HANA – part 1
Dozens of articles, blog posts, how-to guides and SAP notes have been written about this subject. One of the best was by John Appleby, now Global Head of DDM/HANA COEs @ SAP.[i] Several others have been written by vendors with a vested interest in the proposed option. The vendor for which I work, IBM, offers excellent solutions for both options, so my perspective is based on both my and the experiences of our many customers, some that have chosen one or the other option, or both, in some cases.
Scale-out for BW is well established, understood, fully supported by SAP and can be cost effective from the perspective of systems acquisition costs. Scale-out for S/4HANA, by comparison, is in use by very few customers, not well understood, yet is support by SAP for configurations up to 4 nodes. Does this mean that a scale-out architecture should always be used for BW and a scale-up architecture for S/4HANA the only viable choice? This blog post will discuss only BW and similar analytical environments including BW/4HANA, data marts, data lakes, etc. The next will discuss S/4HANA and the third in the series will discuss vendor selection and where one might have an advantage over the others.
Scale-out has 3 key advantages over scale-up:
- Every vendor can participate therefore competitive bidding of “commodity” level systems can result in optimal pricing.
- High availability, using host auto-failover requires nothing more than n+1 systems as the hot standby node can take over the role of any other node (some customers chose n+2 or group nodes and standby nodes).
- Some environments are simply too large to fit in even the largest supported scale-up systems.
Scale-up, likewise, has 3 key advantages over scale-out:
- Performance is, inevitably, better as joins across memory are always faster than joins across a network
- Management is much simpler as query analysis and data distribution decisions need not be performed on a regular basis plus fewer systems are involved with the corresponding decrease in monitoring, updating, connectivity, etc.
- TCO can be lower when the costs of systems, storage, network and basis management are included.
Business requirements, as always, should drive the decision as to which to use. As mentioned, when an environment is simply too large, unless a customer is willing to ask for an exception from SAP (and SAP is willing to grant it), then scale-out may be the only option. Currently, SAP supports BW configurations of up to 6TB on many 8-socket Intel Skylake based systems (up to 12TB on HPE’s 16-socket system) and up to 16TB on IBM Power Systems.
The next most important issue is usually cost. Let’s take a simple example of an 8TB BW HANA requirement. With scale-out, 4 @ 2TB nodes may be used with a single 2TB node for hot standby for a total of 10TB of memory. If scale-up is used, the primary system must be 8TB and the hot-standby another 8TB for a total of 16TB of memory. Considering that memory is the primary driver of the cost of acquisition, 16TB, from any vendor, will cost more than 10TB. If the analysis stops there, then the decision is obvious. However, I would strongly encourage all customers to examine all costs, not just TCA.
In the above example, 5 systems are required for the scale-out configuration vs. 2 for scale-up. The scale-out config could be reduced to 4 systems if 3TB nodes are used with 1TB left unused although the total memory requirement would go up to 12TB. At a minimum, twice the management activities, trouble-shooting and connectivity would be required. Also, remember, prod rarely exists on its own with some semblance of the configuration existing in QA, often DR and sometimes other non-prod instances.
The other set of activities is much more intensive. To distribute load amongst the systems, first data must be distributed. Some data must reside on the master node, e.g. all row-store tables, ABAP tables, general operations tables. Other data such as Fact, DataStore Object (DSO), Persistent Staging Area (PSA) is distributed evenly across the slave nodes based on the desired partitioning specification, e.g. hash, round robin or range. There are also more complex options where specifications can be mixed to get around hash or range limitations and create a multi-level partitioning plan). And, of course, you can partition different tables using different specifications. Which set of distribution specifications you use is highly dependent on how data is accessed and this is where it gets really complicated. Most customers start with a simple specification, begin monitoring placement using the table distribution editor and performance using STO3N plus getting feedback from end users (read that as complaints to the help desk). After some period of time and analysis of performance, many customers elect to redistribute data using a better or more complex set of specifications. Unfortunately, what is good for one query, e.g. distribute data based on month, is bad for another which looks for data based on zipcode, customer name or product number. Some customers report that the above set of activities can consume part or all of one or more FTEs.
Back to the above example. 10TB vs. 16TB which we will assume is replicated in QA and DR, for sake of argument, i.e. the scale-up solution requires 18TB more memory. If the price per TB is $35,000 then the cost different in TCA would be $630,000. The average cost of a senior basis administrator (required for this sort of complex task) in most western countries is in the $150,000 range. That means that over the course of 5 years, the TCO of the scale-up solution, considering only TCA and basis admin costs would be roughly equivalent to the cost of the scale-out solution. Systems, storage and network administration costs could push the TCO of the scale-out solution up relative to the scale-up solution.
And then there is performance. Some very high performance network adapter companies have been able to drive TCP latency across a 10Gb Ethernet down to 3.6us which sounds really good until you consider memory latency is around 120ns, i.e. 30 times faster. Joining tables across nodes not only is substantially slower, but also results in more CPU and memory overhead.[ii] A retailer in Switzerland, Coop Group, reported 5 times quicker analytics while using 85% fewer cores after migrating from an 8-node x86 scale-out BW HANA cluster with 320 total cores to a single scale-up 96-core IBM Power Systems.[iii] While various benchmarks suggest 2x or better per core performance of Power Systems vs. x86, the results suggest far higher, much of which can, no doubt, be attributed to the effect of using a scale-up architecture.
Of course, performance is relative. BW queries run with scale-out HANA will usually outperform BW on a conventional DB by an order of magnitude or more. If this is sufficient for business purposes, then it may be hard to build a case for why faster is required. But end users have a tendency to soak up additional horsepower once they understand what is possible. They do this in the form of more what-if analyses, interactive drill downs, more frequent mock-closes, etc.
If the TCO is similar or better and a scale-up approach delivers superior performance with many fewer headaches and calls to the help desk for intermittent performance problems, then it would be very worthwhile to investigate this option.
To recap; For BW HANA and similar analytical environments, Scale-out architectures usually offer the lowest TCA and scalability beyond the largest scale-up environment. Scale-up architectures offers significantly easier administration, much better performance and competitive to superior TCO.
[i]https://blogs.saphana.com/2014/12/10/sap-hana-scale-scale-hardware/
[ii]https://launchpad.support.sap.com/#/notes/2044468(see FAQ 8)
[iii]https://www.ibm.com/case-studies/coop-group-technical-reference