An ongoing discussion about SAP infrastructure

Scale-up vs. scale-out architectures for SAP HANA – part 2

S/4HANA is enabled for scale-out up to 4 nodes plus one hot-standby.  Enablement does not mean it is easy or advisable.  SAP states clearly: “We recommend using scale-up configurations as long as this is economically justifiable, taking operational costs and drawbacks into account.”[i]  This same note goes on to say: Limited knowledge about S/4HANA customer scenarios using scale-out is currently available.”

For very large customers, e.g. those for which an S/4HANA system’s memory is predicted to be larger than 24TB currently, scale-out may be the best option.  Best, of course, implies that there may be other options which will be discussed later in this post.

It is a reasonable question to ask why does SAP offer such conditional advice.  We can only speculate since SAP does not provide a direct explanation.  Some insight may be gained by reading the SAP note on scale-out sizing.[ii]  Unlike analytical applications such as BW/4HANA, partitioning of S/4HANA tables across nodes is not permitted.  Instead, all tables of a particular module are grouped together and the entire group must be placed on an individual node in the cluster.

Let’s consider a simple example of three commonly used modules, FI, MM and SD (Financial, Materials, Sales).  The tables associated with each module belong to their respective groups.  Placing each on a different node may help to minimize the size of any one node, but several issues arise.

  • Each will probably be a different size.  This fully supported, but the uneven load distribution may result in one node running at high utilization while another is barely using any capacity.  Not only does this mean wasted computing power, power and cooling but could result in inferior performance on the hot node.
  • Since most customers prefer to size all nodes in a cluster the same way, considerable over capacity of memory might result further driving up infrastructure costs.
  • Transactions often do not fit comfortably within a single module, e.g. a sales order might result in financial tables being updated with billing, accounts receivable and revenue data and materials tables being adjusted with a decrement of available stock.  If a transaction is running on node 1 (the master node) and needs to access/update tables on nodes 2 and 3, those communications run across a network.  As with the BW example in the previous blog post, each communication is at least 30 times slower across a network than across memory.

It is important to consider that every transaction that comes into an S/4HANA system connects to the index server on the master node with queries distributed by the master node to the appropriate index server.  This means that every transaction not handled directly by the master node must involve at least one send and one receive with the associated 30 times slower latency.

Some cross-node latency may be reduced by collocating appropriate groups resulting in fewer total nodes and/or by replicating some tables.  Unfortunately, if a table is replicated, this would result in the breaking of a fundamental SAP rule noted in SAP Note 2408419 (see footnote #1 below), that all tables of a group must be located on the same node.

As with the BW example, what works well for one scenario may not work well for another.  One of the significant advantages of S/4HANA over Business Suite 7 is the consolidation and dramatic reduction of tables resulting in fewer, much larger tables.  Conversely, this makes table distribution in a scale-out cluster much more challenging. It is not hard to imagine that performance management could be quite a task in a scale-out scenario.

So, if scale-out is not an option for many/most customers, what should be done if approaching a significant memory barrier?  Options include:

  • Cleanup, use of hybrid LOBs, index optimization, etc
  • Archiving data to reduce the size of the system
  • Eliminating duplicate data or easily reproduced data, e.g. iDocs, data from Hadoop
  • Usage of Data Aging[iii]
  • Sizing memory smaller than predicted
  • Request an exception to size the system larger than officially supported

Cleaning up your system and getting rid of various unnecessary memory consumers should be the first approach undertaken.[iv]  Remember, what might have been important with a conventional DB may either be not needed with S/4HANA or a better technique may exist.  The expected memory reduction is usually shown as part of an ERP sizing report.

Archiving is another obvious approach but since the data is kept on very slow media, compared to in-memory data, and cannot be changed, the decision as to what to archive and where to place it can be very challenging for some organizations.

iDocs are, by definition, intermediate documents and are used primarily for sending and receiving documents to/from third parties, e.g. sales orders, purchase orders, invoices, shipping notices.  Every iDoc sent or received should have a corresponding transaction within the SAP system which means that it is essentially a duplicate record once processed by the SAP system.  Many customers keep these documents indefinitely just in case any disputes occur with those third parties.  Often, these iDocs just sit around collecting digital dust and may be prime candidates for deletion or archival.  Likewise, data from an external source, e.g. Hadoop, should still exist in that source and could potentially be deleted from HANA.

Data Aging only covers a subset of data objects and requires some effort to utilize it.[v]  By default, the ABAP server adds “WITH RANGE RESTRICTION (‘CURRENT’)” to all queries to prevent unintended access to aged or cold partitions, which means that to access aged data, a query must specify which aged partition to access.  This implies special transactions or at least different training for users to access aged data.  Data Aging does allow aged data to be updated, so may be more desirable than archiving in some cases.  Aged data is stored on storage devices which means many order of magnitude slower than memory, however this can be mitigated, to some extent by faster media, e.g. NVMe drives on PCIe cards. Unfortunately, Data Aging has not been implemented by many customers meaning a potentially steep learning curve.

Deliberately undersizing a system is not recommended by SAP and I am not recommending it either.  That said, if an implementation is approaching a memory boundary and scaling to a larger VM or platform is not possible (physically, politically or financially), then this technique may be considered. It comes with some risk, however, so should be considered a last resort only.  HANA enables “lazy loading” of columns[vi] whereby columns are not loaded until needed. If your system has a large number of columns which consume space on disk but are never or rarely accessed, the memory reserved for these columns will, likewise, go unused or underused.  HANA also will attempt to unload columns when the system runs out of allocable memory based on a least frequently used algorithm. Unless a problem occurs, a system configured with less memory than the sizing report predicts will start without problem and unload columns when needed.  The penalty comes when those columns that are not memory resident are accessed at which time other column(s) must first be unloaded and the entire requested column loaded, i.e. significant latency for the first access is incurred. As mentioned earlier, this should be considered only in a worst case scenario and only if scaling up/out is not desired or an option.

Lastly, requesting an exception from SAP to allow a system size greater than officially supported may be a viable choice for the customers that are expected to exceed current maximums. This may not be without difficulty as when you embark on a journey where few or none have gone before, inevitably, you will run into obstacles that others have not yet encountered.  Dispatch mechanisms, delta merge operations, transactional log latency, savepoint I/O throughput, system startup times, backup/recovery and system replication are among some of the more significant areas that would be stressed and some might break.

My advice: Scale-up only in all S/4HANA cases unless the predicted memory for the immediate planning horizon exceeds the official SAP maximum supported size.  Before considering scale-out solutions, use every available tool to reduce the size of the system and ask SAP for an exception if the resulting size is still above the maximum.  Lastly, remember that SAP and its hardware partners are constantly working to enable larger HANA system sizes.  If the size required today fits within the largest supported system but is expected to exceed the limit over time, it may be reasonable to start your implementation or migration effort today with the expectation that the maximum will be increased by the time you need it.  Admittedly, this is taking a risk, but one that may be tolerable and if the limit is not raised in time, scale-out is still an option.

[i]2408419 – SAP S/4HANA – Multi-Node Support
[ii]2428711 – S/4HANA Scale-Out Sizing
[iii]2416490 – FAQ: SAP HANA Data Aging in SAP S/4HANA
[iv]1999997 – FAQ: SAP HANA MemoryFAQ 5
[v]1872170 – Business Suite on HANA and S/4HANA sizing report.

July 16, 2018 Posted by | Uncategorized | , , , , , , , | 1 Comment

Scale-up vs. scale-out architectures for SAP HANA – part 1

Dozens of articles, blog posts, how-to guides and SAP notes have been written about this subject.  One of the best was by John Appleby, now Global Head of DDM/HANA COEs @ SAP.[i]  Several others have been written by vendors with a vested interest in the proposed option. The vendor for which I work, IBM, offers excellent solutions for both options, so my perspective is based on both my and the experiences of our many customers, some that have chosen one or the other option, or both, in some cases.

Scale-out for BW is well established, understood, fully supported by SAP and can be cost effective from the perspective of systems acquisition costs.  Scale-out for S/4HANA, by comparison, is in use by very few customers, not well understood, yet is support by SAP for configurations up to 4 nodes.  Does this mean that a scale-out architecture should always be used for BW and a scale-up architecture for S/4HANA the only viable choice?  This blog post will discuss only BW and similar analytical environments including BW/4HANA, data marts, data lakes, etc.  The next will discuss S/4HANA and the third in the series will discuss vendor selection and where one might have an advantage over the others. 

Scale-out has 3 key advantages over scale-up:

  • Every vendor can participate therefore competitive bidding of “commodity” level systems can result in optimal pricing.
  • High availability, using host auto-failover requires nothing more than n+1 systems as the hot standby node can take over the role of any other node (some customers chose n+2 or group nodes and standby nodes).
  • Some environments are simply too large to fit in even the largest supported scale-up systems.

Scale-up, likewise, has 3 key advantages over scale-out:

  • Performance is, inevitably, better as joins across memory are always faster than joins across a network
  • Management is much simpler as query analysis and data distribution decisions need not be performed on a regular basis plus fewer systems are involved with the corresponding decrease in monitoring, updating, connectivity, etc.
  • TCO can be lower when the costs of systems, storage, network and basis management are included.

Business requirements, as always, should drive the decision as to which to use.  As mentioned, when an environment is simply too large, unless a customer is willing to ask for an exception from SAP (and SAP is willing to grant it), then scale-out may be the only option.  Currently, SAP supports BW configurations of up to 6TB on many 8-socket Intel Skylake based systems (up to 12TB on HPE’s 16-socket system) and up to 16TB on IBM Power Systems.

The next most important issue is usually cost.  Let’s take a simple example of an 8TB BW HANA requirement.  With scale-out, 4 @ 2TB nodes may be used with a single 2TB node for hot standby for a total of 10TB of memory.  If scale-up is used, the primary system must be 8TB and the hot-standby another 8TB for a total of 16TB of memory.  Considering that memory is the primary driver of the cost of acquisition, 16TB, from any vendor, will cost more than 10TB.  If the analysis stops there, then the decision is obvious. However, I would strongly encourage all customers to examine all costs, not just TCA.

In the above example, 5 systems are required for the scale-out configuration vs. 2 for scale-up. The scale-out config could be reduced to 4 systems if 3TB nodes are used with 1TB left unused although the total memory requirement would go up to 12TB.  At a minimum, twice the management activities, trouble-shooting and connectivity would be required. Also, remember, prod rarely exists on its own with some semblance of the configuration existing in QA, often DR and sometimes other non-prod instances.

The other set of activities is much more intensive.  To distribute load amongst the systems, first data must be distributed.  Some data must reside on the master node, e.g. all row-store tables, ABAP tables, general operations tables.  Other data such as Fact, DataStore Object (DSO), Persistent Staging Area (PSA) is distributed evenly across the slave nodes based on the desired partitioning specification, e.g. hash, round robin or range.  There are also more complex options where specifications can be mixed to get around hash or range limitations and create a multi-level partitioning plan).   And, of course, you can partition different tables using different specifications.  Which set of distribution specifications you use is highly dependent on how data is accessed and this is where it gets really complicated.  Most customers start with a simple specification, begin monitoring placement using the table distribution editor and performance using STO3N plus getting feedback from end users (read that as complaints to the help desk).  After some period of time and analysis of performance, many customers elect to redistribute data using a better or more complex set of specifications. Unfortunately, what is good for one query, e.g. distribute data based on month, is bad for another which looks for data based on zipcode, customer name or product number.  Some customers report that the above set of activities can consume part or all of one or more FTEs.

Back to the above example. 10TB vs. 16TB which we will assume is replicated in QA and DR, for sake of argument, i.e. the scale-up solution requires 18TB more memory.  If the price per TB is $35,000 then the cost different in TCA would be $630,000.  The average cost of a senior basis administrator (required for this sort of complex task) in most western countries is in the $150,000 range.  That means that over the course of 5 years, the TCO of the scale-up solution, considering only TCA and basis admin costs would be roughly equivalent to the cost of the scale-out solution.  Systems, storage and network administration costs could push the TCO of the scale-out solution up relative to the scale-up solution.

And then there is performance.  Some very high performance network adapter companies have been able to drive TCP latency across a 10Gb Ethernet down to 3.6us which sounds really good until you consider memory latency is around 120ns, i.e. 30 times faster.  Joining tables across nodes not only is substantially slower, but also results in more CPU and memory overhead.[ii]  A retailer in Switzerland, Coop Group, reported 5 times quicker analytics while using 85% fewer cores after migrating from an 8-node x86 scale-out BW HANA cluster with 320 total cores to a single scale-up 96-core IBM Power Systems.[iii]  While various benchmarks suggest 2x or better per core performance of Power Systems vs. x86, the results suggest far higher, much of which can, no doubt, be attributed to the effect of using a scale-up architecture.

Of course, performance is relative.  BW queries run with scale-out HANA will usually outperform BW on a conventional DB by an order of magnitude or more.  If this is sufficient for business purposes, then it may be hard to build a case for why faster is required.  But end users have a tendency to soak up additional horsepower once they understand what is possible.  They do this in the form of more what-if analyses, interactive drill downs, more frequent mock-closes, etc.

If the TCO is similar or better and a scale-up approach delivers superior performance with many fewer headaches and calls to the help desk for intermittent performance problems, then it would be very worthwhile to investigate this option.


To recap; For BW HANA and similar analytical environments, Scale-out architectures usually offer the lowest TCA and scalability beyond the largest scale-up environment.  Scale-up architectures offers significantly easier administration, much better performance and competitive to superior TCO.


[ii] FAQ 8)


July 9, 2018 Posted by | Uncategorized | , , , , , , , , , , , | 3 Comments

HANA on Power hits the Trifecta!

Actually, trifecta would imply only 3 big wins at the same time and HANA on Power Systems just hit 4 such big wins.

Win 1 – HANA 2.0 was announced by SAP with availability on Power Systems simultaneously as with Intel based systems.[i]  Previous announcements by SAP had indicated that Power was now on an even footing as Intel for HANA from an application support perspective, however until this announcement, some customers may have still been unconvinced.  I noticed this on occasion when presenting to customers and I made such an assertion and saw a little disbelief on some faces.  This announcement leaves no doubt.

Win 2 – HANA 2.0 is only available on Power Systems with SUSE SLES 12 SP1 in Little Endian (LE) mode.  Why, you might ask, is this a “win”?  Because true database portability is now a reality.  In LE mode, it is possible to pick up a HANA database built on Intel, make no modifications at all, and drop it on a Power box.  This removes a major barrier to customers that might have considered a move but were unwilling to deal with the hassle, time requirements, effort and cost of an export/import.  Of course, the destination will be HANA 2.0, so an upgrade from HANA 1.0 to 2.0 on the source system will be required prior to a move to Power among various other migration options.   This subject will likely be covered in a separate blog post at a later date.  This also means that customers that want to test how HANA will perform on Power compared to an incumbent x86 system will have a far easier time doing such a PoC.

Win 3 – Support for BW on the IBM E850C @ 50GB/core allowing this system to now support 2.4TB.[ii]  The previous limit was 32GB/core meaning a maximum size of 1.5TB.  This is a huge, 56% improvement which means that this, already very competitive platform, has become even stronger.

Win 4 – Saving the best for last, SAP announced support for Suite on HANA (SoH) and S/4HANA of up to 16TB with 144 cores on IBM Power E880 and E880C systems.ii  Several very large customers were already pushing the previous 9TB boundary and/or had run the SAP sizing tools and realized that more than 9TB would be required to move to HANA.  This announcement now puts IBM Power Systems on an even footing with HPE Superdome X.  Only the lame duck SGI UV 300H has support for a larger single image size @ 20TB, but not by much.  Also notice that to get to 16TB, only 144 cores are required for Power which means that there are still 48 cores unused in a potential 192 core systems, i.e. room for growth to a future limit once appropriate KPIs are met.  Consider that the HPE Superdome X requires all 16 sockets to hit 16TB … makes you wonder how they will achieve a higher size prior to a new chip from Intel.

Win 5 – Oops, did I say there were only 4 major wins?  My bad!  Turns out there is a hidden win in the prior announcement, easily overlooked.  Prior to this new, higher memory support, a maximum of 96GB/core was allowed for SoH and S/4HANA workloads.  If one divides 16TB by 144 cores, the new ratio works out to 113.8GB/core or an 18.5% increase.  Let’s do the same for HPE Superdome X.  16 sockets times 24 core/socket = 384 cores.  16TB / 384 cores = 42.7GB/core.  This implies that a POWER8 core can handle 2.7 times the workload of an Intel core for this type of workload.  Back in July, I published a two-part blog post on scaling up large transactional workloads.[iii]  In that post, I noted that transactional workloads access data primarily in rows, not in columns, meaning they traverse columns that are typically spread across many cores and sockets.  Clearly, being able to handle more memory per core and per socket means that less traversing is necessary resulting in a high probability of significantly better performance with HANA on Power compared to competing platforms, especially when one takes into consideration their radically higher ccNUMA latencies and dramatically lower ccNUMA bandwidth.

Taken together, these announcements have catapulted HANA on IBM Power Systems from being an outstanding option for most customers, but with a few annoying restrictions and limits especially for larger customers, to being a best-of-breed option for all customers, even those pushing much higher limits than the typical customer does.




December 6, 2016 Posted by | Uncategorized | , , , , , , , , , , , , , , , , , , , | 3 Comments

ASUG Webinar next week – Scale-Up Architecture Makes Deploying SAP HANA® Simple

On October 4, 2016, Joe Caruso, Director – ERP Technical Architecture at Pfizer, will join me presenting on an ASUG Webinar.  Pfizer is not only a huge pharmaceutical company but, more importantly, has implemented SAP throughout their business including just about every module and component that SAP has to offer in their industry.  Almost 2 years ago, Pfizer decided to begin their journey to HANA starting with BW.  Pfizer is a leader in their industry and in the world of SAP and has never been afraid to try new things including a large scale PoC to evaluate scale-out vs. scale-up architectures for BW.  After completion of this PoC, Pfizer made a decision regarding which one worked better, proceeded to implement BW HANA and had their go-live just recently.  Please join us to hear about this fascinating journey.  For those that are ASUG members, simply follow this link.

If you are an employee of an ASUG member company, either an Installation or Affiliate member, but not registered for ASUG, you can follow this link to join at no cost.  That link also offers the opportunity for companies to join ASUG, a very worthwhile organization that offers chapter meetings all over North America, a wide array of presentations at their annual meeting during Sapphire, the BI+Analytics conference coming up in New Orleans, October 17 – 20, 2016, hundreds of webinars, not to mention networking opportunities with other member companies and the ability to influence SAP through their combined power.

This session will be recorded and made available at a later date for those not able to attend.  When the link to the recording is made available, I will amend this blog post with that information.

September 29, 2016 Posted by | Uncategorized | , , , , , , , | 5 Comments

Large scale-up transactional HANA systems – part 1

Customers which require Suite on HANA (SoH) and S/4HANA systems with 6TB of memory or less will find a wide variety of available options.  Those options do not require any specialized type of hardware, just systems that can scale up to 8 sockets with Intel based systems and up to 64 cores with IBM Power Systems (socket count depends on the number of active cores per socket which varies by system).  If you require 6TB or less or can’t imagine ever needing more, then sizing is a fairly easy process, i.e. look at the sizing matrix from SAP and select a system which meets your needs.  If you need to plan for more than 6TB, this is where it gets a bit more challenging.   The list of options narrows to 5 vendors between 6TB and 8TB, IBM, Fujitsu, HPE, SGI and Lenovo and gets progressively smaller beyond that.

All systems with more than one socket today are ccNUMA, i.e. remote cache, memory and I/O accesses are delivered with more latency and lower bandwidth than local to the processor accesses.   HANA is highly optimized for analytics, which most of you probably already know.  The way it is optimized may not be as obvious.  Most tables in HANA are columnar, i.e. every column in a table is kept in its own structure with its own dictionary and the elements of the column are replaced with a very short dictionary pointer resulting in outstanding compression, in most cases.  Each column is placed in as few memory pages as possible which means that queries which scan through a column can run at crazy fast speeds as all of the data in the column is as “close” as possible to each other.  This columnar structure is beautifully suited for analytics on ccNUMA systems since different columns will typically be placed behind different sockets which means that only queries that cross columns and joins will have to access columns that may not be local to a socket and, even then, usually only the results have to be sent across the ccNUMA fabric.  There was a key word in the previous sentence that might have easily been missed: “analytics”.  Where analytical queries scan down columns, transactional queries typically go across rows in which, due to the structure of a columnar database, every element in located in a different column, potentially spanning across the entire ccNUMA system.  As a result, minimized latency and high cross system bandwidth may be more important than ever.

Let me stop here and give an example so that I don’t lose the readers that aren’t system nerds like myself.  I will use a utility company as an example as everyone is a utility customer.  For analytics, an executive might want to know the average usage of electricity on a given day at a given time meaning the query is composed of three elements, all contained in one table: usage, date and time.    Unless these columns are enormous, i.e. over 2 Billion rows, they are very likely stored behind a single socket with no remote accesses required.  Now, take that same company’s customer care center, where a utility consumer wants to turn on service, report an outage or find out what their last few months or years of bills have been.  In this case, all sorts of information is required to populate the appropriate screens, first name, last name, street address, city, state, meter number, account number, usage, billed amount and on and on.  Scans of columns are not required and a simple index lookup suffices, but every element is located in a different column which has to be resolved by an independent dictionary lookup/replacement of the compressed elements meaning several or several dozen communications across the systems as the columns are most likely distributed across the system.  While an individual remote access may take longer, almost 5x in a worst case scenario[i], we are still talking nanoseconds here and even 100 of those still results in a “delay” of 50 microseconds.  I know, what are you going to do while you are waiting!  Of course, a utility customer is more likely to have hundreds, thousands or tens of thousands of transactions at any given point in time and there is the problem.  An increased latency of 5x for every remote access may severely diminish the scalability of the system.

Does this mean that is not possible to scale-up a HANA transactional environment?  Not at all, but it does take more than being able to physically place a lot of memory in a system to be able to utilize it in a productive manner with good scalability.  How can you evaluate vendor claims then?  Unfortunately, the old tried and true SAP SD benchmark has not been made available to run in HANA environments.  Lacking that, you could flip a coin, believe vendor claims without proof or demand proof.  Clearly, demanding proof is the most reasonable approach, but what proof?  There are three types of proof to look at: past history with large transactional workloads, industry standard benchmarks and architecture.

In the over 8TB HANA space, there are three competitors; HPE Superdome X, SGI UV 300H and IBM Power Systems E870/E880.  I will address those systems and these three proof points in part 2 of this blog post.


July 1, 2016 Posted by | Uncategorized | , , , , , , , , , , , , , , , | 3 Comments