SAPonPower

An ongoing discussion about SAP infrastructure

Support for HANA on Power with RedHat is finally here!!

SAP made a very exciting announcement this past Friday.  While it took a bit longer than expected, support for RHEL 7.3 with HANA on Power was announced by SAP in their usual overwhelming manner, i.e. they updated a SAP note:   SAP HANA 2235581: Supported Operating Systems.  RHEL is only supported on Power in Little Endian mode, i.e. only works with HANA 2.0.  This support is incredibly important for customers that have established RHEL as their standard for Linux and were either reluctant to introduce a different Linux distribution or were outright forbidden to by their corporate standards.  Taken together with the TDI Phase 5 SAPS based sizing announcement, yet another element that was inhibiting the already explosive growth of HANA on Power was removed.  I described that announcement as allowing the use of 5th gear after being limited to only 4.  Taking this metaphor a step further, RHEL support is like disengaging the parking brake.  I should mention that IBM does not develop a Linux variant of their own nor do they endorse any particular variety.  As such, I will not suggest any advantage of running RHEL or SLES for HANA and recommend, if a company has no firm policy either way, that you ask each distro partner to explain why they think theirs is better for HANA than the other.  At this time, both are supported only with PowerVM on Power Systems and with the exact same limits and multi-tenant/multi-VM flexibility.

Advertisements

October 9, 2017 Posted by | Uncategorized | , , , , , , , , | 4 Comments

TDI Phase 5 – SAPS based sizing bringing better TCO to new and existing Power Systems customers

SAP made a fundamental and incredibly important announcement this week at SAP TechEd in Las Vegas: TDI Phase 5 – SAPS based sizing for HANA workloads.  Since its debut, HANA has been sized based on a strict memory to core ratio determined by SAP based on workloads and platform characteristics, e.g. generation of processor, MHz, interconnect technology, etc.  This might have made some sense in the early days when much was not known about the loads that customers were likely to experience and SAP still had high hopes for enabling all customer employees to become knowledge workers with direct access to analytics.  Over time, with very rare exception, it turned out that CPU loads were far lower than the ratios might have predicted.

I have only run into one customer in the past two years that was able to drive a high utilization of their HANA systems and that was a customer running an x86 BW implementation with an impressively high number of concurrent users at one point in their month.  Most customers have experienced just the opposite, consistently low utilization regardless of technology.

For many customers, especially those running x86 systems, this has not been an issue.  First, it is not a significant departure from what many have experienced for years, even those running VMware.  Second, to compensate for relatively low memory and socket-to-socket bandwidth combined with high latency interconnects, many x86 systems work best with an excess of CPU.  Third, many x86 vendors have focused on HANA appliances which are rarely utilized with virtualization and are therefore often single instance systems.

IBM Power Systems customers, by comparison, have been almost universal in their concern about poor utilization.  These customers have historically driven high utilization, often over 65%.  Power has up to 5 times the memory bandwidth per socket of x86 systems (without compromising reliability) and very wide and parallel interconnect paths with very low latencies.  HANA has never been offered as an appliance on Power Systems, instead being offered only using a Tailored Datacenter Infrastructure (TDI) approach.  As a result, customers view on-premise Power Systems as a sort of utility, i.e. that they should be able to use them as they see fit and drive as much workload through them as possible while maintaining the Service Level Agreements (SLA) that their end users require.  The idea of running a system at 5%, or even 25%, utilization is almost an affront to these customers, but that is what they have experienced with the memory to core restrictions previously in place.

IBM’s virtualization solution, PowerVM, enabled SAP customers to run multiple production workloads (up to 8 on the largest systems) or a mix of production workloads (up to 7) with a shared pool of CPU resources within which an almost unlimited mix of VMs could run including non-prod HANA, application servers, as well as non-SAP and even other OS workloads, e.g. AIX and IBM i.  In this mixed mode, some of the excess CPU resource not used by the production workloads could be utilized by the shared-pool workloads.  This helped drive up utilization somewhat, but not enough for many.

These customers would like to do what they have historically done.  They would like to negotiate response time agreements with their end user departments then size their systems to meet those agreements and resize if they need more capacity or end up with too much capacity.

The newly released TDI Overview document http://bit.ly/2fLRFPb describes the new methodology: SAP HANA quicksizer and SAP HANA sizing reports have been enhanced to provide separate CPU and RAM sizing results in SAPS”.  I was able to verify Quicksizer showing SAPS, but not the sizing reports.  An SAP expert I ran into at TechEd suggested that getting the sizing reports to determine SAPS would be a tall order since they would have to include a database of SAPS capacity for every system on the market as well as number of cores and MHz for each one.  (In a separate blog post, I will share how IBM can help customers to calculate utilized SAPS on existing systems).  Customers are instructed to work with their hardware partner to determine the number of cores required based on the SAPS projected above.  The document goes on to state: The resulting HANA TDI configurations will extend the choice of HANA system sizes; and customers with less CPU intensive workloads may have bigger main memory capacity compared to SAP HANA appliance based solutions using fixed core to memory sizing approach (that’s more geared towards delivery of optimal performance for any type of a workload).”

Using a SAPS based methodology will be a good start and may result in fewer cores required for the same workload as would have been previously calculated based on a memory/core ratio.  Customers that wish to allocate more of less CPU to those workloads will now have this option meaning that even more significant reduction of CPU may be possible.  This will likely result in much more efficient use of CPU resources, more capacity available to other workloads and/or the ability to size systems with less resources to drive down the cost of those systems.  Either way helps drive much better TCO by reducing numbers and sizes of systems with the associated datacenter and personnel costs.

Existing Power customers will undoubtedly be delighted by this news.  Those customers will be able to start experimenting with different core allocations and most will find they are able to decrease their current HANA VM sizes substantially.  With the resources no longer required to support production, other workloads currently implemented on external systems may be consolidated to the newly, right sized, system.  Application servers, central services, Hadoop, HPC, AI, etc. are candidates to be consolidated in this way.

Here is a very simple example:  A hypothetical customer has two production workloads, BW/4HANA and S/4HANA which require 4TB and 3TB respectively.  For each, HA is required as is Dev/Test, Sandbox and QA.  Prior to TDI Phase 5, using Power Systems, the 4TB BW system would require roughly 82-cores due to the 50GB/core ratio and the S/4 workload would require roughly 33 cores due to the 96GB/core ratio.  Including HA and non-prod, the systems might look something like:

TDI Phase 4

Note the relatively small number of cores available in the shared pool (might be less than optimal) and the total number of cores in the system. Some customers may have elected to increase to an even larger system or utilize additional systems as a result.  As this stood, this was already a pretty compelling TCO and consolidation story to customers.

With SAPS based sizing, the BW workload may require only 70 cores and S/4 21 cores (both are guesses based on early sizing examples and proper analysis of the SAP sizing reports and per core SAPS ratings of servers is required to determine actual core requirements).  The resulting architecture could look like:

TDI Phase 5 est

Note the smaller core count in each system.  By switching to this methodology, lower cost CPU sockets may be employed and processor activation costs decreased by 24 cores per system.  But the number of cores in the shared pool remains the same, so still could be improved a bit.

During a landscape session at SAP TechEd in Las Vegas, an SAP expert stated that customers will be responsible for performance and CPU allocation will not be enforced by SAP through HWCCT as had been the case in the past.  This means that customers will be able to determine the number of cores to allocate to their various instances.  It is conceivable that some customers will find that instead of the 70 cores in the above example, 60, 50 or fewer cores may be required for BW with decreased requirements for S/4HANA as well.  Using this approach, a customer choosing this more hypothetical approach might see the following:

TDI Phase 5 hyp

Note how the number of cores in the shared pool have increased substantially allowing for more workloads to be consolidated to these systems, further decreasing costs by eliminating those external systems as well as being able to consolidate more SAN and Network cards, decreasing computer room space and reducing energy/cooling requirements.

A reasonable question is whether these same savings would accrue to an x86 implementation.  The answer is not necessarily.  Yes, fewer cores would also be required, but to take advantage of a similar type of consolidation, VMware must be employed.  And if VMware is used, then a host of caveats must be taken into consideration.  1) overhead, reportedly 12% or more, must be added to the capacity requirements.  2) I/O throughput must be tested to ensure load times, log writes, savepoints, snapshots and backup speeds which are acceptable to the business.  3) limits must be understood, e.g. max memory in a VM is 4TB which means that BW cannot grow by even 1KB. 4) Socket isolation is required as SAP does not permit the sharing of a socket in a HANA production/VMware environment meaning that reducing core requirements may not result in fewer sockets, i.e. this may not eliminate underutilized cores in an Intel/VMware system.  5) Non-prod workloads can’t take advantage of capacity not used by production for several reasons not the least of which is that SAP does not permit sharing of sockets between VM prod and non-prod instances not to mention the reluctance of many customer to mix prod and non-prod using a software hypervisor such as VMware even if SAP permitted this.  Bottom line is that most customers, through an abundance of caution, or actual experience with VMware, choose to place production on bare-metal and non-prod, which does not require the same stack as prod, on VMware.  Workloads which do require the same stack as prod, e.g. QA, also are usually placed on bare-metal.  After closer evaluation, this means that TDI Phase 5 will have limited benefits to x86 customers.

This announcement is the equivalent of finally being allowed to use 5th gear on your car after having been limited to only 4 for a long time.  HANA on IBM Power Systems already had the fastest adoption in recent SAP history with roughly 950 customers selecting HANA on Power in just 2 years. TDI Phase 5 uniquely benefits Power Systems customers which will continue the acceleration of HANA on Power.  Those individuals that recommended or made decisions to select HANA on Power will look like geniuses to their CFOs as they will now get the equivalent of new systems capacity at no cost.

September 29, 2017 Posted by | Uncategorized | , , , , , , , , , , , , | 3 Comments

HPE, still playing fast and loose with the facts about SAP HANA on Power

Writing a blog post would be so much simpler if IBM permitted me to lie, but that is prohibited.  That is clearly not the case at HPE, see this recent blog post: https://community.hpe.com/t5/Alliances/SAP-HANA-runs-best-on-x86-Period/ba-p/6971659#.WZxDPa01TxW

It contains so many lies, it is hard to know where to start.   Let’s start with the biggest one.  There is a 10x difference in performance KPIs required by SAP to certify and ship a HANA appliance vs. a solution certified for TDI only.

You really have to love those lies that are refuted by such easily obtained facts from documentation that is apparently not used by HPE called SAP Notes.  SAP note 1943937 specifically states: All HWCCT tests of appliances (compute servers) certified with scenario HANA-HWC-AP SU 1.1 or HANA-HWC-AP RH 1.1 must use HWCCT of SAP HANA SPS10 or higher or a related SAP HANA revision”  Interesting that appliances must use the same HWCCT test of SAP KPIs as used by TDI.  So, based on HPE’s blog post, does this mean that if an HPE appliance compute server is used for TDI, it will perform 10x worse than if it is used in an appliance? That would imply that the secret sauce of HPE’s appliances is so incredible that it acts like a dual turbocharger on a car!

The blog post goes on to say “It (Power) ‘works’ … but it is just held to a ~10x lower standard without any of the performance optimizations attributed to SAP’s co-innovation efforts with Intel.”  OK, two lies in one sentence, obviously going for the gold here.  As we have already discussed the 10x lie, let us just look at the second one, i.e. performance optimizations.  Specifically, the blog calls out AVX and TSX as the performance optimizations for the Intel platform.  They are correct, those optimization don’t work on Power as, instead of AVX (Advanced Vector Extensions), Power has two fully symmetric vector pipelines called via VSX (Vector-Scalar eXtensions) instructions which HANA has been “optimized” to use in the same manner as AVX.  And TSX, a.k.a. Transactional System Extensions, came out after POWER8 Transactional Memory, but HANA was optimized for both at the same time.

The blog post also stated “Intel E7v3 CPUs for HANA (TSX and AVX) that offer a 5x performance boost over older Intel E7 or Power 8 CPUs.“  Awesome, but where is the proof behind this statement?  Perhaps a benchmark?  Nope, not one published on SAP’s site, even the old SD benchmark backs up this claim (which shows, by the way, almost the same SAPS/core for E7v3 vs. E7v2 and way less than POWER8, but maybe that benchmark does not use those optimizations?  Ok, then maybe the sizing certifications show this?  Nope.  A 4 socket CS500 Ivy Bridge system (E7v2) is published as supporting up to a 2TB SoH solution where a 4 socket CS500 Haswell systems (E7v3) is shown as supporting up to a 3TB SoH solution.  So far, that is just 50% more, not 5x, but perhaps HPE can’t tell the difference between 0.5 and 5.0?  But didn’t Haswell have more cores per socket?  Yes, it had 18 cores/socket vs. 15 cores/socket for IvyBridge, i.e. 20% more cores/socket.  So, E7v3 based systems could actually host 25% more memory and associated workload than E7v2 based systems per core.  Of course, I am sure that the switch from DDR3 to DDR4 from E7v2 to E7v3 had nothing to do with this performance improvement.  So, 5x performance boost is clearly nothing but a big fat lie.

And the hits just keep coming.  The next statement is just lovely “Naturally, this only matters if you want to be able to call SAP support to get help on nuance performance issues impacting your productive SAP HANA deployment.“  I guess he is trying to suggest that SAP won’t help you with performance problems if you are running any TDI solution including Power Systems, except this is contradicted by all of the SAP notes about TDI and HWCCT, not to mention the experience of customers who have implemented HANA on Power.

IBM wants to be the king of legacy businesses like mainframe and UNIX. That’s pretty much the only platforms they have left. So now that they can state that HANA “works” on Power, they can make a case to their AIX/Power customers that they should stay on AIX/Power for SAP and HANA and avoid what IBM claims to be an “oh so painful’ Unix to x86 migration.“  HPE, suggesting that you can run HANA on AIX/Power since HANA only runs on Linux/Power, might not be telling a lie but might just be expressing ignorance and the inability to use sophisticated and obscure search tools like Google.   As to the suggestion of IBM having said that a SAP heterogeneous migration is “oh so painful” ignores the fact that we have done hundreds of such migrations from HP/UX among others.  Perhaps the author is reflecting HPE’s migration experience with what every other migration provider sees as very well understood and fully supported SAP process.  As to IBM’s motivation, HPE is trying to suggest that IBM is only in the HANA business to support its legacy SAP on AIX/Power.  Just looking at the thriving business of HANA on Power, over 850 HANA on Power wins since becoming a supported HANA provider, might suggest otherwise.  How about the complete absence of any quotes, marketing materials or other documentation that shows that IBM is in this market for any other reason than it is the future of SAP and IBM intends to remain a premier partner of SAP and our customers?

Taken together, this blog post shows that HPE must be using the advanced Skylake processors in their Superdome-X and MC990 X (SGI UV 300H) to generate lies at an astounding pace.  Oh wait, I forgot (not really) that HPE still has not announced support for Skylake in their high end systems and is only certified for BWoH, not SoH or S/4HANA, with Skylake and only up to 3TB at that, … one month after announcement of Skylake!  Wow, I guess this shows simultaneously how much their pace of technology innovation has slowed down and partnership with SAP has decreased!

And, let’s end on their last amazing sentence, “Every day you put off that UNIX to x86 migration, you are running HANA in a performance degraded mode with production support limitations.“  Yes, HPE seems to be recommending that you migrate your UNIX based systems, i.e. those running Business Suite 7, to x86.  In other words, do two migrations, once from UNIX to x86 and a second to HANA.  Sounds like a totally disconnect from business reality!  As to the second part of that sentence, it simply does not make sense, so we are not going to attribute that to a lie, but to a simple logic error.

What are we to conclude about this blog post?  Taken on its own, it is a rogue employee.  Taken with the deluge of other similar misleading and outright lies emerging from HPE and we see a trend.  HPE has gone from being a well-respected systems supplier to a struggling company that promotes and condones lies and hires less than competent individuals to propagate misinformation.  Sounds more like a failing communist state than a company worthy of a customer’s trust.

August 22, 2017 Posted by | Uncategorized | , , , , , , , , | 4 Comments

Intel Skylake has been announced and the self-described HANA “market leader”, HPE, is curiously trailing the field

Intel announced general availability of their “Skylake” processor on the “Purely” platform last week.  Soon after, SAP posted certified HANA configurations for Lenovo and Fujitsu up to 8 sockets and 12TB memory for Suite on HANA (SoH) and S/4HANA (S4) and 6TB for BW on HANA (BWoH).  They also posted certified configurations for Dell and Cisco up to 4-socket systems with 6TB SoH/S4 and 3TB BWoH.  The certified configurations posted for HPE, which describes itself as the HANA market leader, only included up to 4-socket/3TB BWoH configurations, no configurations for SoH/S4 and nothing for any larger systems.

It is still early and more certified configurations will no doubt emerge over time, but these early results do beg the question, “what is going on with HPE?”  I checked the most recent press releases for HPE and they did not even mention the Skylake debut much less their certification with SAP HANA.  If you Google using the keywords, HPE, Skylake and HANA, you may find a few discussions about HPE’s acquisition of SGI and my previous blog posts with my speculation about Superdome’s demise and HPE’s misleading of customers about this impending event, but nothing from HPE.

So, I will share a little more speculation as to what this slow start for HPE in the Skylake space might portend.

Option 1 – HPE is not investing the funds necessary to certify all of their possible configurations and SoH/S4.  Anyone that has been involved with the HANA certification process will tell you that it is very time consuming and expensive.  As you can see from HPE’s primary Intel based competitors, they are all very eager to increase their market share and acted quickly.  Is HPE becoming complacent?  Are they having financial restrictions that have not been made public?

Option 2 – HPE’s technology limitations are becoming apparent.  The Converged System 500 is based on Proliant DL560/580 systems which support a maximum of 4 sockets.  These systems utilize Intel QPI and now UPI interconnect technologies, i.e. no custom ASICs or ccNUMA switches are required.  The CS900 based on the Superdome X and the MC990 X (SGI UV 300H) utilize custom ASICs and, in the case of Superdome X, a set of ccNUMA switches.  As I speculated previously, Superdome X is probably at end of life, so it may never see another certification on SAP’s HANA site.  As to the MC990 X, the crystal ball is a bit more hazy.  Perhaps HPE is trying to shoot for the moon and hit a number beyond the 20TB for SoH/S4 that is currently supported meaning a much longer and more complex set of certification tests.  Or perhaps they are running into technical challenges with the new ASICs required to support UPI.

Option 3 – MC990 X is going to officially become HPE’s only high end offering to support Skylake and subsequent processors and Superdome X is going to be announced at end of life.  If this were to happen, it would mean that anyone that had recently purchased such a system would have purchased a system that is immediately obsolete.

If Option 1 turns out to be true, one would have to concerned about HPE’s future in the HANA space.  If Option 2 turns out to be true, one would have to be really concerned about HPE’s future in the HANA space.  And if Option 3 turns out to be true, why would HPE be waiting?  The answer may be inventory.  If HPE has a substantial inventory of “old” Broadwell based blades and Superdome X chassis, they will undoubtedly want to unload these at the highest price possible and they know that the value of obsolete systems after such an announcement would drop into the below cost of manufacturing range.

So, you pick the most likely scenario.  Worst case for HPE is that they are just a little slow or shooting too high.  Worst case for customers is that they purchase a HANA system based on Superdome X and end up with a few hundred thousand dollar boat anchor.  If you work for a company considering the purchase of an HPE Superdome X solution, you may want to ask about its future and, if you find it is at end of life, select another solution for your SAP HANA requirements.

Inevitably, more systems will be published on SAP’s certification page, https://www.sap.com/dmc/exp/2014-09-02-hana-hardware/enEN/appliances.html#viewcount=100&categories=certified%2CIntel%20Skylake%20SP .  When that happens, especially if any of my predictions turn out to be true or if they are all wrong and another scenario emerges, I will post an update.

July 20, 2017 Posted by | Uncategorized | , , , , , , , , , , , , | Leave a comment

3D XPoint Memory – The best thing for SAP HANA since HANA was invented?

At #SapphireNow, the Intel booth was all atwitter about the new “game changer”, “revolutionary”, “future of computing”, “best thing since the wheel” (ok, I made that last one up).  Yes, they were thrilled with 3D XPoint Optane memory.[i]  It is being positioned as persistent memory, like SSD but much faster and which can take the place of real, a.k.a. DRAM, memory … eventually.  Paraphrasing them, “You will be able to replace conventional memory with 3D XPoint memory at almost the same speed but which gives you the ability to restart your system after failure in a matter of seconds, not minutes or hours, because the entire HANA image will be stored in persistent memory, not on disk or SSDs.”

This sounds fantastic as long as we completely ignore reality.  Let’s dissect the above sentence.

“almost the same speed” – current speculation is that 3DXPoint memory will be about 10 times slower than conventional memory.  That is WAY better than external SSD storage, which is around 1000 times slower, but for memory resident applications, like HANA, 10 times slower will result in at least a 10x performance reduction for HANA.  Remember, we have no idea how this might affect an application which expects very fast access to memory.

“restart your system after failure:” – silly me, I thought the idea was to prevent failure in the first place.  I am curious how often system failure is caused by memory errors or any other cause for which diagnostics might be required to evaluate the underlying problem as well a repair action to fix that problem.  Then the question is in which scenario is a customer willing to circumvent diagnostics and return the system to productive use.  This also assumes that customers are willing to run mission critical systems without any sort of HA solution such as HANA System Replication or HANA Host Auto-Failover.  The use of an HA solution would fail-over production to a secondary system which means that any memory image on the primary system would be out of date almost instantly.

“restart … in seconds” – So, your system has failed for unknown reasons and you are willing to forgo any sort of evaluation of the underlying cause.  So far so good.  So, Linux is capable of restarting and keeping the memory image as it was before hand and utilizing persistent main memory? Not entirely, but with RHEL 7.3 (not supported for HANA yet), using special device drivers applications may be rewritten to utilize “pmem” for pseudo storage devices.[ii]  And HANA is capable of restarting as well from whatever point it was in at the time of failure.  Also, did not know HANA could do this and am surprised that SAP prioritized fast restart ahead of the long laundry list of customer provided requirements … which I doubt they did.  And HANA can figure out what transactions were in flight at the time of failure, which ones had made some changes to memory, but not all, e.g. started to insert data into a delta table but perhaps had not completed this action at time of failure?  Totally wicked!! … and total fantasy, at least for now.

You can easily imagine a variety of other conditions where columns are being updated, e.g. during a delta merge, but have not finished in which some columns contain updated elements and others do not.  I am not saying these are insurmountable problems, but considering that you can’t even make a change to the size of a HANA system without restarting HANA currently, it is a massive stretch to imagine how SAP has or is willing to invest the time and effort to make this work for a highly questionable benefit with likely severe performance degradation.

So, 3D XPoint memory as a replacement for conventional memory is clearly all hype, but don’t expect anyone from Intel or their proponents to tell you this.  How about as a technology for much faster SSDs?  Now we are talking!  I doubt there is any reason why this will not be quickly adopted by disk subsystem vendors and available from multiple sources.

As to whether HANA workloads will benefit, that is a different story.  Remember, HANA is a read-once workload.  Once a column is loaded into memory, it is never read again until unloaded and this should only occur if the memory subsystem is undersized or the system is restarted after maintenance.  So, fast storage is useful for restarts, but super-fast storage is only needed when a system must return to full operation after maintenance very quickly and without any performance degradation, i.e. every column loaded into memory, in 10 minutes or so.  Just as a point of comparison, IBM ran a test with 10 NVMe cards and delivered about 1TB per minute when restarting HANA.  To the best of my knowledge, few customers have expressed more than a passing interest in this capability.  I could imagine a scenario in which customers are willing to put a somewhat recent tier of data, e.g. 1 to 2 year old data, on persistent main memory, with perhaps external, and orders of magnitude slower, storage used for older data.  Once again, this is a nice concept but until SAP writes or adopts code to enable this, it is just a theory.

As to writes, most enterprise storage subsystems can deliver response times that are twice as fast as SAP requires.  IBM SVC (SAN Volume Controller) connected to an IBM Power System has been tested in real customer installations and has delivered the fastest times of any storage subsystem in the industry with a peak latency of only 161us (microseconds) for 4K block size log writes as measured by HWCCT or over 6 times better latency than what SAP requires.   SVC is part of a family of products including V7000, V9000 and Spectrum Virtualization Software which all utilize similar concepts and software.

In other words, you don’t have to wait for tomorrow to get fast restarts and minimized transactional log writes, you just need to select the write infrastructure partner, IBM.

[i] https://www.theregister.co.uk/2017/05/17/coming_xeon_sps_will_run_sap_hana_16_times_faster/
[ii] https://developers.redhat.com/blog/2016/12/05/configuring-and-using-persistent-memory-rhel-7-3/

June 12, 2017 Posted by | Uncategorized | , , , , , | 4 Comments

There they go again, HPE misleading customers about Superdome X futures

When the acquisition of SGI by HPE was announced last year, I was openly skeptical about HPE’s motives and wrote a blog post to discuss the potential reasons and implications behind this decision. I felt pretty certain that HPE would not retain both Superdome X (SD-X) and the high end SGI system, now called MC990 X as they play in the same space. I speculated that the MC990 X would not be the winner because it has inferior memory reliability technology and uses a hand wired matrix to interconnect different nodes compared to SD-X which uses a backplane to connect nodes to switches for connectivity to other nodes.

At SapphireNow this past week, I learned, from a couple of different sources, that Intel’s next generation platform, “Purely” using “Skylake” processors include so many significant changes that SD-X will very likely not be able to support them. Those changes include a new chip level interconnect technology “UltraPath” (UPI), as a follow on to QPI, faster memory and a larger footprint which, according to pictures available on the internet, appears to be at least 25% larger. This makes sense since the high end Skylake processor will have up to 28 cores versus Broadwell’s 24 cores and includes 25% more L3 cache but uses the same 14nm lithography. Though specs are limited, my sources also believe this new platform will require more power and cooling than the “Broadwell” chips used in SD-X today. SD-X cell boards are already pretty compact nodes, which made sense when they were trying to deliver a “converged” solution, but which means that any increase in footprint of any component could push these boards beyond their limits. Clearly, the UPI change alone would require HPE’s XNC2 cell controller to be redesigned even if no power or cooling limits were exceeded. Considering how few of these chips are used per year, it would be impractical for HPE to maintain a development program for both XNC2 (and the associated SX3000 switch) in addition to the SGI acquired NUMALink7 cell controllers. The speculation is therefore that SD-X will, in fact, be the loser and HPE will utilize the MC990 X as the go forward, high end platform.

If this turns out to be a correct conclusion, then a lot of very large customers are likely to be in for a major shock since they have purchased SD-X with the expectation of being able to upgrade it and add on like architecture systems for their SAP HANA workloads as they grow over time as well as for other large memory applications. Had they been informed about the upcoming obsolescence of SD-X, many of these customers may have made a different purchasing decision. I am not convinced they would have purchased the MC990 X instead for a variety of reasons, not the least of which is the lack of robust memory protection, available with SD-X, known as RAS, Lockstep or DDDC+ mode, but not with MC990 X. Lacking this, the failure of a chip on a memory DIMM could result in a HANA failure, system failure or the need to immediately shut down the system to diagnose and fix the failed DIMM. Most SAP customers running Suite on HANA or S/4HANA are unlikely to find this level of protection to be acceptable, especially when they are running massive HANA transactional systems.

On the subject of reliability, one of the highest exposures that any system has are the physical connections and the MC990 X has a lot of them to form the ccNUMA mesh. Each cable has a connector on each end which are then inserted into the appropriate port on each pair of node controllers. Any time a cable is inserted into any sort of receptacle, there is a possibility that it will fail due to the pressure required to insert it or corrosion. A system with 2 nodes has 8 such cables, 2 between the pair of controllers on each node and 4 between the nodes. A system with 5 nodes has approximately 50 such cables. While not necessarily catastrophic, a failure of one of these cables would require a planned outage as soon as possible as there would be a performance impact from the loss of the connection.

Another limitation of the MC990 X is the lack of virtualization technology. As each node in a MC990 X contains 4 sockets, the level of granularity of this system is so coarse that very poor utilization is likely to result and a massive increase in footprint will be required to handle the same set of workloads that might have been previously handled by SD-X. Physical partitioning is available, but based on one or more 4 socket nodes, such a waste as to be completely irrelevant.

If this speculation proves to be true, and HPE has not been sharing this roadmap with customers, then those customers have been convinced to invest in an obsolete platform and they should be furious. For those in the process of making a decision, they should be asking HPE the right questions about SD-X product futures and then taking that into account as well as the product weaknesses of MC990 X.

I should note that IBM has but one go forward architecture for Power Systems based on the on-chip interconnect technology that has been included and evolving since POWER4. Customers that invested in Power technology have seen a consistent 3 to 4 year cycle to the next generation with those who purchased high end models often having the option to upgrade from one generation to the next.

If you read my blog post last week, you may recall that I pointed out how HP was lying to customers about HANA on IBM Power Systems.  Now we learn that they may also be doing the same sort of thing about their own products.  Me thinks a trend is emerging here!?!?

May 23, 2017 Posted by | Uncategorized | , , , , , , , , , | 2 Comments

HPE, in an act of desperation, is spreading misinformation about SAP HANA on IBM Power Systems

Misinformation is a poor characterization of HPE’s behavior.  HPE, or some of its employees, are showing customers charts with a variety of statements which are simply untrue.  In any normal definition, this is called a lie.  This is unethical and unprofessional.  I will repeat what it says in my profile, these are my opinions, not a reflection of those of IBM.

You may have seen the blog post from Vicente Moranta: https://www.linkedin.com/pulse/truth-wild-lies-being-told-hanaonpower-vicente-moranta or my own post on IBM Systems Blog: https://www.ibm.com/blogs/systems/can-you-tell-hana-on-ibm-power-systems-fact-from-fiction/ . At IBM, we have this set of ethical rules called “IBM Business Conduct Guidelines” (BCG).  This 15 to 20 page document is required reading every year with a mandatory test to ensure comprehensive understanding of these rules.  I can boil down one of the most important themes into two words: DON’T LIE!

For those of you who have been reading this blog for a while, you may question whether I am too verbose and that may be fair as I thoroughly research each subject and include attribution for claims, usually including direct links to the source of those claims.  I would never think of making up “facts” and, on rare occasion when a reader has informed me of a mistake, I always correct the mistake as well as include a comment to that effect.

Some background:  A few weeks ago, a customer sent us a list of questions about SAP HANA on IBM Power Systems.  At first, the questions seemed bizarre as they included some very pointed misunderstandings about HANA and SAP in general and IBM’s role with SAP in particular.  As I read them more thoroughly, I realized that someone or some entity had coached the customer.  This was confirmed when I received a copy of a HPE presentation from a completely different source with almost identically worded statements.  By the way, back to the BCG, IBM employees are not allowed to view much less share information from a competitor marked confidential and this presentation was not marked with anything, meaning it was being shown to customers with, or without, HPE management’s official knowledge.

Some of the lies it shares:

  • HPE has 99%+ share of the HANA market. It is kind of funny to note that this claim is contradicted in the same table where it shows 80% share for Intel.  I guess they are confusing SAP and SAP HANA markets which is misleading at best.  More importantly, SAP does not release market share information and even if they did, I think the Lenovo, Cisco, Dell and Fujitsu might together claim more than 1% of the market.
  • IBM, not SAP “delivers” HANA code to customers because they have access to SAP code and have created a “special” version of SAP HANA. Wow, it is hard to figure out where to start here.  SAP owns HANA and only they distribute code.  They refused to support other operating systems than Linux, including AIX, for the very reason of wanting a common code tree for all platforms.  HPE is correct that IBM works closely with SAP to optimize HANA code, a fact which should be lauded not criticized.  Apparently, HPE must not have such a relationship and are jealous?  What HPE does not understand is that regardless of who, IBM, Intel or other, contributes code to SAP or suggests modifications to code, SAP makes all decisions regarding that code, including support, and incorporates it into the common code tree meaning all platforms can benefit if the code is not related to a specific, proprietary instruction set.  When Intel contributed code for TSX, Power HANA was not able to use this code, but with appropriate modifications, SAP was able to add the code to call IBM’s similar “Transactional Memory” calls.  Now, there is simple logic which ensures the appropriate call is made depending on the underlying processor architecture.  Likewise, when IBM saw that the huge number of threads in its architecture might push limits in HANA, it worked with SAP to improve the thread and workload dispatch mechanisms in HANA.  When Intel released their Broadwell-EX 24-core chips and SAP approved large socket counts, these systems would have hit the same threading issues, but with the new mechanisms already in place, were able to benefit from IBM & SAP’s joint effort.  Maybe HPE means that SAP has to compile the same code as used for Intel systems on the Power platform.  Well duh, it is a different chip architecture, so this is computer science 101, but hardly a different “version”.
  • Release priority – #1 Intel, #2 Power. Wrong again HPE!  HANA 2.0 released simultaneously on Intel and Power, as they did for S/4HANA 1610 on-prem edition, support for SoH with HANA 2.0, etc.  Where do you get your misinformation HPE?  This information is widely available on SAP’s Service Marketplace and the SAP PAM.
  • Sizes supported – HPE shows Power support of “only” 4.8TB for BW, 9TB for SoH vs. 24TB for Intel and No scale-out HANA on Power – I will give HPE the benefit of the doubt on the 4.8TB statement as 6TB just came out, but the “only” part is strange in that in the same table it shows “only” 4TB support on Intel. As to 9TB SoH and lack of scale-out HANA, both are wrong and have been for a while with 16TB SoH available since December 4th, 2016, see SAP Note 2188482 and scale-out HANA since November 2015. As to the 24TB claim for Intel, the largest supported HANA appliance is 20TB, so HPE, once again, seems to be making up facts.

There were other lies, but I think you get the idea.  Here are a few suggestions:

To HPE management: Shame on you for permitting such behavior or if done with your knowledge, for encouraging it.  If you have any “integrity” (pun intended), you will fire the employees and managers responsible for knowingly spreading lies and will print a retraction in appropriate press sources and on your web site.  If you don’t, then you are demonstrating, loud and clear, that your company is not to be trusted.

To HPE employees: Unless your management takes the above suggestions to heart with appropriate action to rectify this wrong, I am not sure how you can sleep well working for a company that considers truth to be something to be sacrificed at their convenience.  Hope they are truthful about your benefits.

To SAP customers: I can only speak for myself; when a restaurant, retailer or manufacturer lies to me and/or the public, I refuse to ever do business with them again.  The old saying applies, “fool me once, shame on you, fool me twice, shame on me.”  When you consider the minimal differences, if any, in cost of acquisition between all HANA system providers on the market, including IBM Power

May 15, 2017 Posted by | Uncategorized | , , , , , , , , , | Leave a comment

Is your company ready to put S/4HANA into the Cloud? – part 4

This is the 4th and final installment on this topic.  Sorry for the length of each part, but the issues surrounding placement of corporate application environments cannot be boiled down into simple statements like “always think cloud first” or “cloud is no place for a corporate application”.

  • How will you get from your current on-premise SAP landscape to the cloud? As mentioned in a recent blog post[i], database conversions from a conventional database or Suite on HANA to S/4HANA are not trivial to start with.  Now add the complexity of doing that across a WAN with system characteristics and technologies which you may not be able to control and you have just made a difficult task even more so.
    • Can a migration be completed within the outage window that your business allows? Fundamentally, the business will only allow outages which result in little to know lost business or financial penalties.  Where you may be able to use dedicated, 1Gb or 10Gb Ethernet or even faster internal networks (in the case of Power Systems), unless you are able to purchase temporary, massive WAN bandwidth, you may be faced with an outage that is longer than the business will allow.
    • At what cost, complexity and risk? If such a migration would take longer than allowable, there are strategies and solutions to deal with this, e.g. SAP MDS (Minimized Downtime Service), IBM CDC (InfoSphere Change Data Capture), SNP Transformation Backbone, Dell Shareplex, but these add cost, require much more planning and testing and might impose some additional risk especially across WAN communications, see the discussion on security across the WAN above.

Lest you feel that this post is overly focused on issues which might prevent you from moving to the cloud, there are good reasons as well.  As I am not an expert on that part of the story, I will refer you to some pretty good articles on the subject.[ii]  The common theme across these sites is that cloud can a) result in cost savings, b) improve agility, c) provide more elasticity and scaling, d) move from a CapEx model to OpEx.  Lets take these one at a time.

a) cost savings – For customers that are growing rapidly, are startups, have never implemented a complex ERP system, Cloud certainly can offer major cost avoidance.  For customers with existing data centers, Linux or Unix trained support staffs, UPS and diesel generator power units, storage, security and operations standards, investments and teams, backup and recovery solutions, unless the move of SAP to the cloud along with other potential moves will allow for a large portion of those staffs to be laid off and data centers sold to another company, it may be more challenging to figure out exactly what sort of savings result from a move to the cloud.  Once you address all of your corporate requirements, discussed in detail in parts 2 and 3 of this blog, a new price for the cloud services to support your SAP S/4HANA environment may emerge and then you can start the process of determining what sort of cost savings are likely to be forthcoming.  From my personal experience with customers, it often turns out that little to no cost savings actually result.

b) improve agility – This one is more clear cut.  When on-premise systems are purchased and your requirements change, often you may find out that you under or overbought and that adjusting capacity, starting up or shutting down systems or simply running power, planning for cooling, running network and storage cables, to name just a few tasks can take weeks or months.  Cloud data centers often pre-provision technology to be ready for growth and changes in demands plus this is the business they are in, so they tend to be very good at keeping ahead of the demands for their services.  Admittedly, some customers are also excellent at this and those that have chosen IBM Power Systems with PowerVM find that making adjustments to systems is so easy that agility is not a major issue.  I know of some customers that purchase larger systems than initially required with large amounts of Capacity on Demand CPUs and memory so that growth can be accommodated without any need for physical changes, simply logical activations to deal with this very issue.

c) elasticity and scaling – Elasticity is usually considered in a cost context, i.e. pay for what you use, which cloud models do very well with utility models and use of shared infrastructure plus ability to charge per unit regardless of what size systems are required, meaning nothing is lost if you start on one size system and have to move to another.  Scaling usually refers to the ability to add almost an unlimited number of additional servers very quickly and easily, once against because cloud providers focus on this and are very good at rapid provisioning.  Is this required for S/4HANA is a more important question.  After going through a proper sizing, just about all customers get something wrong.  A study by Solitaire Interglobal [iii] a few years ago revealed that customer using x86 systems for SAP, on average, were quoted a starting price that was no more than 40% of the eventual cost.  I have seen this personally with undersized offerings or ones that “answered the mail” but did not address necessary project requirements.  Customers that have experienced this sort of cost overrun will find a cloud option especially attractive because of the ability to seamlessly move between systems or scale-out as necessary.  By comparison, that same Solitaire study showed that customers that purchased IBM Power Systems for SAP were quoted a starting price of 85% to 90% of the eventual cost.  Once again, that is because we ask the right questions up front so sizes of systems are much more accurate, most project requirements have been accounted for and overruns are less common.  These sorts of customers may not find cloud quite as much of a boon for scaling.  On the elasticity front, Power Systems offer a pay as you go model with capacity on demand or flexible financing, so this issue can also be addressed for on-premise implementations.

d) CapEx vs. OpEx – Cloud is all OpEx.  Some customers’ CFOs have decided that this is necessary even though the rationale is not always clear to those of us without a finance or business degree.  Leases for on-premise systems can be structured to be mostly or all OpEx.  Of course, that only accounts for the systems, so data center infrastructure would likely fall more under CapEx.  If those are sunk assets, however, then unless they are to be sold, depreciation under CapEx will continue whether SAP systems are moved to the cloud or not.

I am sure there are plenty of other reasons to move to the cloud.  I would simply encourage customers to get informed about the challenges of migration; the costs once real corporate requirements are included; the security and control or lack thereof you will have of your mission critical systems; the options you can utilize to resolve some of the issues driving you to cloud today.  For any customer that would like to have a discussion with me about these issues, costs and solutions, please respond to this blog or send me an email: afreude@us.ibm.com

[i] https://wordpress.com/post/saponpower.wordpress.com/524
[ii] https://doublehorn.com/why-move-to-the-cloud/
http://www.belden.com/blog/datacenters/6-reasons-why-enterprises-are-moving-to-the-cloud.cfm
https://www.salesforce.com/uk/blog/2015/11/why-move-to-the-cloud-10-benefits-of-cloud-computing.html
https://www.cardinalsolutions.com/blog/2016/08/top-reasons-for-moving-to-the-cloud
https://www.l-tron.com/top-10-reasons-to-move-your-enterprise-to-the-cloud/
 [iii]http://sil-usa.com/index.php?route=product/product&product_id=54

May 8, 2017 Posted by | Uncategorized | , , , , , , , | Leave a comment

Is your company ready to put S/4HANA into the cloud? – Part 3

The third of a 4 part discussion about corporate requirements in support of S/4HANA and questions to be asked of cloud providers in support of placing this landscape in the cloud.

  • What backups must be performed? Some cloud providers might include daily, weekly, incremental or no backups.  They may include raw image backups vs. database aware backups. Just make sure that whatever backups you require are supported and included in the price for the cloud services.
    • Should corporate backup solution be used? For flexibility reasons as well as visibility, you may prefer that a backup solution that you have tested and approved works in the cloud environment.  Or, perhaps, it is an audit requirement.
    • Are extra server(s) required for backup solution? Your security and audit departments may not permit your backups to share infrastructure, including the network, with any other clients in a provider’s cloud environment.  One or more servers with or without dedicated network infrastructure may be required.
    • How quickly must backups be performed, restored and what is the RTO after database corruption? SAP HANA backups can generally take their time as long as the aggregate transfer rate is sufficient to backup the entire database prior to the next backup.  Well, that is unless you want to be able to restore to the prior day in the event of database corruption in which case you may want the backup finished prior to a specific time on the same day.   Just make sure you think about this and have the infrastructure necessary to meet your backup speed priced.  Even more important is how quickly the backup can be restored as well as what services are offered to restore the backups and roll forward any logs that have been created since that backup was initiated, i.e. the RTO for getting back up and running.
    • Will backups be available to DR systems? Not to be overlooked are backups in DR.  Not only will you want to be able to take backups in DR, but you would also need to be able to restore from a backup in the primary site to the DR site, not so easily done if the primary site is truly down and unavailable.  This means that you would need the backup server to have bi-directional replication with DR site as well as testing to ensure this works correctly.  What incremental costs are required for the replicated backup bandwidth?
  • Security – First a disclaimer. I am not a security expert, so may be only addressing a subset of the real requirements.
    • How will corporate single sign-on operate with the cloud solution? Whether you use Microsoft Active Directory, CA SSO, IBM Tivoli Access Manager or one of the dozens of other products on the market, you are probably using this solution to authenticate and authorize users in SAP.  Make sure it can integrate with the potential S/4HANA system in the cloud.  Make sure that your security administrators can control policies, assign and revoke privileges and audit as necessary.
    • Must communications to/from cloud be encrypted and what solution will be used? We all know that hackers want to access your data for malicious reasons, financial gain or industrial espionage.  Do you want your key strokes and data to and from the cloud to transmit in clear text?  If not, which solution will you use and how might the use of that solution impact performance?  How about between application servers and database servers at the cloud provider?
    • How will data stored in cloud be secured? It is one thing to have your personal email stored on storage devices shared with millions of other users, but do your corporate polices allow for corporate databases to be located on storage devices that are shared with other customers?  If not, do you require dedicated devices, of what kind and at what cost?
    • How will backups be secured? We touched on backups earlier, but this is now specific to the physical media on which those backups are stored not to mention replicated to the DR site as well as any external media that you might require, e.g. tapes, DVDs or removable disks.  How can you be ensured that no one makes a copy, removes a disk, etc?
  • What are the non-production requirements? All of the above was just talking about production, but most customers have an even more extensive non-production landscape.  Many, if not most, of those same questions can be applied to non-production.  Remember, there are few employees that command a higher salary than your developers, whether internal or external.  They create corporate intellectual property and often work with copies of production data.  Their workloads vary based on project demands, phases of implementation or problems to be addressed.  Many customers utilize DR capacity or underutilized capacity on HA systems to address non-prod requirements, however this may not be an option in a cloud environment, or if it is, at what cost?
    • How will images be created/copied, managed, isolated, secured? You may use SAP LaMa (Landscape Manager previously know as Landscape Virtualization Manager (LVM), backup/restore, disk replication, TDMS, BDLS and/or custom scripts to populate non-prod systems.  Will those tools and techniques work in the cloud and at what cost?

 

The last part of this discussion will deal with migration challenges when moving to the cloud and lastly, a few of the reasons that are often used to justify a move to the cloud.

May 5, 2017 Posted by | Uncategorized | , , , , , , , | Leave a comment

Is your company ready to put S/4HANA into the cloud? – Part 2

And now, the details and rationale behind the questions posed in Part 1.

  • What is the expected memory size for the HANA DB? Your HANA instances may fit comfortably within the provider’s offerings, or may force a bare-metal option, or may not be offered at all.  Equally important is expected growth as you may start within one tier and end in another or may be unable to fit in a provider’s cloud environment.
  • What are your performance objectives and how will they be measured/enforced? This may not be that important for some non-production environments, but production is used to run part, or all, of a company.  The last thing you want is to find out that transaction performance is not measured or for which no enforcement for missing an objective exists.  Even worse, what happens if these are measured, but only up to the edge of the provider’s cloud, not inclusive of WAN latency?  Sub-second response time is usually required, but if the WAN adds .5 seconds, your end users may not find this acceptable.  How about if the WAN latency varies?  The only thing worse than poor performance is unpredictable performance.
    • Who is responsible for addressing any performance issues? No one wants finger pointing so is the cloud provider willing to be responsible for end-user performance including WAN latency and at what cost?
    • Is bare-metal required or if shared, how much overhead, how much over-commitment? One of the ways that some cloud providers offer a competitive price is using shared infrastructure, virtualized with VMware or PowerVM for example.  Each of these have different limits and overhead with VMware noted by SAP as having a minimum of 12% overhead and PowerVM with 0% as the benchmarks were run under PowerVM to start with.  Likewise, VMware environments are limited to 4TB per instance and often multiple different instances may not run on shared infrastructure based on a very difficult to understand set of rules from SAP.  PowerVM has no such limits or rules and allows up to 8 concurrent production instances, each up to 16TB for S/4 or SoH up to the physical limits of the system.  If the cloud provider is offering a shared environment, are they running under SAP’s definition of “supported” or are they taking the chance and running “unsupported”?  Lastly, if it is a shared environment, is it possible that your performance or security may suffer because of another client’s use of that shared infrastructure?
  • What availability is required? 99.8%? 9%?  99.95%? 4 nines or higher?  Not all cloud providers can address the higher limits, so you should be clear about what your business requires.
  • Is HA mandatory? HA is usually an option, at a higher price.  The type of HA you desire may, or may not, be offered by each cloud provider.  Testing of that HA solution periodically may, or may not be offered so if you need or expect this, make sure you ask about it.
    • For HA, what are the RPO, RTO and RTP time limits? Not all HA solutions are created equal.  How much data loss is acceptable to your business and how quickly must you be able to get back up and running after a failure?  RTP is a term that you may not have heard to often and refers to “Return to Processing”, i.e. it is not enough to get the system back to a point of full data integrity and ready to work, but the system must be at a point that the business expects with a clear understanding of what transactions have or have not been committed.  Imagine a situation where a customer places an order or paid a bill, but it gets lost or where you paid a supplier and mistakenly pay them a second time.
  • Is DR mandatory and what are the RTP, RTO and RTP time limits? Same rationale for these questions as for HA, once again DR, when available, always offered at an additional charge and highly dependent on the type of replication used, with disk based replication usually less expensive than HANA System Replication but with a longer RTO/RTP.
    • Incremental costs for DR replication bandwidth? Often overlooked is the network costs for replicating data from the primary site to the DR site, but clearly a line item that should not be overlooked.  Some customers may decide to use two different cloud providers for primary and DR in which case not only may pricing for each be different but WAN capacity may be even more critical and pricey.
    • Disaster readiness assessment, mock drills or full, periodic data center flips? Having a DR site available is wonderful provided when you actually need it, everything works correctly.  As this is an entire discussion unto itself, let it be said that every business recovery expert will tell you to plan and test thoroughly.  Make sure you discuss this with a potential cloud provider and have the price to support whatever you require included in their bid.

 

I said this would be a two part post, but there is simply too much to include in only 2 parts, so the parts will go on until I address all of the questions and issues.

May 4, 2017 Posted by | Uncategorized | , , , , , , , | Leave a comment