Actually, trifecta would imply only 3 big wins at the same time and HANA on Power Systems just hit 4 such big wins.
Win 1 – HANA 2.0 was announced by SAP with availability on Power Systems simultaneously as with Intel based systems.[i] Previous announcements by SAP had indicated that Power was now on an even footing as Intel for HANA from an application support perspective, however until this announcement, some customers may have still been unconvinced. I noticed this on occasion when presenting to customers and I made such an assertion and saw a little disbelief on some faces. This announcement leaves no doubt.
Win 2 – HANA 2.0 is only available on Power Systems with SUSE SLES 12 SP1 in Little Endian (LE) mode. Why, you might ask, is this a “win”? Because true database portability is now a reality. In LE mode, it is possible to pick up a HANA database built on Intel, make no modifications at all, and drop it on a Power box. This removes a major barrier to customers that might have considered a move but were unwilling to deal with the hassle, time requirements, effort and cost of an export/import. Of course, the destination will be HANA 2.0, so an upgrade from HANA 1.0 to 2.0 on the source system will be required prior to a move to Power among various other migration options. This subject will likely be covered in a separate blog post at a later date. This also means that customers that want to test how HANA will perform on Power compared to an incumbent x86 system will have a far easier time doing such a PoC.
Win 3 – Support for BW on the IBM E850C @ 50GB/core allowing this system to now support 2.4TB.[ii] The previous limit was 32GB/core meaning a maximum size of 1.5TB. This is a huge, 56% improvement which means that this, already very competitive platform, has become even stronger.
Win 4 – Saving the best for last, SAP announced support for Suite on HANA (SoH) and S/4HANA of up to 16TB with 144 cores on IBM Power E880 and E880C systems.ii Several very large customers were already pushing the previous 9TB boundary and/or had run the SAP sizing tools and realized that more than 9TB would be required to move to HANA. This announcement now puts IBM Power Systems on an even footing with HPE Superdome X. Only the lame duck SGI UV 300H has support for a larger single image size @ 20TB, but not by much. Also notice that to get to 16TB, only 144 cores are required for Power which means that there are still 48 cores unused in a potential 192 core systems, i.e. room for growth to a future limit once appropriate KPIs are met. Consider that the HPE Superdome X requires all 16 sockets to hit 16TB … makes you wonder how they will achieve a higher size prior to a new chip from Intel.
Win 5 – Oops, did I say there were only 4 major wins? My bad! Turns out there is a hidden win in the prior announcement, easily overlooked. Prior to this new, higher memory support, a maximum of 96GB/core was allowed for SoH and S/4HANA workloads. If one divides 16TB by 144 cores, the new ratio works out to 113.8GB/core or an 18.5% increase. Let’s do the same for HPE Superdome X. 16 sockets times 24 core/socket = 384 cores. 16TB / 384 cores = 42.7GB/core. This implies that a POWER8 core can handle 2.7 times the workload of an Intel core for this type of workload. Back in July, I published a two-part blog post on scaling up large transactional workloads.[iii] In that post, I noted that transactional workloads access data primarily in rows, not in columns, meaning they traverse columns that are typically spread across many cores and sockets. Clearly, being able to handle more memory per core and per socket means that less traversing is necessary resulting in a high probability of significantly better performance with HANA on Power compared to competing platforms, especially when one takes into consideration their radically higher ccNUMA latencies and dramatically lower ccNUMA bandwidth.
Taken together, these announcements have catapulted HANA on IBM Power Systems from being an outstanding option for most customers, but with a few annoying restrictions and limits especially for larger customers, to being a best-of-breed option for all customers, even those pushing much higher limits than the typical customer does.
On October 4, 2016, Joe Caruso, Director – ERP Technical Architecture at Pfizer, will join me presenting on an ASUG Webinar. Pfizer is not only a huge pharmaceutical company but, more importantly, has implemented SAP throughout their business including just about every module and component that SAP has to offer in their industry. Almost 2 years ago, Pfizer decided to begin their journey to HANA starting with BW. Pfizer is a leader in their industry and in the world of SAP and has never been afraid to try new things including a large scale PoC to evaluate scale-out vs. scale-up architectures for BW. After completion of this PoC, Pfizer made a decision regarding which one worked better, proceeded to implement BW HANA and had their go-live just recently. Please join us to hear about this fascinating journey. For those that are ASUG members, simply follow this link.
If you are an employee of an ASUG member company, either an Installation or Affiliate member, but not registered for ASUG, you can follow this link to join at no cost. That link also offers the opportunity for companies to join ASUG, a very worthwhile organization that offers chapter meetings all over North America, a wide array of presentations at their annual meeting during Sapphire, the BI+Analytics conference coming up in New Orleans, October 17 – 20, 2016, hundreds of webinars, not to mention networking opportunities with other member companies and the ability to influence SAP through their combined power.
This session will be recorded and made available at a later date for those not able to attend. When the link to the recording is made available, I will amend this blog post with that information.
SAP HANA on Power is building momentum at a break neck pace at IBM and IBM’s commitment to HANA is quite apparent when you look at the amount of time and attention being dedicated to this topic at IBM’s premier systems event, IBM Edge 2016. The information presented below is copied from a newsletter that my colleague, Bob Wolf, shared with his distribution list.
In an odd coincidence, SAP, IBM, and even Oracle are all holding major customer conferences on the same week in September. IBM’s Edge 2016 conference will be held in Las Vegas on September 19-22nd at the MGM Grand. It will feature over 1,000 technical sessions revolving around infrastructure, Linux, Cloud, and yes, SAP.
There will be a total of 25 sessions related to SAP at the Edge 2016 conference. Many of these sessions will discuss various aspects and customer experiences with respect to HANA on Power. There will also be SAP related sessions covering cloud, IBM Flash and NVMe storage, as well as DB2. In addition to IBM presenters, there will be a number of HANA on Power sessions presented by a number of customers including Bosch, Ctac, CenturyLink, CPFL Energia, Deloitte, and South Shore. Two of IBM’s business partners, Meridian and Ciber will each be presenting a session.
Many of the IBM speakers covering the SAP topics will be shuttling back and forth between the IBM Edge conference at the MGM Grand and SAP’s TechEd conference at the Venetian. If you see any of the presenters wearing running shoes, rollerblades, or riding on a unicycle or hoverboard, they are not trying to make a fashion statement as much as they are just trying to run the 2 miles between the two conferences.
In case you would like to attend IBM’s Edge 2016 conference and haven’t registered already, here are some links for more information on the conference including how to register. Further down are the descriptions of the 25 SAP related sessions.
Attend IBM Edge 2016 and join the IT professionals, IT leaders and business leaders who will design, build and deliver infrastructure for the cognitive era.
Here’s what you won’t want to miss:
Over 1,000 technical sessions, demonstrations, deep dives, certifications, and labs
Birds of Feather panel discussions to learn from the experiences of fellow IT professionals
Client success stories and provocative presentations from industry thought leaders
Networking opportunities with experts, peers and solution providers in our state of the art Solution EXPO
“Edge at Night” on Tuesday evening:
Enjoy a concert by Train – the GRAMMY Award-winning band.
Before and after the concert, meet with other attendees pool-side for hors d’oeuvres, beverages and networking.
Attached below are short descriptions of all 25 the SAP-related sessions featured at Edge 2016.
1) PAD-1623: Architecting SAP HANA for Availability, Flexibility and Scalability to Handle Large Databases
SAP HANA is one of the hottest products around, and no system offers better characteristics to enable large-scale transaction processing and concurrent analytics than IBM Power Systems. This session will offer a deep-dive into the issues that must be addressed to deliver maximum availability while keeping costs under control through effective use of PowerVM. Of significant importance is the dilemma that many very large customers must consider: how to determine the best large-scale system for SAP Business Suite 4 SAP HANA (or SAP S/4HANA) lacking any specific benchmarks. This session will provide a methodology by which to evaluate both IBM and non-IBM systems for this purpose.
Alfred Freudenberger, IBM
2) PAD-1922: Benefits of Deploying SAP HANA on IBM Power Systems and Future Roadmap
The joint program between IBM and Bosch will present a case study highlighting the benefits of running SAP HANA on IBM Power Systems. Bosch will review why they chose IBM Power Systems as their platform of choice, their experience with the platform, and the benefits they received. Following this, we will present the roadmap for SAP HANA on Power to showcase what you can expect in the near future.
Sara Cohen, IBM
Erik Thorwirth, BOSCH
3) PAD-1921: Client Roundtable: SAP HANA on IBM Power Systems Experiences
Join this roundtable of clients who have deployed IBM Power Systems for SAP HANA. The panelists will discuss the benefits they have already achieved by using Power as the flexible infrastructure for HANA deployments, and why they chose Power to replace existing x86 servers running HANA. Attendees will also hear views on market trends and future joint IBM/SAP plans for HANA.
Vicente Moranta, IBM
Niek Verhaar, Ctac N.V.
Erik Thorwirth, BOSCH
Volker Fischer, BOSCH
Claude Bernier, South Shore
4) CCL-2387: Create a Fully Managed SAP HANA Environment in Minutes!
Participate in an interactive demo that walks you through entering a provisioning request process for a fully managed SAP Cloud environment. Then work with the IBM SAP on Cloud Benefits Estimator tool, where you can gauge your potential benefits in the areas of economics, availability, customer reach, innovation and security. See how you can deploy SAP HANA quicker than you ever thought possible.
Brian Burke, IBM
5) PAD-1163: Customer Experiences with Installing and Running SAP HANA on Linux on IBM Power Systems
Based on over a dozen customer installs of SAP HANA on Linux on IBM Power Systems, this talk will discuss: Why customers choose Linux on Power for HANA; what customers like about Linux on Power for HANA; what customers are using for their High Availability solution; what storage strategies customers have implemented; what applications customers are running; and what their production workloads look like. Come find out what to plan for with SAP Hana on Power, and how to architect a successful solution.
Kurt Koehle, IBM
6) PAD-1509: Dynamic Power Cloud Manager: Concurrent Hybrid Operation of AIX, IBM i, Linux and SAP HANA on Power
In 2014, FRITZ & MACZIOL won an IBM Beacon Award for “Outstanding IT Transformation Solution – Power Systems” for the Dynamic Power Cloud Manager (DPCM). This comprehensive administration and automation solution for IBM Power environments enables you to manage and automate AIX, IBM I, Linux and even SAP HANA concurrently on one or more IBM Power systems via the same, convenient GUI. It can deliver significant cost and time savings, reduce errors—and in case of failure a restore takes only a few mouse-clicks. Attend this session to learn how easy it can be to manage Power environments in this “smarter” way, and hear about its potential compliance, cost reduction and standardization benefits.
Rainer Schilling, FRITZ & MACZIOL Software und Computervertrieb GmbH
7) PAD-2543: Enterprise HANA
SAP HANA on IBM Power Systems offers enterprise virtualization capabilities such as capacity on-demand and dynamic resizing of compute and memory resources for on-premise and cloud deployments. These capabilities, along with Live Partition Mobility, high availability and disaster recovery features, bring unprecedented flexibility and allow customers to achieve lower TCO. Come hear client perspectives on these benefits.
Niek Verhaar, Ctac N.V.
Erik Thorwirth, BOSCH
Mysore S. Srinivas, IBM
8) PAD-1044: Extended Solutions for SAP HANA and IBM Power Systems Landscapes
This presentation will provide an extended overview of the most important enterprise solutions encapsulating the SAP HANA and IBM Power and Storage landscapes. It will discuss the portfolio of architectural elements needed to increase enterprise-class high availability and simplify disaster recovery using different data replication methods or more common backup/restore/recovery solutions. This session is an extension of the session “Optimal Deployment of SAP HANA on IBM Power and IBM Storage Platforms Based on Customer Experience.”
Damir Rubic, IBM
9) PAD-2509: How CenturyLink Optimized Its SAP and SAP HANA Deployments with a Hybrid SAP Infrastructure
At CenturyLink, we strategically leveraged a hybrid SAP infrastructure on IBM Power Systems, giving us investment protection in addressing today’s SAP business requirements while positioning us for our future SAP HANA requirements. We weren’t exactly sure when we would be ready for SAP HANA; however, we knew we needed to add capacity and upgrade our current SAP environment. Our POWER8 solution gave us flexibility: regardless of the direction and timing we chose, it allowed us to run our traditional SAP workload while giving us the ability to evaluate benefits of HANA on POWER8. Once the decision was made to move ECC to HANA, the infrastructure investments offered significant value over the alternatives.
Odell Riley, CenturyLink
Connie Walden, CenturyLink
10) WET-2626: How to Choose the Best Cloud Provider to Maximize the Value of Your SAP Deployment
More and more companies are moving their SAP systems to the cloud—and they’re turning to third-party experts for help. But you can’t risk partnering with the wrong cloud services provider (CSP). How do you choose the right CSP with the right expertise for your business, SAP landscape and workloads, while ensuring that all stakeholder concerns (CIO, CFO, LOB managers and SAP project leads) are addressed? You’ll walk away from this session with a list of what to look for when choosing a CSP, what questions to ask, and the determining factors to help you that decision.
Brian Burke, IBM
John Harris, IBM
11) CDE-1383: IBM POWER8 and DB/2 BLU: The Foundation of a Cognitive Analytics Platform
In this session, we will share a real-world customer example showcasing the advantages of IBM POWER8 in combination with IBM DB2 BLU. We will discuss the decision process and consider alternatives, such as SAP HANA and other options. We also will provide information on the sizing process and the architectural planning, including TCO/ROI and delivery considerations. In addition, we will explore other advantages, such as IBM PowerHA, AIX Enterprise Edition (EE) and Capacity on Demand (CoD), which provided additional value. The solution was deployed on four IBM Power System E850 servers.
Ulrich Walter, IBM
12) PAD-2305: Implementing S4/HANA on IBM Power Systems to Maximize Performance for Analytics and Big Data
This session covers the current implementation of S4/HANA and reviews lessons learned on the IBM Power Systems platform for analytics. We will also share the performance benefits of the Power Systems platform for enterprise SAP workloads that can help reduce cost and increase productivity. In addition, we will describe how the virtualization built into the Power Systems architecture can allow multiple discrete production instances in virtual machines on the same physical server, thus reducing TCO. Finally, we will review the current state of SAP on Power Systems in the IBM cloud, and its future roadmap.
Balbir Wadhwa, Deloitte
James Seaman, IBM
13) PAD-2342: Leverage the Full Capabilities of IBM Power Systems to Run SAP HANA on a Flexible, In-Memory Cloud
The session will offer insight into how to run SAP HANA environments on a cloud infrastructure built with IBM PowerVM and IBM hardware. Find out why you should choose IBM technology to create a highly scalable platform that is optimized to run in-memory SAP solutions, and what the design and architecture concepts look like. You’ll also hear an overview of how Ctac built their platform with IBM components, the project approach (smooth cooperation with IBM), challenges and necessary SAP performance tests. Ctac will also share the capabilities of their platform and how they can cope with scalability and capacity on-demand.
Niek Verhaar, Ctac N.V.
14) SRA-2554: Leverage the Full Capabilities of IBM Technology to Run SAP HANA on a Flexible In-Memory Cloud
Ctac, a business consultant and cloud provider in the Netherlands, wanted to enhance analytics support for its customers with an in-memory cloud solution that could deliver high performance and simple scalability. This session will give attendees a technical deep-dive into how to run SAP HANA environments on a cloud infrastructure built on IBM PowerVM and IBM FlashSystem. We will describe how to design the architecture of a highly scalable platform that is optimized to run in-memory SAP solutions, share our business challenges and project approach, present the necessary SAP performance test, and demonstrate the capabilities that help us cope with scalability and capacity changes on-demand.
Eric Sperley, IBM
Niek Verhaar, Ctac N.V.
15) PAD-1243: Major Client SAP Landscape Transformation to SAP HANA on POWER8 in a Private Cloud Environment
This talk describes a first-of-a-kind SAP HANA on IBM Power Systems private cloud solution in the datacenter of a large Automotive customer. The solution includes a fully automated installation of Linux LPARs through PowerVC, as well as the unattended installation of SAP HANA with the required storage, CPU and memory. Now the customer’s development team can deploy SAP HANA installations on-demand. As an orchestration tool, we use IBM Cloud Orchestrator. For the infrastructure, the client chose IBM Power System E880 servers and IBM Elastic Storage Server (ESS) with an InfiniBand network. The session will also cover different use cases we implemented to modify the deployed SAP HANA installations (SAP BW, SAP Business Suite on HANA).
Carsten Dieterle, IBM
Dietmar Wierzimok, IBM
16) PBD-1262: Meeting Ultra Stringent HA and DR SLAs for an SAP Ecosystem with IBM Power Systems and Storage
A global manufacturing company needed its SAP ecosystem to be available 24×7, with SLAs of one hour for disaster recovery (DR) and ten minutes for high availability (HA). Hear a real-world case study on how these requirements were met using IBM Power System S824 servers and IBM Storwize V7000 storage–while lowering TCO. After the rollout the client successfully ran a live DR drill on the production environment within the defined SLA. This IBM solution is now a reference for this client’s sister companies.
Hoson Rim, Meridian IT
17) PAD-2545: NVMe Saves Customers Time and Money
New NVMe non-volatile memory technology on IBM Power Systems makes SAP HANA more efficient by saving customers time and money when restarting HANA databases. SAP HANA databases on Power Systems now leverage the latest in server-side flash caching technology that revolutionizes server I/O performance and reliability.
Vicente Moranta, IBM
18) PAD-1043: Optimal Deployment of SAP HANA on IBM Power and IBM Storage Platforms Based on Customer Experience
The presentation will provide a detailed description of the core SAP HANA architectural elements and uses cases that are being considered for optimal deployment on the IBM Power Systems platform. It will also discuss the infrastructure architectural building blocks utilized to achieve an accelerated implementation and improved performance synergy between SAP HANA and IBM Linux on Power-based enterprise landscapes.
Damir Rubic, IBM
19) PAD-1045: Planning for a Successful SAP HANA on IBM Power Systems Proof of Concept
This presentation consolidates the basic and advanced SAP HANA and IBM Power Systems architectural elements described in other Edge 2016 sessions into the optimal set of steps required for the successful execution of a Proof of Concept (PoC) for this enterprise landscape.
Damir Rubic, IBM
20) PAD-1353: Running SAP Faster on FlashSystem and POWER8
Facing a 25% increase in the number of customers, CPFL Energia was close to disrupting its business operations. See how POWER8 plus IBM FlashSystem reduced the overnight billing process from eight hours to five hours (a reduction of 37.5%), which released the capacity to receive two million additional customers and improved call center responsiveness. Attendees will see how a high-performance computing solution can turn a problem into a revolution within the enterprise. Also, you will gain insight into how to extract the maximum from your infrastructure to obtain better SAP HANA performance.
Marcio Felix, CPFL Energia
Tiago Machado, CPFL Energia
21) PAD-1112: SAP HANA Data Protection Capabilites and Customer Experiences
HANA is SAP’s strategic in-memory computing platform that is designed to deliver better performance for analytic and transactional applications. HANA’s platforms and business environments continue to expand. IBM’s Spectrum Protect (formerly TSM) has a long tradition of protecting SAP databases. SAP has announced HANA on the IBM Power Systems platform, and again IBM Spectrum Protect was the first partner on the market to support this new platform. The number of reference customers and production accounts is constantly increasing on all HANA platforms. Come learn about this exciting solution, and hear about selected worldwide customer scenarios and best practices for meeting SAP HANA data protection challenges on Intel and Power platforms.
Gerd Munz, IBM
Carsten Dieterle, IBM
22) PAD-2385: SAP HANA and IBM Power Systems: A Real-Time View into Your Business
The new SAP HANA on IBM Power Systems solution is ideal for IBM Power Systems users who are moving to SAP HANA, as well as for existing SAP HANA and SAP BI Accelerator users running on x86 who are looking to take advantage of the latest POWER8 technologies. Join us to hear IBM’s foremost expert on SAP HANA and IBM Power Systems share key information on: The SAP/IBM alliance–the best of both worlds as you move to SAP HANA; unique SAP HANA attributes delivered on IBM Power Systems, including performance, architecture, and low TCA/TCO; and a comparison of the on-premise IBM Power Systems for SAP HANA with cloud solutions for SAP HANA.
Brian Burke, IBM
John Wise, IBM
23) PAD-2544: SAP S/4HANA on IBM Power Systems: Resiliency and Scalability
Memory resiliency is key for large, multi-terabyte, in-memory databases such as SAP Business Suite 4 SAP HANA (SAP S/4HANA). As system memory capacity grows, memory failure rates go up, resulting in unplanned outages of mission-critical deployments. IBM Power Systems offers resilient memory subsystems as good as mainframes with DRAM sparing and dynamic memory deallocation for predictive failures. Come hear more about Power Systems’ memory features, scalability and resiliency.
Mysore S. Srinivas, IBM
24) PAD-1658: Technical and Financial Benefits of Running SAP HANA on IBM POWER8 Versus Intel
This session starts with an overview of what SAP supports with S/4HANA running SUSE Linux on IBM POWER8 servers, followed by an in-depth overview of how POWER8 technology delivers the solution for less with higher performance and greater availability. We will contrast the technical and financial benefits of IBM POWER versus Intel options to put it into perspective. Understanding the technical and financial advantages are critical for line-of-business, C-level and SAP members to recognize. Some stakeholders may overlook the benefit provided by the foundation of the SAP solution by viewing infrastructure as not playing a role in the success of SAP, which can lead to overpaying while accepting an inferior solution.
Brett Murphy, Ciber
25) PAD-2254: Why SAP HANA on IBM Power Systems is More Efficient and Cost-Effective than on Intel
This session demonstrates how SAP HANA on IBM Power Systems not only provides a higher quality of service, but also reduces IT cost. You will see how the specific platform characteristics of IBM Power Systems translate into cost savings compared to x86. The discussion will include detailed total cost of ownership examples from real client case studies covering SAP Business Suite and Business Warehouse applications that demonstrate efficiencies and savings to businesses.
Program: Accelerate Cloud Innovation with Power Systems
Track: /ENHANCE/ Big Data and In-Memory Analytics
Information is subject to change. Please consult the IBM Events mobile app for complete agenda and session details.
Christopher von Koschembahr, IBM
Last week, HP announced their intention to acquire SGI for $275M, a 30% premium over its market cap prior to the announcement. This came as a surprise to most people, including me, both that HP would want to do this and how little SGI was worth, a fact that eWeek called “Embarrasing”, http://www.eweek.com/innovation/hpe-buys-sgi-for-275-million-how-far-the-mighty-have-fallen.html. This raises a whole host of questions that HANA customers might want to consider.
First and foremost, there would appear to be three possible reasons why HP made this acquisition, 1) eliminate a key competitor and undercut Dell and Cisco at the same time, 2) acquire market share, 3) obtain access to superior technology, resources and/or keep them out of the hands of a competitor or some combination of the above.
If HP considered SGI a key competitor, it means that HP was losing a large number of deals to SGI, a fact that has not been released publicly but which would imply that customers are not convinced of Superdome X’s strength in this market. As many are aware, both Dell and Cisco have resell agreements with SGI for their high end UV HANA systems. It would seem unlikely that either Dell or Cisco would continue such a relationship with their arch nemesis and as such, this acquisition will seriously undermine the prospects of Dell and Cisco to compete for scale-up HANA workloads such as Suite on HANA and S/4HANA among customers that may need more than 4TB, in the case of Dell, and 6TB, in the case of Cisco.
On the other hand, market share is a reasonable goal, but SGI’s total revenue in the year ending 6/26/15 was only $512M, which would barely be a rounding error on HP’s revenue of $52B for the year ending 10/31/15. Hard to imagine that HP could be this desperate for a potential 1% increase in revenue, assuming they had 0% overlap in markets. Of course, they compete in the same markets, so the likely revenue increase is considerably less than even that paltry number.
That brings us to the third option, that the technology of SGI is so good that HP wanted to get their hands on it. If that is the case, then HP would be admitting that the technology in Superdome X is inadequate for the demands of Big Data and Analytics. I could not agree more and made such a case in a recent post on this blog. In that post, I noted the latency inherent in HP’s minimum 8-hop round trip access to any off board resources (remote memory accesses adds another two hops), remember there are only two Intel processors per board in a Superdome X system which can accommodate up to 16 processors. Scale-up transactional workloads typically access data in rows dispersed across NUMA aligned columns, i.e. will constantly be traversing this high latency network. Of course, this is not surprising since the architecture used in this system is INCREDIBLY OLD having been developed in the early 2000s, i.e. way before the era of Big Data. But the surprising thing is that this would imply that HP believes SGI’s architecture is better than their own. Remember, SGI’s UV system uses a point to point, hand wired, 4-bit wide mesh of wires between every two NUMA ASICs in the system, which, for the potential 32-sockets in a single cabinet system means 16 ASICS and 136 wires, if I have done the math correctly. HP has been critical of the memory protection employed in systems like SGI’s UV which is based on SDDC (Single Device Data Correction). In HP’s own words about their DDDC+1 memory protection: “This technology delivers up to a 17x improvement in the number of DIMM replacements versus those systems that use only Single-Chip Sparing technologies. Furthermore, DDDC +1 significantly reduces the chances of memory related crashes compared to systems that only have Single-Chip Sparing capabilities” HP Integrity Superdome X system architecture and RAS Whitepaper. What is really interesting is that SDDC does not imply even Single-Chip Sparing.
The only one of those options which makes sense to me is the first one, but of course, I have no way of knowing which of the above is correct. One thing is certain; customers considering implementing HANA on a high-end scale-up architecture from either HP or SGI, and Dell and Cisco by extension, are going to have to rethink their options. HP has not stated which architecture will prevail or if they will keep both, hard to imagine but not out of the question either. Without concrete direction from HP, it is possible that a customer decision for either architecture could result in almost immediate obsolescence. I would not enjoy being in a board room of a company or meeting with my CFO to explain how I made a decision for a multi-million dollar solution which is dead-ended, worth 1/2 or less overnight and for which a complete replacement will be required sooner than later. Likewise, it would be hard to imagine an upcoming decision for a strategic system to run the company’s entire business based on a single flip of a coin.
Now I am going to sound like a commercial for IBM, sorry. There is an alternative which is rock solid, increasing, not decreasing in value, and has a strong and very clear roadmap, IBM Power Systems. POWER8 for HANA has been seeing one of the fastest market acceptance rates of any solution in recent memory with hundreds of customers implementing HANA on Power Systems and/or purchasing systems in preparation for such an implementation, ranging from medium sized businesses in high growth markets to huge brand names. The roadmap was revealed earlier this year at the OpenPower Summit, http://www.nextplatform.com/2016/04/07/ibm-unfolds-power-chip-roadmap-past-2020/. This roadmap was further backed up with Google’s announcement of their plans for POWER9, http://www.nextplatform.com/2016/04/06/inside-future-google-rackspace-power9-system/, the UK’s SFTC plans, http://www.eweek.com/database/ibm-powers-uk-big-data-research.html worth £313 (roughly $400M based on current exchange rates) and the US DOE’s decision to base its Summit and Sierra supercomputers on IBM POWER9 systems, http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers, a $375M investment, either of which is worth more than the entire value of SGI interestingly enough. More importantly, these two major wins mean IBM is now contractually obligated to deliver POWER9 thereby ensuring the Power roadmap for a long time to come. And, of course, IBM has a long history of delivering mission critical systems to customers, evolving and improving the scaling and workload handling characteristics over time while simultaneously improving systems availability.
SAP made a major revision of the SAP note describing the support of SAP applications with SAP HANA on IBM Power Systems on July 14, 2016. Previously, SAP note 2218464 – Supported products when running SAP HANA on IBM Power Systems had said, “The list is complete. If a software component, product or add-on is not listed in this note or the notes referenced directly by this note, it is not supported with a SAP HANA database running on the IBM POWER Architecture.” among other things. As of version 43 of this note, it now says: “A SAP product version or add-on product version is supported on SAP HANA SPS 11 and newer running on IBM Power Systems, if and only if the product version or add-on version meets all of the following criteria:” and goes on to detail the criteria. The upshot is that unless an SAP application is officially “not” supported, and only 5 of them are still in that category, or it requires a co-req or pre-req which is not supported and it meets minimum documented support levels, then it is, by default, fully supported.
This is an incredibly important change. It takes HANA on Power from being a one-off product with an official list of supported applications to being a standard part of the SAP portfolio with only a list of unsupported applications. Please note, unsupported does not imply that it won’t work or that SAP is not trying to move it to full support, simply that they have not made those applications GA with HANA on Power. As with the previous versions of this note, customers requiring support for any SAP applications which are not supported today should communicate with their SAP AE and request this support. In general, unless there are actual software issues, SAP will consider each request individually and decide whether to provide support, to invite the customer to try out the application in a PoC or to suggest other alternatives.
Part 1 of this subject detailed the challenges when sizing large scale-up transactional HANA environments. This part will dive into the details and methodology by which customers may select a vendor lacking an independent transactional HANA benchmark.
Past history with large transactional workloads
Before I start down this path, first it would be useful to understand why it is relevant. HANA transaction processing utilizes many of the same techniques as a conventional database. It accesses rows, albeit each column is physically separate, the transaction does not know this and gets all of the data together in one place prior to presenting the results to the dialog calling it. Likewise, a write must follow ACID properties including only one update against a piece of data can occur at any time requiring that cache coherency mechanisms are employed to ensure this. And a write to a log in addition to the memory location of the data to be changed or updated must occur. Sounds an awful lot like a conventional DB which is why past history handling these sorts transactional workloads makes plenty of sense.
HPE has a long history with large scale transactional workloads and Superdome systems, but this was primarily based on Integrity Superdome systems using Itanium processors and HP-UX not with Intel x86 systems and Linux. Among the Fortune 100, approximately 20 customers utilized HPE’s systems for their SAP database workloads almost entirely based on Oracle with HP-UX. Not bad and coming in second place to IBM Power Systems with approximately 40 of the Fortune 100 customers that use SAP. SGI has exactly 0 of those customers. Intel x86 systems represent 8 of that customer set with 2 being on Exadata, not even close to a standard x86 implementation with its Oracle RAC and highly proprietary storage environment. Three of the remaining x86 systems are utilized by vendors whose very existence is dependent on x86 so running on anything else would be a contradictory to their mission and these customers must make this solution work no matter what the expense and complexity might be. That leaves 3 customers, none of which utilize Superdome X technology for their database systems. To summarize, IBM Power has a robust set of high end current SAP transactional customers; HPE a smaller set entirely based on a different chip and OS than is offered with Superdome X; SGI has no experience in this space whatsoever; and x86 in general has limited experience confined to designs that have nothing in common with today’s high end x86 technology.
Industry Standard Benchmarks
A bit of background. Benchmarks are lab experiments open to optimization and exploitation by experts in the area and have little resemblance to reality. Unfortunately, it is the only third party metric by which systems can be compared. Benchmarks fall into two general categories, those that are horrible and those that are not horrible (note I did not say good). Horrible ones sometimes test nothing but the speed of CPUs by placing the entire running code in instruction cache and the entire read-only dataset upon which the code executes in data cache meaning no network and disk much less any memory I/O or cache coherency. SPEC benchmarks such as SPECint2006 and SPECint_rate2006 fall into this category. They are uniquely suited for ccNUMA systems as there is absolutely no communication between any sockets meaning this represents the best case scenario for a ccNUMA system.
It is therefore revealing that SGI, with 32 sockets and 288 cores, was only able to achieve 11,400 on this ideal ccNUMA benchmark, slightly beating HP Superdome X’s result of 11,100, also with 288 cores. By comparison, the IBM Power Systems E880 with only 192 cores, i.e. 2/3 of the cores, achieved 14,400, i.e. 26% better performance.
In descending order from horrible to not as bad, there are other benchmarks which can be used to compare systems. The list of benchmarks includes SAP SD 2-tier, SAP BW-EML, TPC-C and SAP 3-tier. Of those, the SD 2-tier has the most participation among vendors and includes real SAP code and a real database, but suffers from the database being a tiny percentage of the workload, approximately 6 to 8%, meaning on ccNUMA systems, multiple app servers can be placed on each system board resulting in only database communication going across a pretty darned fast network represented by the ccNUMA fabric. SGI is a no-show on this benchmark. HPE did show with Superdome X @ 288 cores and achieved 545,780 SAPS (100,000 users, Ref# 2016002), and still the world record holder. IBM Power showed up with the E870, an 80 core systems (28% of the number of cores as the HPE system) and achieved 436,100 SAPS (79,750 users, Ref# 2014034) (80% of the SAPS of the HPE system). Imagine what IBM would have been able to achieve with this almost linearly scalable benchmark had they attempted to run it on the E880 with 192 cores (probably close to 436,100 * 192/80 although it is not allowed for any vendor to publish the “results” of any extrapolations of SAP benchmarks but no one can stop a customer from inputting those numbers into a calcuator).
BW-EML was SAP’s first benchmark designed for HANA, although not restricted to it. As the name implies, it is a BW benchmark, so it is difficult to derive any correlation to transaction processing, but at least it does show some aspect of performance with HANA, analytic if nothing else and concurrent analytics is one of the core value propositions of HANA. HPE was a frequent contributor to this benchmark, but always with something other than Superdome X. It is important to note that Superdome X is the only Intel based system to utilize RAS mode or Intel Lockstep, by default, not as an option. That mode has a memory throughput impact of 40% to 60% based on published numbers from a variety of vendors, but, to date, no published benchmarks, of any sort, have been run in this mode. As a result, it is impossible to predict how well Superdome X might perform on this benchmark. Still, kudos to HPE for their past participation. Much better than SGI which is, once again, a no-show on this benchmark. IBM Power Systems, as you might predict, still holds the record for best performance on this benchmark with the 40 core E870 system @ 2 Billion rows.
TPC-C was a transaction processing benchmark that, at least for some time period, had good participation, including from HP Superdome. That is, until IBM embarrassed HPE so much, by delivering 50% more performance with ½ the number of cores. After this, HPE never published another result on Superdome … and that was back in the 2007/2008 time frame. TPC-C was certainly not a perfect benchmark, but it did have real transactions with real updates and about 10% of the benchmark involved remote accesses. Still, SGI was a no-show and HPE stopped publishing on this level of system in 2007 while IBM continued publishing through 2010 until there was no one left to challenge their results. A benchmark is only interesting when multiple vendors are vying for the top spot.
Last, but certainly not least, is the SAP SD 3-tier benchmark. In this one, the database was kept on a totally separate server and there was almost no way to optimize it to remove any ccNUMA effects. Only IBM had the guts to participate in this benchmark at a large scale with a 64-core POWER7+ system (the previous generation to POWER8). There was no submission from HPE that came even remotely close and, once again, SGI was MIA.
Where IBM Power Systems utilizes a “glueless” interconnect up to 16 sockets, meaning all processor chips connect to each other directly, without the use of specialized hub chips or switches, Intel systems beyond 8 sockets utilize a “glued” architecture. Currently, only HPE and SGI offer solutions beyond 8 sockets. HPE is using a very old architecture in the Superdome X, first deployed for PA-RISC (remember those) in the Superdome introduced in 2000. Back then, they were using a cell controller (a.k.a. hub chip) on each system board. When they introduced the Itanium processor in 2002, they replaced this hub chip with a new one called SX1000; basically an ASIC that connected the various components on the system board together and to the central switch by which it communicats with other system boards. Since 2002, HPE has moved through three generations of ASICs and now is using the SX3000 which features considerably faster speeds, better reliability, some ccNUMA enhancements and connectivity to multiple interconnect switches. Yes, you read that correctly; where Intel has delivered a new generation of x86 chips just about every year over the last 14 years, HPE has delivered 3 generations of hub chips. Pace of innovation is clearly directly tied to volume and Superdome has never achieved sufficient volume alone nor use by other vendors to increase the speed of innovation. This means that while HPE may have delivered a major step forward at a particular point in time, it suffers from a long lag and diminishing returns as time and Intel chip generations progress. The important thing to understand is that every remote access, from either of the two Intel EX chips on each system board, to cache, memory or I/O connected to another system board, must pass through 8 hops, at a minimum, i.e. from calling socket, to SX3000, to central switch to remote SX3000, to remote socket and the same trip in return and that is assuming that data was resident in an on-board cache.
SGI, the other player in the beyond 8 socket space, is using a totally different approach, derived from their experience in the HPC space. They are also using a hub chip, called a HARP ASIC, but rather than connecting through one or more central switches, in the up to 32 socket systems UV 300H system, each system board, featuring 4 Intel EX chips and a proprietary ASIC per memory riser, includes two hub chips which are linked directly to each of the other hub chips in the system. This mesh is hand wired with a separate physical cable for every single connection. Again, you read that correctly, hand wired. This means that not only are physical connections made for every hub chip to hub chip connection with the inherent potential for an insertion or contact problem on each end of that wire, but as implementation size increases, say from 8-sockets/2 boards to 16-sockets/4 boards or to 32-sockets/8 boards, the number of physical, hand wired connections increases exponentially. OK, assuming that does not make you just a little bit apprehensive, consider this: Where HPE uses a memory protection technology called Double Device Data Correction + 1 (DDDC+1) in their Superdome X system, basically the ability to handle not just a single memory chip failure but at least 2 (not at the same time), SGI utilizes SDDC, i.e. Single device data correction. This means that after detection of the first failure, customers must rapidly decide whether to shut down the system and replace the failing memory component (assuming it has been accurately identified), or hope their software based page deallocation technology works fast enough to avert a catastrophic system failure due to a subsequent memory failure. Even with that software, if a memory fault occurs in a different page, the SGI system would still be exposed. My personal opinion is that memory protection is so important in any system, but especially in large scale scale-up HANA systems, that anything short of true enterprise memory protection of at least DDDC is doing nothing other than increasing customer risk.
SGI is asking customers to accept their assertions that SAP’s certification of the SGI UV 300H at 20TB implies they can scale better than any other platform and perform well at that level, but they are providing no evidence in support of that claim. SAP does not publish the criteria with which is certifies a solution, so it is possible that SGI has been able to “prove” addressability at 20TB, the ability to initialize a HANA system and maybe even to handle a moderate number of transactions. Lacking any sort of independent, auditable proof via a benchmark, any reasonable body of customers (one would be nice at least) driving high transaction volumes with HANA or a conventional database and anything other than a 4-bit wide, hand wired ccNUMA nest that would seem prone to low throughput and high error rates, especially with substandard memory protection, it is hard to imagine why anyone would find this solution appealing.
HPE, by comparison, does have some history in transactional systems at high transactional volumes with a completely different CPU, OS and memory architecture, but nothing with Superdome X. HPE has a few benchmarks, however poor, once again on systems from long ago plus mediocre results with the current generation and an architecture that has a minimum of 8-hops round trip for every remote access. On the positive side, at least HPE gets it regarding proper memory protection, but does not address how much performance degradation results from this protection. Once again, SAP’s certification at 16TB for Superdome X must be taken with the same grain of salt as SGI’s.
IBM Power Systems has an outstanding history with transactional systems at very high transactional volumes using current generation POWER8 systems. Power also dominates the benchmark space and continued to deliver better and better results until no competitor dared risk the fight. Lastly, POWER8 is latest generation of a chip designed from the ground up with ccNUMA optimization in mind and with reliability as its cornerstone, i.e. the results already include any overhead necessary to support this level of RAS. Yes, POWER8 is only supported at 9TB today for SAP SoH and S/4HANA, but lest we forget, it is the new competitor in the HANA market and the other guys only achieved their higher supported numbers after extensive customer and internal benchmark testing, both of which are underway with Power.
Customers which require Suite on HANA (SoH) and S/4HANA systems with 6TB of memory or less will find a wide variety of available options. Those options do not require any specialized type of hardware, just systems that can scale up to 8 sockets with Intel based systems and up to 64 cores with IBM Power Systems (socket count depends on the number of active cores per socket which varies by system). If you require 6TB or less or can’t imagine ever needing more, then sizing is a fairly easy process, i.e. look at the sizing matrix from SAP and select a system which meets your needs. If you need to plan for more than 6TB, this is where it gets a bit more challenging. The list of options narrows to 5 vendors between 6TB and 8TB, IBM, Fujitsu, HPE, SGI and Lenovo and gets progressively smaller beyond that.
All systems with more than one socket today are ccNUMA, i.e. remote cache, memory and I/O accesses are delivered with more latency and lower bandwidth than local to the processor accesses. HANA is highly optimized for analytics, which most of you probably already know. The way it is optimized may not be as obvious. Most tables in HANA are columnar, i.e. every column in a table is kept in its own structure with its own dictionary and the elements of the column are replaced with a very short dictionary pointer resulting in outstanding compression, in most cases. Each column is placed in as few memory pages as possible which means that queries which scan through a column can run at crazy fast speeds as all of the data in the column is as “close” as possible to each other. This columnar structure is beautifully suited for analytics on ccNUMA systems since different columns will typically be placed behind different sockets which means that only queries that cross columns and joins will have to access columns that may not be local to a socket and, even then, usually only the results have to be sent across the ccNUMA fabric. There was a key word in the previous sentence that might have easily been missed: “analytics”. Where analytical queries scan down columns, transactional queries typically go across rows in which, due to the structure of a columnar database, every element in located in a different column, potentially spanning across the entire ccNUMA system. As a result, minimized latency and high cross system bandwidth may be more important than ever.
Let me stop here and give an example so that I don’t lose the readers that aren’t system nerds like myself. I will use a utility company as an example as everyone is a utility customer. For analytics, an executive might want to know the average usage of electricity on a given day at a given time meaning the query is composed of three elements, all contained in one table: usage, date and time. Unless these columns are enormous, i.e. over 2 Billion rows, they are very likely stored behind a single socket with no remote accesses required. Now, take that same company’s customer care center, where a utility consumer wants to turn on service, report an outage or find out what their last few months or years of bills have been. In this case, all sorts of information is required to populate the appropriate screens, first name, last name, street address, city, state, meter number, account number, usage, billed amount and on and on. Scans of columns are not required and a simple index lookup suffices, but every element is located in a different column which has to be resolved by an independent dictionary lookup/replacement of the compressed elements meaning several or several dozen communications across the systems as the columns are most likely distributed across the system. While an individual remote access may take longer, almost 5x in a worst case scenario[i], we are still talking nanoseconds here and even 100 of those still results in a “delay” of 50 microseconds. I know, what are you going to do while you are waiting! Of course, a utility customer is more likely to have hundreds, thousands or tens of thousands of transactions at any given point in time and there is the problem. An increased latency of 5x for every remote access may severely diminish the scalability of the system.
Does this mean that is not possible to scale-up a HANA transactional environment? Not at all, but it does take more than being able to physically place a lot of memory in a system to be able to utilize it in a productive manner with good scalability. How can you evaluate vendor claims then? Unfortunately, the old tried and true SAP SD benchmark has not been made available to run in HANA environments. Lacking that, you could flip a coin, believe vendor claims without proof or demand proof. Clearly, demanding proof is the most reasonable approach, but what proof? There are three types of proof to look at: past history with large transactional workloads, industry standard benchmarks and architecture.
In the over 8TB HANA space, there are three competitors; HPE Superdome X, SGI UV 300H and IBM Power Systems E870/E880. I will address those systems and these three proof points in part 2 of this blog post.
The week before Sapphire, SAP unveiled a number of significant enhancements. VMware 6.0 is now supported for a production VM (notice the lack of a plural); more on that below. Hybris Commerce, a number of apps surrounding SoH and S/4HANA is now supported on IBM Power Systems. Yes, you read that right. The Holy Grail of SAP, S/4 or more specifically, 1511 FPS 02, is now supported on HoP. Details, as always, can be found in SAP note: 2218464 – Supported products when running SAP HANA on IBM Power Systems . The importance of this announcement, or should I say non-announcement as you had to be watching the above SAP note as I do on almost a daily basis because it changes so often, was the only place where this was mentioned. This is not a dig at SAP as this is their characteristic way of releasing updates on availability to previously suggested intentions and is consistent as this was how they non-announced VMware 6.0 support as well. Hasso Plattner, various SAP executives and employees, in Sapphire keynotes and other sessions, mentioned support for IBM Power Systems in almost a nonchalant manner, clearly demonstrating that HANA on Power has moved from being a niche product to mainstream.
Also, of note, Pfizer delivered an ASUG session during Sapphire including significant discussion about their use of IBM Power Systems. I was particularly struck by how Joe Caruso, Director, ERP Technical Architecture at Pfizer described how Pfizer tested a large BW environment on both a single scale-up Power System with 50 cores and 5TB of memory and on a 6-node x86 scale-out cluster (tested on two different vendors, not mentioned in this session but probably not critical as their performance differences were negligible), 60-cores on each node with 1 master node and 4 worker nodes plus a not-standby. After appropriate tuning, including utilizing table partitioning on all systems, including Power, the results were pretty astounding; both environments performed almost identically, executing Pfizer’s sample set, composed of 75+ queries, in 5.7 seconds, an impressive 6 to 1 performance advantage on a per core basis, not including the hot-standby node. What makes this incredible is that the official BW-EML benchmark only shows an advantage of 1.8 to 1 vs. the best of breed x86 competitor and another set of BW-EML benchmark results published by another x86 competitor shows scale-out to be only 15% slower than scale-up. For anyone that has studied the Power architecture, especially POWER8, you probably know that it has intrinsics that suggest it should handle mixed, complex and very large workloads far better than x86, but it takes a customer executing against their real data with their own queries to show what this platform can really do. Consider benchmarks to be the rough equivalent of a NASCAR race car, with the best of engineering, mechanics, analytics, etc, vs. customer workloads which, in this analogy, involves transporting varied precious cargo in traffic, on the highway and on sub-par road conditions. Pfizer decided that the performance demonstrated in this PoC was compelling enough to substantiate their decision to implement using IBM Power Systems with an expected go-live later this year. Also, of interest, Pfizer evaluated the reliability characteristics of Power, based in part on their use of Power Systems for conventional database systems over the past few years, and decided that a hot-standby node for Power was unnecessary, further improving the overall TCO for their BW project. I had not previously considered this option, but it makes sense considering the rarity of Power Systems to be unable to handle predictable, or even unpredictable faults, without interrupting running workloads. Add to this, for many, the loss of analytical environments is unlikely to result in significant economic loss.
Also in a Sapphire session, Steve Parker, Director Application Development, Kennametal, shared a very interesting story about their journey to HANA on Power. Though they encountered quite a few challenges along the way, not the least being that they started down the path to Suite on HANA and S/4HANA prior to it being officially supported by SAP, they found the Power platform to be highly stable and its flexibility was of critical importance to them. Very impressively, they reduced response times compared to their old database, Oracle, by 60% and reduced the run-time of a critical daily report from 4.5 hours to just 45 minutes, an 83% improvement and month end batch now completes 33% faster than before. Kennametal was kind enough to participate in a video, available on YouTube at: https://www.youtube.com/watch?v=8sHDBFTBhuk as well as a write up on their experience at: http://www-03.ibm.com/software/businesscasestudies/us/en/gicss67sap?synkey=W626308J29266Y50.
As I mentioned earlier, SAP snuck in a non-announcement about VMware and how a single production VM is now supported with VMware 6.0 in the week prior to Sapphire. SAP note 2315348 – describes how a customer may support a single SAP HANA VM on VMware vSphere 6 in production. One might reasonably question why anyone would want to do this. I will withhold any observations on the mind set of such an individual and instead focus on what is, and is not, possible with this support. What is not possible: the ability to run multiple production VMs on a system or to mix production and non-prod. What is possible: the ability to utilize up to 128 virtual processors and 4TB of memory for a production VM, utilize vMotion and DRS for that VM and to deliver DRAMATICALLY worse performance than would be possible with a bare-metal 4TB system. Why? Because 128 vps with Hyperthreading enabled (which just about everyone does) utilizes 64 cores. To support 6TB today, a bare-metal Haswell-EX system with 144 cores is required. If we extrapolate that requirement to 4TB, 96 cores would be required. Remember, SAP previously explained a minimum overhead of 12% was observed with a VM vs. bare-metal, i.e. those 64 cores under VMware 6.0 would operate, at best, like 56 cores on bare-metal or 42% less capacity than required for bare-metal. Add to this the fact that you can’t recover any capacity left over on that system and you are left with a hobbled HANA VM and lots of leftover CPU resources. So, vMotion is the only thing of real value to be gained? Isn’t HANA System Replication and a controlled failover a much more viable way of moving from one system to another? Even if vMotion might be preferred, does vMotion move memory pages from source to target system using the EXACT same layout as was implemented on the source system? I suspect the answer is no as vMotion is designed to work even if other VMs are currently running on the target system, i.e. it will fill memory pages based on availability, not based on location. As a result, this would mean that all of the wonderful CPU/memory affinity that HANA so carefully established on the source system would likely be lost with a potentially huge impact on performance.
So, to summarize, this new VMware 6.0 support promises bad performance, incredibly poor utilization in return for the potential to not use System Replication and suffer even more performance degradation upon movement of a VM from one system to another using vMotion. Sounds awesome but now I understand why no one at the VMware booth at Sapphire was popping Champagne or in a celebratory mood. (Ok, I just made that up as I did not exactly sit and stare at their booth.)
There seems to be a lot of confusion about the terms “Certified” and “Supported” in the HANA context. Those are not qualitative terms but more of a definition of how solutions are put together and delivered. SAP recognized that HANA was such a new technology, back in 2011, and had so many variables which could impact performance and support, that they asked technology vendors to design appliances which SAP could review, test and ensure that all performance characteristics met SAP’s KPIs. Furthermore, with a comprehensive understanding of what was included in an appliance, SAP could offer a one-stop-shop approach to support, i.e. if a customer has a problem with a “Certified” appliance, just call SAP and they will manage the problem and work with the technology vendor to determine where the problem is, how to fix it and drive it to full resolution.
Sounds perfect, right? Yes … as long as you don’t need to make any modifications as business needs change. Yes … as long as you don’t mind the system running at low utilization most of the time. Yes … as long as the systems, storage and interconnects that are included in the “certified” solution match the characteristics that you consider important, are compatible with your IT infrastructure and allow you to use the management tools of your choice.
So, what is the option? SAP introduced TDI, the Tailored Datacenter Integration approach. It allows customers to put together HANA environments in a more flexible manner using a customer defined set of components (with some restrictions) which meet SAP’s performance KPIs. What is the downside? Meeting those KPIs and problem resolution are customer responsibilities. Sounds daunting, but it is not. Fortunately, SAP doesn’t just say, go forward and put anything together that you want. Instead, they restrict servers and storage subsystems to those for which internal or external performance tests have been completed to SAP standards. This allows reasonable ratios to be derived, e.g. the memory to core ratio for various types of systems and HANA implementation choices. Some restrictions do apply, for example Intel Haswell-EX environments must utilize systems which have been approved for use in appliances and Haswell-EP and IBM Power Systems environments must use systems listed on the appropriate “supported” tabs of the official HANA infrastructure support guide.[i] Likewise, Certified Enterprise storage subsystems are also listed, but this does not rule out the use of internal drives for TDI solutions.
Any HANA solution, whether an appliance or a TDI defined system, is equally capable of handling the HANA workload which falls within the maximums that SAP has identified. SAP will support HANA on any of the above. As to the full solution support, as mentioned previously, this is a customer responsibility. Fortunately, vendors, such as IBM, offer a one-stop-shop support package. IBM calls its package Custom Technical Support (US) or Total Solution Service (outside of US). Similar to the way that SAP supports an appliance, with this offering, a customer need call only one number for support. IBM’s support center will then work with the customer to do problem determination and problem source identification. When problems are determined to be caused by IBM, SAP or SUSE products, warm transfers are made to those groups. The IBM support center stays engaged even after the warm transfer occurs to ensure the problem is resolved and delivered to the customer. In addition, customers may benefit from optional proactive services (on-site or remote) to analyze the system in order to receive recommendations to keep the system up to date and/or to perform necessary OS, firmware or hardware updates and upgrades. With these proactive support offerings, customers can ensure that the HANA systems are maintained in line with SAP’s planned release calendar and are fully prepared for future upgrades.
There are a couple of caveats however. Since TDI permits the use of the customer’s preferred network and storage vendors, these sorts of support offerings typically encompass only the vendors’ products that are within the scope of the warm transfer agreements of each offering vendor. As a result, a problem with a network switch or a third-party storage subsystem for which the proactive support vendor does not have a warm transfer support agreement would still be the responsibility of the customer.
So, should a customer choose a “certified” solution or a TDI supported solution? The answer depends on the scope of the HANA implementation, the customer’s existing standards, skills and desire to utilize them, the flexibility with resource utilization and instance placement desired and, of course, cost.
Scope – If HANA is used as a side-car, a small BW environment, or perhaps for Business One, an appliance can be a viable option, especially if the HANA solution will be located in a setting for which local skilled personnel are not readily available. If, however, the HANA environment is more complex, e.g. BW scale-out, SoH, S/4, large, etc, and located in a company’s main data centers with properly skilled individuals, then a TDI supported approach may be more desirable.
Standards – Many customers have made large investments in network infrastructure, storage subsystems and the tools and skills necessary to manage them. Appliances that include components which are not part of those standards not only bring in new devices that are unfamiliar to the support staff, but may be largely invisible to the tools currently in use.
Flexibility – Appliances are well defined, single purpose devices. That definition includes a fixed amount of memory, CPU resources, I/O adapters, SSD and/or HDD devices. Simple to order, inflexible to change. If a different amount of any of the above resources is desired, in theory, any change permitted by the offering vendor results in the device moving from a SAP supported appliance to a TDI supported configuration instantly requiring the customer to accept responsibility for everything just as quickly. By comparison, TDI supported solutions start out as a customer responsibility meaning it has been tailored around the customer’s standards and can be modified as desired at any time. All that is required for support is to run SAP’s HWCCT (Hardware Configuration Check Tool) to ensure that the resulting configuration still meets all SAP KPIs. As a result, if a customer desires to virtualize, mixing multiple production and non-production or even non-SAP workloads (when supported by the chosen virtualization solution, see my blog post on VMware and HANA recently published), a TDI solution, vendor and technology dependent, supports this by definition; an appliance does not. Likewise, a change in capacity, e.g. physical addition/removal of components, logical change of capacity, often called Capacity on Demand and VM resizing, are fully supported with TDI, not with appliances. As a result, once a limit is reached with an appliance, either a scale-out approach much be utilized, in the case of analytics that support scale-out, or the appliance must be decommissioned and replaced with a larger one. A TDI solution will available additional capacity or the ability to add additional gives the customer the ability to upgrade in place, thereby providing greater investment protection.
Cost – An appliance trades simplicity with potentially higher TCO as it is designed to meet the above mentioned KPIs without a comprehensive understanding of what workload will be handled by said appliance, often resulting in dramatic over-capacity, e.g. uses dozens of HDDs to meet disk throughput requirements. By comparison, customers with existing enterprise storage subsystems, may need only a few additional SSD and/or HDDs to meet those KPIs with limited incremental cost to infrastructure, environmentals and support. Likewise, the ability to use fewer, potentially larger systems with the ability to co-reside production, non-prod, app servers, non-SAP VMs, HA, etc., can result in significant reductions of systems footprints, power, cooling, management and associated costs.
IBM Power Systems has chosen to take the TDI-only approach as a direct result of feedback received from customers, especially enterprise customers, that are used to managing their own systems, have available skills, have prevalent IT standards, etc. HANA on Power is based on virtualization, so is designed, by default, to be a TDI based solution. HANA on Power allows for one or many HANA instances, a mixture of prod and potentially non-prod or non-HANA to share the system. HA and DR can be mixed with other prod and non-prod instances.
I am often asked about t-shirt sizes, but this is a clear indication that the individual asking the question has a mindset based on appliance sizing. TDI architecture involves landscape sizing and encompasses all of the various components required to support the HANA and non-HANA systems, so a t-shirt sizing would end up being completely misleading unless a customer only needs a single system, no HA, no DR, no non-prod, no app servers, etc.
Recently, SAP updated their SAP notes regarding the ability to run multiple production HANA VMs with VMware. On the surface, this sounds like VMware has achieved parity with IBM’s PowerVM, but the reality could not be much farther away from that perception. This is not to say that users of VMware for HANA will see no improvement. For a few customers, this will be a good option, but as always, the devil is in the details and, as always, I will play the part of the devil.
Level of VMware supported: 5.5 … still. VMware 6.0 is supported only for non-production.[i] If VMware 6.0 is so wonderful and they are such “great” partners with SAP, it seems awfully curious why a product announced on Feb 3, 2015 is still not supported by SAP.
Maximum size of each production HANA instance: 64 virtual processors and 1TB of memory, however this translates to 32 physical processors with Hyperthreading enabled and sizing guidelines must still be followed. Currently, BW HANA is sized @ 2TB for 4 Haswell chips, i.e. 28.4GB/core which translates to a maximum size of 910GB for a 32 core VM/64 vp, so slightly less than the 1TB supported. Suite on HANA on Intel is supported at 1.5x higher memory ratio than BW, but since the size of the VM is limited to 1TB, this point is largely moot.
Performance impact: At a minimum, SAP estimates a 12% performance degradation compared to bare metal (upon which most benchmarks are run and from which most sizings are based), so one would logically conclude that the memory/cpu ratio should be reduced by the same level. The 12% performance impact, but not the reduced sizing effect that I believe should be the result, are detailed in a SAP note.[ii] It goes on to state “However, there are around 100 low-level performance tests in the test suite exercising various HANA kernel components that exhibit a performance degradation of more than 12%. This indicates that there are particular scenarios which might not be suited for HANA on VMware.” Only 100? When you consider the only like-for-like published benchmarks using VMware and HANA[iii], which showed a 12% degradation (coincidence, I think not) for a single VM HANA system vs. bare metal, it leaves one to wonder what sort of degradation might occur in a multiple VM HANA environment. There is no guidance provided on this which might make anything less than a bleeding edge customer with no regard for SLA’s to be VERY cautious. Another SAP note[iv] goes on to state, “For optimal VM performance, VMware recommends to size the VMs within the NUMA node boundaries of the specific server system (CPU cores and local NUMA node memory).” How much impact? Not provided here. So, either you size your VMs to fit within NUMA building blocks, i.e. a single 18-core socket or you suffer an undefined performance penalty. It is also interesting to note what VMware said in Performance Best Practices for VMware vSphere® 5.5 “Be careful when using CPU affinity on systems with hyper-threading. Because the two logical processors share most of the processor resources, pinning vCPUs, whether from different virtual machines or from a single SMP virtual machine, to both logical processors on one core (CPUs 0 and 1, for example) could cause poor performance.” That certainly gives me the warm and fuzzy!
Multiple VM support: Yes, you can now run multiple production HANA VMs on a system[v]. HOWEVER, “The vCPUs of a single VM must be pinned to physical cores, so the CPU cores of a socket get exclusively used by only one single VM. A single VM may span more than one socket, however. CPU and Memory overcommitting must not be used.” This is NOT the value of virtualization, but of physical partitioning, a wonderful technology if we were living in the 1990s. So, if you have an 8-socket system, you can run up to 8 simultaneous production VMs as long as al VMs are smaller than 511GB for BW, 767GB for SoH. Need 600GB for BW, well that will cost you a full socket even though you only need a few cores thereby reducing the maximum number of VMs you can support on the system. And this is before we take the 12% performance impact detailed above into consideration which could further limit the memory per core and number of VMs supported.
Support for mixed production HANA VMs and non-production: Not included in any of the above mentioned SAP notes. One can infer from the above notes that this is not permitted meaning that there is no way to harvest unused cycles from production for the use of ANY other workload, whether non-prod or non-HANA DB.
Problem resolution: SAP Note 1995460 details the process by which problems may be resolved and while they are guided via SAP’s OSS system, there is transfer of ownership of problems when a known VMware related fix is not available. The exact words are: “For all other performance related issues, the customer will be referred within SAP’s OSS system to VMware for support. VMware will take ownership and work with SAP HANA HW/OS partner, SAP and the customer to identify the root cause. Due to the abstraction of hardware that occurs when using virtualization, some hardware details are not directly available to SAP HANA support.” and of a little more concern “SAP support may request that additional details be gathered by the customer or the SAP HANA HW partner to help with troubleshooting issues or VMware to reproduce the issue on SAP HANA running in a bare metal environment.”
My summary of the above: One of more production HANA instances may be run under VMware 5.5 with a maximum of 64 vp/32 pp (assuming Hyperthreading is enabled) and a minimum of 12% performance degradation with potential proportionate impact on sizing, with guidance about “some scenarios might be suited for HANA with VMware”, with potential performance issues when VMs cross socket boundaries, with physical partitioning at the socket level, no sharing of CPU resources, no support for running non-production on the same system to harvest unused cycles and potential requirement to reproduce issues on a bare metal system if necessary.
Yes, that was a long, run-on sentence. But it begs the question of just when would VMware be a good choice for hosting one or more production HANA instances? My take is that unless you have very small instances which are unsuitable for HANA MDC (multitenancy) or are a cloud provider for very small companies, there is simply no value in this solution. For those potential cloud providers, their target customer set would include companies with very small HANA requirements and the willingness to accept an SLA that is very flexible on performance targets while using a shared infrastructure for which there a wide variety of issues for which a problem in one VM could result in the whole system to fail with multiple customers impacted simultaneously.
And in case anyone is concerned that I am simply the bearer of bad news, let me remind the reader that IBM Power Systems with PowerVM is supported by SAP with up to 4 production HANA VMs (on the E870 and E880, 3 on all other HANA supported Power Systems) with granularity at the core level, no restrictions on NUMA boundaries, the ability to have a shared pool in place of one of the above production VMs with any number of non-production HANA VMs up to the limits of PowerVM which can utilize unused cycles from the production VMs, no performance penalties, no caveats about what types of workloads are well suited for PowerVM, excellent partition isolation preventing the vast majorities that could happen in one VM from affecting any other ones and no problem resolution handoffs or ownership changes.
In other words, if customers want to continue the journey around virtualization and server consolidation that they started in the early 2000s, want to have a very flexible infrastructure which can grow as they move to SoH, shrink as they move to S/4, grow as they consolidate more workloads into their primary instance, shrink again as they roll off data using data tiering, data aging or perhaps Hadoop and all without having to take significant system outages or throw away investment and purchase additional systems, IBM Power Systems with PowerVM can support this; VMware cannot.
[iii] Benchmark detail for bare metal and VMware 5.5 based runs from http://global.sap.com/solutions/benchmark/bweml-results.htm:
06/02/2014 HP 2,000,000,000 111,850 SuSE Linux Enterprise Server 11 on VMWARE ESX 5.5 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory
03/26/2014 HP 2,000,000,000 126,980 SuSE Linux Enterprise Server 11 SAP HANA 1.0 SAP NetWeaver 7.30 1 database server: HP DL580 Gen8, 4 processors / 60 cores / 120 threads, Intel Xeon Processor E7-4880 v2, 2.50 GHz, 64 KB L1 cache and 256 KB L2 cache per core, 30 MB L3 cache per processor, 1024 GB main memory
[v] 2024433 – Multiple SAP HANA VMs on VMware vSphere in production