Michael Thompson, Director, Systems Management Product Marketing, SolarWinds
If you listen to vendors such as VMware and Microsoft, you would think that virtualizing mission critical applications such as transaction processing and CRM should be an easy decision for companies given the improved hardware utilization and increased flexibility that virtualization provides. In addition, impressive performance and capacity improvements in both VMware, vSphere, and Microsoft Hyper-V have alleviated many of the basic resource concerns about CPU, memory and storage input/output per second (IOPS). In addition, high availability, failover and clustering can also reduce failure rates. But still, businesses are cautious about the change. One reason is that despite all these advancements, systems still go down and when they go down it can be expensive.
“No matter how much technology gets implemented, a critical line of defense holds the ability to monitor and quickly troubleshoot problems when they occur, hopefully before they even impact end users”
The reason systems still go down is that, the most common problems are not hardware or virtual machine failures. According to Gartner, “Through 2015, 80 percent of outages impacting mission-critical services will be caused by people and process issues, and more than 50 percent of those outages will be caused by change/configuration/release integration and hand-off issues.” The actual reality is minor errors by human are quickly turning into expensive and embarrassing business problems. That’s why it is extremely difficult to completely protect these systems, by simply increasing capacity or implementing redundant systems.
As a result, just like any other critical risk that is hard to predict, businesses are looking for an “insurance policy” when the stakes of virtualizing critical applications can become financial survival or the loss of a good reputation. While you can probably buy a literal insurance policy for such catastrophic IT events, I’m sure it’s expensive and would probably be cashed in while you, the CIO, is out looking for a new job. Instead, there are more practical things IT can do to help eliminate or mitigate outages when virtualizing mission critical applications. In particular, a combination of three different approaches can significantly reduce the risk.
Leverage Appropriate HA, Failover Technology, and Procedures
High availability (HA) and Failover technologies can increase cost, not only for the HA or failover software but also for redundant hardware and software licenses for standby systems. However, cloud solutions and many companies’ reduced cost for standby systems can help alleviate this. That said, while technology can provide much of the solution, you can’t forget the people and procedures portion of the equation. A surprising number of companies that purchase or set up failover systems never test them. Alternatively, you can use virtualization technology to provide a poor man’s HA system with regular snapshots, backups and regular maintenance of a set of golden master images.
Implement Virtualization - Appropriate Change Management
Old-school process management approaches that lock down any changes, except those approved by a 20-person change committee that meets once a month can be impossible in the dynamic virtualization environment built on the ability to create, destroy or modify things in seconds. However, it is possible to take change management concepts and apply them to changes in virtual environments, including implementing roll-back plans, snapshots and approvals for high risk activities.
Leverage a Management System with End-to-End Visibility and Configuration Tracking
No matter how much technology gets implemented, a critical line of defense holds the ability to monitor and quickly troubleshoot problems when they occur, hopefully before they even impact end users. While many tools provide siloed views of one domain (e.g. application, virtualization or storage), when virtualizing mission critical applications it is especially important to be able to see performance data and root cause information across these silos in context of the application being virtualized. Tools that provide this integrated, cross-silo visibility can be critical in ensuring that resource contention of shared resources or changes in any layer will not impact business critical applications. Additionally, the ability to track historical configuration changes and associate them with event data can be critical for finding problems that result from human error, not hardware or software failure.
As more companies look to virtualize mission critical applications, CIOs and the IT teams need to carefully evaluate both the technologies and procedures they use in the process to make sure the systems remain highly available. Given the potential risks for business, companies will need to carefully incorporate HA and failover planning, change management and a system for increasing visibility and configuration management to provide an extra insurance policy for both the business and their careers.