Much of my professional career has been devoted to the design and development of storage software and systems. I started this journey almost by accident as a graduate student at the University of Virginia. At the time, I was doing research in high-performance computing and kept bumping into the performance and scalability limitations of memory and storage systems, which ultimately led me to focus my work in those areas.
Back then, data storage was viewed as basic plumbing and received little press. Fast forward 25 years and, with the amount of electronic data now doubling every two years, storage is one of the most exciting areas in IT! Case in point – I walked the show floor at VMworld 2014 in San Francisco and found that the vast majority of products, demos and talks revolved around storage software and systems.
So, outside of raw data growth, what are the key drivers of innovation in storage today? I believe they are the emergence of cloud-based IT, the agility of software-defined storage, and the disruptive nature of flash storage and other forms of non-volatile memory.
Cloud-based IT builds on the hardware virtualization provided by hyper visors such as VMware ESXi, Microsoft Hyper-V, and Linux KVM, as well as the OS virtualization provided by container solutions like Docker and Rocket. Layered on top of this virtualization technology is an orchestration service for provisioning, monitoring and managing virtualized servers and applications. This basic architecture applies to both private and public cloud environments.
Cloud-based IT places unique requirements on the storage software and systems that store virtual machines (VMs) and their associated virtual disks (VDs). First, to integrate with orchestration services, these storage solutions must provide REST-based APIs or similar mechanisms for automating all facets of storage management. Second, because multiple VMs and VDs can reside on a single block storage device or NAS share, these storage solutions must be able to both interpret VM and VD formats and track I/Os associated with a given VD in order to provide per VM and per VD management and performance monitoring. This capability is often referred to as being “VM-aware.” Finally, because having multiple VDs reside on a single block device or NAS share greatly increases both the intensity and randomness of I/Os observed by that device or share, these storage solutions must deliver high IOPS and low latency. This interleaving of VM I/Os is often referred to as the “I/O blender effect” and is where flash storage can come to the rescue, as I’ll discuss later.
Software-defined storage (SDS) refers to storage solutions comprising storage software running on commodity servers and using direct attached HDD and SSD storage devices. The concept of SDS is not new and has its roots in the NFS storage software originally developed by Sun Microsystems (now Oracle) for servers running SunOS, as well as the AFS and DCE/DFS storage software commercially developed by Transarc Corporation (now IBM) for servers running multiple variants of Unix. Full disclosure, I previously worked on both AFS and DCE/DFS. SDS has re-emerged as a major trend because it provides an opportunity to reduce the total cost of storage solutions and because it is a natural fit for cloud-based IT, where it enables a complete storage system to be provisioned and managed as easily as any other application. Now that’s agility!
Finally, flash storage and other forms of non-volatile memory (NVM) represent the greatest disruption to storage software and systems since the advent of the hard disk drive nearly 60 years ago. It’s no wonder that flash storage is generating so much excitement in the IT industry!
Today, most IT administrators encounter flash storage in the form of SATA and SAS SSDs. Packaged this way, flash storage is a direct replacement for HDDs but with a level of performance that far exceeds spinning media. For example, the Micron M500DC SATA SSD can deliver random read/write performance of 63,000/35,000 IOPS, respectively. Compare that with a high-end 15K SAS HDD delivering 500 IOPS of random read/write performance and you can see how SSDs come to the rescue in addressing the I/O blender effect in virtualized workloads.
PCIe SSDs, which connect directly to the PCIe bus without the overheads associated with SATA and SAS, can deliver even higher levels of performance. For example, the Micron P420m PCIe SSD can deliver random read/write performance of 750,000/95,000 IOPS, respectively. This is clearly a disruptive technology and is forcing storage software and operating systems to evolve to keep pace.
In addition to performance, SSDs bring other benefits by removing all of the mechanicals associated with a spinning disk. A reduction in power consumption of 99% in watts/IO is typical in mixed workloads, and SSDs are in an active state for far less time than HDDs further reducing power consumption and heat generation. SSDs also come in alternative form factors such as mSATA, M.2, and PCIeadd in cards, providing greater flexibility in how storage is incorporated into servers and other computing platforms.
Positioning SSDs as a direct replacement for HDDs represents only the starting point for introducing flash storage into the IT environment. To get the most out of flash we need to develop storage software and operating systems that understand flash topology and behavior and are designed to work with it –instead of hiding flash behind the legacy Logical Block Addressing model defined for HDDs. In doing so, I believe we can get 1.5-2X more performance and endurance than can be achieved today with PCIe SSDs.
Cloud-based IT, SDS, and (perhaps most of all) flash and other forms of NVM are the key drivers of innovation in storage today. So, while the start of my journey into the world of storage software and systems may have been an accident, staying on that path has been a very conscious decision that continues to present exciting opportunities each day.