Digital magazine HPC special edition

August 20199 those racing to be the first across the line with advancements in autonomous vehicles, healthcare or the private sector, time to market is typically measured by how fast machine learning models are trained, and often how big the data set is which is driving the trained model. In other words more data leads to more accurate models; thus, high-performance computing (HPC) leads to faster training.The Rise of the Data-Centric Data-CenterIn today's data-centric, the sensor-filled universe, data throughput and greater scalability for data analysis have gone beyond business-critical. While the applications for HPC and artificial intelligence are vast--from image and voice recognition and autonomous systems, to advanced weather prediction, medical research and more--ubiquitous is the usage of data...and lots of it!While moving data is critical for both HPC and AI, equally important is the ability to compute and consume results in real-time. Both HPC and AI require the ability to exchange messages and compute results efficiently from thousands of parallel processes. In yesterday's datacenter, one needed to move data to the compute element before processing and working on it. Today, the CPU has reached the limits of its scalability for HPC and AI workloads; new modern-day data centers are built based on a data-centric architecture. With this more modern approach to scale-out, the goal is to work on data wherever it exists in the data center, including the network fabric itself.In lieu of the CPU's limitations, several types of data-compute accelerators (such as GPUs, FPGAs, etc.) have emerged to address the need to compute vast amounts of data. Today's server real-estate, which might be occupied by 16 or more accelerators in a single server, now requires a network that delivers data at cutting-edge speeds and ultra-low latency. This transition to the new data-centric architecture requires networks to be fast and efficient. Moving data at 200Gb/s is already common place; synonymous with throughput and low latency and native RDMA capabilities is the ability to offload as many functions as possible from the CPU, GPU, FPGA or other types of accelerators. This enables compute elements to focus on their primary role of handling the compute rather than managing network operations. Mellanox Technologies, for example is renowned for its leadership as the fastest, most efficient interconnect solution provider with a primary focus on network offload architecture and flexible topologies for any system deployment size. If you already have Mellanox hardware in your data center deployment, you probably are already taking advantage of the next evolution in data-centric architecture that will lead us to the exascale-class generation of HPC and AI deployments.Welcoming the Era of In-Network ComputingThe need to analyze today growing amount of data and to support ever more complex higher-fidelity simulations leverages the ability to manage and carry out computational operations on the data as it is being transferred by the data center interconnect. Working as a powerful co-processor, the network fabric elements allow data-driven algorithms to be executed in the network fabric. This is where Mellanox's leadership plays such an important role, for both HPC and AI. While in-network computing might seem a novel concept, it is vastly deployed in the world's most powerful supercomputers. In-fact, Mellanox accelerates the world's top 3 supercomputers and consistently develops new capabilities that improve upon performance for the most challenging simulations in scientific research.For modern-day workloads such as deep learning, Mellanox InfiniBand provides the highest performance and scalability and already has become the de-facto standard interconnect of choice. Mellanox's HDR 200Gb/s InfiniBand solutions provide this in-network computing capability, native RDMA, GPUDirect, and numerous other acceleration engines and storage offloads. The latest "Scalable Hierarchical Aggregation and Reduction Protocol" (SHARP) engines enhance acceleration for both deep learning applications and HPC workloads. In a technical preview in March 2019, Mellanox SHARP technology, in combination with NVIDIA's Collective Communications Library (NCCL), set new performance records to double deep learning operations performance. Mellanox's interconnect eliminates overhead from the processor's involvement in the network communication, making it critical toward achieving scalable performance. Equally as important as having an open-standards based network, communication libraries and frameworks, such as NCCL, MPI coupled with Tensorflow, Café, Microsoft CNTK orany leading AI framework, having Mellanox hardware under the hood of your datacenter gives you an exceptional advantage above the rest. IN AN AGE OF EXPONENTIAL DATA GROWTH AND THE RAPID CONVERGENCE OF HIGH-PERFORMANCE COMPUTING (HPC) AND AI, ORGANIZATIONS ARE ADAPTING TO KEEP PACE WITH THE TECHNOLOGICAL DEVELOPMENTS
< Page 8 | Page 10 >