InfiniBand is a powerful new architecture designed to support I/O connectivity for the Internet infrastructure. InfiniBand is supported by all the major OEM server vendors as a means to expand beyond and create the next generation I/O interconnect standard in servers. For the first time, a high volume, industry standard I/O interconnect extends the role of traditional “in the box” busses. InfiniBand is unique in providing both, an “in the box” backplane solution, an external interconnect, and “Bandwidth Out of the box”, thus it provides connectivity in a way previously reserved only for traditional networking interconnects. This unification of I/O and system area networking requires a new architecture that supports the needs of these two previously separate domains.
Underlying this major I/O transition is InfiniBand’s ability to support the Internet’s requirement for RAS: reliability, availability, and serviceability. This white paper discusses the features and capabilities which demonstrate InfiniBand’s superior abilities to support RAS relative to the legacy PCI bus and other proprietary switch fabric and I/O solutions. Further, it provides an overview of how the InfiniBand architecture supports a comprehensive silicon, software, and system solution. The comprehensive nature of the architecture is illustrated by providing an overview of the major sections of the InfiniBand 1.1 specification. The scope of the 1.1 specification ranges from industry standard electrical interfaces and mechanical connectors to well defined software and management interfaces.
High-performance computing (HPC) encompasses advanced computation over parallel processing, enabling faster execution of highly compute intensive tasks such as climate research, molecular modeling, physical simulations, cryptanalysis, geophysical modeling, automotive and aerospace design, financial modeling, data mining and more. Highperformance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.
This reference design describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:
• HPC cluster design
• Installation and configuration of the Mellanox Interconnect components
• Cluster configuration and performance testing
Reducing expenses associated with purchasing and operating a data center means being able to do more with less – less servers, less switches, fewer routers and fewer cables. Reducing the number of pieces of equipment in a data center leads to reductions in floor space and commensurate reductions in demand for power and cooling. In some environments, severe limitations on available floor space, power and cooling effectively throttle the data center’s ability to meet the needs of a growing enterprise, so overcoming these obstacles can be crucial to an enterprise’s growth.
Even in environments where this is not the case, each piece of equipment in a data center demands a certain amount of maintenance and operating cost, so reducing the number of pieces of equipment leads to reductions in operating expenses. Aggregated across an enterprise, these savings are substantial. For all these reasons, enterprise data center owners are incented to focus on the efficient use of IT resources. As we will see shortly, appropriate application of InfiniBand can advance each of these goals.