Google has a history of solving big problems using Ethernet and redesigning the transport layer to meet demanding workloads that require high burst bandwidth, high message rates and low latency. Workloads like storage have long required some of these attributes, but with new use cases like large-scale AI/ML training and high-performance computing (HPC), the need has increased significantly.
In the past, Google has openly shared its knowledge of traffic shaping, congestion control, load balancing, and more with the industry. by submitting ideas to the Association for Computing Machinery and the Internet Engineering Task Force. These ideas have been implemented in software and partly in hardware for several years. But Google believes the industry as a whole will benefit more from implementing the package with dedicated and flexible hardware support.
To achieve this goal, Google developed Falcon to provide performance improvements over pure software transports. Today, at the OCP Global Summit, Google opens Falcon to the ecosystem through the Open Compute Project, the natural place for the community to benefit from Google’s production insights and help modernize Ethernet.
As a hardware-backed transport layer, Falcon is designed for reliability, performance, and low latency, leveraging production-proven technologies such as Carousel, Snap, Swift, PLB, and CSIG.
Falcon’s layers are shown in the following figure along with the associated function. It features RDMA and NVM Express Upper Layer (ULP) protocols, but Falcon is extensible to other ULPs depending on the ecosystem needs.
Falcon’s lower layers leverage three key ideas to achieve low latency in high-bandwidth, but lossy Ethernet data center networks. Fine-grained hardware-assisted round-trip time (RTT) measurements with flexible, hardware-assisted traffic shaping for each flow and fast, accurate packet retransmissions are combined with compatible Falcon connections. with multipath and crypts from PSP. Based on this, Falcon was designed from the ground up to be a multi-protocol transport that can support ULPs that vary widely in performance requirements and application semantics. The ULP mapping layer not only provides out-of-the-box compatibility with the ULP Infiniband verbs RDMA and NVMe, but also includes additional innovations essential for warehouse-scale applications, such as: B. flexible control semantics and elegant error handling. Finally, hardware and software are designed to work together to achieve the desired attributes of high message rate, low latency, and high bandwidth while maintaining the flexibility required for programmability and continuous innovation.
Falcon reflects the central role that Ethernet continues to play in our industry. Falcon is designed to deliver high, predictable performance at warehouse scale, as well as to be flexible and expandable. Google looks forward to working with community and industry partners to modernize Ethernet to meet the networking needs of the AI-driven future. Google believes Falcon will be a valuable addition to other ongoing efforts in this area.
Industry Outlook
Industry partners are excited about the promise that Falcon represents for developing the next generation of Ethernet.
“We welcome Google Falcon’s contribution as it shares the Ultra Ethernet Consortium’s vision to make Ethernet the best data center fabric for AI and HPC, and we look forward to continuing industry innovation in this important area.”
– J Metz, President of the Ultra Ethernet Consortium (led by AMD, Arista, Broadcom, Cisco, Eviden, Hewlett Packard Enterprise, Intel, Meta, Microsoft and Oracle).
“Falcon is available for the first time in the Intel IPU E2000 product series. The value of these IPUs is even greater because they are the first instance of an Ethernet transport to offer low latency and comprehensive congestion management. Intel is a leading member of the Ultra Ethernet Consortium, working to advance Ethernet for high-performance AI and HPC workloads. We plan to use the improvements based on the resulting standards in future IPU and Ethernet products.”
– Sachin Katti, SVP & GM, Network and Edge Group, Intel
“We are excited to see a high-performance transport protocol for mission-critical workloads such as AI and HPC, running over standard Ethernet/IP networks and enabling massive application bandwidth at scale.”
– Hugh Holbrook, Group Vice President, Software Engineer, Arista Networks
“Cisco welcomes Falcon’s contribution to OCP. Cisco has long supported open standards and believes in comprehensive ecosystems. The pace and scale of modern data center networks, and especially AI/ML networks, are unprecedented and represent a challenge and an opportunity for the industry. Falcon addresses many of the challenges of these networks and enables efficient use of the network.”
– Ofer Iny, Cisco Fellow, Cisco
“Juniper is a strong supporter of open ecosystems, which is why we are excited to have Falcon join the OCP community. Falcon enables Ethernet as the data center network of choice for demanding workloads, offering high bandwidth, low latency and congestion mitigation. Today, Falcon offers the industry a proven solution for demanding AI and ML workloads.”
– Raj Yavatkar, Chief Technology Officer, Juniper
“Marvell strongly supports the open Ethernet ecosystem and is committed to evolving it to support new and demanding workloads such as AI. We welcome Falcon’s contribution to OCP and look forward to Google sharing its practical experience with industry.”
– Nick Kucharewski, SVP & GM Network Switching Group, Marvell
Source: Google
And you ?
What is your opinion on this topic?
See also:
Which networks for the data centers of tomorrow? Memory requirements appear to be the reason for the explosion in Ethernet
Bob Metcalfe, inventor of Ethernet, has been named the winner of the 2022 ACM AM Turing Award in recognition of his contributions to the invention, standardization and commercialization of Ethernet
100x faster than Wi-Fi: LiFi, light-based networking standard released, proponents of 802.11bb claim it is 100x faster than Wi-Fi