NBCUni 9.5.23

NVMe and CXL: Enabling More Powerful M&E Workflows

By Tom Coughlin

Higher resolution, higher frame rate, greater dynamic range and multiple camera projects are increasing the required data rates for rich media workflows. Although hard disk drive (HDD) storage can handle many video workloads, solid state drives (SSDs) are used for metadata storage and access in HDD arrays and are increasingly used as primary storage for many applications within media and entertainment.

Although SSDs are available with SATA and SAS interfaces, the highest performing and lowest latency SSDs use the NVMe interface — now the dominant SSD storage interface.

In this piece, we will look at some of the capabilities that NVMe SSDs provide that can benefit rich media applications, including various approaches to computational storage and storage pooling. We will also discuss upcoming changes in processing memory that the Compute Express Link (CXL) will enable and how these can help with various M&E activities.

NVMe SSDs
NVMe SSDs communicate over the PCIe bus that is commonly used in various computer systems. PCIe bus performance roughly doubles with each generation, with new generations being introduced about every two to three years. Current computer systems are mostly using PCIe 4, providing 16GT/s per connection lane (GT/s is giga-transition/second). Early PCIe 5 systems are becoming available, providing 32GB/s per lane, and PCIe 6 specification work is underway that will provide 64GT/s per lane. Multiple PCIe lanes are combined to achieve a desired overall NVMe interface performance.

By running over the PCIe bus, NVMe SSDs can take advantage of the higher IO rates that each new generation of PCIe can provide. This will reduce any communication bottlenecks for NVMe-based storage systems. This also encourages NVMe SSDs and arrays of NVMe SSDs to offer higher write and read performance to avoid storage bottlenecks.

NVMe SSDs are available in a number of form factors — U.2, M.2, AIC and EDSFF. The graphic below from Western Digital shows these SSD form factors. These form factors enable more efficient use in various applications. For instance, M.2 is often used in notebook computers, Add-in Card (AIC) SSDs fit into PCIe slots in servers and computers, and the various EDSFF form factors are often used for storage arrays for enterprise and data center applications.

CXL The latest NVMe specification is NVMe 2.0, released in June 2021. NVMe specifications are actually a library of specifications. There are eight total. These are NVMe Base specification, Command Set specifications (specifications for the NVM Command Set, Zone Name Spaces, ZNS, Command Set and Key Value, KV, Command Set), and Transport (specifications for PCIe Transport, remote direct memory access, RDMA, Transport and TCP Transport). The NVMe Management Interface is a separate specification. The 2.0 specification included Zoned Namespaces, Key Value, rotational media (support for HDDs) and Endurance Group Management.

For larger installations, such as for cloud computing service providers, NVMe over fabrics (NVMe-oF) such as Ethernet allow pooling NVMe storage devices. Such storage pools are part of a larger data center effort referred to as disaggregation and composability. The individual components of a server are being taken out of the server and pooled together. The pooled resources can then be allocated by software to create virtual machines or containers that incorporate these various components as needed for the application (this is called composability). These virtual machines or containers can be created and deconstructed as needed and allow more efficient use of data center resources.

There is also much activity going on among SSD companies, with several of them offering early products available to support computation close to or in SSDs. This effort is called computational storage (CS), and the Storage Networking Industry Association (SNIA) is developing standards for CS. The computation is done using specialized processors and can provide data reduction technologies with lower latency and less overhead for the CPU. CS also enables new capabilities for running supporting processes done closer proximity to the memory. For M&E applications, computational storage could enable new and higher applications, such as faster video rendering and multi-camera video stitching for 360-degree video.

There are several M&E applications in which NVMe SSDs provide the greatest value. One is to support video projects with 8K or higher resolution at high rate and high-dynamic range. Editing content this large can require 10GB/s or higher streaming performance at sub-16ms latencies. This can push HDD-based storage to its limits. NVMe-based storage has the performance to handle editing raw 8K-plus video projects.

When we throw multiple camera video streams into the mix, the argument for NVMe SSDs is even more compelling. Because of the random IO performance of NVMe SSDs, they can provide low-latency, multi-gigabit per second data streams and can help stitch videos together for 360-degree composite videos.

NVMe SSDs can also benefit visual effects and rendering workflows, especially since these workflows often have intermittent data demand. For instance, in a render farm, all the render engines tend to get new data from storage at about the same time, making the peak bandwidth needed to efficiently service this demand very high. NVMe-based storage can service the high-bandwidth, intermittent demand for rendering 8K-plus resolution content more efficiently than an HDD-based storage system.

CXL Memory Applications
Working external memory for data processing, including processing of rich media, is typically DRAM. DRAM is often supplied to the processors in dual in-line memory modules (DIMMs) using the double data rate (DDR) or the low-power DDR (LPDDR) interface. Some specialized memory interfaces include high-bandwidth memory (HBM), which is used for processing on many GPUs, for example. DDR and LPDDR DIMMs are used in personal computers and servers as well as consumer applications to provide memory for individual processors.

Likewise, HBM is dedicated serial memory directly connected to a specialized processor, like a GPU. Because these memories are directly connected to processors, they have the lowest latency, and we can call these “near” memory.

The Open Memory Interface (OMI) is another memory interface developed by IBM and used in its Power processor.  It provides a high-performance serial bus that supports sharing memory between processors and creating pools of shared memory with a limited additional latency. OMI was recently incorporated into the Compute Express Link (CXL) effort. CXL is a cache-coherent interconnect for processors, memory expansion and accelerators based upon the PCIe bus (like NVMe).

CXL also supports memory pooling, with memories having varying performance characteristics (DRAM and other memory technologies) with the latest version 3.0 specification. CXL is supported by major industry participants. Servers and other computing hardware implementing CXL should be available in 2023. One of the first available implementations is for expanding the memory on an individual CPU beyond what DDR memory can support. CXL-attached memory can also provide additional memory buffering and supports adding specialized processors (accelerators) near to the memory for processing data with lower latency and reducing the overhead for system CPUs.

Memory expansion, memory buffering extension and support for accelerators closer to the data are the first uses for CXL, but the capability that will have the biggest impact is memory pooling. Like NVMe-oF, CXL can serve as a fabric that allows collections of memory of various types (including new non-volatile memories) and associated accelerators, including in or close to the memory pool. Like the NVMe storage pools, these memory pools allow data centers to combine memory into virtual machines or containers that can be created and stopped as needed, enabling more efficient use of data center resources.

The figure below shows how the CXL 3.0 specification permits expanded memory sharing and pooling. The CXL switches allows access by various hosts (or allocated pooled processors) to various memory resources in the memory pool.

Summing Up
Storage and memory technologies are undergoing big changes, inspired by the introduction of solid-state non-volatile storage and memory technologies. These changes are being driven by NVMe and CXL storage and memory technology based upon the PCIe computer bus. These storage and memory interconnects support faster direct-connected storage and memory, and they enable new ways to create software-defined virtual machines and containers.

NVMe and CXL storage and memory will enable advanced media and entertainment workflows, providing high-performance, low-latency storage and memory that support working with multi-camera, high-resolution, high-frame-rate and higher-dynamic range video. These technologies will enable both advanced local storage and memory and will drive powerful virtual production and post production workflows.


 

Tom Coughlin is a digital storage analyst and business and technology consultant. He has over 40 years in the data storage industry, with engineering and management positions at several companies. Coughlin Associates publishes the 2022 Digital Storage in Media and Entertainment Report.

.

 

 

 

 

 

 

 

 


Leave a Reply

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.