NBCUni 9.5.23

Category Archives: GPU

Vu One All-inOne Studio for Mobile Virtual Production

Vu Technologies, which offers a network of virtual production studios, has introduced Vu One, a multi-purpose “all-in-one” studio that helps make virtual production accessible to all, regardless of budget and technical experience.

Vu One is designed to increase the efficiency and affordability of virtual production workflows for production studios, filmmakers, ad agencies, corporate marketing departments and content creators of any skill level. This new turnkey virtual production system combines integrated hardware – a scalable LED volume, PTZ camera, audio system and a Puget Systems media server – with virtual production software applications. These include virtual environment generation, project management, remote collaboration, access to libraries of pre-built virtual environments and full studio recording capabilities – all of which use Virtual Studio, a suite of tools including the Vu.ai generative AI content workflow.

Vu One is an easy-to-configure platform controlled by software from a hand-held mobile device. It operates seamlessly on a streamlined technology stack, regardless of screen size or setup, and comes integrated with five essential components: display, audio, tracking, media server and content management. Boasting a range of display options, Vu One sizes start at 16 feet by 9 feet to an expansive 45 feet by 16 feet.

Designed with AMD Threadripper Pro processors, Puget Systems’ virtual production media server systems for Vu One offer multiple cores for fast light-baking and shader-compiling, while Nvidia’s RTX A-series graphics cards offer 3D performance combined with VRAM. They support dual GPUs along with optional NVLink and Quadro Sync add-ons.

Key specs of the server include:

  • AMD Threadripper Pro WRX80 EATX
  • Motherboard: Asus Pro WS WRX80E-Sage SE Wi-Fi (AMD WRX80 EATX)
  • CPU: AMD Ryzen Threadripper Pro 5975 WX 3.6GHz (up to 4.5GHz Turbo) 32 Core 280W
  • RAM: 256GB DDR4-3200 REG ECC (8x32GB)
  • Video card: Nvidia GeForce RTX 4090 24GB Open Air
  • Hard drive: Samsung 870 EVO 2TB SATA3 2.5-inch SSD

This configuration is specially designed for Vu One and optimized for complex virtual production workflows and environments.

Running off of Virtual Studio by Vu, this software suite includes innovative tools such as Scene Forge, Remote VP and Vu.ai, along with industry-leading applications like Unreal Engine, Volinga, Storia AI and more. Virtual Studio also includes a marketplace of 3D and 2D assets, all “Certified for Virtual Production” and optimized to run on Vu One.

Vu One starts at $5,600 per month for the base model. A complete system can be purchased outright starting at $249,000 and comes with both 2D and 3D options, including a media server powered by Puget Systems. Vu One configuration options include:

  • Vu One: A complete cinematic solution for high-resolution images and video playback
  • Vu One + 3D: Upgraded render engine for 3D virtual environments (Unreal Engine) with camera tracking
  • Vu One Custom: Integration to an existing infrastructure and sizing of the LED wall to fit your space

 

 

 

Review: Nvidia GeForce RTX 4070 Founders Edition

By Brady Betzel

After reviewing the Nvidia RTX 4090 Founders Edition and the Asus TUF Nvidia RTX 4070 Ti, some readers were left wanting a much cheaper alternative. If you found the $799 price tag for the 4070 Ti a little pricey, then the Nvidia RTX 4070 Founders Edition, at the price of $599, just might be for you.

On the surface, Nvidia is selling the 4070 Founders Edition as a 1440p GPU at 100+fps with raytracing, DLSS 3 and AI-powered content creation. But for editors, colorists and graphics artists, the Nvidia 4070 Founders Edition may hit the sweet spot between form and function.

For Comparison

– RTX 4090 – CUDA cores: 16,384, Memory: 24GB ($1,599)

– RTX 4080 – CUDA cores: 9,728, Memory: 16GB ($1,199)

– RTX 4070 Ti – CUDA cores: 7,680, Memory: 12GB ($799)

– RTX 4070 – CUDA cores: 5,888, Memory: 12GB ($599)

Tech Specs

Nvidia CUDA cores: 5,888

Boost clock (GHz): 2.475GHz

Base cock (GHz): 1.92GHz

Memory size: 12GB

Memory type: GDDR6X

Memory interface width: 192-bit

Power connectors: 2x PCIe 8-pin cables (adapter in box) or a 300W or greater PCIe Gen 5 cable. (Certain manufacturer models may use one PCIe 8-pin power cable.)

If you are building a new Windows-based computer system, you will need a minimum power supply of 650 watts. The 4070 has three DisplayPort outputs and one HDMI port. The PCIe card itself is much smaller than the 4090 and 4070 Ti but still takes up two PCI slots.

Testing
The Nvidia 4070 Founders Edition is promoted as the most logical upgrade to users who own the RTX 3070 Ti. It’s the same retail price but with upgraded Ada Lovelace AD104 architecture. There are many more technical upgrades, but most importantly, an AV1 encoder, 12GB GDDR6X memory (versus 8GB), 36MB L2 memory subsystem (versus 4MB L2) and, finally, lower overall power consumption. The average video playback power is 16W (versus 20W).

In addition to the upgraded third-gen raytracing cores and DLSS 3, improved AI-powered workflows are what’s keeping me an Nvidia fan. It seems that every day there is some new AI-powered workflow that can embrace the power of the Nvidia 40 Series GPUs, from Magic Mask in Blackmagic Resolve to ChatGPT to Stable Diffusion. I am not doom and gloom about AI, thinking that everyone will be out of a job because of it. I do think that embracing the fluidity and time-saving functions of AI is where the power users will soar.

Resolve
First up, testing inside of Resolve 18.1.2 where I take clips from different cameras and do a basic color correction in a 3840×2160 timeline. I use these same sequences and effects in a lot of reviews. The clips include:

    • ARRI RAW: 3840×2160 24fps – 7 seconds, 12 frames
    • ARRI RAW: 4448×1856 24fps – 7 seconds, 12 frames
    • BMD RAW: 6144×3456 24fps – 15 seconds
    • Red RAW: 6144×3072 23.976fps – 7 seconds, 12 frames
    • Red RAW: 6144×3160 23.976fps – 7 seconds, 12 frames
    • Sony a7siii: 3840×2160 23.976fps – 15 seconds

I then add Blackmagic’s noise reduction, sharpening and grain. Finally, I replace the noise reduction with Neat Video’s noise reduction. From there I export multiple versions: DNxHR 444 10-bit OP1a .MXF file, DNxHR 444 10-bit .mov, H.264 MP4, H.265 MP4, AV1 MP4 (Nvidia GPUs only), and then an IMF package using the default settings.

Nvidia RTX 4070

Resolve 18 exports

DNxHR 444 10-bit MXF DNxHR 444 10-bit .mov H.264 MP4 H.265 MP4 AV1

MP4

IMF
Color correction only 00:28 00:26 00:23 00:23 00:24 00:49
CC + Resolve noise reduction 01:52 01:51 01:50 01:50 01:50 01:54
CC, Resolve NR, sharpening, grain 02:30 02:31 02:30 02:30 02:30 02:33
CC + Neat Video Noise Reduction 03:18 03:18 03:15 03:15 03:18 03:21

Comparing the results between this Nvidia RTX 4070 Founders Edition and the Nvidia RTX 4070 Ti in Resolve is very interesting. Only color-corrected clips export about 30% faster on the 4070 Founders Edition. Once we add built-in Resolve noise reduction, sharpening and grain, the 4070 Ti wins out with a slight 10%-14% speed advantage.

Finally, the most interesting comparison is that in using a third-party OFX plugin like Neat Video, the 4070 and 4070 Ti are essentially tied. I spoke with Neat Video a while ago via email, and they were very excited about the big increase in L2 memory and what speed increases the added bandwidth allows for… and they weren’t wrong. It’s an exciting result to see for colorists and online editors who may not want to invest in a 4090 but still want speed when rendering and exporting clips with Neat Video applied.

Speaking of the Nvidia RTX 4090 Founders Edition, here are the results for the same tests in Resolve for comparison:

Nvidia RTX 4090

Resolve 18 exports

DNxHR 444 10-bit MXF DNxHR 444 10-bit MOV H.264 MP4 H.265 MP4 AV1

MP4

IMF
Color Correction Only 00:27 00:27 00:22 00:22 00:23 00:49
CC + Resolve Noise Reduction 00:57 00:56 00:55 00:55 00:55 01:04
CC, Resolve NR, Sharpening, Grain 01:14 01:14 01:12 01:12 01:12 01:19
CC + Neat Video Noise Reduction 02:38 02:38 02:34 02:34 02:34 02:41

In Adobe Premiere Pro, I use the same sequences that I exported in Resolve but with Premiere built-in noise reduction, sharpening and grain. (I don’t have an Adobe Premiere Pro Neat Video license.)

Here are the results from Premiere:

Nvidia RTX 4070

Adobe Premiere 2023 (individual exports in Media Encoder)

DNxHR 444 10-bit MXF DNxHR 444 10-bit MOV H.264 MP4 H.265 MP4
Color correction only 01:03 01:46 01:08 01:10
CC + NR, sharpening, grain 15:24 35:49 35:40 33:51
Nvidia RTX 4070

Adobe Premiere 2023 (simultaneous exports in Media Encoder)

Color correction only 02:09 03:06 02:31 02:31
CC + NR, sharpening, grain 17:27 38:28 17:27 17:27

And for comparison, here are the results from the Nvidia RTX 4090:

Nvidia RTX 4090

Adobe Premiere 2023 (Individual exports in Media Encoder)

DNxHR 444 10-bit .mxf DNxHR 444 10-bit .mov H.264 .mp4 H.265 .mp4
Color correction only 01:28 01:46 01:08 01:07
CC + NR, sharpening, grain 13:07 34:52 34:12 33:54
Nvidia RTX 4090

Adobe Premiere 2023 (Simultaneous exports in Media Encoder)

Color correction only 02:17 01:44 01:08 01:11
CC + NR, sharpening, grain 13:47 34:13 15:54 15:54

Premiere is not very consistent in its export times, so I performed these exports a few times with each GPU to get a general sense of where the times should be. Unfortunately, each export was different by a minute or two sometimes. Either way, for the most part the Nvidia RTX 4070 was right around where it should be in comparison to the 4090 — about half with only color correction and various quicker timings when I added noise reduction, sharpening and grain.

Benchmarks
Blender Benchmark CPU samples per minute:

  1. Monster: 185.098690
  2. Junkshop: 128.839125
  3. Classroom: 88.104209

Blender Benchmark GPU samples per minute:

  1. Monster: 3138.191595
  2. Junkshop: 1534.448876
  3. Classroom: 1540.772698

Neat Video HD: GPU Only 64.5 frames/sec

Neat Video UHD: GPU Only 15.8 frames/sec

OctaneBench 2020.1.5: Total score – 653.37

PugetBench for Premiere 0.95.6, Premiere 23.3.0:

  • Extended overall score: 913
  • Standard overall score: 983
  • Extended export score: 89.6
  • Extended live playback score: 100.8
  • Standard export score: 89.7
  • Standard live playback score: 121.8
  • Effects score: 83.5
  • GPU score: 86.8

***Note: PugetBench for Resolve and After Effects errored out during testing.

Stable Diffusion using the Automatic1111 web UI:

Test Configuration
Stable Diffusion CheckPoint: v2-1_768-ema-pruned.ckpt

Sampling method: Euler_a

Sampling steps: 50

Batch count: 10

Batch size: 2

Image width: 768

Image height: 768

Prompt: beautiful render of a Tudor-style house near the water at sunset, fantasy forest. photorealistic,cinematic composition, cinematic high detail, ultra-realistic, cinematic lighting, depth of field, hyper-detailed, beautifully color-coded, 8K, many details, chiaroscuro lighting, ++dreamlike, vignette

Total: 20 images per minute

V-Ray – CPU: 19885 vsamples

V-Ray – GPU RTX: 2546 vrays

V-Ray – GPU CUDA: 1872 vpaths

Summing Up
The $599 price tag for the Nvidia RTX 4070 Founders Edition GPU is right on point. The 4070 is a good replacement for the RTX 3070 Series GPUs with added bells and whistles. Will it knock your socks off like the 4090 does? Not really, but in some of my testing, the results weren’t far off. And if you want a little bump in speed, the Nvidia RTX 4070 Ti might be the happy medium at $799.

Physically, the 4070 is about one-third the size of the 4090 and only requires two PCIe 8-pin cables instead of four like the 4090 does. And if you are worried about your carbon footprint, the 4070 is much less of a draw than the 4090. The 4070 would be a great assistant station GPU, where you could offload renders or exports that you don’t necessarily need in a hurry.


Brady Betzel is an Emmy-nominated online editor at Margarita Mix in Hollywood, working on shows like Life Below Zero and Uninterrupted: The Shop . He is also a member of the Producers Guild of America. You can email Brady at bradybetzel@gmail.com. Follow him on Twitter @allbetzroff.

 

NBCUni 9.5.23
Unreal Engine 5.3

Epic’s Unreal Engine 5.3 is Now Available

Epic Games‘ Unreal Engine 5.3 is now available. This release brings many improvements as the company continues to expand UE5’s functionality and potential for game developers and creators.

As well as enhancements to core rendering, developer iteration and virtual production toolsets, UE5.3 features experimental new rendering, animation and simulation features to give users the opportunity to test extended creative workflows inside UE5 — reducing the need to round-trip with external applications.

With this release, Epic has continued to refine all core UE5 rendering features, making it easier for developers to leverage them at higher quality in games running at 60fps on next-gen consoles. The improvements also offer higher-quality results and enhanced performance for linear content creators. Specifically, the Nanite virtualized geometry system has faster performance for masked materials, including foliage, and can represent a greater range of surfaces due to the new Explicit Tangents option, while Lumen with Hardware Ray Tracing has expanded capabilities that include multiple reflection bounces and offers faster performance on consoles.

Other areas with notable advancements include Virtual Shadow Maps (VSM) — which is now production-ready —Temporal Super Resolution (TSR), Hair Grooms, Path Tracing and Substrate.

Developers can now use additional CPU and memory resources when converting content from the internal UE format to a platform-specific format, significantly reducing the time it takes to get a cooked output from a build farm or on a local workstation.

Enabling Multi-Process Cook launches subprocesses that perform parts of the cooking work alongside the main process. Developers can select how many subprocesses they want to run on a single machine.

Filmmakers can now emulate the workflow and results of traditional camera movement along tracks or on dollies, thanks to a new Cine Cam Rig Rail Actor. The Cine Cam Rig Rail provides more refined controls than the existing Rig Rail, including the ability to choreograph settings like camera rotation, focal length, focus distance and so on, at different control points along the path. It supports both in-editor and VCam workflows.

VCam enhancements to the system in this release include the ability to review takes directly on the iPad for faster iteration; to simultaneously stream different VCam output for different team members — for example, with camera controls for the camera operator, without for the director — facilitating collaborative VCam shoots; and to record at a slower frame rate and play back at normal speed for easier capture of fast-moving action.

Cinematic-Quality Volumetric Rendering
Two new features, Sparse Volume Textures (SVT) and Path Tracing of Heterogeneous Volumes, introduce a number of new capabilities for volumetric effects such as smoke and fire.

Sparse Volume Textures store baked simulation data representing volumetric media and can be simulated in Niagara or imported from OpenVDB (.vdb) files created in other 3D applications.

In addition, more complete support for rendering volumes is now available as Experimental in the Path Tracer. This offers the potential for high-quality volumetric rendering —including global illumination, shadows and scattering — for cinematics, films, episodic television and other forms of linear content creation directly in UE5.

Real-time use cases such as games and virtual production can also begin experimenting with SVTs for playback of volumetric elements, although performance is limited at this time and highly dependent on the content.

Orthographic Rendering
UE 5.3 is also introducing orthographic rendering, which is useful for visualizing architecture and manufacturing projects, as well as offering orthographic projections as a stylistic camera choice for games.

Multiple areas of the engine have received attention to achieve parity between perspective and orthographic projections. Most modern features of UE5 are expected to now work, including Lumen, Nanite, Shadows, and Temporal Super Resolutions. Orthographic rendering is also available in UE, enabling users to make updates in a live setting.

Skeletal Editor
A new Skeletal Editor provides animators with a variety of tools for working with Skeletal Meshes, including the ability to paint skin weights.

Whether for quick prototypes or final rigging, this enables animators to perform more character workflows entirely in UE without the need for round-tripping to DCC applications — enabling them to work in context and iterate faster.

Panel-Based Chaos Cloth with ML Simulation
Updates to Chaos Cloth enable creators to bring more of their workflows directly to UE. Epic has introduced a new Panel Cloth Editor and new skin weight transfer algorithms and added XPBD (extended position-based dynamics) constraints as a basis for UE’s future cloth generation. This provides for a non-destructive cloth simulation workflow in which creators can trade speed for precision. In addition, the use of panel-based cloth can result in better-looking simulations.

Cloth can also now be simulated and cached in UE using the new Panel Cloth Editor in conjunction with the ML Deformer Editor.

nDisplay Support for SMPTE ST 2110
In preparation for the next generation of LED production stages, Epic has added Experimental support to nDisplay for SMPTE ST 2110, using Nvidia hardware and Rivermax SDK. This lays the groundwork for a range of hardware configurations that open new possibilities for LED stage. One configuration uses a dedicated machine for each camera frustum, maximizing the potential rendering resolution, increasing frame rate, and allowing for more complex scene geometry and lighting than previously possible.

nDisplay support offers the ability to tackle challenges like wider angle lenses that require greater resolution and multi-camera shoots that stress current systems. It also implies lower latency in the system, due to simplification of the signal chain.


Nvidia’s GTC 2023 – New GPUs and AI Acceleration

By Mike McCarthy

This week, Nvidia held its GTC conference and made several interesting announcements. Most relevant in the M&E space are the new Ada Lovelace-based GPUs. To accompany the existing RTX 6000, there is now a new RTX 4000 small form factor and five new mobile GPUs offering various levels of performance and power usage.

New Mobile GPUs
The new mobile options all offer performance improvements that exceed the next higher tier in the previous generation. This means the new RTX 2000 Ada is as fast as the previous A3000, the new RTX 4000 Ada exceeds the previous top-end A5500, and the new mobile RTX 5000 Ada chip with 9,728 CUDA cores and 42 teraflops of single-precision compute performance should outperform the previous A6000 desktop card or the GeForce 3090 Ti. If true, that is pretty impressive, although there’s no word yet on battery life.

New Desktop GPU
The new RTX 4000 small-form-factor Ada takes the performance of the previous A4000 GPU, ups the memory buffer to 20GB and fits it into the form factor of the previous A2000 card, which is a low-profile, dual-slot PCIe card that only uses the 75 watts from the PCIe bus. This allows it to be installed in small-form-factor PCs or in 2U servers that don’t have the full-height slots or PCIe power connectors that most powerful GPUs require. Strangely, it is lower-performing, at least on paper, than the new mobile 4000, with 20% fewer cores and 40% lower peak performance (if the specs I was given are correct). This is possibly due to power limitations of the 75W PCIe bus slot.

The naming conventions across the various product lines continue to get more confusing and less informative, which I am never a fan of.  My recommendation is to call them the Ada 19 or Ada 42 based on the peak teraflops. That way it is easy to see how they compare, even over generations against the Turing 8 or the Ampere 24. This should work at least for the next four to five generations until we reach petaflops, when the numbering will need to be reset again.

New Server Chips
There are also new announcements targeted at supercomputing and data centers. The Hopper GPU is focused on AI and large language model acceleration, usually installed in sets of 8 SXM modules in a DGX server. Also, Nvidia’s previously announced Grace CPU Superchip is in production as its new ARM-based CPU. Nvidia offers these chips as dual-CPU processing boards or combined as an integrated Grace-Hopper Superchip, with shared interface bus and memory between the CPU and GPU. The new Apple Silicon processors use the same unified memory approach.

There are also new PCIe-based accelerator cards, starting with the H100 NVL, which has Hopper architecture in a PCIe card offering 94GB of memory for transformation processing.  “Transformation” is the “T” in ChatGPT, by the way. There are also Lovelace architecture-based options, including the single-slot L4 for AI video processing and the dual-slot L40 for generative AI content generation.

Four of these L40 cards are included in the new OVX-3 servers, designed for hosting and streaming Omniverse data and applications. These new servers from various vendors will have options for either Intel Sapphire Rapids- or AMD Genoa-based platforms and will include the new BlueField-3 DPU cards and ConnectX-7 NICs. They will be also available in a predesigned Superpod of 64 servers and a Spectrum-3 switch for companies that have a lot of 3D assets to deal with.

Omniverse Updates
On the software side, Omniverse has a variety of new applications that support its popular USD data format for easier interchange, and it now supports the real-time, raytraced, subsurface scattering shader (maybe, RTRTSSSS for short?) for more realistic surfaces. Nvidia is also partnering closely with Microsoft to bring Omniverse to Azure and to MS 365, which will allow Microsoft Teams users to collaboratively explore 3D worlds together during meetings.

Generative AI
Nvidia Picasso — which uses generative AI to convert text into images, videos or 3D objects — is now available to developers like Adobe. So in the very near future, we will reach a point where we can no longer trust the authenticity of any image or video that we see online. It is not difficult to see where that might lead us. One way or another, it will be much easier to add artificial elements to images, videos and 3D models. Maybe I will finally get into Omniverse myself when I can just tell it what I have in mind, and it creates a full-3D world for me. Or maybe if I need it to just add a helicopter into my footage for a VFX shot with the right speed and perspective. That would be helpful.

Some of the new AI developments are concerning from a certain perspective, but hopefully these new technologies can be harnessed to effectively improve our working experience and our final output. Nvidia’s products are definitely accelerating the development and implementation of AI across the board.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 15 years later.


Review: Nvidia’s Founder’s Edition RTX 4090 — An Editor’s Perspective

By Brady Betzel

Nvidia has released its much-anticipated RTX 4090 GPU. It’s big and power-hungry, and I’ll provide more details on that later in this piece. When the product was released, I initially held off filing this review to see if Nvidia or Blackmagic (who showed a prerelease version of Resolve with AV1 encoding technology that only works with the new 4090 series) would release any Easter eggs, but so far it hasn’t happened.

Whether they do or not, I plan on doing a more-in-depth review once I’ve settled in and found the RTX 4090 sweet spots that will help editors and colorists. But for now, there are still some gems in the RTX 4090 that are worth checking out. (For a tech guru perspective, check out Mike McCarthy’s review.)

Founder’s Edition
The Nvidia GeForce RTX 4090 GPU comes in a few different flavors and iterations. I was sent the Founder’s Edition, which features the new Ada Lovelace architecture, two FP32 streaming processors, the DLSS 3 platform, 16,384 CUDA cores, 24GB GDDR6X memory, 2.23 base clock speed with up to 2.52 boost clock speeds, and much more. You can find more in-depth technical specs on the Nvidia site, where you can also compare previous versions of Nvidia GPUs.

In this review, I am focusing on features that directly relate to video editors and colorists. For the most part, the RTX 4090 performance is as expected, with a generational improvement over the RTX 3090. It’s faster and contains new updates, like DLSS 3 (an artificial intelligence-powered performance booster). Those features are typically gaming-focused and embrace technologies like optical flow to “create” higher resolutions and frames to increase frame rates. That doesn’t typically mean much for us post nerds, unless you also play games, but with artificial intelligence-adapted features becoming so prevalent, we are beginning to see speed increases in editing apps as well.

Resolve Prerelease
As editors, we need faster rendering, faster exporting and more efficient decoding of high-resolution media. We always hear about 8K or 4K, but you don’t always hear how much computing and GPU power you need to play these large resolutions back in real time, especially when you are editing with CPU/GPU-hogging codecs like Red R3D, H.264 and more.

Inside of DaVinci Resolve 18, I was able to playback all my standard testing files in real time without any effects on them. From UHD ProRes files to UHD Red R3D files, the RTX 4090 handled all of them. Even when I played back ProRes, 8K UHD (7680×4320) files I was pleasantly surprised at the smooth, real-time playback. All the files played back without using cache files, proxy files or pre-rendered media.

Magic Mask

Keep in mind I was using a prerelease version of Blackmagic’s DaVinci Resolve (mentioned earlier) to harness the power of AV1 encoding. And AV1 is the real gem in the updated Nvidia RTX 4090 GPU architecture. This is why I mentioned “prerelease” in the last sentence. I’ve heard through the grapevine that a newly updated Resolve will be released sometime this fall and will include some of the features I’m about to go into. But for now, I’m sorry. That’s all I’ve got in terms of a release date.

So what is AV1? Think of the old tried-and-true H.264 and H.265 codecs but with a 30% smaller file size for equivalent quality. Without getting too far into the weeds on AV1, the AV1 codec came about when a group of big companies like Intel, Nvidia, Google, etc. wanted to create a royalty-free video codec that had the same quality but was more efficient than HEVC-based codecs, such as H.264 and H.265. That is how the AV1 codec was born. AV1 is still on the ground floor, but with large companies like Nvidia adding new features such as AV1-compatible dual-encoders, and with nonlinear editing apps like Resolve including encoding abilities, it will soon hit the mainstream.

Face refinement

Nvidia really took the bull by the horns on AV1, and Blackmagic followed along. In the prerelease version of Resolve that I used, I encoded the included 4K (UHD) 30fps ProRes 422 HQ clip provided by Nvidia, which has a run time of about 2 minutes and 7 seconds, to the new AV1 codec in about 17 seconds. And since no other card can export AV1 files yet, there is really no benchmark for me to compare to. However, I did export the same sequence to an H.265 encoded file using an Nvidia Quadro A6000 GPU, and that took about 39 seconds. I was kind of surprised given that the A6000 contains double the memory and costs over double the price, but when I looked deeper into it, it made sense.

The RTX 4090 is a much newer card with much newer technology, including over 6,000 more CUDA cores. But for a pro who needs the extended memory range; a compact, two-slot design; and half the power consumption, the Quadro A6000 will fit better (literally). The RTX 4090 is physically large and takes up three slots.

Scene Detect

AI, Editing and Color
Remember a bit earlier when I mentioned AI technology and how it’s creeping its way into the tech that video editors and colorists use? And while RTX 4090 is more of a gamers card, there are a few very specific updates that video editors and colorists will like? One of them inside Resolve 18’s prerelease is Magic Mask. I’ve used it before, and it is very good, but it’s also time-consuming, especially if you don’t have a very fast CPU/GPU. Lucky for us, the RTX 4090 has dramatically improved Magic Mask processing speeds. Nvidia reports that the time difference in its testing was 29 seconds for the RTX 3090, 17 seconds using the RTX 4090 and 34 seconds using the Quadro A6000. Some other AI-improved features of Resolve 18 are Scene Detect, Super Scale and Optical Flow.

The Nvidia RTX 4090 has shown increased efficiency when compared to the Quadro A6000. Besides the increased memory size, Frame Lock and Genlock are the standout features of the A6000 that are going to matter to users looking to decide between the two GPUs. For media creators, the RTX 4090 is a phenomenal GPU that will dramatically decrease export times, media processing times, effects render times and much more, which directly correlates to the “time is money” adage.

Power Needs, Cost
The RTX 4090 is a power-hungry beast, straight up. It needs three PCI slots, three power inputs and a beefy power supply. The statistics say that the RTX 4090 requires 450W of power versus 320W for the RTX 3090. And in overall system power, the RTX 4090 requires at least 850W, while the 3090 requires 750W. If you aren’t familiar with the RTX 3090 style of PCIe cards, both the 3090 and the 4090 require either three PCIe eight-pin cables or one 450W or greater PCIe Gen 5 cable. So you should probably aim for a power supply capable of producing at least 1,000 watts, keeping in mind that any other IO cards you are supporting will also add to the power bill.

Retail pricing for the RTX 4090 starts at $1,599. It’s not cheap, so if you have an RTX 3090 and don’t care about the AV1 encoding feature (whether for Resolve or for streaming apps like OBS), then you might be able to hold off on purchasing one. But if you are like me and want the latest and greatest, the Nvidia RTX 4090 will be the GPU to get. And if you are thinking about getting into other avenues of media — say, Unreal Engine or 3D modeling in apps like Maxon Cinema 4D with Otoy’s OctaneRender — you’ll find the RTX 4090 embraces those apps and even adds special features, such as denoising optimizations.

Watch this space for an Nvidia RTX 4090 follow-up review.


Brady Betzel is an Emmy-nominated online editor at Margarita Mix in Hollywood, working on shows like Life Below Zero and Uninterrupted: The Shop . He is also a member of the Producers Guild of America. You can email Brady at bradybetzel@gmail.com. Follow him on Twitter @allbetzroff.

 


Nvidia Introduces Ada Lovelace GPU Architecture

By Mike McCarthy

Nvidia has announced its next-generation GPU architecture named after Ada Lovelace, who is an interesting figure in early computer programming (check out her Wiki). Nvidia’s newest Ada Lovelace chips have up to 18,432 CUDA cores and up to 76 billion transistors at 4nm sizes. This will of course lead to increased processing performance at lower prices and power usage.

Ada Lovelace

The new changes in the SDK and rendering level encompass DLSS 3.0 for super sampling; RTX Remix for adding new rendering features to mods on older games; shader execution reordering for improved performance; displaced micro-meshes for finer shadow detail; opacity micromaps for cleaner raytracing; and Nvidia Reflex for coordinating the CPU with the GPU. The biggest render function unique to the new generation is optical flow acceleration, which consists of AI-generated frames for higher frame rates.

Ada Lovelace chips have optical flow accelerators in the hardware, and Nvidia is training the AI models for this technology on its own supercomputers. Raytracing is now far more efficient with the newer-generation RT cores. RTX Racer will be a tech demo application released later this year for free and will leverage all these new technologies. RTX Remix can extract 3D objects right from the GPU draw cards and create USD assets from them. It also adds raytracing to older games by intercepting draw calls from DirectX. Users can further customize any RTX mod in real time by adjusting various render settings. As someone who usually plays older games, this is exciting, as I suspect it will lead to all sorts of improvements to older titles.

New GeForce Cards
The main products headlining this announcement are the new GeForce 4090 and 4080 cards, which should far outclass the previous generation released two years ago. Also, contrary to numerous rumors, they should consume less energy than the existing, power-hungry Ampere cards. The 4090 will have 24GB of memory like the previous generation, while the 4080 will come in 12GB and 16GB variants, with the 16GB version offering a more powerful chip and not just more RAM. Even the lower class of 4080 outperforms the existing 3090 Ti in most cases.

The new cards will have a PCIe Gen5 power connector, which offers up to 600W of power, but the cards draw much less energy than that. They do not have PCIe Gen5 slot connectors, and this is because they have yet to saturate the bandwidth available in a PCIe Gen4 slot.

AV1 Encoding
One of the other significant new features in this generation of chips is the addition of hardware acceleration for AV1 encoding. AV1 decoding support is already included in the existing Ampere chips, but this is the first hardware encoder available outside of Intel’s hard-to-find discrete graphics cards. AV1 is a video codec that claims to offer 30% more efficient compression than HEVC while being open-source and royalty-free.

Ada Lovelace

Netflix and a few other large tech companies have been offering AV1 as a streaming option for a while now, but it has not been much of an option for smaller content creators. That is about to change with new software coming, like a hardware-accelerated AV1 encoding plugin for Adobe Premiere Pro and Blackmagic DaVinci Resolve. Integrated AV1 streaming support that will use the new hardware acceleration is also coming to OBS and Discord.

Resolve now uses GPU for RAW decode, AI analysis and encoding, making it a real GPU computing powerhouse. I imagine that YouTube will soon have a lot of AV1 content streaming through it soon. The new cards have dual encoders that work in parallel by dividing each frame between them, allowing up to 8Kp60 encoding in real time. I assume that in the future, lower-end cards will have a single encoder for 8Kp30, which should be good enough for most people.

Professional RTX
Ada LovelaceThere is a new RTX 6000 professional GPU coming in December, not to be confused with the identically named Turing-based card from two generations ago. Nvidia’s product naming has really gone downhill since they dropped the Quadro branding. But regardless of what it is called, the new RTX 6000 should be a very powerful graphics card, with Nvidia claiming up to twice the performance of the current A6000. It has a similar underlying Ada Lovelace chip to the 4090 but with a lower 300W power envelope, a more manageable size with a two-slot cooling solution.

So there is a whole new generation of hardware coming, and it will get here soon. Both Intel and AMD are releasing their next generation of CPUs, and we will have new graphics cards to go with them. Even if you can’t afford a new high-end Ada Lovelace GPU, hopefully this will drive down the prices for the previous-generation cards that have been so difficult to find up to this point due to the cryptocurrency craze. One way or another, faster GPUs are coming, and I am looking forward to all that they bring to the table.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.


GTC 2021: New GPU Cards, Omniverse and More

By Mike McCarthy

Nvidia’s GPU Technology Conference (GTC) is taking place virtually this week, with a number of new hardware and software announcements.

On the hardware front, Nvidia has four new PCIe-based GPU cards, which are essentially scaled-back versions of the previously announced A6000 and A40. The new A5000 and A4000 are new Ampere generations of the previous Turning-based Quadro RTX 5000 and 4000.

A5000

The new cards have 2.5 times as many CUDA cores as their predecessors and 8GB more RAM. The A5000 has 8K cores and 24GB RAM, while the A4000 has 6K cores and 16GB RAM. This makes them the professional equivalents of the GeForce 3080 and 3070, respectively, but with more RAM. The A4000 seems like the ideal professional card for video editors and others who need professional GPU processing but not the overpriced 3D performance of the A6000.

On the server and virtualization side, the A10 replaces the T4 and sits between the A5000 and A4000 performance-wise. The A16 replaces the existing M10 as a 4x GPU card, with 64 GB RAM in total, targeting remote desktop hosting.

A4000

The other major hardware announcement is the mobile line of professional GPUs, scaling from the A5000 down to the T600. The mobile A5000 has 6K cores, doubling the number of cores from the previous-generation RTX 5000 mobile GPUs. This scales down to 5K cores in the A4000, 4K cores in the A3000 and 2,560 cores in the new A2000, which matches the specs of the previous second-tier Quadro RTX 4000.

So we are seeing a huge increase over the previous generation, but compared to the desktop cards, we are no longer seeing parity between the PCIe product naming and the mobile solution labels. The new mobile A5000 is equal to the desktop A4000, which makes the entire naming convention really difficult to intuitively decode or understand. Suffice to say, the new products are much faster than the old products, but it is difficult to compare them to each other. Similar to dropping the Quadro branding, Nvidia seems to be taking steps to make it more difficult to keep track of or compare their different product offerings. I say this as a huge fan of Nvidia’s products: “This confusion is not helping your end users.”

Omniverse

Omniverse
On the software front, Nvidia’s Omniverse is the culmination of a number of technologies being combined to greatly enhance 3D workflows and collaboration between different apps and users. Based on Pixar’s USD (Universal Scene Descriptions), it can link the 3D assets being worked on in applications from completely different vendors. On a single system, it appears to work like Adobe Dynamic Link, where changes made in one program show up immediately on assets in a separate application. But in other ways it is similar to the role of NDI for real-time workflows, sharing content between systems on a network or across the internet. The individual user version is a free tool, but the Nucleus server that allows sharing between multiple users will be an enterprise-level solution in the cloud. Nvidia has also partnered with Apple to allow direct support of viewing Omniverse XR content on iPads and iPhones. I don’t personally do much work in 3D, but I can see the benefits of what they have developed here, and I am sure it will make a lot of people’s creative 3D work much easier and more efficient.

Omniverse

Grace
Nvidia also announced its new line of “Grace” ARM processors. They are not yet available, but after their initial debut in data center servers, they may eventually make it into consumer systems. While this probably won’t totally replace Intel’s x86 CPUs anytime soon, Apple’s M1 architecture demonstrates that Nvidia isn’t the only company betting that x86 processing can be replaced in many new applications. So while Grace-based products aren’t going to impact your work immediately, it may be the first step toward a big change well into the future.

Sessions
Beyond the new products announced during the conference, there are also hundreds of sessions where attendees can learn about different technologies and their implementations from Nvidia’s staff and top users. The ones that stick out to me are Rob Legato’s session on virtual cinematography and a roundtable on in-camera VFX. Both are scheduled for Wednesday. Since the online version of GTC is free to attend, you are welcome to check them out along with hundreds of other sessions that are available to watch.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

­­­

Podcast 12.4

GTC Conference: Advances in Virtual Production

By Mike McCarthy

I usually attend GTC in San Jose each the spring, but that was interrupted this year due to COVID-19. Instead, I watched a couple of sessions online. This fall, Nvidia normally would have held its GTC Europe, but instead made it a global online event. As a global event, the sessions are scheduled at all hours, depending on where in the world the presenters or target audience are — hence the tagline, “Innovation never sleeps.” Fortunately, the sessions that were scheduled at 2am or 3am were recorded, so I could watch them at more convenient times. The only downside was not being able to ask questions.

Nvidia is building a supercomputer for UK’s healthcare research using artificial intelligence.

While the “G” in GTC stands for GPU, much of the conference is focused on AI and supercomputing, with a strong emphasis on health care applications. Raytracing — the primary new feature of Nvidia’s hardware architecture — is graphics-focused, but that is limited to true 3D rendering applications and can’t be easily applied to other forms of image processing. Since I don’t do much 3D animation work, those topics are less relevant to

In-Camera VFX
The one graphics technology development that I am most interested in at the moment —the focus of most of the sessions I attended — is virtual production. Or more precisely, in-camera VFX. This first caught my attention previously at GTC, in sessions about the workflow in use on the show The Mandalorian. I was intrigued by it at the time, and my exploration of those workflow possibilities only increased when one of my primary employers expressed an interest in moving toward that type of production.

Filmmaker Hasraf “HaZ” Dulull using his iPad on a virtual production.

There were a number of sessions at this GTC that touched on virtual production and related topics. I learned about developments in Epic’s Unreal Engine, which seems to be the most popular starting point due to its image quality and performance. There were sessions that touched on applications that build on that foundation — to add the functionality that various productions need — and on the software-composable infrastructure that you can run those applications on.

I saw a session with Hasraf “Haz” Dulull, a director who has made some shorter films in Unreal Engine. He is just getting started on a full-length feature film adaptation of the video game Mutant Year Zero, and it’s being created entirely in Unreal Engine as final pixel renders. While it is entirely animated, Haz uses his iPad for both facial performance capture and virtual camera work.

One of my favorite sessions was a well-designed presentation by Matt Workman, a DP who was demonstrating his previz application Cine Tracer, that runs on Unreal Engine. He basically went through the main steps for an entire virtual production workflow.

There are a number of different complex components that have to come together seamlessly for in-camera VFX, each presenting its own challenges. First you have to have a 3D world to operate in, possibly with pre-animated actions occurring in the background. Then you have to have a camera tracking system to sync your view with the 3D world, which is the basis for simpler virtual production workflows.

To incorporate real-world elements, your virtual camera has to be synced with a physical camera in order to record real objects or people, and you have to composite in the virtual background. Or, for true in-camera VFX, you have to display the background on an LED wall in the background. This requires powerful graphics systems to drive imagery on those displays, compensating for their locations and angles. Then you have to be able to render the 3D world onto those displays from the tracked camera’s perspective. Lastly, you have to be able to view and record the camera output, presumably, as well as a clean background plate to further refine the output in post.

Each of these steps has a learning curve, leading to a very complex operation before all said and done. My big take away from all of my sessions at the conference is that I need to start familiarizing myself with Unreal Engine. Matt Workman’s Cine Tracer application on Steam might be a good way to start learning the fundamentals of those first few steps, if you aren’t familiar with working in 3D.

Lenovo P620

Lenovo P620 & GPUs
Separately, a number of sessions touched on Lenovo’s upcoming P620 workstation based on AMD’s Threadripper Pro architecture. That made sense, as that will be the only way in the immediate future to take advantage of Ampere GPU’s PCIe 4.0 bus speeds for higher bandwidth communication with the host system. I am hoping to be able to do a full review on one of those systems in the near future.

I also attended a session that focused on using GPUs to accelerate various graphics tasks for sports broadcasting, including stitching 360 video at 8K and creating live volumetric renders of athletes with camera arrays. As someone who rarely watches broadcast television, I have been surprised to see how far live graphics have come with various XR effects and AI-assisted on-screen notations and labels to cue viewers to certain details on the fly. The GPU power is certainly available; it has just taken a while for the software to be streamlined enough to use it effectively.


Mike McCarthy is an online editor/workflow consultant with over 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

Podcast 12.4

Review: Nvidia’s New Ampere-Based GeForce 3090 Series

By Mike McCarthy

Nvidia has releasing the next generation of GeForce video cards based on its new Ampere architecture. Nvidia is also sharing a number of new software developments it has been working on. Some are available now, while others are coming soon.

The first cards in the GeForce RTX 30 Series are the 3070, 3080 and 3090. They offer varying numbers and CUDA cores and amounts of video memory, and they strongly outperform the cards they are replacing. They all support the PCIe 4.0 bus standard for greater bandwidth to the host system. They also support HDMI 2.1 output to drive displays at 8Kp60 with a single cable and can encode and decode at 8K resolutions. I have had the opportunity to try out the new GeForce RTX 3090 in my system and am excited by the potential it brings to the table.

Nvidia originally announced its Ampere GPU architecture back at GTC in April, but it was initially available for supercomputing customers only. Ampere has second-generation RTX cores and third-generation Tensor cores, while having double as many CUDA cores per streaming multi-processer (SM), leading to some truly enormous core counts. These three new GeForce graphics cards improve on the previous Turing-generation products by a significant margin, both due to increased cores and video memory and to software developments that improve efficiency.

The biggest change to improve 3D render performance is the second generation of DLSS (Deep Learning Super Sampling). Basically, Nvidia discovered that it was more efficient to use AI to guess the value of a pixel than to actually calculate it from scratch. DLSS 2.0 runs on the Tensor cores, leveraging deep learning to increase the detail level of a rendered image by increasing its resolution via intelligent interpolation. It is basically the opposite of Dynamic Super Sampling (DSR), which rendered higher-resolution frames and scaled them down to fit a display’s native resolution, potentially improving image quality.

DLSS usually increases the resolution by a factor of four, for example, generating a UHD frame from an HD render. But now it supports running by a factor of nine, generating an 8K output image from a 2560×1440 render. While this can improve performance in games, especially at high display resolutions, DLSS can also be leveraged to improve the interactive experience in 3D modeling and animation applications.

Nvidia also announced details for another under-the-hood technology it calls , which will allow the GPU to directly access data in storage, bypassing the CPU and system memory. It will use Microsoft’s upcoming DirectStorage API to do this at an OS level. This frees those CPU and host system resources for other tasks and should dramatically speed up scene or level loading times. They are talking about 3D scenes and game levels opening up five to 10 times faster when they decompress data from storage directly on the GPU. If this technology can be used to decompress video files directly from storage, that could help video editors as well. Nvidia also has RTXGL, which is accelerated Global Illumination running on RTX cores. The new second-generation RTX cores also have new support for rendering motion blur far more efficiently than in the past.

New Technology
Nvidia Reflex is a new technology for measuring and reducing gaming lag — in part by passing the mouse data through the display to compare user’s movements to changes on screen. Omniverse Machinima is a toolset aimed at making it easier for users to use 3D gaming content to create cinematic stories. It includes tools for animating character face and body poses, as well as RTX-accelerated environmental editing and rendering. Nvidia Broadcast is a software tool that uses deep learning to improve your webcam stream, with AI-powered background noise reduction and AI subject tracking or automated background replacement, among other features. The software creates a virtual device for the improved stream, allowing it to be used by nearly any existing streaming application. Combining it with NDI Tools and/or OBS Studio could lead to some very interesting streaming workflow options, and it is available now if you have an RTX card.

The New GeForce Cards
The three new cards are available from Nvidia as Founders Edition products and from other GPU vendors with their own unique designs and variations. The Founders Edition cards have a consistent visual aesthetic, which is strongly driven by their thermal dissipation requirements. The GeForce RTX 3070 will launch next month for $500 and features a fairly traditional cooling design, with two fans blowing on the circuit board. The 3070 has 5888 CUDA cores and 8GB DDR6 RAM, offering a huge performance increase over the previous generation’s RTX 2070 with 2560 cores. With the help of DLSS and other developments, Nvidia claims that the 3070’s render performance in many tasks is on par with the RTX2080Ti, which costs over twice as much.

The new GeForce RTX 3080 is Nvidia’s flagship “gaming” card, with 8704 CUDA cores and 10GB of DDR6X memory for $700. It should offer 50% more processing than the upcoming RTX 3070 and double the previous RTX 2080’s performance. Its 320 watts TPD is high, requiring a new board design for better airflow and ventilation. The resulting PCB is half the size of the previous RTX 2080, allowing air to flow through the card for more effective cooling. But the resulting product is still huge — mostly heat pipes, fins and fans.

Lastly, the GeForce RTX 3090 occupies a unique spot in the lineup. At $1,500, it exceeds the Titan RTX in every way, with 10,486 CUDA cores and 24GB of DDR6X memory for 60% of the cost. It is everything the 3080 is, but scaled up. The same compressed board shape allows for the unique airflow design in a three-slot package that is an inch taller than a full-height card. This allows it to dissipate up to 350W of power. Its size means you need to ensure that it can physically fit in your system case or design a new system around it.

It is primarily targeted at content creators, optimized for high-resolution image processing, be that gaming at 8K, encoding and streaming at 8K, editing 8K video or rendering massive 3D scenes. The 3090 is also the only one of the new GeForce cards that allows for NVLink to harness the power of two cards together, but that has been falling out of favor with many users anyway, as individual GPUs increase in performance so much with each new generation.

All of the cards offer three DisplayPort 1.4 outputs and a single HDMI 2.1 port. They offer hardware decode of the new video codec up to 8K resolution and encode and decode H.264 and HEVC, including screen capture and live-streaming of content through ShadowPlay. All three use a new 12-pin PCIe power connector, which can be adapted from dual 8-pin connectors for the time being, but expect to see these new plugs on power supplies in the future.

My Initial Tests
I was excited to recently receive a new GeForce RTX 3090 to test out, primarily for 8K and HDR workflows. I also will be reviewing a new workstation in the near future, which will pair well with this top-end card, but have not received that system yet and wanted to get readers some immediate initial results. I will follow up with more extensive benchmarking comparing this card to the 2080Ti and P6000 next month once I have that system to test with.

So I set about installing the 3090 into my existing workstation, which was a bit of a challenge. I had to rearrange all of my cards and remove some to free up three adjacent PCIe slots. I also had to adapt some power cables to get the two 8-pin PCIe plugs I needed for the included adapter to connect to the new 12-pin mini plug. Everything I did works fine for the time being, but I can’t close the top of the 3U rack-mount case. (One more reason besides airflow to always opt for the 4U option.) Since case enclosures are important for proper cooling, I added a large fan over the whole open system.

I had already installed the newest Studio Driver, which worked with my existing Quadro cards as well as the new GeForce cards. That is one of the many benefits of the new unified Studio Drivers; I don’t have to change drivers when switching between GeForce and Quadro cards for comparison benchmarks. My first test was in Adobe Premiere Pro, where I discovered that new architectures make a significant difference when you access the CPU as deep as Adobe does. So I had to use the newest public beta version of Premiere Pro to get playback working with CUDA enabled. The newest beta version of Premiere Pro also supports GPU-accelerated video decoding, which is great, but it makes it hard to tell which speed improvements are from Nvidia’s software optimizations and which are from this new hardware.

Playback performance can be hard to quantify compared to render times, but Ctrl+Alt+F11 enables some on-screen information that gives Premiere users some insight into how their system is behaving during playback, combined with the dropped frame indicator and third-party hardware monitoring tools. I got smooth Premiere playback of 4K HDR material with high-quality playback enabled, which is necessary to avoid banding. 8K sequences played back well at half-res, but full resolution was hit or miss, depending on how much I started and stopped playback. Playing back long clips worked well, but stopping and resuming playback quickly usually led to dropped frames and hiccupping playback. Full-res 8K playback is only useful if you have an 8K display, which I do. But 8K content can be edited at lower display resolution if necessary. Basically, my current system is sufficient for 4K HDR editing, but to really push this card to its full potential for smooth 8K editing, I need to put it in a system that isn’t seven years old.

Blackmagic DaVinci Resolve also benefits from the new hardware, especially the increased RAM, for things like image de-noising and motion blur at larger frame sizes, which fail without enough memory on lower-end cards. While technically previous-generation Quadro cards have offered 24GB RAM, they were at much higher price points than the $1,500 GeForce 3090, making it the budget-friendly option from this perspective. And the processing cores help as well, as Nvidia reports a 50% decrease in render times for certain complex effects, which is a better improvement than it claims for similar Premiere export processes.

RedcineX Pro required a new beta update to use the new card’s CUDA acceleration. The existing version does support the RTX 3090 in OpenCL mode, but CUDA support in the new version should be faster. I was able to convert an 8K anamorphic R3D file to UHD2 ProRes HDR, which had repeatedly failed due to memory errors on my previous 8GB Quadro P4000, so the extra memory does help.  My 8K clip takes 30 minutes to transcode in CPU mode and just over a minute with the RTX 3090’s CUDA acceleration.

Blender is an open-source 3D program for animation and rendering, which benefits from CUDA, the Optics SDK and the new RTX motion-blurring functionality. Blender has its own benchmarking tool, and my results were much faster with Optics enabled. I have not tested with Blender in the past, so I don’t have previous benchmarks to compare to, but it is pretty clear that having a powerful GPU makes a big difference here compared to CPU rendering.

Octane is a render engine that explicitly uses RTX to do raytraced rendering, and all three scenes I tried were two to five times faster with RTX enabled. I expect we will see these kinds of performance improvements in any other 3D applications that take the time to properly implement RTX raytracing acceleration.

I also tried a few of my games but realized that nothing I play is graphics-intensive enough to see much difference with this card. I mostly play classic games from many years ago that aren’t even taxing the GPU heavily at 8K resolution. I may test Fortnite for my follow-on benchmarking article, since that program will benefit from both RTX and DLSS in HDR, potentially at 8K.

From a video editor’s perspective, most of us are honestly not GPU-bound in our workflows, even at 8K, at least in terms of raw processing power. Where we will see significant improvements is in system bandwidth as we move to PCIe 4.0 and from the added memory on the cards for processing larger frames, in both resolution and bit depth. The content creators who will see the most significant improvements from the new GPUs are those working with true 3D content. Between Optics, RTX, DLSS and a host of other new technologies, and more processing cores and memory, they will have dramatically more interactive experiences when working with complex scenes, at higher levels of realtime detail.

Do You Need Ampere?
So who needs these, and which card should you get? Anyone who is currently limited by their GPU could benefit from one, but the benefits are greater for people processing actual 3D content.

Honestly, the 3070 should be more than enough for most editors’ needs, unless you are debayering 8K footage or doing heavy grading work on 4K assets. The step up to the 3080 is 40% more money for 50% more cores and 25% more memory. Therefore 3D artists who need the extra cores will find this to be easily worth the cost. The step up to the 3090 is twice the price for more than twice the memory but only 20% more processing cores.

This upgrade is all about larger projects and frame sizes, as it won’t improve 3D processing by nearly as much, unless your data would have exceeded the 3080’s 10GB memory limit. There are also the size considerations for the 3090: do you have the slots, physical space and power to install a 350W beast? Other vendors may create different configurations, but they are going to be imposing, regardless of the details.

So while I love the new cards, due to my fascination with high-end computing hardware, I personally have trouble taxing this beast’s capabilities enough with my current system to make it worth it. We will see if balancing it with a new workstation CPU will allow Premiere to really take advantage of the extra power it brings to the table. Many video editors will be well served by the mid-level options that are presumably coming soon. But for people doing 3D processing that stretch their current graphics solutions to their limits, upgrading to new Ampere GPUs will make a dramatic difference in how fast you can work.


Mike McCarthy is an online editor/workflow consultant with over 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

Otoy uses Nvidia A100 GPUs to speed rendering on Google Cloud

In an effort to speed visual effects workflows, Otoy is now offering the Rndr Enterprise Tier, featuring next-gen Nvidia A100 Tensor Core GPUs on Google Cloud with, according to the company, performance surpassing 8000 OctaneBench. Otoy reached this milestone on a Google Cloud Accelerator-Optimized VM (A2) instance with 16 Nvidia A100 GPUs. Each A100 GPU offers up to 20 times the compute performance compared to the previous-generation processor.

With Google Cloud’s Nvidia A100 instances on Otoy’s Rndr Enterprise Tier, artists can take advantage of OctaneRender’s unbiased, spectrally correct, GPU-accelerated rendering for advanced visual effects, ultra-high-resolution rendering and immersive location-based entertainment formats. Benchmarked using OctaneRender 2020.1.4, Google Cloud instances bring thousands of secure, production-ready Nvidia high-performance GPU clusters to the OctaneRender ecosystem – featuring both Nvidia V100 Tensor Core GPUs (with up to 3000 OctaneBench) and Nvidia A100 Tensor Core GPUs (with up to 8000 OctaneBench).

This fall, OctaneRender users will get early access to Google Cloud’s new Nvidia A100 GPU instances with 40GB VRAM, eight-way Nvidia NVLink support and 1.6TB/s memory bandwidth, offering memory capacity for what Otoy calls “blazingly-fast GPU render times” for the most demanding memory-intensive scenes.

According to Otoy founder/CEO Jules Urbach, “For nearly a decade we have been pushing the boundary of GPU rendering and cloud computing to get to the point where there are no longer constraints on artistic creativity. With Google Cloud’s Nvidia A100 instances featuring massive VRAM and the highest OctaneBench ever recorded, we have reached a first for GPU rendering – where artists no longer have to worry about scene complexity when realizing their creative visions.”

Introduced earlier this year along with OctaneRender 2020, the Rndr Network allows artists to choose between rendering jobs on secure enterprise-tier GPUs like Google Cloud’s Nvidia instances or using the massive processing power available on a network of decentralized GPUs. Artists processing renders on the Rndr Enterprise Tier can also use decentralized GPUs for overflow capacity, providing the flexibility to scale renders across thousands of peer-to-peer nodes when on a deadline or for ultra-high-resolution formats.

The Rndr Enterprise Tier on Google Cloud is available for users of all OctaneRender integrated plugins across 20 of the industry’s leading content creation tools, including Maxon Cinema 4D, Side Effects Houdini and Autodesk Maya. All current OctaneRender 2020 Studio and Enterprise subscribers and OctaneRender Box License holders with an Enterprise maintenance plan can access the Rndr Enterprise Tier.

Image Credit: “Captain Marvel” Main Title Sequence by Elastic. ©2020 Marvel.

Nvidia at SIGGRAPH with new RTX Studio laptops, more

By Mike McCarthy

Nvidia made a number of new announcements at the SIGGRAPH conference in LA this week.  While the company didn’t have any new GPU releases, Nvidia was showing off new implementations of its technology — combining AI image analysis with raytracing acceleration for an Apollo 11-themed interactive AR experience. Nvidia has a number of new 3D software partners supporting RTX raytracing through its Optix raytracing engine.  It allows programs like Blender Cycles, Keyshot, Substance, and Flame to further implement GPU acceleration, using RTX cores for raytracing and tensor cores for AI de-noising.

Nvidia was also showing off a number of new RTX Studio laptop models from manufacturers like HP, Dell, Lenovo and Boxx. These laptops all support Nvidia’s new unified Studio Driver, which, now on its third release, offers full, 10-bit color support for all cards, blurring the feature-set lines between the GeForce and Quadro products. Quadro variants still offer more frame buffer memory, but support for the Studio Drive makes the GeForce cards even more appealing to professionals on a tight budget.

Broader support for 10-bit color makes sense as we move toward more HDR content that requires the higher bit depth, even at the consumer level. And these new Studio Drivers also support both desktop and mobile GPUs, which will simplify eGPU solutions that utilize both on a single system. So if you are a professional with a modern Nvidia RTX GPU, you should definitely check out the new Studio Driver options.

Nvidia is also promoting its cloud-based AI image-generating program Gaugan, which you can check out for free here. It is a fun toy and there are a few potential uses in the professional world, especially for previz backgrounds and concept art.


Mike McCarthy is an online editor/workflow consultant with over 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

Autodesk Arnold 5.3 with Arnold GPU in public beta

Autodesk has made its Arnold 5.3 with Arnold GPU available as a public beta. The release provides artists with GPU rendering for a set number of features, and the flexibility to choose between rendering on the CPU or GPU without changing renderers.

From look development to lighting, support for GPU acceleration brings greater interactivity and speed to artist workflows, helping reduce iteration and review cycles. Arnold 5.3 also adds new functionality to help maximize performance and give artists more control over their rendering processes, including updates to adaptive sampling, a new version of the Randomwalk SSS mode and improved Operator UX.

Arnold GPU rendering makes it easier for artists and small studios to iterate quickly in a fast working environment and scale rendering capacity to accommodate project demands. From within the standard Arnold interface, users can switch between rendering on the CPU and GPU with a single click. Arnold GPU currently supports features such as arbitrary shading networks, SSS, hair, atmospherics, instancing, and procedurals. Arnold GPU is based on the Nvidia OptiX framework and is optimized to leverage Nvidia RTX technology.

New feature summary:
— Major improvements to quality and performance for adaptive sampling, helping to reduce render times without jeopardizing final image quality
— Improved version of Randomwalk SSS mode for more realistic shading
— Enhanced usability for Standard Surface, giving users more control
— Improvements to the Operator framework
— Better sampling of Skydome lights, reducing direct illumination noise
— Updates to support for MaterialX, allowing users to save a shading network as a MaterialX look

Arnold 5.3 with Arnold GPU in public beta will be available March 20 as a standalone subscription or with a collection of end-to-end creative tools within the Autodesk Media & Entertainment Collection. You can also try Arnold GPU with a free 30-day trial of Arnold. Arnold GPU is available in all supported plug-ins for Autodesk Maya, Autodesk 3ds Max, Houdini, Cinema 4D and Katana.

Review: eGPUs and the Sonnet Breakaway Box

By Mike McCarthy

As a laptop user and fan of graphics performance, I have always had to weigh the balance between performance and portability when selecting a system. And this usually bounces back and forth, as neither option is totally satisfactory. Systems are always too heavy or not powerful enough.

My first laptop when I graduated high school was the 16-inch Sony Vaio GRX570, with the largest screen available at the time, running 1600×1200 pixels. After four years carrying that around, I was eager to move to the Dell XPS M1210, the smallest laptop with a discrete GPU. That was followed by a Quadro-based Dell Precision M4400 workstation, which was on the larger side. I then bounced to the lightweight Carbon Fiber 13-inch Sony Vaio Z1 in 2010, which my wife still uses. This was followed by my current Aorus X3 Plus, which has both power (GF870M) and a small form factor (13 inch), but at the expense of everything else.

Some More History
The Vaio Z1 was one of the first hybrid graphics solutions to allow users to switch between different GPUs. Its GeForce 330M was powerful enough to run Adobe’s Mercury CUDA Playback engine in CS5, but was at the limit of its performance. It didn’t support my 30-inch display, and while the SSD storage solution had the throughput for 2K DPX playback, the GPU processing couldn’t keep up.

Other users were upgrading the GPU with an ExpressCard-based ViDock external PCIe enclosure, but a single-lane of PCIe 1.0 bandwidth (2Gb/s) wasn’t enough to make is worth the effort for video editing. (3D gaming requires less source bandwidth than video processing.) Sony’s follow-on Z2 model offered the first commercial eGPU, connected via LightPeak, the forerunner to Thunderbolt. It allowed the ultra-light Z series laptop to use an AMD Radeon 6650M GPU and Blu-ray drive in the proprietary Media Dock, presumably over a PCIe x4 1.0 (8Gb/s) connection.

Thunderbolt 3
Alienware also has a propriety eGPU solution for their laptops, but Thunderbolt is really what makes eGPUs a marketable possibility, giving direct access to the PCIe bus at x4 speed, in a standardized connection. The first generation offered a dedicated 10Gb connection, while Thunderbolt 2 increased that to a 20Gb shared connection. The biggest thing holding back eGPUs at that point was lack of PC adoption of the Apple technology licensed from Intel, and OS X limitations on eGPUs.

Thunderbolt 3 changed all of that, increasing the total connection bandwidth to 40Gb, the same as first-generation PCIe x16 cards. And far more systems support Thunderbolt 3 than the previous iterations. Integrated OS support for GPU switching in Windows 10 and OS X (built on laptop GPU power saving technology) further paved the path to eGPU adoption.

Why eGPUs Now?
Even with all of this in my favor, I didn’t take the step into eGPU solutions until very recently. I bought my personal system in 2014. This was just before Thunderbolt 3 hit the mainstream. The last two systems I reviewed had Thunderbolt 3, but didn’t need eGPUs with their mobile Quadro P4000 and P5000 internal GPUs. So I hadn’t had the opportunity to give it a go until I received an HP Zbook Studio x360 to review. Now, its integrated Quadro P1000 is nothing to scoff at, but there was significantly more room for performance gains from an external GPU.

Sonnet Breakaway Box
I have had the opportunity to review the 550W version of Sonnet’s Breakaway Box PCIe enclosure over the course of a few weeks, allowing me to test out a number of different cards, including four different GPUs, as well as my Red-Rocket-X and 10GbE cards. Sonnet has three different eGPU enclosure options, depending on the power requirements of your GPU.

They sent me the mid-level 550 model, which should support every card on the market, aside from AMD’s power-guzzling Vega 64-based GPUs. The base 350 model should support GF1080 or 2080 cards, but not overclocked Titanium or Titan versions. The 550 model includes two PCIe power cables that can be used in 6- or 8-pin connectors. This should cover any existing GPU on the market, and I have cards requiring nearly every possible combo — 6-pin, 8-pin, both, and dual 8-pin. Sonnet has a very thorough compatibility list available, for more specific details.

Installation
I installed my Quadro P6000 into the enclosure, because it used the same drivers as my internal Quadro P1000 GPU and would give me the most significant performance boost. I plugged the Thunderbolt connector into the laptop while it was booted. It immediately recognized the device, but only saw it as a “Microsoft Basic Display Adapter” until I re-installed my existing 411.63 Quadro drivers and rebooted. After that, it worked great, I was able to run my benchmarks and renders without issue, and I could see which GPU was carrying the processing load just by looking in the task manager performance tab.

Once I had finished my initial tests, safely removed the hardware in the OS and disconnected the enclosure, I swapped the installed card with my Quadro P4000 and plugged it back into the system without rebooting. It immediately detected it, and after a few seconds the new P4000 was recognized and accelerating my next set of renders. When I attempted to do the same procedure with my GeForce 2080TI, it did make me install the GeForce driver (416.16) and reboot before it would function at full capacity (subsequent transitions between Nvidia cards were seamless).

The next step was to try an AMD GPU, since I have a new RadeonPro WX8200 to test, which is a Pro version of the Vega 56 architecture. I was a bit more apprehensive about this configuration due to the integrated Nvidia card, and having experienced those drivers not co-existing well in the distant past. But I figured: “What’s the worst that could happen?”

Initially, plugging it in gave me the same Microsoft Basic Display Adapter device until I installed the RadeonPro drivers. Installing those drivers caused the system to crash and refuse to boot. Startup repair, system restore and OS revert all failed to run, let alone fix the issue. I was about to wipe the entire OS and let it reinstall from the recovery partition when I came across one more idea online. I was able to get to a command line in the pre-boot environment and run a Deployment Image Servicing and Management (DISM) command to see which drivers were installed — DISM /image:D:\ /Get-Drivers|more.

This allowed me to see that the last three drivers — oem172.inf through oem174.inf —were the only AMD-related ones on the system. I was able to remove them via the same tool — DISM /Image:D:\ /Remove-Driver /Driver:oem172.inf”) — and when I restarted, the system booted up just fine.

I then pulled the card from the eGPU box, wiped all the AMD files from the system, and vowed never to do something like that again. Lesson of the day: Don’t mix AMD and Nvidia cards and drivers. To AMDs credit, the WX8200 does not officially support eGPU installations, but extraneous drivers shouldn’t cause that much problem.

Performance Results
I tested Adobe Media Encoder export times with a variety of different sources and settings. Certain tests were not dramatically accelerated by the eGPU, while other renders definitely were. The main place we see differences between the integrated P1000 and a more-powerful external GPU is when effects are applied to high-res footage. That is when the GPU is really put to work, so those are the tests that improve with more GPU power. I had a one-minute sequence of Red clips with lots of effects (Lumetri, selective blur and mosaic: all GPU FX) that took 14 minutes to render internally, but finished in under four minutes with the eGPU attached. Exporting the same sequence with the effects disabled took four minutes internally and three minutes with the GPU. So the effects cost 10 minutes of render time internally, but under one minute of render time (35 seconds to be precise) when a powerful GPU is attached.

So if you are trying to do basic cuts-only editorial, an eGPU may not improve your performance much, but if you are doing VFX or color work, it can make a noticeable difference.

VR Headset Support
The external cards, of course, do increase performance in a measurable way, especially since I am using such powerful cards. It’s not just a matter of increasing render speeds, but about enabling functionality that was previously unavailable on the system. I connected my Lenovo Explorer WMR headset to the RTX2080TI in the Breakaway Box and gave it a shot. I was able to edit 360 video in VR in Premiere Pro, which is not supported on the included Quadro P1000 card. I did experience some interesting ghosting on occasion, where if I didn’t move my head everything looked perfect, but movement caused a double image — as if one eye was a frame behind the other — but the double image was appearing in each eye, as if there was an excessive motion blur applied to the rendered frames.

I thought this might be a delay based on extra latency in the Thunderbolt bus, but other times the picture looked crisp regardless of how quickly I moved my head. So it can work great, but there may need to be a few adjustments made to smooth things out. Lots of other users online report it working just fine, so there is probably a solution available out there.

Full-Resolution 8K Tests
I was able to connect my 8K display to the card as well, and while the x360 happens to support that display already (DP1.3 over Thunderbolt), most notebooks do not — and it increased the refresh rate from 30Hz to the full 60Hz. I was able to watch HEVC videos smoothly at 8K in Windows, and was able to playback 8K DNxHR files in Premiere at full res, as long as there were no edits or effects.

Just playing back footage at full 8K taxed the 2080TI at 80% compute utilization. But this is 8K we are talking about, playing back on a laptop, at full resolution. 4K anamorphic and 6K Venice X-OCN footage played back smoothly at half res in Premiere, and 8K Red footage played back at quarter. This is not the optimal solution for editing 8K footage, but it should have no problem doing serious work at UHD and 4K.

Other Cards and Functionality
GPUs aren’t the only PCIe cards that can be installed in the Breakaway Box, so I can add a variety of other functionality to my laptop if desired. Thunderbolt array controllers minimize the need for SATA or SAS cards in enclosures, but that is a possibility. I installed an Intel X520-DA2 10GbE card into the box and was copying files from my network at 700MB/s within a minute, without even having to install any new drivers. But unless you need to have SFP ports, most people looking for 10GbE functionality would be better served to look into Sonnet’s Solo 10G for smaller form factor, lower power use, and cheaper price. There are a variety of other options for Thunderbolt 3 to 10GbE hitting the market as well.

The Red-Rocket-X card has been a popular option for external PCIe enclosures over the last few years, primarily for on-set media transcoding. I installed mine in the Breakaway Box to give that functionality a shot as well.

I ran into two issues, both of which I was able to overcome, but are worth noting. First, the 6-pin power connector is challenging to fit into the poorly designed Rocket power port, due to the retention mechanism being offset for 8-pin compatibility. But it can fit if you work at it a bit, although I prefer to keep a 6-pin extension cable plugged into my Rocket since I move it around so much. Once I had all of the hardware hooked up, it was recognized in the OS, but installing the drivers from Red resulted in a Code-52 error that is usually associated with USB devices. The recommended solution online was to disable Windows 10 driver signing, in the pre-boot environment, and that did the trick. (My theory is that my HP’s SureStart security functionality was hesitating to give direct memory access to an external device, as that is the level of access Thunderbolt devices get to your system, and the Red Rocket-X driver wasn’t signed for that level of security.)

Anyhow, the card worked fine after that, and I verified that it accelerated my renders in Premiere Pro and AME. I am looking forward to a day when CUDA acceleration allows me to get that functionality out of my underused GPU power instead of requiring a dedicated card.

I did experience an issue with the Quadro P4000, where the fans spun up to 100% when the laptop went to shut off, hibernated, or went to sleep. None of the other cards had that issue, instead they shut off when the host system did and turned back on automatically when I booted up the system. I have no idea why the P4000 acted differently than the architecturally very similar P6000. Manually turning off the Breakaway Box or disconnecting the Thunderbolt cable solves the problem with the P4000, but then you have to remember to reconnect again when you are booting up.

In the process of troubleshooting the fan issue, I did a few other driver installs and learned a few tricks. First off, I already knew Quadro drivers can’t run GeForce cards (otherwise why pay for a Quadro), but GeForce drivers can run on Quadro cards. So it makes sense you would want to install GeForce drivers when mixing both types of GOUs. But I didn’t realize that apparently GeForce drivers take preference when they are installed. So when I had an issue with the internal Quadro card, reinstalling the Quadro drivers had no effect, since the GeForce drivers were running the hardware. Removing them (with DDU just to be thorough) solved the issue, and got everything operating seamlessly again. Sonnet’s support people were able to send me the solution to the problem on the first try. That was a bit of a hiccup, but once it was solved I could again swap between different GPUs without even rebooting. And most users will always have the same card installed when they connect their eGPU, further simplifying the issue.

Do you need an eGPU?
I really like this unit, and I think that eGPU functionality in general will totally change the high-end laptop market for the better. For people who only need high performance at their desk, there will be a class of top-end laptop with high-end CPU, RAM and storage, but no GPU to save on space and weight (CPU can’t be improved by external box, and needs to keep up with GPU).

There will be another similar class with mid-level GPUs to support basic 3D work on the road, but massive increases at home. I fall in the second category, as I can’t forego all GPU acceleration when I am traveling or even walking around the office. But I don’t need to be carrying around an 8K rendering beast all the time either. I can limit my gaming, VR work and heavy renders to my desk. That is the configuration I have been able to use with this ZBook x360.: enough power to edit un-tethered, but combining the internal 6-core CPU with a top -end external GPU gives great performance when attached to the Breakaway Box. As always, I still want to go smaller, and plan to test with an even lighter weight laptop as soon as the opportunity arises.

Summing Up
The Breakaway Box is a simple solution to a significant issue. No bells and whistles, which I initially appreciated. But the eGPU box is inherently a docking station, so there is an argument to be made for adding other functionality. In my case, once I am setup at my next project, using a 10GbE adapter in the second TB3 port on my laptop will be a better solution for top performance and bandwidth anyway.

So I am excited about the possibilities that eGPUs bring to the table, now that they are fully supported by the OS and applications I use, and I don’t imagine buying a laptop setup without one anytime in the foreseeable future. The Sonnet Breakaway Box meets my needs and has performed very well for me over the last few weeks.


Mike McCarthy is an online editor/workflow consultant with over 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

Review: AMD’s Radeon Pro WX8200

By Mike McCarthy

AMD has released the WX8200 high-end professional GPU as part of their Radeon Pro line. It’s based on the Vega architecture, with 3,584 compute cores accessing 8GB of HBM2 memory at up to 512GB/sec. Its hardware specs are roughly equivalent to their $400 Vega 56 gaming card but with professional drivers tuned for optimized performance in a variety of high-end 3D applications. AMD is marketing the WX8200, which is priced at $999, as an option between Nvidia’s Quadro P4000 and P5000.

AMD GPUs
Some background: I really haven’t used an AMD GPU before, at least not since they bought ATI over 10 years ago. My first Vaio laptop (2002) had an ATI Radeon 7500 in it, and we used ATI Radeon cards in our Matrox AXIO LE systems at Bandito Brothers in 2006. That was right around the time ATI got acquired by AMD. My last AMD-based CPUs were Opterons inside HP XW9300 workstations around the same time period, but we were already headed towards Nvidia GPUs when Adobe released Premiere Pro CS5 in early 2010.

CS5’s CUDA-based, GPU-accelerated Mercury Playback Engine locked us in to Nvidia GPUs for years to come. Adobe eventually included support for OpenCL as an alternative acceleration for the Mercury Playback Engine, primarily due to the Mac hardware options available in 2014, but it was never as mature or reliable on Windows as the CUDA-based option. By that point we were already used to using it, so we continued on that trajectory.

I have a good relationship with Nvidia, and have reviewed many of their cards over the years. Starting back in 2008, their Quadro CX card was the first piece of hardware I was ever provided with for the explicit purpose of reviewing, instead of just writing about the products I was already using at work.

When I was approached about doing this AMD review, I had to pause for a moment. I wanted to make sure I could really do an honest and unbiased review of an AMD card. I asked myself, “What if they worked just as well as the Nvidia cards I was used to?” That would really open up my options when selecting a new laptop, as most of the lighter weight options have had AMD GPUs for the last few years. Plus, it would be useful information and experience to have since I was about to outfit a new edit facility and more options are always good when finding ways to cut costs without sacrificing performance or stability.

So I agreed to review this new card and run it through the same tests I use for my Quadro reviews. Ideally, I would have a standard set of assets and timelines that I could use every time I needed to evaluate the performance of new hardware. Then I could just compare it to my existing records from previous tests. But the tests run in software that is changing as well, and Premiere Pro was on Version 11 when I tested the Pascal Quadros; it’s now on Version 13. Plus, I was testing 6K files then and have lots of 8K assets now, as well as a Dell UP3218K monitor to view them on. Just driving images to an 8K monitor smoothly is a decent test for a graphics card, so I ended up benchmarking not just the new AMD card, but all of the other (Nvidia) cards I have on hand for comparison, leading to quite a project.

The Hardware
The first step was to install the card in my Dell Precision 7910 workstation. Slot-wise, it just dropped into the location usually occupied by my Quadro P6000. It takes up two slots, with a single PCIe 3.0 x16 connector. It also requires both a six-pin and eight-pin PCIe power connector, which I was able to provide, with a bit of reconfiguration. Externally, it has four MiniDisplayPort connectors and nothing else. Dell has an ingenious system of shipping DP to mDP cables with their monitors that have both ports, allowing either source port to be used by reversing the cable. But that didn’t apply to my dual full-sized DisplayPort UP3218K monitor. I didn’t realize this until after ordering mDP-to-DP cables, which I already had from my PNY PrevailPro review for the same reason.

I prefer the full-sized connectors to ensure I don’t try to plug them in backwards, especially since AMD didn’t use the space savings to include any other ports on the card. (HDMI, USB-C, etc.) I also tried the card in an HP Z4 workstation a few days later to see if the Windows 10 drivers were any different. Those notes are included throughout.

The Drivers
Once I had my monitors hooked up, I booted the system to see what would happen. I was able to install the drivers and reboot for full functionality without issue. The driver install is a two-part process. You first you install AMD’s display software, and then that software allows you to install a driver. I like this approach because it allows you to change your driver version without reinstalling all of the other supporting software. The fact that driver packages these days are over 500MB is a bit ridiculous, especially for those of us not fortunate enough to live in areas where Fiber Internet connections are available. Hopefully this approach can alleviate that issue a bit.

AMD advertises that this functionality also allows you to switch driver versions without rebooting, and their RadeonPro line fully supports their gaming drivers as well. This can be an advantage for a game developer who uses the professional feature set for their work but then wants to test the consumer experience without having a separate, dedicated system. Or maybe it’s just for people who want better gaming performance on their dual-use systems.

The other feature I liked in their software package is screen-capture functionality called RadeonPro ReLive. It records all onscreen images or specific window selections, as well as application audio and optionally microphone audio. It saves the screen recordings to AVC or HEVC files generated by the VCE 4.0 hardware video compression engine on the card. When I tested, it worked as expected, and the captured files looked good, including a separate audio file for my microphone voice while the system audio was embedded in the video recording.

This is a great tool for making software tutorials, or similar type tasks, and I intend to use it in the near future for posting videos of my project workflows. Nvidia offers similar functionality in the form of ShadowPlay, but doesn’t market it to professionals since it’s part of the GeForce Experience software. I tested for comparison, and it does work on Quadro cards but has fewer options and controls. Nvidia should take the cue from AMD and develop a more professional solution for their users who need this functionality.

I used the card with both my main curved 34-inch monitor at 3440×1440, and my 8K monitor at 7680×4320. The main display worked perfectly the entire time, but I had issues with the 8K one. I went through lots of tests on both operating systems, with various cables and drivers before discovering that a firmware update for the monitor solved the issues. So if you have a UP3218K, take the time to update the firmware for maximum GPU compatibility. My HDMI-based home theater system on the other hand worked perfectly and even allowed me access to the 5.1 speaker in Premiere through the AMD HDMI audio drivers.

10-Bit Display Support
One of the main reasons to get a “professional” GPU over a much cheaper gaming card is that they support 10-bit color in professional applications instead of just in full-screen outputs like games and video playback that are supported at 10-bit on consumer GPUs. But when I enabled 10-bit mode in the RadeonPro Advanced panel, I ran into some serious issues. On Windows 7, it disabled the view ports on most of my professional apps, like Adobe’s Premiere, After Effects and Character Animator. When I enabled it on a Windows 10 system to see if it worked any better in a newer OS, the Adobe application interfaces looked even crazier and still no video playback.

10-bit in Premiere

I was curious to see if my Print Screen captures would still look this way once I disabled the 10-bit setting because, in theory, even after seeing them that way when I pasted them into a Photoshop doc, that could still be a display distortion of proper-looking screen capture. But no, the screen captures look exactly how the interface looked on my display.

AMD is aware of the problem and they are working on it. It is currently listed as a known issue in their newest driver release.

Render Performance
The time then came to analyze the card’s performance and see what it could do. While GPUs are actually designed to accelerate the calculations required to display 3D graphics, that processing capacity can be used in other ways. I don’t do much true 3D processing besides the occasional FPS game, so my GPU use is all for image processing in video editing and visual effects. This can be accelerated by AMD’s GPUs through the OpenCL (Compute Language) framework (as well as through Metal on the Mac side).

My main application is Adobe Premiere Pro 12, and it explicitly supports OpenCL acceleration, as does Adobe After Effects and Media Encoder. So I opened them up and started working. I didn’t see a huge difference in interface performance, even when pushing high-quality files around, but that is a fairly subjective test and fairly intermittent. I might drop frames during playback one time, but not the next time. Render time is a much more easily quantifiable measure of computational performance, so I created a set of sequences to render in the different hardware configuration for repeatable tests.

I am pretty familiar with which operations are CPU-based and which run on the GPU, so I made a point of creating test projects that work the GPUs as much as possible. This is based on the clip resolution, codec and selection of accelerated effects to highlight the performance differences in that area. I rendered those sequences with OpenCL acceleration enabled on the WX8200 and with all GPU acceleration disabled in the Mercury software playback mode, then with a number of different Nvidia GPUs for comparison.

Trying to push the cards as hard as possible, I used 6K Venice files and 8K Red files with Lumetri grades and other GPU effects applied. I then exported them to H.265 files at 10-bit Rec.2020 in UHD and 8K. (I literally named the 8K sequence “Torture Test.”)

My initial tests favored Nvidia GPUs by a factor of at least three to one, which was startling, and I ran them repeatedly to verify with the same result. Further tests and research revealed that usually AMD (and OpenCL) is about 25% slower than Adobe’s CUDA mode on similarly priced hardware, verified by a variety of other sources. But my results were made worse for two reasons: Red decoding is currently more optimized for acceleration on Nvidia cards and rendering at 10-bit ground the AMD-accelerated OpenCL renders to a halt.[

When exporting 10-bit HDR files at “Maximum Bit Depth,” it took up to eight times as long to finish rendering. Clearly this was a bug, but it took a lot of experimentation to narrow it down. And the Intel-based OpenCL acceleration doesn’t suffer from the same issue. Once I was able to test the newest Media Encoder 13 release on the Windows 10 system, the 10-bit performance hit while using the AMD disappeared. When I removed the Red source footage and exported 8-bit HEVC files, the WX8200 was just as fast as any of my Nvidia cards (P4000 and P6000). When I was sourcing from Red footage, the AMD took twice as long but GPU-based effects seemed to have no effect on render time, so those are accelerated properly by the card.

So, basically, as long as you aren’t using Red source and you use Premiere Pro and Media Encoder 13 or newer, this card is comparable to the alternatives for most AME renders.

[Statement from Adobe: “Premiere Pro has a long history of supporting multiple GPU technologies. Adobe works closely with all hardware partners to ensure maximum performance across the wide range of systems that may exist in creative workflows. For example, Adobe has been at the forefront of GPU development across CUDA, OpenCL, Metal and now Vulkan. In the case of OpenCL we partnered with AMD and provided deep access to our code base to ensure that the maximum performance levels were being achieved. It has been, and will remain to be our policy to deeply collaborate with all vendors who create high performance hardware/ software layers for video and audio creatives.”]

Is the RadeonPro WX8200 Right for Your Workflow?
That depends on what type of work you do. Basically, I am a Windows-based Adobe editor, and Adobe has spent a lot of time optimizing their CUDA-accelerated Mercury Playback Engine for Premiere Pro in Windows. That is reflected in how well the Nvidia cards perform for my renders, especially with Version 12, which is the final release on Windows 7. Avid or Resolve may have different results, and even Premiere Pro on OS X may perform much better with AMD GPUs due to the Metal framework optimizations in that version of the program. It is not because the card is necessarily “slower,” it just isn’t being used as well by my software.

Nvidia has invested a lot of effort into making CUDA a framework that applies to tasks beyond 3D calculations. AMD has focused their efforts directly on 3D rendering with things like ProRender, and GPU accelerates true 3D renders. If you are doing traditional 3D work, either animation or CAD projects, this card will probably be much more suitable for you than it is for me.


Mike McCarthy is an online editor/workflow consultant with 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.