NBCUni 9.5.23

Review: Nvidia’s New Ampere-Based GeForce 3090 Series

By Mike McCarthy

Nvidia has releasing the next generation of GeForce video cards based on its new Ampere architecture. Nvidia is also sharing a number of new software developments it has been working on. Some are available now, while others are coming soon.

The first cards in the GeForce RTX 30 Series are the 3070, 3080 and 3090. They offer varying numbers and CUDA cores and amounts of video memory, and they strongly outperform the cards they are replacing. They all support the PCIe 4.0 bus standard for greater bandwidth to the host system. They also support HDMI 2.1 output to drive displays at 8Kp60 with a single cable and can encode and decode at 8K resolutions. I have had the opportunity to try out the new GeForce RTX 3090 in my system and am excited by the potential it brings to the table.

Nvidia originally announced its Ampere GPU architecture back at GTC in April, but it was initially available for supercomputing customers only. Ampere has second-generation RTX cores and third-generation Tensor cores, while having double as many CUDA cores per streaming multi-processer (SM), leading to some truly enormous core counts. These three new GeForce graphics cards improve on the previous Turing-generation products by a significant margin, both due to increased cores and video memory and to software developments that improve efficiency.

The biggest change to improve 3D render performance is the second generation of DLSS (Deep Learning Super Sampling). Basically, Nvidia discovered that it was more efficient to use AI to guess the value of a pixel than to actually calculate it from scratch. DLSS 2.0 runs on the Tensor cores, leveraging deep learning to increase the detail level of a rendered image by increasing its resolution via intelligent interpolation. It is basically the opposite of Dynamic Super Sampling (DSR), which rendered higher-resolution frames and scaled them down to fit a display’s native resolution, potentially improving image quality.

DLSS usually increases the resolution by a factor of four, for example, generating a UHD frame from an HD render. But now it supports running by a factor of nine, generating an 8K output image from a 2560×1440 render. While this can improve performance in games, especially at high display resolutions, DLSS can also be leveraged to improve the interactive experience in 3D modeling and animation applications.

Nvidia also announced details for another under-the-hood technology it calls , which will allow the GPU to directly access data in storage, bypassing the CPU and system memory. It will use Microsoft’s upcoming DirectStorage API to do this at an OS level. This frees those CPU and host system resources for other tasks and should dramatically speed up scene or level loading times. They are talking about 3D scenes and game levels opening up five to 10 times faster when they decompress data from storage directly on the GPU. If this technology can be used to decompress video files directly from storage, that could help video editors as well. Nvidia also has RTXGL, which is accelerated Global Illumination running on RTX cores. The new second-generation RTX cores also have new support for rendering motion blur far more efficiently than in the past.

New Technology
Nvidia Reflex is a new technology for measuring and reducing gaming lag — in part by passing the mouse data through the display to compare user’s movements to changes on screen. Omniverse Machinima is a toolset aimed at making it easier for users to use 3D gaming content to create cinematic stories. It includes tools for animating character face and body poses, as well as RTX-accelerated environmental editing and rendering. Nvidia Broadcast is a software tool that uses deep learning to improve your webcam stream, with AI-powered background noise reduction and AI subject tracking or automated background replacement, among other features. The software creates a virtual device for the improved stream, allowing it to be used by nearly any existing streaming application. Combining it with NDI Tools and/or OBS Studio could lead to some very interesting streaming workflow options, and it is available now if you have an RTX card.

The New GeForce Cards
The three new cards are available from Nvidia as Founders Edition products and from other GPU vendors with their own unique designs and variations. The Founders Edition cards have a consistent visual aesthetic, which is strongly driven by their thermal dissipation requirements. The GeForce RTX 3070 will launch next month for $500 and features a fairly traditional cooling design, with two fans blowing on the circuit board. The 3070 has 5888 CUDA cores and 8GB DDR6 RAM, offering a huge performance increase over the previous generation’s RTX 2070 with 2560 cores. With the help of DLSS and other developments, Nvidia claims that the 3070’s render performance in many tasks is on par with the RTX2080Ti, which costs over twice as much.

The new GeForce RTX 3080 is Nvidia’s flagship “gaming” card, with 8704 CUDA cores and 10GB of DDR6X memory for $700. It should offer 50% more processing than the upcoming RTX 3070 and double the previous RTX 2080’s performance. Its 320 watts TPD is high, requiring a new board design for better airflow and ventilation. The resulting PCB is half the size of the previous RTX 2080, allowing air to flow through the card for more effective cooling. But the resulting product is still huge — mostly heat pipes, fins and fans.

Lastly, the GeForce RTX 3090 occupies a unique spot in the lineup. At $1,500, it exceeds the Titan RTX in every way, with 10,486 CUDA cores and 24GB of DDR6X memory for 60% of the cost. It is everything the 3080 is, but scaled up. The same compressed board shape allows for the unique airflow design in a three-slot package that is an inch taller than a full-height card. This allows it to dissipate up to 350W of power. Its size means you need to ensure that it can physically fit in your system case or design a new system around it.

It is primarily targeted at content creators, optimized for high-resolution image processing, be that gaming at 8K, encoding and streaming at 8K, editing 8K video or rendering massive 3D scenes. The 3090 is also the only one of the new GeForce cards that allows for NVLink to harness the power of two cards together, but that has been falling out of favor with many users anyway, as individual GPUs increase in performance so much with each new generation.

All of the cards offer three DisplayPort 1.4 outputs and a single HDMI 2.1 port. They offer hardware decode of the new video codec up to 8K resolution and encode and decode H.264 and HEVC, including screen capture and live-streaming of content through ShadowPlay. All three use a new 12-pin PCIe power connector, which can be adapted from dual 8-pin connectors for the time being, but expect to see these new plugs on power supplies in the future.

My Initial Tests
I was excited to recently receive a new GeForce RTX 3090 to test out, primarily for 8K and HDR workflows. I also will be reviewing a new workstation in the near future, which will pair well with this top-end card, but have not received that system yet and wanted to get readers some immediate initial results. I will follow up with more extensive benchmarking comparing this card to the 2080Ti and P6000 next month once I have that system to test with.

So I set about installing the 3090 into my existing workstation, which was a bit of a challenge. I had to rearrange all of my cards and remove some to free up three adjacent PCIe slots. I also had to adapt some power cables to get the two 8-pin PCIe plugs I needed for the included adapter to connect to the new 12-pin mini plug. Everything I did works fine for the time being, but I can’t close the top of the 3U rack-mount case. (One more reason besides airflow to always opt for the 4U option.) Since case enclosures are important for proper cooling, I added a large fan over the whole open system.

I had already installed the newest Studio Driver, which worked with my existing Quadro cards as well as the new GeForce cards. That is one of the many benefits of the new unified Studio Drivers; I don’t have to change drivers when switching between GeForce and Quadro cards for comparison benchmarks. My first test was in Adobe Premiere Pro, where I discovered that new architectures make a significant difference when you access the CPU as deep as Adobe does. So I had to use the newest public beta version of Premiere Pro to get playback working with CUDA enabled. The newest beta version of Premiere Pro also supports GPU-accelerated video decoding, which is great, but it makes it hard to tell which speed improvements are from Nvidia’s software optimizations and which are from this new hardware.

Playback performance can be hard to quantify compared to render times, but Ctrl+Alt+F11 enables some on-screen information that gives Premiere users some insight into how their system is behaving during playback, combined with the dropped frame indicator and third-party hardware monitoring tools. I got smooth Premiere playback of 4K HDR material with high-quality playback enabled, which is necessary to avoid banding. 8K sequences played back well at half-res, but full resolution was hit or miss, depending on how much I started and stopped playback. Playing back long clips worked well, but stopping and resuming playback quickly usually led to dropped frames and hiccupping playback. Full-res 8K playback is only useful if you have an 8K display, which I do. But 8K content can be edited at lower display resolution if necessary. Basically, my current system is sufficient for 4K HDR editing, but to really push this card to its full potential for smooth 8K editing, I need to put it in a system that isn’t seven years old.

Blackmagic DaVinci Resolve also benefits from the new hardware, especially the increased RAM, for things like image de-noising and motion blur at larger frame sizes, which fail without enough memory on lower-end cards. While technically previous-generation Quadro cards have offered 24GB RAM, they were at much higher price points than the $1,500 GeForce 3090, making it the budget-friendly option from this perspective. And the processing cores help as well, as Nvidia reports a 50% decrease in render times for certain complex effects, which is a better improvement than it claims for similar Premiere export processes.

RedcineX Pro required a new beta update to use the new card’s CUDA acceleration. The existing version does support the RTX 3090 in OpenCL mode, but CUDA support in the new version should be faster. I was able to convert an 8K anamorphic R3D file to UHD2 ProRes HDR, which had repeatedly failed due to memory errors on my previous 8GB Quadro P4000, so the extra memory does help.  My 8K clip takes 30 minutes to transcode in CPU mode and just over a minute with the RTX 3090’s CUDA acceleration.

Blender is an open-source 3D program for animation and rendering, which benefits from CUDA, the Optics SDK and the new RTX motion-blurring functionality. Blender has its own benchmarking tool, and my results were much faster with Optics enabled. I have not tested with Blender in the past, so I don’t have previous benchmarks to compare to, but it is pretty clear that having a powerful GPU makes a big difference here compared to CPU rendering.

Octane is a render engine that explicitly uses RTX to do raytraced rendering, and all three scenes I tried were two to five times faster with RTX enabled. I expect we will see these kinds of performance improvements in any other 3D applications that take the time to properly implement RTX raytracing acceleration.

I also tried a few of my games but realized that nothing I play is graphics-intensive enough to see much difference with this card. I mostly play classic games from many years ago that aren’t even taxing the GPU heavily at 8K resolution. I may test Fortnite for my follow-on benchmarking article, since that program will benefit from both RTX and DLSS in HDR, potentially at 8K.

From a video editor’s perspective, most of us are honestly not GPU-bound in our workflows, even at 8K, at least in terms of raw processing power. Where we will see significant improvements is in system bandwidth as we move to PCIe 4.0 and from the added memory on the cards for processing larger frames, in both resolution and bit depth. The content creators who will see the most significant improvements from the new GPUs are those working with true 3D content. Between Optics, RTX, DLSS and a host of other new technologies, and more processing cores and memory, they will have dramatically more interactive experiences when working with complex scenes, at higher levels of realtime detail.

Do You Need Ampere?
So who needs these, and which card should you get? Anyone who is currently limited by their GPU could benefit from one, but the benefits are greater for people processing actual 3D content.

Honestly, the 3070 should be more than enough for most editors’ needs, unless you are debayering 8K footage or doing heavy grading work on 4K assets. The step up to the 3080 is 40% more money for 50% more cores and 25% more memory. Therefore 3D artists who need the extra cores will find this to be easily worth the cost. The step up to the 3090 is twice the price for more than twice the memory but only 20% more processing cores.

This upgrade is all about larger projects and frame sizes, as it won’t improve 3D processing by nearly as much, unless your data would have exceeded the 3080’s 10GB memory limit. There are also the size considerations for the 3090: do you have the slots, physical space and power to install a 350W beast? Other vendors may create different configurations, but they are going to be imposing, regardless of the details.

So while I love the new cards, due to my fascination with high-end computing hardware, I personally have trouble taxing this beast’s capabilities enough with my current system to make it worth it. We will see if balancing it with a new workstation CPU will allow Premiere to really take advantage of the extra power it brings to the table. Many video editors will be well served by the mid-level options that are presumably coming soon. But for people doing 3D processing that stretch their current graphics solutions to their limits, upgrading to new Ampere GPUs will make a dramatic difference in how fast you can work.


Mike McCarthy is an online editor/workflow consultant with over 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.


Leave a Reply

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.