Tag Archives: Mike McCarthy

Precision

Review: Dell Precision 5480 Mobile Workstation

By Mike McCarthy

It has been a few years since I’ve tested and reviewed a laptop. Technology has progressed a lot since then, and systems are dramatically more powerful than they were just four years ago — and GPUs have improved more than CPUs by most measures.

Precision

I recently had the opportunity to test out the Dell Precision 5480. This is Dell’s highest end small-form-factor laptop. It is a 14-inch system packed with a 14-core, 13900H CPU; 64GB of DDR5 memory; and an Nvidia RTX 3000 Ada generation GPU. There are lots of laptop options out there with a 13900H CPU, six hyperthreaded performance cores and eight efficiency cores (for a total of 20 processing threads), but not very many of those are in a small, 14-inch frame. And the RTX 3000 Ada is even harder to come by. With 4,608 CUDA cores, 8GB of GDDR6 memory and nearly 20 teraflops of processing power, the RTX 3000 GPU is the physical equivalent of the GeForce 4070 Mobile, but with professional-level drivers. This little laptop system packs a punch.

The Display
Now there is no getting around the fact that 14 inches is a very small screen. Personally, I like huge screens, so even an 18-inch laptop screen would seem small to me, but much of my time using any laptop is likely to be spent with it connected to a larger display, whether in the office or at home. For times when I am using it on the move, or at the kitchen table from time to time, this 2560×1600 WLED screen is a good resolution for its 14-inch size. It can be set to 100% scale by eagle-eyed users who covet screen real estate, but most people will have a good experience at 150%.

The Dell Precision 5480 is advertised as supporting 500 nits, which can be helpful when using it outdoors, but it is a glossy screen. Windows reports that the display supports HDR video streaming, but there is no “Use HDR” option for the UI. I am still trying to figure out the logic behind Microsoft’s support for HDR monitoring. The screen also supports blue light filtering at a hardware level to reduce eye strain, which should be better than Windows’ night light software solution. It is also a touch screen, which can be a useful feature on occasion.

The Internals
I am always interested in fitting the maximum amount of useful computing power into the smallest possible package. Back in the day, I remember testing the PNY Prevail Pro, which, at 15 inches, was the smallest VR-capable system. Beyond that, I still have my 13-inch Sony Z1 with a quad-core, 3GHz CPU and GeForce 330M and dual SSDs. Back in 2010, it could run Adobe Premiere Pro CS5 with full CUDA acceleration in a 3-pound package. (The Dell Precision 5480 is actually very similar to that one in terms of size and weight, but, of course, the Dell is far more powerful.)

Any system smaller than 15 inches with a discrete GPU is usually hard to come by, which is why my HP ZBook X2 with Quadro GPU and a 14-inch, 10-bit display was so unique. But that system is five years old, with no direct replacement available, so I was very excited to see that Dell was stepping up to the plate with a powerful 14-inch pro workstation in a 3.3-pound package and under ¾ of an inch thick. And with a 13th Gen Intel CPU supporting 20 threads, paired with a new Ada based RTX GPU with 20 teraflops, the Dell Precision 5480 is not lacking in power.

The machine has four Thunderbolt 4 ports, which are all power-delivery-capable, plus an analog audio jack and a MicroSD reader. It comes with a small USB-C device that offers a USB-A port and an HDMI 2.0 output. The keyboard seems solid, with half-size up and down arrows and a fingerprint-enabled power button in the upper right corner, which will be natural for Mac users.

In my initial demo unit, the touchpad had a sticking issue with the click mechanism, but it turned out to have just been a defect. Once replaced, the touchpad worked great. This process did highlight to me just how important a touchpad is on a small laptop, even as a mouse user. Anytime I am using the laptop on the go (which is the point of a small laptop), the touchpad is the main pointing device, so I use it far more than I originally recognized.

The system comes with a USB-C-based power supply, rated for 130 watts, as well as the previously mentioned adapter for HDMI and USB-A ports. It comes packaged in a molded cardboard container inside a folded cardboard packing box for good product protection — and more ecofriendly than the older Styrofoam-based packaging.

A small laptop offers flexibility. In the office, you can use it with a full set of peripherals. When at home, you can plug in your monitor and accessories, and pick up exactly where you left off.

With virtual desktops, you can get a similar experience by working in the cloud on various systems at different locations, but that doesn’t allow you full access when you are in transit or when you are in places with limited internet access. The Dell Precision 5480 seems like an ideal system for anyone who needs editing power on the go and has monitors to plug in to in their primary work environments. (And they don’t need a larger laptop display on the unit itself.)

Battery Life
Admittedly, the configuration of this particular model should be expected to have the worst possible battery life (most powerful CPU and GPU available with a high-resolution-screen), but it’s not as bad as you’d think. I used this system when I attended the Adobe Max conference, and I did not bring the charger with me during the day. The only time I regretted that is when I accidentally left Adobe Photoshop running in the background for a few hours. Otherwise, I was able to do basic tasks all day long with no issue.

For non-work-related activities such as gaming, I typically got about two hours of usage when playing a 3D game before needing to plug it in. Dell has done a great job of saving power when it is not needed. Power-hungry, performance-based tasks will drain the battery… which is to be expected. But when just doing simple browser-based tasks, I was able to use it all day without issue.

Software
The unit comes with Windows 11 Pro installed. Even after 18 months, I still have not “adapted” to Microsoft’s newest OS, and I prefer Windows 10. But, based on my performance tests, the thread director in Windows 11, which is aware of the difference between the performance cores and the efficiency cores on Intel’s newest chips, does make a difference. (Windows 10 assigns hard tasks to the efficiency cores, and it takes longer to finish them, decreasing overall performance.)

One way around this is to disable the E-Cores in the BIOS and stick with Windows 10, but especially on a laptop, that negates much of the power efficiency of the newer designs. So you are pretty stuck with Windows 11 on these newer systems. But besides that, the Dell Precision 5480 comes with very little bloatware — just drivers and utilities for the various hardware devices and some Dell performance and configuration optimization tools.

The Graphics Processor
The RTX 3000 GPU is the physical equivalent of the GeForce 4070 Mobile, with 4608 CUDA cores, 8GB of GDDR6 memory and nearly 20 teraflops of processing power. It benchmarks with about 25% of the performance of my giant GeForce 4090 desktop card, which is to be expected based on the paper specs. This is actually fine in most cases since I rarely need to harness the full power of that GPU when doing regular editing tasks. And 20 teraflops is twice the performance of the top-end GeForce 2080/RTX 5000 from two generations ago, and it’s now available in a 14-inch laptop.

PrecisionKey for professional use of a model this size, I also tested the Dell Precision 5480 with a number of external displays, up to and including the Dell UltraSharp UP3218K monitor, which was supported in its full 8K at 60fps resolution by using two USB-C-to-DisplayPort cables. The last HP mobile workstation I tested required a docking station for full support of that display, and my Razer is limited to 30fps unless I use an external GPU. It’s good to see that Dell fully supports its own display range on its own system, but I do recognize that’s really a function of the GPU and supported output ports. Nonetheless, you can use this system with an 8K monitor if you so desire.

Storage
The hard drive reports 4.5GB/s write and 4.8GB/s read in AJA System Test, which isn’t the fastest PCIe 4.0 speed but more than enough for 99% of power users. Dell offers SSDs in sizes from 256GB to 4TB with self-encrypting models at 512GB and 1TB for users with those requirements.

Performance
CPUs are much harder to compare on paper, which is why tools like Maxon’s Cinebench are so valuable. Blender also has a benchmarking tool for comparing system performance. And performance is always a relative measure since we are comparing a specific system (this one) to other potential options.

Usually, reviewers compare systems to others that are very similar, but in this case, I took a different approach for two reasons. First, I don’t have similar current options to compare to. Second, there is value in comparing what you are sacrificing when you scale down to a small laptop. Which tasks can you do effectively on a mobile system, and which can wait until you are in front of (or remoting into) a powerful desktop workstation?

The 13900H, with six performance cores and eight efficiency cores, has 20 threads available to the OS. My desktop with a 12700K CPU also has 20 threads, coming from eight performance cores and four efficiency cores. In most synthetic render tests, this little laptop has about 70% of the CPU processing power of my consumer desktop tower.

PrecisionIn real-world tests, exporting cinema-quality files out of Premiere, my tests were frustratingly inconsistent. This appears to result from a combination of both Intel’s new power-saving technology and Adobe’s software optimizations. I ran my entire suite of standard test exports multiple times and got widely varying results. I then reran them repeatedly on my 12700K-based desktop and also got less consistent results than I recall in the past. Most of the time, I test repeatedly with slightly different settings so that I don’t repeat the exact same test a number of times. This has really shifted my view on quantifying performance in Premiere.

The best tests would be a live-playback test and potentially a latency test to see how long it takes playback to begin after you press the space bar. But due to the playback optimizations within the program, this is no longer a good way to compare different systems. Puget Systems, which does work in benchmarking, detail the challenges of quantifying performance in Premiere in this great article that dives even deeper into the topic than I have. Regardless of those limitations, here are the raw numbers from my Media Encoder benchmarks for you to evaluate compared to my other systems.

Summing Up
Suffice it to say, this machine can edit and play back nearly any sequence due to Premiere’s optimizations, and it can export high-quality output files with decent performance. But for longer renders and Red source footage, it might be best to render on your desktop workstation. This is totally reasonable for a portable laptop — no one should expect a 14-inch notebook to replace server level hardware. But the Dell Precision 5480 can accomplish most editing tasks with ease.


Mike McCarthy is an online editor/workflow consultant with over 15 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

 

Cloud Storage – A Wide Variety of Options

By Mike McCarthy

Cloud storage has been around for at least a decade, but I have been slow to really embrace it. This is for two reasons: trust and bandwidth. But as both of those concerns get alleviated over time, and cloud-based solutions (even beyond storage) continue to mature, I am beginning to move toward more cloud-based functions and workflows. There are a lot of different options out there for cloud storage, and they are not very similar to each other, so there are a lot of data and variables to sort through when trying to find the best solution for your needs. This piece is intended as an introduction to some of the things to consider when weighing those disparate options.

First you have to trust that your data will be available when you need it, which is a legitimate concern, and outages do happen, even in the most redundant systems. But if you add cloud storage to an existing backup and archiving plan, it is nearly all benefit. It is one more copy of your data at a separate location on a separate system. If the cloud copy fails, you should still have your own local copies. And if all your local backups fail or are destroyed in some cataclysmic event, then you should still have access to your versions stored in the cloud. Admittedly, there is higher risk of a lapse in data security with an extra copy in a separate location you don’t control, but a data breach is of less concern in my world than data loss. (I work with media, not health care records or national security.)

Second, bandwidth can be a concern, depending on your level of internet connection. By some twist of fate, I have nearly always had terrible internet options, even when in the middle of LA, and definitely more so living in a rural area for the past decade. But I got fiber this summer, and that has been a game-changer, allowing me to experiment with true cloud-based workflows. You do need a high-bandwidth connection for most serious cloud workflows. How much is enough will depend on your specific workflow, but I would recommend at least enough for one stream of your primary format, and more is always better. My 300Mb connection is fine for HD, but I would upgrade to 1Gb if I was doing 4K ProRes from the cloud.

So once those two concerns are addressed, we have to start looking at what we want to accomplish via the cloud. I have used Dropbox for many years, primarily for documents and project files and less so for media due to my bandwidth limits. Dropbox was originally focused on syncing files between different systems or locations, but now it comes with advanced collaboration functions, including file sharing and review and approval tools. This is not to be confused with Box, which focuses more on collaborative documents but can also be used for media files. Dropbox for business starts at $20 per month and is cheaper per terabyte than other options, but it costs more per user than many competing solutions.

Google Drive can be used for video files but is rarely the best option for those types of files unless absolutely necessary. Downloading large numbers of files through Google’s web app zips the contents, which is undesirable for frame sequences or other workflows with many files. But $10 per month for 2TB is a reasonable price as an intro to cloud storage. I have little experience with Microsoft’s OneDrive, but I would expect similar limitations and results.

Then there are options like Frame.io, which are very review- and approval-focused but can also be used for transferring and sharing files, especially with the camera-to-cloud functionality. It is much more optimized for large (>100GB) files if needed and costs as little as $15 per month for 2TB, but charges are multiplied per user.

Cloud StorageAdobe’s other option is Creative Cloud Storage, which is primarily designed for documents and images and is the backbone of Lightroom and Photoshop’s cloud functionality. Frame.io is usually a better option for videos, while Creative Cloud storage is better integrated with the other apps and is included with the commercial Creative Cloud software subscription.

LucidLink is a very video-centric cloud storage option. Similar to the Google Drive app, LucidLink makes the data available to the system as a standard drive letter (on PC), and it caches data locally that it expects to use. But it is much more intelligent than Google Drive.

Cloud Storage

There is also a Premiere Pro integration via a panel in the application that allows users to automatically download and cache the source files or frames for full sequences or selections of timelines. Of course, you need the right amount of bandwidth to download the files in the first place, but LucidLink is smart enough to cache assets in a given sequence to allow smooth playback and doesn’t delete any cached files until the cache is 80% full. So it could cache your entire project locally but maintain sync with other users in various locations. It costs as little as $20 per terabyte per month.

Blackmagic Cloud is a locally hosted storage solution that users buy the hardware for. That hardware then uses Dropbox or Google Drive to sync the data between multiple locations. It is not a cloud solution in and of itself, but it can be used as a local extension of existing cloud solutions.

Amazon’s S3 and other similar object-based storage solutions from Wasabi, Backblaze and other vendors can be useful for video files in certain large-scale workflows. Wasabi is $7 per terabyte per month, and Backblaze is $6 per terabyte per month, while S3 Standard is $23 per terabyte per month plus egress charges. These object storage services usually are not limited by number of users but have fewer workflow integrations than the full-service solutions above. There are, of course, many other options out there; these are just the ones I have some level of familiarity with.

Many of the other options run on Amazon S3 or Wasabi under the hood, but many of the proprietary implementations are incompatible with each other in the way that they store and access data. For example, Frame.io just added support for Amazon S3 storage, but it won’t allow you to access your existing library of S3 media in the Frame.io application. LucidLink is similar in that the encrypted data stored in the cloud isn’t accessible by other cloud services.

There are large-scale media applications, like Cinnafilm’s PixelStrings, that can access files stored on S3 directly, which is where many large companies now store their media. These applications are clearly the path of the future. I envision a day when all of my source files are stored on LucidLink to be accessed and edited in my Adobe apps, and anything I place into a Frame.io folder (or bucket) would be accessible to that service. And if I wanted to convert a file in PixelStrings, that application can access any of my cloud files to process them. So solutions have matured over the past decade, but they still have a ways to go.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

Adobe Max 2023 Part 2: Tips, Sneaks and Inspiration

By Mike McCarthy

My first article reporting on Adobe Max covered new releases and future announcements. This one will focus on other parts of the conference, and what it was like to attend in person once more. After the opening keynote, I spent the rest of that day in various sessions. The second day featured an Inspiration Keynote that was, as you can imagine, less technical in nature, but more on that later. Then more sessions and Adobe Sneaks, where they revealed new technology still in development.

In addition to all of these events, Adobe hosted the Creativity Park, a hall full of booths showcasing hardware solutions to extend the functionality of Adobe’s offerings.

Individual Sessions
With close to 200 different Max sessions and a finite amount of time, it can be a challenge to choose what you want to explore. I, of course, focused on video sessions, and within that, specifically the topics that would help me better use After Effects’ newest 3D object functionality. My one take away came from Robert Hranitzky in his session Get the Hollywood Look on a Budget Using 3D session, where he talked about how GLB files will work better than the OBJ files I had been using because they embed the textures and other info directly into the file for After Effects to use. He also showed how to break a model into separate parts to animate directly in After Effects. It was the way that I had envisioned, but I haven’t yet had a chance to try in the beta releases.

Ian Robinson’s Move Into the Next Dimension With After was a bit more beginner-focused, but pointed out that one unique benefit of Draft3D mode is that the render is not cropped to the view window, giving you an over-scan effect, which allows you to see what you might be missing in your render perspective. He also did a good job of covering how the Cinema 4D and Advanced 3D render modes allow you to bend and extrude layers and edit materials, while the Classic 3D render mode does not. I have done most of my AE work in Classic mode for the past two decades, but I may use start using the new Advanced 3D renderer for adding actual 3D objects to my videos for postviz effects.

Nol Honig and Kyle Hamrick had a Leveling Up in AE session where they showed all sorts of shortcuts and unique ways to use Essential Properties to create multiple varying copies of a single subcomp. One of my favorite shortcuts was hitting “N” while creating a rectangle mask. It sets its mode to None, which allows you to see the layer you are masking while you are drawing the rectangle. (Honestly, it should default to None until you release the mouse button, in my opinion.) A couple other favorites: Ctrl+Home will center objects in the comp, and, even more useful, Ctrl+Alt+Home will recenter the anchor point if it gets adjusted by accident. But they skipped “U,” which reveals all keyframed properties. When pressed again (“UU”), it reveals all adjusted properties. (I think they assumed everyone knew about the Uber-Key.)

I also went to Rich Harrington’s Work Faster in Premiere Pro session, and while I didn’t learn many new things about Premiere (besides copying the keyboard shortcuts to the Clipboard results in a readable text list), I did learn some cool things in Photoshop that can be used in Premiere-based workflows.

Photoshop can export LUTs (look-up tables) that can be used to adjust the color on images in Premiere via the Lumetri color effects. It generates these lookup tables via adjustment layers applied to the image. While many of the same tools are available directly within Premiere, Photoshop has some further options that Premiere does not, and this is how you can use them for video

First export a still of a shot you want corrected and bring it into Photoshop as a background image. Then you apply adjustment layers — in this case, curves, which is a powerful tool that is not always intuitive to use. For one thing, Alt-clicking the “Auto” button gives you more detailed options in a separate window that I had never even seen.  Then the top left button in the curves panel is the “Targeted Adjustment Tool,” which allows you to modify the selected curve by clipping on the area of the image that you want to change. When you do that, the tool will adjust that point on the curve. In this way, you can use Photoshop to make your still image look the way you want it, and then export a LUT for use in Premiere or anywhere else you can use LUTs. (Hey Adobe, I want this in Lumetri.)

Adobe Sneaks
In what is a staple event of the Max conference, Adobe Sneaks brings the  company’s engineers together to present technologies they are working on that have not yet made it into specific products. The technologies range from Project Primrose, a digital dress that can display patterns on black and white tiles, to Project Dub Dub Dub, which automatically dubs audio in multiple foreign languages via AI.  The event was hosted by comedian Adam Devine, who offered some less technical observations about the new functions.

Illustration isn’t really my thing, but it could be once Project Draw & Delight comes to market.  It uses the power of AI to convert the artistic masterpiece on the left into the refined images to the right, with the simple prompt “cat.” I am looking forward to how much better my storyboard sketches will soon look with this simple and accessible technology.

Adobe always has a lot with fonts, and Project Glyph Ease continues that tradition with complete AI-generated fonts based on a user’s drawing of two or three letters.  This is a natural extension of the new type-editing features demonstrated in Illustrator the day before, whereby any font can be identified and matched from a couple letters, even from vectorized outlines.  But unlike the Illustrator feature, this tool can create whole new fonts instead of matching existing ones.

Project See Through was all about removing reflections from photographs, and the technology did a pretty good job on some complex scenes while preserving details.  But the part that was really impressive was when engineers showed how the computer could also generate a full image based on the image in the reflection.  A little scary when you think about the fact that the photographer taking the photo will frequently be the one in the reflection.  So much for the anonymity of being “behind the camera.”

Project Scene Change was a little rough in its initial presentation but it’s a really powerful concept. It extracts a 3D representation of a scene from a piece of source footage, and then uses that to create a new background, for a different clip, but with the background rendered to match the perspective of the foreground clip. The technology is not really limited to background; that is just the easiest way to explain it with words.  As you can see by the character in the scene behind the coffee cup, the technology really is creating an entire environment, not just a background.  It will be interesting to see how this gets fleshed out with user controls for higher-scale VFX processes.

Project Res Up appears to be capable of true AI-based generative resolution improvements in video. I have been waiting for this ever since Nvidia demonstrated live AI-generated upscaling of 3D rendered images, which is what allows real-time raytracing to work, but haven’t seen it in action until now. If we can create something out of thin air from generative AI, it stands to reason that we should be able to improve something that already exists. But in another sense, I recognize that it is more challenging when you have a specific target to match.  This is also why generative video is much harder to do than stills. Each generated frame has to smoothly match the ones before and after it, and any artifacts will be much more noticeable to humans when in motion.

This is why the most powerful demo, by far from my perspective, was the AI-based generative fill for video, called Project Fast Fill. This was something I expected to see, but I did not anticipate it to be so powerful yet. It started off with a basic removal of distractions from elements in the background. But it ended with adding a necktie to a strutting character walking through a doorway with complex lighting changes and camera motion… all based on a simple text command and a vector shape to point the AI in the right place. The results were stunning and if seeing it believing, it will revolutionize VFX much sooner than I expected.

Creative Park
There was also a hall of booths hosting Adobe’s various hardware and software partners, some of whom had new announcements of their own. The hall was divided into sections, with a quarter of it devoted to video, which might be more than in previous years.

Samsung was showing off its ridiculously oversized wraparound LCD displays, in the form of the 57-inch double wide UHD display, and the 55-inch curved TV display that can be run in portrait mode for an overhead feel. I am still a strong proponent of the 21:9 aspect ratio, as that is the natural shape of human vision, and anything wider requires moving your head instead of your eyes.LudidLink Filespaces

Logitech showed its new Action Ring function for its MX line of productivity mice.  I have been using gaming mice for the past few years, and after talking with some of the reps in the booth, I believe I should be migrating back to the professional options.  The new Action Ring is similar to a feature in my Logitech Triathlon mouse, where you press a button to bring up a customizable context menu with various functions available.  It is still in beta, but it has potential.

LucidLink is a high-performance cloud storage provider that presents to the OS as a regular mounted hard drive. LucidLink demonstrated a new integration with Premiere Pro, as a panel in the application that allows users to control which files maintain a local copy, based on which projects and sequences they are used in. I have yet to try LucidLink myself, as my bandwidth was too low until this year, but I can it envision being a useful tool now that I have a fiber connection at home.

Inspiration Keynote
Getting back to the Inspiration Keynote, I usually don’t have much to report from the Day 2 keynote presentation, as it is rarely technical in detail, and mostly about soft skills that are hard to describe. But this year’s presentation stood out in a number of ways.

There were four different presenters with very different styles and messages. First off was Aaron James Draplin, a graphic designer from Portland with a very unique style, who would appear to pride himself on not fitting the mold of corporate success. His big, loud and autobiographical presentation was entertaining, whose message was if you work hard, you can achieve your own unique success.

Second was Karen X Cheng, a social media artist with some pretty innovative art, the technical aspects of which I am better able to appreciate. Her explicit mix of AI and real photography was powerful. She talked a lot about the algorithms that rule the social media space, and how they skew our perception of value. I thought her five defenses against the algorithm were important ideas:

Money and passion don’t always align – pursue both, separately if necessary
Be proud of your flops – likes and reshares aren’t the only measure of value
Seek respect, not attention, it lasts longer – this one is self explanatory
Human+AI > AI – AI is a powerful tool, but even more so in the hands of a skilled user
Take a sabbath from screens – It helps keep in perspective that there is more to life

Up next was Walker Noble, an artist who was able to find financial success selling his art when the pandemic pushed him out of his day job… which he had previously been afraid to leave. He talked about taking risks and self-perception, asking, “Whynot me?” He also talked about finding your motivation, in his case his family, but there are other possible ways. He also pointed out that he has found success without mastering “the algorithm,” in that he has few social media followers or influence in the online world.  So, “Why not you?”

Last up was Oak Felder, a music producer spoke about channeling emotions through media, specifically music. He made a case for the intrinsic emotional value within certain tones of music, as opposed to learned associations from movies and alike. The way he sees it, there are “kernels of emotion” within music that are then shaped by a skilled composer or artist. He said that the impact it has on others is the definition of truly making music. He ended his segment showing a special needs child being soothed by one of his songs during a medical procedure.

The entire combined presentation was much stronger than the celebrity-interview format they have previously hosted at Max.

That’s It!
That wraps up my coverage of Max, and hopefully gives readers a taste of what it would be like to attend in person, instead of just watching the event online for free, which is still an option.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

 

Adobe Max 2023: A Focus on Creativity and Tools, Part 1

By Mike McCarthy

Adobe held its annual Max conference at the LA Convention Center this week. It was my first time back since COVID, but Adobe hosted an in-person event last year as well. The Max conference is focused on creativity and is traditionally where Adobe announces and releases the newest updates to its Creative Cloud apps.

As a Premiere editor and Photoshop user, I am always interested in seeing what Adobe’s team has been doing to improve its products and improve my workflows. I have followed Premiere and After Effects pretty closely through Adobe’s beta programs for over a decade, but Max is where I find out about what new things I can do in Photoshop, Illustrator and various other apps. And via the various sessions, I also learn some old things I can do that I just didn’t know about before.

The main keynote is generally where Adobe announces new products and initiatives as well as new functions to existing applications. This year, as you can imagine, was very AI-focused, following up on the company’s successful Firefly generative AI imaging tool released earlier this year. The main feature that differentiates Adobe’s generative AI tools from various competing options is that the resulting outputs are guaranteed to be safe to use in commercial projects. That’s because Adobe owns the content that the models are trained on (presumably courtesy of Adobe Stock).

Adobe sees AI as useful in four ways: broadening exploration, accelerating productivity, increasing creative control and including community input. Adobe GenStudio will now be the hub for all things AI, integrating Creative Cloud, Firefly, Express, Frame.io, Analytics, AEM Assets and Workfront. It aims to “enable on-brand content creation at the speed of imagination,” Adobe says.

Firefly

Adobe has three new generative AI models: Firefly Image 2, Firefly Vector and Firefly Design. The company also announced that it is working on Firefly Audio, Video and 3D models, which should be available soon. I want to pair the 3D one with the new AE functionality. Firefly Image 2 has twice the resolution of the original and can ingest reference images to match the style of the output.

Firefly Vector is obviously for creating AI-generated vector images and art.

But the third one, Firefly Design, deserves further explanation. It generates a fully editable Adobe Express template document with a user-defined aspect ratio and text options. The remaining fine-tuning for a completed work can be done in Adobe Express.

FireflyDesign

For those of you who are unfamiliar, Adobe Express is a free cloud-based media creation and editing application, and that is where a lot of Adobe’s recent efforts and this event’s announcements have been focused. It is designed to streamline the workflow for getting content from the idea stage all the way to publishing on the internet, with direct integration to many various social media outlets and a full scheduling system to manage entire social marketing campaigns. It can reformat content for different deliverables and even automatically translate it into 40 different languages.

As more and more of Photoshop and Illustrator’s functionality gets integrated into Express, Express will probably begin to replace them as the go-to for entry-level users. And as a cloud-based app accessed through a browser, it can even be used on Chromebooks and other non-Mac and Windows devices. And Adobe claims that via a partnership with Google, the Express browser extension will be included in all new Chromebooks moving forward.

Photoshop for Web is the next step beyond Express, integrating even more of the application’s functions into a cloud app that users can access from anywhere, once again, also on Chrome devices. Apparently, I’m an old-school guy who has not yet embraced the move to the cloud as much as I could have, but given my dissatisfaction with the direction the newest Microsoft and Mac OS systems are going, maybe browser-based applications are the future.

Similarly, as a finishing editor, I have real trouble posting content that is not polished and perfected, but that is not how social media operates. With much higher amounts of content being produced in narrow time frames, most of which would not meet the production standards I am used to, I have not embraced this new paradigm. That’s why I am writing an article about this event and not posting a video about it. I would have to spend far too much time reframing each shot, color-correcting and cleaning up any distractions in the audio.

Firefly Generative Fill

For desktop applications, within the full version of Photoshop, Firefly-powered generative fill has replaced content-aware fill. You can now use generative fill to create new overlay layers based on text prompts or remove things by overlaying AI-generated background extensions. AI can also add reflections and other image processing. It can “un-crop” images via Generative Expand. Separately, gradients are now fully editable, and there are now adjustment layer presets, including user-definable ones.

Illustrator can now identify fonts in rasterized and vectorized images and can even edit text that has already been converted to outlines. It can convert text to color palettes for existing artwork. It can also AI generate vector objects and scenes that are all fully editable and scalable. It can even take in existing images as input to match to stylistically. There is also a new cloud-based web version of Illustrator coming to public beta.

Text-based editing in Premiere

From the video perspective, the news was mostly familiar to existing public beta users or to those who followed the IBC announcements: text-based editing, pause and filler word removal, and dialog enhancement in Premiere. After Effects is getting true 3D object support, so my session schedule focused on learning more about the workflows for using that feature. You need to create and texture models and then save them as GLB files before you can use them in AE. And you need to set up the lighting environment in AE before they will look right in your scene. But I am looking forward to being able to use that functionality more effectively on my upcoming film postviz projects.

I will detail my experience at Day 2’s Inspiration keynote as well as the tips and tricks I learned in the various training sessions in a separate article. At the time of this writing, I still had one more day to go at the conference. So keep an eye out. The second half of my Max coverage is coming soon.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

IBC 2023: AJA, Sony, Adobe, Maxon Updates

By Mike McCarthy

At the IBC show in Amsterdam, there were a number of new product announcements that are relevant to editors and other post pros.

First off, AJA announced its newest PCIe I/O card, the Kona X. This card is a straightforward option, sitting below the existing Kona 5 but far above the T-Tap Pro, with 12G-SDI and HDMI 2.0 inputs and outputs and compatibility with any of the tools that already support the long-standing Kona line of products. What makes the card unique is its DMA support for lowering input-to-output latency for faster turnaround of AR graphics and AI-driven video overlays — or virtual production backgrounds — that the game engine generates on the fly.

The rest of the potential legacy I/O interfaces are included on a separate card, the Kona Xpand, which adds support for REF, AES and LTC plus RS232 and stereo analog audio. I like this idea, as many users don’t need these options (which add cost to the card), while others require them. And I would hope that the Xpand will work with other cards in the future, lowering its overall cost over time and allowing it to be retained when the main card is updated. It’s priced at over $3,500. I would still love to see a PCIe version of the T-Tap Pro, a single 12G-SDI and HDMI output, on a low-profile PCIe card. I figure there must be a decent market for a card like that.

Blackmagic
Blackmagic announced a new camera to its Pocket Cinema Camera 6K line. The new camera has a full-frame 6K sensor and an L-mount lens, which is much larger than the Super 35 sensor and EF mount of the previous camera in the line. It still records 12-bit Blackmagic RAW files, but now to CFexpress cards or to USB storage, with live H.264 proxies as well, if desired. It seems like a pretty solid option for entry-level professional work for $2,600 plus lenses. Blackmagic also released “Blackmagic Camera,” which is a free iPhone app for fully utilizing the phone’s camera features with maximum user control. It has an impressive number of features, from false color and zebra display modes to metadata controls. It can record 10-bit ProRes directly to Blackmagic Cloud so media files will be ready to edit in Resolve. I am an Android user, otherwise I would have already downloaded and tried it.

Sony
Sony has also announced a new camera, the Burano, which is a smaller sibling to the Venice. New cameras are important to post production since that’s who has to the deal with the media that these cameras record. But in this case, it looks like the various supported formats (X-OCN and XAVC) are pretty similar to the existing options that have been coming from the Venice for a few years now.

Adobe
Adobe announced a variety of new video-related features coming to its Creative Cloud applications. Adobe is extending Premiere Pro’s well-received text-based editing so editors can easily remove pauses and filler words in recordings to quickly clean up audio and tighten up edits. The length of what will be considered a pause is user-definable, and it can, uh, find and, uh, automatically, uh, remove extra, uh, filler words in audio recordings, either leaving silence or ripple-deleting the time out entirely.

Premiere Pro can also clean up audio with a new AI-enhanced speech tool that has been a popular cloud tool on Adobe Podcast but will now be available directly in the application. It resynthesizes voices to remove background noise and increase clarity, with all processing now down on the local system. Adobe also claims to have a solution to the long-standing QuickTime gamma problem, which has resulted in inconsistent MOV output for years. And you can now batch-select multiple markers to move or delete them, which is a small but extremely useful change in certain situations, especially with Frame.IO’s implementation of them for notes.

Speaking of Frame.io, it has new side-by-side comparison tools for not just stills but videos as well, and the sources no longer have to be in the same versions stack. The tools appear to have a clean and simple UI approach to making that functionality easy and intuitive to use. They will support third-party cloud storage for enterprise users, which will probably alleviate certain security concerns for potential customers. Frame.io also has new camera-to-cloud hardware options, including Accsoon Seemo, which I had never heard of before. It appears to be a fairly revolutionary camera attachment for output compression and streaming.

Lastly, After Effects is getting a number of new features. The application’s AI-powered rotoscoping tool, Roto Brush, is getting a big upgrade to allow it to better handle intersecting objects and other complex problems. But the thing I am most excited about is After Effects’ new support for importing 3D objects directly into the composition. This change is probably motivated in part by the development of apps like Adobe Dimension and by Adobe’s acquisition of the Substance 3D tools. It initially supports OBJ and GLB files, but more formats are said to be coming.

I am not much of a 3D guy because I just don’t have the patience that has historically been required. My brother has been modeling stuff for animation since high school and modeling real objects for manufacture since college, but I would prefer to skip that step. With the development of generative AI-created 3D models, I have been looking for a way to bridge the gap between that functionality and my work in Premiere and After Effects. This appears to be what I have needed. I no longer have to set up scenes in Maxon Cinema4D to use my 3D assets in After Effects. I do a fair bit of VFX previsualization, like the postviz mockups of shots I did in editorial on my last couple big films.

The tools are all there for keying together existing content, but when I have to add an object that will be fully synthetic, I either have to send the shot to an artist to create an overlay for me or pull from a prerendered spinning video of the model. The ability to import 3D objects directly into the composition will allow me to get a model from the VFX team and overlay it onto the shot however the director wants.

The new support for 3D objects in After Effects is thanks to a new GPU-accelerated Advanced 3D engine, and it can do all sorts of advanced rendering on 3D objects. But I am primarily interested in the basics. It will allow me to import and manipulate a 3D asset in an environment that I am very familiar with and easily composite it into a shot to preview how it will look once a real VFX artist takes the time to do it properly. This functionality is already available in the current public beta builds, and I would anticipate the full release to be at Adobe Max next month. I have tested it on my system, and it works with the OBJ models from my last film. It would have saved me a lot of time on that project, so I am looking forward to putting it into use on my next movie.

Maxon
Also related to 3D, Maxon has deepened its existing partnership with Adobe (After Effects plus Cinema4D) by packaging a one-year license of Adobe’s Substance 3D tools with new subscriptions of Maxon One for a limited time. Maxon One is Maxon’s subscription service that gives users access to all of Maxon’s tools, many of which have been updated for IBC as Maxon releases its 2024 versions.

Maxon also released Cinebench24 last week, which is the newest version of its free rendering benchmark utility. The new version has added support for GPU benchmarking now that Redshift is GPU-accelerated, and I have been using it for other upcoming reviews with great success.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

The Language of High-Bandwidth Fiber Networking

By Mike McCarthy

As networking bandwidth needs increase for users processing high-resolution media files, especially past 10GbE, more and more users are being pushed from copper twisted pairs (Cat 6, etc.) toward fiber connections. Fiber-optic cable offers a lot more possibilities, making it far more complicated than copper wiring. It requires a whole new vocabulary of acronyms to keep all of the options straight and various methods to multiply the data rates, which can be combined in numerous ways.

This article is intended to orient the reader to many of the concepts and terms used in fiber networking and lay the groundwork for understanding the various ways to move data around faster than 10Gb/s. While there is technically a standard for 25GbE over Cat 8 copper cabling, if you have to replace your wiring for that speed, then you might as well go to fiber. Plus, twisted-pair encoding adds latency and heat, so fiber is usually the best choice beyond 10GbE.

Fiber Cabling
There are two main types of fiber cabling: single-mode and multi-mode. Single-mode cable is much thinner but can transmit data over much longer distances. Multi-mode is thicker and much more forgiving, allowing cheaper connections and electronics at the cost of limited range. Multi-mode is like “fiber lite” for shorter runs within buildings instead of between cities. There are various grades of multi-mode fiber rated for different transmission speeds, with the most relevant options being the older, orange OM2 cables for 4Gb Fibre Channel. These were replaced by the aqua-colored OM3 cables for 8Gb fiber and 10GbE connections. The violet OM4 cables support 25GbE for longer distances, while the aqua cables are limited to 70M. The new lime-green OM5 cables apparently support CWDM to quadruple the data rate to 100Gb. (See below for an explanation of CWDM.)

Single-mode fiber is much simpler, without all the various levels and grades but with much more expensive handling and connections. It is usually identified by a yellow jacket. Either variant comes in single or paired strands (for bidirectional communication) or in multi-strand cables for running many channels in parallel.

Fiber Connectors
Either type of fiber cabling can use many different types of connectors on the end. Telecom uses ST and SC connectors for individual strands. LC is used for pairs, and MTP supports up to 12 strands, with variants that support 16 or 24 strands. Most users in the media and entertainment space will find themselves using LC connectors on their fiber and may not even be aware that there are other options. MTP is used for 40GbE and 100GbE connections, which are more likely to be used in a server than a user workstation.

Fiber Transceivers
There are many types of fiber transceivers used for different ranges and in different enclosures. SFP+ transceivers are the most common in workstations. Specifically, interface cards and switches with the newer SFP28 connectors are used for 25GbE networking. The transceiver adapts this electrical connection to an LC fiber via a laser for outgoing data and a light sensor for incoming data. Most users will see this connected to an aqua-colored multi-mode cable or a yellow single-mode fiber, either of which links them to a switch elsewhere in their building.

There are also wider QSFP ports that interface quad-channel data to fiber, usually using eight of the 12 strands of an MTP fiber connector (four in each direction). This approach is called Parallel Single Mode 4 or PSM4. There are also QSFP transceivers that can transmit all four channels over a single fiber pair (usually with a standard LC connector) by using four different frequencies (colors) at the same time. This technique is called course wavelength-division multiplexing (CWDM). It allows 40GbE or 100GbE connections over a single fiber pair (one fiber in each direction).

QSFP ports can also be adapted to use single-channel SFP transceivers with a QSA adapter. And some SFP ports can be adapted to RJ-45 for twisted-pair Cat 6 connections, but there are some power limitations with that interface because it is less energy-efficient. Separately, there are also other transceiver sizes, like XFP and GBIC, but you are unlikely to see these in new network installations for end users.

Active Optical Cables
If this seems unnecessarily complicated, there are also active optical cables (AOC) that skip the LC or MTP connection to a transceiver and permanently connect the fiber to the ends that fit into SFP or QSFP ports. This simplifies the process of matching the correct components, especially for shorter runs, but the large SFP connectors on the ends may make it harder to run the wires through tight spaces and conduits.

The lengths of these cables are set when they are made in the factory, while LC connectors can be installed on-site with the right tools. Even so, using prefabricated AOC lines can save on deployment costs because AOCs can be mass-produced and tested much less expensively than individual transceivers, which must be interoperable with other products. The AOC is a closed system, engineered with known components for a particular length. That engineering can also save on energy. There are also similar direct-attached copper (DAC) cables with SFP or QSFP ends, which are far simpler and cheaper but much thicker and limited to very short runs, usually 1 to 3 meters. (While those are my preferred solution personally, this article is focused on fiber-based solutions.)

PAM4 Encoding
Besides CWDM, there are two other methods that have been developed to increase the bandwidth of fiber connections. The first is to vary the brightness of the light between four possible levels, allowing 2 bits of data to be transmitted with each clock cycle (00, 01,10 or 11). This method effectively doubles the data transmission rate. This is called PAM4 (pulse amplitude modulation), as opposed to the normal NRZ (non-return-to-zero) transmission method of turning the light on or off. This approach requires newer SFP56 connectors on the electrical side to get the data to the transceiver at that rate before it encodes it onto the fiber. It offers 50GbE bandwidth over a single fiber pair, doubling the potential bandwidth of existing fiber cabling installations. One benefit of this approach is backward compatibility in that the SFP56 ports can still be used with SFP28 or older SFP+ transceivers; the same goes for the QSFP56 ports. And I believe that a PAM4 transceiver will fall back to NRZ transmission on the other side if the fiber isn’t compatible with the newer approach. PAM4 is not limited to networking. It is expected to be the method used to double the data rate of PCIe in the next 6.0 standard, with similar backward-compatibility functions. There is also the PAM8 approach for encoding 3 bits of data in each clock cycle, but that is farther from commercial implementation.

Double Density
The second approach to doubling bandwidth in a given space is double density. Instead of running at a higher clock frequency, an SFP-DD port has a second set of lanes. (Combining the two approaches quadruples the speed.) But this approach doesn’t save on fiber cable, it just offers twice the number of connections (and twice the total bandwidth) for a given SFP port or ports. With QSFP-DD connections, this results in eight channels, requiring 16 fibers (eight in each direction), usually in a new MTP-16 connector, which is wider than the older MTP-12 connector. This offers 200Gb of bandwidth in each direction with NRZ signaling (8x25Gb) and 400Gb of bandwidth with PAM4 signaling (8x50Mb).

Unlike PAM4, double-density ports are very much a switch-focused development, so it is possible to use a breakout cable to adapt a QSFP-DD port in a switch to 8x SFP connections or 4x SFP-DD connections, or even 2x QSFP ports. But similar to PAM4, they will be primarily used, initially at least, in the QSFP form factor. That’s because SFP already has a multi-channel version in the form of QSFP. So it is really about building up from there.

There are already ways of exceeding 25GbE. These technologies are really focused on exceeding 100GbE for backbones and servers, so they will be used to create 200Gb and 400Gb backbones before they are widely deployed to create 50GbE links over single fiber pairs. The upcoming eight-channel OSFP (Octa SFP) standard is similar to QSFP-DD. It will be backward-compatible via an adapter and will offer more power and thermal flexibility.

Summing Up
PAM4-based networking will eventually come to the desktop in the form of 50GbE network adapters, probably around the same time that PCIe 6.0 makes it into your system…  but that won’t be for a while. In the meantime, as users upgrade to 10GbE and 25GbE, they will start seeing more LC fiber connections in their environments. And they will have to make sure they have the right transceivers in their workstations to be able to hook to the fiber network. But only people working with uncompressed 8K, or maybe uncompressed 4K at 60fps or higher, are going to need to exceed 25GbE anytime soon. And when they do, there are already options available for that… and more are coming in the near future.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

 

 

Nvidia’s GTC 2023 – New GPUs and AI Acceleration

By Mike McCarthy

This week, Nvidia held its GTC conference and made several interesting announcements. Most relevant in the M&E space are the new Ada Lovelace-based GPUs. To accompany the existing RTX 6000, there is now a new RTX 4000 small form factor and five new mobile GPUs offering various levels of performance and power usage.

New Mobile GPUs
The new mobile options all offer performance improvements that exceed the next higher tier in the previous generation. This means the new RTX 2000 Ada is as fast as the previous A3000, the new RTX 4000 Ada exceeds the previous top-end A5500, and the new mobile RTX 5000 Ada chip with 9,728 CUDA cores and 42 teraflops of single-precision compute performance should outperform the previous A6000 desktop card or the GeForce 3090 Ti. If true, that is pretty impressive, although there’s no word yet on battery life.

New Desktop GPU
The new RTX 4000 small-form-factor Ada takes the performance of the previous A4000 GPU, ups the memory buffer to 20GB and fits it into the form factor of the previous A2000 card, which is a low-profile, dual-slot PCIe card that only uses the 75 watts from the PCIe bus. This allows it to be installed in small-form-factor PCs or in 2U servers that don’t have the full-height slots or PCIe power connectors that most powerful GPUs require. Strangely, it is lower-performing, at least on paper, than the new mobile 4000, with 20% fewer cores and 40% lower peak performance (if the specs I was given are correct). This is possibly due to power limitations of the 75W PCIe bus slot.

The naming conventions across the various product lines continue to get more confusing and less informative, which I am never a fan of.  My recommendation is to call them the Ada 19 or Ada 42 based on the peak teraflops. That way it is easy to see how they compare, even over generations against the Turing 8 or the Ampere 24. This should work at least for the next four to five generations until we reach petaflops, when the numbering will need to be reset again.

New Server Chips
There are also new announcements targeted at supercomputing and data centers. The Hopper GPU is focused on AI and large language model acceleration, usually installed in sets of 8 SXM modules in a DGX server. Also, Nvidia’s previously announced Grace CPU Superchip is in production as its new ARM-based CPU. Nvidia offers these chips as dual-CPU processing boards or combined as an integrated Grace-Hopper Superchip, with shared interface bus and memory between the CPU and GPU. The new Apple Silicon processors use the same unified memory approach.

There are also new PCIe-based accelerator cards, starting with the H100 NVL, which has Hopper architecture in a PCIe card offering 94GB of memory for transformation processing.  “Transformation” is the “T” in ChatGPT, by the way. There are also Lovelace architecture-based options, including the single-slot L4 for AI video processing and the dual-slot L40 for generative AI content generation.

Four of these L40 cards are included in the new OVX-3 servers, designed for hosting and streaming Omniverse data and applications. These new servers from various vendors will have options for either Intel Sapphire Rapids- or AMD Genoa-based platforms and will include the new BlueField-3 DPU cards and ConnectX-7 NICs. They will be also available in a predesigned Superpod of 64 servers and a Spectrum-3 switch for companies that have a lot of 3D assets to deal with.

Omniverse Updates
On the software side, Omniverse has a variety of new applications that support its popular USD data format for easier interchange, and it now supports the real-time, raytraced, subsurface scattering shader (maybe, RTRTSSSS for short?) for more realistic surfaces. Nvidia is also partnering closely with Microsoft to bring Omniverse to Azure and to MS 365, which will allow Microsoft Teams users to collaboratively explore 3D worlds together during meetings.

Generative AI
Nvidia Picasso — which uses generative AI to convert text into images, videos or 3D objects — is now available to developers like Adobe. So in the very near future, we will reach a point where we can no longer trust the authenticity of any image or video that we see online. It is not difficult to see where that might lead us. One way or another, it will be much easier to add artificial elements to images, videos and 3D models. Maybe I will finally get into Omniverse myself when I can just tell it what I have in mind, and it creates a full-3D world for me. Or maybe if I need it to just add a helicopter into my footage for a VFX shot with the right speed and perspective. That would be helpful.

Some of the new AI developments are concerning from a certain perspective, but hopefully these new technologies can be harnessed to effectively improve our working experience and our final output. Nvidia’s products are definitely accelerating the development and implementation of AI across the board.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 15 years later.

Intel’s Sapphire Rapids CPU Tech Coming to Workstations

By Mike McCarthy

After many years without a significant update to most major workstation offerings, we are finally seeing new technology about to hit the market. Intel’s recently released Sapphire Rapids server CPU technology, built on the “Intel 7” process, forms the basis for its new Sapphire Rapids workstation platform. These processors come in two tiers, the Xeon W-2400 processors with quad-channel DDR5 memory and up to 24 cores, and the Xeon W-3400 processors with 8-channel DDR5 memory and up to 56 cores.

The W-2400 Series
In my opinion, the Xeon W-2400 tier of processors is the most significant news, as it replaces the Core X line of products on the old X299 platform, which was originally introduced in 2017.  It maintains the existing quad-channel interface of the HEDT-class systems, but with the newer DDR5 technology. And it upgrades the PCIe interface two steps, from 3.0 to 5.0, quadrupling the bandwidth per lane and increasing the lane count to 64 total.

With AMD retiring its standard Threadripper line last year, this gives Intel the decisive upper hand in the HEDT market for the foreseeable future. AMD’s closest competitors are the Ryzen CPUs, which have up to 16 cores but are limited to dual-channel memory and have much less PCIe bandwidth. While Intel’s 13th generation Raptor Lake consumer CPUs also have 24 cores, they only have eight “performance cores,” while the rest are smaller “efficiency cores.”

The 24 cores in the W7-2495X are all performance cores, giving it 3x as many as the 13900K for roughly triple the price. The prices throughout the new lineup are relatively linearly consistent at about $80-90 per core. I feel like it used to be cheaper to buy in bulk a decade ago, in that a quad-core CPU didn’t cost twice as much as a dual-core, but then recently the higher core counts started coming at a steep premium. So I suppose linear scaling of price and core count seems fair.

The W-3400 Series
The Xeon W-3400 processors are also a welcome step forward, but it’s a smaller jump from existing offerings than the W-2400 series is. That’s because, with an intermediate W-3300 series released just over a year ago, this class of system already had more current options available.

I reviewed Boxx’s Apexx Matterhorn system just over a year ago, which is based on the Xeon W-3300 Ice Lake processors. The max core count with the new Xeon W-3400 chips jumps from 38 to 56. The DDR and PCIe bandwidths both bump up to 5th generation. And the PCIe lane count increases from 64 to 112.  These products directly compete with AMDs Threadripper Pro architecture, which supports eight channels of DDR4 memory and 128 lanes of PCIe 4.0 bandwidth.

We will have to wait until we have performance reviews before we will know how the new 3400 series chips compare to existing choices in the market. But Intel did provide some relative performance estimates compared to previous generations, and the biggest gains are anticipated in the area of 3D, as that is one of the most multithreaded and CPU-limited tasks. Editing is more I/O-based, so the gains for those users will be more about system bandwidth and component support than render time.

Sapphire Rapids

The W-3400 CPUs are priced at about $100 per core, slightly more expensive due to the multi-chip packaging and the extra memory and system bandwidth. While there is a 56-core model — the W9-3495X with a 350W power envelope — it will not be available via retail. DIY builders will be limited to 36 cores in the W9-3475X, rated at 300W. I recall there were performance issues in certain applications when exceeding 32 cores with Ice Lake (W-3300) systems, leading to better performance with the 32-core model than the 38-core option. It will be interesting to see how the CPUs that exceed 32 cores will fare in different applications this time around.

Both tiers are now divided into performance levels: W3, W5, W7 and W9. These roughly correspond to the existing “core” lineups (i3, i,5, i7 and i9), which many users will be familiar with on the consumer side of the market. W3 has single-digit core counts and DDR5-4400 memory. Moving to a W5class chip jumps to double-digit cores and DDR5-4800 memory. W7 indicates core counts in the 20s, and W9 is for chips above that.

Sapphire Rapids

All of the new Sapphire Rapids CPUs will fit into the same LGA 4677 socket on new W790 chipset motherboards. The chipset supports up to 16 DIMM channels and eight SATA ports for hard drives, as well as 2.5GbE and Wi-Fi 6E. While the 2400 CPUs are based on a monolithic die, the higher core counts on the 3400 CPUs are made possible by linking together multiple separate chiplets with Intel’s new Embedded Multi-die Interconnect Bridge (EMIB). This process should allow scaling to even higher core counts in single large sockets in the future.

One feature that is missing from these new processors (and nearly all existing Xeons) is Intel’s Quick Sync media accelerator. This is what allows hardware encode and decode of HEVC and other video codecs on consumer systems. I am told that these CPUs are fast enough to do that with dedicated accelerators. And workstations usually have discrete GPUs, which also offer similar hardware acceleration now in many applications. So while that could have been a significant consideration for video editors two years ago, it shouldn’t make much difference now, especially when paired to a modern GPU.

HP Z Workstations
Some of the first systems to use these new technologies are HP’s soon-to-be-refreshed Z workstations. The Z4 Gen5 will be based on the 2400 series CPUs, offering up to 24 CPU cores, 512GB RAM and up to two full-sized GPUs. This is a serious step up from the existing peak of 18 cores and 44 PCIe 3.0 lanes. The Z6 line is getting a big change in that instead of being based on the server-level Xeon Scalable CPUs with support for a second CPU socket, the Z6 is now based on the single-socket 3400 architecture. And in a move I predicted last year, The Z8 Fury has gone away from dual-socket systems and now has a similar single-socket architecture.

For those who are convinced they need dual-socket performance, there will also be a “traditional” Z8 G5 variant with dual sockets, presumably based on the Sapphire Rapids dual-socket server architecture. But I expect there to be a rapidly diminishing number of customers as culture catches up with technology. (Dual-socket workstations will no longer be a status symbol when there are 100-plus-core CPUs in single-socket systems.)

In this case the Fury variant is clearly the top option, even with a single socket supporting twice the RAM and GPU expansion. It also offers a unique dual power supply system that can be used redundantly for uninterrupted operation, or in aggregate mode, to support high-power-draw configurations, such as quad GPUs.

Sapphire Rapids HP Anywhere
HP advertises that the systems are optimized for Windows 11 Pro for Workstations, but the underlying hardware from Intel should fully support Windows 10 as well. The new HP systems will all have support for Thunderbolt 4 and the option of adding dual-10GbE interfaces. They will also fully support HP’s newest remote administration hardware, the HP Anywhere Remote System Controller. Available as either a PCIe card or an external box, it allows remote system control, out-of-band access, full control of system power, access to the BIOS and other deep-system access. This should help larger organizations better manage the workstations its remote workers are using, similar in some ways to how they manage their servers. The new hardware products will be compatible with older (and even non-HP) systems, but with a subset of functions because its access to the system won’t be as deep.

Sapphire Rapids

Another new feature of HP’s workstations is the option for front-mounted, hot-swappable, M.2-based NVMe drives. With up to four lockable bays, it can hold a vast amount of high-speed media storage, assuming money is no object. I do find it amusing that “sneakernet” is still a viable approach in certain cases in the modern world — although HP also pointed out that the other use case for removable drives is locking up your data in a safe at night. Admittedly, there is the potential for transfer workflows getting data from a remote shooting set, but those cases are becoming less frequent as the world becomes more connected.

At this point, Starlink should allow real-time dailies transfer from nearly anywhere in the world. But terabytes of source data get generated on-set these days and will eventually need to make it back to media servers, so removable SSDs can offer a convenient way to do that.

Conclusion
For most users, the extra PCIe bandwidth will go primarily toward NVMe storage and GPU processing. These new systems support more GPUs than previous generations, when dual graphics cards were considered the upper limit for most users. So while consumer GPU usage is going the way of CPU sockets — consolidating to single, powerful products with the demise of Crossfire and SLI — pro users commonly need more than two GPUs. The new HP Z8 systems support up to four full-sized GPUs, but other vendors will surely offer solutions supporting even more GPUs than that for smaller market segments.

One or more of Nvidia’s new Ada-based RTX6000 cards will be the highest end GPU of choice for most power users, but a variety of more budget-friendly Ampere options are also available, as well as some AMD-based options. And users can also configure systems with Nvidia’s ConnectX-6 SmartNICs for improved networking and collaboration support over 25/50GbE.

I have been waiting for the successor to the X299 platform for a long time. I had to replace my own dual-Xeon system last summer and went with an Alder Lake solution because I couldn’t imagine stepping back to PCIe 3.0 SSDs after experiencing smooth, uncompressed 8K playback. Admittedly, that consumer system meets most of my current needs, but a W-2400-based system would have futureproofed me for a long time to come. So I expect that this new architecture will provide a welcome performance boost to many users who have been waiting for faster workstations to become available. I look forward to seeing them hit the market two months from now.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

4090

Review: Nvidia’s GeForce 4090

By Mike McCarthy

As the first product coming to market featuring Nvidia’s new Ada Lovelace architecture, the GeForce 4090 graphics card has a host of new features to test. With DLSS3 for gaming, AV1 encoding for video editors and streamers, and raytracing and AI rendering for 3D animators, there are new options available for a variety of different potential users. While the GeForce line of video cards has historically been geared toward computer gaming, Nvidia knows that the cards are also valuable tools for content creators, and a number of new features are designed especially for those users.

Like its predecessor, the Ampere-based 3090, the new GeForce 4090 Founder’s Edition card is a behemoth, taking up three PCIe slots and exceeding the full-height standard, so you will need a large case. While at first glance it looks like Nvidia just dropped the new card into the previous-generation shroud and cooling solution, upon closer inspection it becomes obvious that’s not true. While Nvidia used the same overall design approach, the company adjusted and improved it in nearly every way.

Most significantly, the 4090 is ¼ inch shorter than the 3090, which is important because I have repeatedly found the 3090 card to be slightly too long for a number of cases — to the point of scratching the edge of my card to fit into spots that were too tight. The shorter length allows Nvidia to use larger fans on the card (116mm instead of 110mm), which are now counter-rotating, with fewer blades and with better fluid dynamic bearings. This is all designed to increase airflow and minimize fan noise, which I believe Nvidia has succeeded in doing.

Power Requirements
You will also need a large power supply to support the card’s energy consumption, which peaks at 450 watts. Nvidia recommends at least an 850W power supply, so my Fractal Design 860W unit just barely qualifies. All the new cards use a new 12-pin PCIe 5.0 power connector, which is similar to the connector on the Ampere cards but with slight differences, including additional signaling pins.

While the previous card came with an adapter to direct the current from two 8-pin plugs into the card, the new cards include an adapter to harness the power from up to four 8-pin PCIe power plugs, although only three are absolutely required. The lower-tier GeForce 4080 cards will have lower power requirements but use the same plug. Eventually, this new plug should become a standard feature on new high-end power supplies, simplifying the process of powering the card.

The one thing I don’t like about the new plug on the cards is that because it sticks directly out of the top of the card, it further increases the minimum case size rather dramatically. I imagine Nvidia did this because having the plug come out the end of the card would interfere with hard drive bays and cooling systems in many cases. I also have an issue because the underlying printed circuit board doesn’t extend past the current connector location. This means you need a larger-volume system chassis, which should increase air-cooling efficiency. The card fits well in my case because I was prepared for it after the 3090 didn’t fit in my main system two years ago, but I don’t think the current power cable solution is very elegant or ideal.

The new cards still use a 16-lane PCIe 4.0 interface because they don’t yet saturate the available bandwidth of that connection to justify the expense of using the emerging PCIe 5.0 standard that is available on new motherboards. The only time I could imagine that being an issue is when the connection is being shared between two GPUs in two 8-lane slots, but that multi-card approach to increasing performance is falling out of style with consumer systems. Part of the reason for that is because of the complexity of implementing SLI or Crossfire at the application level. But more significantly, individual GPU performance scales much higher than it used to.

4090

Similar to current CPU options, the high-end options now scale to much greater performance than most users will ever be able to fully use. To that point, there has been no mention of using NVLink or similar technologies to harness the power of multiple 40 Series GPUs. This removes most of the need to harness the power of multiple separate chips or cards to increase performance, simplifying the end solution. And the new GPU options scale up very high, with the new Ada Lovelace-based chips inside this new card being up to twice as powerful as the previous Ampere chips.

The ADA Lovelace Chip
The GeForce 4090 is the current top product in the new lineup, with 16,384 CUDA cores at 4nm and running at 2.5Ghz (which greatly exceeds the previous-generation GeForce 3090’s count of 10,496 cores maxing out at 1.7Ghz). The 4090 has over three times as many transistors at 76 billion and 12 times the L2 cache of the previous version at 72MB. The memory configuration is similar to the 3090, with 24GB running at 1TB/s, but it now uses lower-power memory chips. The other big change since the last generation is the 8th-generation NVENC hardware video encoder, which now supports AV1 encoding acceleration. And now, with dual encoders operating in parallel, content up to 8Kp60 can be encoded in real time for high-resolution streaming.

More details on the AV1 encoding support are below in the video editing section.

New Software and Tools
Users can unlock many of the newest functions available with the Ada hardware through the new software developments Nvidia has been making. The biggest one, most relevant to gamers, is DLSS 3, which stands for deep learning super sampling.

DLSS 2 used AI Super Resolution to decrease the number of pixels that needed to be rendered in 3D by intelligently upscaling the lower-resolution result. DLSS 3 takes this a step further, using AI-based Optical Multi-Frame Generation to generate entirely new frames displayed between the existing rendered ones. This process is hardware-accelerated by Ada’s fourth-generation Tensor cores and a dedicated Optical Flow Accelerator. DLSS 3 uses AI-generated interpolated frames to double the frame rate, even for CPU-bound games like MS Flight Sim. With both optimizations enabled, seven of every eight on-screen pixels were generated by the AI engine, not the 3D rendering engine. This does lead me to wonder: Does it scale the frame or double the frame rate first? I am going to guess it first doubles the frame rate so there is less total data to sort through for the interpolation process, but I don’t actually know for sure.

4090

Nvidia’s other software advances are applicable to more than just increasing computer game frame rates. Nvidia Canvas is a new, locally executed version of Nvidia’s previously cloud-hosted GauGAN application. Now there is a wider variety of controls and features, and everything is processed locally on your own GPU. It is available for free for any RTX user.

Nvidia Remix is Nvidia’s toolset for the game-modding community. It uses some very innovative approaches and technology to add raytracing support to older titles, while other hardware-level approaches capture and export geometry and other 3D data to Nvidia’s Omniverse’s USD format, regardless of its source type. While I don’t do this type of work, I do play older games, so I am looking forward to seeing if anyone brings these new technologies to bear on the titles I play.

In addition, Nvidia Broadcast was introduced two years ago to use AI and hardware acceleration to clean up and modify, in real time, the audio and video streams of online streamers and even teleconference participants. It uses GPU hardware to do background noise removal and processing, visual background replacement and motion tracking of computer microphone and webcam data streams. Once again, via some creative virtual drivers, Nvidia can do it in a way that is automatically compatible with nearly every webcam application.

Real-World Performance
I have provided a lot of details about this new card and the chip inside of it, but that still leaves the question: How fast is this card? For gaming, it offers more than twice the frame rates in DLSS 3-supported applications, like the upcoming release of Microsoft Flight Simulator. I was able to play smoothly at 8K with the graphics settings at maximum and got over 100fps in 4K with AI frame generation enabled. The previous flagship 3090 allowed 8K at low-quality settings and about 50fps at 4K with maximum graphics settings, so this is a huge improvement.

4090For content creation, the biggest improvements will be felt by users working in true 3D. The Blender UI feels dramatically different with the 4090 than it does with the 3090’s pixelated free view when full render is enabled for the viewport.

Both the Blender and Octane render benchmarks report double the render performance with the 4090 compared to the previous 3090. That is a massive increase in performance for users of 3D applications that can fully use the CUDA and OptiX acceleration.

For video editors, the results are a little less clearcut. Blackmagic DaVinci Resolve has a lot of newer AI-powered features that run on the GPU, so many of these functions are about 30% faster with the newer hardware. This could be significant for users who frequently use tools like cut detection, auto framing, magic mask or AI speed processing. This performance increase is in addition to the new AV1 encoding acceleration in NVENC, which will significantly speed up exports to that format.

The improvements in Adobe Premiere Pro — where AV1 acceleration can be harnessed via the upcoming Voukoder encoding plugin — are much more subtle. But most Adobe users won’t see huge performance or functionality improvements with these new cards without further software updates, specifically updates that allow native import and export of AV1 files.

AV1
AV1 is a relatively new codec that is intended to improve upon and, in many cases, replace HEVC. It produces higher-quality video at lower bit rates, which is important to those of us with limited internet bandwidth. And it comes with no licensing fees, which should accelerate support and adoption. The only real downside to AV1 is the encode and decode complexity. That’s where hardware acceleration comes into play.

The 30 Series Ampere cards introduced support for accelerated AV1 decoding, which allows people to play back AV1 files from YouTube or Netflix smoothly. But the GeForce 4090 card is the first with the eighth-generation Nvenc engine, which now supports hardware-accelerated encoding of AV1 files. This can be useful for streaming applications like OBS and Discord, for renders and exports from apps like Resolve and Premiere, and even for remote desktop tools like Parsec — all of which run on NVENC.

4090I for one am looking forward to the improved performance and possibilities that AV1 has to offer as it gets integrated into more products and tools. At higher bitrates, AV1 is not significantly better than HEVC, but at lower bitrates it makes a huge difference. This review was my first hands-on experimentation with the codec, and I found the 5Mb data rates were sufficient to capture UHDp60 content, while 2Mb was usually enough for the 24fps UHD content I was rendering. I would usually recommend twice those data rates for HEVC encoding, so that is a significant reduction in bandwidth requirements.

The future is here.

Summing Up
If you are doing true 3D animation and rendering work in an application that supports raytracing or AI denoising, then the additional processing power in this new chip will probably change your life. For most other people, especially video editors, it is probably overkill. A previous-generation card will do most of what you need at what is hopefully now a lower price. But if you need AV1 encoding or a few of the other new features, it will probably be worth it to spring for the newest generation card — just maybe not the top one in the lineup.

For those who want the absolute fastest GPU available in the world, there is no doubt that this is it. There are no downsides. It is only a matter of whether you can justify the price and whether it will fit in your system.  But it is screaming-fast and full of new features.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

Nvidia Introduces Ada Lovelace GPU Architecture

By Mike McCarthy

Nvidia has announced its next-generation GPU architecture named after Ada Lovelace, who is an interesting figure in early computer programming (check out her Wiki). Nvidia’s newest Ada Lovelace chips have up to 18,432 CUDA cores and up to 76 billion transistors at 4nm sizes. This will of course lead to increased processing performance at lower prices and power usage.

Ada Lovelace

The new changes in the SDK and rendering level encompass DLSS 3.0 for super sampling; RTX Remix for adding new rendering features to mods on older games; shader execution reordering for improved performance; displaced micro-meshes for finer shadow detail; opacity micromaps for cleaner raytracing; and Nvidia Reflex for coordinating the CPU with the GPU. The biggest render function unique to the new generation is optical flow acceleration, which consists of AI-generated frames for higher frame rates.

Ada Lovelace chips have optical flow accelerators in the hardware, and Nvidia is training the AI models for this technology on its own supercomputers. Raytracing is now far more efficient with the newer-generation RT cores. RTX Racer will be a tech demo application released later this year for free and will leverage all these new technologies. RTX Remix can extract 3D objects right from the GPU draw cards and create USD assets from them. It also adds raytracing to older games by intercepting draw calls from DirectX. Users can further customize any RTX mod in real time by adjusting various render settings. As someone who usually plays older games, this is exciting, as I suspect it will lead to all sorts of improvements to older titles.

New GeForce Cards
The main products headlining this announcement are the new GeForce 4090 and 4080 cards, which should far outclass the previous generation released two years ago. Also, contrary to numerous rumors, they should consume less energy than the existing, power-hungry Ampere cards. The 4090 will have 24GB of memory like the previous generation, while the 4080 will come in 12GB and 16GB variants, with the 16GB version offering a more powerful chip and not just more RAM. Even the lower class of 4080 outperforms the existing 3090 Ti in most cases.

The new cards will have a PCIe Gen5 power connector, which offers up to 600W of power, but the cards draw much less energy than that. They do not have PCIe Gen5 slot connectors, and this is because they have yet to saturate the bandwidth available in a PCIe Gen4 slot.

AV1 Encoding
One of the other significant new features in this generation of chips is the addition of hardware acceleration for AV1 encoding. AV1 decoding support is already included in the existing Ampere chips, but this is the first hardware encoder available outside of Intel’s hard-to-find discrete graphics cards. AV1 is a video codec that claims to offer 30% more efficient compression than HEVC while being open-source and royalty-free.

Ada Lovelace

Netflix and a few other large tech companies have been offering AV1 as a streaming option for a while now, but it has not been much of an option for smaller content creators. That is about to change with new software coming, like a hardware-accelerated AV1 encoding plugin for Adobe Premiere Pro and Blackmagic DaVinci Resolve. Integrated AV1 streaming support that will use the new hardware acceleration is also coming to OBS and Discord.

Resolve now uses GPU for RAW decode, AI analysis and encoding, making it a real GPU computing powerhouse. I imagine that YouTube will soon have a lot of AV1 content streaming through it soon. The new cards have dual encoders that work in parallel by dividing each frame between them, allowing up to 8Kp60 encoding in real time. I assume that in the future, lower-end cards will have a single encoder for 8Kp30, which should be good enough for most people.

Professional RTX
Ada LovelaceThere is a new RTX 6000 professional GPU coming in December, not to be confused with the identically named Turing-based card from two generations ago. Nvidia’s product naming has really gone downhill since they dropped the Quadro branding. But regardless of what it is called, the new RTX 6000 should be a very powerful graphics card, with Nvidia claiming up to twice the performance of the current A6000. It has a similar underlying Ada Lovelace chip to the 4090 but with a lower 300W power envelope, a more manageable size with a two-slot cooling solution.

So there is a whole new generation of hardware coming, and it will get here soon. Both Intel and AMD are releasing their next generation of CPUs, and we will have new graphics cards to go with them. Even if you can’t afford a new high-end Ada Lovelace GPU, hopefully this will drive down the prices for the previous-generation cards that have been so difficult to find up to this point due to the cryptocurrency craze. One way or another, faster GPUs are coming, and I am looking forward to all that they bring to the table.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

Review: Professional Ampere GPUs from Nvidia

By Mike McCarthy

Nvidia has quite a big selection of professional GPUs available based on its Ampere generation of chips. While this offers users finer gradations in pricing and performance, it can be more confusing than previous generations, especially since they have dropped the Quadro branding.

My understanding is that one of the main reasons there are so many options is not just because of the binning of chips but because of supply chain issues with the rest of the parts on the board. Unlike gaming cards, where a source part can be swapped and a new revision of the card can be produced without much issue, the professional cards have been certified by software vendors with very precise conditions, and Nvidia must maintain those exact specifications. So different versions are created using easier-to-source parts and then certified again, allowing both cards to be produced as separate options.

The main additions to the series are the A4500 and the A5500, which fit as expected between the existing A4000, A5000 and A6000 cards. The A4500 which I have been testing, sits nearly dead center between the A4000 and A5000 on all paper specs (cores, memory, teraflops, etc.), while the A5500 nearly matches the A6000 in processing power, but with the same memory limits as the A5000. While these new cards were announced in the spring, I am finally getting the chance to test one out now.

The A4500 is a step above my previous favorite option, the single-slot A4000, which I got to try out in the Boxx S3 last spring. The A4500 is more powerful, requiring two slots and an eight-pin PCIe power connector. It has 16% more CUDA cores and 25% more memory at 7168 cores and 20GB of DDR6 RAM. This raises it to a total of 23.7 teraflops of single precision compute power, which is nearly 50% more than the last generation’s top Quadro RTX 6000/8000. So Nvidia has clearly been seeing some tremendous performance gains in the newest generation of cards. The fact that it requires two slots is less of an issue than it was in the past, as many motherboards are now designed for this, and the two-slot cooling solution should be quieter under load, which could benefit editors.

Besides the obligatory PCIe 4.0 x16 connector, the A4500 also adds an NVLink connector that the A4000 does not have, allowing two A4500 cards to be combined for 14,376 cores accessing 40GB of RAM — if your application can leverage the cards in parallel, which mine doesn’t do very well. But in that case, a pair of A4500s should be able to exceed the performance of a single A6000 for much lower cost. Speaking of cost, Google lists the MSRP at $2,500, but unlike most recent GPU price history, these cards are listed for sale at much lower prices than that. It is an interesting time to buy a GPU as markets adjust to shifts in cryptocurrency mining, but that means there should be some good deals available as that sorts itself out.

I am testing this card in my new 12700K-based test bed, which admittedly makes it harder to make direct comparisons to my earlier data. But for most video editors, any A-series GPU will be more than sufficient for their needs, and it really comes down to finding a good deal. Unless you are manipulating huge 3D models (in which case, GPU memory should dictate your selection), you are probably not going to see a huge performance difference in editing applications. My battery of benchmark tests in the Adobe applications returns very similar performance across all of Nvidia’s Ampere cards.

One other benefit over the GeForce cards is the option to use the RTX Desktop manager and the new RTX Experience software. This replaces Quadro Experience as the professional version of Nvidia’s popular GeForce Experience utility. While it can keep your drivers up to date, the feature that I think is most valuable is the option to do desktop screen captures at up to 8K resolution and — more significantly — with support for HDR. This is a big deal to me, as I make tutorials about HDR editing, and previous versions either didn’t support capturing from certain applications or tagged the output files incorrectly, making it harder to edit them in Premiere.

Summing Up
There are a lot of options available in the professional series of cards of the Ampere generation, which should be a benefit to end users both in terms of maximizing value and availability — after years of that being a key issue in the GPU space, courtesy of cryptocurrency mining demand. And any of these cards could be highly recommended to any video editor or VFX artist. One benefit of stepping up to the dual-slot solution in the A4500 is that it should run quieter than the single-slot A4000 at full load, which might be a factor for some editors who have their workstations near their listening environments. But any of the A-series (formerly Quadro) cards will work great in the Adobe apps and in most other NLEs.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

Review: Boxx’s Alder Lake Apexx S3 Workstation

By Mike McCarthy

I have spent the past few weeks using Boxx’s newest iteration of its Apexx S3 workstation, and it is quite impressive. Based on Intel’s “Alder Lake” CPUs with DDR5, it is technically a workstation built from gaming-class hardware, but this just makes it more power-efficient and budget-friendly. And with a 5Ghz Core i9 CPU and 64GB of RAM, it doesn’t lack in performance.

I have been interested in trying out a Core i9-based desktop system for quite a while, but the right opportunity hadn’t come along. My last consumer desktop was a Pentium 4 system I built in 2004, and I have used dual-socket Xeon workstations both at work and at home ever since. I have also reviewed two single-socket workstations recently, but these were still high-end professional architectures that didn’t have the economies of scale to allow them to compete bang-for-buck with more broadly available consumer systems.

In 2019, I reviewed an AMD Ryzen-based desktop, in Boxx’s A3 workstation and didn’t have comparison data from a similar Intel machine. I wanted to test the equivalent ninth-gen Intel 9900K at the time, but three generations later, I am now finally gathering that data thanks to a 12th-gen Intel CPU. A lot has changed in the interim, with more cores, higher bandwidth and new OS and application releases, but this allows me to see what professional content creation tasks can be accomplished effectively on a consumer-targeted architecture designed for gaming performance.

The basic question is: How far a can a consumer-level architecture be pushed before you have to step up to a more expensive professional solution? You don’t find this answer out by testing the higher-end options, although that data is needed for comparison purposes. And to be fair, the system I am testing, the Boxx Apexx S3, is as high-end as a consumer architecture system can be. But it is still based on Intel’s “consumer” chipset and CPU and should offer a reasonable look at what most other similar systems should be able to do in optimal conditions.

Similar to the A3 I reviewed two years ago, Boxx’s S3 workstation comes in a compact tower case with integrated liquid cooling for the CPU. The Core i9-12900K is a 16-core processor with support for both DDR5 memory and PCIe 5.0 interface. Although PCIe 5.0 isn’t supported widely enough to be functional yet, it will be useful in the future. It increases overall available bandwidth, which is an issue in consumer systems, especially when it comes to PCIe slots. On the topic of PCIe cards, Boxx sent me the machine equipped with an Nvidia A4000 GPU, which is the professional equivalent of the GeForce 3070 Ti.

I was also planning to test it with my existing GeForce 3090 installed for comparison purposes, but with the smaller-size case, it was too tight to install that behemoth of a GPU in it without modifying the chassis. It was only about 1mm short, so if it had been my permanent system, I definitely would have been able to make it work — but not without some permanent bending or cutting that I didn’t want to do to this review unit. But it didn’t really matter, as the A4000 was more than powerful enough for anything I wanted to throw at it and is as fast a GPU as any video editor should need. Only people working in true 3D should need anything faster. I usually recommend GeForce cards unless a user needs a specific pro feature, but with the inflated price of GPUs right now, the premium for the pro version is minimal, and the single-slot solution with professional drivers and support is much more attractive than usual.

There are three other PCIe slots available for other cards (two x8 and one x1 slot). Other cards that are relevant to video editors would be SAS or SATA RAID cards for large media storage volumes, a high-speed NIC for sharing data with other editors, and an I/O device for viewing that content on pro displays. The system comes with dual 2.5GbE ports, which should meet the needs of many users, but I also installed a 10GbE card for connecting to my 64TB storage server at higher speeds to test large projects on the system.

I have used AJA’s Kona 5 and other cards in the past but am not testing with one now. Nvidia’s HDMI output is sufficient for me at the moment since Adobe Premiere now supports HDR displays over that output, but SDI-based users will want a dedicated video I/O interface. If I were configuring this system for an individual editor, they would undoubtedly have a RAID card for storing their media. But users who are part of a team are likely using the network interface to connect to a shared storage solution, or both, to share their own local media volumes with others.

SSD
Speaking of local volumes, this system came configured with a 1TB PCIe 4 NVMe SSD, which I consider essential for any performance system. Boxx’s choice, the Samsung 980 Pro, seems widely regarded as the best SSD option available. Your OS doesn’t need the multi-GB/s bandwidth it offers, but it does benefit from the lower latency and higher I/O count of an NVMe disk. And while you might be able to get by with a 512GB system drive, applications continue to balloon in size, so 1TB is probably the sweet spot at the moment.

For those who want more SSD storage capacity, the three M.2 slots in the system could support up to 24TB of internal SSD storage now that 8TB M.2 cards are on the market, and that is before you look at PCIe add-in cards. But Boxx only configures it with 2TB cards, topping out at 6TB. Instead, they offer up to two 10TB SATA drives, which should be more cost-effective for most users. SSD drives in those capacities are very expensive, so there is still a place for good old high-capacity spinning disks when managing large amounts of media. The system can be configured with two internal hard disks, which could offer up to 20TB of storage if RAID’ed together. Anything larger than that will require an external solution connected via USB, SAS/SATA, Ethernet or Thunderbolt. (Or you could step up to Boxx’s larger S4 model, which offers more than twice as many drive bay options.)

Memory
Boxx’s newest version of the S3 is based on ASRock’s Z690 Taichi, meaning that it has Intel’s highest-end Z690 chipset supporting PCIe 5.0 and DDR5. This system came with two 32GB sticks of DDR5-4800 memory for maximum performance at dual-channel settings. Using all four DIMM slots has a performance limitation on Intel’s newest architecture for reasons I haven’t really wrapped my head around. My only theory is that Intel is pushing the limit as far as it can, and two sticks is better.

As a Premiere editor, I would recommend users have at least 32GB of RAM, and ideally 64GB. I have 128GB on my dual-socket systems for large projects, but that might be overkill for most users. Adobe After Effects, on the other hand, can benefit from even more than that if you are processing larger frame sizes, higher bit depths or lots of layers. Unfortunately, DDR5 is currently much more expensive than DDR4 (to the tune of triple the price) for fairly minimal benefit that I am aware of. So the 64GB of DDR5 in this system is an optimal choice.

Thunderbolt
The motherboard also supports two Thunderbolt 4 ports, which I consider an essential feature for editors but less so for VFX artists and other positions. Thunderbolt is a fast interface for connecting large storage volumes, and enough Mac users are exchanging data on Thunderbolt drives that editors need the option of connecting those devices to their systems.

Apexx S3

Thunderbolt is also the interface for I/O devices like the AJA T-Tap Pro for connecting to professional displays and other gear via SDI. Lack of Thunderbolt support would be a disqualifying limitation for many users, but this system has full support for it. It also has six USB 3.2 ports, dual 2.5GbE NICs, Wi-Fi, surround audio, seven SATA ports, four PCIe slots and three M.2 slots.

Windows 10 vs. 11
The system came with Windows 10 installed, which is what I would usually recommend for most users, but after my initial benchmarks, I upgraded to Windows 11 to see if it would make any noticeable difference in performance. One significant change in Windows 11 that is relevant to this processor architecture is new recognition of the difference between power (“P”) and efficiency (“E”) cores. P-cores are hyper-threaded and more responsive for foreground applications, while E-cores are for offloading simpler background tasks. Windows 10 doesn’t recognize the difference, and therefore might assign complex high-priority tasks to the slower E-cores, not realizing that it will take those cores far longer to complete the task. One way around this is to disable the E-cores, either for a particular application or entirely, to force all of the processing tasks onto the faster P-cores.

With Windows 11, this shouldn’t be an issue, and higher-priority tasks should automatically be assigned to the power cores. I was curious to measure this affect in person, hence my request for the system with Windows 10, with the intention of retesting after upgrading. I did not expect it to make a big difference and was quite surprised to see that the upgrade to Windows 11 resulted in an average of a 20% increase in rendering performance. I also had some stability issues in Windows 10, with occasional lockups and blue screens, but after upgrading to Windows 11 and updating the BIOS, I never had another issue with the system. I ran into some obstacles when updating the BIOS, which would have been show-stoppers in the past, but the motherboard’s out-of-band BIOS flashing functionality performed flawlessly, and I was up and running minutes later.  So sometimes changes to technology can be great.

It is also worth noting that the system came with Windows 10 Pro for Workstations. The Workstation edition adds support for RDMA networking, ReFS drive format and NV-DIMM persistent memory, but none of those features are relevant to this use case.

Performance was impressive, especially in the Adobe applications, even in Windows 10. I have a series of standard tests I run in Premiere Pro and Media Encoder, and those tests are based on taking large RAW camera footage clips, processing them with effects and encoding them to HEVC. (RAW 8K to HEVC encoding was the best torture test I could dream up a few years ago.) But with Adobe’s new 22 releases, that workflow is fully GPU-accelerated, all the way to 10-bit HDR HEVC output. While this is great, it makes it harder to do an apples-to-apples comparison with previous systems I have tested in the past. Regardless, this machine outperforms any system I have ever tested in Premiere Pro and Adobe Media Encoder.

The 5Ghz processor frequency sits heavily in its favor compared to the higher core counts of the more expensive workstations I have been reviewing recently. For applications (like Maxon’s Cinebench) that can scale perfectly across more threads, those extra cores make a bigger difference. But for many editing tasks, the raw, single-threaded performance is a more significant factor.

We can see that upgrading to Windows 11 cut render times by between 5% and 30%, depending on the benchmark. This is presumably due to the fact that Windows 10 loses performance by evenly distributing computational tasks between the performance and efficient cores. This causes the efficient cores to drag down overall performance. This is most clear in the Cinebench15 single thread test, which apparently got assigned to an E-core in Windows 10 because we see a 50% performance improvement after the upgrade. I retested to confirm it wasn’t a glitch but couldn’t retest in Windows 10 at that point. We also see that the more effects (CPU tasks) I add to a sequence, the bigger the difference the OS upgrade makes because much of the rest of the processing is done on the GPU, which is less affected by the OS version.

So while I wouldn’t recommend most users upgrade to Windows 11 without good reason, Alder Lake system owners do have good reason to make the jump to the newer OS. But regardless of the OS, this is the fastest system I have ever tested, at least for Adobe applications, which is my primary use case. In synthetic benchmarks like Cinebench, and surely in many other real 3D applications, there will be benefits to having more cores on higher-end architectures, but Adobe users can select Alder Lake-based systems with zero compromise in regard to performance.

Graphics
The new 12th-gen processors also include Intel’s UHD graphics 770 hardware (except for the F variants, which are a few dollars cheaper). I have no intention of using the integrated graphics to drive my displays because I have powerful PCIe-based GPUs for my graphics processing, but you can use Intel’s Quick Sync media in Adobe software independent of the display outputs. Intel Quick Sync includes hardware-accelerated encode and decode of H.264, H.265/HEVC and a variety of other formats. Most Nvidia GPUs offer similar capabilities, and Premiere now supports both vendors’ hardware. But Intel’s iteration is slightly more flexible, offering support for 4:2:2 color space processing, while Nvidia is limited to 4:2:0 and 4:4:4. 4:2:0 is used for media delivery and playback streaming, while 4:4:4 is used for high-end UI streaming, but 4:2:2 sits in the middle and is popular in broadcast environments.

4:2:2 HEVC acceleration is currently limited to Intel CPUs, and specifically this Alder Lake generation, but it is supported in Premiere, and I did see a noticeable difference in playback performance of those media files when it was enabled. (Smooth playback of multiple layers and much less CPU and power usage.) So as more cameras record in that format, this acceleration will become more useful to editors.

Power Usage
The system only used 120 watts at idle, but that jumped to 400 watts at full load. The idle power usage is much improved over some of the professional CPU architectures I have been testing recently, while still spooling up to an impressive level of peak performance. The system also supports sleep, allowing users to quickly resume their work where they left off after taking a break.

Summing Up
This system costs $6700 in the exact configuration that was sent to me, but money could be saved by opting for a lower-end GPU instead of the A4000. Money could also be saved with a lower-end CPU or less RAM, but I would definitely recommend against that, as it will affect your performance. I would consider the 12900K CPU to be the most budget-conscious solution for a system that could still be considered high-end. I would use this system to edit 8K video, full-size feature films and other such projects.

The main reasons I could imagine for needing to upgrade to a higher-level system would be more storage and connectivity options (more PCIe cards) or for 3D VFX artists who need more GPUs over 128GB RAM or more cores. But for actual video editors, 16 cores are enough for most NLE apps, and 128GB is enough to run all my Adobe apps concurrently. So I would give tricked-out Alder Lake systems like this Boxx Apexx S3 my highest recommendation as the optimal system design for top-end video editors.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

single-socket

Review: Single-Socket Workstations from Boxx and Lenovo

By Mike McCarthy

I have had the opportunity to test the two most powerful single-socket workstations available on the market today. Last year’s review of the Lenovo ThinkStation P620 looked at what is still the only new desktop workstation to be released by a major system vendor since 2017. It’s Threadripper Pro 3995WX 64-core processor is the pinnacle of AMD’s Threadripper CPU lineup and the only Threadripper-based system available from a major manufacturer. The Threadripper Pro lineup has since been refreshed to the 5000-series CPUs, and while they are slightly faster, they still use the same architecture, motherboards and chipsets.

I had a chance to test out Boxx’s Xeon W-3300-based Apexx Matterhorn workstation. With 38 cores, eight channels of memory and 64 PCIe lanes, the Xeon W-3375 processor is the single-socket version of Intel’s newest “Ice Lake” Xeon chips.

Each system design has its strengths, and each system has a few caveats I have discovered. Some of these features and issues are a result of Intel and AMD, and others are from the implementations by Lenovo and Boxx, which based its system on a Supermicro motherboard. I have had a year to find all of the peculiarities on the P620, while I have only had the Boxx system for about two months. Even so, I have compiled as much data as I can on each one to make as thorough of a comparison as possible.

I have both the GeForce 3090 and A6000 Nvidia GPUs to test with, as well as a few others, so I was able to swap between them to see whether the A6000 was any faster than the GeForce and how much the GPUs impacted the performance measurements. While there are surely cases where the A6000’s professional drivers would offer better performance in certain 3D applications, I did not see a major performance difference between them in any of my tests.

single-socketThis means that a) I recommend the cheaper GeForce option for most users, and b) I didn’t publish separate benchmarks for both GPUs since the results were within a couple percentage points of each other. The only notable difference I did find is that the boot problems I have with the AMD system when an 8K display is attached are solved when I use the A6000 instead of the GeForce card.

The Lenovo system was shipped to me with 32GB of RAM, only using two of the eight memory channels, which is likely capping performance in certain cases. It also has a PCIe 3.0 SSD since the 4.0 ones were harder to come by back then. The Boxx has 64GB of memory spread across all eight channels, which should help. It also has a PCIe 4.0 SSD, which has twice the peak transfer rate for high-resolution uncompressed playback tests. So those are some of the configuration-based limitations that could impact the results, which might not apply if you configure and buy a new system.

Benchmark Boxx Apexx Matterhorn Lenovo P620 Threadripper Pro
Cinebench R15 Multi-Core 6215 9649
Cinebench R23 Multi-Core 40040 60120
Puget AE Benchmark 1773 1151
Adobe Media Encoder (HEVC-Nvenc) 10:03 11:02
Adobe Media Encoder (HEVC-10-bit Software) 41:22 39:19

Both systems play back my 8K assets and various camera Raw files in Adobe Premiere Pro in real time without issue. The Xeon plays back 8K DPXs, but I am confident the Lenovo could do that too if it had a PCIe 4.0 SSD. Both have integrated 10GbE ports with support for NBase-T, although the Boxx system has an extra Gigabit Ethernet port. The Boxx system supports IPMI system management, but Lenovo has its own set of system management tools that I am less familiar with.

The AMD system has the option for Thunderbolt 3 support via an add-in card, but due to motherboard limitations, the Intel system does not. This is ironic considering Intel created Thunderbolt. There might be other motherboard choices that do support Thunderbolt, but I can’t find any online.

Here is a point-by-point comparison of some of the differences:

Boxx Apexx Matterhorn Xeon W-3375 Lenovo P620 Threadripper Pro 3995WX
38 Cores from 2.5-4.0Ghz (10nm) 64 Cores from 2.7-4.2Ghz (7nm)
8 Channels of DDR4-3200 ECC Memory 8 Channels of DDR4-3200 ECC Memory
64GB RAM (16 Slots, 4TB Max) 32GB RAM (8 Slots, 2TB Max)
64 PCIe 4.0 Lanes, 7 x16 slots, 4 M.2 Slots 128 PCIe 4.0 Lanes, 7 x16 Slots, 2 M.2 slots
Liquid Cooling, 5 Fans, 1600Watt Power Supply Air Cooling, 5 Fans, 1000Watt Power Supply
Requires High Power performance profiles Throttles up properly in Balanced profile
160-240Watts at idle, 250-420Watts at Load 240Watts at Idle, 400-500Watts at Load
No sleep or hibernate options (Supermicro) Sleeps and hibernates well
Runs either A6000 or GeForce 3090 well Has boot issues with 8K display on some GPUs
Opens apps much faster Takes over a minute to open Adobe apps
Runs smoothly in my experience Has issues creating folders, zipping files, saving AI files
No Thunderbolt option (Supermicro) Has Thunderbolt add-in card available
Faster AE Benchmarking Faster Cinema4D Benchmarking

Neither is a clear winner over the other, and different motherboard manufacturers might deliver different results for certain functionality. The AMD processor is better for animation rendering, but the Xeon performs better in Adobe After Effects. The difference is negligible in Premiere Pro, which is my primary application, so there is not much distinction there.

Lenovo

Although I don’t recommend updating to Windows 11 without a specific reason to make that change, I did investigate it for these systems. The Boxx system requires that you install a TPM 2.0 card to support Windows 11. This will, of course, be included once Boxx starts supporting Windows 11, but it didn’t come in my review system. The P620 does support upgrading to Windows 11, and the trick there was to avoid it until I was ready. I put off all of Microsoft’s promptings until I was finished with all my other tests in case anything went wrong. I usually don’t recommend upgrading an OS compared to a fresh install, but I figured I would give it a try. The operating system updated fine, but I did have some issues in Adobe Media Encoder and a few other apps that weren’t ready for the transition, so I recommend that most professional users stick with Windows 10 for now.

With the Threadripper system, I did have a lot of delays when making new folders, but that appears to have been helped by turning off “Show frequently used folders in Quick Access” in File Explorer Options. It still struggles with certain folders, and I haven’t found the cause. Other basic tasks have given me issues, like zip files and Illustrator save actions hanging for 60 seconds intermittently. But then it goes and plays back 8K HDR footage flawlessly from the AJA Kona 5 card.

I also had lots of problems with the Premiere interface for a month or two, but it turns out they were caused by a bug in Premiere 15 that only kicks in when a control surface is connected. So plugging in the Loupedeck+ console was the cause, not the change to an AMD-based system. It is so much quieter and more efficient than my rack-server-based beast of a home-grown workstation, and unlike the server or the Boxx workstation, it can be put to sleep or hibernated.

It you want a system from a larger system vendor, the P620 was the only game in town, being the first workstation from HP, Dell, or Lenovo, to support PCIe 4.0, among other features.  Dell has since then released the similar Precision 7865.  If you include smaller manufacturers in your search, that opens up options based on Intel’s Ice Lake Scalable Xeons and AMD Epyc CPUs, but the single-socket Xeon W chips are probably a better fit for most workstation users, like those featured in the Apexx Matterhorn.

Boxx

My Findings
If I had to buy a system for myself in that class today (presumably for PCIe lane reasons), I would buy the Lenovo P620 with the 16-core option and add a GeForce 3070 if I couldn’t use my existing parts. This is because it is cheaper and scales down farther than the Xeon option. (It sleeps and offers nearly instant boot so you can get back to work quickly.)

That said, many Adobe users would probably be better served by a system a few steps down from the ones covered here. The first step down would be the standard Threadripper processors, with half the channels of RAM and PCIe lanes of the Pro series. This actually isn’t a huge cost savings because Lenovo has put the Threadripper Pro into a well-designed and reasonably cost-effective package. Below Threadripper, there are Core X-based HEDT systems from Intel, which are placed firmly between these systems and consumer-level options. But the Core X299 platform was released in 2017 and lacks basic features like PCIe 4.0, among other things. I have set up HP Z4 systems that have worked well for clients who edit 4K projects, but that was years ago. Below that, we have consumer-level Core i9 and Ryzen systems that now feature up to 16 cores and 128GB RAM. These probably could meet the needs of more users than you would expect.

Looking upward, the Boxx solution supports four full, double-wide GPUs plus a sync card, which clearly exceeds the Lenovo if your application can use that much processing power effectively. So the Boxx clearly scales higher and can be much “faster” than the Lenovo if you add enough GPUs. That helps in maximizing potential performance before you need to jump to the next tier with a dual-socket system, which increases the price even more substantially, but the Lenovo design might appeal to a broader set of users. There is a much wider variety of workstation options available than there used to be, but you really need to understand your own processing needs, based on your target applications and workflows, to find the right solution for you. These systems are clearly the fastest single-socket systems available today, but they aren’t necessarily the best solution for every user.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

 

 

 

HEVC

Hardware-Accelerated HEVC (H.265) in Premiere Pro

By Mike McCarthy

The High Efficiency Video Codec (HEVC), or H.265, is a processing-intensive codec for both encode and decode that leads to higher video quality at lower data rates. There have been both CPUs and GPUs available for years that have dedicated hardware within them to accelerate HEVC encoding and decoding. But this hardware acceleration requires specific support within software applications to use them. And unlike with software encoders, there are a finite number of supported encoding options that can be accelerated, each of which has to be explicitly supported. The newest updates to Premiere Pro have increased the number of hardware-accelerated options for HEVC workflows, greatly increasing performance with those types of files.

CPU-Based Codec Acceleration
Premiere Pro has had CUDA-based GPU acceleration for over a decade, since CS5, but it did not use Nvidia’s accelerated encode and decode hardware until recently. Adobe started with Intel’s hardware-based acceleration for H.264 and HEVC encoding in Version 13, which was limited to 4K at 8-bit on CPUs with Quick Sync video processing. This, from my perspective, was because of laptop chips. (High-end Xeon CPUs don’t support Quick Sync, including the newest W-3300 chips.)

 HEVC

The quality was also inferior to software encodes in the initial release, but that was fixed shortly thereafter. The next step was hardware-accelerated decoding of H.264 and HEVC, which made editing with those codecs much more doable on less powerful systems, especially when it came to scrubbing through footage, which is usually rough with long GOP compression formats.

GPU-Based Codec Acceleration
Then in June of 2020, Adobe added GPU encoding acceleration to Premiere Pro 14.2, which gave support for hardware acceleration of H.264 and HEVC encoding with both Nvidia and AMD graphics cards, regardless of your CPU. This capability was much more applicable to high-end workstations, which don’t have Intel’s consumer-level Quick Sync feature but have top-end, discrete GPUs. This is when I started using hardware acceleration for more than just testing purposes. It supported up to 8K resolution on newer hardware, but it was still limited to 8-bit color.

The Limits of Hardware Acceleration
Eight-bit color was fine for most web deliverables, which was what many of those types of encodes were geared toward at the time. But that was also about the time we started seeing more HDR workflows being developed…and HDR definitely requires at least 10-bit color. All HDR exports were still using the slower software encoding and required more processing under the hood to render the extra color detail when Max Bit Depth was enabled.

HEVC Once accelerated encoding became mainstream, my standard system benchmarking process was to encode 8K Red to 8K HEVC with hardware encoding to 8-bit Rec. 709, and with software encoding to 10-bit HDR, which took considerably longer. Those benchmarks were not really affected when Adobe added GPU decoding support for H264 and HEVC in Premiere 14.5, but that support really helped playback performance, especially when using multiple streams (like in a multicam timeline).

New, High-Quality 10-Bit 4:2:2 HEVC Recordings
What about newer, high-quality 10-bit 4:2:2 HEVC recordings? With the most recent release of its Version 22, Adobe added support for accelerated decode of 10-bit 4:2:2 HEVC files. This is specific to Intel Quick Sync because neither Nvidia nor AMD currently support 4:2:2 acceleration in their GPUs. Without hardware acceleration, these newer 4:2:2 HEVC files do not play back well at all on most systems. 4:2:2 refers to the amount of color data in a file, and it used to be much more frequently discussed when the industry was making the jump from SD to HD.

The human eye is more sensitive to brightness than chroma, so higher-resolution images could be encoded more efficiently by focusing on the luminance values over the chroma data. A 4:2:0 video file has basically half-res color detail in both dimensions, while a 4:4:4 file has full color data for every pixel. 4:2:2 sits between the two, with full-vertical but half-horizontal resolution for the color data. It is the default format for SDI connections.

Because H.264 and HEVC are designed to be delivery formats, they are targeted to carry the detail that is visible to the human eye and, for the sake of efficiency, drop anything that won’t be noticed. But now those codecs are being used in cameras for acquisition, and the lost color detail is becoming more noticeable during grading, when colorists highlight image detail that otherwise wouldn’t have been visible.

Because of this, camera manufacturers — who want the affordable efficiency of HEVC encoding but better-quality imagery — have started using HEVC encoding on 4:2:2 image data. Specifically, the Canon R5 and R6, Sony’s a7S III and other DSLRs use this new format, which is not as widely supported for hardware-accelerated playback. But users of Premiere Pro 22 who have Intel Quick Sync support on their newer CPUs (11th-gen graphics or higher) should now see much smoother playback of files from those cameras — on the order of 10 times the frame rate for real-time playback and three times faster processing for export or transcoding tasks.

HEVC 10-bit Encoding Acceleration
The most recent feature, which just appeared in the Premiere 22.1.1 beta, is 10-bit HEVC-accelerated encoding. This includes support for HDR output formats and runs on Intel CPUs or Nvidia GPUs. My initial tests showed my standard benchmarking encodes completing four times faster on my workstation (which can already encode pretty fast in software on the top-end CPU) and 16 times faster on my Razer laptop. Hardware acceleration usually makes a bigger difference on less powerful systems because the system has less spare processing power to throw at the software-encoding implementation.

This new 10-bit encoding acceleration will be a big help to those working in HDR, especially if they are using an Intel laptop — and all the more if they don’t have a discrete GPU (which I wouldn’t usually recommend editing on). HEVC export is limited to 4:2:0 color space because no one should need to output 4:2:2 HEVC for delivery, and HEVC is not a good choice for intermediate exports, even at 4:2:2. But if you have a top-end DSLR shooting 4:2:2 HEVC files, and you want to edit on a laptop and post your work to YouTube in HDR, then the playback and export of your project is going to be a whole lot better with the newest version of Premiere than it would have been before.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

Review: Boxx’s Apexx Matterhorn With New Intel Xeon CPU

By Mike McCarthy

Boxx’s Apexx Matterhorn system is based on Intel’s Xeon W-3300 CPU architecture and C621A chipset. It represents the pinnacle of Intel’s single-socket system performance, with up to 38 CPU cores, eight channels of DDR4 memory and 64 lanes of PCIe 4.0 connectivity. The system I used for this review came configured with the top-end Xeon W-3375 CPU, sporting 38 cores, listed as running between 2.5GHz and 4.0 GHz, with 57MB of cache (1.5MB/core). It has 64GB RAM spread across all eight channels.

The CPU is liquid-cooled and has a 1600-watt power supply, allowing it to support all sorts of power-hungry, multi-GPU configurations with its 64 PCIe lanes spread across seven slots. It also has four M.2 slots, one of which came populated with a 1TB SSD. The system also shipped to me with an Nvidia A6000 card in it, which is the top pro visualization card in its formerly Quadro lineup.

Initial Impressions
The front of the system offers dual-USB 3 ports and dual-USB 2 ports, as well as 1/8-inch jacks for speakers, microphone and the power and reset buttons. This system also came with a full-sized DVD-RW drive in the only 5.25-inch bay. As a side note, it was labeled an M-Disc drive, which turns out to be a special, long-lasting archival optical format, which is a physical variation of DVD or Blu-Ray that’s readable in most normal players. Although I had never heard of it, apparently it has been available for a decade and is one more random capability of the system.

The back side looks different than most systems in that the motherboard is mounted on the opposite side of the case from conventional systems, placing the I/O bracket at the bottom and the PCIe slots above that. The I/O bracket includes six USB-A ports, (four of them Blue 3.0 ones) and one USB-C port (USB 3.2 Gen2x2) that is not Thunderbolt-compatible. (There is apparently no option for adding support for Thunderbolt to this system, which is surprising for an Intel system.) There is an optical port and five 1/8-inch audio jacks for 5.1 audio, as well as microphone and line input. There is a legacy serial port, a VGA port and three RJ-45 network jacks. The center network jack and the VGA port are for the IPMI baseband management controller, which I will discuss later. The far jack at the bottom corner is an Intel i210-AT 1GbE controller, while the one closer to the PCIe slots is an Aquantia AQC113 10GbE controller with support for NBase-T.  None of these network jacks is labeled or color-coded, which is an issue I have with other Supermicro motherboards as well.

Just above all of this is the 1600W power supply, which has a different connector than most users will be familiar with: a C19 connector instead of the usual C14, due to the potential for the system to draw more than 15 amps from a 120V circuit.  Without having four GPUs, there is no way my system will ever draw that much current, so I was grateful that the supplied power cable still fit into a regular 15-amp plug on my UPS.

The Expansion Options — Slots and Bays
The right side panel can be removed with two thumb screws, and it hinges open from the front. Inside the system, there are all sorts of expansion options to meet your specific configuration needs. Besides the previously mentioned seven PCIe slots and four M.2 slots, there are eight SATA ports, one of which is used by the optical drive. Those ports can be used to support storage devices in four 3.5-inch drive bays with tool-less trays for easy installation and a bracket that can hold up to two 2.5-inch drives, presumably SATA SSDs for most users. These are also located in spaces that are also convenient for connection to PCIe RAID cards if desired.

There is also a non-functional PCIe slot that I am told is for holding an Nvidia Sync card when all of the “real” PCIe slots are in use. The only PCIe card I added to my system is a Mellanox MCX354A network interface card for my 40GbE tests.

The system came with an A6000 GPU, which is an amazing marvel of graphics engineering, but not the best option for most video editors who don’t need the extra $5,000 price tag. Boxx also sells the system with Nvidia GeForce or AMD GPU options. I usually recommend GeForce cards for most editors unless you need a Quadro-specific feature — and this is especially true now that Nvidia’s Studio drivers run on both GeForce and Quadro hardware. The GeForce 3070 is probably the sweet spot for most editors, but any of the Ampere generation cards offer excellent graphics performance for editing tasks. Those doing more color and VFX work will want to invest more in GPU performance with higher-tier GeForce cards, but only people working in true 3D or using other visualization tools should need to opt for the more expensive professional-series cards. For example, my multi-screen cinematic experience workflow for the film 6 Below a few years ago was based entirely on Nvidia’s Mosaic display tiling technology, and in that case, the functionality that Quadro cards offered was worth every bit of the cost for that use case.

It is also worth noting that I was unaware that the A6000 has an EPS 12V plug instead of an eight-pin PCIe cable, but it comes with an adapter that converts two standard eight-pin PCIe power connectors into an EPS 12V plug for the card. Ironically, I have had to go the other direction in my previous Supermicro builds, adapting EPS 12V to the nearly identical looking eight-pin PCIe plug to support GPUs in my rackmount storage systems.

Boxx shipped the system with a 1TB Western Digital SN850 PCIe 4.0 NVMe SSD installed. While this is the third system I have tested that supports PCIe 4.0, this is the first time I have had the opportunity to test an SSD that fully uses that bandwidth. The drive did not disappoint, offering about 5GB/s read and write performance in AJA System Test and nearly 8GB/s reported read performance during playback of 8K DPXs in Premiere Pro. Uncompressed 8K DPXs require 3.5GB/s to play back, which is more than any of my previous SSDs could sustain. But because Premiere caches frames in front of the playback head, the potential read demand is much higher than that, hence the higher-than-strictly-necessary performance numbers.

I was able to play back my 30-second 8K DPX sequence to my 8K display at 10-bit color with a Lumetri grade and High Quality Playback enabled without dropping frames, which I had never been able to do before. EXRs did not fare as well in any of the many possible flavors, presumably due to the processing power required for decompression and conversion from linear to video gamma color space.

The CPU is water-cooled, with tubes connected to a radiator at the lower front of the case and large fans to cool the radiator with minimal acoustic noise. The space savings of not having a large CPU heatsink and blower allows the power supply to sit directly above the CPU and RAM, and the power supply is connected in such a way that it can swing up out of the way to access those components if needed.

Water-cooling is one of those features I have never attempted to set up on my own, and I have no desire to do so. The fact that Boxx has a solution completely engineered for this is one of the features that sets this system apart from its larger competitors. Water-cooling should offer improvements in both acoustic performance and thermal throttling, which are both important factors for editing systems.

Power
My initial performance tests, rendering in Adobe Media Encoder, did not return great results, but once I changed the Windows power profile from “Balanced” to “High Performance” and repeated the same renders, things improved dramatically. This resulted in the same encodes being completed in less than half the time, so the power profile makes a huge difference. That also bumped the idle power usage of the system from 160 watts to 240 watts, a significant 50% increase. (The “High Performance” power plan prevents the OS from clocking back the CPU when it is not needed.) I consider this to be a pretty big deal in regard to wasted energy, and I am hoping this can be fixed by a BIOS update or new energy management tools in Windows 11 or something like that.  To be clear, this is not a Boxx issue, but an Intel one. A quick search online reveals that other vendors’ Xeon 3300 systems have the same results.

Similarly, there is no option to sleep or hibernate the system, so you must close all your apps and do a full shutdown if you want to cut power usage while the system is dormant. My understanding is that this is due to the underlying Supermicro X12SPA-TF motherboard, which is more targeted toward the server market, where systems are expected to be always available. My storage server-based workstation is the same way, and this limitation is just as annoying here as it is on there. I usually sleep my system when I head to lunch or a meeting or whatever, which pauses everything I am doing and kills the power usage.

System Management
Being based on a Supermicro motherboard, the Boxx Apexx Matterhorn also has a full baseband management controller in the form of Supermicro’s IPMI management system. This allows users to remotely log into the system administration interface to troubleshoot problems, even when the machine is not booted.

I recommend assigning a static IP to that port, which requires a separate physical connection to your network. Browsing to that address allows you to log in and monitor or administer the system. The login used to be a generic Admin/Admin by default, but that was a powerful tool to leave unsecured by default, so Supermicro set unique default passwords on every board, which can be found on a physical label inside the system. This new password system is well-documented, but there is no reference that I can find in any of that documentation that the login name remains ADMIN in all caps. So if you get this system, or any other one with a Supermicro motherboard, I recommend assigning the IPMI port a static IP in the BIOS and making sure you successfully log in over the network with ADMIN/[label password].

The system UI does respond well in Windows, and my Adobe apps do open up much quicker than on my other systems. (My Threadripper Pro takes over a minute to open any version of Premiere, while this Boxx system only take 20 seconds.) Once in Premiere, I immediately tried to play back 8K timelines, and the space bar response for both starting and stopping was fast enough that I couldn’t consciously detect any lag. Illustrator and Photoshop were very responsive as well. I was able to edit large PSDs (hundreds of layers at 5K resolution) on my 8K monitor without any screen-tearing, which I have experienced in the past, even on other high-end systems.

Benchmark Results
I created some 4K and 8K sequences a few years ago, and I always try to use the same content with the same effects and the same encoding settings for my comparison benchmarks.  But the software keeps changing as well, so I use the newest version of Premiere Pro, AME and After Effects for my render tests.  These are my results compared to a few other systems I have tested over the years.  For the most significant comparisons, I chose the Ryzen-based Boxx A3, the Threadripper Pro-based Lenovo P620, as well as my dual-Xeon server conversion and a few older options.  I also included my initial set of benchmark results from the Balanced power profile to show the difference that setting made.

After Effects has undergone some major improvements recently, and the current beta version enables multi-frame rendering, which is well-suited to a 38-core system. My 5K background comps for my series Grounds of Freedom rendered 3 times faster once multi-frame rendering was enabled. (A 20-second segment took 4 minutes instead of 12 minutes, which is a huge improvement, especially when it comes time to render the entire 5-minute composition.

Conclusions
The configuration I was reviewing currently ships for about $17,500, which I thought was a lot of money for a computer. At least that was my opinion until I looked at Apple’s prices for similarly configured Mac Pro towers for $20K, which are based on the previous generation architecture (28-core Xeon W processors with six-channel RAM). And while $17,500 is still a lot of money, $5,000 of that price is for the A6000 card, which few people really need, and $4,000 of that was for the 38-core CPU, which not everyone needs either.

A different configuration of this Matterhorn could be ordered for half that price while still supporting all the same PCIe and RAM expansion options. While this machine is a comparable system to the Mac Pro, and exceeds it in nearly every metric, this might not be the best choice for the average editor. (Similarly, I don’t think the Mac Pro is the best choice for most OSX-based editors, but they have far fewer options available to them.) There are other systems, including other options from Boxx, with lower levels of peak performance for much less money. The lack of Thunderbolt support in this system is a factor as well, as that interface is important for a number of different media workflow options.

Consumer system architectures, like Ryzen and Core i9, offer much better performance per dollar, and the majority of video editors will be better served by those platforms. Lower-tier systems now support top-performing GPUs, eight or more CPU cores and at least 128GB RAM. This really raises the level you can reach before you are forced to step up to that next class of system. And once you are stepping up beyond that consumer tier, AMD’s Threadripper platform would be the next option to evaluate before you pull out the big bucks for this model at the top of the hierarchy.

Now that recommendation is more about the differences between Intel and AMD options than anything Boxx has done. Boxx offers systems from both Intel and AMD at each of these different performance classes. I have now tested their lowest-end AMD solution in the A3 and this top-end, single-socket Intel solution. While this system is amazingly powerful, for most editors, the lower-tier solution will offer far more bang for their buck. But those lower-tier options don’t leave nearly as much possibility for scaling up to higher performance with more RAM and GPUs.

The Apexx Matterhorn system is really designed for users who need the performance of four top-end GPUs and terabytes of RAM, so people at the top end of color grading and 3D VFX workflows or creating VR content might find this to be a compelling option. Applications like Foundry Nuke or Autodesk’s offerings might see greater performance increases from this hardware than I am seeing in Adobe applications. So if your workloads scale up with more cores and RAM, and you need greater expandability, then Boxx has put together quite the system, and this Apexx Matterhorn might be the ideal solution for you.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.

 

Networking for Post: Beyond 10GigE

By Mike McCarthy

This article wraps up a series of articles that explores different networking options for exceeding the bandwidth limitations of Gigabit Ethernet. 10GbE has come down in price and has become more common for post work over the past few years, but it’s not quite fast enough for uncompressed 4K at higher frame rates, higher resolutions or multiple streams. There are technologies that offer bandwidths exceeding 10Gb/s, but there is minimal information and familiarity with these options outside the data center space. The original approach was 40GbE, which combines four channels of 10GbE in a single connection, while the newer 25GbE standard is based on increasing the signaling frequency of a single connection.

Networking for Post Production

40GbE seems like it could be an affordable way to exceed 10GbE for some users in regard to cheap direct links between systems. It should offer 5GB/sec of bandwidth, which is as much as a large disk array or NVMe SSD can serve up. 40GbE hardware was originally developed not long after 10GbE and was primarily used for aggregate links between switches. Then it was used to connect servers to the switches, replacing teamed NICs that had been used to increase bandwidth up to that point, but it was never widely used to connect individual client systems to a network. 40GbE uses a single Quad Small Form-factor Plug (QSFP) port that allows you to connect either via direct-attach cables with four pairs of Twinax copper or over multi-line fiber to MTP connectors in QSFP transceivers. QSFP ports can also be adapted to single-channel SFP ports via a QSA adapter (quad-to-single adapter).

On the switch side, a single QSFP port can be adapted to support four separate 10GbE SFP connections, but NICs apparently do not support this approach. That would, among other things, require the OS to view it as four separate adapters with unique settings and IPs to configure. If there is a way to do that, I would like to know, because it would make for some interesting switchless 10GbE resource-sharing options.

Normally, when you divide network traffic between multiple channels, the load balancing directs all traffic from a single request over a single link. This isn’t an issue when your storage server is streaming data to a number of clients where no individual recipient will receive data faster than one of the individual links, but the aggregate total of those connections will exceed an individual link.

Networking for Post

So in a traditional use case, you have eight workstations connected to a switch at 10Gb, and the media server they are all pulling data from is connected with a 40Gb link. Each system can get up to 10Gb from the server — up to 40Gb total. But what if one of those workstations was connected at 40Gb, and everyone else was off for the weekend? Could a single transfer exceed 10Gb? Or would you have to initiate multiple transfer tasks to use the four-component links? I asked this question a few years ago at NAB, and none of the representatives from the networking companies had a solid answer for me.

25GbE
The next approach to exceeding the 1.2GB/s bandwidth limit of 10GbE was 25GbE, which is designed to eventually replace 10GbE all the way down to the client level when needed. It is much simpler than 40GbE because they just increased the signal frequency in the SFP connector, and that is basically it. The other simplifying factor is that there is no support for older connection types like CX4 or even RJ-45 over twisted-pair (Cat #) cable. All 25GbE connections come in the form of SFP28 ports (with 28GHz being the number for compatibility with InfiniBand protocols, which use the same hardware). Similar to other SFP+ ports, SFP28 ports can be connected with direct-attached Twinax cables for short distances or transceivers with fiber for longer runs. It is a single unified link, so there is no question about maximum speeds, and you should be able to transfer 3GB/s over 25GbE connections if your storage can support that rate.

Additionally, 25GbE links can be combined to create 50GbE and 100GbE connections using QSFP28 ports — 4x25GbE links for 100Gb of aggregate bandwidth, similar to 4x10GbE links for 40Gb of aggregate bandwidth. So there is lots of bandwidth available, and very few users need to (or even can) move more than 12GB/s. The only downside is price. While 40GbE gear can be found for cheap on eBay, this is because other people are moving from 40GbE- to 25GbE-based solutions, including 50GbE and 100GbE. And 25GbE gear hasn’t been available long enough to make it into the used market. So 25GbE-based hardware still commands a much higher price. And while that is where high-end post solutions are moving eventually, that doesn’t necessarily make it the best choice for everyone at the moment.

Networking for Post40GbE
I had known about 40GbE for quite a while before I really explored it because I was familiar with some of the limits of traditional Ethernet bonding. You need a mechanism to break up the request across the channels. My first Fiber Channel SAN used two channels of 4Gb fiber to share two separate storage volumes that were then combined as a dynamic RAID-0 in Windows disk manager, and the RAID architecture divided the I/O between the two channels. This worked, but it was hard to manage, and that added complexity opens up all sorts of potential problems.

Networking for PostI was only interested in using 40GbE if it wouldn’t require any of these types of workarounds. And no one seemed to know for sure if it would because, similar to bonded Ethernet, 40GbE was normally only used in situations where data from many connections was aggregated, making it easy to divide between separate channels. I realized it was time to run some tests. Could 40GbE be used to connect high-bandwidth workstations at relatively low cost, or would the more expensive 25GbE gear be required?

While 40GbE-capable switches aren’t cheap, the 40GbE cards are. I bought two PCIe cards for this experiment for under $100 on eBay — Mellanox ConnectX-3 MCX354A. I installed the cards and connected them with a QSFP Twinax cable. I installed Mellanox’s https://www.mellanox.com/products/adapter-software/ethernet/windows/winof-2 drivers and set the cards to 40GbE mode instead of 56Gb InfiniBand mode, which I haven’t explored yet. Beyond that, the steps are the same for configuring a 10GbE direct link or a 100GbE one, which I detailed out in the second article in this series.

It is a bit more work to establish a single-link network than to connect to an existing network switch and router. In this case, I set up my 40GbE direct link to run on the 192.168,40.1 subnet. I also recommend setting the Jumbo Packet threshold to 9014 for best performance. Usually when establishing a high-speed direct connection, all of your systems are already connected to the same Gigabit network for internet access. And you’ll want to make sure the traffic between those systems is routed over the new link and not the existing Gigabit network. To do that, we use those unique IP addresses that are in a separate subnet. So in Windows, map a network drive with the path: \\[OppositeSystemIP]\[ShareName] like \\192.168,40.101\RAID.

Networking for Post

Then you can try copying files to or from that network drive. Assuming your storage volumes themselves on both systems are fast enough (RAIDs or SSDs), you should get around 800-1000MB/s if you have a 10GbE direct link. It would be nice to get 4-5GB/s over 40GbE, but apparently the Windows TCP/IP stack is only able to process about 2GB/s without using third-party tools to divide the copy over multiple threads. But anything exceeding 1.25GB/s is enough to prove that we aren’t limited to a single 10Gb link for a discrete copy task. Even if only one of the connected systems has storage that supports that data rate, the connection can be tested by playing back high-res uncompressed video across the network. (Try exporting some lower-res content to 4K or 6K DPXs.) If neither system has storage that supports that data rate, then you oversized your network. For testing purposes you can create RAM drives, share them over the network and copy between them to test maximum network performance. This approach removes the potential storage bottleneck for any network bandwidth test as long as you have the available RAM.

My home systems both have drives that run over 1000MB/s, so they are optimal candidates for a 10GbE link. The 40GbE link I was testing here was overkill for them, but I was able to get over 2GB/s between RAM drives on the systems. These cards will be optimal for one of my client locations, where they have a number of systems with large RAIDs and SSDs that get 2-3GB/s. That is probably where they will end up being used long-term, connecting the main editor to the largest storage server at maximum speed.

When I installed one of the 40GbE cards in my Sonnet Thunderbolt PCIe Breakaway Box, I was able to get 1.5GB/s connection to my laptop, but this is only marginally better than the 1GB/s I can get with Sonnet’s much easier-to-use Solo10G Thunderbolt NIC. Sonnet offers both SFP and RJ45 versions of the Solo10G adapter, which is the optimal high-bandwidth solution for most Thunderbolt laptops, while QNAP offers a USB-C adapter that provides 5Gb NBase-T support for systems without Thunderbolt.

Using PCIe expansion boxes allows bandwidths up to the 40Gb limit for Thunderbolt 3, but that is probably not a reasonable solution for most laptop users. Laptop users shouldn’t need 40GbE, but since I had the parts to test it, I gave it a try. I was also able to connect the 40GbE cards to the QSW-2104 switch I reviewed in my last article by using a QSA adapter and a direct-attached SFP cable, or by using an MTP to 4xLC breakout cable. This allows me to have a triangle of connections using the dual QSFP port cards, linking the systems together at 40Gb and giving them each a 10Gb connection to the rest of my network. This makes for an interesting application that combines all the technologies I have been exploring. But the main purpose of these dedicated high-bandwidth, direct-link connections is to allow a power user access to their files at maximum speed on a shared storage location, or to allow two users to share files that are stored locally on one of the systems.

My Findings
The result is, yes, 40GbE can be used to connect workstations and exceed 10Gb of bandwidth for individual transfers or playback. This will only be practical when all of the systems are in close proximity, but it could be scaled to a large number of systems via 40GbE switches that are relatively cheap on eBay. For any other use, or for better future compatibility, investing in 25GbE-based products will probably be worth the money if you need that level of performance and bandwidth. But delaying that upgrade until prices come down by using 10GbE or 40GbE in the interim (until you outgrow those limits) will probably save you lots of money.


Mike McCarthy is a technology consultant with extensive experience in the film post production. He started posting technology info and analysis at HD4PC in 2007. He broadened his focus with TechWithMikeFirst 10 years later.