An enthusiast is claiming that his own homemade power efficiency algorithm can dramatically improve performance on AMD Ryzen CPUs, over and above the default algorithms AMD has itself distributed.
The idea of unlocking hidden performance with a quick software mod is intrinsically attractive. Moreover, the enthusiast side of the PC industry has a long history of uncovering useful tricks and tips like this. From third-party utilities that offer access to various GPU features like Nvidia Inspector, to the old third-party utility Rain that enthusiasts could use to lower CPU temperatures and reduce power consumption, ordinary PC users have often created projects that extended or improved the capabilities of various products.
The 1usmus Power Plan
Ryzen enthusiast 1usmus has written an article at TechPowerUp claiming to improve the performance of AMD’s CPUs through a little manual adjustment to the company’s power plans.
When I reached out to AMD, however, the company representatives we spoke to were uncertain this type of manipulation could pay dividends. AMD’s perspective is similar to my own. I’ve taken a few cracks at trying to boost Ryzen performance through UEFI adjustments and have generally found the effort not to be worth it. Overclocking headroom on Ryzen is very limited and the CPU has generally seemed to perform best if left to its own devices.
But the entire point of a site like this is to explore what kinds of performance might be available at the cutting edge. I fired up the Ryzen 9 3900X on our MSI X570 Godlike with the 7C34v16 UEFI, a fresh, fully patched installation of Windows 10 1903, the latest AMD drivers (220.127.116.113), and a Thermaltake Floe Riing RGB 360 cooler. While this cooler is from 2017, it’s a large radiator design with space for three 120mm fans. We’ve relied on it for multiple generations of Threadripper and Intel HEDT testing, and it can stand up to a 12-core Ryzen 3950X with no problem. Maximum reported CPU temperature from the 12-core was 67C in Ryzen Master under sustained load during the Blender 1.0 Beta 2 benchmark.
The Windows 10 scheduler has some known shortcomings with regards to how it treats CPUs with multiple NUMA nodes (we’ve discussed some of these with relation to the 2990WX, which is severely affected in some workloads). 1usmus spends some time cataloging various issues with the Windows 10 scheduler, some of which he believes his own power plan can solve.
My approach to solve this deficit in the Windows Scheduler is to use a customized power profile that provides better guidance to the scheduler on how to distribute loads among cores. This should put load on the best cores (which clock higher) and result in higher and smoother FPS because workloads will no longer bounce between cores.
This should be a fairly straightforward set of improvements to test. He claims gains of up to 200MHz in highest observed CPU frequency. He claims an improvement of ~1.03X in Cinebench R15 multi-threaded based on actual benchmark results and writes:
Whilst AMD tried to address deficiencies in Precision Boost behavior by dialing up maximum boost frequencies by roughly 50-100 MHz with AGESA 18.104.22.168 ABBA, this modification has managed to increase maximum observed core frequencies by 200 MHz on average, which should result in higher performance across the board, especially for less-parallelized workloads.
1usmus writes that the following settings must be changed in order for his power plan to work properly:
- Global C-state Control = Enabled
- Power Supply Idle Control = Low Current Idle
- CPPC = Enabled
- CPPC Preferred Cores = Enabled
- AMD Cool’n’Quiet = Enabled
- PPC Adjustment = PState 0
We performed these modifications and fired up the test platform. They definitely change the behavior of the CPU, but it’s not clear they do so in a helpful manner.
Under AMD’s default power plan, most CPU cores remain awake while the chip is running a workload like Cinebench, though they remain at low clocks.
Under 1usmus’ plan, these cores mostly sleep. There’s a definite trade-off here between system responsiveness and lowest-possible power state. Putting 10-11 of the other CPU cores to sleep during a workload does allow for full power to be diverted to a single CPU core. We do, in fact, see some evidence that this allows the CPU to hit very slightly higher boost clocks in single-threaded workloads.
In Cinebench R20, for example, Core 05 of our 3900X tends to run at ~4.46GHz when using AMD’s Ryzen Balanced Plan. Under 1usmus’ plan, that CPU core tends to run very slightly higher, at around 4.53GHz. This represents roughly 1.5 percent more clock speed than previously available.
1usmus also states that his power plan results in workloads being matched more frequently to the highest-overclocking core as identified by Ryzen Master. We saw no evidence of this. Windows 10 clearly prefers running workloads on our Ryzen’s C05 core (as identified in Ryzen Master). If anything, it preferred this core more strongly when running 1usmus’ power plan than the AMD default.
Cinebench R15 performance did not improve in multi-threaded and only increased by 1.5 percent at most in single-threaded. Maximum clock speed recorded increased from 4.49GHz to 4.6GHz, but the actual degree of performance improvement was a fraction of this. Single-threaded performance improved from a score of 209 to 2012.
Cinebench R20 multi-threaded performance did not improve at all. Single-threaded performance improved by roughly 3 percent, from 512 to 526.
Finally, we ran the Barbershop_Interior benchmark from the Blender 1.0Beta2 benchmark. This is a sustained all-core test that gives the CPU more time to show the impact of any performance improvements or subtle clock changes. Barbershop_Interior render times improved from 13.35 minutes to 13.26 minutes, a gain of 0.04 percent, or indistinguishable from random error and noise.
1usmus’ power plan does not appear to increase performance in any way we can separate from a margin of error, though there may be a very small (1.5 percent – 3 percent) improvement in single-threaded tests. I typically assume a 2.5 – 3 percent margin of error for any benchmark, however, and the best ST gains we measured were in 3 percent territory. There’s no sign of any improved core parking behavior; C05 is the preferred core for single-threaded workloads over time, and the 1usmus plan does nothing to change this. It appears that whatever performance the power plan secured comes from shutting down the rest of the chip more aggressively. It takes longer to wake cores from deep sleep, so there’ll be trade-offs in any case.
The truth is, as power consumption and schedulers continue to both become more complex, the chances that an end-user will magically stumble on a combination of settings that magically boost performance are low. The 1usmus’ plan does hit superficially faster clocks, but it doesn’t translate them into sustained improved performance.
ExtremeTech’s guidance is to stick with AMD defaults in all regards as it relates to power management and CPU clocking. I’ve yet to test any combination of motherboard settings that delivers consistently higher performance that didn’t amount to just overclocking the CPU — and even then, frankly, I haven’t seen OC results impressive enough to justify the effort on top-notch chips. AMD has gotten very good at squeezing out all available performance from its silicon.