AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs

Skype November 28, 2018 Technology Leave a comment 369 Views

AMD Epyc

AMD may have revealed its upcoming 7nm Epyc CPUs at its New Horizons event, but it only touched on many of the architectural enhancements and improvements for the core. We know that the chip pairs a series of 7nm chiplets (each containing eight CPU cores), but fine details on cache organization or CCX design haven’t been revealed yet. A new data point provided courtesy of SiSoft Sandra suggests that AMD has doubled the amount of L3 cache per CPU core, at least on Epyc.

The original entry in the SiSoft Sandra database has been removed, but not before being screenshotted by Overclock3D.net. The screenshot is for an engineering sample clocked at 1.4GHz base, with a 2GHz boost clock and 128 threads, implying that SMT is not enabled at this stage. Neither the low clock nor the lack of SMT support is concerning; engineering samples often have features disabled and Epyc isn’t expected to launch until 2019 is well underway.

Image by Overclock3D

Doubling the total amount of L3 cache per core is an expected move for AMD and should help improve Epyc performance overall. AMD’s existing CCX implementation allocates 8MB of L3 per CCX, with two CCX per die. Ping times between logical cores are roughly 26ns when pinging the same CPU core, 42ns when pinging within the same CCX, and 142ns when pinging a different CCX from within the same physical die. That’s not much better from the memory latency hit you take when you step out to main memory to retrieve data that way.

What this means, in aggregate, is that Epyc doesn’t actually have a 64MB L3 at all, in any meaningful sense. It has 8 L3 caches of 8MB each. This works just fine for applications that can fit into an 8MB cache slice, but it hampers Epyc on any application that doesn’t fit this access model. As this memory latency benchmark from Anandtech shows, Epyc’s memory latency in dual random reads is quite competitive below 8MB and significantly worse than Intel above that point.

Graph by Anandtech

Doubling the amount of L3 cache per die will obviously improve performance in applications that fit into a 16MB access pool but not an 8MB slice. I want to caution, however, against concluding that this is the only change AMD has made to Epyc’s overall organization. The decision to organize Epyc as a set of 7nm chiplets that connect to a common I/O die is going to impact core-to-core communication. It’s not clear exactly how things will change with AMD’s Rome silicon because the company hasn’t released this information yet, but there are a lot of knobs and dials AMD could have tweaked. In addition to the physical changes that we know 7nm Epyc incorporates, there are potential changes to caching strategy, Infinity Fabric improvements, CCX design alterations, and even shifts in how AMD manages power consumption in its caches that could potentially impact memory latency. Knowing that the company likely doubled up on L3 cache does tell us something about Rome, but it isn’t the whole story.

How this change could impact desktop Ryzen is unclear. AMD could opt to keep the same L3 cache size per die, or it might fuse off some L3 to recover bad chips or differentiate between Epyc and Ryzen parts. The company’s original Ryzen launched reused the same silicon across all product families to the maximal extent possible, but some of the company’s second-generation Ryzen 5 CPUs have smaller L3 caches (8MB on the Ryzen 5 2500X, versus 16MB on the Ryzen 5 1500X).

Now Read: AMD 7nm Epyc CPU Offers Core Enhancements, Huge Performance Gains, Nvidia Tesla, AMD Epyc to Power New Berkeley Supercomputer, AMD Reports Earning Results, Significantly Improves Gross Margin