ARM Goes for Broke With High-Performance Cortex-X1 CPU Core
ARM announced a pair of new CPU core designs on Tuesday and launched a significant new strategy for competing in the market in the process. The Cortex-A78 is a new high-efficiency core that emphasizes that aspect of design over performance. It’s a step-wise improvement for ARM over the Cortex-A77, and it’ll undoubtedly show up in plenty of designs next year either as the high-end core in a midrange or upper-midrange design, or the midrange core in a three-tier big.Little.littlest design.
The ARM Cortex-X1, on the other hand, is something genuinely new and exciting. Up until now, there’ve effectively been two players in the ARM CPU market: Apple and everyone else. Apple has driven single-threaded ARM performance far above anything any other company has delivered, and it is the only company to offer an ARM SoC that could plausibly challenge the likes of AMD or Intel at the top of the performance stack (in single-threaded performance).
This slide shows the high-level differences between the two. The X1 doubles SIMD throughput, can dispatch 5 instructions or 8 Mops per cycle, and offers up to 1MB of L2 and 8MB of L3.
Dispatch bandwidth has been increased by 33 percent, with a larger out-of-order window (224 entries, up from 160 to help ARM extract better ILP). Integer pipelines appear identical to Cortex-A78, but the FPU resources have increased with 2x the SIMD pipelines for NEON support. ARM continues to support 128-bit vector registries with no 256-bit or higher capability, but doubling up the 128-bit units does partially compensate for that.
Cache bandwidth is substantially higher, with doubled available bandwidth to both L1 and L2, as well as the already-mentioned doubling of L2 capacity. The L2 has been redesigned to improve its access latency and offers 10-cycle latency compared to 11 cycles on the Neoverse-N1. The L2 TLB is also 66 percent larger.
Two Chips to Rule Them All
ARM is dividing the Cortex-A78 and the Cortex-X1 to allow the two families to play in somewhat different markets. The X1 is the performance-at-all-costs CPU core that’s unlikely to show up in clusters of 4-8 chips but could serve as the basis for a server play or a much higher performance ARM PC than anything we’ve seen to date. If you were serious about building an ARM-based Windows PC that could keep up with Intel or AMD, the X1 would be the easy choice — while it may not be as power-efficient as the A78, ARM needs to throw more silicon at x86 emulation to squeeze out better performance in the first place.
Overall, ARM is moving into position to challenge x86 more directly. I wouldn’t start drawing up title cards for an x86 versus ARM battle just yet — the long-foretold fight between the two architectures appeared poised to begin in the mid-2010s, just before Intel quit the tablet market. ARM hasn’t exactly muscled into the desktop, mobile, and server markets yet, and until it does, we can’t exactly declare that the two spaces have come to blows. Both AMD and Intel, however, ought to be looking nervously over their shoulders. They’ve got some potential competition on the horizon.
- Intel May Have Reserved Its Top-End 28W Ice Lake CPUs Exclusively For Apple
- New Startup SiPearl Will Challenge AMD, Intel for Control of the EU HPC Market
- Apple Now Rumored to Be Building ARM-Based Mac CPU For 2021