Tuesday, September 30, 2008

Nehalem - My Article in Pointer

Introduction:
Intel's next-generation micro architecture (codenamed "Nehalem") represents the next step in processor energy efficiency, performance, and dynamic scalability. Designed from the ground up to take advantage of hafnium-based Intel 45nm hi-k metal gate silicon technology, Nehalem will also be the first to introduce Intel Quick Path technology.

"Atom" is the brand name for Intel's newly-launched ultra mobile processor line, but it could just as well be the name for Intel's next-generation 45nm micro architecture. This new core micro architecture, codenamed Nehalem, forms the basic building block from which Intel will assemble the brains for everything from high-end servers to notebooks. Nehalem represents a lot more than just a new processor; it's a significant shift for Intel at almost every level.

Increase in Bandwidth of the processor:
Moore's Law has given processor designers an embarrassment of transistor resources and nowhere is that more apparent than in Intel's 45nm Nehalem processor. Debuting in 4 and 8 core variants later this year, Nehalem packs a ton of hardware into a single processor socket. (Early numbers put the transistor count of a quad-core Nehalem at 781 million; no numbers for the 8-core model have appeared yet.) But trying to feed all of that hardware with the Intel platform's existing front side bus architecture would be folly. So, just as importantly, Nehalem also sounds the long-overdue death knell for Intel's absolutely old front side bus architecture.

The radical change in Intel's system bandwidth situation that Intel's new Quick Path Interconnect (QPI) represents is perhaps the largest single factor that shaped Nehalem's design. Between Quick Path and Nehalem's integrated memory controller, a Nehalem processor will have access to an unprecedented amount of aggregate bandwidth, especially in two and four socket implementations.

What this means is that Intel no longer has to equip its processors with freakishly large unified caches designed to mitigate the effects of the bandwidth starvation with which Intel platforms currently struggle. The chipmaker is now free to use all of the transistors that Moore's Law affords more flexibly and intelligently, and this freedom has profound effects on every aspect of Nehalem.

Remixing the microprocessor:
In some ways, Nehalem is Intel's most significant processor since the Pentium 4, so far as it signifies a major shift for the company's x86 strategy. The ill-fated Pentium 4 was a relatively radical design conceived with clock speed in mind. Nehalem, in contrast, is a more progressive evolution of Intel's existing, mobile-oriented Core 2 products; all of its changes are made with a view to exploiting the large amounts of parallelism that Moore's Law affords at the 45nm process node and to taking advantage of QPI's bandwidth.
Because of this emphasis on parallelism and bandwidth, "Nehalem," broadly conceived, is less of a "processor" in the classical sense than it is a set of building blocks that can be assembled in different configurations for different market segments.
Nehalem-derived processors will mix the following elements in different proportions, depending on the platform and product:
• Number of cores
• Number of memory channels on the integrated memory controller
• Type of memory supported (registered and unregistered DDR3 or FB-DIMM)
• Number of links in the Quick Path interface (for scaling Quick Path bandwidth)
• L3 cache size
• Power management features
• Integrated graphics
So far, Intel has said that Nehalem will scale from two to eight cores, but the company has talked about only the four-core, server-oriented part. All Nehalem configurations have a number of Nehalem cores—each with a 32KB, four-way set associative instruction cache, a 32KB, eight-way set associative data cache, and a private, low-latency 256KB L2 cache—all attached to an inclusive L3 cache that will be sized to fit the number of cores and target market.
The four-core part that Intel has detailed weighs in at 781 million transistors, much of which is no doubt the very generous 8MB L3 cache. This part also includes an on-die, three-channel DDR3 memory controller and a Quick Path interface that supports four Quick Path links. As I noted above, the number of memory channels and Quick Path links in other Nehalem-based products can be expected to vary with the part and target market.
Nehalem's core:
The basic building block of Intel's Nehalem family is a new version of the Core micro architecture, which sports a number of major changes from its Core 2 Duo incarnation. In fact, Nehalem's core represents the biggest overhaul that this micro architecture has undergone since the transition from Core to Core 2. Most areas of the processor have undergone major revisions in order to take advantage of the amount of bandwidth made available by Quick Path. The important exception here is the execution hardware, which, except for the addition of some floating-point and integer shuffle blocks on port 5, is substantially unchanged from Core 2.
At a high level, you can think of Nehalem as a design that takes the very wide, extremely robust execution engine from its predecessor, the Core 2 Duo, and focuses on keeping it as busy as possible by feeding it code and data at an unprecedented rate.

Conclusion:
With the advent of Nehalem, Intel makes the giant leap from what is fundamentally still its decades-old monolithic-processor-plus-FSB platform to a fully modern SoC(System on a Chip) and NUMA(Non-Uniform Memory Access) platform. This leap is long overdue, but when Intel makes it in the fourth quarter of this year, it will change everything about its broader platform picture. The increase in bandwidth alone will improve performance on multi socket servers, and even desktop and mobile platforms will benefit from the higher levels of integration and performance that the integrated memory controller brings with it.
In sum, Intel's entire processor product line will benefit from the large structural changes that has been outlined here, as well as from the smaller, core-specific improvements that Nehalem embodies. And from here on out, Nehalem's mix-and-match approach to products and platforms will be par for the course for Intel, as well as for rival AMD. AMD has its own version of this CPU-GPU tandem, which it refers to as Fusion, though initial versions will put the CPU and GPU in the same overall package, but not on the same physical piece of silicon.
With the launch of the first four-core, eight-thread Nehalem, the future of hardware will have arrived; and with all of that parallelism available, the performance ball will be squarely in the software industry's court.

No comments: