POET Technologies Inc.

over 10 years ago

Below is an article recently published in PCMag.com which highlights several of the current limitations that Dr. Taylor discussed at the EC.

http://forwardthinking.pcmag.com/none/322231-chipmaking-challenges-face-moore-s-law

Chipmaking Challenges Face Moore's Law

Apr 02, 2014 5:30 PM EST
0 Comments

By Michael J. Miller

Every few years there are stories about how Moore's Law – the concept that the number of transistors in a given area doubles every two years or so – is dying. Such stories have been around for decades, but we still continue to see new chips with more transistors every few years, pretty much on schedule.

For instance, in February Intel introduced a 4.3-billion transistor chip called the Xeon E7v2 or Ivytown on a 541 square millimeter die using its 22nm process. A decade ago, Intel's high-end Xeon, known as Gallatin, was a 130nm chip with 82 million transistors on a 555 square millimeter die. That's not quite keeping up with a doubling every two years, but it's close.

Of course, that doesn't mean it will continue working forever, and indeed, chipmaking is going through some big changes that affect both the manufacturing and the design of chips, and all these will have lasting impacts on users.

Most obviously, it has been clear for a long time that clock speeds aren't getting faster. After all, Intel introduced Pentium chips in 2004 that ran at 3.6 GHz; today the company's top-end Core i7 runs at 3.5 GHz with a maximum turbo speed of 3.9 GHz. (Of course, there are some people who overclock, but that's always been the case.)

Instead, designers reacted by adding more cores to the chips and by increasing the efficiency of each individual core. Today, even the lowest-end chip you can get for a desktop or laptop is a dual-core chip, and quad-core versions are commonplace. Even in phones, we're now seeing a lot of quad-core and even octa-core parts.

That's great for running multiple applications at the same time (multi-tasking) or for applications that can really take advantage of multiple cores and threads, but most applications still don't do that. Developers – particularly those who create developer tools – have spent a lot of time making their applications work better with multiple cores, but there are still a lot of applications that depend mostly on single-threaded performance.

In addition, processor developers are putting a lot more graphics cores and other specialized cores (such as those that encode or decode video, or encrypt or decrypt data) within an application processor, in what much of the industry has called heterogeneous processing. AMD, Qualcomm, and MediaTek have all been pushing this concept, which does make a lot of sense for some things. It certainly helps in integration – making the chips smaller and less power-hungry; and seems to make perfect sense in mobile processors – such as the big.LITTLE approach that ARM has taken where it combines more powerful but more power-hungry cores with those that only take a little power. For many of us, getting chips that use less power for the same performance – and therefore mobile devices that go longer on a battery charge, is a big deal.

The use of a tremendous number of cores – whether graphics cores or specialized x86 cores – is certainly having a huge impact on high-performance computing, where things such as Nvidia's Tesla boards or Intel's Xeon Phi (Knight's Corner) are having a huge impact. Indeed, most of the top supercomputers today use one of these approaches. But it still only works for certain kinds of uses, primarily for applications primarily for applications that use SIMD (single instruction, multiple data) commands. For other things, this approach doesn't work.

And it's not just that the chips that can't run faster. On the manufacturing side, there are other obstacles to putting more transistors on a die. Over the past decade, we've seen all sorts of new techniques for chipmaking, moving from the traditional mixture of silicon, oxygen, and aluminum towards new techniques such as "strained silicon" (where engineers stretch out the silicon atoms), replacing the gates with high-K/metal gate materials, and most recently moving from traditional planar gates towards 3-D gates known as FinFETs or "TriGate" in Intel parlance. The first two techniques are now used by all the advanced chipmakers, with the foundries planning to introduce FinFETs in the next year or so, following Intel's 2012 introduction.

One alternative is called FD-SOI (fully depleted silicon-on-insulator), a technique that ST Microelectronics in particular has pushed, which uses a thin insulating layer between the silicon substrate and the channel to provide better electrical control of tiny transistors, in theory delivering better performance and lower power. But thus far, it doesn't seem to have nearly the momentum from the big manufacturers that FinFETs have.

Lately, Intel has been making a big deal of how far ahead it is at chipmaking, and indeed it started shipping volume production of its Core microprocessors on its 22nm process with TriGate technology about two years ago and plans to ship 14nm products in the second half of this year. Meanwhile, the big chip foundries are planning on 20nm production in volume later this year using traditional planar transistors, with 14 or 16nm products with FinFETs slated for next year.

Intel has been showing off slides showing how far ahead it is on chip density, such as this one from its analyst day:

But the foundries disagree. Here's a slide from TSMC's most recent investor call, saying it can close the gap next year.

Obviously, only time will tell.

In the meantime, getting smaller die sizes is harder with the traditional lithography tools used to etch the lines into the silicon chip. Immersion lithography, which the industry has used for years, has reached its limit, so vendors are now turning to "double patterning" or even more passes in order to get finer dimensions. Though we have seen a bit of progress lately, the long-awaited move toward extreme ultraviolet (EUV) lithography, which should offer finer control, remains years away.

Things like FinFETs and multiple patterning are helping make the next generation of chips, but at increasing costs. Indeed, a number of analysts are saying that the cost per transistor of production at 20nm may not be an improvement over the cost at 28nm, because of the need for double patterning. And new structures like FinFETs will likely also be more expensive, at least at the beginning.

As a result, many chipmakers are looking at even more exotic methods of improving density even if traditional Moore's Law techniques don't work.

NAND flash memory uses the most advanced process technology so it is already running into serious issues with conventional horizontal scaling. The solution is to create vertical NAND strings. The individual memory cells won't get any smaller, but because you can stack so many on top of one another—all on the same substrate—you get much greater density in the same footprint. For example, a 16-layer 3D NAND chip manufactured on a 40nm process would be roughly equivalent to a conventional 2D NAND chip made on a 10nm process (the most advanced process in use now is 16nm). Samsung says it is already manufacturing its V-NAND (Vertical-NAND), and Toshiba and SanDisk will follow with what it calls p-BiCS. Micron and SK Hynix are also developing 3D NAND, but seem to be focused on standard 2D NAND for the next couple of years.

Note that this is not the same thing as 3D chip stacking. DRAM memory is also hitting a scaling wall, but it has a different architecture that requires one transistor and one capacitor in each cell. The solution here is to stack multiple fabricated DRAM memory chips on top of one another, drill holes through the substrates, and then connect them using a technology called through-silicon-vias (TSVs). The end result is the same—higher density in a smaller footprint—but it is more of an advanced packaging process than a new fabrication process. The industry plans to use this same technique to stack memory on top of logic, not only to trim the footprint, but also to improve the performance and reduce power. One solution that has gotten a lot of attention is Micron's Hybrid Memory Cube. Eventually 3D chip stacking could be used to create powerful mobile chips that combine CPUs, memory, sensors, and other components in a single package, but there are still many issues to resolve with the manufacturing, testing, and operation of these so-called heterogeneous 3D stacks.

But it's the next generation of techniques that the chip makers have talked about that seem much more exotic. At chip conferences, you hear a lot about Directed Self Assembly (DSA), in which new materials will actually assemble themselves into the basic transistor pattern – at least for one layer of a chip. It sounds a little like science fiction, but I know a number of researchers who believe this really isn't far off at all.

Meanwhile, other researchers are looking at a class of new materials – known as III-V semiconductors in more traditional styles of manufacturing; while others are looking at different semiconductor structures to supplement or replace FinFETs, such as nanowires.

Another method of reducing costs is to make transistors on a larger wafer. The industry has gone through such transitions before moving from 200mm wafers to 300mm wafers (about 12 inches in diameter) about a decade ago. Now, there is a lot of talk about moving to 450mm wafers, with most of the big manufacturers of wafers and the tool suppliers creating a consortium for looking at the necessary technologies. Such a transition should reduce manufacturing costs, but will carry a high capital cost as it will require new factories and a new generation of chip-making tools. Intel has a plant in Arizona that would be capable of 450mm production, but has delayed ordering the tools, and many of the tools vendors are delaying their offerings as well, making it likely that the first real production of 450mm wafers won't be until 2019 or 2020 at the earliest.

It all seems to be getting harder, and more expensive. But that's been the case for semiconductor manufacturing since the beginning. The big question is always whether the improvements in performance and the extra density will be worth the extra cost in manufacturing.

ISSCC: Extending Moore's Law
How to extend Moore's Law was a major topic at last month's International Solid State Circuits conference (ISSCC). Mark Horowitz, a Stanford University professor and founder of Rambus, noted that the reason we have computing in everything today is because computing became cheap, due to Moore's Law and Dennard's rules on scaling. This has led to expectations that computing devices will become ever cheaper, smaller and more powerful. (Stanford has plotted the performance of processors over time at cpudb.stanford.edu).

But he noted that the clock frequency of microprocessors stopped scaling around 2005 because power density became a problem. Engineers hit a real power limit – because they couldn't make the chips any hotter, so now all computing systems are power-limited. As he noted, power scaling – the power supply voltage – is changing very slowly.

The industry's first inclination to solve this problem is to change technology. "Unfortunately I am not optimistic that we are going to find a technology to replace CMOS for computing," he said, for both technical and economic problems. The only way to get operations per second to increase, therefore, is to decrease the energy per operation, he said, suggesting this is why everyone has multi-core processors today, even in their cell phones. But the problem is that you can't keep adding cores because you quickly hit a point of diminishing returns in terms of performance energy and die area. CPU designers have known about this for some time and have been optimizing CPUs for a long time.

Horowitz said we shouldn't forget about the energy used by the memory. In his presentation, he showed the energy breakdown for a current, unidentified 8-core processor in which the CPU cores used about 50 percent of the energy and the on-die memory (L1, L2, and L3 caches) used the other 50 percent. This doesn't even include the external DRAM system memory, which could end up being 25 percent of more of the total system energy usage.

Many people are talking about using specialized hardware (such as ASICs), which can be a thousand times better in terms of energy per operation compared to a general-purpose CPU. But as Horowitz noted, the efficiency here comes in part because it is used for specific applications (such as modem processing, image processing, video compression and decompression) that basically don't access memory very much. That is why it is helps out so much with energy—it's not so much about the hardware, it is about moving the algorithm to a much more restricted space.

The bad news is that this means the applications you can build are restricted. The good news is that you might be able to build a more general engine that could handle these sorts of applications with "high-locality," meaning they don't need to access memory. He refers to this as the Highly Local Computation Model and the "stencil applications" that can run on it. This of course requires a new programming model. Stanford has developed a domain-specific language, a compiler that can build these stencil applications and run them on FPGAs and ASICs.

Also at the ISSCC conference, Ming-Kai Tsai, Chairman and CEO of MediaTek, said that people have been asking since the early 1990s how long Moore's Law will actually last. But as Gordon Moore said at ISSCC in 2003, "No exponential is forever. But we can delay it forever." The industry has done a great job sustaining Moore's Law more or less, he said. The transistor cost has continued its historic decline. For the cost of 100 grams of rice (about 10 cents), you could buy only 100 transistors in 1980, but by 2013 you could buy 5 million transistors.

Tsai said mobile devices have hit a ceiling because processors can't run efficiently at speeds beyond 3 GHz and because battery technology hasn't improved much. MediaTek has been working on this problem by using multicore CPUs and heterogeneous multiprocessing (HMP). He said the company introduced the first true 8-core HMP processor in 2013, and earlier this week, it announced a 4-core processor using its PTP (Performance, Thermal and Power) technology to further increase performance and reduce power. He also talked about the rapid progress in connectivity. Many mobile applications that were formerly impossible are now viable because of these improvements in WLAN and WWAN networks, he said.

MediaTek is working on different technologies for "Cloud 2.0 "including wireless charging solutions, the "Aster" SoC for wearables (measuring only 5.4x6.6 millimeters), and heterogeneous systems as part of the HSA Foundation, he said. Cloud 2.0, according to Tsai, will be characterized by many more devices—in particular wearables—with lots more radios; more than 100 radios per person by 2030.

The big challenges for Cloud 2.0 will be energy and bandwidth, Tsai said. The first will require innovative integrated systems, hardware and software solutions; better battery technology; and some form of energy harvesting. The second will require more efficient use of available spectrum, adaptive networks and more reliable connectivity.

Whatever happens with chip making, it's certain to lead to new applications and new decisions that chipmakers, product designers, and ultimately end users will face.