Lean machines: Research aims to produce more efficient computer
chips
There was a time when giving computers more “power,” that
is to say “performance,” was an unabashed mantra of the semiconductor
industry. No chip seemed fast enough. But a more ominous meaning for
power has taken over: Wattage and heat. The industry is now looking for
a long-term solution to a troublesome problem.
“The cost of energy for a server exceeds the purchase price of
that machine in a year and half,” says Bill Dally, chair of the
computer science department and also a professor of electrical engineering. “It’s
also a significant amount of carbon that’s being released into
the atmosphere because people are powering servers.”
In fact, computer servers and the infrastructure required to cool them
now account for more than 1 percent of total electricity usage in the
United States, according to Jonathan Koomey, a consulting professor of
civil and environmental engineering. Meanwhile, people are depending
increasingly on mobile devices that can only last so long on a given
battery charge. More efficient cell phones and laptops would last longer.
Hardly oblivious to the problem, chip companies have taken important
steps to improve their power performance, but achieving more than a stopgap
or near-term solution, say Dally and electrical engineering and computer
science Professor Mark Horowitz, could require substantially rethinking
how chips are designed.
For example, chip companies have made chips more efficient by optimizing
them for just one application, such as handling video encoding and decoding,
or handling packets in a high-speed network. But these application specific
integrated circuits (ASICs) are prohibitively expensive to make for all
but the most popular uses.
Eager to provide chipmakers with a better solution, each professor is
now pursuing innovative ideas to reduce the financial and environmental
footprint of future computers. Dally has made a very efficient general
use processor, while Horowitz is investigating how new design methods
could slash the cost of creating ASICs.
Location, location, location
In describing his new processor
that is, on average, 32 times more energy efficient than a standard chip
with comparable function, Dally readily makes analogies to real estate.
The reason is that one of the biggest electrical problems facing chips
today is the distance that signals must traverse to make them work. About
70 percent of a chip’s energy
is expended pushing bits from distant memory banks to the logic units
that must process them, and then hauling that output to its next destination.
A key innovation in his EEC (efficient embedded computing) processor,
is, in a sense, the same kind of shift that environmentalists call
for when they advocate greater consumption of locally grown produce.
By putting smartly managed storage closer to the logic units, Dally is
greatly reducing data transport much in the same way that people buying
vegetables at a farmer’s market would greatly reduce the need for
long-haul boat, plane, or train transport of food.
To bring data closer to logic is helpful, but not if it’s the
wrong data. Dally therefore employs sophisticated optimization techniques
in his processor’s compiler to ensure that the most deserving data
is given the best proximity as often as possible. Here the best analogy
is to a neighborhood convenience store chain that employs demand forecasting
to ensure that the products in highest demand are always in stock.
To understand the difference optimization can make, consider how typical
processors “decide” what to keep in a cache. Say such a cache
can hold four units of information. It would simply keep the most recently
used four units. If the program the chip is executing, however, cycles
through a loop that uses five units, the cache will never have the data
that is needed next. It will always have to send away, at great energy
cost, for the missing unit.
Dally’s compiler looks ahead in the code of the programs written
for his chip and anticipates what data will likely be needed next. The
compiler also analyzes what the flow of data around the chip will look
like, and tries to plan the most efficient paths for that flow.
The results of this and other design advances are to be published
in an upcoming paper. They represent a major improvement over the energy
efficiency of a conventional embedded Reduced Instruction Set Computer
(RISC) processor. Dally’s group tested each chip’s power
usage while running standard software tasks, such as encryption, signal
processing, image encoding and mathematical tasks. On average the EEC
processor used 32 times less power (the median factor was a little more
than 20) than the RISC processor.
The technology embodied in the EEC processor promises to both reduce
the development time of demanding embedded applications and to enable
applications where the sales volume unfortunately does not justify the
high development cost of an ASIC, such as scientific instrumentation
and aids to the handicapped, Dally says.
Virtual virtues?
“The best way I know to create an efficient design is to create
an application optimized design,” says Horowitz, who will become
chair of the electrical engineering department this summer. “But
creating this design has got to be cheaper.”
The reasons ASIC design is expensive, is modern chips are very complex, and it takes a lot of work to ensure that this complex system really does what you want it to do. Horowitz’s approach is to try to make the result of this expensive design process more useful than a single chip. He wants the designers to create a chip generator, rather than a chip. In other words, rather than creating a flexible video processor, create a tool that can create a family of video processors – creating a virtual, extremely flexible, video chip. To create an optimized processor, the application experts would configure/program the generator to run their application. Once this has been done, the generator would use this information to create an implementation energy and performance optimized for this application.
To some, the generator may seem a little magical, but Horowitz has thought intently about the challenges he’ll have to overcome to succeed.
“Can I build a generator that is flexible enough to be usable?” he asks. “Can I generate not only the design of the chips but also all the testing collateral that’s needed to validate that it works? And can I take logic design, optimize it and create an efficient silicon implementation?”
The research is young, but he is optimistic that it can succeed.
“I don’t know that I can build a chip generator, but at least right now I don’t know of any reason why I can’t,” he says, and his group is off working to demonstrate a prototype chip generator this year.
Stopgap measures and small careful steps, after all, are not going to be enough to change the basic nature of the power problem. That will more likely come about either by massive leaps in the efficiency of general processors or large reductions in the cost of designing ASICs. Or both.
March 2008
|