The following text is excerpted from Chapter 3 of the book Power Management in Mobile Devices by Findlay Shearer. Printed with permission from Newnes, a division of Elsevier, copyright 2008. For more information about this title, visit www.elsevierdirect.com.
The book, for technical managers, software developers, hardware designers, and manufacturing engineers in the mobile device industry, provides an overview of the challenges facing designers for feature-rich mobile devices that call for smaller form factor and lower cost. It follows with in-depth coverage of energy-optimized software, batteries and displays, power management ICs (PMICs), and system-level approaches to energy conservation.
If you order a copy of this book by June 1, 2008, you will receive a 20 percent discount. Click here and make sure to type in 92351 when ordering this book. Or call 1-800-545-2522 and mention 92351.
Low Power Design Techniques
Dynamic Process Temperature Compensation
A common engineering philosophy when designing a System-on-a-Chip (SoC) is to insure that they perform under "worst-case" conditions. Worst case in semiconductor manufacturing applies to very high temperatures and variations in the manufacturing
process; transistor performance varies in a predefined range of parameters. Thus, some SoCs from the same wafer lot are capable of supporting higher operating frequencies (best case — fast process) or lower frequencies at the bottom of the predefined
performance window (worst case — slow process) at the given voltage (Figure 3.1).
(Click on Image to Enlarge)
Dynamic process temperature compensation (DPTC) mechanism measures the frequency of a reference circuit that is dependent on the process speed and temperature. This reference circuit captures the product's speed dependency on the process technology and existing operating temperature and lowers the voltage to the minimum level needed to support the existing required operating frequency (Figure 3.2).
A mobile device containing a fast-process SoC operating in a moderate climate condition can be expected to work at the worst-case calculated voltage to support the required frequency. This is less than an optimum energy savings.
The DPTC concept allows the supply voltage to be adjusted to match the process corner and SoC temperature. If the process corner is "best case" a lower supply voltage can be applied to support the required performance of the SoC. Similarly, the temperature of the part can be used to adjust the supply voltage.
The available performance is monitored by different types of reference circuits comprised of free-running ring oscillators. The inputs from reference "sense" circuits are processed by internal control and compare logic and written to software readable registers. If there
is a significant (predefined) change in the reference circuit delay values, an interrupt is triggered. The relevant software interrupt routine calculates the new required voltage and re-programs the Power Management IC (PMIC) to supply the new voltage to the
SoC. A new voltage is applied, based on the reference circuit delay, values change, providing feedback and closing the loop of the DPTC mechanism. This insures that the system stabilizes at the proper voltage level. Software control permits fast and simple changes.
DPTC can result in an approximate power savings of 35%, significantly improving the battery life.
Static Process Compensation
Static process compensation (SPC) follows a similar path to DPTC but without the temperature compensation aspect. SoCs are designed at the worst-case process corner (see Figure 3.1). However, production wafer lots are typically manufactured close to
a typical point in the "box" and as a result can run at a lower voltage and still meet performance requirements.
SPC is a technique of identifying minimum operating voltage for each SoC at the production line and programming the fuses with the information. Software reads the fuses to set the operating voltage for the SoC.
The basic circuits required to support SPC are similar to those employed in DPTC and integrated into the SoC. They include a ring oscillator, support register, and fuses (Figure 3.3).
The frequency of the ring oscillator correlates with the process corner. The support register captures the oscillator frequency and the fuses are programmed to define the counter frequency and voltage for the SoC.
SPC does not compensate voltage for temperature and the operating voltage is not changed dynamically in SPC. In addition, the SoC manufacturer can test the SoC to the SPC defined voltage. Given that SPC is a subset of DPTC, it has been demonstrated that the temperature compensation aspect of DPTC provides marginal benefit to energy conservation.
Power Gating
Like voltage gating, power gating involves temporarily shutting down blocks in a design when the blocks are not in use. And, like voltage gating, the technique is complex. With power gating, the designer has to worry about it at the SoC design phase, specifically at
the Register Transfer Level (RTL). The engineer has to design a power controller that is going to control what blocks need to shut down at a particular time and has to think about what voltage to run different blocks (Figure 3.4).
Traditionally, two methods for power gating are fine-grained and coarse-grained. In fine-grained power gating, designers place a switch transistor between ground and each gate. This approach allows designers to shut off the connection to ground whenever a
series of functions is not in use. This technique is done with every cell in the library.
There are trade-offs with fine-grained power gating because it is fairly easy to do power characterization of each cell. However, the problem is the area hit is very significant: two to four times larger (Figure 3.5).
(Click on Image to Enlarge)
In order to keep the area overhead to a minimum, fine-grained power gates are implemented as footer switches to ground as NMOS transistors. The timing impact of the IR drop across the switch and the behavior of the clamp are easy to characterize. It is still
possible to use a traditional design flow to deploy fine-grained power gating.
Designers can also mix and match cells having some power gated and others not. Cells with high threshold voltage need not use power gating. For the most part, the power penalty is just too large, and many design groups are instead using coarse-grained power
gating, in which designers create a power switch network. This is essentially, a group of switch transistors that in parallel turn entire blocks on and off. The technique does not have the area hit of the fine-grained technique because for a given block of logic the
switching activity will be less than the 100%. However, due to the propagation delay through the cells, the switching activity will be distributed in time. In addition, it is harder to characterize on a cell-by-cell basis (Figure 3.6).
(Click on Image to Enlarge)
Unlike fine-grained power gating, when the power is switched in coarse-grained power gating, the power is disconnected from all logic, including the registers, resulting in the loss of all states. If the state is to be preserved when the power is disconnected then it
must be stored somewhere, where it is not power gated. Most commonly this is done locally to the registers by swapping in special "retention" registers which have an extra storage node that is separately powered. There are a number of retention register designs which trade-off performance against area. Some use the existing slave latch as the storage
node whilst others add an additional "balloon" latch storage node. However, they all require one or more extra control signals to save and restore the state.
The key advantage of retention registers is that they are simple to use and are very quick to save and restore state. This means that they have a relatively low energy cost of entering and leaving standby mode and so are often used to implement "light sleep."
However, in order to minimize the leakage power of these retention registers during standby, it is important that the storage node and associated control signal buffering are implemented using high threshold low leakage transistors.
If very low standby leakage is required then it is possible to store the state in main memory and cut the power to all logic including the retention registers. However, this technique is more complex to implement and also takes much longer to save and restore
state. This means that it has a higher energy cost of entering and leaving standby mode and so is more likely to be used to implement "deep sleep."
A key challenge in power gating is managing the in-rush current when the power is reconnected. This in-rush current must be carefully controlled in order to avoid excessive IR drop in the power network as this could result in the collapse of the main power supply and loss of the retained state.
State-Retention Power Gating
The major motivation of this technique is to significantly reduce the leakage power for the SoC when in the inactive mode. State-retention power gating (SRPG) is a technique that allows the voltage supply to be reduced to zero for the majority of a block's logic gates while maintaining the supply for the state elements of that block. The state of the SoC is always saved in the sequential components. Combinational elements propagate the state of the flip-flops. Using the SRPG technique, when in the inactive mode, power to the combinational logic is turned off and the sequential stays on. SRPG can thereby greatly reduce power consumption when the application is in stop mode, yet it still accommodates fast wake-up times.
Reducing the supply to zero in the stop mode allows both the dynamic and static power to be removed. Retaining the supply on the state elements allows a quick continuation of processing when exiting the stop mode.
Since the state of the digital logic is stored in the flip-flops, if the flip-flops are kept on a constantly powered voltage grid, the intermediate logic can be put onto a voltage grid that can be power gated. When the voltage is reapplied to the intermediate logic, the state of the flip-flops will be re-propagated through the logic and the system can start where it has left off as illustrated in Figure 3.7.
(Click on Image to Enlarge)
In a full SRPG implementation the entire target platform is entered into state retention and all (100%) flip-flops retain the state during power down. There is a specific power up sequence required and the power down sequence is also predefined. Power up/down time is dependent on number of flops. Expected "wake-up" latency is less than 1 ns and "sleep" is less than 500 ns.
Partial SRPG
Partial SRPG further reduces the leakage from a full SRPG implementation. In this case only a few flip-flops are made capable of state retention and all other flip-flops are turned off. After power up, the SRPG flip-fl ops are restored to original state and the others are
restored to reset state of that flip-flop. System software understands the non-saved registers are lost and should either re-program those registers or ensure the reset state meets the software requirements.
Low Power Architectural and Subsystem Techniques
Clock Gating
A tried-and-true technique for reducing power is clock gating. One-third to one-half of an IC design's dynamic power is in the SoC's clock-distribution system. The concept is simple, if you do not need a clock running, shut it down...
Today, the two popular methods of clock gating are local and global. If you feed old data to the output of a flip-flop back into its input through a multiplexer, you typically need not clock again. Therefore, you can replace each feedback multiplexer with a clock gating
cell that clocks the signal off. You would then use the enable signal that controls the multiplexer to control the clock cell to clock the signal off (Figure 3.9).
The other popular approach of clock gating, global clock gating, is to simply turn off the clock to the whole block, typically from a central-clock-generator module. This method functionally shuts down the block, unlike local clock gating, but even further reduces dynamic power because it shuts down the entire clock tree (Figure 3.10).