What is meant by drive strength of a standard cell

As we know that cell delay is a function of output load capacitance. The most simplistic equivalent circuit of a logic gate driving an output can be assumed as given in figure 1:


The purpose of logic gate is to propagate the effect of logic value available at its input to the output. Based upon whether '0' or '1' is to be propagated to the output. The corresponding is achieved by charging and discharging of the output load capacitance. Propagating a logic '0' will mean discharging of the load capacitance, and vice-versa. Drive strength of the logic gate is the its relative capability to charge/discharge the capacitance present at its output. Now, the time constant, and hence, delay of the circuit is "RC".
So, for a cell with higher drive strength, corresponding "R" is lesser than the one with lower drive strength. So that for same load capacitance "C", delay is lower for a cell with higher drive strength as it can charge the capacitance in lesser time.

How drive strength varies with size of a cell: Let us talk in terms of MOSFETs, although this is valid in terms of every device in general. We know that for a given technology standard cell library, length of all transistors is kept constant. For instance, 90 nm technology will have gate length of all transistors as ~90 nm. And channel resistance of the MOSFET is inversely proportional to "W/L" of the transistor. So, a simple way to decrease channel resistance is to increase "W" of the transistor. So, a transistor with more area will have lesser resistance. Or we can say that a logic gate with bigger transistors will have more drive strength.

What is unit drive strength: In a standard cell library, we generally see cells labelled as "1X", "2X" and so on. But what is meant by the number that you see with drive strength? In general, the lowest size logic gate is labelled as unit drive strength. The drive strength numbers of other cells are laelled relative to unit drive strength cell.

Read next: How delay of a cell changes with drive strength

Also read:

Why setup is checked on next edge and hold on same edge? Setup and hold – the state machines essentials

Hi friends, in the post State machines – a practical perspective, we learnt about state machines. We also discussed different aspects of a state machine with the help of an example and the need of setup and hold checks to be taken care of. In this post, we will be discussing the state machine with a pinch of setup and hold and try to build a better understanding regarding these. For recapitalization, see figure 1. Each clock edge in a digital state machine represents a state. At each clock edge, all the registers in the design update their value based upon the data available at their input which is based upon the value computed on the basis of values launched by some other registers at previous clock edge. In simplest of words, state 3 is dependent upon state 2, which, in turn, is dependent upon state 1 and so on.

Each state of a state machine can be represented by a clock pulse
Figure 1: State machine representation of a clock pulse

All digital systems are synchronous systems (state machines) that require all the elements of the state machine to be in harmony. For instance, let us say, a state machine has 100 registers. All these registers need to be updated at the same time and need synchronized inputs so as to have only valid states in the state machines. For digital synchronous state machines, all the registers are synchronized by a clock signal. This is ensured with the help of setup and hold checks. But why do we need to apply setup and hold checks, is the question still unanswered.

Why setup and hold? We often encounter people saying that meeting the setup and hold requirements of a design is critical for silicon functionality. But have you ever thought why it is so? As we now know from our previous post, for a design consisting of only positive edge-triggered registers, each positive clock edge corresponds to a state and the state of the machine is updated at every clock edge since all the flip-flops capture data. In other words, the state of the machine is a function of the values of the registers at a particular clock cycle. For proper state machine functioning, the values launched from one register at one clock edge should be available at the input of the capturing flop before next clock edge arrives and should be available only after the present clock edge has passed (will become clear later on). This is necessary in order that the next state of the state machine is a valid state. If this does not happen, the state of the machine will not be what is desired. It may also happen that the state machine goes altogether into an invalid state leading to undesired results. And this is the reason setup is checked on the next edge and hold on same edge as discussed below.
Let us assume a hypothetical state machine consisting of three registers and some logic gates as shown in figure 2 below. If we assume the initial outputs of REG1 and REG2 to be 1 and 0 respectively, then the possible states can only be 100 or 010.

The shown state machine consists of three registers
Figure 2: A state machine with three registers and some logic

The state transition table for this state machine is as shown in figure 2. As is shown, there are only two valid states (100 and 010) provided the initial state of REG1 and REG2 is different (10 or 01). The state of REG3 is supposed to remain always ‘0’ as the REG1 and REG2 are supposed to always have different values at a time (given initial condition is also this).
State diagram of state machine shown in figure 2
Figure 3: State diagram of state machine shown in figure 2

The states with respect to clock waveform are as shown in figure 4 below. As is expected, each clock edge corresponds to one of the two states.

Figure 4: Clock waveforms showing different states of state machine

Now, the state of register 2 at a particular state (clock edge) depends upon state of register 1 at the previous clock edge. Let us assume register 2 is getting delayed clock with respect to register 1. In other words, there is considerable positive skew between the two registers. In this case, as shown in figure 5 below, there is a good chance that the data launched from register 1 is captured at register 2 at the same edge and not the next clock edge. Due to this, both register 1 and 2 will have same value at a particular clock cycle and the machine will run into invalid state. The data getting captured at the same edge as the launch flop is termed as hold violation (unless it is architecturally intended, the discussion of this is outside the scope of this topic).
The hold violation results due to data being captured on the same edge as it is launched on
Figure 5: Hold violation resulting in state machine going to invalid state
Similarly, for the proper functioning of the state machine, the data launched at one edge should get captured at the next edge. This is what is termed as setup check. Thus, setup check is formed on next edge only. The failure in happening so is termed as a setup violation. Similarly, the data launched at one edge should not be captured on the same edge. Thus, hold check represents this situation and ensures that the data launched on one edge is not captured on the same edge.

Based upon development of our understanding in this post, setup and hold can be defined as:

Setup check: Setup check refers to the condition in which data launched at one clock edge should get captured at the next clock edge so that the state machine functionality is preserved and the state machine transitions smoothly from one state to the next. The failure in happening so is termed as setup violation and the state machine might transition to an invalid state.



Hold check: Hold check refers to the condition in which data launched at one clock edge should not get captured at the same clock edge so that present state of the state machine does not get corrupt. The failure in happening so is termed as hold violation and the state machine might transition to an invalid state.

Design problem: How can you convert an XOR gate into a buffer or an inverter?


Figure 1 above shows the truth table of a 2-input XOR gate. It has two inputs A, B and an output OUT. On looking closely, we observe that:

  • If one of the inputs (say B) is 0, OUT is equal to the other input. For instance, when B = 0,
    • OUT = 0 when A = 0
    • OUT = 1 when A = 1
  • Similarly, if one of the inputs is 1, OUT is equal to invert of the other input. For instance, when B = 1,
    • OUT = 1 when A = 0
    • OUT = 0 when A = 1
Using the above information, XOR gate can be easily converted to a buffer and an inverter.

Buffer using XOR gate: Simply connecting one of the inputs to logic '0' will convert XOR gate into a buffer. As shown in the truth table below, OUT is following the value of B.




Inverter design using XOR gate: Similarly, we can realize an inverter using an XOR gate by connecting one of the inputs to logic '1'. As shown in the truth table below, OUT is opposite to the value of B.


Minimum pulse width violation example

STA problem: Consider below figure, wherein minimum pulse width requirement of a flip-flop is 590 ps. It is getting clocked by a PLL of 500 MHz with a duty cycle variation of 60 ps. There are 30 buffers in clock path, each having a rise delay of 60 ps and fall delay of 48 ps. Will this setup be able to meet the duty cycle requirement of flip-flop? Find the slack available.


Solution:

Here, we must remember that pulse can be either high pulse or low pulse. So, we need to check for both. Let us start with high pulse:

Pulse width check for high pulse: Here, we are left with calculating the latest possible arrival of rising edge and earliest possible arrival of falling edge at the flip-flop. It is given that

Ideal clock period = 2000 ps (500 MHz frequency)
Ideal half cycle = 1000 ps
Duty cycle variation of clock source = 60 ps
So, if we assume that positive edge of the clock has arrived at 0 time, negative edge can arrive at any time between 940 ps (1000 - 60) and 1060 ps (1000 + 60). Taking the pessimistic case, we have to assume negative edge arrives at 940 ps thereby making the high pulse as 940 ps at clock source.

Now, there are 30 buffers with rise delay of 60 ps and fall delay of 48 ps.

Rising edge will reach flip-flop at time (0 + 30 * 60) = 1800 ps.
Falling edge will reach flip-flop at time (940 + 30 * 48) = 2380 ps
Effective pulse width visible at flip-flop = 2380 - 1800 = 580 ps

Now, the pulse width requirement = 590 ps
Slack = Actual pulse width = Required minimum pulse width = -10 ps

So, we are violating the minimum high pulse width requirement by 10 ps.

Pulse width requirement for low pulse: Similar to the earlier case, we have to find the difference in arrival of latest negative edge and earliest positive edge.

Ideal clock period = 2000 ps (500 MHz)
Ideal half cycle = 1000 ps
Duty cycle variation of clock source = 60 ps
If we assume that negative edge arrived at 0 ps, positive edge can arrive at any time between 940 ps and 1060 ps. Taking the pessimistic case, low pulse width = 940 ps at clock source.

Now, there are 30 buffers with rise delay of 60 ps and fall delay of 48 ps.

Falling edge will reach flip-flop at time (0 + 30 * 48) = 1440 ps
Rising edge will reach flip-flop at time (940 + 60 * 30) = 2740 ps
Effective pulse width visible at flip-flop = 2740 - 1440 = 1300 ps

Pulse width requirement = 590 ps
Slack = 1300 - 590 = 710 ps

So, we are meeting the low pulse width requirement by 710 ps.