Performance gain with latches

The property of latches being transparent gives them a basic characteristic, known as time borrowing, owing to which they can capture data over a period of time rather than an instant. Using this property of latches intelligently can result in performance advantage for specific design scenarios, especially for designs having asymmetric data paths in subsequent stages. Let us elaborate with the help of an example.
Let us suppose a design having two stages of pipeline with combinational logic in each stage as 12 ns and 5 ns respectively as shown in figure 1 below:

Figure 1: 2-stage pipelining

If we assume clock period to be 16 ns (half cycle being 8 ns), then each latch stage will borrow time from the subsequent stage as shown in figure below:





.

Now, since all the registers get the same clock signal, the minimu clock period is the maximum of combinational delays from REGA to REGB and REGB to REGC.

Tclk > MAX (TcombregA->regB, Tcombr(regB->regC))



Thus, this circuit cannot run with half clock period less than 12 ns, or clock period less than 24 ns.

This situation can be easened up if we replace REGB with a negative level-sensitive latch. Let us have a look at figure 2 below. Although the number of stages still remains the same, LATB can borrow time from next stage without impacting any logic.

Figure 2: Latch replacing register in the 2-stage pipelining
The same is shown in figure 3 below with the help of waveform. The clock is having a period of 9 ns. The latch can borrow time of 3 ns from next stage, still meeting the setup time by 1 ns. Thus, we have succeeded in reducing the half time period from 12 ns to 9 ns (time period from 24 ns to 18 ns), just by changing the register to a latch. This is how a latch can help gain in performance.

If there are multiple latch stages in series, each can borrow from the subsequent stage such that overall timing is met. For example, figure 3 shows 6 latches in series.


How delay of a standard cell changes with drive strength

A standard cell (let us say a buffer) can be represented as shown in figure 1 below, where 
R = Channel resistance 
Cds = Drain-to-source capacitance (internal capacitance of cell)
Cload = Load capacitance


So, RC time constant can be represented as "R * (Cds + Cload)".

What happens on increasing the drive strength? In our post "what is meant by drive strength", we discussed that the drive strength of a standard cell increases when we increase the size of its transistors. So, basically, a cell with drive strength 2X will have twice of width as compared to the one with 1X drive strength.
And we know that
Channel resistance decreases with "W".
Drain-to-source capacitance increases with "W".
So,  upon increasing the drive strength, its internal capacitance will increase and channel resistance will reduce by same amount. The same is depicted in figure 2 below.


Time constant of "1X" buffer = R * (Cds + Cload)
 Time constant of "2X" buffer = R/2 * (2Cds + Cload) 
Now, let us talk of following scenarios:

Special case 1: Load capacitance is negligible.
In this scenario, we are left with only internal resistance and capacitance of the cell.

Time constant of "1X" buffer = R * Cds
Time constant of "2X" buffer = R * Cds
So, in this case, there should not be any impact of increasing the drive strength of standard cell on delay. So, in case there is negligible load, we should not upsize the standard cell. Doing so may instead increase the overall path delay as increased drive strength cell will present increased load to the previous stage cell, thereby increasing the delay of previous stage.

Special case 2: Load capacitance is very large as compared to internal capacitance.
In this scenario,
Time constant of "1X" buffer = R * Cload
Time constant of "2X" buffer = (R * Cload ) / 2 
So, second buffer will take approximately half the time to charge the load capacitance as compared to "1X" buffer.

So, we see that the the maximum possible benefit in delay by increasing the drive strength of standard cell is a reduction by a factor of two. In the worst case, we may not see any benefit at all.

We can also look at above equation by splitting cell delay into two components:
  1. Cell delay due to its own intrinsic capacitance: It does not scale by drive strength and is a constant value for one kind of standard cells.
  2. Cell delay due to external load capacitance: It is variable and decreases as we increase the drive strength of standard cell.