How to fix setup violations

In the post setup and hold time violations, we learnt about the setup time violations and hold time violations. In this post, we will learn the approaches to tackle setup time violations. Following strategies can be useful in reducing the magnitude of setup violation and bringing it closer towards a positive value:

1. Increase the drive strength of data-path logic gates: A cell with better drive strength can charge the load capacitance quickly, resulting in lesser propagation delay. Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load capacitor to toggle the output state. This will form an RC circuit with a certain RC time constant. A better drive-strength gate will have a lesser resistance, effectively lowering the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better drive strength will have less delay.


This strategy is going to give best results only if the load of the cell is dominated by external load capacitance. Generally, drive strength of a cell is proportional to the cell size. Thus, increasing the cell size halves its internal resistance, but doubles the internal node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell ideally remains same of doubling the size of the cell.



Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay) the delay can be anything between D/2 to D depending upon the ratio of intrinsic and external load capacitance.

Moreover, the input pin capacitance is a by-product of the size of the cell. Thus, increasing the size of the cell results in increased load for the driver cell of its input pins. So, in some cases (very high drive strength cell with less load driven by a low drive strength cell), increasing the drive strength can result in increase in magnitude of setup violation.

Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a function of cell drive strength. Also, area is a function of cell drive strength. So, increasing the drive strength to fix a setup violation results in both area and power increase (although very small in comparison to whole design).


2. Use the data-path cells with lesser threshold voltages: If you have multiple flavors of threshold voltages in your designs, the cell with lesser threshold voltage will certainly have less delay. So, this must be the first step to resolve setup violations.


3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-flop is a function of the transition at its data pin and clock pin. Better the transition at data pin, less is setup time. And worse clock transition causes less setup time. Also, a flip-flop with higher drive strength and/or lower threshold voltage is more probable of having less setup time requirement. Also, increasing the drive strength of flip-flop might cause the transition at clock pin and data pin to get worse due to higher pin loads. This also plays a role in deciding the setup time.

4. Restructuring of the data-path: Based upon the placement of data path logic cells, you can decide either to combine simple logic gates into a complex gate, or split a multi-stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area, power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate and 1 OR gate combined for same output load capacitance. But, if you need to traverse distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one logic cell can drive 200 micron and each logic cell has only one drive strength available for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power as compared to two 2-input AND gates.



Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to add a repeater, which will have its own delay. In this case, using two 2-input AND gates should give better results in terms of overall data-path delay.
 

5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in the design, the routing tool can detour the nets trying to get the place less congested. Thus, two logic cells might be placed very close, still the delay can seem to be high for both the cells ; for driver cell due to high net capacitance and for load cell due to poor transition at the input. Also, net delay can be a significant component in such scenarios. Below figure shows one such example of two AND gates situated a certain distance apart. Ideally, there could be a straight net route between the two gates. But, due to very high net density in the region, router tool chose to route the way as shown on the right to help ease the congestion (this is an exaggerated scenario to help understand better).

So, always give proper importance to net routing topology, at least for setup timing critical nets. A few tips to improve the timing you can try include:

  • Try the net to have as less detouring as possible
  • Vias increase the net resistance. So, try to have as less vias as possible
  • Higher metal layers have less resistance. So, long nets can be routed in higher layers to have less net delay

6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance. After that, its delay starts increasing rapidly. Since, net capacitance is a function of net length, we should keep a limit on the length of net driven by a gate. Also, net delay itself is proportional to square of net length. Moreover, the transitions may be very bad in such cases. So, it is wise to add repeater buffers after a certain distance, in order to ensure that the signal is transferred reliably, and in time.

7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flip-flop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops.

8. Increase clock period: As a last resort, you may choose to time your design at reduced frequency. But, if you are targeting a particular performance, you need a minimum frequency. In that case, this option is not for you.

9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay will help meeting a violating setup timing path. This can be achieved by:
  • Improving transition at flip-flops clock pin
  • Choosing a flip-flop of high drive strength. However, if by doing so, clock transition degrades, delay can actually increase
  • Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever used a method that is not listed above? Please share your experience in comments. We will be happy to hear from you.

Also read:

Setup and hold violations

What is meant by setup and/or hold violations: The ultimate aim of timing analysis is to get the design work at required frequency and with reliability. For this to happen, it must be ensured in timing that all the state transitions are happening smoothly; i.e., the setup and hold requirements of all the timing paths in the design are met. If there are failing setup and/or hold paths, the design is said to have violations.

What if setup and/or hold violations occur in a design: As said earlier, setup and hold timings are to be met in order to ensure that data launched from one flop is captured properly at another and in accordance to the state machine designed. In other words, no timing violations means that the data launched by one flip-flop at one clock edge is getting captured by another flip-flop at the desired clock edge. If the setup check is violated, data will not be captured properly at the next clock edge. Similarly, if hold check is violated, data intended to get captured at the next edge will get captured at the same edge. Moreover, setup/hold violations can lead to data getting captured within the setup/hold window which can lead to metastability of the capturing flip-flop (as explained in our post metastability). So, it is very important to have setup and hold requirements met for all the registers in the design and there should not be any setup/hold violations.

Setup violations: As we know, setup checks are applied for timing paths to get the state machine to move to the next state. The timing equation for a setup check from positive edge-triggered flip-flop to positive edge-triggered flip-flop is given as below:
                       Tck->q + Tprop + Tsetup - Tskew < Tperiod
For a timing path to meet setup requirements, this equation needs to be satisfied. The difference between left and right sides is represented by a parameter known as setup slack.

Setup slack is the margin by which a timing path meets setup check requirement. It is given as the difference in R.H.S. and L.H.S. of setup timing equation. The equation for setup slack is given as:
                        Setup slack = Tperiod -  Tck->q - Tprop - Tsetup + Tskew
If setup slack is positive, it means the timing path meets setup requirement. On the other hand, a negative setup slack means setup violating timing path. If, by chance, a fabricated design is found to have a setup violation, you can still run the design at less frequency than specified and get the desired functionality as setup equation includes clock period as a variable.

If we analyze setup equation more closely, it involves four parameters:
  1. Data path delay: More the total delay of data path (flip-flop delay + combinational delay + Setup), less is setup slack
  2. Clock skew: More the clock skew (difference between arrival times of clock at capture and launch flip-flops), more is the setup slack
  3. Setup time requirement of capturing flip-flp: Less the setup time requirement, more will be setup slack
  4. Clock period: More is the clock period, more is the setup slack. However, if you are targetting a specific clock period, doing this is not an option. :-)
How to tackle setup violations: The ultimate goal of timing analysis is to get every timing path follow setup equation and get a positive setup slack number for every timing path in the design. If a timing path is violating setup timing (assuming we are targetting a certain clock frequency), we can try one or more of the following to bring the setup slack back to a positive value by:
  • Decreasing data path delay
  • Choosing a flip-flop with less setup time requirement
  • Increasing clock skew
How to fix setup violations discusses various ways to tackle setup violations.

Hold violations: As we know, hold checks are applied to ensure that the state machine remains in its present state until desired. The hold timing equation for a timing path from a positive edge-triggered flip-flop to another positive edge-triggered flip-flop is governed by the following equation:
               Tck->q + Tprop > Thold + Tskew
Similar to setup slack, the presence and magnitude of hold violation is governed by a parameter called as hold slack. The hold slack is defined as the amount by which L.H.S is greater than R.H.S. In other words, it is the margin by which timing path meets the hold timing check. The equation for hold slack is given as:
Hold slack = Tck->q + Tprop - Thold + Tskew
If hold slack is positive, it means there is still some margin available before it will start violating for hold. A negative hold slack means the path is violating hold timing check by the amount represented by hold slack. To get the path met, either data path delay should be increased, or clock skew/hold requirement of capturing flop should be decreased.

If we analyze hold timing equation more closely, it involves three parameters:
  1. Data path delay: More data path delay favours hold slack; hence, more data path delay, more is the margin
  2. Skew: Having a positive skew degrades hold slack
  3. Hold requirement of capturing flip-flop: Less the hold requirement, more will be hold slack
How to tackle hold violations: Similar to setup analysis, the ultimate aim of hold analysis is to get every timing path follow the hold timing equation and get a positive hold slack for each and every timing path in the design. If a timing path violates for hold, we can do either of the following:
  • Increase data path delay
  • Decrease clock skew
  • Choose a flip-flop with less hold requirement