Showing posts with label Resolve setup violations. Show all posts
Showing posts with label Resolve setup violations. Show all posts

How to fix setup violations

In the post setup and hold time violations, we learnt about the setup time violations and hold time violations. In this post, we will learn the approaches to tackle setup time violations. Following strategies can be useful in reducing the magnitude of setup violation and bringing it closer towards a positive value:

1. Increase the drive strength of data-path logic gates: A cell with better drive strength can charge the load capacitance quickly, resulting in lesser propagation delay. Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load capacitor to toggle the output state. This will form an RC circuit with a certain RC time constant. A better drive-strength gate will have a lesser resistance, effectively lowering the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better drive strength will have less delay.


This strategy is going to give best results only if the load of the cell is dominated by external load capacitance. Generally, drive strength of a cell is proportional to the cell size. Thus, increasing the cell size halves its internal resistance, but doubles the internal node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell ideally remains same of doubling the size of the cell.



Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay) the delay can be anything between D/2 to D depending upon the ratio of intrinsic and external load capacitance.

Moreover, the input pin capacitance is a by-product of the size of the cell. Thus, increasing the size of the cell results in increased load for the driver cell of its input pins. So, in some cases (very high drive strength cell with less load driven by a low drive strength cell), increasing the drive strength can result in increase in magnitude of setup violation.

Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a function of cell drive strength. Also, area is a function of cell drive strength. So, increasing the drive strength to fix a setup violation results in both area and power increase (although very small in comparison to whole design).


2. Use the data-path cells with lesser threshold voltages: If you have multiple flavors of threshold voltages in your designs, the cell with lesser threshold voltage will certainly have less delay. So, this must be the first step to resolve setup violations.


3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-flop is a function of the transition at its data pin and clock pin. Better the transition at data pin, less is setup time. And worse clock transition causes less setup time. Also, a flip-flop with higher drive strength and/or lower threshold voltage is more probable of having less setup time requirement. Also, increasing the drive strength of flip-flop might cause the transition at clock pin and data pin to get worse due to higher pin loads. This also plays a role in deciding the setup time.

4. Restructuring of the data-path: Based upon the placement of data path logic cells, you can decide either to combine simple logic gates into a complex gate, or split a multi-stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area, power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate and 1 OR gate combined for same output load capacitance. But, if you need to traverse distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one logic cell can drive 200 micron and each logic cell has only one drive strength available for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power as compared to two 2-input AND gates.



Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to add a repeater, which will have its own delay. In this case, using two 2-input AND gates should give better results in terms of overall data-path delay.
 

5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in the design, the routing tool can detour the nets trying to get the place less congested. Thus, two logic cells might be placed very close, still the delay can seem to be high for both the cells ; for driver cell due to high net capacitance and for load cell due to poor transition at the input. Also, net delay can be a significant component in such scenarios. Below figure shows one such example of two AND gates situated a certain distance apart. Ideally, there could be a straight net route between the two gates. But, due to very high net density in the region, router tool chose to route the way as shown on the right to help ease the congestion (this is an exaggerated scenario to help understand better).

So, always give proper importance to net routing topology, at least for setup timing critical nets. A few tips to improve the timing you can try include:

  • Try the net to have as less detouring as possible
  • Vias increase the net resistance. So, try to have as less vias as possible
  • Higher metal layers have less resistance. So, long nets can be routed in higher layers to have less net delay

6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance. After that, its delay starts increasing rapidly. Since, net capacitance is a function of net length, we should keep a limit on the length of net driven by a gate. Also, net delay itself is proportional to square of net length. Moreover, the transitions may be very bad in such cases. So, it is wise to add repeater buffers after a certain distance, in order to ensure that the signal is transferred reliably, and in time.

7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flip-flop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops.

8. Increase clock period: As a last resort, you may choose to time your design at reduced frequency. But, if you are targeting a particular performance, you need a minimum frequency. In that case, this option is not for you.

9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay will help meeting a violating setup timing path. This can be achieved by:
  • Improving transition at flip-flops clock pin
  • Choosing a flip-flop of high drive strength. However, if by doing so, clock transition degrades, delay can actually increase
  • Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever used a method that is not listed above? Please share your experience in comments. We will be happy to hear from you.

Also read:

Setup time and hold time basics

In digital designs, each and every flip-flop has some restrictions related to the data with respect to the clock in the form of windows in which data can change or not. There is always a region around the clock edge in which input data should not change at the input of the flip-flop. This is because, if the data changes within this window, we cannot guarantee the output. The output can be the result of either of the previous input, the new input or metastability (as explained in our post  'metastability'). This window is marked by two boundary lines, one pertaining to the setup time of the flop, the other to the hold time defined as below.

Definition of Setup time: Setup time is defined as the minimum amount of time before the clock's active edge that the data must be stable for it to be latched correctly. In other words, each flip-flop (or any sequential element, in general) needs some time for the data to remain stable before the clock edge arrives, such that it can reliably capture the data. This duration is known as setup time.
The data that was launched at the previous clock edge should be stable at the input at least setup time before the clock edge. So, adherence to setup time ensures that the data launched at previous edge is captured properly at the current edge. In other words, we can also say that setup time adherence ensures that the system moves to next state smoothly. 
Definition of Hold time: Hold time is defined as the minimum amount of time after the clock's active edge during which data must be stable. Similar to setup time, each sequential element needs some time for data to remain stable after clock edge arrives to reliably capture data. This duration is known as hold time.
The data that was launched at the current edge should not travel to the capturing flop before hold time has passed after the clock edge. Adherence to hold time ensures that the data launched at current clock edge does not get captured at the same edge. In other words, hold time adherence ensures that system does not deviate from the current state and go into an invalid state.
As shown in the figure 1 below, the data at the input of flip-flop can change anywhere except within the seup time hold time window. 

Figure showing the setup and hold requirements forming a timing window during which data cannot toggle
Figure 1: Setup-hold window



A D-latch is composed of two inverters, connected in positive feedback loop which is tristated when input data path is enabled. On the other hand, when data path is tristated, this loop is enable
A D-type latch

Cause/origin of setup time and hold timeSetup time and hold time are said to be the backbone of timing analysis. Rightly so, for the chip to function properly, setup and hold timing constraints need to be met properly for each and every flip-flop in the design. If even a single flop exists that does not meet setup and hold requirements for timing paths starting from/ending at it, the design will fail and meta-stability will occur. It is very important to understand the origin of setup time and hold time as whole design functionality is ensured by these. Let us discuss the origin of setup time and hold time taking an example of D-flip-flop as in VLSI designs, D-type flip-flops are almost always used. A D-type flip-flop is realized using two D-type latches; one of them is positive level-sensitive, the other is negative level-sensitive. A D-type latch, in turn, is realized using transmission gates and inverters. Figure below shows a positive-level sensitive D-type latch. Just inverting the transmission gates’ clock, we get negative-level sensitive D-type latch.

A complete D flip-flop using the above structure of D-type latch is shown in figure below:

 A D-type flip-flop consists of two latches connected back to back in master-slave format
A D-type flip-flop



Now, let us get into the details of above figure. For data to be latched by ‘latch 1’ at the falling edge of the clock, it must be present at ‘Node F’ at that time. Since, data has to travel ‘NodeA’ -> ‘Node B’ -> ‘Node C’ -> ‘Node D’ -> ‘Node E’ -> ‘Node F’ to reach ‘Node F’, it should arrive at flip-flop’s input (Node A) at some earlier time. This time for data to reach from ‘Node A’ to ‘Node F’ is termed as data setup time (assuming CLK and CLK' are present instantaneously. If that is not the case, it will be accounted for accordingly). Similarly, it is necessary to ensure a stable value at the input to ensure a stable value at ‘Node C’. In other words, hold time can be termed as delay taken by data from ‘Node A’ to ‘Node C’.

Setup and hold checks in a design: Basically, setup and hold timing checks ensure that a data launched from one flop is captured at another properly. Considering the way digital designs of today are designed (finite state machines), the next state is derived from its previous state.  So, data launched at one edge should be captured at next active clock edge. Also, the data launched from one flop should not be captured at next flop at the same edge. These conditions are ensured by setup and hold checks. Setup check ensures that the data is stable before the setup requirement of next active clock edge at the next flop so that next state is reached. Similarly, hold check ensures that data is stable until the hold requirement for the next flop for same clock edge has been met so that present state is not corrupted.

A timing path from rise edge-triggered flip-flop to rise edge-triggered flip-flop
A sample path in a design
Shown above is a flop-to-flop timing path. For simplicity, we have assumed that both the flops are rise edge triggered. The setup and hold timing relations for the data at input of second flop can be explained using the waveforms below:


waveform showing setup and hold requirement for the sample timing path shown above
Figure showing setup and hold checks being applied for the timing path shown above


As shown, data launched from launching flop is allowed to arrive at the input of the second flop only after a delay greater than its hold requirement so that it is properly captured. Similarly, it must not have a delay greater than (clock period – setup requirement of second flop). In other words, mathematically speaking, setup check equation is given as below (assuming zero skew between launch and capture clocks):
                                Tck->q + Tprop + Tsetup < Tperiod
 Similarly, hold check equation is given as:
                               Tck->q  + Tprop > Thold

If we take into account skews between the two clocks, the above equations are modified accordingly. If Tskew is the skew between launch and capture flops, (equal to latency of clock at capture flop minus latency of clock at launch flop so that skew is positive if capture flop has larger latency and vice-versa), above equations are modified as below:
                    
                      Tck->q + Tprop + Tsetup - Tskew < Tperiod
                      Tck->q  + Tprop > Thold + Tskew

Setup checks and hold checks for reg-to-reg paths explains different cases covering setup and hold checks for flop-to-flop paths.

What if setup and/or hold violations occur in a design: As said earlier, setup and hold timings are to be met in order to ensure that data launched from one flop is captured properly at the next flop at next clock edge so as to transfer the state-machine of the design to the next state. If the setup check is violated, the data will not be captured at the next clock edge properly. Similarly, if hold check is violated, data intended to be captured at the next edge will get captured at the same edge. Setup hold violations can also lead to data changing within setup/hold window of the capturing flip-flop. It may lead to metastability failure in the design (as explained in our post 'metastability'). So, it is necessary to have setup and hold requirements met for all the flip-flops in the design and there should not be any setup/hold violation.
What if you fabricate a design without taking care of setup/hold violations: If you fabricate a design having setup violations, you can still use it by lowering the frequency as the equation involves the variable clock frequency. On the other hand, a design with hold violation cannot be run properly. So, if you fabricate a design with an accidental hold violation, you will have to simply throw away the chip (unless the hold path is half cycle as explained here). A design with half cycle hold violations only can still be used at lower frequencies.

Tackling setup time violation As given above, the equation for setup timing check is given as:
            Tck->q + Tprop + Tsetup - Tskew < Tperiod

The parameter that represents if there is a setup time violation is setup slack. The setup slack can be defined as the difference between the L.H.S and R.H.S. In other words, it is the margin that is available such that the timing path meets setup check. The setup slack equation can be given as:
            Setup slack = Tperiod -  (Tck->q + Tprop + Tsetup - Tskew)
If setup slack is positive, it means there is still some margin available in the timing path. On the other hand, a negative slack means that the paths violates setup timing check by the amount of setup slack. To get the path met, either data delay should be decreased or clock period should be increased.

Mitigating setup violation: Thus, we can meet the setup requirement, if violating, by 
1. Decreasing clk->q delay of launching flop 
2. Decreasing the propagation delay of the combinational cloud 
3. Reducing the setup time requirement of capturing flop 
4. Increasing the skew between capture and launch clocks
5. Increasing the clock period

Tackling hold time violation: Similarly, the equation for hold timing check is as below:
            Tck->q + Tprop > Thold + Tskew
The parameter that represents if there is a hold timing violation is hold slack. The hold slack is defined as the amount by which L.H.S is greater than R.H.S. In other words, it is the margin by which timing path meets the hold timing check. The equation for hold slack is given as:
            Hold slack = Tck->q + Tprop - Thold - Tskew
If hold slack is positive, it means there is still some margin available before it will start violating for hold. A negative hold slack means the path is violating hold timing check by the amount represented by hold slack. To get the path met, either data path delay should be increased, or clock skew/hold requirement of capturing flop should be decreased.

Mitigating hold violation: We can meet the hold requirement by:
  1. Increasing the clk->q delay of launching flop
  2. Decreasing the hold requirement of capturing flop
  3.  Decreasing clock skew between capturing clock and launching flip-flops
Also read: