In the post setup and hold time violations, we learnt about the setup time violations and hold time violations. In this post, we will learn the approaches to tackle setup time violations. Following strategies can be useful in reducing the magnitude of setup violation and bringing it closer towards a positive value:
1. Increase the drive strength of data-path logic gates: A cell with better drive strength can charge the load capacitance quickly, resulting in lesser propagation delay. Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load capacitor to toggle the output state. This will form an RC circuit with a certain RC time constant. A better drive-strength gate will have a lesser resistance, effectively lowering the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better drive strength will have less delay.
This strategy is going to give best results only if the load of the cell is dominated by external load capacitance. Generally, drive strength of a cell is proportional to the cell size. Thus, increasing the cell size halves its internal resistance, but doubles the internal node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell ideally remains same of doubling the size of the cell.
Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay) the delay can be anything between D/2 to D depending upon the ratio of intrinsic and external load capacitance.
Moreover, the input pin capacitance is a by-product of the size of the cell. Thus, increasing the size of the cell results in increased load for the driver cell of its input pins. So, in some cases (very high drive strength cell with less load driven by a low drive strength cell), increasing the drive strength can result in increase in magnitude of setup violation.
Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a function of cell drive strength. Also, area is a function of cell drive strength. So, increasing the drive strength to fix a setup violation results in both area and power increase (although very small in comparison to whole design).
2. Use the data-path cells with lesser threshold voltages: If you have multiple flavors of threshold voltages in your designs, the cell with lesser threshold voltage will certainly have less delay. So, this must be the first step to resolve setup violations.
3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-flop is a function of the transition at its data pin and clock pin. Better the transition at data pin, less is setup time. And worse clock transition causes less setup time. Also, a flip-flop with higher drive strength and/or lower threshold voltage is more probable of having less setup time requirement. Also, increasing the drive strength of flip-flop might cause the transition at clock pin and data pin to get worse due to higher pin loads. This also plays a role in deciding the setup time.
4. Restructuring of the data-path: Based upon the placement of data path logic cells, you can decide either to combine simple logic gates into a complex gate, or split a multi-stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area, power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate and 1 OR gate combined for same output load capacitance. But, if you need to traverse distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one logic cell can drive 200 micron and each logic cell has only one drive strength available for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power as compared to two 2-input AND gates.
Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to add a repeater, which will have its own delay. In this case, using two 2-input AND gates should give better results in terms of overall data-path delay.
5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in the design, the routing tool can detour the nets trying to get the place less congested. Thus, two logic cells might be placed very close, still the delay can seem to be high for both the cells ; for driver cell due to high net capacitance and for load cell due to poor transition at the input. Also, net delay can be a significant component in such scenarios. Below figure shows one such example of two AND gates situated a certain distance apart. Ideally, there could be a straight net route between the two gates. But, due to very high net density in the region, router tool chose to route the way as shown on the right to help ease the congestion (this is an exaggerated scenario to help understand better).
So, always give proper importance to net routing topology, at least for setup timing critical nets. A few tips to improve the timing you can try include:
6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance. After that, its delay starts increasing rapidly. Since, net capacitance is a function of net length, we should keep a limit on the length of net driven by a gate. Also, net delay itself is proportional to square of net length. Moreover, the transitions may be very bad in such cases. So, it is wise to add repeater buffers after a certain distance, in order to ensure that the signal is transferred reliably, and in time.
7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flip-flop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops.
8. Increase clock period: As a last resort, you may choose to time your design at reduced frequency. But, if you are targeting a particular performance, you need a minimum frequency. In that case, this option is not for you.
9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay will help meeting a violating setup timing path. This can be achieved by:
Also read:
Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay) the delay can be anything between D/2 to D depending upon the ratio of intrinsic and external load capacitance.
Moreover, the input pin capacitance is a by-product of the size of the cell. Thus, increasing the size of the cell results in increased load for the driver cell of its input pins. So, in some cases (very high drive strength cell with less load driven by a low drive strength cell), increasing the drive strength can result in increase in magnitude of setup violation.
Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a function of cell drive strength. Also, area is a function of cell drive strength. So, increasing the drive strength to fix a setup violation results in both area and power increase (although very small in comparison to whole design).
2. Use the data-path cells with lesser threshold voltages: If you have multiple flavors of threshold voltages in your designs, the cell with lesser threshold voltage will certainly have less delay. So, this must be the first step to resolve setup violations.
3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-flop is a function of the transition at its data pin and clock pin. Better the transition at data pin, less is setup time. And worse clock transition causes less setup time. Also, a flip-flop with higher drive strength and/or lower threshold voltage is more probable of having less setup time requirement. Also, increasing the drive strength of flip-flop might cause the transition at clock pin and data pin to get worse due to higher pin loads. This also plays a role in deciding the setup time.
4. Restructuring of the data-path: Based upon the placement of data path logic cells, you can decide either to combine simple logic gates into a complex gate, or split a multi-stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area, power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate and 1 OR gate combined for same output load capacitance. But, if you need to traverse distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one logic cell can drive 200 micron and each logic cell has only one drive strength available for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power as compared to two 2-input AND gates.
Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to add a repeater, which will have its own delay. In this case, using two 2-input AND gates should give better results in terms of overall data-path delay.
5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in the design, the routing tool can detour the nets trying to get the place less congested. Thus, two logic cells might be placed very close, still the delay can seem to be high for both the cells ; for driver cell due to high net capacitance and for load cell due to poor transition at the input. Also, net delay can be a significant component in such scenarios. Below figure shows one such example of two AND gates situated a certain distance apart. Ideally, there could be a straight net route between the two gates. But, due to very high net density in the region, router tool chose to route the way as shown on the right to help ease the congestion (this is an exaggerated scenario to help understand better).
So, always give proper importance to net routing topology, at least for setup timing critical nets. A few tips to improve the timing you can try include:
- Try the net to have as less detouring as possible
- Vias increase the net resistance. So, try to have as less vias as possible
- Higher metal layers have less resistance. So, long nets can be routed in higher layers to have less net delay
6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance. After that, its delay starts increasing rapidly. Since, net capacitance is a function of net length, we should keep a limit on the length of net driven by a gate. Also, net delay itself is proportional to square of net length. Moreover, the transitions may be very bad in such cases. So, it is wise to add repeater buffers after a certain distance, in order to ensure that the signal is transferred reliably, and in time.
7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flip-flop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops.
8. Increase clock period: As a last resort, you may choose to time your design at reduced frequency. But, if you are targeting a particular performance, you need a minimum frequency. In that case, this option is not for you.
9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay will help meeting a violating setup timing path. This can be achieved by:
- Improving transition at flip-flops clock pin
- Choosing a flip-flop of high drive strength. However, if by doing so, clock transition degrades, delay can actually increase
- Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever used a method that is not listed above? Please share your experience in comments. We will be happy to hear from you.
I have been preparing for Silicon Design Engineering positions from your Blogs. It is so precise and to the point. It has almost everything a Design Engineer needs to know before going to the interview. Thank you!
ReplyDeleteHi Swati
DeleteThanks for good words. It would be very nice of you if you can provide a feedback of what topics can be included more. :-)
Excellent work bro, Have some below listed queries
ReplyDelete1. In step 1 (Increase the drive strength of data-path logic gates) you had mentioned that cell size can be increased to increase drive strength. what does the cell size here refers to ? will it mean increase in process technology like moving from one nano meter to other (0.10nm to 0.28 nm).
If that is the case, is it possible to increase specific cells ( path having higher delay) alone to different technology ? Because mostly i heared like the entire chip will be made up by single technology like (0.28nm chip).
Correct my understanding if i'm wrong.
2. In step 6 ( Adding repeaters), But repeaters will in turn take some delay right? Do you mean that delay caused by repeater is very less when compared to net delay.
Hi
Delete1. You are right that entire chip needs to be made into a single technology. I would digress slightly away from topic, but technology refers to the minimum dimensions that can be drawn. For instance, 28 nm technology has requirement for wire width to be, say, minimum of 35 nm. But we can always draw wires with larger width like 100 nm, which can be the minimum wire width of 90 nm technology. And 100 nm width wires will have less resistance than 35 nm wires. So, technology is just a placeholder for minimum dimensions.
Coming back to query, increasing cell size means increasing "W" while keeping "L" constant for all the transistors in the cell. It also means using a cell with higher drive strength.
2. Yes, repeaters will take some delay. But a repeater can only handle load until a specific limit. I recommend you to first go through this post "https://vlsiuniverse.blogspot.com/2017/12/how-delay-of-standard-cell-changes-with.html". I guess the answer should be implicit after that. In case there are any queries, we can always discuss.
1. Thank u, which means we can able to increase wire width of specific cell's alone (worst case timing path cells). Is that possible ? Apart from that, increase in wire width by keeping L as constant will leads to increase in overall area of transistor right?
DeleteHi
DeleteAt cell level (assuming we are talking about SoC design flow), increasing wire width will not be possible because cells are picked from specific cell libraries. So, changing attributes of a cell is not possible. :-) But yes, you can change the width of net connected to cell. But there are pros and cons of doing this. Because increasing the width decreases R, but at the same time, increases C. So, if the net is very small, we may not see any improvement in delay (or we may see delay degradation) as well.
Hi
DeleteWill increasing the width of the net increase and decrease the capacitance and resistance respectively by the same amount? If this is so, it means that the time constant of the net will remain unchanged even after the new size.
Unless, the resistance and capacitance get modified differently upon resizing of the net's width?
Hi, on a first level, yes. there may be slight modification based upon higher order effects. Yes, time constant of the net will remain same if we consider lumped parasitics. But on decreasing net capacitance, the driver cell supplies better transition signal. So, in most of the cases, delay of the net should improve.
Deleteok, Got clarified now. Thanks a lot
ReplyDeletehi.
ReplyDeletei have some question..
you say that 9. Choosing a flip-flop of high drive strength. However, if by doing so, clock transition degrades, delay can actually increase
but i learned that if drive strength increase, delay and transition is better(short,,?).
it is different flipflop drive strength and other drive strength?
what happend in tapout stage if not fixing hold voilations
ReplyDeleteAmazing post, I work as a PD engineer, I have been refering this page for past 6 months. Keep it up!
ReplyDeleteNumber "9. Improve (Reduce) the d->q delay..." can also be accomplished by reducing the fanout. Fanout can be reduced through register duplication.
ReplyDelete