Showing posts with label STA. Show all posts
Showing posts with label STA. Show all posts

Clock gating checks in case of mux select transition when both clocks are running

PROBLEM: In the following figure, it is desired to toggle the select of the mux from CLOCK_DIV to CLOCK and both the clocks are running. What are the architectural and STA considerations for the same?

SOLUTION:
This is a very good example to understand how clock gating checks work, although you may/may not find any practical application for the same. We have to toggle the select of the multiplexer such that there is no glitch at the output. Let us consider architectural considerations first:

Architectural considerations:

Launching flip-flop of 'select' signal: In the post clock gating checks at a multiplexer, we discussed that if there is a mux getting clock at its inputs and select as data, then, there are two possible scenarios:

  • If the other clock is at state "0", then AND type check is formed and select has to launch from negative edge-triggered flip-flop
  • If the other clock is at state "1", then OR type check is formed and select has to launch from positive edge-triggered flip-flop
Now, since both the clocks are running simultaneously, both with act as "other clock" for each other. Let us choose to keep both the clocks at state "0" when select toggles. The same discussion holds true for the other scenario as well, just that appropriate values will hold. Thus,

(i) Both clocks required to be at state '0' when clock toggles
(ii) There is AND-type clock gating check formed between 'select' and both clocks 
(iii) 'select' launches from negative edge-triggered flip-flop.


Valid negative edges when 'select' can toggle: Now, as mentioned above both the clocks should be zero when select toggles. Figure below shows the valid and invalid edges where 'select' can toggle. As it turns out, select can toggle only on edges labelled "VALID" as both "CLOCK" and "DIV_CLOCK" will be zero then.

 So, to ensure that "SEL" toggles only when DIV_CLOCK is "0", we can add logic to the input of the flip-flop launching "SEL" such that it allows to propagate "SEL" only when DIV_CLOCK is "0".


In the above diagram, flip-flop launching "SEL" will hold its value when DIV_CLOCK = 0. We have to keep in mind that this implementation is just a representation of what needs to be done. the actual implementation may be more complex than this depending upon the requirements.

Timing considerations: Now coming to the timing considerations, we need to ensure that the setup and hold conditions are met, which are as shown in the figure below:

Also read:




Intricacies in handling of half cycle timing paths

What is a half cycle path? A half cycle timing path is one in which launch and capture happen on different clock edges. A half cycle path can be in terms of both setup and hold. However, normally, in technical terms half cycle path is the one which has setup check getting formed as half cycle. For instance, following are some of the examples of half cycle timing paths:


  1. A timing path from positive edge-triggered flip-flop to a negative edge-triggered flip-flop and vice-verse. Here, hold check is also half cycle on the previous edge
  2. A timing path from a positive level-sensitive latch to a negative level-sensitive latch and vice-verse. Here, hold check is zero cycle
  3. A timing path from a negative edge-triggered flip-flop forming a clock gating check on AND gate (Here, hold check is zero cycle)
  4. A timing path from a positive edge-triggered flip-flop forming a clock gating check on OR gate (here, hold check is zero cycle)
There are also, some cases where hold check is half cycle and setup check is single/zero cycle. These are:
  1. A timing path from a negative edge-triggered flip-flop forming a clock gating check on OR gate (Here, setup check is single cycle check)
  2. A timing path from a positive edge-triggered flip-flop forming a clock gating check on AND gate (Here, setup check is single cycle check)
In addition, minimu pulse width checks should also be considered same as half cycle timing paths. But, in this case, start-point and end-point are the same register.

In this post, we will be considering only setup timings paths as example, although the complete discussion applies on all kinds of half cycle setup paths/checks. To start with, let us note down the most simple setup check equation for half cycle timing paths.

Tck->q + Tprop + Tsetup  < (Tperiod/2) + Tskew

Let us now discuss some of the intricacies that we should be aware of while dealing with half cycle timing paths:

Clock source duty cycle variation: There is always a variation in duty cycle of the clock source due to uncertainty in the relative timings of positive and negative edges. Duty cycle variation is always measured with respect to corresponding positive and negative edges. In other words, we can also say that duty cycle variation is the uncertainty in arrival of negative edge, given that positive edge has arrived at certain fixed point of time. Let us take an example. If we are given a clock with a period of 10 ns with ideal 50% duty cycle. Also, we are given that it has the clock has a duty cycle variation of +-5%. So, if we say that we saw positive edge of clock at 100 ns, we can expect to see negative edge of clock at any time between 14.5 ns and 15.5 ns. Following waveform illustrates this. You can read my earlier post duty cycle variation to have a more detailed elaboration.

So, the setup check equation modifies as:



Tck->q + Tprop + Tsetup  < (Tperiod/ 2- Tsdc) + Tskew
where Tsdc is the clock source duty cycle variation. Thus, the effective half clock period reduces by an amount equal to duty cycle variation.

Duty cycle degradation In addition to source duty cycle variation, there can be assymmetry in rise delay vs fall delay of clock elements. For instance, a buffer may have nominal rise (0 -> 1) delay of 50 ns whereas 48 ns for fall delay (1 -> 0). So, if a clock pulse passes through it, it will eat a portion of this clock pulse as shown in figure 1 below. For more clarity, we have exaggerated the scenario with a fall delay of 30 ns.

So, a half cycle may be larger of smaller than actual half cycle at the clock pin. In the above case, positive to negative edge setup check will be tighter by 20 ns and negative-> positive setup check will be relaxed by same amount (neglective OCVs as of now). So, the modified setup equation, now, becomes:
Tck->q + Tprop + Tsetup  < (Tperiod/2 - Tsdc) + (Tskew - Tdcd)
As discussed above also, Tdcd can be positive or negative depending upon if rise-fall variation of cells is helping or oppsing.

Can you think of some other scenario that is specific only to half cycle timing paths? Do share, if you do.

Is hold always checked on the same edge?

One of the guys asked me a question, "Why is hold always checked on the same edge?" Normally, it is taught in books/colleges that hold is frequency independent because it is checked on same edge. But, is it really true? It is true only for some of the many cases. Hold can be checked on the same edge, next edge or previous edge depending upon the scenario. In this post, we will discuss those cases one by one, and try to generalize if this statement holds true.

How to determine the edge on which hold check needs to be checked: For most of us, it seems quite confusing to arrive at the conclusion of how to determine the hold edge. Let us try to use a state machine perspective here. In state machine theory, we study that synchronous digital circuits can be considered as state machines moving from one state to another. This state transition happens on each clock edge as shown in figure 1 below.

In digital circuits, we can say that each clock edge (either positive or negative) corresponds to an independent state.
Figure 1: Each clock edge corresponds to a design state

If we look at each flip-flop, every positive edge-triggered flip-flop changes its state at positive clock edge and all negative edge-triggered flip-flops transition state at negative clock edge. Similarly, all negative edge-triggered flip-flops transition state at negative clock edge as shown in figure 2 below.


We can assume that all positive edge-triggered flip-flops transition their states at positive edges and all negative edge-triggered flip-flops transition their states at negative edges of clock
Figure 2: State transition for positive edge-triggered and negative edge-triggered flip-flops

Or, we can represent the states of positive edge-triggered and negative edge-triggered flops as separate as shown in figure 3 below.


Figure 3: States of positive and negative edge-triggered flops represented symbolically

Let us have a scenario of a timing path from a positive edge-triggered flop to a positive edge-triggered flop. In the figure 4 below, flip-flop "2" transitions to state (K+1) depending upon the value of flip-flop "1" at state (K).

Figure 4: A sample timing path from positive edge-triggered flip-flop to positive edge-triggered flip-flop

Here, the data launched from ff1 should help ff2 transition to state "K+1", meaning, it should be captured at the corresponding clock edge. This represents setup check. Also, it should not disturb state "K" of ff2, meaning it should not get captured at this edge. This represents hold check. So, in this case hold check is on the same edge as the present state of start and end flops is the same edge.

Figure 5: Setup and hold checks for positive-to-positive edge-triggered timing paths

Now, let us take a look on the scenario where-in hold check is not on the same edge. Let us take a timing path launching data from negative edge and capturing at positive edge. This scenario is shown in figure 6 below.

Figure 6: Timing path from negative-to-positive edge-triggered flop

Here, positive edge-triggered flip-flop transitions states on positive edge and negative edge-triggered flop transitions on negative edges. So, the data launched from negative edge-triggered flop corresponding to state "X" should get captured on positive edge-triggered flop on state "Y+1", which corresponds to setup check. And it should not get captured on state "Y", which corresponds to hold check.


Thus, we have looked upon different cases of hold capturing edge being same or different than the launch edge. For all the possible cases of setup and hold checks, you can follow below posts:


Setup time and hold time - origin

In our previous post, Setup and hold – the state machine perspective, we discussed how setup and hold can be defined in respect of state machines. Interestingly, there is another perspective of setup and hold – that in repect to devices, known as setup and hold time requirements. For a device, (for example a flip-flop, a latch or an SoC), setup and hold times are defined as:
Setup time: Setup time of a device is defined as the minimum time before the clock edge the data should be kept stable so that it is reliably sampled by the clock.
Hold time: Hold time of a device is defined as the minimum time after the clock edge the data should be kept stable so that it is reliably sampled by the clock.

In other words, every device has a setup and hold window surrounding the active clock edge within which data should be kept stable. As is shown in figure 1, brown line represents the active clock edge, blue line represents setup window and red line represents hold window. As is shown, data can toggle at any time except between setup and hold windows. Toggling of data between setup-hold window means flip-flop might go into metastable state and the output of flip-flop does not remain predictable.
Setup check for data path being launched from positive edge-triggered flip-flop is single cycle and hold check is zero cycle
Figure 1: Setup and hold checks
Origin of setup and hold timing requirements: Let us consider a positive edge-triggered flip-flop. Figure 2 shows a most simplistic circuit for a practical flip-flop. Inverters I1, I2 and Transmission gates G1, G2 constitute master latch and I3, I4, G3, G4 constitute slave latch.
A positive edge-triggered flip-flop consists of master and slave latches, each of which consists of two inverters connected in positive loop back mode and two transmission gates
Figure 2: A typical practical circuit for negative edge-triggered flip-flop
Figure 3 below shows the origin of setup time requirement. For data to get latched properly, it should complete the feedback loop of master latch before the closing edge of clock at transmission gate G4. So, setup time requirement of the flip-flop is:
Tsetup = TG3 + TI1 + TI2 + TG4
The setup check of a flip-flop consists of delay of input transmission gate and feedback transmission gate and the two inverters of master latch
Figure 3: Figure demonstrating delays constituting setup check
Similarly, figure 4 below shows the origin of hold timing requirement. For data to get latched properly, the next data should not cross inverter I1. So, hold timing requirement of the flip-flop is:
Thold = -(TG3 + TI1)

In other words, hold time is the minimum time required for the data to change after the clock edge has passed so that new data does not get captured at the present clock edge.
Hold check consists of input transmission gate delay and input inverter delay of master latch in flip-flop
Figure 4: Delays constituted in hold check


Thus, in this post, we have discussed the origin of setup and hold checks for a device.

What is the difference between a normal buffer and clock buffer?

A buffer is an element which produces an output signal, which is of the same value as the input signal. We can also refer a buffer as a repeater which repeats the signal it is receiving, just as there are repeaters in telephone signal transmission lines. You must have noticed that we have two kinds of buffers (or any logic gate) available in standard cell libraries as:

  • Clock buffer: The clock buffers are designed specifically to have specific properties that are supposed to be good for clock distribution networks (clock trees). The specific properties that are required in an ideal clock tree buffer are given as below. However, it is not possible to attain these ideal properties for every buffer at every technology node. It may be only possible to get close to these properties.
    • Equal rise and fall times
    • Less delays
    • Less delay variations with PVT and OCV
  • Normal buffer/data buffer: For a data buffer, the above properties are usually less desired
Usually, we can say that following differences may exist between a clock buffer and a normal buffer:
  • In SoCs, clock routing is done in higher metal layers as compared to signal routing. So, to provide easier access to clock pins from these layers, clock buffers may have pins in higher metal layers. That is, vias are provided in standard cell itself instead of necessitating on having in clock distribution network. For a data buffer, the pins are expected to be in lower layers only.
  • Clock buffers are balanced. In other words, rise and fall times of clock buffers are nearly equal. The reason behind this is that if the clock buffers are not balanced, there will be duty cycle distortion in the clock tree, which can lead to pulse width violations as discussed in minimum pulse width violation example. On the other hand, data buffers can compromise with either of rise/fall times. In other words, they dont need to have PMOS/NMOS size to be 2:1; and hence, can be of smaller size as compared to clock buffers.
  • Due to above reason, clock buffers consume more power as compared to normal buffers.
  • Generally, you will find clock buffers with higher drive strength as compared to normal buffers. So that a clock buffer can drive long nets and can have higher fanouts. This helps clock buffers, and hence, clock trees to have less overall delays.

What is meant by drive strength of a standard cell

As we know that cell delay is a function of output load capacitance. The most simplistic equivalent circuit of a logic gate driving an output can be assumed as given in figure 1:


The purpose of logic gate is to propagate the effect of logic value available at its input to the output. Based upon whether '0' or '1' is to be propagated to the output. The corresponding is achieved by charging and discharging of the output load capacitance. Propagating a logic '0' will mean discharging of the load capacitance, and vice-versa. Drive strength of the logic gate is the its relative capability to charge/discharge the capacitance present at its output. Now, the time constant, and hence, delay of the circuit is "RC".
So, for a cell with higher drive strength, corresponding "R" is lesser than the one with lower drive strength. So that for same load capacitance "C", delay is lower for a cell with higher drive strength as it can charge the capacitance in lesser time.

How drive strength varies with size of a cell: Let us talk in terms of MOSFETs, although this is valid in terms of every device in general. We know that for a given technology standard cell library, length of all transistors is kept constant. For instance, 90 nm technology will have gate length of all transistors as ~90 nm. And channel resistance of the MOSFET is inversely proportional to "W/L" of the transistor. So, a simple way to decrease channel resistance is to increase "W" of the transistor. So, a transistor with more area will have lesser resistance. Or we can say that a logic gate with bigger transistors will have more drive strength.

What is unit drive strength: In a standard cell library, we generally see cells labelled as "1X", "2X" and so on. But what is meant by the number that you see with drive strength? In general, the lowest size logic gate is labelled as unit drive strength. The drive strength numbers of other cells are laelled relative to unit drive strength cell.

Read next: How delay of a cell changes with drive strength

Also read:

How to fix min pulse width violation

In our previous posts, we discussed about the duty cycle, duty cycle variation and duty cycle degradation. Bad duty cycle impacts half cycle timing paths and has impact in meeting timing for minimum pulse width checks of flip-flops. However, there are certain techniques available that can help you in improving the duty cycle of the clock. We will discuss these techniques in this post as below:

1. Dual inversion in clock branch: A certain category of logic cells are more probable of having one of the rise or fall delays greater than the other. A chain of such cells will make either high pulse of clock shorter or low pulse of clock shorter. One can use an inverter in the middle of the chain as shown in figure below to tackle this. Doing this, what we are essentially doing is converting rise edges to fall and vice-versa. So, the shortening of pulse of first few elements is balanced with the rest of the elements. In the below figure, there are 20 buffers, each shortening the pulse by 10 ps. The output of 10th buffer will have a shorter pulse as compared to clock source. The inverter at the output of 10th buffer will feed an inverted clock to 11th buffer. This will have high pulse which is greater than low pulse. Rest of the chain will try to reduce this pulse. In the end, we get a pulse which is equal to what was available at the source.


One can also try an all-inverter clock tree. In an all-inverter clock tree, every element will change the sense of clock pulse; thereby minimizing the clock pulse distortion.

However, this kind of delay balancing will only work where there is inherent variation of delays in rise vs fall. It will not work in case of OCV variations. So, if the chain length is arbitrarily large, our second method will come to rescue.

2. Even division to tackle duty cycle degradation: Suppose there is source clock with very poor duty cycle (say 10%) and you divide down the clock by 2 with a flip-flop divider. What we observe is amazing. The resulting clock is having almost 50% duty cycle. So, whenever we need an output clock with perfect duty cycle, we can use a divider to divide down the clock. The only drawback of this method is that we need a source clock of frequency twice than what is required to be timed!!


There are a few things to be kept in mind for this method:

  • This method will improve the duty cycle of clock at the output of the flop. Degradation in duty cycle happening after the divider, if any, will be there.
  • Duty cycle of the input clock at flip-flop must be within the limits of what is required to be minimum pulse width at the flip-flop.

Duty cycle of clock

Duty cycle: Duty cycle of a clock is defined as the fraction of a period of clock during which the clock is in active state. Duty cycle of a clock is normally expressed as a percentage. For instance, figure below shows a clock having an active state of '1' stays low for 2 ns during its period of 10 ns. It is, therefore, said to have a duty cycle of 20%.


How duty cycle impacts timing: Duty cycle of clock plays a big role in timing closure of designs. We need to consider following factors related to duty cycle variation while timing:

  • Half cycle timing paths: If there are both positive and negative edge-triggered flip-flops in the design, duty cycle of the clock matters a lot. For instance, if we have a clock of 100 MHz with 20% duty cycle; For a timing path from positive edge-triggered flip-flop to negative edge-triggered flip-flop, we get only 2 ns for setup timing for positive-to-negative path and 8 ns for negative-to-positive path as compared to 10 ns for a full cycle path. However, if the same clock had duty cycle of 50%, we would have got 5 ns for the same half cycle timng path.

  • Minimum pulse width requirements: At high frequencies, duty cycle matters a lot. For instance, every sequential element has requirement of minimum pulse width that should reach it (read this). If the duty cycle of the clock is not close to 50%, we are limited in providing high frequency even if we are capable of meeting timing at even higher frequencies. Let us take an example. If the minimum pulse width requirement of a flip-flop is 500 ps, then with 50% duty cycle clock, we can use a clock of 1 GHz (1 ns clock period). But if we use a clock of duty cycle of 20%, we cannot use a clock greater than 400 MHz.
With the above things in mind, it makes sense to use a clock with duty cycle as close to 50%. However, in many scenarios, it may not be feasible to do so. So, one needs to decide the priorities; i.e., architecture complexities vs timing complexities. Generating a divided clock of 50% duty cycle is not always possible and there are a few complexities involved in architecture. For instance, clock waveform synchronization between the clocks if there are multiple dividers. Also, for odd division factors like divide_by_3 etc., we need more complex divider circuitry than what may be required for divide_by_2 or divide_by_4 etc.

Clock jitter

Clock jitter: By definition, clock jitter is the deviation of a clock edge from its ideal position in time. Simply speaking, it is the inability of a clock source to produce a clock with clean edges. As the clock edge can arrive within a range, the difference between two successive clock edges will determine the instantaneous period for that cycle. So, clock jitter is of importance while talking about timing analysis. There are many causes of jitter including PLL loop noise, power supply ripples, thermal noise, crosstalk between signals etc. Let us elaborate the concept of clock jitter with the help of an example:

A clock source (say PLL) is supposed to provide a clock of frequency 10 MHz, amounting to a clock period of 100 ns. If it was an ideal clock source, the successive rising edges would come at 0 ns, 100 ns, 200 ns, 300 ns and so on. However, since, the PLL is a non-ideal clock source, it will have some uncertainty in producing edges. It may produce edges at 0 ns, 99.9 ns, 201 ns etc. Or we can say that the clock edge may come at any time between (<ideal_time>+- jitter); i.e. 0, between 99-101 ns, between 199-201 ns etc (1 ns is jitter). However, counting over a number of cycles, average period will come out to be ~100 ns.

Figure 1 below shows the generic diagram for clock jitter:



Please note that the uncertainty in clock edge can be for both positive as well as negative edges (above example showed only for positive edges). So, there are both full cycle and half cycle jitters. By convention, clock jitter implies full cycle clock jitter.


Types of clock jitter: Clock jitter can be measured in many forms depending upon the type of application. Clock jitter can be categorized into cycle-to-cycle, period jitter and long term jitter.
  • Cycle to cycle jitter: By definition, cycle-to-cycle jitter signifies the change in clock period accross two consecutive cycles. For instance, it will be difference in periods for 1st and 2nd cycles, difference in periods for 10th and 11th cycles etc. It has nothing to do with frequency variation over time. For instance, in figure below, the clock has drifted in frequency (from period = 10 ns to period = 1 ns), still maintaining a cycle-to-cycle jitter of 0.1 ns. In other words, if t2 and t1 are successive clock periods, then cycle_to_cycle_jitter = (t2 - t1).

  • Period jitter: It is defined as the "deviation of any clock period with respect to its mean period". In other words, it is the difference between the ideal clock period and the actual clock period. Period jitter can be specified as either RMS period jitter or peak-to-peak period jitter.
    • Peak-to-peak period jitter: It is defined as the jitter value measuring the difference between two consecutive edges of clock. For instance, if the ideal period of the clock was 20 ns, then for clock shown above,
      • for first cycle, peak-to-peak period jitter = (20 - 20) = 0 ns
      • for second cycle, peak-to-peak period jitter = (20 - 19.9) = 0.1 ns
      • for last cyle, peak-to-peak period jitter = (20 - 1) = 19 ns
    • RMS period jitter: RMS period jitter is simply the root-mean-square of all the peak-to-peak period jitters available.

  • Long term jitter: Long term jitter is the deviation of the clock edge from its ideal position. For instance, for a clock with period 20 ns, ideally, clock edges should arrive at 20 ns, 40 ns and so on. So, if 10th edge comes at 201 ns, we will say that the long term jitter for 10th edge is 1 ns. Similarly, 1000th edge will have a long term jitter of 0.5 ns if it arrives at 20000.5 ns.

Let us try to understand the difference between all the three kinds of jitter with the help of an illustrative example waveform below:


Reference:
* Understanding SYSCLK jitter

Also read:

Can jitter in clock effect setup and hold violations?

First of all, we need to understand what is meant by jitter. In most simplistic language, jitter is the uncertainty of a clock source in production of clock edges. For example, if we say that there is a 100 MHz clock source. Ideally, it should produce a clock edge at 0 ns, 10 ns, 20 ns... So, if we say that there was a clock edge at time t = 30 ns, we should get the next clock edge at t = 40 ns. But this is hardly so; due to the uncertainty of getting a clock edge, we might get the next edge between 39.9 ns to 40.1 ns. So, we say that 0.1 ns is the jitter in the period of the clock. In reality, the definition of jitter is more complex. But, for our scope, this understanding is sufficient.

Let us consider a simple timing path from a positive edge-triggered flip-flop to a positive edge-triggered flip-flop.


Now, let us come to our discussion. First, let us discuss the effect of clock jitter on setup slack.

Effect of clock jitter on setup slack for single cycle paths: From our knowledge of STA basics, setup check formed, in this case, will be from edge 1 -> edge 3. Now, if we know that edge 1 arrived at 20 ns, then edge 3 may arrive at any time (20 ns + CLOCK_PERIOD + jitter) and (20 ns + CLOCK_PERIOD - jitter). So, to cover worst case timing scenario, we need to time as per (20 ns + CLOCK_PERIOD - jitter). So, effectively, we will get (CLOCK_PERIOD - jitter) as effective clock period.

In other words, jitter in clock period makes the setup timing more tight. Or it decreases setup slack for single cycle timing paths.


Effect of clock jitter on hold slack for single cycle paths: Going on the same grounds as setup slack, hold check will be from edge 1 -> edge 1 only. And we know with certainty that edge 1 will leave the source at 20 ns only. So, hold slack should not get bothered by the amount of jitter present at the clock source for single cycle timing paths.

Now, you understand the basics of  how jitter affects setup and hold slacks. We can state as below:

If the check being formed involves two different edges of same polarity (for instance, different rising edges), then, jitter in clock period will affect setup slack. Otherwise, it will not.

Now, can you guess the effect of jitter on setup and hold slacks for zero cycle timing paths?

Also, what will be the amount of pessimism needed to be taken into account for setup and hold slacks' calculations if the timing path is a multi-cycle path taking 2 cycles for setup and zero cycle for hold?

Also read:



Recovery and removal checks

Recovery and removal checks are associated with deassertion of asynchronous reset. The assertion of reset causes the output to get reset and deassertion transfers the control of output to clock signal; i.e., deassertion of reset does not change the output as we discussed in post synchronous and asynchronous resets. However, to ensure that the design comes out of reset in deterministic cycle and to avoid metastability, there must be a region around arrival of clock edge within which reset must not be deasserted. This is similar to setup and hold timing checks, the difference being that:
Setup and hold checks are associated with synchronous data signals for a flop and are applied to both rise and fall transitions of data. Recovery and removal checks, on the other hand, are for asynchronous reset transitioning from active state to inactive state only (deassertion of reset).
To properly understand what recovery and removal checks are, we need to understand what asynchronous reset assertion and deassertion does.

Asynchronous reset assertion: In a flip-flop, assertion of reset causes the output to go to its reset value (which is normally "0"). The assertion of reset is an asynchronous event and is not impacted by the state of clock. Figure 1 below depicts the assertion of reset. As it can be seen, Output asynchronously goes to "0" as an effect of reset going to its active state "1".


Asynchronous reset deassertion: The deassertion of asynchronous reset causes the output to get out of the impact of reset and behave like a normal flip-flop. When the reset gets deasserted, its output remains to be "0" until the clock edge. When the clock edge arrives, the value at the input of the flop propagates to the output. Figure 2 below depicts the same. However, the position of reset deassertion with respect to clock edge matters here as is the case with setup and hold checks. If the reset toggles in the vicinity of clock edge, the flip-flop may go metastable. This is avoided by defining recovery and removal checks for reset deassertion. For the sake of simplicity, we can say that recovery and removal checks are setup and hold checks for reset deassertion.



Reset recovery check: Recovery check ensures that the deasserted reset signal allows the clock signal to take control of the output at the desired clock edge. For this, reset signal must be stable at least "recovery time" before the active clock edge. Recovery time is the minimum time required between the deassertion of reset signal and arrival of clock edge. This can be modelled similarly as a setup check with the difference of it being a single sided synchronous check only.

Reset removal check: Removal check ensures that the deasserted reset signal does not get captured on the clock edge at which it is launched by reset synchronizer. For this, reset signal must be stable at lease "removal time" after the active clock edge. Removal time is the minimum time required after the arrival of clock edge for which reset must not be deasserted at the flop's reset pin. This can be modelled similarly as a hold check with the difference of it being a single sisded synchronous check only.

Figure below shows reset de-assertion as a complete picture and summarises what we have discussed in this post.



Next read: Reset basics

Also read:




Clock multiplexer for glitch-free clock switching

In the post clock switching and clock gating checks, we discussed how important it is to have a glitch free clock. Also, in clock gating checks at a multiplexer, we discussed the conditions wherein a normal multiplexer can be used to propagate a clock without any glitches. 


In this post, we will discuss about multiplexer circuit for clock switching which can safely switch clocks without the probability of any glitches under most of the scenarios, hence, also called glitch-less multiplexer.



Definition of clock multiplexer: Let us first define a clock multiplexer "A clock multiplexer is a circuit that can switch the system from one clock to another while the chip is running. The two frequencies may be related to each other, or may to totally unrelated". A clock multiplexer switches the clock without any glitches as the glitch in clock will be hazardous for the system. Hence, a clock multiplexer is also known as a glitchless multiplexer.

Clock multiplexer for switching between two synchronous clocks:



Clock multiplexer for switching between two asynchronous clocks:



Reference: A very detailed and good explanation is provided at below link. I recommend to go through this for complete understanding of the process.
http://www.eetimes.com/document.asp?doc_id=1202359


Also read:

STA problem: Finding setup and hold slack considering ideal clock

Problem: Figure 1 below shows a timing path from a positive edge-triggered flip-flop to a positive edge-triggered flip-flop. Considering ideal clocks, and clock frequency of 100 MHz, find the setup and hold slacks for this timing path.

Solution:
Figure 1: Timing path


Ideally, all the flip-flops in design should get clock at the same time. So, ideal clock means that launch as well as capture flip-flops get clock at zero time. In other words, we can assume that clock skew is zero between start and end points.

As the clock frequency is given as a 100 MHz, time period = 1/frequency = 10 ns.

Let us first calculate the setup slack. The setup timing equation is given as:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
And equation for setup slack is given as:
SS = Tperiod - (Tck->q + Tprop + Tsetup - Tskew)  
Here,
Tck->q = 2ns, Tprop (max value of delay of combinational logic) = 4 ns+ Tsetup = 1 ns,   Tperiod = 10 ns
 Putting these values into equation for setup slack, we get setup slack for this timing path.

SS = 10 - (2 + 4 + 1 - 0) ns
SS = 3 ns

Now, hold slack can be found out from the hold timing equation. The hold timing equation is given as:
Tck->q  + Tprop > Thold + Tskew
Here,
Tck->q  =  2 ns, Tprop (min value of combinational propagation delay) = 4 ns, Thold = 1 ns
 And the equation for hold slack is given as:

HS = Tck->q  + Tprop  - (Thold + Tskew
HS = 2 + 4 - (1 + 0) = 5 ns 
So, for this timing path, setup slack value is 3 ns and hold slack value is 5 ns.

How to fix hold violations

In the post setup and hold time violations, we learnt about the setup time violations and hold time violations. In this post, we will learn the approaches to tackle hold time violations. Following strategies can be useful in reducing the magnitude of hold violation and bringing the hold slack towards a positive value:

1. Insert delay elements: This is the simplest we can do, if we are to decrease the magnitude of a hold time violation. The increase in data path delay can be increased if we insert delay elements in the data-path. Thus, the hold violating path's delay can be increased, and hence, slack can be made positive by inserting buffers in hold violating data-path.

2. Reduce the drive strength of data-path logic gates: Replacing a cell with a similar cell of less drive strength will certainly add delay to data-path. However, there is a slight chance of decrease in data-path delay if the cell load is dominated by intrinsic capacitance as we discussed in how delay of a standard cell changes with drive strength

3. Use data-path cells with higher threshold voltages: If you have multiple flavors of threshold voltages in your design, the cells with higher threshold voltage will certainly have higher delays. So, this must be the first option you must be looking for to resolve hold violations.

4. Improve hold time of capturing flip-flop: Using a capturing flip-flop with higher drive strength and/or lower threshold voltage will give a lower hold time requirement. Also, improving the transition at flip-flop's clock pin reduces its hold time requirement.

5. Detoured routing: Detoured routing can be adoped as an alternative to insertion of delay elements as it will add load to the driving cell as well as provide additional net delay thereby increasing the data-path delay.

6. Play with clock skew: A positive skew degrades hold timing and a negative skew aids hold timing. So, if a data-path is violating, we can either decrease the latency of capturing flip-flop or increase the clock latency of launching flip-flop. However, in doing so, we need to keep in mind the setup and hold slacks of other timing paths starting and/or ending at these flip-flops.

7. Increase the clk->q delay of launching flip-flop: A launching flip-flop with more clk->q delay will help ease the hold timing of the data-path. For this, either we can decrease the drive strength of the flip-flop or move it to higher threshold voltage.

Also read:

How to fix setup violations

In the post setup and hold time violations, we learnt about the setup time violations and hold time violations. In this post, we will learn the approaches to tackle setup time violations. Following strategies can be useful in reducing the magnitude of setup violation and bringing it closer towards a positive value:

1. Increase the drive strength of data-path logic gates: A cell with better drive strength can charge the load capacitance quickly, resulting in lesser propagation delay. Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load capacitor to toggle the output state. This will form an RC circuit with a certain RC time constant. A better drive-strength gate will have a lesser resistance, effectively lowering the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better drive strength will have less delay.


This strategy is going to give best results only if the load of the cell is dominated by external load capacitance. Generally, drive strength of a cell is proportional to the cell size. Thus, increasing the cell size halves its internal resistance, but doubles the internal node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell ideally remains same of doubling the size of the cell.



Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay) the delay can be anything between D/2 to D depending upon the ratio of intrinsic and external load capacitance.

Moreover, the input pin capacitance is a by-product of the size of the cell. Thus, increasing the size of the cell results in increased load for the driver cell of its input pins. So, in some cases (very high drive strength cell with less load driven by a low drive strength cell), increasing the drive strength can result in increase in magnitude of setup violation.

Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a function of cell drive strength. Also, area is a function of cell drive strength. So, increasing the drive strength to fix a setup violation results in both area and power increase (although very small in comparison to whole design).


2. Use the data-path cells with lesser threshold voltages: If you have multiple flavors of threshold voltages in your designs, the cell with lesser threshold voltage will certainly have less delay. So, this must be the first step to resolve setup violations.


3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-flop is a function of the transition at its data pin and clock pin. Better the transition at data pin, less is setup time. And worse clock transition causes less setup time. Also, a flip-flop with higher drive strength and/or lower threshold voltage is more probable of having less setup time requirement. Also, increasing the drive strength of flip-flop might cause the transition at clock pin and data pin to get worse due to higher pin loads. This also plays a role in deciding the setup time.

4. Restructuring of the data-path: Based upon the placement of data path logic cells, you can decide either to combine simple logic gates into a complex gate, or split a multi-stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area, power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate and 1 OR gate combined for same output load capacitance. But, if you need to traverse distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one logic cell can drive 200 micron and each logic cell has only one drive strength available for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power as compared to two 2-input AND gates.



Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to add a repeater, which will have its own delay. In this case, using two 2-input AND gates should give better results in terms of overall data-path delay.
 

5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in the design, the routing tool can detour the nets trying to get the place less congested. Thus, two logic cells might be placed very close, still the delay can seem to be high for both the cells ; for driver cell due to high net capacitance and for load cell due to poor transition at the input. Also, net delay can be a significant component in such scenarios. Below figure shows one such example of two AND gates situated a certain distance apart. Ideally, there could be a straight net route between the two gates. But, due to very high net density in the region, router tool chose to route the way as shown on the right to help ease the congestion (this is an exaggerated scenario to help understand better).

So, always give proper importance to net routing topology, at least for setup timing critical nets. A few tips to improve the timing you can try include:

  • Try the net to have as less detouring as possible
  • Vias increase the net resistance. So, try to have as less vias as possible
  • Higher metal layers have less resistance. So, long nets can be routed in higher layers to have less net delay

6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance. After that, its delay starts increasing rapidly. Since, net capacitance is a function of net length, we should keep a limit on the length of net driven by a gate. Also, net delay itself is proportional to square of net length. Moreover, the transitions may be very bad in such cases. So, it is wise to add repeater buffers after a certain distance, in order to ensure that the signal is transferred reliably, and in time.

7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup violation, we may either choose to increase the clock latency of capturing flip-flop, or decrease the clock latency of launching flip-flop. However, in doing so, we need to be careful regarding setup and hold slack of other timing paths that are being formed from/to these flip-flops.

8. Increase clock period: As a last resort, you may choose to time your design at reduced frequency. But, if you are targeting a particular performance, you need a minimum frequency. In that case, this option is not for you.

9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay will help meeting a violating setup timing path. This can be achieved by:
  • Improving transition at flip-flops clock pin
  • Choosing a flip-flop of high drive strength. However, if by doing so, clock transition degrades, delay can actually increase
  • Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever used a method that is not listed above? Please share your experience in comments. We will be happy to hear from you.

Also read: