Showing posts with label setup and hold. Show all posts
Showing posts with label setup and hold. Show all posts

Why setup is checked on next edge and hold on same edge? Setup and hold – the state machines essentials

Hi friends, in the post State machines – a practical perspective, we learnt about state machines. We also discussed different aspects of a state machine with the help of an example and the need of setup and hold checks to be taken care of. In this post, we will be discussing the state machine with a pinch of setup and hold and try to build a better understanding regarding these. For recapitalization, see figure 1. Each clock edge in a digital state machine represents a state. At each clock edge, all the registers in the design update their value based upon the data available at their input which is based upon the value computed on the basis of values launched by some other registers at previous clock edge. In simplest of words, state 3 is dependent upon state 2, which, in turn, is dependent upon state 1 and so on.

Each state of a state machine can be represented by a clock pulse
Figure 1: State machine representation of a clock pulse

All digital systems are synchronous systems (state machines) that require all the elements of the state machine to be in harmony. For instance, let us say, a state machine has 100 registers. All these registers need to be updated at the same time and need synchronized inputs so as to have only valid states in the state machines. For digital synchronous state machines, all the registers are synchronized by a clock signal. This is ensured with the help of setup and hold checks. But why do we need to apply setup and hold checks, is the question still unanswered.

Why setup and hold? We often encounter people saying that meeting the setup and hold requirements of a design is critical for silicon functionality. But have you ever thought why it is so? As we now know from our previous post, for a design consisting of only positive edge-triggered registers, each positive clock edge corresponds to a state and the state of the machine is updated at every clock edge since all the flip-flops capture data. In other words, the state of the machine is a function of the values of the registers at a particular clock cycle. For proper state machine functioning, the values launched from one register at one clock edge should be available at the input of the capturing flop before next clock edge arrives and should be available only after the present clock edge has passed (will become clear later on). This is necessary in order that the next state of the state machine is a valid state. If this does not happen, the state of the machine will not be what is desired. It may also happen that the state machine goes altogether into an invalid state leading to undesired results. And this is the reason setup is checked on the next edge and hold on same edge as discussed below.
Let us assume a hypothetical state machine consisting of three registers and some logic gates as shown in figure 2 below. If we assume the initial outputs of REG1 and REG2 to be 1 and 0 respectively, then the possible states can only be 100 or 010.

The shown state machine consists of three registers
Figure 2: A state machine with three registers and some logic

The state transition table for this state machine is as shown in figure 2. As is shown, there are only two valid states (100 and 010) provided the initial state of REG1 and REG2 is different (10 or 01). The state of REG3 is supposed to remain always ‘0’ as the REG1 and REG2 are supposed to always have different values at a time (given initial condition is also this).
State diagram of state machine shown in figure 2
Figure 3: State diagram of state machine shown in figure 2

The states with respect to clock waveform are as shown in figure 4 below. As is expected, each clock edge corresponds to one of the two states.

Figure 4: Clock waveforms showing different states of state machine

Now, the state of register 2 at a particular state (clock edge) depends upon state of register 1 at the previous clock edge. Let us assume register 2 is getting delayed clock with respect to register 1. In other words, there is considerable positive skew between the two registers. In this case, as shown in figure 5 below, there is a good chance that the data launched from register 1 is captured at register 2 at the same edge and not the next clock edge. Due to this, both register 1 and 2 will have same value at a particular clock cycle and the machine will run into invalid state. The data getting captured at the same edge as the launch flop is termed as hold violation (unless it is architecturally intended, the discussion of this is outside the scope of this topic).
The hold violation results due to data being captured on the same edge as it is launched on
Figure 5: Hold violation resulting in state machine going to invalid state
Similarly, for the proper functioning of the state machine, the data launched at one edge should get captured at the next edge. This is what is termed as setup check. Thus, setup check is formed on next edge only. The failure in happening so is termed as a setup violation. Similarly, the data launched at one edge should not be captured on the same edge. Thus, hold check represents this situation and ensures that the data launched on one edge is not captured on the same edge.

Based upon development of our understanding in this post, setup and hold can be defined as:

Setup check: Setup check refers to the condition in which data launched at one clock edge should get captured at the next clock edge so that the state machine functionality is preserved and the state machine transitions smoothly from one state to the next. The failure in happening so is termed as setup violation and the state machine might transition to an invalid state.



Hold check: Hold check refers to the condition in which data launched at one clock edge should not get captured at the same clock edge so that present state of the state machine does not get corrupt. The failure in happening so is termed as hold violation and the state machine might transition to an invalid state.

Interesting problem – Latches in series


Problem: 100 latches (either all positive or all negative) are placed in series (figure 1). How many cycles of latency will it introduce?

This figure shows 100 negative level-sensitive latches connected together in a chain
Figure 1 : 100 negative level-sensitive latches in series
As we know, setup check between latches of same polarity (both positive or negative) is zero cycle with half cycle of time borrow allowed as shown in figure 2 below for negative level-sensitive latches:

Setup check between two latches of same polarity is zero cycle with half cycle of time borrow allowed.
Figure 2: Setup check between two negative level-sensitive latches

So, if there are a number of same polarity latches, all will form zero cycle setup check with the next latch; resulting in overall zero cycle phase shift.

As is shown in figure 3, all the latches in series are borrowing time, but allowing any actual phase shift to happen. If we have a design with all latches, there cannot be a next state calculation if all the latches are either positive level-sensitive or negative level-sensitive. In other words, for state-machine implementation, there should not be latches of same polarity in series.

Each latch will form a zero cycle setup check with the following latch, resulting in overall zero cycle phase shift.
Figure 3 : Timing for 100 latches in series


Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References:



Setup time and hold time basics

In digital designs, each and every flip-flop has some restrictions related to the data with respect to the clock in the form of windows in which data can change or not. There is always a region around the clock edge in which input data should not change at the input of the flip-flop. This is because, if the data changes within this window, we cannot guarantee the output. The output can be the result of either of the previous input, the new input or metastability (as explained in our post  'metastability'). This window is marked by two boundary lines, one pertaining to the setup time of the flop, the other to the hold time defined as below.

Definition of Setup time: Setup time is defined as the minimum amount of time before the clock's active edge that the data must be stable for it to be latched correctly. In other words, each flip-flop (or any sequential element, in general) needs some time for the data to remain stable before the clock edge arrives, such that it can reliably capture the data. This duration is known as setup time.
The data that was launched at the previous clock edge should be stable at the input at least setup time before the clock edge. So, adherence to setup time ensures that the data launched at previous edge is captured properly at the current edge. In other words, we can also say that setup time adherence ensures that the system moves to next state smoothly. 
Definition of Hold time: Hold time is defined as the minimum amount of time after the clock's active edge during which data must be stable. Similar to setup time, each sequential element needs some time for data to remain stable after clock edge arrives to reliably capture data. This duration is known as hold time.
The data that was launched at the current edge should not travel to the capturing flop before hold time has passed after the clock edge. Adherence to hold time ensures that the data launched at current clock edge does not get captured at the same edge. In other words, hold time adherence ensures that system does not deviate from the current state and go into an invalid state.
As shown in the figure 1 below, the data at the input of flip-flop can change anywhere except within the seup time hold time window. 

Figure showing the setup and hold requirements forming a timing window during which data cannot toggle
Figure 1: Setup-hold window



A D-latch is composed of two inverters, connected in positive feedback loop which is tristated when input data path is enabled. On the other hand, when data path is tristated, this loop is enable
A D-type latch

Cause/origin of setup time and hold timeSetup time and hold time are said to be the backbone of timing analysis. Rightly so, for the chip to function properly, setup and hold timing constraints need to be met properly for each and every flip-flop in the design. If even a single flop exists that does not meet setup and hold requirements for timing paths starting from/ending at it, the design will fail and meta-stability will occur. It is very important to understand the origin of setup time and hold time as whole design functionality is ensured by these. Let us discuss the origin of setup time and hold time taking an example of D-flip-flop as in VLSI designs, D-type flip-flops are almost always used. A D-type flip-flop is realized using two D-type latches; one of them is positive level-sensitive, the other is negative level-sensitive. A D-type latch, in turn, is realized using transmission gates and inverters. Figure below shows a positive-level sensitive D-type latch. Just inverting the transmission gates’ clock, we get negative-level sensitive D-type latch.

A complete D flip-flop using the above structure of D-type latch is shown in figure below:

 A D-type flip-flop consists of two latches connected back to back in master-slave format
A D-type flip-flop



Now, let us get into the details of above figure. For data to be latched by ‘latch 1’ at the falling edge of the clock, it must be present at ‘Node F’ at that time. Since, data has to travel ‘NodeA’ -> ‘Node B’ -> ‘Node C’ -> ‘Node D’ -> ‘Node E’ -> ‘Node F’ to reach ‘Node F’, it should arrive at flip-flop’s input (Node A) at some earlier time. This time for data to reach from ‘Node A’ to ‘Node F’ is termed as data setup time (assuming CLK and CLK' are present instantaneously. If that is not the case, it will be accounted for accordingly). Similarly, it is necessary to ensure a stable value at the input to ensure a stable value at ‘Node C’. In other words, hold time can be termed as delay taken by data from ‘Node A’ to ‘Node C’.

Setup and hold checks in a design: Basically, setup and hold timing checks ensure that a data launched from one flop is captured at another properly. Considering the way digital designs of today are designed (finite state machines), the next state is derived from its previous state.  So, data launched at one edge should be captured at next active clock edge. Also, the data launched from one flop should not be captured at next flop at the same edge. These conditions are ensured by setup and hold checks. Setup check ensures that the data is stable before the setup requirement of next active clock edge at the next flop so that next state is reached. Similarly, hold check ensures that data is stable until the hold requirement for the next flop for same clock edge has been met so that present state is not corrupted.

A timing path from rise edge-triggered flip-flop to rise edge-triggered flip-flop
A sample path in a design
Shown above is a flop-to-flop timing path. For simplicity, we have assumed that both the flops are rise edge triggered. The setup and hold timing relations for the data at input of second flop can be explained using the waveforms below:


waveform showing setup and hold requirement for the sample timing path shown above
Figure showing setup and hold checks being applied for the timing path shown above


As shown, data launched from launching flop is allowed to arrive at the input of the second flop only after a delay greater than its hold requirement so that it is properly captured. Similarly, it must not have a delay greater than (clock period – setup requirement of second flop). In other words, mathematically speaking, setup check equation is given as below (assuming zero skew between launch and capture clocks):
                                Tck->q + Tprop + Tsetup < Tperiod
 Similarly, hold check equation is given as:
                               Tck->q  + Tprop > Thold

If we take into account skews between the two clocks, the above equations are modified accordingly. If Tskew is the skew between launch and capture flops, (equal to latency of clock at capture flop minus latency of clock at launch flop so that skew is positive if capture flop has larger latency and vice-versa), above equations are modified as below:
                    
                      Tck->q + Tprop + Tsetup - Tskew < Tperiod
                      Tck->q  + Tprop > Thold + Tskew

Setup checks and hold checks for reg-to-reg paths explains different cases covering setup and hold checks for flop-to-flop paths.

What if setup and/or hold violations occur in a design: As said earlier, setup and hold timings are to be met in order to ensure that data launched from one flop is captured properly at the next flop at next clock edge so as to transfer the state-machine of the design to the next state. If the setup check is violated, the data will not be captured at the next clock edge properly. Similarly, if hold check is violated, data intended to be captured at the next edge will get captured at the same edge. Setup hold violations can also lead to data changing within setup/hold window of the capturing flip-flop. It may lead to metastability failure in the design (as explained in our post 'metastability'). So, it is necessary to have setup and hold requirements met for all the flip-flops in the design and there should not be any setup/hold violation.
What if you fabricate a design without taking care of setup/hold violations: If you fabricate a design having setup violations, you can still use it by lowering the frequency as the equation involves the variable clock frequency. On the other hand, a design with hold violation cannot be run properly. So, if you fabricate a design with an accidental hold violation, you will have to simply throw away the chip (unless the hold path is half cycle as explained here). A design with half cycle hold violations only can still be used at lower frequencies.

Tackling setup time violation As given above, the equation for setup timing check is given as:
            Tck->q + Tprop + Tsetup - Tskew < Tperiod

The parameter that represents if there is a setup time violation is setup slack. The setup slack can be defined as the difference between the L.H.S and R.H.S. In other words, it is the margin that is available such that the timing path meets setup check. The setup slack equation can be given as:
            Setup slack = Tperiod -  (Tck->q + Tprop + Tsetup - Tskew)
If setup slack is positive, it means there is still some margin available in the timing path. On the other hand, a negative slack means that the paths violates setup timing check by the amount of setup slack. To get the path met, either data delay should be decreased or clock period should be increased.

Mitigating setup violation: Thus, we can meet the setup requirement, if violating, by 
1. Decreasing clk->q delay of launching flop 
2. Decreasing the propagation delay of the combinational cloud 
3. Reducing the setup time requirement of capturing flop 
4. Increasing the skew between capture and launch clocks
5. Increasing the clock period

Tackling hold time violation: Similarly, the equation for hold timing check is as below:
            Tck->q + Tprop > Thold + Tskew
The parameter that represents if there is a hold timing violation is hold slack. The hold slack is defined as the amount by which L.H.S is greater than R.H.S. In other words, it is the margin by which timing path meets the hold timing check. The equation for hold slack is given as:
            Hold slack = Tck->q + Tprop - Thold - Tskew
If hold slack is positive, it means there is still some margin available before it will start violating for hold. A negative hold slack means the path is violating hold timing check by the amount represented by hold slack. To get the path met, either data path delay should be increased, or clock skew/hold requirement of capturing flop should be decreased.

Mitigating hold violation: We can meet the hold requirement by:
  1. Increasing the clk->q delay of launching flop
  2. Decreasing the hold requirement of capturing flop
  3.  Decreasing clock skew between capturing clock and launching flip-flops
Also read:

All about clock signals

Today’s designs are dominated by digital devices. These are all synchronous state machines consisting of flip-flops. The transition from one state to next is synchronous and is governed by a signal known as clock. That is why, we have aptly termed clock as ‘the in-charge of synchronous designs’. 
Definition of clock signal: We can define a clock signal as the one which synchronizes the state transitions by keeping all the registers/state elements in synchronization. In common terminology, a clock signal is a signal that is used to trigger sequential devices (flip-flops in general). By this, we mean that on the active state/edge of clock, data at input of flip-flops propagates to the output’. This propagation is termed as state transition. As shown in figure 1, the ‘2-bit’ ring counter transitions from state ‘01’ to ‘10’ on active clock edge.

An example of a two bit ring counter making transition from one state to another on rising (active) edge of the clock

Figure 1: Figure showing state transition on active edge of clock

Clock signals occupy a very important place throughout the chip design stages. Since, the state transition happens on clock transition, the entire simulations including verification, static timing analysis and gate level simulations roam around these clock signals only. If Static timing analysis can be considered as a body, then clock is its blood. Also, during physical implementation of the design, special care has to be given to the placement and routing of clock elements, otherwise the design is likely to fail. Clock elements are responsible for almost half the dynamic power consumption in the design. That is why; clock has to be given the prime importance.

Clock tree: A clock signal originates from a clock source. There may be designs with a single clock source, while some designs have multiple clock sources. The clock signal is distributed in the design in the form of a tree; leafs of the tree being analogous to the sequential devices being triggered by the clock signal and the root being analogous to the clock source. That is why; the distribution of clock in the design is termed as clock tree. Normally, (except for sometimes when intentional skew is introduced to cater some timing critical paths), the clock tree is designed in such a way that it is balanced. By balanced clock tree, we mean that the clock signal reaches each and every element of the design almost at the same time. Clock tree synthesis (placing and routing clock tree elements) is an important step in the implementation process. Special cells and routing techniques are used to ensure a robust clock tree.

Clock domains: By clock domain, we mean ‘the set of flops being driven by the clock signal’. For instance, the flops driven by system clock constitute system domain. Similarly, there may be other domains. There may be multiple clock domains in a design; some of these may be interacting with each other. For interacting clock domains, there must be some phase relationship between the clock signals otherwise there is chance of failure due to metastability. If phase relationship is not possible to achieve, there should be clock domain synchronizers to reduce the probability of metastability failure.

Specifying a signal as clock: In EDA tools, ‘create_clock’ command is used to specify a signal as a clock. We have to pass the period of the clock, clock definition point, its reference clock (if it is a generated clock as discussed below), duty cycle, waveform etc. as argument to the command.

Master and generated clocks: EDA tools have the concept of master and generated clocks. A generated clock is the one that is derived from another clock, known as its master clock. The generated clock may be of the same frequency or different frequency than its master clock. In general, a generated clock is defined so as to distinguish it from its master in terms of frequency, phase or domain relationship

Some terminology related to clock: There are different terms related to clock signals, described below:


  •      Leading and trailing clock edge: When clock signal transitions from ‘0’ to ‘1’, the clock edge is termed as leading edge. Similarly, when clock signal transitions from ‘1’ to ‘0’, the clock edge is termed as trailing edge.


Leading and trailing edges of the clock signal
  • Launch and capture edge: Launch edge is that edge of the clock at which data is launched by a flop. Similarly, capture edge is that edge of the clock at which data is capture by a flop.
  • Clock skew: Clock skew is defined as the difference in arrival times of clock signals at different leaf pins. Considering a set of flops, skew is the difference in the minimum and maximum arrival times of the clock signal. Global skew is the clock skew for the whole design. On the contrary, considering only a portion of the design, the skew is termed as local skew.