Showing posts with label Static timing analysis. Show all posts
Showing posts with label Static timing analysis. Show all posts

Lockup latches vs. lockup registers: what to choose

Both lockup latches and lockup registers are used to make scan chain robust to hold failures. What one uses for the same depends upon his/her priorities and the situation. However, it seems lockup latches are more prevalent in designs of today. This might be due to following reasons:
  1. Area: As we know, a latch occupies only half the area as a register. So, using lockup latches instead of lockup registers gives us area and power advantage; i.e., less overhead.
  2. Timing: Lockup elements – timing perspective has given an analysis of how timing critical lockup elements (lockup latches and lockup registers) paths can be. According to it, using a negative lockup latch, you don’t have to meet timing at functional (at-speed) frequency. However, in all other cases, you need to meet timing. This might also be a reason people prefer lockup latches.

Lockup latches, on one hand relax only one side hold. So, you can afford to have skew only on one side, either on launch or on capture. Lockup registers, on the other hand, let you have skew on both the sides. So, lockup latches are preferable where you can afford to have tap on the clock either from launch flop or on capture flop. On the other hand, lockup flops can be used by tapping clock from any point as long as you meet setup and hold timings.

Hope you’ve found this post useful. Let us know what you think in the comments.

Time borrowing in latches

What is time borrowing: Latches exhibit the property of being transparent when clock is asserted to a required value. In sequential designs, using latches can enhance performace of the design. This is possible due to time borrowing property of latches. We can define time borrowing in latches as follows:
Time borrowing is the property of a latch by virtue of which a path ending at a latch can borrow time from the next path in pipeline such that the overall time of the two paths remains the same. The time borrowed by the latch from next stage in pipeline is, then, subtracted from the next path's time.
The time borrowing property of latches is due to the fact that latches are level sensitive; hence, they can capture data over a range of times than at a single time, the entire duration of time over which they are transparent. If they capture data when they are transparent, the same point of time can launch the data for the next stage (of course, there is combinational delay from data pin of latch to output pin of latch).

Let us consider an example wherein a negative latch is placed between two positive edge-triggered registers for simplicity and ease of understanding. The schematic diagram for the same is shown in figure 1 below:


A latch placed between two registers. Tha path from regA to latch can borrow time from the path between latch and regB
Figure 1: Negative level-sensitive latch between two positive edge-triggered registers

Figure 2 below shows the clock waveform for all the three elements involved. We have labeled the clock edges for convenience. As is shown, latB is transparent during low phase of the clock. RegA and RegC (positive edge-triggered registers) can capture/launch data only at positive edge of clock; i.e., at Edge1, Edge3 or Edge5. LatB, on other hand, can capture and launch data at any instant of time between Edge2 and Edge3 or Edge4 and Edge5.


Clock waveforms for positive register negative latch positive flip-flop case
Figure 2: Clock waveforms

The time instant at which data is launched from LatB depends upon the time at which data launched from RegA has become stable at the input of latB. If the data launched at Edge1 from RegA gets stable before Edge2, it will get captured at Edge2 itself.  However, if the data is not able to get stable, even then, it will get captured. This time, as soon as the data gets stable, it will get captured. The latest instant of time this can happen is the latch closing edge (Edge3 here). One point here to be noted is that at whatever point data launches from LatB, it has to get captured at RegC at edge3. The more time latch takes to capture the data, it gets subtracted from the next path. The worst case setup check at latB is at edge2. However, latch can borrow time as needed. The maximum time borrowed, ideally, can be upto Edge3. Figure 3 below shows the setup and hold checks with and without time borrow for this case:

A latch can borrow time from next stage timing path.
Figure 3: Setup check with and without time borrow

The above example consisted of a negative level-sensitive latch. Similarly, a positive level-sensitive latch will also borrow time from the next stage, just the polarities will be different.

Also read:

STA

Static timing analysis (STA) is a vast domain involving many sub-fields. It involves computing the limits of delay of elements in the circuit without actually simulating it. In this post, we have tried to list down all the posts that an STA engineer cannot do without. Please add your feedback in comments to make reading it a more meaningful experience.

  • Metastability - This post discusses the basics of metastability and how to avoid it.
  • Lockup latch - The basics of lockup latch, both from timing and DFT perspective have been discussed in this post.

  • Clock latency - Read this if you wish to get acquainted with the terminology related to clock latency

  • Data checks - Non-sequential setup and hold checks have been discussed, very useful for beginners

  • Synchronizers - Different types of synchronizers have been discussed in detail

  • On-chip variations - Describes on-chip variations and the methods undertaken to deal with these
  • Temperature inversion - Discusses the concept of temperature inversion and conductivity trends with temperature

  • Timing arcs - Discusses the basics of timing arcs, positive and negative unateness, cell arcs and net arcs etc.

  • Basics of latch timing - Definition of latch, setup time and hold timing of a latch, latch timing arcs are discussed

What is Static Timing Analysis?

Static timing analysis (STA) is an analysis method of computing the max/min delay values of a complete circuit without actually simulating the full circuit. In STA, static delays such as gate delay and net delays are considered in each path. These delays are, then, compared against the required bounds on the delay values and/or the relationship between the delays of different gates. In STA, the circuit to be analyzed is broken down into timing paths consisting of gates, registers and nets connecting these. Normally, timing paths start from and end at registers or chip boundary. Based on origin and termination of data, timing paths can be categorized into four categories:

        1.)    Input to register paths: These paths start at chip boundary from input ports and end at registers
        2.)    Register to register paths: These paths start at register output pin and terminate at register input   pin
        3.)    Register to output paths: These paths start at a register and end at chip boundary output ports
        4.)    Input to output paths: These paths start from chip boundary at input port and end at chip               boundary at output port
Timing path from each start-point to end-point are constrained to have maximum and minimum delays. For example, for register to register paths, each path can take maximum of one clock cycle (minus input/output delay in case of input/output to register paths). The minimum delay of a path is governed by hold timing requirement of the endpoints. Thus, the maximum delay taken by a timing path governs the maximum frequency of operation.
As stated before, Static timing analysis does timing analysis without actually simulating the circuit. The delays of cells are picked from respecting technology libraries. The delays are available in libraries in tabulated form on the basis of input transition and output load, which have been calculated based by simulating the cells for a range of boundary conditions. Net delays are calculated based upon R and C models.

One important characteristic of static timing analysis that must be discussed is that static timing analysis checks the static delay requirements of the circuit without applying any vectors, hence, the delays calculated are the maximum and minimum bounds of the delays that will occur in real application scenarios with vectors applied. This enables the static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to certify the functionality of the design. Thus, static timing analysis guarantees the timing of the design whereas dynamic timing analysis guarantees functionality for real application specific input vectors.

I hope you’ve found this post useful. Let me know what you think in the comments. I’d love to hear from you all.

Noise margins



In this realistic world, nothing is ideal. A signal travelling along a wire/cable/transmission line is susceptible to noise from the surroundings. Also, there is degradation in signal due to parasitic elements involved in the line. Moreover, the output signal produced by the transmitter itself only does resemble the ideal signal thereby worsening the scenario. There are repeaters/buffers along the line to minimize the impact of noise. But there is a limit up to which degradation is allowed beyond which the receiver is unable to sense the correct value of the signal. This degradation is measured in terms of noise margins. One can find the topic discussed in all the textbooks related to digital logic and system design might it be CMOS, TTL or any other logic family.

Let us illustrate the concept of noise margins with the help of an example. Let us assume that a signal has to travel from a transmitter to a receiver through an inter-connect element (or, commonly called as a net) which will only degrade the signal, since there is no active element in-between transmitter and receiver. The output signal produced by Transmitter (Tx) will deviate from ideal voltage levels as is shown in figures 1 and 2 for logic level ‘1’. In addition, there will be signal degradation by inter-connect element as well as noise induced from the surroundings. As a result, the band of voltages that can be present at the receiver input for logic ‘1’ will further widen. Now, there are two cases:

  1. If the band voltages recognized as logic ‘1’ by the receiver is super-set of the band of voltages that can exist at the receiver input as shown in figure 1, receiver will recognize the transmitted logic ‘1’ for all the cases. This is the desired scenario as no logic ‘1’ transmitted will be missed by the receiver. This scenario is depicted in figure 1, wherein the noise induced by surroundings is such that the range of voltages present at the receiver does not violate the band of voltages recognized as voltage '1' by the receiver. So, it will be recognized correctly as logic '1' by the receiver.

When the noise induced is less than noise margin, it will be captured properly by the receiver
Figure 1: Figure showing the noise induced is less than noise margin


2)  If the band of values recognized as logic ‘1’ by the receiver is a sub-set of the band of voltages that can exist at the receiver input as shown in figure 2, there will be some cases that will not be recognized as logic ‘1’, but are intended to be recognized. So, there will be a loss of information/incorrect transmission of information possible in such cases. This scenario is depicted in figure 2, wherein the noise induced by surroundings makes the band of voltage at the receiver's input larger than that can be decoded correctly as logic '1' by the receiver. So, there is no guarantee that the signal will be perceived as logic '1' by the receiver.

Figure showing the noise induced is less than noise margin. In case this happens, the signal will not be correctly decoded by the receiver.
Figure 2: Figure showing the noise induced is greater than noise margin
Let us now label each of these regions to make the discussion more meaningful. The lowest voltage that will be produced as logic ‘1’ by the transmitter is termed as VOH and, let us say, highest is VDD. (We are here considered about lower level only). So, the range of voltages produced by the transmitter is (VDD – VOH).  And let the receiver accept voltages higher than VIH. So the range of voltages accepted by the receiver will be (VDD – VIH). So, the maximum degradation that can happen over the communication channel is (VOH – VIH) which is nothing but the noise margin. If the degradation is less than this figure, the logic ‘1’ will be recognized correctly by the receiver; otherwise it won’t. So, the noise margin equation can be given as below for logic '1':


Noise margin for logic '1' (NM) = VOH – VIH
Where
VOH = Lowest level of voltage that can be produced as logic '1' by the transmitter
VIH = Lowest level of voltage that can be recognized as logic '1' by the receiver

Similarly, for logic ‘0’, the range of outputs that can be produced by the transmitter is (0 - VOL) and the range of input voltages that can be detected by the receiver is (0 – VIL), thereby providing the noise margin as:
Noise margin (NM) = VIL – VOL

Where

VIL = Highest level of voltage that can be recognized as logic ‘0’ by the receiver.
VIH = Highest level of voltage that is produced as logic ‘0’ by the transmitter.

Figure 3 shows all these levels for the example we had taken earlier to demonstrate the concept of noise margins.

Noise margin calculation.
Figure 3: Noise margin

From out preceding discussion, if the degradation over the communication channel is more than noise margin, it will not be detected correctly by the receiver. So, it is imperative for the designer to design accordingly.


Definition of noise margin: Thus, we can conclude this post by defining noise margin as below:
"Noise margin is the difference between the worst signal voltage produced by the transmitter and the worst signal that can be detected by receiver."
Also read

Can hold check be frequency dependant?


We often encounter people argue that hold check is frequency independent. However, it is only partially true. This condition is true only for zero-cycle hold checks. By zero cycle hold checks, we mean that the hold check is performed on the same edge at which it is launched. This is true in case of timing paths between same polarity registers; e.g. between positive edge-triggered flops. Figure 1 below shows timing checks for a data-path launched from a positive edge-triggered flip-flop and captured at a positive edge-triggered flip-flop. The hold timing, in this case, is checked at the same edge at which data is launched. Changing the clock frequency will not cause hold check to change.

Setup check for positive edge-triggered flip-flop to positive edge-triggered flip-flop is single cycle and hold check is zero cycle
Figure 1: Setup and hold checks for positive edge-triggered to positive edge-triggered flip-flop
Most of the cases in today’s designs are of this type only. The exceptions to zero cycle hold check are not too many. There are hold checks for previous edge also. However, these are very relaxed as compared to zero cycle hold check. Hence, are not mentioned. Also, hold checks on next edge are impossible to be met considering cross-corner delay variations. So, seldom do we hear that hold check is frequency dependant. Let us talk of different scenarios of frequency dependant hold checks:

  1.  From positive edge-triggered flip-flop to negative edge-triggered flip-flop and vice-versa: Figure 2 below shows the setup and hold checks for a timing path from positive edge-triggered flip-flop to a negative edge-triggered flip-flop. Change in frequency will change the distance between the two adjacent edges; hence, hold check will change. The equation for hold timing will be given for below case as:

Tdata + Tclk/2 > Tskew + Thold
or
Tslack =  Tclk/2 - Thold - Tskew + Tdata
          Thus, clock period comes into picture in calculation of hold timing slack.

Both setup and hold checks are half cycle. Setup is checked on next edge whereas hold is checked on previous edge
Figure 2: Setup and hold checks for timing path from positive edge-triggered flip-flop to negative edge-triggered flip-flop

Similarly, for timing paths launching from negative edge-triggered flip-flop and being captured at positive edge-triggered flip-flop, clock period comes into picture. However, this check is very relaxed most of the times. It is evident from above equation that for hold slack to be negative, the skew between launch and capture clocks should be greater than half clock cycle which is very rare scenario to occur. Even at 2 GHz frequency (Tclk = 500 ps), skew has to be greater than 250 ps which is still very rare.
Coming to latches, hold check from a positive level-sensitive latch to negative edge-triggered flip-flop is half cycle. Similarly, hold check from a negative level-sensitive latch to positive edge-triggered flip-flop is half cycle. Hence, hold check in both of these cases is frequency dependant.

2. Clock gating hold checks: When data launched from a negative edge-triggered flip-flop gates a clock on an OR gate, hold is checked on next positive edge to the edge at which data is launched as shown in figure 3, which is frequency dependant.

Setup check is single cycle and hold check is half cycle and checked on next clock edge with respect to launch clock edge
Figure 3: Clock gating hold check between data launched from a negative edge-triggered flip-flop and and clock at an OR gate

           Similarly, data launched from positive edge-triggered and gating clock on an AND gate form half cycle hold. However, this kind of check is not possible to meet under normal scenarios considering cross-corner variations.

3)      Non-default hold checks: Sometimes, due to architectural requirements (e.g. multi-cycle paths for hold), hold check is non-zero cycle even for positive edge-triggered to positive edge-triggered paths as shown in figure 4 below.
Figure 4: Non-default hold check with multi-cycle path of 1 cycle specified







Worst Slew Propagation


Worst slew propagation is a phenomenon in Static Timing Analysis. According to it, the worst of the slews at the input pin of a gate is propagated to its output. As we know, the output slew of a logic cell is a function of its input slew and output load. For a multi-input logic gate, the output slew should be different for the timing paths through its different input pins. However, this is not the case. This is due to the reason that to maintain a timing grapth, each node in the design can have only 1 slew. So, to cover the worst scenario for setup timing, the maximum slew at each output pin should be equal to that caused by the input pin having worst of the slews. The output slew calculated is on the basis of worst input slew, even if the timing path for which the output slew is being calculated is not through the input pin with worst slew. Similarly, the best of the slews is calculated based upon the effect of all the input pins for hold timing analysis. We can refer to it as best slew propagation.

Let us illustrate with the help of a 2-input AND gate. As shown in figure below, let the slews at the input pins be denoted as SLEW_A and SLEW_B and that at the output pin as SLEW_OUT. Now, as we know:

SLEW_OUT = func (SLEW_A) if A toggles leading to OUT toggling
And SLEW_OUT = func (SLEW_B) if B toggles leading to OUT toggling

However, even though the timing path as shown through A pin, the resultant slew at output SLEW_OUT will be calculated as:

SLEW_OUT         =  func (SLEW_A) if func(SLEW_A) > func(SLEW_B)

                                =  func (SLEW_B) if func(SLEW_B) > func(SLEW_A)



Worst slew propagation is carried out through the worst of all the slews caused by each input pin
Figure 1: Figure showing worst slew propagation

One may feel this as an over-pessimism inserted by timing analysis tool. Path based timing analysis will not have worst slew propagation phenomenon as it calculates output slew for each timing path rather than one slew per node. 

Similarly, for performing timing analysis for hold violations, the best of the slews at inputs is propagated to the output as mentioned before also. 

Also read:



Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References:



Metal ECO - the process



A metal-only ECO is carried out by changing only metal interconnects in the design. Metal-only ECOs are very common in today’s semiconductor industry as they save complete silicon re-spin. Sometimes there may be need to change the design for various reasons, and that too, a minor change. These changes may be due to some bug in the design or due to customer demand. A metal-only ECO enables the design to be re-fabricated only for a few layers. It is very cost-effective as for complete silicon re-spin, there may be a requirement of around 100 layer masks to be manufactured. Metal-only ECOs enable the older masks to be used for most of the layers. Only the layers with changes in them need to be manufactured again, which is usually 2 to 4 in case of metal-only ECOs.
The steps to carry out metal-only ECOs are explained below:
1.) A number of spare cells are sprinkled throughout the design before being taped-out so as to facilitate metal layer ECOs later on. The set of spare cells is chosen very carefully considering in mind the nature of design and the probability of metal ECO later on (it depends upon how mature the design building blocks are)
2.)  First, the changes to be made are evaluated if these can be carried out by changing only metal layers. For this purpose, spare cells in the vicinity of the ECO location need to be observed. If there is enough number of spare cells there, these can be used. On the other hand, if there is not enough number of spare cells to represent the logic change, the ECO cannot be carried out using only metal layers. It has to be, then, carried out using all the layers as more cells will need to be added. It will, then, result in re-spin of the design. 
3.) If there is enough number of spare cells available, the appropriate spare cells to represent the design change are selected in the vicinity of the logic to be changed. Interconnects are, then, modified so as to represent the modified circuit. 
4.) The resulting layout is checked for timing and DRC/LVS violations. If everything is fine, the design is sent to be fabricated. There, masks for the modified layers are manufactured using the older masks for layers not modified.
5.) If there is any violation related to timing or DRC/LVS, steps 2, 3 and 4 are repeated until the design is clean with respect to these.

Also read:
References
·         http://www.cadence.com/Community/blogs/ii/archive/2010/11/23/user-interview-how-metal-only-ecos-save-full-silicon-respins.aspx