April 2020 : VLSI n EDA

Timing path types

In the post - timing paths- we discussed about timing paths and common components of a timing path. We also discussed that the type of a timing path is perceived by its components, the elements encountered in reference path, the elements encountered in constrained path and the type of check between reference signal and constrained signal. We also discussed how these signal traversals are differentiated into different components of timing paths - startpoint, endpoint, launch clock path and capture clock path. Based on these, we can categorize the timing paths into broadly following categories. We will not talk about min/hold and max/setup paths, but each of below categories can further be differentiated into these based upon the type of check being formed. Also, it is to be noted that every timing path is, essentially, either of the type of a generic timing path or a modeling in some or the other form of a generic timing path as shown in figure 1.

Figure 1: Generic timing path

Reg-to-reg paths: The timing path where both "startpoint" and "endpoint" are sequential elements, e.g. a flip-flop, a latch or a memory element is termed as a reg-to-reg path in common terminology.

In-to-reg path: The timing path where "startpoint" is an input port and "endpoint" is a sequential element, is termed as in-to-reg path.

Reg-to-out path: Here, "startpoint" is a sequential element and "endpoint" is an output port.

In-to-out path: In this type of path, "startpoint" is an input port and "endpoint" is and output port.

Clock gating paths: In this type of path, "startpoint" can be any out of sequential element, input port or output port. The endpoint is usually input pin of either a combinational gate or an Integrated Clock Gating cell (ICG). The common scenario involved is to time the arrival of constrained signal (termed as enable in clock gating paths) such that complete pulses of clock as reference signal are transmitted and there is no glitch at the output of the "endpoint".

Min-pulse-width-check paths: Here, both reference and constrained path, both are clock paths and common right from source till "endpoint". This type of path compares the latest arrival of rise transition of the clock with respect to the earliest arrival of fall transition of clock and vice-versa. The nature of check is max check only.

Data check paths: In this type of paths, both reference signal and constrained signal are data launched by a clock signal.

Point-to-point paths: The paths with only constrained signal are called as point-to-point paths. "startpoint" as well "endpoint" can be any sequential or combinational pin or port.

Timing paths

The most important element of a design in Static Timing Analysis is a timing path. A design is broken down into a set of timing paths. Each timing path is analyzed by a set of timing equations for possible violations of timing. A timing path can be defined as flow of timing information (such as delay, transition etc.) through a set of elements which can be accumulated and verified against a specified set of rules.

A timing path can be supposed to be consisting of two sub-paths - a reference path through which reference signal traverses and a constrained path through which constrained signal traverses. Both of these essentially originate from same source (or have a definite relationship at their respective sources). At the terminal end of both, there is a relationship governing the arrival of constrained signal to the arrival of reference signal. Depending upon the type of reference signal and constrained signal, the type of elements encountered by these and the check that is formed between the two, we govern the type of path. For instance, in a reg-to-reg setup path, the reference signal is clock, constrained signal is data launched from a clock and traversing through a flip-flop and the check that is formed between the two signals is a setup check at a flip-flop as the endpoint.

Figure 1: Generic timing path in STA

Figure 1 above shows a generic timing path. The elements of the path are not shown individually. The path that is common among constrained signal and reference signal is termed as common path.

Based upon type of check being formed between constrained signal and reference signal, there are commonly two types of paths that are formed: max path/setup check path and hold check path/min path.

Max/setup check path: In this kind of path, the earliest arrival of reference signal and latest arrival of constrained signal is considered. The kind of check is known as setup check in most of the cases. And the type of path is called setup path/max path.

Min/hold check path: In this kind of paths, the earliest arrival of constrained signal and the latest arrival of reference signal is considered. The kind of check is known as hold check in most of the cases. And the type of path is called hold path/min path.

Let us move to the commonly perceived understanding of a timing path by taking an example of a reg-to-reg path. Figure 1 below shows an example of a timing path, which starts from a flip-flop and ends at a flip-flop.

Figure 2: Components of a reg-to-reg path

The above timing path (or any timing path, in general), has following components:

Startpoint: The element from which the data gets launched is known as startpoint. In general, it can be a sequential element (latch, flip-flop) or an input port. In case it is a flip-flop, the clock pin of the flip-flop is counted as the startpoint of timing path. For point-to-point paths, it can also be a combinational input or output pin.

Endpoint: The element at which timing path ends is called the endpoint. It can be data pin of flip-flop or an output port. For point-to-point paths, it can also be a combinational input or output pin.

Clock: Most of the timing paths are constrained by a clock signal, which clocks both startpoint and endpoint. The properties of the clock signal, such as clock period, jitter etc are defined in timing constraints.

Launch clock path: It refers to the path traversed by clock signal from clock source to the startpoint.

Capture clock path: It refers to the path traversed by clock signal from clock source to the endpoint.

Data path: It refers to the path traversed by data signal from starptoint to endpoint.

In the above example, launch clock path and data path together constitute constrained signal path and capture clock path constitutes reference signal path.

Timing requirements/constraints related to a reset synchronizer

In the post reset synchronizer, we discussed the functionality associated with a reset synchronizer. Here, we will discuss the timing associated with a reset synchronizer. Figure 1 below shows a reset synchronizer.

Figure 1: Reset synchronizer

As we can see, a reset synchronizer is expected to have 3 pins other than a clock pin. We will discuss the timing requirements of each of these one-by-one:

R0/D (Data input pin) is tied to 1, hence, no timing requirement related to this.
Reset deassertion timing is required for R0/Q -> R1/D at the clock frequency at which reset deassertion is happening
Similarly, reset deassertion timing is required for R1/Q -> functional_flops/Rbar pins
Timing at R0/Rbar pin is not required, since, it is put there to absorb metastability and come out of metastability before next clock edge
Timing at R1/Rbar pin is not required, since, when Rbar gets deasserted, R1/D and R1/Q are both at value "0".
Both R0 & R1 need at least a certain pulse width at Rbar pin in order to detect the reset. This requirement is generally given in the timing model of flip-flop

Points 4 & 5 assume that there is not much skew between R0/Rbar and R1/Rbar, which is true since either there is a custom cell made as a reset synchronizer or both the flip-flops are placed very close to each other, leaving almost no scope for a large skew.

Points 2 & 3 require that reset synchronizer and its fanout flip-flops are clocked on a related clock.

Thus, there are following constraints related to a reset synchronizer:

Reset synchronizer must be clocked on either same or related clock to its fanout flops
set_false_path -to R0/Rbar
set_false_path -to R1/Rbar
Min-pulse-width requirement at Rbar pins modelled in timing models

Min-pulse-width-check timing paths

All sequential elements require the clock pulse to have a pulse of at-least a certain width in order to function correctly. This is coded in their timing models in terms of a "minimum pulse width" requirement. And the timing path pertaining to check minimum pulse width of a signal is termed as "min-pulse-width-check" timing path. A min-pulse-width-check timing path essentially checks the arrival of one transition of the clock at the endpoint with respect to its fall transition. In other words, both constrained signal and reference signal are the same, just the opposite transitions.

For instance, min-pulse-width-check timing path for a high pulse will have fall transition as the reference signal and rise transition as constrained signal. The latest arrival of rise transition is checked against earliest arrival of fall transition Similarly, min-pulse-width-check timing path for a low pulse will have fall transition as constrained signal and rise transition as reference signal. The latest arrival of fall transition will be checked against earliest arrival of rise transition.

Constraining min-pulse-width-check timing paths: There are two kind of scenarios:

Min-pulse-width requirement for the pin is picked from timing model

User specifically specifies a "min-pulse-width" check at a specific pin using "set_min_pulse_width" command. This may be required in certain scenarios, such as a clock going out of the design through an output port, and we need to maintain a minimum duty cycle for the outgoing clock.

Regardless of this, we just need to define a clock of required period and we can report min-pulse-width-check timing path as desired.

In-to-out paths

In-to-out paths start from an input port and end at an output port. Figure 1 below shows an in-to-out path. As shown in figure 1, most of the times (not always), in-to-out paths are subset of a reg-to-reg path seen from a higher level of hierarchy. For instance, an in-to-out path at the level of a block-level design may be a reg-to-reg path as seen at SoC flat.

For in-to-out paths, the clock path is always assumed fully outside the design.

Constraining in-to-out paths: There are two ways that we can constrain in-to-out paths:

Constraining with respect to a virtual clock: We can consider in-to-out path as a sub-segment of a larger reg-to-reg path. And we can constrain these paths using a virtual clock. Using "set_input_delay" for input port and "set_output_delay" for output port with respect to same virtual clock, these paths can be constrained.

Create a clock VCLK without any source
"set_input_delay" at input_port with respect to VCLK
"set_output_delay" at output_port with respct to VCLK

Constraining as point-to-point paths: We can constrain in-to-out paths using "set_max_delay" command as point-to-point paths. However, using this approach, we may need to apply some extra constraints as well depending upon the behavior of the tool we are using.

Where does STA fit in backend design flow

Static Timing Analysis (STA) is an integral part of backend design cycle. At each stage in the design cycle, it is imperative to check for timing violations by carrying out static timing analysis and do course correction, if required. A typical physical design cycle involves logic synthesis followed by test insertion, placement, clock tree synthesis and data routing. At each of these stages, we need to ensure that the timing is under control by running STA. Figure 1 below shows, on a higher level, different steps involved in physical design cycle; and how STA is an integral part of each of the stages.

Figure 1: STA as an integral part of physical design cycle

The first step in physical design cycle is synthesis and test insertion followed by floorplanning and placement. Although, nowadays, some tools combine synthesis and placement to save on run time and efforts. At this level, STA is run with ideal clocks and estimated parasitic values either on the basis of fanout with a model known popularly as WLM (Wire Load model) or on the basis of actual placement of instances. Since, a lot of variables are not taken into account at this stage, it is preferred to meet setup timing with some margin so as to accommodate the degradation in setup timing due to those variables later in the design cycle. These variables include clock skew, uncommon path in clock network, data nets detouring during actual routing and crosstalk effects. When clock tree is built, we have actual clock skew numbers. So, STA needs to be run to check if the margins for clock skew need to be accommodated into some more optimization (area, power or timing) due to actual clock skews and uncommon clock paths being different than the margin assumed. Similarly, after data routing, we can run STA with actual parasitic values and crosstalk effects, thereby signing off timing using STA by running accross all possible corner scenarios.

Static Timing Analysis

What is STA: STA (Static Timing Analysis) is a method to validate the timing performance and hence, functionality of the designs. STA is based upon calculating the limits of minimum and maximum delay of logic elements through timing models. Using these calculated delays and based upon a set of equations, it is, then, determined if the design will pass or not.

An interesting thing to note about STA is that there is no importance given to actual functionality or state machine model of the design. The only thing of concern is how fast and accurately maximum and minimum delay bounds can be calculated.

Why is STA important: An SoC is supposed to run in a range of temperatures and voltages. Also, there are variations in process parameters while manufacturing chips. To guarantee performance and functionality across all combinations, it is important to analyze timing and check for any possible timing failures. STA is a very fast method to achieve the same as opposed to dynamic timing simulations (spice simulations). In other words, STA is one of the most important steps of chip design flow to check the design performance with respect to timing constraints.

How STA works: As stated earlier, STA works on calculating timing bounds and validating against a set of timing equations. One of the most important aspects of timing is the delay of individual elements and overall delay between sequential elements. Let us consider a flip-flop sending some signal to another flip-flop through a combinational logic as shown in figure 1 below.

Figure 1: A sample signal propagation between two sequential elements

For the above to work properly, signal that is launched from FLOP1 on a clock edge should reach FLOP2 only after hold time has passed after the clock edge (definition of hold check). Thus, the sum of minimum delay values of all the elements from FLOP1 to FLOP2 must be greater than hold time of FLOP2, thus giving below equation for minimum delay limit.

FLOP1_delay (CK_to_Q_min) + NET1_delay(min) + CELL1_delay(A_to_Z_min) + NET2_delay(min) + CELL2+delay(A_to_Z_min) + NET3_delay - FLOP2_hold > 0

Similarly, the signal launched from FLOP1 on a clock edge should reach FLOP2 setup time before the next clock edge (definition of setup check). Thus, the sum of maximum delay values of all the elements from FLOP1 to FLOP2 must be less than time period of clock received by both flops - setup time of FLOP2, thus giving below equation for maximum delay limit.

FLOP1_delay (CK_to_Q_max) + NET1_delay(max) + CELL1_delay(A_to_Z_max) + NET2_delay(max) + CELL2_delay(A_to_Z_max) + NET3_delay < CLK_period - FLOP2_setup

Of course, we can differentiate max delay as rise_max/fall_max and min delay as rise_min/fall_min. But for simplicity, we chose not to differentiate. Also, we considered ideal scenario wherein clock arrives at the same time on both the flip-flops, and no cross-talk effects.

Now the question arises how all the delays mentioned are calculated. If you observe carefully, there are three kinds of delays mentioned above: cell delays, net delays and setup/hold check values. For cell delays and setup/hold check values, there are cell timing models, in liberty format in most of the cases. Liberty format implements a lookup-table based delay model which is a set of values varying with transition and load values. These values are interpolated based upon the actual load and slew values to calculate cell delays. For net delays, tools implement a delay calculation engine based upon parasitic values of the nets. There is a different model of such values for each of the corner-case scenarios; and STA is run separately for each such scenario to provide a complete coverage of the design accross all use-case scenarios.

How is STA different than dynamic simulations: Dynamic timing analysis needs a set of input vectors to work. It works by propagating actual values and calculating actual differential equations as provided in spice models, which are quite effort intensive. Moreover, the set of input vectors for a design with 50 inputs itself will be so big that it is not possible to run dynamic simulations at all possible corner-case scenarios for all set of input vectors. On the other hand, static timing analysis works on delay bounds without the need of any input vectors; and hence, is pretty fast. That is why, static timing analysis is a more popular way of timing analysis. On the other hand, of course, dynamic analysis is more accurate. So, the paths passing with very-very small margins can be run through spice simulations as well in order to be extra cautious about the robustness of the design against failures. Thus, in all, the overall approach can be to have both static as well as dynamic analysis for timing, with static timing analysis providing a complete coverage and dynamic simulations being a confidence booster for design robustness against failures by checking for real application specific input vectors.

VLSI UNIVERSE