Reset Synchronizer


Need for reset synchronizer: The way most of the designs have been modelled needs asynchronous reset assertion and synchronous de-assertion. The requirement of most of the designs these days is:
  1. When reset is asserted, it propagates to all designs; brings them to reset state whether or not clock is toggling; i.e. assertion should be asynchronous
  2. When reset is deasserted, wait for a clock edge, and then, move the system to next state as per the FSM (Finite State Machine); i.e. deassertion should be synchronous
The top level reset sources are mostly asynchronous, both in assertion and during deassertion. The circuit that manipulates the asynchronous reset to have asynchronous assertion and synchronous deassertion is referred as reset synchronizer.

Definition of reset synchronizer: A reset synchronizer synchronizes the deassertion of reset with respect to the clock domain. In other words, a reset synchronizer manipulates the asynchronous reset to have synchronous deassetion.

Figure 1 below shows the schematic representation of how a reset synchronizer is built. It consists of two registers connected in series, the input of first register tied to VDD. The asynchronous reset signal is connected to the Rbar pin of both the register. The output of this circuit has synchronized de-assertion. This synchronised reset fans out to the design.

Figure 1: Reset synchronizer

How reset synchronizer works: When the reset is asserted, it first propagates to reset synchronizer flops. It resets both the flops of reset synchronizer asynchronously (without waiting for clock edge) thereby generating reset assertion for fanout registers. Figure 2 below shows the scenario of reset assertion, and also the timing waveforms associated with assertion of reset.

Figure 2: Reset assertion


Similarly, the de-assertion of reset first reaches the two flops of reset synchronizer. Now, the first flop in chain propagates 1 to intermediate output upon arrival of a clock edge. Upon next clock edge, this signal propagates to the output thereby reaching the fanout registers. The reset de-assertion timing (recovery and removal checks timing) should be met from second stage of reset synchronizer to all the domain registers' reset pins as the deassertion is synchronous.







Some facts about reset synchronizer:

  1. The reset synchronizer manipulates the originally asynchronous reset to have synchronous deassertion.
  2. The reset synchronizer must fanout to all the registers that need to be "OUT OF RESET" in a single cycle. And there must be a single synchronizer for all such flops, otherwise, some flops will be out of reset 2 cycles later, some 3 cycles later; thus, defeating the purpose of reset synchronization.
  3. There can, of course, be multiple reset synchronizers in the design, with the number equal to number of functional clock domains. Each reset synchronizer fans out to all the resettable flops of its own clock domain.
  4. The timing constraints related to a reset synchronizer are discussed here.
Also read:

Quiz: Modeling skew requirements with data-to-data setup and hold checks


Problem: Suppose there are 'N' signals which are to be skew matched within a window of 200 ps with respect to each other. Model this requirement with the help of data setup and hold checks.

As we discussed in data setup and data hold checks, data setup check of 200 ps means that constrained data should come at least 200 ps before the reference data. Similarly, data hold check of 200 ps constrains the constrained data to come at least 200 ps after the reference data. The same is shown pictorially in figure 1(a) and 1(b).


Data check of 200 ps ensures the constrained data to come at least 200 ps before the reference data
Figure 1(a): Data setup check of 200 ps constrains the constrained signal to toggle at-least 200 ps before reference signal toggles.

Data hold check of 200 ps ensures that the constrained data comes at least 200 ps after the reference data
Figure 1(b): Data hold check of 200 ps constrains the constrained signal to toggle at-least 200 ps after the reference signal has toggled.

Now, suppose you apply a data setup check of -200 ps instead of 200 ps. This would mean that the constrained signal can toggle upto 200 ps after the reference signal. Similarly, a data hold check of -200 ps would mean that the constrained signal can toggle from 200 ps before the reference signal. If we apply both the checks together, it would infer that constrained signal can toggle in a window that ranges from 200 ps before the toggling of reference signal to 200 ps after the toggling of reference signal. This is pictorially shown in figures 2(a) and 2(b).

Negative data setup and data hold checks together will ensure a skew check between reference and constrained signals
Figure 2(a): Negative data setup and hold checks of 200 ps
If we combine the two checks, it implies that the constrained data can toggle upto 200 ps after and from 200 ps before the reference signal. In other words, we have constrained the constrained signal to toggle in a window +- 200 ps within the reference signal.

Coming to the given problem, if there are a number of signals required to toggle within a window of 200 ps, we can consider one of these to act as reference signal and other signals as constrained signals. The other signals can then be constrained in both setup and hold with respect to reference signal such that all of these lie within +-100 ps of the reference signal. The same is shown in figure 3 below:

We can constrain the signals to toggle within a window by choosing one of these as reference signal and applying negative data setup and data hold checks for other signals with respect to reference signal
Figure 3: Data checks to maintain skew between N signals


Lockup latches vs. lockup registers: what to choose

Both lockup latches and lockup registers are used to make scan chain robust to hold failures. What one uses for the same depends upon his/her priorities and the situation. However, it seems lockup latches are more prevalent in designs of today. This might be due to following reasons:
  1. Area: As we know, a latch occupies only half the area as a register. So, using lockup latches instead of lockup registers gives us area and power advantage; i.e., less overhead.
  2. Timing: Lockup elements – timing perspective has given an analysis of how timing critical lockup elements (lockup latches and lockup registers) paths can be. According to it, using a negative lockup latch, you don’t have to meet timing at functional (at-speed) frequency. However, in all other cases, you need to meet timing. This might also be a reason people prefer lockup latches.

Lockup latches, on one hand relax only one side hold. So, you can afford to have skew only on one side, either on launch or on capture. Lockup registers, on the other hand, let you have skew on both the sides. So, lockup latches are preferable where you can afford to have tap on the clock either from launch flop or on capture flop. On the other hand, lockup flops can be used by tapping clock from any point as long as you meet setup and hold timings.

Hope you’ve found this post useful. Let us know what you think in the comments.

Time borrowing in latches

What is time borrowing: Latches exhibit the property of being transparent when clock is asserted to a required value. In sequential designs, using latches can enhance performace of the design. This is possible due to time borrowing property of latches. We can define time borrowing in latches as follows:
Time borrowing is the property of a latch by virtue of which a path ending at a latch can borrow time from the next path in pipeline such that the overall time of the two paths remains the same. The time borrowed by the latch from next stage in pipeline is, then, subtracted from the next path's time.
The time borrowing property of latches is due to the fact that latches are level sensitive; hence, they can capture data over a range of times than at a single time, the entire duration of time over which they are transparent. If they capture data when they are transparent, the same point of time can launch the data for the next stage (of course, there is combinational delay from data pin of latch to output pin of latch).

Let us consider an example wherein a negative latch is placed between two positive edge-triggered registers for simplicity and ease of understanding. The schematic diagram for the same is shown in figure 1 below:


A latch placed between two registers. Tha path from regA to latch can borrow time from the path between latch and regB
Figure 1: Negative level-sensitive latch between two positive edge-triggered registers

Figure 2 below shows the clock waveform for all the three elements involved. We have labeled the clock edges for convenience. As is shown, latB is transparent during low phase of the clock. RegA and RegC (positive edge-triggered registers) can capture/launch data only at positive edge of clock; i.e., at Edge1, Edge3 or Edge5. LatB, on other hand, can capture and launch data at any instant of time between Edge2 and Edge3 or Edge4 and Edge5.


Clock waveforms for positive register negative latch positive flip-flop case
Figure 2: Clock waveforms

The time instant at which data is launched from LatB depends upon the time at which data launched from RegA has become stable at the input of latB. If the data launched at Edge1 from RegA gets stable before Edge2, it will get captured at Edge2 itself.  However, if the data is not able to get stable, even then, it will get captured. This time, as soon as the data gets stable, it will get captured. The latest instant of time this can happen is the latch closing edge (Edge3 here). One point here to be noted is that at whatever point data launches from LatB, it has to get captured at RegC at edge3. The more time latch takes to capture the data, it gets subtracted from the next path. The worst case setup check at latB is at edge2. However, latch can borrow time as needed. The maximum time borrowed, ideally, can be upto Edge3. Figure 3 below shows the setup and hold checks with and without time borrow for this case:

A latch can borrow time from next stage timing path.
Figure 3: Setup check with and without time borrow

The above example consisted of a negative level-sensitive latch. Similarly, a positive level-sensitive latch will also borrow time from the next stage, just the polarities will be different.

Also read:

2-input gates using 2:1 mux

Definition of a multiplexer: A 2^n-input mux has n select lines. It can be used to implement logic functions by implementing LUT (Look-Up Table) for that function. A 2-input mux can implement any 2-input function, a 4-input mux can implement any 3-input, an 8-input mux can implement any 4-input function, and so on. This property of muxes makes FPGAs implement programmable hardware with the help of LUT muxes. In this post, we will be discussing the implementation of 2-input AND, OR, NAND, NOR, XOR and XNOR gates using a 2-input mux.


2-input AND gate implementation using 2:1 mux: Figure 1 below shows the truth table of a 2-input AND gate. If we observe carefully, OUT equals '0' when A is '0'. And OUT follows B when A is '1'. So, if we connect A to the select pin of a 2:1 mux, AND gate will be implemented if we connect D0 to '0' and D1 to 'B'.

A 2-input AND gate has output '0' when either or both inputs is '0'. And output is '1' when both the inputs are '1'.
Figure 1: Truth table of AND gate
Figure 2 below shows the implementation of 2-input AND gate using a 2:1 multiplexer.

An AND gate can be implemented using a 2-input multiplexer by connected D0 input to '0' and D1 to B, SEL being connected to A. AND gate using mux, AND gate using 2x1 mux, 2-input AND gate using mux
Figure 2: Implementation of AND gate using a 2:1 mux



2-input NAND gate using 2:1 mux: Figure 3 below shows the truth table of a 2-input NAND gate. If we observe carefully, OUT equals '1' when A is '0'. Similarly, when A is '1', OUT is B'. So, if we connect SEL pin of mux to A, D0 pin of mux to '1' and D1 to B', then it will act as a NAND gate.

In a 2-input NAND gate, output is '0' when both inputs are '1', otherwise output is '1'
Figure 3: Truth table of 2-input NAND gate

Figure 4 below shows the implementation of a 2-input NAND gate using 2:1 mux.


A NAND gate can be implemented using a 2-input multiplexer, if we connect the select pin of the multiplexer to A, D0 to VDD and D1 to B' inputs. NAND gate using mux, NAND gate using 2x1 mux
Figure 4: Implementation of 2-input NAND gate using 2:1 mux

2-input OR gate using 2x1 mux: Figure 5 below shows the truth table for a 2-input OR gate. If we observe carefully, OUT equals B when A is '0'. Similarly, OUT is '1' (or A), when A is '1'. So, we can make a 2:1 mux act like a 2-input OR gate, if we connect D0 pin to B and D1 pin to A, with select connected to A.

In a 2-input OR gate, output is '1' when either or both of the inputs are '1'. Otherwise, output is '0'.
Figure 5: Truth table of 2-input OR gate

Figure 6 below shows the implementation of 2-input OR gate using a 2:1 multilpexer:


A 2-inputs multiplexer can be converted to an OR gate, if we connect the select pin of mux to A-input, D0 to B-input and D1 to VDD. OR gate using mux, OR gate using 2x1 mux
Figure 6: Implementation of 2-input OR gate using 2:1 mux


2-input NOR gate using 2x1 mux: Figure 7 below shows the truth table of a 2-input NOR gate. If we observe carefully, OUT equal B' when A is '0'. Similarly, OUT equals '0' when A is '1'. So, we can make a 2-input mux act like a 2-input NOR gate, if we connect SEL of mux to A, D0 to B' and D1 to '0'.

In a 2-input NOR gate, output equals '0' when either or both the inputs is '1'. Otherwise, output is '0'.
Figure 7: Truth table of 2-input NOR gate
Figure 8 shows the implementation of 2-input NOR gate using 2:1 mux.


NOR gate using mux, 2-input NOR gate using 2:1 mux, NOR gate using 2x1 mux
Figure 8: Implementation of 2-input NOR gate using 2x1 mux


2-input XNOR gate using 2x1 mux: Figure 9 below shows the truth table of a 2-input XNOR gate. If we observe carefully, OUT equals B' when A is '0' and equals B when A is '1'. So, a 2-input XNOR gate can be implemented from a 2x1 mux, if we connect SEL pin to A, D0 to B' and D1 to B.

In a 2-input XNOR gate, output equals '0' when exactly one of the inputs is '1', otherwise output is '1'.
Figure 9: Truth table of 2-input XNOR gate
The implementation of 2-input XNOR gate using a 2x1 mux is as shown in figure 10.
A 2-input XNOR gate can be realized using a 2:1 mux provided we connect the select to A-input, D0 to B' and D1 to B. XNOR gate using mux, XNOR gate using 2x1 mux, 2-input XNOR gate using mux
Figure 10: Implementation of 2-input XNOR gate using 2x1 mux


2-input XOR gate using 2x1 mux: Figure 11 shows the truth table for a 2-input XOR gate. If we observe carefully, OUT equals B when A is '0' and B' when A is '1'. So, a 2:1 mux can be used to implement 2-input XOR gate if we connect SEL to A, D0 to B and D1 to B'.

In a 2-input XNOR gate, output equals '1' when exactly one of the inputs is '1', otherwise output is '0'.
Figure 11: Truth table of 2-input XOR gate
Figure 12 shows the implementation of 2-input XOR gate using 2x1 mux.
A 2-input XNOR gate can be realized using a 2:1 mux provided we connect the select to A-input, D0 to B and D1 to B'. XOR gate using mux, 2-input XNOR gate using mux, XNOR gate using 2:1 mux
Implementation of 2-input XOR gate using 2x1 mux

NOT gate using 2:1 mux: Figure 13 shows the truth table for a NOT gate. The only inverting path in a multiplexer is from select to output. To implement NOT gate with the help of a mux, we just need to enable this inverting path. This will happen if we connect D0 to '1' and D1 to '0'.
Truth table of NOT gate
Figure 13: Truth table of NOT gate
Figure 14 shows the implementation of NOT gate using 2x1 mux:
NOT gate using 2-input mux, NOT gate using mux, NOT gate using multiplexer
Figure 14: Implementation of NOT gate using 2x1 mux

Multiplexer

Definition of mux: A multiplexer is a combinational circuit that selects one out of multiple input signals depending upon the state of select line. A 2^N:1 multiplexer with ‘N’ select lines can select 1 out of 2^N inputs. In other words, the multiplexer connects the output to one of its inputs based upon the value held at the select lines. A multiplexer (or commonly called as MUX) is also termed as data selector. Common functions of a multiplexer include concentrating the multiple data lines onto one single line. It can also be used as a data selector or clock selector.

Multiplexers can be categorized based upon the number of inputs:
  • 2-input mux: A 2:1 mux has 2 data input lines and 1 select line. The state of select line decides which of the inputs propagates to the output. The truth table of 2x1 mux is given below. As it shows, when SEL is 1, OUT follows IN2 and when SEL is 0, OUT follows IN1.
2:1 mux truth table, 2x1 mux truth table, 2 to 1 mux truth table, 2 by 1 mux truth table, 2 by 1 multiplexer, 2 to 1 multiplexer, 2x1 multiplexer truth table
Figure 1: Truth table of 2x1 mux
The logic circuit and symbol of 2x1 mux is shown in figure 2. Let us assume logical area of a 2:1 mux to be A.
Mux symbol, a 2:1 mux can be built using 2 2-input AND gates and an inverter.
Figure 2(a): Logic diagram of 2x1 mux                                               Figure 2(b): Schematic symbol of 2x1 mux
  • 3-input mux: A 3:1 mux has 2 select lines and 3 inputs. As a mux with 2 select lines can represent at max 4 inputs, a 3:1 mux repeats some inputs for 2 combinations. The truth table for 3-input mux is given below. As can be seen, for SEL value "10" and "11", IN2 is selected at the output (this is one of the 3 possible scenarios, repetition of IN0 or IN1 is also possible).
A 3:1 multiplexer has 2 select lines and 3 inputs. One of these inputs propagates to the output depending upon the state of select lines.
Figure 3: Truth table for 3x1 mux
The schematic symbol and structural representation (in terms of 2x1 muxes) for 3:1 mux is shown in figure 4 below. As can be figured out, 1 3x1 mux can be built using 2 2x1 muxes.
3-input mux, 3:1 mux, 3x1 mux, 3 by 1 mux, 3-input multiplexer, 3:1 multiplexer, 3x1 multiplexer schematic
Figure 4(a): Schematic symbol for 3x1 mux Figure                           4(b): Structural symbol for 3:1 mux using 2x1 muxes

  • 4-input mux: A 4:1 mux has 2 select lines and 4 inputs. The truth table for 4x1 mux is shown below:
4-input mux, 4-input multiplexer, 4x1 mux, 4x1 multiplexer, 4 by 1 mux, 4 by 1  multiplexer, 4:1 mux, 4:1 multiplexer truth table
Figure 5: Truth table for 4:1 mux
Figure 6 below shows the schematic symbol and structural symbol of 4:1 mux using 2:1 muxes. As is evident, a 4:1 mux can be built from 3 2:1 muxes.

A 4:1 mux has 4 inputs and 2 select lines. Depending upon the state of select lines, output selects one of the inputs
Figure 6(a): 4x1 mux schematic symbol     Figure 6(b): 4:1 mux structural representation with 2x1 muxes

  • 8-input mux: An 8x1 mux has 3 select lines and 8 inputs. The truth table for 8x1 mux is shown below:
An 8:1 mux has 8 inputs and 3 select lines. Depending upon the state of select lines, an input is connected to the output
Figure 7: Truth table for 8:1 mux
The structural representation using 2x1 muxes, and schematic symbol for the same is as shown below in figure 8. An 8-input mux can be implemented using 7 2-input muxes.
8-input mux, 8-input multiplexer, 8:1 mux, 8:1 multiplexer, 8 by 1 mux, 8 by 1 mux schematic, 8x1 mux schematic, 8x1 multiplexer, 8:1 mux using 2:1 mux
Figure 8(a): Schematic symbol for 8x1 mux                    Figure 8(b): Structure of 8x1 mux  with 2x1 mux                                              

  • 16-input mux: A 16x1 mux can be implemented from 15 2:1 muxes. It has 4 select lines and 16 inputs. Output follows one of the inputs depending upon the state of the select lines.

Also read:


Multicycle paths handling in STA

In the post Multicycle paths - the architectural perspective, we discussed about the architectural aspects of multicycle paths. In this post, we will discuss how multicycle paths are handling in backend optimization and timing analysis:

How multi-cycle paths are handled in STA: By default, in STA, all the timing paths are considered to have default setup and hold timings; i.e., all the timing paths should be covered in either half cycle or single cycle depending upon the nature of path (see setup-hold checks part 1 and setup-hold checks part 2 for reference). However, it is possible to convey the information to STA engine regarding a path being multi-cycle. There is an  SDC command "set_multicycle_path" for the same. Let us elaborate it with the help of an example:

Path from ff1 to ff2 is a multicycle path
Figure 3: Path from ff1/Q to ff2/D is multicycle path

Let us assume a multi-cycle timing path (remember, it has to be ensured by architecture) wherein both launch and capture flops are positive edge-triggered as shown in figure 3.  The default setup and hold checks for this path will be as shown in red in figure 4. We can tell STA engine to time this path in 3 cycles instead of default one cycle with the help of set_multicycle_path SDC command:

                                    set_multicycle_path 3 -setup -from ff1/Q -to ff2/D

Above command will shift both setup and hold checks forward by two cycles. That is, setup check will now become 3 cycle check and hold will be 2 cycle check as shown in blue in figure 4. This is because, by default, STA engine considers hold check one active edge prior to setup check, which, in this case, is after 3 cycles.

When you apply multicycle path only for setup, hold also moves along as default hold check is one edge prior to setup check
Figure 4: Setup and hold checks before and after applying multicyle for setup-only

However, this is not the desired scenario in most of the cases. As we discussed earlier, multi-cycle paths are achieved by either gating the clock path or data path for required number of cycles. So, the required hold check in most cases is 0 cycle. This is done through same command with switch "-hold" telling the STA engine to pull hold back to zero cycle check.

                               set_multicycle_path -hold 2 -from ff1/Q -to ff2/D

The above command will bring back the hold check 2 cycles back to zero cycle. This is as shown in figure 5 in blue.

On applying multicycle path for hold, hold check comes back to where it was intended. It does not impact setup check
Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold


We need to keep in mind the following statement:

Setting a multi-cycle path for setup affects the hold check by same number of cycles as setup check in the same direction. However, applying a multi-cycle path for hold check does not affect setup check.

So, in the above example, both the statements combined will give the desired setup and hold checks. Please note that there might be a case where only setup or hold multi-cycle is sufficient, but that is the need of the design and depends on how FSM has been modeled.


What if both clock periods are not equal: In the above example, for simplicity, we assumed that launch and capture clock periods are equal. However, this may not be true always. As discussed in multicycle path - the architectural perspective, it makes more sense to have multi-cycle paths where there is a difference in clock periods. The setup and hold checks for multicycle paths is not as simple in this case as it was when we considered both the clocks to be of same frequency. Let us consider a case where launch clock period is twice the capture clock period as shown in figure 6 below.

Setup and hold cheks in case of multicycle paths  for clocks differing in frequencies
Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock

Now, the question is, defining a multi-cycle path, what clock period will be added to the setup check, launch or capture? The answer depends upon the architecture and FSM of the design. Once you know it, the same can be modelled in timing constraints. There is a switch in the SDC command to provide for which of the clock periods is to be added. "set_multicycle_path -start" means that the path is a multi-cycle for that many cycles of launch clock. Similarly, "set_multicycle_path -end" means that the path is a multicycle for that many cycles of capture clock. Let the above given path be a multicycle of 2. Let us see below how it changes with -start and -end options.

      1. set_multicycle_path -start: This will cause a cycle of launch clock to be added in setup check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle check. Figure 7 below shows the effect of below two commands on setup and hold checks. As is shown, setup check gets relaxed by one launch clock cycle.

                      set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -start
                      set_multicycle_path 1 -hold   -from ff1/Q -to ff2/D -start

When provided with -start switch, shifts in setup and hold checks happen in multiples of launch clock period.
Figure 8: Setup and hold checks with -start option provided with set_multicycle_path

       2. set_multicycle_path -end: This will cause a cycle of capture clock to be added in setup check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle check. Figure 8 below shows the effect of below two commands on setup and hold checks. As is shown, setup gets relaxed by one cycle of capture clock.
                      set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -end
                      set_multicycle_path 1 -hold   -from ff1/Q -to ff2/D -end

When provided with -end option, shifts in setup and hold checks happen in multiples of capture clock period.
Figure 9: Setup and hold checks with -end option provided with set_multicycle_path

Why is it important to apply multi-cycle paths: To achieve optimum area, power and timing, all the timing paths must be timed at the desired frequencies. Optimization engine will know about a path being multicycle only when it is told through SDC commands in timing constraints. If we dont specify a multicycle path as multicycle, optimization engine will consider it as a single cycle path and will try to use bigger drive strength cells to meet timing. This will result in more area and power; hence, more cost. So, all multicycle paths must be correctly specified as multicycle paths during timing optimization and timing analysis.

Also read:

VLSI n EDA : Now on Facebook

Hello guys!! Check out our new facebook page and get connected for all the latest posts. Just "like" the page and find all the updates right on your wall.

You can find our page at the below link:

https://www.facebook.com/vlsiuniverse/

Please help us reach your friends and colleagues.