STA

Static timing analysis (STA) is a vast domain involving many sub-fields. It involves computing the limits of delay of elements in the circuit without actually simulating it. In this post, we have tried to list down all the posts that an STA engineer cannot do without. Please add your feedback in comments to make reading it a more meaningful experience.

  • Metastability - This post discusses the basics of metastability and how to avoid it.
  • Lockup latch - The basics of lockup latch, both from timing and DFT perspective have been discussed in this post.

  • Clock latency - Read this if you wish to get acquainted with the terminology related to clock latency

  • Data checks - Non-sequential setup and hold checks have been discussed, very useful for beginners

  • Synchronizers - Different types of synchronizers have been discussed in detail

  • On-chip variations - Describes on-chip variations and the methods undertaken to deal with these
  • Temperature inversion - Discusses the concept of temperature inversion and conductivity trends with temperature

  • Timing arcs - Discusses the basics of timing arcs, positive and negative unateness, cell arcs and net arcs etc.

  • Basics of latch timing - Definition of latch, setup time and hold timing of a latch, latch timing arcs are discussed

XOR/XNOR gate using 2:1 MUX

2-input XOR gate using a 2:1 multiplexer: As we know, a 2:1 multiplexer selects between two inputs depending upon the value of its select input. The function of a 2:1 multiplexer can be given as:

OUT = IN0 when SEL = 0 ELSE IN1

Also, a 2-input XOR gate produces a ‘1’ at the output if both the inputs have different value; and ‘0’ if the inputs are same. The truth table of an XOR gate is given as:

A
B
OUT
0
0
0
0
1
1
1
0
1
1
1
0
Truth table of XOR gate

In the truth table of XOR gate, if we fix a value, say B, then

OUT = A WHEN B = 0 ELSE A’


Both the above equations seem equivalent if we connect negative of IN0 to IN1 in a multiplexer. This is how a 2:1 multiplexer will implement an XOR gate. Figure 1 below shows the implement of a 2-input XOR gate using a 2:1 Multiplexer.

An XOR gate can be implemented from a mux simply by connecting the select to one of the inputs, and the inputs to A and Abar respectively.
Implementing a 2-input XOR gate using a 2:1 Multiplexer


I hope you’ve found this post useful. Let me know what you think in the comments. I’d love to hear from you all.


2-input XNOR gate using a 2:1 multiplexer: Similarly, the truth table of XNOR gate can be written as:

A
B
OUT
0
0
1
0
1
0
1
0
0
1
1
1
Truth table of XNOR gate

In the truth table, if we fix, say A, then

OUT = B WHEN A = 1, ELSE B’


Thus, XNOR gate is the complement of XOR gate. It can be implemented if we connect A to IN1 and Abar to IN0.

An XNOR gate can be implemented from a mux simply by connecting the select to one of the inputs, and the inputs to A and Abar respectively.
2-input XNOR gate using 2:1 multiplexer


Read also:

Latch using 2:1 MUX

As we know, a 2:1 multiplexer selects between two inputs depending upon the value of its select input. Also, a latch holds its previous value when its enable pin is in a particular state (‘0’ for positive level sensitive latch and ‘1’ for negative level sensitive latch).

So, to build a positive level sensitive latch from a multiplexer, short the output with IN0 pin of the multiplexer and connect data input to IN1 and Clock input to SEL pin of multiplexer. A negative level latch can also be built similarly. Figure 1 below shows the diagram representation for the same.

Build a latch using a multiplexer


Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

String class vs dynamically allocated array

String class should be preferred over dynamically allocated array due to following limitations of dynamically allocated arrays:
  1. Whenever user calls new operator, it becomes her/his responsibility to delete it as well to avoid memory leaks. 
  2.  User must ensure that correct form of delete is called. For a single element allocation delete should be used and  for an array allocation delete[] should be used. If wrong version is used it may result into undefined behavior. 
  3. User has to make sure that there is single delete for one allocation. 
string class provides function c_str() for backward compatibility with C API's that expects char* as argument. Hence there is no reason for not to use string in place of array of char.

But, in Multi-threaded environment, there can be performance issues with string class because of reference counting(wiki link) optimization. Basically, reference counting optimization can eliminate unnecessary memory allocations and copying of characters. But in multi-threaded environment, time saved by avoiding unnecessary allocations and copying is dwarfed by time spent on behind the scenes for concurrency control.

Hence in multi threaded environment, user has following options :
  1. Check for library implementation of string class if it allows you to disable reference counting optimization.
  2. check for alternative implementation of string class that do not have reference-counting optimization that can be checked in copy constructor of class. 
  3. consider using vector<char> instead of string.  String class' member functions will not be available but most of the functionality is available through STL algorithms.    
Option 1 & 2 are not even solutions that are just checking string class or library implementation. Option 3 is a real solution.

Hope you’ve found this post useful. Let us know what you think in the comments.

References : Effective STL by Scott Meyer


What is Static Timing Analysis?

Static timing analysis (STA) is an analysis method of computing the max/min delay values of a complete circuit without actually simulating the full circuit. In STA, static delays such as gate delay and net delays are considered in each path. These delays are, then, compared against the required bounds on the delay values and/or the relationship between the delays of different gates. In STA, the circuit to be analyzed is broken down into timing paths consisting of gates, registers and nets connecting these. Normally, timing paths start from and end at registers or chip boundary. Based on origin and termination of data, timing paths can be categorized into four categories:

        1.)    Input to register paths: These paths start at chip boundary from input ports and end at registers
        2.)    Register to register paths: These paths start at register output pin and terminate at register input   pin
        3.)    Register to output paths: These paths start at a register and end at chip boundary output ports
        4.)    Input to output paths: These paths start from chip boundary at input port and end at chip               boundary at output port
Timing path from each start-point to end-point are constrained to have maximum and minimum delays. For example, for register to register paths, each path can take maximum of one clock cycle (minus input/output delay in case of input/output to register paths). The minimum delay of a path is governed by hold timing requirement of the endpoints. Thus, the maximum delay taken by a timing path governs the maximum frequency of operation.
As stated before, Static timing analysis does timing analysis without actually simulating the circuit. The delays of cells are picked from respecting technology libraries. The delays are available in libraries in tabulated form on the basis of input transition and output load, which have been calculated based by simulating the cells for a range of boundary conditions. Net delays are calculated based upon R and C models.

One important characteristic of static timing analysis that must be discussed is that static timing analysis checks the static delay requirements of the circuit without applying any vectors, hence, the delays calculated are the maximum and minimum bounds of the delays that will occur in real application scenarios with vectors applied. This enables the static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to certify the functionality of the design. Thus, static timing analysis guarantees the timing of the design whereas dynamic timing analysis guarantees functionality for real application specific input vectors.

I hope you’ve found this post useful. Let me know what you think in the comments. I’d love to hear from you all.

VLSI design interview questions

VLSI stands for Very Large Scale Integration and it enables the creation of integrated circuits by incorporating thousands, and even millions of transistors on a single chip. Before VLSI, only small functionalities could be integrated onto a chip. Most of the ICs could perform only a small set of functions such as ALU, counters etc. With the help of VLSI technology, it has become possible to get a whole system designed on a single chip.

Getting into the field of VLSI demands knowledge of some of the basic concepts, be it systems design, timing analysis, RTL design etc. We have tried to collate a few of the topics in the links below. Going through these should be helpful for you. Looking for your feedback for further improvement.

Defining a clock signal in VHDL

Defining a clock signal in VHDL
Clock is the backbone of any synchronous design. For test-benches, a clock is the most desired signal as almost every design requires a clock. Going a bit deeper, a clock signal is a binary signal that changes state every few time units. So, defining a clock in VHDL is pretty simple, as shown below in the following code:
                signal my_clock : std_logic;
                process               
                                my_clock <= ‘0’;
                                wait for 5 ns;
                                my_clock = ‘1’;
                                wait for 5 ns;
                end process;
The above code defines a clock of  period 10 ns with 5 ns high time and 5 ns low time, hence, 50% duty cycle. Since, we are assigning a value to my_clock in the code, it can wither be defines as a signal or an output. Most probably, clocks are defined in test-benches, hence, are internal signals. High time and low time don’t always need to be same. You can always define a clock that has different high and low times as shown below:
signal my_clock : std_logic;
                process
                                my_clock <= ‘0’;
                                wait for 8 ns;
                                my_clock = ‘1’;
                                wait for 2 ns;
                end process;
As we can see, now, my_clock has a duty cycle of 80%; i.e. a high time of 80% and a low time of 20%.

Defining a clock in this way, obviously, is not synthesizable as we are using delays in code, and delays cannot be synthesized. Hence, this way of defining a clock can only be used in a test-bench to test a piece of code. If you need to write a synthesizable clock, then you have to use structural coding. The simplest of clock generation circuits is a ring counter (a chain of inverters connected back-to-back), but it will have a variable frequency clock because delay of inverters changes on change in operating conditions.