Showing posts with label Interview Questions. Show all posts
Showing posts with label Interview Questions. Show all posts

What is meant by drive strength of a standard cell

As we know that cell delay is a function of output load capacitance. The most simplistic equivalent circuit of a logic gate driving an output can be assumed as given in figure 1:


The purpose of logic gate is to propagate the effect of logic value available at its input to the output. Based upon whether '0' or '1' is to be propagated to the output. The corresponding is achieved by charging and discharging of the output load capacitance. Propagating a logic '0' will mean discharging of the load capacitance, and vice-versa. Drive strength of the logic gate is the its relative capability to charge/discharge the capacitance present at its output. Now, the time constant, and hence, delay of the circuit is "RC".
So, for a cell with higher drive strength, corresponding "R" is lesser than the one with lower drive strength. So that for same load capacitance "C", delay is lower for a cell with higher drive strength as it can charge the capacitance in lesser time.

How drive strength varies with size of a cell: Let us talk in terms of MOSFETs, although this is valid in terms of every device in general. We know that for a given technology standard cell library, length of all transistors is kept constant. For instance, 90 nm technology will have gate length of all transistors as ~90 nm. And channel resistance of the MOSFET is inversely proportional to "W/L" of the transistor. So, a simple way to decrease channel resistance is to increase "W" of the transistor. So, a transistor with more area will have lesser resistance. Or we can say that a logic gate with bigger transistors will have more drive strength.

What is unit drive strength: In a standard cell library, we generally see cells labelled as "1X", "2X" and so on. But what is meant by the number that you see with drive strength? In general, the lowest size logic gate is labelled as unit drive strength. The drive strength numbers of other cells are laelled relative to unit drive strength cell.

Read next: How delay of a cell changes with drive strength

Also read:

Divide by 2 clock in VHDL

Clock dividers are ubiquitous circuits used in every digital design. A divide-by-N divider produces a clock that is N times lesser frequency as compared to input clock. A flip-flop with its inverted output fed back to its input serves as a divide-by-2 circuit. Figure 1 shows the schematic representation for the same.

A divide by 2 clock circuit produces output clock that is half the frequency of the input clock
Divide by 2 clock circuit
                                          
Following is the code for a divide-by-2 circuit.
-- This module is for a basic divide by 2 in VHDL.
library ieee;
use ieee.std_logic_1164.all;
entity div2 is
                port (
                                reset : in std_logic;
                                clk_in : in std_logic;
                                clk_out : out std_logic
                );
end div2;

-- Architecture definition for divide by 2 circuit
architecture behavior of div2 is
signal clk_state : std_logic;
begin
                process (clk_in,reset)
                begin
                                if reset = '1' then
                                                clk_state <= '0';
                                elsif clk_in'event and clk_in = '1' then
                                                clk_state <= not clk_state;
                                end if;
                end process;
clk_out <= clk_state;

end architecture;

Hope you’ve found this post useful. Let us know what you think in the comments.

Interesting problem – Latches in series


Problem: 100 latches (either all positive or all negative) are placed in series (figure 1). How many cycles of latency will it introduce?

This figure shows 100 negative level-sensitive latches connected together in a chain
Figure 1 : 100 negative level-sensitive latches in series
As we know, setup check between latches of same polarity (both positive or negative) is zero cycle with half cycle of time borrow allowed as shown in figure 2 below for negative level-sensitive latches:

Setup check between two latches of same polarity is zero cycle with half cycle of time borrow allowed.
Figure 2: Setup check between two negative level-sensitive latches

So, if there are a number of same polarity latches, all will form zero cycle setup check with the next latch; resulting in overall zero cycle phase shift.

As is shown in figure 3, all the latches in series are borrowing time, but allowing any actual phase shift to happen. If we have a design with all latches, there cannot be a next state calculation if all the latches are either positive level-sensitive or negative level-sensitive. In other words, for state-machine implementation, there should not be latches of same polarity in series.

Each latch will form a zero cycle setup check with the following latch, resulting in overall zero cycle phase shift.
Figure 3 : Timing for 100 latches in series


Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

VLSI design interview questions

VLSI stands for Very Large Scale Integration and it enables the creation of integrated circuits by incorporating thousands, and even millions of transistors on a single chip. Before VLSI, only small functionalities could be integrated onto a chip. Most of the ICs could perform only a small set of functions such as ALU, counters etc. With the help of VLSI technology, it has become possible to get a whole system designed on a single chip.

Getting into the field of VLSI demands knowledge of some of the basic concepts, be it systems design, timing analysis, RTL design etc. We have tried to collate a few of the topics in the links below. Going through these should be helpful for you. Looking for your feedback for further improvement.

How propagation of ‘X’ happens through different logic gates


‘X’ refers to a signal attaining a value that is ‘unknown’. It can be either ‘0’ or ‘1’. But, the exact value of the signal is not known. If a simulator is not able to decide whether a logic value should be logic ‘0’ or logic ‘1’, it will assign a value ‘X’ to the value. An example of ‘X’ source is a logic block that has not been initialized properly through reset. Having an ‘X’ value at a node can propagate to the logic lying in the fan-out, thereby increasing the uncertainty downstream the logic.

How ‘X’ propagates: An ‘X’ value at the input of a logic gate may or may not propagate to its output depending upon the states at other inputs of the logic gate. Given below is how different logic gates react to ‘X’ values:

1)  OR gate: An OR gate can absorb an ‘X’ if the other input has logic ‘1’. Otherwise, ‘X’ propagates through it. Please refer figure 1 for explanation:
A logic '1' at the other input saves 'X' from propagating through an OR gate, whereas a logic '0' causes 'X' at the other input to propagate to the output.
Figure 1: X-propagation through OR gate

2) AND gate: An AND gate can absorb ‘X’ if the other input has logic ‘0’. Otherwise, ‘X’ propagates through it. Please refer figure 2 for explanation:

Figure 2: X-propagation through AND gate


3) Buffer/inverter: Since, buffers/inverters are single input gates, an ‘X’ at the input means ‘X’ at the output.

4) XOR gate: An ‘X’ at one of the inputs of XOR produces ‘X’ at the output no matter what the 
other input state is. Please refer truth table given in Figure 3 for explanation:

An 'X' at the input of an XOR gate propagates to the output no matter what is the state of the other input.
Figure 3: X-propagation through XOR gate

5) Complex gates: For complex gates, whether ‘X’ will propagate to the output depends upon the overall function of the ‘X’ input with respect to other gates. E.g. suppose a gate with function

Z = A + (B * C)

Then, if B input goes ‘X’, the output will go ‘X’ if A=0 and C=1.


Read also:

Can a net have negative propagation delay?


As we discussed in ‘’Is it possible for a logic gate to have negative propagation delay”, a logic cell can have negative propagation delay. However, the only condition we mentioned was that the transition at the output pin should be improved drastically so that 50% level at output is reached before 50% level of input waveform.

In other words, the only condition for negative delay is to have improvement in slew. As we know, a net has only passive parasitic in the form of parasitic resistances and capacitances. Passive elements can only degrade the transition as they cannot provide energy (assuming no crosstalk); rather can only dissipate it. In other words, it is not possible for a net to have negative propagation delay.

However, we can have negative delay for a net, if there is crosstalk, as crosstalk can improve the transition on a net. In other words, in the presence of crosstalk, we can have 50% level at output reached before 50% level at input; hence, negative propagation delay of a net.

Also read:




C function that converts hexadecimal value to decimal value.

Hexadecimal to decimal conversion is something that is often needed in hardware. Below functions can be used for hexadecimal to decimal conversion in C:
#include<stdio.h>#include<conio.h>#include<string.h>
int get_value(char a)  { if(a>='0'&& a<='9' ) { return (a- '0'); } else if(a>='A' && a<='F') return ((a-'0')-7); } else if(a>='a' && a<='f') return ((a-'0')-39); else return -1;
}


int htoi(char a[]){ int len=strlen(a); int temp=0; for(int i=0;i<len;i++) { int digit=get_value(a[i]); if(digit == -1){ return -1; } temp=temp*16+digit; } return temp;}
int main(){ char a[]="f0"; clrscr(); int b=htoi(a); if(b == -1) printf("invalid input"); else printf("decimal value is %d",b); getch();        return 0;}

Interesting programming quiz : Array Bound Read Error

Problem: Can you figure out what is wrong with following piece of code?


#include <iostream>int main() {     int a[5] = {1,2,3,4,5};     for (int i = 4; a[i] >= 0 && i >=0 ; i--) {          std::cout<< "ith element of array is "<<a[i]<<std::endl;     }}

I would suggest you to  try it yourself before scrolling down to see the answer. Its quite interesting,
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
Answer : Here, as one can figure out, the intention is to print array elements from end till we don't hit any negative number. In the first look it may seem fine but unfortunately it will end up in ABR (Array Bound Read).

Explanation : After the completion of 5th iteration; i.e. when i = 0, compiler will decrement i;  i.e., i will become "-1". It will, then, try to check the condition, which will result in reading a[-1]. Since, array can have indexes only greater than or equal to 0, it will result in an error. Trying to read array elements out of the allowed indexes is termed as Array Bound Read Error. Hence, one should avoid such conditions because it can result into random result. The program can crash anytime. If you are lucky, it may run successfully also. Its all up to your luck.  Instead, it should be

for (int i = 5; i>=0 && a[i]  >= 0 ; i--) {

i.e. first check index value and then do the array access operation.

Here, with the above solution, one more interesting thing comes up to understand. In AND (&&) operation compiler first evaluates  condition1; if it is true, then goes to evaluate condition2; otherwise return false from there only.

For example,
#include <iostream>
int main() {int i = 0;int j= 1;if(  ( i == 1) && (++j ==3)  ) {      std::cout<<"inside if"<<std::endl;}std::cout<<"i is "<<i<<" and j is "<<j<<std::endl;}
Output :
i is 0 and j is 1


Here, as you can see code control will not go into if branch as none of condition is true. since condition1 i==1 is false, compiler will not even check condition2 i.e. value of j will not be incremented.


Internally,  compiler might be doing some following kind of transformation to evaluate && operation

  bool cond = (i==1);  if( cond ) {      cond = (++j != 0) ;  }if(cond){      std::cout<<"inside if"<<std::endl;}

Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References:



Implement 3 and 4 variable function using 8:1 MUX

Three variable  function can be easily implemented using 8:1 multiplexer. connect 3 input lines to select lines of mux and connect 8 inputs of mux to logic 0 or 1 according to function output. For example, let us say Function is

                                      F(X,Y,Z) = Σ(0,1,3,6)

then X,Y,Z will be connected to select lines of Mux and I0 , I1, I3 and I6 will be connected to logic 1(VDD) and other will be connected to logic 0


The output will select an input based upon the values provided at S0, S1 and S2


For a 4 variable function, there are 16 possible combinations. To implement 4 variable function using 8:1 MUX, use 3 input as select lines of MUX and remaining 4th input and function will determine ith input of mux . Let us demonstrate it with an example :

                                  F(A,B,C,D) = Σ(1,5,7,9,10,11,12)

A
B
C
D
Decimal Equivalent
F
0
0
0
0
0
0
1
0
0
0
8
0
0
0
0
1
1
1
1
0
0
1
9
1
0
0
1
0
2
0
1
0
1
0
10
1
0
0
1
1
3
0
1
0
1
1
11
1
0
1
0
0
4
0
1
1
0
0
12
1
0
1
0
1
5
1
1
1
0
1
13
0
0
1
1
0
6
0
1
1
1
0
14
0
0
1
1
1
7
1
1
1
1
1
15
0


The function represented using 8:1 mux
The 4 variable function represented using 8:1 mux

                                  ABar = ~A (inverted A)

As shown in figure, B,C,D are used as select lines and A will be used input  of Mux. from Truth table, if B,C,D  are 0 then output F is 0 irrespective of status of A so I0 = 0. For I5(BCD = 101) output depend upon A.

                                     for A = 0,  F  = 1
                                     for A = 1,  F =  0
                          Hence F = ~A (for BCD = 101)
                                    I4, (B = 1, C = 0, D = 0),  F = A
                                    I1, (B = 0, C =  0,D  = 1), F = 1 (irrespective of status of A)

similarly All other inputs can be inferred in the same way.  Thus we can conclude that to implement n variable function, we need 2^(n-1) to 1 MUX and an inverter. n-1 input lines shall be used as select lines and rest one will be used for input of MUX.

Also read: