Delay line based Time to digital converter

A time to digital converter is a circuit that digitizes time; i.e., it converts time into digital number. In other words, a time-to-digital converter measures the time interval between two events and represents that interval in the form of a digital number.

TDCs are used in places where the time interval between two events needs to be determined. These two events may, for example, be represented by rising edges of two signals. Some applications of TDCs include time-of-flight measurement circuits and All-Digital PLLs.

Delay line based time-to-digital converter: This is a very primitive TDC and involves a delay-line which is used to delay the reference signal. The other signal is used to sample the state of delay chain. Each stage of delay chain outputs to a flip-flop or a latch which is clocked by the sample signal. Thus, the output of the TDC forms a thermometer code as the stage will show a ‘1’ if the reference signal has passed it, otherwise it will show a zero. The schematic diagram of delay line based time-to-digital converter is shown in figure 1 below:

his is a very primitive TDC and involves a delay-line which is used to delay the reference signal. The other signal is used to sample the state of delay chain. Each stage of delay chain outputs to a flip-flop or a latch which is clocked by the sample signal. Thus, the output of the TDC forms a thermometer code as the stage will show a ‘1’ if the reference signal has passed it, otherwise it will show a zero.
Figure 1: Delay line based Time-to-digital converter


The VHDL code for delay line based time-to-digital converter is given below:
-- This is the module definition of delay line based time to digital converter.
library ieee;
use ieee.std_logic_1164.all;        
entity tdc is
                generic (
                                number_of_bits : integer := 64
                );
                port (
                                retimed_clk : in std_logic;
                                variable_clk : in std_logic;
                                tdc_out : out std_logic_vector (number_of_bits-1 downto 0);
                                reset : in std_logic
                );
end entity;
architecture behavior of tdc is
                component buffd4 is port (
                                I : in std_logic;
                                Z : out std_logic
                );
                end component;
                signal buf_inst_out : std_logic_vector (number_of_bits downto 0);
begin
--buffd4
                buf_inst_out(0) <= variable_clk;
                tdc_loop : for i in 1 to (number_of_bits) generate
                begin
                                buf_inst : buffd4 port map (
                                                I => buf_inst_out(i-1),
                                                Z => buf_inst_out(i)
                                );
                end generate;

                process (reset,retimed_clk)
                begin
                                if reset = '1' then
                                                tdc_out <= (others => '0');
                                elsif retimed_clk'event and retimed_clk = '1' then
                                                tdc_out <= buf_inst_out(number_of_bits downto 1);
                                end if;
                end process;
end architecture;

References:

Hope you’ve found this post useful. Let us know what you think in the comments.

Integer to string conversion and vice-versa in c++

While programming, we often need to convert an integer into string and vice-versa. Some of the examples scenarios to do so are as follows:
  • One may want to concatenate a string and integer number 
  • Read input from a file that contains multiple integers and report the sum. 

There are multiple ways to do it. C provides inbuilt functions stoi (read as string to integer) and sprintf to do this task. C++ provides an object-oriented approach for the same. It provides better methods to do it via sstream library. 

Concatenation of a string and an integer: The following piece of code concatenates a string and an integer and returns the result as a string. It first declares an object of class stringstream and simply reads in a number into the object. There is nothing special that we need to do for the same.
#include <sstream>
#include <string>
#include <iostream>
using namespace std;
string concatenate(const string &str,int num) {
    stringstream s; // stringstream class is defined in sstream
    s<<num;      //integer num is assigned to string s
    string ret = str + s.str(); // s.str() return string object from s
    return ret;
}

String to integer conversion: Following piece of code (function) takes two strings as input, converts them into integers by assigning to objects of istringstream type, and returns their sum.
int sum(const string &str1,const string &str2) {    istringstream s1(str1); // istringstream is defined in sstream    istringstream s2(str2);    int in1,in2;    s1>>in1;    s2>>in2;    return in1+in2;}
The following piece of code calls the above two functions to perform concatenation and calculations:
int main() {   cout << concatenate("my birthday is on this month of ", 25)<<endl;   cout<<"sum = "<<sum("1","2")<<endl;   cout << concatenate("sum is ",sum("1","2") );}

Also read:

Thermometer code

What is thermometer code: Thermometer code resembles the output produced by a thermometer. In thermometer code, a value representing number ‘N’ has the lowermost ‘N’ bits as ‘1’; others as 0. So, to move from N to ‘N+1’, just change the rightmost ‘0’ to ‘1’. Figure 1 below shows graphically the thermometer codes for values from ‘0’ to ‘7’. As is evident, each value resembles a reading in thermometer. This is how, thermometer code got its name. Flash ADCs, time-to-digital converters (TDC) are some of the circuits that utilize thermometer code.

A thermometer code is a series of zeroes followed by a series of ones. A 8-symbol thermometer code will have 7 bits that need to represent all symbols.
Thermometer code with 7 symbols







Characteristics of thermometer code:
  • Each symbol in thermometer code is a sequence of 0s followed by a sequence of 1s
  • There cannot be 0s in-between two 1s. For example, a symbol 01011 is invalid in thermometer code
  • For an n-bit binary code, the corresponding thermometer code will have 2n – 1 symbols; hence, as many bits will be needed to represent thermometer code for the same.

How to convert from binary to thermometer code: Given below is the VHDL code for a 3-bit binary to thermometer converter. A simple case statement can be utilized for the same.
                                  
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;
entity bin2therm2bit is
                port (
                                binary_input : in std_logic_vector (1 downto 0);
                                therm_output : out std_logic_vector (6 downto 0)
                );
end bin2therm6bit;

architecture Behavioral of bin2therm6bit is
begin
                process (binary_input)
                begin
                                label1 : case binary_input is
                                                when "000" => therm_output <= "0000000";
                                                when "001" => therm_output <= "0000001";
                                                when "010" => therm_output <= "0000011";
                                                when "011" => therm_output <= "0000111";
                                                when "100" => therm_output <= "0001111";
                                                when "101" => therm_output <= "0011111";
                                                when "110" => therm_output <= "0111111";
                                                when "111" => therm_output <= "1111111";
                                                when others => therm_output <= “xxxxxxx”;
                                end case;
                end process;
end Behavioral;

Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

Virtual clock - purpose and timing

What is a virtual clock: By definition, a virtual clock is a clock without any source. Stating more clearly, a virtual clock is a clock that has been defined, but has not been associated with any pin/port. A virtual clock is used as a reference to constrain the interface pins by relating the arrivals at input/output ports with respect to it with the help of input and output delays.

How to define a virtual clock: The most simple sdc command syntax to define a virtual clock is as follows:
                create_clock –name VCLK –period 10
The above SDC command will define a virtual clock “VCLK” with period 10 ns.

Purpose of defining a virtual clock: The advantage of defining a virtual clock is that we can specify desired latency for virtual clock. As mentioned above, virtual clock is used to time interface paths. Figure 1 shows a scenario where it helps to define a virtual clock. Reg-A is flop inside block that is sending data through PORT outside the block. Since, it is a synchronous signal, we can assume it to be captured by a flop (Reg-B) sitting outside the block. Now, within the block, the path to PORT can be timed by specifying output delay for this port with a clock synchronous to clock_in. We can specify a delay with respect to clock_in itself, but there lies the difficulty of specifying the clock latency. If we specify the latency for clock_in, it will be applied to Reg-A also. Applying output delay with respect to a real clock causes input ports to get relaxed and output ports to get tightened after clock tree has been built. Let us elaborate it in some detail below. Let us assume clock period to be 10 ns and the budget allocated to be 3 ns inside; thus, having a "set_output_delay" of 7 ns.



 virtual clock is used to time interface paths. Figure 1 shows a scenario where it helps to define a virtual clock. Reg-A is flop inside block that is sending data through PORT outside the block. Since, it is a synchronous signal, we can assume it to be captured by a flop (Reg-B) sitting outside the block. Now, within the block, the path to PORT can be timed by specifying output delay for this port with a clock synchronous to clock_in. We can specify a delay with respect to clock_in itself, but there lies the difficulty of specifying the clock latency. If we specify the latency for clock_in, it will be applied to Reg-A also. Applying output delay with respect to a real clock causes input ports to get relaxed and output ports to get tightened after clock tree has been built.
Figure 1: Figure to illustrate virtual clock

Case 1: Applying "set_output_delay" with respect to real clock (R_CLK)
Pre-CTS scenario: Here, if we apply any latency to the clock, it will be applied both to launch as well as capture registers (capture register is imaginary here). So, we unltimately get a full cycle to time the path. In other words, applying or not applying a latency to the clock will time the path as needed.
Post-CTS scenario: Post-CTS, we need to "set_propagate_clock RCLK" in order for clock latencies to come into effect. Doing so, the launch register's actual clock latency will come into picture. However, since, capture register is imaginary, there is no clock built onto it and its latency will be zero. So, we get (clock_period - RCLK_latency) as the actual phase shift to time the path. Thus, timing path gets tightened by "RCLK_latency".
Case 2:  Applying set_output_delay with respect to virtual clock (VCLK)
Pre-CTS scenario: In this case, in order to provide full cycle for the path to be timed; if we have applied any latency to RCLK, we will have to apply the same latency for VCLK as well.
Post-CTS scenario: After CTS is built and clocks are propagated, network latency of RCLK will be overridden by actual latency. But VCLK will not be propagated and its source + network latencies will still be reflected as applied in constraints. If (VCLK_source_latency + VCLK_network_latency_user) is equal to (RCLK_source_latency + RCLK_network_latency_CTS), we will still see the same timing path as we see pre-CTS.
Thus, the solution to the problem is to define a virtual clock and apply output delay with respect to it. Making the source latency of virtual clock equal to network latency of real clock will solve the problem.

Can you think of any other method that can serve the purpose of a virtual clock?

Interesting problem – Latches in series


Problem: 100 latches (either all positive or all negative) are placed in series (figure 1). How many cycles of latency will it introduce?

This figure shows 100 negative level-sensitive latches connected together in a chain
Figure 1 : 100 negative level-sensitive latches in series
As we know, setup check between latches of same polarity (both positive or negative) is zero cycle with half cycle of time borrow allowed as shown in figure 2 below for negative level-sensitive latches:

Setup check between two latches of same polarity is zero cycle with half cycle of time borrow allowed.
Figure 2: Setup check between two negative level-sensitive latches

So, if there are a number of same polarity latches, all will form zero cycle setup check with the next latch; resulting in overall zero cycle phase shift.

As is shown in figure 3, all the latches in series are borrowing time, but allowing any actual phase shift to happen. If we have a design with all latches, there cannot be a next state calculation if all the latches are either positive level-sensitive or negative level-sensitive. In other words, for state-machine implementation, there should not be latches of same polarity in series.

Each latch will form a zero cycle setup check with the following latch, resulting in overall zero cycle phase shift.
Figure 3 : Timing for 100 latches in series


Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

STA

Static timing analysis (STA) is a vast domain involving many sub-fields. It involves computing the limits of delay of elements in the circuit without actually simulating it. In this post, we have tried to list down all the posts that an STA engineer cannot do without. Please add your feedback in comments to make reading it a more meaningful experience.

  • Metastability - This post discusses the basics of metastability and how to avoid it.
  • Lockup latch - The basics of lockup latch, both from timing and DFT perspective have been discussed in this post.

  • Clock latency - Read this if you wish to get acquainted with the terminology related to clock latency

  • Data checks - Non-sequential setup and hold checks have been discussed, very useful for beginners

  • Synchronizers - Different types of synchronizers have been discussed in detail

  • On-chip variations - Describes on-chip variations and the methods undertaken to deal with these
  • Temperature inversion - Discusses the concept of temperature inversion and conductivity trends with temperature

  • Timing arcs - Discusses the basics of timing arcs, positive and negative unateness, cell arcs and net arcs etc.

  • Basics of latch timing - Definition of latch, setup time and hold timing of a latch, latch timing arcs are discussed

XOR/XNOR gate using 2:1 MUX

2-input XOR gate using a 2:1 multiplexer: As we know, a 2:1 multiplexer selects between two inputs depending upon the value of its select input. The function of a 2:1 multiplexer can be given as:

OUT = IN0 when SEL = 0 ELSE IN1

Also, a 2-input XOR gate produces a ‘1’ at the output if both the inputs have different value; and ‘0’ if the inputs are same. The truth table of an XOR gate is given as:

A
B
OUT
0
0
0
0
1
1
1
0
1
1
1
0
Truth table of XOR gate

In the truth table of XOR gate, if we fix a value, say B, then

OUT = A WHEN B = 0 ELSE A’


Both the above equations seem equivalent if we connect negative of IN0 to IN1 in a multiplexer. This is how a 2:1 multiplexer will implement an XOR gate. Figure 1 below shows the implement of a 2-input XOR gate using a 2:1 Multiplexer.

An XOR gate can be implemented from a mux simply by connecting the select to one of the inputs, and the inputs to A and Abar respectively.
Implementing a 2-input XOR gate using a 2:1 Multiplexer


I hope you’ve found this post useful. Let me know what you think in the comments. I’d love to hear from you all.


2-input XNOR gate using a 2:1 multiplexer: Similarly, the truth table of XNOR gate can be written as:

A
B
OUT
0
0
1
0
1
0
1
0
0
1
1
1
Truth table of XNOR gate

In the truth table, if we fix, say A, then

OUT = B WHEN A = 1, ELSE B’


Thus, XNOR gate is the complement of XOR gate. It can be implemented if we connect A to IN1 and Abar to IN0.

An XNOR gate can be implemented from a mux simply by connecting the select to one of the inputs, and the inputs to A and Abar respectively.
2-input XNOR gate using 2:1 multiplexer


Read also:

Latch using 2:1 MUX

As we know, a 2:1 multiplexer selects between two inputs depending upon the value of its select input. Also, a latch holds its previous value when its enable pin is in a particular state (‘0’ for positive level sensitive latch and ‘1’ for negative level sensitive latch).

So, to build a positive level sensitive latch from a multiplexer, short the output with IN0 pin of the multiplexer and connect data input to IN1 and Clock input to SEL pin of multiplexer. A negative level latch can also be built similarly. Figure 1 below shows the diagram representation for the same.

Build a latch using a multiplexer


Hope you’ve found this post useful. Let us know what you think in the comments.

Also read:

String class vs dynamically allocated array

String class should be preferred over dynamically allocated array due to following limitations of dynamically allocated arrays:
  1. Whenever user calls new operator, it becomes her/his responsibility to delete it as well to avoid memory leaks. 
  2.  User must ensure that correct form of delete is called. For a single element allocation delete should be used and  for an array allocation delete[] should be used. If wrong version is used it may result into undefined behavior. 
  3. User has to make sure that there is single delete for one allocation. 
string class provides function c_str() for backward compatibility with C API's that expects char* as argument. Hence there is no reason for not to use string in place of array of char.

But, in Multi-threaded environment, there can be performance issues with string class because of reference counting(wiki link) optimization. Basically, reference counting optimization can eliminate unnecessary memory allocations and copying of characters. But in multi-threaded environment, time saved by avoiding unnecessary allocations and copying is dwarfed by time spent on behind the scenes for concurrency control.

Hence in multi threaded environment, user has following options :
  1. Check for library implementation of string class if it allows you to disable reference counting optimization.
  2. check for alternative implementation of string class that do not have reference-counting optimization that can be checked in copy constructor of class. 
  3. consider using vector<char> instead of string.  String class' member functions will not be available but most of the functionality is available through STL algorithms.    
Option 1 & 2 are not even solutions that are just checking string class or library implementation. Option 3 is a real solution.

Hope you’ve found this post useful. Let us know what you think in the comments.

References : Effective STL by Scott Meyer


What is Static Timing Analysis?

Static timing analysis (STA) is an analysis method of computing the max/min delay values of a complete circuit without actually simulating the full circuit. In STA, static delays such as gate delay and net delays are considered in each path. These delays are, then, compared against the required bounds on the delay values and/or the relationship between the delays of different gates. In STA, the circuit to be analyzed is broken down into timing paths consisting of gates, registers and nets connecting these. Normally, timing paths start from and end at registers or chip boundary. Based on origin and termination of data, timing paths can be categorized into four categories:

        1.)    Input to register paths: These paths start at chip boundary from input ports and end at registers
        2.)    Register to register paths: These paths start at register output pin and terminate at register input   pin
        3.)    Register to output paths: These paths start at a register and end at chip boundary output ports
        4.)    Input to output paths: These paths start from chip boundary at input port and end at chip               boundary at output port
Timing path from each start-point to end-point are constrained to have maximum and minimum delays. For example, for register to register paths, each path can take maximum of one clock cycle (minus input/output delay in case of input/output to register paths). The minimum delay of a path is governed by hold timing requirement of the endpoints. Thus, the maximum delay taken by a timing path governs the maximum frequency of operation.
As stated before, Static timing analysis does timing analysis without actually simulating the circuit. The delays of cells are picked from respecting technology libraries. The delays are available in libraries in tabulated form on the basis of input transition and output load, which have been calculated based by simulating the cells for a range of boundary conditions. Net delays are calculated based upon R and C models.

One important characteristic of static timing analysis that must be discussed is that static timing analysis checks the static delay requirements of the circuit without applying any vectors, hence, the delays calculated are the maximum and minimum bounds of the delays that will occur in real application scenarios with vectors applied. This enables the static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to certify the functionality of the design. Thus, static timing analysis guarantees the timing of the design whereas dynamic timing analysis guarantees functionality for real application specific input vectors.

I hope you’ve found this post useful. Let me know what you think in the comments. I’d love to hear from you all.

VLSI design interview questions

VLSI stands for Very Large Scale Integration and it enables the creation of integrated circuits by incorporating thousands, and even millions of transistors on a single chip. Before VLSI, only small functionalities could be integrated onto a chip. Most of the ICs could perform only a small set of functions such as ALU, counters etc. With the help of VLSI technology, it has become possible to get a whole system designed on a single chip.

Getting into the field of VLSI demands knowledge of some of the basic concepts, be it systems design, timing analysis, RTL design etc. We have tried to collate a few of the topics in the links below. Going through these should be helpful for you. Looking for your feedback for further improvement.

Defining a clock signal in VHDL

Defining a clock signal in VHDL
Clock is the backbone of any synchronous design. For test-benches, a clock is the most desired signal as almost every design requires a clock. Going a bit deeper, a clock signal is a binary signal that changes state every few time units. So, defining a clock in VHDL is pretty simple, as shown below in the following code:
                signal my_clock : std_logic;
                process               
                                my_clock <= ‘0’;
                                wait for 5 ns;
                                my_clock = ‘1’;
                                wait for 5 ns;
                end process;
The above code defines a clock of  period 10 ns with 5 ns high time and 5 ns low time, hence, 50% duty cycle. Since, we are assigning a value to my_clock in the code, it can wither be defines as a signal or an output. Most probably, clocks are defined in test-benches, hence, are internal signals. High time and low time don’t always need to be same. You can always define a clock that has different high and low times as shown below:
signal my_clock : std_logic;
                process
                                my_clock <= ‘0’;
                                wait for 8 ns;
                                my_clock = ‘1’;
                                wait for 2 ns;
                end process;
As we can see, now, my_clock has a duty cycle of 80%; i.e. a high time of 80% and a low time of 20%.

Defining a clock in this way, obviously, is not synthesizable as we are using delays in code, and delays cannot be synthesized. Hence, this way of defining a clock can only be used in a test-bench to test a piece of code. If you need to write a synthesizable clock, then you have to use structural coding. The simplest of clock generation circuits is a ring counter (a chain of inverters connected back-to-back), but it will have a variable frequency clock because delay of inverters changes on change in operating conditions.

some good reads about process

Today I was supposed to write a function that finds user name or user Id who is running the application. I was looking for C++ APIs that can bring this information for me. I found two API's getuid and geteuid. I could not understand the difference between these two So I did some research around it and  found one interesting paper and a nice example. unfortunately I could not read it the whole paper but it looks cool stuff so I thought of sharing it with you. You can check links in References at the end of post:

Some facts about process :  Each process has a set of user Ids and group Ids that determines which system resources like network ports and files a process can access. Certain privileged User Id or group Ids allow a process to access restricted  system resources. E.g. 0 id is preserved for superuser root and allows a process to access all resources.

Each process has three Ids :

Real User ID(ruid) : It identifies owner of process.
Effective user ID(euid) : It is used in most access control decisions.
Saved ID (suid): It stores previous User ID so that it can be used later.

Similarly a process has three group Ids :Real group ID, Effective Group ID,Saved Group ID that has the same meaning as corresponding user ID.

In linux, a process has fsuid and fsgid as well for access control to filesystem. fsuid usually follows the value in euid unless it is not set by setfsuid and similarly fsgid follows the same value in effective group ID unless explicitly set by setfsgid.

 Since access control is based on effective user-id, A process gains privilege by assigning a privileged user ID to its effective ID and drops privilege by removing privilege user ID from effective user ID. Privilege can be removed temporary or permanently.

Gaining or removing privilege temporary : Process assigns the privileged user ID to process's effective user-ID and move original effective ID saved ID so that later privilege can be removed.

Gaining or removing privilege permanently :  Process assigns or remove the privileged ID from all three User-IDs. As there is no way left to retrieve the previous id privilege is gained or removed permanently respectively. 

References :
http://www.cs.berkeley.edu/~daw/papers/setuid-usenix02.pdf
http://www.gnu.org/software/libc/manual/html_node/Setuid-Program-Example.html