C function that converts hexadecimal value to decimal value.

Hexadecimal to decimal conversion is something that is often needed in hardware. Below functions can be used for hexadecimal to decimal conversion in C:
#include<stdio.h>#include<conio.h>#include<string.h>
int get_value(char a)  { if(a>='0'&& a<='9' ) { return (a- '0'); } else if(a>='A' && a<='F') return ((a-'0')-7); } else if(a>='a' && a<='f') return ((a-'0')-39); else return -1;
}


int htoi(char a[]){ int len=strlen(a); int temp=0; for(int i=0;i<len;i++) { int digit=get_value(a[i]); if(digit == -1){ return -1; } temp=temp*16+digit; } return temp;}
int main(){ char a[]="f0"; clrscr(); int b=htoi(a); if(b == -1) printf("invalid input"); else printf("decimal value is %d",b); getch();        return 0;}

Interesting programming quiz : Array Bound Read Error

Problem: Can you figure out what is wrong with following piece of code?


#include <iostream>int main() {     int a[5] = {1,2,3,4,5};     for (int i = 4; a[i] >= 0 && i >=0 ; i--) {          std::cout<< "ith element of array is "<<a[i]<<std::endl;     }}

I would suggest you to  try it yourself before scrolling down to see the answer. Its quite interesting,
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
......
Answer : Here, as one can figure out, the intention is to print array elements from end till we don't hit any negative number. In the first look it may seem fine but unfortunately it will end up in ABR (Array Bound Read).

Explanation : After the completion of 5th iteration; i.e. when i = 0, compiler will decrement i;  i.e., i will become "-1". It will, then, try to check the condition, which will result in reading a[-1]. Since, array can have indexes only greater than or equal to 0, it will result in an error. Trying to read array elements out of the allowed indexes is termed as Array Bound Read Error. Hence, one should avoid such conditions because it can result into random result. The program can crash anytime. If you are lucky, it may run successfully also. Its all up to your luck.  Instead, it should be

for (int i = 5; i>=0 && a[i]  >= 0 ; i--) {

i.e. first check index value and then do the array access operation.

Here, with the above solution, one more interesting thing comes up to understand. In AND (&&) operation compiler first evaluates  condition1; if it is true, then goes to evaluate condition2; otherwise return false from there only.

For example,
#include <iostream>
int main() {int i = 0;int j= 1;if(  ( i == 1) && (++j ==3)  ) {      std::cout<<"inside if"<<std::endl;}std::cout<<"i is "<<i<<" and j is "<<j<<std::endl;}
Output :
i is 0 and j is 1


Here, as you can see code control will not go into if branch as none of condition is true. since condition1 i==1 is false, compiler will not even check condition2 i.e. value of j will not be incremented.


Internally,  compiler might be doing some following kind of transformation to evaluate && operation

  bool cond = (i==1);  if( cond ) {      cond = (++j != 0) ;  }if(cond){      std::cout<<"inside if"<<std::endl;}

Function Overloading

Function overloading is a feature inherent in many programming languages including c++. It allows a user to write multiple functions with same name but with different signatures. On calling the function, the version of the function corresponding to the signature will be referred to. Function signature includes function parameters/arguments, but it does not include return type. Function signature may differ in terms of number of parameters or type of parameters. Let us illustrate with the help of a few examples:

Example 1: The two functions below are overloaded, since the return type of arguments differ:
int func(int a,int b);double func(double a,double b);
Example 2: The two functions below are overloaded because they differ in the number of arguments:
int func(int a,int b);int func(int a,int b,int c);
Example 3: The two functions below are not overloaded because they differ only in terms of their return type; the number and type of all the arguments is same.
void func(int a,int b,int c);int func(int a,int b,int c);


Please note that C does not support function overloading because there is no concept of name mangling in C. On the other hand, C++ does support function overloading as name mangling is supported in C++. Name mangling is mangled name of function name and its signature which is used by C++ compiler internally to refer to functions. For instance, in above example 1, mangled name of functions will look something like shown below:
func__int_intfunc__double_double

This way C++ compiler can handle function overloading. 


Note : above are not the actual mangled names. Compiler can make some more complicated names. this is just for understanding. 

Also read:


On-chip variations – the STA takeaway

Static timing analysis of a design is performed to estimate its working frequency after the design has been fabricated. Nominal delays of the logic gates as per characterization are calculated and some pessimism is applied above that to see if there will be any setup and/or hold violation at the target frequency. However, all the transistors manufactured are not alike. Also, not all the transistors receive the same voltage and are at same temperature.  The characterized delay is just the delay of which there is maximum probability. The delay variation of a typical sample of transistors on silicon follows the curve as shown in figure 1. As is shown, most of the transistors have nominal characteristics. Typically, timing signoff is carried out with some margin. By doing this, the designer is trying to ensure that more number of transistors are covered. There is direct relationship between the margin and yield. Greater the margin taken, larger is the yield. However, after a certain point, there is not much increase in yield by increasing margins. In that case, it adds more cost to the designer than it saves by increase in yield. Therefore, margins should be applied so as to give maximum profits.

Most of the transisors have close to nominal delay. However, some transistors have delay variations. Theoretically, there is no bound existing for delay variations. However, probabilty of having that delay decreases as delay gets far from nominal.
Number of transistors v/s delay for a typical silicon transistors sample


We have discussed above how variations in characteristics of transistors are taken care of in STA. These variations in transistors’ characteristics as fabricated on silicon are known as OCV (On-Chip Variations). The reason for OCV, as discussed above also, is that all transistors on-chip are not alike in geometry, in their surroundings, and position with respect to power supply. The variations are mainly caused by three factors:
  • Process variations: The process of fabrication includes diffusion, drawing out of metal wires, gate drawing etc. The diffusion density is not uniform throughout wafer. Also, the width of metal wire is not constant. Let us say, the width is 1um +- 20 nm. So, the metal delays are bound to be within a range rather than a single value. Similarly, diffusion regions for all transistors will not have exactly same diffusion concentrations. So, all transistors are expected to have somewhat different characteristics.
  • Voltage variation: Power is distributed to all transistors on the chip with the help of a power grid. The power grid has its own resistance and capacitance. So, there is voltage drop along the power grid. Those transistors situated close to power source (or those having lesser resistive paths from power source) receive larger voltage as compared to other transistors. That is why, there is variation seen across transistors for delay.
  • Temperature variation: Similarly, all the transistors on the same chip cannot have same temperature. So, there are variations in characteristics due to variation in temperatures across the chip.


How to take care of OCV: To tackle OCV, the STA for the design is closed with some margins. There are various margining methodologies available. One of these is applying a flat margin over whole design. However, this is over pessimistic since some cells may be more prone to variations than others. Another approach is applying cell based margins based on silicon data as what cells are more prone to variations. There also exist methodologies based on different theories e.g. location based margins and statistically calculated margins. As advances are happening in STA, more accurate and faster discoveries are coming into existence.

Latency and throughput – the two measures of system performance

Performance of the system is one of the most stringent criteria for its success. While performance increases the desirability among customers, cost is what makes it affordable. This is the reason why system designers aim for maximum performance with available resources such as power and area constraints. There are two related parameters that determine the performance output of a system –

Throughput - Throughput is a measure of the productivity of the system. In electronic/communication systems, throughput refers to rate at which output data is produced. Higher the throughput, more productive is the system. In most of the cases, it is measured as time difference between two consecutive outputs (nth and n+1th). Throughput also refers to the rate at which input data can be applied to system.
Let us discuss with the help of an example:

throughput summary diagram


Above figure depicts the throughput of 3 number adder. Result of input set applied at 1st clock cycle appears at output at 3rd clock cycle and in 4th clock cycle next input set is applied and output comes in 6th clock cycle.  Hence, throughput of above design is ⅓ per clock cycle. As we can see from diagram, first input is applied in first clock cycle and 2nd input is applied in 4th clock cycle. Hence we can also say that throughput is rate at which input data can be applied to system.

Latency- Latency is the time taken by a system to produce output after input is applied. It is a measure of delay response of a design. Higher the latency value, slower is the system. in synchronous designs, it is measured in terms of number of clock cycles. In combinational designs, latency is basically propagation delay of circuit. In non pipelined designs, latency improvement is major area of concern. In more general terms, it is time difference between output and input time.
Latency
Relationship between throughput and latency: Both latency and throughput are inter-related. It is desired to have maximum throughput and minimum latency. Increasing latency and/or throughput might make the system costly. Let us take an example. Consider a park with 3 rides and it takes 5 minutes for a ride.  A child can take sequentially these rides; i.e, ride 1, ride 2 and then ride 3. Firstly, let us assume that only one child at a time is allowed to enter park at a time. While he is taking a ride, no one is allowed to enter the park. Thus, the throughput of the park is 15 minutes per child and latency is 15 minutes. Now, let us assume that while a child has finished taking ride1, another child is allowed to enter park. Thus, in this case, throughput will be 5 minutes per child whereas latency is still 15 minutes. Thus, we have increased the throughput of the system without affecting latency and at the same cost.

What is Logic Built-in Self Test (LBIST)

LBIST stands for Logic Built-In Self Test. As VLSI marches to deep sub-micron technologies, LBIST is gaining importance due to the unique advantages it provides. LBIST refers to a self-test mechanism for testing random logic. The logic can be tested with no intervention from the outside world. In other words, a piece of hardware and/or software is inbuilt into an integrated circuit to test itself. By random logic, is meant any form of hardware (logic gates, memories etc.) that can form a part or whole of the chip. A generic LBIST system is implemented using STUMPS (Self-Test Using MISR and PRPG) architecture. A typical LBIST system is as shown in the figure below:

A typical LBIST system consists of a PRPG, An LUT and a MISR controlled by an LBIST controller
Figure 1: A typical LBIST system


Components of an LBIST system: A typical LBIST system comprises following:
  1. Logic to be tested, or, as is called Circuit Under Test (CUT): In case of LBIST, the logic to be tested through LBIST is the Circuit under Test (CUT). Any random logic residing on the chip can be brought under LBIST following a certain procedure.
  2. PRPG (Pseudo-Random Pattern Generator): A PRPG generates input patterns that are applied to internal scan chains of the CUT for LBIST testing. In other words, PRPG acts as a Test Pattern Generator (TPG) for LBIST. A PRPG can either use a counter or an LFSR for pattern generation.
  3. MISR (Multi-Input Signature Register): MISR obtains the response of the device to the test patterns applied. An incorrect MISR output indicates a defect in the CUT. In classical language, MISR acts as a ORA (Output Response Analyzer) for LBIST testing.
  4. A master (LBIST controller): The controller controls the functioning of the LBIST; i.e. clocks propagation, initialization and scan patterns flow in and out of the LBIST scan chains.

One of the most stringent requirements in LBIST testing is the prohibition of X-sources. There cannot be any source of ‘X’ during LBIST testing. By ‘X’, is meant a definite, but unknown value. It might be either ‘0’ or ‘1’, but it is not known what value is being propagated. All X-sources are masked and a known value is allowed to be propagated in LBIST.

Why ‘X’ is prohibited in LBIST: As stated above, there cannot be any ‘X’ propagating during LBIST testing. The reason behind this is that LBIST involves MISR to calculate the signature of the LBIST patterns. Since, the resulting signal is unique, any unknown value can result in the corruption of the signature. So, there cannot be any ‘X’ in LBIST testing.

Advantages of LBIST: As stated above, there are many unique advantages of LBIST that make it desirable, especially in safety critical designs such as those used in automobiles and aeroplanes. LBIST offers many advantages as listed below:
  • LBIST provides self-test capability to logic inside chip; thus, the chip can test itself without any external control and interference.
  • This provides the ability to be tested at higher frequencies reducing test time considerably.
  • LBIST can run while the chip is on field running functionally. Thus, it is very useful in safety critical applications wherein faults developed on field can be easily detectable at startup before chip goes into functional mode.

Overheads due to LBIST: Along with many advantages, there are some overheads due to LBIST as mentioned below:
(i)                  The LBIST implementation involves some hardware on-chip to control LBIST. So, there are area and power impacts due to these. In other words, the cost of chip increases.
(ii)                Also, ‘X’-masking involves addition of extra logic gates in already timing critical functional signals causing impact on timing as well.
(iii)               Another disadvantage of using LBIST is that even the on-chip test equipment may fail. This is not the problem with testing using outside equipment with proven test circuitry
References:
  1. Identification and reduction of safe-stating points in LBIST designs 
  2. Logic built-in self-test
  3. Challenges in LBIST verification of high reliability SoCs
Also read:

Worst Slew Propagation


Worst slew propagation is a phenomenon in Static Timing Analysis. According to it, the worst of the slews at the input pin of a gate is propagated to its output. As we know, the output slew of a logic cell is a function of its input slew and output load. For a multi-input logic gate, the output slew should be different for the timing paths through its different input pins. However, this is not the case. This is due to the reason that to maintain a timing grapth, each node in the design can have only 1 slew. So, to cover the worst scenario for setup timing, the maximum slew at each output pin should be equal to that caused by the input pin having worst of the slews. The output slew calculated is on the basis of worst input slew, even if the timing path for which the output slew is being calculated is not through the input pin with worst slew. Similarly, the best of the slews is calculated based upon the effect of all the input pins for hold timing analysis. We can refer to it as best slew propagation.

Let us illustrate with the help of a 2-input AND gate. As shown in figure below, let the slews at the input pins be denoted as SLEW_A and SLEW_B and that at the output pin as SLEW_OUT. Now, as we know:

SLEW_OUT = func (SLEW_A) if A toggles leading to OUT toggling
And SLEW_OUT = func (SLEW_B) if B toggles leading to OUT toggling

However, even though the timing path as shown through A pin, the resultant slew at output SLEW_OUT will be calculated as:

SLEW_OUT         =  func (SLEW_A) if func(SLEW_A) > func(SLEW_B)

                                =  func (SLEW_B) if func(SLEW_B) > func(SLEW_A)



Worst slew propagation is carried out through the worst of all the slews caused by each input pin
Figure 1: Figure showing worst slew propagation

One may feel this as an over-pessimism inserted by timing analysis tool. Path based timing analysis will not have worst slew propagation phenomenon as it calculates output slew for each timing path rather than one slew per node. 

Similarly, for performing timing analysis for hold violations, the best of the slews at inputs is propagated to the output as mentioned before also. 

Also read:



Depletion MOSFET and negative logic. Why it is not possible?


As we know, depletion MOSFET conducts current even with gate and source at same voltage level. To cut-off the current in depletion MOSFET, a voltage has to be applied at gate so as to exhaust the already existing carriers inside the channel. On the other hand, enhancement type MOSFET is cut-off when gate and source are at same voltage.
Taking the example of NMOS, for a depletion MOS, with source and gate at same level, there is still a channel available, hence, it conducts electric current. To bring it to cut-off, a negative potential is needed to be applied at gate (considering source at ‘X’ potential). Thus, with source at ‘X’ potential and gate at ‘X’ potential, drain attains the potential of source. Since, to cut-off the device, gate has to be given a voltage less than ‘X’, so we can say “when Gate is 1 and source is 1, then drain is 1”.  On the other hand, when source is 1 and gate is 0, drain attains ‘high impedance’. The reverse is true for PMOS.
Similarly, with the same logic, for an enhancement NMOS, “When Gate is 1 and source is 0, drain attains 0 potential”; similarly, “When Gate is 0 and source is 0, drain is 0”. The reverse is true for PMOS.

Source voltage
Gate voltage
Drain voltage for enhancement NMOS
Drain voltage for enhancement PMOS
Drain voltage for depletion NMOS
Drain voltage for depletion PMOS
0
0
Z
Z
0
0
0
1
0
Z
0
Z
1
0
Z
1
Z
1
1
1
Z
Z
1
1

Thus, we can say that it is due to the inherent properties of NMOS and PMOS that that they cannot be used to create negative level logic.

Enhancement and depletion MOSFETs


A MOSFET (Metal Oxide Semiconductor Field Effect Transistor) is a 4-terminal device with Source, Drain, Gate and Body as its terminals. It is used for amplification or switching of electronic signals and is the most common transistor in both digital and analog integrated circuits. The generic structure of a MOSFET is shown in figure 1. The source and drain terminals are separated by a channel. The conduction of the channel is determined by the carrier density in the channel which is a function of voltage applied at the gate terminal. The body terminal is normally connected to the source so as to allow only minimal leakage current to flow.


A MOSFET has 4 terminals, source, drain, gate and body (bulk)
Figure 1: A MOSFET

MOSFETs are categorized into two categories based upon the nature of channel:
        1)      Enhancement mode MOSFETs: In an enhancement MOSFET, the channel is devoid of carriers. The channel has to be created by creating a suitable voltage difference between gate and source terminals. With gate and source at same potential, only minimal current flows. However, when a positive potential difference is applied which is greater than threshold voltage for the MOSFET, a channel is created. Thus, the current will now flow between source and drain if there is a potential difference between them. Figure 2 below shows how a channel is formed on applying a voltage between source and gate terminals.

Figure 2: Channel formation in Enhancement MOSFET


        2)      Depletion mode MOSFETs: In a depletion mode MOSFET, the channel is already present with the help of ion-implantation.  Even with gate and source at same voltage, it will conduct current. The channel has to be depleted by applying suitable potential.






Negative gate delay - is it possible

As discussed in our post ‘propagation delay’, the difference in time from the input reaching 50% of the final value of the transition to that of the output is termed as propagation delay. It seems a bit absurd to have negative value of propagation delay as it provides a misinterpretation of the effect happening before the cause. Common sense says that the output should only change after input. However, under certain special cases, it is possible to have negative delay. In most of such cases, we have one or more of the following conditions:
i)                    A high drive strength transistor
ii)                   Slow transition at the input
iii)                 Small load at the output

Under all of the above mentioned conditions, the output is expected to transition faster than the input signal, and can result in negative propagation delay. An example negative delay scenario is shown in the figure below. The output signal starts to change only after the input signal; however, the faster transition of the output signal causes it to attain 50% level before input signal, thus, resulting in negative propagation delay. In other words, negative delay is a relative concept.
The negative propagation delay can result in certain scenarios as shown in the figure below
Figure 1: Input and output transitions showing negative input delay


Propagation Delay


What is propagation delay: Propagation delay of a logic gate is defined as the time it takes for the effect of change in input to be visible at the output. In other words, propagation delay is the time required for the input to be propagated to the output. Normally, it is defined as the difference between the times when the transitioning input reaches 50% of its final value to the time when the output reaches 50% of the final value showing the effect of input change. Here, 50% is the defined as the logic threshold where output (or, in particular, any signal) is assumed to switch its states.


2 input AND gate
Figure 1: 2-input AND gate

Propagation delay example: Let us consider a 2-input AND gate as shown in figure 1, with input ‘I2’ making transition from logic ‘0’ to logic ‘1’ and 'I1' being stable at logic value '1'. In effect, it will cause the output ‘O’ also to make a transition. The output will not show the effect immediately, but after certain time interval. The timing diagram for the transitions are also shown. The propagation delay, in this case, will be the time interval between I2 reaching 50% while rising to 'O' reaching 50% mark while rising as a result of 'I2' making a transition. The propagation delay is labeled as “TP” in figure 2.

The propagation delay is the time from 50 percent of transitioning input to 50% of transitioning output
Figure 2: Propagation delay


On what factors propagation delay depends: The propagation delay of a logic gate is not a constant value, but is dependent upon two factors:

  1. Transition time of the input causing transition at the output: More the transition time at the input, more will be the propagation delay of the cell. For less propagation delays, the signals should switch faster.
  2. The output load being felt by the logic gate: Greater is the capacitive load sitting at the output of the cell, more will be the effort put (time taken) to charge it. Hence, greater is the propagation delay.
How Propagation delay of logic gates is calculated: In physical design tools, there can be following sources of calculation of propagation delay:

  • Liberty file: Liberty file contains a lookup table for the each input-to-output path (also called as cell arc) for logic gates as .lib models. The table contains values for different input transition times and output loads corresponding to cell delay. Depending upon the input transition and output load that is present in the design for the logic gate under consideration, physical design tools interpolate between these values and calculate the cell delay.
  • SDF file: SDF (Standard Delay Format) is the extracted delay information of a design. The current delay information, as calculated, can be dumped into SDF file. It can, then, be read back. In case SDF is read, delays are not calculated and SDF delays are given precedence.

Output transition time: The output transition time is also governed by the same two factors as propagation delay. In other words, larger transition time and load increase the transition time of the signal at the output of the logic gate. So, for better transition times, both of these should be less.

Read next : Negative delay- How is it possible


Also read:

Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References: