Showing posts with label multicycle path. Show all posts
Showing posts with label multicycle path. Show all posts

Zero cycle paths

Zero cycle path: A zero cycle timing path is a representative of race condition between data and clock. A zero cycle path is one in which data is launched and captured on the same edge of the clock. In other words, setup check for a zero cycle path is zero cycle, i.e., it is on the same edge as the one launching data. Hold check, then, will be one cycle before the edge at which data is launched. Figure 1 below shows the setup check and hold check for a zero cycle timing path.


In a zero cycle path, setup check is zero cycle. In other words, it is on the same edge as of launch clock.
Figure 1: Setup check and hold check for zero cycle paths


How to specify zero cycle path: As we know, by default, setup check is single cycle (is checked on the next edge with respect to the one on which data is launched). If the FSM requires a timing path to be zero cycle, it has to be specified using the SDC command "set_multicycle_path".

Default setup check for a timing path is single cycle, whereas hold check is zero cycle.
Figure 2: Default setup and hold checks for single cycle timing path
The default setup and hold check for same edge timing paths is single cycle and zero cycle as shown in figure 2 above. To model it as a zero cycle path (as in figure 1), we need to apply following timing constraint:
set_multicycle_path 0 -setup -from <startpoint> -to <endpoint>
where <startpoint> is the the flip-flop which launches the data and <endpoint> is the flip-flop which captures the data. In other words, as viewed from application perspective, zero cycle path is one of the special cases of a multi-cycle path only. Above multicycle constraint modifies the setup check to be zero cycle. Hold check also, shifts one edge back.

Also read:

Can hold check be frequency dependant?


We often encounter people argue that hold check is frequency independent. However, it is only partially true. This condition is true only for zero-cycle hold checks. By zero cycle hold checks, we mean that the hold check is performed on the same edge at which it is launched. This is true in case of timing paths between same polarity registers; e.g. between positive edge-triggered flops. Figure 1 below shows timing checks for a data-path launched from a positive edge-triggered flip-flop and captured at a positive edge-triggered flip-flop. The hold timing, in this case, is checked at the same edge at which data is launched. Changing the clock frequency will not cause hold check to change.

Setup check for positive edge-triggered flip-flop to positive edge-triggered flip-flop is single cycle and hold check is zero cycle
Figure 1: Setup and hold checks for positive edge-triggered to positive edge-triggered flip-flop
Most of the cases in today’s designs are of this type only. The exceptions to zero cycle hold check are not too many. There are hold checks for previous edge also. However, these are very relaxed as compared to zero cycle hold check. Hence, are not mentioned. Also, hold checks on next edge are impossible to be met considering cross-corner delay variations. So, seldom do we hear that hold check is frequency dependant. Let us talk of different scenarios of frequency dependant hold checks:

  1.  From positive edge-triggered flip-flop to negative edge-triggered flip-flop and vice-versa: Figure 2 below shows the setup and hold checks for a timing path from positive edge-triggered flip-flop to a negative edge-triggered flip-flop. Change in frequency will change the distance between the two adjacent edges; hence, hold check will change. The equation for hold timing will be given for below case as:

Tdata + Tclk/2 > Tskew + Thold
or
Tslack =  Tclk/2 - Thold - Tskew + Tdata
          Thus, clock period comes into picture in calculation of hold timing slack.

Both setup and hold checks are half cycle. Setup is checked on next edge whereas hold is checked on previous edge
Figure 2: Setup and hold checks for timing path from positive edge-triggered flip-flop to negative edge-triggered flip-flop

Similarly, for timing paths launching from negative edge-triggered flip-flop and being captured at positive edge-triggered flip-flop, clock period comes into picture. However, this check is very relaxed most of the times. It is evident from above equation that for hold slack to be negative, the skew between launch and capture clocks should be greater than half clock cycle which is very rare scenario to occur. Even at 2 GHz frequency (Tclk = 500 ps), skew has to be greater than 250 ps which is still very rare.
Coming to latches, hold check from a positive level-sensitive latch to negative edge-triggered flip-flop is half cycle. Similarly, hold check from a negative level-sensitive latch to positive edge-triggered flip-flop is half cycle. Hence, hold check in both of these cases is frequency dependant.

2. Clock gating hold checks: When data launched from a negative edge-triggered flip-flop gates a clock on an OR gate, hold is checked on next positive edge to the edge at which data is launched as shown in figure 3, which is frequency dependant.

Setup check is single cycle and hold check is half cycle and checked on next clock edge with respect to launch clock edge
Figure 3: Clock gating hold check between data launched from a negative edge-triggered flip-flop and and clock at an OR gate

           Similarly, data launched from positive edge-triggered and gating clock on an AND gate form half cycle hold. However, this kind of check is not possible to meet under normal scenarios considering cross-corner variations.

3)      Non-default hold checks: Sometimes, due to architectural requirements (e.g. multi-cycle paths for hold), hold check is non-zero cycle even for positive edge-triggered to positive edge-triggered paths as shown in figure 4 below.
Figure 4: Non-default hold check with multi-cycle path of 1 cycle specified







Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References: