Showing posts with label multicycle. Show all posts
Showing posts with label multicycle. Show all posts

Multicycle paths handling in STA

In the post Multicycle paths - the architectural perspective, we discussed about the architectural aspects of multicycle paths. In this post, we will discuss how multicycle paths are handling in backend optimization and timing analysis:

How multi-cycle paths are handled in STA: By default, in STA, all the timing paths are considered to have default setup and hold timings; i.e., all the timing paths should be covered in either half cycle or single cycle depending upon the nature of path (see setup-hold checks part 1 and setup-hold checks part 2 for reference). However, it is possible to convey the information to STA engine regarding a path being multi-cycle. There is an  SDC command "set_multicycle_path" for the same. Let us elaborate it with the help of an example:

Path from ff1 to ff2 is a multicycle path
Figure 3: Path from ff1/Q to ff2/D is multicycle path

Let us assume a multi-cycle timing path (remember, it has to be ensured by architecture) wherein both launch and capture flops are positive edge-triggered as shown in figure 3.  The default setup and hold checks for this path will be as shown in red in figure 4. We can tell STA engine to time this path in 3 cycles instead of default one cycle with the help of set_multicycle_path SDC command:

                                    set_multicycle_path 3 -setup -from ff1/Q -to ff2/D

Above command will shift both setup and hold checks forward by two cycles. That is, setup check will now become 3 cycle check and hold will be 2 cycle check as shown in blue in figure 4. This is because, by default, STA engine considers hold check one active edge prior to setup check, which, in this case, is after 3 cycles.

When you apply multicycle path only for setup, hold also moves along as default hold check is one edge prior to setup check
Figure 4: Setup and hold checks before and after applying multicyle for setup-only

However, this is not the desired scenario in most of the cases. As we discussed earlier, multi-cycle paths are achieved by either gating the clock path or data path for required number of cycles. So, the required hold check in most cases is 0 cycle. This is done through same command with switch "-hold" telling the STA engine to pull hold back to zero cycle check.

                               set_multicycle_path -hold 2 -from ff1/Q -to ff2/D

The above command will bring back the hold check 2 cycles back to zero cycle. This is as shown in figure 5 in blue.

On applying multicycle path for hold, hold check comes back to where it was intended. It does not impact setup check
Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold


We need to keep in mind the following statement:

Setting a multi-cycle path for setup affects the hold check by same number of cycles as setup check in the same direction. However, applying a multi-cycle path for hold check does not affect setup check.

So, in the above example, both the statements combined will give the desired setup and hold checks. Please note that there might be a case where only setup or hold multi-cycle is sufficient, but that is the need of the design and depends on how FSM has been modeled.


What if both clock periods are not equal: In the above example, for simplicity, we assumed that launch and capture clock periods are equal. However, this may not be true always. As discussed in multicycle path - the architectural perspective, it makes more sense to have multi-cycle paths where there is a difference in clock periods. The setup and hold checks for multicycle paths is not as simple in this case as it was when we considered both the clocks to be of same frequency. Let us consider a case where launch clock period is twice the capture clock period as shown in figure 6 below.

Setup and hold cheks in case of multicycle paths  for clocks differing in frequencies
Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock

Now, the question is, defining a multi-cycle path, what clock period will be added to the setup check, launch or capture? The answer depends upon the architecture and FSM of the design. Once you know it, the same can be modelled in timing constraints. There is a switch in the SDC command to provide for which of the clock periods is to be added. "set_multicycle_path -start" means that the path is a multi-cycle for that many cycles of launch clock. Similarly, "set_multicycle_path -end" means that the path is a multicycle for that many cycles of capture clock. Let the above given path be a multicycle of 2. Let us see below how it changes with -start and -end options.

      1. set_multicycle_path -start: This will cause a cycle of launch clock to be added in setup check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle check. Figure 7 below shows the effect of below two commands on setup and hold checks. As is shown, setup check gets relaxed by one launch clock cycle.

                      set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -start
                      set_multicycle_path 1 -hold   -from ff1/Q -to ff2/D -start

When provided with -start switch, shifts in setup and hold checks happen in multiples of launch clock period.
Figure 8: Setup and hold checks with -start option provided with set_multicycle_path

       2. set_multicycle_path -end: This will cause a cycle of capture clock to be added in setup check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle check. Figure 8 below shows the effect of below two commands on setup and hold checks. As is shown, setup gets relaxed by one cycle of capture clock.
                      set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -end
                      set_multicycle_path 1 -hold   -from ff1/Q -to ff2/D -end

When provided with -end option, shifts in setup and hold checks happen in multiples of capture clock period.
Figure 9: Setup and hold checks with -end option provided with set_multicycle_path

Why is it important to apply multi-cycle paths: To achieve optimum area, power and timing, all the timing paths must be timed at the desired frequencies. Optimization engine will know about a path being multicycle only when it is told through SDC commands in timing constraints. If we dont specify a multicycle path as multicycle, optimization engine will consider it as a single cycle path and will try to use bigger drive strength cells to meet timing. This will result in more area and power; hence, more cost. So, all multicycle paths must be correctly specified as multicycle paths during timing optimization and timing analysis.

Also read:

Multicycle paths : The architectural perspective


Definition of multicycle paths: By definition, a multi-cycle path is one in which data launched from one flop is allowed (through architecture definition) to take more than one clock cycle to reach to the destination flop. And it is architecturally ensured either by gating the data or clock from reaching the destination flops. There can be many such scenarios inside a System on Chip where we can apply multi-cycle paths as discussed later. In this post, we discuss architectural aspects of multicycle paths. For timing aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists of many components working in tandem. Each of these works on different frequencies depending upon performance and other requirements. Ideally, the designer would want the maximum throughput possible from each component in design with paying proper respect to power, timing and area constraints. The designer may think to introduce multi-cycle paths in the design in one of the following scenarios:
      
       1)      Very large data-path limiting the frequency of entire component: Let us take a hypothetical case in which one of the components is to be designed to work at 500 MHz; however, one of the data-paths is too large to work at this frequency. Let us say, minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the paths as single cycle, the component cannot work at more than 333 MHz; however, if we ignore this path, the rest of the design can attain 500 MHz without much difficulty. Thus, we can sacrifice this path only so that the rest of the component will work at 500 MHz. In that case, we can make that particular path as a multi-cycle path so that it will work at 250 MHz sacrificing the performance for that one path only.
     
     2)      Paths starting from slow clock and ending at fast clock: For simplicity, let us suppose there is a data-path involving one start-point and one end point with the start-point receiving clock that is half in frequency to that of the end point. Now, the start-point can only send the data at half the rate than the end point can receive. Therefore, there is no gain in running the end-point at double the clock frequency. Also, since, the data is launched once only two cycles, we can modify the architecture such that the data is received after a gap of one cycle. In other words, instead of single cycle data-path, we can afford a two cycle data-path in such a case. This will actually save power as the data-path now has two cycles to traverse to the endpoint. So, less drive strength cells with less area and power can be used. Also, if the multi-cycle has been implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the ways of introducing multi-cycle paths in the design:

      1)      Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the enable signal toggles once every three cycles, the data at the end-point toggles after three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and hold check is 0 cycle.
Figure 1: Introducing multicycle paths in design by gating data path



    Now let us extend this discussion to the case wherein the launch clock is half in frequency to the capture clock. Let us say, Enable changes once every two cycles. Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock (capture clock here). As is evident from the figure below, it is important to have Enable signal take proper waveform as on the waveform on right hand side of figure 2. In this case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
   
   
When the launch clock is half in frequency, it is better to make the path a multicycle of 2 because data will anyways be launched once every few cycles.
Figure 2: Introducing multi-cycle path where launch clock is half in  frequency to capture clock


        2) Through gating in clock path: Similarly, we can make the capturing flop capture data once every few cycles by clipping the clock. In other words, send only those pulses of clock to the capturing flip-flop at which you want the data to be captured. This can be done similar to data-path masking as discussed in point 1 with the only difference being that the enable will be masking the clock signal going to the capturing flop. This kind of gating is more advantageous in terms of power saving. Since, the capturing flip-flop does not get clock signal, so we save some power too.
    
Figure 3: Introducing multi cycle paths through gating the clock path
      Figure 3 above shows how multicycle paths can be achieved with the help of clock gating. The enable signal, in this case, launches from negative edge-triggered register due to architectural reasons (read here). With the enable waveform as shown in figure 3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle path of 4 cycles from launch to capture. The setup check and hold check, in this case, is also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to destination in two cycles can alternatively be implemented through pipelining the logic. This is much simpler approach in most of the cases than making the path multi-cycle. Pipelining means splitting the data-path into two halves and putting a flop between them, essentially making the data-path two cycles. This approach also eases the timing at the cost of performance of the data-path. However, looking at the whole component level, we can afford to run the whole component at higher frequency. But in some situations, it is not economical to insert pipelined flops as there may not be suitable points available. In such a scenario, we have to go with the approach of making the path multi-cycle.

References: