On-chip variations – the STA takeaway

Static timing analysis of a design is performed to estimate its working frequency after the design has been fabricated. Nominal delays of the logic gates as per characterization are calculated and some pessimism is applied above that to see if there will be any setup and/or hold violation at the target frequency. However, all the transistors manufactured are not alike. Also, not all the transistors receive the same voltage and are at same temperature.  The characterized delay is just the delay of which there is maximum probability. The delay variation of a typical sample of transistors on silicon follows the curve as shown in figure 1. As is shown, most of the transistors have nominal characteristics. Typically, timing signoff is carried out with some margin. By doing this, the designer is trying to ensure that more number of transistors are covered. There is direct relationship between the margin and yield. Greater the margin taken, larger is the yield. However, after a certain point, there is not much increase in yield by increasing margins. In that case, it adds more cost to the designer than it saves by increase in yield. Therefore, margins should be applied so as to give maximum profits.

Most of the transisors have close to nominal delay. However, some transistors have delay variations. Theoretically, there is no bound existing for delay variations. However, probabilty of having that delay decreases as delay gets far from nominal.
Number of transistors v/s delay for a typical silicon transistors sample


We have discussed above how variations in characteristics of transistors are taken care of in STA. These variations in transistors’ characteristics as fabricated on silicon are known as OCV (On-Chip Variations). The reason for OCV, as discussed above also, is that all transistors on-chip are not alike in geometry, in their surroundings, and position with respect to power supply. The variations are mainly caused by three factors:
  • Process variations: The process of fabrication includes diffusion, drawing out of metal wires, gate drawing etc. The diffusion density is not uniform throughout wafer. Also, the width of metal wire is not constant. Let us say, the width is 1um +- 20 nm. So, the metal delays are bound to be within a range rather than a single value. Similarly, diffusion regions for all transistors will not have exactly same diffusion concentrations. So, all transistors are expected to have somewhat different characteristics.
  • Voltage variation: Power is distributed to all transistors on the chip with the help of a power grid. The power grid has its own resistance and capacitance. So, there is voltage drop along the power grid. Those transistors situated close to power source (or those having lesser resistive paths from power source) receive larger voltage as compared to other transistors. That is why, there is variation seen across transistors for delay.
  • Temperature variation: Similarly, all the transistors on the same chip cannot have same temperature. So, there are variations in characteristics due to variation in temperatures across the chip.


How to take care of OCV: To tackle OCV, the STA for the design is closed with some margins. There are various margining methodologies available. One of these is applying a flat margin over whole design. However, this is over pessimistic since some cells may be more prone to variations than others. Another approach is applying cell based margins based on silicon data as what cells are more prone to variations. There also exist methodologies based on different theories e.g. location based margins and statistically calculated margins. As advances are happening in STA, more accurate and faster discoveries are coming into existence.

Latency and throughput – the two measures of system performance

Performance of the system is one of the most stringent criteria for its success. While performance increases the desirability among customers, cost is what makes it affordable. This is the reason why system designers aim for maximum performance with available resources such as power and area constraints. There are two related parameters that determine the performance output of a system –

Throughput - Throughput is a measure of the productivity of the system. In electronic/communication systems, throughput refers to rate at which output data is produced. Higher the throughput, more productive is the system. In most of the cases, it is measured as time difference between two consecutive outputs (nth and n+1th). Throughput also refers to the rate at which input data can be applied to system.
Let us discuss with the help of an example:

throughput summary diagram


Above figure depicts the throughput of 3 number adder. Result of input set applied at 1st clock cycle appears at output at 3rd clock cycle and in 4th clock cycle next input set is applied and output comes in 6th clock cycle.  Hence, throughput of above design is ⅓ per clock cycle. As we can see from diagram, first input is applied in first clock cycle and 2nd input is applied in 4th clock cycle. Hence we can also say that throughput is rate at which input data can be applied to system.

Latency- Latency is the time taken by a system to produce output after input is applied. It is a measure of delay response of a design. Higher the latency value, slower is the system. in synchronous designs, it is measured in terms of number of clock cycles. In combinational designs, latency is basically propagation delay of circuit. In non pipelined designs, latency improvement is major area of concern. In more general terms, it is time difference between output and input time.
Latency
Relationship between throughput and latency: Both latency and throughput are inter-related. It is desired to have maximum throughput and minimum latency. Increasing latency and/or throughput might make the system costly. Let us take an example. Consider a park with 3 rides and it takes 5 minutes for a ride.  A child can take sequentially these rides; i.e, ride 1, ride 2 and then ride 3. Firstly, let us assume that only one child at a time is allowed to enter park at a time. While he is taking a ride, no one is allowed to enter the park. Thus, the throughput of the park is 15 minutes per child and latency is 15 minutes. Now, let us assume that while a child has finished taking ride1, another child is allowed to enter park. Thus, in this case, throughput will be 5 minutes per child whereas latency is still 15 minutes. Thus, we have increased the throughput of the system without affecting latency and at the same cost.