Showing posts with label shellbr. Show all posts
Showing posts with label shellbr. Show all posts

On chip bus power reduction techniques



The process of data transmission on an on-chip bus leads to switching activity on the bus wires, which charges and discharges the capacitance associated with the wires and consequently leads to dynamic power dissipation.
Bus encoding is widely used technique to reduce dynamic switching power. For any encoding scheme the sender encoder encodes the signal, while receiver decoder decodes the signal with inverse function. The power reduction encoding techniques can be divided into 2 categories: a) self-switching power reduction b) coupling power reduction.
Self-switching is bit toggling between 0 and 1 level on a wire over time, causing this wire capacitance charging and discharging with respect to its metal layer. Following techniques are used to address this power dissipation:
1.      Address bus encoding. It exploits the high regularity associated with address streams, which is characterized by local and temporal locality.
a.       Gray code encoding. This scheme guarantees only bit flip in case of sequential addresses access.
b.      T0 code. It uses an extra signal on bus which indicates whether the currently accessed address is the sequential of the previously accessed. If yes, the address bus isn’t toggled, and the receiver is responsible to calculate the address based on the previous.
c.       T0-C code. Here the extra signal is eliminated and instead a new address is sent to indicate the address regularity finished.
2.      Data bus encoding. Data bus on the contrary to address bus doesn’t possess any regularity but rather can be considered random. Therefore, no local and temporal locality can be effectively exploited.
a.       Bus-invert code. It uses Hamming distance (the number of changed bits) computation between the current value and the next value on the bus and inverts the value if the distance is greater than half of the bit width. An additional indication signal is used to indicate the value is inverted.
b.      Transition signaling. In this scheme logical 1 is indicated by level transition from 0 to 1 or from 1 to 0, while logical 0 doesn’t cause transition. This scheme ensures the number of transitions on bus is equal to the number of 1s and is effective with data where the number of 1s is less than the number of 0s.
Coupling power is dissipated when crosstalk between different wires of the bus happens. Following techniques are used to address this power dissipation:
1.      Address bus encoding.
a.       Permutation of address bus lines is done at physical design stage to reduce coupling. It can be achieved by orthogonal layout of the wires or passing them through different metal layers.
2.      Data bus encoding.
a.       CBI (coupling bus-invert). Is very similar to previously explained bus-invert code scheme but inverts the data to achieve better cross-coupling effect.
b.      Transition pattern coding scheme (TPC). It adds signal to the bus to encode codeword patterns in which neighboring lines change in phase.

For more power reduction schemes you can refer to On-chip Communication Architectures book.

Courtesy www.shellbr.com.

Asynchronous FIFO


ASYNC FIFO is a frequency relationship agnostic bus synchronization technique and by that can be considered practically universal.

It is convenient to choose the write/read pointers of width by one bit bigger than needed by FIFO size. The msb then will play the role of “sign”. The pointers (bus) synchronization is performed with the help of Gray encoding. Gray code encoding is a popular technique to synchronize a bus because only one bit is changed at a time. This ensures we always sample or old or new value on the bus and never – inconsistent one. “g2b” and “b2g” is the logic to convert Gray code to binary and vice versa. It is out of the scope of this article to depict its design.



//write pointer
always @ (src_clk)
      if (!rst_n)
                  wr_ptr <= ‘d0;
      else if (push)
                  wr_ptr <= wr_ptr + 1’b1;
//read pointer
always @ (dst_clk)
      if (!rst_n)
                  rd_ptr <= ‘d0;
      else if (pop)
                  rd_ptr <= rd_ptr + 1’b1;
//full
assign full = (wr_ptr[log(FIFO_SIZE)] ^ rd_ptr_synch[log(FIFO_SIZE)]) &&
(wr_ptr[log(FIFO_SIZE)-1:0] == rd_ptr_synch[log(FIFO_SIZE)-1:0]);
//empty
assign empty = (wr_ptr_synch[log(FIFO_SIZE):0] == rd_ptr[log(FIFO_SIZE):0]);

The important thing to remember is the size of the FIFO has to be exactly power of two. This is because in any other case there will be multiple bit transitions even with Gray code encoding and thus bus synchronization with only one bit changed at a time is violated.


Half-handshake synchronization scheme

Synchronization questions is one of the favorites among VLSI job interviewers. This is because they check not just the general intellectual abilities of the potential candidate but also the very specific professional knowledge which is usually acquired only by experience. When it comes to synchronization there are plenty of schemes. During the emerging interview it often comes to the "ultimate" decision - the synchronizer, which is tolerable to any source-destination conditions (relative frequencies, duration of signals, etc). The expected answer is very well known full-handshake scheme. It is definitely the "ultimate" solution. But its extra-generic nature comes at a cost of very long processing cycle (6 source + 6 destination cycles).

Less known is half hand-shake synchronization scheme which differs from full hand-shake scheme by that it utilizes signals toggling rather than level as an indication to transfer synchronization information from side to side.



At source and destination sides it is toggling (0 to 1 signal change or vice versa) of the synchronized valid signal or ack signal, which becomes an indication the synchronized output may be issued and/or the state changed. Toggled signal may be achieved by comparison (XOR) of the next and current signal value. The current signal value need to be latched at each processing cycle.

Half-handshake scheme provides 2 times better processing cycle than full hand-shake because it consists of only synchronization-acknowledge cycle rather than of synchronization-acknowledge-synchronization de-assertion-acknowledge de-assertion.