KR100221315B1

KR100221315B1 - A pipelined adder

Info

Publication number: KR100221315B1
Application number: KR1019960072052A
Authority: KR
Inventors: 이수정
Original assignee: 전주범; 대우전자주식회사
Priority date: 1996-12-26
Filing date: 1996-12-26
Publication date: 1999-09-15
Also published as: KR19980053016A

Abstract

본 발명은 다수 피연산자 가산회로에 관한 것으로 더욱 자세하게는, 월리스 트리 마지막 스테이지에서 두개 출력을 최종 가산하는 병합 가산기를 파이프라인 구조로 제작하여 동작 사이클 주기가 감소된 파이프라인 가산기에 관한 것으로서, 제 1 클럭 기간에 제 1 비트 그룹(A[0:k-1],B[0:k-1])을 가산하여 제 1합(S[0:k-1])과 제 1 캐리(C[k])를 출력하는 제 1 가산부(60); 제 2 클럭 기간에 상기 제 1 캐리(C[k]) 및 제 2 비트 그룹(A[k:2k-1],B[k:2k-1])을 가산하여 제 2합(S[k:2k-1])과 제 2 캐리(C[2k])를 출력하는 제 2 가산부(61); 제 3 클럭 기간에 상기 제 2 캐리(C[2k]) 및 제 3 비트 그룹(A[2k:3k-1],B[2k:3k-1])을 가산하여 제 3 합(S[2k:3k-1])과 제 3 캐리(C[3k])를 출력하는 제 3 가산부(62); 및 제 4 클럭 기간에 상기 제 3 캐리 (C[3k])및 제 4 비트 그룹(A[3k:n-1],B[3k:n-1])을 가산하여 제 4 합(S[3k:n-1])과 제 4 캐리(C[n])를 출력하는 제 4 가산부(63)로 구성된 파이프라인 구조로서, 상위 스테이지의 지연 시간에 비해 상대적으로 큰 지연시간을 갖던 병합 가산기 자체의 내부회로를 파이프라인 구조화 하여, 모든 스테이지의 지연 시간이 거의 비슷해지므로써, 전체 파이프라인의 동작 주파수가 종전에 비해 증가하는 효과가 있다.The present invention relates to a plurality of operand adder circuits, and more particularly, to a pipeline adder having a pipelined structure in which a merge adder that finally adds two outputs at the last stage of a Wallace tree has a reduced operation cycle period. The first sum group S [0: k-1] and the first carry C [k] by adding the first bit groups A [0: k-1] and B [0: k-1] to the period. A first adder 60 for outputting; In the second clock period, the first carry C [k] and the second bit group A [k: 2k-1] and B [k: 2k-1] are added to add a second sum S [k: 2k-1]) and a second adder 61 for outputting the second carry C [2k]; In the third clock period, the second carry C [2k] and the third bit group A [2k: 3k-1] and B [2k: 3k-1] are added to add a third sum S [2k: 3k-1]) and a third adder 62 for outputting a third carry C [3k]; And a fourth sum S [3k] by adding the third carry C [3k] and the fourth bit group A [3k: n-1], B [3k: n-1] to a fourth clock period. : n-1]) and a fourth adder 63 for outputting the fourth carry C [n]. The merge adder itself has a relatively large delay time compared to the delay time of the upper stage. By internalizing the internal circuit of the pipeline, the delay times of all stages are almost the same, so that the operating frequency of the entire pipeline is increased.

Description

Pipeline adder

본 발명은 다수 피연산자 가산회로에 관한 것으로 더욱 자세하게는, 월리스 트리 마지막 스테이지에서 두개 출력을 최종 가산하는 병합 가산기를 파이프라인 구조로 제작하여 동작 사이클 주기가 감소된 파이프라인 가산기에 관한 것이다.The present invention relates to a multi-addition adder circuit, and more particularly, to a pipeline adder having a pipelined structure by constructing a merge adder that finally adds two outputs at the last stage of the Wallace tree.

일반적으로, 고성능 컴퓨터의 연산 장치나 신호 처리 시스템 및 각종 특수 목적용 칩에서 필수적으로 사용되는 다수개의 피연산자를 더하는 가산회로(multioperand addition circuit)로는 월리스 트리(Wallace Tree)가 가장 널리 이용되고 있다.In general, Wallace Tree is most widely used as a multioperand addition circuit that adds a plurality of operands which are used in computing devices, signal processing systems, and various special purpose chips of high-performance computers.

여기서, '월리스 트리'는 C.S. Wallace가 1964년 IEEE지에 게재한 "A Suggestion for a Fast Multiplier"논문을 참조하여 설명Here, Wallace Tree is C.S. See Wallace's article "A Suggestion for a Fast Multiplier" published in IEEE in 1964

도 1a는 월리스 트리의 구조를 설명1A illustrates the structure of a Wallace tree

도 1a에서 보는 바와 같이 첫번째 단의 3개 가산기는 3입력 2출력 전가산기로, 가산기의 입력선은 9행 부분곱을 입력 받고, 다음 가산기로 합과 캐리를 출력하고, 마지막 단의 가산기는 2개 출력선(CARRY, SUM)을 출력한다.As shown in FIG. 1A, the three adders in the first stage are three-input two-output full adders, the input line of the adder receives a partial row product of nine rows, and outputs a sum and a carry in the next adder. Output line (CARRY, SUM).

도 1b는 수정형 월리스 트리를 설명1B illustrates a modified Wallace tree

도 1b에서 보는 바와 같이, 전가산기의 캐리 출력(C)과 합 출력(S)의 지연 시간이 서로 다르므로 같은 지연시간을 갖는 출력을 모아서 처리한다. 이렇게 하므로써 수정형 월리스 트리는 기존의 월리스 트리보다 적은 게이트 지연시간을 갖는다. 그리고, 트리의 입력 비트수가 증가 할수록 게이터 지연시간이 더욱 감소한다.As shown in FIG. 1B, since the delay times of the carry output C and the sum output S of the full adder are different from each other, outputs having the same delay time are collected and processed. In this way, the modified Wallace tree has less gate latency than the existing Wallace tree. As the number of input bits in the tree increases, the gator delay time further decreases.

이와 같이 월리스 트리는 반가산기나 전가산기 혹은 카운터를 이용하여 트리를 구성하여, 두 개이상의 피연산자를 최단시간에 연산할수 있다.In this way, the Wallace tree forms a tree using a half adder, a full adder, or a counter, so that two or more operands can be calculated in the shortest time.

각종 응용 분야에 따라 처리하고자 하는 데이터가 많게 되면 상기에 보인 월리스 트리를 파이프라인화 하여 병렬처리를 하기도 한다.If there is a lot of data to be processed according to various application fields, the wallless tree shown above may be pipelined to perform parallel processing.

한편, 파이프라인 처리는 데이터 처리의 고속화 방식의 하나로서, 시스템의 효율을 높이기 위하여 둘 이상의 프로세서가 서로 다른 부분을 병렬 처리하여 앞의 프로세서에 의하여 산출된 결과가 다음 프로세서의 입력으로 이용될 수 있도록 한다. 즉, 파이프라인 끝에서 연속적으로 입력한 데이터를 파이프라인 중의 각 스테이지에서 순차 처리를 하고, 다른 끝에서 연속 출력을 얻는 방법을 가르며, 데이터 처리를 가능한 범위에서 세분화하여 그들을 각 스테이지에 할당하므로써 물리적으로는 1개의 파이프라인상에서 스테이지 수만큼 병렬 처리가 실행되는 것이다.On the other hand, pipeline processing is one of the methods of speeding up data processing. In order to increase the efficiency of the system, two or more processors process different parts in parallel so that the result calculated by the previous processor can be used as the input of the next processor. do. In other words, the data that is continuously inputted at the end of the pipeline is processed sequentially in each stage of the pipeline, and the continuous output is obtained at the other end, and the data processing is physically divided by assigning them to each stage. Parallel processing is executed by the number of stages in one pipeline.

도 2는 일반적인 파이프라인 프로세싱을 설명2 illustrates general pipeline processing

따라서, 각 스테이지중 가장 긴 지연 시간을 갖는 스테이지에 의해 파이프라인 클럭 주기가 하기 수학식과 같이 결정된다.Therefore, the pipeline clock period is determined by the stage having the longest delay time in each stage as shown in the following equation.

[수학식 1][Equation 1]

상기 수학식에서는 각 스테이지의 지연 시간이며,은 각 인터페이스 래치의 지연 시간으로, 가장 긴 스테이지 지연 시간과 인터페이스 래치의 지연 시간을 합한 시간이 파이프라인의 클럭 주기로 결정되어 지고, 각 인터페이스 래치는 클럭 주기에 동기되는 것이다.In the above equation Each stage Is the delay time of Is the delay time for each interface latch, the longest stage delay Delay Time for Interface and Interface Latch Time Determined by the clock cycle of this pipeline, each interface latch is clocked To be motivated.

방대한 데이터를 가산해야하는 경우에 상기에 설명If you need to add massive data explained above

그 일례로서, 본 발명자에 의해 제안된 가산기 구조가 있으며, 이것은 움직임 추정시 절대에러값을 구하는 과정에 필요한 가산과정에서 월리스 트리를 적용한 파이프라인을 통해 고속 가산을 수행하였다.As an example, there is an adder structure proposed by the present inventor, which performs fast addition through a pipeline to which a Wallace tree is applied in the addition process required to obtain an absolute error value in motion estimation.

도 1에서 본 바와같이 월리스 트리는 2개의 출력선을 가지므로 최후의 합값을 구하기 위해서는 두 출력값을 일반 가산기로 구해야만 하며, 바로 이 점이 월리스 트리를 적용한 파이프라인에서 문제가 되는 부분이다.As shown in FIG. 1, since the Wallless tree has two output lines, two output values must be obtained by using a general adder to obtain the final sum value. This is a problem in the pipeline to which the Wallless tree is applied.

여기서, 월리스 트리의 특성을 이해하는 도움이 되고저, 4개 데이터를 2개 데이터로 출력하는 월리스 트리의 동작을 도 3 및 도 4를 참조하여 설명Here, it is helpful to understand the characteristics of the wallless tree, and the operation of the wallless tree for outputting four data as two data will be described with reference to FIGS. 3 and 4.

도 3은 4개의 데이터를 가산하여 2개의 데이터로 출력하는 월리스 트리의 특성을 설명3 illustrates the characteristics of the Wallace tree that adds four data and outputs the data as two data.

도 3에서 보는 바와 같이, 월리스 트리에서는 4개의 데이터를 입력받아 동일한 위치의 비트끼리 묶어서 가산하여 합(SUM)과 캐리(CARRY)를 발생하고, 발생된 합(SUM)에 해당하는 비트들과 캐리(CARRY)에 해당하는 비트들을 하나의 데이터로 간주하여 2개의 데이터를 출력한다. 따라서, 캐리 프로퍼케이션이 발생되지 않도록 하고 있다.As shown in FIG. 3, in the Wallace tree, four data are input and the bits of the same position are added and added together to generate a sum and carry, and the bits and carry corresponding to the sum generated. Two bits are output by considering the bits corresponding to (CARRY) as one data. Therefore, carry provision is prevented from occurring.

즉, 4개의 데이터 A, B, C, D 는 8비트로 구성되어 있으며, 동일한 비트 위치끼리 가산을 수행하는데, A, B, C의 각 비트 3개씩 그룹을 지어 가산한다. 그 결과 각 그룹의 합(SUM)은 동일한 비트 위치, 캐리(CARRY)는 상위 1비트 위치에 해당하는 웨이트가 되어서, 나머지 D의 각 비트와 동일한 웨이트의 비트끼리 다시 3개씩 그룹을 지어 가산한다. 그 결과 각 그룹의 합(SUM)은 동일한 비트 위치, 캐리(CARRY)는 상위 1비트 위치에 해당하는 웨이트가 되어서, 2개의 데이터 X와 Y는 윌리스 트리의 출력쌍을 형성한다.That is, four data A, B, C, and D are composed of 8 bits, and the same bit positions are added to each other. Groups of three bits of A, B, and C are added together. As a result, the sum SUM of each group becomes the weight corresponding to the same bit position, and the carry is the upper 1 bit position, and the bits of the same weight as each of the remaining D bits are added in groups of three again. As a result, the sum of each group SUM is the same bit position and the carry is the weight corresponding to the upper 1 bit position, so that two data X and Y form an output pair of the Willis tree.

이때, 출력된 2개의 데이터 X와 Y의 최고 9비트까지 발생되고 있으나, 그 응용에 따라 가산된 2개의 데이터 값이 그다지 크지 않는 경우에는, 최상위 비트(캐리)는 고려하지 않아도 상관 없다.At this time, up to 9 bits of the output two data X and Y are generated. However, when the two added data values are not very large according to the application, the most significant bit (carry) may not be considered.

도 4를 참조하면, 도 3에서 보인 4개의 입력 데이터중 3개 데이터 A, B, C의 동일한 웨이트를 갖는 비트가 8개의 제 1 가산부(41)를 통해 가산된다. 상기 제 1 전가산기(41)로부터 출력된 합과 캐리 및 나머지 입력 데이터에서 동일한 웨이트를 갖는 비트가 7개의 제 2 가산부(42)를 통해 가산되어 2개의 데이터 X,Y가 출력된다.Referring to FIG. 4, bits having the same weight of three data A, B, and C among the four input data shown in FIG. 3 are added through the eight first adders 41. Bits having the same weight in the sum and carry and the remaining input data output from the first full adder 41 are added through the seven second adders 42 to output two data X and Y.

월리스 트리를 적용한 파이프라인 가산기 구조를 도 3에 도시하였으며, 이렇게 파이프라인으로 제작하려할 때 가장 중요한 요소는 이미 설명The pipeline adder structure to which the wallless tree is applied is shown in FIG.

그러나, 마지막 스테이지의 병합 가산기에서는 두 출력값을 더하는데 캐리 지연을 수반하기 때문에 가장 큰 지연 시간을 갖고, 이에 따라 전체 시스템의 동작 주파수는 병합 가산기의 지연 시간에의해 결정되어 진다.However, the merge adder of the last stage has the largest delay because it involves carry delay in adding the two outputs, so that the operating frequency of the whole system is determined by the delay time of the merge adder.

월리스 트리를 적용한 파이프라인 가산기에서는 최종 스테이지에서 상위 스테이지의 월리스 트리로부터 출력된 2개의 데이터를 가산하여 최종값을 구해야한다.In the pipeline adder using the wallless tree, the final value is obtained by adding two pieces of data output from the wallless tree of the upper stage in the final stage.

이 때, 상위 스테이지 즉, 월리스 트리에서는 캐리 전파 시간이 발생되지 않는다. 그러나, 마지막 스테이지에서는 캐리 전파가 발생되기 때문에, 상위 스테이지의 지연시간에 비해 상당히 큰 지연 시간을 갖게 되어 전체 파이프라인 클럭 주파수가 마지막 스테이지의 지연시간으로 결정된다.At this time, the carry propagation time does not occur in the upper stage, that is, the wallless tree. However, because carry propagation occurs in the last stage, it has a considerably larger delay time than the delay time of the upper stage, so that the entire pipeline clock frequency is determined as the delay time of the last stage.

따라서, 상위 스테이지들에서는 마지막 스테이지 지연시간에서 현 스테이지 지연시간의 차에 해당하는 여분의 처리 시간이 남아 도는 문제점이 있었다.Therefore, in the upper stages, there is a problem that extra processing time corresponding to the difference between the current stage delay time and the last stage delay time remains.

이에 본 발명은 상기와 같은 종래의 문제점을 해소하기 위하여 안출된 것으로, 파이프라인 마지막 스테이지에서 올림수 지연을 갖는 병합 가산기의 지연 시간을 나머지 스테이지의 지연시간에 비해 적거나 동일하게 분할하여, 병합 가산기 자체가 그 만큼의 스테이지를 갖는 파이프라인 프로세싱을 하여 전체 동작 주파수가 증가되므로써 성능이 향상된 파이프라인 가산기를 제공하는데 그 목적이 있다.Accordingly, the present invention has been devised to solve the above-mentioned problems. The delay adder of a merge adder having a round-up delay in the last stage of the pipeline is divided into less than or equal to the delay time of the remaining stages. Its purpose is to provide a pipeline adder with improved performance by increasing the overall operating frequency by performing pipeline processing with its own stages.

상기와 같은 목적을 달성하기 위하여 본 발명의 파이프라인 가산기는, 클럭 주파수 f에 따라 월리스 트리로 구성된 파이프라인 프로세싱을 통해서 마지막 월리스 트리로부터 출력된 2개 이진 데이터를 최종 합산하여 1개 데이터를 출력하기 위한 병합 가산기에 있어서,In order to achieve the above object, the pipeline adder of the present invention outputs one data by finally summing two binary data output from the last Wallace tree through pipeline processing configured as a Wallace tree according to a clock frequency f. In the merge adder for

제 1 클럭 기간에 입력 데이터(n비트로 구성된 데이터)의 최하위 비트를 포함한 다수개의 비트로 이루어진 제 1 비트 그룹(A[0:k-1],B[0:k-1])을 가산하여 제 1합(S[0:k-1])과 제 1 올림수(C[k])를 출력하는 제 1 가산부; 제 2 클럭 기간에 상기 제 1 올림수(C[k]) 및 제 2 비트 그룹(A[k:2k-1],B[k:2k-1])을 가산하여 제 2합(S[k:2k-1])과 제 2 올림수(C[2k])를 출력하는 제 2 가산부 ; 제 3 클럭 기간에 상기 제 2 올림수(C[2k]) 및 제 3 비트 그룹(A[2k:3k-1],B[2k:3k-1])을 가산하여 제 3 합(S[2k:3k-1])과 제 3 올림수(C[3k])를 출력하는 제 3 가산부; 및 제 4 클럭 기간에 상기 제 3 올림수 (C[3k])및 제 4 비트 그룹(A[3k:4k-1],B[3k:4k-1])을 가산하여 제 4 합(S[3k:4k-1])과 제 4 올림수(C[4k])를 출력하는 제 4 가산부를 포함하여 구성되는 것을 특징으로 한다.The first bit group A [0: k-1], B [0: k-1], which is composed of a plurality of bits including the least significant bit of the input data (n bit data), is added to the first clock period. A first adder for outputting a sum S [0: k-1] and a first rounded number C [k]; The second sum period S [k] is added by adding the first round number C [k] and the second bit group A [k: 2k-1] and B [k: 2k-1] to a second clock period. : 2k-1]) and a second adder for outputting the second rounded number C [2k]; The third sum (S [2k) is added by adding the second rounded number C [2k] and the third bit group A [2k: 3k-1] and B [2k: 3k-1] to a third clock period. 3k-1] and a third adder for outputting a third rounded number C [3k]; And adding a third rounded number C [3k] and a fourth bit group A [3k: 4k-1], B [3k: 4k-1] to a fourth clock period to add a fourth sum S [S]. 3k: 4k-1] and a fourth adder for outputting the fourth rounded number C [4k].

이와 같이 상위 스테이지의 지연 시간에 비해 상대적으로 큰 지연시간을 갖던 병합 가산기 자체의 내부회로를 파이프라인 구조화 하여, 모든 스테이지의 지연 시간이 거의 비슷해지므로써, 전체 파이프라인의 동작 주파수가 종전에 비해 증가함은 물론, 병렬 처리해야하는 로드가 많을 수록 더 좋은 성능을 향상을 기대할 수 있게 된다.The pipelined structure of the internal circuit of the merge adder itself, which had a relatively large delay time compared to the delay time of the upper stage, makes the delay time of all stages almost similar, thereby increasing the operating frequency of the entire pipeline compared to the past. Of course, the more load you have to process in parallel, the better performance you can expect.

도1은 일반적인 월리스 트리의 개념을 설명1 illustrates the concept of a typical Wallace tree

도1a는 기본형 월리스 트리 구조도,1A is a basic Wallace tree structure diagram;

도1b는 수정형 월리스 트리 구조도,1B is a modified Wallace tree structure diagram;

도2는 일반적인 파이프라인 프로세싱을 설명Figure 2 illustrates general pipeline processing

도3은 4개 데이터를 2개 데이터로 출력하는 월리스 트리 동작을 설명Fig. 3 illustrates the Wallace tree operation of outputting four data as two data.

도4는 도 3의 월리스 트리를 전가산기로 구현한 4입력 2출력 월리스 가산기에 대한 블럭도,4 is a block diagram of a four input two output wallless adder implementing the wallless tree of FIG.

도5는 파이프라인 최종 스테이지의 병합 가산기로 사용될 수 있는 가산기들을 보인 블록도로서,5 is a block diagram showing adders that can be used as a merge adder in a pipeline final stage.

도5a는 8비트 리플 가산기에 대한 블록도,5A is a block diagram for an 8-bit ripple adder;

도5b는 8비트 캐리 선택 가산기에 대한 블록도,5B is a block diagram for an 8-bit carry select adder;

도5c는 8비트 캐리 예견 가산기에 대한 블록도,5C is a block diagram for an 8-bit carry lookahead adder;

도6은 본 발명에 따라 병합 가산기의 동작을 세분화하여 파이프라인 구조를 갖는 병합 가산기로 구성한 블록도이다.6 is a block diagram of a merge adder having a pipeline structure by subdividing the operation of the merge adder according to the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

60 : 제 1 가산부 61 : 제 2 가산부60: first adder 61: second adder

62 : 제 3 가산부 63 : 제 4 가산부62: third adder 63: fourth adder

L1∼L5 : 래치부L1 to L5: Latch

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 자세히 설명Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

우선, 파이프라인의 최종스테이지에서 월리스 트리로부터 출력된 2개 데이터를 가산하는 일반 가산기를 도 5를 참조하여 살펴보고자 한다.First, a general adder for adding two data output from the Wallace tree in the final stage of the pipeline will be described with reference to FIG. 5.

도 5는 파이프라인 최종 스테이지의 병합 가산기로 사용될 수 있는 가산기들을 보인 블록도로서, 도 5a는 8비트 리플 가산기에 대한 블록도, 도 5b는 8비트 캐리 선택 가산기에 대한 블록도, 도 5c는 8비트 캐리 예견 가산기에 대한 블록도이다.5 is a block diagram showing adders that can be used as a merge adder in a pipeline final stage, FIG. 5A is a block diagram for an 8-bit ripple adder, FIG. 5B is a block diagram for an 8-bit carry select adder, and FIG. 5C is 8 A block diagram for a bit carry predictor adder.

도 5a를 참조하면, 8비트 리플 가산기는 8 개의 2진 병렬 전가기(FA₀∼FA₇)를 사용하여 입력 신호 X₀∼X₇와 Y₀∼Y₇의 모든 비트들을 동시에 가산한다. 최초 전가산 기(FA₀)로부터의 출력 캐리(COUT = C₁)는 바로 다음 상위 전가산기(FA₂)의 입력 캐리(CIN = C₁)가 되고, 각각의 캐리와 합은 상기 수학식 2에 의해서 구할 수 있다.Referring to Figure 5a, 8-bit ripple adder adds the 8 binary parallel around the top (FA ₀ ~FA ₇₎ using the input signal X ₀ ~X ₇ and all the bits of Y ₀ ~Y ₇ at the same time. The output carry COUT = C ₁ from the first full adder FA ₀ becomes the input carry CIN = C ₁ of the next higher full adder FA ₂ , with each carry and sum being Can be obtained by

[수학식 2][Equation 2]

상기 리플 가산기는 발생된 캐리를 다음 상위 전가산기(다음 상위 비트)에 전파하는 데 걸리는 시간(캐리 전파 지연)이 매우 크다는 단점을 갖고 있다.The ripple adder has a disadvantage in that it takes a very long time (carrie propagation delay) to propagate the generated carry to the next higher full adder (the next higher bit).

즉, 입력 신호 X₇과 Y₇는 입력과 동시에 가산과정을 수행할 준비가 되어있는데 반해, 캐리 C₇은 캐리 C₆이 전파될 때까지 설정되지 못하고 이러한 이유로 C₆은 C₅를 기다리고 C₅는 C₄를 기다리게 되므로, 상기 C₀로부터 C₁, C₂,… , C₆단계를 거쳐 C₇이 얻어져야만 리플 캐리 가산 장치가 정상적이고 오류없는 합 S_0∼7과 캐리 C₈을 출력하게 된다.That is, while the input signals X ₇ and Y ₇ are ready to perform the addition process simultaneously with the input, the carry C ₇ cannot be set until carry C ₆ is propagated and for this reason C ₆ waits for C ₅ and C ₅ is therefore waiting for the _{_{_{C 4, C 1, C 2}}} , from the C ₀ ... The ripple carry adder outputs the normal and error-free sums S _0-7 and carry C _{8 only} when C ₇ is obtained through C ₆ , C ₆ .

도 5b를 참조하면, 상기 8 비트 캐리 선택 가산기 제 1 가산부(10)와 제 2 가산부(12), 제 3 가산부(14), 멀티플렉서(16 : multiplexer) 및 최종 캐리 출력 선택부(18)로 구성된다.Referring to FIG. 5B, the 8-bit carry select adder first adder 10 and the second adder 12, the third adder 14, the multiplexer 16, and the final carry output selector 18. It is composed of

우선, 상기 8 비트 캐리 선택 가산기가 8 비트의 입력 신호 X₀∼X₇과 Y₀∼Y₇그리고 최초 외부 입력 신호 C₀를 입력받아 가산한다고 가정하자.First, assume that the 8-bit carry select adder receives and adds 8-bit input signals X _{0 to} X ₇ , Y ₀ to Y _7, and an initial external input signal C ₀ .

상기 제 1 가산부(10)는 하위 4비트 입력 신호 X₀∼X₃과 Y₀∼Y₃, 그리고 최초 외부 입력 신호 C₀를 입력받아 가산한 후, 합 S₀∼S₃과 전파 캐리 C₄를 출력한다.여기서, 상기 전파 캐리 C₄는 상기 멀티플렉서(16)의 선택 신호(selection signal : SEL)로 입력된다.The first adding unit 10 receives and adds the lower 4 bit input signals X _{0 to} X ₃ and Y ₀ to Y ₃ , and the first external input signal C ₀ , and then adds the sums S _{0 to} S ₃ and the radio wave carry C. Outputs _4. Here, the propagation carry C ₄ is input to a selection signal (SEL) of the multiplexer 16.

상기 제 2 가산부(12)는 상기 전파 캐리 C₄가 "0"일 경우의 합 S₄∼S₇과 전파 캐리 (C₈)₀를 출력한다. 즉, 상기 제 2 가산부(14)는 상위 4비트 입력 신호 X₄∼X₇과 Y₄∼Y₇및 전파 캐리 "0"을 입력받아 가산한 후, 합 S₄∼S₇과 전파 캐리 C₈를 출력한다. 여기서, 상기 합 S₄∼S₇은 상기 멀티플렉서(16)의 입력 신호로 출력되고 상기 전파 캐리 (C₈)₀는 상기 최종 캐리 선택부(18)로 출력된다.The second adding unit 12 outputs the sum S _{4 to} S ₇ and the radio wave carry C ₈ ₀ when the radio wave carry C ₄ is "0". That is, the second adder 14 receives and adds the upper 4-bit input signals X _{4 to} X ₇ and Y ₄ to Y _7, and the radio wave carry "0", and then adds the sum S _{4 to} S ₇ and the radio wave carry C. Print ₈ Here, the sums S _{4 to} S ₇ are output as the input signal of the multiplexer 16 and the propagation carry C ₈ ₀ is output to the final carry selector 18.

상기 제 3 가산부(14)는 상기 전파 캐리 C₄가 "1"일 경우의 합 S₄∼S₇과 전파 캐리 (C₈)₁를 출력한다. 즉, 상기 제 3 가산부(14)는 상위 4비트 입력 신호 X₄∼X₇과 Y₄∼Y₇및 전파 캐리 "1"을 입력받아 가산한 후, 합 S₄∼S₇과 전파 캐리 C₈를 출력한다. 여기서, 상기 합 S₄∼S₇은 상기 멀티플렉서(16)의 입력 신호로 출력되고 상기 전파 캐리 (C₈)₁는 상기 최종 캐리 선택부(18)로 출력된다.The third adder 14 outputs the sum S _{4 to} S ₇ and the propagation carry C ₈ ₁ when the propagation carry C ₄ is "1". That is, the third adder 14 receives and adds the upper 4-bit input signals X _{4 to} X ₇ and Y ₄ to Y _7, and the radio wave carry "1", and then adds the sum S _{4 to} S ₇ and the radio wave carry C. Print ₈ Here, the sums S _{4 to} S ₇ are output as the input signal of the multiplexer 16 and the propagation carry C ₈ ₁ is output to the final carry selector 18.

상기 멀티플렉서(16)는 상기 선택 신호 C₄에 따라 상기 제 2 가산부(12)로부터의 합 S₄∼S₇과 상기 제 3 가산부(14)로부터의 합 S₄∼S₇을 선택하여 출력시킨다. 상기 최종 캐리 출력 선택부(18)는 상기 전파 캐리 C₄와(C₈)₀및 (C₈)₁를 입력받아 상기 전파 캐리 (C₈)₀와(C₈)₁를 선택하여 출력시킨다.The multiplexer 16 selecting and outputting the sum S ₄ ~S ₇ from the third addition unit 14 and the sum S ₄ ~S ₇ from the second addition unit 12 according to the selection signal C ₄ Let's do it. The final carry output selecting section 18 outputs to receiving the radio waves carry C ₄ and (C ₈₎ ₀ and (C ₈₎ ₁ selects the radio waves carry (C ₈₎ ₀ and (C ₈₎ _1.

즉, 상기 최종 캐리 C₈를 논리식으로 전개해보면 다음 수학식 3과 같다.That is, when the final carry C ₈ is developed as a logical equation, it is expressed as Equation 3 below.

[수학식 3][Equation 3]

상기 캐리 선택 가산기는 캐리 선택을 위해 사용되는 여분의 가산부(redundant adder)와 멀티플렉서 및 최종 캐리 선택부로 인한 신호 지연을 갖고 있다는 단점이 있다.The carry select adder has the disadvantage of having a signal adder due to a redundant adder used for carry selection, a multiplexer and a final carry selector.

도 5c를 참조하면, 8 비트 캐리 예견 가산기는 캐리 생성 신호(carry generate : G_i) 발생부(20)와, 캐리 전파 신호(carry propagate : P_i) 발생부(22), 캐리 발생부(24) 및 가산부(26)로 구성된다.Referring to FIG. 5C, the 8-bit carry prediction adder includes a carry generate signal G _i generator 20, a carry propagate P _i generator 22, and a carry generator 24. ) And an adder 26.

상기 캐리 생성 신호 발생부(20)는 논리곱 게이트로 구성되며, 상기 입력 신호 X_i와 Y_i가 둘다 "1" 인 경우, 상기 캐리 입력 C_i에 상관 없이 캐리 출력 C_i+1을 "1" 로 만든다.The carry generation signal generator 20 is configured by an AND gate, and when the input signals X _i and Y _i are both “1”, the carry output C _{i + 1 is set} to “1” regardless of the carry input C _i . "

상기 캐리 전파 신호 발생부(22)는 배타논리합 게이트로 구성되며, 상기 입력 신호 X_i와 Y_i중 하나만 "1" 인 경우, 캐리 입력 C_i에 상관 없이 캐리 출력C_i+1을 "1" 로 만든다.The carry propagation signal generator 22 includes an exclusive logic gate, and when only one of the input signals X _i and Y _i is “1”, the carry output C _{i + 1 is set} to “1” regardless of the carry input C _i . Make it.

상기 캐리 발생부(24)는 상기 캐리 생성 신호 G_i와 캐리 전파 신호 P_i및 최초 외부 캐리 C₀를 입력받아 예견 캐리 C_i+1를 발생시켜 상기 가산부(26)로 출력시킨다.The carry generator 24 receives the carry generation signal G _i , the carry propagation signal P _i, and the first external carry C ₀ to generate a predicted carry C _{i + 1} and output the generated carry C _{i + 1} to the adder 26.

상기 가산부(8)는 상기 캐리 전파 신호 P_i와 예견 캐리 C_i를 더하여 합 S_i를 출력시킨다.The addition section 8 outputs the sum S _i in addition to the carry propagate signal P _i and the predicted carry-C _i.

상기 캐리 전파 신호와 예견 캐리로 정의되는 합 S_i를 논리식으로 전개해 보면 다음 수학식 4와 같다.The sum S _i defined by the carry propagation signal and the predicted carry is logically developed as follows.

[수학식 4][Equation 4]

그리고, 상기 캐리 생성 신호 G_i와 상기 캐리 전파 신호 P_i를 논리식으로 전개해 보면 다음 수학식 5과 같다.The carry generation signal G _i and the carry propagation signal P _i are logically developed as shown in Equation 5 below.

[수학식 5][Equation 5]

그리고, 상기 캐리 발생 신호와 캐리 전파 신호로 정의되는 예견 캐리 C_i+1를 논리식으로 전개해 보면 다음 수학식 6과 같다.Then, when the prediction carry C _{i + 1} defined by the carry generation signal and the carry propagation signal is developed in a logical formula, the following equation (6) is obtained.

[수학식 6][Equation 6]

상기 예견 캐리 논리식에 따라, 예견 캐리 C₁∼C₄를 구해보면 다음 수학식 7∼10과 같다.According to the predictive carry logic, the predicted carry C _{1 to} C ₄ are obtained as shown in the following equations (7) to (10).

[수학식 7][Equation 7]

[수학식 8][Equation 8]

[수학식 9][Equation 9]

[수학식 10][Equation 10]

상기 수학식에서 살펴본 바와 같이, 예견 캐리 출력 C_i+1는 캐리 입력 C_i에 상관없이 입력 신호 X_i와 Y_i그리고 최초 외부 캐리 C₀에 의해 미리 발생시킬 수 있으므로, 캐리 예견 가산기는 캐리 전파 지연을 감소시켜 가산 속도를 향상시킨다. 그러나, 예견 캐리 C₁∼C₄는 2 level의 NAND 게이트로 표현 되었지만 예견 캐리의 가중치(weight : C₀= 2⁰, C₁= 2¹, C₂= 2², … , C_2n= 2ⁿ)가 계속 증가할 경우, 예견 캐리는 더 이상 2 level의 NAND 게이트로 표현될 수 없다.As described in the above equation, the predictive carry output C _{i + 1} may be generated in advance by the input signals X _i and Y _i and the first external carry C ₀ irrespective of the carry input C _i , so the carry predict adder carries a carry propagation delay. To increase the addition speed. However, the predictive carry C _{1 to} C ₄ are expressed as 2 level NAND gates, but the weight of the predicted carry (C ₀ = 2 ⁰ , C ₁ = 2 ¹ , C ₂ = 2 ² ,…, C _2n = 2 ⁿ If) continues to increase, the prediction carry can no longer be represented by a two-level NAND gate.

즉, 캐리 전파 지연을 가장 효과적으로 제거되긴 하지만, 예견 캐리의 가중치가 증가함에 따라 상기 예견 캐리 발생 로직의 복잡도도 증가하여 구성 하드웨어가 증가되므로, 최종 결과를 얻기 까지 소요되는 임계 경로가 길어진다.In other words, although the carry propagation delay is most effectively eliminated, as the weight of the predicted carry increases, the complexity of the predicted carry generation logic also increases, thus increasing the configuration hardware, thus increasing the critical path required to obtain the final result.

한편, 월리스 트리의 임계 경로는 도 4에서 보는 것과 같이 제 1 전가산기와 제 2 전가산기를 통과하는 경로이며, 이상에서 살펴본 종래의 일반 가산기의 임계경로는 도 5에서 보는 바와 같이, 캐리 발생부나 캐리 전파부가 추가되거나, 캐리 예견부가 추가되므로, 월리스 트리의 임계 경로에 비해 상당히 큰 임계 경로를 갖게 되고, 그 만큼 긴 지연 시간을 갖게 된다.On the other hand, the critical path of the wall tree is a path passing through the first full adder and the second full adder, as shown in Figure 4, the critical path of the conventional general adder described above, as shown in Figure 5, the carry generation unit or Since the carry propagation part is added or the carry prediction part is added, it has a significantly larger critical path compared to the critical path of the Wallace tree, and thus a long delay time.

따라서, 본 발명은 가장 긴 지연 시간을 갖는 최종 스테이지를 세분화 하여 각 스테이지의 지연 시간의 균형을 맞추고, 동작 주파수를 빠르게 하는 것이다.Therefore, the present invention is to subdivide the final stage with the longest delay time to balance the delay time of each stage and to increase the operating frequency.

도 6은 본 발명에 따라 병합 가산기의 동작을 세분화하여 파이프라인 구조를 갖는 가산기를 구성한 블록도이다.6 is a block diagram illustrating an adder having a pipeline structure by subdividing the operation of the merge adder according to the present invention.

도 6에서 보는 바와같이, 파이프라인 가산기의 입력은 8비트로 구성된 2개의 데이터 X[0:7],Y[0:7]이며, 입력 데이터를 2비트씩 나누어서 데이터 비트 A[0:1],B[0:1]는 제 1 비트 그룹, 데이터 비트 A[2:3],B[2:3]는 제 2 비트 그룹, 데이터 비트 A[4:5],B[4:5] 는 제 3 비트 그룹 , 데이터 비트 A[6:7],B[6:7]는 제 4 비트 그룹으로 가정한다.As shown in Fig. 6, the input of the pipeline adder is two data X [0: 7], Y [0: 7] composed of 8 bits, and the input data are divided by two bits to provide data bits A [0: 1], B [0: 1] is the first group of bits, data bits A [2: 3], B [2: 3] are the second group of bits, data bits A [4: 5], and B [4: 5] are Assume that the 3 bit group, data bits A [6: 7], B [6: 7] are the fourth bit group.

파이프라인 가산기는 제 1 내지 제 4 가산기(60∼63)와 제 1 내지 제 5 래치부(L1∼L5)로 구성되어 있다.The pipeline adder is composed of first to fourth adders 60 to 63 and first to fifth latch portions L1 to L5.

상기 제 1 래치부(L1)는 입력되는 두개의 데이터의 모든 비트 그룹(A[0:7], B[0:7])을 래치하고, 상기 제 1 가산기(60)는 입력 데이터 제 1 비트 그룹(A[0:1],B[0:1])을 가산하여 제 1합(S[0:1])과 제 1 캐리(C[2])를 출력한다.The first latch unit L1 latches all the bit groups A [0: 7] and B [0: 7] of the two data inputs, and the first adder 60 inputs the first data bit. The groups A [0: 1] and B [0: 1] are added to output the first sum S [0: 1] and the first carry C [2].

상기 제 2 래치부(L2)는 상기 제 1 가산부(60)로부터 출력된 제 1합(S[0:1])과 제 1 캐리(C[2]) 및 제 2 내지 제 4 비트 그룹(A[2:7],B[2:7])을 래치한다.The second latch part L2 may include a first sum S [0: 1], a first carry C [2], and a second to fourth bit group output from the first adder 60. L [2: 7], B [2: 7]).

상기 제 2 가산부(71)는 상기 제 1 캐리(C[2]) 및 제 2 비트 그룹(A[2:3],B[2:3])을 가산하여 제 2합(S[2:3])과 제 2 캐리(C[4])를 출력한다.The second adder 71 adds the first carry C [2] and the second bit groups A [2: 3] and B [2: 3] to add a second sum S [2:]. 3]) and the second carry C [4].

상기 제 3 래치부(L3)는 상기 제 1 가산부(60)로부터 출력된 제 1 합(S[0:1])과, 상기 제 2 가산부(71)로부터 출력된 제 2 합(S[2:3]), 제 2 캐리(C[4]) 및 제 3 내지 제 4 비트 그룹(A[4:7],B[4:7])을 래치한다.The third latch portion L3 has a first sum S [0: 1] output from the first adder 60 and a second sum S [outputted from the second adder 71. 2: 3]), the second carry C [4] and the third to fourth bit groups A [4: 7], B [4: 7].

상기 제 3 가산부(62)는 상기 제 2 캐리(C[4]) 및 제 3 비트 그룹(A[4:5],B[4:5])을 가산하여 제 3 합(S[4:5])과 제 3 캐리(C[5])를 출력한다.The third adder 62 adds the second carry C [4] and the third bit group A [4: 5], B [4: 5] to add a third sum S [4: 5]) and the third carry C [5].

상기 제 4 래치부(L4)는 상기 제 1 가산부(60)로부터 출력된 제 1 합(S[0:1])과 상기 제 2 가산부(61)로부터 출력된 제 2합(S[2:3]) 상기 제 3 가산부(62)로 부터 출력된 제 3 합(S[4:5]), 제 3 캐리(C[6]) 및 제 4 비트 그룹(A[6:7],B[6:7])을 래치한다.The fourth latch portion L4 has a first sum S [0: 1] output from the first adder 60 and a second sum S [2] output from the second adder 61. 3) the third sum S [4: 5], the third carry C [6], and the fourth bit group A [6: 7], output from the third adder 62; B [6: 7]) is latched.

상기 제 4 가산부(63)는 상기 제 3 캐리 (C[6])및 제 4 비트 그룹(A[6:7],B[6:7])을 가산하여 제 4 합(S[6:7])과 제 4 캐리(C[8])를 출력한다.The fourth adder 63 adds the third carry C [6] and the fourth bit group A [6: 7], B [6: 7] to add a fourth sum S [6: 7]) and the fourth carry C [8].

상기 제 5 래치(L5)는 상기 제 1 가산부(60)로부터 출력된 제 1 합(S[0:1])과 상기 제 2 가산부(61)로부터 출력된 제 2 합(S[2:3]) ,상기 제 3 가산부(62)로부터 출력된 제 3 합(S[4:5]), 상기 제 4 가산부(63)로부터 출력된 제 4 합(S[6:7]) 을 래치한다.The fifth latch L5 includes a first sum S [0: 1] output from the first adder 60 and a second sum S [2: output from the second adder 61. 3]), and the third sum S [4: 5] output from the third adder 62 and the fourth sum S [6: 7] output from the fourth adder 63. Latch.

이어서, 본 실시예의 작용 및 효과를 자세히 설명Next, the operation and effects of the present embodiment will be described in detail.

본 발명의 파이프라인 가산기는 이진 입력 데이터 A[0:7], B[0:7]를 가산하는 데 있어서, 제 1 클럭 동안에, 제 1 가산기(60)를 통해 제 1 합 S[0:1]과 제 1 캐리C[2]를 계산하고, 제 2 클럭 동안에, 제 2 가산기(61)를 통해 제 2 합S[2:3]과 제 2 캐리C[4]를 계산하고, 제 3 클럭 동안에, 제 3 가산기(62)를 통해 제 3 합S[4:5]과 제 3 캐리C[6]를 계산하고, 제 4 클럭 동안에 제 4 가산기(63)를 통해 제 4합S[6:7]과 제 4캐리C[8]를 계산한다. 즉, 제 5 클럭 동안 두 입력 데이터의 최종 합값 S[0:7]이 계산되어 진다.The pipeline adder of the present invention adds binary input data A [0: 7], B [0: 7], during the first clock, through the first adder 60, the first sum S [0: 1. ] And the first carry C [2], and during the second clock, calculate the second sum S [2: 3] and the second carry C [4] via the second adder 61, and the third clock. In the meantime, the third sum S [4: 5] and the third carry C [6] are calculated through the third adder 62 and the fourth sum S [6: through the fourth adder 63 during the fourth clock. 7] and the fourth carry C [8]. That is, the final sum S [0: 7] of the two input data during the fifth clock is calculated.

본 발명을 요약하면, 다수개의 데이터를 병렬처리하기 위해 월리스 트리를 적용하여 파이프라인 구조로 제작할 경우, 최종 스테이지에서는 2개의 출력 데이터를 합산하여 1개의 결과 값을 구해야만 한다. 이때 최종단에서는 필연적으로 캐리 프로퍼케이션이 발생되는 병합 가산기를 사용하기 때문에, 최종 스테이지가 상대적으로 가장 긴 지연 시간을 갖게 되며, 이 긴 지연 시간에 의해 전체 파이프라인 클럭 주파수가 결정되어졌다.In summary, when producing a pipeline structure by applying a Wallace tree to process a plurality of data in parallel, the final stage has to sum two output data and obtain one result value. In this case, since the merge stage inevitably uses the carry adder, the final stage has a relatively long delay time, and the long pipeline time determines the total pipeline clock frequency.

따라서, 이 병합 가산기의 긴 지연 시간을 세분화하여 즉, 병합 가산기 내부회로 자체를 몇 개의 스테이지로 구분하여 파이프라인 처리하므로써, 전체 파이프라인 클럭 주파수는 상대적으로 빠른 주파수를 갖고 처리 속도가 향상되는 효과가 있는 것이다.Therefore, by segmenting the long delay time of the merge adder, that is, by dividing the merge adder internal circuit into several stages and performing pipeline processing, the overall pipeline clock frequency has a relatively high frequency and the processing speed is improved. It is.

물론, 스테이지 수가 증가됨에 따라 최초 입력 데이터에 대한 출력값이 나오기 까지 걸리는 초기 지연 시간은 증가되지만, 빠른 클럭 주파수로 인해 처리 속도가 향상되고, 또한 데이터량이 상당히 많을 경우에는 보다 높은 효율을 기대할 수 있다.Of course, as the number of stages increases, the initial delay time until the output value for the first input data is increased, but the processing speed is improved due to the fast clock frequency, and higher efficiency can be expected when the amount of data is quite large.

Claims

In the merge adder for outputting one data by finally summing the two binary data output from the last Wallace tree through the pipeline processing consisting of the Wallace tree according to the clock frequency f,

The first bit group A [0: k-1], B [0: k-1], which is composed of a plurality of bits including the least significant bit of the input data (n bit data), is added to the first clock period. A first adder 60 outputting a sum S [0: k-1] and a first carry C [k];

In the second clock period, the first carry C [k] and the second bit group A [k: 2k-1] and B [k: 2k-1] are added to add a second sum S [k: 2k-1]) and a second adder 61 for outputting the second carry C [2k];

In the third clock period, the second carry C [2k] and the third bit group A [2k: 3k-1] and B [2k: 3k-1] are added to add a third sum S [2k: 3k-1]) and a third adder 62 for outputting a third carry C [3k]; And

In the fourth clock period, the third carry (C [3k]) and the fourth bit group A [3k: n-1] and B [3k: n-1] are added to add a fourth sum S [3k: n-1]) and a fourth adder (63) for outputting a fourth carry (C [n]).

2. The first latch unit (L1) according to claim 1, further comprising: a first latch portion (L1) for latching all bit groups (A [0: n], B [0: n]) of two data inputted;

The first sum S [0: k-1] and the first carry C [k] and the second to fourth bit groups A [k: n], output from the first adder 60, A second latch portion L2 for latching B [k: n]);

A first sum S [0: k-1] output from the first adder 60 and a second sum S [k: 2k-1] output from the second adder 61. A third latch portion L3 for latching the second carry C [2k] and the third to fourth bit groups A [2k: 3k-1] and B [2k: 3k-1];

A first sum S [0: k-1] output from the first adder 60 and a second sum S [k: 2k-1] output from the second adder 61; The third sum S [2k: 3k-1], the third carry C [3k] and the fourth bit group A [3k: n-1], B output from the third adder 62 A fourth latch portion L4 for latching [3k: n-1]);

A first sum S [0: k-1] output from the first adder 60 and a second sum S [k: 2k-1] output from the second adder 61; Latching a third sum S [2k: 3k-1] output from the third adder 62 and a fourth sum S [3k: n] output from the fourth adder 63. Pipeline adder, characterized in that the fifth latch portion (L5) is further added.

The pipeline adder according to claim 1, wherein the first to fourth adders (60 to 63) comprise full adders for adding a plurality of bits to at least one bit.