CN210129212U

CN210129212U - Wide-word high-speed segmented carry adder, counter and multiplier

Info

Publication number: CN210129212U
Application number: CN201920764408.4U
Authority: CN
Inventors: 何群
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2020-03-06
Anticipated expiration: 2029-05-16

Abstract

A wide-word high-speed segmented carry adder, counter and multiplier belonging to the computer arithmetic operation part. A structure combining carry look ahead and node carry is adopted, a 4-bit carry look ahead adder and a carry control element form 1 node, and all the nodes are connected in series by single chains. The link can be controlled to be switched on or switched off according to the condition generated by each node, the link is dynamically divided into a plurality of sections, and each section keeps the carry state required by each node in the section to realize summation operation. The design firstly utilizes the high-speed on/off characteristic of a network digital bus switch to realize the single-line serial expansion of the word width, and has the characteristics of less original elements, less connecting lines and high speed. The time delay of 128-bit addition operation can be close to that of 8-bit carry-ahead addition by using original elements of the same type; the counting frequency of a 128-bit counter and a 64-bit multiplier which are formed by the principle is close to that of a 16-bit synchronous counter, and the operation delay is less than that of a 32-bit wallace multiplier.

Description

Wide-word high-speed segmented carry adder, counter and multiplier

Technical Field

Computer arithmetic units, in particular adders, counters and multipliers. Belongs to the programmable integrated circuit design.

Background

The performance of a computer system depends primarily on the speed at which the set of instructions is executed by the operator, with the arithmetic instruction set taking the longest time. The speed of execution of the arithmetic instruction set depends on the speed of the adder.

1. Adder

The tree adder structure represented by carry look ahead is realized by grouping and layering. Each group represents the carry state of the group by a carry transfer function (P) and a carry generation function (G), the carry state is uploaded to a transfer layer, and the carry value of each group is given by each layer through judgment. With the increase of the word width and the increase of the levels of the adder, the carry structure has the problems that the carry speed is reduced when elements are increased, connecting lines are increased and the connecting lines are increased.

The single-chain carry adder structure represented by Manchester chain is to use bypass mode to raise carry speed, and use P signal to control the bypass of this group of carry chain. Since each bypass node is implemented by logic gate or multi-way switch and transmission gate (analog) step, when the word width is increased, the total delay of each node is also increased and the speed is reduced.

The carry mode with difficult word width expansion is a logic structure which is gradually pushed from the carry state of each bit at the bottom layer to the high layer, and is a main problem for restricting the operation of the computer wide word.

2. Counter with a memory

A general 16-bit synchronous parallel counter has a high counting frequency. If the count is larger than 16 bits, N chips are connected in series (4 bits are counted in parallel in each chip) to form an inter-chip asynchronous count. Due to the asynchronous carry time delay between chips, the counting frequency is obviously reduced along with the increase of N.

3. Multiplier and method for generating a digital signal

The Wallace structure multiplier with higher running speed firstly multiplies a multiplicand and a multiplier (N-bit word length) by bits to obtain N points (output points of two-phase AND) to form N bottom layer data with the word length of N bits, and each three points of the layer are added by CSA (carry Save adder) according to rules to obtain two points of the upper layer. The variable quantity of each layer is decreased progressively according to [2N/3], the variable length is increased progressively by 4 bits, the time delay is increased progressively by 4td, when the layer-by-layer is pushed to have only 2 variables, the 2 variables are added by using an adder with 2N bits to obtain a product of multiplication. This structure has problems of a large amount of underlying data, poor compression regularity of the CSA by a factor of 3, and complicated wiring.

The problems of the adder, the counter and the multiplier can be reduced to a carry problem, firstly, the carry mode of the adder is improved, so that the wide-word high-speed adder is realized, and the wide-word counter and the multiplier are designed based on the principle of the wide-word high-speed adder.

Disclosure of Invention

For convenience of explanation, the present invention adopts TTL original components and assemblies for logic description, and the logic description can be implemented by corresponding CMOS original components and assemblies or by programming and integration.

According to the binary addition principle, when two binary variables with the word length of N bits are added, the maximum value of forward carry is 1, so that a single-chain carry structure is adopted, and the carry state can be completely represented. However, in the current single-chain design, the multi-way switch and the transmission gate (simulation) are often idealized (internal resistance at on-time is 0, internal resistance at off-time is-0), and in practice, the on/off and transmission delay are difficult to realize high-speed operation. With the development of network technology, a digital bus Switch (SW) dedicated for bus switching instead of an analog signal transmission gate, a multi-way switch and the like has the characteristics of high switching-on/off speed, small transmission delay, complete digitalization and easy integration. The utility model discloses introduce it for the first time and be used as the control of carry, effectively improve the transmission speed of carry.

The length of the controlled single chain is from the lowest bit to the highest bit, whether the link is switched on (switched on) or switched off (switched off) can be determined according to the logical relation of each position, so that the carry chain can be dynamically divided into a plurality of sections, the carry state required by each bit in the section is kept in each section, and finally, the summation operation is realized in parallel.

Let two variables (binary representation) a (an, …, ai, …, a1) and B (bn, …, bi, …, B1) add up to S (sn +1, sn, …, si, …, S1), and carry signals are correspondingly represented by C (cn, …, ci, …, C1).

Analysis was performed for 1 site (ai, bi) whose carry logic was only two cases:

(1) when in use

The present site does not generate a carry state and cannot receive a carry state of a later bit (the present site is occupied);

(2) when in use

The present site can receive the carry state of the last bit (the present site is empty), and the carry state of the present site forward is determined by ci.

For the case (1), controlling the link of the current position to be opened, and keeping the state of the current carry chain unchanged;

in case (2), the link is cut off (closed) at this point, i.e. a segmentation point is formed at this point, and this break point is the ending point of the next-stage carry link segment and also the starting point of the previous-stage carry link segment, and at the same time, the carry state ci at this point is sent to the previous-stage link to become the initial carry state.

For this purpose, 1 signal K (kn, …, ki, …, K1) controlling the on/off link is provided per site for controlling the digital switches to implement the on/off link. The truth relation is given according to the analysis:

obtaining a logic expression:

ki is ai ⊙ bi, ci is ai bi when ki is 1, and ci is high resistance when ki is 0;

therefore, a logic principle diagram 1 is obtained, and a 16-bit adder block diagram formed by taking the logic principle diagram as 1 unit and 16 units is shown in FIG. 2.

The operation of the 16-bit adder is described below with reference to fig. 1 and 2.

Let a ═ (1101110011010111); b ═ (0010010001001101); c0 is 1;

S＝A+B+c0＝(1101110011010111)。

the operation process is as follows:

i	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
																			ai	0	1	1	0	1	1	1	0	0	1	1	0	1	0	1	1	1
bi	0	0	0	1	0	0	1	0	0	0	1	0	0	1	1	0	1
																			ki	1	0	0	0	0	0	1	1	1	0	1	1	0	0	1	0	1
ci	0	*	*	*	*	*	1	0	0	*	1	0	*	*	1	*	1	1
																			si	1	0	0	0	0	0	0	0	1	0	0	1	0	0	1	0	1

k1, k3, k6, k7, k9, k10, k11 and k17 are all 1 by ki-ai ⊙ bi, the digital switch SW (element No. 6 in fig. 1) of the corresponding unit is controlled by the digital switch SW, the carry chain is cut off, the tristate gate TS (element No. 5 in fig. 1) is controlled by the tristate gate SW, the carry state ci of the current bit enters the previous-stage link, other bits (k2, k4, k5, k8, k12, k13, k14, k15 and k16) are all 0, the SW of the corresponding unit is switched on (opened), and the TS output is controlled to be high-resistance, so that ci is prevented from entering the link.

This divides the carry chain into 8 segments (indicated by double lines in the table). The ci output at this point is denoted as high impedance, the carry state on the link remains unchanged, and the carry state in these 8 segments is determined by the states of the 8 start points, i.e., c1, c3, c6, c7, c9, c10, c11, and c 17. When summed, the actual value is the value of the starting point of the segment, i.e. the sum is the value of the starting point of the segment

c16～12＝c11＝1；c8＝c7＝1；c5～4＝c3＝1；c2＝c1＝1。

For example:

in the most extreme case, let ki be 0(i be 16, 15, …, 2, 1) and k17 be 1. When the carry chain is only 1 segment, c 0-1 can be directly fed into c16, reflecting the shortcut characteristic of the carry mode.

1.128 bit adder design

According to the segmentation principle, for a 128-bit adder, if each site is taken as 1 node, the most extreme case requires that TS of the 1 st node has the capability of fanning out 128 gates, while the maximum fanning out of TS currently used for bus transmission is about 30 gates, and the optimal fanning out of TS is set to 24 gates in consideration of internal resistance (Ron) of each SW, then each 24 nodes needs to be connected with 1 TS in series, and 128 nodes have 6 TS delays, thereby reducing the carry speed. If 4 bits are used as 1 node, then 16 nodes (64/4) with 64 bits need only 1 TS to drive, while the classic 4-bit parallel carry look ahead adder has the advantage of high speed concurrency, and its summed intermediate variables P and G can be used as control signals for the node, thus effectively reducing the number of bits. According to P and G of each node, SW is controlled to switch on (open) or switch off (close) the link, TS is controlled to change the carry state, thus the carry chain is dynamically divided into a plurality of sections by taking the node as a unit, the carry state required by each node in the section is kept in each section, and finally the summation operation is realized in parallel. The structure of the word width capable of linear expansion (single chain series connection) is a scheme that the advance parallel carry in the nodes and the single chain carry between the nodes are combined by parallel addition logic, a digital bus switch and three-state drive.

And 4 sites are set as 1 node, and a 128-bit adder is formed by combining a general chip (SN74S 181). Firstly, the correlation characteristics of the 4-bit parallel adder chip are analyzed, and the logic expressions (negative logic) of P and G of the SN74S181 are developed:

P＝pi+3·pi+2·pi+1·pi；

G＝gi+3+gi+2·pi+3+gi+1·pi+3·pi+2+gi·pi+3·pi+2·pi+1。

p indicates whether a carry state is generated at 4 bits in the chip, and G indicates the generated carry state, so that SN74S181 can be used as a main body of 1 cell, and the digital switch and the carry state can be controlled by using P, G signals.

When P is equal to 0, the 4 bit points in the chip do not generate a carry state, a link is opened, and G is prevented from entering the carry chain;

when P is 1, at least 1 bit of 4 bits in the display slice generates carry state (G is 1, carry state is 1, 6 is 0, carry state is 0), closes (cuts off) the link, forms a segment point and puts the carry state G of the current position into the front segment chain.

The unit includes 1 AUL (SN74S181) and 1 digital Switch (SW), 1 tri-state gate (TS) and 1 anti-gate, see FIG. 3. The 64-bit adder is composed of 16 units as shown in fig. 4, and the 128-bit adder is obtained by connecting 2 lines of fig. 4 in series as shown in fig. 5. The implementation and operation process are shown in the implementation example 1.

Performance comparison 1

For comparison, in the following performance comparison, the same level of original is used for both the typical application (selected comparison object) and the present embodiment (1, 2, 3). And comparing the performances of the parts with difference by using the number of elements, the number of connecting wires, the operation speed and the expandability. The table below gives the relevant original parameters (see original description of TI company).

TABLE 1 parameter table of related original

A 64-bit typical adder of 16-slice 74S181 and 5-slice 74S182 slices (see table 1) was compared to the 64-bit adder implementing example 1 for performance.

Typical of

A look-ahead chain consisting of 5 chips 74S182,

element amount: each 15 gates, 5 gates, 75 gates; the number of connecting lines: each incoming line is 9 x 5 pieces, and the total number of lines is 45;

the running speed is as follows: when 64 bits are used, the time required is 11+18+19+ 28-76 ns (see table 1).

If the word width is increased to 128 bits, the delay increase proportion is increased by at least 35ns according to table 1, and the total delay is 76+ 35-111 ns.

Examples 1

A control unit for the carry chain in 16 units (figure 3),

element amount: each cell has 4 gates 16-64 gates, and each cell has 4 stubs 16-64 stubs.

The running speed is as follows: each SN74S181 achieves 1 addition (see table 1) ═ 11 ns; the TS maximum concurrency delay (tpLH of 74F125DR64 in Table 1) is 9ns, and the SW on/off maximum delay (ten or tdis of 74CBT3215CPWR in Table 1) is 4.5ns and the 1-stage inverse gate delay is 3ns, which is 7.5 ns. Two chips are concurrent, so the maximum delay is 9ns, the transmission delay of SW (tpd of 74CBT3215CPWR in table 1) is 0.15ns per gate, and 16 × 1.25 ═ 2.4ns, and 11+9+2.4 ═ 22.4 ns.

If the word width is increased to 128 bits, only 2 64-bit adders are needed to be connected in series, and the time consumption is increased by only 16 SW transmission delays tpd and 8ns of 1-stage TS: 0.15 × 16+8 ═ 10.4 ns. The total delay is 22.4+10.4 ═ 32.4 ns.

The comparison is as follows:

	component	Connecting wire		64 bit time delay	128 bit time delay
						Typical of	75 door	45 long	76ns	111ns
Example 1	65 door	64 short	22.4ns	32.4ns

2.128 bit synchronous counter design

The 128-bit synchronous counter is formed by the single-chain segmentation principle and the counter chip (SN74S 163).

Firstly, the related characteristics of SN74S163 chip are analyzed, the count carry condition logic expression of ith chip is developed, Dj is input of Qj register (j is j-j +3), Qj is current value of said register and-sign represents "not", ENTi, ENPi and RCOi are respectively fixed count enable, selective count enable and dynamic carry output signals of ith chip, there are

Dj＝～Qj⊙(ENTi·ENPi)；

Dj+1＝～Qj+1⊙Qj·(ENTi·ENPi)；

Dj+2＝～Qj+2⊙Qj·Qj+1·(ENTi·ENPi)；

Dj+3＝～Qj+3⊙Qj·Qj+1·Qj+2·(ENTi·ENPi)；

RCOi＝Qj·Qj+1·Qj+2·Qj+3·ENTi。

If ENPi is the carry state of the i-1 th chip to the chip (i) and ENTi is 1, the chip starts to count synchronously according to CLK when the ENPi is 1; when ENPi is 0, CLK keeps each bit as it is and does not count.

Since ENTi is 1, RCOi (Qj · Qj +1 · Qj +2 · Qj +3 · 1) reflects the state of the 4-bit count in the chip and is used to control the open/close of the carry chain.

If RCOi is equal to 1, the 4-bit count in the chip is full, a link is opened, the carry state of the rear unit (i-1) continues to carry to the front unit (i +1), and the RCOi state of the unit is prevented from entering the carry chain.

When RCOi is equal to 0, the link is cut off to form 1 segmentation point, and the carry state RCOi of the unit is sent to the previous-stage carry chain.

This unit includes a 1-slice 4-bit parallel counter (SN74S163), 1 SW, 1 TS, and 1 inverter gate, see fig. 6. The 64-bit synchronous counter formed by connecting 16 units in series is shown in figure 7, and the 128-bit synchronous counter is obtained by connecting 2 lines shown in figure 7 in series. See example 2 for implementation steps.

Performance comparison 2

A64-bit typical counter (see the N-BITSCHRONOUS COUNTERS diagram of the original description of SN74S163) connected by 16 pieces of SN74S163 is compared with the present embodiment example 2.

Typical of

The longest count time delay (CLK to RCOtpLH + ENT to RCOtpLH (16-2) + ENPtsu) is shown in table 1, where CLKto RCOtpLH is the time delay from the clock signal to the rise of the dynamic input/output signal, ENT to RCOtpLH is the time delay from the fixed count enable signal to the rise of the dynamic input/output signal, ENPtsu is the recovery time delay from the selection count enable ENT signal back edge, and there is 25+20 + 14+ 20-325 ns

EXAMPLES example 2

The counting longest delay is the sum of CLK to RCOtLH and the longest delay of the carry link. CLK to RCOtpLH (see table 1) is 25ns, the longest delay (RCOi 1 of each unit) from unit 1 to unit 16 has a simultaneous on time of 15 SW and TS (see table 1) and a respective SW propagation delay (tpd in table 1), and has a delay of 9ns +15 x 0.15 ns-11.25. A total of 25+ 11.25-36.25 ns.

The comparison is as follows:

	typical 64-bit asynchronous counter	Example 2(64 bit counter)
			Number of original	On-chip 4-bit parallel counting logic circuit 15	Increase 3 x 16 gates more
Number of connecting wires	16*5	Increase by 3 x 16 lines
			Counting time delay	325ns	36.25ns

Embodiment 2 adds control lines to each node, but the longest delay in the count is significantly lower than that of a typical line.

3.64 bit multiplier design

The wide-bit adder of the present invention can be combined with 4 × 4 bit multiplier chips (SN74284, SN74285) to design a high-speed wide-bit multiplier, and the principle of the wide-bit adder is illustrated by using a 32-bit multiplier as an example.

The 32-bit multiplicand A and the multiplier B are grouped into Ai and Bi (i iS 1-8) according to 4 bits, and the partial product iS obtained from a 4x4 chip according to multiplication rules Ai and Bi (see figure 10), so that 16 variables (iS, i iS 1-16) with the length of 32 are obtained and are used as a bottom layer data set.

Referring to embodiment example 3 and fig. 11-13, the 16 variables are correspondingly added according to the multiplicative expansion to obtain 8 36-bit variables (iT, i ═ 1-8) of the first layer; correspondingly adding the iT to obtain 4 40-bit variables (iU, i is 1-4) of the second layer; correspondingly adding the iU to obtain a third layer of 2 variables with 48 bits (iV, i is 1-2); finally, the two iV are added to obtain the product of a × B of the top layer.

The 1 unit of the bottom layer comprises 1 SN74284 and 1 SN74285 respectively as shown in figure 8; the bottom multiplication array composed of 8 by 8 cells is shown in fig. 9, and the steps for implementing the first, second, third and top layers are shown in embodiment example 3. According to the multiplication expansion, the bottom data set of 64-bit multiplication is obtained by connecting 4 arrays of FIG. 8 into 16-by-16 arrays, and the adder length of the corresponding layer is widened to obtain the 64-bit multiplier.

Performance comparison 3

A 32-bit multiplier using a typical wallace architecture is compared to implementation example 3.

Typical of

The multiplier layer, variable number and variable length variation of the 32-bit wallace structure are as follows:

hierarchy	Bottom		1	2	3	4	5	6	7	8	Top roof
												Number of variables	32	22	16	12	8	6	4	3	2	1
Number of variable bits	32	36	40	44	48	52	56	60	64	64

The running speed is as follows: each layer of CSA needs 4td delay of 12ns, and the 9 layers need 108ns, and 64-bit addition delay (see performance comparison 1) needs 76ns and 184 ns;

if the 64 bits are extended, 3 layers (644430) are added, the 2-layer CSA delay is further increased by 24ns, the 128-bit addition (see performance comparison 1) is 111ns, and the total delay is 108ns +24ns +111 ns-243 ns.

Each CSA layer is wired according to a multiple point of 3, and the connection is complex (as shown in the table above) and is not easy to expand.

EXAMPLE 3

The layers, variable number and variable length variation are as follows:

hierarchy	Bottom	A	II	III	Top roof
						Number of variables	2	16	8	4	2
Number of variable bits	32	32	36	40	48

The running speed is as follows: the bottom layer needs 22ns (see SN74284 and SN74285 original descriptions), each layer needs 22.4ns (see performance comparison 1), and the total of 89.6ns and 111ns are contained in 4 layers;

if the 64 bits are expanded, a layer (6432) is needed, the 1-layer addition delay is further increased by 22.4ns, and the 128-bit addition delay (see performance comparison 1) is 32.4ns, which is 89.6ns +22.4ns +32.4ns, namely 143.4.

The multiplicand A and the multiplier B are directly connected into the 4x4 bit multiplier, each layer is wired according to the multiple of 2, the connection is simple (as shown in the table above), and the expansion is easy.

The comparison is as follows:

	underlying data generation	Complexity of wiring	32 bit time delay	64 bit time delay
					Wallace structure	32 x 32 AND gates	Integral multiple of 3, and is not easy to wire	184ns	243ns
Example 3	A、BDirect access	Integral multiple of 2, easy wiring	111.6ns	148.4ns

Drawings

FIG. 1 is a schematic diagram of a full adder

In the figures 1.SN74F08, 2, 3.SN74S86, 4.SN74F04, 5.SN74F125, 6.SN74 CBT3125, ai, bi: site, si: full addition, ci-1: carry of last bit, ci: carry forward bit.

216-bit adder block diagram

In the figure, 1 SN74F125, I1-17.17 full adder units in the figure 1, ai, bi: 16 sites, si: the full sum of the ith bit, ci. carry forward of the ith bit.

FIG. 34 bit parallel adder unit

In the figure, 1.SN74S181, 2.SN74F04, 3.SN74F08, 4.SN74F125, 5.SN74CBT3125, ai + 3-i, bi + 3-i: 4 sites, si + 3. about.i: sum of 4 bits, Cin: carry-back unit, Cout: carry forward cell, P: carry transfer function of 4 sites, G: the carry over of 4 positions generates the function.

FIG. 464 bit adder Block

In the figure, 1 SN74F125, II 1-17.17 units shown in figure 3, ai, bi: 64 sites, si: sum of 64 bits, CXin, CXout: carry-in and carry-out of the X-th bit.

FIG. 5128 bit adder Block

In the figure, 1 SN74F125, II 1-33.33 units shown in figure 3, ai, bi: 128 sites, si: sum of 128 bits, CXin, CXout: carry-in and carry-out of the X-th bit.

FIG. 6 counter unit

In the figure, 1.SN74F163, 2.SN74F04, 3.SN74F125, 4.SN74CBT3125, Qd-Qa: 4-bit count value, D-A: preset count value, RCO, ENT: count feature bits, ENP: carry-back unit, LD: preset put pulse, CK: count pulse, Clr: a 0 pulse is set.

764 bit counter block diagram

1-16 in the figure: 16 units of FIG. 6, 17-19: SN74F 125.

FIG. 84 bit multiplier cell

In the figure, 1.SN74284, 2.SN74285, ai, bj: 4 sites, Sj: an 8 bit partial product.

FIG. 932 bit multiplier block diagram

In the figure, ApBr is 8 × 8 multiplication arrays composed of fig. 5, Ap represents a block number p where a multiplicand a is located, and Br represents a block number r where a multiplier B is located.

The wiring scheme for generating the partial product kS from the block number: if ApBr is set, i is (p-1) × 4+ 1; j ═ r-1 × 4+ 1; k ═ r +1)

When p + k is an even number; when p + k is an odd number, k is r. kSi (k is 1-16) is generated.

FIG. 1032 bits multiplier bottom data generation schematic.

Fig. 11 example 3 bottom level data generation diagram.

Fig. 12 is a graphical representation of the first-level data operation of example 3.

FIG. 13 is a graphical representation of example 3 second, third, and top level data operations.

Detailed Description

Example 164 bit adder

The chip is selected according to the design scheme. Digital Switches (SW) per cell: SN74CBT3125 CDWR; three-state gate (TS): SN74F125DRG 4; and (3) reverse door: SN74F 04; a buffer door: SN74F 08; 4-bit adder: SN74S 181. And 16 units are used.

A connecting line is determined. The carry state Cin of the rear unit to the unit is connected with the input (in) of the SW and the Cn of the SN74S 181; the carry state signal Cout of the forward unit of the unit is connected with the output (out) of SW and the output of TS; the P signal generated by AUL is connected with the enable end (En) of TS and the enable End (EN) of SW through a first-stage inverse gate; the input of TS is connected with G signal generated by AUL. The addition is indicated by setting M, s3, s2, s1, and s0 of AUL to 0, 1, 0, and 1. See fig. 3.

The 64-bit adder formed by connecting 16 units in series by using FIG. 3 as 1 unit is shown in FIG. 4.

Example procedure (for ease of analysis, using positive logic representation)

Let U ═ 16 (C4576D676D36B8E 8); v ═ 16 (3BA8955398994B 1A); c0 is 1;

then W + V (1000002BB05D00213) 16. Then

i	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	c0
																			Ui	0	C	4	5	7	6	D	6	7	6	D	3	6	B	9	E	8
Vi	0	3	B	A		8	9	5	5	3	9	8	9	9	4	8	1	A
																			Pi	0	1	1	1	1	1	0	0	0	1	0	0	1	1	0	1	0
Gi	0	*	*	*	*	*	1	0	0	*	1	0	*	*	1	*	1	1
																			Wi	1	0	0	0	0	0	2	B	B		0	5	D	0	0	2	1	3

The carry chain is divided into 8 segments (indicated by double lines in the table) by Pi ═ 0. The carry state in each segment is determined by the carry value of the cleavage site, i.e., G1, G3, G6, G7, G9, G10, G11, G17, and the state of the carry chain held by the open-end of the site (the output of TS is high-impedance) is indicated by a value, which is the actual value of the starting site of the segment, i.e., the value of the starting site of the segment

G16～12＝G11＝1；G8＝G7＝1；G5～4＝G3＝1；G2＝G1＝1。

For example, 1:

u3 ═ 2 (1001); v3 is (1000)2, the indices j +3, j +2, j +1, j correspond to two sets of binary values. Is provided with

G3＝gj+3+gj+2·pj+3+gj+1·pj+3·pj+2+gj·pj+3·pj+2·pj+1

＝1·1+(0·0)·0+(0·0)·0·0+(1·0)·0·0·0＝1

So the carry chain is broken at this point by P3 and G3 serves as the start point for the precedent carry chain;

the sum S3 ═ U3+ V3+ G2 ═ (1001) + (1000) + G1 ═ 10010 ═ (0010), and position 5 is automatically eliminated.

For example, 2:

U4＝(1011)2；V4＝(0100)。

is provided with

Therefore, the carry chain is opened by P4 and prevents G4 from entering the carry chain, indicated by x in the table.

The sum S4 ═ U4+ V4+ G3 ═ 1011) + (0100) +1 ═ 10000 ═ 0000, (0000), and the high carry 1 is automatically removed.

The same may be implemented for an implementation example of a 128-bit adder.

264 bit parallel counter of embodiment example

The chip is selected according to design requirements. A counter chip: (SN74F163), 16 pieces in total; control means of each unit: SW, TS and 1 of the inverse gate are respectively shown in figure 6; 16 cells, SW, TS, inverse gates each initiate drive TS, 16 and 3.

Determining connecting lines according to design, wherein all units are respectively connected according to the graph 6, Cin is connected with in of ENP and SW, RCO is connected with EN end of SW, input of TS and EN end, and output of TS is connected with Cout end; the master control signals CLR, CK, LD and Vcc are respectively connected to the CLR, CK, LD and ENT ends of each unit; the Cin of each unit is connected into the Cout of the unit, and 16 units are connected in series to obtain a 64-bit counter, which is shown in FIG. 7.

Example procedure and analysis:

1. when X64-1 is (F … FE) and the data are put into a counter Q64-1 by an LD pulse, at the moment, an RCO 16-2 of a unit 16-2 is 1 to turn SW 16-2 on, an RCO1 of a unit 1 is 0, a link is cut off, a carry state RCO1 (0) is sent into a front-stage carry chain, and ENPi of the unit 16-2 is not counted. CK can only be counted in unit 1.

2. When the 1 st CK pulse arrives at river, the 1 st unit counts up, RCO1 is 1, and the link is opened. All the SWs of the 16 to 1 units are turned on, and ENPi of each unit is 1 (generated by Cin of unit No. 1). At this point 16 cells are all in a countable state.

3. When the 2 nd CK pulse arrives at the Leishi, the count value of each unit of 16-1 is 0, and RCOi is 0 because the count value of each unit is 0, the link of each unit is synchronously cut off. When the ENPi of the 16-2 units is 0 and the ENP1 is 1, the unit 1 counts from 0.

This example is the case of the longest delay, namely the carry process from the lowest carry RCO1 signal to the ENP16 input, which proves its ability to synchronize and parallelize.

The same may be implemented for a 128-bit counter implementation example.

Example 332 bit multiplier

The chip is selected according to design requirements. 4x4 multiplication means: (SN74284 and SN74285)1 groups, 8 by 8 groups; an adder section: taking fig. 3 as a basic unit, the first layer is formed by connecting 36 basic units in series into 1 adder unit with 36 bits, and 8 units; the second layer 40 of basic units are connected in series into 1 adder unit with 40 bits and 4 units in total; the 48 basic units of the third layer are connected in series into 1 adder unit with 48 bits and 2 units in total; the top 64 base units are concatenated into 1 adder unit of 64 bits.

The connection is by design, see fig. 8, (SN74284 and SN74285) with the 13, 14 pins grounded, and the multiplicand a (A8, a7, a6, a5, a4, A3, a2, a1) and multiplier B (B8, B7, B6, B5, B4, B3, B2, B1) are wired by the following wildcard (see fig. 9):

ApBr is (p-1) × 4+1, j ═ r-1) × 4+ 1. k is (r +1), when p + r is an even number; when p + r is an odd number: the output end of the multiplication chip generates bottom 16 data kS (k is 1-16) with the length of 32 bits, and the format of the bottom layer of the table is shown below.

The following tables one to three and the top layer format are accessed to each layer of addition calculation unit components (see fig. 3 and embodiment example 1) to realize the summation of each layer, and the product of A and B is obtained on the top layer.

Example analysis

Let a ═ (A8 a7 a 65 a4 A3 a2 a1) ═ D4576D 67) 16;

B＝(B8 B7 B6 5 B4 B3 B2 B1)(6 D 3 6 B 9 E 8)16。

then W ═ a × B ═ (53C3329B0C049458) 16.

The generation of bottom layer data and the operation process of each layer are shown in figures 11-13.

Claims

1. The wide word high-speed segmented carry adder, counter and multiplier are characterized in that the carry adder, counter and 64-bit multiplier are formed based on a 128-bit adder, counter and 64-bit multiplier which are formed by a single-chain segmented carry control component, the single-chain segmented carry control component comprises 1 digital bus switch SW, 1 three-state gate TS and 1 inverse gate, a carry transfer function P and a carry generation function G are generated by an addition, counting or multiplication logical circuit, the on/off of the SW is controlled by the P to realize the on/segmentation of a link, an enabling end of the TS is controlled to realize whether the G enters the link or not, and the SW and the TS form a mutual exclusion relation: if P is effective, opening the link, and not enabling TS, and preventing G from entering the link; p is inactive then the link is cut, the link is fragmented and TS is enabled to bring G into the previous segment link.

2. The wide-word high-speed segmented carry adder, counter and multiplier as claimed in claim 1, wherein said 128-bit adder is composed of 1 single-chain control unit and 4-bit parallel adder units to form 1 unit, said unit comprising: a carry control unit with 1 single chain segment type carry control unit and 4-bit parallel adder ALU, which uses P and G generated by ALU to realize control of single chain control unit, 32 units are connected in series, Cout of unit i is connected with Cin of unit i +1 to form 128-bit parallel adder, addend A (a128, a127, …, a2, a1) and addend B (B128, B127, …, B2, B1) and c0 are input into adder, and the adder outputs the sum S (S129, S128, …, S2, S1).

3. The wide-word high-speed segmented carry adder, counter and multiplier as claimed in claim 1, wherein said 128-bit synchronous count is comprised of 1 single-chain control element and 4-bit parallel counter elements in 1 unit comprising: 1 single-chain segmented carry control component, 4-bit parallel counter, setting each fixed enable signal ENTi as 1, then each dynamic carry output signal RCOi as Qa.Qb.Qc.qd.1, RCOi has dual property of P and G, and using it to realize control to single-chain control component, RCOi as 1 shows that 4-bit count is full, opening link, carry state Cin of back unit continues to carry to unit, and prevents RCOi state from entering carry chain; when RCOi is equal to 0, cutting off a link to form 1 segmentation point, sending the RCOi of the unit into a front-segment carry chain, connecting 32 units in series, connecting Cout of the unit i with Cin of the unit i +1, and connecting the ends of CLR, CK, LD and ENT of each unit with the ends of a master control signal CLR, CK, LD and Vcc respectively to form a 128-bit parallel synchronous counter, wherein CLR is a counter clear 0 pulse; CK is a counting pulse; LD is a preset pulse, and preset numbers x128, x127, …, x2, x1 can be preset in the counter; q128, Q127, …, Q2, Q1 are current count values.

4. The wide-word high-speed segmented carry adder, counter and multiplier as claimed in claim 1, wherein said 64-bit multiplication is performed by obtaining 16 bottom layer data sets each having a length of 32 bits from a multiplication array consisting of 8 × 8 bottom layer units, each bottom layer unit comprising 1 SN74284 and SN 74285; the method as claimed in claim 2, wherein the single chain control unit and the 4-bit parallel adder unit form 1 addition unit, and the one, two, three and top addition operations are implemented according to the multiplication expansion rule to obtain the 128-bit multiplication product.