CN1707426A - Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof - Google Patents
Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof Download PDFInfo
- Publication number
- CN1707426A CN1707426A CN 200410025007 CN200410025007A CN1707426A CN 1707426 A CN1707426 A CN 1707426A CN 200410025007 CN200410025007 CN 200410025007 CN 200410025007 A CN200410025007 A CN 200410025007A CN 1707426 A CN1707426 A CN 1707426A
- Authority
- CN
- China
- Prior art keywords
- multiplier
- input end
- multipliers
- row
- vector input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The present invention discloses one kind of operand distributing device and method backing up the digital signal processing in microprocessor. The operand distributing device includes two input units, one access and memory part and one executing part connected between the input units and the access and memory part. It features the executing part comprising multiplier matrix of small bit width multipliers, and includes also two operand allocating connectors connected separately between one of the input units and the executing part. The operand distributing method includes allocating connector to connect the horizontal vector inputs and vertical vector inputs of the input units with the horizontal vector inputs and vertical vector inputs of the multipliers via different routings, and outputting the operation results of different configuration from the access and memory part. The present invention raises the digital signal processing performance of microprocessor greatly.
Description
Technical field
The present invention relates to operand distributor and distribution method thereof, be used for supporting digital signal processing efficiently at microprocessor based on configurable multiplier matrix structure.
Background technology
When using microprocessor to carry out the high performance digital signal processing, general multiplier and other unit construction of big bit wide of adopting carries out, as shown in Figure 1, comprise: execution unit (big bit wide multiplier) 11, and 12,13 and memory access parts 14 of two groups of input blocks that are connected with described multiplier respectively.Because the multiplier of big bit wide can't provide enough computing channel bandwidths, therefore cause the performance of digital signal processing not ideal enough.
Summary of the invention
A kind of operand distributor and the distribution method thereof based on configurable multiplier matrix structure that provide in order to solve the existing microprocessor problem that efficient is lower when carrying out digital signal processing is provided, this structure and method can be under the prerequisites that only increases few parts, realize the various configurations of operand, thereby improve the performance of digital signal processing greatly.
The technical measures that the present invention takes are:
A kind of operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;
Be characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operand configuration connectors; Described operand configuration connector is connected between the input end of each self-corresponding two groups of input block and corresponding execution unit, the operating results that memory access parts output difference disposes.
Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described each little bit wide multiplier is meant 8 multipliers, this multiplier can be accepted two 8 potential source operands, and produces 16 results.
Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.
Operand distributor method based on configurable multiplier matrix structure is characterized in, may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Aforesaid operations is counted distribution method, and wherein: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.
Aforesaid operations is counted distribution method, and wherein: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.
Aforesaid operations is counted distribution method, and wherein, described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein routine multiplier in the multiplier matrix respectively;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
Aforesaid operations is counted distribution method, and wherein, described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
Aforesaid operations is counted distribution method, and wherein: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end;
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition;
The output result is:
First kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
Because the present invention has adopted above technical scheme, can reach following beneficial effect:
1, under the prerequisite that only increases a small amount of hardware, strengthens the performance of digital signal processing greatly.
2, reduce the integrally-built modification of microprocessor, thereby prevent to introduce extra mistake.
Description of drawings
Concrete structure performance of the present invention is further described by following embodiment and accompanying drawing thereof.
Fig. 1 is an operand distributor of supporting digital signal processing in the existing microprocessor.
Fig. 2 is the synoptic diagram that the present invention is based on the operand distributor structure of configurable multiplier matrix structure.
Fig. 3 is the multiplier matrix structural representation among Fig. 2 of the present invention.
Fig. 4 a, 4b are respectively the route connection diagrams of one of the present invention embodiment of disposing the input variable signal.
Fig. 5 a, 5b, 5c, 5d are the distribution output result schematic diagrams of one of Fig. 4 embodiment.
Fig. 6 a, 6b, 6c, 6d are respectively two the route connection diagrams of the present invention embodiment of disposing the input variable signal.
Fig. 7 a, 7b, 7c, 7d are two the distribution output result schematic diagrams of Fig. 6 embodiment.
Fig. 8 a, 8b are respectively three the route connection diagrams of the present invention embodiment of disposing the input variable signal
Fig. 9 a, 9b are three the distribution output result schematic diagrams of Fig. 8 embodiment.
Embodiment
The operand distributor of digital signal processing comprises 23, two operand configurations of 21,22, one execution units of two groups of input blocks connector 24,25 as shown in Figure 2 in the whole support microcontroller, and memory access parts 26.In two groups of input blocks 21,22 wherein one group be horizontal vector input block 21, another group is divided 48 horizontal vector Y for vertical vector input block 22 in the horizontal vector input block 21
0~Y
3, vertical vector input block 22 is divided 48 vertical vector X
0~X
4, memory access parts 26 (are used Z among all the other each figure
0~Z
3Show) the different operating results that dispose of output.
See also Fig. 3, execution unit 23 forms multiplier matrix by 4 * 4=16 8 (little bit wide) multiplier M and constitutes, and each 8 multiplier can be accepted two 8 potential source operand A, B, and produces one 16 C as a result.
The input end of an operand configuration connector 24 connects 48 horizontal vector Y in the horizontal vector input block 21 in two operands configuration connector 24,25
0~Y
3, its output terminal connects the horizontal input end of each multiplier in the multiplier matrix by suitable route; The input end of another operand configuration connector 25 connects 48 vertical vector X in the vertical vector input block 21
0~X
4, its output terminal is by the vertical input end of each multiplier in the suitable route connection multiplier matrix, and described memory access parts 26 are connected with the output terminal C of execution unit 23.
The present invention is by suitable displacement and route, can finish 32~32 * 32 long multiplication, high degree of parallelism 8 and operations such as 16 multiplication, FFT, thereby the high performance digital signal of realizing microprocessor handled.
Further specify method of the present invention and advantage below by specific embodiment.
See also Fig. 4 a, 4b, this is that one of embodiment of the invention realizes supporting that the operand of 32 * 32 multiplication distributes synoptic diagram.
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector X; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Suppose that Y and X are respectively two 32 bit vectors, Z is the multiplication result of Y and X.Then have
Y is divided into following 48 bit vectors:
Y’
0=Y
7...Y
0
Y’
1=Y
15...Y
8
Y’
2=Y
23...Y
16
Y’
3=Y
31...Y
24
X is divided into following 48 bit vectors:
X’
0=X
7...X
0
X’
1=X
15...X
8
X’
2=X
23...X
16
X’
3=X
31...X
24
Each 8 multiplier belongs to 28 input ends, and one of them is horizontal vector input end A, and another is vertical vector input end B.
Configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected; As shown in Fig. 4 a the horizontal vector input end Y in the horizontal vector input block
3, Y
2, Y
1, Y
0The horizontal vector input end A of delegation's multiplier in difference correspondence and the multiplier matrix
33~A
30, A
23~A
20, A
13~A
10, A
03~A
00Connect;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein row multiplier in the multiplier matrix respectively; As shown in Fig. 4 b the vertical vector input end X in the vertical vector input block
3, X
2, X
1, X
0The vertical vector input end B of a row multiplier in difference correspondence and the multiplier matrix
33~B
30, B
23~B
20, B
13~B
10, B
03~B
00Connect.
See also Fig. 5 a~5d, the configuration result of more than joining its output has 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
See also Fig. 6 a~d, this is two of embodiments of the invention, is used to support the realization of 8 complex vector located Radix4FFT (high speed Fourier transform).
Radix4 fft algorithm principle
If array f length is N, W=e
2 π i/NBelow be classical Radix 4FFT algorithm:
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector B; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 a, utilize the multiplier A of the 2nd in the multiplier matrix, the 4th row
23~A
20And A
03~A
00, with link Y in the horizontal vector input block
3With the A in the multiplier
23, A
03Connect, with link Y in the horizontal vector input block
2With the A in the multiplier
22, A
02Connect, with link Y in the horizontal vector input block
1With the A in the multiplier
01, A
21Connect, with link Y in the horizontal vector input block
0With the A in the multiplier
20, A
00Connect.
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.Shown in Fig. 6 b, utilize the multiplier B of the 1st in the multiplier matrix, the 3rd row
32~B
02And B
30~B
00, with link X in the horizontal vector input block
3With the B in the multiplier
30, B
32Connect, with link X in the horizontal vector input block
2With the B in the multiplier
20, B
22Connect, with link X in the horizontal vector input block
1With the B in the multiplier
10, B
12Connect, with link X in the horizontal vector input block
0With the B in the multiplier
02, B
00Connect.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 c, utilize the multiplier A of the 1st in the multiplier matrix, the 3rd row
00~A
03And A
30~A
33, with link Y in the horizontal vector input block
3With the A in the multiplier
33, A
13Connect, with link Y in the horizontal vector input block
2With the A in the multiplier
32, A
12Connect, with link Y in the horizontal vector input block
1With the A in the multiplier
11, A
31Connect, with link Y in the horizontal vector input block
0With the A in the multiplier
30, A
10Connect.
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected; Shown in Fig. 6 d, utilize the multiplier B of the 2nd in the multiplier matrix, the 4th row
01~B
31And B
03~B
33, with link X in the horizontal vector input block
3With the B in the multiplier
33, B
31Connect, with link X in the horizontal vector input block
2With the B in the multiplier
21, B
23Connect, with link X in the horizontal vector input block
1With the B in the multiplier
11, B
13Connect, with link X in the horizontal vector input block
0With the B in the multiplier
01, B
03Connect.
See also Fig. 7 a~7d, more than the configuration result of twice configuration output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
Illustrate:
Is (with language performance what to specific k?), multiplication matrix can calculate Fk in one-period.The configuration of multiplication matrix is as follows:
If (i j) is the multiplier of the capable j row of i to M.R (x) is the real part of plural x, and I (x) is the imaginary part of plural x, n=0, and 1,2,3:
The time series analysis of plural number Radix4 FFT
Plural Radix 4 FFT to N is ordered need log altogether
4N recurrence, the operation of each recurrence need operation types (1) N time.Be total to Nlog
4N time.
For length is the plural Radix4 FFT of N, and establishing loop unroll is k.Then operation is written into instruction and FFT2 and instructs and to need Nlog respectively
4N cycle.The branch instruction needs (Nlog
4N)/a k cycle.So need (2+1/k) Nlog altogether
4N cycle.
Fig. 8 a, 8b are the route synoptic diagram that three of the embodiment of the invention realizes 8 complex vector located dot product operations.
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end; Shown in Fig. 8 a, with link Y in the horizontal vector input block
3A with first row in the multiplier
33, A
32Connect, with link Y in the horizontal vector input block
2A with second row in the multiplier
23, A
22Connect, with link Y in the horizontal vector input block
1A with the third line in the multiplier
11, A
10Connect, with link Y in the horizontal vector input block
0A with fourth line in the multiplier
01, A
00Connect.
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition; Shown in Fig. 8 b, with the B in link X3 in the vertical vector input block and the 4th row multiplier
33, B
23Connect, with link X in the vertical vector input block
2With the B in the 3rd row multiplier
32, B
22Connect, with link X in the vertical vector input block
1With the B in the multiplier
11, B
01Connect, with link X in the vertical vector input block
0With the B in the multiplier
10, B
00Connect.
The output result is two kinds:
Ask for an interview Fig. 9 a, first kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Ask for an interview Fig. 9 b, second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
Illustrate:
The principle of 8 complex vector located dot product operations:
Be provided with two arrays:
A=<a
0,...,a
n>
B=<b
0,...,b
n>
Wherein for any plural a
i, make I (a
i) be its imaginary part, R (a
i) be its real part.
Then the dot product of two arrays is operating as an array as a result:
C=<c
0,...,c
n>
Wherein plural c
i=a
i* b
i
Collocation method to multiplier matrix
C wherein
iThe computing formula of imaginary part as follows:
I(c
i)=I(a
i)*R(b
i)+I(b
i)*R(a
i) (3)
c
iThe computing formula of real part as follows:
R(c
i)=R(a
i)*R(b
i)-I(b
i)*I(a
i) (4)
Take all factors into consideration the double-port access characteristic (phase two 32 words weekly) of multiplication matrix and storer, so to finish length be 2 dot product the phase weekly.The dot product that then to finish length be N needs N cycle.
Claims (9)
1. operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;
It is characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operands configuration connectors, described operand configuration connector be connected to each self-corresponding two groups of input block and corresponding execution unit input end it, the operating results that memory access parts output difference disposes.
2. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described each little bit wide multiplier is meant 8 multipliers, and this multiplier can be accepted two 8 potential source operands, and produces 16 results.
3. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.
4. based on the operand distribution method of configurable multiplier matrix structure, it is characterized in that, may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
5. operand distribution method according to claim 4 is characterized in that: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.
6. operand distribution method according to claim 4 is characterized in that: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.
7. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein routine multiplier in the multiplier matrix respectively;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
8. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
9. according to claim 4 or 6 described operand distribution methods, it is characterized in that: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end;
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition;
The output result has two kinds:
First kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410025007 CN1707426A (en) | 2004-06-09 | 2004-06-09 | Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410025007 CN1707426A (en) | 2004-06-09 | 2004-06-09 | Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1707426A true CN1707426A (en) | 2005-12-14 |
Family
ID=35581367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200410025007 Pending CN1707426A (en) | 2004-06-09 | 2004-06-09 | Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1707426A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107636640A (en) * | 2016-01-30 | 2018-01-26 | 慧与发展有限责任合伙企业 | Dot product engine with designator of negating |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
CN110337635A (en) * | 2017-03-20 | 2019-10-15 | 英特尔公司 | System, method and apparatus for dot product operations |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
-
2004
- 2004-06-09 CN CN 200410025007 patent/CN1707426A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107636640A (en) * | 2016-01-30 | 2018-01-26 | 慧与发展有限责任合伙企业 | Dot product engine with designator of negating |
CN107636640B (en) * | 2016-01-30 | 2021-11-23 | 慧与发展有限责任合伙企业 | Dot product engine, memristor dot product engine and method for calculating dot product |
CN110337635A (en) * | 2017-03-20 | 2019-10-15 | 英特尔公司 | System, method and apparatus for dot product operations |
CN110337635B (en) * | 2017-03-20 | 2023-09-19 | 英特尔公司 | System, method and apparatus for dot product operation |
US11847452B2 (en) | 2017-03-20 | 2023-12-19 | Intel Corporation | Systems, methods, and apparatus for tile configuration |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
CN108874744B (en) * | 2017-05-08 | 2022-06-10 | 辉达公司 | Processor, method and storage medium for performing matrix multiply-and-accumulate operations |
US11797303B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11797301B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11797302B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11816481B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1832344A (en) | Controller of graphic equalizer | |
CN1806274A (en) | Driving method of deplay device having main display and sub display | |
CN1924429A (en) | Method for manufacturing backlight and backlight | |
CN101055375A (en) | Backlight assembly and display device having the same | |
CN1877532A (en) | Compiler apparatus | |
CN1622180A (en) | Demultiplexer and display device using the same | |
CN1892636A (en) | Systems and methods of providing indexed load and store operations in a dual-mode computer processing environment | |
CN1207893C (en) | Image processing method and apparatus | |
CN1808571A (en) | Acoustical signal separation system and method | |
CN1551062A (en) | Data drive and electronic optical device | |
CN1707426A (en) | Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof | |
CN1165007C (en) | Data transmission device, display and data sender, receiver and transmission method | |
CN1227903C (en) | Image processing circuit | |
CN1784926A (en) | Array speaker system | |
CN1880984A (en) | Light guiding plate, backlight assembly having the same, and display device having the same | |
CN1967500A (en) | Resource using method in automatic testing process | |
CN1492313A (en) | Coordinate transformation method for digital scanning change-over device and processor | |
CN1900903A (en) | Using a graphics system to enable a multi-user computer system | |
CN1505866A (en) | Two-dimensional pyramid filter architecture | |
CN1627285A (en) | Method and system of interconnecting processors of a parallel computer to facilitate torus partitioning | |
CN1202467C (en) | Method of creating plurality of partitions on removable device | |
CN101051263A (en) | Processor, image processing system and processing method | |
CN1273936C (en) | Correspond rediation correction method for push-scanning satellite images CCD | |
CN100342643C (en) | Two-dimensional pyramid filter architecture | |
CN1034912C (en) | Process for production of gas containing large oxygen |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |