CN1707426A - Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof - Google Patents

Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof Download PDF

Info

Publication number
CN1707426A
CN1707426A CN 200410025007 CN200410025007A CN1707426A CN 1707426 A CN1707426 A CN 1707426A CN 200410025007 CN200410025007 CN 200410025007 CN 200410025007 A CN200410025007 A CN 200410025007A CN 1707426 A CN1707426 A CN 1707426A
Authority
CN
China
Prior art keywords
multiplier
input end
multipliers
row
vector input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410025007
Other languages
Chinese (zh)
Inventor
沈胜宇
李思昆
高树静
周军明
张谊
卢先兆
黄勇
曾亮
薛德贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hua Bo Technology (group) Co Ltd
Original Assignee
Shanghai Hua Bo Technology (group) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hua Bo Technology (group) Co Ltd filed Critical Shanghai Hua Bo Technology (group) Co Ltd
Priority to CN 200410025007 priority Critical patent/CN1707426A/en
Publication of CN1707426A publication Critical patent/CN1707426A/en
Pending legal-status Critical Current

Links

Images

Abstract

The present invention discloses one kind of operand distributing device and method backing up the digital signal processing in microprocessor. The operand distributing device includes two input units, one access and memory part and one executing part connected between the input units and the access and memory part. It features the executing part comprising multiplier matrix of small bit width multipliers, and includes also two operand allocating connectors connected separately between one of the input units and the executing part. The operand distributing method includes allocating connector to connect the horizontal vector inputs and vertical vector inputs of the input units with the horizontal vector inputs and vertical vector inputs of the multipliers via different routings, and outputting the operation results of different configuration from the access and memory part. The present invention raises the digital signal processing performance of microprocessor greatly.

Description

Operand distributor and distribution method thereof based on configurable multiplier matrix structure
Technical field
The present invention relates to operand distributor and distribution method thereof, be used for supporting digital signal processing efficiently at microprocessor based on configurable multiplier matrix structure.
Background technology
When using microprocessor to carry out the high performance digital signal processing, general multiplier and other unit construction of big bit wide of adopting carries out, as shown in Figure 1, comprise: execution unit (big bit wide multiplier) 11, and 12,13 and memory access parts 14 of two groups of input blocks that are connected with described multiplier respectively.Because the multiplier of big bit wide can't provide enough computing channel bandwidths, therefore cause the performance of digital signal processing not ideal enough.
Summary of the invention
A kind of operand distributor and the distribution method thereof based on configurable multiplier matrix structure that provide in order to solve the existing microprocessor problem that efficient is lower when carrying out digital signal processing is provided, this structure and method can be under the prerequisites that only increases few parts, realize the various configurations of operand, thereby improve the performance of digital signal processing greatly.
The technical measures that the present invention takes are:
A kind of operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;
Be characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operand configuration connectors; Described operand configuration connector is connected between the input end of each self-corresponding two groups of input block and corresponding execution unit, the operating results that memory access parts output difference disposes.
Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described each little bit wide multiplier is meant 8 multipliers, this multiplier can be accepted two 8 potential source operands, and produces 16 results.
Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.
Operand distributor method based on configurable multiplier matrix structure is characterized in, may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Aforesaid operations is counted distribution method, and wherein: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.
Aforesaid operations is counted distribution method, and wherein: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.
Aforesaid operations is counted distribution method, and wherein, described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein routine multiplier in the multiplier matrix respectively;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
Aforesaid operations is counted distribution method, and wherein, described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
Aforesaid operations is counted distribution method, and wherein: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end;
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition;
The output result is:
First kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
Because the present invention has adopted above technical scheme, can reach following beneficial effect:
1, under the prerequisite that only increases a small amount of hardware, strengthens the performance of digital signal processing greatly.
2, reduce the integrally-built modification of microprocessor, thereby prevent to introduce extra mistake.
Description of drawings
Concrete structure performance of the present invention is further described by following embodiment and accompanying drawing thereof.
Fig. 1 is an operand distributor of supporting digital signal processing in the existing microprocessor.
Fig. 2 is the synoptic diagram that the present invention is based on the operand distributor structure of configurable multiplier matrix structure.
Fig. 3 is the multiplier matrix structural representation among Fig. 2 of the present invention.
Fig. 4 a, 4b are respectively the route connection diagrams of one of the present invention embodiment of disposing the input variable signal.
Fig. 5 a, 5b, 5c, 5d are the distribution output result schematic diagrams of one of Fig. 4 embodiment.
Fig. 6 a, 6b, 6c, 6d are respectively two the route connection diagrams of the present invention embodiment of disposing the input variable signal.
Fig. 7 a, 7b, 7c, 7d are two the distribution output result schematic diagrams of Fig. 6 embodiment.
Fig. 8 a, 8b are respectively three the route connection diagrams of the present invention embodiment of disposing the input variable signal
Fig. 9 a, 9b are three the distribution output result schematic diagrams of Fig. 8 embodiment.
Embodiment
The operand distributor of digital signal processing comprises 23, two operand configurations of 21,22, one execution units of two groups of input blocks connector 24,25 as shown in Figure 2 in the whole support microcontroller, and memory access parts 26.In two groups of input blocks 21,22 wherein one group be horizontal vector input block 21, another group is divided 48 horizontal vector Y for vertical vector input block 22 in the horizontal vector input block 21 0~Y 3, vertical vector input block 22 is divided 48 vertical vector X 0~X 4, memory access parts 26 (are used Z among all the other each figure 0~Z 3Show) the different operating results that dispose of output.
See also Fig. 3, execution unit 23 forms multiplier matrix by 4 * 4=16 8 (little bit wide) multiplier M and constitutes, and each 8 multiplier can be accepted two 8 potential source operand A, B, and produces one 16 C as a result.
The input end of an operand configuration connector 24 connects 48 horizontal vector Y in the horizontal vector input block 21 in two operands configuration connector 24,25 0~Y 3, its output terminal connects the horizontal input end of each multiplier in the multiplier matrix by suitable route; The input end of another operand configuration connector 25 connects 48 vertical vector X in the vertical vector input block 21 0~X 4, its output terminal is by the vertical input end of each multiplier in the suitable route connection multiplier matrix, and described memory access parts 26 are connected with the output terminal C of execution unit 23.
The present invention is by suitable displacement and route, can finish 32~32 * 32 long multiplication, high degree of parallelism 8 and operations such as 16 multiplication, FFT, thereby the high performance digital signal of realizing microprocessor handled.
Further specify method of the present invention and advantage below by specific embodiment.
See also Fig. 4 a, 4b, this is that one of embodiment of the invention realizes supporting that the operand of 32 * 32 multiplication distributes synoptic diagram.
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector X; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Suppose that Y and X are respectively two 32 bit vectors, Z is the multiplication result of Y and X.Then have
Z = Σ i = 0 32 A * B i
Y is divided into following 48 bit vectors:
Y’ 0=Y 7...Y 0
Y’ 1=Y 15...Y 8
Y’ 2=Y 23...Y 16
Y’ 3=Y 31...Y 24
X is divided into following 48 bit vectors:
X’ 0=X 7...X 0
X’ 1=X 15...X 8
X’ 2=X 23...X 16
X’ 3=X 31...X 24
Each 8 multiplier belongs to 28 input ends, and one of them is horizontal vector input end A, and another is vertical vector input end B.
Configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected; As shown in Fig. 4 a the horizontal vector input end Y in the horizontal vector input block 3, Y 2, Y 1, Y 0The horizontal vector input end A of delegation's multiplier in difference correspondence and the multiplier matrix 33~A 30, A 23~A 20, A 13~A 10, A 03~A 00Connect;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein row multiplier in the multiplier matrix respectively; As shown in Fig. 4 b the vertical vector input end X in the vertical vector input block 3, X 2, X 1, X 0The vertical vector input end B of a row multiplier in difference correspondence and the multiplier matrix 33~B 30, B 23~B 20, B 13~B 10, B 03~B 00Connect.
See also Fig. 5 a~5d, the configuration result of more than joining its output has 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
See also Fig. 6 a~d, this is two of embodiments of the invention, is used to support the realization of 8 complex vector located Radix4FFT (high speed Fourier transform).
Radix4 fft algorithm principle
If array f length is N, W=e 2 π i/NBelow be classical Radix 4FFT algorithm:
F k = Σ n = 0 3 W nk F k 4 j + n (1)
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector B; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 a, utilize the multiplier A of the 2nd in the multiplier matrix, the 4th row 23~A 20And A 03~A 00, with link Y in the horizontal vector input block 3With the A in the multiplier 23, A 03Connect, with link Y in the horizontal vector input block 2With the A in the multiplier 22, A 02Connect, with link Y in the horizontal vector input block 1With the A in the multiplier 01, A 21Connect, with link Y in the horizontal vector input block 0With the A in the multiplier 20, A 00Connect.
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.Shown in Fig. 6 b, utilize the multiplier B of the 1st in the multiplier matrix, the 3rd row 32~B 02And B 30~B 00, with link X in the horizontal vector input block 3With the B in the multiplier 30, B 32Connect, with link X in the horizontal vector input block 2With the B in the multiplier 20, B 22Connect, with link X in the horizontal vector input block 1With the B in the multiplier 10, B 12Connect, with link X in the horizontal vector input block 0With the B in the multiplier 02, B 00Connect.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 c, utilize the multiplier A of the 1st in the multiplier matrix, the 3rd row 00~A 03And A 30~A 33, with link Y in the horizontal vector input block 3With the A in the multiplier 33, A 13Connect, with link Y in the horizontal vector input block 2With the A in the multiplier 32, A 12Connect, with link Y in the horizontal vector input block 1With the A in the multiplier 11, A 31Connect, with link Y in the horizontal vector input block 0With the A in the multiplier 30, A 10Connect.
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected; Shown in Fig. 6 d, utilize the multiplier B of the 2nd in the multiplier matrix, the 4th row 01~B 31And B 03~B 33, with link X in the horizontal vector input block 3With the B in the multiplier 33, B 31Connect, with link X in the horizontal vector input block 2With the B in the multiplier 21, B 23Connect, with link X in the horizontal vector input block 1With the B in the multiplier 11, B 13Connect, with link X in the horizontal vector input block 0With the B in the multiplier 01, B 03Connect.
See also Fig. 7 a~7d, more than the configuration result of twice configuration output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
Illustrate:
Is (with language performance what to specific k?), multiplication matrix can calculate Fk in one-period.The configuration of multiplication matrix is as follows:
If (i j) is the multiplier of the capable j row of i to M.R (x) is the real part of plural x, and I (x) is the imaginary part of plural x, n=0, and 1,2,3:
M ( 0 , n ) = R ( W nk ) * R ( F k 4 j + n )
M ( 1 , n ) = I ( W nk ) * R ( F k 4 j + n )
M ( 2 , n ) = R ( W nk ) * I ( F k 4 j + n )
M ( 4 , n ) = I ( W nk ) * R ( F k 4 j + n )
Σ i = 0 3 M ( 0 , i ) - Σ i = 0 3 M ( 3 , i ) Be F kReal part. Σ i = 0 3 M ( 1 , i ) + Σ i = 0 3 M ( 2 , i ) Be F kImaginary part (2)
The time series analysis of plural number Radix4 FFT
Plural Radix 4 FFT to N is ordered need log altogether 4N recurrence, the operation of each recurrence need operation types (1) N time.Be total to Nlog 4N time.
For length is the plural Radix4 FFT of N, and establishing loop unroll is k.Then operation is written into instruction and FFT2 and instructs and to need Nlog respectively 4N cycle.The branch instruction needs (Nlog 4N)/a k cycle.So need (2+1/k) Nlog altogether 4N cycle.
Fig. 8 a, 8b are the route synoptic diagram that three of the embodiment of the invention realizes 8 complex vector located dot product operations.
Operand collocation method in the microprocessor of the present invention may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
Described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end; Shown in Fig. 8 a, with link Y in the horizontal vector input block 3A with first row in the multiplier 33, A 32Connect, with link Y in the horizontal vector input block 2A with second row in the multiplier 23, A 22Connect, with link Y in the horizontal vector input block 1A with the third line in the multiplier 11, A 10Connect, with link Y in the horizontal vector input block 0A with fourth line in the multiplier 01, A 00Connect.
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition; Shown in Fig. 8 b, with the B in link X3 in the vertical vector input block and the 4th row multiplier 33, B 23Connect, with link X in the vertical vector input block 2With the B in the 3rd row multiplier 32, B 22Connect, with link X in the vertical vector input block 1With the B in the multiplier 11, B 01Connect, with link X in the vertical vector input block 0With the B in the multiplier 10, B 00Connect.
The output result is two kinds:
Ask for an interview Fig. 9 a, first kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Ask for an interview Fig. 9 b, second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
Illustrate:
The principle of 8 complex vector located dot product operations:
Be provided with two arrays:
A=<a 0,...,a n>
B=<b 0,...,b n>
Wherein for any plural a i, make I (a i) be its imaginary part, R (a i) be its real part.
Then the dot product of two arrays is operating as an array as a result:
C=<c 0,...,c n>
Wherein plural c i=a i* b i
Collocation method to multiplier matrix
C wherein iThe computing formula of imaginary part as follows:
I(c i)=I(a i)*R(b i)+I(b i)*R(a i) (3)
c iThe computing formula of real part as follows:
R(c i)=R(a i)*R(b i)-I(b i)*I(a i) (4)
Take all factors into consideration the double-port access characteristic (phase two 32 words weekly) of multiplication matrix and storer, so to finish length be 2 dot product the phase weekly.The dot product that then to finish length be N needs N cycle.

Claims (9)

1. operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;
It is characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operands configuration connectors, described operand configuration connector be connected to each self-corresponding two groups of input block and corresponding execution unit input end it, the operating results that memory access parts output difference disposes.
2. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described each little bit wide multiplier is meant 8 multipliers, and this multiplier can be accepted two 8 potential source operands, and produces 16 results.
3. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.
4. based on the operand distribution method of configurable multiplier matrix structure, it is characterized in that, may further comprise the steps:
A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;
B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;
C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.
5. operand distribution method according to claim 4 is characterized in that: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.
6. operand distribution method according to claim 4 is characterized in that: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.
7. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected;
B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein routine multiplier in the multiplier matrix respectively;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: 3 output terminals move to left 16, and an output does not move to left;
Second kind: three output terminals move to left 32, and an output terminal moves to left 48;
The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;
The 4th kind: each output terminal all moves to left 24.
8. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;
The route of configuration is for the first time:
A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.
The route of configuration is for the second time:
A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;
B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected;
More than its configuration result of twice configuration in memory access out connector output have 4 kinds:
First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;
The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.
9. according to claim 4 or 6 described operand distribution methods, it is characterized in that: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:
A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end;
B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition;
The output result has two kinds:
First kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;
Second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.
CN 200410025007 2004-06-09 2004-06-09 Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof Pending CN1707426A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410025007 CN1707426A (en) 2004-06-09 2004-06-09 Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410025007 CN1707426A (en) 2004-06-09 2004-06-09 Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof

Publications (1)

Publication Number Publication Date
CN1707426A true CN1707426A (en) 2005-12-14

Family

ID=35581367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410025007 Pending CN1707426A (en) 2004-06-09 2004-06-09 Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof

Country Status (1)

Country Link
CN (1) CN1707426A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107636640A (en) * 2016-01-30 2018-01-26 慧与发展有限责任合伙企业 Dot product engine with designator of negating
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
CN110337635A (en) * 2017-03-20 2019-10-15 英特尔公司 System, method and apparatus for dot product operations
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107636640A (en) * 2016-01-30 2018-01-26 慧与发展有限责任合伙企业 Dot product engine with designator of negating
CN107636640B (en) * 2016-01-30 2021-11-23 慧与发展有限责任合伙企业 Dot product engine, memristor dot product engine and method for calculating dot product
CN110337635A (en) * 2017-03-20 2019-10-15 英特尔公司 System, method and apparatus for dot product operations
CN110337635B (en) * 2017-03-20 2023-09-19 英特尔公司 System, method and apparatus for dot product operation
US11847452B2 (en) 2017-03-20 2023-12-19 Intel Corporation Systems, methods, and apparatus for tile configuration
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
CN108874744B (en) * 2017-05-08 2022-06-10 辉达公司 Processor, method and storage medium for performing matrix multiply-and-accumulate operations
US11797303B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11797301B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11797302B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11816481B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations

Similar Documents

Publication Publication Date Title
CN1832344A (en) Controller of graphic equalizer
CN1806274A (en) Driving method of deplay device having main display and sub display
CN1924429A (en) Method for manufacturing backlight and backlight
CN101055375A (en) Backlight assembly and display device having the same
CN1877532A (en) Compiler apparatus
CN1622180A (en) Demultiplexer and display device using the same
CN1892636A (en) Systems and methods of providing indexed load and store operations in a dual-mode computer processing environment
CN1207893C (en) Image processing method and apparatus
CN1808571A (en) Acoustical signal separation system and method
CN1551062A (en) Data drive and electronic optical device
CN1707426A (en) Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof
CN1165007C (en) Data transmission device, display and data sender, receiver and transmission method
CN1227903C (en) Image processing circuit
CN1784926A (en) Array speaker system
CN1880984A (en) Light guiding plate, backlight assembly having the same, and display device having the same
CN1967500A (en) Resource using method in automatic testing process
CN1492313A (en) Coordinate transformation method for digital scanning change-over device and processor
CN1900903A (en) Using a graphics system to enable a multi-user computer system
CN1505866A (en) Two-dimensional pyramid filter architecture
CN1627285A (en) Method and system of interconnecting processors of a parallel computer to facilitate torus partitioning
CN1202467C (en) Method of creating plurality of partitions on removable device
CN101051263A (en) Processor, image processing system and processing method
CN1273936C (en) Correspond rediation correction method for push-scanning satellite images CCD
CN100342643C (en) Two-dimensional pyramid filter architecture
CN1034912C (en) Process for production of gas containing large oxygen

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication