CN1707426A

CN1707426A - Operand value distributing apparatus based on configurational multiplier array structure and distributing method thereof

Info

Publication number: CN1707426A
Application number: CN 200410025007
Authority: CN
Inventors: 沈胜宇; 李思昆; 高树静; 周军明; 张谊; 卢先兆; 黄勇; 曾亮; 薛德贤
Original assignee: Shanghai Hua Bo Technology (group) Co Ltd
Current assignee: Shanghai Hua Bo Technology (group) Co Ltd
Priority date: 2004-06-09
Filing date: 2004-06-09
Publication date: 2005-12-14

Abstract

The present invention discloses one kind of operand distributing device and method backing up the digital signal processing in microprocessor. The operand distributing device includes two input units, one access and memory part and one executing part connected between the input units and the access and memory part. It features the executing part comprising multiplier matrix of small bit width multipliers, and includes also two operand allocating connectors connected separately between one of the input units and the executing part. The operand distributing method includes allocating connector to connect the horizontal vector inputs and vertical vector inputs of the input units with the horizontal vector inputs and vertical vector inputs of the multipliers via different routings, and outputting the operation results of different configuration from the access and memory part. The present invention raises the digital signal processing performance of microprocessor greatly.

Description

Operand distributor and distribution method thereof based on configurable multiplier matrix structure

Technical field

The present invention relates to operand distributor and distribution method thereof, be used for supporting digital signal processing efficiently at microprocessor based on configurable multiplier matrix structure.

Background technology

When using microprocessor to carry out the high performance digital signal processing, general multiplier and other unit construction of big bit wide of adopting carries out, as shown in Figure 1, comprise: execution unit (big bit wide multiplier) 11, and 12,13 and memory access parts 14 of two groups of input blocks that are connected with described multiplier respectively.Because the multiplier of big bit wide can't provide enough computing channel bandwidths, therefore cause the performance of digital signal processing not ideal enough.

Summary of the invention

A kind of operand distributor and the distribution method thereof based on configurable multiplier matrix structure that provide in order to solve the existing microprocessor problem that efficient is lower when carrying out digital signal processing is provided, this structure and method can be under the prerequisites that only increases few parts, realize the various configurations of operand, thereby improve the performance of digital signal processing greatly.

The technical measures that the present invention takes are:

A kind of operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;

Be characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operand configuration connectors; Described operand configuration connector is connected between the input end of each self-corresponding two groups of input block and corresponding execution unit, the operating results that memory access parts output difference disposes.

Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described each little bit wide multiplier is meant 8 multipliers, this multiplier can be accepted two 8 potential source operands, and produces 16 results.

Above-mentioned operand distributor based on configurable multiplier matrix structure, wherein, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.

Operand distributor method based on configurable multiplier matrix structure is characterized in, may further comprise the steps:

A. set up the multiplier matrix that constitutes by a plurality of little bit wide multipliers;

B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector, and another group is vertical input vector; Each 8 multiplier is decomposed into a horizontal vector input end and a vertical vector input end; Between signal input part and described multiplier matrix input end, be connected the configuration connector;

C. disposing connector is configured with each little bit wide multiplier horizontal vector input end of execution unit the horizontal input vector of input block respectively with vertical input vector by different routes and is connected with the vertical vector input end, from the operating result of the different configurations of memory access parts output, be used for the digital signal processing of the difference in functionality of support microcontroller.

Aforesaid operations is counted distribution method, and wherein: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.

Aforesaid operations is counted distribution method, and wherein: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.

Aforesaid operations is counted distribution method, and wherein, described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:

A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected;

B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein routine multiplier in the multiplier matrix respectively;

More than its configuration result of twice configuration in memory access out connector output have 4 kinds:

First kind: 3 output terminals move to left 16, and an output does not move to left;

Second kind: three output terminals move to left 32, and an output terminal moves to left 48;

The third: two output terminals move to left 40, and two output terminals move to left 8 in addition;

The 4th kind: each output terminal all moves to left 24.

Aforesaid operations is counted distribution method, and wherein, described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;

The route of configuration is for the first time:

A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;

B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.

The route of configuration is for the second time:

A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized;

B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected;

First kind: select the delegation's multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;

Second kind: select the delegation in the remaining triplex row multiplier in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;

The third: select the delegation in the remaining two row multipliers in the multiplier matrix, each output terminal in the memory access parts is linked to each other in twos with the output terminal of selected that delegation's multiplier;

The 4th kind: remaining delegation multiplier output terminal in each output terminal in the memory access parts and the multiplier matrix is linked to each other in twos.

Aforesaid operations is counted distribution method, and wherein: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:

A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end;

B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition;

The output result is:

First kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;

Second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.

Because the present invention has adopted above technical scheme, can reach following beneficial effect:

1, under the prerequisite that only increases a small amount of hardware, strengthens the performance of digital signal processing greatly.

2, reduce the integrally-built modification of microprocessor, thereby prevent to introduce extra mistake.

Description of drawings

Concrete structure performance of the present invention is further described by following embodiment and accompanying drawing thereof.

Fig. 1 is an operand distributor of supporting digital signal processing in the existing microprocessor.

Fig. 2 is the synoptic diagram that the present invention is based on the operand distributor structure of configurable multiplier matrix structure.

Fig. 3 is the multiplier matrix structural representation among Fig. 2 of the present invention.

Fig. 4 a, 4b are respectively the route connection diagrams of one of the present invention embodiment of disposing the input variable signal.

Fig. 5 a, 5b, 5c, 5d are the distribution output result schematic diagrams of one of Fig. 4 embodiment.

Fig. 6 a, 6b, 6c, 6d are respectively two the route connection diagrams of the present invention embodiment of disposing the input variable signal.

Fig. 7 a, 7b, 7c, 7d are two the distribution output result schematic diagrams of Fig. 6 embodiment.

Fig. 8 a, 8b are respectively three the route connection diagrams of the present invention embodiment of disposing the input variable signal

Fig. 9 a, 9b are three the distribution output result schematic diagrams of Fig. 8 embodiment.

Embodiment

The operand distributor of digital signal processing comprises 23, two operand configurations of 21,22, one execution units of two groups of input blocks connector 24,25 as shown in Figure 2 in the whole support microcontroller, and memory access parts 26.In two groups of input blocks 21,22 wherein one group be horizontal vector input block 21, another group is divided 48 horizontal vector Y for vertical vector input block 22 in the horizontal vector input block 21 ₀～Y ₃, vertical vector input block 22 is divided 48 vertical vector X ₀～X ₄, memory access parts 26 (are used Z among all the other each figure ₀～Z ₃Show) the different operating results that dispose of output.

See also Fig. 3, execution unit 23 forms multiplier matrix by 4 * 4=16 8 (little bit wide) multiplier M and constitutes, and each 8 multiplier can be accepted two 8 potential source operand A, B, and produces one 16 C as a result.

The input end of an operand configuration connector 24 connects 48 horizontal vector Y in the horizontal vector input block 21 in two operands configuration connector 24,25 ₀～Y ₃, its output terminal connects the horizontal input end of each multiplier in the multiplier matrix by suitable route; The input end of another operand configuration connector 25 connects 48 vertical vector X in the vertical vector input block 21 ₀～X ₄, its output terminal is by the vertical input end of each multiplier in the suitable route connection multiplier matrix, and described memory access parts 26 are connected with the output terminal C of execution unit 23.

The present invention is by suitable displacement and route, can finish 32～32 * 32 long multiplication, high degree of parallelism 8 and operations such as 16 multiplication, FFT, thereby the high performance digital signal of realizing microprocessor handled.

Further specify method of the present invention and advantage below by specific embodiment.

See also Fig. 4 a, 4b, this is that one of embodiment of the invention realizes supporting that the operand of 32 * 32 multiplication distributes synoptic diagram.

Operand collocation method in the microprocessor of the present invention may further comprise the steps:

B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector X; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;

Suppose that Y and X are respectively two 32 bit vectors, Z is the multiplication result of Y and X.Then have

Z = Σ_{i = 0}^{32} A * B_{i}

Y is divided into following 48 bit vectors:

Y’ ₀＝Y ₇...Y ₀

Y’ ₁＝Y ₁₅...Y ₈

Y’ ₂＝Y ₂₃...Y ₁₆

Y’ ₃＝Y ₃₁...Y ₂₄

X is divided into following 48 bit vectors:

X’ ₀＝X ₇...X ₀

X’ ₁＝X ₁₅...X ₈

X’ ₂＝X ₂₃...X ₁₆

X’ ₃＝X ₃₁...X ₂₄

Each 8 multiplier belongs to 28 input ends, and one of them is horizontal vector input end A, and another is vertical vector input end B.

Configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:

A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in wherein the horizontal vector input end of the multiplier of delegation be connected; As shown in Fig. 4 a the horizontal vector input end Y in the horizontal vector input block ₃, Y ₂, Y ₁, Y ₀The horizontal vector input end A of delegation's multiplier in difference correspondence and the multiplier matrix ₃₃～A ₃₀, A ₂₃～A ₂₀, A ₁₃～A ₁₀, A ₀₃～A ₀₀Connect;

B. each the vertical vector input end in the vertical vector input block is connected with each vertical input end of a wherein row multiplier in the multiplier matrix respectively; As shown in Fig. 4 b the vertical vector input end X in the vertical vector input block ₃, X ₂, X ₁, X ₀The vertical vector input end B of a row multiplier in difference correspondence and the multiplier matrix ₃₃～B ₃₀, B ₂₃～B ₂₀, B ₁₃～B ₁₀, B ₀₃～B ₀₀Connect.

See also Fig. 5 a～5d, the configuration result of more than joining its output has 4 kinds:

The 4th kind: each output terminal all moves to left 24.

See also Fig. 6 a～d, this is two of embodiments of the invention, is used to support the realization of 8 complex vector located Radix4FFT (high speed Fourier transform).

Radix4 fft algorithm principle

If array f length is N, W=e ^{2 π i/N}Below be classical Radix 4FFT algorithm:

F_{k} = Σ_{n = 0}^{3} W^{nk} F_{k}^{4 j + n}

(1)

B. input signal is decomposed into two group of 32 bit vector, wherein one group is horizontal input vector Y, and another group is vertical input vector B; Each 8 multiplier is decomposed into a horizontal vector input end A and a vertical vector input end B; Between signal input part and described multiplier matrix input end, be connected the configuration connector;

Described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;

The route of configuration is for the first time:

A. select with at interval two row multipliers in the multiplier matrix, the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 a, utilize the multiplier A of the 2nd in the multiplier matrix, the 4th row ₂₃～A ₂₀And A ₀₃～A ₀₀, with link Y in the horizontal vector input block ₃With the A in the multiplier ₂₃, A ₀₃Connect, with link Y in the horizontal vector input block ₂With the A in the multiplier ₂₂, A ₀₂Connect, with link Y in the horizontal vector input block ₁With the A in the multiplier ₀₁, A ₂₁Connect, with link Y in the horizontal vector input block ₀With the A in the multiplier ₂₀, A ₀₀Connect.

B. select in the multiplier matrix two row multipliers at interval, with the vertical input end connection of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized.Shown in Fig. 6 b, utilize the multiplier B of the 1st in the multiplier matrix, the 3rd row ₃₂～B ₀₂And B ₃₀～B ₀₀, with link X in the horizontal vector input block ₃With the B in the multiplier ₃₀, B ₃₂Connect, with link X in the horizontal vector input block ₂With the B in the multiplier ₂₀, B ₂₂Connect, with link X in the horizontal vector input block ₁With the B in the multiplier ₁₀, B ₁₂Connect, with link X in the horizontal vector input block ₀With the B in the multiplier ₀₂, B ₀₀Connect.

The route of configuration is for the second time:

A. select in the multiplier matrix two between-line spacing multipliers arranged in addition, with the horizontal vector input end connection of corresponding two row multipliers of going together in each the horizontal vector input end in the horizontal vector input block and the two row multipliers that utilized; Shown in Fig. 6 c, utilize the multiplier A of the 1st in the multiplier matrix, the 3rd row ₀₀～A ₀₃And A ₃₀～A ₃₃, with link Y in the horizontal vector input block ₃With the A in the multiplier ₃₃, A ₁₃Connect, with link Y in the horizontal vector input block ₂With the A in the multiplier ₃₂, A ₁₂Connect, with link Y in the horizontal vector input block ₁With the A in the multiplier ₁₁, A ₃₁Connect, with link Y in the horizontal vector input block ₀With the A in the multiplier ₃₀, A ₁₀Connect.

B. selecting in the described multiplier matrix is two between-line spacing multipliers arranged, and the vertical input end of each corresponding two multiplier of going together in each the vertical vector input end in the vertical vector input block and the two row multipliers that utilized is connected; Shown in Fig. 6 d, utilize the multiplier B of the 2nd in the multiplier matrix, the 4th row ₀₁～B ₃₁And B ₀₃～B ₃₃, with link X in the horizontal vector input block ₃With the B in the multiplier ₃₃, B ₃₁Connect, with link X in the horizontal vector input block ₂With the B in the multiplier ₂₁, B ₂₃Connect, with link X in the horizontal vector input block ₁With the B in the multiplier ₁₁, B ₁₃Connect, with link X in the horizontal vector input block ₀With the B in the multiplier ₀₁, B ₀₃Connect.

See also Fig. 7 a～7d, more than the configuration result of twice configuration output have 4 kinds:

Illustrate:

Is (with language performance what to specific k?), multiplication matrix can calculate Fk in one-period.The configuration of multiplication matrix is as follows:

If (i j) is the multiplier of the capable j row of i to M.R (x) is the real part of plural x, and I (x) is the imaginary part of plural x, n=0, and 1,2,3:

M (0, n) = R (W^{nk}) * R (F_{k}^{4 j + n})

M (1, n) = I (W^{nk}) * R (F_{k}^{4 j + n})

M (2, n) = R (W^{nk}) * I (F_{k}^{4 j + n})

M (4, n) = I (W^{nk}) * R (F_{k}^{4 j + n})

Σ_{i = 0}^{3} M (0, i) - Σ_{i = 0}^{3} M (3, i)

Be F _kReal part.

Σ_{i = 0}^{3} M (1, i) + Σ_{i = 0}^{3} M (2, i)

Be F _kImaginary part (2)

The time series analysis of plural number Radix4 FFT

Plural Radix 4 FFT to N is ordered need log altogether ₄N recurrence, the operation of each recurrence need operation types (1) N time.Be total to Nlog ₄N time.

For length is the plural Radix4 FFT of N, and establishing loop unroll is k.Then operation is written into instruction and FFT2 and instructs and to need Nlog respectively ₄N cycle.The branch instruction needs (Nlog ₄N)/a k cycle.So need (2+1/k) Nlog altogether ₄N cycle.

Fig. 8 a, 8b are the route synoptic diagram that three of the embodiment of the invention realizes 8 complex vector located dot product operations.

Described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:

A. each the horizontal vector input end in the horizontal vector input block respectively with multiplier matrix in delegation's multiplier in the horizontal vector input end of two multipliers be connected, wherein: the corresponding respectively horizontal vector input end that connects preceding two multipliers of first and second row of two horizontal vector input ends is arranged, in addition two corresponding horizontal vector input ends that connect latter two multiplier of third and fourth row of horizontal vector input end; Shown in Fig. 8 a, with link Y in the horizontal vector input block ₃A with first row in the multiplier ₃₃, A ₃₂Connect, with link Y in the horizontal vector input block ₂A with second row in the multiplier ₂₃, A ₂₂Connect, with link Y in the horizontal vector input block ₁A with the third line in the multiplier ₁₁, A ₁₀Connect, with link Y in the horizontal vector input block ₀A with fourth line in the multiplier ₀₁, A ₀₀Connect.

B. each the vertical vector input end in the vertical vector input block respectively with multiplier matrix in the row multiplier the vertical input end of two multipliers be connected, wherein: the corresponding respectively vertical input end that connects preceding two multipliers of first and second row of two input ends is arranged, and the vertical input end of two multipliers connects the vertical input end of latter two multiplier of third and fourth row respectively in addition; Shown in Fig. 8 b, with the B in link X3 in the vertical vector input block and the 4th row multiplier ₃₃, B ₂₃Connect, with link X in the vertical vector input block ₂With the B in the 3rd row multiplier ₃₂, B ₂₂Connect, with link X in the vertical vector input block ₁With the B in the multiplier ₁₁, B ₀₁Connect, with link X in the vertical vector input block ₀With the B in the multiplier ₁₀, B ₀₀Connect.

The output result is two kinds:

Ask for an interview Fig. 9 a, first kind: in multiplier matrix, select two row, two row arbitrarily, each output terminal in the memory access parts is linked to each other in twos with the output terminal of the multiplier of selected row, column infall;

Ask for an interview Fig. 9 b, second kind: the output terminal of each output terminal in the memory access parts with the multiplier of other two row, two row infalls linked to each other in twos.

Illustrate:

The principle of 8 complex vector located dot product operations:

Be provided with two arrays:

A＝<a ₀，...，a _n>

B＝<b ₀，...，b _n>

Wherein for any plural a _i, make I (a _i) be its imaginary part, R (a _i) be its real part.

Then the dot product of two arrays is operating as an array as a result:

C＝<c ₀，...，c _n>

Wherein plural c _i=a _i* b _i

Collocation method to multiplier matrix

C wherein _iThe computing formula of imaginary part as follows:

I(c _i)＝I(a _i)*R(b _i)+I(b _i)*R(a _i) (3)

c _iThe computing formula of real part as follows:

R(c _i)＝R(a _i)*R(b _i)-I(b _i)*I(a _i) (4)

Take all factors into consideration the double-port access characteristic (phase two 32 words weekly) of multiplication matrix and storer, so to finish length be 2 dot product the phase weekly.The dot product that then to finish length be N needs N cycle.

Claims

1. operand distributor based on configurable multiplier matrix structure, comprise two groups of input blocks, memory access parts and be connected a execution unit between input block, the memory access parts, described execution unit is a multiplier, its horizontal vector input end and vertical vector input end corresponding respectively with input block in each horizontal vector input end be connected with the vertical vector input end;

It is characterized in that described execution unit forms multiplier matrix by a plurality of little bit wide multipliers and constitutes; And, also comprise two operands configuration connectors, described operand configuration connector be connected to each self-corresponding two groups of input block and corresponding execution unit input end it, the operating results that memory access parts output difference disposes.

2. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described each little bit wide multiplier is meant 8 multipliers, and this multiplier can be accepted two 8 potential source operands, and produces 16 results.

3. the operand distributor based on configurable multiplier matrix structure according to claim 1 is characterized in that, described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute.

4. based on the operand distribution method of configurable multiplier matrix structure, it is characterized in that, may further comprise the steps:

5. operand distribution method according to claim 4 is characterized in that: described multiplier matrix is to press arranged by 8 multipliers of 4 * 4=16 to constitute; Each 8 multiplier can be accepted two 8 potential source operands, and produces 16 results.

6. operand distribution method according to claim 4 is characterized in that: the digital signal processing of the described support microcontroller difference in functionality of C step is meant: 32 * 32 multiplication, high speed Fourier transform and 8 complex vector located dot product operations.

7. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration connector configuration operation number supports the route of 32 * 32 multiplication to be:

The 4th kind: each output terminal all moves to left 24.

8. according to claim 4 or 6 described operand distribution methods, it is characterized in that described configuration operation number is supported the high speed Fourier transform, adopts configuration mode twice;

The route of configuration is for the first time:

The route of configuration is for the second time:

9. according to claim 4 or 6 described operand distribution methods, it is characterized in that: described configuration operation is counted 8 of supports and operated by dot product, and the route of its configuration is:

The output result has two kinds: