WO2019019196A1

WO2019019196A1 - Digital signal processing method and device and programmable logic device

Info

Publication number: WO2019019196A1
Application number: PCT/CN2017/095061
Authority: WO
Inventors: 杨伟国; 潘剑锋; 陈秀波
Original assignee: 华为技术有限公司
Priority date: 2017-07-28
Filing date: 2017-07-28
Publication date: 2019-01-31

Abstract

A digital signal processing method and device and a programmable logic device, relating to the field of digital circuits, and used for solving the existing problem in the prior art that the use of a fixed bit-width multiplier might result in low resource utilization and inferior operation performance. The invention comprises K multipliers (201), a shift control circuit (202), and a selection control circuit (203). The ith multiplier of the K multipliers (201) is used for performing multiplication with a bit width of Mi bits as a multiplicand and a bit width of Ni bits as a multiplier. The shift control circuit (202) is connected to the K multipliers (201), and is used for receiving K products outputted by the K multipliers (201), and for performing shift control processing on the K products to obtain K processed products and outputting the K processed products to the selection control circuit (203). The selection control circuit (203) is connected to the shift control circuit (202), and is used for receiving the K processed products sent by the shift control circuit (202) and outputting a result of accumulating the K processed products or outputting the K processed products.

Description

Digital signal processing method, device and programmable logic device

Technical field

The embodiments of the present invention relate to the field of digital circuits, and in particular, to a digital signal processing method, apparatus, and programmable logic device.

Background technique

Programmable logic devices, such as Field Programmable Gate Array (FPGA), include Digital Signal Processor Hardcore (DSP Hardcore). DSP Hardcore can be configured for multiplication, addition, and multiply-accumulate. The arithmetic processing of the signal, therefore, FPGA can provide the operations required for deep learning (for example, convolution multiplication and accumulation operations), and the industry generally uses FPGA as a deep learning processor.

Generally, the device in which the DSP Hardcore performs arithmetic processing in the FPGA is mainly a multiplier. As shown in FIG. 1, FIG. 1 shows a DSP Hardcore internal structure provided in the prior art, wherein the bit width of the multiplier 103 is generally fixed to M bit (bit) × N bit. There are many possibilities for the multiplication operation bit width required for deep learning. For example, the bit width required for deep learning is smaller than the bit width that the multiplier itself has. In this case, the prior art usually fills X bits of the bit width multiplier (for example, 19 bit × 18 bits) in order of high to low to realize the use of a large bit width multiplier to realize the small bit. Multiplication by a wide multiplier (for example, 8bit x 8bit). As shown in FIG. 2, FIG. 2 shows a schematic diagram of using a 19-bit×18-bit multiplier as an 8-bit×8-bit multiplier, as shown in FIG. 2, by inputting a 19-bit×18-bit multiplier. Fill 11 zeros and fill the other input with 10 zeros to use the 19bit x 18bit multiplier as an 8bit x 8bit multiplier.

However, in the prior art, a large bit width multiplier can be used as a small bit width multiplier as needed when used as a small bit width multiplier, for example, a 19 bit x 18 bit multiplier. Can only be used as an 8bit × 8bit multiplier. A large bit width multiplier cannot be split into multiple smaller bit width multipliers at the same time, and padding 0 in the high bit bit is used to implement the large bit width multiplier as a small bit width multiplier. A lot of computing power and computing resources are wasted.

Summary of the invention

The embodiment of the present invention provides a digital signal processing method, device, and programmable logic device, which solves the problems of using a fixed bit width multiplier in the prior art, resulting in low resource utilization and waste of computing performance.

In a first aspect, an embodiment of the present application provides a digital signal processing apparatus, including: K multipliers, a shift control circuit, and a selection control circuit; wherein an i-th multiplier of the K multipliers is used to implement the a multiplier whose width is M _i bits and whose multiplier width is N _i bits, where M _i and N _i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2; The shift control circuit is connected to K multipliers for receiving K products of K multiplier outputs, and for performing shift control processing on K products, obtaining K processed products, and K The processed product is output to the selection control circuit; the selection control circuit is connected to the shift control circuit for receiving the K processed products transmitted by the shift control circuit, and outputting the accumulated result or output of the K processed products. K processed products.

The present application provides a digital signal processing apparatus that performs shift control processing on K products outputted by each multiplier of K multipliers by using a shift control circuit to obtain K processed products, and K processing The latter product is output directly by the selection control circuit or after the accumulation operation is performed. In this way, not only K multipliers can be used to implement K multiplication operations, but also K multipliers can be used to implement multiply and accumulate operations. For example, K multipliers can be used to implement functions of large bit width multipliers, and matrix volumes can also be used. The product operation completes the multiplication and accumulation required for a convolution operation, which maximizes the computing power of the digital signal processing device, reduces the waste of computing resources, and can be applied to different scenarios or devices according to different digital signal processing requirements. The digital signal processing device provided by the present application can be embedded in a programmable logic device (such as a field programmable gate array FPGA) in a hard core manner. The digital signal processing device provided by the present application has a selection control circuit and a shift control. The circuit provides a certain flexibility, while ensuring a better operating frequency and providing higher computational efficiency, thus being suitable for various computing scenarios, for example, providing an efficient combination of matrix convolution operations in the deep learning process. Arithmetic unit.

In conjunction with the first aspect, in a first possible implementation of the first aspect, the shift control circuit includes K shift circuits, one of each of the K shift circuits and one of the K multipliers The multipliers are connected, each shifting circuit is used to receive the product of a multiplier output, and the product of the output of one multiplier is subjected to shift control processing to obtain a processed product. By arranging one shifting circuit for each of the K multipliers, the product of each multiplier output can be shifted more flexibly and accurately.

In conjunction with the first aspect or the first possible implementation of the first aspect, in a second possible implementation of the first aspect, the shift control process comprises: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different. Not only can the function of the large bit width multiplier be realized by setting the shift or no shift operation, but also the function of each multiplier can be realized.

With reference to any one of the first aspect to the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the selection control circuit includes an accumulation circuit, and the accumulation circuit is configured to implement K The accumulation of the processed product. The multiply-accumulate operation or the multiplier function of achieving a larger bit width can be implemented by the apparatus provided by the present application by performing an accumulation operation on the K processed products.

With reference to the first aspect to any one of the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement the following functions. At least one of the multiply-accumulate operation function; the function of the K multipliers; the function of the M×N multiplier, wherein the M×N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits Multiplier and satisfy

With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in a fifth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement multiply and accumulate In the operation, the shift control circuit directly outputs the K products of the received K multiplier outputs to the selection control circuit without performing shift processing; wherein, any one of the other K-1 products The shift processing by the product, the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit occupied by the multiplier generating the arbitrary one product in the N-bit multiplier Positioning; selecting the control circuit, The received K multiplications are added and output. When the multiply and accumulate operation is implemented, the structure of the digital signal processing apparatus provided by the present application may be changed, only the shift control circuit does not perform the shift operation on the received K products, and the control circuit is selected, and the received K signals are received. This can be achieved by multiplying and adding and outputting.

With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in the sixth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement K In the function of the multiplier, the shift control circuit outputs the K products of the received K multiplier outputs to the selection control circuit without shift processing, and selects the control circuit to output the received K products. When the function of each multiplier is implemented, the structure of the digital signal processing apparatus provided by the present application may not be changed only by performing shift processing on the received K products for the shift control circuit, and shifting by selecting the control circuit The K products of the bit control circuit that are not shifted are directly outputted, which is achieved, thus increasing the application range of the device provided by the present application.

With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in the seventh possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement M× When the function of the N multiplier is performed, the shift control circuit does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products. K processed products, and K processed products are output to the selection control circuit; the selection control circuit adds and outputs the K processed multiplications. The present application can perform a shift operation of a different number of bits by using a shift control circuit for the product of each of the K multiplier outputs to implement the function of the large bit width multiplier.

In combination with the first aspect to any one of the seventh possible implementation manners of the first aspect, in the eighth possible implementation manner of the first aspect, the digital signal processing apparatus further includes a configuration information receiving circuit, configured to receive the first A configuration information, the first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products, respectively. By configuring the first configuration information for the digital signal processing device, the shift control circuit may cause the shift operation or the shift operation not to be performed on the product outputted by each of the multipliers according to the first configuration information.

In conjunction with the first aspect, the eighth possible implementation of the first aspect, in the ninth possible implementation of the first aspect, the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit. By configuring the indication information for each shift circuit, each shift circuit can be caused to perform shift processing or no shift processing on the respective received products in accordance with the respective received indication information.

In combination with the first aspect to any one of the ninth possible implementation manners of the first aspect, in a tenth possible implementation manner of the first aspect, the indication information is further used to indicate that the at least one The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.

In combination with the first aspect to any one of the tenth possible implementation manners of the first aspect, in the eleventh possible implementation manner of the first aspect, the digital signal processing apparatus further includes configuration information receiving circuit, configured to receive And second configuration information, the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output K processed products.

In conjunction with the first aspect to any one of the eleventh possible implementations of the first aspect, in a twelfth possible implementation of the first aspect, the digital signal processing apparatus is integrated in the programmable logic device.

In a second aspect, an embodiment of the present application provides a programmable logic device including digital signal processing as described in any one of the first aspect to the eleventh possible implementation manner of the first aspect. Device. Optionally, the programmable logic device comprises a Field Programmable Gate Array (FPGA), a Complex Programable Logic Device (CPLD), and an erasable editable logic device (Erasable Programmable Logic). At least one of Device, EPLD).

In a third aspect, an embodiment of the present application provides a digital signal processing method. The method provided by the embodiment of the present application includes: performing K multiplication operations to obtain K products, and outputting K products to a shift control circuit, and K multiplication operations. The i-th multiplication operation in the multiplication operation realizes a multiplication operation in which the multiplicand bit width is M _i bits and the multiplier bit width is N _i bits, where M _i and N _i are positive integers, i=1, 2...K, K is an integer greater than or equal to 2; the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit; and selects control circuit output K The accumulated result of the processed products or the K processed products.

With reference to the third aspect, in a first possible implementation manner of the third aspect, the shift control circuit performs a shift control process on the K products to obtain K processed products, including: K included in the shift control circuit Each shift circuit in the shift circuit performs shift control processing on one of the K products to obtain K processed products.

In conjunction with the third aspect or the first possible implementation of the third aspect, in a second possible implementation of the third aspect, the shift control process includes: shifting or not shifting.

With reference to any one of the third aspect to the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement at least one of the following functions. One: multiply accumulate operation function; function of K multipliers; function of M×N multiplier, where M×N multiplier represents multiplier whose multiplicand bit width is M bits and multiplier bit width is N bits And satisfied

With reference to any one of the third aspect to the third possible implementation manner of the third aspect, in the fourth possible implementation manner of the third aspect, when the method provided by the embodiment of the present application is used to implement the multiply and accumulate operation, The shift control circuit performs shift control processing on the K products to obtain K processed products, and outputs the K processed products to the selection control circuit, including: the shift control circuit does not shift the K products. Processing, obtaining K products that are not subjected to shift processing, and outputting K products that have not undergone shift processing to the selection control circuit; selecting the control circuit to output the accumulated result of the K processed products or outputting K processed samples The product includes: the selection control circuit performs the accumulation operation on the received K non-shifted products and outputs K multiplication accumulation addition results that are not subjected to the shift processing.

With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a fifth possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement K multipliers. In the function, the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit does not K products After the shift processing, K products which are not subjected to the shift processing are obtained, and K products which are not subjected to the shift processing are output to the selection control circuit; wherein, for any one of the other K-1 products, Shift processing, based on the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit position occupied by the multiplier generating the arbitrary product in the N-bit multiplier ; select the control circuit to output K processed multiplications The accumulated result of the product or the output of the K processed products includes: the shift control circuit directly outputs the received K products that are not subjected to the shift processing.

With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a sixth possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement an M×N multiplier In the function, the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit receives the received One of the K products is not subjected to shift processing, and the other K-1 products in the received K products are subjected to shift processing to obtain K processed products, and K processed products are output. To the selection control circuit; the selection control circuit outputs the accumulated result of the K processed products or outputs the K processed products, and the shift control circuit adds and outputs the received K processed multiplications.

With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a seventh possible implementation manner of the third aspect, the method provided by the embodiment of the present application further includes: receiving the first configuration information The first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products.

With reference to any one of the third aspect to the seventh possible implementation manner of the third aspect, in an eighth possible implementation manner of the third aspect, the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit.

With reference to any one of the third aspect to the eighth possible implementation manner of the third aspect, in the ninth possible implementation manner of the third aspect, the indication information is further used to indicate that the at least one shift circuit is configured to perform at least one shift The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.

With reference to any one of the third aspect to the ninth possible implementation manner of the third aspect, in a tenth possible implementation manner of the third aspect, the method provided by the embodiment of the present application further includes: receiving the second configuration information The second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or to output K processed products.

DRAWINGS

1 shows an internal structure of a digital signal processor DSP provided in the prior art;

2 is a schematic structural diagram of a small bit width multiplier implemented by using a large bit width multiplier in the prior art;

3 is a schematic structural diagram 1 of a digital signal processing apparatus provided by the present application;

4 is a schematic structural diagram 2 of a digital signal processing apparatus provided by the present application;

FIG. 5 is a schematic structural diagram 3 of a digital signal processing apparatus provided by the present application; FIG.

6 is a schematic structural diagram 4 of a digital signal processing apparatus provided by the present application;

FIG. 7 is a schematic structural diagram 5 of a digital signal processing apparatus provided by the present application; FIG.

FIG. 8 is a schematic structural diagram 6 of a digital signal processing apparatus provided by the present application; FIG.

9 is a schematic structural diagram 7 of a digital signal processing apparatus provided by the present application;

10 is a schematic structural diagram VIII of a digital signal processing apparatus provided by the present application;

11 is a schematic structural diagram IX of a digital signal processing apparatus provided by the present application;

12 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

FIG. 13 is a schematic structural diagram 11 of a digital signal processing apparatus provided by the present application; FIG.

14 is a schematic structural diagram 12 of a digital signal processing apparatus provided by the present application;

15 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

16 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

17 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

18 is a schematic structural diagram 16 of a digital signal processing apparatus provided by the present application;

19 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

20 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;

21 is a schematic flowchart of a digital signal processing method provided by the present application;

22 is a schematic structural diagram of an FPGA provided by the present application;

Figure 23 is a schematic diagram 1 of a convolution operation;

Figure 24 is a schematic diagram 2 of a convolution operation.

Detailed ways

In order to facilitate the clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words “first”, “second”, and the like are used to distinguish objects with similar names or functions or functions, and those skilled in the art may Understanding the words “first” and “second” does not limit the quantity and order of execution.

It is to be noted that Q'b0 in the embodiment of the present application indicates 0 of Qbit. For example, in Fig. 2, 11'b0 indicates 11-bit 0, and 10'b0 indicates 10-bit 0.

[M:0] represents all or part of the bit sequence of the input/output data, wherein all or part of the bit sequence of the input/output data includes M bits, for example, [7:0] in FIG. 2 represents the input data. A portion of the bit sequence includes 7 bits.

In addition, 2^P in the embodiment of the present invention indicates that P bits are shifted left, for example, 2^16 indicates that 16 bits are shifted left.

FIG. 3 shows a digital signal processing apparatus 20 provided by an embodiment of the present application. As shown in FIG. 3, the present invention includes: a shift control circuit 202 connected to K multipliers 201 by K multipliers 201, and shift control. The selection control circuit 203 is connected to the circuit 202.

Wherein, K multipliers 201 are respectively used for multiplication operations in which the multiplicand bit width is M _i bits and the multiplier bit width is N _i bits, wherein M _i and N _i are positive integers, i=1, 2 ...K, K is an integer greater than or equal to 2; K multipliers may be M ₁ × N ₁ multiplier 2011, M ₂ × N ₂ multiplier 2012, ..., M _K × N _K multiplier as in Fig. 3 201K. The unit of the data width of the M _i ×N _i multiplier is a bit.

Specifically, the i-th (i=1, 2...K) multipliers of the K multipliers 201 are configured to obtain a product according to the received multiplier and the multiplicand, wherein the multiplicand may be the first input sequence The first input sequence includes M _i bits, the multiplier may be a second input sequence, and the second input sequence includes N _i bits.

The shift control circuit 202 is configured to receive K products of K multiplier outputs, and perform shift control processing on K products to obtain K processed products, and output K processed products to The control circuit 203 is selected.

The selection control circuit 203 is configured to receive the K processed products transmitted by the shift control circuit 202, and output the accumulated result of the K processed products or output the K processed products.

Alternatively, the K multipliers 201 may share a shift control circuit 202 and a selection control circuit 203. In this case, a shift control circuit 202 is used to control each of the K multipliers 201. Whether the product of the multiplier outputs performs a shift operation to obtain K processed products, one selection The control circuit 203 is configured to control the K processed products to directly output or perform an accumulation operation and then output.

In a specific example, to achieve a flexible and accurate shifting of the product of each multiplier output of the K multipliers, one shifting circuit can be configured for each of the K multipliers. Referring to FIG. 3, as shown in FIG. 4, the shift control circuit 202 of the present application may include K shift circuits, for example, the shift circuit 2021, the shift circuit 2022, ..., and the shift circuit 202K shown in FIG. Each shift circuit of the K shift circuits is connected to one of the K multipliers 201, and each of the K shift circuits is configured to receive a product of a multiplier output and multiply a multiplication method. The product of the output of the device is subjected to shift control processing to obtain a processed product.

In another specific example, two or more multipliers of the K multipliers may share a shift circuit. That is, the shift control circuit 202 may include at least one shift circuit for performing shift control processing on the products of the K multiplier outputs, respectively. Optionally, the shift control process includes: shifting or not shifting. The shift control circuit 202 performs a non-shift operation on the product of one multiplier output. It can be understood that the shift control circuit 202 shifts the product of the output of one multiplier by 0 bits, or the shift control circuit 202 directly outputs the multiplier output. The product of. Therefore, the processed K products output by the shift control circuit 202 include the product of the shifted product and/or the non-shifted product.

When the product of the different multiplier outputs is shifted, the number of bits to be moved may be the same or different. For example, the shift control circuit 202 may shift the product output from the multiplier 2011 to the left by 5 bits, and the pair is right. The product output by the multiplier 2012 is shifted to the left by 10 bits, and the product of the other K-2 multiplier outputs is not shifted.

Exemplarily, as shown in FIG. 4, the M ₁ × N ₁ multipliers 2011 in the K multipliers 201 are connected to the shift circuit 2021 in the shift control circuit 202, and the M ₂ × N ₂ multipliers 2012 are shifted. The shift circuit 2022 in the bit control circuit 202 is connected, and the M _K × N _K multiplier 201K is connected to the shift circuit 202K in the shift control circuit 202. The connection relationship between the remaining multipliers and the shift circuit can be seen in FIG. This application does not repeat here. For example, the shift circuit 2021 is connected to the M ₁ × N ₁ multiplier 2011, and the shift circuit 2021 is for receiving the product of the output of the M ₁ × N ₁ multiplier 2011 and outputting the product of the M ₁ × N ₁ multiplier 2011. Perform shift control processing. When each shift circuit of the K shift circuits performs shift control processing on the respective products received, the shift operation or the non-shift operation may be performed in conjunction with the scene or configuration information applied by the data processing device. The application embodiment does not limit this.

Optionally, as shown in FIG. 4, the apparatus 20 provided by the present application further includes: a register connected to each multiplier for storing a multiplicand and a multiplier of the multipliers connected thereto.

Optionally, the shifting circuit may include a shift register shift and a selector MUX, and the shift register is connected to the one multiplier for performing a shift operation on a product of the multiplier output. The selector is coupled to the shift register and the multiplier for selecting and outputting a shifted or unshifted product. In a specific example, as shown in FIG. 5, the shift control circuit 202 includes K shift registers and K selectors (for example, as shown in FIG. 5, the selector 2031, the selector 2032, ..., And a selector 203K).

Exemplarily, as shown in FIG. 5, one end of the selector 2031 is connected to the M ₁ × N ₁ multiplier 2011 for receiving the product of the output of the M ₁ × N ₁ multiplier 2011 (that is, for receiving without shifting). The other end of the selector 2031 is coupled to the shift register for receiving the shifted product of the shift register output; one end of the selector 2032 is coupled to the M ₂ × N ₂ multiplier 2012 for receiving M _{The product of the 2} × N ₂ multiplier 2012 output (that is, the product for receiving the non-shift), the other end of the selector 2032 is connected to the shift register for receiving the product after the shift of the shift register; the selector 203K end and M _{_K} × N _K multipliers are connected to receive the product M _{_K} × N _K multiplier output, the other terminal of the selector 203K, and a shift register connected to receive shift register output after the shift product. The working principle and the connection relationship of the other selectors are similar to those of the selector 203K, as shown in FIG. 5, and details are not described herein again.

Optionally, as shown in FIG. 6, the apparatus provided by the present application further includes: a configuration information receiving circuit 30, configured to receive first configuration information, where the first configuration information is used to instruct the shift control circuit 202 to receive the received K. The product is subjected to shift control processing. For example, the first configuration information indicates whether the shift control circuit needs to perform shift processing or no shift processing for each of the received K products.

The shift control process performed by each shift circuit on the product of the respective multiplier output received may be different. For example, some shift circuits need to perform shift processing on the product of the received multiplier output, and some shifts. The bit circuit needs to process the product of the received multiplier output without shifting, in order to facilitate flexible and accurate shift control processing for each shift circuit to multiply the product of the respective multiplier output, optionally, first The configuration information includes first indication information configured for at least one of the shift control circuits, wherein the first indication information is used to indicate a multiplier output of each of the at least one shift circuit for each of the shift circuits The product of the shift is subjected to shift control processing. The first indication information may be a first indicator or a second indicator, where the first indicator indicates that the received K products are subjected to shift processing, and the second indicator indicates that the received K products are not Perform shift processing. Exemplarily, the first indicator may be "0" and the second indicator may be "1." Optionally, the first configuration information may include first indication information separately configured for each shift circuit.

The number of shifts of the product of each shift circuit to the output of the multiplier output may be different. Therefore, the first configuration information received by the configuration information receiving circuit 30 further includes a second indication configured for the at least one shift circuit. information. The second indication information is used to indicate a shift direction and/or a shift bit number when each shift circuit of the at least one shift circuit performs a shift control process on a product of the respective received multiplier outputs. By arranging the second indication information for the shift circuit, the shift circuit can accurately perform the shift control process on the product of the respective received multiplier outputs. Optionally, the first configuration information may include second indication information separately configured for each shift circuit.

Specifically, the configuration information receiving circuit 30 can be a pin or a connecting line of the digital signal processing device 20. In a specific example, when the shift register is included in the shift control circuit 202, the configuration information receiving circuit 30 can be a pin of the shift register, and the configuration information receiving circuit 30 can pass the pin of the shift register. The indication information (such as the first indication information and/or the second indication information described above) is input to the shift register.

Optionally, the configuration information receiving circuit 30 may be located in the DSP, or may be located in another controller in the FPGA, such as a central processing unit (CPU), which is not limited by the embodiment of the present application. Any means for inputting the first configuration information to the shift control circuit 202 provided in the embodiment of the present application can be used as the configuration information receiving circuit 30 of the present application.

Optionally, the configuration information receiving circuit 30 is further configured to receive third configuration information, where the third configuration information is used to indicate a shift of each of the K selectors from the respective received shift register output. After multiplication One of the unshifted products of the product and the multiplier output is selected and output to the selection control circuit 203. Optionally, the third configuration information includes a first indication configured for each of the K shift circuits, wherein the first indication is used to indicate a product sum of the selector after shifting of the shift register output One of the products of the multiplier output is selected and output to the selection control circuit 203. Optionally, the first indication may be a letter or a number, which is not limited in this application. Exemplarily, the first indication may be a third indicator or a fourth indicator, wherein the third indicator is used to instruct the selector to output the shifted product outputted by the shift circuit to the selection control circuit 203, and fourth The indicator is used to instruct the selector to output the product of the multiplier output to the selection control circuit 203. Exemplarily, the third indicator may be “1” and the fourth indicator may be “0”, such that each selector may perform corresponding processing according to the specific content of the corresponding corresponding first indication.

Optionally, as shown in FIG. 7, the selection control circuit 203 provided by the present application includes: an accumulation circuit for implementing accumulation of K processed products. Specifically, the accumulation circuit may include a plurality of accumulators, for example, an accumulator 301, an accumulator 302, an accumulator 303, ..., and an accumulator 30 (K-1) as shown in FIG.

Optionally, in the embodiment, when the accumulating circuit includes a plurality of accumulators (for example, including at least three accumulators), the plurality of accumulators are connected in cascade, and the number of accumulators in the previous stage is adjacent to the same The number of accumulators in the latter stage is one less (wherein the output of the accumulator of the previous stage in the two cascaded accumulators is the input of the accumulator of the latter stage). Optionally, in the embodiment of the present application, the accumulator connected to the selection control circuit 202 is referred to as a first-stage accumulator, and a first-stage accumulator is used to perform the two processed products output by the shift control circuit 202. Accumulate. In a specific example, each of the two selectors is coupled to a first stage accumulator of the accumulator circuit, for example, a selector 2031 and a selector 2032 are coupled to a first stage accumulator 301 of the accumulating circuit.

Optionally, when the number of selectors is an odd number, there is an accumulator that only receives the product of one of the selector outputs, and accumulates the product of one of the selector outputs received by the accumulator and the other accumulators. Accumulate. Exemplarily, as shown in FIG. 8, the product output by the selector 2031 and the selector 2032 is accumulated by the accumulator 301, the product output by the selector 2033 is output to the accumulator 302, and finally the accumulator 301 and the accumulator 302 are respectively output. The accumulated result is accumulated and output by the accumulator 303. Optionally, when the number of accumulators of any level is an odd number, accumulation may also be performed using a method similar to the above. As shown in FIG. 9, FIG. 9 takes the shift control circuit as an example. The products of the selector 2031 and the output of the selector 2032 are accumulated by the accumulator 301 of the first stage, and the selector 2033 and the selector 2034 output. The product of the first stage is accumulated by the accumulating circuit 302 of the first stage, and the product of the output of the selector 2033 and the selector 2034 is accumulated by the accumulating circuit 303 of the first stage, and the accumulated result of the accumulator 301 and the accumulator 302 is accumulated by the accumulator 304, and finally The accumulated result output by the accumulator 304 and the accumulated result output by the accumulator 303 are accumulated by the accumulator 305.

Optionally, the configuration information receiving circuit 30 is further configured to receive second configuration information, where the second configuration information is used to instruct the selection control circuit 203 to output an accumulated result of the K processed products or output K processed samples. product. Specifically, the second configuration information may be a second indication or a third indication, where the second indication is used to instruct the selection control circuit 203 to output K processed products, and the third indication is used to instruct the selection control circuit 203 to output K The cumulative result of the processed product. The selection control circuit 203 directly outputs K processed products when determining that the second configuration information is the second indication, and the selection control circuit 203 performs accumulation of the K processed products when determining that the second configuration information is the third indication. Output after operation.

It should be noted that the results of the K multipliers output by the digital signal processing apparatus provided by the present application may be selected or used as needed, that is, the apparatus may implement the function of any at least one of the K multipliers, The multiply accumulate function can be implemented by K multipliers or the function of the M×N multiplier can be realized by K multipliers, wherein the M×N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits. Multiplier and satisfy

The specific application of any of the digital signal processing devices 20 provided in the embodiments of the present application in different scenarios will be described below with reference to the accompanying drawings.

The apparatus provided by the present application can be used to implement a multiply-accumulate operation function.

As shown in FIG. 10, the shift control circuit 202 provided by the present application is specifically configured to output K products outputted by the K multipliers 201 to the selection control circuit 203 without shift processing; and select a control circuit. 203. Add and output the received K multiplications.

Optionally, the third configuration information received by each selector is used to instruct each selector to directly output the product of the respective received multiplier outputs. The second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products.

In a specific example, the multiply and accumulate operations in this application may be a convolution operation between matrices.

The digital signal processing device 20 shown in Fig. 11 includes four 4 bit × 4 bit multipliers, which can be used to implement a 2 × 2 matrix convolution operation. When implementing the matrix convolution operation, the multiplicands are d1, d2, d3, and d4, and the multipliers are k1, k2, k3, and k4.

As shown in FIG. 11, the multiplicand sequence input to the 8-bit×8-bit multiplier 3011 is: in_X1[7:0], the multiplier sequence is: in_Y1[7:0], and the input to the 8-bit×8-bit multiplier 3012 is The multiplier sequence is: in_X2[7:0], the multiplier sequence is: in_Y2[7:0]; the multiplicand sequence input to the 8bit×8bit multiplier 3013 is: in_X3[7:0], and the multiplier sequence is :in_Y3[7:0]; The multiplicand sequence input to the 8bit×8bit multiplier 3014 is: in_X4[7:0], and the multiplier sequence is: in_Y4[7:0].

Wherein, in_X1[7:0] represents d1, in_X2[7:0] represents d2, in_X3[7:0] represents d3 and in_X4[7:0] represents d4; in_Y1[7:0] represents k1, in_Y2[7 :0] means k2, in_Y3[7:0] means k3 and in_Y4[7:0] means k4. Alternatively, the multiplicand sequence and the multiplier sequence can be stored in registers, respectively.

The 8-bit×8-bit multiplier 3011 obtains the first product after performing multiplication operations on in_X1[7:0] and in_Y1[7:0]; the 8-bit×8-bit multiplier 3012 will in_X2[7:0] and in_Y2[7: 0] After performing the multiplication operation, the second product is obtained; the 8-bit×8-bit multiplier 3013 multiplies in_X3[7:0] and in_Y3[7:0] to obtain the third product; the 8-bit×8-bit multiplier 3011 will in_X4 After [7:0] and in_Y4[7:0] multiply, the fourth product is obtained.

When the device shown in FIG. 11 is used to implement a 2×2 convolution operation, the output result satisfies the following mathematical principle: Z=(d1×k1)+(d2×k2)+(d3×k3)+(d4×k4 ). The shift control circuit 202 does not perform a shift operation on the product of each 8-bit×8-bit multiplier output, and the shift circuit 4011 in the shift control circuit 202 uses the first product output from the 8-bit×8-bit multiplier 3011 as the selection control circuit. In the input of 203, the shift circuit 4012 uses the second product output from the 8-bit×8-bit multiplier 3012 as the input of the selection control circuit 203, and the shift circuit 4013 uses the second product output from the 8-bit×8-bit multiplier 3013 as the selection control circuit 203. The input, shift circuit 4014 uses the fourth product output by the 8-bit x 8-bit multiplier 3014 as an input to the selection control circuit. Specifically, the unshifted first product and shift circuit 4012 output by the shift circuit 4011 The output of the unshifted second product obtains the first accumulation result by the first accumulator 501, and the unshifted third product output by the shift circuit 4013 and the unshifted fourth product output by the shift circuit 4014 pass the The second accumulator 502 obtains a second accumulated result, and the first accumulator 501 and the second accumulator 502 respectively output the first accumulated result and the second accumulated result to the third accumulator 503 to obtain a final 2×2 convolution multiplication. The result of the operation.

Optionally, as shown in FIG. 11, the digital signal processing apparatus 20 provided by the present application further includes a register connected to the selection control circuit 203 for storing the accumulation result output by the selection control circuit 203 or the shift control of the direct output. The K processed products of the circuit output.

Illustratively, as shown in FIG. 12, FIG. 12 differs from FIG. 11 in that the digital signal processing apparatus 20 shown in FIG. 12 includes nine 4 bit x 4 bit multipliers (4 bit x 4 bit multiplication as shown in FIG. 12). The 6011, 4 bit × 4 bit multiplier 6012, 4 bit × 4 bit multiplier 6013, ..., and 4 bit × 4 bit multiplier 6019). Therefore, the digital signal processing device 20 can implement a 3 × 3 matrix convolution operation. When implementing the matrix convolution operation, the multiplicands are d1, d2, d3, d4, d5, d6, d7, d8, and d9, and the multipliers are k1, k2, k3, k4, k5, k6, k7, k8, And k9.

As shown in FIG. 12, the multiplicand sequence input to the 4 bit × 4 bit multiplier 6011 is: in_X1 [3:0], and the multiplier sequence is: in_Y1 [3:0]; for example, input to the 4 bit × 4 bit multiplier 6012 The multiplicand sequence is: in_X2[3:0], the multiplier sequence is: in_Y2[3:0]; the multiplicand sequence input to the 4bit×4bit multiplier 6018 is: in_X8[3:0], multiplier The sequence is: in_Y8[3:0]; the multiplicand sequence input to the 4bit×4bit multiplier 6019 is: in_X9[3:0], and the multiplier sequence is: in_Y9[3:0]. Wherein, in_X1[3:0] represents d1, in_X2[3:0] represents d2, in_X3[3:0] represents d3 and in_X9[3:0] represents d9; in_Y1[3:0] represents k1, in_Y2[3 :0] means k2, in_Y3[3:0] means k3 and in_Y9[7:0] means k9. It can be understood that the above multiplicands correspond to a multiplicand input sequence of a 4bit×4bit multiplier, respectively. The number is corresponding to the multiplier input sequence of a 4bit×4bit multiplier, and the rest of the input can refer to the above description, which is not described herein again.

Wherein, each 4bit×4bit multiplier is used to multiply the input sequence it receives, and obtain an output result. Since the device shown in FIG. 12 is used to implement a 3×3 convolution operation, the output result is obtained. The following mathematical principles should be satisfied: Z = (d1 × k1) + (d2 × k2) + (d3 × k3) + (d4 × k4) + (d5 × k5) + (d6 × k6) + (d7 × k7) + (d8 × k8) + (d9 × k9), whereby it can be known that the shift control circuit 202 does not perform a shift operation on the output result of each 4 bit × 4 bit multiplier output, for example, as shown in Fig. 12, The black thick solid arrow in FIG. 12 indicates that the shift operation is not performed, that is, each selector shown in FIG. 12 is used to input the product of the respective received 4-bit×4 bit multiplier output to the selection control circuit 203, The selection control circuit 203 is then used to perform the accumulation operation by the product of the nine unexecuted shift operations through the accumulation circuit to obtain the result Z of the final 3×3 convolution operation, and the specific selection control circuit 203 will receive the nine The process of performing the accumulating operation is similar to that of FIG. 11. For details, refer to the content shown in FIG.

The apparatus provided by the present application can also be used to implement the functions of K multipliers.

Hereinafter, an explanation will be given with reference to FIGS. 13 and 14. As shown in the embodiment of Fig. 13, the digital signal processing device 20 can be used as four independent 8-bit by 8-bit multipliers. Each of the shift control circuits 202 inputs the product of the respective received multiplier outputs directly to the selection control circuit 203 (ie, the selector included in each shift circuit inputs the unshifted product to Select control circuit 203), select control The circuit 203 directly outputs the unshifted product of each shift circuit including the selector output to obtain the result of the output of four 8-bit by 8-bit multipliers, for example, the output results Z1, Z2 as shown in FIG. Z3 and Z4. Optionally, the first configuration information received by the configuration information receiving circuit 30 and/or the second configuration information indicates that the shift control circuit 202 does not perform a shift operation on the received K products, and the configuration information receiving circuit 30 receives The second configuration information indicates that the selection control circuit 203 directly outputs the received K processed results.

Exemplarily, as shown in FIG. 13, the input of the 8-bit×8-bit multiplier 7011 is X1 (in_X1[7:0]) and Y1 (in_Y1[7:0]), and the input of the 8-bit×8-bit multiplier 7012 is X2 ( In_X2[7:0]) and Y2(in_Y2[7:0]), the input of the 8bit×8bit multiplier 7013 is X3 (in_X3[7:0]) and Y3 (in_Y3[7:0]), 8bit×8bit The inputs to multiplier 7014 are X4 (in_X4[7:0]) and Y4 (in_Y4[7:0]). Each of the shift circuits in the shift control circuit 202 in Fig. 13 does not perform a shift operation on the product of the respective received 8-bit × 8-bit multiplier outputs, and therefore, each shift circuit will receive the respective 8 bits. The product of the ×8 bit multiplier output is directly output to the selection control circuit 203 (that is, each shift circuit outputs an unshifted product to the selection control circuit 203 through a respective selector), and the selection control circuit 203 shifts each of them. The unprocessed product sent by the circuit is directly output to obtain the output result of each 8-bit×8-bit multiplier, that is, the final output of four output results in FIG. 13 is satisfied, and the mathematical principle satisfies: Z1=(X1×Y1); Z2=(X2) ×Y2); Z3=(X3×Y3); Z4=(X4×Y4).

Optionally, the selection control circuit 203 in the apparatus provided in the present application may include an accumulation circuit, but in a scenario in which an accumulation operation is required, the output of the selection control circuit may be configured to pass through the accumulation circuit without performing an accumulation operation. The output of the configuration selection control circuit in the scene does not pass through the accumulation circuit. Of course, the selection control circuit 203 can simultaneously output the processing result of the accumulation circuit and the processing result without the accumulation circuit, and the module or device connected to the digital signal processing device provided by the embodiment of the present application can be selectively used according to requirements. Optionally, the selection control circuit 203 can also output the output of any one or more accumulators according to the calculation requirement.

The digital signal processing device shown in FIG. 14 can be used as nine independent 4bit×4bit multipliers, and the usage process is similar to that of FIG. 13. For details, refer to the process shown in FIG. The device finally inputs 9 input results, and its mathematical principle satisfies: Z1=(X1×Y1); Z2=(X2×Y2); Z3=(X3×Y3); Z4=(X4×Y4);......;Z9= (X9×Y9).

The apparatus provided by the present application can also be used to implement the function of an M×N multiplier, where M>M _i , N>N _i , i=1, . . . , K.

Description will be made below with reference to Figs. 15 and 16 . The shift control circuit 202 does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products to obtain K processed products. The K processed products are output to the selection control circuit 203, and the selection control circuit 203 adds and outputs the K processed multiplications.

Optionally, the shifting process performed on any one of the other K-1 products may be generated according to a bit position occupied by the multiplicand of the generated one of the multiplicative products in the M-bit multiplicand The multiplier of any one of the products is performed at the bit position occupied by the N-bit multiplier. Optionally, when performing a shift operation on other K-1 products, the shift control circuit 202 selects a moving direction and/or a direction included in the indication information received by the shift circuit corresponding to each of the other K-1 products. The number of bits moved is made. Optional, each shift circuit will be connected The direction in which the received product moves and the specific number of bits moved can be determined by:

The K multiplicand sequences received by the K multipliers are obtained by allocating a source multiplicand sequence according to the bit width of the multiplier in order of high to low, and K multiplier sequences received by K multipliers. It is obtained by assigning a source multiplier sequence according to the bit width of the multiplier in order of high to low. In general, the shift circuit needs to determine the number of bits of the received product shift based on the bit position of the multiplicand of each multiplier in the source multiplicand and the bit position of the multiplier in the source multiplier.

Illustratively, the M bit source is multiplied by X[9:0], which can be split into X[8:6], X[5:3], and X[2:0], N bit source multipliers. The input sequence is Y[15:0], which can be split into: Y[14:10], Y[9:5], and Y[4:0], assuming that the multiplicator received by the multiplier is X[ 8:6], the multiplier is Y[14:10], then the shift control circuit 202 controls the shift register connected to the multiplier to shift the product of the multiplier output to the left by 16 bits. Assuming that the multiplier received by the multiplier is X[5:3] and the multiplier is Y[14:10], the shift control circuit 202 controls the shift register connected to the multiplier to output the multiplier. The product is shifted left by 13 bits.

Illustratively, as shown in FIG. 15, the digital signal processing apparatus includes four 8-bit×8-bit multipliers for implementing a 16-bit×16-bit multiplier as an example. The inputs of the 8bit×8bit multiplier 8011 are in_X1[7:0] and in_Y1[7:0], respectively, and the inputs of the 8bit×8bit multiplier 8012 are in_X2[7:0], in_Y2[7:0], and 8bit× respectively. The inputs of the 8-bit multiplier 8013 are in_X3[7:0] and in_Y3[7:0], respectively, and the inputs of the 8-bit×8-bit multiplier 8014 are in_X4[7:0] and in_Y4[7:0], respectively. Optionally, in the actual process, the source multiplicand X[15:0] and the source multiplier Y[15:0], which will include 16 bit bits, may be split according to the bit order from high to low. The multiplicands X[15:8] and X[7:0], and the multipliers Y[15:8] and [Y7:0] are stored in the registers shown in Figure 15, respectively, where X[ 7:0] as X1[7:0], X[15:8] as X2[7:0], Y[15:8] as Y1[7:0], Y[7:0] as Y2[7 :0], specifically, X1[7:0] can be used as the multiplicand input in_X1[7:0] of the 8bit×8bit multiplier 8011, and the multiplicand input of the 8bit×8bit multiplier 8012 is input in_X2[7:0 ], X2[7:0] can be used as the multiplicand input in_X3[7:0] of the 8bit×8bit multiplier 8013, and the multiplicand input of the 8bit×8bit multiplier 8014 is input in_X4[7:0]; Y1[7 :0] can be used as the multiplier input in_Y1[7:0] of the 8bit×8bit multiplier 8011, and the multiplier input in_Y3[7:0] of the 8bit×8bit multiplier 8013, and Y2[7:0] can be used as the 8bit× The multiplicand input of the 8-bit multiplier 8012 is input to in_X2[7:0], and the multiplicand of the 8-bit×8-bit multiplier 8014 is input to in_X4[7:0].

In order to make the output of four 8-bit × 8-bit multipliers the same as the output of a 16-bit × 16-bit multiplier, the following formula needs to be satisfied:

Z[31:0]=X[15:0]×Y[15:0]=((2^8)*X[15:8]+X[7:0])×((2^8)* Y[15:8]+Y[7:0])=(2^16)*(X[15:8]×Y[15:8])+(2^8)*((X[15:8] ]×Y[7:0])+(X[7:0]×Y[15:8]))+(2^0)*(X[7:0]×Y[7:0])=( 2^16)*(in_X4[7:0]×in_Y4[7:0])+(2^8)*((in_X3[7:0]×inY3[7:0])+(in_X2[7:0 ] ×in_Y2[7:0])+(2^0)*(in_X1[7:0]×in_Y1[7:0]), where 2^16 means left shift 16bit, 2^8 means left shift 8bit, 2^0 means no shifting.

Therefore, the shift circuit 9011 does not perform the shift operation on the received product, the shift circuit 9012 and the shift circuit 9013 shift the received product to the left by 8 bits, and the shift circuit 9014 shifts the received product to the left by 16 bits. Then, the selector included in the shift circuit 9011 outputs the unshifted product to the selection control circuit 203, and the selector included in the shift circuit 9012 outputs the product of the left shift of 8 bits to the selection control circuit 203, and the selection included in the shift circuit 9013 Device The product shifted left by 8 bits is output to the selection control circuit 203, and the selector included in the shift circuit 9014 outputs the product shifted left by 16 bits to the selection control circuit 203. Optionally, the first configuration information may correspondingly indicate that the shifting circuit performs a corresponding shift or no shift operation, and the third configuration information may correspondingly instruct the selector to select a corresponding processed product output to the selection control circuit. . The selection control circuit 203 shown in FIG. 15 performs an accumulation operation on an unshifted product and three shifted products output from the shift control circuit 202, and outputs it, specifically selecting the processing procedure in which the control circuit 203 performs the accumulation operation and The processing procedure of the selection control circuit 203 described in the foregoing embodiment for implementing the convolution operation is similar, and the details are not described herein again.

16 differs from FIG. 15 in that the digital signal processing apparatus shown in FIG. 16 includes nine 4 bit x 4 bit multipliers that can be used to implement the functions of a 12 bit x 12 bit multiplier. In Figure 16, the source multiplicand A is X[11:0], which can be divided into a2=X[11:8], a1=X[7:4], and a0=X[3:0], source multiplier B is Y[11:0] and can be divided into b2=Y[11:8], b1=Y[7:4], and b0=Y[3:0], where a0 is used as in_X1[3:0], in_X2[3:0] and in_X3[3:0], b0 is used as in_Y1[3:0], in_Y2[3:0], and in_Y 3[3:0], a1 is used as in_X4[3:0], in_X5 [3:0] and in_X6[3:0], b1 is used as in_Y4[3:0], in_Y5[3:0], and in_Y 6[3:0], and a2 is used as in_X7[3:0], in_X8[ 3:0] and in_X9[3:0], b2 is used as in_Y7[3:0], in_Y8[3:0], and in_Y 9[3:0]. In order to satisfy Z[23:0]=X[11:0]×Y[11:0]=((2^8)*X[11:8]+(2^4)*X[7:4]+ X[3:0])×((2^8)*Y[11:8]+(2^4)*Y[7:4]+Y[3:0])=(2^16)*( X[11:8]×Y[11:8])+(2^12)*(X[11:8]×Y[7:4])+(2^8)*(X[11:8] ×Y[3:0])+(2^12)*(X[7:4]×Y[11:8])+(2^8)*(X[7:4]×Y[7:4 ])+(2^4)*(X[7:4]×Y[3:0])+(2^8)*(X[3:0]×Y[11:8])+(2^ 4)*(X[3:0]×Y[7:4])+(X[3:0]×Y[3:0])=((2^8)*in_X3[3:0]+( 2^4)*in_X2[3:0]+in_X1[3:0])×((2^8)*in_Y3[3:0]+(2^4)*in_Y2[3:0]+in_Y1[3 :0]), therefore, the shift circuit 1111 in the shift control circuit 202 does not perform the shift operation on the received product, and the shift circuit 1112 shifts the received product to the left by 4 bits, and the shift circuit 1113 receives the received The product is shifted left by 8 bits, the shift circuit 1114 shifts the received product to the left by 4 bits, the shift circuit 1115 shifts the received product to the left by 8 bits, and the shift circuit 1116 shifts the received product to the left by 12 bits, shifting The circuit 1117 shifts the received product to the left by 8 bits, the shift circuit 1118 shifts the received product to the left by 12 bits, and the shift circuit 1119 shifts the received product to the left by 16 bits. Specifically, the shift circuit in FIG. In 1111 The selector is used to output the unshifted product to the selection control circuit 203, and the remaining selectors included in the shift control circuit 202 except the selector in the shift circuit 1111 are used to shift the respective connected shift registers. The product after the bit is output to the selection control circuit 203. The function and processing procedure of the selection control circuit 203 in FIG. 16 and the selection control circuit 203 in FIG. 15 are the same, and the details are not described herein again.

Embodiments in which the multiplier bit widths of the K multipliers are different from the multiplicand bit widths will be respectively described below with reference to FIGS. 17 to 20.

The digital signal processing apparatus shown in FIG. 17 includes ab Abit×Bbit multipliers whose multiplicands are in_X1[A:0], in_X2[A:0], in_X3[A:0], in_X4[A, respectively. :0]...in_X ab[A:0], whose multipliers are in_Y1[B:0], in_Y2[B:0], in_Y3[B:0], in_Y4[B:0]...in_Yab[B:0 In the digital signal processing apparatus shown in FIG. 17, the output may be an output result when each Abit×Bbit multiplier of ab Abit×Bbit multipliers is used independently, or multiplication by ab Abit×Bbit. The output result when the y × y convolution operation is realized, or the output result of the multiplier of Mbit × Nbit (satisfying: A = M / a, B = N / b) is realized by ab Abit × Bbit multipliers .

Illustratively, FIG. 18 to FIG. 21 are described by taking M=27, N=16, A=9, B=8, and K=a×b=6 as an example.

As shown in FIG. 18, the digital signal processing apparatus includes six 9-bit x 8-bit multipliers, which can be used to implement the functions of a 27-bit x 16-bit multiplier. The source multiplicand A=X[26:0] can be divided into a2=[26:18], a1=[17:9], and a0=[8:0]; source multiplier B=Y[15:0] Can be divided into b1=[15:8] and b0=[7:0]; where a0=[8:0] is used as in_X1[8:0], and as in_X2[8:0] is input to 9bit × 8bit multiplier 601 and 9bit × 8bit multiplier 603. A1=[17:9] is used as in_X3[8:0] and in_X4[8:0] are input to the 9-bit×8-bit multiplier 603 and the 9-bit×8-bit multiplier 604, respectively. A2=[26:18] is used as in_X5[8:0], and is used as in_X6[8:0] to be input to the 9-bit×8-bit multiplier 605 and the 9-bit×8-bit multiplier 606, respectively. B1=[15:8] is used as in_Y2[7:0], in_Y4[7:0], and in_Y6[7:0] are input to the 9-bit×8-bit multiplier 602, the multiplier 604, and the multiplier 606, respectively, b0= [7:0] is used as in_Y1[7:0], in_Y3[7:0], and in_Y5[7:0] are input to 9bit×8bit multiplier 601, 9bit×8bit multiplier 603, and 9bit×8bit multiplier, respectively. 605.

Because it needs to satisfy Z[42:0]=X[26:0]×Y[15:0]=((2^18)*X[26:18]+(2^9)*X[17:9] +(2^0)*X[8:0])×((2^8)*Y[15:8]+(2^0)*Y[7:0])=(2^26)*( X[26:18]×Y[15:8])+(2^18)*(X[26:18]×Y[7:0])+(2^17)*(X[19:9] ×Y[15:8])+(2^9)*(X[17:9]×Y[7:0])+(2^8)*(X[8:0]×Y[15:8 ])+(2^0)*(X[7:0]×Y[7:0])=(2^26)*(in_X6[8:0]×in_Y2[7:0])+(2^ 18)*(in_X5[8:0]×in_Y1[7:0])+(2^17)*(in_X4[8:0]×in_Y2[15:8])+(2^9)*(in_X3[ 8:0]×in_Y1[7:0])+(2^8)*(in_X3[8:0]×in_Y2[15:8])+(2^0)*(in_X1[8:0]×in_Y1 [7:0]), therefore shift circuit 701 does not perform a shift operation on the received product; shift circuit 702 shifts the received product to the left by 8 bits; shift circuit 703 shifts the received product to the left by 9 bits The shift circuit 704 shifts the received product to the left by 17 bits; the shift circuit 705 shifts the received product to the left by 18 bits; and the shift circuit 706 shifts the received product to the left by 26 bits. The selector in the shift circuit 701 is for outputting the unshifted product to the selection control circuit 203, and the remaining selectors of the shift control circuit 202 except the selector in the shift circuit 701 are used to receive the respective ones respectively. The shifted product of the shift register is output to the selection control circuit 203, and the selection control circuit 203 is configured to perform an accumulation operation on an unshifted product output from the shift control circuit 202 and the remaining shifted products, the specific flow For details, refer to the description of the above embodiments, which are not described herein.

As shown in FIG. 19, FIG. 19 implements a matrix [d1 d2 d3] and a matrix with a digital signal processing apparatus including six 9-bit×8-bit multipliers.

The product between them is an example.

Wherein d1 is used as in_X1[8:0] and in_X4[8:0] are input to the 9bit×8bit multiplier 601 and the 9bit×8bit multiplier 604 as the multiplicand; d2 is used as in_X2[8:0] and In_X5[8:0] is input to the 9bit×8bit multiplier 602 and the 9bit×8bit multiplier 605 as the multiplicand; d3 is used as the in_X3[8:0] and the in_X6[8:0] respectively input to the 9bit×8bit The multiplier 603 and the 9-bit × 8 bit multiplier 606 are used as multiplicands; k1 is used as in_Y1[7:0] and in_Y2[7:0] are input to the 9-bit × 8-bit multiplier 601 and the 9-bit × 8-bit multiplier 602, respectively. The multiplier; k2 is used as an in_Y2[7:0] input to the 9bit×8bit multiplier 602 as a multiplier; k3 is used as an in_Y3[7:0] input to the 9bit×8bit multiplier 603 as a multiplier; j1 is used as a multiplier; In_Y4[7:0] is input to the 9bit×8bit multiplier 604 It is a multiplier; j2 is used as in_Y5[7:0] input to the 9bit×8bit multiplier 605 as a multiplicand; j3 is used as in_Y6[7:0] input to the 9bit×8bit multiplier 606 as a multiplier. Finally, the output of the device output as shown in FIG. 19 is Z1=(d1×k1)+(d2×k2)+(d3×k3); Z2=(d1×j1)+(d2×j2)+(d3× J3) Therefore, it is understood that the shift circuit 701, the shift circuit 702, ..., and the shift circuit 706 do not perform the instruction information of the shift operation.

As shown in FIG. 20, FIG. 20 is an example of realizing six independent 9-bit by 8-bit multiplier functions by a digital signal processing apparatus including six 9-bit by 8-bit multipliers. The input of each 9-bit x 8-bit multiplier in Fig. 20 is the same as that of each 9 bit x 8 bit in Fig. 18. The difference between Fig. 20 and Fig. 18 is that the shift control circuit 202 of Fig. 20 does not perform shifting on the received K products. The operation, that is, each of the shift circuit 701, the shift circuit 702, ..., and the shift circuit 706 does not perform a shift operation for each of the products received. Therefore, in the apparatus shown in Fig. 20, the output result satisfies: Z1 = (in_X1[8:0] × in_Y1[7:0]); Z2 = (in_X2[8:0] × in_Y2[7:0]); Z3 =(in_X3[8:0]×in_Y3[7:0]); Z4=(in_X4[8:0]×in_Y4[7:0]); Z5=(in_X5[8:0]×in_Y5[7:0 ]) and Z6=(in_X6[8:0]×in_Y6[7:0]).

Optionally, any of the digital signal processing devices provided by the embodiments of the present application may be integrated into the programmable logic device. In a specific example, the programmable logic device can be an FPGA.

As shown in FIG. 21, FIG. 21 is a schematic flowchart diagram of a digital signal processing method provided by the present application, which can be applied to a digital signal processing apparatus, for example, any of the digital signal processing shown in FIG. 3 to FIG. Devices, including:

S101, K multipliers perform K multiplication operations to obtain K products, and K products are output to the shift control circuit, and the i-th multiplication operation in the K multiplication operations realizes the multiplicand bit width as M _i bits, multiplied A multiplication operation in which the digit width is N _i bits, where M _i and N _i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2.

S102. The shift control circuit performs shift control processing on the K products, obtains K processed products, and outputs the K processed products to the selection control circuit.

S103. The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.

Optionally, the step S102 is specifically implemented by: each of the K shift circuits included in the shift control circuit performs a shift control process on one of the K products to obtain K processes. The product of each of the shift circuits outputs the resulting processed product to the selection control circuit.

Optionally, the shift control process includes: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different.

As a possible implementation manner, the shift control circuit includes K shift circuits, and any one of the K shift circuits includes a shift register and a selector, wherein one end of the selector and the shift register Connect, the other end of the selector is connected to a multiplier. The selector receives the output of the shift register and multiplier and selects one of them as its own output. The shift register is used to shift the product of the multiplier output and then output to the selector.

Optionally, the method provided by the application is used to implement at least one of the following functions: a multiply and accumulate operation function, a function of K multipliers, and a function of an M×N multiplier, where the M×N multiplier indicates that the multiplication is performed. Multiplier with a bit width of M bits and a multiplier bit width of N bits, and satisfies

Optionally, when the method provided by the present application is used to implement the multiply and accumulate operation, step S102 is specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted processes. The product is output, and K products that are not subjected to the shift processing are output to the selection control circuit.

Specifically, the K shift circuits included in the shift control circuit do not perform shift processing on the respective received products to obtain K products that are not subjected to the shift processing. As a possible implementation, each of the K shifting circuits will multiply the product of the respective multiplier outputs as K products that are not subjected to shift processing.

Step S103 can be specifically implemented by the following method: the selection control circuit performs the accumulation operation on the received products that are not subjected to the shift processing, and outputs K multiplication accumulation addition results that are not subjected to the shift processing.

Optionally, when the method provided by the present application is used to implement the functions of the K multipliers, step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted The processed product is output to the selection control circuit by K products that are not subjected to the shift processing.

Step S103 can be specifically implemented by the following method: the shift control circuit directly outputs the received K products that have not undergone the shift processing.

Optionally, when the method provided by the present application is used to implement the function of the M×N multiplier, step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on one of the received K products. And performing shift processing on the other K-1 products of the received K products to obtain K processed products, and outputting the K processed products to the selection control circuit;

Step S103 can be specifically implemented by: the shift control circuit adds and outputs the received K processed multiplications.

Optionally, the method provided by the application further includes: S104: Receive first configuration information, where the first configuration information is used to instruct the shift control circuit to perform a shift control process on the received K products.

Optionally, the first configuration information includes indication information configured for at least one of the shift control circuits, wherein the indication information is used to indicate that the at least one shift circuit performs shift control on the product received by the shift circuit. deal with.

Optionally, the indication information is further used to indicate a shift direction and/or a shift bit number when the shift control process is performed by the at least one shift circuit on the product received by the at least one shift circuit.

Specifically, the first configuration information includes indication information configured for each shift circuit.

Optionally, the method provided by the application further includes: S105: Receive second configuration information, where the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output the K processed products.

As shown in FIG. 22, the present application provides a programmable logic device (PLD), the programmable logic device including at least the digital signal processing device as described in any one of embodiments of FIGS. 3 to 20. The programmable logic device can be applied to scenes that require digital signal processing (eg, multiply-accumulate calculation), such as radar, artificial intelligence, deep learning, image processing, video processing, wireless baseband/medium radio, satellite navigation, and the like.

Specifically, the programmable logic device can be a field programmable gate array as shown in FIG. Programmable Gate Array (FPGA), specifically, the digital signal processing device may be a digital signal processor (DSP) as shown in FIG. The programmable logic device can also be a Complex Programable Logic Device (CPLD) or an Erasable Programmable Logic Device (EPLD). Of course, the programmable logic device may also include Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Array Logic (PAL), Generic Array Logic (GAL). )Wait.

The FPGA provided in this application can be used in industrial control, consumer electronics and other related fields, and provides a large number of multiplication and addition capabilities due to the large number of DSP hard core units included in the FPGA. Since the digital signal processing device provided by the present application includes a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions. The FPGA provided by the present application can provide a large number of multiplication and addition functions through the included DSP hard core, and can be used as the most important deep learning algorithm in artificial intelligence (especially in the Convolutional Neural Network (CNN)). Use FPGA as a deep learning processor.

As can be seen from Figure 22, FPGA is currently based on look-up table technology, and integrates common functions (such as RAM, clock management and DSP) hard core (Hardcore, ASIC type) module; because the lookup table based FPGA has a very high The degree of integration, with device densities ranging from tens of thousands to tens of millions of gates, can complete extremely complex timing and logic combination logic functions, so it is suitable for high-speed, high-density high-end digital logic circuit design. The main components of FPGA are: programmable input/output unit (I/O Block, IOB), Configurable Logic Resource Block (CLB), complete clock management, embedded block random access memory (Random access memory). , RAM), rich routing resources, DSP hard core units and other embedded dedicated hardware modules.

The programmable logic resource unit in Figure 22 can be programmed to perform a variety of circuits and functions, including programmable Look Up Tables (LUTs) and registers. The number of CLBs has reached millions of levels (K x K). In FIG. 22, horizontal and vertical lines indicate routing resources, and the input and output interconnections of each CLB can be completed by programming. The wiring resources are connected to various programmable resources in the FPGA, and the wiring of the FPGA in FIG. 22 adopts a short-line interconnection architecture.

In Figure 22, the DSP represents the digital signal processing hard core unit (DSP Hardcore) inside the FPGA, which can be configured to perform complex signal processing operations such as multiplication and addition operations, and multiply and accumulate operations to satisfy the user's video decoding and Fourier transform. The need for signal and image processing has become the most important hard core unit for FPGA signal processing. The DSP in FIG. 22 can use any of the data signal processing devices provided in the embodiments of the present application.

In a specific example, in the deep learning operation, in particular, most of the operations in the convolutional neural network processing operation are convolution operations, most typically convolution operations of N×N matrices, such as 2×2 or A 3×3 or 5×5 convolution operation is shown in FIG.

If the DSP multiplier in the FPGA is implemented in the conventional technical solution, N multipliers are needed to realize one convolution operation at the same time. N multipliers need to use logic circuits to cascade into the required multiply and accumulate operations. The part of the logic circuit is a programmable circuit and needs to occupy other programmable logic resources other than DSP. This would be limited by the logic used for cascading so that the programmable part could not run to too high a frequency, although the DSP hard core It can run to a higher frequency, but the overall running computing performance is limited, and the computing power is wasted. For example, as shown in FIG. 24, the convolution operation completes the multiplication of the multiplicand and the multiplier corresponding value, and implements an accumulation operation. The completion of the 2×2 convolution operation in the present application requires only one digital signal processing device provided in the embodiment of the present application.

In a possible implementation, when the apparatus in the present application is used in a deep learning algorithm, the multiplicand may be a parameter of some data (such as image, sound, and text) received by the device in the present application. The multiplier can be a fixed parameter located in the Kernel.

The digital signal processing device in the embodiment of the present application is embedded in the FPGA in a hard core manner. Since it is an ASIC-based hard core circuit, it can provide certain flexibility while ensuring an optimal operating frequency and providing the highest operational efficiency. Since the digital signal processing device is provided with a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions.

Finally, it should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A digital signal processing device, comprising: K multipliers, a shift control circuit and a selection control circuit, wherein

The i-th multiplier of the K multipliers is used to implement a multiplication operation in which the multiplicand bit width is M i bits and the multiplier bit width is N i bits, where both M i and N i are positive integers , i=1, 2...K, K is an integer greater than or equal to 2;

The shift control circuit is connected to the K multipliers for receiving K products of the K multiplier outputs, and for performing shift control processing on the K products to obtain K processes a subsequent product, and outputting the K processed products to the selection control circuit;

The selection control circuit is connected to the shift control circuit for receiving the K processed products sent by the shift control circuit, and outputting the accumulated result or output of the K processed products The K processed products.
The apparatus according to claim 1, wherein said shift control circuit comprises K shift circuits, each of said K shift circuits and one of said K multipliers The multipliers are connected, and each of the shifting circuits is configured to receive a product of a multiplier output and perform a shift control process on the product of the output of the one multiplier to obtain a processed product.
The apparatus according to claim 1 or 2, wherein the shift control processing comprises shifting or not shifting.
Apparatus according to any of claims 1-3, wherein said selection control circuit comprises an accumulation circuit for effecting accumulation of said K processed products.
Apparatus according to any one of claims 1 to 4, wherein said apparatus is operative to implement at least one of the following functions:

Multiply and accumulate functions;

The function of the K multipliers;

The function of an M×N multiplier, wherein the M×N multiplier represents a multiplier having a multiplicand bit width of M bits and a multiplier bit width of N bits, and satisfies
K = K a · K b .
The apparatus according to claim 5, wherein said apparatus is configured to implement said multiply and accumulate operation,

The shift control circuit outputs the received K products of the K multipliers to the selection control circuit without performing a shift process;

The selection control circuit adds and outputs the received K multiplications.
The apparatus according to claim 5, wherein said apparatus is configured to implement the functions of said K multipliers,

The shift control circuit outputs the received K products of the K multipliers to the selection control circuit without performing a shift process;

The selection control circuit outputs the received K products.
The apparatus according to claim 5, wherein said apparatus is for implementing a function of an M x N multiplier

The shift control circuit does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products to obtain a Deriving the K processed products, and outputting the K processed products to the selection control circuit, wherein the shift processing performed on any one of the other K-1 products is generated according to The bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit position occupied by the multiplier generating the arbitrary one product in the N-bit multiplier;

The selection control circuit adds and outputs the K processed multiplications.
The apparatus according to any one of claims 1-8, wherein the digital signal processing apparatus further comprises configuration information receiving circuitry for receiving first configuration information, the first configuration information being used to indicate the The shift control circuit performs the shift control process on the received K products.
The apparatus according to claim 9, wherein said first configuration information comprises indication information configured for at least one of said shift control circuits, wherein said indication information is for indicating said At least one shift circuit performs a shift control process on the product received by the shift circuit.
The apparatus according to claim 10, wherein said indication information is further for indicating a shift direction when said at least one shift circuit performs a shift control process on a product received by said at least one shift circuit And / or shift the number of bits.
The apparatus according to any one of claims 1 to 11, wherein the digital signal processing apparatus further comprises configuration information receiving circuitry for receiving second configuration information, the second configuration information being used to indicate the The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
Apparatus according to any one of claims 1 to 12 wherein said digital signal processing means is integrated in a programmable logic device.
A programmable logic device, characterized in that the programmable logic device comprises a digital signal processing device according to any of claims 1-12.
A digital signal processing method, the method comprising:

K multiplication operations are performed to obtain K products, and the K products are output to a shift control circuit, and an ith multiplication operation of the K multiplication operations realizes a multiplicand bit width of M i bits, a multiplier bit a multiplication operation of N i bits, where M i and N i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2;

The shift control circuit performs shift control processing on the K products to obtain K processed products, and outputs the K processed products to a selection control circuit;

The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
The method according to claim 15, wherein said shift control circuit performs shift control processing on said K products to obtain K processed products, and outputs said K processed products To select control circuits, including:

Each of the K shift circuits included in the shift control circuit performs shift control processing on one of the K products to obtain K processed products;

Each of the shift circuits outputs the obtained processed product to the selection control circuit.
The method according to claim 15 or 16, wherein the shift control process comprises: Shift or not shift.
A method according to any one of claims 15-17, wherein the method is for implementing at least one of the following functions:

Multiply and accumulate functions;

The function of the K multipliers;

The function of an M×N multiplier, wherein the M×N multiplier represents a multiplier having a multiplicand bit width of M bits and a multiplier bit width of N bits, and satisfies
K = K a · K b .
The method according to claim 18, wherein when the method is used to implement the multiply and accumulate operation, the shift control circuit performs shift control processing on the K products to obtain K processed Product, and outputting the K processed products to the selection control circuit, including:

The shift control circuit does not perform shift processing on the K products, obtains K products that are not subjected to shift processing, and outputs the K products that have not undergone shift processing to the selection control circuit;

The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:

The selection control circuit performs an accumulation operation on the received K non-shifted products and outputs the K multiplication accumulation addition results that are not subjected to the shift processing.
The method according to claim 16, wherein when the method is used to implement the functions of the K multipliers, the shift control circuit performs shift control processing on the K products to obtain K The processed product, and outputting the K processed products to the selection control circuit, comprising:

The shift control circuit does not perform shift processing on the K products, obtains K products that are not subjected to shift processing, and outputs the K products that have not undergone shift processing to the selection control circuit;

The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:

The shift control circuit directly outputs the received products that are not subjected to the shift processing.
The method according to claim 16, wherein when the method is used to implement the function of the M×N multiplier, the shift control circuit performs shift control processing on the K products to obtain K a processed product, and outputting the K processed products to the selection control circuit, comprising: the shift control circuit does not perform shift processing on one of the received K products, Performing shift processing on the other K-1 products of the K products, obtaining the K processed products, and outputting the K processed products to the selection control circuit, wherein a shift process performed on any one of the other K-1 products, generating a bit position according to a bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand The multiplier of the product is performed at the bit position occupied by the N-bit multiplier;

The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:

The shift control circuit adds and outputs the received K processed multiplications.
The method according to any one of claims 15 to 21, wherein the method further comprises:

Receiving first configuration information, the first configuration information is used to instruct the shift control circuit to receive the received The K products perform the shift control process.
The method of claim 22, wherein the first configuration information comprises indication information configured for at least one of the shift control circuits, wherein the indication information is for indicating the At least one shift circuit performs a shift control process on the product received by the shift circuit.
The method according to claim 23, wherein said indication information is further for indicating a shift direction when said at least one shift circuit performs a shift control process on a product received by said at least one shift circuit And / or shift the number of bits.
The method according to any one of claims 15 to 24, further comprising: receiving second configuration information, the second configuration information being used to instruct the selection control circuit to output the K processes The accumulated result of the latter product or the K processed product is output.