WO2019019196A1 - Digital signal processing method and device and programmable logic device - Google Patents

Digital signal processing method and device and programmable logic device Download PDF

Info

Publication number
WO2019019196A1
WO2019019196A1 PCT/CN2017/095061 CN2017095061W WO2019019196A1 WO 2019019196 A1 WO2019019196 A1 WO 2019019196A1 CN 2017095061 W CN2017095061 W CN 2017095061W WO 2019019196 A1 WO2019019196 A1 WO 2019019196A1
Authority
WO
WIPO (PCT)
Prior art keywords
shift
control circuit
products
multiplier
bit
Prior art date
Application number
PCT/CN2017/095061
Other languages
French (fr)
Chinese (zh)
Inventor
杨伟国
潘剑锋
陈秀波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/095061 priority Critical patent/WO2019019196A1/en
Publication of WO2019019196A1 publication Critical patent/WO2019019196A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Definitions

  • the embodiments of the present invention relate to the field of digital circuits, and in particular, to a digital signal processing method, apparatus, and programmable logic device.
  • Programmable logic devices such as Field Programmable Gate Array (FPGA), include Digital Signal Processor Hardcore (DSP Hardcore).
  • DSP Hardcore can be configured for multiplication, addition, and multiply-accumulate. The arithmetic processing of the signal, therefore, FPGA can provide the operations required for deep learning (for example, convolution multiplication and accumulation operations), and the industry generally uses FPGA as a deep learning processor.
  • FIG. 1 shows a DSP Hardcore internal structure provided in the prior art, wherein the bit width of the multiplier 103 is generally fixed to M bit (bit) ⁇ N bit.
  • bit width of the multiplier 103 is generally fixed to M bit (bit) ⁇ N bit.
  • bit width required for deep learning is smaller than the bit width that the multiplier itself has.
  • the prior art usually fills X bits of the bit width multiplier (for example, 19 bit ⁇ 18 bits) in order of high to low to realize the use of a large bit width multiplier to realize the small bit.
  • FIG. 2 shows a schematic diagram of using a 19-bit ⁇ 18-bit multiplier as an 8-bit ⁇ 8-bit multiplier, as shown in FIG. 2, by inputting a 19-bit ⁇ 18-bit multiplier. Fill 11 zeros and fill the other input with 10 zeros to use the 19bit x 18bit multiplier as an 8bit x 8bit multiplier.
  • a large bit width multiplier can be used as a small bit width multiplier as needed when used as a small bit width multiplier, for example, a 19 bit x 18 bit multiplier. Can only be used as an 8bit ⁇ 8bit multiplier. A large bit width multiplier cannot be split into multiple smaller bit width multipliers at the same time, and padding 0 in the high bit bit is used to implement the large bit width multiplier as a small bit width multiplier. A lot of computing power and computing resources are wasted.
  • the embodiment of the present invention provides a digital signal processing method, device, and programmable logic device, which solves the problems of using a fixed bit width multiplier in the prior art, resulting in low resource utilization and waste of computing performance.
  • the shift control circuit is connected to K multipliers for receiving K products of K multiplier outputs, and for performing shift control processing on K products, obtaining K processed products, and K
  • the processed product is output to the selection control circuit;
  • the selection control circuit is connected to the shift control circuit for receiving the K processed products transmitted by the shift control circuit, and outputting the accumulated result or output of the K processed products.
  • K processed products are examples of the K processed products.
  • the present application provides a digital signal processing apparatus that performs shift control processing on K products outputted by each multiplier of K multipliers by using a shift control circuit to obtain K processed products, and K processing The latter product is output directly by the selection control circuit or after the accumulation operation is performed.
  • K multipliers can be used to implement K multiplication operations, but also K multipliers can be used to implement multiply and accumulate operations.
  • K multipliers can be used to implement functions of large bit width multipliers, and matrix volumes can also be used.
  • the product operation completes the multiplication and accumulation required for a convolution operation, which maximizes the computing power of the digital signal processing device, reduces the waste of computing resources, and can be applied to different scenarios or devices according to different digital signal processing requirements.
  • the digital signal processing device provided by the present application can be embedded in a programmable logic device (such as a field programmable gate array FPGA) in a hard core manner.
  • the digital signal processing device provided by the present application has a selection control circuit and a shift control.
  • the circuit provides a certain flexibility, while ensuring a better operating frequency and providing higher computational efficiency, thus being suitable for various computing scenarios, for example, providing an efficient combination of matrix convolution operations in the deep learning process. Arithmetic unit.
  • the shift control circuit includes K shift circuits, one of each of the K shift circuits and one of the K multipliers The multipliers are connected, each shifting circuit is used to receive the product of a multiplier output, and the product of the output of one multiplier is subjected to shift control processing to obtain a processed product.
  • the product of each multiplier output can be shifted more flexibly and accurately.
  • the shift control process comprises: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different. Not only can the function of the large bit width multiplier be realized by setting the shift or no shift operation, but also the function of each multiplier can be realized.
  • the selection control circuit includes an accumulation circuit, and the accumulation circuit is configured to implement K The accumulation of the processed product.
  • the multiply-accumulate operation or the multiplier function of achieving a larger bit width can be implemented by the apparatus provided by the present application by performing an accumulation operation on the K processed products.
  • the digital signal processing apparatus provided by the embodiment of the present application is used to implement the following functions. At least one of the multiply-accumulate operation function; the function of the K multipliers; the function of the M ⁇ N multiplier, wherein the M ⁇ N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits Multiplier and satisfy
  • the digital signal processing apparatus provided by the embodiment of the present application is used to implement multiply and accumulate In the operation, the shift control circuit directly outputs the K products of the received K multiplier outputs to the selection control circuit without performing shift processing; wherein, any one of the other K-1 products The shift processing by the product, the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit occupied by the multiplier generating the arbitrary one product in the N-bit multiplier Positioning; selecting the control circuit, The received K multiplications are added and output.
  • the structure of the digital signal processing apparatus provided by the present application may be changed, only the shift control circuit does not perform the shift operation on the received K products, and the control circuit is selected, and the received K signals are received. This can be achieved by multiplying and adding and outputting.
  • the digital signal processing apparatus provided by the embodiment of the present application is used to implement K
  • the shift control circuit outputs the K products of the received K multiplier outputs to the selection control circuit without shift processing, and selects the control circuit to output the received K products.
  • the structure of the digital signal processing apparatus provided by the present application may not be changed only by performing shift processing on the received K products for the shift control circuit, and shifting by selecting the control circuit
  • the K products of the bit control circuit that are not shifted are directly outputted, which is achieved, thus increasing the application range of the device provided by the present application.
  • the digital signal processing apparatus provided by the embodiment of the present application is used to implement M ⁇
  • the shift control circuit does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products.
  • K processed products, and K processed products are output to the selection control circuit; the selection control circuit adds and outputs the K processed multiplications.
  • the present application can perform a shift operation of a different number of bits by using a shift control circuit for the product of each of the K multiplier outputs to implement the function of the large bit width multiplier.
  • the digital signal processing apparatus further includes a configuration information receiving circuit, configured to receive the first A configuration information, the first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products, respectively.
  • the shift control circuit may cause the shift operation or the shift operation not to be performed on the product outputted by each of the multipliers according to the first configuration information.
  • the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit.
  • each shift circuit can be caused to perform shift processing or no shift processing on the respective received products in accordance with the respective received indication information.
  • the indication information is further used to indicate that the at least one The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.
  • the digital signal processing apparatus further includes configuration information receiving circuit, configured to receive And second configuration information, the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output K processed products.
  • the digital signal processing apparatus is integrated in the programmable logic device.
  • an embodiment of the present application provides a programmable logic device including digital signal processing as described in any one of the first aspect to the eleventh possible implementation manner of the first aspect.
  • the programmable logic device comprises a Field Programmable Gate Array (FPGA), a Complex Programable Logic Device (CPLD), and an erasable editable logic device (Erasable Programmable Logic). At least one of Device, EPLD).
  • an embodiment of the present application provides a digital signal processing method.
  • the method provided by the embodiment of the present application includes: performing K multiplication operations to obtain K products, and outputting K products to a shift control circuit, and K multiplication operations.
  • the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit; and selects control circuit output K The accumulated result of the processed products or the K processed products.
  • the shift control circuit performs a shift control process on the K products to obtain K processed products, including: K included in the shift control circuit
  • Each shift circuit in the shift circuit performs shift control processing on one of the K products to obtain K processed products.
  • the shift control process includes: shifting or not shifting.
  • the method provided by the embodiment of the present application is used to implement at least one of the following functions.
  • M ⁇ N multiplier represents multiplier whose multiplicand bit width is M bits and multiplier bit width is N bits
  • the shift control circuit when the method provided by the embodiment of the present application is used to implement the multiply and accumulate operation, the shift control circuit performs shift control processing on the K products to obtain K processed products, and outputs the K processed products to the selection control circuit, including: the shift control circuit does not shift the K products.
  • Processing obtaining K products that are not subjected to shift processing, and outputting K products that have not undergone shift processing to the selection control circuit; selecting the control circuit to output the accumulated result of the K processed products or outputting K processed samples
  • the product includes: the selection control circuit performs the accumulation operation on the received K non-shifted products and outputs K multiplication accumulation addition results that are not subjected to the shift processing.
  • the method provided by the embodiment of the present application is used to implement K multipliers.
  • the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit does not K products After the shift processing, K products which are not subjected to the shift processing are obtained, and K products which are not subjected to the shift processing are output to the selection control circuit; wherein, for any one of the other K-1 products, Shift processing, based on the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit position occupied by the multiplier generating the arbitrary product in the N-bit multiplier ; select the control circuit to output K processed multiplications
  • the accumulated result of the product or the output of the K processed products includes: the shift control circuit directly outputs the received K products that are not subjected to the shift processing
  • the method provided by the embodiment of the present application is used to implement an M ⁇ N multiplier
  • the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit receives the received One of the K products is not subjected to shift processing, and the other K-1 products in the received K products are subjected to shift processing to obtain K processed products, and K processed products are output.
  • the selection control circuit outputs the accumulated result of the K processed products or outputs the K processed products, and the shift control circuit adds and outputs the received K processed multiplications.
  • the method provided by the embodiment of the present application further includes: receiving the first configuration information
  • the first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products.
  • the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit.
  • the indication information is further used to indicate that the at least one shift circuit is configured to perform at least one shift The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.
  • the method provided by the embodiment of the present application further includes: receiving the second configuration information
  • the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or to output K processed products.
  • FIG. 1 shows an internal structure of a digital signal processor DSP provided in the prior art
  • FIG. 2 is a schematic structural diagram of a small bit width multiplier implemented by using a large bit width multiplier in the prior art
  • FIG. 3 is a schematic structural diagram 1 of a digital signal processing apparatus provided by the present application.
  • FIG. 4 is a schematic structural diagram 2 of a digital signal processing apparatus provided by the present application.
  • FIG. 5 is a schematic structural diagram 3 of a digital signal processing apparatus provided by the present application.
  • FIG. 6 is a schematic structural diagram 4 of a digital signal processing apparatus provided by the present application.
  • FIG. 7 is a schematic structural diagram 5 of a digital signal processing apparatus provided by the present application.
  • FIG. 8 is a schematic structural diagram 6 of a digital signal processing apparatus provided by the present application.
  • FIG. 9 is a schematic structural diagram 7 of a digital signal processing apparatus provided by the present application.
  • FIG. 10 is a schematic structural diagram VIII of a digital signal processing apparatus provided by the present application.
  • FIG. 11 is a schematic structural diagram IX of a digital signal processing apparatus provided by the present application.
  • FIG. 12 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • FIG. 13 is a schematic structural diagram 11 of a digital signal processing apparatus provided by the present application.
  • FIG. 14 is a schematic structural diagram 12 of a digital signal processing apparatus provided by the present application.
  • 15 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • 16 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • FIG. 17 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • FIG. 18 is a schematic structural diagram 16 of a digital signal processing apparatus provided by the present application.
  • FIG. 19 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • FIG. 20 is a schematic structural diagram of a digital signal processing apparatus provided by the present application.
  • 21 is a schematic flowchart of a digital signal processing method provided by the present application.
  • FIG. 22 is a schematic structural diagram of an FPGA provided by the present application.
  • Figure 23 is a schematic diagram 1 of a convolution operation
  • Figure 24 is a schematic diagram 2 of a convolution operation.
  • Q'b0 in the embodiment of the present application indicates 0 of Qbit.
  • 11'b0 indicates 11-bit
  • 10'b0 indicates 10-bit 0.
  • [M:0] represents all or part of the bit sequence of the input/output data, wherein all or part of the bit sequence of the input/output data includes M bits, for example, [7:0] in FIG. 2 represents the input data. A portion of the bit sequence includes 7 bits.
  • 2 ⁇ P in the embodiment of the present invention indicates that P bits are shifted left, for example, 2 ⁇ 16 indicates that 16 bits are shifted left.
  • FIG. 3 shows a digital signal processing apparatus 20 provided by an embodiment of the present application.
  • the present invention includes: a shift control circuit 202 connected to K multipliers 201 by K multipliers 201, and shift control.
  • the selection control circuit 203 is connected to the circuit 202.
  • the unit of the data width of the M i ⁇ N i multiplier is a bit.
  • the first input sequence includes M i bits
  • the multiplier may be a second input sequence
  • the second input sequence includes N i bits.
  • the shift control circuit 202 is configured to receive K products of K multiplier outputs, and perform shift control processing on K products to obtain K processed products, and output K processed products to The control circuit 203 is selected.
  • the selection control circuit 203 is configured to receive the K processed products transmitted by the shift control circuit 202, and output the accumulated result of the K processed products or output the K processed products.
  • the K multipliers 201 may share a shift control circuit 202 and a selection control circuit 203.
  • a shift control circuit 202 is used to control each of the K multipliers 201. Whether the product of the multiplier outputs performs a shift operation to obtain K processed products, one selection
  • the control circuit 203 is configured to control the K processed products to directly output or perform an accumulation operation and then output.
  • the shift control circuit 202 of the present application may include K shift circuits, for example, the shift circuit 2021, the shift circuit 2022, ..., and the shift circuit 202K shown in FIG.
  • Each shift circuit of the K shift circuits is connected to one of the K multipliers 201, and each of the K shift circuits is configured to receive a product of a multiplier output and multiply a multiplication method.
  • the product of the output of the device is subjected to shift control processing to obtain a processed product.
  • two or more multipliers of the K multipliers may share a shift circuit.
  • the shift control circuit 202 may include at least one shift circuit for performing shift control processing on the products of the K multiplier outputs, respectively.
  • the shift control process includes: shifting or not shifting.
  • the shift control circuit 202 performs a non-shift operation on the product of one multiplier output. It can be understood that the shift control circuit 202 shifts the product of the output of one multiplier by 0 bits, or the shift control circuit 202 directly outputs the multiplier output. The product of. Therefore, the processed K products output by the shift control circuit 202 include the product of the shifted product and/or the non-shifted product.
  • the number of bits to be moved may be the same or different.
  • the shift control circuit 202 may shift the product output from the multiplier 2011 to the left by 5 bits, and the pair is right.
  • the product output by the multiplier 2012 is shifted to the left by 10 bits, and the product of the other K-2 multiplier outputs is not shifted.
  • the M 1 ⁇ N 1 multipliers 2011 in the K multipliers 201 are connected to the shift circuit 2021 in the shift control circuit 202, and the M 2 ⁇ N 2 multipliers 2012 are shifted.
  • the shift circuit 2022 in the bit control circuit 202 is connected, and the M K ⁇ N K multiplier 201K is connected to the shift circuit 202K in the shift control circuit 202.
  • the connection relationship between the remaining multipliers and the shift circuit can be seen in FIG. This application does not repeat here.
  • the shift circuit 2021 is connected to the M 1 ⁇ N 1 multiplier 2011, and the shift circuit 2021 is for receiving the product of the output of the M 1 ⁇ N 1 multiplier 2011 and outputting the product of the M 1 ⁇ N 1 multiplier 2011. Perform shift control processing.
  • the shift operation or the non-shift operation may be performed in conjunction with the scene or configuration information applied by the data processing device.
  • the application embodiment does not limit this.
  • the apparatus 20 provided by the present application further includes: a register connected to each multiplier for storing a multiplicand and a multiplier of the multipliers connected thereto.
  • the shifting circuit may include a shift register shift and a selector MUX, and the shift register is connected to the one multiplier for performing a shift operation on a product of the multiplier output.
  • the selector is coupled to the shift register and the multiplier for selecting and outputting a shifted or unshifted product.
  • the shift control circuit 202 includes K shift registers and K selectors (for example, as shown in FIG. 5, the selector 2031, the selector 2032, ..., And a selector 203K).
  • one end of the selector 2031 is connected to the M 1 ⁇ N 1 multiplier 2011 for receiving the product of the output of the M 1 ⁇ N 1 multiplier 2011 (that is, for receiving without shifting).
  • the other end of the selector 2031 is coupled to the shift register for receiving the shifted product of the shift register output; one end of the selector 2032 is coupled to the M 2 ⁇ N 2 multiplier 2012 for receiving M
  • the product of the 2 ⁇ N 2 multiplier 2012 output (that is, the product for receiving the non-shift)
  • the other end of the selector 2032 is connected to the shift register for receiving the product after the shift of the shift register;
  • the selector 203K end and M K ⁇ N K multipliers are connected to receive the product M K ⁇ N K multiplier output, the other terminal of the selector 203K, and a shift register connected to receive shift register output after the shift product.
  • the working principle and the connection relationship of the other selectors are similar to those of the selector 203K, as shown in FIG. 5, and details
  • the apparatus provided by the present application further includes: a configuration information receiving circuit 30, configured to receive first configuration information, where the first configuration information is used to instruct the shift control circuit 202 to receive the received K.
  • the product is subjected to shift control processing.
  • the first configuration information indicates whether the shift control circuit needs to perform shift processing or no shift processing for each of the received K products.
  • the shift control process performed by each shift circuit on the product of the respective multiplier output received may be different. For example, some shift circuits need to perform shift processing on the product of the received multiplier output, and some shifts.
  • the bit circuit needs to process the product of the received multiplier output without shifting, in order to facilitate flexible and accurate shift control processing for each shift circuit to multiply the product of the respective multiplier output, optionally, first
  • the configuration information includes first indication information configured for at least one of the shift control circuits, wherein the first indication information is used to indicate a multiplier output of each of the at least one shift circuit for each of the shift circuits
  • the product of the shift is subjected to shift control processing.
  • the first indication information may be a first indicator or a second indicator, where the first indicator indicates that the received K products are subjected to shift processing, and the second indicator indicates that the received K products are not Perform shift processing.
  • the first indicator may be "0" and the second indicator may be "1.”
  • the first configuration information may include first indication information separately configured for each shift circuit.
  • the first configuration information received by the configuration information receiving circuit 30 further includes a second indication configured for the at least one shift circuit. information.
  • the second indication information is used to indicate a shift direction and/or a shift bit number when each shift circuit of the at least one shift circuit performs a shift control process on a product of the respective received multiplier outputs.
  • the shift circuit can accurately perform the shift control process on the product of the respective received multiplier outputs.
  • the first configuration information may include second indication information separately configured for each shift circuit.
  • the configuration information receiving circuit 30 can be a pin or a connecting line of the digital signal processing device 20.
  • the configuration information receiving circuit 30 can be a pin of the shift register, and the configuration information receiving circuit 30 can pass the pin of the shift register.
  • the indication information (such as the first indication information and/or the second indication information described above) is input to the shift register.
  • the configuration information receiving circuit 30 may be located in the DSP, or may be located in another controller in the FPGA, such as a central processing unit (CPU), which is not limited by the embodiment of the present application. Any means for inputting the first configuration information to the shift control circuit 202 provided in the embodiment of the present application can be used as the configuration information receiving circuit 30 of the present application.
  • CPU central processing unit
  • the configuration information receiving circuit 30 is further configured to receive third configuration information, where the third configuration information is used to indicate a shift of each of the K selectors from the respective received shift register output. After multiplication One of the unshifted products of the product and the multiplier output is selected and output to the selection control circuit 203.
  • the third configuration information includes a first indication configured for each of the K shift circuits, wherein the first indication is used to indicate a product sum of the selector after shifting of the shift register output One of the products of the multiplier output is selected and output to the selection control circuit 203.
  • the first indication may be a letter or a number, which is not limited in this application.
  • the first indication may be a third indicator or a fourth indicator, wherein the third indicator is used to instruct the selector to output the shifted product outputted by the shift circuit to the selection control circuit 203, and fourth The indicator is used to instruct the selector to output the product of the multiplier output to the selection control circuit 203.
  • the third indicator may be “1” and the fourth indicator may be “0”, such that each selector may perform corresponding processing according to the specific content of the corresponding corresponding first indication.
  • the selection control circuit 203 includes: an accumulation circuit for implementing accumulation of K processed products.
  • the accumulation circuit may include a plurality of accumulators, for example, an accumulator 301, an accumulator 302, an accumulator 303, ..., and an accumulator 30 (K-1) as shown in FIG.
  • the accumulating circuit when the accumulating circuit includes a plurality of accumulators (for example, including at least three accumulators), the plurality of accumulators are connected in cascade, and the number of accumulators in the previous stage is adjacent to the same The number of accumulators in the latter stage is one less (wherein the output of the accumulator of the previous stage in the two cascaded accumulators is the input of the accumulator of the latter stage).
  • the accumulator connected to the selection control circuit 202 is referred to as a first-stage accumulator, and a first-stage accumulator is used to perform the two processed products output by the shift control circuit 202. Accumulate.
  • each of the two selectors is coupled to a first stage accumulator of the accumulator circuit, for example, a selector 2031 and a selector 2032 are coupled to a first stage accumulator 301 of the accumulating circuit.
  • the number of selectors when the number of selectors is an odd number, there is an accumulator that only receives the product of one of the selector outputs, and accumulates the product of one of the selector outputs received by the accumulator and the other accumulators. Accumulate.
  • the product output by the selector 2031 and the selector 2032 is accumulated by the accumulator 301
  • the product output by the selector 2033 is output to the accumulator 302
  • the accumulator 301 and the accumulator 302 are respectively output.
  • the accumulated result is accumulated and output by the accumulator 303.
  • accumulation may also be performed using a method similar to the above. As shown in FIG.
  • FIG. 9 takes the shift control circuit as an example.
  • the products of the selector 2031 and the output of the selector 2032 are accumulated by the accumulator 301 of the first stage, and the selector 2033 and the selector 2034 output.
  • the product of the first stage is accumulated by the accumulating circuit 302 of the first stage, and the product of the output of the selector 2033 and the selector 2034 is accumulated by the accumulating circuit 303 of the first stage, and the accumulated result of the accumulator 301 and the accumulator 302 is accumulated by the accumulator 304, and finally
  • the accumulated result output by the accumulator 304 and the accumulated result output by the accumulator 303 are accumulated by the accumulator 305.
  • the configuration information receiving circuit 30 is further configured to receive second configuration information, where the second configuration information is used to instruct the selection control circuit 203 to output an accumulated result of the K processed products or output K processed samples. product.
  • the second configuration information may be a second indication or a third indication, where the second indication is used to instruct the selection control circuit 203 to output K processed products, and the third indication is used to instruct the selection control circuit 203 to output K The cumulative result of the processed product.
  • the selection control circuit 203 directly outputs K processed products when determining that the second configuration information is the second indication, and the selection control circuit 203 performs accumulation of the K processed products when determining that the second configuration information is the third indication. Output after operation.
  • the results of the K multipliers output by the digital signal processing apparatus provided by the present application may be selected or used as needed, that is, the apparatus may implement the function of any at least one of the K multipliers.
  • the multiply accumulate function can be implemented by K multipliers or the function of the M ⁇ N multiplier can be realized by K multipliers, wherein the M ⁇ N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits. Multiplier and satisfy
  • the apparatus provided by the present application can be used to implement a multiply-accumulate operation function.
  • the shift control circuit 202 provided by the present application is specifically configured to output K products outputted by the K multipliers 201 to the selection control circuit 203 without shift processing; and select a control circuit. 203. Add and output the received K multiplications.
  • the third configuration information received by each selector is used to instruct each selector to directly output the product of the respective received multiplier outputs.
  • the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products.
  • the multiply and accumulate operations in this application may be a convolution operation between matrices.
  • the digital signal processing device 20 shown in Fig. 11 includes four 4 bit ⁇ 4 bit multipliers, which can be used to implement a 2 ⁇ 2 matrix convolution operation.
  • the multiplicands are d1, d2, d3, and d4, and the multipliers are k1, k2, k3, and k4.
  • the multiplicand sequence input to the 8-bit ⁇ 8-bit multiplier 3011 is: in_X1[7:0]
  • the multiplier sequence is: in_Y1[7:0]
  • the input to the 8-bit ⁇ 8-bit multiplier 3012 is The multiplier sequence is: in_X2[7:0]
  • the multiplier sequence is: in_Y2[7:0]
  • the multiplicand sequence input to the 8bit ⁇ 8bit multiplier 3013 is: in_X3[7:0]
  • the multiplier sequence is :in_Y3[7:0]
  • the multiplicand sequence input to the 8bit ⁇ 8bit multiplier 3014 is: in_X4[7:0]
  • the multiplier sequence is: in_Y4[7:0].
  • in_X1[7:0] represents d1, in_X2[7:0] represents d2, in_X3[7:0] represents d3 and in_X4[7:0] represents d4; in_Y1[7:0] represents k1, in_Y2[7 :0] means k2, in_Y3[7:0] means k3 and in_Y4[7:0] means k4.
  • the multiplicand sequence and the multiplier sequence can be stored in registers, respectively.
  • the 8-bit ⁇ 8-bit multiplier 3011 obtains the first product after performing multiplication operations on in_X1[7:0] and in_Y1[7:0]; the 8-bit ⁇ 8-bit multiplier 3012 will in_X2[7:0] and in_Y2[7: 0] After performing the multiplication operation, the second product is obtained; the 8-bit ⁇ 8-bit multiplier 3013 multiplies in_X3[7:0] and in_Y3[7:0] to obtain the third product; the 8-bit ⁇ 8-bit multiplier 3011 will in_X4 After [7:0] and in_Y4[7:0] multiply, the fourth product is obtained.
  • the shift control circuit 202 does not perform a shift operation on the product of each 8-bit ⁇ 8-bit multiplier output, and the shift circuit 4011 in the shift control circuit 202 uses the first product output from the 8-bit ⁇ 8-bit multiplier 3011 as the selection control circuit.
  • the shift circuit 4012 uses the second product output from the 8-bit ⁇ 8-bit multiplier 3012 as the input of the selection control circuit 203
  • the shift circuit 4013 uses the second product output from the 8-bit ⁇ 8-bit multiplier 3013 as the selection control circuit 203
  • the input, shift circuit 4014 uses the fourth product output by the 8-bit x 8-bit multiplier 3014 as an input to the selection control circuit.
  • the unshifted first product and shift circuit 4012 output by the shift circuit 4011 The output of the unshifted second product obtains the first accumulation result by the first accumulator 501, and the unshifted third product output by the shift circuit 4013 and the unshifted fourth product output by the shift circuit 4014 pass the
  • the second accumulator 502 obtains a second accumulated result, and the first accumulator 501 and the second accumulator 502 respectively output the first accumulated result and the second accumulated result to the third accumulator 503 to obtain a final 2 ⁇ 2 convolution multiplication.
  • the result of the operation is a first accumulator 501 and the second accumulator 502 respectively output the first accumulated result and the second accumulated result to the third accumulator 503 to obtain a final 2 ⁇ 2 convolution multiplication.
  • the digital signal processing apparatus 20 provided by the present application further includes a register connected to the selection control circuit 203 for storing the accumulation result output by the selection control circuit 203 or the shift control of the direct output.
  • the K processed products of the circuit output is not limited to the register connected to the selection control circuit 203 for storing the accumulation result output by the selection control circuit 203 or the shift control of the direct output.
  • FIG. 12 differs from FIG. 11 in that the digital signal processing apparatus 20 shown in FIG. 12 includes nine 4 bit x 4 bit multipliers (4 bit x 4 bit multiplication as shown in FIG. 12).
  • the multiplicands are d1, d2, d3, d4, d5, d6, d7, d8, and d9
  • the multipliers are k1, k2, k3, k4, k5, k6, k7, k8, And k9.
  • the multiplicand sequence input to the 4 bit ⁇ 4 bit multiplier 6011 is: in_X1 [3:0], and the multiplier sequence is: in_Y1 [3:0]; for example, input to the 4 bit ⁇ 4 bit multiplier 6012
  • the multiplicand sequence is: in_X2[3:0]
  • the multiplier sequence is: in_Y2[3:0]
  • the multiplicand sequence input to the 4bit ⁇ 4bit multiplier 6018 is: in_X8[3:0]
  • multiplier The sequence is: in_Y8[3:0]
  • the multiplicand sequence input to the 4bit ⁇ 4bit multiplier 6019 is: in_X9[3:0]
  • the multiplier sequence is: in_Y9[3:0].
  • in_X1[3:0] represents d1, in_X2[3:0] represents d2, in_X3[3:0] represents d3 and in_X9[3:0] represents d9; in_Y1[3:0] represents k1, in_Y2[3 :0] means k2, in_Y3[3:0] means k3 and in_Y9[7:0] means k9.
  • the above multiplicands correspond to a multiplicand input sequence of a 4bit ⁇ 4bit multiplier, respectively. The number is corresponding to the multiplier input sequence of a 4bit ⁇ 4bit multiplier, and the rest of the input can refer to the above description, which is not described herein again.
  • each 4bit ⁇ 4bit multiplier is used to multiply the input sequence it receives, and obtain an output result. Since the device shown in FIG. 12 is used to implement a 3 ⁇ 3 convolution operation, the output result is obtained.
  • Z (d1 ⁇ k1) + (d2 ⁇ k2) + (d3 ⁇ k3) + (d4 ⁇ k4) + (d5 ⁇ k5) + (d6 ⁇ k6) + (d7 ⁇ k7) + (d8 ⁇ k8) + (d9 ⁇ k9), whereby it can be known that the shift control circuit 202 does not perform a shift operation on the output result of each 4 bit ⁇ 4 bit multiplier output, for example, as shown in Fig.
  • the black thick solid arrow in FIG. 12 indicates that the shift operation is not performed, that is, each selector shown in FIG. 12 is used to input the product of the respective received 4-bit ⁇ 4 bit multiplier output to the selection control circuit 203,
  • the selection control circuit 203 is then used to perform the accumulation operation by the product of the nine unexecuted shift operations through the accumulation circuit to obtain the result Z of the final 3 ⁇ 3 convolution operation, and the specific selection control circuit 203 will receive the nine
  • the process of performing the accumulating operation is similar to that of FIG. 11. For details, refer to the content shown in FIG.
  • the apparatus provided by the present application can also be used to implement the functions of K multipliers.
  • the digital signal processing device 20 can be used as four independent 8-bit by 8-bit multipliers.
  • Each of the shift control circuits 202 inputs the product of the respective received multiplier outputs directly to the selection control circuit 203 (ie, the selector included in each shift circuit inputs the unshifted product to Select control circuit 203), select control
  • the circuit 203 directly outputs the unshifted product of each shift circuit including the selector output to obtain the result of the output of four 8-bit by 8-bit multipliers, for example, the output results Z1, Z2 as shown in FIG. Z3 and Z4.
  • the first configuration information received by the configuration information receiving circuit 30 and/or the second configuration information indicates that the shift control circuit 202 does not perform a shift operation on the received K products, and the configuration information receiving circuit 30 receives The second configuration information indicates that the selection control circuit 203 directly outputs the received K processed results.
  • the input of the 8-bit ⁇ 8-bit multiplier 7011 is X1 (in_X1[7:0]) and Y1 (in_Y1[7:0]), and the input of the 8-bit ⁇ 8-bit multiplier 7012 is X2 ( In_X2[7:0]) and Y2(in_Y2[7:0]), the input of the 8bit ⁇ 8bit multiplier 7013 is X3 (in_X3[7:0]) and Y3 (in_Y3[7:0]), 8bit ⁇ 8bit
  • the inputs to multiplier 7014 are X4 (in_X4[7:0]) and Y4 (in_Y4[7:0]).
  • Each of the shift circuits in the shift control circuit 202 in Fig. 13 does not perform a shift operation on the product of the respective received 8-bit ⁇ 8-bit multiplier outputs, and therefore, each shift circuit will receive the respective 8 bits.
  • the product of the ⁇ 8 bit multiplier output is directly output to the selection control circuit 203 (that is, each shift circuit outputs an unshifted product to the selection control circuit 203 through a respective selector), and the selection control circuit 203 shifts each of them.
  • the selection control circuit 203 in the apparatus provided in the present application may include an accumulation circuit, but in a scenario in which an accumulation operation is required, the output of the selection control circuit may be configured to pass through the accumulation circuit without performing an accumulation operation.
  • the output of the configuration selection control circuit in the scene does not pass through the accumulation circuit.
  • the selection control circuit 203 can simultaneously output the processing result of the accumulation circuit and the processing result without the accumulation circuit, and the module or device connected to the digital signal processing device provided by the embodiment of the present application can be selectively used according to requirements.
  • the selection control circuit 203 can also output the output of any one or more accumulators according to the calculation requirement.
  • the shift control circuit 202 does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products to obtain K processed products.
  • the K processed products are output to the selection control circuit 203, and the selection control circuit 203 adds and outputs the K processed multiplications.
  • the shifting process performed on any one of the other K-1 products may be generated according to a bit position occupied by the multiplicand of the generated one of the multiplicative products in the M-bit multiplicand
  • the multiplier of any one of the products is performed at the bit position occupied by the N-bit multiplier.
  • the shift control circuit 202 selects a moving direction and/or a direction included in the indication information received by the shift circuit corresponding to each of the other K-1 products.
  • the number of bits moved is made.
  • each shift circuit will be connected
  • the direction in which the received product moves and the specific number of bits moved can be determined by:
  • the K multiplicand sequences received by the K multipliers are obtained by allocating a source multiplicand sequence according to the bit width of the multiplier in order of high to low, and K multiplier sequences received by K multipliers. It is obtained by assigning a source multiplier sequence according to the bit width of the multiplier in order of high to low.
  • the shift circuit needs to determine the number of bits of the received product shift based on the bit position of the multiplicand of each multiplier in the source multiplicand and the bit position of the multiplier in the source multiplier.
  • the M bit source is multiplied by X[9:0], which can be split into X[8:6], X[5:3], and X[2:0], N bit source multipliers.
  • the input sequence is Y[15:0], which can be split into: Y[14:10], Y[9:5], and Y[4:0], assuming that the multiplicator received by the multiplier is X[ 8:6], the multiplier is Y[14:10], then the shift control circuit 202 controls the shift register connected to the multiplier to shift the product of the multiplier output to the left by 16 bits. Assuming that the multiplier received by the multiplier is X[5:3] and the multiplier is Y[14:10], the shift control circuit 202 controls the shift register connected to the multiplier to output the multiplier. The product is shifted left by 13 bits.
  • the digital signal processing apparatus includes four 8-bit ⁇ 8-bit multipliers for implementing a 16-bit ⁇ 16-bit multiplier as an example.
  • the inputs of the 8bit ⁇ 8bit multiplier 8011 are in_X1[7:0] and in_Y1[7:0], respectively, and the inputs of the 8bit ⁇ 8bit multiplier 8012 are in_X2[7:0], in_Y2[7:0], and 8bit ⁇ respectively.
  • the inputs of the 8-bit multiplier 8013 are in_X3[7:0] and in_Y3[7:0], respectively, and the inputs of the 8-bit ⁇ 8-bit multiplier 8014 are in_X4[7:0] and in_Y4[7:0], respectively.
  • the source multiplicand X[15:0] and the source multiplier Y[15:0] which will include 16 bit bits, may be split according to the bit order from high to low.
  • the multiplicands X[15:8] and X[7:0], and the multipliers Y[15:8] and [Y7:0] are stored in the registers shown in Figure 15, respectively, where X[ 7:0] as X1[7:0], X[15:8] as X2[7:0], Y[15:8] as Y1[7:0], Y[7:0] as Y2[7 :0], specifically, X1[7:0] can be used as the multiplicand input in_X1[7:0] of the 8bit ⁇ 8bit multiplier 8011, and the multiplicand input of the 8bit ⁇ 8bit multiplier 8012 is input in_X2[7:0 ], X2[7:0] can be used as the multiplicand input in_X3[7:0] of the 8bit ⁇ 8bit multiplier 8013, and the multiplicand input of the 8bit ⁇ 8bit multiplier 8014 is input in_X4[7:0]; Y
  • the shift circuit 9011 does not perform the shift operation on the received product
  • the shift circuit 9012 and the shift circuit 9013 shift the received product to the left by 8 bits
  • the shift circuit 9014 shifts the received product to the left by 16 bits.
  • the selector included in the shift circuit 9011 outputs the unshifted product to the selection control circuit 203
  • the selector included in the shift circuit 9012 outputs the product of the left shift of 8 bits to the selection control circuit 203
  • the selection included in the shift circuit 9013 Device The product shifted left by 8 bits is output to the selection control circuit 203
  • the selector included in the shift circuit 9014 outputs the product shifted left by 16 bits to the selection control circuit 203.
  • the first configuration information may correspondingly indicate that the shifting circuit performs a corresponding shift or no shift operation
  • the third configuration information may correspondingly instruct the selector to select a corresponding processed product output to the selection control circuit.
  • the selection control circuit 203 shown in FIG. 15 performs an accumulation operation on an unshifted product and three shifted products output from the shift control circuit 202, and outputs it, specifically selecting the processing procedure in which the control circuit 203 performs the accumulation operation and
  • the processing procedure of the selection control circuit 203 described in the foregoing embodiment for implementing the convolution operation is similar, and the details are not described herein again.
  • the digital signal processing apparatus shown in FIG. 16 includes nine 4 bit x 4 bit multipliers that can be used to implement the functions of a 12 bit x 12 bit multiplier.
  • the shift circuit in FIG. In 1111 The selector is used to output the unshifted product to the selection control circuit 203, and the remaining selectors included in the shift control circuit 202 except the selector in the shift circuit 1111 are used to shift the respective connected shift registers. The product after the bit is output to the selection control circuit 203.
  • the function and processing procedure of the selection control circuit 203 in FIG. 16 and the selection control circuit 203 in FIG. 15 are the same, and the details are not described herein again.
  • the digital signal processing apparatus shown in FIG. 17 includes ab Abit ⁇ Bbit multipliers whose multiplicands are in_X1[A:0], in_X2[A:0], in_X3[A:0], in_X4[A, respectively. :0]...in_X ab[A:0], whose multipliers are in_Y1[B:0], in_Y2[B:0], in_Y3[B:0], in_Y4[B:0]...in_Yab[B:0
  • the output may be an output result when each Abit ⁇ Bbit multiplier of ab Abit ⁇ Bbit multipliers is used independently, or multiplication by ab Abit ⁇ Bbit.
  • the digital signal processing apparatus includes six 9-bit x 8-bit multipliers, which can be used to implement the functions of a 27-bit x 16-bit multiplier.
  • A1 [17:9] is used as in_X3[8:0] and in_X4[8:0] are input to the 9-bit ⁇ 8-bit multiplier 603 and the 9-bit ⁇ 8-bit multiplier 604, respectively.
  • A2 [26:18] is used as in_X5[8:0], and is used as in_X6[8:0] to be input to the 9-bit ⁇ 8-bit multiplier 605 and the 9-bit ⁇ 8-bit multiplier 606, respectively.
  • the selector in the shift circuit 701 is for outputting the unshifted product to the selection control circuit 203, and the remaining selectors of the shift control circuit 202 except the selector in the shift circuit 701 are used to receive the respective ones respectively.
  • the shifted product of the shift register is output to the selection control circuit 203, and the selection control circuit 203 is configured to perform an accumulation operation on an unshifted product output from the shift control circuit 202 and the remaining shifted products, the specific flow
  • FIG. 19 implements a matrix [d1 d2 d3] and a matrix with a digital signal processing apparatus including six 9-bit ⁇ 8-bit multipliers. The product between them is an example.
  • d1 is used as in_X1[8:0] and in_X4[8:0] are input to the 9bit ⁇ 8bit multiplier 601 and the 9bit ⁇ 8bit multiplier 604 as the multiplicand;
  • d2 is used as in_X2[8:0] and In_X5[8:0] is input to the 9bit ⁇ 8bit multiplier 602 and the 9bit ⁇ 8bit multiplier 605 as the multiplicand;
  • d3 is used as the in_X3[8:0] and the in_X6[8:0] respectively input to the 9bit ⁇ 8bit
  • the multiplier 603 and the 9-bit ⁇ 8 bit multiplier 606 are used as multiplicands;
  • k1 is used as in_Y1[7:0] and in_Y2[7:0] are input to the 9-bit ⁇ 8-bit multiplier 601 and the 9-bit ⁇ 8-bit multiplier 602, respectively.
  • FIG. 20 is an example of realizing six independent 9-bit by 8-bit multiplier functions by a digital signal processing apparatus including six 9-bit by 8-bit multipliers.
  • the input of each 9-bit x 8-bit multiplier in Fig. 20 is the same as that of each 9 bit x 8 bit in Fig. 18.
  • the difference between Fig. 20 and Fig. 18 is that the shift control circuit 202 of Fig. 20 does not perform shifting on the received K products.
  • the operation that is, each of the shift circuit 701, the shift circuit 702, ..., and the shift circuit 706 does not perform a shift operation for each of the products received. Therefore, in the apparatus shown in Fig.
  • Z1 (in_X1[8:0] ⁇ in_Y1[7:0]);
  • Z2 (in_X2[8:0] ⁇ in_Y2[7:0]);
  • Z3 (in_X3[8:0] ⁇ in_Y3[7:0]);
  • Z4 (in_X4[8:0] ⁇ in_Y4[7:0]);
  • any of the digital signal processing devices provided by the embodiments of the present application may be integrated into the programmable logic device.
  • the programmable logic device can be an FPGA.
  • FIG. 21 is a schematic flowchart diagram of a digital signal processing method provided by the present application, which can be applied to a digital signal processing apparatus, for example, any of the digital signal processing shown in FIG. 3 to FIG. Devices, including:
  • the shift control circuit performs shift control processing on the K products, obtains K processed products, and outputs the K processed products to the selection control circuit.
  • the selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
  • the step S102 is specifically implemented by: each of the K shift circuits included in the shift control circuit performs a shift control process on one of the K products to obtain K processes.
  • the product of each of the shift circuits outputs the resulting processed product to the selection control circuit.
  • the shift control process includes: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different.
  • the shift control circuit includes K shift circuits, and any one of the K shift circuits includes a shift register and a selector, wherein one end of the selector and the shift register Connect, the other end of the selector is connected to a multiplier.
  • the selector receives the output of the shift register and multiplier and selects one of them as its own output.
  • the shift register is used to shift the product of the multiplier output and then output to the selector.
  • the method provided by the application is used to implement at least one of the following functions: a multiply and accumulate operation function, a function of K multipliers, and a function of an M ⁇ N multiplier, where the M ⁇ N multiplier indicates that the multiplication is performed.
  • Multiplier with a bit width of M bits and a multiplier bit width of N bits, and satisfies
  • step S102 is specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted processes.
  • the product is output, and K products that are not subjected to the shift processing are output to the selection control circuit.
  • the K shift circuits included in the shift control circuit do not perform shift processing on the respective received products to obtain K products that are not subjected to the shift processing.
  • each of the K shifting circuits will multiply the product of the respective multiplier outputs as K products that are not subjected to shift processing.
  • Step S103 can be specifically implemented by the following method: the selection control circuit performs the accumulation operation on the received products that are not subjected to the shift processing, and outputs K multiplication accumulation addition results that are not subjected to the shift processing.
  • step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted The processed product is output to the selection control circuit by K products that are not subjected to the shift processing.
  • Step S103 can be specifically implemented by the following method: the shift control circuit directly outputs the received K products that have not undergone the shift processing.
  • step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on one of the received K products. And performing shift processing on the other K-1 products of the received K products to obtain K processed products, and outputting the K processed products to the selection control circuit;
  • Step S103 can be specifically implemented by: the shift control circuit adds and outputs the received K processed multiplications.
  • the method provided by the application further includes: S104: Receive first configuration information, where the first configuration information is used to instruct the shift control circuit to perform a shift control process on the received K products.
  • the first configuration information includes indication information configured for at least one of the shift control circuits, wherein the indication information is used to indicate that the at least one shift circuit performs shift control on the product received by the shift circuit. deal with.
  • the indication information is further used to indicate a shift direction and/or a shift bit number when the shift control process is performed by the at least one shift circuit on the product received by the at least one shift circuit.
  • the first configuration information includes indication information configured for each shift circuit.
  • the method provided by the application further includes: S105: Receive second configuration information, where the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output the K processed products.
  • the present application provides a programmable logic device (PLD), the programmable logic device including at least the digital signal processing device as described in any one of embodiments of FIGS. 3 to 20.
  • the programmable logic device can be applied to scenes that require digital signal processing (eg, multiply-accumulate calculation), such as radar, artificial intelligence, deep learning, image processing, video processing, wireless baseband/medium radio, satellite navigation, and the like.
  • the programmable logic device can be a field programmable gate array as shown in FIG. Programmable Gate Array (FPGA), specifically, the digital signal processing device may be a digital signal processor (DSP) as shown in FIG.
  • the programmable logic device can also be a Complex Programable Logic Device (CPLD) or an Erasable Programmable Logic Device (EPLD).
  • CPLD Complex Programable Logic Device
  • EPLD Erasable Programmable Logic Device
  • the programmable logic device may also include Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Array Logic (PAL), Generic Array Logic (GAL). )Wait.
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • PAL Programmable Array Logic
  • GAL Generic Array Logic
  • the FPGA provided in this application can be used in industrial control, consumer electronics and other related fields, and provides a large number of multiplication and addition capabilities due to the large number of DSP hard core units included in the FPGA. Since the digital signal processing device provided by the present application includes a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions.
  • the FPGA provided by the present application can provide a large number of multiplication and addition functions through the included DSP hard core, and can be used as the most important deep learning algorithm in artificial intelligence (especially in the Convolutional Neural Network (CNN)). Use FPGA as a deep learning processor.
  • CNN Convolutional Neural Network
  • FPGA is currently based on look-up table technology, and integrates common functions (such as RAM, clock management and DSP) hard core (Hardcore, ASIC type) module; because the lookup table based FPGA has a very high
  • the main components of FPGA are: programmable input/output unit (I/O Block, IOB), Configurable Logic Resource Block (CLB), complete clock management, embedded block random access memory (Random access memory). , RAM), rich routing resources, DSP hard core units and other embedded dedicated hardware modules.
  • the programmable logic resource unit in Figure 22 can be programmed to perform a variety of circuits and functions, including programmable Look Up Tables (LUTs) and registers.
  • LUTs programmable Look Up Tables
  • the number of CLBs has reached millions of levels (K x K).
  • horizontal and vertical lines indicate routing resources, and the input and output interconnections of each CLB can be completed by programming.
  • the wiring resources are connected to various programmable resources in the FPGA, and the wiring of the FPGA in FIG. 22 adopts a short-line interconnection architecture.
  • the DSP represents the digital signal processing hard core unit (DSP Hardcore) inside the FPGA, which can be configured to perform complex signal processing operations such as multiplication and addition operations, and multiply and accumulate operations to satisfy the user's video decoding and Fourier transform.
  • DSP Hardcore digital signal processing hard core unit
  • the need for signal and image processing has become the most important hard core unit for FPGA signal processing.
  • the DSP in FIG. 22 can use any of the data signal processing devices provided in the embodiments of the present application.
  • most of the operations in the convolutional neural network processing operation are convolution operations, most typically convolution operations of N ⁇ N matrices, such as 2 ⁇ 2 or A 3 ⁇ 3 or 5 ⁇ 5 convolution operation is shown in FIG.
  • N multipliers are needed to realize one convolution operation at the same time.
  • N multipliers need to use logic circuits to cascade into the required multiply and accumulate operations.
  • the part of the logic circuit is a programmable circuit and needs to occupy other programmable logic resources other than DSP. This would be limited by the logic used for cascading so that the programmable part could not run to too high a frequency, although the DSP hard core It can run to a higher frequency, but the overall running computing performance is limited, and the computing power is wasted.
  • the convolution operation completes the multiplication of the multiplicand and the multiplier corresponding value, and implements an accumulation operation.
  • the completion of the 2 ⁇ 2 convolution operation in the present application requires only one digital signal processing device provided in the embodiment of the present application.
  • the multiplicand may be a parameter of some data (such as image, sound, and text) received by the device in the present application.
  • the multiplier can be a fixed parameter located in the Kernel.
  • the digital signal processing device in the embodiment of the present application is embedded in the FPGA in a hard core manner. Since it is an ASIC-based hard core circuit, it can provide certain flexibility while ensuring an optimal operating frequency and providing the highest operational efficiency. Since the digital signal processing device is provided with a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions.

Abstract

A digital signal processing method and device and a programmable logic device, relating to the field of digital circuits, and used for solving the existing problem in the prior art that the use of a fixed bit-width multiplier might result in low resource utilization and inferior operation performance. The invention comprises K multipliers (201), a shift control circuit (202), and a selection control circuit (203). The ith multiplier of the K multipliers (201) is used for performing multiplication with a bit width of Mi bits as a multiplicand and a bit width of Ni bits as a multiplier. The shift control circuit (202) is connected to the K multipliers (201), and is used for receiving K products outputted by the K multipliers (201), and for performing shift control processing on the K products to obtain K processed products and outputting the K processed products to the selection control circuit (203). The selection control circuit (203) is connected to the shift control circuit (202), and is used for receiving the K processed products sent by the shift control circuit (202) and outputting a result of accumulating the K processed products or outputting the K processed products.

Description

一种数字信号处理方法、装置及可编程逻辑器件Digital signal processing method, device and programmable logic device 技术领域Technical field
本申请实施例涉及数字电路领域,尤其涉及一种数字信号处理方法、装置及可编程逻辑器件。The embodiments of the present invention relate to the field of digital circuits, and in particular, to a digital signal processing method, apparatus, and programmable logic device.
背景技术Background technique
可编程逻辑器件,如现场可编程门阵列(Field Programmable Gate Array,FPGA),中包含数字信号处理硬核(Digital Signal Processor Hardcore,DSP Hardcore),DSP Hardcore可以配置乘法,加法,以及乘累加等复杂信号的运算处理,因此,FPGA可以提供深度学习时需要的运算(例如,卷积乘法运算和累加运算)能力,通常业界使用FPGA作为深度学习的处理器。Programmable logic devices, such as Field Programmable Gate Array (FPGA), include Digital Signal Processor Hardcore (DSP Hardcore). DSP Hardcore can be configured for multiplication, addition, and multiply-accumulate. The arithmetic processing of the signal, therefore, FPGA can provide the operations required for deep learning (for example, convolution multiplication and accumulation operations), and the industry generally uses FPGA as a deep learning processor.
通常FPGA内DSP Hardcore完成运算处理的器件主要是乘法器,如图1所示,图1示出了现有技术中提供的一种DSP Hardcore内部结构,其中,乘法器103的位宽一般固定为M比特(bit)×N bit。而深度学习所需要的乘法运算位宽存在多种可能性,例如,深度学习所需要的位宽小于乘法器本身具有的位宽。在这种情况下,现有技术通常将大位宽乘法器(例如,19bit×18bit)的bit位按照由高至低的顺序填充X个0,以实现使用大位宽乘法器来实现小位宽乘法器(例如,8bit×8bit)的乘法运算。如图2所示,图2示出了将一个19bit×18bit的乘法器当作一个8bit×8bit的乘法器使用的示意图,如图2所示,可以通过将19bit×18bit的乘法器的一个输入填充11个0,将另一个输入填充10个0以将19bit×18bit的乘法器当作一个8bit×8bit的乘法器使用。Generally, the device in which the DSP Hardcore performs arithmetic processing in the FPGA is mainly a multiplier. As shown in FIG. 1, FIG. 1 shows a DSP Hardcore internal structure provided in the prior art, wherein the bit width of the multiplier 103 is generally fixed to M bit (bit) × N bit. There are many possibilities for the multiplication operation bit width required for deep learning. For example, the bit width required for deep learning is smaller than the bit width that the multiplier itself has. In this case, the prior art usually fills X bits of the bit width multiplier (for example, 19 bit × 18 bits) in order of high to low to realize the use of a large bit width multiplier to realize the small bit. Multiplication by a wide multiplier (for example, 8bit x 8bit). As shown in FIG. 2, FIG. 2 shows a schematic diagram of using a 19-bit×18-bit multiplier as an 8-bit×8-bit multiplier, as shown in FIG. 2, by inputting a 19-bit×18-bit multiplier. Fill 11 zeros and fill the other input with 10 zeros to use the 19bit x 18bit multiplier as an 8bit x 8bit multiplier.
但是,现有技术中一个大位宽乘法器在当作小位宽的乘法器使用时,仅能根据需要将其当作一个小位宽的乘法器使用,例如,一个19bit×18bit的乘法器仅能当作一个8bit×8bit的乘法器使用。一个大位宽乘法器无法同时拆分成多个更小位宽的乘法器使用,且在高bit位填充0以实现将大位宽的乘法器当作小位宽的乘法器使用,这样会浪费大量的运算能力和运算资源。However, in the prior art, a large bit width multiplier can be used as a small bit width multiplier as needed when used as a small bit width multiplier, for example, a 19 bit x 18 bit multiplier. Can only be used as an 8bit × 8bit multiplier. A large bit width multiplier cannot be split into multiple smaller bit width multipliers at the same time, and padding 0 in the high bit bit is used to implement the large bit width multiplier as a small bit width multiplier. A lot of computing power and computing resources are wasted.
发明内容Summary of the invention
本申请实施例提供一种数字信号处理方法、装置及可编程逻辑器件,用以解决现有技术中存在的使用固定位宽乘法器,导致资源利用率低、运算性能浪费等问题。The embodiment of the present invention provides a digital signal processing method, device, and programmable logic device, which solves the problems of using a fixed bit width multiplier in the prior art, resulting in low resource utilization and waste of computing performance.
第一方面,本申请实施例提供一种数字信号处理装置,包括:K个乘法器,移位控制电路和选择控制电路;其中,K个乘法器中的第i个乘法器,用于实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数;移位控制电路,与K个乘法器相连,用于接收K个乘法器输出的K个乘积,以及用于对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路;选择控制电路,与移位控制电 路相连,用于接收移位控制电路发送的K个处理后的乘积,并输出K个处理后的乘积的累加结果或者输出K个处理后的乘积。In a first aspect, an embodiment of the present application provides a digital signal processing apparatus, including: K multipliers, a shift control circuit, and a selection control circuit; wherein an i-th multiplier of the K multipliers is used to implement the a multiplier whose width is M i bits and whose multiplier width is N i bits, where M i and N i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2; The shift control circuit is connected to K multipliers for receiving K products of K multiplier outputs, and for performing shift control processing on K products, obtaining K processed products, and K The processed product is output to the selection control circuit; the selection control circuit is connected to the shift control circuit for receiving the K processed products transmitted by the shift control circuit, and outputting the accumulated result or output of the K processed products. K processed products.
本申请提供一种数字信号处理装置,利用移位控制电路对K个乘法器中每个乘法器输出的K个乘积做移位控制处理,以获得K个处理后的乘积,并将K个处理后的乘积通过选择控制电路直接输出或者执行累加操作后输出。这样,不仅可以利用K个乘法器实现K个乘法运算,且还可以利用K个乘法器实现乘累加运算,例如,利用K个乘法器实现大位宽的乘法器的功能,还可以针对矩阵卷积运算完成一个卷积运算所需的乘累加,使数字信号处理装置的运算能力得到最大利用,减少计算资源浪费,且可以根据不同的数字信号处理需求应用到不同的场景或器件中。本申请提供的数字信号处理装置,可以硬核的方式内嵌于可编程逻辑器件(如,现场可编程门阵列FPGA)内,本申请提供的数字信号处理装置由于具备选择控制电路和移位控制电路,在提供一定灵活性的同时,保证更佳的运行频率,提供更高的运算效率,从而适用于各种运算场景,例如,在深度学习处理过程中可以为矩阵卷积运算提供高效的组合运算单元。The present application provides a digital signal processing apparatus that performs shift control processing on K products outputted by each multiplier of K multipliers by using a shift control circuit to obtain K processed products, and K processing The latter product is output directly by the selection control circuit or after the accumulation operation is performed. In this way, not only K multipliers can be used to implement K multiplication operations, but also K multipliers can be used to implement multiply and accumulate operations. For example, K multipliers can be used to implement functions of large bit width multipliers, and matrix volumes can also be used. The product operation completes the multiplication and accumulation required for a convolution operation, which maximizes the computing power of the digital signal processing device, reduces the waste of computing resources, and can be applied to different scenarios or devices according to different digital signal processing requirements. The digital signal processing device provided by the present application can be embedded in a programmable logic device (such as a field programmable gate array FPGA) in a hard core manner. The digital signal processing device provided by the present application has a selection control circuit and a shift control. The circuit provides a certain flexibility, while ensuring a better operating frequency and providing higher computational efficiency, thus being suitable for various computing scenarios, for example, providing an efficient combination of matrix convolution operations in the deep learning process. Arithmetic unit.
结合第一方面,在第一方面的第一种可能的实现方式中,移位控制电路包括K个移位电路,K个移位电路中的每个移位电路与K个乘法器中的一个乘法器相连,每个移位电路用于接收一个乘法器输出的乘积,并对一个乘法器输出的乘积做移位控制处理,得到一个处理后的乘积。通过为K个乘法器中的每个乘法器配置一个移位电路可以使得每个乘法器输出的乘积得到更灵活、更精确地移位。In conjunction with the first aspect, in a first possible implementation of the first aspect, the shift control circuit includes K shift circuits, one of each of the K shift circuits and one of the K multipliers The multipliers are connected, each shifting circuit is used to receive the product of a multiplier output, and the product of the output of one multiplier is subjected to shift control processing to obtain a processed product. By arranging one shifting circuit for each of the K multipliers, the product of each multiplier output can be shifted more flexibly and accurately.
结合第一方面或第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,移位控制处理包括:移位或者不移位。即,所述K个乘积中的任一个乘积可以被移位或者不被移位,所述K个乘积的移位控制处理可以相同或者不同。通过设置移位或不移位操作不仅可以实现大位宽乘法器的功能,还可以实现每个乘法器各自的功能。In conjunction with the first aspect or the first possible implementation of the first aspect, in a second possible implementation of the first aspect, the shift control process comprises: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different. Not only can the function of the large bit width multiplier be realized by setting the shift or no shift operation, but also the function of each multiplier can be realized.
结合第一方面至第一方面的第二种可能的实现方式中任一项,在第一方面的第三种可能的实现方式中,选择控制电路包括累加电路,累加电路用于实现将K个处理后的乘积的累加。通过将K个处理后的乘积执行累加操作可以用本申请提供的装置实现乘累加操作或者实现更大位宽的乘法器功能。With reference to any one of the first aspect to the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the selection control circuit includes an accumulation circuit, and the accumulation circuit is configured to implement K The accumulation of the processed product. The multiply-accumulate operation or the multiplier function of achieving a larger bit width can be implemented by the apparatus provided by the present application by performing an accumulation operation on the K processed products.
结合第一方面至第一方面的第三种可能的实现方式中任一项,在第一方面的第四种可能的实现方式中,本申请实施例提供的数字信号处理装置用于实现以下功能中的至少一项,乘累加运算功能;K个乘法器的功能;M×N乘法器的功能,其中,M×N乘法器表示被乘数位宽为M比特,乘数位宽为N比特的乘法器,且满足
Figure PCTCN2017095061-appb-000001
With reference to the first aspect to any one of the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement the following functions. At least one of the multiply-accumulate operation function; the function of the K multipliers; the function of the M×N multiplier, wherein the M×N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits Multiplier and satisfy
Figure PCTCN2017095061-appb-000001
结合第一方面至第一方面的第四种可能的实现方式中任一项,在第一方面的第五种可能的实现方式中,本申请实施例提供的数字信号处理装置用于实现乘累加运算时,移位控制电路,将接收到的K个乘法器输出的K个乘积,不经过移位处理,直接输出至选择控制电路;其中,对所述其他K-1个乘积中的任意一个乘积进行的移位处理,根据生成所述任意一个乘积的被乘数在M比特被乘数中所占的比特位置和生成所述任意一个乘积的乘数在N比特乘数中所占的比特位置进行;选择控制电路, 将接收到的K个乘积累加并输出。在实现乘累加运算时,可以不改变本申请提供的数字信号处理装置的结构仅通过移位控制电路对接收到的K个乘积不执行移位操作,以及选择控制电路,将接收到的K个乘积累加并输出即可实现。With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in a fifth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement multiply and accumulate In the operation, the shift control circuit directly outputs the K products of the received K multiplier outputs to the selection control circuit without performing shift processing; wherein, any one of the other K-1 products The shift processing by the product, the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit occupied by the multiplier generating the arbitrary one product in the N-bit multiplier Positioning; selecting the control circuit, The received K multiplications are added and output. When the multiply and accumulate operation is implemented, the structure of the digital signal processing apparatus provided by the present application may be changed, only the shift control circuit does not perform the shift operation on the received K products, and the control circuit is selected, and the received K signals are received. This can be achieved by multiplying and adding and outputting.
结合第一方面至第一方面的第四种可能的实现方式中任一项,在第一方面的第六种可能的实现方式中,本申请实施例提供的数字信号处理装置用于实现K个乘法器的功能时,移位控制电路,将接收到的K个乘法器输出的K个乘积,不经过移位处理,输出至选择控制电路;选择控制电路,将接收到的K个乘积输出。在实现每个乘法器的功能时,可以不改变本申请提供的数字信号处理装置的结构仅通过为移位控制电路对接收到的K个乘积不执行移位处理,并通过选择控制电路将移位控制电路未移位的K个乘积直接输出,即可实现,因此增大了本申请提供的装置的应用范围。With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in the sixth possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement K In the function of the multiplier, the shift control circuit outputs the K products of the received K multiplier outputs to the selection control circuit without shift processing, and selects the control circuit to output the received K products. When the function of each multiplier is implemented, the structure of the digital signal processing apparatus provided by the present application may not be changed only by performing shift processing on the received K products for the shift control circuit, and shifting by selecting the control circuit The K products of the bit control circuit that are not shifted are directly outputted, which is achieved, thus increasing the application range of the device provided by the present application.
结合第一方面至第一方面的第四种可能的实现方式中任一项,在第一方面的第七种可能的实现方式中,本申请实施例提供的数字信号处理装置用于实现M×N乘法器的功能时,移位控制电路,对接收到的K个乘积中的一个乘积不执行移位处理,对接收到的K个乘积中的其他K-1个乘积执行移位处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路;选择控制电路将K个处理后的乘积累加并输出。本申请可以利用移位控制电路对K个乘法器中每个乘法器输出的乘积执行不同位数的移位操作以实现大位宽乘法器的功能。With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in the seventh possible implementation manner of the first aspect, the digital signal processing apparatus provided by the embodiment of the present application is used to implement M× When the function of the N multiplier is performed, the shift control circuit does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products. K processed products, and K processed products are output to the selection control circuit; the selection control circuit adds and outputs the K processed multiplications. The present application can perform a shift operation of a different number of bits by using a shift control circuit for the product of each of the K multiplier outputs to implement the function of the large bit width multiplier.
结合第一方面至第一方面的第七种可能的实现方式中任一项,在第一方面的第八种可能的实现方式中,数字信号处理装置还包括配置信息接收电路,用于接收第一配置信息,该第一配置信息用于指示移位控制电路对接收到的K个乘积分别做移位控制处理。通过为数字信号处理装置配置第一配置信息可以使得移位控制电路根据第一配置信息对每个乘法器各自输出的乘积执行移位操作或者不执行移位操作。In combination with the first aspect to any one of the seventh possible implementation manners of the first aspect, in the eighth possible implementation manner of the first aspect, the digital signal processing apparatus further includes a configuration information receiving circuit, configured to receive the first A configuration information, the first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products, respectively. By configuring the first configuration information for the digital signal processing device, the shift control circuit may cause the shift operation or the shift operation not to be performed on the product outputted by each of the multipliers according to the first configuration information.
结合第一方面至第一方面的第八种可能的实现方式中任一项,在第一方面的第九种可能的实现方式中,第一配置信息包含为移位控制电路中的至少一个移位电路配置的指示信息,其中,指示信息用于指示至少一个移位电路对移位电路接收到的乘积做移位控制处理。通过为每个移位电路配置指示信息,可以使得每个移位电路根据各自接收到的指示信息对各自接收到的乘积执行移位处理或者不执行移位处理。In conjunction with the first aspect, the eighth possible implementation of the first aspect, in the ninth possible implementation of the first aspect, the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit. By configuring the indication information for each shift circuit, each shift circuit can be caused to perform shift processing or no shift processing on the respective received products in accordance with the respective received indication information.
结合第一方面至第一方面的第九种可能的实现方式中任一项,在第一方面的第十种可能的实现方式中,指示信息还用于指示至少一个移位电路对至少一个移位电路接收到的乘积进行移位控制处理时的移位方向和/或移位位数。In combination with the first aspect to any one of the ninth possible implementation manners of the first aspect, in a tenth possible implementation manner of the first aspect, the indication information is further used to indicate that the at least one The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.
结合第一方面至第一方面的第十种可能的实现方式中任一项,在第一方面的第十一种可能的实现方式中,数字信号处理装置还包括配置信息接收电路,用于接收第二配置信息,所述第二配置信息用于指示所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出K个处理后的乘积。In combination with the first aspect to any one of the tenth possible implementation manners of the first aspect, in the eleventh possible implementation manner of the first aspect, the digital signal processing apparatus further includes configuration information receiving circuit, configured to receive And second configuration information, the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output K processed products.
结合第一方面至第一方面的第十一种可能的实现方式中任一项,在第一方面的第十二种可能的实现方式中,数字信号处理装置集成在可编程逻辑器件中。In conjunction with the first aspect to any one of the eleventh possible implementations of the first aspect, in a twelfth possible implementation of the first aspect, the digital signal processing apparatus is integrated in the programmable logic device.
第二方面,本申请实施例提供一种可编程逻辑器件,该可编程逻辑器件中包括如第一方面至第一方面的第十一种可能的实现方式中任一项所描述的数字信号处理 装置。可选的,所述可编程逻辑器件包括现场可编程门阵列(Field Programmable Gate Array,FPGA),复杂可编程逻辑器件(Complex Programable Logic Device,CPLD),可擦除可编辑逻辑器件(Erasable Programmable Logic Device,EPLD)中的至少一种。In a second aspect, an embodiment of the present application provides a programmable logic device including digital signal processing as described in any one of the first aspect to the eleventh possible implementation manner of the first aspect. Device. Optionally, the programmable logic device comprises a Field Programmable Gate Array (FPGA), a Complex Programable Logic Device (CPLD), and an erasable editable logic device (Erasable Programmable Logic). At least one of Device, EPLD).
第三方面,本申请实施例提供一种数字信号处理方法,本申请实施例提供的方法包括:进行K个乘法操作获得K个乘积,将K个乘积输出至移位控制电路,K个乘法操作中的第i个乘法操作实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数;移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路;选择控制电路输出K个处理后的乘积的累加结果或者输出K个处理后的乘积。In a third aspect, an embodiment of the present application provides a digital signal processing method. The method provided by the embodiment of the present application includes: performing K multiplication operations to obtain K products, and outputting K products to a shift control circuit, and K multiplication operations. The i-th multiplication operation in the multiplication operation realizes a multiplication operation in which the multiplicand bit width is M i bits and the multiplier bit width is N i bits, where M i and N i are positive integers, i=1, 2...K, K is an integer greater than or equal to 2; the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit; and selects control circuit output K The accumulated result of the processed products or the K processed products.
结合第三方面,在第三方面的第一种可能的实现方式中,移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,包括:移位控制电路包括的K个移位电路中每个移位电路对K个乘积中的一个乘积进行移位控制处理,以得到K个处理后的乘积。With reference to the third aspect, in a first possible implementation manner of the third aspect, the shift control circuit performs a shift control process on the K products to obtain K processed products, including: K included in the shift control circuit Each shift circuit in the shift circuit performs shift control processing on one of the K products to obtain K processed products.
结合第三方面或第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,移位控制处理包括:移位或者不移位。In conjunction with the third aspect or the first possible implementation of the third aspect, in a second possible implementation of the third aspect, the shift control process includes: shifting or not shifting.
结合第三方面至第三方面的第二种可能的实现方式中任一项,在第三方面的第三种可能的实现方式中,本申请实施例提供的方法用于实现如下功能中的至少一种:乘累加运算功能;K个乘法器的功能;M×N乘法器的功能,其中,M×N乘法器表示被乘数位宽为M比特,乘数位宽为N比特的乘法器,且满足
Figure PCTCN2017095061-appb-000002
Figure PCTCN2017095061-appb-000003
With reference to any one of the third aspect to the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement at least one of the following functions. One: multiply accumulate operation function; function of K multipliers; function of M×N multiplier, where M×N multiplier represents multiplier whose multiplicand bit width is M bits and multiplier bit width is N bits And satisfied
Figure PCTCN2017095061-appb-000002
Figure PCTCN2017095061-appb-000003
结合第三方面至第三方面的第三种可能的实现方式中任一项,在第三方面的第四种可能的实现方式中,本申请实施例提供的方法用于实现乘累加运算时,移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路,包括:移位控制电路对K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将K个不经过移位处理的乘积输出至选择控制电路;选择控制电路输出K个处理后的乘积的累加结果或者输出K个处理后的乘积,包括:选择控制电路将接收到的K个不经过移位处理的乘积执行累加操作并输出K个不经过移位处理的乘积累加结果。With reference to any one of the third aspect to the third possible implementation manner of the third aspect, in the fourth possible implementation manner of the third aspect, when the method provided by the embodiment of the present application is used to implement the multiply and accumulate operation, The shift control circuit performs shift control processing on the K products to obtain K processed products, and outputs the K processed products to the selection control circuit, including: the shift control circuit does not shift the K products. Processing, obtaining K products that are not subjected to shift processing, and outputting K products that have not undergone shift processing to the selection control circuit; selecting the control circuit to output the accumulated result of the K processed products or outputting K processed samples The product includes: the selection control circuit performs the accumulation operation on the received K non-shifted products and outputs K multiplication accumulation addition results that are not subjected to the shift processing.
结合第三方面至第三方面的第四种可能的实现方式中任一项,在第三方面的第五种可能的实现方式中,本申请实施例提供的方法用于实现K个乘法器的功能时,移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路,包括:移位控制电路对K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将K个不经过移位处理的乘积输出至选择控制电路;其中,对所述其他K-1个乘积中的任意一个乘积进行的移位处理,根据生成所述任意一个乘积的被乘数在M比特被乘数中所占的比特位置和生成所述任意一个乘积的乘数在N比特乘数中所占的比特位置进行;选择控制电路输出K个处理后的乘 积的累加结果或者输出K个处理后的乘积,包括:移位控制电路将接收到的K个不经过移位处理的乘积直接输出。With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a fifth possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement K multipliers. In the function, the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit does not K products After the shift processing, K products which are not subjected to the shift processing are obtained, and K products which are not subjected to the shift processing are output to the selection control circuit; wherein, for any one of the other K-1 products, Shift processing, based on the bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit position occupied by the multiplier generating the arbitrary product in the N-bit multiplier ; select the control circuit to output K processed multiplications The accumulated result of the product or the output of the K processed products includes: the shift control circuit directly outputs the received K products that are not subjected to the shift processing.
结合第三方面至第三方面的第四种可能的实现方式中任一项,在第三方面的第六种可能的实现方式中,本申请实施例提供的方法用于实现M×N乘法器的功能时,移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路,包括:移位控制电路对接收到的K个乘积中的一个乘积不进行移位处理,对接收到的K个乘积中的其他K-1个乘积进行移位处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路;选择控制电路输出K个处理后的乘积的累加结果或者输出K个处理后的乘积,包括:移位控制电路将接收到的K个处理后的乘积累加并输出。With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a sixth possible implementation manner of the third aspect, the method provided by the embodiment of the present application is used to implement an M×N multiplier In the function, the shift control circuit performs shift control processing on K products, obtains K processed products, and outputs K processed products to the selection control circuit, including: the shift control circuit receives the received One of the K products is not subjected to shift processing, and the other K-1 products in the received K products are subjected to shift processing to obtain K processed products, and K processed products are output. To the selection control circuit; the selection control circuit outputs the accumulated result of the K processed products or outputs the K processed products, and the shift control circuit adds and outputs the received K processed multiplications.
结合第三方面至第三方面的第四种可能的实现方式中任一项,在第三方面的第七种可能的实现方式中,本申请实施例提供的方法还包括:接收第一配置信息,该第一配置信息用于指示移位控制电路对接收到的K个乘积进行移位控制处理。With reference to any one of the third aspect to the fourth possible implementation manner of the third aspect, in a seventh possible implementation manner of the third aspect, the method provided by the embodiment of the present application further includes: receiving the first configuration information The first configuration information is used to instruct the shift control circuit to perform shift control processing on the received K products.
结合第三方面至第三方面的第七种可能的实现方式中任一项,在第三方面的第八种可能的实现方式中,第一配置信息包含为移位控制电路中的至少一个移位电路配置的指示信息,其中,指示信息用于指示至少一个移位电路对移位电路接收到的乘积进行移位控制处理。With reference to any one of the third aspect to the seventh possible implementation manner of the third aspect, in an eighth possible implementation manner of the third aspect, the first configuration information includes at least one shift in the shift control circuit The indication information of the bit circuit configuration, wherein the indication information is used to instruct the at least one shift circuit to perform a shift control process on the product received by the shift circuit.
结合第三方面至第三方面的第八种可能的实现方式中任一项,在第三方面的第九种可能的实现方式中,指示信息还用于指示至少一个移位电路对至少一个移位电路接收到的乘积进行移位控制处理时的移位方向和/或移位位数。With reference to any one of the third aspect to the eighth possible implementation manner of the third aspect, in the ninth possible implementation manner of the third aspect, the indication information is further used to indicate that the at least one shift circuit is configured to perform at least one shift The product received by the bit circuit performs the shift direction and/or the number of shift bits at the time of the shift control process.
结合第三方面至第三方面的第九种可能的实现方式中任一项,在第三方面的第十种可能的实现方式中,本申请实施例提供的方法还包括:接收第二配置信息,该第二配置信息用于指示选择控制电路输出K个处理后的乘积的累加结果或者输出K个处理后的乘积。With reference to any one of the third aspect to the ninth possible implementation manner of the third aspect, in a tenth possible implementation manner of the third aspect, the method provided by the embodiment of the present application further includes: receiving the second configuration information The second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or to output K processed products.
附图说明DRAWINGS
图1示出了现有技术中提供的一种数字信号处理器DSP内部结构;1 shows an internal structure of a digital signal processor DSP provided in the prior art;
图2示出了现有技术中使用大位宽乘法器实现小位宽乘法器的结构示意图;2 is a schematic structural diagram of a small bit width multiplier implemented by using a large bit width multiplier in the prior art;
图3为本申请提供的数字信号处理装置的结构示意图一;3 is a schematic structural diagram 1 of a digital signal processing apparatus provided by the present application;
图4为本申请提供的数字信号处理装置的结构示意图二;4 is a schematic structural diagram 2 of a digital signal processing apparatus provided by the present application;
图5为本申请提供的数字信号处理装置的结构示意图三;FIG. 5 is a schematic structural diagram 3 of a digital signal processing apparatus provided by the present application; FIG.
图6为本申请提供的数字信号处理装置的结构示意图四;6 is a schematic structural diagram 4 of a digital signal processing apparatus provided by the present application;
图7为本申请提供的数字信号处理装置的结构示意图五;FIG. 7 is a schematic structural diagram 5 of a digital signal processing apparatus provided by the present application; FIG.
图8为本申请提供的数字信号处理装置的结构示意图六;FIG. 8 is a schematic structural diagram 6 of a digital signal processing apparatus provided by the present application; FIG.
图9为本申请提供的数字信号处理装置的结构示意图七;9 is a schematic structural diagram 7 of a digital signal processing apparatus provided by the present application;
图10为本申请提供的数字信号处理装置的结构示意图八;10 is a schematic structural diagram VIII of a digital signal processing apparatus provided by the present application;
图11为本申请提供的数字信号处理装置的结构示意图九;11 is a schematic structural diagram IX of a digital signal processing apparatus provided by the present application;
图12为本申请提供的数字信号处理装置的结构示意图十;12 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图13为本申请提供的数字信号处理装置的结构示意图十一;FIG. 13 is a schematic structural diagram 11 of a digital signal processing apparatus provided by the present application; FIG.
图14为本申请提供的数字信号处理装置的结构示意图十二; 14 is a schematic structural diagram 12 of a digital signal processing apparatus provided by the present application;
图15为本申请提供的数字信号处理装置的结构示意图十三;15 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图16为本申请提供的数字信号处理装置的结构示意图十四;16 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图17为本申请提供的数字信号处理装置的结构示意图十五;17 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图18为本申请提供的数字信号处理装置的结构示意图十六;18 is a schematic structural diagram 16 of a digital signal processing apparatus provided by the present application;
图19为本申请提供的数字信号处理装置的结构示意图十七;19 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图20为本申请提供的数字信号处理装置的结构示意图十八;20 is a schematic structural diagram of a digital signal processing apparatus provided by the present application;
图21为本申请提供的数字信号处理方法的流程示意图;21 is a schematic flowchart of a digital signal processing method provided by the present application;
图22为本申请提供的FPGA的结构示意图;22 is a schematic structural diagram of an FPGA provided by the present application;
图23为一种卷积运算示意图一;Figure 23 is a schematic diagram 1 of a convolution operation;
图24为一种卷积运算示意图二。Figure 24 is a schematic diagram 2 of a convolution operation.
具体实施方式Detailed ways
为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对名称或功能或作用类似的对象进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。In order to facilitate the clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words “first”, “second”, and the like are used to distinguish objects with similar names or functions or functions, and those skilled in the art may Understanding the words “first” and “second” does not limit the quantity and order of execution.
需要说明的是,本申请实施例中的Q’b0表示Qbit的0,例如,图2中11’b0表示11bit的0,10’b0表示10bit的0。It is to be noted that Q'b0 in the embodiment of the present application indicates 0 of Qbit. For example, in Fig. 2, 11'b0 indicates 11-bit 0, and 10'b0 indicates 10-bit 0.
[M:0]表示输入/输出数据的全部或者部分比特序列,其中,输入/输出数据的全部或者部分比特序列中包括M个比特,例如,如图2中的[7:0]表示输入数据的比特序列的一部分包括7个比特。[M:0] represents all or part of the bit sequence of the input/output data, wherein all or part of the bit sequence of the input/output data includes M bits, for example, [7:0] in FIG. 2 represents the input data. A portion of the bit sequence includes 7 bits.
此外,本发明实施例中的2^P表示左移P个比特位,例如,2^16表示左移16个比特位。In addition, 2^P in the embodiment of the present invention indicates that P bits are shifted left, for example, 2^16 indicates that 16 bits are shifted left.
图3示出了本申请实施例提供的一种数字信号处理装置20,如图3所示,包括:K个乘法器201与K个乘法器201连接的移位控制电路202以及与移位控制电路202连接的选择控制电路203。FIG. 3 shows a digital signal processing apparatus 20 provided by an embodiment of the present application. As shown in FIG. 3, the present invention includes: a shift control circuit 202 connected to K multipliers 201 by K multipliers 201, and shift control. The selection control circuit 203 is connected to the circuit 202.
其中,K个乘法器201分别用于实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数;K个乘法器可以为如图3中的M1×N1乘法器2011、M2×N2乘法器2012、…、MK×NK乘法器201K。其中,Mi×Ni乘法器数据位宽的单位为比特(bit)。Wherein, K multipliers 201 are respectively used for multiplication operations in which the multiplicand bit width is M i bits and the multiplier bit width is N i bits, wherein M i and N i are positive integers, i=1, 2 ...K, K is an integer greater than or equal to 2; K multipliers may be M 1 × N 1 multiplier 2011, M 2 × N 2 multiplier 2012, ..., M K × N K multiplier as in Fig. 3 201K. The unit of the data width of the M i ×N i multiplier is a bit.
具体的,K个乘法器201中的第i(i=1,2…K)个乘法器用于根据接收到的乘数和被乘数,获取乘积,其中,被乘数可以是第一输入序列,所述第一输入序列包括Mi个比特,乘数可以是第二输入序列,所述第二输入序列包括Ni个比特。Specifically, the i-th (i=1, 2...K) multipliers of the K multipliers 201 are configured to obtain a product according to the received multiplier and the multiplicand, wherein the multiplicand may be the first input sequence The first input sequence includes M i bits, the multiplier may be a second input sequence, and the second input sequence includes N i bits.
移位控制电路202,用于接收K个乘法器输出的K个乘积,以及用于对K个乘积进行移位控制处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路203。The shift control circuit 202 is configured to receive K products of K multiplier outputs, and perform shift control processing on K products to obtain K processed products, and output K processed products to The control circuit 203 is selected.
选择控制电路203,用于接收移位控制电路202发送的K个处理后的乘积,并输出K个处理后的乘积的累加结果或者输出K个处理后的乘积。The selection control circuit 203 is configured to receive the K processed products transmitted by the shift control circuit 202, and output the accumulated result of the K processed products or output the K processed products.
本申请可选的,所述K个乘法器201可以共享一个移位控制电路202和一个选择控制电路203,在这种情况下,一个移位控制电路202用于控制K个乘法器201中每个乘法器输出的乘积是否执行移位操作,以获得K个处理后的乘积,一个选择 控制电路203用于控制所述K个处理后的乘积直接输出或者进行累加操作后再输出。Alternatively, the K multipliers 201 may share a shift control circuit 202 and a selection control circuit 203. In this case, a shift control circuit 202 is used to control each of the K multipliers 201. Whether the product of the multiplier outputs performs a shift operation to obtain K processed products, one selection The control circuit 203 is configured to control the K processed products to directly output or perform an accumulation operation and then output.
在一个具体的示例中,为了对K个乘法器中每个乘法器输出的乘积实现灵活精确的移位,可以为K个乘法器中的每个乘法器配置一个移位电路。结合图3,如图4所示,本申请移位控制电路202可以包括K个移位电路,例如,图4中所示的移位电路2021、移位电路2022、…、以及移位电路202K,K个移位电路中每个移位电路与K个乘法器201中的一个乘法器连接,K个移位电路中每个移位电路用于接收一个乘法器输出的乘积,并对一个乘法器输出的乘积进行移位控制处理,得到一个处理后的乘积。In a specific example, to achieve a flexible and accurate shifting of the product of each multiplier output of the K multipliers, one shifting circuit can be configured for each of the K multipliers. Referring to FIG. 3, as shown in FIG. 4, the shift control circuit 202 of the present application may include K shift circuits, for example, the shift circuit 2021, the shift circuit 2022, ..., and the shift circuit 202K shown in FIG. Each shift circuit of the K shift circuits is connected to one of the K multipliers 201, and each of the K shift circuits is configured to receive a product of a multiplier output and multiply a multiplication method. The product of the output of the device is subjected to shift control processing to obtain a processed product.
在另一个具体的示例中,K个乘法器中可以有两个或两个以上的乘法器共用一个移位电路。即,移位控制电路202中可以包含至少一个移位电路,用于分别对所述K个乘法器输出的乘积进行移位控制处理。可选的,所述移位控制处理包括:移位或者不移位。移位控制电路202对一个乘法器输出的乘积执行不移位操作可以理解为:移位控制电路202将该一个乘法器输出的乘积移动0位,或者移位控制电路202直接输出该乘法器输出的乘积。因此,所述移位控制电路202输出的处理后的K个乘积包括移位后的乘积和/或不移位的乘积。In another specific example, two or more multipliers of the K multipliers may share a shift circuit. That is, the shift control circuit 202 may include at least one shift circuit for performing shift control processing on the products of the K multiplier outputs, respectively. Optionally, the shift control process includes: shifting or not shifting. The shift control circuit 202 performs a non-shift operation on the product of one multiplier output. It can be understood that the shift control circuit 202 shifts the product of the output of one multiplier by 0 bits, or the shift control circuit 202 directly outputs the multiplier output. The product of. Therefore, the processed K products output by the shift control circuit 202 include the product of the shifted product and/or the non-shifted product.
不同乘法器输出的乘积在执行移位操作时,所移动的位数可以相同,也可以不相同,例如,移位控制电路202可以对乘法器2011输出的乘积向左移动5位,同时对对乘法器2012输出的乘积向左移动10位,对其他K-2个乘法器输出的乘积不进行移位。When the product of the different multiplier outputs is shifted, the number of bits to be moved may be the same or different. For example, the shift control circuit 202 may shift the product output from the multiplier 2011 to the left by 5 bits, and the pair is right. The product output by the multiplier 2012 is shifted to the left by 10 bits, and the product of the other K-2 multiplier outputs is not shifted.
示例性的,如图4所示的,K个乘法器201中的M1×N1乘法器2011与移位控制电路202中的移位电路2021连接,M2×N2乘法器2012与移位控制电路202中的移位电路2022连接,MK×NK乘法器201K与移位控制电路202中的移位电路202K连接,其余乘法器与移位电路的连接关系,可以参见图4,本申请在此不再赘述。例如,移位电路2021与M1×N1乘法器2011连接,则移位电路2021用于接收M1×N1乘法器2011输出的乘积,并对M1×N1乘法器2011输出的乘积执行移位控制处理。K个移位电路中每个移位电路对各自接收到的乘积进行移位控制处理时,是执行移位操作或者不移位操作可以结合数据处理装置所应用的场景或者配置信息来确定,本申请实施例对此不进行限定。Exemplarily, as shown in FIG. 4, the M 1 × N 1 multipliers 2011 in the K multipliers 201 are connected to the shift circuit 2021 in the shift control circuit 202, and the M 2 × N 2 multipliers 2012 are shifted. The shift circuit 2022 in the bit control circuit 202 is connected, and the M K × N K multiplier 201K is connected to the shift circuit 202K in the shift control circuit 202. The connection relationship between the remaining multipliers and the shift circuit can be seen in FIG. This application does not repeat here. For example, the shift circuit 2021 is connected to the M 1 × N 1 multiplier 2011, and the shift circuit 2021 is for receiving the product of the output of the M 1 × N 1 multiplier 2011 and outputting the product of the M 1 × N 1 multiplier 2011. Perform shift control processing. When each shift circuit of the K shift circuits performs shift control processing on the respective products received, the shift operation or the non-shift operation may be performed in conjunction with the scene or configuration information applied by the data processing device. The application embodiment does not limit this.
可选的,如图4所示,本申请提供的装置20还包括:与每个乘法器连接的寄存器,该寄存器用于存储与各自相连的乘法器的被乘数和乘数。Optionally, as shown in FIG. 4, the apparatus 20 provided by the present application further includes: a register connected to each multiplier for storing a multiplicand and a multiplier of the multipliers connected thereto.
可选的,所述移位电路可以包括移位寄存器shift和选择器MUX,所述移位寄存器与所述一个乘法器相连接,用于对所述乘法器输出的乘积进行移位操作,所述选择器与所述移位寄存器以及所述乘法器相连接,用于选择并输出经过移位或者未经过移位的乘积。在一个具体的示例中,如图5所示,移位控制电路202中包括K个移位寄存器和K个选择器(例如,如图5所示的,选择器2031、选择器2032、…、以及选择器203K)。Optionally, the shifting circuit may include a shift register shift and a selector MUX, and the shift register is connected to the one multiplier for performing a shift operation on a product of the multiplier output. The selector is coupled to the shift register and the multiplier for selecting and outputting a shifted or unshifted product. In a specific example, as shown in FIG. 5, the shift control circuit 202 includes K shift registers and K selectors (for example, as shown in FIG. 5, the selector 2031, the selector 2032, ..., And a selector 203K).
示例性的,如图5所示的,选择器2031的一端与M1×N1乘法器2011连接,用于接收M1×N1乘法器2011输出的乘积(也即用于接收不移位的乘积),选择器2031的另一端与移位寄存器连接,用于接收移位寄存器输出的移位后的乘积;选择器 2032的一端与M2×N2乘法器2012连接,用于接收M2×N2乘法器2012输出的乘积(也即用于接收不移位的乘积),选择器2032的另一端与移位寄存器连接,用于接收移位寄存器移位后的乘积;选择器203K的一端和MK×NK乘法器连接,用于接收MK×NK乘法器输出的乘积,选择器203K的另一端和移位寄存器连接,用于接收移位寄存器输出的移位后的乘积。其他选择器的工作原理和连接关系均与选择器203K类似,如图5所示,在此不再赘述。Exemplarily, as shown in FIG. 5, one end of the selector 2031 is connected to the M 1 × N 1 multiplier 2011 for receiving the product of the output of the M 1 × N 1 multiplier 2011 (that is, for receiving without shifting). The other end of the selector 2031 is coupled to the shift register for receiving the shifted product of the shift register output; one end of the selector 2032 is coupled to the M 2 × N 2 multiplier 2012 for receiving M The product of the 2 × N 2 multiplier 2012 output (that is, the product for receiving the non-shift), the other end of the selector 2032 is connected to the shift register for receiving the product after the shift of the shift register; the selector 203K end and M K × N K multipliers are connected to receive the product M K × N K multiplier output, the other terminal of the selector 203K, and a shift register connected to receive shift register output after the shift product. The working principle and the connection relationship of the other selectors are similar to those of the selector 203K, as shown in FIG. 5, and details are not described herein again.
可选的,如图6所示,本申请提供的装置还包括:配置信息接收电路30,用于接收第一配置信息,该第一配置信息用于指示移位控制电路202对接收到的K个乘积进行移位控制处理。例如,第一配置信息指示移位控制电路对接收到的K个乘积中的每一个乘积是需要进行移位处理还是不移位处理。Optionally, as shown in FIG. 6, the apparatus provided by the present application further includes: a configuration information receiving circuit 30, configured to receive first configuration information, where the first configuration information is used to instruct the shift control circuit 202 to receive the received K. The product is subjected to shift control processing. For example, the first configuration information indicates whether the shift control circuit needs to perform shift processing or no shift processing for each of the received K products.
每个移位电路对各自接收到的乘法器输出的乘积所执行的移位控制处理可以不同,例如,有的移位电路需要对接收到的乘法器输出的乘积执行移位处理,有的移位电路需要对接收到的乘法器输出的乘积不移位处理,为了便于每个移位电路对各自接收到的乘法器输出的乘积实现灵活和精确的移位控制处理,可选的,第一配置信息包含为移位控制电路中的至少一个移位电路配置的第一指示信息,其中,第一指示信息用于指示至少一个移位电路中每个移位电路对各自接收到的乘法器输出的乘积进行移位控制处理。其中,第一指示信息可以为第一指示符或第二指示符,其中,第一指示符表示对接收到的K个乘积进行移位处理,第二指示符表示对接收到的K个乘积不进行移位处理。示例性的,第一指示符可以为“0”,第二指示符可以为“1”。可选的,第一配置信息中可以包含为每个移位电路单独配置的第一指示信息。The shift control process performed by each shift circuit on the product of the respective multiplier output received may be different. For example, some shift circuits need to perform shift processing on the product of the received multiplier output, and some shifts. The bit circuit needs to process the product of the received multiplier output without shifting, in order to facilitate flexible and accurate shift control processing for each shift circuit to multiply the product of the respective multiplier output, optionally, first The configuration information includes first indication information configured for at least one of the shift control circuits, wherein the first indication information is used to indicate a multiplier output of each of the at least one shift circuit for each of the shift circuits The product of the shift is subjected to shift control processing. The first indication information may be a first indicator or a second indicator, where the first indicator indicates that the received K products are subjected to shift processing, and the second indicator indicates that the received K products are not Perform shift processing. Exemplarily, the first indicator may be "0" and the second indicator may be "1." Optionally, the first configuration information may include first indication information separately configured for each shift circuit.
每个移位电路对各自接收到的乘法器输出的乘积的移动位数可以不同,因此,配置信息接收电路30接收到的第一配置信息中还包括为至少一个移位电路配置的第二指示信息。第二指示信息用于指示至少一个移位电路中每个移位电路对各自接收到的乘法器输出的乘积进行移位控制处理时的移位方向和/或移位位数。通过为移位电路配置第二指示信息可以使移位电路对各自接收到的乘法器输出的乘积精确地执行移位控制处理。可选的,第一配置信息中可以包含为每个移位电路单独配置的第二指示信息。The number of shifts of the product of each shift circuit to the output of the multiplier output may be different. Therefore, the first configuration information received by the configuration information receiving circuit 30 further includes a second indication configured for the at least one shift circuit. information. The second indication information is used to indicate a shift direction and/or a shift bit number when each shift circuit of the at least one shift circuit performs a shift control process on a product of the respective received multiplier outputs. By arranging the second indication information for the shift circuit, the shift circuit can accurately perform the shift control process on the product of the respective received multiplier outputs. Optionally, the first configuration information may include second indication information separately configured for each shift circuit.
具体的,所述配置信息接收电路30可以为数字信号处理装置20的一个管脚或者一个连接线。在一个具体的示例中,当移位控制电路202中包括移位寄存器时,配置信息接收电路30可以为所述移位寄存器的一个管脚,配置信息接收电路30可以通过移位寄存器的管脚向移位寄存器输入指示信息(如上述第一指示信息和/或第二指示信息)。Specifically, the configuration information receiving circuit 30 can be a pin or a connecting line of the digital signal processing device 20. In a specific example, when the shift register is included in the shift control circuit 202, the configuration information receiving circuit 30 can be a pin of the shift register, and the configuration information receiving circuit 30 can pass the pin of the shift register. The indication information (such as the first indication information and/or the second indication information described above) is input to the shift register.
可选的,所述配置信息接收电路30可以为位于DSP内,也可以位于FPGA内的其他控制器中,例如中央处理器(Central Processing Unit,CPU)中,本申请实施例对此不限定。任何可以向本申请实施例中提供的移位控制电路202输入第一配置信息的装置均可以作为本申请的配置信息接收电路30。Optionally, the configuration information receiving circuit 30 may be located in the DSP, or may be located in another controller in the FPGA, such as a central processing unit (CPU), which is not limited by the embodiment of the present application. Any means for inputting the first configuration information to the shift control circuit 202 provided in the embodiment of the present application can be used as the configuration information receiving circuit 30 of the present application.
可选的,所述配置信息接收电路30,还用于接收第三配置信息,该第三配置信息用于指示K个选择器中每个选择器从各自接收到的移位寄存器输出的移位后的乘 积和乘法器输出的未经移位的乘积中选择一个,输出给选择控制电路203。可选的,该第三配置信息包括为K个移位电路中每个移位电路配置的第一指示,其中,第一指示用于指示选择器在移位寄存器输出的移位后的乘积和乘法器输出的乘积中选择一个,输出给选择控制电路203。可选的,所述第一指示可以为字母或者数字,本申请对此不限定。示例性的,第一指示可以为第三指示符或第四指示符,其中,第三指示符用于指示选择器将移位电路输出的移位后的乘积输出给选择控制电路203,第四指示符用于指示选择器将乘法器输出的乘积输出给选择控制电路203。示例性的,第三指示符可以为“1”,第四指示符可以为“0”,这样每个选择器可以根据各自对应的第一指示的具体内容执行相应的处理。Optionally, the configuration information receiving circuit 30 is further configured to receive third configuration information, where the third configuration information is used to indicate a shift of each of the K selectors from the respective received shift register output. After multiplication One of the unshifted products of the product and the multiplier output is selected and output to the selection control circuit 203. Optionally, the third configuration information includes a first indication configured for each of the K shift circuits, wherein the first indication is used to indicate a product sum of the selector after shifting of the shift register output One of the products of the multiplier output is selected and output to the selection control circuit 203. Optionally, the first indication may be a letter or a number, which is not limited in this application. Exemplarily, the first indication may be a third indicator or a fourth indicator, wherein the third indicator is used to instruct the selector to output the shifted product outputted by the shift circuit to the selection control circuit 203, and fourth The indicator is used to instruct the selector to output the product of the multiplier output to the selection control circuit 203. Exemplarily, the third indicator may be “1” and the fourth indicator may be “0”, such that each selector may perform corresponding processing according to the specific content of the corresponding corresponding first indication.
可选的,如图7所示,本申请提供的选择控制电路203包括:累加电路,累加电路用于实现将K个处理后的乘积的累加。具体的,累加电路可以包含多个累加器,例如,如图7中所示的累加器301、累加器302、累加器303、…、以及累加器30(K-1)。Optionally, as shown in FIG. 7, the selection control circuit 203 provided by the present application includes: an accumulation circuit for implementing accumulation of K processed products. Specifically, the accumulation circuit may include a plurality of accumulators, for example, an accumulator 301, an accumulator 302, an accumulator 303, ..., and an accumulator 30 (K-1) as shown in FIG.
可选的,本实施例中当累加电路包括多个累加器(例如,包括至少三个累加器)时,该多个累加器级联连接,且前一级的累加器数量比与其相邻的后一级累加器的数量少一个(其中,在两个级联的累加器中前一级的累加器的输出是后一级累加器的输入)。可选的,本申请实施例中将与选择控制电路202相连接的累加器称为第一级累加器,一个第一级累加器用于将移位控制电路202输出的两个处理后的乘积进行累加。在一个具体的示例中,每两个选择器与累加电路中的一个第一级累加器相连,例如,选择器2031和选择器2032和累加电路中的一个第一级累加器301连接。Optionally, in the embodiment, when the accumulating circuit includes a plurality of accumulators (for example, including at least three accumulators), the plurality of accumulators are connected in cascade, and the number of accumulators in the previous stage is adjacent to the same The number of accumulators in the latter stage is one less (wherein the output of the accumulator of the previous stage in the two cascaded accumulators is the input of the accumulator of the latter stage). Optionally, in the embodiment of the present application, the accumulator connected to the selection control circuit 202 is referred to as a first-stage accumulator, and a first-stage accumulator is used to perform the two processed products output by the shift control circuit 202. Accumulate. In a specific example, each of the two selectors is coupled to a first stage accumulator of the accumulator circuit, for example, a selector 2031 and a selector 2032 are coupled to a first stage accumulator 301 of the accumulating circuit.
可选的,在选择器的个数为奇数时,则有一个累加器仅接收到的一个选择器输出的乘积,并且将其接收到的一个选择器输出的乘积与其他累加器累加后的结果累加。示例性的,如图8所示,选择器2031和选择器2032输出的乘积通过累加器301累加,选择器2033输出的乘积输出至累加器302中,且最终累加器301和累加器302分别输出的累加结果通过累加器303实现累加并输出。可选的,当任一级累加器的个数为奇数时,也可以使用与上述类似的方法进行累加。如图9所示,图9以移位控制电路包括6个选择器为例,选择器2031、选择器2032的输出的乘积通过第一级的累加器301累加,选择器2033和选择器2034输出的乘积通过第一级的累加电路302累加,选择器2033和选择器2034输出的乘积通过第一级的累加电路303累加,累加器301和累加器302输出的累加结果通过累加器304累加,最后累加器304输出的累加结果和累加器303输出的累加结果通过累加器305累加。Optionally, when the number of selectors is an odd number, there is an accumulator that only receives the product of one of the selector outputs, and accumulates the product of one of the selector outputs received by the accumulator and the other accumulators. Accumulate. Exemplarily, as shown in FIG. 8, the product output by the selector 2031 and the selector 2032 is accumulated by the accumulator 301, the product output by the selector 2033 is output to the accumulator 302, and finally the accumulator 301 and the accumulator 302 are respectively output. The accumulated result is accumulated and output by the accumulator 303. Optionally, when the number of accumulators of any level is an odd number, accumulation may also be performed using a method similar to the above. As shown in FIG. 9, FIG. 9 takes the shift control circuit as an example. The products of the selector 2031 and the output of the selector 2032 are accumulated by the accumulator 301 of the first stage, and the selector 2033 and the selector 2034 output. The product of the first stage is accumulated by the accumulating circuit 302 of the first stage, and the product of the output of the selector 2033 and the selector 2034 is accumulated by the accumulating circuit 303 of the first stage, and the accumulated result of the accumulator 301 and the accumulator 302 is accumulated by the accumulator 304, and finally The accumulated result output by the accumulator 304 and the accumulated result output by the accumulator 303 are accumulated by the accumulator 305.
可选的,所述配置信息接收电路30,还用于接收第二配置信息,该第二配置信息用于指示选择控制电路203输出K个处理后的乘积的累加结果或者输出K个处理后的乘积。具体的,第二配置信息可以为第二指示或第三指示,其中,第二指示用于指示选择控制电路203输出K个处理后的乘积,第三指示用于指示选择控制电路203输出K个处理后的乘积的累加结果。选择控制电路203在确定第二配置信息为第二指示时,直接输出K个处理后的乘积,选择控制电路203在确定第二配置信息为第三指示时,将K个处理后的乘积执行累加操作后输出。 Optionally, the configuration information receiving circuit 30 is further configured to receive second configuration information, where the second configuration information is used to instruct the selection control circuit 203 to output an accumulated result of the K processed products or output K processed samples. product. Specifically, the second configuration information may be a second indication or a third indication, where the second indication is used to instruct the selection control circuit 203 to output K processed products, and the third indication is used to instruct the selection control circuit 203 to output K The cumulative result of the processed product. The selection control circuit 203 directly outputs K processed products when determining that the second configuration information is the second indication, and the selection control circuit 203 performs accumulation of the K processed products when determining that the second configuration information is the third indication. Output after operation.
需要说明的是,本申请提供的数字信号处理装置输出的K个乘法器的结果可以根据需要选择使用或者单独使用,也即本装置可以实现K个乘法器中任意至少一个乘法器的功能,也可以利用K个乘法器实现乘累加运算功能或者利用K个乘法器实现M×N乘法器的功能,其中,M×N乘法器表示被乘数位宽为M比特,乘数位宽为N比特的乘法器,且满足
Figure PCTCN2017095061-appb-000004
It should be noted that the results of the K multipliers output by the digital signal processing apparatus provided by the present application may be selected or used as needed, that is, the apparatus may implement the function of any at least one of the K multipliers, The multiply accumulate function can be implemented by K multipliers or the function of the M×N multiplier can be realized by K multipliers, wherein the M×N multiplier indicates that the multiplicand bit width is M bits, and the multiplier bit width is N bits. Multiplier and satisfy
Figure PCTCN2017095061-appb-000004
以下将结合附图介绍本申请实施例提供的任一种数字信号处理装置20在不同场景的具体应用。The specific application of any of the digital signal processing devices 20 provided in the embodiments of the present application in different scenarios will be described below with reference to the accompanying drawings.
本申请提供的装置可以用于实现乘累加运算功能。The apparatus provided by the present application can be used to implement a multiply-accumulate operation function.
如图10所示,本申请提供的移位控制电路202,具体用于将接收到的K个乘法器201输出的K个乘积,不经过移位处理,输出至选择控制电路203;选择控制电路203,将接收到的K个乘积累加并输出。As shown in FIG. 10, the shift control circuit 202 provided by the present application is specifically configured to output K products outputted by the K multipliers 201 to the selection control circuit 203 without shift processing; and select a control circuit. 203. Add and output the received K multiplications.
可选的,每个选择器接收到的第三配置信息用于指示每个选择器将各自接收到的乘法器输出的乘积直接输出。第二配置信息用于指示选择控制电路输出K个处理后的乘积的累加结果。Optionally, the third configuration information received by each selector is used to instruct each selector to directly output the product of the respective received multiplier outputs. The second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products.
在一个具体的示例中,本申请中的乘累加运算可以为矩阵之间的卷积运算。In a specific example, the multiply and accumulate operations in this application may be a convolution operation between matrices.
图11中所示的数字信号处理装置20包含4个4bit×4bit的乘法器,可以用与实现2×2矩阵卷积运算。在实现矩阵卷积运算时,其被乘数为d1、d2、d3和d4,乘数为k1、k2、k3以及k4。The digital signal processing device 20 shown in Fig. 11 includes four 4 bit × 4 bit multipliers, which can be used to implement a 2 × 2 matrix convolution operation. When implementing the matrix convolution operation, the multiplicands are d1, d2, d3, and d4, and the multipliers are k1, k2, k3, and k4.
如图11所示,输入至8bit×8bit乘法器3011的被乘数序列为:in_X1[7:0],乘数序列为:in_Y1[7:0];输入至8bit×8bit乘法器3012的被乘数序列为:in_X2[7:0],乘数序列为:in_Y2[7:0];输入至8bit×8bit乘法器3013的被乘数序列为:in_X3[7:0],乘数序列为:in_Y3[7:0];输入至8bit×8bit乘法器3014的被乘数序列为:in_X4[7:0],乘数序列为:in_Y4[7:0]。As shown in FIG. 11, the multiplicand sequence input to the 8-bit×8-bit multiplier 3011 is: in_X1[7:0], the multiplier sequence is: in_Y1[7:0], and the input to the 8-bit×8-bit multiplier 3012 is The multiplier sequence is: in_X2[7:0], the multiplier sequence is: in_Y2[7:0]; the multiplicand sequence input to the 8bit×8bit multiplier 3013 is: in_X3[7:0], and the multiplier sequence is :in_Y3[7:0]; The multiplicand sequence input to the 8bit×8bit multiplier 3014 is: in_X4[7:0], and the multiplier sequence is: in_Y4[7:0].
其中,in_X1[7:0]表示d1,in_X2[7:0]表示d2,in_X3[7:0]表示d3以及in_X4[7:0]表示d4;in_Y1[7:0]表示k1,in_Y2[7:0]表示k2,in_Y3[7:0]表示k3以及in_Y4[7:0]表示k4。可选的,被乘数序列和乘数序列可以分别存放在寄存器中。Wherein, in_X1[7:0] represents d1, in_X2[7:0] represents d2, in_X3[7:0] represents d3 and in_X4[7:0] represents d4; in_Y1[7:0] represents k1, in_Y2[7 :0] means k2, in_Y3[7:0] means k3 and in_Y4[7:0] means k4. Alternatively, the multiplicand sequence and the multiplier sequence can be stored in registers, respectively.
其中,8bit×8bit乘法器3011将in_X1[7:0]和in_Y1[7:0]执行乘法操作后,获取第一乘积;8bit×8bit乘法器3012将in_X2[7:0]和in_Y2[7:0]执行乘法操作后,获取第二乘积;8bit×8bit乘法器3013将in_X3[7:0]和in_Y3[7:0]进行乘法操作后,获取第三乘积;8bit×8bit乘法器3011将in_X4[7:0]和in_Y4[7:0]进行乘法操作后,获取第四乘积。The 8-bit×8-bit multiplier 3011 obtains the first product after performing multiplication operations on in_X1[7:0] and in_Y1[7:0]; the 8-bit×8-bit multiplier 3012 will in_X2[7:0] and in_Y2[7: 0] After performing the multiplication operation, the second product is obtained; the 8-bit×8-bit multiplier 3013 multiplies in_X3[7:0] and in_Y3[7:0] to obtain the third product; the 8-bit×8-bit multiplier 3011 will in_X4 After [7:0] and in_Y4[7:0] multiply, the fourth product is obtained.
如图11所示的装置用于实现2×2的卷积运算时其输出结果满足以下数学原理:Z=(d1×k1)+(d2×k2)+(d3×k3)+(d4×k4)。移位控制电路202对每个8bit×8bit乘法器输出的乘积均不执行移位操作,移位控制电路202中的移位电路4011将8bit×8bit乘法器3011输出的第一乘积作为选择控制电路203的输入,移位电路4012将8bit×8bit乘法器3012输出的第二乘积作为选择控制电路203的输入,移位电路4013将8bit×8bit乘法器3013输出的第二乘积作为选择控制电路203的输入,移位电路4014将8bit×8bit乘法器3014输出的第四乘积作为选择控制电路的输入。具体的,移位电路4011输出的未移位的第一乘积和移位电路4012 输出的未移位的第二乘积通过第一累加器501获得第一累加结果,移位电路4013输出的未移位的第三乘积和移位电路4014输出的未移位的第四乘积通过第二累加器502获得第二累加结果,第一累加器501和第二累加器502分别将第一累加结果和第二累加结果输出至第三累加器503,以获得最终的2×2卷积乘法运算结果。When the device shown in FIG. 11 is used to implement a 2×2 convolution operation, the output result satisfies the following mathematical principle: Z=(d1×k1)+(d2×k2)+(d3×k3)+(d4×k4 ). The shift control circuit 202 does not perform a shift operation on the product of each 8-bit×8-bit multiplier output, and the shift circuit 4011 in the shift control circuit 202 uses the first product output from the 8-bit×8-bit multiplier 3011 as the selection control circuit. In the input of 203, the shift circuit 4012 uses the second product output from the 8-bit×8-bit multiplier 3012 as the input of the selection control circuit 203, and the shift circuit 4013 uses the second product output from the 8-bit×8-bit multiplier 3013 as the selection control circuit 203. The input, shift circuit 4014 uses the fourth product output by the 8-bit x 8-bit multiplier 3014 as an input to the selection control circuit. Specifically, the unshifted first product and shift circuit 4012 output by the shift circuit 4011 The output of the unshifted second product obtains the first accumulation result by the first accumulator 501, and the unshifted third product output by the shift circuit 4013 and the unshifted fourth product output by the shift circuit 4014 pass the The second accumulator 502 obtains a second accumulated result, and the first accumulator 501 and the second accumulator 502 respectively output the first accumulated result and the second accumulated result to the third accumulator 503 to obtain a final 2×2 convolution multiplication. The result of the operation.
可选的,如图11所示,本申请提供的数字信号处理装置20还包括与选择控制电路203连接的寄存器,该寄存器用于存储选择控制电路203输出的累加结果或者直接输出的移位控制电路输出的K个处理后的乘积。Optionally, as shown in FIG. 11, the digital signal processing apparatus 20 provided by the present application further includes a register connected to the selection control circuit 203 for storing the accumulation result output by the selection control circuit 203 or the shift control of the direct output. The K processed products of the circuit output.
示例性的,如图12所示,图12与图11的区别在于,图12中所示的数字信号处理装置20包含9个4bit×4bit的乘法器(如图12所示的4bit×4bit乘法器6011、4bit×4bit乘法器6012、4bit×4bit乘法器6013、…、以及4bit×4bit乘法器6019)。因此,该数字信号处理装置20可以实现3×3矩阵卷积运算。在实现矩阵卷积运算时,其被乘数为d1、d2、d3、d4、d5、d6、d7、d8以及d9,乘数为k1、k2、k3、k4、k5、k6、k7、k8、以及k9。Illustratively, as shown in FIG. 12, FIG. 12 differs from FIG. 11 in that the digital signal processing apparatus 20 shown in FIG. 12 includes nine 4 bit x 4 bit multipliers (4 bit x 4 bit multiplication as shown in FIG. 12). The 6011, 4 bit × 4 bit multiplier 6012, 4 bit × 4 bit multiplier 6013, ..., and 4 bit × 4 bit multiplier 6019). Therefore, the digital signal processing device 20 can implement a 3 × 3 matrix convolution operation. When implementing the matrix convolution operation, the multiplicands are d1, d2, d3, d4, d5, d6, d7, d8, and d9, and the multipliers are k1, k2, k3, k4, k5, k6, k7, k8, And k9.
如图12所示,输入至4bit×4bit乘法器6011的被乘数序列为:in_X1[3:0],乘数序列为:in_Y1[3:0];例如,输入至4bit×4bit乘法器6012的被乘数序列为:in_X2[3:0],乘数序列为:in_Y2[3:0];输入至4bit×4bit乘法器6018的被乘数序列为:in_X8[3:0],乘数序列为:in_Y8[3:0];输入至4bit×4bit乘法器6019的被乘数序列为:in_X9[3:0],乘数序列为:in_Y9[3:0]。其中,in_X1[3:0]表示d1,in_X2[3:0]表示d2,in_X3[3:0]表示d3以及in_X9[3:0]表示d9;in_Y1[3:0]表示k1,in_Y2[3:0]表示k2,in_Y3[3:0]表示k3以及in_Y9[7:0]表示k9,可以理解的是,上述被乘数分别和一个4bit×4bit乘法器的被乘数输入序列对应,乘数分别和一个4bit×4bit乘法器的乘数输入序列对应,其余输入均可以参考上述描述,本申请在此不再赘述。As shown in FIG. 12, the multiplicand sequence input to the 4 bit × 4 bit multiplier 6011 is: in_X1 [3:0], and the multiplier sequence is: in_Y1 [3:0]; for example, input to the 4 bit × 4 bit multiplier 6012 The multiplicand sequence is: in_X2[3:0], the multiplier sequence is: in_Y2[3:0]; the multiplicand sequence input to the 4bit×4bit multiplier 6018 is: in_X8[3:0], multiplier The sequence is: in_Y8[3:0]; the multiplicand sequence input to the 4bit×4bit multiplier 6019 is: in_X9[3:0], and the multiplier sequence is: in_Y9[3:0]. Wherein, in_X1[3:0] represents d1, in_X2[3:0] represents d2, in_X3[3:0] represents d3 and in_X9[3:0] represents d9; in_Y1[3:0] represents k1, in_Y2[3 :0] means k2, in_Y3[3:0] means k3 and in_Y9[7:0] means k9. It can be understood that the above multiplicands correspond to a multiplicand input sequence of a 4bit×4bit multiplier, respectively. The number is corresponding to the multiplier input sequence of a 4bit×4bit multiplier, and the rest of the input can refer to the above description, which is not described herein again.
其中,每个4bit×4bit乘法器用于将其接收到的输入序列进行乘法操作后,获取一个输出结果,由于如图12所示的装置用于实现3×3的卷积运算,因此其输出结果应满足以下数学原理:Z=(d1×k1)+(d2×k2)+(d3×k3)+(d4×k4)+(d5×k5)+(d6×k6)+(d7×k7)+(d8×k8)+(d9×k9),由此,可以知道,移位控制电路202对每个4bit×4bit乘法器输出的输出结果均不执行移位操作,例如,如图12所示,图12中的黑色粗实线箭头表示不执行移位操作,也即如图12所示的每个选择器用于将各自接收到的4bit×4bit乘法器输出的乘积输入至选择控制电路203中,然后选择控制电路203用于将9个未执行移位操作的乘积通过累加电路执行累加操作,以得到最终3×3的卷积运算的结果Z,具体的选择控制电路203将接收到的9个乘积执行累加操作的过程与图11类似,具体可以参见图11所示的内容,本申请在此不再赘述。Wherein, each 4bit×4bit multiplier is used to multiply the input sequence it receives, and obtain an output result. Since the device shown in FIG. 12 is used to implement a 3×3 convolution operation, the output result is obtained. The following mathematical principles should be satisfied: Z = (d1 × k1) + (d2 × k2) + (d3 × k3) + (d4 × k4) + (d5 × k5) + (d6 × k6) + (d7 × k7) + (d8 × k8) + (d9 × k9), whereby it can be known that the shift control circuit 202 does not perform a shift operation on the output result of each 4 bit × 4 bit multiplier output, for example, as shown in Fig. 12, The black thick solid arrow in FIG. 12 indicates that the shift operation is not performed, that is, each selector shown in FIG. 12 is used to input the product of the respective received 4-bit×4 bit multiplier output to the selection control circuit 203, The selection control circuit 203 is then used to perform the accumulation operation by the product of the nine unexecuted shift operations through the accumulation circuit to obtain the result Z of the final 3×3 convolution operation, and the specific selection control circuit 203 will receive the nine The process of performing the accumulating operation is similar to that of FIG. 11. For details, refer to the content shown in FIG.
本申请提供的装置还可以用于实现K个乘法器的功能。The apparatus provided by the present application can also be used to implement the functions of K multipliers.
以下结合图13和图14为例进行说明。如图13所示的实施例,数字信号处理装置20可以当作4个独立的8bit×8bit的乘法器使用。移位控制电路202中的每个移位电路,将各自接收到的乘法器输出的乘积直接输入至选择控制电路203(也即每个移位电路包括的选择器将未移位的乘积输入至选择控制电路203),选择控制 电路203将每个移位电路包括选择器输出的未移位的乘积直接输出,以获取4个8bit×8bit的乘法器输出的结果,例如,如图13中所示的输出结果Z1、Z2、Z3和Z4。可选的,配置信息接收电路30接收到的第一配置信息和/或第二配置信息指示移位控制电路202对接收到的K个乘积均不执行移位操作,配置信息接收电路30接收到的第二配置信息指示选择控制电路203将接收到的K个处理后的结果直接输出Hereinafter, an explanation will be given with reference to FIGS. 13 and 14. As shown in the embodiment of Fig. 13, the digital signal processing device 20 can be used as four independent 8-bit by 8-bit multipliers. Each of the shift control circuits 202 inputs the product of the respective received multiplier outputs directly to the selection control circuit 203 (ie, the selector included in each shift circuit inputs the unshifted product to Select control circuit 203), select control The circuit 203 directly outputs the unshifted product of each shift circuit including the selector output to obtain the result of the output of four 8-bit by 8-bit multipliers, for example, the output results Z1, Z2 as shown in FIG. Z3 and Z4. Optionally, the first configuration information received by the configuration information receiving circuit 30 and/or the second configuration information indicates that the shift control circuit 202 does not perform a shift operation on the received K products, and the configuration information receiving circuit 30 receives The second configuration information indicates that the selection control circuit 203 directly outputs the received K processed results.
示例性的,如图13所示,8bit×8bit乘法器7011的输入为X1(in_X1[7:0])和Y1(in_Y1[7:0]),8bit×8bit乘法器7012的输入为X2(in_X2[7:0])和Y2(in_Y2[7:0]),8bit×8bit乘法器7013的输入为X3(in_X3[7:0])和Y3(in_Y3[7:0]),8bit×8bit乘法器7014的输入为X4(in_X4[7:0])和Y4(in_Y4[7:0])。在图13中移位控制电路202中的每个移位电路均对各自接收到的8bit×8bit的乘法器输出的乘积不执行移位操作,因此,每个移位电路将各自接收到的8bit×8bit乘法器输出的乘积直接输出至选择控制电路203(也即每个移位电路通过各自的选择器向选择控制电路203输出未经过移位的乘积),选择控制电路203将每个移位电路发送的未处理的乘积直接输出以获得每个8bit×8bit乘法器的输出结果,也即图13最终输出4个输出结果,其数学原理满足:Z1=(X1×Y1);Z2=(X2×Y2);Z3=(X3×Y3);Z4=(X4×Y4)。Exemplarily, as shown in FIG. 13, the input of the 8-bit×8-bit multiplier 7011 is X1 (in_X1[7:0]) and Y1 (in_Y1[7:0]), and the input of the 8-bit×8-bit multiplier 7012 is X2 ( In_X2[7:0]) and Y2(in_Y2[7:0]), the input of the 8bit×8bit multiplier 7013 is X3 (in_X3[7:0]) and Y3 (in_Y3[7:0]), 8bit×8bit The inputs to multiplier 7014 are X4 (in_X4[7:0]) and Y4 (in_Y4[7:0]). Each of the shift circuits in the shift control circuit 202 in Fig. 13 does not perform a shift operation on the product of the respective received 8-bit × 8-bit multiplier outputs, and therefore, each shift circuit will receive the respective 8 bits. The product of the ×8 bit multiplier output is directly output to the selection control circuit 203 (that is, each shift circuit outputs an unshifted product to the selection control circuit 203 through a respective selector), and the selection control circuit 203 shifts each of them. The unprocessed product sent by the circuit is directly output to obtain the output result of each 8-bit×8-bit multiplier, that is, the final output of four output results in FIG. 13 is satisfied, and the mathematical principle satisfies: Z1=(X1×Y1); Z2=(X2) ×Y2); Z3=(X3×Y3); Z4=(X4×Y4).
可选的,本申请中提供的装置中的选择控制电路203可以包括累加电路,只不过在需要进行累加操作的场景中,可以配置选择控制电路的输出经过累加电路,在不需要进行累加操作的场景中配置选择控制电路的输出不经过累加电路。当然,选择控制电路203也可以同时输出经过累加电路的处理结果和不经过累加电路的处理结果,供与本申请实施例提供的数字信号处理装置相连接的模块或者装置根据需求选择使用。可选的,选择控制电路203还可以根据计算需求输出任一个或者多于一个累加器的输出。Optionally, the selection control circuit 203 in the apparatus provided in the present application may include an accumulation circuit, but in a scenario in which an accumulation operation is required, the output of the selection control circuit may be configured to pass through the accumulation circuit without performing an accumulation operation. The output of the configuration selection control circuit in the scene does not pass through the accumulation circuit. Of course, the selection control circuit 203 can simultaneously output the processing result of the accumulation circuit and the processing result without the accumulation circuit, and the module or device connected to the digital signal processing device provided by the embodiment of the present application can be selectively used according to requirements. Optionally, the selection control circuit 203 can also output the output of any one or more accumulators according to the calculation requirement.
如图14所示的数字信号处理装置,可以当作9个独立的4bit×4bit的乘法器使用,其使用过程与图13类似,具体可以参见图13所示的过程,那么图14所示的装置最终输入9个输入结果,其数学原理满足:Z1=(X1×Y1);Z2=(X2×Y2);Z3=(X3×Y3);Z4=(X4×Y4);……;Z9=(X9×Y9)。The digital signal processing device shown in FIG. 14 can be used as nine independent 4bit×4bit multipliers, and the usage process is similar to that of FIG. 13. For details, refer to the process shown in FIG. The device finally inputs 9 input results, and its mathematical principle satisfies: Z1=(X1×Y1); Z2=(X2×Y2); Z3=(X3×Y3); Z4=(X4×Y4);......;Z9= (X9×Y9).
本申请提供的装置还可以用于实现一个M×N乘法器的功能,其中M>Mi,N>Ni,i=1,…,K。The apparatus provided by the present application can also be used to implement the function of an M×N multiplier, where M>M i , N>N i , i=1, . . . , K.
以下结合图15和图16进行说明。移位控制电路202,对接收到的K个乘积中的一个乘积不进行移位处理,对接收到的K个乘积中的其他K-1个乘积进行移位处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路203,选择控制电路203将K个处理后的乘积累加并输出。Description will be made below with reference to Figs. 15 and 16 . The shift control circuit 202 does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products to obtain K processed products. The K processed products are output to the selection control circuit 203, and the selection control circuit 203 adds and outputs the K processed multiplications.
可选的,对所述其他K-1个乘积中的任意一个乘积进行的移位处理,可以根据生成所述任意一个乘积的被乘数在M比特被乘数中所占的比特位置和生成所述任意一个乘积的乘数在N比特乘数中所占的比特位置进行。可选的,移位控制电路202在对其他K-1个乘积执行移位操作时,根据每个其他K-1个乘积对应的移位电路接收到的指示信息中包含的移动方向和/或移动位数进行。可选的,每个移位电路将接 收到的乘积向哪个方向移动以及移动的具体位数,可以通过以下方式确定:Optionally, the shifting process performed on any one of the other K-1 products may be generated according to a bit position occupied by the multiplicand of the generated one of the multiplicative products in the M-bit multiplicand The multiplier of any one of the products is performed at the bit position occupied by the N-bit multiplier. Optionally, when performing a shift operation on other K-1 products, the shift control circuit 202 selects a moving direction and/or a direction included in the indication information received by the shift circuit corresponding to each of the other K-1 products. The number of bits moved is made. Optional, each shift circuit will be connected The direction in which the received product moves and the specific number of bits moved can be determined by:
K个乘法器接收到的K个被乘数序列是通过将一个源被乘数序列根据乘法器的位宽按由高到底的顺序分配得到的,K个乘法器接收到的K个乘数序列是通过将一个源乘数序列根据乘法器的位宽按由高到底的顺序分配得到的。通常,移位电路需要将接收到的乘积移动的位数根据每个乘法器的被乘数的比特在源被乘数中的比特位置以及乘数的比特在源乘数中的比特位置确定。The K multiplicand sequences received by the K multipliers are obtained by allocating a source multiplicand sequence according to the bit width of the multiplier in order of high to low, and K multiplier sequences received by K multipliers. It is obtained by assigning a source multiplier sequence according to the bit width of the multiplier in order of high to low. In general, the shift circuit needs to determine the number of bits of the received product shift based on the bit position of the multiplicand of each multiplier in the source multiplicand and the bit position of the multiplier in the source multiplier.
示例性的,M bit源被乘数为X[9:0],其可以拆分为X[8:6]、X[5:3]和X[2:0],N bit源乘数的输入序列为Y[15:0],其可以拆分为:Y[14:10]、Y[9:5]和Y[4:0],假设一个乘法器接收到的被乘数为X[8:6],乘数为Y[14:10],那么移位控制电路202控制与该乘法器连接的移位寄存器将该乘法器输出的乘积左移16位。假设一个乘法器接收到的被乘数为X[5:3],乘数为Y[14:10],那么移位控制电路202控制与该乘法器连接的移位寄存器将该乘法器输出的乘积左移13位。Illustratively, the M bit source is multiplied by X[9:0], which can be split into X[8:6], X[5:3], and X[2:0], N bit source multipliers. The input sequence is Y[15:0], which can be split into: Y[14:10], Y[9:5], and Y[4:0], assuming that the multiplicator received by the multiplier is X[ 8:6], the multiplier is Y[14:10], then the shift control circuit 202 controls the shift register connected to the multiplier to shift the product of the multiplier output to the left by 16 bits. Assuming that the multiplier received by the multiplier is X[5:3] and the multiplier is Y[14:10], the shift control circuit 202 controls the shift register connected to the multiplier to output the multiplier. The product is shifted left by 13 bits.
示例性的,如图15所示,数字信号处理装置包含4个8bit×8bit乘法器,用于实现16bit×16bit的乘法器为例。8bit×8bit乘法器8011的输入分别为in_X1[7:0]、in_Y1[7:0],8bit×8bit乘法器8012的输入分别为in_X2[7:0]、in_Y2[7:0],8bit×8bit的乘法器8013的输入分别为in_X3[7:0]、in_Y3[7:0],8bit×8bit乘法器8014的输入分别为in_X4[7:0]、in_Y4[7:0]。可选的,在实际过程中可以用将包括16个bit位的源被乘数X[15:0]和源乘数Y[15:0],按照bit位由高至低的顺序,拆分为被乘数X[15:8]和X[7:0],以及乘数Y[15:8]和[Y7:0]并分别存储在如图15所示的寄存器中,其中,X[7:0]作为X1[7:0],X[15:8]作为X2[7:0],Y[15:8]作为Y1[7:0],Y[7:0]作为Y2[7:0],具体的,X1[7:0]可以作为8bit×8bit乘法器8011的被乘数输入in_X1[7:0],和8bit×8bit乘法器8012的被乘数输入in_X2[7:0],X2[7:0]可以作为8bit×8bit乘法器8013的被乘数输入in_X3[7:0],和8bit×8bit乘法器8014的被乘数输入in_X4[7:0];Y1[7:0]可以作为8bit×8bit乘法器8011的乘数输入in_Y1[7:0],和8bit×8bit乘法器8013的乘数输入in_Y3[7:0],Y2[7:0]可以作为8bit×8bit乘法器8012的被乘数输入in_X2[7:0],和8bit×8bit乘法器8014的被乘数输入in_X4[7:0]。Illustratively, as shown in FIG. 15, the digital signal processing apparatus includes four 8-bit×8-bit multipliers for implementing a 16-bit×16-bit multiplier as an example. The inputs of the 8bit×8bit multiplier 8011 are in_X1[7:0] and in_Y1[7:0], respectively, and the inputs of the 8bit×8bit multiplier 8012 are in_X2[7:0], in_Y2[7:0], and 8bit× respectively. The inputs of the 8-bit multiplier 8013 are in_X3[7:0] and in_Y3[7:0], respectively, and the inputs of the 8-bit×8-bit multiplier 8014 are in_X4[7:0] and in_Y4[7:0], respectively. Optionally, in the actual process, the source multiplicand X[15:0] and the source multiplier Y[15:0], which will include 16 bit bits, may be split according to the bit order from high to low. The multiplicands X[15:8] and X[7:0], and the multipliers Y[15:8] and [Y7:0] are stored in the registers shown in Figure 15, respectively, where X[ 7:0] as X1[7:0], X[15:8] as X2[7:0], Y[15:8] as Y1[7:0], Y[7:0] as Y2[7 :0], specifically, X1[7:0] can be used as the multiplicand input in_X1[7:0] of the 8bit×8bit multiplier 8011, and the multiplicand input of the 8bit×8bit multiplier 8012 is input in_X2[7:0 ], X2[7:0] can be used as the multiplicand input in_X3[7:0] of the 8bit×8bit multiplier 8013, and the multiplicand input of the 8bit×8bit multiplier 8014 is input in_X4[7:0]; Y1[7 :0] can be used as the multiplier input in_Y1[7:0] of the 8bit×8bit multiplier 8011, and the multiplier input in_Y3[7:0] of the 8bit×8bit multiplier 8013, and Y2[7:0] can be used as the 8bit× The multiplicand input of the 8-bit multiplier 8012 is input to in_X2[7:0], and the multiplicand of the 8-bit×8-bit multiplier 8014 is input to in_X4[7:0].
为了使得4个8bit×8bit乘法器的输出结果和一个16bit×16bit乘法器的输出结果相同,需要满足以下公式:In order to make the output of four 8-bit × 8-bit multipliers the same as the output of a 16-bit × 16-bit multiplier, the following formula needs to be satisfied:
Z[31:0]=X[15:0]×Y[15:0]=((2^8)*X[15:8]+X[7:0])×((2^8)*Y[15:8]+Y[7:0])=(2^16)*(X[15:8]×Y[15:8])+(2^8)*((X[15:8]×Y[7:0])+(X[7:0]×Y[15:8]))+(2^0)*(X[7:0]×Y[7:0])=(2^16)*(in_X4[7:0]×in_Y4[7:0])+(2^8)*((in_X3[7:0]×inY3[7:0])+(in_X2[7:0]×in_Y2[7:0])+(2^0)*(in_X1[7:0]×in_Y1[7:0]),其中,2^16表示左移16bit,2^8表示左移8bit,2^0表示不移位。Z[31:0]=X[15:0]×Y[15:0]=((2^8)*X[15:8]+X[7:0])×((2^8)* Y[15:8]+Y[7:0])=(2^16)*(X[15:8]×Y[15:8])+(2^8)*((X[15:8] ]×Y[7:0])+(X[7:0]×Y[15:8]))+(2^0)*(X[7:0]×Y[7:0])=( 2^16)*(in_X4[7:0]×in_Y4[7:0])+(2^8)*((in_X3[7:0]×inY3[7:0])+(in_X2[7:0 ] ×in_Y2[7:0])+(2^0)*(in_X1[7:0]×in_Y1[7:0]), where 2^16 means left shift 16bit, 2^8 means left shift 8bit, 2^0 means no shifting.
因此,移位电路9011将接收到的乘积不执行移位操作,移位电路9012和移位电路9013将接收到的乘积向左移动8bit,移位电路9014将接收到的乘积向左移动16bit。那么移位电路9011包括的选择器将未移位的乘积输出至选择控制电路203,移位电路9012包括的选择器将左移8bit的乘积输出至选择控制电路203,移位电路9013包括的选择器 将左移8bit的乘积输出至选择控制电路203,移位电路9014包括的选择器将左移16bit的乘积输出至选择控制电路203。可选的,第一配置信息可以对应的指示上述移位电路进行相应的移位或者不移位操作,第三配置信息可以对应的指示上述选择器选择相应的处理后的乘积输出至选择控制电路。图15所示的选择控制电路203对移位控制电路202输出的一个未移位的乘积和3个移位后的乘积执行累加操作后并输出,具体选择控制电路203执行累加操作的处理过程和上述实施例为实现卷积运算所描述的选择控制电路203的处理过程类似,本申请在此不再赘述。Therefore, the shift circuit 9011 does not perform the shift operation on the received product, the shift circuit 9012 and the shift circuit 9013 shift the received product to the left by 8 bits, and the shift circuit 9014 shifts the received product to the left by 16 bits. Then, the selector included in the shift circuit 9011 outputs the unshifted product to the selection control circuit 203, and the selector included in the shift circuit 9012 outputs the product of the left shift of 8 bits to the selection control circuit 203, and the selection included in the shift circuit 9013 Device The product shifted left by 8 bits is output to the selection control circuit 203, and the selector included in the shift circuit 9014 outputs the product shifted left by 16 bits to the selection control circuit 203. Optionally, the first configuration information may correspondingly indicate that the shifting circuit performs a corresponding shift or no shift operation, and the third configuration information may correspondingly instruct the selector to select a corresponding processed product output to the selection control circuit. . The selection control circuit 203 shown in FIG. 15 performs an accumulation operation on an unshifted product and three shifted products output from the shift control circuit 202, and outputs it, specifically selecting the processing procedure in which the control circuit 203 performs the accumulation operation and The processing procedure of the selection control circuit 203 described in the foregoing embodiment for implementing the convolution operation is similar, and the details are not described herein again.
图16与图15的区别在于,图16中所示的数字信号处理装置包含9个4bit×4bit乘法器,可用于实现12bit×12bit乘法器的功能。在图16中源被乘数A为X[11:0],可以划分为a2=X[11:8]、a1=X[7:4]和a0=X[3:0],源乘数B为Y[11:0]可以划分为b2=Y[11:8]、b1=Y[7:4]和b0=Y[3:0],其中,a0用作in_X1[3:0]、in_X2[3:0]和in_X3[3:0],b0用作in_Y1[3:0]、in_Y2[3:0]和in_Y 3[3:0],a1用作in_X4[3:0]、in_X5[3:0]和in_X6[3:0],b1用作in_Y4[3:0]、in_Y5[3:0]和in_Y 6[3:0],a2用作in_X7[3:0]、in_X8[3:0]和in_X9[3:0],b2用作in_Y7[3:0]、in_Y8[3:0]和in_Y 9[3:0]。为了满足Z[23:0]=X[11:0]×Y[11:0]=((2^8)*X[11:8]+(2^4)*X[7:4]+X[3:0])×((2^8)*Y[11:8]+(2^4)*Y[7:4]+Y[3:0])=(2^16)*(X[11:8]×Y[11:8])+(2^12)*(X[11:8]×Y[7:4])+(2^8)*(X[11:8]×Y[3:0])+(2^12)*(X[7:4]×Y[11:8])+(2^8)*(X[7:4]×Y[7:4])+(2^4)*(X[7:4]×Y[3:0])+(2^8)*(X[3:0]×Y[11:8])+(2^4)*(X[3:0]×Y[7:4])+(X[3:0]×Y[3:0])=((2^8)*in_X3[3:0]+(2^4)*in_X2[3:0]+in_X1[3:0])×((2^8)*in_Y3[3:0]+(2^4)*in_Y2[3:0]+in_Y1[3:0]),因此移位控制电路202中的移位电路1111将接收到的乘积不执行移位操作,移位电路1112将接收到的乘积左移4位,移位电路1113将接收到的乘积左移8位,移位电路1114将接收到的乘积左移4位,移位电路1115将接收到的乘积左移8位,移位电路1116将接收到的乘积左移12位,移位电路1117将接收到的乘积左移8位,移位电路1118将接收到的乘积左移12位,移位电路1119将接收到的乘积左移16位,具体的,图16中与移位电路1111中的选择器用于将未移位的乘积输出至选择控制电路203,移位控制电路202包括的除移位电路1111中的选择器之外的其余选择器均用于将各自相连的移位寄存器移位后的乘积输出至选择控制电路203。图16中选择控制电路203和图15中选择控制电路203的功能和处理过程相同,本申请在此不再赘述。16 differs from FIG. 15 in that the digital signal processing apparatus shown in FIG. 16 includes nine 4 bit x 4 bit multipliers that can be used to implement the functions of a 12 bit x 12 bit multiplier. In Figure 16, the source multiplicand A is X[11:0], which can be divided into a2=X[11:8], a1=X[7:4], and a0=X[3:0], source multiplier B is Y[11:0] and can be divided into b2=Y[11:8], b1=Y[7:4], and b0=Y[3:0], where a0 is used as in_X1[3:0], in_X2[3:0] and in_X3[3:0], b0 is used as in_Y1[3:0], in_Y2[3:0], and in_Y 3[3:0], a1 is used as in_X4[3:0], in_X5 [3:0] and in_X6[3:0], b1 is used as in_Y4[3:0], in_Y5[3:0], and in_Y 6[3:0], and a2 is used as in_X7[3:0], in_X8[ 3:0] and in_X9[3:0], b2 is used as in_Y7[3:0], in_Y8[3:0], and in_Y 9[3:0]. In order to satisfy Z[23:0]=X[11:0]×Y[11:0]=((2^8)*X[11:8]+(2^4)*X[7:4]+ X[3:0])×((2^8)*Y[11:8]+(2^4)*Y[7:4]+Y[3:0])=(2^16)*( X[11:8]×Y[11:8])+(2^12)*(X[11:8]×Y[7:4])+(2^8)*(X[11:8] ×Y[3:0])+(2^12)*(X[7:4]×Y[11:8])+(2^8)*(X[7:4]×Y[7:4 ])+(2^4)*(X[7:4]×Y[3:0])+(2^8)*(X[3:0]×Y[11:8])+(2^ 4)*(X[3:0]×Y[7:4])+(X[3:0]×Y[3:0])=((2^8)*in_X3[3:0]+( 2^4)*in_X2[3:0]+in_X1[3:0])×((2^8)*in_Y3[3:0]+(2^4)*in_Y2[3:0]+in_Y1[3 :0]), therefore, the shift circuit 1111 in the shift control circuit 202 does not perform the shift operation on the received product, and the shift circuit 1112 shifts the received product to the left by 4 bits, and the shift circuit 1113 receives the received The product is shifted left by 8 bits, the shift circuit 1114 shifts the received product to the left by 4 bits, the shift circuit 1115 shifts the received product to the left by 8 bits, and the shift circuit 1116 shifts the received product to the left by 12 bits, shifting The circuit 1117 shifts the received product to the left by 8 bits, the shift circuit 1118 shifts the received product to the left by 12 bits, and the shift circuit 1119 shifts the received product to the left by 16 bits. Specifically, the shift circuit in FIG. In 1111 The selector is used to output the unshifted product to the selection control circuit 203, and the remaining selectors included in the shift control circuit 202 except the selector in the shift circuit 1111 are used to shift the respective connected shift registers. The product after the bit is output to the selection control circuit 203. The function and processing procedure of the selection control circuit 203 in FIG. 16 and the selection control circuit 203 in FIG. 15 are the same, and the details are not described herein again.
以下将结合图17至图20分别介绍当所述K个乘法器的乘数位宽与被乘数位宽不同的实施例。Embodiments in which the multiplier bit widths of the K multipliers are different from the multiplicand bit widths will be respectively described below with reference to FIGS. 17 to 20.
如图17所示的数字信号处理装置,包含ab个Abit×Bbit乘法器,其被乘数分别为in_X1[A:0]、in_X2[A:0]、in_X3[A:0]、in_X4[A:0]…in_X ab[A:0],其乘数分别为in_Y1[B:0]、in_Y2[B:0]、in_Y3[B:0]、in_Y4[B:0]…in_Yab[B:0],该图17所示的数字信号处理装置,其输出可以为ab个Abit×Bbit的乘法器中每个Abit×Bbit的乘法器独立使用时的输出结果,或者用ab个Abit×Bbit的乘法器实现y×y的卷积运算时的输出结果,或者用ab个Abit×Bbit的乘法器实现Mbit×Nbit(满足:A=M/a,B=N/b)的乘法器时的输出结果。The digital signal processing apparatus shown in FIG. 17 includes ab Abit×Bbit multipliers whose multiplicands are in_X1[A:0], in_X2[A:0], in_X3[A:0], in_X4[A, respectively. :0]...in_X ab[A:0], whose multipliers are in_Y1[B:0], in_Y2[B:0], in_Y3[B:0], in_Y4[B:0]...in_Yab[B:0 In the digital signal processing apparatus shown in FIG. 17, the output may be an output result when each Abit×Bbit multiplier of ab Abit×Bbit multipliers is used independently, or multiplication by ab Abit×Bbit. The output result when the y × y convolution operation is realized, or the output result of the multiplier of Mbit × Nbit (satisfying: A = M / a, B = N / b) is realized by ab Abit × Bbit multipliers .
示例性的,图18至图21以M=27,N=16,A=9,B=8,K=a×b=6为例进行介绍。 Illustratively, FIG. 18 to FIG. 21 are described by taking M=27, N=16, A=9, B=8, and K=a×b=6 as an example.
如图18所示,所述数字信号处理装置包含6个9bit×8bit乘法器,可用于实现27bit×16bit乘法器的功能。源被乘数A=X[26:0]可以划分为a2=[26:18]、a1=[17:9]和a0=[8:0];源乘数B=Y[15:0]可以划分为为b1=[15:8]和b0=[7:0];其中,a0=[8:0]用作in_X1[8:0]、以及用作in_X2[8:0]分别输入至9bit×8bit乘法器601以及9bit×8bit乘法器603中。a1=[17:9]用作in_X3[8:0]、in_X4[8:0]分别输入至9bit×8bit乘法器603以及9bit×8bit乘法器604中。a2=[26:18]用作in_X5[8:0]、以及用作in_X6[8:0]分别输入至9bit×8bit乘法器605以及9bit×8bit乘法器606中。b1=[15:8]用作in_Y2[7:0]、in_Y4[7:0]以及in_Y6[7:0]分别输入至9bit×8bit乘法器602、乘法器604以及乘法器606中,b0=[7:0]用作in_Y1[7:0]、in_Y3[7:0]以及in_Y5[7:0]分别输入至9bit×8bit乘法器601、9bit×8bit乘法器603、以及9bit×8bit乘法器605中。As shown in FIG. 18, the digital signal processing apparatus includes six 9-bit x 8-bit multipliers, which can be used to implement the functions of a 27-bit x 16-bit multiplier. The source multiplicand A=X[26:0] can be divided into a2=[26:18], a1=[17:9], and a0=[8:0]; source multiplier B=Y[15:0] Can be divided into b1=[15:8] and b0=[7:0]; where a0=[8:0] is used as in_X1[8:0], and as in_X2[8:0] is input to 9bit × 8bit multiplier 601 and 9bit × 8bit multiplier 603. A1=[17:9] is used as in_X3[8:0] and in_X4[8:0] are input to the 9-bit×8-bit multiplier 603 and the 9-bit×8-bit multiplier 604, respectively. A2=[26:18] is used as in_X5[8:0], and is used as in_X6[8:0] to be input to the 9-bit×8-bit multiplier 605 and the 9-bit×8-bit multiplier 606, respectively. B1=[15:8] is used as in_Y2[7:0], in_Y4[7:0], and in_Y6[7:0] are input to the 9-bit×8-bit multiplier 602, the multiplier 604, and the multiplier 606, respectively, b0= [7:0] is used as in_Y1[7:0], in_Y3[7:0], and in_Y5[7:0] are input to 9bit×8bit multiplier 601, 9bit×8bit multiplier 603, and 9bit×8bit multiplier, respectively. 605.
由于需要满足Z[42:0]=X[26:0]×Y[15:0]=((2^18)*X[26:18]+(2^9)*X[17:9]+(2^0)*X[8:0])×((2^8)*Y[15:8]+(2^0)*Y[7:0])=(2^26)*(X[26:18]×Y[15:8])+(2^18)*(X[26:18]×Y[7:0])+(2^17)*(X[19:9]×Y[15:8])+(2^9)*(X[17:9]×Y[7:0])+(2^8)*(X[8:0]×Y[15:8])+(2^0)*(X[7:0]×Y[7:0])=(2^26)*(in_X6[8:0]×in_Y2[7:0])+(2^18)*(in_X5[8:0]×in_Y1[7:0])+(2^17)*(in_X4[8:0]×in_Y2[15:8])+(2^9)*(in_X3[8:0]×in_Y1[7:0])+(2^8)*(in_X3[8:0]×in_Y2[15:8])+(2^0)*(in_X1[8:0]×in_Y1[7:0]),因此移位电路701对接收到的乘积不执行移位操作;移位电路702将接收到的乘积左移8位;移位电路703将接收到的乘积左移9位;移位电路704将接收到的乘积左移17位;移位电路705将接收到的乘积左移18位;移位电路706将接收到的乘积左移26位。移位电路701中的选择器用于将未移位的乘积输出至选择控制电路203,移位控制电路202除移位电路701中的选择器之外的其余选择器,用于将各自分别接收到的移位寄存器移位后的乘积输出至选择控制电路203,选择控制电路203用于将移位控制电路202输出的一个未移位的乘积和其余移位后的乘积执行累加操作,具体的流程可以参见上述实施例需要进行累加的描述,本申请在此不再赘述。Because it needs to satisfy Z[42:0]=X[26:0]×Y[15:0]=((2^18)*X[26:18]+(2^9)*X[17:9] +(2^0)*X[8:0])×((2^8)*Y[15:8]+(2^0)*Y[7:0])=(2^26)*( X[26:18]×Y[15:8])+(2^18)*(X[26:18]×Y[7:0])+(2^17)*(X[19:9] ×Y[15:8])+(2^9)*(X[17:9]×Y[7:0])+(2^8)*(X[8:0]×Y[15:8 ])+(2^0)*(X[7:0]×Y[7:0])=(2^26)*(in_X6[8:0]×in_Y2[7:0])+(2^ 18)*(in_X5[8:0]×in_Y1[7:0])+(2^17)*(in_X4[8:0]×in_Y2[15:8])+(2^9)*(in_X3[ 8:0]×in_Y1[7:0])+(2^8)*(in_X3[8:0]×in_Y2[15:8])+(2^0)*(in_X1[8:0]×in_Y1 [7:0]), therefore shift circuit 701 does not perform a shift operation on the received product; shift circuit 702 shifts the received product to the left by 8 bits; shift circuit 703 shifts the received product to the left by 9 bits The shift circuit 704 shifts the received product to the left by 17 bits; the shift circuit 705 shifts the received product to the left by 18 bits; and the shift circuit 706 shifts the received product to the left by 26 bits. The selector in the shift circuit 701 is for outputting the unshifted product to the selection control circuit 203, and the remaining selectors of the shift control circuit 202 except the selector in the shift circuit 701 are used to receive the respective ones respectively. The shifted product of the shift register is output to the selection control circuit 203, and the selection control circuit 203 is configured to perform an accumulation operation on an unshifted product output from the shift control circuit 202 and the remaining shifted products, the specific flow For details, refer to the description of the above embodiments, which are not described herein.
如图19所示,图19以用包含6个9bit×8bit乘法器的数字信号处理装置实现矩阵[d1 d2 d3]和矩阵
Figure PCTCN2017095061-appb-000005
之间的乘积为例。
As shown in FIG. 19, FIG. 19 implements a matrix [d1 d2 d3] and a matrix with a digital signal processing apparatus including six 9-bit×8-bit multipliers.
Figure PCTCN2017095061-appb-000005
The product between them is an example.
其中,d1用作in_X1[8:0]和in_X4[8:0]分别输入至9bit×8bit乘法器601和9bit×8bit乘法器604中作为被乘数;d2用作in_X2[8:0]和in_X5[8:0]分别输入至9bit×8bit乘法器602和9bit×8bit乘法器605中作为被乘数;d3用作in_X3[8:0]和in_X6[8:0]分别输入至9bit×8bit乘法器603和9bit×8bit乘法器606中作为被乘数;k1用作in_Y1[7:0]和in_Y2[7:0]分别输入至9bit×8bit乘法器601和9bit×8bit乘法器602中作为乘数;k2用作in_Y2[7:0]输入至9bit×8bit乘法器602中作为乘数;k3用作in_Y3[7:0]输入至9bit×8bit乘法器603中作为乘数;j1用作in_Y4[7:0]输入至9bit×8bit乘法器604中作 为乘数;j2用作in_Y5[7:0]输入至9bit×8bit乘法器605中作为被乘数;j3用作in_Y6[7:0]输入至9bit×8bit乘法器606中作为乘数。最终如图19所示的装置输出的输出结果为Z1=(d1×k1)+(d2×k2)+(d3×k3);Z2=(d1×j1)+(d2×j2)+(d3×j3),因此,可知,为移位电路701、移位电路702、…、以及移位电路706均不进行移位操作的指示信息。Wherein d1 is used as in_X1[8:0] and in_X4[8:0] are input to the 9bit×8bit multiplier 601 and the 9bit×8bit multiplier 604 as the multiplicand; d2 is used as in_X2[8:0] and In_X5[8:0] is input to the 9bit×8bit multiplier 602 and the 9bit×8bit multiplier 605 as the multiplicand; d3 is used as the in_X3[8:0] and the in_X6[8:0] respectively input to the 9bit×8bit The multiplier 603 and the 9-bit × 8 bit multiplier 606 are used as multiplicands; k1 is used as in_Y1[7:0] and in_Y2[7:0] are input to the 9-bit × 8-bit multiplier 601 and the 9-bit × 8-bit multiplier 602, respectively. The multiplier; k2 is used as an in_Y2[7:0] input to the 9bit×8bit multiplier 602 as a multiplier; k3 is used as an in_Y3[7:0] input to the 9bit×8bit multiplier 603 as a multiplier; j1 is used as a multiplier; In_Y4[7:0] is input to the 9bit×8bit multiplier 604 It is a multiplier; j2 is used as in_Y5[7:0] input to the 9bit×8bit multiplier 605 as a multiplicand; j3 is used as in_Y6[7:0] input to the 9bit×8bit multiplier 606 as a multiplier. Finally, the output of the device output as shown in FIG. 19 is Z1=(d1×k1)+(d2×k2)+(d3×k3); Z2=(d1×j1)+(d2×j2)+(d3× J3) Therefore, it is understood that the shift circuit 701, the shift circuit 702, ..., and the shift circuit 706 do not perform the instruction information of the shift operation.
如图20所示,图20以用包含6个9bit×8bit乘法器的数字信号处理装置实现6个独立的9bit×8bit乘法器功能为例。图20中各个9bit×8bit乘法器的输入和图18中各个9bit×8bit相同,图20与图18的区别在于,图20中移位控制电路202对接收到的K个乘积均不执行移位操作,也即移位电路701、移位电路702、…、以及移位电路706中每个移位电路对各自接收到的乘积均不执行移位操作。因此,图20所示的装置,输出结果满足:Z1=(in_X1[8:0]×in_Y1[7:0]);Z2=(in_X2[8:0]×in_Y2[7:0]);Z3=(in_X3[8:0]×in_Y3[7:0]);Z4=(in_X4[8:0]×in_Y4[7:0]);Z5=(in_X5[8:0]×in_Y5[7:0])以及Z6=(in_X6[8:0]×in_Y6[7:0])。As shown in FIG. 20, FIG. 20 is an example of realizing six independent 9-bit by 8-bit multiplier functions by a digital signal processing apparatus including six 9-bit by 8-bit multipliers. The input of each 9-bit x 8-bit multiplier in Fig. 20 is the same as that of each 9 bit x 8 bit in Fig. 18. The difference between Fig. 20 and Fig. 18 is that the shift control circuit 202 of Fig. 20 does not perform shifting on the received K products. The operation, that is, each of the shift circuit 701, the shift circuit 702, ..., and the shift circuit 706 does not perform a shift operation for each of the products received. Therefore, in the apparatus shown in Fig. 20, the output result satisfies: Z1 = (in_X1[8:0] × in_Y1[7:0]); Z2 = (in_X2[8:0] × in_Y2[7:0]); Z3 =(in_X3[8:0]×in_Y3[7:0]); Z4=(in_X4[8:0]×in_Y4[7:0]); Z5=(in_X5[8:0]×in_Y5[7:0 ]) and Z6=(in_X6[8:0]×in_Y6[7:0]).
可选的,本申请实施例提供的任一种数字信号处理装置,可以集成在可编程逻辑器件中。在一个具体的示例中,可编程逻辑器件可以为FPGA。Optionally, any of the digital signal processing devices provided by the embodiments of the present application may be integrated into the programmable logic device. In a specific example, the programmable logic device can be an FPGA.
如图21所示,图21示出了本申请提供的一种数字信号处理方法的流程示意图,可以应用于数字信号处理装置中,例如,图3至图20所示的任一种数字信号处理装置,包括:As shown in FIG. 21, FIG. 21 is a schematic flowchart diagram of a digital signal processing method provided by the present application, which can be applied to a digital signal processing apparatus, for example, any of the digital signal processing shown in FIG. 3 to FIG. Devices, including:
S101、K个乘法器进行K个乘法操作获得K个乘积,将K个乘积输出至移位控制电路,K个乘法操作中的第i个乘法操作实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数。S101, K multipliers perform K multiplication operations to obtain K products, and K products are output to the shift control circuit, and the i-th multiplication operation in the K multiplication operations realizes the multiplicand bit width as M i bits, multiplied A multiplication operation in which the digit width is N i bits, where M i and N i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2.
S102、移位控制电路对K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路。S102. The shift control circuit performs shift control processing on the K products, obtains K processed products, and outputs the K processed products to the selection control circuit.
S103、选择控制电路输出K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。S103. The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
可选的,步骤S102具体可以通过以下方式实现:移位控制电路包括的K个移位电路中每个移位电路对K个乘积中的一个乘积进行移位控制处理,以得到K个处理后的乘积;每个移位电路将得到的处理后的乘积输出至选择控制电路。Optionally, the step S102 is specifically implemented by: each of the K shift circuits included in the shift control circuit performs a shift control process on one of the K products to obtain K processes. The product of each of the shift circuits outputs the resulting processed product to the selection control circuit.
可选的,移位控制处理包括:移位或者不移位。即,所述K个乘积中的任一个乘积可以被移位或者不被移位,所述K个乘积的移位控制处理可以相同或者不同。Optionally, the shift control process includes: shifting or not shifting. That is, any one of the K products may or may not be shifted, and the shift control processes of the K products may be the same or different.
作为一种可能的实现方式,移位控制电路包括K个移位电路,K个移位电路中任一个移位电路包括一个移位寄存器和一个选择器,其中,选择器的一端与移位寄存器连接,选择器的另一端与一个乘法器相连接。选择器接收移位寄存器和乘法器的输出,并选择其中之一作为自己的输出。移位寄存器用于将乘法器输出的乘积进行移位操作,然后输出至选择器中。As a possible implementation manner, the shift control circuit includes K shift circuits, and any one of the K shift circuits includes a shift register and a selector, wherein one end of the selector and the shift register Connect, the other end of the selector is connected to a multiplier. The selector receives the output of the shift register and multiplier and selects one of them as its own output. The shift register is used to shift the product of the multiplier output and then output to the selector.
可选的,本申请提供的方法用于实现如下功能中的至少一种:乘累加运算功能、K个乘法器的功能;M×N乘法器的功能,其中,M×N乘法器表示被乘数位宽为M比 特,乘数位宽为N比特的乘法器,且满足
Figure PCTCN2017095061-appb-000006
Optionally, the method provided by the application is used to implement at least one of the following functions: a multiply and accumulate operation function, a function of K multipliers, and a function of an M×N multiplier, where the M×N multiplier indicates that the multiplication is performed. Multiplier with a bit width of M bits and a multiplier bit width of N bits, and satisfies
Figure PCTCN2017095061-appb-000006
可选的,本申请提供的方法用于实现所述乘累加运算时,步骤S102具体通过以下方式实现:移位控制电路对K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将K个不经过移位处理的乘积输出至选择控制电路。Optionally, when the method provided by the present application is used to implement the multiply and accumulate operation, step S102 is specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted processes. The product is output, and K products that are not subjected to the shift processing are output to the selection control circuit.
具体的,移位控制电路包括的K个移位电路均对各自接收到的乘积不执行移位处理,以获得K个不经过移位处理的乘积。作为一种可能的实现方式,K个移位电路中每个选择器将各自接收到的乘法器输出的乘积,作为K个不经过移位处理的乘积。Specifically, the K shift circuits included in the shift control circuit do not perform shift processing on the respective received products to obtain K products that are not subjected to the shift processing. As a possible implementation, each of the K shifting circuits will multiply the product of the respective multiplier outputs as K products that are not subjected to shift processing.
步骤S103具体可以通过以下方式实现:选择控制电路将接收到的K个不经过移位处理的乘积执行累加操作并输出K个不经过移位处理的乘积累加结果。Step S103 can be specifically implemented by the following method: the selection control circuit performs the accumulation operation on the received products that are not subjected to the shift processing, and outputs K multiplication accumulation addition results that are not subjected to the shift processing.
可选的,本申请提供的方法用于实现K个乘法器的功能时,步骤S102具体可以通过以下方式实现:移位控制电路对K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将K个不经过移位处理的乘积输出至选择控制电路。Optionally, when the method provided by the present application is used to implement the functions of the K multipliers, step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on K products, and obtains K non-shifted The processed product is output to the selection control circuit by K products that are not subjected to the shift processing.
步骤S103具体可以通过以下方式实现:移位控制电路将接收到的K个不经过移位处理的乘积直接输出。Step S103 can be specifically implemented by the following method: the shift control circuit directly outputs the received K products that have not undergone the shift processing.
可选的,本申请提供的方法用于实现M×N乘法器的功能时,步骤S102具体可以通过以下方式实现:移位控制电路对接收到的K个乘积中的一个乘积不进行移位处理,对接收到的所述K个乘积中的其他K-1个乘积进行移位处理,得到K个处理后的乘积,并将K个处理后的乘积输出至选择控制电路;Optionally, when the method provided by the present application is used to implement the function of the M×N multiplier, step S102 may be specifically implemented by: the shift control circuit does not perform shift processing on one of the received K products. And performing shift processing on the other K-1 products of the received K products to obtain K processed products, and outputting the K processed products to the selection control circuit;
步骤S103具体可以通过以下方式实现:移位控制电路将接收到的K个处理后的乘积累加并输出。Step S103 can be specifically implemented by: the shift control circuit adds and outputs the received K processed multiplications.
可选的,本申请提供的方法还包括:S104、接收第一配置信息,第一配置信息用于指示移位控制电路对接收到的K个乘积进行移位控制处理。Optionally, the method provided by the application further includes: S104: Receive first configuration information, where the first configuration information is used to instruct the shift control circuit to perform a shift control process on the received K products.
可选的,第一配置信息包含为移位控制电路中的至少一个移位电路配置的指示信息,其中,指示信息用于指示至少一个移位电路对移位电路接收到的乘积进行移位控制处理。Optionally, the first configuration information includes indication information configured for at least one of the shift control circuits, wherein the indication information is used to indicate that the at least one shift circuit performs shift control on the product received by the shift circuit. deal with.
可选的,指示信息还用于指示至少一个移位电路对至少一个移位电路接收到的乘积进行移位控制处理时的移位方向和/或移位位数。Optionally, the indication information is further used to indicate a shift direction and/or a shift bit number when the shift control process is performed by the at least one shift circuit on the product received by the at least one shift circuit.
具体的,第一配置信息包括为每个移位电路配置的指示信息。Specifically, the first configuration information includes indication information configured for each shift circuit.
可选的,本申请提供的方法还包括:S105、接收第二配置信息,所述第二配置信息用于指示所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。Optionally, the method provided by the application further includes: S105: Receive second configuration information, where the second configuration information is used to instruct the selection control circuit to output an accumulated result of the K processed products or output the K processed products.
如图22所示,本申请提供一种可编程逻辑器件(programmable logic device,PLD),该可编程逻辑器件至少包括如图3至图20中任一个实施例所描述的数字信号处理装置。该可编程逻辑器件可以应用于需要进行数字信号处理(例如,乘累加计算)的场景中,例如,雷达、人工智能、深度学习、图像处理、视频处理、无线基带/中射频、卫星导航等。As shown in FIG. 22, the present application provides a programmable logic device (PLD), the programmable logic device including at least the digital signal processing device as described in any one of embodiments of FIGS. 3 to 20. The programmable logic device can be applied to scenes that require digital signal processing (eg, multiply-accumulate calculation), such as radar, artificial intelligence, deep learning, image processing, video processing, wireless baseband/medium radio, satellite navigation, and the like.
具体的,该可编程逻辑器件可以为如图22所示的现场可编程门阵列(Field  Programmable Gate Array,FPGA),具体的,数字信号处理装置可以为如图22所示的数字信号处理器(Digital Signal Processor,DSP)。该可编程逻辑器件还可以为复杂可编程逻辑器件(Complex Programable Logic Device,CPLD)或者可擦除可编辑逻辑器件(Erasable Programmable Logic Device,EPLD)。当然该可编程逻辑器件还可以包括电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、可编程阵列逻辑(Programmable Array Logic,PAL)、通用阵列逻辑(Generic Array Logic,GAL)等。Specifically, the programmable logic device can be a field programmable gate array as shown in FIG. Programmable Gate Array (FPGA), specifically, the digital signal processing device may be a digital signal processor (DSP) as shown in FIG. The programmable logic device can also be a Complex Programable Logic Device (CPLD) or an Erasable Programmable Logic Device (EPLD). Of course, the programmable logic device may also include Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Array Logic (PAL), Generic Array Logic (GAL). )Wait.
本申请中提供的FPGA可以用于工业控制、消费性电子等相关领域,且由于FPGA内包含大量的DSP硬核单元,提供了大量的乘法和加法运算能力。由于本申请提供的数字信号处理装置中具备选择控制电路和移位控制电路,通过对选择控制电路和移位控制电路进行配置,可以使得应用数字信号处理装置的FPGA实现多种不同的运算功能。本申请提供的FPGA可以通过包含的DSP硬核而提供大量乘法和加法运算能力,可以用作人工智能中最重要的深度学习算法(特别是卷积神经网络(Convolutional Neural Network,CNN))中,使用FPGA作为深度学习的处理器。The FPGA provided in this application can be used in industrial control, consumer electronics and other related fields, and provides a large number of multiplication and addition capabilities due to the large number of DSP hard core units included in the FPGA. Since the digital signal processing device provided by the present application includes a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions. The FPGA provided by the present application can provide a large number of multiplication and addition functions through the included DSP hard core, and can be used as the most important deep learning algorithm in artificial intelligence (especially in the Convolutional Neural Network (CNN)). Use FPGA as a deep learning processor.
由图22可知,FPGA目前主要是基于查找表技术的,并且整合了常用功能(如RAM、时钟管理和DSP)的硬核(Hardcore,ASIC型)模块;由于基于查找表的FPGA具有很高的集成度,其器件密度从数万门到数千万门不等,可以完成极其复杂的时序与逻辑组合逻辑电路功能,所以适用于高速、高密度的高端数字逻辑电路设计领域。FPGA组成部分主要有:可编程输入输出单元(I/O Block,IOB)、可编程逻辑资源单元(Configurable Logic Resource Block,CLB)、完整的时钟管理、嵌入块式随机存取存储器(Random access memory,RAM)、丰富的布线资源、DSP硬核单元和其他的内嵌专用硬件模块等。As can be seen from Figure 22, FPGA is currently based on look-up table technology, and integrates common functions (such as RAM, clock management and DSP) hard core (Hardcore, ASIC type) module; because the lookup table based FPGA has a very high The degree of integration, with device densities ranging from tens of thousands to tens of millions of gates, can complete extremely complex timing and logic combination logic functions, so it is suitable for high-speed, high-density high-end digital logic circuit design. The main components of FPGA are: programmable input/output unit (I/O Block, IOB), Configurable Logic Resource Block (CLB), complete clock management, embedded block random access memory (Random access memory). , RAM), rich routing resources, DSP hard core units and other embedded dedicated hardware modules.
图22中可编程逻辑资源单元,可以通过编程完成各种电路和功能,包括可编程的查找表(Look Up Table,LUT)和寄存器,CLB的数量已经达到百万级别(K×K)。图22中,横、竖线,表示布线资源,可以通过编程完成各CLB的输入和输出互联,布线资源联通FPGA内的各种可编程资源,图22中的FPGA的布线采用短线互联的架构。The programmable logic resource unit in Figure 22 can be programmed to perform a variety of circuits and functions, including programmable Look Up Tables (LUTs) and registers. The number of CLBs has reached millions of levels (K x K). In FIG. 22, horizontal and vertical lines indicate routing resources, and the input and output interconnections of each CLB can be completed by programming. The wiring resources are connected to various programmable resources in the FPGA, and the wiring of the FPGA in FIG. 22 adopts a short-line interconnection architecture.
图22中,DSP表示FPGA内部的数字信号处理硬核单元(DSP Hardcore),其可配置完成乘法和加法操作,以及乘累加操作等复杂信号处理运算,以满足用户实现视频解码、傅里叶变换等信号和图像处理的需求,成为FPGA完成信号处理的最重要的硬核单元。其中,图22中的DSP可以使用本申请实施例中提供的任一种数据信号处理装置。In Figure 22, the DSP represents the digital signal processing hard core unit (DSP Hardcore) inside the FPGA, which can be configured to perform complex signal processing operations such as multiplication and addition operations, and multiply and accumulate operations to satisfy the user's video decoding and Fourier transform. The need for signal and image processing has become the most important hard core unit for FPGA signal processing. The DSP in FIG. 22 can use any of the data signal processing devices provided in the embodiments of the present application.
在一个具体的示例中,在深度学习的运算中,特别是卷积神经网络处理运算中大部分的运算为卷积运算,最典型的为N×N矩阵的卷积运算,比如2×2或者3×3或者5×5的卷积运算,如图23所示。In a specific example, in the deep learning operation, in particular, most of the operations in the convolutional neural network processing operation are convolution operations, most typically convolution operations of N×N matrices, such as 2×2 or A 3×3 or 5×5 convolution operation is shown in FIG.
如果采用传统技术方案中FPGA内DSP乘法器实现,则需要N个乘法器才能同时实现一次卷积运算。N个乘法器需要采用逻辑电路来级联搭成所需的乘累加运算,该部分逻辑电路为可编程电路,需要占用除DSP以外的其他可编程逻辑资源。这样会受限于用于级联的逻辑电路使得可编程部分无法运行到太高频率,虽然DSP硬核 能运行到较高频率,但总体运行计算性能受限,运算能力大量浪费。例如,如图24所示,卷积运算完成被乘数与乘数对应值的乘法,并实现累加操作。而本申请中完成2×2的卷积运算仅需要一个本申请实施例所提供的数字信号处理装置即可完成。If the DSP multiplier in the FPGA is implemented in the conventional technical solution, N multipliers are needed to realize one convolution operation at the same time. N multipliers need to use logic circuits to cascade into the required multiply and accumulate operations. The part of the logic circuit is a programmable circuit and needs to occupy other programmable logic resources other than DSP. This would be limited by the logic used for cascading so that the programmable part could not run to too high a frequency, although the DSP hard core It can run to a higher frequency, but the overall running computing performance is limited, and the computing power is wasted. For example, as shown in FIG. 24, the convolution operation completes the multiplication of the multiplicand and the multiplier corresponding value, and implements an accumulation operation. The completion of the 2×2 convolution operation in the present application requires only one digital signal processing device provided in the embodiment of the present application.
在一种可能的实现方式中,当本申请中的装置用于深度学习算法中时,被乘数可以为本申请中的装置接收到的一些数据(例如图像,声音和文本)的参数,而乘数可以为位于Kernel(内核)中的固定参数。In a possible implementation, when the apparatus in the present application is used in a deep learning algorithm, the multiplicand may be a parameter of some data (such as image, sound, and text) received by the device in the present application. The multiplier can be a fixed parameter located in the Kernel.
本申请实施例中的数字信号处理装置,以硬核的方式内嵌于FPGA内,由于是ASIC化的硬核电路,能提供一定灵活性的同时,保证最佳运行频率,提供最高的运算效率,由于数字信号处理装置中具备选择控制电路和移位控制电路,通过对选择控制电路和移位控制电路进行配置,可以使得应用数字信号处理装置的FPGA实现多种不同的运算功能。The digital signal processing device in the embodiment of the present application is embedded in the FPGA in a hard core manner. Since it is an ASIC-based hard core circuit, it can provide certain flexibility while ensuring an optimal operating frequency and providing the highest operational efficiency. Since the digital signal processing device is provided with a selection control circuit and a shift control circuit, by configuring the selection control circuit and the shift control circuit, the FPGA applying the digital signal processing device can implement a plurality of different arithmetic functions.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 Finally, it should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (25)

  1. 一种数字信号处理装置,其特征在于,包括:K个乘法器,移位控制电路和选择控制电路,其中,A digital signal processing device, comprising: K multipliers, a shift control circuit and a selection control circuit, wherein
    所述K个乘法器中的第i个乘法器,用于实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数;The i-th multiplier of the K multipliers is used to implement a multiplication operation in which the multiplicand bit width is M i bits and the multiplier bit width is N i bits, where both M i and N i are positive integers , i=1, 2...K, K is an integer greater than or equal to 2;
    所述移位控制电路,与所述K个乘法器相连,用于接收所述K个乘法器输出的K个乘积,以及用于对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至所述选择控制电路;The shift control circuit is connected to the K multipliers for receiving K products of the K multiplier outputs, and for performing shift control processing on the K products to obtain K processes a subsequent product, and outputting the K processed products to the selection control circuit;
    所述选择控制电路,与所述移位控制电路相连,用于接收所述移位控制电路发送的所述K个处理后的乘积,并输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。The selection control circuit is connected to the shift control circuit for receiving the K processed products sent by the shift control circuit, and outputting the accumulated result or output of the K processed products The K processed products.
  2. 根据权利要求1所述的装置,其特征在于,所述移位控制电路包括K个移位电路,所述K个移位电路中的每个移位电路与所述K个乘法器中的一个乘法器相连,所述每个移位电路用于接收一个乘法器输出的乘积,并对所述一个乘法器输出的乘积进行移位控制处理,得到一个处理后的乘积。The apparatus according to claim 1, wherein said shift control circuit comprises K shift circuits, each of said K shift circuits and one of said K multipliers The multipliers are connected, and each of the shifting circuits is configured to receive a product of a multiplier output and perform a shift control process on the product of the output of the one multiplier to obtain a processed product.
  3. 根据权利要求1或2所述的装置,其特征在于,所述移位控制处理包括:移位或者不移位。The apparatus according to claim 1 or 2, wherein the shift control processing comprises shifting or not shifting.
  4. 根据权利要求1-3任一项所述的装置,其特征在于,所述选择控制电路包括累加电路,所述累加电路用于实现将所述K个处理后的乘积的累加。Apparatus according to any of claims 1-3, wherein said selection control circuit comprises an accumulation circuit for effecting accumulation of said K processed products.
  5. 根据权利要求1-4任一项所述的装置,其特征在于,所述装置用于实现如下功能中的至少一种:Apparatus according to any one of claims 1 to 4, wherein said apparatus is operative to implement at least one of the following functions:
    乘累加运算功能;Multiply and accumulate functions;
    所述K个乘法器的功能;The function of the K multipliers;
    M×N乘法器的功能,其中,所述M×N乘法器表示被乘数位宽为M比特,乘数位宽为N比特的乘法器,且满足
    Figure PCTCN2017095061-appb-100001
    K=Ka·Kb
    The function of an M×N multiplier, wherein the M×N multiplier represents a multiplier having a multiplicand bit width of M bits and a multiplier bit width of N bits, and satisfies
    Figure PCTCN2017095061-appb-100001
    K = K a · K b .
  6. 根据权利要求5所述的装置,其特征在于,所述装置用于实现所述乘累加运算时,The apparatus according to claim 5, wherein said apparatus is configured to implement said multiply and accumulate operation,
    所述移位控制电路,将接收到的所述K个乘法器输出的所述K个乘积,不经过移位处理,输出至选择控制电路;The shift control circuit outputs the received K products of the K multipliers to the selection control circuit without performing a shift process;
    所述选择控制电路,将接收到的所述K个乘积累加并输出。The selection control circuit adds and outputs the received K multiplications.
  7. 根据权利要求5所述的装置,其特征在于,所述装置用于实现所述K个乘法器的功能时,The apparatus according to claim 5, wherein said apparatus is configured to implement the functions of said K multipliers,
    所述移位控制电路,将接收到的所述K个乘法器输出的所述K个乘积,不经过移位处理,输出至选择控制电路;The shift control circuit outputs the received K products of the K multipliers to the selection control circuit without performing a shift process;
    所述选择控制电路,将接收到的所述K个乘积输出。The selection control circuit outputs the received K products.
  8. 根据权利要求5所述的装置,其特征在于,所述装置用于实现M×N乘法器的功能时, The apparatus according to claim 5, wherein said apparatus is for implementing a function of an M x N multiplier
    所述移位控制电路,对接收到的所述K个乘积中的一个乘积不进行移位处理,对接收到的所述K个乘积中的其他K-1个乘积进行移位处理,得到所述K个处理后的乘积,并将所述K个处理后的乘积输出至所述选择控制电路,其中,对所述其他K-1个乘积中的任意一个乘积进行的移位处理,根据生成所述任意一个乘积的被乘数在M比特被乘数中所占的比特位置和生成所述任意一个乘积的乘数在N比特乘数中所占的比特位置进行;The shift control circuit does not perform shift processing on one of the received K products, and performs shift processing on the other K-1 products in the received K products to obtain a Deriving the K processed products, and outputting the K processed products to the selection control circuit, wherein the shift processing performed on any one of the other K-1 products is generated according to The bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand and the bit position occupied by the multiplier generating the arbitrary one product in the N-bit multiplier;
    所述选择控制电路将所述K个处理后的乘积累加并输出。The selection control circuit adds and outputs the K processed multiplications.
  9. 根据权利要求1-8任一项所述的装置,其特征在于,所述数字信号处理装置还包括配置信息接收电路,用于接收第一配置信息,所述第一配置信息用于指示所述移位控制电路对接收到的所述K个乘积进行所述移位控制处理。The apparatus according to any one of claims 1-8, wherein the digital signal processing apparatus further comprises configuration information receiving circuitry for receiving first configuration information, the first configuration information being used to indicate the The shift control circuit performs the shift control process on the received K products.
  10. 根据权利要求9所述的装置,其特征在于,所述第一配置信息包含为所述移位控制电路中的至少一个移位电路配置的指示信息,其中,所述指示信息用于指示所述至少一个移位电路对所述移位电路接收到的乘积进行移位控制处理。The apparatus according to claim 9, wherein said first configuration information comprises indication information configured for at least one of said shift control circuits, wherein said indication information is for indicating said At least one shift circuit performs a shift control process on the product received by the shift circuit.
  11. 根据权利要求10所述的装置,其特征在于,所述指示信息还用于指示所述至少一个移位电路对所述至少一个移位电路接收到的乘积进行移位控制处理时的移位方向和/或移位位数。The apparatus according to claim 10, wherein said indication information is further for indicating a shift direction when said at least one shift circuit performs a shift control process on a product received by said at least one shift circuit And / or shift the number of bits.
  12. 根据权利要求1至11任一项所述的装置,其特征在于,所述数字信号处理装置还包括配置信息接收电路,用于接收第二配置信息,所述第二配置信息用于指示所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。The apparatus according to any one of claims 1 to 11, wherein the digital signal processing apparatus further comprises configuration information receiving circuitry for receiving second configuration information, the second configuration information being used to indicate the The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
  13. 根据权利要求1至12任一项所述的装置,其特征在于,所述数字信号处理装置集成在可编程逻辑器件中。Apparatus according to any one of claims 1 to 12 wherein said digital signal processing means is integrated in a programmable logic device.
  14. 一种可编程逻辑器件,其特征在于,所述可编程逻辑器件中包括如权利要求1-12任意一项权利要求所述的数字信号处理装置。A programmable logic device, characterized in that the programmable logic device comprises a digital signal processing device according to any of claims 1-12.
  15. 一种数字信号处理方法,其特征在于,所述方法包括:A digital signal processing method, the method comprising:
    进行K个乘法操作获得K个乘积,将所述K个乘积输出至移位控制电路,所述K个乘法操作中的第i个乘法操作实现被乘数位宽为Mi比特,乘数位宽为Ni比特的乘法运算,其中,Mi和Ni均为正整数,i=1,2…K,K为大于或等于2的整数;K multiplication operations are performed to obtain K products, and the K products are output to a shift control circuit, and an ith multiplication operation of the K multiplication operations realizes a multiplicand bit width of M i bits, a multiplier bit a multiplication operation of N i bits, where M i and N i are both positive integers, i=1, 2...K, and K is an integer greater than or equal to 2;
    所述移位控制电路对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路;The shift control circuit performs shift control processing on the K products to obtain K processed products, and outputs the K processed products to a selection control circuit;
    所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。The selection control circuit outputs an accumulation result of the K processed products or outputs the K processed products.
  16. 根据权利要求15所述的方法,其特征在于,所述移位控制电路对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路,包括:The method according to claim 15, wherein said shift control circuit performs shift control processing on said K products to obtain K processed products, and outputs said K processed products To select control circuits, including:
    所述移位控制电路包括的K个移位电路中每个移位电路对所述K个乘积中的一个乘积进行移位控制处理,以得到K个处理后的乘积;Each of the K shift circuits included in the shift control circuit performs shift control processing on one of the K products to obtain K processed products;
    所述每个移位电路将得到的处理后的乘积输出至所述选择控制电路。Each of the shift circuits outputs the obtained processed product to the selection control circuit.
  17. 根据权利要求15或16所述的方法,其特征在于,所述移位控制处理包括: 移位或者不移位。The method according to claim 15 or 16, wherein the shift control process comprises: Shift or not shift.
  18. 根据权利要求15-17任一项所述的方法,其特征在于,所述方法用于实现如下功能中的至少一种:A method according to any one of claims 15-17, wherein the method is for implementing at least one of the following functions:
    乘累加运算功能;Multiply and accumulate functions;
    所述K个乘法器的功能;The function of the K multipliers;
    M×N乘法器的功能,其中,所述M×N乘法器表示被乘数位宽为M比特,乘数位宽为N比特的乘法器,且满足
    Figure PCTCN2017095061-appb-100002
    K=Ka·Kb
    The function of an M×N multiplier, wherein the M×N multiplier represents a multiplier having a multiplicand bit width of M bits and a multiplier bit width of N bits, and satisfies
    Figure PCTCN2017095061-appb-100002
    K = K a · K b .
  19. 根据权利要求18所述的方法,其特征在于,所述方法用于实现所述乘累加运算时,所述移位控制电路对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路,包括:The method according to claim 18, wherein when the method is used to implement the multiply and accumulate operation, the shift control circuit performs shift control processing on the K products to obtain K processed Product, and outputting the K processed products to the selection control circuit, including:
    所述移位控制电路对所述K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将所述K个不经过移位处理的乘积输出至所述选择控制电路;The shift control circuit does not perform shift processing on the K products, obtains K products that are not subjected to shift processing, and outputs the K products that have not undergone shift processing to the selection control circuit;
    所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积,包括:The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:
    所述选择控制电路将接收到的K个不经过移位处理的乘积执行累加操作并输出所述K个不经过移位处理的乘积累加结果。The selection control circuit performs an accumulation operation on the received K non-shifted products and outputs the K multiplication accumulation addition results that are not subjected to the shift processing.
  20. 根据权利要求16所述的方法,其特征在于,所述方法用于实现所述K个乘法器的功能时,所述移位控制电路对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路,包括:The method according to claim 16, wherein when the method is used to implement the functions of the K multipliers, the shift control circuit performs shift control processing on the K products to obtain K The processed product, and outputting the K processed products to the selection control circuit, comprising:
    所述移位控制电路对所述K个乘积不经过移位处理,得到K个不经过移位处理的乘积,并将所述K个不经过移位处理的乘积输出至所述选择控制电路;The shift control circuit does not perform shift processing on the K products, obtains K products that are not subjected to shift processing, and outputs the K products that have not undergone shift processing to the selection control circuit;
    所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积,包括:The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:
    所述移位控制电路将接收到的所述K个不经过移位处理的乘积直接输出。The shift control circuit directly outputs the received products that are not subjected to the shift processing.
  21. 根据权利要求16所述的方法,其特征在于,所述方法用于实现所述M×N乘法器的功能时,所述移位控制电路对所述K个乘积进行移位控制处理,得到K个处理后的乘积,并将所述K个处理后的乘积输出至选择控制电路,包括:所述移位控制电路对接收到的所述K个乘积中的一个乘积不进行移位处理,对接收到的所述K个乘积中的其他K-1个乘积进行移位处理,得到所述K个处理后的乘积,并将所述K个处理后的乘积输出至所述选择控制电路,其中,对所述其他K-1个乘积中的任意一个乘积进行的移位处理,根据生成所述任意一个乘积的被乘数在M比特被乘数中所占的比特位置和生成所述任意一个乘积的乘数在N比特乘数中所占的比特位置进行;The method according to claim 16, wherein when the method is used to implement the function of the M×N multiplier, the shift control circuit performs shift control processing on the K products to obtain K a processed product, and outputting the K processed products to the selection control circuit, comprising: the shift control circuit does not perform shift processing on one of the received K products, Performing shift processing on the other K-1 products of the K products, obtaining the K processed products, and outputting the K processed products to the selection control circuit, wherein a shift process performed on any one of the other K-1 products, generating a bit position according to a bit position occupied by the multiplicand of the arbitrary one product in the M-bit multiplicand The multiplier of the product is performed at the bit position occupied by the N-bit multiplier;
    所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积,包括:The selection control circuit outputs an accumulated result of the K processed products or outputs the K processed products, including:
    所述移位控制电路将接收到的所述K个处理后的乘积累加并输出。The shift control circuit adds and outputs the received K processed multiplications.
  22. 根据权利要求15-21任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15 to 21, wherein the method further comprises:
    接收第一配置信息,所述第一配置信息用于指示所述移位控制电路对接收到的 所述K个乘积进行所述移位控制处理。Receiving first configuration information, the first configuration information is used to instruct the shift control circuit to receive the received The K products perform the shift control process.
  23. 根据权利要求22所述的方法,其特征在于,所述第一配置信息包含为所述移位控制电路中的至少一个移位电路配置的指示信息,其中,所述指示信息用于指示所述至少一个移位电路对所述移位电路接收到的乘积进行移位控制处理。The method of claim 22, wherein the first configuration information comprises indication information configured for at least one of the shift control circuits, wherein the indication information is for indicating the At least one shift circuit performs a shift control process on the product received by the shift circuit.
  24. 根据权利要求23所述的方法,其特征在于,所述指示信息还用于指示所述至少一个移位电路对所述至少一个移位电路接收到的乘积进行移位控制处理时的移位方向和/或移位位数。The method according to claim 23, wherein said indication information is further for indicating a shift direction when said at least one shift circuit performs a shift control process on a product received by said at least one shift circuit And / or shift the number of bits.
  25. 根据权利要求15至24任一项所述的方法,其特征在于,所述方法还包括:接收第二配置信息,所述第二配置信息用于指示所述选择控制电路输出所述K个处理后的乘积的累加结果或者输出所述K个处理后的乘积。 The method according to any one of claims 15 to 24, further comprising: receiving second configuration information, the second configuration information being used to instruct the selection control circuit to output the K processes The accumulated result of the latter product or the K processed product is output.
PCT/CN2017/095061 2017-07-28 2017-07-28 Digital signal processing method and device and programmable logic device WO2019019196A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095061 WO2019019196A1 (en) 2017-07-28 2017-07-28 Digital signal processing method and device and programmable logic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095061 WO2019019196A1 (en) 2017-07-28 2017-07-28 Digital signal processing method and device and programmable logic device

Publications (1)

Publication Number Publication Date
WO2019019196A1 true WO2019019196A1 (en) 2019-01-31

Family

ID=65039300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095061 WO2019019196A1 (en) 2017-07-28 2017-07-28 Digital signal processing method and device and programmable logic device

Country Status (1)

Country Link
WO (1) WO2019019196A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4872128A (en) * 1987-06-30 1989-10-03 Mitsubishi Denki Kabushiki Kaisha High speed data processing unit using a shift operation
CN104133655A (en) * 2014-07-11 2014-11-05 中国人民解放军信息工程大学 Design method of anti-radiation multiplier based on satellite-borne MIMO detection
CN106484366A (en) * 2016-10-17 2017-03-08 东南大学 A kind of variable modular multiplication device of two element field bit wide
CN106528046A (en) * 2016-11-02 2017-03-22 上海集成电路研发中心有限公司 Long bit width time sequence accumulation multiplying unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4872128A (en) * 1987-06-30 1989-10-03 Mitsubishi Denki Kabushiki Kaisha High speed data processing unit using a shift operation
CN104133655A (en) * 2014-07-11 2014-11-05 中国人民解放军信息工程大学 Design method of anti-radiation multiplier based on satellite-borne MIMO detection
CN106484366A (en) * 2016-10-17 2017-03-08 东南大学 A kind of variable modular multiplication device of two element field bit wide
CN106528046A (en) * 2016-11-02 2017-03-22 上海集成电路研发中心有限公司 Long bit width time sequence accumulation multiplying unit

Similar Documents

Publication Publication Date Title
CN109063825B (en) Convolutional neural network accelerator
US11907719B2 (en) FPGA specialist processing block for machine learning
US10275219B2 (en) Bit-serial multiplier for FPGA applications
CN111008003B (en) Data processor, method, chip and electronic equipment
US11809798B2 (en) Implementing large multipliers in tensor arrays
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
US4528641A (en) Variable radix processor
CN111047034A (en) On-site programmable neural network array based on multiplier-adder unit
CN109325590B (en) Device for realizing neural network processor with variable calculation precision
TW202020654A (en) Digital circuit with compressed carry
CN111931441A (en) Method, device and medium for establishing FPGA rapid carry chain time sequence model
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
WO2019019196A1 (en) Digital signal processing method and device and programmable logic device
CN113504892A (en) Method, system, equipment and medium for designing multiplier lookup table
CN110647307B (en) Data processor, method, chip and electronic equipment
CN209879493U (en) Multiplier and method for generating a digital signal
CN210006031U (en) Multiplier and method for generating a digital signal
JP2001051826A (en) Information processing system, method for generating circuit information of programmable logic circuit, and method for reconstituting programmable logic circuit
JP2004220377A (en) Reconfigurable circuit, and integrated circuit device and data conversion device capable of using it
JP2021501406A (en) Methods, devices, and systems for task processing
US20230259581A1 (en) Method and apparatus for floating-point data type matrix multiplication based on outer product
CN111610955B (en) Data saturation and packaging processing component, chip and equipment
JP4413052B2 (en) Data flow graph processing apparatus and processing apparatus
CN220208247U (en) Division operation circuit
CN113031909B (en) Data processor, method, device and chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17918917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17918917

Country of ref document: EP

Kind code of ref document: A1