CN209879493U - Multiplier and method for generating a digital signal - Google Patents

Multiplier and method for generating a digital signal Download PDF

Info

Publication number
CN209879493U
CN209879493U CN201921433536.7U CN201921433536U CN209879493U CN 209879493 U CN209879493 U CN 209879493U CN 201921433536 U CN201921433536 U CN 201921433536U CN 209879493 U CN209879493 U CN 209879493U
Authority
CN
China
Prior art keywords
bit
data
circuit
multiplier
partial product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201921433536.7U
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201921433536.7U priority Critical patent/CN209879493U/en
Application granted granted Critical
Publication of CN209879493U publication Critical patent/CN209879493U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present application provides a multiplier, comprising: the output end of the judgment circuit is connected with the input end of the data expansion circuit, the output end of the judgment circuit is connected with the first input end of the regular signed number coding circuit, the output end of the data expansion circuit is connected with the second input end of the regular signed number coding circuit, the output end of the regular signed number coding circuit is connected with the input end of the compression circuit, the multiplier can carry out regular signed number coding on received data through the regular signed number coding circuit, the number of obtained effective partial products is small, and therefore the complexity of the multiplier for realizing multiplication operation is reduced.

Description

Multiplier and method for generating a digital signal
Technical Field
The present application relates to the field of computer technologies, and in particular, to a multiplier.
Background
With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has higher and higher requirements for high-performance digital multipliers. As one of algorithms widely used by an intelligent chip, a neural network algorithm is a common operation in which multiplication is performed by a multiplier.
At present, a multiplier takes every three bits of a multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a wallace tree to obtain a multiplication result. However, in the conventional technique, the number of non-zero values in the code is large, and the number of the generated corresponding partial products is large, so that the complexity of the multiplier for realizing multiplication operation is high.
SUMMERY OF THE UTILITY MODEL
In view of the above, it is necessary to provide a multiplier to solve the above technical problems.
An embodiment of the present application provides a multiplier, where the multiplier includes: the output end of the judging circuit is connected with the input end of the data expansion circuit and the first input end of the regular signed number coding circuit, the output end of the data expansion circuit is connected with the second input end of the regular signed number coding circuit, the output end of the regular signed number coding circuit is connected with the first input end of the compression circuit, and the regular signed number coding circuit comprises a third input end and is used for receiving a function selection mode signal; the compression circuit comprises a second input end for receiving the function selection mode signal;
the judging circuit is used for judging whether the received data needs to be processed through the data expansion circuit connected with the output end of the judging circuit, the data expansion circuit is used for carrying out expansion processing on the received data, the regular signed number coding circuit is used for carrying out regular signed number coding processing on the received data to obtain a partial product of a target code, and the compression circuit is used for carrying out accumulation processing on the partial product of the target code.
In one embodiment, the determining circuit includes: a first data input port and a first data output port; the first data input port is used for receiving data for multiplication processing, and the first data output port is used for outputting the received data.
In one embodiment, the data expansion circuit includes: a second data input port, an extended mode selection signal input port, a function selection mode signal output port, and a second data output port; the second data input port is configured to receive the data output by the determining circuit, the extension mode selection signal input port is configured to receive a data extension mode selection signal corresponding to extension processing performed on the received data, the function selection mode signal output port is configured to output the function selection mode signal determined according to a mode in which the data extension circuit performs extension processing on the received data, and the second data output port is configured to output the data after the extension processing.
In one embodiment, the regular signed number encoding circuit comprises: the system comprises a regular signed number coding sub-circuit and a partial product obtaining sub-circuit, wherein the output end of the regular signed number coding sub-circuit is connected with the first input end of the partial product obtaining sub-circuit;
the regular signed number coding sub-circuit is used for carrying out regular signed number coding processing on the received data to obtain a target code, and the partial product obtaining sub-circuit is used for obtaining a partial product of the target code according to the target code.
In one embodiment, the regular signed number encoding sub-circuit comprises: the third data input port is used for receiving first data subjected to regular signed number coding processing, and the coding output port is used for outputting the target code obtained after the received first data is subjected to the regular signed number coding processing.
In one embodiment, the partial product acquisition sub-circuit comprises: a low bit partial product obtaining unit, a low bit selector set unit, a high bit partial product obtaining unit and a high bit selector set unit; a first output end of the regular signed number coding sub-circuit is connected with a first input end of the low-order partial product acquisition unit, an output end of the low-order selector set unit is connected with a second input end of the low-order partial product acquisition unit, a second output end of the regular signed number coding sub-circuit is connected with a first input end of the high-order partial product acquisition unit, and an output end of the high-order selector set unit is connected with a second input end of the high-order partial product acquisition unit;
the low bit partial product obtaining unit is configured to obtain a low bit partial product after sign bit extension according to a low bit target code in the received target code and second data, and obtain a low bit partial product of the target code according to the low bit partial product after sign bit extension, the low bit selector set unit is configured to gate a numerical value in the low bit partial product after sign bit extension, the high bit partial product obtaining unit is configured to obtain a high bit partial product after sign bit extension according to a high bit target code in the received target code and the second data, and obtain a high bit partial product of the target code according to the high bit partial product after sign bit extension, and the high bit selector set unit is configured to gate a numerical value in the high bit partial product after sign bit extension.
In one embodiment, the compression circuit comprises: a Wallace tree group sub-circuit and an accumulation sub-circuit; the output end of the Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit; the Wallace tree group sub-circuit is used for accumulating the partial product of the target code to obtain an accumulation operation result, and the accumulation sub-circuit is used for accumulating the accumulation operation result to obtain the target operation result.
In one embodiment, the wallace tree group sub-circuit comprises: the system comprises a low-level Wallace tree unit, a selector and a high-level Wallace tree unit, wherein the output end of the low-level Wallace tree unit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree unit; the low-order Wallace tree unit is used for performing accumulation operation on each column value in the partial product of the target code, the selector is used for gating a carry input signal received by the high-order Wallace tree unit, and the high-order Wallace tree unit is used for performing accumulation operation on each column value in the partial product of the target code.
In one embodiment, the accumulation sub-circuit comprises: an adder for adding the result of the addition operation.
In one embodiment, the adder comprises: a carry signal input port, a sum signal input port and an operation result output port; the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the operation result output port is used for outputting a target operation result obtained by accumulating the carry signal and the sum signal.
In the multiplier provided by the embodiment, the regular signed number encoding can be performed on the received data through the regular signed number encoding circuit, the number of the obtained effective partial products is small, and therefore the complexity of the multiplier for realizing multiplication operation is reduced.
The machine learning arithmetic device provided by the embodiment of the application comprises one or more multipliers described in the first aspect; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of multipliers, the plurality of computing devices are connected through a preset specific structure and transmit data;
the multipliers are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.
The combined processing device provided by the embodiment of the application comprises the machine learning processing device, the universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.
The neural network chip provided by the embodiment of the application comprises the multiplier, the machine learning arithmetic device or the combined processing device.
The neural network chip packaging structure provided by the embodiment of the application comprises the neural network chip.
The board card provided by the embodiment of the application comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
The chip provided by the embodiment of the application comprises at least one multiplier as described in any one of the above.
An electronic device provided by the embodiment of the application comprises the chip.
Drawings
Fig. 1 is a schematic structural diagram of a multiplier according to an embodiment;
FIG. 2 is a schematic diagram of another multiplier according to another embodiment;
FIG. 3 is a circuit diagram of an embodiment of a multiplier;
FIG. 4 is a schematic diagram illustrating a distribution rule of partial products obtained by 8-bit data multiplication according to an embodiment;
FIG. 5 is a schematic diagram illustrating a distribution rule of partial products obtained by 16-bit data multiplication according to an embodiment;
FIG. 6 is a specific circuit diagram of a compression circuit for 8-bit data operation according to another embodiment;
FIG. 7 is a circuit diagram of another embodiment of a multiplier;
FIG. 8 is a flowchart illustrating a data processing method according to an embodiment;
FIG. 9 is a flow chart illustrating another data processing method according to an embodiment;
FIG. 10 is a block diagram of a combined processing device according to an embodiment;
FIG. 11 is a block diagram of another integrated processing device according to an embodiment;
fig. 12 is a schematic structural diagram of a board card according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The multiplier provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip or other hardware circuit devices for multiplication processing, and the specific structural schematic diagram of the multiplier is shown in fig. 1 and 2.
As shown in fig. 1, fig. 1 is a structural diagram of a multiplier according to an embodiment. As shown in fig. 1, the multiplier includes: a correction regular signed number encoding circuit 11 and a correction compression circuit 12; the output end of the correction regular signed number encoding circuit 11 is connected with the input end of the correction compression circuit 12; the modified regular signed number encoding circuit 11 includes a first input end for receiving a function selection mode signal; the modified compression circuit 12 includes a first input terminal for receiving the function selection mode signal. Optionally, the function selection mode signal is used to determine a data bit width that can be processed by the multiplier.
Optionally, the modified regular signed number encoding circuit 11 includes an encoding processing branch 111 and a partial product obtaining branch 112, where the encoding processing branch 111 is configured to perform regular signed number encoding processing on received data to obtain a target code, the partial product obtaining branch 112 is configured to obtain a partial product after sign bit extension according to the target code, and obtain a partial product of the target code according to the partial product after sign bit extension, and the modified compression circuit 12 is configured to perform accumulation processing on the partial product of the target code to obtain a target operation result.
Specifically, the modified regular signed number encoding circuit 11 may include two data processing branches with different functions, that is, the encoding processing branch 111 and the partial product obtaining branch 112, and the data received by the modified regular signed number encoding circuit 11 may be fixed-point numbers, and the data may be used as a multiplier in multiplication or a multiplicand in multiplication. Alternatively, the encoding processing branch 111 may include a unit having a regular signed number encoding processing function, and the partial product obtaining branch 112 may include a data processing unit having a plurality of different functions. Optionally, the modified regular signed number encoding circuit 11 may receive data with a plurality of different bit widths, that is, the multiplier provided in this embodiment may process multiplication operations of data with a plurality of different bit widths. However, in the same multiplication, the multiplier and the multiplicand received by the correction regular signed number encoding circuit 11 may be data having the same bit width, that is, the multiplier and the multiplicand have the same bit width. For example, the multiplier provided in this embodiment may process a multiplication operation of 8 bits by 8 bits data, a multiplication operation of 16 bits by 16 bits, a multiplication operation of 32 bits by 32 bits data, and a multiplication operation of 64 bits by 64 bits data, which is not limited in this embodiment.
In this embodiment, the modified regular signed number encoding circuit 11 may perform regular signed number encoding processing on the received multiplier to obtain a target code, and obtain a partial product after sign bit expansion according to the received multiplicand and the target code, where a bit width of the partial product after sign bit expansion may be equal to 2 times a bit width of data currently processed by the multiplier. Alternatively, the regular signed number encoding process described above may be characterized as a data processing procedure by encoding by the values 0, -1 and 1. Illustratively, the modified regular signed number encoding circuit 11 receives data with a bit width of 16 bits, and if the multiplier performs 8-bit data multiplication currently processed, the modified regular signed number encoding circuit 11 may divide the data with a bit width of 16 bits into two groups of data with a high bit 8 and a low bit 8, and perform regular signed number encoding processing, at this time, the bit width of the partial product after sign bit expansion may be 16 bits, and at the same time, the high-bit partial product after 9 sign bit expansion may be obtained for the high-bit data of 8 bits, and the low-bit partial product after 9 sign bit expansion may also be obtained for the low-bit data of 8 bits; if the multiplier performs a 16-bit data multiplication operation currently processed, the modified regular signed number encoding circuit 11 may perform an operation on the entire 16-bit data, at this time, the bit width of the obtained partial product after sign bit expansion may be 32 bits, and the number of the obtained partial products after sign bit expansion may be equal to the bit width of the data currently processed by the multiplier plus 2.
It will be appreciated that the function selection mode signal described above may be of a variety, with different function selection mode signals corresponding to multiplication operations for which the multiplier can currently process data of different bit widths. Alternatively, the function selection mode signals received by the correction regular signed number encoding circuit 11 and the correction compression circuit 12 may be equal to each other in the same multiplication.
For example, if the correction regular signed number encoding circuit 11 and the correction compression circuit 12 can receive a plurality of function selection mode signals, taking three function selection mode signals as an example, and the mode may be 00, 01, and 10, respectively, the mode 00 may indicate that the multiplier can process 16-bit data, the mode 01 may indicate that the multiplier can process 32-bit data, the mode 10 may indicate that the multiplier can process 64-bit data, the mode 00 may indicate that the multiplier can process 64-bit data, the mode 01 may indicate that the multiplier can process 16-bit data, and the multiplier may indicate that the multiplier can process 32-bit data.
In this embodiment, the modified regular signed number encoding circuit 11 may receive a multiplier in the multiplication operation, and perform regular signed number encoding processing on the multiplier to obtain the target code. It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; and so on, the others are continued by l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number encoding circuit 11 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1)0(-1)00(-1)00 (1) 0", and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an initial code, and after the initial code is subjected to one bit complementing process, the representation regular signed number coding process is completed to obtain an intermediate code, wherein the bit width of the initial code may be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding circuit 11 performs the regular signed number encoding processing on the multiplier, to obtain new data (i.e. initial encoding), if the highest-order value and the second-order highest-order value in the new data are "10" or "01", the regular signed number encoding circuit 11 may supplement a first-order value 0 to the highest-order position of the highest-order value of the new data, so as to obtain a corresponding middle-encoded high three-order value which is "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the multiplier plus 1.
In addition, if the bit width of the data received by the multiplier is 2N and the data operation can be currently performed on N-bit data, the regular signed number encoding circuit 11 in the multiplier can divide the 2N-bit data into two groups of N-bit data for data operation, and at this time, the two groups of (N +1) -bit intermediate codes obtained are combined to be used as target codes; if the multiplier can currently process 2N-bit data operation, the regular signed number encoding circuit 11 in the multiplier may complement a bit value 0 (i.e., complement processing) at a position higher than the highest bit value of the obtained (2N +1) -bit intermediate code, and then use the (2N +2) -bit data after complement processing as the target code.
In the multiplier provided by the embodiment, the multiplier performs regular signed number coding processing on received data through the correction regular signed number coding circuit to obtain a partial product after sign bit expansion, obtains a partial product of a target code according to the partial product after sign bit expansion, and performs accumulation processing on the partial product of the target code through the correction compression circuit to obtain a target operation result of multiplication; meanwhile, the multiplier can adopt the correction regular signed number coding circuit to carry out regular signed number coding processing on the received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
Fig. 2 is a schematic circuit diagram of a multiplier according to another embodiment. As shown in fig. 2, the multiplier includes: a judgment circuit 21, a data expansion circuit 22, a regular signed number encoding circuit 23, and a compression circuit 24; the output end of the judging circuit 21 is connected to the input end of the data expanding circuit 22, the output end of the judging circuit 21 is connected to the first input end of the regular signed number encoding circuit 23, the output end of the data expanding circuit 22 is connected to the second input end of the regular signed number encoding circuit 23, and the output end of the regular signed number encoding circuit 23 is connected to the input end of the compressing circuit 24. The judging circuit 21 is configured to judge whether received data needs to be processed by the data expanding circuit 22 connected to the output end of the judging circuit 21, the data expanding circuit 22 is configured to expand the received data, the regular signed number encoding circuit 23 is configured to perform regular signed number encoding on the received data to obtain a partial product of a target code, and the compressing circuit 24 is configured to perform accumulation processing on the partial product of the target code.
Specifically, the judging circuit 21 may be a circuit that automatically judges the bit width of the received data and the bit width of the data that can be processed by the multiplier, which is 2N. Alternatively, the regular signed number encoding circuit 23 may include a plurality of data processing units with different functions, and the data received by the regular signed number encoding circuit 23 may be used as a multiplier in a multiplication operation and may also be used as a multiplicand in the multiplication operation. The data received by the regular signed number encoding circuit 23 may be two data output by the judgment circuit 21, or may be data obtained by performing expansion processing on the two received data by the data expansion circuit 22. Optionally, the data processing units with different functions may be data processing units with a regular signed number encoding function. Alternatively, the multiplier and the multiplicand may be fixed-point numbers with multi-bit widths. Alternatively, the compressing circuit 24 may perform accumulation processing on the partial product of the target code obtained by the positive signed number encoding circuit 23 to obtain a target operation result of the multiplication operation.
It should be noted that the multiplier may perform multiplication operation on data with a fixed 2N bit width, and it is also understood that the regular signed number encoding circuit 23 and the compressing circuit 24 in the multiplier may perform multiplication operation on data with a 2N bit width. However, in the same multiplication, the multiplier and the multiplicand received by the regular signed number encoding circuit 23 are data of the same bit width. For example, the multiplier provided in this embodiment may process a data multiplication operation of 8 bits by 8 bits, a data multiplication operation of 16 bits by 16 bits, a data multiplication operation of 32 bits by 32 bits, and a data multiplication operation of 64 bits by 64 bits, which is not limited in this embodiment. Optionally, there may be one input port of the data processing unit with different functions, the function of each input port of each data processing unit may be the same, there may also be one output port, the function of each output port of each data processing unit may be different, and the circuit structures of the data processing units with different functions may be different.
Optionally, the regular signed number encoding circuit 23 includes a third input end, configured to receive a function selection mode signal; the compression circuit 24 includes a second input terminal for receiving the function selection mode signal.
In the multiplier provided by this embodiment, the multiplier determines, by the determining circuit, whether the received data needs to be processed by the next data expansion circuit, if the received data does not need to be processed by the data expansion circuit, the determining circuit directly inputs the received data to the regular signed number encoding circuit to perform the regular signed number encoding process to obtain the partial product of the target code, otherwise, the received data is input to the data expansion circuit to perform the expansion process, and then the expanded data is input to the regular signed number encoding circuit to perform the regular signed number encoding process to obtain the partial product of the target code, and the partial product of the target code is accumulated by the compression circuit to obtain the target operation result of the multiplication operation, the multiplier can expand the received low-bit-width data, and the expanded data meets the data requirement that the multiplier can process, the target operation result is still the result of multiplication operation of the original bit width data, so that the operation of the low bit width data can be processed by the multiplier, and the area of an AI chip occupied by the multiplier is effectively reduced; meanwhile, the multiplier can adopt the correction regular signed number coding circuit to carry out regular signed number coding processing on the received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
Fig. 3 is a schematic diagram of a specific structure of a multiplier according to another embodiment, in which the multiplier includes an encoding processing branch 111, the encoding processing branch 111 includes a modified regular signed number encoding unit 1111, and the partial product obtaining branch 112 includes a lower partial product obtaining unit 1121, a lower selector block 1122, an upper partial product obtaining unit 1123, and an upper selector block 1124; a first output terminal of the modified regular signed number coding unit 1111 is connected to a first input terminal of the low-order partial product obtaining unit 1121, an output terminal of the low-order selector bank unit 1122 is connected to a second input terminal of the low-order partial product obtaining unit 1121, a second output terminal of the modified regular signed number coding unit 1111 is connected to a first input terminal of the high-order partial product obtaining unit 1123, and an output terminal of the high-order selector bank unit 1124 is connected to a second input terminal of the high-order partial product obtaining unit 1123.
Wherein the modified regular signed number encoding unit 1111 is configured to perform regular signed number encoding processing on the received first data, determine a bit width of the data that can be processed by the multiplier according to the received function selection mode signal, and obtain a target code according to the bit width of the data that can be processed by the multiplier, the lower bit product obtaining unit 1121 is configured to obtain a sign-extended lower bit product according to a lower bit target code in the received target code and the second data, and obtain a lower bit product of the target code according to the sign-extended lower bit product, the lower selector group unit 1122 is configured to gate a value in the sign-extended lower bit product, the upper bit product obtaining unit 1123 is configured to obtain an upper bit target code in the received target code and the second data, the sign bit extended upper bit partial product is obtained, and the upper bit selector group unit 1124 is configured to gate the value in the sign bit extended upper bit partial product, and obtain the target code upper bit partial product according to the sign bit extended upper bit partial product.
Specifically, modified regular signed number encoding section 1111 may receive first data, perform regular signed number encoding processing on the first data to obtain a target code, where the first data may be a multiplier in a multiplication operation. Optionally, the lower partial product obtaining unit 1121 may obtain a lower partial product of the target code according to the lower target code obtained by the modified regular signed number encoding unit 1111 and receiving the second data; the high-order partial product obtaining unit 1123 may obtain a high-order partial product of the target code according to the high-order target code obtained by the modified regular signed number coding unit 1111 and by receiving the second data; the second data may be a multiplicand in a multiplication operation. Optionally, if the bit width of the data received by the correction regular signed number coding unit 1111 is 2N, and the bit width of the data currently processed by the multiplier is N bits, the correction regular signed number coding unit 1111 may automatically split the received 2N-bit data into high N-bit data and low N-bit data, and perform regular signed number coding on the high N-bit data and the low N-bit data respectively, where the number of the obtained high target codes is equal to N plus 1, and the number of the obtained low target codes may also be equal to N plus 1; meanwhile, the number of high bit products of the corresponding target code obtained by the high bit target code may be equal to (N +1), and the number of low bit products of the corresponding target code obtained by the low bit target code may be equal to (N + 1); if the bit width of the data received by the modified regular signed number encoding unit 1111 in the multiplier is 2N, and the bit width of the data currently processable by the multiplier is also 2N bits, the modified regular signed number encoding unit 111 may perform regular signed number encoding processing on the received 2N bit data to obtain a (2N +1) -bit intermediate code, and needs to complement a value 0 to a higher bit of a highest bit value of the intermediate code, and use the (2N +2) -bit code as a target code, that is, the highest bit value in the target code is 0, and values included in a partial product of the target code corresponding to the signal 0 are both 0; wherein, the high (N +1) bit value in the (2N +2) bit target code can be called as the high bit target code, and the low (N +1) bit value can be called as the low bit target code.
It should be noted that the low selector bank 1122 may gate the partial bit value in the low partial product after sign bit extension according to the received function selection mode signal, and whether the partial bit value is the value in the partial product after sign bit extension obtained by N-bit multiplication or the value in the partial product after sign bit extension obtained by 2N-bit multiplication; similarly, the upper selector set unit 1124 may select, according to the received function selection mode signal, a partial bit value in the upper partial product after sign bit extension, which is a value in the partial product after sign bit extension obtained by N-bit multiplication or a value in the partial product after sign bit extension obtained by 2N-bit multiplication.
It can be understood that, if the bit width of the data received by the multiplier may be 2N, and the current multiplier may process an N-bit data multiplication operation, the low-bit partial product obtaining unit 1121 in the multiplier may obtain a partial product after sign bit expansion corresponding to the low N-bit data according to each bit value in the low-bit target code; the low selector bank 1122 may gate the value of the sign-extended low bit product; and then combining the partial product after the sign bit expansion with the value in the lower bit partial product after the sign bit expansion after gating to obtain the lower bit partial product after the sign bit expansion. Optionally, the high-order partial product obtaining unit 1123 may obtain a partial product after sign bit extension corresponding to the high N-order data according to each digit value in the high-order target code; the upper selector bank unit 1124 may gate the value in the sign bit extended upper bit partial product; and then combining the partial product after the sign bit expansion with the value in the high-order partial product after the sign bit expansion after gating to obtain the high-order partial product after the sign bit expansion. Optionally, in the regular signed number coding processing process, the number of the obtained low-order target codes may be equal to the number of the obtained high-order target codes, and may also be equal to the number of low-order partial products after sign bit extension corresponding to low-N-bit data, or the number of high-order partial products after sign bit extension corresponding to high-N-bit data. Optionally, the modified regular signed number encoding circuit 11 may include (N +1) lower-order partial product obtaining units 1121, and may further include (N +1) upper-order partial product obtaining units 1123. Optionally, each of the lower partial product obtaining unit 1121 and each of the upper partial product obtaining unit 1123 may include 2N number of value generating sub-units, and each of the value generating sub-units may obtain a bit value of the partial product after sign bit extension. Meanwhile, the lower-order partial product obtaining unit 1121 may determine the lower-order partial product of the corresponding target code according to the obtained lower-order partial product after sign bit extension; the high-order partial product obtaining unit 1123 may determine the high-order partial product of the corresponding target code according to the obtained high-order partial product after sign bit extension.
In the multiplier provided by this embodiment, the multiplier performs regular signed number encoding processing on received data through a correction regular signed number encoding unit in a correction regular signed number encoding circuit to obtain a target code, and a low-order partial product obtaining unit and a high-order partial product obtaining unit obtain a partial product after sign bit expansion according to a low-order target code and a high-order target code in the target code, obtain a corresponding partial product of the target code according to the partial product after sign bit expansion, and further perform accumulation processing on the partial product of the target code through a compression circuit to obtain a target operation result of multiplication; meanwhile, the multiplier can adopt the correction regular signed number coding unit to carry out regular signed number coding processing on the received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of realizing multiplication by the multiplier is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
In one embodiment, the multiplier includes a modified regular signed number coding unit 1111, and the modified regular signed number coding unit 1111 includes: a first data input port 1111a, a first mode selection signal input port 1111b, a low order targeted code output port 1111c, and a high order targeted code output port 1111 d; the first data input port 1111a is configured to receive the first data, the first mode selection signal input port 1111b is configured to receive the function selection mode signal, the lower target encoding output port 1111c is configured to output the lower target encoding obtained by performing a regular signed number encoding process on the first data, and the upper target encoding output port 1111d is configured to output the upper target encoding obtained by performing a regular signed number encoding process on the first data.
Specifically, in the multiplication process, the modified regular signed number encoding unit 1111 may receive the first data through the first data input port 1111a, receive the function selection mode signal through the first mode selection signal input port 1111b, perform regular signed number encoding processing on the first data to obtain the intermediate code, determine whether the complementary number processing needs to be performed on the intermediate code according to the received function selection mode signal to obtain the target code, output the lower target code in the target code through the lower target code output port 1111c, and output the upper target code in the target code through the upper target code output port 1111 d. The complement processing may be performed by complementing the highest-order-bit value of the intermediate code by 0.
In the multiplier provided by the embodiment, the regular signed number encoding processing can be performed on the received data by adopting the modified regular signed number encoding unit, so that the number of effective partial products obtained in the multiplication process is reduced, the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced; meanwhile, the multiplier can carry out multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.
As one embodiment, the lower partial product obtaining unit 1121 includes: a low bit target code input port 1121a, a first strobe value input port 1121b, a second mode selection signal input port 1121c, a second data input port 1121d, and a low bit partial product output port 1121 e; the lower target code input port 1121a is configured to receive the lower target code, the first strobe value input port 1121b is configured to receive a value included in the lower product of the sign bit after being expanded and output after being gated by the lower selector bank unit, the second mode selection signal input port 1121c is configured to receive the function selection mode signal, the second data input port 1121d is configured to receive the second data, and the lower product output port 1121e is configured to output the lower product of the target code.
Specifically, the lower partial product obtaining unit 1121 may receive the lower target code output by the modified regular signed number encoding unit 1111 through the lower target code input port 1121a, receive the multiplicand in the multiplication operation through the second data input port 1121d, and obtain the partial product after the sign bit extension corresponding to the lower target code according to the lower target code and the multiplicand. Optionally, if the function selection mode signal received by the second mode selection signal input port 1121c corresponds to a multiplier to process an N-bit data operation, the bit width of the partial product after sign bit extension may be equal to 2N. For example, if the multiplier processes an N-bit data operation and the lower partial product obtaining unit 1121 receives a multiplicand X with a bit width of N bits, the lower partial product obtaining unit 1121 may directly obtain a corresponding partial product after sign bit expansion of 2N bits according to the multiplicand X and three values-1, 1 and 0 included in the lower target code, where the lower (N +1) bit value of the partial product after sign bit expansion may be equal to all values in the original partial product directly obtained by the lower target code, and the upper (N-1) bit value of the partial product after sign bit expansion may be equal to the sign bit value in the original partial product, where the sign bit value is the highest bit value of the original partial product. The original partial product may be-X when the value in the lower target code is-1, X when the value in the lower target code is 1, and 0 when the value in the lower target code is 0.
It should be noted that the low-bit partial product obtaining unit 1121 may receive, through the first gated value input port 1121b, a corresponding bit value in the low-bit partial product obtained after sign bit expansion when the low-bit selector bank unit 1122 gates different-bit-width data operations; and combining the sign bit expanded partial product corresponding to the low-order target code obtained by the multiplier at present with the corresponding bit value after gating to obtain the sign bit expanded low-order partial product.
Further, the lower-order-portion-product obtaining unit 1121 may obtain the lower-order portion product of the corresponding target code according to the lower-order portion products after all sign bits are extended, and output the lower-order portion product of the target code through the lower-order-portion-product output port 1121 e. Alternatively, the distribution rule of the lower bit products of all the target codes can be characterized in that the lower bit product of the first target code can be equal to the lower bit product after the first sign bit is expanded, i.e. the lower partial product after the sign bit extension corresponding to the lowest bit value in the lower target code, starting from the lower partial product of the second target code, the highest bit value in the lower partial product of each target code, the highest bit value in the lower bit product of the first target code is in the same column, the lower bit product of each target code, may be equal to the lowest bit value of the corresponding sign bit extended lower bit partial product, the next highest value of the lower partial product of the last target code is in the same column, i.e., and a plurality of values of which the lower bit partial products after the corresponding sign bit expansion exceed the highest column value in the lower bit partial product of the first target code do not participate in the subsequent operation.
In the multiplier provided by this embodiment, the multiplier may obtain, through the low-order-portion-product obtaining unit, the low-order-portion product after the sign bit expansion according to each bit value and the second data included in the low-order-target code, obtain the low-order-portion product of the target code according to the low-order-portion product after the sign bit expansion, determine the high-order-portion product of the target code according to the high-order-portion product after the sign bit expansion obtained by the high-order-portion-product obtaining unit, and further perform accumulation processing on the low-order-portion product of the target code and the high-order-portion product of the target code through the correction compression circuit to obtain the target operation result, where the number of effective partial products that can be obtained by the multiplier is small, thereby reducing the complexity of the multiplier in implementing multiplication, improving the operation efficiency of multiplication, and effectively reducing the power consumption of the multiplier; meanwhile, the multiplier can carry out multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.
In one embodiment, wherein the multiplier includes the low selector set unit 1122, the low selector set unit 1122 includes: a plurality of low selectors 1122a are used to gate the value in the sign bit extended low bit partial product.
Specifically, the number of the low selectors 1122a in the low selector bank unit 1122 may be equal to 3N (N +1), 2N may represent the bit width of the data currently processed by the multiplier, and the internal circuit structure of each low selector 1122a in the low selector bank unit 1122 may be the same. Optionally, in the multiplication operation, each of the (N +1) lower partial product obtaining units 1121 connected to the modified regular signed number encoding unit 111 may include 4N number of value generating sub-units, where 2N number of value generating sub-units may be connected to 2N number of lower selectors 1122a, and each of the 2N number of value generating sub-units may be connected to one lower selector 1122 a. Optionally, the 2N value generating subunits corresponding to the 2N low-bit selectors 1122a may generate subunits corresponding to the high 2N-bit values in the low-bit partial product of the target code, and meanwhile, the external input ports of the 2N low-bit selectors 1122a have two other input ports besides the function selection mode signal input port (mode). Optionally, if the multiplier can process N data operations with different bit widths, and the bit width of the data received by the multiplier is 2N, the signals received by two other input ports of the low selector 1122a may be 0, and when the multiplier performs a data operation with a bit width of 2N bits, the sign bit value in the partial product after the corresponding sign bit is extended obtained by the low partial product obtaining unit 1121. The (N +1) lower partial product obtaining units 1121 may be connected to (N +1) groups of 2N lower selectors 1122a, and sign bit values received by the 2N lower selectors 1122a of each group may be the same or different, but sign bit values received by the 2N lower selectors 1122a of the same group are the same, and the sign bit value may be obtained according to the sign bit value in the partial product after sign bit expansion obtained by the lower partial product obtaining unit 1121 connected to each group of 2N lower selectors 1122 a.
In addition, in each of the 4N number-of-bits generating sub-units included in the lower-bit-product obtaining unit 1121, the corresponding N number-of-bits generating sub-unit may not be connected to the lower-bit selector 1122a, in this case, the number value obtained by the N number-of-bits generating sub-unit may be a number value in a lower-bit target code obtained by processing data of different bit widths currently processed by the multiplier, and a corresponding bit number value in a lower-bit product after corresponding sign bit expansion, or it may be understood that the number value obtained by the N number-of-bits generating sub-unit may be a number value between 1 st bit and an N th bit number value in a lower-bit product after corresponding sign bit expansion, which corresponds to the lowest bit (i.e., 1 st bit) to the highest bit.
It should be noted that, of the 4N number of value generation subunits included in each of the above-mentioned lower partial product obtaining units 1121, the remaining N number of value generation subunits may be connected to N number of lower selectors 1122a, each of the number of value generation subunits may be connected to 1 number of lower selectors 1122a, an external input port of the N number of lower selectors 1122a may have two other input ports besides the function selection mode signal input port (mode), signals respectively received by the two other input ports may be subjected to 2N-bit data operation for the multiplier, a sign bit value in the partial product after corresponding sign bit expansion obtained by performing 2N-bit data operation on the multiplier and a corresponding bit value in the lower partial product after corresponding sign bit expansion are obtained, it may also be understood that a value obtained by the N number of value generation subunits may be in the lower partial product after corresponding sign bit expansion, corresponding to all values between the (N +1) th bit to the 2N nd bit values from the least significant bit (i.e., the 1 st bit) to the most significant bit. The (N +1) lower partial product obtaining units 1121 may be connected to (N +1) groups of N lower selectors 1122a, sign bit values received by the N lower selectors 1122a of each group may be the same or different, but sign bit values received by the N lower selectors 1122a of the same group are the same, and the sign bit value may be obtained according to each group of N lower selectors 1122a corresponding to the sign bit value in the partial product obtained by extending the sign bit obtained by the connected lower partial product obtaining unit 1121.
The corresponding bit value in the sign-extended low bit product received by the N low selectors 1122a of each group may be determined by the corresponding bit value in the sign-extended low bit product obtained by the low bit product obtaining unit 1121 connected to the group of low selectors 1122a, and the corresponding bit value received by each of the N low selectors 1122a of each group may be the same or different. The distribution rule of the positions of the 4N number of sub-units for generating values in each low-order partial product obtaining unit 1121 may be shifted to the left by one sub-unit for generating values based on the positions of the 4N number of sub-units for generating values in the last low-order partial product obtaining unit 1121. Optionally, of the lower bit products of all target codes participating in the subsequent operation, only the bit width of the lower bit product of the first target code may be equal to 4N, the bit widths of the lower bit products of the remaining target codes are less than one bit on the basis of the lower bit product of the previous target code, and the bit width of the lower bit product of the last target code may be equal to (2N-1).
In the multiplier provided by this embodiment, the low bit selector set unit in the multiplier may gate the value in the low bit product after sign bit extension to obtain the low bit product after sign bit extension, obtain the low bit product of the target code according to the low bit product after sign bit extension, and further perform accumulation processing on the low bit product and the high bit product of the target code through the correction compression circuit to obtain the target operation result, the number of effective partial products that can be obtained by the multiplier is small, thereby reducing the complexity of the multiplier for realizing multiplication, improving the operation efficiency of multiplication, and effectively reducing the power consumption of the multiplier; meanwhile, the multiplier can carry out multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.
In one embodiment, the multiplier includes an upper partial product obtaining unit 1123, and the upper partial product obtaining unit 1123 includes: a high target code input port 1123a, a second strobe value input port 1123b, a third mode select signal input port 1123c, a second data input port 1123d, and a high product output port 1123 e; the upper bit target code input port 1123a is configured to receive the upper bit target code, the second strobe value input port 1123b is configured to receive a value included in the upper bit product of the sign bit extension output after being strobed by the upper selector bank unit, the third mode selection signal input port 1123c is configured to receive the function selection mode signal, the second data input port 1123d is configured to receive the second data, and the upper bit product output port 1123e is configured to output the upper bit product of the target code.
Specifically, the high-order partial product obtaining unit 1123 may receive the high-order target code output by the correction regular signed number coding unit 1111 through the high-order target code input port 1123a, receive the multiplicand in the multiplication operation through the second data input port 1123d, and obtain the high-order partial product after the sign bit corresponding to the high-order target code is extended according to the high-order target code and the multiplicand. Optionally, if the high-order partial product obtaining unit 1123 processes an N-bit data operation through the third mode selection signal input port 1123c by using the received function selection mode signal corresponding to the multiplier, the bit width of the high-order partial product after sign bit expansion obtained by the high-order partial product obtaining unit 1123 may be equal to 2N. For example, if the multiplier processes an N-bit data operation and the upper partial product obtaining unit 1123 receives a multiplicand X with an N-bit width, the upper partial product obtaining unit 1123 may directly obtain a corresponding 2N-bit sign bit extended partial product according to the multiplicand X and three values-1, and 0 included in the upper target code, where the lower N-bit value in the sign bit extended partial product may be equal to all values in the original partial product directly obtained by the upper target code, and the upper N-bit value in the sign bit extended partial product may be equal to the sign bit value in the original partial product, where the sign bit value is the highest bit value of the original partial product. The original partial product may be-X when the value in the upper target code is-1, X when the value in the upper target code is 1, and 0 when the value in the upper target code is 0.
It should be noted that the high-order partial product obtaining unit 1123 may receive, through the second gated value input port 1123b, the corresponding bit value in the high-order partial product after sign bit expansion, obtained when the data with different bit widths gated by the high-order selector bank unit 1124 is received; and combining the partial product obtained by the multiplier after the sign bit expansion corresponding to the high-order target code is expanded with the corresponding bit value after gating to obtain the high-order partial product after the sign bit expansion.
Further, the high-order partial product obtaining unit 1123 may obtain a corresponding high-order partial product of the target code according to the high-order partial products after all sign bits are extended, and output the high-order partial product of the target code through the high-order partial product output port 1123 e. Optionally, the distribution rule of the upper bit products of all the target codes may be characterized in that the upper bit product of the first target code may be located at the partial product of the target code next to the lower bit product of the last target code, that is, the partial product of the target code corresponding to the lowest bit value in the upper bit target code, the bit width of the upper bit product of the first target code may be equal to the bit width of the lower bit product of the last target code minus 1, that is, the upper bit product of the first target code may be equal to the upper bit product after the first sign bit is expanded, and the lowest bit value of the upper bit product after the sign bit is expanded and the next highest bit value of the lower bit product of the last target code are located in the same column, which is equivalent to that a plurality of values of the upper bit product after the first sign bit is expanded beyond the highest column value in the lower bit product of the last target code do not participate in the subsequent operation, starting from the upper-order partial product of the second target code, the highest-order value in the upper-order partial product of each target code is located in the same column as the highest-order value in the upper-order partial product of the first target code, the upper-order partial product of each target code may be equal to the upper-order partial product after the corresponding sign bit is expanded, and the lowest-order value of the upper-order partial product after the sign bit is expanded is located in the same column as the next-order upper-order value of the upper-order partial product of the last target code, that is, a plurality of values of the upper-order partial product after the corresponding sign bit is expanded exceeding the highest-order value in the upper-order partial product of the first target code do not participate in the subsequent operation.
In the multiplier provided by this embodiment, the multiplier can obtain the high-order partial product of the sign bit after being expanded according to each bit value and the second data included in the high-order target code through the high-order partial product obtaining unit, obtain the high-order partial product of the target code according to the high-order partial product of the sign bit after being expanded, and perform accumulation processing on the high-order partial product and the low-order partial product of the target code through the correction compression circuit to obtain the target operation result, the number of effective partial products that can be obtained by the multiplier is small, so that the complexity of realizing multiplication by the multiplier is reduced, the operation efficiency of multiplication is improved, and the power consumption of the multiplier is effectively reduced; meanwhile, the multiplier can carry out multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.
In one embodiment, among others, the multiplier includes a high selector bank unit 1124, the high selector bank unit 1124 includes: a high bit selector 1124a, a plurality of the high bit selectors 1124a are used to gate the value in the sign bit extended high bit partial product.
Specifically, the number of the high selectors 1124a in the high selector bank unit 1124 may be equal to 3N (N +1), 2N may represent the bit width of the data currently processed by the multiplier, and the internal circuit structure of each high selector 1124a in the high selector bank unit 1124 may be the same. Optionally, during the multiplication operation, the modified regular signed number encoding unit 111 may be connected to (N +1) upper partial product obtaining units 1123, each upper partial product obtaining unit 1123 may include 4N number of value generating sub-units, where 2N number of value generating sub-units may be connected to 2N number of upper selectors 1124a, and each value generating sub-unit is connected to one upper selector 1124 a. Optionally, the 2N value generating subunits corresponding to the 2N high-order selectors 1124a may be generating subunits corresponding to low 2N-order values in the high-order partial product of the target code, and an external input port of the 2N high-order selectors 1124a has two other input ports besides the function selection mode signal input port (mode). Optionally, if the multiplier can process N data operations with different bit widths, and the bit width of the data received by the multiplier is 2N, the signals received by the two other input ports of the high-order selector 1124a may be 0, and when the multiplier performs a data operation with a bit width of 2N bits, the high-order partial product obtaining unit 1123 obtains a corresponding bit value in the partial product after the corresponding sign bit is extended. The (N +1) high bit partial product obtaining unit 1123 may be connected to the (N +1) groups of 2N high bit selectors 1124a, and the corresponding bit values received by the 2N high bit selectors 1124a of each group may be the same or different.
In addition, in the 4N number of value generation subunits included in each high-order partial product obtaining unit 1123, the N number of high-order selectors 1124a may be connected to the corresponding N number of value generation subunits, each value generation subunit may be connected to 1 number of high-order selectors 1124a, the internal circuit structures of the N number of high-order selectors 1124a and the selector 113 may be the same, and an external input port of the N number of high-order selectors 1124a may have two other input ports in addition to the function selection mode signal input port (mode), and signals respectively received by the two other input ports may be subjected to 2N-bit data operation for the multiplier, so as to obtain a sign bit value in the partial product after the sign bit is extended, and the multiplier performs 2N-bit data operation, so as to obtain a corresponding bit value in the partial product after the sign bit is extended. The (N +1) upper partial product obtaining units 1123 may be connected to (N +1) groups of N upper selectors 1124a, sign bit values received by the N upper selectors 1124a of each group may be the same or different, but sign bit values received by the N upper selectors 1124a of the same group are the same, and the sign bit value may be obtained according to each group of N upper selectors 1124a, corresponding to the sign bit value in the partial product obtained by the sign bit expansion obtained by the connected upper partial product obtaining unit 1123. In addition, the corresponding bit value in the sign bit extended partial product received by the N upper selectors 1124a of each group may be determined by the sign bit value in the sign bit extended partial product obtained by the upper partial product obtaining unit 1123 connected to the group of upper selectors 1124a, and the corresponding bit value received by each of the N upper selectors 1124a of each group may be the same or different.
It should be noted that, of the 4N number of value generation sub-units included in each high-order partial product obtaining unit 1123, the remaining N number of value generation sub-units may not be connected to the high-order selector 1124a, and at this time, the value obtained by the N number of value generation sub-units may be a corresponding bit value in a partial product after corresponding sign bit expansion obtained from a value in a high-order target code, which is obtained by processing data with different bit widths currently by the multiplier, or it may be understood that the value obtained by the N number of value generation sub-units may be all values between (2N +1) th bit and 3N number of bit values, from the lowest bit (i.e., 1 st bit) to the highest bit, in the high-order partial product after corresponding sign bit expansion. The distribution rule of the positions of the 4N number of value generation sub-units in each high-order partial product obtaining unit 1123 may be shifted to the left by one number of value generation sub-unit based on the positions of the 4N number of value generation sub-units in the last high-order partial product obtaining unit 1123. Optionally, of the upper bit products of all target codes participating in the subsequent operation, only the bit width of the upper bit product of the first target code may be equal to 4N, the bit widths of the upper bit products of the remaining target codes are less than one bit based on the upper bit product of the previous target code, and the bit width of the upper bit product of the last target code may be equal to (2N-1).
In the multiplier provided by this embodiment, the high-order selector bank unit in the multiplier may gate the value in the high-order partial product to obtain the high-order partial product after sign bit extension, obtain the high-order partial product of the target code according to the high-order partial product after sign bit extension, and further perform accumulation processing on the high-order partial product and the low-order partial product of the target code through the correction compression circuit to obtain the target operation result, and the number of effective partial products that can be obtained by the multiplier is small, thereby reducing the complexity of the multiplier in realizing multiplication, improving the operation efficiency of multiplication, and effectively reducing the power consumption of the multiplier; meanwhile, the multiplier can carry out multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.
Fig. 3 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the modified compression circuit 12, and the modified compression circuit 12 includes: a modified Wallace tree group circuit 121 and an accumulation circuit 122, wherein the output end of the modified Wallace tree group circuit 121 is connected with the input end of the accumulation circuit 122; the modified wallace tree group circuit 121 is configured to perform accumulation processing on each column number value in the partial product of all target codes obtained when data with different bit widths are calculated, so as to obtain an accumulation operation result, and the accumulation circuit 122 is configured to perform accumulation processing on the accumulation operation result.
Specifically, the modified wallace tree group circuit 121 may perform accumulation processing on each column number value in the partial product of the target code obtained by the modified regular signed number encoding circuit 11, and perform accumulation processing on two operation results obtained by the modified wallace tree group circuit 121 through the accumulation circuit 122 to obtain a target operation result of multiplication.
It should be noted that each of the partial products of all target codes may be equal to the sign-bit extended partial product, and may also be equal to the value of a partial bit in the sign-bit extended partial product, where the first partial product of a target code may be equal to the first corresponding partial product of the sign-bit extended partial product. Optionally, the lowest order value in the partial product of each target code may be located in the same column as the next-lowest order value in the partial product of the previous target code, which is equivalent to each order value in the partial product of each sign bit after being expanded, the lowest order value in the partial product of each target code is shifted to the left by one column on the basis of the column where each order value in the partial product of the previous sign bit after being expanded is located, and the highest order value in the partial product of each target code and the highest order value in the partial product of the first target code are located in the same column, wherein all values exceeding the column corresponding to the highest order value in the partial product of the first target code may not be accumulated. Alternatively, the column number of the partial products of all target codes may be equal to 2 times the bit width of the data currently processed by the multiplier.
Illustratively, if the two data bits received by the multiplier are 16 bits wide and the multiplier can currently process 8-bit data multiplication, the multiplier can process two groups of 8 bits by 8 bits data multiplication, the multiplier obtains the lower products of 9 target codes by correcting the regular signed number encoding circuit 11, and the distribution rule of the upper products of 9 target codes is shown in fig. 4, wherein the upper right corner is the distribution diagram of the lower products of 9 target codes, the lower left corner is the distribution diagram of the upper products of 9 target codes, "indicates each bit value in the lower products of target codes,representing each bit value in the target code '●' representing the sign-extended bit value of either the target code's lower bit product or the target code's upper bit product; if the multiplier can currently process 16 bits by 16 bits data multiplication, the distribution rule of the lower products of the 9 target codes and the upper products of the 9 target codes obtained by the multiplier through the modified regular signed number coding circuit 11 is shown in fig. 5, wherein "∘" represents each bit value in the lower products of the target codes,indicating each bit value in the target code's upper bit product, ' ● ' indicating the sign-extended bit value of either the target code's lower bit product or the target code's upper bit product.
In the multiplier provided by this embodiment, the multiplier can accumulate the low-order part and the high-order part of the target code by modifying the wallace tree group circuit, and accumulate the accumulated result again by the accumulation circuit to obtain the target operation result of the multiplication operation, and the process can multiply data with various bit widths, thereby effectively reducing the area of the AI chip occupied by the multiplier; meanwhile, the number of effective partial products which can be obtained by the multiplier is small, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 3, the multiplier includes the modified wallace tree group circuit 121, and the modified wallace tree group circuit 121 includes: a low-order Wallace tree subcircuit 1211, a selector 1212 and a high-order Wallace tree subcircuit 1213, wherein an output terminal of the low-order Wallace tree subcircuit 1211 is connected with an input terminal of the selector 1212, and an output terminal of the selector 1212 is connected with an input terminal of the high-order Wallace tree subcircuit 1213; the low Wallace tree sub-circuits 1211 is configured to accumulate each column value of the target encoded partial product, the selector 1212 is configured to gate the carry input signal received by the high Wallace tree sub-circuit 1213, and the high Wallace tree sub-circuits 1213 are configured to accumulate each column value of the target encoded partial product.
Specifically, the circuit structure of each low-level wallace tree sub-circuit 1211 can be implemented by combining a full adder and a half adder, or by combining a 4-2 compressor, wherein the 4-2 compressor can be composed of a plurality of full adders; the circuit structure of each high-order Wallace tree sub-circuit 1213 can be realized by the combination of a full adder and a half adder, or by a 4-2 compressor groupIn a preferred implementation, the 4-2 compressor may be comprised of a plurality of full adders. In addition, the lower Wallace tree sub-circuit 1211 and the upper Wallace tree sub-circuit 1213 may each be understood as a circuit capable of processing a multi-bit input signal and adding the multi-bit input signal to obtain a two-bit output signal. Optionally, the number of the high-order wallace tree sub-circuits 1213 in the modified wallace tree group circuit 121 may be equal to the data bit width N currently processed by the multiplier, or may be equal to the number of the low-order wallace tree sub-circuits 1211, and the low-order wallace tree sub-circuits 1211 may be connected in series, and the high-order wallace tree sub-circuits 1213 may be connected in series. Optionally, the output terminal of the last lower Wallace tree sub-circuit 1211 is connected to the input terminal of the selector 1212, and the output terminal of the selector 1212 is connected to the input terminal of the first upper Wallace tree sub-circuit 1211. Optionally, each low-order Wallace tree sub-circuit 1211 of the modified Wallace tree group circuit 121 may add each column number value of the partial product of all target codes; each of the low level Wallace Tree sub-circuits 1211 may output two signals, a Carry signal CarryiWith a Sum signal Sumi(ii) a Where i may represent the number corresponding to each lower Wallace tree sub-circuit 1211, the number of the first lower Wallace tree sub-circuit 1211 is 0. Alternatively, the number of input signals received by each of the lower Wallace tree sub-circuits 1211 may be equal to the number of target codes, or the number of partial products of the target codes. Wherein, the sum of the numbers of the upper Wallace tree sub-circuit 1213 and the lower Wallace tree sub-circuit 1211 in the modified Wallace tree group circuit 121 may be equal to 2N; the total number of columns from the lowest column to the highest column of the partial products of all target codes may be equal to 2N, the N lower walsh tree subcircuits 1211 may accumulate values of each column in the lower N columns of the partial products of all target codes, and the N upper walsh tree subcircuits 1213 may accumulate values of each column in the upper N columns of the partial products of all target codes.
Illustratively, if the data bit width received by the multiplier is N and the multiplier can currently process an N-bit data multiplication operation, then the data bit width in the multiplier is NThe selector 1212 may gate the last lower Wallace tree sub-circuit 1211 of the modified Wallace tree group circuit 121 to output the carry output signal CoutN-1As a carry input signal Cin received by the first high order Wallace Tree sub-circuit 1213 of the modified Wallace Tree group circuit 121N(ii) a It will also be appreciated that the multiplier may currently operate on the received N-bit data as a whole. When the multiplier can currently process N/2 bit data multiplication, the selector 1212 in the multiplier may gate 0 as the carry input signal Cin received by the first higher order Wallace Tree subcircuit 1213 in the modified Wallace Tree group circuit 121N(ii) a It will also be appreciated that the multiplier may currently divide the received N-bit data into high N/2-bit and low N/2-bit data for multiplication operations, respectively, wherein the corresponding numbers i from the first low-order wallace tree sub-circuit 1211 to the last low-order wallace tree sub-circuit 1211 may be represented as 0, 1, 2, …, N-1, respectively; the corresponding numbers i from the first high-order Wallace tree sub-circuit 1213 to the last high-order Wallace tree sub-circuit 1213 may be denoted as N, N +1, …, 2N-1, respectively.
It should be noted that, each of the low-order Wallace tree sub-circuits 1211 and the high-order Wallace tree sub-circuit 1213 of the modified Wallace tree group circuit 121 may receive a carry input signal CiniPartial product value input signal, carry output signal Couti. Optionally, the partial product value input signal received by each of the low-level wallace tree sub-circuits 1211 and the high-level wallace tree sub-circuits 1213 may be a value of a corresponding column in the partial product of all target codes; carry signal Cout output by each of the low Wallace Tree subcircuit 1211 and the high Wallace Tree subcircuit 1213iMay be equal to NCout=floor((NI+NCin)/2) -1. Wherein N isIMay represent the number of partial product value input signals, N, of the Wallace Tree subcircuitCinMay represent the number, N, of carry input signals of the Wallace Tree subcircuitCoutCan represent the least number of carry output signals of the Wallace tree subcircuit, floor (. cndot.) can represent a rounded down function. Optionally, the carry input signal received by each of the lower-level wallace tree sub-circuits 1211 and the upper-level wallace tree sub-circuits 1213 in the modified wallace tree group circuit 121 may be a carry output signal output by the previous lower-level wallace tree sub-circuit 1211 or the upper-level wallace tree sub-circuit 1213, and the carry input signal received by the first lower-level wallace tree sub-circuit 1211 is a value of 0. The carry input signal received by the first high-order Wallace tree sub-circuit 1213 may be determined by the bit width of the data currently processed by the multiplier and the bit width of the data received by the multiplier.
In the multiplier provided by the embodiment, the multiplier can accumulate partial products of target codes by correcting the Wallace tree group circuit to obtain two paths of output signals, and accumulate the two paths of output signals again by the accumulation circuit to obtain a multiplication result, and the process can multiply data with various bit widths, so that the area of an AI chip occupied by the multiplier is effectively reduced; meanwhile, the multiplier can also accumulate fewer effective partial products so as to reduce the complexity of multiplication operation.
Another embodiment provides a multiplier, wherein the multiplier comprises the accumulation circuit 122, and the accumulation circuit 122 comprises: and the adder 1221, where the carry adder 1221 is configured to perform an addition operation on the accumulation operation result.
Specifically, the adder 1221 may be a carry adder with different bit widths. Optionally, the adder 1221 may receive the two paths of signals output by the modified wallace tree group circuit 121, perform addition operation on the two paths of output signals, and output a target operation result of the multiplication operation. Alternatively, the adder 1221 may be a carry look ahead adder.
In the multiplier provided by the embodiment, the multiplier can perform accumulation processing on two paths of signals output by the modified wallace tree group circuit through the accumulation circuit, and output a target operation result of multiplication; the process can carry out multiplication operation on data with various bit widths, and effectively reduces the area of an AI chip occupied by the multiplier.
In one embodiment, the adder 1221 includes: a carry signal input port 1221a, a bit signal input port 1221b, and an operation result output port 1221 c; the carry signal input port 1221a is configured to receive a carry signal, the sum signal input port 1221b is configured to receive a sum signal, and the operation result output port 1221c is configured to output the target operation result obtained by performing accumulation processing on the carry signal and the sum signal.
Specifically, the adder 1221 may receive the Carry signal Carry output by the modified wallace tree group circuit 121 through the Carry signal input port 1221a, receive the Sum bit signal Sum output by the modified wallace tree group circuit 121 through the Sum bit signal input port 1221b, add the Carry signal Carry and the Sum bit signal Sum, and output the result through the operation result output port 1221 c.
It should be noted that, during multiplication, the multiplier may adopt an adder 1221 with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the modified wallace tree group circuit 121, where a bit width of data that can be processed by the adder 1221 may be equal to 2 times of a bit width N of data currently processed by the multiplier. Optionally, each of the low Wallace tree sub-circuits 1211 and the high Wallace tree sub-circuit 1213 of the modified Wallace tree group circuit 121 may output a Carry output signal CarryiAnd a Sum bit output signal Sumi(i ═ 1, …, 2N, i is the corresponding number for each lower or higher walsh tree sub-circuit, starting with 1). Optionally, the adder 1221 receives Carry { [ Carry { ] { [ Carry { ] received by1:Carry2N-1]0, that is, the bit width of the Carry output signal Carry received by the adder 1221 is 2N, the first 2N-1 bit values in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 lower and upper walsh tree sub-circuits in the modified walsh tree group circuit 121, and the last bit value in the Carry output signal Carry may be replaced by a value 0. Alternatively, the Sum output signal Sum received by the adder 1221 may have a bit width of 2N, and the value of the Sum output signal Sum may be equal to each of the lower or upper bits of the modified wallace tree group circuit 121The sum bit output signal of the lesch tree subcircuit.
Illustratively, if the multiplier is currently processing 8-bit by 8-bit fixed point multiplication, the adder 1221 may be a 16-bit Carry adder, as shown in fig. 6, the modified wallace tree group circuit 121 may output a Sum output signal Sum and a Carry output signal Carry of 16 lower and upper wallace tree sub-circuits, however, the Sum output signal received by the 16-bit Carry adder may be a complete Sum signal Sum output by the modified wallace tree group circuit 121, and the received Carry output signal may be a Carry output signal Carry of the modified wallace tree group circuit 121 excluding all Carry output signals output by the last upper wallace tree sub-circuit 1213, combined with a value 0.
In the multiplier provided by the embodiment, the two paths of signals output by the modified wallace tree group circuit can be accumulated by the multiplier through the accumulation circuit, and the target operation result of the multiplication operation is output.
Fig. 7 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the determining circuit 21, and the determining circuit 21 includes: a first data input port 211 and a first data output port 212; the first data input port 211 is configured to receive data to be subjected to multiplication, and the first data output port 212 is configured to output the received data.
Specifically, the judgment circuit 21 receives two data to be multiplied through the first data input port 211. Optionally, the data received by the determining circuit 21 may be a multiplier and a multiplicand in a multiplication operation, and bit widths of the multiplier and the multiplicand may be the same. Alternatively, the judgment circuit 21 may output the received two data through the first data output port 212 and input the two data to the data expansion circuit 22 at the same time, or input the two data to the regular signed number encoding circuit 23 at the same time.
It should be noted that, if the determining circuit 21 determines that the bit width of the two received data is N and is smaller than the bit width 2N of the data that can be processed by the multiplier, at this time, the determining circuit 21 needs to input the two received data with the bit width of N bits to the data expanding circuit 22 for expansion processing, so as to obtain two data with the bit width of 2N bits; if the judging circuit 21 judges that the bit width of the two received data is 2N and is equal to the bit width 2N of the data that can be processed by the multiplier, at this time, the judging circuit 21 may directly input the two received data with the bit width of 2N bits to the regular signed number encoding circuit 23 for the regular signed number encoding processing.
In the multiplier provided by this embodiment, the multiplier determines, by using the determining circuit, whether the received data needs to be processed by the next data expansion circuit, if the data expansion circuit does not need to be processed, the determining circuit directly inputs the received data to the regular signed number encoding circuit to perform the regular signed number encoding process to obtain the partial product of the target code, otherwise, the received data is input to the data expansion circuit to perform the expansion process, the expanded data is input to the regular signed number encoding circuit to perform the regular signed number encoding process to obtain the partial product of the target code, and the partial product of the target code is accumulated by the compression circuit to obtain the target operation result of the multiplication operation, in this process, the expansion process can be performed on the received low-bit-width data, and the expanded data meets the bit-width requirement that the multiplier can process, the target operation result is still the result of multiplication operation of the original bit width data, so that the operation of the low bit width data can be processed by the multiplier, and the area of an AI chip occupied by the multiplier is effectively reduced; meanwhile, the multiplier can adopt a regular signed number coding circuit to carry out regular signed number coding processing on received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
Fig. 7 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the data expansion circuit 22, and the data expansion circuit 22 includes: a second data input port 221, an extended mode selection signal input port 222, a function selection mode signal output port 223, and a second data output port 224; the second data input port 221 is configured to receive the data output by the determining circuit, the extension mode selection signal input port 222 is configured to receive a data extension mode selection signal corresponding to extension processing performed on the received data, the function selection mode signal output port 223 is configured to output the function selection mode signal determined according to a mode in which the data extension circuit performs extension processing on the received data, and the second data output port 224 is configured to output data after extension processing.
Specifically, there may be three data expansion mode selection signals received by the expansion mode selection signal input port 222, which are respectively denoted as 00, 01, and 10, where the signal 00 denotes that the data expansion circuit 22 can expand the received N-bit data into 2N-bit data, the upper N-bit value in the 2N-bit data may be equal to the value of the received N-bit data, and the lower N-bit value may be equal to the expanded N-bit value 0, at this time, the function selection mode signal output port 223 may output the function selection mode signal 00, and in the target operation result with a 4N-bit width obtained by the multiplier, the upper 2N-bit value may be the target operation result of the multiplication operation; signal 01 indicates that the data expansion circuit 22 may expand the received N-bit data into 2N-bit data, the lower N-bit value of the 2N-bit data may be equal to the value of the received N-bit data, and the upper N-bit values may be equal to the expanded N-bit value 0, at this time, the function selection mode signal output port 223 may output a function selection mode signal 01, and in the target operation result with a 4N-bit width obtained by the multiplier, the lower 2N-bit value may be the target operation result of the multiplication operation; the signal 10 indicates that the data expansion circuit 22 may expand the received N-bit data into 2N-bit data, the lower N-bit value of the 2N-bit data may be equal to the value of the received N-bit data, and the upper N-bit value may be equal to the sign bit value of the data received by the data expansion circuit 22, at this time, the function selection mode signal output port 223 may output the function selection mode signal 10, and the lower 2N-bit value of the target operation result of 4N-bit width obtained by the multiplier may be the target operation result of the multiplication operation.
It should be noted that, if the bit width of the two data received by the multiplier is 2N and is equal to the bit width 2N of the data that can be processed by the multiplier, the determining circuit 21 may directly input the two received data into the regular signed number encoding circuit 23 for regular signed number encoding processing; if the bit width of the two data received by the multiplier is N, which is smaller than the bit width 2N of the data that can be processed by the multiplier, and the data expansion mode selection signal received by the data expansion circuit 22 is 10, the judgment circuit 21 may input the two received data to the data expansion circuit 22 for expansion processing, and input the expanded data to the regular signed number encoding circuit 23 for regular signed number encoding processing.
In the multiplier provided by this embodiment, the multiplier may perform expansion processing on received data through the data expansion circuit, and input the expanded data into the regular signed number encoding circuit, perform regular signed number encoding processing to obtain a partial product of a target code, and perform accumulation processing on the partial product of the target code through the compression circuit to obtain a target operation result of multiplication operation, where the process may perform expansion processing on received low-bit-width data, and the expanded data meets a bit-width requirement of the data that can be processed by the multiplier, so that the target operation result is still a result of multiplication operation performed on the original bit-width data, thereby ensuring that the multiplier can process operation on the low-bit-width data and effectively reducing an area of an AI chip occupied by the multiplier; meanwhile, the multiplier can adopt the correction regular signed number coding circuit to carry out regular signed number coding processing on the received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
Fig. 7 is a schematic structural diagram of a multiplier provided in another embodiment, where the multiplier includes the regular signed number encoding circuit 23, and the regular signed number encoding circuit 23 includes: the constant sign number coding circuit comprises a constant sign number coding sub-circuit 231 and a partial product obtaining sub-circuit 232, wherein the output end of the constant sign number coding sub-circuit 231 is connected with the first input end of the partial product obtaining sub-circuit 232;
the regular signed number coding sub-circuit 231 is configured to perform regular signed number coding processing on the received data to obtain a target code, and the partial product obtaining sub-circuit 232 is configured to obtain a partial product of the target code according to the target code.
Specifically, the data received by the regular signed number coding sub-circuit 231 may be input by the judgment circuit 21, or may be input by the data expansion circuit 22, and the received data may be a multiplier in multiplication operation, and the multiplier is subjected to regular signed number coding processing to obtain the target code.
It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; and so on, the others are continued by l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number encoding sub-circuit 231 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00 (1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "00110 (-1)0(-1)0(-1)00(-1)00(-1) 0", -010 ", and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an initial code, and after the initial code is subjected to one bit complementing process, the representation regular signed number coding process is completed to obtain an intermediate code, wherein the bit width of the initial code may be equal to the bit width of the multiplier. Optionally, after the regular signed number coding sub-circuit 231 performs regular signed number coding processing on the multiplier, new data (i.e. initial coding) is obtained, and if the highest bit value and the second highest bit value in the new data are "10" or "01", the regular signed number coding sub-circuit 231 may complement a bit value of 0 to the first highest bit of the highest bit value of the new data, so as to obtain the corresponding middle coded high three-bit values of "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the multiplier plus 1.
In addition, if the bit width of the data received by the multiplier is 2N and the data operation can be currently performed on N-bit data, the regular signed number coding sub-circuit 231 in the multiplier can divide the 2N-bit data into two groups of N-bit data for data operation, and at this time, the two groups of (N +1) -bit intermediate codes obtained are combined to be used as target codes; if the multiplier can currently process 2N-bit data operation, the regular signed number encoding sub-circuit 231 in the multiplier can complement a bit value 0 (i.e. complement processing) at a position higher than the highest bit value of the obtained (2N +1) -bit intermediate code, and then take the (2N +2) -bit data after complement processing as the target code.
Optionally, the regular signed number encoding sub-circuit 231 includes: a third data input port 2311 and a coding output port 2312, where the third data input port 2311 is configured to receive first data subjected to regular signed number coding processing, and the coding output port 2312 is configured to output the target code obtained after the received first data is subjected to regular signed number coding processing.
It is understood that if the third data input port 2311 receives the first data, which may be a multiplier in a multiplication operation, the regular signed number coding sub-circuit 231 may perform a regular signed number coding process on the first data, target-code, and output the target-code through the coding output port 2312. Optionally, the regular signed number encoding sub-circuit 231 may receive a multiplier in the multiplication operation through the third data input port 2311, and the regular signed number encoding sub-circuit 231 may perform the regular signed number encoding process on the multiplier.
For example, if the multiplier receives 2N bits of data and can currently process N bits of data operation, at this time, the number of target codes obtained by the regular signed number coding sub-circuit 231 may be equal to (N +1), which is equivalent to performing the regular signed number coding process on the data, and the obtained (N +1) bits of intermediate codes may be directly used as the target codes; if the multiplier can currently process 2N-bit data operation, at this time, the number of target codes obtained by the regular signed number coding sub-circuit 231 may be equal to (2N +2), that is, the data is subjected to regular signed number coding processing, and the obtained (2N +1) -bit intermediate code needs to be further subjected to complement processing to obtain a (2N +2) -bit target code, where the complement processing may be characterized by complementing a bit value 0 at a position higher than a most significant bit value of the intermediate code.
In the multiplier provided by the embodiment, the multiplier can perform regular signed number encoding processing on received data through a regular signed number encoding circuit to obtain a partial product of a target code, and performs accumulation processing on the partial product of the target code through a compression circuit to obtain a target operation result of multiplication operation, the process can perform expansion processing on received low-bit-width data, and the expanded data meets the bit width requirement of the data which can be processed by the multiplier, so that the target operation result is still the result of multiplication operation performed on the original bit-width data, thereby ensuring that the multiplier can process the operation of the low-bit-width data and effectively reducing the area of an AI chip occupied by the multiplier; meanwhile, the multiplier can adopt a regular signed number coding sub-circuit to carry out regular signed number coding processing on the received data to obtain the target code, so that the number of effective partial products of the target code obtained by the partial product obtaining sub-circuit according to the target code is less, the complexity of realizing multiplication operation by the multiplier is reduced, the operation efficiency of the multiplication operation is improved, and the power consumption of the multiplier is effectively reduced.
Another embodiment provides a multiplier, wherein the multiplier comprises the partial product obtaining sub-circuit 232, and the partial product obtaining sub-circuit 232 comprises: a low bit partial product obtaining unit 2321, a low bit selector bank unit 2322, a high bit partial product obtaining unit 2323 and a high bit selector bank unit 2324; a first output terminal of the regular signed number coding sub-circuit 231 is connected to a first input terminal of the low bit product obtaining unit 2321, an output terminal of the low bit selector bank unit 2322 is connected to a second input terminal of the low bit product obtaining unit 2321, a second output terminal of the regular signed number coding sub-circuit 231 is connected to a first input terminal of the high bit product obtaining unit 2323, and an output terminal of the high bit selector bank unit 2324 is connected to a second input terminal of the high bit product obtaining unit 2323.
The lower bit partial product obtaining unit 2321 is configured to obtain a sign bit extended lower bit partial product according to a received lower bit target code in the target code and second data, and obtain a target code lower bit partial product according to the sign bit extended lower bit partial product, the lower bit selector group unit 2322 is configured to gate a value in the sign bit extended lower bit partial product, the upper bit partial product obtaining unit 2323 is configured to obtain a sign bit extended upper bit partial product according to a received higher bit target code in the target code and the second data, and obtain a target code upper bit partial product according to the sign bit extended upper bit partial product, and the upper bit selector group unit 2324 is configured to gate a value in the sign bit extended upper bit partial product.
Specifically, the lower-order partial product obtaining unit 2321 and the upper-order partial product obtaining unit 2323 may obtain the partial product of the target code according to the target code obtained by the regular signed number coding sub-circuit 231 and receive the second data, where the second data may be a multiplicand in the multiplication operation. Optionally, if the bit width of the data received by the regular signed number coding sub-circuit 231 is 2N, and the bit width of the data that can be currently processed by the multiplier is N bits, the regular signed number coding sub-circuit 231 may automatically split the received 2N-bit data into high N-bit data and low N-bit data, and perform regular signed number coding processing on the high N-bit data and the low N-bit data, respectively, so that the number of the obtained high target codes is equal to N plus 1, and the number of the obtained low target codes is also equal to N plus 1; meanwhile, the number of high bit products of the corresponding target code obtained by the high bit target code may be equal to (N +1), and the number of low bit products of the corresponding target code obtained by the low bit target code may be equal to (N + 1); if the bit width of the data received by the regular signed number coding sub-circuit 231 is 2N, and the bit width of the data that can be currently processed by the multiplier is also 2N bits, the regular signed number coding sub-circuit 231 may perform regular signed number coding processing on the received 2N-bit data to obtain a (2N +1) -bit intermediate code, and perform complement processing on the intermediate code, and then take the (2N +2) -bit code as a target code, where the complement processing may be characterized as complementing a value of 0 at a higher bit of a highest-order bit value of the data; that is, the highest bit value in the target code is signal 0, and all values contained in the partial product of the target code corresponding to the signal 0 are 0; wherein, the high (N +1) bit value in the (2N +2) bit target code can be called as the high bit target code, and the low (N +1) bit value can be called as the low bit target code.
It should be noted that the low-level selector set unit 2322 may gate, according to the received function selection mode signal, the low-level partial bit value in the low-level partial product after sign bit extension, which is the value in the partial product after sign bit extension obtained by N-bit multiplication or the value in the partial product after sign bit extension obtained by 2N-bit multiplication; similarly, the upper selector set unit 2324 may gate the partial bit value in the upper partial product after sign bit extension according to the received function selection mode signal, and the partial bit value is the value in the partial product after sign bit extension obtained by N-bit multiplication or the value in the partial product after sign bit extension obtained by 2N-bit multiplication.
It can be understood that, if the bit width of the data received by the multiplier may be 2N bits and the N-bit data multiplication operation can be currently processed, the low-order partial product obtaining unit 1121 in the multiplier may obtain a partial product after the sign bit corresponding to the low-N bit data is extended according to each bit value in the low-order target code; the low selector bank 1122 may gate the value of the sign-extended low bit product; and then combining the partial product after the sign bit expansion with the value in the lower bit partial product after the sign bit expansion after gating to obtain the lower bit partial product after the sign bit expansion. Optionally, the high-order partial product obtaining unit 2323 in the multiplier may obtain a partial product after sign bit extension corresponding to the high N-bit data according to each bit value in the high-order target code; the high selector bank unit 2324 may gate the value in the high partial product after sign bit extension; and then combining the partial product after the sign bit expansion with the value in the high-order partial product after the sign bit expansion after gating to obtain the high-order partial product after the sign bit expansion. Optionally, in the regular signed number coding processing process, the number of the obtained low-order target codes may be equal to the number of the obtained high-order target codes, and may also be equal to the number of low-order partial products after sign bit extension corresponding to low-N-bit data, or the number of high-order partial products after sign bit extension corresponding to high-N-bit data. Optionally, the modified regular signed number encoding circuit 11 may include (N +1) lower partial product obtaining units 2321, and may further include (N +1) upper partial product obtaining units 2323. Optionally, each of the low-order partial product obtaining unit 2321 and each of the high-order partial product obtaining units 2323 may include 2N number of value generating sub-units, and each of the value generating sub-units may obtain a bit value in the partial product after sign bit extension. Meanwhile, the low-order partial product obtaining unit 2321 may determine the low-order partial product of the corresponding target code according to the obtained low-order partial product after sign bit extension; the high-order partial product obtaining unit 2323 may determine the high-order partial product of the corresponding target code according to the obtained high-order partial product after sign bit extension.
In the present embodiment, the internal circuit structure and the output port function of the low-order partial product obtaining unit 2321 are the same as those of the low-order partial product obtaining unit 1121, and the specific structure of the low-order partial product obtaining unit 2321 will not be described here. Optionally, the internal circuit structure and the output port function of the low level selector set unit 2322 are the same as those of the low level selector set unit 1122, and the specific structure of the low level selector set unit 2322 is not described again in this embodiment. Optionally, the internal circuit structure and the output port function of the high-order partial product obtaining unit 2323 are the same as those of the high-order partial product obtaining unit 1123, and a detailed structure of the high-order partial product obtaining unit 2323 will not be described in this embodiment. Optionally, the internal circuit structure and the output port function of the high level selector bank unit 2324 are the same as those of the high level selector bank unit 1124, and the specific structure of the high level selector bank unit 2324 will not be described again in this embodiment.
In the multiplier provided by this embodiment, the multiplier can obtain the partial product of the corresponding target code according to each value in the target code through the partial product obtaining sub-circuit, and can perform accumulation processing on the partial products of all the target codes through the compression circuit to obtain the target operation result of the multiplication operation, the multiplier can perform expansion processing on the received low-bit-width data, and the expanded data meets the bit width requirement of the data that can be processed by the multiplier, so that the target operation result is still the result of the multiplication operation performed on the data with the original bit width, thereby ensuring that the multiplier can process the operation on the low-bit-width data, and effectively reducing the area of the AI chip occupied by the multiplier; meanwhile, the multiplier can adopt a regular signed number coding sub-circuit to carry out regular signed number coding processing on the received data to obtain the target code, so that the number of effective partial products of the target code obtained by the partial product obtaining sub-circuit according to the target code is less, the complexity of realizing multiplication operation by the multiplier is reduced, the operation efficiency of the multiplication operation is improved, and the power consumption of the multiplier is effectively reduced.
Fig. 7 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the compression circuit 24, and the compression circuit 24 includes: a wallace tree set sub-circuit 241 and an accumulation sub-circuit 242; wherein, the output terminal of the wallace tree group sub-circuit 241 is connected with the input terminal of the accumulation sub-circuit 242; the wallace tree group sub-circuit 241 is configured to perform accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation sub-circuit 242 is configured to perform accumulation processing on the accumulation operation result to obtain the target operation result.
Specifically, the wallace tree group sub-circuit 241 may accumulate the column number values in the partial products of all the target codes obtained by the regular signed number encoding circuit 23 to obtain two output results, and accumulate the two output results obtained by the wallace tree group sub-circuit 241 through the accumulation sub-circuit 242 to obtain the target operation result of the multiplication operation.
In the multiplier provided by this embodiment, the multiplier can perform accumulation operation processing on partial products of target codes through the wallace tree group sub-circuit, and perform accumulation operation processing on the accumulation operation result again through the accumulation sub-circuit to obtain a target operation result of multiplication operation, the multiplier can perform expansion processing on received low-bit-width data, and the expanded data meets the bit width requirement of data that can be processed by the multiplier, so that the target operation result is still the result of multiplication operation on the original bit-width data, thereby ensuring that the multiplier can process the operation on the low-bit-width data, and effectively reducing the area occupied by the multiplier on the AI chip; meanwhile, the number of effective partial products of the target code obtained by the multiplier is small, so that the complexity of the multiplier for realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 7, the multiplier includes the wallace tree group sub-circuit 241, and the wallace tree group sub-circuit 241 includes: the system comprises a low-level Wallace tree unit 2411, a selector 2412 and a high-level Wallace tree unit 2413, wherein the output ends of the low-level Wallace tree units 2411 are connected with the input end of the selector 2412, and the output end of the selector 2412 is connected with the input end of the high-level Wallace tree unit 2413; the low-order Wallace tree units 2411 are configured to accumulate each column of values in the target encoded partial product, the selector 2412 is configured to gate the carry input signal received by the high-order Wallace tree unit 2413, and the high-order Wallace tree units 2413 are configured to accumulate each column of values in the target encoded partial product.
Specifically, the circuit structure of each low-level wallace tree unit 2411 may be implemented by a combination of a full adder and a half adder, or by a combination of a 4-2 compressor, where the 4-2 compressor may be composed of multiple full adders; the circuit structure of each high-order Wallace tree unit 2413 may be implemented by a combination of a full adder and a half adder, or by a combination of a 4-2 compressor, where the 4-2 compressor may be composed of a plurality of full adders. In addition, both the lower level wallace tree unit 2411 and the upper level wallace tree unit 2413 can be understood as a circuit that can process a multi-bit input signal and add the multi-bit input signal to obtain a two-bit output signal. Optionally, the number of the upper-order wallace tree units 2413 in the wallace tree group sub-circuit 241 may be equal to the data bit width N currently processed by the multiplier, or may be equal to the number of the lower-order wallace tree units 2411, and the lower-order wallace tree units 2411 may be connected in series, or the upper-order wallace tree units 2413 may be connected in series. Optionally, an output of the last low-level wallace tree unit 2411 is connected to an input of a selector 2412, and an output of the selector 2412 is connected to an input of the first high-level wallace tree unit 2413. Optionally, each lower level Wallace tree unit 2411 in Wallace tree group subcircuit 241 may encode a corresponding column value in the partial product of all targetsPerforming addition processing; each low level Wallace tree unit 2411 may output two signals, namely Carry signal CarryiWith a Sum signal Sumi(ii) a Wherein i may represent the number corresponding to each lower Wallace tree unit 2411, and the number of the first lower Wallace tree unit 2411 is 0. Alternatively, the number of input signals received by each lower Wallace tree unit 2411 may be equal to the number of target codes, or the number of partial products of the target codes. The sum of the numbers of the upper-level Wallace tree units 2413 and the lower-level Wallace tree units 2411 in the Wallace tree group sub-circuit 241 may be equal to 2N; in the partial products of all target codes, the total number of columns from the lowest column to the highest column may be equal to 2N, N lower walsh tree units 2411 may perform an accumulation operation on each column number value in the lower N columns of the partial products of all target codes, and N upper walsh tree units 2413 may perform an accumulation operation on each column number value in the upper N columns of the partial products of all target codes.
Illustratively, if the data bit width received by the multiplier is N bits and the current multiplier can process an N-bit data multiplication operation, the selector 2412 may gate the last lower-order wallace tree unit 2411 in the wallace tree group sub-circuit 241 to output the carry output signal CoutN-1As the carry input signal Cin received by the first high order Wallace Tree Unit 2413 in Wallace Tree group subcircuit 241NIt can also be understood that the multiplier can currently operate on the received N-bit data as a whole; when the current multiplier can process N/2 bit data multiplication, the selector 2412 may gate 0 as the carry input signal Cin received by the first high order Wallace Tree unit 2413 in the Wallace Tree group sub-circuit 241NIt is also understood that the multiplier may divide the received N bits of data into high N/2 bits and low N/2 bits of data for multiplication, wherein the numbers i corresponding to the first low Wallace tree unit 2411 to the last low Wallace tree unit 2411 are 0, 1, 2, …, N-1, respectively, and the numbers i corresponding to the first high Wallace tree unit 2413 to the last high Wallace tree unit 2413 are N, N +1, …, 2N-1, respectively。
It should be noted that, for each of the low-order Wallace tree unit 2411 and the high-order Wallace tree unit 2413 in the Wallace tree group sub-circuit 241, the received signal may include a carry input signal CiniPartial product value input signal, carry output signal Couti. Optionally, the partial product value input signals received by each of the lower-level wallace tree unit 2411 and the upper-level wallace tree unit 2413 may be values of corresponding columns in all target-coded partial products, and the carry signal Cout output by each of the lower-level wallace tree unit 2411 and the upper-level wallace tree unit 2413iMay be equal to NCout=floor((NI+NCin)/2) -1. Wherein N isIMay represent the number of data input bits, N, of the Wallace Tree cellCinMay represent the carry-in number, N, of the Wallace Tree cellCoutThe least carry-out bits of the Wallace tree cell can be represented, and floor (·) can represent a floor rounding function. Optionally, the carry input signal received by each lower-level wallace tree unit 2411 or the higher-level wallace tree unit 2413 in the wallace tree group sub-circuit 241 may be a carry output signal output by the last lower-level wallace tree unit 2411 or the higher-level wallace tree unit 2413, and the carry input signal received by the first lower-level wallace tree unit 2411 is 0. The carry input signal received by the first high-order wallace tree unit 2413 may be determined by the data bit width currently processed by the multiplier and the data bit width received by the multiplier.
According to the multiplier provided by the embodiment, partial products of target codes can be accumulated through the Wallace tree group sub-circuit, the accumulated result is accumulated again through the accumulation sub-circuit, and a target operation result of multiplication is obtained.
In one embodiment, the accumulation sub-circuit 242 comprises: an adder 2421, wherein the adder 2421 is configured to add the accumulated result.
Specifically, the adder 2421 can be an adder with different bit widths. Optionally, the adder 2421 may receive the two signals output by the wallace tree group sub-circuit 241, perform addition operation on the two output signals, and output a target operation result of the multiplication operation. Alternatively, the adder 2421 may be a carry look ahead adder.
Optionally, the adder 2421 includes: a carry signal input port 2421a, a bit signal input port 2421b and an operation result output port 2421 c; the carry signal input port 2421a is configured to receive a carry signal, the sum signal input port 2421b is configured to receive a sum signal, and the operation result output port 2421c is configured to output a target operation result obtained by performing accumulation processing on the carry signal and the sum signal.
Optionally, the adder 2421 may receive a Carry signal Carry output by the wallace tree group sub-circuit 241 through a Carry signal input port 2421a, receive a Sum bit signal Sum output by the wallace tree group sub-circuit 241 through a Sum bit signal input port 2421b, add a result of the Carry signal Carry and the Sum bit signal Sum, and output the result through an operation result output port 2421 c.
It should be noted that, during multiplication, the multiplier may adopt an adder 2421 with different bit widths to add the Carry output signal Carry and the Sum output signal Sum output by the wallace tree group sub-circuit 241, where the bit width of the processable data of the adder 2421 may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Optionally, each wallace tree unit in the wallace tree group sub-circuit 241 may output a Carry output signal CarryiAnd a Sum bit output signal Sumi(i ═ 0, …, 2N-1, i is the corresponding number for each wallace tree cell, starting with number 0). Optionally, the adder 1421 receives Carry { [ Carry { ])0:Carry2N-2]0, that is to say,the bit width of the Carry output signal Carry received by the adder 1421 is 2N, the first 2N-1 bit value in the Carry output signal Carry corresponds to the Carry output signals of the first 2N-1 wallace tree units in the wallace tree group sub-circuit 241, and the last bit value in the Carry output signal Carry may be replaced by a value 0. Optionally, the Sum bit output signal Sum received by the adder 2421 has a bit width N, and the value of the Sum bit output signal Sum may be equal to the Sum bit output signal of each wallace tree unit in the wallace tree group sub-circuit 241.
For example, if the multiplier is currently processing 8 × 8 multiplication operations, the adder 2421 may be a 16-bit Carry look ahead adder, as shown in fig. 6, the wallace tree group sub-circuit 241 may output Sum output signals Sum and Carry output signals Carry of 16 wallace tree units, however, the Sum output signal received by the 16-bit Carry look ahead adder may be the complete Sum bit signal Sum output by the wallace tree group sub-circuit 241, and the Carry output signal received may be the Carry output signal Carry of the wallace tree group sub-circuit 241 after all Carry output signals except the Carry output signal output by the last wallace tree unit are combined with a value of 0. In fig. 6, Wallace _ i represents a Wallace tree unit, i is the number of the Wallace tree unit from 0, a solid line connected between every two Wallace tree units represents that the Wallace tree unit corresponding to the high-order number has a carry output signal, a dotted line represents that the Wallace tree unit corresponding to the high-order number does not have a carry output signal, and the ladder circuit represents a two-way selector.
In the multiplier provided by this embodiment, the multiplier can perform accumulation operation on two paths of signals output by the wallace tree group sub-circuit through the accumulation sub-circuit, and output a target operation result of the multiplication operation, and the multiplier can perform expansion processing on received low-bit-width data, and the data after the expansion processing meets the bit-width requirement of data that can be processed by the multiplier, so that the target operation result is still the result of the multiplication operation on the original bit-width data, thereby ensuring that the multiplier can process the operation on the low-bit-width data, and effectively reducing the area of an AI chip occupied by the multiplier.
Fig. 8 is a flowchart illustrating a data processing method according to an embodiment, where the method may be processed by the multipliers shown in fig. 1 and fig. 3, and this embodiment relates to a process of performing a multiplication operation on data with different bit widths. As shown in fig. 8, the method includes:
s101, receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating the data bit width which can be processed currently by the multiplier.
Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through a modified regular signed number encoding circuit. During each multiplication, the correcting regular signed number encoding circuit and the correcting compression circuit in the multiplier can receive the same function selection mode signal. Optionally, the data to be processed may be fixed-point numbers. If the multiplier receives different function selection mode signals, the characterization multiplier can process data operations with different bit widths, and meanwhile, the corresponding relationship between the different selection mode signals and the data with different bit widths processed by the multiplier can be flexibly set, and the embodiment is not limited at all. For example, the correction regular signed number encoding circuit and the correction compression circuit can receive a plurality of function selection mode signals, and taking three function selection mode signals as an example, the correction regular signed number encoding circuit and the correction compression circuit may respectively be a mode 00, a mode 01, and a mode 10, the mode 00 may indicate that the multiplier can process 16-bit data, the mode 01 may indicate that the multiplier can process 32-bit data, the mode 10 may indicate that the multiplier can process 64-bit data, the mode 00 may indicate that the multiplier can process 64-bit data, the mode 01 may indicate that the multiplier can process 16-bit data, and the mode 10 may indicate that the multiplier can process 32-bit data.
Optionally, the bit width of the multiplier and the multiplicand in the multiplication operation received by the correction regular signed number encoding circuit may be 8 bits, 16 bits, 32 bits, or 64 bits, which is not limited in this embodiment. Wherein, the bit width of the multiplier in the multiplication operation can be equal to the bit width of the multiplicand in the multiplication operation.
And S102, judging whether the data to be processed needs to be split or not according to the function selection mode signal.
Specifically, the multiplier may determine a bit width of data that can be processed by the current multiplier according to the received function selection mode signal, so as to determine whether to split the data to be processed. The splitting process may be characterized as dividing the data to be processed into a plurality of groups of data with the same bit width.
Optionally, the step of determining whether the to-be-processed data needs to be split according to the function selection mode signal in the step S102 may include: and judging whether the bit width of the data to be processed is equal to the bit width of the data which can be processed by the multiplier or not according to the function selection mode signal.
It should be noted that, in the above, according to the function selection mode signal, determining whether the data to be processed needs to be split, actually, it can be understood that, according to the function selection mode signal, determining whether the bit width of the data to be processed is equal to the bit width of the data that can be processed by the multiplier, if so, the data to be processed does not need to be split, otherwise, the data to be processed needs to be split.
S103, if the data to be processed needs to be split, splitting the data to be processed to obtain split data.
Optionally, after the step of determining, in the S102, whether the data to be processed needs to be split according to the function selection mode signal, the method further includes: and if the data to be processed does not need to be split, continuing to perform regular signed number coding processing on the data to be processed to obtain the target code.
Specifically, if the bit width of the multiplier and the multiplicand in the multiplication operation received by the correction regular signed number encoding circuit is not equal to the bit width of the data that can be processed corresponding to the function selection mode signal received by the multiplier, the multiplier can automatically divide the received data to be processed into a plurality of groups of data that are equal to the bit width of the data that can be processed by the multiplier currently according to the bit width of the data that can be processed by the multiplier currently, and perform parallel processing, where the bit width of the data to be processed received by the correction regular signed number encoding circuit can be greater than the bit width of the data that can be processed by the multiplier currently. Optionally, the parallel processing may be characterized by processing each divided group of data to be processed at the same time. If the bit width of the data to be processed received by the regular signed number encoding circuit is corrected to be equal to the bit width of the data which can be processed corresponding to the function selection mode signal received by the multiplier, the multiplier directly carries out subsequent processing on the complete data to be processed, and the subsequent processing is not required to be carried out after the data to be processed is split.
It should be noted that, if the bit width of the data to be processed received by the multiplier is 2N, and the bit width of the data that can be processed currently is 2N, the regular signed number coding sub-circuit in the multiplier may perform regular signed number coding processing on the complete 2N-bit data to obtain the corresponding target code. Wherein the regular signed number encoding process described above can be characterized as a data processing procedure by encoding by the values 0, -1 and 1.
Meanwhile, if the whole 2N bit data is directly subjected to the regular signed number coding processing, and the number of the obtained target codes can be equal to (2N +2) bits, the high (N +1) bit data can be called high bit data, and the low (N +1) bit data can be called low bit data.
And S104, performing regular signed number coding processing on the split data to obtain target codes.
Optionally, the step of performing regular signed number coding processing on the split data in S104 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the split data into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.
Specifically, if the bit width of the data to be processed received by the multiplier is 2N and the bit width of the data that can be processed currently is N, the regular signed number coding sub-circuit in the multiplier can automatically split the 2N-bit data into high N-bit data and low N-bit data, and simultaneously perform regular signed number coding processing on the high N-bit data and the low N-bit data respectively to obtain corresponding high-bit target codes and corresponding low-bit target codes. Optionally, the data to be processed may include high N-bit data to be processed and low N-bit data to be processed after being split. If the bit width of the data to be processed is 2N, the upper N bits may be referred to as upper data to be processed, and the lower N bits may be referred to as upper data to be processed.
And S105, obtaining a partial product of the target code according to the target code and the split data.
Specifically, the number of target codes described above may be equal to the data bit width subjected to the regular signed number coding process plus 1, and the partial product of the target codes may be equal to the number of target codes.
Optionally, after continuing to perform the step of performing regular signed number encoding processing on the data to be processed to obtain the target code, the method further includes: and obtaining a partial product of the target code according to the target code and the data to be processed.
It should be noted that, in the multiplication process, if the data to be processed does not need to be split, and the regular signed number correction coding circuit directly performs the regular signed number coding on the data to be processed to obtain the target code, the regular signed number correction coding circuit may obtain the partial product of the target code according to the multiplicand and the target code in the data to be processed. Alternatively, each bit value included in the target code may have a partial product of the corresponding target code.
And S106, accumulating the partial product of the target code to obtain a target operation result.
Specifically, the multiplier may perform accumulation processing on the column number in the partial product of all target codes to obtain a target operation result. Optionally, the bit width of the target operation result may be equal to 2 times of the bit width of the data currently processed by the multiplier.
The data processing method provided by this embodiment receives data to be processed and a function selection mode signal, determines whether the data to be processed needs to be split according to the function selection mode signal, if the data to be processed needs to be split, the data to be processed is split to obtain split data, the split data is subjected to regular signed number coding processing to obtain a target code, a partial product of the target code is obtained according to the target code and the split data, and the partial product of the target code is accumulated to obtain a target operation result, so that the method can perform multiplication operation on data with various bit widths according to the function selection mode signal received by a multiplier, thereby effectively reducing the area of an AI chip occupied by the multiplier; meanwhile, the method can carry out regular signed number coding processing on the data to be processed, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication and improving the operation efficiency of the multiplication.
As an embodiment, the step of performing regular signed number coding processing on the split data in S104 to obtain the target code specifically includes:
s1041, carrying out regular signed number coding processing on the split data to obtain an intermediate code.
Specifically, the split data subjected to the regular signed number encoding processing may be a multiplier in a multiplication operation.
S1042, obtaining the target code according to the intermediate code and the function selection mode signal.
Specifically, the method of the regular signed number encoding process may be characterized by the following steps: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; wherein, the N-bit multiplier is processed by regular signed number coding, and the bit width of the obtained target code can be equal to that of the target code(N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; and so on, the others are continued by l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number coding sub-circuit in the multiplier is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1)0(-1)0(-1)00(-1)00(-1), and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an initial code, and after the initial code is subjected to one bit complementing process, the representation regular signed number coding process is completed to obtain an intermediate code, wherein the bit width of the initial code may be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding sub-circuit performs regular signed number encoding processing on the multiplier, new data (i.e., initial encoding) is obtained, and if the highest-order numerical value and the second-order numerical value in the new data are "10" or "01", the regular signed number encoding sub-circuit may complement a one-order numerical value 0 to the first-order position of the highest-order numerical value of the new data, so as to obtain a corresponding middle-encoded high three-order numerical value of "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the multiplier plus 1.
In addition, if the bit width of the data received by the multiplier is 2N and the data operation can be currently processed by N-bit data, the regular signed number coding sub-circuit in the multiplier can divide the 2N-bit data into two groups of N-bit data for data operation respectively, and at this time, the obtained two groups of (N +1) -bit intermediate codes are combined to be used as target codes; if the multiplier can currently process 2N-bit data operation, the regular signed number encoding sub-circuit in the multiplier can complement a bit value 0 (i.e. complement processing) at the higher bit of the highest bit value of the obtained (2N +1) -bit intermediate code, and then take the (2N +2) -bit data after complement processing as the target code.
In the data processing method provided by this embodiment, the split data is subjected to regular signed number coding processing to obtain an intermediate code, and the target code is obtained according to the intermediate code and the function selection mode signal, so that the method can perform multiplication operation on multiple data with different bit widths, thereby effectively reducing the area of an AI chip occupied by a multiplier; meanwhile, the method can carry out regular signed number coding processing on the data, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication and improving the operation efficiency of the multiplication.
As an embodiment, the step of obtaining a partial product of the target code according to the target code and the split data in S105 may include: obtaining a low-order partial product of the target code according to the low-order target code and the split data; and obtaining the high-order partial product of the target code according to the high-order target code and the split data.
Specifically, the multiplier obtains an original low-order partial product according to the low-order target encoded and split data, performs sign bit extension processing on the original low-order partial product to obtain a partial product after sign bit extension, and further obtains a low-order partial product after sign bit extension according to all partial products after sign bit extension. Optionally, the original lower bit partial product may be a lower bit partial product without sign bit extension, and may also be understood as a partial product obtained by corresponding lower bit data without sign bit extension. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width N of the currently processable data of the multiplier, and the bit width of the original low-bit partial product may be equal to N. Optionally, the sign-extended partial product may include an N-bit value in the original lower-bit partial product and a sign-bit value in consecutive N-bit original lower-bit partial products.
It should be noted that, if the lower part of the product obtaining unit receives an 8-bit multiplicand x7x6x5x4x3x2x1x0(i.e., X), the lower partial product fetch unit may be based on the multiplicand X7x6x5x4x3x2x1x0(i.e., X) directly obtains the corresponding original lower product with three values-1, 1 and 0 contained in the lower target code, where the original lower product may be-X when the value in the lower target code is-1, the original lower product may be X when the value in the lower target code is 1, and the original lower product may be 0 when the value in the lower target code is 0.
It will be appreciated that each of the low selectors in the low selector bank unit may gate the corresponding bit value in the sign bit extended low bit partial product according to the different function selection mode signal received. Optionally, the low-order partial product obtaining unit may obtain the low-order partial product after the target sign bit extension corresponding to the bit width data currently processed by the multiplier according to the value in the low-order partial product after the sign bit extension obtained after the low-order selector bank unit is gated and the partial bit value in the partial product after the sign bit extension obtained by the multiplier currently processing the corresponding bit width data.
Further, the multiplier may obtain the lower bit products of the corresponding target codes according to the lower bit products after all sign bit extensions, and the distribution rule of the lower bit products of all target codes may be characterized in that, the lower bit product of the first target code may be equal to the lower bit product after the first sign bit extension, that is, the lower bit product after sign bit extension corresponding to the lowest bit value in the lower bit target codes, starting from the lower bit product of the second target code, the highest bit value in the lower bit product of each target code is in the same column as the highest bit value in the lower bit product of the first target code, the lower bit product of each target code may be equal to the lower bit product after the corresponding sign bit extension, and the lowest bit value of the lower bit product after sign bit extension is in the same column as the next highest bit value of the lower bit product of the last target code, that is, a plurality of values whose corresponding sign bit extended lower bit partial product exceeds the highest column value in the lower bit partial product of the first target code do not participate in the subsequent operation.
In addition, the multiplier obtains the original high-order partial product corresponding to the data with different bit widths currently processed by the multiplier according to the received high-order target code and the split data, and performs sign bit expansion processing on the original high-order partial product to obtain a partial product after sign bit expansion. Optionally, the original high-order partial product may be a high-order partial product without sign bit extension, and may also be understood as a partial product without sign bit extension, which is obtained by high-order data corresponding to the high-order data. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times of the data bit width N that can be processed by the multiplier, and the bit width of the original high-order partial product may be equal to N. Optionally, the sign-extended partial product may include an N-bit value in the original upper partial product and a sign-bit value in the N-bit original upper partial product.
It should be noted that each of the high selectors in the high selector bank unit may gate the corresponding bit value in the high partial product after sign bit extension according to the received different function selection mode signals. Optionally, the high-order partial product obtaining unit may obtain the high-order partial product after sign bit extension, which is obtained after the high-order selector bank unit gates, and the high-order partial product after sign bit extension, which is obtained by the multiplier currently processing the corresponding bit-width data, according to the value in the high-order partial product after sign bit extension, which is obtained by the multiplier currently processing the corresponding bit-width data.
Further, the multiplier may obtain the upper bit products of the corresponding target codes according to the upper bit products after all sign bit extensions, and the distribution rule of the upper bit products of all target codes may be characterized in that the upper bit product of the first target code may be located at the partial product of the target code next to the lower bit product of the last target code, that is, the partial product of the target code corresponding to the lowest bit value in the upper bit target code, the bit width of the upper bit product of the first target code may be equal to the bit width of the lower bit product of the last target code minus 1, that is, the upper bit product of the first target code may be equal to the upper bit product after the first sign bit extension, and the lowest bit value of the upper bit product after the sign bit extension is located in the same column as the next highest bit value of the lower bit product of the last target code, that is, the values of the upper product after the first sign bit expansion exceeding the highest column value in the lower product of the last target code do not participate in the subsequent operation, and starting from the upper product of the second target code, the highest value in the upper product of each target code and the highest value in the upper product of the first target code are in the same column, and the upper product of each target code may be equal to the upper product after the sign bit expansion, and the lowest value of the upper product after the sign bit expansion is in the same column as the next higher value of the upper product of the last target code, that is, the values of the upper product after the sign bit expansion exceeding the highest column value in the upper product of the first target code do not participate in the subsequent operation.
The data processing method provided by the embodiment can acquire fewer effective partial products of the target code, thereby reducing the complexity of multiplication.
As an embodiment, the step of performing accumulation processing on the partial product of the target code in S106 to obtain the target operation result may include:
s1061, accumulating the low-order partial product of the target code and the high-order partial product of the target code by a modified Wallace tree group circuit to obtain an intermediate operation result.
For example, when the lowest bit value to the highest bit value in the lower target code (bit width is (N +1)), the lowest bit value is numbered 1, the highest bit value is numbered (N +1), the numbers of the corresponding lower bit products of the target codes are similar, and when the lowest bit value to the highest bit value in the upper target code (bit width is (N +1)), the numbers of the lowest bit values are 1, the highest bit value is numbered (N +1), the numbers of the corresponding upper bit products of the target codes are similar, the distribution rule of the lower bit products of all the target codes and the upper bit products of all the target codes can be characterized as the lowest bit value of the upper bit product of the target code numbered 1, and the next lower bit value of the lower bit product of the target code numbered (N +1) is located in the same column, on the basis of the upper bit product of the first target code, the next lower value of the higher-order partial product of the other target codes is in the same column as the lowest value of the higher-order partial product of the next target code, and the next lower value of the lower-order partial product of the other target codes is in the same column as the lowest value of the lower-order partial product of the next target code based on the lower-order partial product of the first target code.
It should be noted that the modified wallace tree set circuit may perform an accumulation process on each column number in the lower bit partial product of all target codes and the upper bit partial product of all target codes.
And S1062, accumulating the intermediate operation result through an accumulation circuit to obtain the target operation result.
Optionally, the step of performing accumulation processing on the intermediate operation result through an accumulation circuit in S1062 to obtain the target operation result may specifically include: accumulating the column number values in the partial products of all target codes through a low-order improved Wallace tree sub-circuit to obtain an accumulated operation result; gating the accumulation operation result through a selector to obtain a carry gating signal; and performing accumulation processing through a high-order improved Wallace tree sub-circuit according to the carry gating signal and the column number in the partial product of the target code to obtain the target operation result.
Specifically, according to the distribution rule of the low-order partial products of all target codes and the high-order partial products of all target codes, the total column number of the values corresponding to the partial products of all target codes is 2N (N is the bit width of the data currently processed by the multiplier), and the number corresponding to each column of values from the lowest order value may be 0, …, 2N-1, where the numbers 0 to N-1 may be referred to as low-N column values. Alternatively, the accumulation operation result may be a carry output signal Cout output by the last modified wallace tree sub-circuit in the lower modified wallace tree sub-circuit.
It should be noted that the N improved wallace tree sub-circuits included in the low-order improved wallace tree sub-circuit may perform the accumulation operation on the low N column numbers according to the numbering order to obtain the accumulation operation result. Optionally, the result of the accumulation operation may include Carry output signals Carry, Sum of each modified Wallace tree sub-circuit, and output signal Cout of the last modified Wallace tree sub-circuit in the lower modified Wallace tree sub-circuit.
It is understood that the selector in the modified wallace tree group circuit may gate the output signal Cout or the value 0 of the last modified wallace tree sub-circuit in the lower modified wallace tree sub-circuits to obtain the carry gate signal according to the received function selection mode signal.
In this embodiment, according to the distribution rule of the partial products of all target codes, the total number of columns of the corresponding values of the partial products of all target codes is 2N (N is the bit width of the data currently processed by the multiplier), and the number corresponding to each column of values from the lowest bit value may be 0, …, 2N-1, where the numbers N to 2N-1 may be referred to as high N columns of values.
It should be noted that the N improved wallace tree sub-circuits included in the high-order improved wallace tree sub-circuit may perform the accumulation operation on the high N column numbers according to the numbering order, and output the accumulation operation result. The carry input signal received by the first high-order modified Wallace tree sub-circuit in the high-order modified Wallace tree sub-circuits may be a carry strobe signal output by the selector.
In the data processing method provided by this embodiment, the modified wallace tree group circuit is used to accumulate the low-order part product of the target code and the high-order part product of the target code to obtain an intermediate operation result, and the accumulation circuit is used to accumulate the intermediate operation result to obtain a target operation result, so that the method can multiply data with different bit widths according to the function selection mode signal received by the multiplier, thereby effectively reducing the area of the AI chip occupied by the multiplier; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication is reduced, and the operation efficiency of the multiplication is improved.
Fig. 9 is a flowchart illustrating a data processing method according to another embodiment, which can be processed by the multipliers shown in fig. 2 and fig. 7, where the embodiment relates to a process of performing a multiplication operation on data with different bit widths. As shown in fig. 9, the method includes:
s201, receiving data to be processed.
Specifically, the judgment circuit in the multiplier may receive two pieces of data to be processed, and the two pieces of data to be processed are a multiplier and a multiplicand in a multiplication operation, and bit widths of the multiplier and the multiplicand received by the multiplier may be the same in the same operation. In addition, the regular signed number encoding circuit and the compression circuit in the multiplier can also receive function selection mode signals, and different function selection mode signals can determine that the multiplier can currently process data with different bit widths. For example, the regular signed number encoding circuit and the compression circuit can receive a plurality of function selection mode signals, and taking three function selection mode signals as an example, the mode may be 00, 01, and 10, respectively, the mode 00 may indicate that the multiplier can process 16-bit data, the mode 01 may indicate that the multiplier can process 32-bit data, the mode 10 may indicate that the multiplier can process 64-bit data, the mode 00 may indicate that the multiplier can process 64-bit data, the mode 01 may indicate that the multiplier can process 16-bit data, and the mode 10 may indicate that the multiplier can process 32-bit data.
S202, judging whether the bit width of the data to be processed is equal to the bit width of the data which can be processed by the multiplier.
Specifically, the multiplier can automatically determine, through the determination circuit, whether the bit widths of the two received data to be processed are equal to the bit width of the data that can be currently processed by the multiplier. In this embodiment, if the bit width of the data that can be processed by the multiplier is 2N bits, the bit width of the data to be processed received by the determining circuit may be N bits or may also be 2N bits.
And S203, if the data to be processed are not equal, performing data expansion processing on the data to be processed to obtain expanded data.
Specifically, if the bit width of the data to be processed received by the determining circuit is not equal to the data bit width 2N that can be processed by the multiplier, the multiplier may perform data expansion processing on the data to be processed through the data expansion circuit, and expand the data to be processed into data with a bit width of 2N. Optionally, the data expansion processing may be characterized by complementing the small bit width data with a value 0 or other values, and converting the small bit width data into large bit width data.
Optionally, the step of performing data expansion processing on the data to be processed in S203 to obtain expanded data may specifically include: and performing data expansion processing on the data to be processed through a numerical value 0 or a sign bit numerical value of the data to be processed to obtain the expanded data, wherein the bit width of the expanded data is equal to the bit width of the data currently processed by the multiplier.
It should be noted that, the data expansion circuit in the multiplier may receive three data expansion mode selection signals, which are respectively denoted as 00, 01, and 10, where the signal 00 denotes that the data expansion circuit may expand the received N-bit data to be processed into 2N-bit data, the upper N-bit data in the 2N-bit data may be equal to the received N-bit data, and the values in the lower N-bit data may all be equal to the expanded value 0, at this time, the data expansion circuit may output the function selection mode signal 00, and in the operation result of 4N-bit wide obtained by the multiplier, the upper 2N-bit data may be a target operation result of multiplication operation; signal 01 indicates that the data expansion circuit can expand the received N-bit data into 2N-bit data, the lower N-bit data in the 2N-bit data can be equal to the received N-bit data, and the numerical values in the upper N-bit data can all be equal to the expanded numerical value 0, at this time, the data expansion circuit can output a function selection mode signal 00, and in the operation result with a 4N-bit wide obtained by the multiplier, the lower 2N-bit data can be the target operation result of the multiplication operation; the signal 10 indicates that the data expansion circuit can expand the received N-bit data into 2N-bit data, the lower N-bit data of the 2N-bit data can be equal to the received N-bit data, and the values of the upper N-bit data can be equal to the sign bit value of the data received by the data expansion circuit, at this time, the data expansion circuit can output the function selection mode signal 01, and of the operation result with 4N-bit width obtained by the multiplier, the lower 2N-bit data can be the target operation result of the multiplication operation.
And S204, performing regular signed number coding processing on the expanded data to obtain a partial product of the target code.
Specifically, the multiplier may perform regular signed number encoding processing on the expanded data through a regular signed number encoding circuit, and obtain a partial product of the target encoding according to the received multiplicand to be processed and the result of the regular signed number encoding. Optionally, the number of partial products of the target code may be equal to the bit width 2N plus 2 of the data currently processed by the multiplier, or may be equal to the bit width N plus 1 of the data currently processed by the multiplier.
And S205, accumulating the partial product of the target code to obtain a target operation result.
Specifically, the multiplier may accumulate the partial product of the target code through the compression circuit, and obtain the target operation result.
For example, a multiplier may process data with a bit width of 16 bits and receive two data with a bit width of 8 bits, and the multiplier may expand the received two data with a bit width of 8 bits into two data with a bit width of 16 bits through a data expansion circuit, and after performing a multiplication operation on the data, may obtain one data with a bit width of 32 bits; if the data expansion circuit expands two data with 8bit width into the values of low 8 bits which are both 0 and high 8 bits which are received 8 bits, at this time, the data expansion mode selection signal received by the data expansion circuit is 00, the output function selection mode signal is also 00, and the multiplier can intercept high 16 bits in the data with 32 bit width as the target operation result of the multiplication operation; if the data expansion circuit expands two data with 8bit width into data with 8bit width as the value 0 and data with 8bit width as the received data, at this time, the data expansion mode selection signal received by the data expansion circuit is 01, the output function selection mode signal is also 00, and the multiplier can intercept the data with 16 bit width as the target operation result of the multiplication operation; if the data expansion circuit expands two data with 8bit width into the sign bit value of the data with 8bit width, where the upper 8bit value is the sign bit value of the received data with 8bit width, and the lower 8bit data is the received data, at this time, the data expansion mode selection signal received by the data expansion circuit is 10, the output function selection mode signal is also 01, and the multiplier can intercept the lower 16 bit data in the data with 32 bit width as the target operation result of the multiplication operation.
The data processing method provided by this embodiment receives data to be processed, determines whether a bit width of the data to be processed is equal to a bit width of data that can be processed by a multiplier, performs data expansion processing on the data to be processed if the bit width of the data to be processed is not equal to the bit width of the data that can be processed by the multiplier, obtains expanded data, performs regular signed number coding processing on the expanded data, obtains a partial product of a target code, and performs accumulation processing on the partial product of the target code to obtain a target operation result, where the method can perform expansion processing on the received low-bit-width data, and the expanded data meets a bit width requirement of the data that can be processed by the multiplier, so that the target operation result is still a result of performing multiplication operation on the data with an original bit width, thereby ensuring that the multiplier can process operation on the low-bit-width data, and; meanwhile, the method can carry out regular signed number coding processing on the data to be processed, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication and improving the operation efficiency of the multiplication.
In the multiplication method provided in another embodiment, after the step of determining whether the bit width of the data to be processed is equal to the bit width of the data that can be processed by the multiplier, the method may further include: and if so, continuing to perform regular signed number coding processing on the data to be processed to obtain the partial product of the target code.
Specifically, if the bit width of the data to be processed received by the multiplier is equal to the bit width 2N of the data that can be currently processed by the multiplier, the judgment circuit in the multiplier may input the received data to be processed to the regular signed number coding circuit, and directly perform regular signed number coding processing on the data to be processed by the regular signed number coding circuit to obtain the partial product of the target code. In this case, the multiplier does not need to perform data expansion processing on the data to be processed.
Optionally, after continuing to perform regular signed number encoding processing on the data to be processed to obtain the partial product of the target code, the method further includes: carrying out regular signed number coding processing on data to be processed to obtain target codes; and obtaining a partial product of the target code according to the data to be processed and the target code.
It should be noted that, if the bit width of the data to be processed received by the multiplier is equal to the bit width 2N of the data that can be currently processed by the multiplier, at this time, the multiplier does not need to perform data expansion processing on the data to be processed, and can directly perform regular signed number encoding processing on the received data to be processed, so as to perform subsequent processing.
In the data processing method provided by this embodiment, if the bit width of the to-be-processed data received by the multiplier is equal to the bit width of the data that can be currently processed by the multiplier, the regular signed number coding circuit may directly perform regular signed number coding processing on the to-be-processed data to obtain the partial product of the target code, and perform accumulation processing on the partial product of the target code to obtain the target operation result, the method can perform expansion processing on the received low-bit-width data, and the expanded data meets the bit width requirement of the data that can be processed by the multiplier, so that the target operation result is still the result of performing multiplication operation on the original bit-width data, thereby ensuring that the multiplier can process the operation on the low-bit-width data, and effectively reducing the area of an AI chip occupied by the multiplier; meanwhile, the method can carry out regular signed number coding processing on the data to be processed, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication and improving the operation efficiency of the multiplication.
In another embodiment of the multiplication method, the step of performing regular signed number coding processing on the extended data in the step S204 to obtain a partial product of the target code includes:
s2041, performing regular signed number coding processing on the expanded data to obtain target codes.
Specifically, the multiplier may perform regular signed number encoding processing on the expanded multiplier to be processed through a regular signed number encoding sub-circuit, so as to obtain the target code.
Optionally, the step of performing regular signed number coding processing on the extended data in S2041 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the expanded data into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.
Specifically, the method of the regular signed number encoding process may be characterized by the following steps: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; class i (in order)Push, other consecutive l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number coding sub-circuit in the multiplier is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1)0(-1)0(-1)00(-1)00(-1), and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an initial code, and after the initial code is subjected to one bit complementing process, the representation regular signed number coding process is completed to obtain an intermediate code, wherein the bit width of the initial code may be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding sub-circuit performs regular signed number encoding processing on the multiplier, new data (i.e., initial encoding) is obtained, and if the highest-order numerical value and the second-order numerical value in the new data are "10" or "01", the regular signed number encoding sub-circuit may complement a one-order numerical value 0 to the first-order position of the highest-order numerical value of the new data, so as to obtain a corresponding middle-encoded high three-order numerical value of "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the multiplier plus 1.
In addition, if the bit width of the data received by the multiplier is 2N and the data operation can be currently processed by N-bit data, the regular signed number coding sub-circuit in the multiplier can divide the 2N-bit data into two groups of N-bit data for data operation respectively, and at this time, the obtained two groups of (N +1) -bit intermediate codes are combined to be used as target codes; if the multiplier can currently process 2N-bit data operation, the regular signed number encoding sub-circuit in the multiplier can complement a bit value 0 (i.e. complement processing) at the higher bit of the highest bit value of the obtained (2N +1) -bit intermediate code, and then take the (2N +2) -bit data after complement processing as the target code.
S2042, obtaining a partial product of the target code according to the expanded data and the target code.
Specifically, the partial product obtaining sub-circuit may obtain a partial product of the target code according to the expanded multiplicand to be processed and the target code. It should be noted that, if the bit width of the to-be-processed data received by the multiplier is N and the bit width of the currently processable data is 2N, the multiplier needs to perform expansion processing on the to-be-processed N-bit data to obtain expanded 2N-bit data, and then perform regular signed number coding processing on the 2N-bit data to obtain a corresponding target code, where the number of the target codes may be equal to (2N +2), and the number of the obtained partial products corresponding to the target code may also be equal to (2N + 2).
The data processing method provided by this embodiment performs regular signed number coding processing on expanded data to obtain a code, obtains a partial product of a target code according to data to be processed and the target code, and performs accumulation processing on the partial product of the target code to obtain a target operation result, and can perform expansion processing on received low-bit-width data, where the expanded data meets a data bit width requirement processable by a multiplier, so that the target operation result is still a result of multiplication operation performed on original bit-width data, thereby ensuring that the multiplier can process operation on the low-bit-width data, and effectively reducing the area of an AI chip occupied by the multiplier; meanwhile, the method can carry out regular signed number coding processing on the data to be processed, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication and improving the operation efficiency of the multiplication.
In one embodiment, the step of obtaining the partial product of the target code according to the extended data and the target code in S2042 may specifically include:
s2042a, obtaining an original partial product according to the expanded data and the target code.
In particular, the number of original partial products may be equal to the number of target codes. Alternatively, the original partial product may be a partial product without sign bit extension. Optionally, the expanded data may be a multiplicand in a multiplication operation.
Illustratively, if the partial product fetch sub-circuit receives an 8-bit multiplicand x7x6x5x4x3x2x1x0(i.e., X), then the partial product fetch subcircuit may be based on the multiplicand X7x6x5x4x3x2x1x0(i.e. X) directly obtains corresponding original partial products with three values-1, 1 and 0 contained in the target code, where the original partial product may be-X when the value in the target code is-1, the original partial product may be X when the value in the target code is 1, and the original partial product may be 0 when the value in the target code is 0.
S2042b, sign bit expansion processing is carried out on the original partial product, and a partial product after sign bit expansion is obtained.
Specifically, the partial product obtaining sub-circuit may perform sign bit extension processing on the original partial product according to the sign bit value of the original partial product, so as to obtain the partial product after sign bit extension. Optionally, the bit width of the original partial product may be equal to the bit width N of the data currently processed by the multiplier, and the bit width of the partial product after sign bit extension may be equal to 2N. Optionally, the lower N-bit value in the partial product after the sign bit extension is the N-bit value in the original partial product, and the upper N-bit value in the partial product after the sign bit extension is the sign bit value in the original partial product.
S2024c, shifting the partial product after the sign bit expansion to obtain the partial product of the target code.
Specifically, each target code partial product may be equal to the corresponding sign bit expanded partial product, and may also be equal to a partial bit value in the corresponding sign bit expanded partial product, where a first target code partial product may be equal to a first corresponding sign bit expanded partial product, starting from a second target code partial product, a lowest bit value in each target code partial product may be located in the same column as a next-lowest bit value in a last target code partial product, which is equivalent to each bit value in each sign bit expanded partial product, each bit value in the last sign bit expanded partial product is shifted to the left by one column on the basis of the column in which each bit value in the last sign bit expanded partial product is located, and a highest bit value of each target code partial product is located in the same column as a highest bit value in the first target code partial product, the numerical value of the higher column corresponding to the highest numerical value in the partial product exceeding the first target code does not participate in the accumulation operation. Alternatively, the column number of the partial products of all target codes may be equal to 2 times the bit width of the data currently processed by the multiplier.
According to the data processing method provided by the embodiment, the original partial product is obtained according to the split data and the target code, sign bit expansion processing is carried out on the original partial product to obtain the partial product after sign bit expansion, the partial product of the target code is obtained according to the partial product after sign bit expansion, and then accumulation processing is carried out on the partial products of all the target codes to obtain the target operation result; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication is reduced, and the operation efficiency of the multiplication is improved.
In another embodiment of the data processing method, the step of performing accumulation processing on the partial product after sign bit extension in the step S205 to obtain a target operation result may include:
s2051, accumulating the partial product of the target code through the Wallace tree group subcircuit to obtain an intermediate operation result.
Specifically, the multiplier may accumulate all partial products after sign bit expansion by the wallace tree group sub-circuit according to a distribution rule to obtain an intermediate operation result. Optionally, the intermediate operation result may include a Sum bit output signal Sum and a Carry output signal Carry, where bit widths of the Sum bit output signal Sum and the Carry output signal Carry may be the same.
And S2052, accumulating the intermediate operation result through an accumulation sub-circuit to obtain the target operation result.
Specifically, the multiplier may add the Carry output signal Carry and the Sum output signal Sum output by the wallace tree group sub-circuit by an adder in the accumulation sub-circuit, and output an addition result.
Optionally, the step of performing accumulation processing on the intermediate operation result through an accumulation sub-circuit in S2052 to obtain the target operation result may specifically include: accumulating the column number values in the partial products of all target codes through a Wallace tree unit to obtain an accumulation operation result; gating the accumulation operation result through a selector to obtain a carry gating signal; and performing accumulation processing through a high-order improved Wallace tree sub-circuit according to the carry gating signal and the column number in the partial product of the target code to obtain the target operation result.
According to the data processing method provided by the embodiment, the partial product of the target code is accumulated by the Wallace tree group sub-circuit to obtain an intermediate operation result, and the intermediate operation result is accumulated by the accumulation sub-circuit to obtain a target operation result, the method can be used for expanding the received low-bit-width data, and the expanded data meets the bit width requirement of the data which can be processed by the multiplier, so that the target operation result is still the result of multiplication of the original bit-width data, the multiplier can be used for processing the operation of the low-bit-width data, and the area of an AI chip occupied by the multiplier is effectively reduced; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication is reduced, and the operation efficiency of the multiplication is improved.
The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers can be linked and transmit data through a specific structure, for example, the PCIE bus interconnects and transmits data to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 10 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Alternatively, as shown in fig. 11, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.
In some embodiments, a chip package structure is provided, which includes the above chip.
In some embodiments, a board card is provided, which includes the above chip package structure. As shown in fig. 12, fig. 12 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving means 391 and control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 grains (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 grains are adopted in each group of memory units, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a multiplier, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required for this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A multiplier, characterized in that it comprises: the output end of the judging circuit is connected with the input end of the data expansion circuit and the first input end of the regular signed number coding circuit, the output end of the data expansion circuit is connected with the second input end of the regular signed number coding circuit, the output end of the regular signed number coding circuit is connected with the first input end of the compression circuit, and the regular signed number coding circuit comprises a third input end and is used for receiving a function selection mode signal; the compression circuit comprises a second input end for receiving the function selection mode signal;
the judging circuit is used for judging whether the received data needs to be processed through the data expansion circuit connected with the output end of the judging circuit, the data expansion circuit is used for carrying out expansion processing on the received data, the regular signed number coding circuit is used for carrying out regular signed number coding processing on the received data to obtain a partial product of a target code, and the compression circuit is used for carrying out accumulation processing on the partial product of the target code.
2. The multiplier of claim 1, wherein the decision circuit comprises: a first data input port and a first data output port; the first data input port is used for receiving data for multiplication processing, and the first data output port is used for outputting the received data.
3. The multiplier of any of claims 1 to 2, wherein the data spreading circuit comprises: a second data input port, an extended mode selection signal input port, a function selection mode signal output port, and a second data output port; the second data input port is configured to receive the data output by the determining circuit, the extension mode selection signal input port is configured to receive a data extension mode selection signal corresponding to extension processing performed on the received data, the function selection mode signal output port is configured to output the function selection mode signal determined according to a mode in which the data extension circuit performs extension processing on the received data, and the second data output port is configured to output the data after the extension processing.
4. The multiplier of claim 1, wherein the regular signed number encoding circuit comprises: the system comprises a regular signed number coding sub-circuit and a partial product obtaining sub-circuit, wherein the output end of the regular signed number coding sub-circuit is connected with the first input end of the partial product obtaining sub-circuit;
the regular signed number coding sub-circuit is used for carrying out regular signed number coding processing on the received data to obtain a target code, and the partial product obtaining sub-circuit is used for obtaining a partial product of the target code according to the target code.
5. The multiplier of claim 4, wherein the regular signed number encoding subcircuit comprises: the third data input port is used for receiving first data subjected to regular signed number coding processing, and the coding output port is used for outputting the target code obtained after the received first data is subjected to the regular signed number coding processing.
6. The multiplier of claim 4 or 5, wherein the partial product obtaining sub-circuit comprises: a low bit partial product obtaining unit, a low bit selector set unit, a high bit partial product obtaining unit and a high bit selector set unit; a first output end of the regular signed number coding sub-circuit is connected with a first input end of the low-order partial product acquisition unit, an output end of the low-order selector set unit is connected with a second input end of the low-order partial product acquisition unit, a second output end of the regular signed number coding sub-circuit is connected with a first input end of the high-order partial product acquisition unit, and an output end of the high-order selector set unit is connected with a second input end of the high-order partial product acquisition unit;
the low bit partial product obtaining unit is configured to obtain a low bit partial product after sign bit extension according to a low bit target code in the received target code and second data, and obtain a low bit partial product of the target code according to the low bit partial product after sign bit extension, the low bit selector set unit is configured to gate a numerical value in the low bit partial product after sign bit extension, the high bit partial product obtaining unit is configured to obtain a high bit partial product after sign bit extension according to a high bit target code in the received target code and the second data, and obtain a high bit partial product of the target code according to the high bit partial product after sign bit extension, and the high bit selector set unit is configured to gate a numerical value in the high bit partial product after sign bit extension.
7. The multiplier of claim 1, wherein the compression circuit comprises: a Wallace tree group sub-circuit and an accumulation sub-circuit; the output end of the Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit; the Wallace tree group sub-circuit is used for accumulating the partial products of the target codes to obtain an accumulation operation result, and the accumulation sub-circuit is used for accumulating the accumulation operation result to obtain a target operation result.
8. The multiplier of claim 7, wherein the Wallace tree group subcircuit comprises: the system comprises a low-level Wallace tree unit, a selector and a high-level Wallace tree unit, wherein the output end of the low-level Wallace tree unit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree unit; the low-order Wallace tree unit is used for performing accumulation operation on each column value in the partial product of the target code, the selector is used for gating a carry input signal received by the high-order Wallace tree unit, and the high-order Wallace tree unit is used for performing accumulation operation on each column value in the partial product of the target code.
9. The multiplier of claim 7 or 8, wherein the accumulation sub-circuit comprises: an adder for adding the result of the addition operation.
10. The multiplier of claim 9, wherein the adder comprises: a carry signal input port, a sum signal input port and an operation result output port; the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the operation result output port is used for outputting a target operation result obtained by accumulating the carry signal and the sum signal.
CN201921433536.7U 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal Active CN209879493U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201921433536.7U CN209879493U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201921433536.7U CN209879493U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Publications (1)

Publication Number Publication Date
CN209879493U true CN209879493U (en) 2019-12-31

Family

ID=68949528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201921433536.7U Active CN209879493U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Country Status (1)

Country Link
CN (1) CN209879493U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362293A (en) * 2019-08-30 2019-10-22 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362293A (en) * 2019-08-30 2019-10-22 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment
CN110362293B (en) * 2019-08-30 2023-12-19 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment

Similar Documents

Publication Publication Date Title
CN110413254B (en) Data processor, method, chip and electronic equipment
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN110554854A (en) Data processor, method, chip and electronic equipment
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN209879493U (en) Multiplier and method for generating a digital signal
CN113031912A (en) Multiplier, data processing method, device and chip
CN210006031U (en) Multiplier and method for generating a digital signal
CN210006029U (en) Data processor
CN210109789U (en) Data processor
CN110647307B (en) Data processor, method, chip and electronic equipment
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN210109863U (en) Multiplier, device, neural network chip and electronic equipment
CN111258542B (en) Multiplier, data processing method, chip and electronic equipment
CN113031915A (en) Multiplier, data processing method, device and chip
CN110688087A (en) Data processor, method, chip and electronic equipment
CN209879492U (en) Multiplier, machine learning arithmetic device and combination processing device
CN209962284U (en) Multiplier, device, chip and electronic equipment
CN210006082U (en) Multiplier, device, neural network chip and electronic equipment
CN210006084U (en) Multiplier and method for generating a digital signal
CN111258546B (en) Multiplier, data processing method, chip and electronic equipment
CN113033799B (en) Data processor, method, device and chip

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant