CN209879492U - Multiplier, machine learning arithmetic device and combination processing device - Google Patents

Multiplier, machine learning arithmetic device and combination processing device Download PDF

Info

Publication number
CN209879492U
CN209879492U CN201921434164.XU CN201921434164U CN209879492U CN 209879492 U CN209879492 U CN 209879492U CN 201921434164 U CN201921434164 U CN 201921434164U CN 209879492 U CN209879492 U CN 209879492U
Authority
CN
China
Prior art keywords
partial product
circuit
bit
multiplier
target code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201921434164.XU
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201921434164.XU priority Critical patent/CN209879492U/en
Application granted granted Critical
Publication of CN209879492U publication Critical patent/CN209879492U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The application provides a multiplier, a machine learning arithmetic device and a combined processing device, wherein the multiplier comprises: the multiplier can carry out regular signed number coding on received data through the regular signed number coding circuit, the obtained number of effective partial products is small, and therefore the complexity of the multiplier for realizing multiplication operation is reduced.

Description

Multiplier, machine learning arithmetic device and combination processing device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a multiplier, a machine learning arithmetic device, and a combination processing device.
Background
With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has higher and higher requirements for high-performance digital multipliers. As one of algorithms widely used by an intelligent chip, a neural network algorithm is a common operation in which multiplication is performed by a multiplier.
At present, a multiplier takes every three bits of a multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a wallace tree to obtain a multiplication result. However, in the conventional technique, the number of non-zero values in the code is large, and the number of the generated corresponding partial products is large, so that the complexity of the multiplier for realizing multiplication operation is high.
SUMMERY OF THE UTILITY MODEL
In view of the above, there is a need to provide a multiplier that can reduce the number of effective partial products obtained during multiplication to reduce the complexity of multiplication of the multiplier.
An embodiment of the present application provides a multiplier, where the multiplier includes: the device comprises a regular signed number coding circuit, a correction partial product acquisition circuit, a Wallace tree group circuit and an accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the correction partial product acquisition circuit, the output end of the correction partial product acquisition circuit is connected with the input end of the Wallace tree group circuit, the output end of the Wallace tree group circuit is connected with the input end of the accumulation circuit, and the correction partial product acquisition circuit comprises a partial product acquisition branch and a partial product acquisition branch of target codes;
the regular signed number coding circuit is used for carrying out regular signed number coding processing on received data to obtain a target code, the correction partial product acquisition circuit is used for obtaining a partial product after sign bit expansion and a correction numerical value according to the target code, the partial product acquisition branch of the target code obtains a partial product of the target code according to the partial product after sign bit expansion, the Wallace tree group circuit is used for carrying out accumulation processing according to the partial product of the target code and the correction numerical value, and the accumulation circuit is used for carrying out addition operation processing on an accumulation operation result.
In one embodiment, the regular signed number encoding circuit comprises: the data input port is used for receiving first data subjected to the regular signed number coding processing, and the target coding output port is used for outputting the target code obtained after the received first data is subjected to the regular signed number coding processing.
In one embodiment, the modified partial product acquisition circuit includes: the target code comprises a target code input port, a data input port and a partial product output port, wherein the target code input port is used for receiving target codes, the data input port is used for receiving second data, and the partial product output port is used for outputting partial products of the target codes obtained according to the second data and the target codes.
In one embodiment, the wallace tree set circuit includes: the modified partial product acquisition circuit comprises a modified value input port, a partial product input port, a sum signal output port and a carry signal output port, wherein the modified value input port is used for receiving the modified value acquired by the modified partial product acquisition circuit, the partial product input port is used for receiving the partial product obtained by the modified partial product acquisition circuit after sign bit expansion, the sum signal output port is used for outputting a sum signal acquired by the Wallace tree group circuit, and the carry signal output port is used for outputting a carry signal acquired by the Wallace tree group circuit.
In one embodiment, the wallace tree set circuit includes: the Wallace tree sub-circuit is used for accumulating the number of each column in the partial product of all target codes to obtain an accumulated operation result.
In one embodiment, the accumulation circuit comprises: and the adder is used for performing accumulation processing on the two paths of output signals according to the corrected values to obtain the target operation result.
In one embodiment, the adder comprises: the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving the sum signal, the carry input port is used for receiving the correction value obtained by the correction partial product acquisition circuit and filling the correction value to the lowest position of the carry input signal, and the result output port is used for outputting a result obtained by performing accumulation operation on the carry signal, the sum signal and the correction value.
The multiplier provided by this embodiment includes a regular signed number encoding circuit, a modified partial product obtaining circuit, a wallace tree group circuit, and an accumulation circuit, where an output end of the regular signed number encoding circuit is connected to an input end of the modified partial product obtaining circuit, an output end of the modified partial product obtaining circuit is connected to an input end of the wallace tree group circuit, and an output end of the wallace tree group circuit is connected to an input end of the accumulation circuit, and the multiplier can perform regular signed number encoding on received data through the regular signed number encoding circuit, so that the number of obtained effective partial products is small, thereby reducing complexity of the multiplier in realizing multiplication.
The machine learning arithmetic device provided by the embodiment of the application comprises one or more multipliers; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of multipliers, the plurality of computing devices are connected through a preset specific structure and transmit data;
the multipliers are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.
The combined processing device provided by the embodiment of the application comprises the machine learning processing device, the universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.
The neural network chip provided by the embodiment of the application comprises the multiplier, the machine learning arithmetic device or the combined processing device.
The neural network chip packaging structure provided by the embodiment of the application comprises the neural network chip.
The board card provided by the embodiment of the application comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
The chip provided by the embodiment of the application comprises at least one multiplier as described in any one of the above.
An electronic device provided by the embodiment of the application comprises the chip.
Drawings
Fig. 1 is a schematic structural diagram of a multiplier according to an embodiment;
FIG. 2 is a schematic diagram of another multiplier according to another embodiment;
fig. 3 is a schematic diagram illustrating a distribution rule of partial products of all target codes obtained by 8-bit data multiplication according to another embodiment;
FIG. 4 is a schematic diagram of a specific structure of a multiplier according to an embodiment;
FIG. 5 is a schematic diagram illustrating a distribution rule of partial products of all target codes obtained by 8-bit data multiplication according to another embodiment;
FIG. 6 is a schematic diagram of a connection structure of a Wallace tree sub-circuit for performing an 8-bit data multiplication according to another embodiment;
fig. 7 is a schematic diagram of another specific structure of a multiplier according to another embodiment;
FIG. 8 is a schematic diagram illustrating a distribution rule of partial products of all target codes obtained by another 8-bit data multiplication according to another embodiment;
FIG. 9 is a schematic diagram of a modified Wallace tree sub-circuit for performing an 8-bit data multiplication according to another embodiment;
FIG. 10 is a schematic diagram of another modified Wallace tree sub-circuit for performing an 8-bit data multiplication according to another embodiment;
FIG. 11 is a flowchart illustrating a method for processing data according to an embodiment;
FIG. 12 is a flow chart illustrating another data processing method according to another embodiment;
FIG. 13 is a block diagram of a combined processing device according to an embodiment;
FIG. 14 is a block diagram of another combined processing device according to an embodiment;
fig. 15 is a schematic structural diagram of a board card according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The multiplier provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip or other hardware circuit devices for multiplication processing, and the specific structural schematic diagram of the multiplier is shown in fig. 1 and 2.
Fig. 1 is a schematic diagram of a specific structure of a multiplier according to an embodiment, and as shown in fig. 1, the multiplier includes: the device comprises a regular signed number coding circuit 11, a corrected partial product acquisition circuit 12, a Wallace tree group circuit 13 and an accumulation circuit 14, wherein the output end of the regular signed number coding circuit 11 is connected with the input end of the corrected partial product acquisition circuit 12, the output end of the corrected partial product acquisition circuit 12 is connected with the input end of the Wallace tree group circuit 13, the output end of the Wallace tree group circuit 13 is connected with the input end of the accumulation circuit 14, and the corrected partial product acquisition circuit 12 comprises a partial product acquisition branch 12a and a target coded partial product acquisition branch 12 b.
The regular signed number encoding circuit 11 is configured to perform regular signed number encoding processing on received data to obtain a target code, the partial product obtaining branch 12a is configured to obtain a partial product after sign bit expansion and a correction value according to the target code, the partial product obtaining branch 12b is configured to obtain a partial product of the target code according to the partial product after sign bit expansion, the wallace tree group circuit 13 is configured to perform accumulation processing according to the partial product of the target code and the correction value, and the accumulation circuit 14 is configured to perform addition operation on an accumulation operation result and the correction value.
Specifically, the regular signed number encoding circuit 11 may receive a multiplier in the multiplication operation, and perform regular signed number encoding processing on the multiplier to obtain the target code. Alternatively, the regular signed number encoding process described above may be characterized as a data processing procedure by encoding by the values 0, -1 and 1. Alternatively, the multiplier may be a fixed point number.
In addition, the method of the regular signed number encoding processing can be characterized in that, for the N-bit multiplier, the value is processed from the lower bit to the upper bit, if there is a continuous l (l)>2) bit 1, then l bit 1 may be converted into data "1 (0))l-1(-1) ", and combining the other corresponding N-l bit value with the converted l +1 bit value to obtain a new data, and using the new data as the initial data of the next stage conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit 1, wherein the N-bit multiplier is subjected to a regular signed number encoding processThe bit width of the target code may be equal to the N +1 bit value. Further, in the regular signed number encoding process, 11 may be equal to 100-001, i.e. 11 is equal to 10(-1), 111 may be equal to 1000-0001, i.e. 111 is equal to 100(-1), and other continuous l (l)>2) bit 1 conversion process is also similar.
For example, the multiplier received by the regular signed number encoding circuit 11 is "001010101101110", the first new data obtained after the first-stage conversion is "0010101011100 (-1) 0", the second new data obtained after the second-stage conversion is continued to be "0010101100 (-1)00(-1) 0", the third new data obtained after the third-stage conversion is continued to be "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained after the fourth-stage conversion is continued to be "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained after the fifth-stage conversion is continued to be "010 (-1)0(-1)0(-1)00(-1)00(-1) 0", in this case, the fifth new data may be referred to as intermediate coding, and the bit padding process is performed on the intermediate coding once, so that the regular signed number coding process is completed. Optionally, in new data obtained after the regular signed number encoding processing is performed by the regular signed number encoding circuit 11, if the highest bit value and the next highest bit value are "10" or "01", the regular signed number encoding circuit 11 supplements a bit value 0 to a higher bit of the obtained highest bit value of the intermediate code, and obtains the high three-bit values of the corresponding target code as "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the target code minus 1.
In addition, if the bit width of the data received by the multiplier is 2N and the data operation can be currently performed on N-bit data, the regular signed number encoding circuit 11 in the multiplier can divide the 2N-bit data into two groups of N-bit data for data operation, and at this time, the two groups of (N +1) -bit intermediate codes obtained are combined to be used as target codes; if the multiplier can currently process 2N-bit data operation, the regular signed number encoding circuit 11 in the multiplier may complement a bit value 0 (i.e., complement processing) at a position higher than the highest bit value of the obtained (2N +1) -bit intermediate code, and then use the (2N +2) -bit data after complement processing as the target code.
The bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of the original partial products, and the modified partial product obtaining circuit 12 may obtain the partial product after the sign bit is extended and the corresponding modified value according to each bit value included in the target code. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width of the data currently processed by the multiplier. Optionally, the partial products of all the target codes may include the values in the partial products after sign bit extension.
It can be understood that the regular signed number encoding circuit 11 in the multiplier may perform regular signed number encoding processing on the received multiplier to obtain a target code, the modified partial product obtaining circuit 12 may obtain a partial product after sign bit extension and a modified numerical value through the target code and the received multiplicand, and obtain a partial product of the target code according to the partial product after sign bit extension, and the wallace tree group circuit 13 may perform accumulation processing on each column number value in the partial product of the target code obtained by the modified partial product obtaining circuit 12 to obtain an accumulation result, and perform accumulation processing on the accumulation result obtained by the wallace tree group circuit 13 through the accumulation circuit 14 to obtain a target operation result of multiplication. During the multiplication, the modified value can be used as the carry input signal of the wallace tree group circuit 13 and the accumulation circuit 14. Optionally, the value in the target code may be a positive signal (i.e., values 1 and 0) or a negative signal (i.e., value-1), if the value in the target code is-1, the modified value is-1, otherwise, the modified value is 0. Optionally, the multiplier and the multiplicand may be fixed-point numbers with the same multi-bit width, and the multiplier provided in this embodiment may process data multiplication with a fixed bit width.
In the multiplier provided in this embodiment, the regular signed number encoding circuit performs regular signed number encoding on the received data to obtain the target code, the modified partial product obtaining circuit obtains the partial product after sign bit extension and the modified value according to the target code and the received data, and obtains the partial product of the target code according to the partial product after sign bit extension, accumulating the partial product of the target code and the corrected value by the Wallace tree group circuit to obtain an accumulated result, accumulating the accumulated result again by the accumulation circuit to obtain an operation result, the multiplier can adopt a regular signed number coding circuit to carry out regular signed number coding processing on received data, and the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of realizing multiplication by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Fig. 2 is a schematic diagram of a specific structure of a multiplier according to an embodiment, and as shown in fig. 2, the multiplier includes: the device comprises a regular signed number coding circuit 21, a correction partial product acquisition circuit 22, a correction Wallace tree group circuit 23 and an accumulation circuit 24, wherein the output end of the regular signed number coding circuit 21 is connected with the input end of the correction partial product acquisition circuit 22, the output end of the correction partial product acquisition circuit 22 is connected with the input end of the correction Wallace tree group circuit 23, the output end of the correction Wallace tree group circuit 23 is connected with the input end of the accumulation circuit 24, and the correction partial product acquisition circuit 22 comprises a partial product acquisition branch 221 and a partial product acquisition branch 222 of target coding. The regular signed number coding circuit 21 is configured to perform regular signed number coding processing on received data to obtain a target code, the partial product obtaining branch 221 is configured to obtain a partial product after sign bit extension and a modified value according to the target code, the partial product obtaining branch 222 of the target code is configured to obtain a partial product of the target code according to the partial product after sign bit extension and the modified value, the modified wallace tree group circuit 23 is configured to perform accumulation processing according to the partial product of the target code and the modified value to obtain an accumulation operation result, and the accumulation circuit 24 is configured to perform addition processing on the accumulation operation result.
Specifically, onThe regular signed number encoding circuit 21 may receive a multiplier in the multiplication operation, and perform regular signed number encoding processing on the multiplier to obtain a target code. It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; and so on, the others are continued by l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number encoding circuit 21 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1)0(-1)00(-1)00 (1) 0", and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an intermediate code, and after the intermediate code is subjected to one bit complementing process, the representation regular signed number coding process is completed, wherein the bit width of the intermediate code may be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding circuit 21 performs the regular signed number encoding processing on the multiplier, in the obtained new data (i.e. intermediate code), if the highest bit value and the second highest bit value in the new data are "10" or "01", the regular signed number encoding circuit 21 may complement a bit value of 0 at a higher bit of the highest bit value of the intermediate code obtained by the new data, so as to obtain a high three-bit value of "010" or "001" respectively corresponding to the target code. Optionally, the bit width of the intermediate code may be equal to the bit width of the target code minus 1.
Optionally, the bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of the original partial products, and the modified partial product obtaining circuit 22 may obtain the partial product after the sign bit is extended and the corresponding modified value according to each bit value in the target code, and obtain the partial product of the target code according to all the partial products after the sign bit is extended and all the modified values. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width of the data currently processed by the multiplier. Optionally, the modified wallace tree group circuit 23 may accumulate each column number value in the partial product of all target codes obtained by the regular signed number encoding circuit 21 to obtain an accumulated result, and accumulate the accumulated result obtained by the modified wallace tree group circuit 23 by the accumulation circuit 24 to obtain a target operation result of multiplication. Alternatively, all of the target encoded partial products may include all of the sign bit extended partial products and all of the modified values.
It should be noted that the multiplier can receive the multiplicand in the multiplication operation through the corrected partial product obtaining circuit 22, and obtain the partial product after sign bit expansion according to the target code obtained by the regular signed number encoding circuit 21 and the received multiplicand, and meanwhile, the corrected partial product obtaining circuit 22 can obtain the corrected value according to the target code obtained by the regular signed number encoding circuit 21.
Optionally, the regular signed number encoding circuit 21, where the regular signed number encoding circuit 21 includes: the data input port 211 is configured to receive first data subjected to the regular signed number encoding processing, and the target encoding output port 212 is configured to output the target encoding obtained by performing the regular signed number encoding processing on the received first data.
Alternatively, the internal structure of the regular signed number encoding circuit 21 and the function of the output port, and the internal structure of the regular signed number encoding circuit 11 and the function of the output port may be completely the same.
In the multiplier provided by this embodiment, the regular signed number encoding circuit performs regular signed number encoding on the received data to obtain the target code, the modified partial product obtaining circuit obtains the partial product after sign bit expansion and the modified value according to the target code and the received data, and obtains the partial product of the target code according to the partial product after the sign bit expansion and the correction value, accumulating the partial product of the target code by the modified Wallace tree group circuit to obtain an accumulated result, accumulating the accumulated result by the accumulation circuit, therefore, the multiplier can carry out regular signed number coding processing on the received data through the regular signed number coding circuit, the number of effective partial products obtained in the multiplication process is reduced, and the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, the multiplier includes the regular signed number encoding circuit 11, and the regular signed number encoding circuit 11 includes: the data input port 111 is configured to receive first data subjected to the regular signed number encoding processing, and the target encoding output port 112 is configured to output the target encoding obtained by performing the regular signed number encoding processing on the received first data.
Specifically, if the regular signed number encoding circuit 11 receives a first data through the data input port 111, the regular signed number encoding circuit 11 may perform a regular signed number encoding process on the first data to obtain a target code, and output the target code through the target code output port 112, where the first data may be a multiplier in a multiplication operation. Optionally, the values included in the target code obtained after the regular signed number coding process may be-1, 0, and 1.
In the multiplier provided by this embodiment, the regular signed number coding circuit may perform regular signed number coding on received first data to obtain a target code, then the modified partial product obtaining circuit may obtain a partial product after sign bit extension and a modified value according to a value in the target code and the received data, obtain a partial product of the target code through the partial product after sign bit extension and the modified value, obtain an accumulation result by performing accumulation processing on the partial product of the target code through the wallace tree group circuit, and finally perform accumulation processing on the received accumulation result through the accumulation circuit, where the number of effective partial products that can be obtained by the multiplier is small, thereby reducing the complexity of the multiplier in realizing multiplication; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As one embodiment, the modified partial product acquisition circuit 12 includes: a target code input port 121, a data input port 122 and a partial product output port 123, wherein the target code input port 121 is configured to receive a target code, the data input port 122 is configured to receive second data, and the partial product output port 123 is configured to output a partial product of the target code obtained according to the second data and the target code.
Specifically, the modified partial product obtaining circuit 12 may receive, through the target code input port 121, each bit value in the target code output by the regular signed number coding circuit 11, where the target code may include three values, which are-1, 0, and 1, and meanwhile, the modified partial product obtaining circuit 12 may obtain an original partial product according to each bit value in the received target code and the received second data, perform sign bit extension processing on the original partial product to obtain a partial product after sign bit extension, obtain a corresponding modified value according to each bit value in the received target code, and finally obtain a partial product of the target code according to the partial product after sign bit extension. Optionally, the data input port 122 may receive second data in a multiplication operation, which may be a multiplicand in the multiplication operation. Optionally, each bit value in the original partial product and the sign bit extended partial product is a value 0 and/or 1, where 0 may represent a low level signal and 1 may represent a high level signal.
Optionally, in a distribution rule of partial products of all target codes, each partial product of a target code may be equal to a partial product after sign bit expansion, and may also be equal to a partial bit value in the partial product after sign bit expansion, where a first partial product of a target code may be equal to a first corresponding partial product after sign bit expansion, starting from a second partial product of the target code, a lowest bit value in each partial product of the target codes may be located in the same column as a lower two-bit value of a lowest bit value in a previous partial product of the target code, which is equivalent to each bit value in each partial product after sign bit expansion, and each bit value in the partial product after last sign bit expansion is shifted to the left by one column on the basis of the corresponding column where each bit value in the partial product after sign bit expansion is located, and a highest bit value of each partial product of the target codes is located in the same column as a highest bit value in the partial product of the first target code, the higher column value of the column corresponding to the highest bit value in the partial product after the expansion of the first sign bit does not participate in the accumulation operation. Alternatively, the column number of the partial products of all target codes may be equal to 2 times the bit width of the data currently processed by the multiplier. Alternatively, the number of partial products of the target encoding may be equal to the bit width of the original partial product plus 1.
It should be noted that the modified partial product obtaining circuit 12 may directly obtain an original partial product according to the target code, and perform sign bit extension processing on the original partial product to obtain a partial product after sign bit extension. Optionally, the bit width of the original partial product may be equal to N, and the bit width of the partial product after sign bit extension may be equal to 2N, where N represents the bit width of the data currently processed by the multiplier. Optionally, in the partial product after sign bit extension, the lower N-bit value may be equal to the N-bit value of the original partial product, and the upper N-bit value is extended to the sign bit value in the original partial product, where the sign bit value in the original partial product may be the highest bit value in the original partial product. It is also understood that the upper N-bit values in the sign-extended partial product may be the same and the lower N-bit values may be equal to the lower N-bit values in the original partial product. However, the distribution rule of all sign bit expanded partial products participating in the operation is the same as that of all target codes, that is, from the second sign bit expanded partial product, the lowest bit value is in the same column as the next lowest bit value of the last sign bit expanded partial product, and the highest bit value of each sign bit expanded partial product is in the same column as the highest bit value of the first sign bit expanded partial product.
Illustratively, if the multiplier currently handles 8-bit by 8-bit fixed point multiplication, an original partial product obtained by the modified partial product obtaining circuit 12 is p8p7p6p5p4p3p2p1p0The sign bit extended partial product obtained after the sign bit extension process is performed can be represented as p8p8p8p8p8p8p8p8p7p6p5p4p3p2p1p0
In addition, if the modified partial product obtaining circuit 12 can receive three values, which are-1, 0 and 1, respectively, included in the target code, when the value in the target code is-1, the corresponding modified value may be 1, and when the value in the target code is 1 or 0, the corresponding modified value may be 0.
Illustratively, continuing with the above example, the multiplier performs 8-bit by 8-bit data multiplication to obtain a distribution rule of the partial products of all target codes, as shown in fig. 3, where "·" represents a sign bit value in the sign bit extended partial product, and "·" represents other bit values except the sign bit value in the sign bit extended partial product.
In the present embodiment, the internal circuit configuration and the implementation function of the above-described modified partial product acquisition circuit 12 and the modified partial product acquisition circuit 22 may be completely the same.
In the multiplier provided by this embodiment, the modified partial product obtaining circuit may obtain the partial product after sign bit extension and the modified value corresponding to each bit value included in the target code and the received data, obtain the partial product of the target code according to the partial product after sign bit extension, accumulate the partial product of the target code through the wallace tree group circuit and the modified value to obtain an accumulated result, and finally accumulate the received accumulated result through the accumulation circuit.
In one embodiment, the multiplier includes the wallace tree grouping circuit 13, and the wallace tree grouping circuit 13 includes: a modified value input port 131, a partial product input port 132, a sum bit signal output port 133 and a carry signal output port 134, where the modified value input port 131 is configured to receive the modified value obtained by the modified partial product obtaining circuit 12, the partial product input port 132 is configured to receive the sign bit expanded partial product obtained by the modified partial product obtaining circuit 12, the sum bit signal output port 133 is configured to output the sum bit signal obtained by the wallace tree group circuit 13, and the carry signal output port 134 is configured to output the carry signal obtained by the wallace tree group circuit 13.
Optionally, as shown in fig. 4, the multiplier includes the wallace tree group circuit 13, and the wallace tree group circuit 13 includes: the Wallace tree sub-circuits 1311-131 n are used for accumulating the number of each column in the partial product of all the target codes to obtain the accumulation operation result.
Specifically, the circuit structures of the Wallace tree sub-circuits 1311-131 n may be implemented by a combination of full adders and/or half adders, and it is understood that the Wallace tree sub-circuits 1311-131 n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Optionally, the number n of the wallace tree sub-circuits included in the wallace tree group circuit 13 may be equal to 2 times of the bit width of the data currently processed by the multiplier, and the n wallace tree sub-circuits perform parallel processing on the partial product of the target code, but the connection manner may be serial connection. Optionally, each wallace tree sub-circuit in the wallace tree group circuit 13 may add each column number value in all target-coded partial products, and each wallace tree sub-circuit may output two signals, namely, Carry signal CarryiWith a Sum signal SumiWhere i may represent the number corresponding to each Wallace tree sub-circuit, and the number of the first Wallace tree sub-circuit is 0. Optionally, the number of input signals received by each wallace tree subcircuit may be equal to the data bit width received by the multiplier plus 1, the input signals may be a column of values in all target-coded partial products, or a column of values in all target-coded partial products and 0, where a port of a carry input signal received by each wallace tree subcircuit is different from a port of a partial product input signal, and the number of partial product input ports of each wallace tree subcircuit is fixed, and if the total number of the column of values in all target-coded partial products is less than the number of the partial product input ports, a partial product input port to which no value is input may be input with 0. In addition, the modified value input port 131 and the partial product input port 132 may be input ports of a first wallace tree sub-circuit in the wallace tree group circuit 13, and the sum signal output port 133 and the carry signal output port 134 may be input ports of a last wallace tree sub-circuit in the wallace tree group circuit 13.
It should be noted that the carry output signal of each Wallace tree sub-circuit can be used as the next Wallace tree sub-circuitAnd the circuit inputs a corresponding carry signal. Optionally, if the numbers corresponding to the first target codes are 0, 1, …, n, respectively, and the numbers of the corresponding modified values may also be 0, 1, …, n, then the carry input signal received by the first wallace tree sub-circuit in the wallace tree group circuit 13 may be the modified values corresponding to the n-1 target codes with the numbers from 0 to n-2 corresponding to the target codes. Optionally, the carry output bit number N of each Wallace tree sub-circuitCoutMay be equal to floor ((N)I+NCin) /2) -1, wherein NIRepresenting the number of partial product value input signals, N, of the Wallace Tree subcircuitCinRepresenting the number of carry input signals of the Wallace Tree subcircuit, floor (·) representing a floor rounding function, NCoutRepresenting the number of carry-out signals of the smallest number.
In addition, the first wallace tree sub-circuit in the wallace tree group circuit 13 may receive the carry input signal as the modified value corresponding to the N-1 target codes with the target code corresponding numbers from 0 to N-2, and therefore, the distribution rule of the partial products of all the target codes may be understood as that the actually operated value is a (N +2) × 2N square matrix, and except the value in each sign bit expanded partial product, the value in the blank around each bit value is complemented by the value 0 or 1. Optionally, the partial product of each target code corresponds to one digit value in the target code, if the value in the target code is-1, the blank of the partial product of the target code corresponding to the actual operation is operated by using the complement digit of the value 1, and if the value in the target code is 0 or 1, the blank of the partial product of the target code corresponding to the actual operation is operated by using the complement digit of the value 0. Continuing with the previous example, the multiplier performs 8-bit by 8-bit fixed point multiplication, which may be equal to 9 partial products of the target codes, and N may be equal to 8, so the distribution rule of the partial products of all the target codes may be as shown in fig. 5, where "●" represents the sign bit value in the partial product after sign bit extension, "o" represents the other bit values except the sign bit value in the partial product after sign bit extension,representing the value of the space in the partial product of all target codes, which is determined by the target code. Optionally, in the actual accumulation operation, the wallace tree group circuit 13 replaces the complement value of the blank in the first column of data in the partial product of all the target codes with the modified value corresponding to the n-1 target codes with the target code corresponding number from 0 to n-2.
For example, if the multiplier currently processes 8 bits by 8 bits data multiplication, the distribution rule of all target-coded partial products obtained by the modified partial product obtaining circuit 12 may continue to refer to fig. 5, each Wallace tree sub-circuit may receive all values of corresponding columns in all target-coded partial products, a connection circuit diagram of 16 Wallace tree sub-circuits in the Wallace tree group circuit 13 is shown in fig. 6, in the diagram, Wallace _ i represents a Wallace tree sub-circuit, i is a number of the Wallace tree sub-circuit starting from 0, a solid line connected between every two Wallace tree sub-circuits indicates that the Wallace tree sub-circuit corresponding to the high-bit number has a carry output signal, and a dotted line indicates that the Wallace tree sub-circuit corresponding to the high-bit number has no carry output signal.
In the multiplier provided by this embodiment, the walsh tree group circuit accumulates the partial products of the target codes to obtain the accumulated result, and the accumulation circuit accumulates the accumulated result.
Another embodiment provides a specific structure of a multiplier, wherein the multiplier includes the accumulation circuit 14, and the accumulation circuit 14 includes: and the adder 141 is configured to perform accumulation processing on the two paths of output signals according to the correction value to obtain the target operation result.
Specifically, the carry input signal of the adder 141 may be a modified value corresponding to the target code of the last number. Optionally, the multiplier may shift the Carry output signal Carry output by the wallace tree group circuit 13 through the adder 141 to obtain a shifted Carry output signal Carry ', and then accumulate the shifted Carry output signal Carry', the Sum output signal Sum output by the wallace tree group circuit, and the Carry input signal received by the accumulation circuit, and output the operation result. Optionally, the shift processing may be left shift by one bit, after the wallace tree group circuit 13 performs left shift by one bit on the output Carry output signal Carry, the lowest bit value corresponding to the original Carry input signal Carry is null, and at this time, the adder 141 may fill the correction value corresponding to the target code corresponding to the received second to last number to the lowest bit position corresponding to the original Carry input signal Carry to obtain the shifted Carry output signal Carry'.
In this embodiment, the internal circuit structures and the implementation functions of the accumulation circuit 14 and the accumulation circuit 24 may be completely the same.
According to the multiplier provided by the embodiment, the two paths of signals output by the Wallace tree group circuit and the received carry signals can be accumulated by the accumulation circuit to output the multiplication result, and the multiplier can reduce the complexity of multiplication, improve the operation efficiency of the multiplication and effectively reduce the power consumption of the multiplier.
As one embodiment, the adder 141 includes: a carry signal input port 1411, a sum bit signal input port 1412, a carry input port 1413, and a result output port 1414, where the carry signal input port 1411 is configured to receive a carry signal, the sum bit signal input port 1412 is configured to receive the sum bit signal, the carry input port 1413 is configured to receive the modified value obtained by the modified partial product obtaining circuit 12 and fill the modified value to the lowest bit of the carry input signal, and the result output port 1414 is configured to output a result obtained by performing an accumulation operation on the carry signal, the sum bit signal, and the modified value.
Specifically, the adder 141 may receive the Carry signal Carry output from the wallace tree grouping circuit 13 through the Carry signal input port 1411, and may sum the Carry signal CarryThe bit signal input port 1412 receives the Sum bit signal Sum output by the wallace tree group circuit 13, and may receive, through the Carry input port 1413, the modified value corresponding to the target code of the last number obtained by the modified partial product obtaining circuit 12, and may also receive the modified value corresponding to the target code of the second to last number obtained by the modified partial product obtaining circuit 12, where Carry is { Carry }1,Carry2,…,Carryi},Sum={Sum1,Sum2,…,SumiI denotes the number of Wallace tree sub-circuits corresponding to the Wallace tree set circuit 13 from the first Wallace tree sub-circuit, i may be equal to the number of Wallace tree sub-circuits. The modified value corresponding to the target code of the last number may be used as the carry input signal of the adder 141.
The adder 141 may sum the received correction value, Carry input signal and sum input signal with the original Carry input signal Carry ═ Carry { Carry ═ Carry } before performing the accumulation operation on the received correction value, Carry input signal and sum input signal1,Carry2,…,CarryiLeft shift by one bit to get an i-1 Carry input signal, i.e. { Carry }2,Carry3,…,CarryiThe adder 141 may fill the correction value corresponding to the last number received to the lowest bit corresponding to the Carry input signal Carry to obtain an i-bit Carry input signal again, and the adder 141 may perform an accumulation operation on the i-bit Carry input signal obtained again, the Sum signal Sum received, and the correction value corresponding to the target code of the last number received, and output the result of the accumulation operation through the result output port 1414.
Illustratively, if the multiplier is currently processing 8-bit by 8-bit fixed point multiplication, the adder 141 may be a 16-bit Carry look ahead adder, and as shown in fig. 6, the wallace tree bank circuit 13 may output the Sum output signal Sum and the Carry output signal Carry of 16 wallace tree sub-circuits, however, the Sum output signal received by the 16-bit Carry look ahead adder may be the complete Sum signal Sum output by the wallace tree bank circuit 13, and the received Carry output signal may be the Carry output signal Carry of the wallace tree bank circuit 13 after all Carry output signals except the Carry output signal output by the last wallace tree sub-circuit are combined with 0.
According to the multiplier provided by the embodiment, the two paths of signals output by the Wallace tree group circuit and the received carry input signal can be accumulated through the accumulation circuit, and a multiplication result is output.
As an embodiment, wherein the multiplier includes the modified partial product obtaining circuit 22, the modified partial product obtaining circuit 22 includes: a target code input port 221, a data input port 222 and a partial product output port 223, wherein the target code input port 221 is configured to receive the target code, the data input port 222 is configured to receive second data, and the partial product output port 223 is configured to output a partial product of the target code obtained according to the second data and the target code.
Specifically, the modified partial product obtaining circuit 22 may receive three values, which are-1, 0 and 1, included in the target code output by the regular signed number encoding circuit 21 through the target code input port 221, and the modified partial product obtaining circuit 22 may obtain an original partial product according to different values in the received target code and the received second data, perform sign bit extension processing on the original partial product to obtain a partial product after sign bit extension, obtain a corresponding modified value according to different values in the received target code, and finally obtain a partial product of the target code according to the partial product after sign bit extension and the modified value. Optionally, the data input port 222 may receive second data in a multiplication operation, which may be a multiplicand in the multiplication operation. Optionally, each bit value in the original partial product and the sign bit extended partial product is a value 0 and/or 1, where 0 may represent a low level signal and 1 may represent a high level signal.
In addition, the multiplier may combine each of the obtained sign bit expanded partial products with a corresponding correction value by the correction partial product acquisition circuit 22 to obtain a partial product of the target code. Optionally, the partial product of each target code may be equal to the partial product after sign bit expansion, or may be equal to a partial product of a partial value in the partial product after sign bit expansion and a partial product of a combination of a correction value corresponding to the partial product after last sign bit expansion, and the correction value in each target code partial product may be located two lower bits after a lowest bit value in the partial product after sign bit expansion. Alternatively, the column number of the partial products of all the target codes may be equal to 2 times of the bit width of the data currently processed by the multiplier, and the number of the partial products of the target codes may be equal to the number of the partial products after sign bit expansion plus 1.
It should be noted that the modified partial product obtaining circuit 22 may directly obtain the original partial product according to the target code, and perform sign bit extension processing on the original partial product to obtain the sign bit extended partial product. Optionally, the bit width of the original partial product may be equal to N, and the bit width of the partial product after sign bit extension may be equal to 2N, where N represents the bit width of the data currently processed by the multiplier. Optionally, in the partial product after sign bit extension, the lower N-bit value may be equal to the N-bit value of the original partial product, and the upper N-1-bit value is extended to the sign bit value in the original partial product, where the sign bit value in the original partial product may be the highest bit value in the original partial product. It is also understood that the upper N-bit values in the sign-extended partial product may be the same and the lower N-bit values may be equal to the lower N-bit values in the original partial product.
Illustratively, if the multiplier is currently processing 8 bits by 8 bits fixed point multiplication, an original partial product obtained by the modified partial product obtaining circuit 22 is p8p7p6p5p4p3p2p1p0The sign bit extended partial product obtained after the sign bit extension process is performed can be represented as p8p8p8p8p8p8p8p8p7p6p5p4p3p2p1p0
In addition, if the modified partial product obtaining circuit 22 can receive three values, which are-1, 0 and 1, respectively, included in the target code, when the value in the target code is-1, the corresponding modified value may be 1, and when the value in the target code is 1 or 0, the corresponding modified value may be 0.
Optionally, in the distribution rule of the partial products of all target codes, the partial product of the first target code may be equal to the partial product after the first sign bit is extended, starting from the partial product of the second target code, each partial product of the target codes may be equal to the partial product after the corresponding sign bit is extended, the partial product obtained by combining the correction value corresponding to the partial product after the previous sign bit expansion is carried out, and the correction value can be positioned in the same column with the lowest bit value of the partial product of the previous target code, and there is no empty bit between the modified value in the partial product of the target code and the lowest bit value of the partial product after the corresponding sign bit is extended, however, the partial product of the last target code may be equal to the modified value corresponding to the partial product after the last sign bit extension, and it is also understood that the last modified value has no sign bit extended partial product that can be combined. Meanwhile, in the distribution rule of the partial products of all target codes, the corresponding column of the highest numerical value of the partial product of the first target code is taken as the standard, the corresponding columns of the highest numerical values of the partial products of all other target codes are the same column as the corresponding column of the highest numerical value of the partial product of the first target code, and the numerical values of the corresponding columns of the highest numerical value of the partial product exceeding the first target code are not accumulated.
Illustratively, continuing with the above example, the multiplier performs 8-bit by 8-bit data multiplication to obtain the distribution rule of the partial products of all target codes as shown in fig. 8Represents a corrected value "● ' indicates the sign bit value in the sign bit extended partial product, and ' O ' indicates the value of the bit other than the sign bit value in the sign bit extended partial product.
In the multiplier provided by this embodiment, the sign bit extended partial product and the corrected value can be obtained according to each target code and received data by the corrected partial product obtaining circuit, the sign bit extended partial product and the corrected value are used to obtain the target code partial product, the corrected wallace tree group circuit is used to accumulate the target code partial product, and finally the accumulation circuit is used to accumulate the received input data again to obtain the operation result, and the multiplier can reduce the number of the effective partial products obtained in the multiplication process, thereby reducing the complexity of the multiplier for realizing multiplication; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As one example, the modified wallace tree group circuit 23 includes: a partial product input port 231, a carry signal input port 232, a sum bit signal output port 233, and a carry signal output port 234, where the partial product input port 231 is configured to receive the partial product of the target code obtained by the modified partial product obtaining circuit 22, the carry signal input port 232 is configured to receive a carry input signal, the sum bit signal output port 233 is configured to output a sum bit signal obtained by the modified wallace tree group circuit 23, and the carry signal output port 234 is configured to output a carry signal obtained by the modified wallace tree group circuit 23.
Optionally, the modified wallace tree group circuit 23 includes: the modified Wallace tree sub-circuits 2311-231 n, the plurality of modified Wallace tree sub-circuits 2311-231 n are used for accumulating each column number of partial products of all target codes to obtain an accumulation operation result.
In particular, the number n of modified Wallace tree sub-circuits included in modified Wallace tree group circuit 23 may be equal to 2 times the bit width of the data currently processed by the multiplier, and the n modified Wallace tree sub-circuits may encode partial products of all the targetsParallel processing is performed, but the connection may be a serial connection. Optionally, each modified wallace tree sub-circuit in the modified wallace tree group circuit 23 may add each column number value in the partial product of all target codes, and each modified wallace tree sub-circuit may output two signals, namely, Carry signal CarryiWith a Sum signal SumiWherein i may represent a number corresponding to each modified wallace tree sub-circuit, the number of the first modified wallace tree sub-circuit is 0, and the carry signal received by the first modified wallace tree sub-circuit may be 0. Optionally, the number of partial product input signals received by each modified wallace tree sub-circuit may be equal to the total number of each column of values in the partial products of the target codes, and may also be equal to the number of the partial products after sign bit expansion plus 1, and the input signals may be a column of values and 0 in all the partial products of the target codes, where a port of a carry input signal received by each modified wallace tree sub-circuit is different from a port of the partial product input signals, and the number of partial product input ports of each modified wallace tree sub-circuit is fixed, and if the number of one column of values in all the partial products of the target codes is less than the number of partial product input ports, a value 0 may be input to the partial product input port to which no value is input.
In addition, each modified Wallace tree sub-circuit in modified Wallace tree group circuit 23 may receive a number of partial product input signals equal to the number of partial products of the target code minus 1, however, one of the modified Wallace tree sub-circuits may receive a number of partial product input signals equal to the number of partial products of the target code, the modified Wallace tree sub-circuit corresponding to all values of the column corresponding to the modified value in the partial product of the last target code received. Wherein, the partial product of the last target code may be equal to the modified value corresponding to the last target code.
For example, if the multiplier currently processes 8 bits by 8 bits data multiplication, the distribution rule of the partial products of all target codes obtained by the modified partial product obtaining circuit 22 may continue to refer to fig. 8 to obtain 10 partial products of target codes, each modified Wallace tree sub-circuit may receive all values of the corresponding column in the partial products of all target codes, the connection circuit diagram of the 18 modified Wallace tree sub-circuits in the modified Wallace tree group circuit 23 is shown in fig. 9, Wallace _ i in the figure represents a modified Wallace tree sub-circuit, i is a number of the modified Wallace tree sub-circuit starting from 0, and a solid line connected between two pairs of modified Wallace tree sub-circuits represents that the modified Wallace tree sub-circuit corresponding to a high-order number has a carry output signal, a dotted line represents that the modified Wallace tree sub-circuit corresponding to a high-order has no carry output signal, each modified wallace tree sub-circuit in the modified wallace tree group circuit 23 has 10 partial product value input ports except for the carry input port. In addition, another modified wallace tree group circuit 23 may be adopted for the present example to perform the accumulation operation, where a connection circuit diagram of 16 modified wallace tree sub-circuits is shown in fig. 10, where each modified wallace tree sub-circuit in the modified wallace tree group circuit 23 has 9 partial product value input ports except for the carry input port, but only one modified wallace tree sub-circuit has 10 partial product value input ports except for the carry input port, and the modified wallace tree sub-circuit correspondingly processes the value of the column corresponding to the partial product of the last target code in fig. 8.
In the multiplier provided by the embodiment, the modified wallace tree group circuit is used for accumulating the partial products of the target codes to obtain the accumulated result, and the accumulated result is accumulated by the accumulation circuit to obtain the operation result, so that the number of the effective partial products obtained in the multiplication process can be reduced, and the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, continuing with the specific structural diagram of the multiplier shown in fig. 7, the multiplier includes the accumulation circuit 2, and the accumulation circuit 24 includes: an adder 241, wherein the adder 241 is configured to add the accumulation result.
Specifically, the adder 241 may be an adder with different bit widths, and the adder 241 may be a carry-look-ahead adder. Optionally, the adder 241 may receive the two paths of signals output by the modified wallace tree group circuit 23, perform addition operation on the two paths of output signals, and output a multiplication result.
In the multiplier provided by the embodiment, the two paths of signals output by the modified wallace tree group circuit 23 can be accumulated by the accumulation circuit to output a multiplication result, and the number of effective partial products obtained in the multiplication process can be reduced by the multiplier, so that the complexity of realizing multiplication by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, the multiplier includes the adder 241, and the adder 241 includes: the carry signal input port 2411, the sum signal input port 2412, and the result output port 2413, where the carry signal input port 2411 is configured to receive a carry signal, the sum signal input port 2412 is configured to receive a sum signal, and the result output port 2413 is configured to output a result obtained by accumulating the carry signal and the sum signal.
Specifically, the adder 241 may receive the Carry signal Carry output by the modified wallace tree group circuit 23 through the Carry signal input port 2411, receive the Sum bit signal Sum output by the modified wallace tree group circuit 23 through the Sum bit signal input port 2412, perform an accumulation operation on the Carry signal Carry and the Sum bit signal Sum to obtain a result, and output the result through the result output port 2413.
It should be noted that, during multiplication, the multiplier may adopt adders 241 with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the modified wallace tree group circuit 23, where the bit width of the data that can be processed by the adder 241 may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Optionally, each Wallace tree sub-circuit in the modified Wallace tree group circuit 23 may output a Carry output signal CarryiAnd a Sum bit output signal Sumi(i ═ 0, …, 2N-1, i is the corresponding number for each misshapen wallace tree sub-circuit, starting with number 0). Optionally, the Carry { [ Carry ] received by the adder 2410:Carry2N-2]0, that is, the bit width of the Carry output signal Carry received by the adder 241 is 2N, the first 2N-1 bit values in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 modified wallace tree sub-circuits in the modified wallace tree group circuit 23, and the last bit value in the Carry output signal Carry may be replaced by 0. Optionally, the Sum bit output signal Sum received by the adder 241 has a bit width of 2N, and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each modified wallace tree sub-circuit in the modified wallace tree group circuit 23.
Illustratively, if the multiplier is currently processing 8bit by 8bit fixed point multiplication, the adder 241 may be a 16 bit Carry look ahead adder, and as shown in fig. 9, the modified wallace tree group circuit 23 may output the Sum output signal Sum and the Carry output signal Carry of 16 modified wallace tree sub-circuits, however, the Sum output signal received by the 16 bit Carry look ahead adder may be the complete Sum output signal Sum output by the modified wallace tree group circuit 23, and the Carry output signal received may be the Carry output signal Carry of the modified wallace tree group circuit 23 after all Carry output signals except the Carry output signal output by the last modified wallace tree sub-circuit are combined with 0.
According to the multiplier provided by the embodiment, two paths of signals output by the modified Wallace tree group circuit can be accumulated through the accumulation circuit, a multiplication result is output, and the number of effective partial products obtained in the multiplication process can be reduced by the multiplier, so that the complexity of realizing multiplication by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Fig. 11 is a flowchart illustrating a data processing method according to an embodiment, which may be processed by the multipliers shown in fig. 1 and fig. 3, where the embodiment relates to a process of data multiplication. As shown in fig. 11, the method includes:
s101, receiving data to be processed.
Specifically, the multiplier may receive data to be processed through a regular signed number encoding circuit, where the data to be processed is a multiplier, and receive the data to be processed through a modified partial product obtaining circuit, where the data to be processed may be a multiplicand. Optionally, the bit widths of the multiplier and the multiplicand may be 8 bits, 16 bits, 32 bits, and 64 bits, which is not limited in this embodiment. Wherein, the multiplier to be processed and the multiplicand can both be fixed point numbers.
And S102, performing regular signed number coding processing on the data to be processed to obtain target codes.
Specifically, the multiplier can perform regular signed number encoding processing on the received data to be processed through a regular signed number encoding circuit to obtain a target code, wherein the data to be processed is a multiplier. The bit width of the target code may be equal to the bit width N of the multiplier to be processed plus 1.
Optionally, the step of performing regular signed number coding processing on the data to be processed in S102 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the data to be processed into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.
It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1).
S103, obtaining a partial product and a correction value of the target code according to the data to be processed and the target code.
Specifically, the modified partial product obtaining circuit may obtain a corresponding modified value according to the received target code through a modification processing rule. The rule of the modification process may be characterized in that, if the modified partial product obtaining circuit can receive three values, which are-1, 0 and 1, respectively, included in the target code, when the value in the target code is-1, the corresponding modified value may be 1, and when the value in the target code is 1 or 0, the corresponding modified value may be 0. Optionally, each target code has a corresponding partial product of the target code and a corresponding correction value.
And S104, accumulating the partial product of the target code according to the correction value to obtain a target operation result.
Specifically, the wallace tree set circuit in the multiplier may accumulate each column number value in the partial product of the target code according to the correction value to obtain a target operation result of the multiplication, and output the target operation result. The modified value can be used as a carry input signal of the Wallace tree group circuit.
In the data processing method provided by this embodiment, to-be-processed data is received, regular signed number coding processing is performed on the to-be-processed data to obtain a target code, conversion processing is performed on the to-be-processed data and the target code to obtain a partial product of the target code and a correction value, and the partial product of the target code is accumulated according to the correction value to obtain a target operation result, in which, the method can perform regular signed number coding on the received data through a regular signed number coding circuit, so that the number of the obtained effective partial products is small, thereby reducing the complexity of a multiplier for realizing multiplication; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a data processing method, in which the obtaining of the partial product and the modified value of the target code according to the data to be processed and the target code in the step S103 includes:
and S1031, obtaining the partial product after the sign bit is expanded according to the data to be processed and the target code.
Specifically, the modified partial product obtaining circuit may obtain a partial product after sign bit extension according to the received data to be processed and the target code obtained by the regular signed number coding circuit, where the data to be processed may be a multiplicand.
S1032, the corrected numerical value is obtained according to the target code, and the partial product of the target code is obtained according to the partial product after sign bit expansion.
Specifically, the modified partial product obtaining circuit may obtain a corresponding modified value according to the received target code through a modification processing rule.
For example, if the modified partial product obtaining circuit can receive three values, which are-1, 0 and 1 respectively, contained in the target code, and the received multiplicand is X, the modified partial product obtaining circuit can determine a modified value corresponding to each value contained in the target code according to the modification processing rule. Optionally, the modification processing rule may be characterized in that when the value in the target code is-1, the corresponding modification value may be 1, and when the value in the target code is 0 or 1, the corresponding modification value may be 0.
It should be noted that each partial product of the target code may correspond to a number, and the number starts from 1. Alternatively, the number corresponding to the partial product of the target code obtained by the lower data may start from 1. Optionally, the partial product of the first target code may be equal to the partial product after the first corresponding sign bit is extended, and starting from the partial product of the second target code, each partial product of the target codes may be left-shifted by (2N-i +1) bit values for the partial product after the corresponding sign bit is extended, which is equivalent to that the high (i-1) bit values in the partial product after the corresponding sign bit is extended do not perform an accumulation operation, where i represents the number of each partial product after the sign bit is extended.
According to the data processing method provided by the embodiment, a partial product after sign bit expansion is obtained according to the data to be processed and the target code, the partial product after sign bit expansion and the correction value are converted to obtain the partial product of the target code, and the partial product of the target code is accumulated according to the correction value to obtain a target operation result, so that the number of effective partial products obtained by the method is small, and the complexity of a multiplier for realizing multiplication is reduced; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, the obtaining, in S1031, a partial product after sign bit expansion according to the to-be-processed data and the target code includes:
and S1031a, obtaining an original partial product according to the data to be processed and the target code.
Specifically, the number of original partial products may be equal to the number of target codes. Illustratively, if the modified partial product acquisition circuit receives an 8-bit multiplicand x7x6x5x4x3x2x1x0(i.e., X), the modified partial product acquisition circuit may be based on the multiplicand X7x6x5x4x3x2x1x0(i.e. X) directly obtains corresponding original partial products with the three values contained in the target code, where the original partial product may be-X when the value in the target code is-1, the original partial product may be X when the value in the target code is 1, and the original partial product may be 0 when the value in the target code is 0, which is equivalent to that the 9-bit values in the original partial product are all values of 0.
And S1031b, performing sign bit expansion processing on the original partial product to obtain the partial product after sign bit expansion.
Specifically, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width N of the data currently processed by the multiplier, the bit width of the original partial product may be equal to N, and the bit number of the sign bit extension bit may be equal to N. Optionally, the sign bit extension processing may be understood as filling a value of the sign bit extension bit with a value of a sign bit in the original partial product, where the value of the sign bit may be a highest-order value in the original partial product, and obtaining a 2N-bit-wide sign bit extended partial product. Optionally, in the distribution rule of the partial products after all sign bit extensions, the highest-order numerical value in the partial products after all sign bit extensions may be located in the same column, the lowest-order numerical value may be located in the same column, and other corresponding numerical values may also correspond to the same column.
Illustratively, if the multiplier currently handles 8-bit by 8-bit fixed-point multiplication, an original partial product obtained by the modified partial product obtaining circuit is p8p7p6p5p4p3p2p1p0The partial product after sign bit extension can be expressed as p8p8p8p8p8p8p8p8p7p6p5p4p3p2p1p0
According to the data processing method provided by the embodiment, an original partial product is obtained according to the data to be processed and the target code, sign bit extension processing is performed on the original partial product to obtain a partial product after sign bit extension, and accumulation processing is performed on the partial product after sign bit extension according to a correction value to obtain an operation result; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a data processing method, in which the step S104 adds up the partial product of the target code according to the correction value to obtain a target operation result, including:
and S1041, accumulating the partial product of the target code according to the corrected value to obtain two paths of output signals.
Specifically, the multiplier may accumulate the number of each column in the partial product of all target codes through a wallace tree sub-circuit in the wallace tree group circuit to obtain two output signals, where Carry output signal Carry ═ { Carry ═ Carry0, Carry1,…,CarryiAnd Sum output signal Sum ═ Sum0,Sum1,…,SumiWhere i denotes the number of Wallace tree subcircuits starting with 0. Optionally, the number of input signals received by each of the wallace tree sub-circuits may be equal to the number of partial products after sign bit extension, and the number of carry input signals received by each of the wallace tree sub-circuits from the second wallace tree sub-circuit may be equal to the number of carry output signals output by the previous wallace tree sub-circuit. Optionally, if the number of the target codes obtained by the regular signed number encoding circuit is n +1, and the numbers corresponding to all the numerical values in the target codes may be 1,.. or n +1, the carry input signal received by the first wallace tree sub-circuit may be n-1 modified numerical values obtained by corresponding numerical values from 1 to n-1 in the target codes, where n may represent the bit width of the data received by the multiplier. In addition, each Wallace tree sub-circuit can output two paths of signals, namely Carry output signal CarryiAnd bit output signal Sumi. Alternatively, the number of ports of the input signal of each Wallace tree subcircuit may be equal to the total number of values in the partial product of the target code of the corresponding column and the correction value.
It should be noted that, if the number of the wallace tree sub-circuits in the wallace tree group circuit is 0, 1, …, i from 0, and the number of the corresponding original partial products received by the wallace tree group circuit may also be 0, 1, …, i, the carry input signal received by the first wallace tree sub-circuit in the wallace tree group circuit may be i-1 modified values corresponding to the original partial products with the numbers 0, 1, …, i-2.
And S1042, accumulating the two paths of output signals according to the corrected values to obtain the target operation result.
Specifically, the carry input signal of the accumulation circuit may be a modified value corresponding to a target code of a last number. Optionally, the multiplier may shift the Carry output signal Carry output by the wallace tree group circuit through the accumulation circuit to obtain a shifted Carry output signal Carry ', and then accumulate the shifted Carry output signal Carry', the Sum output signal Sum output by the wallace tree group circuit, and the Carry input signal received by the accumulation circuit, and output the operation result. Optionally, the shift processing may be left shift by one bit, after the wallace tree group circuit performs left shift by one bit on the output Carry output signal Carry, the lowest bit value corresponding to the original Carry input signal Carry is null, and at this time, the accumulation circuit may fill the correction value corresponding to the target code corresponding to the received penultimate number to the lowest bit position corresponding to the original Carry input signal Carry to obtain the shifted Carry output signal Carry'.
In the data processing method provided by this embodiment, the partial products of the target code are accumulated according to the correction value to obtain two output signals, and the two output signals are accumulated according to the correction value to obtain a target operation result, so that the number of effective partial products obtained by the above method is small, thereby reducing the complexity of the multiplier for realizing multiplication; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Fig. 12 is a flowchart illustrating a data processing method according to an embodiment, which may be processed by the multipliers shown in fig. 2 and fig. 7, where the embodiment relates to a process of data multiplication. As shown in fig. 12, the method includes:
s201, receiving data to be processed.
Specifically, the multiplier may receive data to be processed through a regular signed number encoding circuit, where the data to be processed is a multiplier, and receive the data to be processed through a modified partial product obtaining circuit, where the data to be processed may be a multiplicand. Optionally, the bit widths of the multiplier and the multiplicand may be 8 bits, 16 bits, 32 bits, and 64 bits, which is not limited in this embodiment. Wherein, the multiplier to be processed and the multiplicand can both be fixed point numbers.
S202, performing regular signed number coding processing on the data to be processed to obtain target codes.
Specifically, the multiplier can perform regular signed number encoding processing on the received data to be processed through a regular signed number encoding circuit to obtain a target code, wherein the data to be processed is a multiplier. The bit width of the target code may be equal to the bit width N of the multiplier to be processed plus 1.
Optionally, the step of performing regular signed number coding processing on the data to be processed in S202 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the data to be processed into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.
It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1).
And S203, obtaining a corrected value according to the target code.
Specifically, the modified partial product obtaining circuit may obtain a corresponding modified value according to the received target code through a modification processing rule. The rule of the modification process may be characterized in that, if the modified partial product obtaining circuit can receive three values, which are-1, 0 and 1, respectively, included in the target code, when the value in the target code is-1, the corresponding modified value may be 1, and when the value in the target code is 1 or 0, the corresponding modified value may be 0. Optionally, each target code has a corresponding partial product of the target code and a corresponding correction value.
S204, converting the data to be processed and the target code to obtain a partial product of the target code.
Specifically, the conversion process may be characterized by converting the target code into a partial product of the target code based on the data to be processed, which is a multiplicand.
Optionally, the processing sequence of steps S203 and S204 is not limited in this embodiment.
And S205, accumulating the partial product of the target code according to the correction value to obtain a target operation result.
Specifically, the wallace tree set circuit in the multiplier may accumulate each column number value in the partial product of the target code according to the correction value to obtain a target operation result of the multiplication, and output the target operation result. The modified value can be used as a carry input signal of the Wallace tree group circuit.
In the data processing method provided by this embodiment, to-be-processed data is received, regular signed number coding processing is performed on the to-be-processed data to obtain a target code, conversion processing is performed on the to-be-processed data and the target code to obtain a partial product of the target code and a correction value, and the partial product of the target code is accumulated according to the correction value to obtain a target operation result, in which, the method can perform regular signed number coding on the received data through a regular signed number coding circuit, so that the number of the obtained effective partial products is small, thereby reducing the complexity of a multiplier for realizing multiplication; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a data processing method, where in the step S204, the converting process is performed on the data to be processed and the target code to obtain a partial product of the target code, including:
s2041, obtaining a partial product after sign bit expansion according to the data to be processed and the target code.
Specifically, the modified partial product obtaining circuit may obtain the partial product after sign bit expansion according to the received multiplicand to be processed and the target code obtained by the regular signed number coding circuit.
S2042, converting the partial product after sign bit expansion and the correction value to obtain the partial product of the target code.
Specifically, the modified partial product obtaining circuit may obtain a corresponding modified value according to the received target code through a modification processing rule.
For example, if the modified partial product obtaining circuit can receive three values, which are-1, 0 and 1 respectively, contained in the target code, and the received multiplicand is X, the modified partial product obtaining circuit can determine a modified value corresponding to each value contained in the target code according to the modification processing rule. Optionally, the modification processing rule may be characterized in that when the value in the target code is-1, the corresponding modification value may be 1, and when the value in the target code is 0 or 1, the corresponding modification value may be 0.
It should be noted that each partial product of the target code may correspond to a number, and the number starts from 1. Alternatively, the number corresponding to the partial product of the target code obtained by the lower data may start from 1. Optionally, the partial product of the first target code may be equal to the partial product after the first corresponding sign bit is extended, and starting from the partial product of the second target code, each partial product of the target codes may be left-shifted by (2N-i +1) bit values for the partial product after the corresponding sign bit is extended, which is equivalent to that the high (i-1) bit values in the partial product after the corresponding sign bit is extended do not perform an accumulation operation, where i represents the number of each partial product after the sign bit is extended.
According to the data processing method provided by the embodiment, a partial product after sign bit expansion is obtained according to the data to be processed and the target code, the partial product after sign bit expansion and the correction value are converted to obtain the partial product of the target code, and the partial product of the target code is accumulated according to the correction value to obtain a target operation result, so that the number of effective partial products obtained by the method is small, and the complexity of a multiplier for realizing multiplication is reduced; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, the obtaining, in the S2041, a partial product after sign bit extension according to the to-be-processed data and the target code includes:
s2041a, obtaining an original partial product according to the data to be processed and the target code.
Specifically, the number of original partial products may be equal to the number of target codes. Illustratively, if the modified partial product acquisition circuit receives an 8-bit multiplicand x7x6x5x4x3x2x1x0(i.e., X), the modified partial product acquisition circuit may be based on the multiplicand X7x6x5x4x3x2x1x0(i.e. X) directly obtains corresponding original partial products with the three values contained in the target code, where the original partial product may be-X when the value in the target code is-1, the original partial product may be X when the value in the target code is 1, and the original partial product may be 0 when the value in the target code is 0, which is equivalent to that the 9-bit values in the original partial product are all values of 0.
S2041b, sign bit expansion processing is carried out on the original partial product, and the partial product after sign bit expansion is obtained.
Specifically, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width N of the data currently processed by the multiplier, the bit width of the original partial product may be equal to N, and the bit number of the sign bit extension bit may be equal to N. Optionally, the sign bit extension processing may be understood as filling a value of the sign bit extension bit with a value of a sign bit in the original partial product, where the value of the sign bit may be a highest-order value in the original partial product, and obtaining a 2N-bit-wide sign bit extended partial product. Optionally, in the distribution rule of the partial products after all sign bit extensions, the highest-order numerical value in the partial products after all sign bit extensions may be located in the same column, the lowest-order numerical value may be located in the same column, and other corresponding numerical values may also correspond to the same column.
Illustratively, if the multiplier currently handles 8-bit by 8-bit fixed-point multiplication, an original partial product obtained by the modified partial product obtaining circuit is p8p7p6p5p4p3p2p1p0The partial product after sign bit extension can be expressed as p8p8p8p8p8p8p8p8p7p6p5p4p3p2p1p0
According to the data processing method provided by the embodiment, an original partial product is obtained according to the data to be processed and the target code, sign bit extension processing is performed on the original partial product to obtain a partial product after sign bit extension, and accumulation processing is performed on the partial product after sign bit extension according to a correction value to obtain an operation result; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a data processing method, in which the step S205 of accumulating the partial product of the target code according to the correction value to obtain a target operation result includes:
and S2051, accumulating the partial products of the target codes to obtain two paths of output signals.
In particular, the multiplier can encode each column of partial products of all targets via Wallace tree subcircuits within a Wallace tree bank circuitThe numerical values are accumulated to obtain two paths of output signals, and Carry output signals Carry ═ Carry0, Carry1,…,CarryiAnd Sum output signal Sum ═ Sum0,Sum1,…,SumiWhere i denotes the number of Wallace tree subcircuits starting with 0. Optionally, the number of input signals received by each of the wallace tree sub-circuits may be equal to the number of partial products after sign bit extension, and the number of carry input signals received by each of the wallace tree sub-circuits from the second wallace tree sub-circuit may be equal to the number of carry output signals output by the previous wallace tree sub-circuit. Optionally, if the number of the target codes obtained by the regular signed number encoding circuit is n +1, and the numbers corresponding to all the numerical values in the target codes may be 1,.. or n +1, the carry input signal received by the first wallace tree sub-circuit may be n-1 modified numerical values obtained by corresponding numerical values from 1 to n-1 in the target codes, where n may represent the bit width of the data received by the multiplier. In addition, each Wallace tree sub-circuit can output two paths of signals, namely Carry output signal CarryiAnd bit output signal Sumi. Alternatively, the number of ports of the input signal of each Wallace tree subcircuit may be equal to the total number of values in the partial product of the target code of the corresponding column and the correction value.
It should be noted that, if the number of the wallace tree sub-circuits in the wallace tree group circuit is 0, 1, …, i from 0, and the number of the corresponding original partial products received by the wallace tree group circuit may also be 0, 1, …, i, the carry input signal received by the first wallace tree sub-circuit in the wallace tree group circuit may be i-1 modified values corresponding to the original partial products with the numbers 0, 1, …, i-2.
And S2052, accumulating the corrected value and the two paths of output signals to obtain a target operation result.
Specifically, the carry input signal of the accumulation circuit may be a modified value corresponding to a target code of a last number. Optionally, the multiplier may shift the Carry output signal Carry output by the wallace tree group circuit through the accumulation circuit to obtain a shifted Carry output signal Carry ', and then accumulate the shifted Carry output signal Carry', the Sum output signal Sum output by the wallace tree group circuit, and the Carry input signal received by the accumulation circuit, and output the operation result. Optionally, the shift processing may be left shift by one bit, after the wallace tree group circuit performs left shift by one bit on the output Carry output signal Carry, the lowest bit value corresponding to the original Carry input signal Carry is null, and at this time, the accumulation circuit may fill the correction value corresponding to the target code corresponding to the received penultimate number to the lowest bit position corresponding to the original Carry input signal Carry to obtain the shifted Carry output signal Carry'.
In the data processing method provided by this embodiment, the partial products of the target code are accumulated according to the correction value to obtain two output signals, and the two output signals are accumulated according to the correction value to obtain a target operation result, so that the number of effective partial products obtained by the above method is small, thereby reducing the complexity of the multiplier for realizing multiplication; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers can be linked and transmit data through a specific structure, for example, the PCIE bus interconnects and transmits data to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 13 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Alternatively, as shown in fig. 14, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.
In some embodiments, a chip package structure is provided, which includes the above chip.
In some embodiments, a board card is provided, which includes the above chip package structure. As shown in fig. 15, fig. 15 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving means 391 and control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through a standard PCIE interface, so that data transfer is realized. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a multiplier, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required for this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A multiplier, characterized in that it comprises: the device comprises a regular signed number coding circuit, a correction partial product acquisition circuit, a Wallace tree group circuit and an accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the correction partial product acquisition circuit, the output end of the correction partial product acquisition circuit is connected with the input end of the Wallace tree group circuit, the output end of the Wallace tree group circuit is connected with the input end of the accumulation circuit, and the correction partial product acquisition circuit comprises a partial product acquisition branch and a partial product acquisition branch of target codes;
the regular signed number coding circuit is used for carrying out regular signed number coding processing on received data to obtain a target code, the correction partial product acquisition circuit is used for obtaining a partial product after sign bit expansion and a correction numerical value according to the target code, the partial product acquisition branch of the target code is used for obtaining a partial product of the target code according to the partial product after sign bit expansion, the Wallace tree group circuit is used for carrying out accumulation processing according to the partial product of the target code and the correction numerical value, and the accumulation circuit is used for carrying out addition operation on an accumulation operation result and the correction numerical value.
2. The multiplier of claim 1, wherein the regular signed number encoding circuit comprises: the data input port is used for receiving first data subjected to the regular signed number coding processing, and the target coding output port is used for outputting the target code obtained after the received first data is subjected to the regular signed number coding processing.
3. The multiplier of claim 1 or 2, wherein the modified partial product acquisition circuit comprises: the target code comprises a target code input port, a data input port and a partial product output port, wherein the target code input port is used for receiving target codes, the data input port is used for receiving second data, and the partial product output port is used for outputting partial products of the target codes obtained according to the second data and the target codes.
4. The multiplier of claim 1, wherein the wallace tree bank circuit comprises: the modified partial product acquisition circuit comprises a modified value input port, a partial product input port, a sum signal output port and a carry signal output port, wherein the modified value input port is used for receiving the modified value acquired by the modified partial product acquisition circuit, the partial product input port is used for receiving the partial product obtained by the modified partial product acquisition circuit after sign bit expansion, the sum signal output port is used for outputting a sum signal acquired by the Wallace tree group circuit, and the carry signal output port is used for outputting a carry signal acquired by the Wallace tree group circuit.
5. The multiplier of claim 4, wherein the Wallace Tree grouping circuit comprises: the Wallace tree sub-circuit is used for accumulating the number of each column in the partial product of all target codes to obtain an accumulated operation result.
6. The multiplier of claim 1, wherein the accumulation circuit comprises: and the adder is used for performing accumulation processing on the accumulation operation result according to the correction value to obtain a target operation result.
7. The multiplier of claim 6, wherein the adder comprises: the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving the sum signal, the carry input port is used for receiving the correction value obtained by the correction partial product acquisition circuit and filling the correction value to the lowest position of the carry input signal, and the result output port is used for outputting a result obtained by performing accumulation operation on the carry signal, the sum signal and the correction value.
8. A machine learning operation device, wherein the machine learning operation device comprises one or more multipliers according to any one of claims 1 to 7, and is configured to obtain input data and control information to be operated from other processing devices except the multipliers in the machine learning operation device, execute a specified machine learning operation, and transmit an execution result to other processing devices except the multipliers in the machine learning operation device through an I/O interface;
when the machine learning arithmetic device comprises a plurality of multipliers, the multipliers are connected through a preset structure and transmit data;
the multipliers are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.
9. A combined processing apparatus, characterized in that the combined processing apparatus comprises the machine learning arithmetic apparatus according to claim 8, a common interconnection interface, and processing means other than the machine learning arithmetic apparatus in the combined processing apparatus;
and the machine learning arithmetic device interacts with other processing devices except the machine learning arithmetic device in the combined processing device to jointly complete the calculation operation designated by the user.
10. The combined processing device according to claim 9, further comprising: and a storage device connected to the machine learning arithmetic device and the other processing device except the machine learning arithmetic device among the combined processing devices, respectively, for storing data of the machine learning arithmetic device and the other processing device.
CN201921434164.XU 2019-08-30 2019-08-30 Multiplier, machine learning arithmetic device and combination processing device Active CN209879492U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201921434164.XU CN209879492U (en) 2019-08-30 2019-08-30 Multiplier, machine learning arithmetic device and combination processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201921434164.XU CN209879492U (en) 2019-08-30 2019-08-30 Multiplier, machine learning arithmetic device and combination processing device

Publications (1)

Publication Number Publication Date
CN209879492U true CN209879492U (en) 2019-12-31

Family

ID=68949510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201921434164.XU Active CN209879492U (en) 2019-08-30 2019-08-30 Multiplier, machine learning arithmetic device and combination processing device

Country Status (1)

Country Link
CN (1) CN209879492U (en)

Similar Documents

Publication Publication Date Title
CN110413254B (en) Data processor, method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN110554854A (en) Data processor, method, chip and electronic equipment
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN113031912A (en) Multiplier, data processing method, device and chip
CN110515588B (en) Multiplier, data processing method, chip and electronic equipment
CN209879493U (en) Multiplier and method for generating a digital signal
CN210109863U (en) Multiplier, device, neural network chip and electronic equipment
CN110647307B (en) Data processor, method, chip and electronic equipment
CN210109789U (en) Data processor
CN110515586B (en) Multiplier, data processing method, chip and electronic equipment
CN111258542B (en) Multiplier, data processing method, chip and electronic equipment
CN210006031U (en) Multiplier and method for generating a digital signal
CN210006029U (en) Data processor
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN209879492U (en) Multiplier, machine learning arithmetic device and combination processing device
CN113031911A (en) Multiplier, data processing method, device and chip
CN209962284U (en) Multiplier, device, chip and electronic equipment
CN113031915A (en) Multiplier, data processing method, device and chip
CN210006032U (en) Multiplier, machine learning arithmetic device and combination processing device
CN113031916A (en) Multiplier, data processing method, device and chip
CN110688087A (en) Data processor, method, chip and electronic equipment

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant