CN110378478B

CN110378478B - Multiplier, data processing method, chip and electronic equipment

Info

Publication number: CN110378478B
Application number: CN201910819000.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2023-09-08
Anticipated expiration: 2039-08-30
Also published as: CN110378478A

Abstract

The application provides a multiplier, a data processing method, a chip and electronic equipment, wherein the multiplier comprises: the output end of the regular signed number coding circuit is connected with the input end of the correction accumulation circuit, the multiplier carries out regular signed number coding processing on received data through the regular signed number coding circuit to obtain an original partial product, carries out addition operation processing and judgment processing on high two-bit numerical values of the original partial product to realize symbol bit expansion elimination processing to obtain a partial product after symbol bit expansion elimination, and finally carries out accumulation correction processing on the partial product after symbol bit expansion elimination through the correction accumulation circuit to obtain a target operation result.

Description

Multiplier, data processing method, chip and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a multiplier, a data processing method, a chip, and an electronic device.

Background

With the continuous development of digital electronic technology, the rapid development of various artificial intelligence (Artificial Intelligence, AI) chips has also been increasingly demanded for high-performance digital multipliers. The neural network algorithm is one of algorithms widely used by intelligent chips, and multiplication operation through a multiplier is a common operation in the neural network algorithm.

At present, the multiplier takes each three-digit value in the multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a Wallace tree to obtain a target operation result in multiplication operation. However, in the conventional technology, the number of non-zero numerical values in the code is large, and the number of corresponding partial products is large, so that the complexity of the multiplier in realizing multiplication is high.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multiplier, a data processing method, a chip, and an electronic device that can reduce the number of effective partial products obtained during a multiplication process to reduce the complexity of the multiplication operation of the multiplier.

An embodiment of the present application provides a multiplier, including: the device comprises a regular signed number coding circuit and a correction accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the correction accumulation circuit; the regular signed number coding circuit is used for performing regular signed number coding processing on received data to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit is used for performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated.

In one embodiment, the canonical signed number coding circuit includes: the regular has symbol number coding processing unit and partial product acquisition unit; the output end of the regular signed number coding processing unit is connected with the input end of the partial product acquisition unit;

the regular signed number coding processing unit is used for performing regular signed number coding processing on the received first data to obtain target codes, and the partial product obtaining unit is used for obtaining original partial products according to the target codes and performing logic operation processing according to the original partial products.

In one embodiment, the partial product obtaining unit is specifically configured to obtain an original partial product according to the target code, and perform binary addition operation according to a highest bit number of the original partial product, so as to obtain the partial product after the symbol bit expansion is eliminated.

In one embodiment, the partial product acquisition unit includes: a first full adder.

In one embodiment, the partial product acquisition unit includes: a target code input port, a data input port, and a partial product output port; the target code input port is used for receiving the target code, the data input port is used for receiving second data, and the partial product output port is used for outputting a partial product obtained by acquiring the symbol bit elimination expansion according to the target code and the received second data.

In one embodiment, the correction accumulation circuit includes: and the full adder is used for carrying out accumulation correction processing on the received partial product after the symbol bit expansion.

According to the multiplier provided by the embodiment, the regular signed number coding circuit is used for carrying out regular signed number coding processing on received data to obtain an original partial product, the high two-bit numerical value of the original partial product is subjected to addition operation and judgment processing to realize elimination of the sign bit expansion processing to obtain a partial product after elimination of the sign bit expansion, and finally the correction accumulation circuit is used for carrying out accumulation correction processing on the partial product after elimination of the sign bit expansion to obtain a target operation result.

The embodiment of the application provides a data processing method, which comprises the following steps:

receiving data to be processed;

carrying out regular signed number coding treatment on the data to be processed to obtain a target code;

Obtaining a partial product after eliminating sign bit expansion according to the data to be processed and the target code;

and performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, so as to obtain a target operation result.

In one embodiment, the performing regular signed number encoding on the data to be processed to obtain a target encoding includes: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

In one embodiment, the obtaining the partial product after eliminating the sign bit expansion according to the data to be processed and the target code includes:

obtaining an original partial product according to the data to be processed and the target code;

and carrying out addition operation on the original partial product to obtain a partial product with the sign bit expansion eliminated.

In one embodiment, the performing an addition operation on the original partial product to obtain a partial product with the sign bit expanded eliminated includes: and performing AND logic operation on the highest bit numerical value of the original partial product to obtain a partial product with the sign bit expansion eliminated.

According to the data processing method provided by the embodiment, the multiplier can receive data to be processed, the data to be processed is encoded, the partial product after the symbol bit expansion is eliminated is obtained according to the data to be processed and the encoding, the partial product after the symbol bit expansion is eliminated is subjected to accumulation correction processing, and a target operation result is obtained.

The machine learning operation device provided by the embodiment of the application comprises one or more multipliers; the machine learning operation device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to the other processing devices through an I/O interface;

when the machine learning operation device comprises a plurality of multipliers, a plurality of calculation devices are connected through a preset specific structure and transmit data;

the multipliers are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or have respective control systems; the multipliers share the memory or have the memory of each; the interconnection mode of a plurality of multipliers is any interconnection topology.

The embodiment of the application provides a combined processing device, which comprises the machine learning processing device, a universal interconnection interface and other processing devices; the machine learning operation device interacts with the other processing devices to jointly complete the operation appointed by the user; the combination processing device may further include a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

The neural network chip provided by the embodiment of the application comprises the multiplier, the machine learning operation device or the combination processing device.

The embodiment of the application provides a neural network chip packaging structure, which comprises the neural network chip.

The board provided by the embodiment of the application comprises the neural network chip packaging structure.

The embodiment of the application provides an electronic device which comprises the neural network chip or the board card.

The chip provided by the embodiment of the application comprises at least one multiplier.

The electronic equipment provided by the embodiment of the application comprises the chip.

Drawings

FIG. 1 is a schematic diagram of a multiplier according to an embodiment;

FIG. 2 is a schematic diagram of another multiplier according to another embodiment;

FIG. 3 is a schematic diagram of a multiplier according to an embodiment;

FIG. 4 is a schematic diagram of another multiplier according to another embodiment;

FIG. 5 is a schematic diagram of a distribution rule of 9 partial products after symbol bit expansion cancellation according to another embodiment;

FIG. 6 is a diagram showing another embodiment of a correction accumulation circuit for 8-bit data operation according to the present application;

FIG. 7 is a flow chart of a method for processing data according to an embodiment;

FIG. 8 is a flowchart of another method for processing data according to an embodiment;

FIG. 9 is a block diagram of a combination processing apparatus according to an embodiment;

FIG. 10 is a block diagram of another combination processing apparatus according to one embodiment;

fig. 11 is a schematic structural diagram of a board according to an embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The multiplier provided by the application can be applied to AI chips, field programmable gate array FPGA (Field-Programmable Gate Array, FPGA) chips or other hardware circuit devices for multiplication, and the specific structure schematic diagrams are shown in figures 1 and 2.

As shown in fig. 1, fig. 1 is a block diagram of a multiplier according to an embodiment, where the multiplier includes: the device comprises a regular signed number coding circuit 11 and a correction accumulation circuit 12, wherein the output end of the regular signed number coding circuit 11 is connected with the input end of the correction accumulation circuit 12; the regular signed number coding circuit 11 is configured to perform regular signed number coding on the received data to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit 12 is configured to perform accumulation correction on the partial product after the symbol bit expansion is eliminated.

Specifically, the regular signed number coding circuit 11 may include a plurality of data processing units with different functions, and the data received by the regular signed number coding circuit 11 may be used as a multiplier in multiplication operation or a multiplicand in multiplication operation. Alternatively, the data processing units of the different functions may comprise data processing units having a regular signed number encoding process which may be characterized as a data processing process encoded by the values 0, -1 and 1. Alternatively, the multipliers and multiplicands may be fixed point numbers that are multiple bits wide. Optionally, the correction accumulation circuit 12 may perform correction processing in the process of accumulating the partial product obtained by the regular signed number encoding circuit 11 after the sign bit expansion is eliminated, so as to obtain the target operation result in the multiplication operation.

It should be noted that, the multiplier provided in this embodiment may process multiplication operation of fixed bit width data, where the fixed bit width may be 8 bits, 16 bits, 32 bits, or 64 bits, and this embodiment is not limited in any way. However, in the same multiplication, the multiplier and the multiplicand received by the symbol encoding circuit 11 are data having the same bit width. Alternatively, there may be one input port of the data processing unit with different functions, the function of the input port of each data processing unit may be the same, there may also be one output port, the function of the output port of each data processing unit may be different, and the circuit structures of the data processing units with different functions may be different.

The multiplier provided by the embodiment performs regular signed number coding processing on received data through the regular signed number coding circuit to obtain a partial product after the expansion of the elimination sign bit, and the correction accumulation circuit can perform accumulation correction processing on the partial product after the expansion of the elimination sign bit to obtain a target operation result; the multiplier can adopt the regular signed number coding circuit to carry out regular signed number coding processing on the received data, and reduces the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of the multiplier in realizing multiplication operation; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 2 is a block diagram of a multiplier according to an embodiment. As shown in fig. 2, the multiplier includes: a regular symbol number encoding circuit 21, a partial product acquisition circuit 22, and a correction accumulation circuit 23; the output end of the regular signed number coding circuit 21 is connected with the input end of the partial product acquisition circuit 22, and the output end of the partial product acquisition circuit 22 is connected with the input end of the correction accumulation circuit 23. The regular signed number coding circuit 21 is configured to perform regular signed number coding on the received data to obtain a target code, the partial product obtaining circuit 22 is configured to obtain an original partial product according to the target code, perform logic operation processing according to the original partial product to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit 23 is configured to perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated.

Optionally, the regular signed number coding circuit 21 includes: a data input port 211 and a target code output port 212; the data input port 211 is configured to receive first data subjected to regular signed number encoding, and the target encoding output port 212 is configured to output the target encoding obtained after the received first data is subjected to regular signed number encoding.

Optionally, the partial product obtaining circuit 22 includes an original partial product obtaining unit 221 and a logic gate unit 222, where the original partial product obtaining unit 221 is configured to obtain an original partial product according to target encoding, and the logic gate unit 222 is configured to perform logic operation processing on a highest bit number value of the original partial product to obtain a partial product after eliminating sign bit expansion. Optionally, the partial product acquisition circuit 22 includes an and circuit.

Specifically, the regular signed number coding circuit 21 may receive the first data, and perform regular signed number coding processing on the first data to obtain a target code; the first data may be a multiplier in a multiplication operation. It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l >When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; wherein N-bit multiplier is subjected to regular signed number coding processing to obtainThe bit width of the target code to be reached may be equal to (n+1). Further, in the regular signed number encoding process, data 11 may be converted to (100-001), i.e., data 11 may be equivalently converted to 10 (-1); data 111 may be converted to (1000-0001), i.e., data 111 may be equivalently converted to 100 (-1); by analogy, the other consecutive l (l>=2) the manner of the bit-number 1 conversion process is also similar.

For example, the multiplier received by the regular-symbol number coding circuit 21 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 00 (-1) 0), the fifth new data obtained after the fifth conversion processing is continuously performed on the fourth new data is '010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (-1) 0', and no continuous l (l > =2) bit number value 1 exists in the fifth new data, at this time, the fifth new data can be called intermediate coding, and after the intermediate coding is subjected to the bit supplementing processing once, the regular signed number coding processing is characterized, wherein the bit width of the intermediate coding can be equal to the bit width of the multiplier. Optionally, in the new data (i.e. intermediate code) obtained after the multiplier is subjected to the regular signed number encoding processing by the regular signed number encoding circuit 21, if the highest order number value and the next highest order number value in the new data are "10" or "01", the regular signed number encoding circuit 21 may supplement one bit value 0 to the higher order position of the highest order number value of the intermediate code obtained by the new data, so as to obtain the highest three-order number value of the corresponding target code as "010" or "001", respectively. Alternatively, the above intermediate encoded bit width may be equal to the target encoded bit width minus 1.

Alternatively, the bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of original partial products, and the original partial product obtaining unit 221 in the partial product obtaining circuit 22 may obtain a corresponding original partial product according to each bit value in the target code, and perform logic operation processing on the highest bit value in each original partial product through the logic gate 222, and directly eliminate the sign extension bit to obtain a partial product after eliminating the sign bit extension. Alternatively, the original partial product may be a partial product without sign bit expansion. At the same time, the highest order value in the original partial product is determined by logic gate 222 as an additional one-bit value in the partial product after the sign bit extension is eliminated, which may be represented by Q. Alternatively, the logic gate unit 222 may include an and gate circuit.

It should be noted that, if the highest-order numerical value of the original partial product is denoted by a, the partial product obtaining circuit 22 may perform an and logic operation on the highest-order numerical value and the signal 1 through the and circuit to obtain the highest-order numerical value of the original partial product, where the numerical value a 'corresponding to the corresponding bit in the partial product after the sign bit expansion of the elimination of the target code, that is, a' is the sum-order signal of a and the signal 1; and the additional one-bit value Q in the partial product after the sign-bit-extension cancellation resulting in the target code may be equal to a and the carry signal of signal 1. The generating relationship between the highest numerical value a of the original partial product and the corresponding highest numerical value a' and the additional one numerical value Q in the partial product after the sign bit expansion is eliminated after the logic operation processing can be seen in table 1.

TABLE 1

According to the multiplier provided by the embodiment, the multiplier can perform regular signed number coding processing on received first data through the regular signed number coding circuit to obtain target codes, then the partial product acquisition circuit performs logic operation processing on high-order data of the original partial product according to each bit number value in the target codes to achieve elimination of sign bit expansion processing through the logic gate unit to obtain a partial product after elimination of sign bit expansion, and finally the correction accumulation circuit performs accumulation correction processing on the partial product after elimination of sign bit expansion to ensure that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding circuit, so that the number of effective partial products acquired in the multiplication operation process is reduced, and the complexity of the multiplier for achieving multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 3 is a schematic diagram of a specific structure of a multiplier provided in one embodiment, as shown in fig. 3, where the multiplier includes the canonical signed number coding circuit 11, and the canonical signed number coding circuit 11 includes: a regular code processing unit 111 and a partial product acquisition unit 112; the output end of the regular signed number coding processing unit 111 is connected to the input end of the partial product obtaining unit 112. The regular signed number coding processing unit 111 is configured to perform regular signed number coding processing on the received first data to obtain a target code, and the partial product obtaining unit 112 is configured to obtain an original partial product according to the target code, and perform logic operation processing according to the original partial product.

Optionally, the partial product obtaining unit 112 is specifically configured to obtain an original partial product according to the target code, and perform binary addition operation according to a highest bit number of the original partial product, so as to obtain the partial product after the symbol bit expansion is eliminated. Optionally, the partial product acquisition unit 112 includes first full adders 112a and 1122b.

Specifically, the regular signed number coding processing unit 111 may receive the first data, and perform regular signed number coding processing on the first data to obtain a target code; the first data may be a multiplier in a multiplication operation. It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; the new data is then taken as the nextThe initial data of the stage conversion processing is not continued until there is no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1). Further, in the regular signed number encoding process, data 11 may be converted to (100-001), i.e., data 11 may be equivalently converted to 10 (-1); data 111 may be converted to (1000-0001), i.e., data 111 may be equivalently converted to 100 (-1); by analogy, the other consecutive l (l >=2) the manner of the bit-number 1 conversion process is also similar.

For example, the multiplier received by the regular-symbol-number encoding processing unit 111 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by performing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by performing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by performing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 00) 0", the fifth new data obtained by performing the fifth-stage conversion processing on the fourth new data is "010 (-1) 0 (-1) 0 (-1) 0 (-1) 00 (-1) 0", no continuous l (l > 2) bit number value 1 is present in the fifth new data, the fifth new data can be called intermediate encoding, the intermediate encoding can be performed after performing the fourth-stage conversion processing on the third new data, the intermediate encoding can be performed on the intermediate encoding can be performed, and the intermediate encoding can be represented by the intermediate encoding can be performed, and the intermediate encoding can be performed with a symbol number is equal to the regular bit. Optionally, after the regular signed number encoding processing unit 111 performs the regular signed number encoding processing on the multiplier, in the obtained new data (i.e. intermediate encoding), if the highest order number value and the next highest order number value in the new data are "10" or "01", the regular signed number encoding processing unit 111 may supplement one bit value 0 to the higher order position of the highest order number value of the intermediate encoding obtained by the new data, so as to obtain the highest three-order number value of the corresponding target encoding as "010" or "001", respectively. Alternatively, the above intermediate encoded bit width may be equal to the target encoded bit width minus 1.

The bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of original partial products, and the partial product obtaining unit 112 may obtain a corresponding original partial product according to each bit value in the target code, and perform an and logic operation on the highest bit value in each original partial product through the two first full adders 112a and 1122b included in the partial product obtaining unit 112. Alternatively, the bit width of the original partial product may be equal to the bit width N of the multiplier received by the multiplier. Alternatively, as shown in the above example, the target code includes three values, namely, -1,0, and 1, wherein the partial product obtaining unit 112 may obtain an original partial product of-X according to the received value-1 and the multiplicand X, obtain the original partial product of X according to the received value 1 and the multiplicand X, and obtain the original partial product of 0 according to the received value 0 and the multiplicand X.

It should be noted that, if the highest numerical value of the original partial product is denoted by a, after performing a logic operation on the highest numerical value a, an additional numerical value in the partial product after the symbol bit expansion is eliminated in the target encoding can be obtained, and the numerical value may be denoted by Q. Optionally, the additional one-bit value Q in the partial product after the symbol bit expansion is eliminated may be determined according to the result of the and logic operation performed by the highest-bit value a and the signal 1 in the original partial product, where the Q-bit value in the partial product after the symbol bit expansion is eliminated may be equal to the carry signal of the and logic operation performed by the highest-bit value a and the signal 1 in the original partial product, and the next highest-bit value in the partial product after the symbol bit expansion is eliminated may be equal to the sum signal of the and logic operation performed by the highest-bit value a and the signal 1.

According to the multiplier provided by the embodiment, the multiplier can perform regular signed number coding processing on received first data through the regular signed number coding processing unit to obtain target codes, then the partial product acquisition unit performs AND logic operation according to each bit number value in the target codes and the highest bit number value of the original partial product to realize elimination of sign bit expansion processing to obtain a partial product after elimination of sign bit expansion, and finally the partial product after elimination of sign bit expansion is subjected to accumulation correction processing through the correction accumulation circuit, so that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding circuit, the number of effective partial products acquired in the multiplication operation process is reduced, and the complexity of the multiplier for realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the canonical signed number coding processing unit 111, and the canonical signed number coding processing unit 111 includes: a data input port 1111 and a target code output port 1112; the data input port 1111 is configured to receive the first data subjected to regular signed number encoding, and the target encoding output port 1112 is configured to output a target encoding obtained by performing regular signed number encoding on the received first data.

Specifically, if the data input port 1111 receives the first data, the regular signed number coding processing unit 111 may perform regular signed number coding processing on the received first data to obtain a target code, and output the target code through the target code output port 1112. Alternatively, the canonical signed number encoding processing unit 111 may receive the first data through the data input port 1111, and the first data may be a multiplier in the multiplication operation. The regular signed number coding circuit 11 shown in fig. 3 has the same internal circuit configuration and external output port and function as the regular signed number coding processing unit 111. Alternatively, the values included in the target codes obtained by the regular signed number coding processing unit 111 performing the regular signed number coding processing on the multiplier may be-1, 0, and 1.

The multiplier provided by the embodiment can perform regular signed number coding processing on the received first data to obtain target codes, then the partial product obtaining unit can obtain corresponding partial products after eliminating the sign bit expansion according to each bit number value in the target codes, and can perform accumulation correction processing on the partial products after eliminating the sign bit expansion through the correction accumulation circuit to obtain target operation results in multiplication operation, so that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding processing unit, the number of effective partial products obtained in the multiplication operation process is reduced, and the complexity of the multiplier for realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the partial product acquisition unit 112, and the partial product acquisition unit 112 includes: a target code input port 1121, a data input port 1122, and a partial product output port 1123; the target code input port 1121 is configured to receive the target code, the data input port 1122 is configured to receive second data, and the partial product output port 1123 is configured to output a partial product obtained by obtaining the symbol bit cancellation bit extension based on the target code and the received second data.

Specifically, the partial product obtaining unit 112 may receive the regular signed number code processing unit 111 through the target code input port 1121 to output the target code, and the partial product obtaining unit 112 receives second data according to each bit value in the target code received by the target code input port 1121 and the data input port 1122 to obtain an original partial product, where the second data may be a multiplicand in multiplication operation, and performs and logic operation processing on the original partial product, so as to obtain a corresponding partial product after eliminating sign bit expansion. Alternatively, the bit width of the partial product after eliminating the sign bit extension may be equal to the bit width of the original partial product.

According to the multiplier provided by the embodiment, the multiplier can obtain the corresponding partial product after the symbol bit expansion is eliminated according to each bit value in the target code through the partial product obtaining unit, and can carry out accumulation correction processing on the partial product after the symbol bit expansion is eliminated through the correction accumulation circuit, so that the target operation result in the multiplication operation is obtained, the reduction of the number of the effective partial products obtained by the multiplier is ensured, and the complexity of the multiplier for realizing the multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the specific structure of the multiplier shown in fig. 3 is further illustrated, where the multiplier includes the modified accumulation circuit 12, and the modified accumulation circuit 12 includes: and full adders 121 to 12n, wherein the plurality of full adders 121 to 12n are configured to perform accumulation correction processing on the received partial product after the spread of the erasure symbol bits.

Specifically, the full adders 121 to 12n may implement a binary addition and summation combination circuit by using a gate circuit, and may also be understood as a circuit for processing a multi-bit input signal and adding the multi-bit input signal to obtain a two-bit output signal. Alternatively, the number N of full adders included in the correction accumulation circuit 12 may be equal to the product of the bit width N of the partial product after the sign bit expansion is eliminated and (n+1) and N, where N may represent the number of values included in the target code obtained by the regular signed number code processing unit 111 minus 1, that is, the number of target codes is equal to n+1. Alternatively, the distribution rule of the n full adders in the correction accumulation circuit 12 may be a layer-by-layer distribution, and each partial product obtained by the partial product obtaining unit 112 after the symbol bit expansion is eliminated may correspond to one layer of full adder. The number of layers of full adders may be equal to the number of partial products after the symbol bit expansion is eliminated, the number of full adders of the last layer may be equal to the sum of the bit width N of the partial products after the symbol bit expansion is eliminated and 1 and N, and the number of full adders of each other layer may be equal to the bit width N of the partial products after the symbol bit expansion is eliminated. In addition, when the accumulation processing is performed on the partial products after the expansion of all the elimination sign bits, the position of the lowest numerical value of the partial product after the expansion of each elimination sign bit is staggered by one numerical value to the right than the position of the lowest numerical value of the partial product after the expansion of the next elimination sign bit. Alternatively, after the full adders 121 to 12n end the accumulation correction process, an operation result may be obtained, and the operation result may be a sum bit signal output by the last full adder. The internal circuit configuration of the full adders 121 to 12n may be the same as the internal circuit configuration of the first full adders 112a and 1122b, and the functions may be the same.

It should be noted that, each full adder in the correction accumulation circuit 12 may perform addition operation on two or more input signals to obtain two output signals, and the two output signals may include a Carry signal Carry and a result bit signal Sum. Alternatively, in this embodiment, each full adder in the modified accumulation circuit 12 may receive three input signals, which may be any one of a partial product obtained by eliminating sign bit expansion, a Carry output signal Carry obtained by the low-order adder, a result bit signal Sum, and a binary signal. Alternatively, in the process of performing the accumulation correction processing on the partial product after the symbol bit expansion is eliminated by the correction accumulation circuit 12, the correction processing may be performed on the two partial products after the symbol bit expansion obtained by the partial product obtaining unit 112 by a full adder in the correction accumulation circuit 12, and the correction processing corresponds to the addition 1 processing. Optionally, the multiplier may accumulate the partial product of the first symbol bit expansion and the corresponding bit of the partial product of the second symbol bit expansion obtained by the partial product obtaining unit 112 by modifying the first full adder in the accumulating circuit 12, the second full adder may accumulate the partial product of the third symbol bit expansion obtained by the partial product obtaining unit 112 and the result of the previous full adder, and so on, and the final full adder may accumulate the result of the previous full adder, the unprocessed carry signal or the sum bit signal in the signal output by each previous full adder and the partial product of the last symbol bit expansion obtained by the partial product obtaining unit 112, to obtain the target operation result in the multiplication operation, and in the processing process, the input signal received by each full adder of other layers may include not only the partial product corresponding bit value of each symbol bit expansion obtained by the first full adder, but also the value of the corresponding bit output by the previous full adder and the corresponding bit of the previous full adder.

Alternatively, the correction accumulation circuit 12 may perform correction processing twice in the process of accumulating the partial product after the symbol bit expansion is eliminated, where the correction accumulation circuit 12 may perform correction processing on the values in the partial product after the symbol bit expansion through two full adders in the first layer and the last layer full adder, where if each full adder corresponds to a number, the full adder performing correction processing in the first layer full adder may be the full adder of the next highest number, and the full adder performing correction processing in the last layer full adder may be the full adder of the highest number. In addition, the carry input signal received by the full adder of the lowest bit number of the last layer full adder may be equal to 0.

According to the multiplier provided by the embodiment, the correction accumulation circuit in the multiplier can carry out accumulation correction processing on the partial product obtained by the partial product obtaining unit after less symbol bit expansion is eliminated, so that a target operation result in multiplication operation is obtained, the complexity of the multiplier in realizing the multiplication operation is reduced, and the power consumption of the multiplier is effectively reduced.

Fig. 4 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the modified accumulation circuit 23, and the modified accumulation circuit 23 includes: a modified Wallace tree group sub-circuit 231 and an accumulation sub-circuit 232; wherein, the output end of the modified Wallace tree group sub-circuit 231 is connected with the input end of the accumulation sub-circuit 232; the modified wallace tree group sub-circuit 231 is configured to perform an accumulation modification process on the partial product after the symbol bit expansion is eliminated, and the accumulation sub-circuit 232 is configured to perform an accumulation process on the accumulation modification operation result.

Specifically, the modified wallace tree group sub-circuit 231 may perform accumulation modification processing on the numerical value in the partial product obtained by the regular signed number encoding circuit 211 after the sign bit is eliminated and expanded, and perform accumulation processing on the accumulation modification operation result obtained by the modified wallace tree group sub-circuit 13 through the accumulation sub-circuit 232, so as to obtain the target operation result in multiplication operation.

In one embodiment, the specific structure of the multiplier shown in fig. 4 is further illustrated, wherein the multiplier includes the modified wallace tree group sub-circuit 231, and the modified wallace tree group sub-circuit 231 includes: wallace tree units 2311 to 231n, wherein the Wallace tree units 2311 to 231n are used for performing accumulation correction processing on each column number of the partial product after the symbol bit expansion is eliminated.

Specifically, the circuit structure of the wale tree units 2311 through 231n may be implemented by a combination of a full adder and a half adder, and in addition, it may be understood that the wale tree units 2311 through 231n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Alternatively, the number N of the wallace tree units included in the modified wallace tree group sub-circuit 231 may be equal to 2 times the bit width N of the partial product after the symbol bit expansion is eliminated, where N may represent the number of values included in the target code obtained by the regular signed number coding circuit 21 minus 1; meanwhile, the n wallace tree units may perform parallel processing on the partial product of the target code, but the connection mode may be serial connection, where the partial product of the target code may be a partial product obtained by the partial product obtaining circuit 22 after all symbol bit expansion is eliminated. Alternatively, each Wallace tree unit in the modified Wallace tree group sub-circuit 23 may perform addition processing on all values of each column of all partial products after the cancellation of the sign bit extension, and each Wallace tree unit may output two signals, namely, carry signal Carry _i And a Sum bit signal Sum _i Wherein i may represent a number corresponding to each Wallace tree unit, and the number of the first Wallace tree unit is 0. Alternatively, the number of received input signals of each Wallace tree unit may be equal to the number of all the values contained in the target code or the total number of partial products after the sign bit expansion is eliminated, and may also be equal to the number of all the values contained in the target codeThe total number of partial products after symbol bit expansion is eliminated or added to 1.

In addition, in the process of adding each column value of the partial product after the symbol bit expansion, the multiplier corrects the two columns of data in the partial product after the symbol bit expansion by correcting the two columns of the column units in the column group sub-circuit 231, that is, the input signals of the two column units corresponding to the two columns of data in the partial product after the symbol bit expansion are more than the input signals of each column of the column units corresponding to the other column values in the partial product after the symbol bit expansion, and the input signal is 1.

In addition, the signal received by each Wallace tree cell in modified Wallace tree group sub-circuit 231 may include a carry input signal Cin _i Partial product input signal, carry output signal Cout _i . Alternatively, the partial product input signal received by each Wallace tree unit may be the numerical value of each column in the partial product after all the symbol bit expansion is eliminated, and the carry signal Cout output by each Wallace tree unit _i The number of bits of (a) may be equal to N _Cout ＝floor((N _I +N _Cin )/2) -1. Wherein N is _I Can represent the number of partial product value input signals of the Wallace tree unit, N _Cin Can represent the number of carry input signals of the Wallace tree unit, N _Cout The number of carry out signals that may represent the minimum of the Wallace tree cells, floor (·) may represent a rounding down function. Optionally, the carry input signal received by each of the wallace tree units in the modified wallace tree group sub-circuit 231 may be the carry output signal output by the last wallace tree unit, and the carry input signal received by the first wallace tree unit is 0, and at the same time, the number of carry signal input ports received by the first wallace tree unit may be the same as the number of carry signal input ports of other wallace tree units.

In this embodiment, if the serial numbers of n wallace tree units connected in series in the modified wallace tree group sub-circuit 231 are 1,2, …, i, …, n, the modified wallace tree group sub-circuit 231 may perform modification processing on two columns of data corresponding to the partial product after symbol bit expansion through the i th wallace tree unit and the n th wallace tree unit; if the partial product obtained by the first symbol bit cancellation code circuit 21 is a partial product obtained by the first symbol bit cancellation code circuit, the number of bits corresponding from the lowest bit to the highest bit is 1,2, …, m-2, m-1, m, where m corresponds to the number of Q bits, and 1 corresponds to the number of the lowest bit in the partial product obtained by the first symbol bit cancellation code circuit, i may be equal to N, and it may be understood that the modified wallace tree group sub-circuit 231 may perform the modification processing on the partial product obtained by the symbol bit cancellation code by the nth wallace tree unit and the last wallace tree unit, where N may represent the bit width of the multiplier received by the multiplier.

For example, if the multiplier currently handles a fixed-point number multiplication of 8 bits by 8 bits, the partial product obtained by the partial product acquisition circuit 22 after the cancellation sign bit expansion is "p _i8 p _i7 p _i6 p _i5 p _i4 p _i3 p _i2 p _i1 p _i0 "(i=1, …, n=9), where i may represent the i-th partial product after the elimination of the sign bit expansion, the distribution rule of the 9 partial products after the elimination of the sign bit expansion may be shown in fig. 5, each origin represents each numerical value in the partial product after the elimination of the sign bit expansion, from the rightmost column to the leftmost column (17 columns of numerical values are shown in the figure, and in actual operation, the numerical value of the last column overflows, i.e., the numerical value of the last partial product after the elimination of the sign bit expansion overflows, without participating in subsequent accumulation operation), and in total 16 Wallace tree units are required to perform accumulation correction processing on the 9 partial products after the elimination of the sign bit expansion, the correction Wallace tree group sub-circuit 231 may perform correction processing through the 8 th Wallace tree unit and the last Wallace tree unit, the connection circuit diagram of the 16 Wallace tree units and the two Wallace tree unit diagrams for implementing correction processing are shown in fig. 6, where wallace_i represents the Wallace tree unit and 1 is numbered from the beginning of the fig. 6 The solid line connected between every two Wallace tree units indicates that the Wallace tree unit corresponding to the high-order number has a carry output signal, and the dotted line indicates that the Wallace tree unit corresponding to the high-order number has no carry output signal.

According to the multiplier provided by the embodiment, the modified Wallace tree group sub-circuit in the multiplier can perform accumulation modification processing on the partial product obtained by the partial product obtaining unit after the less elimination of the sign bit expansion, so that a target operation result in multiplication operation is obtained, the complexity of the multiplier in realizing the multiplication operation is reduced, and the power consumption of the multiplier is effectively reduced.

In one embodiment, the specific structure of the multiplier shown in fig. 4 is further illustrated, where the multiplier includes the accumulation sub-circuit 232, and the accumulation sub-circuit 232 includes: and an adder 2321, where the adder 2321 is configured to perform an addition operation on the accumulation correction operation result.

In particular, adder 2321 may be a different bit width adder, which may be a carry-lookahead adder. Alternatively, the adder 2321 may receive two signals output by the modified wallace tree group sub-circuit 231, and perform addition operation on the two output signals to obtain a target operation result in the multiplication operation.

According to the multiplier provided by the embodiment, the multiplier can carry out accumulation processing on two paths of signals output by the modified Wallace tree group subcircuit through the accumulation subcircuit to obtain a target operation result of multiplication operation, and the process can reduce the complexity of the multiplier in realizing the multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the adder 2321, and the adder 2321 includes: carry signal input port 2321a, and bit signal input port 2321b, and result output port 2321c; the carry signal input port 2321a is configured to receive a carry signal, the sum bit signal input port 2321b is configured to receive a sum bit signal, and the result output port 2321c is configured to output the target operation result obtained by performing accumulation processing on the carry signal and the sum bit signal.

Specifically, the adder 2321 may receive the Carry signal Carry output by the modified wallace tree group sub-circuit 231 through the Carry signal input port 2321a, receive the Sum bit signal Sum output by the modified wallace array circuit 231 through the Sum bit signal input port 2321b, and output a result of accumulating the Carry signal Carry and the Sum bit signal Sum through the result output port 2321 c.

It should be noted that, during multiplication, the multiplier may use the adder 2321 with different bit widths to perform addition operation on the Carry output signal Carry and the Sum bit output signal Sum output by the modified wallace tree group sub-circuit 231, where the bit width of the data processable by the adder 2321 may be equal to 2 times of the data bit width N currently processed by the multiplier. Alternatively, each Wallace tree cell in modified Wallace tree group subcircuit 231 may output a Carry out signal Carry _i And a Sum bit output signal Sum _i (i=0, …,2N-1, i being the corresponding number of each wale tree unit, the number starting from 0). Optionally, the carry= { [ Carry ] received by adder 2321 ₀ ：Carry _2N-2 ]0, that is, the bit width of the Carry out signal Carry received by adder 2321 is 2N, the first 2N-1 digits in the Carry out signal Carry corresponds to the Carry out signal of the first 2N-1 wallace tree units in modified wallace tree group sub-circuit 231, and the last digit in the Carry out signal Carry can be replaced with 0. Alternatively, the Sum bit output signal Sum received by adder 2321 may have a bit width of 2N and a value in Sum bit output signal Sum may be equal to the Sum bit output signal of each of the wallace tree cells in modified wallace tree group subcircuit 231.

For example, if the multiplier currently processes a fixed-point multiplication operation with 8 bits by 8 bits, the adder 2321 may be a 16-bit Carry-ahead adder, as shown in fig. 6, the modified wallace tree group sub-circuit 231 may output a Sum bit output signal Sum and a Carry output signal Carry of 16 wallace tree units, but the Sum bit output signal received by the 16-bit Carry-ahead adder may be a complete Sum bit signal Sum output by the modified wallace tree group sub-circuit 231, and the received Carry output signal may be a Carry signal Carry after all Carry output signals of the Carry output signal output by the last wallace tree unit are combined with 0 in the modified wallace tree group sub-circuit 231.

According to the multiplier provided by the embodiment, the accumulation sub-circuit can be used for carrying out accumulation processing on two paths of signals output by the modified Wallace tree group sub-circuit to obtain the target operation result of multiplication operation, and the process can reduce the complexity of the multiplier in realizing the multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 7 is a flow chart of a data processing method provided in an embodiment, which can be processed by the multiplier shown in fig. 1, and the embodiment relates to a data multiplication operation process. As shown in fig. 7, the method includes:

S101, receiving data to be processed.

Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through a canonical signed number encoding circuit. Wherein the bit width of the multiplier may be equal to the bit width of the multiplicand.

S102, carrying out regular signed number coding processing on the data to be processed to obtain target codes.

Specifically, the multiplier can perform regular signed number coding processing on the received multiplier to be processed through the regular signed number coding circuit to obtain target codes. Wherein, the bit width of the target code can be equal to the to-be-processed multiplied digital width N plus 1.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S102 to obtain the target encoding may include: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l >When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1)”And combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1).

S103, obtaining a partial product after eliminating the sign bit expansion according to the data to be processed and the target code.

It should be noted that the regular signed number coding circuit may obtain a partial product after eliminating the sign bit expansion according to the multiplicand in the multiplication operation and the target code obtained by the regular signed number coding, and the number of the partial products after eliminating the sign bit expansion may be equal to the bit width of the target code.

S104, performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, and obtaining a target operation result.

Specifically, the multiplier can perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated through a layer-by-layer full adder in the correction accumulation circuit until the operation is finished by the last layer of full adder, so as to obtain a target operation result in the multiplication operation. Alternatively, the above-mentioned accumulation correction process may be characterized as a correction process performed in the process of accumulating the partial products after the symbol bit expansion is canceled, and the correction process may be performed by correcting the two full adders in the first layer full adder and the last layer full adder in the accumulation circuit. Optionally, the target operation result may be an operation result obtained by eliminating the sign bit expansion and performing correction accumulation processing. In addition, in the process of accumulation correction, the correction accumulation circuit may perform correction processing on the numerical value in the partial product after the symbol bit expansion is eliminated through two full adders in the first layer full adder and the last layer full adder, where if each full adder corresponds to a number, the full adder performing correction processing in the first layer full adder may be a full adder with a next highest number, and the full adder performing correction processing in the last layer full adder may be a full adder with a highest number.

In addition, the multiplier can also carry out accumulation processing on each column value of the partial product after the symbol bit expansion is eliminated through a correction Wallace tree group sub-circuit in the correction accumulation circuit, can carry out correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, outputs a carry output signal and a sum bit output signal after the correction processing through the correction Wallace tree group sub-circuit, and finally carries out accumulation processing on the carry output signal of the correction Wallace tree group sub-circuit and a signal after the last sum bit signal is replaced by 0 through the accumulation sub-circuit, and outputs a target operation result.

It should be noted that, if the multiplier currently processes N-bit data operation and 2N wallace tree units are serially connected in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree unit starts from 0, the modified wallace tree group sub-circuit may perform modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

According to the data processing method provided by the embodiment, data to be processed is received, regular signed number coding processing is carried out on the data to be processed, target codes are obtained, partial products after symbol bit expansion are eliminated are obtained according to the data to be processed and the target codes, and accumulation correction processing is carried out on the partial products after symbol bit expansion is eliminated, so that target operation results are obtained. Meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, the data processing method in S103, which obtains a partial product with the symbol bit spread eliminated according to the data to be processed and the target code, includes:

s1031, obtaining an original partial product according to the data to be processed and the target code.

It should be noted that the number of the original partial products may be equal to the bit width of the target code.

Exemplary, if the partial product acquisition unit receives an 8-bit multiplicand "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ "(i.e., X), the partial product acquisition unit may be based on the multiplicand" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ The corresponding original partial product is directly obtained by (i.e., X) and three values-1, 0,1 contained in the target code, when the one-bit value in the target code is-1, the original partial product can be-X, when the one-bit value in the target code is 0, the original partial product can be 0, and when the one-bit value in the target code is 1, the original partial product can be X.

S1032, carrying out addition operation processing on the original partial product to obtain a partial product with the sign bit expansion eliminated.

Optionally, in S1032, the adding operation is performed on the original partial product to obtain a partial product with the sign bit extension eliminated, including: and performing AND logic operation on the highest bit numerical value of the original partial product to obtain a partial product with the sign bit expansion eliminated.

Specifically, the multiplier may perform and logic operation on the highest bit value of each original partial product through the partial product obtaining unit, and may obtain an extra one bit value Q and a next highest bit value in the partial product after the symbol bit expansion is eliminated, so as to obtain a partial product after the symbol bit expansion is eliminated. Optionally, the additional one-bit value Q in the partial product after the symbol bit expansion is eliminated may be a carry signal of performing an and logic operation on the sum signal 1 of the highest-bit value in the original partial product, and the next highest-bit value in the partial product after the symbol bit expansion is eliminated may be a sum signal of performing an and logic operation on the sum signal of the highest-bit value in the original partial product.

According to the data processing method provided by the embodiment, an original partial product is obtained according to the data to be processed and the target code, AND logic operation processing is performed according to the highest bit number value of the original partial product, the partial product after the sign bit expansion is eliminated is obtained, and further, accumulation correction processing is performed on the partial product after the sign bit expansion is eliminated, so that a target operation result in multiplication operation is obtained; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 8 is a flow chart of a data processing method provided in an embodiment, which can be processed by the multiplier shown in fig. 2, and the embodiment relates to a data multiplication operation process. As shown in fig. 8, the method includes:

s201, receiving data to be processed.

S202, carrying out regular signed number coding processing on the data to be processed to obtain an original partial product.

Specifically, the multiplier performs regular signed number encoding processing on the multiplier in the multiplication operation through the regular signed number encoding circuit, and the partial product acquisition circuit can obtain an original partial product according to the result of the regular signed number encoding processing.

S203, performing logic operation processing according to the original partial product, and eliminating the sign extension bit to obtain a partial product with the sign extension eliminated.

Specifically, the multiplier can perform logic operation processing on the original partial product through a logic gate unit in the partial product acquisition circuit, and directly eliminates the numerical value of the sign extension bit to obtain the partial product with the sign bit extension eliminated.

S204, performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, and obtaining a target operation result.

Specifically, the multiplier can perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated through a layer-by-layer full adder in the correction accumulation circuit until the operation is finished by the final layer full adder, so as to obtain an operation result. Alternatively, the above-mentioned accumulation correction process may be characterized as a correction process performed in the process of accumulating the partial products after the symbol bit expansion is canceled, and the correction process may be performed by correcting the two full adders in the first layer full adder and the last layer full adder in the accumulation circuit. Alternatively, the operation result may be an operation result obtained by eliminating the sign bit expansion and performing correction accumulation processing. In addition, in the process of accumulation correction processing, the correction accumulation circuit can perform correction processing on the numerical value in the partial product after the symbol bit expansion is eliminated through two full adders in the first layer full adder and the last layer full adder, wherein if each full adder corresponds to one number, the full adder performing correction processing in the first layer full adder can be the full adder with the next highest number, and the full adder performing correction processing in the last layer full adder can be the full adder with the highest number.

In addition, the multiplier can also accumulate each column number value of the partial product after eliminating the sign bit expansion through a correction Wallace tree group sub-circuit in a correction accumulation circuit, can perform correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, outputs a Carry output signal and a sum bit output signal after the correction processing through the correction Wallace tree group sub-circuit, and finally outputs all Carry output signals Carry of the correction Wallace tree group sub-circuit through the accumulation sub-circuit _i And replacing the last Sum bit signal Sum with 0 _2N And accumulating all the sum bit signals and outputting the operation result. It should be noted that, if the multiplier currently processes N-bit data operation and 2N wallace tree units are serially connected in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree unit starts from 0, the modified wallace tree group sub-circuit may perform modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

According to the data processing method provided by the embodiment, data to be processed is received, regular signed number coding processing is conducted on the data to be processed to obtain an original partial product, logic operation processing is conducted on the original partial product to obtain a partial product after symbol bit expansion is eliminated, accumulation correction processing is conducted on the partial product after symbol bit expansion is eliminated to obtain a target operation result, and the method can conduct regular signed number coding on the received data to be processed, reduces the number of effective partial products in multiplication operation, and therefore reduces complexity of multiplication operation; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, the data processing method in S202 performs regular signed number encoding processing on the data to be processed to obtain an original partial product, including:

s2021, carrying out regular signed number coding processing on the data to be processed to obtain target codes.

Specifically, the multiplier can perform regular signed number coding processing on the multiplier in multiplication operation through a regular signed number coding circuit to obtain target codes. Optionally, after the regular signed number coding processing, the obtained target codes include three values, namely-1, 0 and 1.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S2021 to obtain the target encoding may include: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; which is a kind ofIn the method, N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1).

S2022, obtaining the original partial product according to the data to be processed and the target code.

It should be noted that the number of original partial products may be equal to the bit width of the target code.

Exemplary, if the original partial product acquisition unit receives an 8-bit multiplicand "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ "(i.e., X), the original partial product acquisition unit may be based on the multiplicand" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ The corresponding original partial product is directly obtained by (i.e., X) and three values-1, 0,1 contained in the target code, when the one-bit value in the target code is-1, the original partial product can be-X, when the one-bit value in the target code is 0, the original partial product can be 0, and when the one-bit value in the target code is 1, the original partial product can be X.

According to the data processing method provided by the embodiment, regular signed number coding processing is carried out on the data to be processed to obtain target codes, the original partial product is obtained according to the data to be processed and the target codes, then symbol bit expansion processing is carried out on the original partial product, and accumulation correction processing is carried out on the partial product after symbol bit expansion is eliminated to obtain a target operation result in multiplication operation. Meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, in the data processing method provided in the above S203, performing a logic operation process according to the original partial product, and eliminating the sign extension bit to obtain a partial product with the sign extension eliminated includes: and performing AND logic operation on the highest bit numerical value of the original partial product, and eliminating the sign expansion bit to obtain the partial product after eliminating the sign bit expansion.

Specifically, the multiplier may perform an and logic operation on the highest order numerical value in the original partial product through a logic gate unit in the partial product acquisition circuit to obtain a next highest order numerical value and a highest order numerical value in the partial product after the sign bit expansion is eliminated, and may further perform an and logic operation on the highest order numerical value in the original partial product and the signal 1 through a logic gate unit in the partial product acquisition circuit to obtain an additional one-bit numerical value Q in the partial product after the sign bit expansion is eliminated and a next highest order numerical value (i.e., a low one-bit numerical value of Q bits) in the partial product after the sign bit expansion is eliminated.

According to the data processing method provided by the embodiment, after the data to be processed is processed, the original partial product is obtained, and the AND logic operation is carried out on the highest bit numerical value of the original partial product, so that the symbol expansion bit is eliminated to obtain the partial product with the symbol bit expansion eliminated, and the power consumption of the multiplier can be effectively reduced.

The embodiment of the application also provides a machine learning operation device which comprises one or more multipliers, wherein the multipliers are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers may be linked and data transferred through a specific structure, such as interconnection and data transfer through a PCIE bus, to support larger scale machine learning operations. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.

The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.

The embodiment of the application also provides a combined processing device which comprises the machine learning operation device, a general interconnection interface and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 9 is a schematic diagram of a combination processing apparatus.

Other processing means include one or more processor types of general-purpose/special-purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; other processing devices may cooperate with the machine learning computing device to perform the computing task.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a machine learning operation device chip; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to other processing devices.

Alternatively, as shown in fig. 10, the structure may further include a storage device connected to the machine learning operation device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the machine learning arithmetic device or the other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.

In some embodiments, a chip is also disclosed, which includes the machine learning computing device or the combination processing device.

In some embodiments, a chip package structure is disclosed, which includes the chip.

In some embodiments, a board card is provided that includes the chip package structure described above. As shown in fig. 11, fig. 11 provides a board that may include other mating components in addition to the chips 389, including but not limited to: a storage device 390, a receiving device 391 and a control device 392;

the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.

In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.

The receiving device is electrically connected with the chip in the chip packaging structure. The receiving means is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the receiving device may be another interface, and the present application is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g., server) by the receiving apparatus.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device may be a multiplier, robot, computer, printer, scanner, tablet, smart terminal, cell phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, watch, headset, mobile storage, wearable device, vehicle, household appliance, and/or medical device.

The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of circuit combinations, but those skilled in the art should appreciate that the present application is not limited by the described circuit combinations, as some circuits may be implemented in other manners or structures according to the present application. Further, it should be understood by those skilled in the art that the embodiments described in the specification are all alternative embodiments, and the devices and modules involved are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A multiplier, the multiplier comprising: the regular signed number coding circuit comprises a regular signed number coding processing unit and a partial product acquisition unit, wherein the partial product acquisition unit comprises a first full adder, the correction accumulation circuit comprises a full adder, and the partial product acquisition unit comprises a target coding input port, a data input port and a partial product output port; the output end of the regular signed number coding circuit is connected with the input end of the correction accumulation circuit; the output end of the regular signed number coding processing unit is connected with the input end of the partial product acquisition unit;

the regular signed number coding circuit is used for performing regular signed number coding processing on received data to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit is used for performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated; the regular signed number coding processing unit is used for performing regular signed number coding processing on the received first data to obtain target codes, the partial product obtaining unit is used for obtaining an original partial product according to the target codes and performing logic operation processing according to the original partial product, and the full adder is used for performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated; the target code input port is used for receiving the target code, the data input port is used for receiving second data, and the partial product output port is used for outputting a partial product obtained by acquiring the symbol bit elimination extension according to the target code and the received second data.

2. The multiplier of claim 1, wherein the canonical signed number encoding processing unit includes: a data input port and a target code output port; the data input port is used for receiving the first data subjected to regular signed number coding processing, and the target coding output port is used for outputting target codes obtained after the received first data is subjected to regular signed number coding processing.

3. The multiplier according to claim 2, wherein said partial product obtaining unit is specifically configured to obtain an original partial product according to the target code, and perform binary addition according to a highest bit number value of the original partial product, to obtain the partial product after the symbol bit expansion is eliminated.

4. A machine learning computing device, characterized in that it comprises one or more multipliers according to any one of claims 1-3, for acquiring input data and control information to be computed from other processing devices, and executing a specified machine learning operation, and transmitting the execution result to other processing devices through an I/O interface;

when the machine learning operation device comprises a plurality of multipliers, the multipliers are connected through a preset structure and data are transmitted;

5. A combination processing device, comprising the machine learning computing device of claim 4, a universal interconnect interface, and other processing devices;

the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

6. The combination processing device of claim 5, further comprising: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

7. A neural network chip, characterized in that the neural network chip includes the machine learning arithmetic device according to claim 4 or the combination processing device according to claim 5 or the combination processing device according to claim 6.

8. An electronic device comprising the neural network chip of claim 7.

9. A board, characterized in that, the board includes: a memory device, a receiving means and a control device, and a neural network chip as claimed in claim 7;

the neural network chip is respectively connected with the storage device, the control device and the receiving device;

the storage device is used for storing data;

the receiving device is used for realizing data transmission between the neural network chip and external equipment;

the control device is used for monitoring the state of the neural network chip.

10. The board card of claim 9, wherein the board card comprises,

the memory device includes: each group of storage units is connected with the neural network chip through a bus, and the storage units are as follows: DDR SDRAM;

the chip comprises: the DDR controller is used for controlling data transmission and data storage of each storage unit;

the receiving device is as follows: standard PCIE interfaces.