CN110515586B

CN110515586B - Multiplier, data processing method, chip and electronic equipment

Info

Publication number: CN110515586B
Application number: CN201910817905.0A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2024-04-09
Anticipated expiration: 2039-08-30
Also published as: CN110515586A

Abstract

The application provides a multiplier, a data processing method, a chip and electronic equipment, wherein the multiplier comprises: the regular has symbol number encoding circuit, partial product acquisition circuit and correction accumulation circuit; the output end of the regular signed number coding circuit is connected with the input end of the partial product acquisition circuit, the output end of the partial product acquisition circuit is connected with the input end of the correction accumulation circuit, and the multiplier can perform regular signed number coding on received data through the regular signed number coding circuit, so that the number of obtained effective partial products is small, and the complexity of the multiplier in realizing multiplication operation is reduced.

Description

Multiplier, data processing method, chip and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a multiplier, a data processing method, a chip, and an electronic device.

Background

With the continuous development of digital electronic technology, the rapid development of various artificial intelligence (Artificial Intelligence, AI) chips has also been increasingly demanded for high-performance digital multipliers. The neural network algorithm is one of algorithms widely used by intelligent chips, and multiplication operation through a multiplier is a common operation in the neural network algorithm.

At present, the multiplier takes each three-digit value in the multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a Wallace tree to obtain a target operation result in multiplication operation. However, in the conventional technology, the number of non-zero numerical values in the code is large, and the number of corresponding partial products is large, so that the complexity of the multiplier in realizing multiplication is high.

Disclosure of Invention

Accordingly, in order to solve the above-mentioned problems, it is necessary to provide a multiplier, a data processing method, a chip and an electronic device capable of reducing the number of partial products obtained during the multiplication process, so as to reduce the complexity of the multiplication process of the multiplier.

The embodiment of the application provides a multiplier, which comprises: the device comprises a regular signed number coding circuit, a partial product acquisition circuit and a correction accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the partial product acquisition circuit, and the output end of the partial product acquisition circuit is connected with the input end of the correction accumulation circuit;

the regular signed number coding circuit is used for carrying out regular signed number coding on received data to obtain target coding, the partial product obtaining circuit is used for obtaining an original partial product according to the target coding, carrying out logic operation processing according to the original partial product to obtain a partial product after eliminating the sign bit expansion, and the correction accumulation circuit is used for carrying out accumulation correction processing on the partial product after eliminating the sign bit expansion.

In one embodiment, the canonical signed number coding circuit includes: a data input port and a target code output port; the data input port is used for receiving first data subjected to regular signed number coding, and the target coding output port is used for outputting target codes obtained after the received first data is subjected to regular signed number coding.

In one embodiment, the partial product acquisition circuit includes an original partial product acquisition unit, and a logic gate unit, where the original partial product acquisition unit is configured to obtain an original partial product according to target encoding, and the logic gate unit is configured to perform logic operation processing on a high two-bit numerical value of the original partial product, so as to obtain a partial product with sign bit expansion eliminated.

In one embodiment, the partial product acquisition circuit includes an and circuit.

In one embodiment, the correction accumulation circuit includes: the system comprises a modified Wallace tree group sub-circuit and an accumulation sub-circuit, wherein the output end of the modified Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit;

the modified Wallace tree group sub-circuit is used for carrying out accumulation modification processing on the partial product after the symbol bit expansion is eliminated, and the accumulation sub-circuit is used for carrying out accumulation processing on the accumulation modification operation result.

In one embodiment, the modified Wallace Tree group sub-circuit comprises: and the Wallace tree unit is used for carrying out accumulation correction processing on each column number of the partial product after the symbol bit expansion is eliminated.

In one embodiment, the accumulation sub-circuit includes: and the adder is used for carrying out addition operation on the accumulation correction operation result.

In one embodiment, the adder includes: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting the target operation result of accumulation processing of the carry signal and the sum bit signal.

According to the multiplier provided by the embodiment, the regular signed number coding circuit can be used for carrying out regular signed number coding on received data to obtain target codes, the partial product acquisition circuit is used for carrying out logic operation processing on the original partial product according to each bit number value in the target codes to obtain a corresponding partial product after eliminating the sign bit expansion, and finally the correction accumulation circuit is used for carrying out accumulation correction processing on the partial product after eliminating the sign bit expansion.

The embodiment of the application provides a data processing method, which comprises the following steps:

receiving data to be processed;

carrying out regular signed number coding treatment on the data to be treated to obtain an original partial product;

performing logic operation processing according to the original partial product, and eliminating the sign extension bit to obtain a partial product with the sign extension eliminated;

and performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, so as to obtain a target operation result.

In one embodiment, the performing regular signed number encoding on the data to be processed to obtain an original partial product includes:

carrying out regular signed number coding treatment on the data to be processed to obtain a target code;

and obtaining the original partial product according to the data to be processed and the target code.

In one embodiment, the performing regular signed number encoding on the data to be processed to obtain a target encoding includes: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

In one embodiment, the performing logic operation according to the original partial product, and removing the sign extension bit to obtain a partial product after removing the sign bit extension includes: and performing AND logic operation on the highest bit numerical value of the original partial product, and eliminating the sign expansion bit to obtain the partial product after eliminating the sign bit expansion.

According to the data processing method provided by the embodiment, the multiplier receives data to be processed, regular signed number coding is carried out on the data to be processed to obtain an original partial product, logic operation processing is carried out on the original partial product, the symbol expansion bits are eliminated to obtain a partial product after the symbol bit expansion is eliminated, and accumulation correction processing is carried out on the partial product after the symbol bit expansion is eliminated to obtain a target operation result.

The embodiment of the application provides a machine learning operation device, which comprises one or more multipliers; the machine learning operation device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to the other processing devices through an I/O interface;

when the machine learning operation device comprises a plurality of multipliers, a plurality of calculation devices are connected through a preset specific structure and transmit data;

the multipliers are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or have respective control systems; the multipliers share the memory or have the memory of each; the interconnection mode of a plurality of multipliers is any interconnection topology.

The embodiment of the application provides a combined processing device, which comprises the machine learning processing device, a general interconnection interface and other processing devices; the machine learning operation device interacts with the other processing devices to jointly complete the operation appointed by the user; the combination processing device may further include a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

The embodiment of the application provides a neural network chip, which includes the multiplier, the machine learning computing device or the combination processing device.

The embodiment of the application provides a neural network chip packaging structure, which comprises the neural network chip.

The embodiment of the application provides a board card, which comprises the neural network chip packaging structure.

The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.

The chip provided by the embodiment of the application comprises at least one multiplier as described in any one of the above.

An electronic device provided in an embodiment of the present application includes a chip as described

Drawings

FIG. 1 is a schematic diagram of a multiplier according to an embodiment;

FIG. 2 is a schematic diagram of another multiplier according to another embodiment;

FIG. 3 is a schematic diagram of a multiplier according to an embodiment;

FIG. 4 is a schematic diagram of another multiplier according to another embodiment;

FIG. 5 is a schematic diagram of a distribution rule of 9 partial products after symbol bit expansion cancellation according to another embodiment;

FIG. 6 is a diagram showing another embodiment of a correction accumulation circuit for 8-bit data operation according to the present invention;

FIG. 7 is a flow chart of a method for processing data according to an embodiment;

FIG. 8 is a flowchart of another method for processing data according to an embodiment;

FIG. 9 is a block diagram of a combination processing apparatus according to an embodiment;

FIG. 10 is a block diagram of another combination processing apparatus according to one embodiment;

fig. 11 is a schematic structural diagram of a board according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The multiplier provided by the application can be applied to an AI chip, a Field programmable gate array FPGA (Field-Programmable Gate Array, FPGA) chip or other hardware circuit devices for multiplication, and the specific structure schematic diagrams are shown in figures 1 and 2.

As shown in fig. 1, fig. 1 is a block diagram of a multiplier according to an embodiment, where the multiplier includes: the device comprises a regular signed number coding circuit 11 and a correction accumulation circuit 12, wherein the output end of the regular signed number coding circuit 11 is connected with the input end of the correction accumulation circuit 12; the regular signed number coding circuit 11 is configured to perform regular signed number coding on the received data to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit 12 is configured to perform accumulation correction on the partial product after the symbol bit expansion is eliminated.

Specifically, the regular signed number coding circuit 11 may include a plurality of data processing units with different functions, and the data received by the regular signed number coding circuit 11 may be used as a multiplier in multiplication operation or a multiplicand in multiplication operation. Alternatively, the data processing units of the different functions may comprise data processing units having a regular signed number encoding process which may be characterized as a data processing process encoded by the values 0, -1 and 1. Alternatively, the multipliers and multiplicands may be fixed point numbers that are multiple bits wide. Optionally, the correction accumulation circuit 12 may perform correction processing in the process of accumulating the partial product obtained by the regular signed number encoding circuit 11 after the sign bit expansion is eliminated, so as to obtain the target operation result in the multiplication operation.

It should be noted that, the multiplier provided in this embodiment may process multiplication operation of fixed bit width data, where the fixed bit width may be 8 bits, 16 bits, 32 bits, or 64 bits, and this embodiment is not limited in any way. However, in the same multiplication, the multiplier and the multiplicand received by the symbol encoding circuit 11 are data having the same bit width. Alternatively, there may be one input port of the data processing unit with different functions, the function of the input port of each data processing unit may be the same, there may also be one output port, the function of the output port of each data processing unit may be different, and the circuit structures of the data processing units with different functions may be different.

The multiplier provided by the embodiment performs regular signed number coding processing on received data through the regular signed number coding circuit to obtain a partial product after the expansion of the elimination sign bit, and the correction accumulation circuit can perform accumulation correction processing on the partial product after the expansion of the elimination sign bit to obtain a target operation result; the multiplier can adopt the regular signed number coding circuit to carry out regular signed number coding processing on the received data, and reduces the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of the multiplier in realizing multiplication operation; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 2 is a block diagram of a multiplier according to an embodiment. As shown in fig. 2, the multiplier includes: a regular symbol number encoding circuit 21, a partial product acquisition circuit 22, and a correction accumulation circuit 23; the output end of the regular signed number coding circuit 21 is connected with the input end of the partial product acquisition circuit 22, and the output end of the partial product acquisition circuit 22 is connected with the input end of the correction accumulation circuit 23. The regular signed number coding circuit 21 is configured to perform regular signed number coding on the received data to obtain a target code, the partial product obtaining circuit 22 is configured to obtain an original partial product according to the target code, perform logic operation processing according to the original partial product to obtain a partial product after the symbol bit expansion is eliminated, and the correction accumulation circuit 23 is configured to perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated.

Optionally, the regular signed number coding circuit 21 includes: a data input port 211 and a target code output port 212; the data input port 211 is configured to receive first data subjected to regular signed number encoding, and the target encoding output port 212 is configured to output the target encoding obtained after the received first data is subjected to regular signed number encoding.

Optionally, the partial product obtaining circuit 22 includes an original partial product obtaining unit 221 and a logic gate unit 222, where the original partial product obtaining unit 221 is configured to obtain an original partial product according to target encoding, and the logic gate unit 222 is configured to perform logic operation processing on a highest bit number value of the original partial product to obtain a partial product after eliminating sign bit expansion. Optionally, the partial product acquisition circuit 22 includes an and circuit.

Specifically, the regular signed number coding circuit 21 may receive the first data, and perform regular signed number coding processing on the first data to obtain a target code; the first data may be a multiplier in a multiplication operation. It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l >When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1). Further, in the regular signed number encoding process, data 11 may be converted to (100-001), i.e., data 11 may be equivalently converted to 10 (-1); data 111 may be converted to (1000-0001), i.e., data 111 may be equivalently converted to 100 (-1); by analogy, the other consecutive l (l>=2) the manner of the bit-number 1 conversion process is also similar.

For example, the multiplier received by the regular-symbol number coding circuit 21 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 00 (-1) 0), the fifth new data obtained after the fifth conversion processing is continuously performed on the fourth new data is '010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (-1) 0', and no continuous l (l > =2) bit number value 1 exists in the fifth new data, at this time, the fifth new data can be called intermediate coding, and after the intermediate coding is subjected to the bit supplementing processing once, the regular signed number coding processing is characterized, wherein the bit width of the intermediate coding can be equal to the bit width of the multiplier. Optionally, in the new data (i.e. intermediate code) obtained after the multiplier is subjected to the regular signed number encoding processing by the regular signed number encoding circuit 21, if the highest order number value and the next highest order number value in the new data are "10" or "01", the regular signed number encoding circuit 21 may supplement one bit value 0 to the higher order position of the highest order number value of the intermediate code obtained by the new data, so as to obtain the highest three-order number value of the corresponding target code as "010" or "001", respectively. Alternatively, the above intermediate encoded bit width may be equal to the target encoded bit width minus 1.

Alternatively, the bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of original partial products, and the original partial product obtaining unit 221 in the partial product obtaining circuit 22 may obtain a corresponding original partial product according to each bit value in the target code, and perform logic operation processing on the highest bit value in each original partial product through the logic gate 222, and directly eliminate the sign extension bit to obtain a partial product after eliminating the sign bit extension. Alternatively, the original partial product may be a partial product without sign bit expansion. At the same time, the highest order value in the original partial product is determined by logic gate 222 as an additional one-bit value in the partial product after the sign bit extension is eliminated, which may be represented by Q. Alternatively, the logic gate unit 222 may include an and gate circuit.

It should be noted that, if the highest-order numerical value of the original partial product is denoted by a, the partial product obtaining circuit 22 may perform an and logic operation on the highest-order numerical value and the signal 1 through the and circuit to obtain the highest-order numerical value of the original partial product, where the numerical value a 'corresponding to the corresponding bit in the partial product after the sign bit expansion of the elimination of the target code, that is, a' is the sum-order signal of a and the signal 1; and the additional one-bit value Q in the partial product after the sign-bit-extension cancellation resulting in the target code may be equal to a and the carry signal of signal 1. The generating relationship between the highest numerical value a of the original partial product and the corresponding highest numerical value a' and the additional one numerical value Q in the partial product after the sign bit expansion is eliminated after the logic operation processing can be seen in table 1.

TABLE 1

According to the multiplier provided by the embodiment, the multiplier can perform regular signed number coding processing on received first data through the regular signed number coding circuit to obtain target codes, then the partial product acquisition circuit performs logic operation processing on high-order data of the original partial product according to each bit number value in the target codes to achieve elimination of sign bit expansion processing through the logic gate unit to obtain a partial product after elimination of sign bit expansion, and finally the correction accumulation circuit performs accumulation correction processing on the partial product after elimination of sign bit expansion to ensure that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding circuit, so that the number of effective partial products acquired in the multiplication operation process is reduced, and the complexity of the multiplier for achieving multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 3 is a schematic diagram of a specific structure of a multiplier provided in one embodiment, as shown in fig. 3, where the multiplier includes the canonical signed number coding circuit 11, and the canonical signed number coding circuit 11 includes: a regular code processing unit 111 and a partial product acquisition unit 112; the output end of the regular signed number coding processing unit 111 is connected to the input end of the partial product obtaining unit 112. The regular signed number coding processing unit 111 is configured to perform regular signed number coding processing on the received first data to obtain a target code, and the partial product obtaining unit 112 is configured to obtain an original partial product according to the target code, and perform logic operation processing according to the original partial product.

Optionally, the partial product obtaining unit 112 is specifically configured to obtain an original partial product according to the target code, and perform binary addition operation according to a highest bit number of the original partial product, so as to obtain the partial product after the symbol bit expansion is eliminated. Optionally, the partial product acquisition unit 112 includes first full adders 112a and 1122b.

Specifically, the regular signed number coding processing unit 111 may receive the first data, and perform regular signed number coding processing on the first data to obtain a target code; the first data may be a multiplier in a multiplication operation. It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1). Further, in the regular signed number encoding process, data 11 may be converted to (100-001), i.e., data 11 may be equivalently converted to 10 (-1); data 111 may be converted to (1000-0001), i.e., data 111 may be Converted to 100 (-1) with equivalence; by analogy, the other consecutive l (l>=2) the manner of the bit-number 1 conversion process is also similar.

For example, the multiplier received by the regular-symbol-number encoding processing unit 111 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by performing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by performing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by performing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 00) 0", the fifth new data obtained by performing the fifth-stage conversion processing on the fourth new data is "010 (-1) 0 (-1) 0 (-1) 0 (-1) 00 (-1) 0", no continuous l (l > 2) bit number value 1 is present in the fifth new data, the fifth new data can be called intermediate encoding, the intermediate encoding can be performed after performing the fourth-stage conversion processing on the third new data, the intermediate encoding can be performed on the intermediate encoding can be performed, and the intermediate encoding can be represented by the intermediate encoding can be performed, and the intermediate encoding can be performed with a symbol number is equal to the regular bit. Optionally, after the regular signed number encoding processing unit 111 performs the regular signed number encoding processing on the multiplier, in the obtained new data (i.e. intermediate encoding), if the highest order number value and the next highest order number value in the new data are "10" or "01", the regular signed number encoding processing unit 111 may supplement one bit value 0 to the higher order position of the highest order number value of the intermediate encoding obtained by the new data, so as to obtain the highest three-order number value of the corresponding target encoding as "010" or "001", respectively. Alternatively, the above intermediate encoded bit width may be equal to the target encoded bit width minus 1.

The bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of original partial products, and the partial product obtaining unit 112 may obtain a corresponding original partial product according to each bit value in the target code, and perform an and logic operation on the highest bit value in each original partial product through the two first full adders 112a and 1122b included in the partial product obtaining unit 112. Alternatively, the bit width of the original partial product may be equal to the bit width N of the multiplier received by the multiplier. Alternatively, as shown in the above example, the target code includes three values, namely, -1,0, and 1, wherein the partial product obtaining unit 112 may obtain an original partial product of-X according to the received value-1 and the multiplicand X, obtain the original partial product of X according to the received value 1 and the multiplicand X, and obtain the original partial product of 0 according to the received value 0 and the multiplicand X.

It should be noted that, if the highest numerical value of the original partial product is denoted by a, after performing a logic operation on the highest numerical value a, an additional numerical value in the partial product after the symbol bit expansion is eliminated in the target encoding can be obtained, and the numerical value may be denoted by Q. Optionally, the additional one-bit value Q in the partial product after the symbol bit expansion is eliminated may be determined according to the result of the and logic operation performed by the highest-bit value a and the signal 1 in the original partial product, where the Q-bit value in the partial product after the symbol bit expansion is eliminated may be equal to the carry signal of the and logic operation performed by the highest-bit value a and the signal 1 in the original partial product, and the next highest-bit value in the partial product after the symbol bit expansion is eliminated may be equal to the sum signal of the and logic operation performed by the highest-bit value a and the signal 1.

According to the multiplier provided by the embodiment, the multiplier can perform regular signed number coding processing on received first data through the regular signed number coding processing unit to obtain target codes, then the partial product acquisition unit performs AND logic operation according to each bit number value in the target codes and the highest bit number value of the original partial product to realize elimination of sign bit expansion processing to obtain a partial product after elimination of sign bit expansion, and finally the partial product after elimination of sign bit expansion is subjected to accumulation correction processing through the correction accumulation circuit, so that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding circuit, the number of effective partial products acquired in the multiplication operation process is reduced, and the complexity of the multiplier for realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the canonical signed number coding processing unit 111, and the canonical signed number coding processing unit 111 includes: a data input port 1111 and a target code output port 1112; the data input port 1111 is configured to receive the first data subjected to regular signed number encoding, and the target encoding output port 1112 is configured to output a target encoding obtained by performing regular signed number encoding on the received first data.

Specifically, if the data input port 1111 receives the first data, the regular signed number coding processing unit 111 may perform regular signed number coding processing on the received first data to obtain a target code, and output the target code through the target code output port 1112. Alternatively, the canonical signed number encoding processing unit 111 may receive the first data through the data input port 1111, and the first data may be a multiplier in the multiplication operation. The regular signed number coding circuit 11 shown in fig. 3 has the same internal circuit configuration and external output port and function as the regular signed number coding processing unit 111. Alternatively, the values included in the target codes obtained by the regular signed number coding processing unit 111 performing the regular signed number coding processing on the multiplier may be-1, 0, and 1.

The multiplier provided by the embodiment can perform regular signed number coding processing on the received first data to obtain target codes, then the partial product obtaining unit can obtain corresponding partial products after eliminating the sign bit expansion according to each bit number value in the target codes, and can perform accumulation correction processing on the partial products after eliminating the sign bit expansion through the correction accumulation circuit to obtain target operation results in multiplication operation, so that the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding processing unit, the number of effective partial products obtained in the multiplication operation process is reduced, and the complexity of the multiplier for realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the partial product acquisition unit 112, and the partial product acquisition unit 112 includes: a target code input port 1121, a data input port 1122, and a partial product output port 1123; the target code input port 1121 is configured to receive the target code, the data input port 1122 is configured to receive second data, and the partial product output port 1123 is configured to output a partial product obtained by obtaining the symbol bit cancellation bit extension based on the target code and the received second data.

Specifically, the partial product obtaining unit 112 may receive the regular signed number code processing unit 111 through the target code input port 1121 to output the target code, and the partial product obtaining unit 112 receives second data according to each bit value in the target code received by the target code input port 1121 and the data input port 1122 to obtain an original partial product, where the second data may be a multiplicand in multiplication operation, and performs and logic operation processing on the original partial product, so as to obtain a corresponding partial product after eliminating sign bit expansion. Alternatively, the bit width of the partial product after eliminating the sign bit extension may be equal to the bit width of the original partial product.

According to the multiplier provided by the embodiment, the multiplier can obtain the corresponding partial product after the symbol bit expansion is eliminated according to each bit value in the target code through the partial product obtaining unit, and can carry out accumulation correction processing on the partial product after the symbol bit expansion is eliminated through the correction accumulation circuit, so that the target operation result in the multiplication operation is obtained, the reduction of the number of the effective partial products obtained by the multiplier is ensured, and the complexity of the multiplier for realizing the multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the specific structure of the multiplier shown in fig. 3 is further illustrated, where the multiplier includes the modified accumulation circuit 12, and the modified accumulation circuit 12 includes: and full adders 121 to 12n, wherein the plurality of full adders 121 to 12n are configured to perform accumulation correction processing on the received partial product after the spread of the erasure symbol bits.

Specifically, the full adders 121 to 12n may implement a binary addition and summation combination circuit by using a gate circuit, and may also be understood as a circuit for processing a multi-bit input signal and adding the multi-bit input signal to obtain a two-bit output signal. Alternatively, the number N of full adders included in the correction accumulation circuit 12 may be equal to the product of the bit width N of the partial product after the sign bit expansion is eliminated and (n+1) and N, where N may represent the number of values included in the target code obtained by the regular signed number code processing unit 111 minus 1, that is, the number of target codes is equal to n+1. Alternatively, the distribution rule of the n full adders in the correction accumulation circuit 12 may be a layer-by-layer distribution, and each partial product obtained by the partial product obtaining unit 112 after the symbol bit expansion is eliminated may correspond to one layer of full adder. The number of layers of full adders may be equal to the number of partial products after the symbol bit expansion is eliminated, the number of full adders of the last layer may be equal to the sum of the bit width N of the partial products after the symbol bit expansion is eliminated and 1 and N, and the number of full adders of each other layer may be equal to the bit width N of the partial products after the symbol bit expansion is eliminated. In addition, when the accumulation processing is performed on the partial products after the expansion of all the elimination sign bits, the position of the lowest numerical value of the partial product after the expansion of each elimination sign bit is staggered by one numerical value to the right than the position of the lowest numerical value of the partial product after the expansion of the next elimination sign bit. Alternatively, after the full adders 121 to 12n end the accumulation correction process, an operation result may be obtained, and the operation result may be a sum bit signal output by the last full adder. The internal circuit configuration of the full adders 121 to 12n may be the same as the internal circuit configuration of the first full adders 112a and 1122b, and the functions may be the same.

It should be noted that, each full adder in the correction accumulation circuit 12 may perform addition operation on two or more input signals to obtain two output signals, and the two output signals may include a Carry signal Carry and a result bit signal Sum. Alternatively, in this embodiment, each full adder in the modified accumulation circuit 12 may receive three input signals, which may be any one of a partial product obtained by eliminating sign bit expansion, a Carry output signal Carry obtained by the low-order adder, a result bit signal Sum, and a binary signal. Alternatively, in the process of performing the accumulation correction processing on the partial product after the symbol bit expansion is eliminated by the correction accumulation circuit 12, the correction processing may be performed on the two partial products after the symbol bit expansion obtained by the partial product obtaining unit 112 by a full adder in the correction accumulation circuit 12, and the correction processing corresponds to the addition 1 processing. Optionally, the multiplier may accumulate the partial product of the first symbol bit expansion and the corresponding bit of the partial product of the second symbol bit expansion obtained by the partial product obtaining unit 112 by modifying the first full adder in the accumulating circuit 12, the second full adder may accumulate the partial product of the third symbol bit expansion obtained by the partial product obtaining unit 112 and the result of the previous full adder, and so on, and the final full adder may accumulate the result of the previous full adder, the unprocessed carry signal or the sum bit signal in the signal output by each previous full adder and the partial product of the last symbol bit expansion obtained by the partial product obtaining unit 112, to obtain the target operation result in the multiplication operation, and in the processing process, the input signal received by each full adder of other layers may include not only the partial product corresponding bit value of each symbol bit expansion obtained by the first full adder, but also the value of the corresponding bit output by the previous full adder and the corresponding bit of the previous full adder.

Alternatively, the correction accumulation circuit 12 may perform correction processing twice in the process of accumulating the partial product after the symbol bit expansion is eliminated, where the correction accumulation circuit 12 may perform correction processing on the values in the partial product after the symbol bit expansion through two full adders in the first layer and the last layer full adder, where if each full adder corresponds to a number, the full adder performing correction processing in the first layer full adder may be the full adder of the next highest number, and the full adder performing correction processing in the last layer full adder may be the full adder of the highest number. In addition, the carry input signal received by the full adder of the lowest bit number of the last layer full adder may be equal to 0.

According to the multiplier provided by the embodiment, the correction accumulation circuit in the multiplier can carry out accumulation correction processing on the partial product obtained by the partial product obtaining unit after less symbol bit expansion is eliminated, so that a target operation result in multiplication operation is obtained, the complexity of the multiplier in realizing the multiplication operation is reduced, and the power consumption of the multiplier is effectively reduced.

Fig. 4 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the modified accumulation circuit 23, and the modified accumulation circuit 23 includes: a modified Wallace tree group sub-circuit 231 and an accumulation sub-circuit 232; wherein, the output end of the modified Wallace tree group sub-circuit 231 is connected with the input end of the accumulation sub-circuit 232; the modified wallace tree group sub-circuit 231 is configured to perform an accumulation modification process on the partial product after the symbol bit expansion is eliminated, and the accumulation sub-circuit 232 is configured to perform an accumulation process on the accumulation modification operation result.

Specifically, the modified wallace tree group sub-circuit 231 may perform accumulation modification processing on the numerical value in the partial product obtained by the regular signed number encoding circuit 211 after the sign bit is eliminated and expanded, and perform accumulation processing on the accumulation modification operation result obtained by the modified wallace tree group sub-circuit 13 through the accumulation sub-circuit 232, so as to obtain the target operation result in multiplication operation.

In one embodiment, the specific structure of the multiplier shown in fig. 4 is further illustrated, wherein the multiplier includes the modified wallace tree group sub-circuit 231, and the modified wallace tree group sub-circuit 231 includes: wallace tree units 2311 to 231n, wherein the Wallace tree units 2311 to 231n are used for performing accumulation correction processing on each column number of the partial product after the symbol bit expansion is eliminated.

Specifically, the circuit structure of the wale tree units 2311 through 231n may be implemented by a combination of a full adder and a half adder, and in addition, it may be understood that the wale tree units 2311 through 231n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Alternatively, the number N of the wallace tree units included in the modified wallace tree group sub-circuit 231 may be equal to 2 times the bit width N of the partial product after the symbol bit expansion is eliminated, where N may represent the number of values included in the target code obtained by the regular signed number coding circuit 21 minus 1; meanwhile, the n wallace tree units may perform parallel processing on the partial product of the target code, but the connection mode may be serial connection, where the partial product of the target code may be a partial product obtained by the partial product obtaining circuit 22 after all symbol bit expansion is eliminated. Alternatively, each Wallace tree unit in the modified Wallace tree group sub-circuit 23 may perform addition processing on all values of each column of all partial products after the cancellation of the sign bit extension, and each Wallace tree unit may output two signals, namely, carry signal Carry _i And a Sum bit signal Sum _i Wherein i may represent a number corresponding to each Wallace tree unit, and the number of the first Wallace tree unit is 0. Alternatively, the number of received input signals by each wallace tree unit may be equal to the number of all the values contained in the target code or the total number of the partial products after the symbol bit expansion is eliminated, or may be equal to the number of all the values contained in the target code or the total number of the partial products after the symbol bit expansion is eliminated plus 1.

In addition, in the process of adding each column value of the partial product after the symbol bit expansion, the multiplier corrects the two columns of data in the partial product after the symbol bit expansion by correcting the two columns of the column units in the column group sub-circuit 231, that is, the input signals of the two column units corresponding to the two columns of data in the partial product after the symbol bit expansion are more than the input signals of each column of the column units corresponding to the other column values in the partial product after the symbol bit expansion, and the input signal is 1.

In addition, the signal received by each Wallace tree cell in modified Wallace tree group sub-circuit 231 may include a carry input signal Cin _i Partial product input signal, carry output signal Cout _i . Alternatively, the partial product input signal received by each Wallace tree unit may be the numerical value of each column in the partial product after all the symbol bit expansion is eliminated, and the carry signal Cout output by each Wallace tree unit _i The number of bits of (a) may be equal to N _Cout ＝floor((N _I +N _Cin )/2) -1. Wherein N is _I Can represent the number of partial product value input signals of the Wallace tree unit, N _Cin Can represent the number of carry input signals of the Wallace tree unit, N _Cout The number of carry out signals that may represent the minimum of the Wallace tree cells, floor (·) may represent a rounding down function. Optionally, the carry input signal received by each of the wallace tree units in the modified wallace tree group sub-circuit 231 may be the carry output signal output by the last wallace tree unit, and the carry input signal received by the first wallace tree unit is 0, and at the same time, the number of carry signal input ports received by the first wallace tree unit may be the same as the number of carry signal input ports of other wallace tree units.

In this embodiment, if the serial numbers of n wallace tree units connected in series in the modified wallace tree group sub-circuit 231 are 1,2, …, i, …, n, the modified wallace tree group sub-circuit 231 may perform modification processing on two columns of data corresponding to the partial product after symbol bit expansion through the i th wallace tree unit and the n th wallace tree unit; if the partial product obtained by the first symbol bit cancellation code circuit 21 is a partial product obtained by the first symbol bit cancellation code circuit, the number of bits corresponding from the lowest bit to the highest bit is 1,2, …, m-2, m-1, m, where m corresponds to the number of Q bits, and 1 corresponds to the number of the lowest bit in the partial product obtained by the first symbol bit cancellation code circuit, i may be equal to N, and it may be understood that the modified wallace tree group sub-circuit 231 may perform the modification processing on the partial product obtained by the symbol bit cancellation code by the nth wallace tree unit and the last wallace tree unit, where N may represent the bit width of the multiplier received by the multiplier.

For example, if the multiplier currently handles a fixed-point number multiplication of 8 bits by 8 bits, the partial product obtained by the partial product acquisition circuit 22 after the cancellation sign bit expansion is "p _i8 p _i7 p _i6 p _i5 p _i4 p _i3 p _i2 p _i1 p _i0 "(i=1, …, n=9), where i may represent the i-th partial product after the symbol bit expansion is eliminated, and when the accumulation correction process is performed, the distribution rule of the 9 partial products after the symbol bit expansion may be shown in fig. 5, each origin represents each numerical value in the partial product after the symbol bit expansion, from the rightmost column to the leftmost column (17 columns of numerical values are shown in the figure, and when the actual operation is performed, the numerical value in the last column overflows, that is, the numerical value in the last column after the symbol bit expansion overflows, that is, the most significant numerical value of the last partial product after the symbol bit expansion does not participate in the subsequent accumulation operation), and in total 16 Wallace tree units are required to perform the accumulation correction process on the 9 partial products after the symbol bit expansion, the correction Wallace tree group sub-circuit 231 may perform the correction process through the 8 th Wallace tree unit and the last Wallace tree unit, the connection circuit diagram of the 16 Wallace tree units and the two Wallace tree unit diagrams for implementing the correction process are shown in fig. 6, where wallace_i represents that the two Wallace tree units are connected by the corresponding to the number of the two high-level signal units, and no carry signal number is output from the corresponding to the two high-level units, and no carry signal number is shown in the fig. 6.

According to the multiplier provided by the embodiment, the modified Wallace tree group sub-circuit in the multiplier can perform accumulation modification processing on the partial product obtained by the partial product obtaining unit after the less elimination of the sign bit expansion, so that a target operation result in multiplication operation is obtained, the complexity of the multiplier in realizing the multiplication operation is reduced, and the power consumption of the multiplier is effectively reduced.

In one embodiment, the specific structure of the multiplier shown in fig. 4 is further illustrated, where the multiplier includes the accumulation sub-circuit 232, and the accumulation sub-circuit 232 includes: and an adder 2321, where the adder 2321 is configured to perform an addition operation on the accumulation correction operation result.

In particular, adder 2321 may be a different bit width adder, which may be a carry-lookahead adder. Alternatively, the adder 2321 may receive two signals output by the modified wallace tree group sub-circuit 231, and perform addition operation on the two output signals to obtain a target operation result in the multiplication operation.

According to the multiplier provided by the embodiment, the multiplier can carry out accumulation processing on two paths of signals output by the modified Wallace tree group subcircuit through the accumulation subcircuit to obtain a target operation result of multiplication operation, and the process can reduce the complexity of the multiplier in realizing the multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier includes the adder 2321, and the adder 2321 includes: carry signal input port 2321a, and bit signal input port 2321b, and result output port 2321c; the carry signal input port 2321a is configured to receive a carry signal, the sum bit signal input port 2321b is configured to receive a sum bit signal, and the result output port 2321c is configured to output the target operation result obtained by performing accumulation processing on the carry signal and the sum bit signal.

Specifically, the adder 2321 may receive the Carry signal Carry output by the modified wallace tree group sub-circuit 231 through the Carry signal input port 2321a, receive the Sum bit signal Sum output by the modified wallace array circuit 231 through the Sum bit signal input port 2321b, and output a result of accumulating the Carry signal Carry and the Sum bit signal Sum through the result output port 2321 c.

It should be noted that, during multiplication, the multiplier may use addition with different bit widthsThe adder 2321 performs an addition operation on the Carry output signal Carry and the Sum bit output signal Sum output by the modified wallace tree group sub-circuit 231, where the bit width of the processable data of the adder 2321 may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Alternatively, each Wallace tree cell in modified Wallace tree group subcircuit 231 may output a Carry out signal Carry _i And a Sum bit output signal Sum _i (i=0, …,2N-1, i being the corresponding number of each wale tree unit, the number starting from 0). Optionally, the carry= { [ Carry ] received by adder 2321 ₀ ：Carry _2N-2 ]0, that is, the bit width of the Carry out signal Carry received by adder 2321 is 2N, the first 2N-1 digits in the Carry out signal Carry corresponds to the Carry out signal of the first 2N-1 wallace tree units in modified wallace tree group sub-circuit 231, and the last digit in the Carry out signal Carry can be replaced with 0. Alternatively, the Sum bit output signal Sum received by adder 2321 may have a bit width of 2N and a value in Sum bit output signal Sum may be equal to the Sum bit output signal of each of the wallace tree cells in modified wallace tree group subcircuit 231.

For example, if the multiplier currently processes a fixed-point multiplication operation with 8 bits by 8 bits, the adder 2321 may be a 16-bit Carry-ahead adder, as shown in fig. 6, the modified wallace tree group sub-circuit 231 may output a Sum bit output signal Sum and a Carry output signal Carry of 16 wallace tree units, but the Sum bit output signal received by the 16-bit Carry-ahead adder may be a complete Sum bit signal Sum output by the modified wallace tree group sub-circuit 231, and the received Carry output signal may be a Carry signal Carry after all Carry output signals of the Carry output signal output by the last wallace tree unit are combined with 0 in the modified wallace tree group sub-circuit 231.

According to the multiplier provided by the embodiment, the accumulation sub-circuit can be used for carrying out accumulation processing on two paths of signals output by the modified Wallace tree group sub-circuit to obtain the target operation result of multiplication operation, and the process can reduce the complexity of the multiplier in realizing the multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 7 is a flow chart of a data processing method provided in an embodiment, which can be processed by the multiplier shown in fig. 1, and the embodiment relates to a data multiplication operation process. As shown in fig. 7, the method includes:

s101, receiving data to be processed.

Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through a canonical signed number encoding circuit. Wherein the bit width of the multiplier may be equal to the bit width of the multiplicand.

S102, carrying out regular signed number coding processing on the data to be processed to obtain target codes.

Specifically, the multiplier can perform regular signed number coding processing on the received multiplier to be processed through the regular signed number coding circuit to obtain target codes. Wherein, the bit width of the target code can be equal to the to-be-processed multiplied digital width N plus 1.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S102 to obtain the target encoding may include: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1).

S103, obtaining a partial product after eliminating the sign bit expansion according to the data to be processed and the target code.

It should be noted that the regular signed number coding circuit may obtain a partial product after eliminating the sign bit expansion according to the multiplicand in the multiplication operation and the target code obtained by the regular signed number coding, and the number of the partial products after eliminating the sign bit expansion may be equal to the bit width of the target code.

S104, performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, and obtaining a target operation result.

Specifically, the multiplier can perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated through a layer-by-layer full adder in the correction accumulation circuit until the operation is finished by the last layer of full adder, so as to obtain a target operation result in the multiplication operation. Alternatively, the above-mentioned accumulation correction process may be characterized as a correction process performed in the process of accumulating the partial products after the symbol bit expansion is canceled, and the correction process may be performed by correcting the two full adders in the first layer full adder and the last layer full adder in the accumulation circuit. Optionally, the target operation result may be an operation result obtained by eliminating the sign bit expansion and performing correction accumulation processing. In addition, in the process of accumulation correction, the correction accumulation circuit may perform correction processing on the numerical value in the partial product after the symbol bit expansion is eliminated through two full adders in the first layer full adder and the last layer full adder, where if each full adder corresponds to a number, the full adder performing correction processing in the first layer full adder may be a full adder with a next highest number, and the full adder performing correction processing in the last layer full adder may be a full adder with a highest number.

In addition, the multiplier can also carry out accumulation processing on each column value of the partial product after the symbol bit expansion is eliminated through a correction Wallace tree group sub-circuit in the correction accumulation circuit, can carry out correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, outputs a carry output signal and a sum bit output signal after the correction processing through the correction Wallace tree group sub-circuit, and finally carries out accumulation processing on the carry output signal of the correction Wallace tree group sub-circuit and a signal after the last sum bit signal is replaced by 0 through the accumulation sub-circuit, and outputs a target operation result.

It should be noted that, if the multiplier currently processes N-bit data operation and 2N wallace tree units are serially connected in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree unit starts from 0, the modified wallace tree group sub-circuit may perform modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

According to the data processing method provided by the embodiment, data to be processed is received, regular signed number coding processing is carried out on the data to be processed, target codes are obtained, partial products after symbol bit expansion are eliminated are obtained according to the data to be processed and the target codes, and accumulation correction processing is carried out on the partial products after symbol bit expansion is eliminated, so that target operation results are obtained. Meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, the data processing method in S103, which obtains a partial product with the symbol bit spread eliminated according to the data to be processed and the target code, includes:

s1031, obtaining an original partial product according to the data to be processed and the target code.

It should be noted that the number of the original partial products may be equal to the bit width of the target code.

Exemplary, if the partial product acquisition unit receives an 8-bit multiplicand "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ "(i.e., X), the partial product acquisition unit may be based on the multiplicand" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ The corresponding original partial product is directly obtained by (i.e., X) and three values-1, 0,1 contained in the target code, when the one-bit value in the target code is-1, the original partial product can be-X, when the one-bit value in the target code is 0, the original partial product can be 0, and when the one-bit value in the target code is 1, the original partial product can be X.

S1032, carrying out addition operation processing on the original partial product to obtain a partial product with the sign bit expansion eliminated.

Optionally, in S1032, the adding operation is performed on the original partial product to obtain a partial product with the sign bit extension eliminated, including: and performing AND logic operation on the highest bit numerical value of the original partial product to obtain a partial product with the sign bit expansion eliminated.

Specifically, the multiplier may perform and logic operation on the highest bit value of each original partial product through the partial product obtaining unit, and may obtain an extra one bit value Q and a next highest bit value in the partial product after the symbol bit expansion is eliminated, so as to obtain a partial product after the symbol bit expansion is eliminated. Optionally, the additional one-bit value Q in the partial product after the symbol bit expansion is eliminated may be a carry signal of performing an and logic operation on the sum signal 1 of the highest-bit value in the original partial product, and the next highest-bit value in the partial product after the symbol bit expansion is eliminated may be a sum signal of performing an and logic operation on the sum signal of the highest-bit value in the original partial product.

According to the data processing method provided by the embodiment, an original partial product is obtained according to the data to be processed and the target code, AND logic operation processing is performed according to the highest bit number value of the original partial product, the partial product after the sign bit expansion is eliminated is obtained, and further, accumulation correction processing is performed on the partial product after the sign bit expansion is eliminated, so that a target operation result in multiplication operation is obtained; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 8 is a flow chart of a data processing method provided in an embodiment, which can be processed by the multiplier shown in fig. 2, and the embodiment relates to a data multiplication operation process. As shown in fig. 8, the method includes:

s201, receiving data to be processed.

S202, carrying out regular signed number coding processing on the data to be processed to obtain an original partial product.

Specifically, the multiplier performs regular signed number encoding processing on the multiplier in the multiplication operation through the regular signed number encoding circuit, and the partial product acquisition circuit can obtain an original partial product according to the result of the regular signed number encoding processing.

S203, performing logic operation processing according to the original partial product, and eliminating the sign extension bit to obtain a partial product with the sign extension eliminated.

Specifically, the multiplier can perform logic operation processing on the original partial product through a logic gate unit in the partial product acquisition circuit, and directly eliminates the numerical value of the sign extension bit to obtain the partial product with the sign bit extension eliminated.

S204, performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated, and obtaining a target operation result.

Specifically, the multiplier can perform accumulation correction processing on the partial product after the symbol bit expansion is eliminated through a layer-by-layer full adder in the correction accumulation circuit until the operation is finished by the final layer full adder, so as to obtain an operation result. Alternatively, the above-mentioned accumulation correction process may be characterized as a correction process performed in the process of accumulating the partial products after the symbol bit expansion is canceled, and the correction process may be performed by correcting the two full adders in the first layer full adder and the last layer full adder in the accumulation circuit. Alternatively, the operation result may be an operation result obtained by eliminating the sign bit expansion and performing correction accumulation processing. In addition, in the process of accumulation correction processing, the correction accumulation circuit can perform correction processing on the numerical value in the partial product after the symbol bit expansion is eliminated through two full adders in the first layer full adder and the last layer full adder, wherein if each full adder corresponds to one number, the full adder performing correction processing in the first layer full adder can be the full adder with the next highest number, and the full adder performing correction processing in the last layer full adder can be the full adder with the highest number.

In addition, the multiplier can also accumulate each column number value of the partial product after eliminating the sign bit expansion through a correction Wallace tree group sub-circuit in a correction accumulation circuit, can perform correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, outputs a Carry output signal and a sum bit output signal after the correction processing through the correction Wallace tree group sub-circuit, and finally outputs all Carry output signals Carry of the correction Wallace tree group sub-circuit through the accumulation sub-circuit _i And replacing the last Sum bit signal Sum with 0 _2N And accumulating all the sum bit signals and outputting the operation result. It should be noted that, if the multiplier currently processes N-bit data operation and 2N wallace tree units are serially connected in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree unit starts from 0, the modified wallace tree group sub-circuit may perform modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

According to the data processing method provided by the embodiment, data to be processed is received, regular signed number coding processing is conducted on the data to be processed to obtain an original partial product, logic operation processing is conducted on the original partial product to obtain a partial product after symbol bit expansion is eliminated, accumulation correction processing is conducted on the partial product after symbol bit expansion is eliminated to obtain a target operation result, and the method can conduct regular signed number coding on the received data to be processed, reduces the number of effective partial products in multiplication operation, and therefore reduces complexity of multiplication operation; meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, the data processing method in S202 performs regular signed number encoding processing on the data to be processed to obtain an original partial product, including:

s2021, carrying out regular signed number coding processing on the data to be processed to obtain target codes.

Specifically, the multiplier can perform regular signed number coding processing on the multiplier in multiplication operation through a regular signed number coding circuit to obtain target codes. Optionally, after the regular signed number coding processing, the obtained target codes include three values, namely-1, 0 and 1.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S2021 to obtain the target encoding may include: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

S2022, obtaining the original partial product according to the data to be processed and the target code.

It should be noted that the number of original partial products may be equal to the bit width of the target code.

Exemplary, if the original partial productThe acquisition unit receives an 8-bit multiplicand "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ "(i.e., X), the original partial product acquisition unit may be based on the multiplicand" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ The corresponding original partial product is directly obtained by (i.e., X) and three values-1, 0,1 contained in the target code, when the one-bit value in the target code is-1, the original partial product can be-X, when the one-bit value in the target code is 0, the original partial product can be 0, and when the one-bit value in the target code is 1, the original partial product can be X.

According to the data processing method provided by the embodiment, regular signed number coding processing is carried out on the data to be processed to obtain target codes, the original partial product is obtained according to the data to be processed and the target codes, then symbol bit expansion processing is carried out on the original partial product, and accumulation correction processing is carried out on the partial product after symbol bit expansion is eliminated to obtain a target operation result in multiplication operation. Meanwhile, the method can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In another embodiment, in the data processing method provided in the above S203, performing a logic operation process according to the original partial product, and eliminating the sign extension bit to obtain a partial product with the sign extension eliminated includes: and performing AND logic operation on the highest bit numerical value of the original partial product, and eliminating the sign expansion bit to obtain the partial product after eliminating the sign bit expansion.

Specifically, the multiplier may perform an and logic operation on the highest order numerical value in the original partial product through a logic gate unit in the partial product acquisition circuit to obtain a next highest order numerical value and a highest order numerical value in the partial product after the sign bit expansion is eliminated, and may further perform an and logic operation on the highest order numerical value in the original partial product and the signal 1 through a logic gate unit in the partial product acquisition circuit to obtain an additional one-bit numerical value Q in the partial product after the sign bit expansion is eliminated and a next highest order numerical value (i.e., a low one-bit numerical value of Q bits) in the partial product after the sign bit expansion is eliminated.

According to the data processing method provided by the embodiment, after the data to be processed is processed, the original partial product is obtained, and the AND logic operation is carried out on the highest bit numerical value of the original partial product, so that the symbol expansion bit is eliminated to obtain the partial product with the symbol bit expansion eliminated, and the power consumption of the multiplier can be effectively reduced.

The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers, wherein the multipliers are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers may be linked and data transferred through a specific structure, such as interconnection and data transfer through a PCIE bus, to support larger scale machine learning operations. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.

The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.

The embodiment of the application also provides a combined processing device which comprises the machine learning operation device, a general interconnection interface and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 9 is a schematic diagram of a combination processing apparatus.

Other processing means include one or more processor types of general-purpose/special-purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; other processing devices may cooperate with the machine learning computing device to perform the computing task.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a machine learning operation device chip; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to other processing devices.

Alternatively, as shown in fig. 10, the structure may further include a storage device connected to the machine learning operation device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the machine learning arithmetic device or the other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.

In some embodiments, a chip is also disclosed, which includes the machine learning computing device or the combination processing device.

In some embodiments, a chip package structure is disclosed, which includes the chip.

In some embodiments, a board card is provided that includes the chip package structure described above. As shown in fig. 11, fig. 11 provides a board that may include other mating components in addition to the chips 389, including but not limited to: a storage device 390, a receiving device 391 and a control device 392;

the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.

In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.

The receiving device is electrically connected with the chip in the chip packaging structure. The receiving means is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the receiving device may be another interface, and the application is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g., server) by the receiving apparatus.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device may be a multiplier, robot, computer, printer, scanner, tablet, smart terminal, cell phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, watch, headset, mobile storage, wearable device, vehicle, household appliance, and/or medical device.

The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of circuit combinations, but those skilled in the art should appreciate that the present application is not limited by the circuit combinations described, as some circuits may be implemented in other manners or structures according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A multiplier, the multiplier comprising: the device comprises a regular signed number coding circuit, a partial product acquisition circuit and a correction accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the partial product acquisition circuit, and the output end of the partial product acquisition circuit is connected with the input end of the correction accumulation circuit;

the regular signed number coding circuit is used for carrying out regular signed number coding on received first data to obtain target codes, the partial product acquisition circuit is used for obtaining corresponding original partial products according to each bit value and second data in the target codes, carrying out logic operation processing on the highest bit value of each original partial product to obtain partial products after each elimination sign bit expansion, and the correction accumulation circuit is used for carrying out accumulation correction processing on the partial products after each elimination sign bit expansion; the bit width of the target code is equal to the bit width of the data plus 1.

2. The multiplier of claim 1, wherein the canonical signed number coding circuit comprises: a data input port and a target code output port; the data input port is used for receiving first data subjected to regular signed number coding, and the target coding output port is used for outputting the target coding obtained after the received first data is subjected to regular signed number coding.

3. The multiplier according to claim 1 or 2, wherein said partial product obtaining circuit comprises an original partial product obtaining unit for obtaining each of said original partial products from each bit value in said target code and said second data, and a logic gate unit for performing a logic operation process on a highest bit value of each of said original partial products to obtain each of said partial products after said sign bit cancellation expansion.

4. A multiplier as claimed in any one of claims 1 to 3, in which the partial product acquisition circuit comprises an and circuit.

5. A multiplier as claimed in any one of claims 1 to 4, in which the modified accumulation circuit comprises: the system comprises a modified Wallace tree group sub-circuit and an accumulation sub-circuit, wherein the output end of the modified Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit;

the modified Wallace tree group sub-circuit is used for carrying out accumulation modification processing on the partial products after the symbol bit expansion is eliminated, and the accumulation sub-circuit is used for carrying out accumulation processing on the accumulation modification operation result.

6. The multiplier of claim 5, wherein said modified wallace tree group subcircuit comprises: and the Wallace tree unit is used for carrying out accumulation correction processing on each column number of the partial product after the symbol bit expansion is eliminated.

7. A multiplier as claimed in claim 5 or 6, in which the accumulation sub-circuit comprises: and the adder is used for carrying out addition operation on the accumulation correction operation result.

8. The multiplier of claim 7, wherein the adder comprises: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting a target operation result of accumulation processing of the carry signal and the sum bit signal.

9. A method of data processing, the method comprising:

receiving data to be processed;

carrying out regular signed number coding processing on the data to be processed to obtain a target code, and obtaining a plurality of original partial products according to each bit value in the data to be processed and the target code; the bit width of the target code is equal to the bit width of the data plus 1;

Performing logic operation processing according to the highest bit value in each original partial product, and eliminating the sign extension bit to obtain a partial product after the sign extension is eliminated;

and performing accumulation correction processing on the partial product after the symbol bit expansion is eliminated to obtain a target operation result.

10. The method of claim 9, wherein the performing regular signed number encoding on the data to be processed to obtain the target code comprises: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

11. The method according to any one of claims 9 or 10, wherein said performing a logical operation on each of said original partial products to eliminate sign extension bits to obtain each of the partial products after eliminating sign bit extension, comprises: and performing AND logic operation on the highest bit value of each original partial product, and eliminating the sign extension bit to obtain a partial product after eliminating the sign bit extension.

12. A machine learning computing device, characterized in that the machine learning computing device comprises one or more multipliers according to any one of claims 1-8 for acquiring input data and control information to be computed from other processing devices, executing specified machine learning computation, and transmitting the execution result to other processing devices through I/O interfaces;

When the machine learning operation device comprises a plurality of multipliers, the multipliers are connected through a preset specific structure and transmit data;

13. A combination processing device, comprising the machine learning computing device of claim 12, a universal interconnect interface, and other processing devices;

the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

14. The combination processing device of claim 13, further comprising: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

15. A neural network chip, characterized in that the neural network chip comprises the machine learning arithmetic device according to claim 12 or the combination processing device according to claim 13 or the combination processing device according to claim 14.

16. An electronic device comprising the neural network chip of claim 15.

17. A board, characterized in that, the board includes: a memory device, a receiving means and a control device, and a neural network chip as claimed in claim 15;

the neural network chip is respectively connected with the storage device, the control device and the receiving device;

the storage device is used for storing data;

the receiving device is used for realizing data transmission between the chip and external equipment;

the control device is used for monitoring the state of the chip.

18. The board card of claim 17, wherein the board card comprises,

the memory device includes: each group of storage units is connected with the chip through a bus, and the storage units are as follows: DDR SDRAM;

the chip comprises: the DDR controller is used for controlling data transmission and data storage of each storage unit; the receiving device is as follows: standard PCIE interfaces.