CN110515589B

CN110515589B - Multiplier, data processing method, chip and electronic equipment

Info

Publication number: CN110515589B
Application number: CN201910819020.4A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2024-04-09
Anticipated expiration: 2039-08-30
Also published as: CN110515589A

Abstract

The application provides a multiplier, a data processing method, a chip and electronic equipment, wherein the multiplier comprises: multiplication circuit, register control circuit, register circuit, state control circuit and selection circuit; the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the register control circuit, the output end of the register control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the register control circuit, the second output end of the state control circuit is connected with the second input end of the selection circuit, the multiplier can perform regular signed number coding on received data, the number of obtained effective partial products is small, and the complexity of the multiplier for realizing multiplication operation is reduced.

Description

Multiplier, data processing method, chip and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a multiplier, a data processing method, a chip, and an electronic device.

Background

With the continuous development of digital electronic technology, the rapid development of various artificial intelligence (Artificial Intelligence, AI) chips has also been increasingly demanded for high-performance digital multipliers. The neural network algorithm is one of algorithms widely used by intelligent chips, and multiplication operation through a multiplier is a common operation in the neural network algorithm.

At present, the multiplier takes each three-digit value in the multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a Wallace tree to obtain a target operation result. However, in the conventional technology, the number of non-zero numerical values in the code is large, and the number of corresponding partial products is large, so that the complexity of the multiplier in realizing multiplication is high.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multiplier, a data processing method, a chip, and an electronic device that can reduce the number of effective partial products obtained during a multiplication process, so as to reduce the complexity of the multiplication of the multiplier.

An embodiment of the present application provides a multiplier, including: the device comprises a multiplication circuit, a register control circuit, a register circuit, a state control circuit and a selection circuit, wherein the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the register control circuit, the output end of the register control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the register control circuit, and the second output end of the state control circuit is connected with the second input end of the selection circuit.

In one embodiment, the regular signed number coding sub-circuit includes a regular signed number coding unit and a partial product obtaining unit, where the regular signed number coding unit is configured to receive first data and perform the regular signed number coding processing on the first data to obtain the target code, the partial product obtaining unit is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product, the accumulating sub-circuit is configured to perform an accumulation processing on the partial product of the target code to obtain a multiplication result, the state control circuit is configured to obtain a storage instruction signal and a read instruction signal, the register control circuit is configured to determine the register circuit storing the multiplication result according to the storage instruction signal input by the state control circuit, the register circuit is configured to store the multiplication result, and the selection circuit is configured to read the data stored in the register circuit as the target multiplication result according to the received read instruction signal.

In one embodiment, the regular signed number coding unit may include: a data input port and a target code output port; the data input port is used for receiving the first data subjected to regular signed number coding processing, and the target coding output port is used for outputting the target code obtained after the first data is subjected to regular signed number coding processing.

In one embodiment, the partial product obtaining unit is specifically configured to perform conversion processing on the target code to obtain an original partial product, perform sign bit expansion processing on the original partial product to obtain a partial product after sign bit expansion, and obtain the partial product of the target code according to the partial product after sign bit expansion.

In one embodiment, the partial product acquisition unit includes: a target encoding input port, a second data input port, and a partial product output port; the target code input port is used for receiving the target code, the second data input port is used for receiving the second data, and the partial product output port is used for outputting a partial product of the target code.

In one embodiment, the accumulation sub-circuit includes: the Wallace tree group unit and the accumulation unit; the output end of the Wallace tree group unit is connected with the input end of the accumulation unit; the Wallace tree group unit is used for carrying out accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit is used for carrying out accumulation processing on the accumulation operation result.

In one embodiment, the Wallace tree group unit comprises: and the Wallace tree subunit is used for accumulating each column number value in the partial product of all target codes.

In one embodiment, the accumulating unit includes: and the adder is used for carrying out addition operation on the received accumulated correction result.

In one embodiment, the adder includes: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting a result of accumulation processing of the carry signal and the sum bit signal.

In one embodiment, the register circuit includes: and the register sub-circuit is used for storing multiplication operation results corresponding to different storage indication signals.

An embodiment of the present application provides a multiplier, including: the device comprises a multiplication circuit and a revolution circuit, wherein the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the input end of the revolution circuit, and the revolution circuit comprises a first conversion sub-circuit and a second conversion sub-circuit;

The regular signed number coding sub-circuit is used for carrying out regular signed number coding processing on received data to obtain target codes, and obtaining partial products of the target codes according to the target codes, the accumulation sub-circuit is used for carrying out correction accumulation processing on the partial products of the target codes to obtain multiplication results, and the first conversion sub-circuit and the second conversion sub-circuit are respectively used for carrying out revolution processing on the multiplication results to obtain target operation results.

In one embodiment, the revolution number circuit comprises an input port for receiving a data conversion signal; the data conversion signal is used for determining the data conversion type processed by the revolution number circuit.

In one embodiment, the first conversion sub-circuit is specifically configured to convert the multiplication result into the target operation result of a floating point type, and the second conversion sub-circuit is specifically configured to convert the multiplication result into the target operation result of a fixed point type.

According to the multiplier provided by the embodiment, the multiplier can perform regular signed number coding on the received data through the regular signed number coding sub-circuit, and the number of obtained effective partial products is small, so that the complexity of the multiplier in realizing multiplication operation is reduced.

The embodiment of the application provides a data processing method, which comprises the following steps:

receiving data to be processed;

carrying out regular signed number coding treatment on the data to be processed to obtain a partial product of target coding;

accumulating the partial products of the target codes to obtain multiplication results;

acquiring a storage indication signal and a reading indication signal;

storing a plurality of multiplication operation results into different register sub-circuits according to the storage indication signals;

and according to the reading indication signals, reading partial data corresponding to the multiplication operation result stored in different register sub-circuits to obtain a target operation result.

In one embodiment, the performing regular signed number encoding on the data to be processed to obtain a partial product of target encoding includes:

carrying out regular signed number coding treatment on the data to be treated to obtain an original partial product;

and performing sign bit expansion processing on the original partial product to obtain the target encoded partial product.

In one embodiment, the performing regular signed number encoding on the data to be processed to obtain an original partial product includes:

Carrying out regular signed number coding treatment on the data to be processed to obtain a target code;

and carrying out conversion processing according to the data to be processed and the target code to obtain the original partial product.

In one embodiment, the performing sign bit expansion processing on the original partial product to obtain a partial product of the target code includes: and performing bit filling processing on the original partial product to obtain the target encoded partial product.

In one embodiment, the storing the multiplication results according to the storage indication signal into different register sub-circuits includes:

storing a first multiplication result corresponding to the first storage indication signal into a first register sub-circuit;

and storing a second multiplication operation result corresponding to the second storage indication signal into a second register sub-circuit.

In one embodiment, the reading the partial data stored in the different register sub-circuits and corresponding to the multiplication result according to the reading instruction signal to obtain a target operation result includes:

according to a first reading instruction signal, reading a first part of data in a first multiplication result stored in the first register sub-circuit to obtain a first operation result;

Reading a second part of data in the first multiplication result stored in the first register sub-circuit according to a second reading instruction signal to obtain a second operation result;

according to a third reading instruction signal, reading the first part of data in the second multiplication result stored in the second register sub-circuit to obtain a third operation result;

and reading the second part of data in the second multiplication result stored in the second register sub-circuit according to a fourth reading indication signal to obtain a fourth operation result.

receiving a data conversion signal and data to be processed;

and carrying out revolution processing on the multiplication operation result according to the data conversion signal to obtain a target operation result, wherein the data conversion signal is used for indicating a multiplier to convert the target operation result into a required data type.

According to the data processing method provided by the embodiment, the received data to be processed can be subjected to regular signed number coding, and the number of effective partial products in multiplication operation is reduced, so that the complexity of the multiplication operation is reduced.

The embodiment of the application provides a machine learning operation device, which comprises one or more multipliers; the machine learning operation device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to the other processing devices through an I/O interface;

when the machine learning operation device comprises a plurality of multipliers, a plurality of calculation devices are connected through a preset specific structure and transmit data;

the multipliers are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or have respective control systems; the multipliers share the memory or have the memory of each; the interconnection mode of a plurality of multipliers is any interconnection topology.

The embodiment of the application provides a combined processing device, which comprises the machine learning processing device, a general interconnection interface and other processing devices; the machine learning operation device interacts with the other processing devices to jointly complete the operation appointed by the user; the combination processing device may further include a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

The embodiment of the application provides a neural network chip, which includes the multiplier, the machine learning computing device or the combination processing device.

The embodiment of the application provides a neural network chip packaging structure, which comprises the neural network chip.

The embodiment of the application provides a board card, which comprises the neural network chip packaging structure.

The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.

The chip provided by the embodiment of the application comprises at least one multiplier as described in any one of the above.

The electronic equipment provided by the embodiment of the application comprises the chip.

Drawings

FIG. 1 is a schematic diagram of a multiplier according to an embodiment;

FIG. 2 is a schematic diagram of a multiplier according to another embodiment;

FIG. 3 is a schematic diagram of a distribution rule of partial products of 9 target codes according to another embodiment;

FIG. 4 is a schematic diagram of an accumulation circuit for 8-bit data operation according to another embodiment;

FIG. 5 is a flow chart illustrating a data processing method according to an embodiment;

FIG. 6 is a flowchart of another data processing method according to another embodiment;

FIG. 7 is a block diagram of a combination processing apparatus according to an embodiment;

FIG. 8 is a block diagram of another combination processing apparatus according to an embodiment;

fig. 9 is a schematic structural diagram of a board according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The multiplier provided by the application can be applied to an AI chip, a Field programmable gate array FPGA (Field-Programmable Gate Array, FPGA) chip or other hardware circuit devices for comparison operation processing, and the specific structure schematic diagrams are shown in fig. 1 and 2.

Fig. 1 is a schematic diagram of a multiplier according to an embodiment. The multiplier comprises: the multiplication circuit 11 comprises a regular signed number coding sub-circuit 111 and an accumulation sub-circuit 112, wherein the output end of the regular signed number coding sub-circuit 111 is connected with the input end of the accumulation sub-circuit 112, the output end of the accumulation sub-circuit 112 is connected with the first input end of the register control circuit 12, the output end of the register control circuit 12 is connected with the input end of the register circuit 13, the output end of the register circuit 13 is connected with the first input end of the selection circuit 15, the first output end of the state control circuit 14 is connected with the second input end of the register control circuit 13, and the second output end of the state control circuit 14 is connected with the second input end of the selection circuit 15.

The regular signed number coding sub-circuit 111 includes a regular signed number coding unit 1111 and a partial product obtaining unit 1112, where the regular signed number coding unit 1111 is configured to receive first data and perform the regular signed number coding processing on the first data to obtain the target code, the partial product obtaining unit 1112 is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product, and the accumulating sub-circuit 112 is configured to perform the accumulating processing on the partial product of the target code to obtain a multiplication result; the state control circuit 14 is configured to obtain a storage indication signal and a read indication signal; the register control circuit 12 is configured to determine the register circuit 13 storing the multiplication result according to the storage instruction signal input by the state control circuit 14, the register circuit 13 is configured to store the multiplication result, and the selection circuit 15 is configured to read data in the multiplication result stored in the register circuit 13 as a target operation result according to the received read instruction signal.

Specifically, the regular signed number coding sub-circuit 111 may perform regular signed number coding processing on the received first data to obtain the target code by the regular signed number coding unit 1111, where the first data may be a multiplier in the multiplication operation. Alternatively, the partial product obtaining unit 1112 may obtain an original partial product according to the received second data and the target code, and obtain a partial product of the target code according to the original partial product, where the second data may be a multiplicand in the multiplication operation. The multiplier and the multiplicand may be fixed point numbers with the same bit width. Alternatively, the register circuit 13 may include a plurality of memory cells. Alternatively, the bit width of the multiplication result may be equal to 2 times the bit width of the data received by the regular signed number coding sub-circuit 111. Alternatively, the regular signed number coding sub-circuit 111 may process data with a fixed bit width, and the bit width of the data received by the regular signed number coding sub-circuit 111 may be equal to the bit width of the input port of the multiplier, and in this embodiment, the bit width of the output port of the multiplier may be less than 2 times the bit width of the input port. Alternatively, there may be a plurality of input ports of the selection circuit 15, each of which may have different functions, and one of which may have an output port. Alternatively, the bit width of the target operation result may be equal to 1/2 of the bit width of the multiplication result, which is not limited in this embodiment. In this embodiment, it is further understood that the bit width of the target operation result may be smaller than 2 times the bit width of the multiplication operation result. Alternatively, the number of the target codes may be equal to the number of the partial products of the target codes, and the target codes may include three values, namely-1, 0 and 1.

It should be noted that, when the above-mentioned state control circuit 14 may automatically obtain the accumulation sub-circuit 112 to obtain each multiplication operation, the corresponding storage indication signal, for example, when the accumulation sub-circuit 112 obtains the first multiplication operation result, the storage indication signal obtained by the state control circuit 14 may be 1, when the accumulation sub-circuit 112 obtains the second multiplication operation result, the storage indication signal obtained by the state control circuit 14 may be 2, and so on, the accumulation sub-circuit 112 obtains each multiplication operation result, and the value of the storage indication signal obtained by the state control circuit 14 may be 1 added on the basis of the value of the storage indication signal corresponding to the previous multiplication operation result. Optionally, the state control circuit 14 may further automatically obtain a read indication signal corresponding to the current clock cycle number when the multiplication result exists in the register circuit 13, where the state control circuit 14 may automatically obtain the current clock cycle number, and may also receive the clock cycle number transmitted by the external device. For example, if the first clock cycle is the time when the first multiplication result is stored in the register circuit 13, the corresponding read instruction signal acquired by the state control circuit 14 may be 1, and at this time, the selection circuit 15 may read part of the data stored in the register circuit 13, and the second clock cycle is the time when the corresponding read instruction signal acquired by the state control circuit 14 may be 2, and at this time, the selection circuit 15 may read the rest of the data in the first multiplication result stored in the register circuit 13, and it may be further understood that the multiplier may output one multiplication result corresponding to two clock cycles; however, when the second multiplication result is obtained after five clock cycles are required after the first multiplication result is obtained, the register circuit 13 may store the second multiplication result in the sixth clock cycle, and at this time, the corresponding read instruction signal obtained by the state control circuit 14 may be 3, and the value corresponding to the read instruction signal may be determined according to the number of data stored in the register circuit 13.

In addition, the multiplication result obtained by the accumulation sub-circuit 112 is not the target operation result obtained by the multiplier, the target operation result can be obtained by splicing two operation results output by the multiplier twice, the operation result output by the selection circuit 15 in the multiplier for the first time is spliced with the operation result output by the multiplier for the second time, the target operation result obtained by the multiplier can be obtained, and the operation results output by the selection circuit 15 for the two times are spliced to obtain the target operation result obtained by each multiplication operation of the multiplier. The multiplication circuit 11 may output one target operation result for a plurality of clock cycles.

It should be noted that, the multiplier may receive the multiplication result output by each multiplication of the accumulation sub-circuit 112 through the register control circuit 12, and determine a storage unit storing each multiplication result according to the received storage instruction signal. Alternatively, the selection circuit 15 may determine to read the data in the multiplication result stored in the corresponding register circuit 13 according to the received different read instruction signals. Alternatively, if the bit width of the input port of the multiplier is N and the bit width of the received data is also N, at this time, the bit width M of the output port of the multiplier may be equal to 2n/t+delta ((2n/t+delta) < 2N), where in general, the multiplication circuit 11 may complete the multiplication implemented by the multiplier once through t (t > 1) clock cycles, to obtain a multiplication result, and store the multiplication result obtained by the accumulation sub-circuit 112 in the multiplication circuit 11 into the register circuit 13, where delta (delta > =0) is a constant. In addition, there is a case where there is a small probability that the multiplier may complete one multiplication operation through m (m < t, and m < =1) clock cycles to obtain one multiplication operation result, and store the multiplication operation result obtained by the accumulation sub-circuit 112 in the multiplication operation circuit 11 into the register circuit 13. Alternatively, the selecting circuit 15 may read the data in the multiplication result stored in the register circuit 13 twice, where the bit width of the multiplication result may be equal to 2N, the bit width of the data in the read multiplication result may be equal to N, the selecting circuit 15 may read the high N bit data and the low N bit data in the same multiplication result respectively twice as the two operation results, and splice the two operation results to obtain the target operation result obtained by multiplying the multiplier.

In addition, in the present embodiment, it is understood that the above-described partial product acquisition unit 1112 may obtain a partial product after sign bit expansion from the original partial product, and obtain a partial product of the target code from the partial product after sign bit expansion. Alternatively, the bit width of the partial product after the sign bit expansion may be 2 times the data bit width N received by the multiplier, and the bit width of the original partial product may be equal to the data bit width N received by the multiplier. Alternatively, the upper N-bit value in the partial product after sign bit expansion may be equal to the highest N-bit value in the original partial product, i.e., the sign bit value of the original partial product, i.e., the upper n+1-bit value in the partial product after sign bit expansion is equal to the lower N-1-bit value in the original partial product.

For example, if the multiplier currently handles 8-bit by 8-bit fixed-point number multiplication, an original partial product obtained by the partial product obtaining unit 1112 is "p ₇ p ₆ p ₅ p ₄ p ₃ p ₂ p ₁ p ₀ "the sign bit expansion processing is performed on the original partial product, and the obtained sign bit expanded partial product can be expressed as" p ₇ p ₇ p ₇ p ₇ p ₇ p ₇ p ₇ p ₇ p ₇ p ₆ p ₅ p ₄ p ₃ p ₂ p ₁ p ₀ ”。

It may be further understood that, in the distribution rule of the partial products of all target codes, each partial product of target codes may have a corresponding partial product after the expansion of the sign bit, where the partial product of the first target code may be a partial product after the expansion of the first sign bit, starting from the partial product of the second target code, the corresponding partial product after the expansion of the sign bit may be shifted to the left by a value based on the partial product of the last target code, where the highest value of the partial product of each target code and the highest value of the partial product of the first target code are located in the same column, which is equivalent to that, starting from the partial product of the second target code, shifting to the left by the corresponding higher value after the partial product after the expansion of each sign bit does not perform addition.

According to the multiplier provided by the embodiment, the multiplier carries out regular signed number coding processing on received data through the regular signed number coding sub-circuit to obtain target codes, partial products of the target codes are obtained according to the target codes, the accumulation sub-circuit carries out accumulation processing on the partial products after sign bit expansion to obtain multiplication results, the state control circuit obtains storage indication signals and reading indication signals, the register control circuit determines a register circuit for storing the multiplication results according to the storage indication signals, the register circuit stores the multiplication results, meanwhile, the selection circuit reads data in the multiplication results stored in the register circuit according to the reading indication signals to obtain target operation results, the multiplier can carry out regular signed number coding processing on the received data through the regular signed number coding sub-circuit, the number of effective partial products obtained in the multiplication process is reduced, and therefore complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Fig. 2 is a schematic diagram of a specific structure of a multiplier according to an embodiment. The multiplier comprises: the device comprises a multiplication circuit 21 and a revolution circuit 22, wherein the multiplication circuit 21 comprises a regular signed number coding sub-circuit 211 and an accumulation sub-circuit 212, the output end of the regular signed number coding sub-circuit 211 is connected with the input end of the accumulation sub-circuit 212, the output end of the accumulation sub-circuit 212 is connected with the input end of the revolution circuit 22, and the revolution circuit 22 comprises a first conversion sub-circuit 221 and a second conversion sub-circuit 222; the regular signed number coding sub-circuit 211 is configured to perform regular signed number coding on the received data to obtain a target code, and obtain a partial product of the target code according to the target code, the accumulating sub-circuit 212 is configured to perform accumulating processing on the partial product of the target code to obtain a multiplication result, and the first converting sub-circuit 221 and the second converting sub-circuit 222 are respectively configured to perform revolution processing on the multiplication result to obtain a target operation result.

Optionally, the regular signed number coding sub-circuit 211 includes a regular signed number coding unit 2111 and a partial product obtaining unit 2112, where the regular signed number coding unit 2111 is configured to receive first data, perform the regular signed number coding processing on the first data to obtain the target code, and the partial product obtaining unit 2112 is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product.

Specifically, the regular signed number encoding sub-circuit 211 may perform regular signed number encoding processing on the received data, where the data may be a multiplier and a multiplicand in the multiplication operation, and the multiplier and the multiplicand may be fixed point numbers with a parity width. Optionally, the regular signed number coding sub-circuit 211 may include a plurality of data processing sub-circuits having different functions, one or more input ports of the plurality of data processing sub-circuits having different functions, different functions of each input port in each data processing sub-circuit, different output ports, and different circuit structures of the data processing sub-circuits having different functions. Alternatively, the revolution number circuit 22 may convert the multiplication result output by the accumulation sub-circuit 212 into data in a target format, that is, a target operation result, where the multiplication result may be a fixed point number, and the data in the target format may be a fixed point number or a floating point number, and in addition, the bit width of the data in the target format may be less than 2 times the bit width of the multiplication result. Alternatively, the target operation result may be part of the data in the multiplication operation result. Alternatively, the bit width of the target operation result may be equal to 1/2 of the bit width of the multiplication result, and may be equal to 1/4 of the bit width of the multiplication result, which is not limited in this embodiment. In this embodiment, it is also understood that the bit width of the target operation result is smaller than 2 times the bit width of the multiplication result. In addition, the multiplication result obtained by the accumulation sub-circuit 212 is not the target operation result obtained by the multiplier performing the multiplication operation, but only a part of the data in the target operation result. Alternatively, the number of the target codes may be equal to the number of the partial products of the target codes, and the target codes may include three values, namely-1, 0 and 1.

It should be noted that, the regular signed number coding sub-circuit 211 may perform multiplication processing on data with a fixed bit width, and the bit width of the data received by the regular signed number coding sub-circuit 211 may be equal to the bit width of the input port of the multiplier, and in this embodiment, the bit width of the output port of the multiplier may be less than 2 times the bit width of the input port.

Optionally, the revolution number circuit 22 includes an input port therein for receiving a data conversion signal. Optionally, the data conversion signal is used to determine the type of data conversion that the revolution number circuit 22 processes.

Alternatively, there may be a plurality of data conversion signals, and the revolution number circuit 22 corresponding to the different data conversion signals may convert the received data into the data in the target format. Alternatively, the data conversion types may include fixed point number to fixed point number, and fixed point number to floating point number. For example, if the bit widths of the input port and the output port of the multiplier are N, the multiplier may obtain a multiplication result with a 2N-bit width, and the multiplier may convert the multiplication result with a 2N-bit width into a target operation result with a N-bit width through the revolution circuit 22, where the target operation result may be a floating point number, and in addition, the multiplier may convert the multiplication result with a 2N-bit width into a fixed point number with a N-bit width, that is, the target operation result through the revolution circuit 22. In this embodiment, the circuit structure and the function of the regular signed number coding sub-circuit 211 are the same as the circuit structure and the function of the regular signed number coding sub-circuit 111, and the specific structure of the regular signed number coding sub-circuit 211 is not repeated in this embodiment.

According to the multiplier provided by the embodiment, the regular signed number coding sub-circuit can be adopted for carrying out regular signed number coding processing on received data, so that the number of effective partial products obtained in the multiplication process is reduced, and the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

As one embodiment, the regular signed number coding unit 1111 may include: a first data input port 1111a and a target code output port 1111b; the first data input port 1111a is configured to receive the first data subjected to regular signed number encoding, and the target encoding output port 1111b is configured to output the target encoding obtained by performing regular signed number encoding on the first data.

Specifically, the first data received by the first data input port 1111a in the regular signed number coding unit 1111 may be a multiplier in the multiplication operation, and the multiplier may be a fixed point number. Alternatively, the second data received by the partial product acquiring unit 1112 may be a multiplicand in the multiplication operation, the multiplicand may be a fixed point number, and the multiplier and the multiplicand may be parity-wide data. Alternatively, the number of target codes may be equal to the number of original partial products and the number of target-coded partial products.

It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new oneThe data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1). Further, in the regular signed number encoding process, data 11 may be converted to (100-001), i.e., data 11 may be equivalently converted to 10 (-1); data 111 may be converted to (1000-0001), i.e., data 111 may be equivalently converted to 100 (-1); by analogy, the other consecutive l (l>=2) the manner of the bit-number 1 conversion process is also similar.

For example, the multiplier received by the regular-symbol-number encoding unit 1111 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 00 (-1) 0), the fifth new data obtained after the fifth conversion processing is continuously performed on the fourth new data is '010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (-1) 0', and no continuous l (l > =2) bit number value 1 exists in the fifth new data, at this time, the fifth new data can be called intermediate coding, and after the intermediate coding is subjected to the bit supplementing processing once, the regular signed number coding processing is characterized, wherein the bit width of the intermediate coding can be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding unit 1111 performs the regular signed number encoding processing on the multiplier, in the new data (i.e. intermediate encoding) obtained, if the highest order number value and the next highest order number value in the new data are "10" or "01", the regular signed number encoding unit 1111 may supplement one bit value 0 to the higher order position of the highest order number value of the intermediate encoding obtained by the new data, so as to obtain the highest three-order number value of the corresponding target encoding as "010" or "001", respectively. Alternatively, the above intermediate encoded bit width may be equal to the target encoded bit width minus 1.

It should be noted that the regular signed number coding unit 1111 may output the target code through the target code output port 1111 b. Alternatively, the bit width of the target code may be equal to the bit width of the data received by the regular signed number coding unit 1111, and the target code may include three values, namely-1, 0 and 1, respectively, and it is also understood that the number of the values included in the target code may be equal to the bit width of the target code.

According to the multiplier provided by the embodiment, the regular signed number coding unit in the multiplication circuit can perform regular signed number coding processing on received data to obtain target codes, the partial product obtaining unit obtains original partial products according to each target code, the partial products of the target codes are obtained according to the original partial products, the accumulation sub-circuit is used for accumulating the partial products of the target codes to obtain multiplication processing, the state control circuit is used for obtaining storage indication signals and reading indication signals, the register control circuit is used for determining a register circuit for storing multiplication results according to the storage indication signals, the register circuit is used for storing the multiplication results, meanwhile, the selection circuit is used for reading the data in the register circuit according to the reading indication signals to obtain target operation results, the multiplier can adopt the regular signed number coding processing on the received data to reduce the number of effective partial products obtained in the multiplication process, and therefore complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

As one embodiment, the partial product obtaining unit 1112 is specifically configured to perform a conversion process on the target code to obtain an original partial product, perform a sign bit expansion process on the original partial product to obtain a sign bit expanded partial product, and obtain the partial product of the target code according to the sign bit expanded partial product.

In particular, the conversion process described above may be characterized as converting the value in the target code into the original partial product based on the multiplicand (i.e., X) in the multiplication operation. Optionally, each bit value in the target code has a corresponding original partial product; if the value in the target code is-1, the corresponding original partial product may be-X, if the value in the target code is 1, the corresponding original partial product may be X, and if the value in the target code is 0, the corresponding original partial product may be 0. Alternatively, the original partial product may be a partial product not subjected to sign bit expansion, and the bit width of the original partial product may be the same as the bit width of the data currently processed by the multiplication circuit 11. Alternatively, the bit width of the partial product after the sign bit expansion may be equal to 2 times of the bit width N of the multiplier processing data, and at this time, the bit width of the original partial product may be equal to N. Alternatively, the lower N-bit value in the partial product after sign bit expansion may be equal to the N-bit value contained in the original partial product, and the upper N-bit value in the partial product after sign bit expansion may be equal to the highest-bit value of the original partial product, i.e., the sign bit value of the original partial product.

In addition, the partial product obtaining unit 1112 may obtain a partial product of the target code according to the obtained partial product after all the sign bit expansion, and in a distribution rule of the partial products of all the target codes, the partial product of the first target code may be equal to the partial product after the first sign bit expansion, starting from the partial product of the second target code, a highest bit value of the partial product of each target code may be in the same column as a highest bit value of the partial product of the first target code, a bit width of the partial product of each target code may be equal to a bit width of the partial product of the last target code minus 1, and may be equal to a bit width of the partial product of each corresponding sign bit expansion minus 2N minus (i-1), where i represents a number of the partial product of the target code starting from 1, and a distribution diagram of the obtained 9 partial products of the target code may be shown in fig. 3.

Optionally, the partial product acquiring unit 1112 includes: a target encoding input port 1112a, a second data input port 1112b, and a partial product output port 1112c; the target code input port 1112a is configured to receive the target code, the second data input port 1112b is configured to receive the second data, and the partial product output port 1112c is configured to output a partial product of the target code.

In this embodiment, the partial product acquiring unit 1112 may receive the target code obtained by the regular signed number encoding unit 1111 through the target code input port 1112a, receive the second data through the second data input port 1112b, perform conversion processing according to the target code and the second data, and perform shift processing to obtain a partial product of the target code, and output the partial product of the target code through the partial product output port 1112 c.

The number of the effective partial products which can be obtained by the multiplier is small, so that the complexity of the multiplier in realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Another embodiment provides a multiplier, wherein the multiplier includes the accumulation sub-circuit 112, and the accumulation sub-circuit 112 includes: a Wallace tree group unit 1121 and an accumulation unit 1122; wherein, the output end of the Wallace tree group unit 1121 is connected with the input end of the accumulation unit 1122; the wallace tree group unit 1121 is configured to perform accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit 1122 is configured to perform accumulation processing on the accumulation operation result.

Specifically, the wallace tree group unit 1121 may perform accumulation processing on the values in the partial products of all the target codes obtained by the partial product obtaining unit 1112 to obtain an accumulation operation result, and perform accumulation processing on the accumulation operation result obtained by the wallace tree group unit 1121 by the accumulation unit 1122 to obtain the target operation result.

According to the multiplier provided by the embodiment, the Wallace tree group unit can be used for accumulating the partial products of the target codes, the accumulation unit is used for accumulating the accumulation results to obtain the multiplication results, and the target operation results are obtained according to the multiplication results, so that the number of effective partial products obtained by the multiplier is ensured to be small, and the complexity of the multiplier in realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Another embodiment provides a wallace tree group unit 1121 in a multiplier comprising: the Wallace tree subunits 1121_1-1121—n are configured to accumulate each column number in the partial product of all target codes.

Specifically, the circuit structure of the Wallace tree subunits 1121_1 to 1121—n may be realized by a combination of a full adder and a half adder, and in addition, it may be understood that the Wallace tree subunits 1121_1 to 1121—n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Alternatively, the number n of the wallace tree subunits contained in the wallace tree group unit 1121 may be equal to 2 times the current processed data bit width of the multiplication circuit 11, and the n wallace tree subunits may process the partial product of the target code in parallel, but the connection manner may be serial connection. Optionally, each Wallace tree subunit in Wallace tree group 1121 may perform addition processing on each column value in the partial product of all target codes, and each Wallace tree subunit may output two signals, namely Carry signal Carry _i And a Sum bit signal Sum _i Wherein i can represent the number corresponding to each Wallace tree subunit, and the number of the first Wallace tree subunit is 1. Alternatively, the number of input signals received by each Wallace tree subunit may be equal to the number of target codes or the number of partial products after symbol bit expansion.

In addition, the signal received by each Wallace tree subunit of Wallace tree group unit 1121 may include a carry-in signal Cin _i Partial product input signal, carry output signal Cout _i . Alternatively, the partial product input signal received by each Wallace tree subunit may be each column of values in the partial product of all target codes, and the carry signal Cout output by each Wallace tree subunit _i The number of bits of (a) may be equal to N _Cout ＝floor((N _I +N _Cin )/2) -1. Wherein N is _I Partial product value input capable of representing the Wallace tree subunitNumber of incoming signals, N _Cin Can represent the number of carry input signals of the Wallace tree subunit, N _Cout The number of carry out signals that may represent the minimum of the Wallace tree subunits, floor (·) may represent a rounding down function. Alternatively, the carry input signal received by each of the wallace tree subunits in the wallace tree group 1121 may be the carry output signal output by the last wallace tree subunit, and the carry input signal received by the first wallace tree subunit may be 0, and meanwhile, the number of carry signal input ports received by the first wallace tree subunit may be the same as the number of carry signal input ports of other wallace tree subunits.

For example, if the multiplication circuit 11 currently processes a multiplication of 8 bits by 8 bits, the partial product obtained by the partial product obtaining unit 1112 after the sign bit expansion is "p _i9 p _i9 p _i9 p _i9 p _i9 p _i9 p _i9 p _i9 p _i8 p _i7 p _i6 p _i5 p _i4 p _i3 p _i2 p _i1 "(i=1, …, n=9), where i may represent the partial product of the i-th symbol bit extension, and obtain 9 target encoded partial products from the 9 symbol bit extension partial products, and perform accumulation processing on the 9 target encoded partial products. Alternatively, as shown in fig. 3, each origin may represent each bit value in the partial product after the sign bit expansion, and the partial product of the first target code may be the partial product after the first sign bit expansion, where in the distribution rule of the partial product of the 9 target codes, each partial product of the target code may have a corresponding partial product after the sign bit expansion, starting from the partial product of the second target code, the corresponding partial product after the sign bit expansion may be shifted to the left by a bit value based on the partial product of the last target code, and the highest bit value of the partial product of each target code and the highest bit value of the partial product of the first target code are located in the same column, which corresponds to shifting each of the partial products of the second target code to the left After the sign bit is expanded, the higher numerical value corresponding to the left shift is not added. Alternatively, among the 9 target encoded partial products, the first target encoded partial product may be a partial product after the first sign bit expansion, and starting from the second target encoded partial product, the highest bit number value of each target encoded partial product is located in the same column as the highest bit number value of the first target encoded partial product; from the rightmost column to the leftmost column, 16 Wallace tree subunits are required to accumulate partial products of 9 symbol target codes, a connection circuit diagram of the 16 Wallace tree subunits is shown in fig. 4, wherein Wallace_i in fig. 4 represents Wallace tree subunits, i is the number of the Wallace tree subunits from 1, a solid line connected between every two Wallace tree subunits represents that the Wallace tree subunits corresponding to the high-order number have carry output signals, and a dotted line represents that the Wallace tree subunits corresponding to the high-order number have no carry output signals.

The multiplier provided by the embodiment has the advantages that the number of the effective partial products obtained by the multiplier is small, and the complexity of the multiplier in realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

As one embodiment, the accumulation unit 1122 in the multiplier includes: and the adder is used for carrying out addition operation on the received accumulated correction result.

In particular, the adders may be adders of different bit widths, which may be carry-lookahead adders. Alternatively, the adder may receive two signals output from the modified wallace tree group unit 1121, add the two output signals, and output the multiplication result.

Optionally, the adder includes: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting a result of accumulation processing of the carry signal and the sum bit signal.

Specifically, the adder may receive the Carry signal Carry output by the modified wallace tree group unit 1121 through a Carry signal input port, receive the Sum bit signal Sum output by the modified wallace tree group unit 1121 through a Sum bit signal input port, and output a result of accumulating the Carry signal Carry and the Sum bit signal Sum through a result output port.

In addition, during multiplication, the multiplication circuit 11 may add the Carry output signal Carry and the Sum output signal Sum output from the modified wallace tree group unit 1121 by using adders with different bit widths, where the bit width of the processable data of the adders may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Alternatively, each Wallace tree subunit of the modified Wallace tree group unit 1121 may output a Carry-out signal Carry _i And a Sum bit output signal Sum _i (i=0, …,2N-1, i being the corresponding number of each wallace subunit, the number starting from 0). Optionally, the carry= { [ Carry ] received by the adder ₀ ：Carry _2N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder is 2N, the first 2N-1 digits in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 wallace tree subunits in the modified wallace tree group unit 1121, and the last digit in the Carry output signal Carry may be replaced by a digit 0. Alternatively, the Sum bit output signal Sum received by the adder may have a bit width of 2N and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each of the modified wallace tree subunits in the modified wallace tree group unit 1121.

For example, if the multiplication circuit 11 currently processes 8-bit multiplication, the adder may be a 16-bit Carry-ahead adder, and continuing to be illustrated in fig. 4, the modified wallace tree group unit 1121 may output the Sum and Carry output signals Sum and Carry of the 16 wallace tree subunits, but the Sum and Carry output signal received by the 16-bit Carry-ahead adder may be the complete Sum signal Sum output by the modified wallace tree group unit 1121, and the received Carry output signal may be all the Carry output signals of the Carry output signal output by the last wallace tree subunit in the modified wallace tree group unit 1121, except the Carry signal Carry after being combined with the value 0.

According to the multiplier provided by the embodiment, the number of the effective partial products obtained by the multiplier is small, the complexity of multiplication operation is reduced, the operation efficiency of the multiplication operation is improved, and the power consumption of the multiplier is effectively reduced.

In an embodiment the multiplier comprises said register circuit 13, which register circuit 13 comprises: and a register sub-circuit 131, where the register sub-circuit 131 is configured to store the multiplication results corresponding to different storage instruction signals.

Specifically, the register circuit 13 may include two or more register sub-circuits 131, and it is further understood that the number of register sub-circuits 131 in the register circuit 13 may be equal to 2N _in /N _out ，N _in Representing the width of the data bit received by the multiplier, N _out (N _out <2N _in ) Representing the data bit width of the multiplier output. Alternatively, the data bit width stored by the register sub-circuit 131 may be equal to 2 times the multiplier input port bit width. Alternatively, the bit width of the data received by the multiplier may be equal to the bit width of the input port of the multiplier, and the bit width of the data output by the multiplier may be equal to the bit width of the input port of the multiplier, or may be less than 2 times the bit width of the input port of the multiplier. For example, if the bit width of the input port and the bit width of the output port of the multiplier are both N bits, the register circuit 13 needs to be combined by two register sub-circuits 131; if the bit width of the input port of the multiplier is N bits and the bit width of the output port is N/2 bits, the register circuit 13 needs to be combined by four register sub-circuits 131. Optionally, the multiplier may store the multiplication result obtained by each multiplication to the corresponding 2N according to the storage instruction signal _in /N _out Among the register sub-circuits 131, different ones of the storage instruction signals have different ones of the register sub-circuits 131 that store multiplication results correspondingly. Optionally, each multiplication result obtained by the multiplier can only be according to the register sub-circuit corresponding to the storage indication signal 131, and the multiplication result obtained each time cannot be stored in another register sub-circuit 131 that does not correspond to the storage instruction signal.

For example, if the register circuit 13 has n register sub-circuits 131 with corresponding numbers of 1,2,3, & gt, n, the first multiplication result obtained by the multiplier may be stored in the register sub-circuit 131 with number 1, at this time, the value of the storage instruction signal may be 1, the second multiplication result obtained by the multiplier may be stored in the register sub-circuit 132 with number 2, at this time, the value of the storage instruction signal may be 2, it may be understood that when the value of the storage instruction signal is odd, the corresponding number of the register sub-circuit 131 storing the multiplication result is also odd, and when the value of the storage instruction signal is even, the corresponding number of the register sub-circuit 131 storing the multiplication result is also even, wherein the value of the storage instruction signal may be equal to the number of the register sub-circuit 131 storing the multiplication result.

According to the multiplier, the register sub-circuit in the multiplier stores multiplication results obtained by each multiplication operation into different register sub-circuits according to different storage indication signals, and further outputs data in the multiplication operation results stored by the corresponding register sub-circuits according to the reading indication signals, so that the target operation results are output through the multiplier with the output port bit width not being 2 times as wide as the input port bit width, meanwhile, the number of effective partial products obtained by the multiplier is small, and the complexity of the multiplier in realizing multiplication operation is reduced.

Another embodiment provides a multiplier, wherein the multiplier includes the accumulation sub-circuit 212, and the accumulation sub-circuit 212 includes: a waling tree group unit 2121 and an accumulation unit 2122; wherein, the output end of the Wallace tree group unit 2121 is connected with the input end of the accumulating unit 2122; the wallace tree group unit 2121 is configured to perform accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit 2122 is configured to perform accumulation processing on the accumulation operation result to obtain the target operation result.

Specifically, the wallace tree group unit 2121 may perform accumulation processing on the values in the partial products of all the target codes obtained by the partial product obtaining unit 2112 to obtain an accumulation operation result, and perform accumulation processing on the accumulation operation result obtained by the wallace tree group unit 2121 by the accumulation unit 2122 to obtain the target operation result.

Optionally, a multiplier includes the wallace tree group unit 2121, and the wallace tree group unit 2121 includes: the Wallace tree subunits 2121_1 to 2121—n are configured to accumulate each column value in the partial product of all target codes.

In this embodiment, the circuit structure and the function of the wallace tree unit 2121 may be the same as the circuit structure and the function of the wallace tree unit 1121, and the specific structure of the wallace tree unit 2121 is not described in detail in this embodiment.

According to the multiplier provided by the embodiment, the Wallace tree group unit can be used for accumulating the partial products of the target codes, the accumulation unit is used for accumulating the results to obtain multiplication results, and the target operation results are obtained according to the multiplication results, so that the number of effective partial products obtained by the multiplier is ensured to be small, and the complexity of the multiplier in realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

As one embodiment, the multiplier includes the accumulating unit 2122, and the accumulating unit 2122 includes: and the adder is used for carrying out addition operation on the accumulation operation result.

In particular, the adders may be adders of different bit widths, which may be carry-lookahead adders. Alternatively, the adder may receive two signals output by the wallace tree group unit 2121, perform addition on the two output signals, and output a multiplication result.

According to the multiplier provided by the embodiment, the accumulation unit can be used for carrying out accumulation processing on two paths of signals output by the Wallace tree group unit, outputting a multiplication result and obtaining a target operation result according to the multiplication result, so that the number of effective partial products obtained by the multiplier is ensured to be small, and the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

In one embodiment, the multiplier comprises the adder, the adder comprising: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting a multiplication operation result obtained by accumulating the carry signal and the sum bit signal.

Specifically, the adder may receive the Carry signal Carry output by the wallace tree group unit 2121 through a Carry signal input port, receive the Sum bit signal Sum output by the wallace tree group unit 2121 through a Sum bit signal input port, and accumulate the Carry signal Carry and the Sum bit signal Sum to obtain a multiplication result, and output the multiplication result through a result output port.

It should be noted that, during multiplication, the multiplication circuit 21 may add the Carry output signal Carry and the Sum output signal Sum output from the wallace tree group unit 2121 by using adders with different bit widths, where the bit width of the data that can be processed by the adders may be equal to 2 times the bit width N of the data that is currently processed by the multiplier. Alternatively, each Wallace tree subunit of Wallace tree group unit 2121 may output a Carry-out signal Carry _i And a Sum bit output signal Sum _i (i=0, …,2N-1, i being the corresponding number of each wallace subunit, the number starting from 0). Optionally, the carry= { [ Carry ] received by the adder ₀ ：Carry _2N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder is 2N, the first 2N-1 digits in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 wallace tree subunits in the wallace tree group unit 2121, and the last digit in the Carry output signal Carry may be replaced with 0. Alternatively to this, the method may comprise,the Sum bit output signal Sum received by the adder may have a bit width of 2N and a value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each of the wallace tree subunits in the wallace tree group unit 2121.

For example, if the multiplication circuit 11 currently processes 8-bit multiplication, the adder may be a 16-bit Carry-ahead adder, and as further shown in fig. 4, the wallace tree group unit 2121 may output the Sum bit output signal Sum and the Carry output signal Carry of the 16 wallace tree subunits, but the Sum bit output signal received by the 16-bit Carry-ahead adder may be the complete Sum bit signal Sum output by the wallace tree group unit 2121, and the received Carry output signal may be the Carry signal Carry after all Carry output signals of the Carry output signal output by the last wallace tree subunit are combined with 0 in the wallace tree group unit 2121.

According to the multiplier provided by the embodiment, the accumulation unit can perform accumulation operation on two paths of signals output by the Wallace tree group unit, a multiplication result is output, and a target operation result is obtained according to the multiplication result, so that the number of effective partial products acquired by the multiplier is ensured to be small, and the complexity of the multiplier in realizing multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.

Another embodiment provides a multiplier, which includes the first conversion sub-circuit 221 and the second conversion sub-circuit 222, where the first conversion sub-circuit 221 is specifically configured to convert the multiplication result into the target operation result of a floating point type, and the second conversion sub-circuit 222 is specifically configured to convert the multiplication result into the target operation result of a fixed point type.

Specifically, the bit width of the multiplication result may be 2 times the bit width of the data received by the multiplier, the bit width of the floating point type operation result and the bit width of the fixed point type operation result may be equal to the bit width of the output port of the multiplier, and the bit width of the floating point type operation result may be equal to the bit width of the fixed point type operation result in the revolution circuit 22.

In the revolution number circuit 22, the first conversion sub-circuit 221 and the second conversion sub-circuit 222 are not connected to each other, and are independent of each other, and each time the multiplication operation is performed, the revolution number circuit 22 only needs to perform the data revolution number processing by using the first conversion sub-circuit 221 or the second conversion sub-circuit 222, so that the target operation result can be obtained. Alternatively, the revolution number circuit 22 may determine whether the current multiplication operation requires the data revolution number processing by the first conversion sub-circuit 221 or the second conversion sub-circuit 222 according to the received data conversion signal.

Alternatively, the data conversion signal may include two signals, which may be represented as binary values of 00, 01, respectively, wherein the signal represented by the data conversion signal of 00 may include a fixed point number of which the data received by the revolution circuit 22 is 2N-bit wide, the fixed point number of which 2N-bit wide needs to be converted into a fixed point number of which N-bit wide, and a position of the fixed point number decimal point after the conversion, wherein the position of the fixed point number decimal point of which 2N-bit wide before the conversion may be determined; the signal representing a data conversion signal of 01 may include a fixed point number of 2N bits wide as a result of the multiplication operation received by the revolution circuit 22, which fixed point number of 2N bits wide is required to be converted into a floating point number of N bits wide. Alternatively, the revolution number circuit 22 may perform different revolution number processing on the received multiplication result through the first conversion sub-circuit 221 or the second conversion sub-circuit 222 according to the received two different data conversion signals, and the specific implementation manner is as follows:

(1) If the data conversion signal received by the revolution number circuit 22 is 00, the revolution number circuit 22 may convert the fixed point number with the 2N bit width into the fixed point number with the N bit width, at this time, the revolution number circuit 22 may perform data conversion on the received fixed point number with the 2N bit width through the second conversion sub-circuit 222, specifically, when the revolution number is processed, the position of the fixed point number decimal point with the N bit width after the target conversion needs to be aligned with the position of the fixed point number decimal point with the 2N bit width before the conversion, then N bit values are shared before and after the fixed point number decimal point with the 2N bit width before the conversion are intercepted, so as to obtain the fixed point number with the N bit width after the conversion, and the intercepting manner may be divided into three cases:

in case a, when the truncated N-bit values are all included in the fixed-point number with the 2N-bit width before conversion, the second conversion sub-circuit 222 may directly truncate the N-bit values before and after the fixed-point number with the 2N-bit width before conversion;

in case b, when a part of the truncated N-bit values are included in the fixed-point number with the width of 2N bits before the conversion, and no corresponding part of the truncated N-bit values are truncated in the fixed-point number with the width of 2N bits before the conversion, the second conversion sub-circuit 222 may use the sign bit of the fixed-point number with the width of 2N bits before the conversion to supplement each bit of the truncated N-bit values, and then truncate the N-bit values from the fixed-point number after the supplement;

In case c, when a part of the truncated N-bit values are included in the fixed-point number with the width of 2N bits before the conversion, and no corresponding part of the lower-bit values in the N-bit values to be truncated are available in the fixed-point number with the width of 2N bits before the conversion, the second conversion sub-circuit 222 may supplement each of the N-bit values according to the positive and negative of the fixed-point number with the width of 2N bits before the conversion, if the fixed-point number with the width of 2N bits before the conversion is positive, the each of the N-bit values may be supplemented with a value of 0, otherwise, a value of 1 is used, and then the N-bit values are truncated from the fixed-point number after the supplement;

(2) If the data conversion signal received by the revolution circuit 22 is 01, the revolution circuit 22 may convert the fixed point number with the 2N bit width into the floating point number with the N bit width, at this time, the revolution circuit 22 may perform data conversion on the received fixed point number with the 2N bit width through the first conversion sub-circuit 221, specifically, when the revolution is processed, the highest digit value (i.e., sign bit) of the fixed point number may be used as the sign digit value of the floating point after conversion, in addition, if the fixed point number with the 2N bit before conversion is positive, the sign digit of the highest digit value is removed, when the value 1 is found from the highest digit with the 2N-1 bit, there is an m-digit value after the value 1 is found, at this time, the exponent digit value after conversion may be equal to m plus the exponent digit offset value i, and the position of the fixed point with the 2N bit before conversion is subtracted, however, if the fixed point with the 2N bit before conversion is negative, the sign digit value is removed, and if the sign digit with the 2N-1 bit is found from the highest digit value, and if the N is found from the lowest digit value, the N is found from the N-digit value, and N is found from the N-digit value after the N is the N-zero, and if the N is found from the N-digit value.

By way of example, if it is desired to convert a fixed-point number of 2N bits wide to a floating-point number of 16 bits wide, i may be equal to 16, and N may be equal to 10; if it is necessary to convert a fixed-point number of 2N bit width to a floating-point number of 32 bit width, i may be equal to 127, N may be equal to 23; if it is desired to convert a fixed-point number of 2N bits wide to a floating-point number of 64 bits wide, i may be equal to 1023 and N may be equal to 52.

According to the multiplier provided by the embodiment, the number of revolutions of the multiplier can be used for converting the multiplication result into the data with the bit width equal to that of the output port of the multiplier, and then the target operation result is output, so that the bit width of the obtained target operation result can be smaller than 2 times of the bit width of the data input by the multiplier, the requirement of the multiplier on the bit width of the input and output ports is effectively reduced, and meanwhile, the number of effective partial products obtained by the multiplier is small, and the complexity of the multiplier for realizing multiplication operation is reduced.

Fig. 5 is a flow chart of a data processing method according to an embodiment, which can be processed by the multiplier shown in fig. 1, and the embodiment relates to a process of performing a comparison operation on data. As shown in fig. 5, the method includes:

S101, receiving data to be processed.

Specifically, the regular signed number encoding sub-circuit in the multiplier may receive two data to be processed. Alternatively, the canonical signed number encoding sub-circuit may process two fixed bit-wide data, and the fixed bit-width may be equal to the bit-width of the multiplier input port. Optionally, the data to be processed received by the regular signed number coding sub-circuit may be a fixed point number, and a bit width of the fixed point number may be equal to a bit width of an input port of the multiplier.

S102, carrying out regular signed number coding processing on the data to be processed to obtain a partial product of target coding.

Specifically, the method for encoding the regular signed number can be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l) >=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1). It should be noted that the number of partial products of the target code may be equal to the data bit width N plus 1 received by the multiplier.

S103, performing accumulation processing on the partial product of the target code to obtain a multiplication result.

Specifically, the accumulation sub-circuit may perform an accumulation operation on each column number value in the partial product of all the target codes, to obtain a multiplication operation result. Alternatively, the bit width of the multiplication result may be 2 times the bit width of the data received by the multiplier, and may be 2 times the bit width of the input port of the multiplier.

S104, acquiring a storage indication signal and a reading indication signal.

Specifically, the multiplier can automatically acquire the storage indication signal and the reading indication signal through the state control circuit.

S105, storing a plurality of multiplication operation results into different register sub-circuits according to the storage indication signals.

Specifically, the state control circuit in the multiplier inputs the acquired storage instruction signal to the register control circuit, and the register control circuit determines a multiplication result obtained by the current multiplication according to the received storage instruction signal and stores the multiplication result in the corresponding register sub-circuit.

It should be noted that, at most, one register sub-circuit can only store one multiplication result, and some register sub-circuits in the plurality of register sub-circuits may be in an idle state.

S106, according to the reading instruction signals, partial data stored in different register sub-circuits and corresponding to the multiplication result are read, and a target operation result is obtained.

Specifically, the selection circuit in the multiplier may read, as the target operation result, part of the data in the multiplication result stored in the corresponding register sub-circuit according to the received read instruction signal. Optionally, the operation result is not a target operation result, the target operation result of the multiplication operation may be formed by splicing the operation result read twice or may be formed by splicing the operation result read multiple times, and it may be understood that the bit width of part of the data in the multiplication operation result may be equal to 1/2 of the bit width of the multiplication operation result or may be less than 1/2 of the bit width of the multiplication operation result. Alternatively, the bit width of the target operation result may be equal to or less than the bit width of the multiplier input port.

The data processing method provided by the embodiment can perform regular signed number coding processing on received data to obtain a partial product of target coding, and perform accumulation processing on the partial product of target coding to obtain a multiplication result, and respectively read high-order data and low-order data in the multiplication result as the target operation result, so that the bit width of the obtained target operation result can be smaller than 2 times of the bit width of data input by a multiplier, thereby effectively reducing the requirement of the multiplier on the bit width of an input/output port; meanwhile, the method can adopt the regular signed number coding circuit to carry out regular signed number coding processing on the received data, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of the multiplication operation; meanwhile, the method can improve the operation efficiency of multiplication operation.

As one embodiment, the step of performing regular signed number encoding processing on the data to be processed in S102 to obtain the partial product of the target encoding may include:

s1021, carrying out regular signed number coding processing on the data to be processed to obtain an original partial product.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S1021 to obtain an original partial product may include:

s1021a, carrying out regular signed number coding processing on the data to be processed to obtain a target code.

Specifically, the multiplier can perform regular signed number coding processing on the received multiplier to be processed through the regular signed number coding unit to obtain target codes. Wherein, the bit width of the target code can be equal to the to-be-processed multiplied digital width N plus 1.

Optionally, the step of performing regular signed number encoding processing on the data to be processed in S1021a to obtain the target encoding may include: and converting the continuous l-bit numerical value 1 in the data to be processed into a (l+1) -bit numerical value with the highest bit being 1, the numerical value with the lowest bit being-1, and obtaining the target code after the rest bits are the numerical value 0, wherein l is more than or equal to 2.

It should be noted that, the method of the regular signed number encoding processing described above may be characterized in the following manner: for an N-bit multiplier, processing from a low-order value to a high-order value, if there is a succession of l (l>When the value 1 is =2), the consecutive n-bit value 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l+1) bit values to obtain a new data; then the new data is used as the initial data of the next conversion process until no continuous l (l)>=2) bit value 1; the N-bit multiplier is subjected to regular signed number coding, and the bit width of the obtained target code can be equal to (N+1).

And S1022b, performing conversion processing according to the data to be processed and the target code to obtain the original partial product.

It should be noted that the number of the original partial products may be equal to the bit width of the target code.

Exemplary, if the partial product acquisition unit receives an 8-bit multiplicand "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ "(i.e., X), the partial product acquisition unit may be based on the multiplicand" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ The corresponding original partial product is directly obtained by (i.e., X) and three values-1, 0,1 contained in the target code, when the one-bit value in the target code is-1, the original partial product can be-X, when the one-bit value in the target code is 0, the original partial product can be 0, and when the one-bit value in the target code is 1, the original partial product can be X. Alternatively, the conversion process described above may be characterized as converting the values in the target code into the original partial products based on the multiplicand in the multiplication operation.

S1022, performing sign bit expansion processing on the original partial product to obtain the target encoded partial product.

Optionally, the step of performing sign bit expansion processing on the original partial product in S1022 to obtain the target encoded partial product may specifically include: and performing bit filling processing on the original partial product to obtain the target encoded partial product.

Specifically, the bit width of the partial product after the sign bit expansion may be equal to 2 times the current data bit width N of the multiplier, while the bit width of the original partial product may be equal to N, and the number of bits of the sign bit expansion bit may be equal to N. Alternatively, the sign bit expansion process may be understood as that the sign bit expansion values are all complemented with the sign bit values in the original partial product, that is, the complemented value may be the sign bit value in the original partial product, and the sign bit value may be the highest bit value in the original partial product, so as to obtain a partial product after sign bit expansion with a 2N bit width. Alternatively, the number of bits of the complementary bits may be equal to N. Optionally, in the distribution rule of the partial product after all sign bit expansion, the most significant digit values in the partial product after all sign bit expansion may be located in the same column, the least significant digit values may also be located in the same column, and other corresponding digit values may also correspond to the same column.

According to the data processing method provided by the embodiment, regular signed number coding processing can be carried out on the data to be processed to obtain an original partial product, sign bit expansion processing is carried out on the original partial product to obtain a target coded partial product, accumulation processing is carried out on the target coded partial product to obtain a multiplication result, and then high-order data and low-order data in the multiplication result are respectively read and used as the target operation result, so that the bit width of the obtained target operation result can be smaller than 2 times of the bit width of data input by a multiplier, and the requirement of the multiplier on the bit width of an input port and an output port is effectively reduced; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication operation is reduced; meanwhile, the method can improve the operation efficiency of multiplication operation.

In another embodiment, the step of storing the multiplication results in different register sub-circuits according to the storage instruction signal in S105 may specifically include:

s1051, storing a first multiplication result corresponding to the first storage instruction signal into a first register sub-circuit.

Specifically, the number of storage indication signals may be equal to the number of times the multiplier performs multiplication, the multiplier performs multiplication once, a multiplication result may be obtained, and the state control circuit may obtain a corresponding storage indication signal. If the multiplier performs the first multiplication operation to obtain a first multiplication operation result, the state control circuit automatically acquires a first storage indication signal, and the register control circuit determines a first register sub-circuit for storing the first multiplication operation result according to the first storage indication signal input by the state control circuit and inputs the first multiplication operation result to the first register sub-circuit for storage.

S1052, storing the second multiplication result corresponding to the second storage instruction signal in the second register sub-circuit.

If the multiplier performs the second multiplication operation to obtain the second multiplication operation result, the state control circuit automatically obtains the second storage instruction signal, and the register control circuit determines a second register sub-circuit for storing the second multiplication operation result according to the second storage instruction signal input by the state control circuit, and inputs the second multiplication operation result to the second register sub-circuit for storage. And so on, the multiplier can store multiplication results obtained by each multiplication to different register sub-circuits, and store corresponding multiplication results according to the serial number sequence of the register sub-circuits, that is, continuous twice multiplication results can be stored to two adjacent register sub-circuits.

According to the data processing method provided by the embodiment, the first multiplication result corresponding to the first storage indication signal is stored in the first register sub-circuit, and the second multiplication result corresponding to the second storage indication signal is stored in the second register sub-circuit, so that the problem of coverage of the multiplication result is avoided; in addition, the method can also enable the bit width of the obtained target operation result to be smaller than 2 times of the bit width of the data input by the multiplier, effectively reduce the requirement of the multiplier on the bit width of the input/output port, and simultaneously, the method can obtain fewer effective partial products and reduce the complexity of multiplication operation.

As one embodiment, the step of reading the partial data stored in the different register sub-circuits and corresponding to the multiplication result according to the read instruction signal in S106 to obtain the target operation result may be specifically implemented by:

s1061, according to a first reading instruction signal, reading a first part of data in the first multiplication result stored in the first register sub-circuit to obtain a first operation result.

S1062, according to a second reading instruction signal, reading a second part of data in the first multiplication result stored in the first register sub-circuit to obtain a second operation result.

Specifically, the number of read instruction signals acquired by the state control circuit in the multiplier may be equal to the number of times the multiplier reads the operation result, which is equal to 2 times the number of times the multiplication result. Alternatively, the multiplication result may include two parts of data, i.e., first part of data and second part of data. For example, if the bit width of the multiplication result is equal to 2N, the multiplication result may be divided into two parts of data, i.e., high N-bit data and low N-bit data, wherein the first part of data may be high N-bit data or low N-bit data, and the second part of data may be low N-bit data or high N-bit data.

S1063, according to the third reading instruction signal, reading the first part of data in the second multiplication result stored in the second register sub-circuit to obtain a third operation result.

Alternatively, each of the read indication signals may correspond to the first portion of data or the second portion of data in the multiplication result.

S1064, according to the fourth reading instruction signal, reading the second part of data in the second multiplication result stored in the second register sub-circuit to obtain a fourth operation result.

Specifically, the multiplier may multiply the multiple sets of data to be processed to obtain multiple multiplication results, so after the multiplier reads the fourth operation result, the multiplier may read part of the data in the next multiplication result according to the next read instruction signal.

For example, if the input port bit width of the multiplier is 32 bits and the output port bit width is 64/t+delta bits (in general, the multiplier can complete one multiplication operation after t clock cycles to obtain a multiplication operation result, t>1，deta>=0), the data bit width received by the multiplier is also 32 bits, and the multiplier needs to multiply the multiple sets of data to be processed, in which case (64/(64/t+delta)) register sub-circuits 131 (i.e. register sub-circuit a) are included in the register circuit 13 ₁ ，A ₂ ，...，A _i I may be equal to (64/(64/t+deta))), the implementation process to obtain the target operation result may be:

if the multiplier passes throught (t may be greater than or equal to 0) clock cycles to obtain a first multiplication result M_0, the register control circuit may store M_0 (64-bit wide) to the register sub-circuit A according to the first storage indication signal ₁ At this time, the selection circuit may select the first read instruction signal from the register sub-circuit A ₁ The high-order 32-bit data of M_0 is read as a first operation result obtained by the first multiplication operation;

meanwhile, when the multiplier reaches the t+1th clock period, the selection circuit can select the register sub-circuit A according to the second read instruction signal ₁ The low-32-bit data of M_0 is read as a second operation result obtained by the first multiplication operation, and in the embodiment, the multiplier splices the first operation result with the second operation result to obtain a target operation result of the data to be processed;

If the multiplier can obtain the second multiplication result M_1 from the 2t clock period, the register control circuit can store M_1 into the register sub-circuit A according to the second storage indication signal ₂ In this case, the selection circuit may be configured to select the second read instruction signal from the register sub-circuit A ₂ The high-order 32-bit data of M_1 is read as a third operation result obtained by the second multiplication operation;

meanwhile, when the multiplier operates for 2t+1th clock period, the selection circuit can read the indication signal from the register sub-circuit A according to the fourth read indication signal ₂ The low-32-bit data of M_1 is read as a fourth operation result obtained by the second multiplication operation, and in the embodiment, the data comparator combines the third operation result and the fourth operation result to obtain a target operation result of the data to be processed;

and by analogy, the obtained multiplication result can be stored in corresponding different register sub-circuits according to different storage instruction signals, and partial data in the stored multiplication result is read in the different register sub-circuits according to different reading instruction signals to obtain a target operation result.

In addition, if a zero value exists in one group of data to be processed in the plurality of groups of data to be processed, at this time, the multiplier can obtain a multiplication result corresponding to the group of data to be processed through m (m < t) clock cycles, the multiplier can store the multiplication result into a corresponding register sub-circuit according to a storage indication signal, under the current clock cycle, the multiplier can read part of data in the multiplication results stored in different register sub-circuits according to a reading indication signal, and the multiplier can output the rest of data in the multiplication result in the next clock cycle; if zero value exists in the next group of data to be processed, and 1 clock cycle is needed to complete one multiplication operation, a multiplication operation result is obtained, and at this time, the multiplier can store the multiplication operation result into the adjacent next register sub-circuit.

According to the data processing method provided by the embodiment, the multiplier reads part of data in corresponding multiplication results stored in different register sub-circuits according to the reading instruction signals to obtain target operation results, and the method can respectively read high-order data and low-order data in the multiplication results as the target operation results, so that the bit width of the obtained target operation results can be smaller than 2 times of the bit width of data input by the multiplier, and the requirement of the multiplier on the bit width of an input/output port is effectively reduced; meanwhile, the number of effective partial products which can be obtained by the method is small, and the complexity of multiplication operation is reduced.

Fig. 6 is a flow chart of a data processing method provided in an embodiment, which can be processed by the multiplier shown in fig. 2, and the embodiment relates to a process of multiplying data. As shown in fig. 6, the method includes:

s201, receiving a data conversion signal and data to be processed.

Specifically, the multiplication circuit in the multiplier may receive two data to be processed and a data conversion signal. Alternatively, the bit width of the data to be processed may be equal to the bit width of the multiplier input port. Alternatively, if the revolution number circuit receives a different data conversion signal, the revolution number circuit may convert the received data into data of a format corresponding to the data conversion signal.

S202, carrying out regular signed number coding processing on the data to be processed to obtain a partial product of target coding.

Specifically, the principle of the regular signed number encoding process described above can be characterized by, for an N-bit multiplier, processing from a low order to a high order value, if there is a succession of l (l>When =2) bit 1, then n bit 1 can be converted into data "1 (0) _l-1 (-1) ", and combining the remaining corresponding N-l bit values with the converted l+1 bit values to obtain new data, and using the new data as initial data of the next conversion process until no continuous l (l) is present in the new data obtained after the conversion process>=2) bit 1, where the bit width of the target code obtained by regular signed number encoding of the N-bit multiplier may be equal to the n+1-bit value. It should be noted that the number of partial products of the target code may be equal to the data bit width N plus 1 received by the multiplier.

S203, accumulating the partial products of the target codes to obtain multiplication results.

Specifically, the accumulation sub-circuit may perform an accumulation operation on each column number value in the partial product of all the target codes, to obtain a multiplication operation result. Alternatively, the bit width of the multiplication result may be 2 times the bit width of the data received by the multiplier, and may be 2 times the bit width of the input port of the multiplier. Alternatively, the bit width of the multiplication result may be 2 times the bit width of the multiplier input port, and may be 2 times the bit width of the data to be processed.

S204, carrying out revolution processing on the multiplication operation result according to the data conversion signal to obtain a target operation result, wherein the data conversion signal is used for indicating a multiplier to convert the target operation result into a required data type.

Specifically, the revolution number circuit determines according to the received data conversion signal, and can convert the multiplication operation result into a fixed-point type operation result or a floating-point type operation result. For example, if the revolution number circuit can receive two data conversion signals, which are respectively indicated as 00 and 01, and the bit widths of the input port and the output port of the multiplier are N bits, 00 indicates that the revolution number circuit can convert the received 2N-bit multiplication result into an N-bit fixed-point type operation result, and 01 indicates that the revolution number circuit can convert the received 2N-bit multiplication result into an N-bit floating-point type operation result, wherein functions implemented by the revolution number circuit corresponding to different data conversion signals can be flexibly set. Alternatively, each data conversion signal may characterize a type of data that the multiplier needs to convert the multiplication result to a demand.

According to the data processing method provided by the embodiment, the data conversion signal and the data to be processed are received, multiplication operation is carried out on the data to be processed to obtain a multiplication operation result, and the multiplication operation result is subjected to revolution processing according to the data conversion signal to obtain a target operation result; meanwhile, the number of effective partial products which can be obtained by the method is small, and the complexity of multiplication operation is reduced.

The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers, wherein the multipliers are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers may be linked and data transferred through a specific structure, such as through a fast peripheral interconnect bus, to support larger scale machine learning operations. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.

The machine learning arithmetic device has high compatibility and can be connected with various types of servers through a quick external equipment interconnection interface.

The embodiment of the application also provides a combined processing device which comprises the machine learning operation device, a general interconnection interface and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 7 is a schematic diagram of a combination processing apparatus.

Other processing means include one or more processor types of general-purpose/special-purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; other processing devices may cooperate with the machine learning computing device to perform the computing task.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a machine learning operation device chip; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to other processing devices.

Alternatively, as shown in fig. 8, the structure may further include a storage device connected to the machine learning operation device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the machine learning arithmetic device or the other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.

In some embodiments, a chip is also disclosed, which includes the machine learning computing device or the combination processing device.

In some embodiments, a chip package structure is disclosed, which includes the chip.

In some embodiments, a board card is provided that includes the chip package structure described above. As shown in fig. 9, fig. 9 provides a board that may include other mating components in addition to the chips 389, including, but not limited to: a storage device 390, a receiving device 391 and a control device 392;

the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.

In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.

The receiving device is electrically connected with the chip in the chip packaging structure. The receiving means is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving means may be a standard fast external device interconnect interface. For example, the data to be processed is transferred from the server to the chip through a standard rapid external device interconnection interface, so as to realize data transfer. Preferably, the theoretical bandwidth can reach 16000MB/s when using a fast peripheral interconnect 3.0X10 interface transport. In another embodiment, the receiving device may be another interface, and the application is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g., server) by the receiving apparatus.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device may be a multiplier, robot, computer, printer, scanner, tablet, smart terminal, cell phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, watch, headset, mobile storage, wearable device, vehicle, household appliance, and/or medical device.

The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of circuit combinations, but those skilled in the art should appreciate that the present application is not limited by the circuit combinations described, as some circuits may be implemented in other manners or structures according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A multiplier, the multiplier comprising: the device comprises a multiplication circuit, a register control circuit, a register circuit, a state control circuit and a selection circuit, wherein the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the register control circuit, the output end of the register control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the register control circuit, and the second output end of the state control circuit is connected with the second input end of the selection circuit; the regular signed number coding sub-circuit comprises a regular signed number coding unit and a partial product acquisition unit;

the regular signed number coding unit comprises a data input port and a target coding output port; the data input port is used for receiving first data subjected to regular signed number coding processing, and the target coding output port is used for outputting target codes obtained after the first data is subjected to regular signed number coding processing.

2. The multiplier according to claim 1, wherein said partial product obtaining unit is configured to receive second data, obtain an original partial product from said target code and said second data, and obtain a partial product of said target code from said original partial product; the accumulation sub-circuit is used for carrying out accumulation processing on the partial product of the target code to obtain a multiplication result; the state control circuit is used for acquiring a storage indication signal and a reading indication signal; the register control circuit is used for determining the register circuit for storing the multiplication result according to the storage indication signal input by the state control circuit; the register circuit is used for storing the multiplication result; the selection circuit is used for reading data in the multiplication operation result stored in the register circuit according to the received reading indication signal to serve as a target operation result; the second data and the first data are fixed point numbers, and the data bit widths of the second data and the first data are equal.

3. The multiplier according to claim 2, wherein said partial product obtaining unit is specifically configured to convert said target code to obtain an original partial product, and to perform a sign bit extension process on said original partial product to obtain a sign bit extended partial product, and to obtain said target code partial product according to said sign bit extended partial product.

4. A multiplier as claimed in claim 3, characterized in that the partial product acquisition unit comprises: a target encoding input port, a second data input port, and a partial product output port; the target code input port is used for receiving the target code, the second data input port is used for receiving the second data, and the partial product output port is used for outputting a partial product of the target code.

5. A multiplier as claimed in any one of claims 2 to 4, in which the accumulation sub-circuit comprises: the Wallace tree group unit and the accumulation unit; the output end of the Wallace tree group unit is connected with the input end of the accumulation unit; the Wallace tree group unit is used for carrying out accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit is used for carrying out accumulation processing on the accumulation operation result.

6. The multiplier of claim 5, wherein said wale tree group unit comprises: and the Wallace tree subunit is used for accumulating each column number value in the partial product of all target codes.

7. The multiplier of claim 5, wherein the accumulation unit comprises: and the adder is used for adding the received accumulated correction result.

8. The multiplier of claim 7, wherein the adder comprises: carry signal input port, and bit signal input port and result output port; the carry signal input port is used for receiving a carry signal, the sum bit signal input port is used for receiving a sum bit signal, and the result output port is used for outputting a result of accumulation processing of the carry signal and the sum bit signal.

9. A multiplier according to any one of claims 2 to 4, in which the register circuit comprises: and the register sub-circuit is used for storing multiplication operation results corresponding to different storage indication signals.

10. A multiplier, the multiplier comprising: the device comprises a multiplication circuit and a revolution circuit, wherein the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the input end of the revolution circuit, and the revolution circuit comprises a first conversion sub-circuit and a second conversion sub-circuit;

The regular signed number coding sub-circuit is used for carrying out regular signed number coding processing on received first data to obtain target codes, determining an original partial product according to the target codes and the second data, and obtaining a partial product of the target codes according to the original partial product; the accumulation sub-circuit is used for carrying out accumulation processing on the partial product of the target code to obtain a multiplication result; the first conversion sub-circuit and the second conversion sub-circuit are respectively used for carrying out revolution processing on the multiplication operation result to obtain a target operation result; wherein the data bit width of the target operation result is smaller than 2 times of the data bit width of the multiplication operation result.

11. The multiplier of claim 10, wherein the revolution number circuit includes an input port therein for receiving a data conversion signal; the data conversion signal is used for determining the data conversion type processed by the revolution number circuit.

12. Multiplier according to claim 10 or 11, characterized in that said first conversion sub-circuit is in particular adapted to convert said multiplication result into said target operation result of floating point type, and said second conversion sub-circuit is in particular adapted to convert said multiplication result into said target operation result of fixed point type.

13. A data processing method, applied to a multiplier according to any one of claims 1 to 9, the method comprising:

receiving data to be processed;

acquiring a storage indication signal and a reading indication signal;

14. The method of claim 13, wherein the performing regular signed number encoding on the data to be processed to obtain a partial product of the target code comprises:

15. The method of claim 14, wherein said performing a canonical signed number encoding process on the data to be processed results in an original partial product, comprising:

16. The method according to claim 14 or 15, wherein said performing a sign bit extension process on said original partial product to obtain a partial product of said target code comprises: and performing bit filling processing on the original partial product to obtain the target encoded partial product.

17. The method according to any one of claims 13 to 15, wherein storing a plurality of multiplication results into different register sub-circuits according to the storage indication signal comprises:

18. The method according to any one of claims 13 to 15, wherein the reading, according to the read instruction signal, a part of data stored in different register sub-circuits corresponding to the multiplication result to obtain a target operation result includes:

According to the first reading instruction signal, reading a first part of data in a first multiplication result stored in a first register sub-circuit to obtain a first operation result;

according to the third reading instruction signal, reading the first part of data in the second multiplication result stored in the second register sub-circuit to obtain a third operation result;

19. A data processing method, applied to a multiplier according to any one of claims 10 to 12, the method comprising:

receiving a data conversion signal and data to be processed;

20. A machine learning computing device, comprising one or more multipliers according to any one of claims 1-12, configured to obtain input data and control information to be computed from other processing devices, perform a specified machine learning operation, and transmit the execution result to other processing devices through an I/O interface;

when the machine learning operation device comprises a plurality of multipliers, the multipliers are connected through a preset specific structure and transmit data;

21. A combination processing device, comprising the machine learning computing device of claim 20, a universal interconnect interface, and other processing devices;

the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

22. The combination processing device of claim 21, further comprising: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.

23. A neural network chip, characterized in that the neural network chip comprises the machine learning arithmetic device of claim 20 or the combination processing device of claim 21 or the combination processing device of claim 22.

24. An electronic device comprising the neural network chip of claim 23.

25. A board, characterized in that, the board includes: a memory device, a receiving means and a control device, the neural network chip of claim 23;

the neural network chip is respectively connected with the storage device, the control device and the receiving device;

the storage device is used for storing data;

the receiving device is used for realizing data transmission between the neural network chip and external equipment;

the control device is used for monitoring the state of the neural network chip.

26. The board card of claim 25, wherein the board card is configured to,

the memory device includes: each group of storage units is connected with the neural network chip through a bus, and the storage units are as follows: DDR SDRAM;

the neural network chip includes: the DDR controller is used for controlling data transmission and data storage of each storage unit;

the receiving device is as follows: standard PCIE interfaces.