CN210006082U

CN210006082U - Multiplier, device, neural network chip and electronic equipment

Info

Publication number: CN210006082U
Application number: CN201921433488.1U
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2020-01-31
Anticipated expiration: 2029-08-30

Abstract

The application provides multipliers, devices, a neural network chip and electronic equipment, wherein the multipliers comprise regular signed number coding circuits, partial product acquisition circuits and correction accumulation circuits, wherein the output ends of the regular signed number coding circuits are connected with the input ends of the partial product acquisition circuits, the output ends of the partial product acquisition circuits are connected with the input ends of the correction accumulation circuits, the multipliers can carry out regular signed number coding on received data through the regular signed number coding circuits, the number of obtained effective partial products is small, and therefore the complexity of the multipliers for realizing multiplication operation is reduced.

Description

Multiplier, device, neural network chip and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to multipliers, devices, neural network chips, and electronic devices.

Background

With the continuous development of digital electronic technology, the fast development of various Artificial Intelligence (AI) chips has higher and higher requirements for high-performance digital multipliers, and the neural network algorithm is of algorithms widely applied to the intelligent chip , and multiplication operations performed by multipliers are common operations in the neural network algorithm.

At present, each three-bit value in a multiplier is used as codes by a multiplier, partial products are obtained according to the multiplicand, and all the partial products are compressed by using a Wallace tree to obtain a target operation result in multiplication.

SUMMERY OF THE UTILITY MODEL

In view of the above, it is desirable to provide multipliers, chips and electronic devices capable of reducing the number of partial products obtained during multiplication to reduce the complexity of multiplication of the multipliers.

The embodiment of the application provides multipliers, which comprise a regular signed number coding circuit, a partial product acquisition circuit and a correction accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the partial product acquisition circuit, the output end of the partial product acquisition circuit is connected with the input end of the correction accumulation circuit, the partial product acquisition circuit comprises an original partial product acquisition unit and a logic unit, and the correction accumulation circuit comprises a correction Wallace tree group sub-circuit and an accumulation sub-circuit;

the regular signed number coding circuit is used for carrying out regular signed number coding on received data to obtain target codes, the original partial product obtaining unit is used for obtaining original partial products according to the target codes, the logic unit is used for carrying out logic operation processing on highest-order numerical values of the original partial products to obtain partial products with sign bit expansion eliminated, the corrected Wallace tree group sub-circuit is used for carrying out accumulation correction processing on the partial products with sign bit expansion eliminated, and the accumulation sub-circuit is used for carrying out accumulation processing on accumulated correction operation results.

In embodiments, the regular signed number coding circuit includes a data input port for receiving th data subjected to regular signed number coding, and a target coding output port for outputting a target code obtained by performing regular signed number coding on the th data.

In of these embodiments, the partial product acquisition circuit includes an and circuit.

In embodiments, the modified Wallace tree group sub-circuit includes Wallace tree cells for performing an accumulation modification process on each column of values of the partial product after sign-elimination extension.

In embodiments, the accumulation sub-circuit comprises an adder for adding the accumulated modified operation results.

In embodiments, the adder includes a carry signal input port for receiving a carry signal, a sum signal input port for receiving a sum signal, and a result output port for outputting the target operation result of the accumulation of the carry signal and the sum signal.

The multipliers provided by this embodiment can perform regular signed number encoding on received data through a regular signed number encoding circuit to obtain a target code, then obtain an original partial product through a partial product obtaining circuit according to each -bit number value in the target code, perform logic operation processing on the original partial product through a logic unit to obtain a corresponding partial product after sign bit extension elimination, and finally perform accumulation correction processing on the partial product after sign bit extension elimination through a correction accumulation circuit.

The machine learning arithmetic device provided by the embodiment of the application comprises or a plurality of multipliers, wherein the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting the execution result to other processing devices through an I/O interface;

when the machine learning arithmetic device comprises a plurality of multipliers, the plurality of computing devices are connected through a preset specific structure and transmit data;

the multipliers are interconnected through a PCIE bus and transmit data to support larger-scale machine learning operation, the multipliers share the same control system or own control system, the multipliers share a memory or own memory, and the interconnection mode of the multipliers is any interconnection topology.

The combined processing devices provided by the embodiment of the application comprise the machine learning processing device, the universal interconnection interface and other processing devices, the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user, and the combined processing device can also comprise a storage device which is respectively connected with the machine learning arithmetic device and the other processing devices and is used for storing the data of the machine learning arithmetic device and the other processing devices.

The neural network chips provided by the embodiment of the present application include the multiplier, the machine learning arithmetic device, or the combination processing device.

The neural network chip packaging structures provided by the embodiment of the application comprise the neural network chip.

The boards provided by the embodiment of the application comprise the neural network chip packaging structure.

The embodiment of the application provides electronic devices, which include the neural network chip or the board card.

The kinds of chips provided by the embodiment of the present application include at least multipliers as described in any of the above .

electronic devices provided by the embodiment of the application comprise the chip

Drawings

FIG. 1 is a schematic structural diagram of an embodiment providing multipliers;

FIG. 2 is a schematic diagram of another multipliers provided in another embodiment;

FIG. 3 is a schematic diagram of a specific structure of the multipliers provided in the example;

FIG. 4 is a schematic diagram of another multipliers provided in another embodiment;

fig. 5 is a schematic diagram illustrating a distribution rule of the partial products after the sign bit extension is removed in 9 embodiments according to another ;

FIG. 6 is a diagram illustrating another specific circuit structures of a modified accumulation circuit for 8-bit data operation according to another embodiment;

FIG. 7 is a flowchart illustrating a processing method of types of data provided by an embodiment of ;

FIG. 8 is a flowchart illustrating another data processing method provided in the embodiment;

FIG. 9 is a block diagram of an embodiment providing combination processing devices;

FIG. 10 is a block diagram of illustrating an alternative combination processing device;

fig. 11 is a schematic structural diagram of kinds of boards provided in an embodiment.

Detailed Description

For purposes of making the present application, its objects, aspects and advantages more apparent, the present application is described in further detail with reference to the drawings and the examples.

The multiplier provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip of a Field-Programmable Array, or other hardware circuit devices for multiplication processing, and the specific structural schematic diagram of the multiplier is shown in fig. 1 and 2.

As shown in fig. 1, fig. 1 is a structural diagram of multipliers provided by embodiments, where the multiplier includes a regular signed number encoding circuit 11 and a correction accumulation circuit 12, an output end of the regular signed number encoding circuit 11 is connected to an input end of the correction accumulation circuit 12, the regular signed number encoding circuit 11 is configured to perform regular signed number encoding processing on received data to obtain a partial product after sign bit extension removal, and the correction accumulation circuit 12 is configured to perform accumulation correction processing on the partial product after sign bit extension removal.

Specifically, the regular signed number encoding circuit 11 may include a plurality of data processing units having different functions, and the data received by the regular signed number encoding circuit 11 may be used as a multiplier in a multiplication operation and may also be used as a multiplicand in the multiplication operation. Optionally, the data processing unit with different functions may include a data processing unit with a regular signed number encoding processing function, and the regular signed number encoding processing may be characterized as a data processing procedure by encoding values 0, -1 and 1. Alternatively, the multiplier and the multiplicand may be fixed-point numbers with multi-bit widths. Optionally, the correction accumulation circuit 12 may perform correction processing during accumulation of the partial product obtained by the regular signed number encoding circuit 11 after eliminating sign bit extension, so as to obtain a target operation result in the multiplication operation.

It should be noted that, the multiplier provided in this embodiment may process a multiplication operation of fixed-bit-width data, where the fixed bit width may be 8 bits, 16 bits, 32 bits, or may also be 64 bits, and this embodiment is not limited in any way.

The multipliers provided by the embodiment of the present invention perform regular signed number encoding on received data through a regular signed number encoding circuit to obtain a partial product after sign bit extension is eliminated, and the correction accumulation circuit can perform accumulation correction on the partial product after sign bit extension is eliminated to obtain a target operation result.

Fig. 2 is a structural diagram of multipliers provided by embodiments, and as shown in fig. 2, the multiplier includes a regular signed number encoding circuit 21, a partial product obtaining circuit 22, and a correction accumulation circuit 23, where an output end of the regular signed number encoding circuit 21 is connected to an input end of the partial product obtaining circuit 22, and an output end of the partial product obtaining circuit 22 is connected to an input end of the correction accumulation circuit 23, the regular signed number encoding circuit 21 is configured to perform regular signed number encoding processing on received data to obtain a target code, the partial product obtaining circuit 22 is configured to obtain an original partial product according to the target code, perform logical operation processing according to the original partial product to obtain a partial product after sign bit extension is removed, and the correction accumulation circuit 23 is configured to perform accumulation correction processing on the partial product after sign bit extension is removed.

Optionally, the regular signed number encoding circuit 21 includes a data input port 211 and a target encoding output port 212, where the data input port 211 is configured to receive th data subjected to regular signed number encoding processing, and the target encoding output port 212 is configured to output the target encoding obtained after the received th data is subjected to regular signed number encoding processing.

Optionally, the partial product obtaining circuit 22 includes an original partial product obtaining unit 221 and a logic unit 222, where the original partial product obtaining unit 221 is configured to obtain an original partial product according to a target code, and the logic unit 222 is configured to perform a logical operation on a highest-order digit value of the original partial product to obtain a partial product with sign bit expansion removed.

Specifically, the regular signed number encoding circuit 21 may receive th data and perform regular signed number encoding on the th data to obtain a target code, the th data may be a multiplier in a multiplication operation, and the method of the regular signed number encoding may be characterized by processing from a lower value to an upper value for an N-bit multiplier and if there is a continuous l (l) in the N-bit multiplier>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))_l-1(-1) ", and combining the rest corresponding (N-l) bit values with the converted (l +1) bit values to obtain new data, and then using the new data as the initial data of the next -level conversion process until there is no continuous l (l) in the new data obtained after the conversion process>2) bit value 1, wherein the N-bit multiplier is subjected to regular signed number coding processing, the bit width of the obtained target code can be equal to (N +1), and steps are further carried out, in the regular signed number coding processing, the data 11 can be converted into (100- & ltSUB & gt 001), namely the data 11 can be equivalently converted into 10(-1), the data 111 can be converted into (1000- & ltSUB & gt 0001), namely the data 111 can be equivalently converted into 100(-1), and the like, and l (l- & ltSUB & gt) is continuous to other parts>2) bit value 1 conversion process is also similar.

For example, the multiplier received by the regular signed number coding circuit 21 is "001010101101110", the th new data obtained by performing the th stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second stage conversion processing on the th new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth stage conversion processing on the fourth new data is "010 (-1)0(-1)0(-1)00(-1) 0", no consecutive i (l > -2) bit number value 1 exists in the fifth new data, at this time, the fifth new data is referred to as an intermediate coding sign number, and the intermediate coding number of the intermediate data is equal to an optional intermediate coding bit number coding number of "351", and the intermediate coding number of the intermediate coding circuit "3501" coding sign number of the intermediate data is equal to an optional highest bit number coding target number of the third data, which is equal to a highest bit number coding target coding value of the highest coding number of the third data, or a coding target coding number of the intermediate data, which is equal to "3501", and the highest coding number of the intermediate coding sign number of the intermediate coding circuit 26, which is equal to a coding of the highest coding number of the intermediate coding of the highest bit number of the third data, which is equal to "355, and the highest coding of the target coding number of the highest coding of the highest data, which is equal to" 355, which is equal to.

Optionally, the bit width of the target code may be equal to the bit width N of the multiplier received by the multiplier plus 1, the bit width of the target code may be equal to the number of original partial products, and the original partial product obtaining unit 221 in the partial product obtaining circuit 22 may obtain a corresponding original partial product according to each -bit number value in the target code, and perform a logical operation on the highest bit value in each original partial products through the logic circuit 222 to directly remove the sign bit extension bit to obtain a partial product after removing the sign bit extension.

It should be noted that if the most significant bit value of the original partial product is represented by a, the partial product obtaining circuit 22 may perform an and logic operation on the most significant bit value and the signal 1 through an and circuit to obtain the most significant bit of the original partial product, which corresponds to the value a ' of the corresponding bit in the partial product after the sign bit removal extension of the target code, that is, a ' is the sum signal of a and the signal 1, and obtain the additional -bit value Q in the partial product after the sign bit removal extension of the target code, which may be equal to the carry signal of a and the signal 1, where the generation relationship between the most significant bit value a of the original partial product, and the corresponding most significant bit a ' and the additional -bit value Q in the partial product after the sign bit removal extension obtained after the logic operation process may be referred to table 1.

TABLE 1

In the multipliers provided by this embodiment, the multiplier may perform regular signed number encoding processing on th data received by the regular signed number encoding circuit to obtain a target code, then obtain an original partial product by the partial product obtaining circuit according to each -bit number value in the target code, and perform logic operation processing on -bit data of the original partial product by the logic unit to achieve elimination of sign bit extension processing to obtain a partial product after sign bit extension is eliminated, and finally perform accumulation correction processing on the partial product after sign bit extension elimination by the correction accumulation circuit, so as to ensure that the regular multiplier can perform regular signed number encoding processing on the received data by the regular signed number encoding circuit, reduce the number of effective partial products obtained in the multiplication process, thereby reducing complexity of the multiplier in realizing multiplication, and at the same time, the multiplier can improve the operation efficiency of multiplication, and effectively reduce the power consumption of the multiplier.

Fig. 3 is a specific structural schematic diagram of multipliers provided by embodiments, and as shown in fig. 3, the multiplier includes the regular signed number encoding circuit 11, the regular signed number encoding circuit 11 includes a regular signed number encoding processing unit 111 and a partial product obtaining unit 112, an output end of the regular signed number encoding processing unit 111 is connected to an input end of the partial product obtaining unit 112, where the regular signed number encoding processing unit 111 is configured to perform regular signed number encoding processing on received th data to obtain a target code, and the partial product obtaining unit 112 is configured to obtain an original partial product according to the target code and perform logical operation processing according to the original partial product.

Optionally, the partial product obtaining unit 112 is specifically configured to obtain an original partial product according to a target code, and perform binary addition processing according to a highest-order digit value of the original partial product to obtain the partial product after sign bit removal expansion.

Specifically, the regular signed number encoding processing unit 111 may receive the th data and perform the regular signed number encoding processing on the th data to obtain the target code, and the th data may be a multiplier in the multiplication operation>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))_l-1(-1) ", and combining the rest corresponding (N-l) bit values with the converted (l +1) bit values to obtain new data, and then using the new data as the initial data of the next -level conversion process until there is no continuous l (l) in the new data obtained after the conversion process>2) bit value 1, wherein the N-bit multiplier is subjected to regular signed number coding processing, the bit width of the obtained target code can be equal to (N +1), and steps are further carried out, in the regular signed number coding processing, the data 11 can be converted into (100- & ltSUB & gt 001), namely the data 11 can be equivalently converted into 10(-1), the data 111 can be converted into (1000- & ltSUB & gt 0001), namely the data 111 can be equivalently converted into 100(-1), and the like, and l (l- & ltSUB & gt) is continuous to other parts>2) bit value 1 conversion process is also similar.

For example, the multiplier received by the regular signed number coding processing unit 111 is "001010101101110", the new data obtained by performing the th stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second stage conversion processing on the th new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth stage conversion processing on the fourth new data is "010 (-1)0(-1)00(-1) 0", no consecutive i (l >) bit number value 1 exists in the fifth new data, at this time, the fifth new data is referred to as an intermediate signed number coding unit, and the intermediate coding number of the fifth new data is equal to an optional intermediate coding target bit number, which is equal to "3501", and the intermediate coding unit may obtain a coding value corresponding to the highest coding number of the intermediate coding unit encoding target coding unit , and the intermediate coding unit may obtain a coding unit encoding the highest coding number of the intermediate data, where the intermediate data, the coding unit may obtain a coding unit encoding the highest coding unit having a coding number of the highest bit number after the highest bit number of the highest coding unit having a possible number after completing the highest coding unit having a bit width of the highest coding unit, and the highest coding unit having a bit number of the highest coding unit having a possible coding unit having a bit number of the highest coding unit, which may obtain a possible number after completing coding unit of the highest coding unit of.

The bit width of the target code may be equal to the bit width N plus 1 of a multiplier received by the multiplier, the bit width of the target code may be equal to the number of original partial products, and the partial product obtaining unit 112 may obtain a corresponding original partial product according to each -bit number value in the target code, and perform an and logic operation on a highest bit value in each original partial product through two full adders 112a and 1122b included in the partial product obtaining unit 112.

Optionally, an extra -bit value Q in the partial product after sign bit elimination extension may be determined according to the result of performing and logic operation on the highest bit value a in the original partial product and the signal 1, wherein the Q-bit value in the partial product after sign bit elimination extension may be equal to a carry signal of performing and logic operation on the highest bit value a in the original partial product and the signal 1, and the next highest bit value in the partial product after sign bit elimination extension may be equal to a sum signal of performing and logic operation on the highest bit value a and the signal 1.

In the multipliers provided by this embodiment, the multiplier may perform regular signed number encoding processing on th data received by the regular signed number encoding processing unit to obtain a target code, then the partial product obtaining unit obtains an original partial product according to each -bit number value in the target code, and performs logical operation according to a highest-order bit value of the original partial product to achieve elimination of sign bit expansion processing to obtain a partial product after eliminating sign bit expansion, and finally corrects the partial product after eliminating sign bit expansion corresponding to the accumulation circuit, and performs accumulation correction processing on the partial product after eliminating sign bit expansion, so as to ensure that the multiplier can perform regular signed number encoding processing on the received data by using the regular signed number encoding circuit, reduce the number of effective partial products obtained in the multiplication process, thereby reducing complexity of the multiplier in achieving multiplication, and at the same time, the multiplier can improve operation efficiency of multiplication and effectively reduce power consumption of the multiplier.

In embodiments, the multiplier includes the regular signed number encoding processing unit 111, and the regular signed number encoding processing unit 111 includes a data input port 1111 and a target encoding output port 1112, where the data input port 1111 is configured to receive the th data subjected to regular signed number encoding processing, and the target encoding output port 1112 is configured to output a target encoding obtained by performing regular signed number encoding processing on the th data received.

Specifically, if the data input port 1111 receives the th data, the regular signed number encoding processing unit 111 may perform regular signed number encoding processing on the th data to obtain a target code, and output the target code through the target code output port 1112. optionally, the regular signed number encoding processing unit 111 may receive the th data through the data input port 1111, and the th data may be a multiplier in a multiplication operation, it should be noted that the internal circuit structures and the external output ports of the regular signed number encoding circuit 11 and the regular signed number encoding processing unit 111 shown in fig. 3 are the same as functions, and optionally, the values included in the target code obtained by performing the regular signed number encoding processing on the multiplier by the regular signed number encoding processing unit 111 may be-1, 0, and 1.

In the multipliers provided by this embodiment, the regular signed number encoding processing unit may perform regular signed number encoding processing on the received th data to obtain a target code, then the partial product obtaining unit may obtain a corresponding partial product after sign bit extension elimination according to each -bit number value in the target code, and may perform accumulation correction processing on the partial product after sign bit extension elimination through the correction accumulation circuit to obtain a target operation result in multiplication, so as to ensure that the multiplier can perform regular signed number encoding processing on the received data through the regular signed number encoding processing unit, reduce the number of effective partial products obtained in the multiplication process, thereby reduce the complexity of the multiplier in realizing multiplication, and at the same time, the multiplier may improve the operation efficiency of multiplication, and effectively reduce the power consumption of the multiplier.

In embodiments, the multiplier comprises the partial product obtaining unit 112, and the partial product obtaining unit 112 comprises a target code input port 1121, a data input port 1122, and a partial product output port 1123, wherein the target code input port 1121 is configured to receive the target code, the data input port 1122 is configured to receive the second data, and the partial product output port 1123 is configured to output a partial product obtained by obtaining the sign-removed bit expansion according to the target code and the received second data.

Specifically, the partial product obtaining unit 112 may receive the target code output by the regular signed number code processing unit 111 through the target code input port 1121, and the partial product obtaining unit 112 obtains the original partial product according to each -bit value in the target code received by the target code input port 1121, and the data input port 1122 receives second data, where the second data may be a multiplicand in a multiplication operation, and performs an and logical operation on the original partial product, so as to obtain a corresponding partial product after sign bit extension removal.

In the multipliers provided by the embodiment, the partial product obtaining unit of the multiplier can obtain the corresponding partial product after sign bit extension elimination according to each bit value in the target code, and the correction accumulation circuit can perform accumulation correction processing on the partial product after sign bit extension elimination to obtain the target operation result in the multiplication operation, so that the number of effective partial products obtained by the multiplier is reduced, the complexity of the multiplier for realizing the multiplication operation is reduced, and meanwhile, the multiplier can improve the operation efficiency of the multiplication operation and effectively reduce the power consumption of the multiplier.

In embodiments, continuing with the detailed structural diagram of the multiplier shown in fig. 3, the multiplier includes the modified accumulation circuit 12, and the modified accumulation circuit 12 includes full adders 121 to 12n, where a plurality of the full adders 121 to 12n are used for performing the accumulated modification processing on the received partial products after the sign bit is removed.

Specifically, the full adders 121 to 12N may implement a combination circuit of binary addition and summation by using a circuit, and may also be understood as a circuit that processes a multi-bit input signal and adds the multi-bit input signal to obtain a two-bit output signal, optionally, the number N of the full adders included in the modified accumulation circuit 12 may be equal to the bit width N of the partial product after sign bit extension is removed plus 1, multiplied by (N +1), and then summed with N, where N may represent the sum of the number of the values included in the target code obtained by the regular signed number code processing unit 111 minus 1, that is, the number of the target code is equal to N +1, optionally, the distribution rule of the N full adders in the modified accumulation circuit 12 may be distributed layer by layer, each partial product obtained by the partial product obtaining unit 112 after sign bit extension is removed every 64 may correspond to -layer full adders, where the number of the full adders may be equal to the number of the partial products after sign bit extension is removed, the number of the partial adders after the total adders are removed may be equal to the number of the partial products after sign bit extension is removed, which may be equal to the number of the partial products obtained by the full adders 3546 b, and after the optional full adders are added to the bit extension of the optional full adders 357, and after the optional partial products, the bit extension of the partial products after the bit extension of the partial products after the bit extension circuit 12 is added, the bit extension of the partial products, may be equal to the bit extension of the total adders 3512, 365634 b, and the optional partial products after the optional partial products, the addition, the optional partial products after the addition of the optional partial products, the addition of the optional partial products.

It should be noted that, every full adders in the modified accumulation circuit 12 may add two or more input signals to obtain two-bit output signals, where the two-bit output signals may include Carry signal Carry and result bit signal Sum, optionally, in this embodiment, every full adders in the modified accumulation circuit 12 may receive three input signals, where the three input signals may be any -bit value in a partial product with sign bit extension removed, Carry output signal Carry obtained by a lower adder, result bit signal Sum, and any three signals in a binary signal, optionally, during the process of adding and correcting partial products with sign bit extension removed by the modified accumulation circuit 12, the partial products with sign bit extension removed may be obtained by a full adder in the modified accumulation circuit 12, the partial products with two sign bit extensions removed obtained by the partial product obtaining unit 112 may be processed by a full adder in the modified accumulation circuit 12, and the partial products with sign bit extension removed may be processed by a full adder in a modified accumulation circuit 12, and the multiplier may obtain a partial product with sign bit extension removed by a full adder in the partial product obtaining unit 112, where the partial product obtained by a full adder may obtain a full adder after adding sign bit extension signal Sum, and a full adder after adding sign bit extension signal Sum may obtain a full adder after adding signal Sum by a full adder 632, and a full adder after adding signal Sum signal after adding a full adder after adding signal obtained by a full adder in a full adder layer 9, and a full adder layer 38, and a full adder may obtain a full adder layer 9, and a full adder layer after adding a full adder may obtain a full adder layer corresponding signal after adding a full adder layer 9 and a full adder layer 9, and a full adder layer after adding a full adder after.

Optionally, the modified accumulation circuit 12 may perform two times of modification processing on the partial product after the sign bit extension is removed, where the modified accumulation circuit 12 may perform modification processing on the value in the partial product after the sign bit extension is removed through two full adders in layers and , where if numbers correspond to each full adder, the full adder performing modification processing in the full adder in the th layer may be a full adder with a next highest bit number, and the full adder performing modification processing in the full adder in the th layer may be a full adder with a highest bit number.

In the multipliers provided by this embodiment, the correction accumulation circuit in the multiplier can perform accumulation correction on the partial products obtained by the partial product obtaining unit after the sign bit extension is eliminated less, so as to obtain the target operation result in the multiplication, thereby reducing the complexity of the multiplier in realizing the multiplication and effectively reducing the power consumption of the multiplier.

Fig. 4 is a schematic diagram of a specific structure of multipliers provided in another embodiment, where the multiplier includes the correction accumulation circuit 23, and the correction accumulation circuit 23 includes a correction wallace tree group sub-circuit 231 and an accumulation sub-circuit 232, where an output terminal of the correction wallace tree group sub-circuit 231 is connected to an input terminal of the accumulation sub-circuit 232, the correction wallace tree group sub-circuit 231 is configured to perform accumulation correction processing on the partial product after sign bit expansion is eliminated, and the accumulation sub-circuit 232 is configured to perform accumulation processing on the accumulation correction operation result.

Specifically, the modified wallace tree group sub-circuit 231 may perform the accumulation and modification on the value in the partial product obtained by the regular signed number encoding circuit 211 after the sign bit is removed from the spread, and perform the accumulation and modification on the accumulated and modified result obtained by the modified wallace tree group sub-circuit 13 through the accumulation sub-circuit 232 to obtain the target operation result in the multiplication.

In embodiments, continuing with the detailed structure diagram of the multiplier shown in fig. 4, the multiplier includes the modified wallace tree group sub-circuit 231, and the modified wallace tree group sub-circuit 231 includes wallace tree units 2311-231 n, and the wallace tree units 2311-231 n are used for performing accumulated correction processing on each -column value of the partial product after sign bit elimination expansion.

Specifically, the circuit structure of the Wallace tree cells 2311-231 n can be implemented by a combination of full adders and half adders, and it can be understood that the Wallace tree cells 2311-231 n are types capable of processing multi-bit input signalsOptionally, the number N of the Wallace tree cells included in the modified Wallace tree group subcircuit 231 may be equal to 2 times the bit width N of the partial product after sign bit extension removal, where N may represent the number of the values included in the target code obtained by the regular signed number encoding circuit 21 minus 1, and at the same time, the N Wallace tree cells may perform parallel processing on the partial product of the target code, but the connection may be serial connection, where the partial product of the target code may be all the partial products after sign bit extension removal obtained by the partial product obtaining circuit 22. optionally, every Wallace tree cells in the modified Wallace tree group subcircuit 23 may perform addition processing on all the values of every columns of all the partial products after sign bit extension removal, and every Wallace tree cells may output two signals, that is, a Carry signal Carry_iAnd Sum signals Sum_iWhere i may represent the corresponding number of every Wallace tree cells, and the number of the Wallace tree cell is 0. optionally, the number of input signals received by every Wallace tree cells may be equal to the number of all values contained in the target code or the total number of partial products after sign bit extension is removed, and may also be equal to the number of all values contained in the target code or the total number of partial products after sign bit extension is removed plus 1.

It should be noted that, in the process of adding every column values of all partial products after sign bit expansion is eliminated by the multiplier, two columns of data in the partial products after sign bit expansion are corrected by correcting two walsh tree units in the walsh tree group sub-circuit 231, that is, the input signals of two walsh tree units corresponding to two columns of data in the partial products after sign bit expansion are respectively eliminated, which are input signals more than the input signals of every walsh tree units corresponding to other column values in the partial products after sign bit expansion are eliminated, and the input signals are 1.

In addition, the signals received by each Wallace tree cells in modified Wallace tree group subcircuit 231 may include carry input signals Cin_iPartial product input signal, carry output signal Cout_iAlternatively, the partial product input signal received by each Wallace tree cells may be the value of each columns in all the partial products after sign bit extension is removed, and the carry signal Cout output by each Wallace tree cells_iMay be equal to N_Cout＝floor((N_I+N_Cin)/2) -1. Wherein N is_IMay represent the number of partial product value input signals, N, of the Wallace Tree cell_CinMay represent the number of carry input signals, N, of the Wallace Tree cell_CoutOptionally, the carry input signal received by every Wallace tree cells in modified Wallace tree group subcircuit 231 may be the carry output signal output by the last Wallace tree cells, while the carry input signal received by the th Wallace tree cell is 0, and at the same time, the number of carry signal input ports received by the th Wallace tree cell may be the same as the number of carry signal input ports of the other Wallace tree cells.

In this embodiment, if the serial numbers of the N walsh tree units connected in series in the modified walsh tree group sub-circuit 231 are 1, 2, …, i, …, N, the modified walsh tree group sub-circuit 231 may perform the modification processing on the two columns of data corresponding to the partial product after the sign bit expansion is removed by the i-th walsh tree unit and the N-th walsh tree unit, and if the -th sign bit expansion-removed partial product obtained by the regular signed number encoding circuit 21 has the bit numbers corresponding to the least significant bits from the least significant bit to the most significant bit of 1, 2, …, m-2, m-1, m, where m corresponds to the number of the Q bit, and 1 corresponds to the number of the least significant bit in the -th sign bit expansion-removed partial product, i may be equal to N, it may be understood that the modified walsh tree group sub-circuit 231 may perform the modification processing on the bit width of the walsh tree group after the sign bit expansion by the N-th walsh tree unit and the last multipliers, where N may represent the multiplier received bit width of the modified walsh tree group.

Illustratively, if multiplicationThe processor currently processes 8 bits by 8 bits fixed point multiplication operation, and the partial product obtained by the partial product obtaining circuit 22 after eliminating sign bit expansion is' p_i8p_i7p_i6p_i5p_i4p_i3p_i2p_i1p_i0"(i ═ 1, …, n ═ 9), where i may represent the ith sign-removal bit extended partial product, then the distribution rule of the 9 sign-removal bit extended partial products may be as shown in fig. 5, each origin represents each bit value in the sign-removal bit extended partial product, counted from the rightmost column to the leftmost column (17 columns of partial product values are shown in the figure, and in actual operation, the value in the penultimate column overflows, i.e., the highest bit value of the last sign-removal bit extended partial product overflows and does not participate in the subsequent accumulation operation), and 16 Wallace tree units are required in total to perform the accumulated correction processing on the 9 sign-removal bit extended partial products, the modified Wallace group sub-circuit 231 may perform the correction processing by the 8 th Wallace tree unit and the last Wallace tree units, the circuit diagram of 16 units and the two Wallace tree units implementing the correction processing each represent the number of Wallace tree units, where the Wallace tree group sub-circuit 231 may output a full-tree signal corresponding to the Wallace tree unit number, and where the Wallace unit represents no sign bit, and where the number of Wallace tree number corresponds to the Wallace tree unit 1, and where a full-Wallace tree unit represents no signal output.

In the multipliers provided by this embodiment, the modified wallace tree group sub-circuit in the multiplier can perform accumulation modification on the partial products obtained by the partial product obtaining unit after less sign bit elimination expansion to obtain the target operation result in the multiplication, thereby reducing the complexity of the multiplier in realizing multiplication and effectively reducing the power consumption of the multiplier.

In embodiments, continuing with the detailed structural diagram of the multiplier shown in fig. 4, the multiplier includes the accumulation sub-circuit 232, and the accumulation sub-circuit 232 includes an adder 2321, and the adder 2321 is configured to add the accumulation correction operation result.

Specifically, the adder 2321 may be an adder with different bit widths, and the adder may be a carry-look-ahead adder. Optionally, the adder 2321 may receive the two paths of signals output by the modified wallace tree group sub-circuit 231, and perform addition operation on the two paths of output signals to obtain a target operation result in the multiplication operation.

In the multipliers provided by the embodiment, the multiplier can accumulate two paths of signals output by the modified wallace tree group sub-circuit through the accumulation sub-circuit to obtain a target operation result of multiplication, and the process can reduce the complexity of the multiplier for realizing multiplication and effectively reduce the power consumption of the multiplier.

In embodiments, the multiplier includes the adder 2321, the adder 2321 includes a carry signal input port 2321a, a sum bit signal input port 2321b and a result output port 2321c, the carry signal input port 2321a is configured to receive a carry signal, the sum bit signal input port 2321b is configured to receive a sum bit signal, and the result output port 2321c is configured to output the target operation result of the accumulation processing of the carry signal and the sum bit signal.

Specifically, the adder 2321 may receive the Carry signal Carry output by the modified wallace tree group sub-circuit 231 through the Carry signal input port 2321a, receive the Sum bit signal Sum output by the modified wallace tree group sub-circuit 231 through the Sum bit signal input port 2321b, add the Carry signal Carry and the Sum bit signal Sum, and output the result through the result output port 2321 c.

It should be noted that, during multiplication, the multiplier may adopt an adder 2321 with different bit widths to add the Carry output signal Carry output by the modified wallace tree group sub-circuit 231 and the Sum output signal Sum, where the bit width of the data that can be processed by the adder 2321 may be equal to 2 times of the bit width N of the data currently processed by the multiplier, optionally, each wallace tree units in the modified wallace tree group sub-circuit 231 may output Carry output signals Carry_iAnd Sum bit output signals Sum_i(i＝0，…, 2N-1, i is the corresponding number of each wallace tree units, starting with 0.) optionally, the Carry { [ Carry ] received by the adder 2321₀：Carry_2N-2]0, that is, the bit width of the Carry output signal Carry received by the adder 2321 is 2N, the first 2N-1 bit value in the Carry output signal Carry corresponds to the Carry output signal of the first 2N-1 wallace tree cells in the modified wallace tree group sub-circuit 231, and the last bit value in the Carry output signal Carry may be replaced by 0. optionally, the bit width of the Sum bit output signal Sum received by the adder 2321 is 2N, and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of every wallace tree cells in the modified wallace tree group sub-circuit 231.

For example, if the multiplier is currently processing 8bit by 8bit fixed point multiplication, the adder 2321 may be a 16 bit Carry look ahead adder, as shown in fig. 6, the modified wallace tree group sub-circuit 231 may output Sum output signals Sum and Carry output signals Carry of 16 wallace tree cells, however, the Sum output signal received by the 16 bit Carry look ahead adder may be the complete Sum output signal Sum output by the modified wallace tree group sub-circuit 231, and the Carry output signal received may be the Carry output signal Carry of the modified wallace tree group sub-circuit 231 after all Carry output signals except the Carry output signal output by the last wallace tree cells are combined with 0.

The multipliers provided by the embodiment can perform accumulation processing on two paths of signals output by the modified wallace tree group sub-circuit through the accumulation sub-circuit to obtain a target operation result of multiplication, and the process can reduce the complexity of the multiplier for realizing multiplication and effectively reduce the power consumption of the multiplier.

Fig. 7 is a flow chart of data processing methods provided by embodiments, where the methods can be processed by the multiplier shown in fig. 1, and the present embodiment relates to a process of data multiplication, as shown in fig. 7, the method includes:

s101, receiving data to be processed.

Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through a regular signed number encoding circuit. The bit width of the multiplier may be equal to the bit width of the multiplicand.

And S102, performing regular signed number coding processing on the data to be processed to obtain a target code.

Specifically, the multiplier may perform regular signed number encoding processing on the received multiplier to be processed through a regular signed number encoding circuit, so as to obtain the target code. The bit width of the target code may be equal to the bit width N of the multiplier to be processed plus 1.

Optionally, the step of performing regular signed number coding processing on the data to be processed in S102 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the data to be processed into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.

It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))_l-1(-1) ", and combining the rest corresponding (N-l) bit values with the converted (l +1) bit values to obtain new data, and then using the new data as the initial data of the next -level conversion process until there is no continuous l (l) in the new data obtained after the conversion process>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1).

S103, obtaining a partial product after sign bit expansion elimination according to the data to be processed and the target code.

It should be noted that the regular signed number encoding circuit may obtain partial products after sign bit extension is eliminated according to a multiplicand in the multiplication operation and a target code obtained by regular signed number encoding, and the number of the partial products after sign bit extension is eliminated may be equal to the bit width of the target code.

And S104, performing accumulation correction processing on the partial product after eliminating sign bit expansion to obtain a target operation result.

The method comprises the steps of performing accumulation correction processing on partial products subjected to sign bit elimination expansion through layer-by-layer full adders in a correction accumulation circuit until the last layers of full adders finish operation to obtain target operation results in multiplication operation, wherein the accumulation correction processing can be characterized in that correction processing is performed in the process of accumulating the partial products subjected to sign bit elimination expansion, the correction processing can be performed through layers of full adders in the correction accumulation circuit and two full adders in the last layers of full adders, optionally, the target operation results can be operation results subjected to sign bit elimination expansion and correction accumulation processing, and the correction accumulation circuit can perform correction processing on numerical values in the partial products subjected to sign bit elimination expansion through two full adders in layers and the last layers of full adders in the accumulation correction processing, wherein if numbers are matched by every full adders, the full adders in the layers of full adders can be corrected to be full adders in the highest full adder number adder, and full adders in the layers of full adders can be corrected full adders.

In addition, the multiplier can also perform accumulation processing on each column values of the partial product after sign bit expansion is eliminated through a correction Wallace tree group sub-circuit in the correction accumulation circuit, perform correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, output carry output signals and sum bit output signals after correction processing through the correction Wallace tree group sub-circuit, and perform accumulation processing on carry output signals of the correction Wallace tree group sub-circuit and signals obtained by replacing the last sum bit signals with 0 through the accumulation sub-circuit and output a target operation result.

It should be noted that, if the multiplier currently processes N-bit data operation, and 2N wallace tree units are connected in series in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree units starts from 0, the modified wallace tree group sub-circuit may perform the modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

The data processing methods provided by this embodiment receive data to be processed, perform regular signed number coding processing on the data to be processed to obtain a target code, obtain a partial product with sign bit extension eliminated according to the data to be processed and the target code, and perform accumulation correction processing on the partial product with sign bit extension eliminated to obtain a target operation result.

Another embodiment of the data processing method, wherein the obtaining a partial product with sign bit removed after spreading according to the data to be processed and the target code in S103 includes:

and S1031, obtaining an original partial product according to the data to be processed and the target code.

It should be noted that the number of the original partial products may be equal to the bit width of the target code.

Illustratively, if the partial product fetch unit receives multiplicands "x" of 8 bits₇x₆x₅x₄x₃x₂x₁x₀"(i.e., X), then the partial product acquisition unit may be based on the multiplicand" X₇x₆x₅x₄x₃x₂x₁x₀"(i.e., X) directly obtains the corresponding original partial product with three values-1, 0, 1 contained in the target code, where the original partial product may be-X when the -bit value in the target code is-1, 0 when the -bit value in the target code is 0, and X when the -bit value in the target code is 1.

S1032, the original partial product is subjected to addition operation processing, and the partial product with sign bit expansion eliminated is obtained.

Optionally, the adding operation processing on the original partial product in S1032 to obtain a partial product with sign bit extension removed includes: and performing AND logic operation on the highest-order numerical value of the original partial product to obtain the partial product with the sign bit being eliminated from expansion.

Optionally, the extra -bit value Q in the partial product after sign bit elimination extension may be a carry signal for performing an and logic operation on the highest bit value in the original partial product and the signal 1, and the second highest bit value in the partial product after sign bit elimination extension may be a sum signal for performing an and logic operation on the highest bit value in the original partial product and the signal 1.

According to the data processing methods provided by this embodiment, an original partial product is obtained according to the data to be processed and the target code, and an and logic operation is performed according to the highest-order numerical value of the original partial product to obtain a partial product from which sign bit extension is removed, and then the partial product from which sign bit extension is removed is subjected to an accumulation correction process to obtain a target operation result in multiplication.

Fig. 8 is a flow chart of data processing methods provided by embodiments, where the methods can be processed by the multiplier shown in fig. 2, and this embodiment relates to a process of data multiplication, as shown in fig. 8, the method includes:

s201, receiving data to be processed.

Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through a regular signed number encoding circuit. Wherein the bit width of the multiplier may be equal to the bit width of the multiplicand.

S202, performing regular signed number coding processing on the data to be processed to obtain an original partial product.

Specifically, the multiplier performs regular signed number encoding processing on a multiplier in multiplication operation through a regular signed number encoding circuit, and the partial product acquisition circuit can obtain an original partial product according to a result of the regular signed number encoding processing.

And S203, carrying out logic operation processing according to the original partial product, and eliminating sign extension bits to obtain the partial product after sign bit extension is eliminated.

Specifically, the multiplier may perform logical operation processing on the original partial product through a logic unit in the partial product obtaining circuit, and directly eliminate the value of the sign extension bit to obtain the partial product after eliminating the sign bit extension.

And S204, performing accumulation correction processing on the partial product after eliminating sign bit expansion to obtain a target operation result.

The multiplier may perform accumulation correction processing on the partial product after sign bit extension is removed by using a layer-by-layer full adder in the correction accumulation circuit until the last layers of full adders finish operation to obtain an operation result, and optionally, the accumulation correction processing may be characterized in that correction processing is performed during accumulation of the partial product after sign bit extension is removed, and the correction processing may be performed by using a th layer full adder in the correction accumulation circuit and two full adders in a layer full adder.

In addition, the multiplier can also accumulate each column values of partial products after sign bit expansion is eliminated through a correction Wallace tree group sub-circuit in the correction accumulation circuit, can perform correction processing through two Wallace tree units in the correction Wallace tree group sub-circuit in the accumulation processing process, outputs the Carry output signal and the sum bit output signal after correction processing through the correction Wallace tree group sub-circuit, and finally outputs all Carry output signals Carry of the correction Wallace tree group sub-circuit through the accumulation sub-circuit_iAnd replaces the last Sum signals Sum with 0_2NIt should be noted that, if the multiplier currently processes the N-bit data operation and 2N wallace tree units are connected in series in the modified wallace tree group sub-circuit, and the number corresponding to each wallace tree units starts from 0, the modified wallace tree group sub-circuit may perform the modification processing through the nth wallace tree unit and the 2 nth wallace tree unit.

The data processing methods provided by this embodiment receive data to be processed, perform regular signed number encoding processing on the data to be processed to obtain an original partial product, perform logical operation processing according to the original partial product to obtain a partial product with sign bit being eliminated from extension, perform accumulation correction processing on the partial product with sign bit being eliminated from extension to obtain a target operation result.

In another embodiment of the data processing method, the performing regular signed number coding processing on the data to be processed in S202 to obtain an original partial product includes:

s2021, performing regular signed number coding processing on the data to be processed to obtain target codes.

Specifically, the multiplier may perform regular signed number encoding processing on a multiplier in the multiplication operation through a regular signed number encoding circuit to obtain the target code. Optionally, after the regular signed number coding processing, the obtained target code includes three values, which are-1, 0 and 1, respectively.

Optionally, the step of performing regular signed number coding processing on the data to be processed in the above S2021 to obtain the target code may include: and converting continuous l-bit numerical values 1 in the data to be processed into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.

S2022, obtaining the original partial product according to the data to be processed and the target code.

It should be noted that the number of original partial products may be equal to the bit width of the target code.

Illustratively, if the original partial product fetch unit receives multiplicands "x" of 8 bits₇x₆x₅x₄x₃x₂x₁x₀"(i.e., X), the original partial product fetch unit may be based on the multiplicand" X₇x₆x₅x₄x₃x₂x₁x₀"(i.e., X) directly yields the corresponding original partial product with three values-1, 0, 1 contained in the target code,the original partial product may be-X when the bit value in the target code is-1, 0 when the bit value in the target code is 0, and X when the bit value in the target code is 1.

The data processing methods provided by this embodiment perform regular signed number encoding on the data to be processed to obtain a target code, obtain the original partial product according to the data to be processed and the target code, then perform sign bit extension elimination processing on the original partial product, and perform accumulation correction processing on the partial product after sign bit extension elimination to obtain a target operation result in multiplication.

In another , the method for processing data, wherein the performing logic operation processing according to the original partial product and removing the sign extension bit to obtain the partial product with the sign bit removed includes performing and logic operation processing on the highest-order value of the original partial product and removing the sign extension bit to obtain the partial product with the sign bit removed.

Specifically, the multiplier may perform an and logic operation on the highest order value in the original partial product by a logic unit in the partial product obtaining circuit to obtain the next highest order value in the partial product after sign bit expansion elimination and the highest order value, and may perform an and logic operation on the highest order value in the original partial product by a logic unit in the partial product obtaining circuit and the signal 1 to obtain the extra -bit value Q in the partial product after sign bit expansion elimination and the next highest order value in the partial product after sign bit expansion elimination (i.e., the lower -bit value of the Q bit).

According to the data processing methods provided by this embodiment, after processing data to be processed, an original partial product is obtained, and the most significant digit value of the original partial product is subjected to and logic operation, and the sign extension bit is eliminated to obtain a partial product after sign bit extension is eliminated, so that the power consumption of the multiplier can be effectively reduced.

The embodiment of the application also provides machine learning operation devices, which comprise or more multipliers mentioned in the application, and are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting an execution result to peripheral equipment through an I/O interface.

The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.

The embodiment of the present application further provides combination processing devices, which include the above machine learning arithmetic device, the universal interconnection interface, and other processing devices, the machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user, and fig. 9 is a schematic diagram of the combination processing device.

The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, including data transportation, and completing basic control of starting, stopping and the like of the machine learning arithmetic device, and the other processing devices can also cooperate with the machine learning arithmetic device to complete arithmetic tasks.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.

Alternatively, as shown in fig. 10, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.

In , chips including the machine learning computing device or the combination processing device are also disclosed.

In , chip package structures including the above chip are disclosed.

In embodiments, cards are provided that include the chip package structure illustrated above, and as shown in fig. 11, fig. 11 provides cards that include other complementary components in addition to the chip 389, including but not limited to, a memory device 390, a receiving device 391, and a control device 392;

the memory device 390 is bussed to the chips in the chip package for storing Data, the memory device may include multiple banks 393. every of the banks are bussed to the chips it will be appreciated that every of the banks may be DDR SDRAM (Double Data Rate SDRAM).

DDR does not need to increase the clock frequency to double the SDRAM speed, DDR allows the clock pulse rising edge and falling edge read data, DDR double the speed of standard SDRAM, in embodiments, the memory device can include 4 groups of the memory cells, each groups of the memory cells can include a plurality of DDR4 particles (chips). in embodiments, the chip can include 4 72 DDR4 controllers, the 72 DDR4 controller 64bit for data transfer, 8bit for ECC check, it can be understood that when each group of the memory cells in DDR4-3200 particles, the theoretical bandwidth of data transfer can reach 25600 MB/s.

In embodiments, each group of the memory units comprises a plurality of double rate synchronous dynamic random access memories (DDR SDRAMs) arranged in parallel, data can be transmitted twice in clock cycles, and a controller for controlling the DDR is arranged in the chip and used for controlling data transmission and data storage of each memory unit.

Preferably, when PCIE 3.0X 16 interface is adopted for transmission, the theoretical bandwidth can reach 16000MB/s, in another embodiments, the receiving device can also be other interfaces, the application does not limit the concrete representation forms of the other interfaces, the interface unit can realize the switching function, in addition, the calculation result of the chip is still transmitted back to the external equipment (such as the server) by the receiving device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.

In , kinds of electronic devices including the above board card are applied.

The electronic device may be a multiplier, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are all expressed as series circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

The multipliers are characterized in that the multiplier comprises a regular signed number coding circuit, a partial product acquisition circuit and a correction accumulation circuit, wherein the output end of the regular signed number coding circuit is connected with the input end of the partial product acquisition circuit, the output end of the partial product acquisition circuit is connected with the input end of the correction accumulation circuit, the partial product acquisition circuit comprises an original partial product acquisition unit and a logic unit, and the correction accumulation circuit comprises a correction Wallace tree group sub-circuit and an accumulation sub-circuit;

the regular signed number coding circuit is used for carrying out regular signed number coding on received data to obtain target codes, the original partial product obtaining unit is used for obtaining original partial products according to the target codes, the logic unit is used for carrying out logic operation processing on highest-order numerical values of the original partial products to obtain partial products with sign bit expansion eliminated, the corrected Wallace tree group sub-circuit is used for carrying out accumulation correction processing on the partial products with sign bit expansion eliminated, and the accumulation sub-circuit is used for carrying out accumulation processing on accumulated correction operation results.
2. The multiplier of claim 1, wherein the regular signed number coding circuit comprises a data input port for receiving th data subjected to regular signed number coding and a target coding output port for outputting a target code obtained by performing regular signed number coding on the th data.
3. The multiplier of claim 1 or 2, wherein the partial product acquisition circuit comprises an and circuit.
4. The multiplier of claim 1, wherein the modified Wallace tree group sub-circuit comprises Wallace tree cells configured to perform an accumulated correction process on each columns of values of the partial product after sign-elimination extension.
5. The multiplier of claim 1, wherein the accumulation sub-circuit comprises: and the adder is used for adding the accumulated correction operation result.
6, machine learning arithmetic device, which comprises or more multipliers as claimed in any of claims 1-5, for obtaining input data and control information to be operated from other processing devices except the multipliers in the machine learning arithmetic device, executing the designated machine learning arithmetic, and transmitting the execution result to other processing devices except the multipliers in the machine learning arithmetic device through I/O interface;

when the machine learning arithmetic device comprises a plurality of multipliers, the multipliers are connected through a preset structure and transmit data;

the multipliers are interconnected through a PCIE bus and transmit data to support larger-scale machine learning operation, the multipliers share the same control system or own control system, the multipliers share a memory or own memory, and the interconnection mode of the multipliers is any interconnection topology.
7, kinds of combined processing device, characterized in that, the combined processing device comprises the machine learning arithmetic device of claim 6, a universal interconnection interface and other processing devices except the machine learning arithmetic device in the combined processing device;

and the machine learning arithmetic device interacts with other processing devices except the machine learning arithmetic device in the combined processing device to jointly complete the calculation operation designated by the user.
8. The combined processing device according to claim 7, further comprising: and a storage device connected to each of the machine learning arithmetic device and the combined processing device except the machine learning arithmetic device and the storage device, for storing data of the machine learning arithmetic device and the combined processing device except the machine learning arithmetic device and the storage device.
A neural network chip of claim 9, , wherein the neural network chip comprises the machine learning arithmetic device of claim 6 or the combined processing device of claim 7 or the combined processing device of claim 8.
10, electronic device, characterized in that the electronic device comprises a chip according to claim 9.