CN209895329U - Multiplier and method for generating a digital signal - Google Patents

Multiplier and method for generating a digital signal Download PDF

Info

Publication number
CN209895329U
CN209895329U CN201921433513.6U CN201921433513U CN209895329U CN 209895329 U CN209895329 U CN 209895329U CN 201921433513 U CN201921433513 U CN 201921433513U CN 209895329 U CN209895329 U CN 209895329U
Authority
CN
China
Prior art keywords
circuit
multiplier
data
sub
partial product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201921433513.6U
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201921433513.6U priority Critical patent/CN209895329U/en
Application granted granted Critical
Publication of CN209895329U publication Critical patent/CN209895329U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The present application provides a multiplier comprising: the device comprises a multiplication operation circuit, a register control circuit, a register circuit, a state control circuit and a selection circuit; the multiplication operation circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, wherein the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the register control circuit, the output end of the register control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the register control circuit, the second output end of the state control circuit is connected with the second input end of the selection circuit, the multiplier can carry out regular signed number coding on received data, the number of obtained effective partial products is small, and the complexity of the multiplier for realizing multiplication operation is reduced.

Description

Multiplier and method for generating a digital signal
Technical Field
The present application relates to the field of computer technologies, and in particular, to a multiplier.
Background
With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has higher and higher requirements for high-performance digital multipliers. As one of algorithms widely used by an intelligent chip, a neural network algorithm is a common operation in which multiplication is performed by a multiplier.
At present, a multiplier takes every three bits of a multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a wallace tree to obtain a target operation result. However, in the conventional technique, the number of non-zero values in the code is large, and the number of the generated corresponding partial products is large, so that the complexity of the multiplier for realizing multiplication operation is high.
SUMMERY OF THE UTILITY MODEL
In view of the above, there is a need to provide a multiplier that can reduce the number of effective partial products obtained during multiplication to reduce the complexity of multiplication of the multiplier.
An embodiment of the present application provides a multiplier, including: the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the registration control circuit, the output end of the registration control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the registration control circuit, and the second output end of the state control circuit is connected with the second input end of the selection circuit.
In one embodiment, the regular signed number coding sub-circuit includes a regular signed number coding unit and a partial product obtaining unit, the regular signed number coding unit is configured to receive first data and perform the regular signed number coding processing on the first data to obtain the target code, the partial product obtaining unit is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product, the accumulation sub-circuit is configured to perform accumulation processing on the partial product of the target code to obtain a multiplication result, the state control circuit is configured to obtain a storage indication signal and a read indication signal, the register control circuit is configured to obtain the storage indication signal according to the storage indication signal input by the state control circuit, and the selection circuit is used for reading data in the multiplication result stored in the register circuit according to the received reading indication signal to be used as a target operation result.
In one embodiment, the regular signed number encoding unit may include: a data input port and a target code output port; the data input port is used for receiving the first data subjected to regular signed number coding processing, and the target coding output port is used for outputting the target code obtained after the first data is subjected to regular signed number coding processing.
In one embodiment, the partial product obtaining unit is specifically configured to perform conversion processing on the target code to obtain an original partial product, perform sign bit extension processing on the original partial product to obtain a sign bit extended partial product, and obtain the partial product of the target code according to the sign bit extended partial product.
In one embodiment, the partial product obtaining unit includes: a target code input port, a second data input port, and a partial product output port; the target code input port is configured to receive the target code, the second data input port is configured to receive the second data, and the partial product output port is configured to output a partial product of the target code.
In one embodiment, the accumulation sub-circuit comprises: a Wallace tree group unit and an accumulation unit; the output end of the Wallace tree group unit is connected with the input end of the accumulation unit; the Wallace tree group unit is used for accumulating the partial product of the target code to obtain an accumulation operation result, and the accumulation unit is used for accumulating the accumulation operation result.
In one embodiment, the wallace tree group unit includes: a Wallace tree subunit to accumulate each column of values in the partial products of all target codes.
In one embodiment, the accumulation unit includes: and the adder is used for adding the received accumulated correction result.
In one embodiment, the adder comprises: a carry signal input port, a sum signal input port and a result output port; the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the result output port is used for outputting a result of accumulation processing of the carry signal and the sum signal.
In one embodiment, the register circuit comprises: and the register sub-circuit is used for storing the multiplication operation results corresponding to different storage indication signals.
An embodiment of the present application provides a multiplier, including: the circuit comprises a multiplication operation circuit and a revolution circuit, wherein the multiplication operation circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the input end of the revolution circuit, and the revolution circuit comprises a first conversion sub-circuit and a second conversion sub-circuit;
the regular signed number coding sub-circuit is used for performing regular signed number coding processing on received data to obtain a target code and obtaining a partial product of the target code according to the target code, the accumulation sub-circuit is used for performing accumulation processing on the partial product of the target code to obtain a multiplication result, and the first conversion sub-circuit and the second conversion sub-circuit are respectively used for performing revolution processing on the multiplication result to obtain a target operation result.
In one embodiment, the revolution circuit comprises an input port for receiving a data conversion signal; the data conversion signal is used for determining the data conversion type processed by the revolution circuit.
In one embodiment, the first conversion sub-circuit is specifically configured to convert the result of the multiplication operation into the target operation result of a floating-point type, and the second conversion sub-circuit is specifically configured to convert the result of the multiplication operation into the target operation result of a fixed-point type.
In the multiplier provided by this embodiment, the regular signed number encoding of the received data can be performed by the regular signed number encoding sub-circuit, and the number of the obtained effective partial products is small, thereby reducing the complexity of the multiplier in realizing multiplication.
The machine learning arithmetic device provided by the embodiment of the application comprises one or more multipliers; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of multipliers, the plurality of computing devices are connected through a preset specific structure and transmit data;
the multipliers are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.
The combined processing device provided by the embodiment of the application comprises the machine learning processing device, the universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.
The neural network chip provided by the embodiment of the application comprises the multiplier, the machine learning arithmetic device or the combined processing device.
The neural network chip packaging structure provided by the embodiment of the application comprises the neural network chip.
The board card provided by the embodiment of the application comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
The chip provided by the embodiment of the application comprises at least one multiplier as described in any one of the above.
An electronic device provided by the embodiment of the application comprises the chip.
Drawings
Fig. 1 is a schematic structural diagram of a multiplier according to an embodiment;
FIG. 2 is a schematic diagram of a multiplier according to another embodiment;
FIG. 3 is a schematic diagram illustrating a distribution rule of partial products of 9 target codes according to another embodiment;
FIG. 4 is a diagram of a specific circuit structure of an accumulation circuit for 8-bit data operation according to another embodiment;
FIG. 5 is a flowchart illustrating a data processing method according to an embodiment;
FIG. 6 is a flow chart illustrating another data processing method according to another embodiment;
FIG. 7 is a block diagram of a combined processing device according to an embodiment;
FIG. 8 is a block diagram of another integrated processing device according to an embodiment;
fig. 9 is a schematic structural diagram of a board card according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The multiplier provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip, or other hardware circuit devices for comparison operation processing, and the specific structural schematic diagrams thereof are shown in fig. 1 and fig. 2.
Fig. 1 is a schematic structural diagram of a multiplier according to an embodiment. The multiplier includes: a multiplication operation circuit 11, a register control circuit 12, a register circuit 13, a state control circuit 14, and a selection circuit 15, the multiplication circuit 11 comprises a regular signed number encoding sub-circuit 111 and an accumulation sub-circuit 112, the output of the regular signed number encoding sub-circuit 111 is connected to the input of the accumulation sub-circuit 112, an output of the accumulation sub-circuit 112 is connected to a first input of the register control circuit 12, an output terminal of the register control circuit 12 is connected to an input terminal of the register circuit 13, an output of said register circuit 13 is connected to a first input of said selection circuit 15, a first output of the state control circuit 14 is connected to a second input of the register control circuit 13, a second output of the state control circuit 14 is connected to a second input of the selection circuit 15.
The regular signed number coding sub-circuit 111 includes a regular signed number coding unit 1111 and a partial product obtaining unit 1112, where the regular signed number coding unit 1111 is configured to receive first data and perform regular signed number coding processing on the first data to obtain the target code, the partial product obtaining unit 1112 is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product, and the accumulation sub-circuit 112 is configured to perform accumulation processing on the partial product of the target code to obtain a multiplication result; the state control circuit 14 is configured to obtain a storage indication signal and a reading indication signal; the register control circuit 12 is configured to determine the register circuit 13 storing the multiplication result according to the storage instruction signal input by the state control circuit 14, the register circuit 13 is configured to store the multiplication result, and the selection circuit 15 is configured to read data in the multiplication result stored in the register circuit 13 as a target operation result according to the received read instruction signal.
Specifically, the regular signed number coding sub-circuit 111 may perform regular signed number coding on the received first data through the regular signed number coding unit 1111 to obtain the target code, where the first data may be a multiplier in the multiplication. Optionally, the partial product obtaining unit 1112 may obtain an original partial product according to the received second data and the target code, and obtain a partial product of the target code according to the original partial product, where the second data may be a multiplicand in the multiplication operation. The multiplier and the multiplicand may be fixed-point numbers with the same bit width. Alternatively, the register circuit 13 may include a plurality of memory cells. Optionally, the bit width of the multiplication result may be equal to 2 times the bit width of the data received by the regular signed number encoding sub-circuit 111. Optionally, the regular signed number coding sub-circuit 111 may process data with a fixed bit width, and a bit width of the data received by the regular signed number coding sub-circuit 111 may be equal to a bit width of an input port of the multiplier, and in addition, in this embodiment, a bit width of an output port of the multiplier may be less than 2 times of a bit width of the input port. Alternatively, there may be a plurality of input ports of the selection circuit 15, each input port may have a different function, and there may be one output port. Optionally, the bit width of the target operation result may be equal to 1/2 of the bit width of the multiplication result, which is not limited in this embodiment. In this embodiment, it can be further understood that the bit width of the target operation result may be less than 2 times the bit width of the multiplication result. Optionally, the number of the target codes may be equal to the number of partial products of the target codes, and the target codes may include three values, which are-1, 0, and 1, respectively.
It should be noted that, the state control circuit 14 may automatically obtain the corresponding storage indication signal when the accumulation sub-circuit 112 obtains each multiplication, for example, when the accumulation sub-circuit 112 obtains a first multiplication result, the storage indication signal obtained by the state control circuit 14 may be 1, when the accumulation sub-circuit 112 obtains a second multiplication result, the storage indication signal obtained by the state control circuit 14 may be 2, and so on, the accumulation sub-circuit 112 obtains each multiplication result, and the value of the storage indication signal obtained by the state control circuit 14 may be 1 added on the basis of the value of the storage indication signal corresponding to the previous multiplication result. Optionally, the state control circuit 14 may further automatically obtain a read indication signal corresponding to the current clock cycle number when the multiplication result exists in the register circuit 13, where the state control circuit 14 may automatically obtain the current clock cycle number, and may further receive the clock cycle number transmitted by the external device. For example, if the corresponding read indication signal obtained by the state control circuit 14 in the first clock cycle when the register circuit 13 stores the first multiplication result may be 1, at this time, the selection circuit 15 may read a part of the data stored in the register circuit 13, and in the second clock cycle, the corresponding read indication signal obtained by the state control circuit 14 may be 2, at this time, the selection circuit 15 may read the remaining part of the data in the first multiplication result stored in the register circuit 13, and it may also be understood that the multiplier may output one multiplication result corresponding to two clock cycles; however, when the second multiplication result is obtained after five clock cycles are required after the first multiplication result is obtained, the register circuit 13 may store the second multiplication result only in the sixth clock cycle, and at this time, the corresponding read indication signal obtained by the state control circuit 14 may be 3, and the value corresponding to the read indication signal may be determined according to the number of data stored in the register circuit 13.
In addition, the multiplication result obtained by the accumulation sub-circuit 112 is not the target operation result obtained by the multiplier, the target operation result can be obtained by splicing two operation results output by the multiplier twice, and the operation result output by the selection circuit 15 in the multiplier for the first time is spliced with the operation result output by the selection circuit for the second time, so that the target operation result obtained by the multiplier for each multiplication can be obtained. The multiplication circuit 11 may output one target operation result for a plurality of clock cycles.
It should be noted that the multiplier may receive the multiplication result output by the accumulation sub-circuit 112 every time of multiplication through the register control circuit 12, and determine a storage unit storing each multiplication result according to the received storage instruction signal. Alternatively, the selection circuit 15 may determine to read data in the multiplication result stored in the corresponding register circuit 13 according to the received different read indication signals. Optionally, if the bit width of the input port of the multiplier is N and the bit width of the received data is also N, at this time, the bit width M of the output port of the multiplier may be equal to 2N/t + deta ((2N/t + deta) <2N), where, in a normal case, the multiplication circuit 11 may complete the multiplication operation implemented by the multiplier once through t (t >1) clock cycles to obtain a multiplication result, and store the multiplication result obtained by the accumulation sub-circuit 112 in the multiplication circuit 11 into the register circuit 13, where deta (deta > ═ 0) is a constant. In addition, there is a case where there is a small probability that the multiplier may complete one multiplication operation over m (m < t, and m < ═ 1) clock cycles to obtain one multiplication result, and store the multiplication result obtained by the accumulation sub-circuit 112 in the multiplication circuit 11 in the register circuit 13. Optionally, the selection circuit 15 may read data in the multiplication result stored in the register circuit 13 twice, where a bit width of the multiplication result may be equal to 2N, a bit width of the data in the read multiplication result may be equal to N, the selection circuit 15 may read high N-bit data and low N-bit data in the same multiplication result twice as the two-time operation result, and concatenate the two operation results to obtain a target operation result obtained by performing multiplication by the multiplier.
In addition, in this embodiment, it is understood that the partial product obtaining unit 1112 may obtain a sign bit extended partial product according to the original partial product, and obtain a target coded partial product according to the sign bit extended partial product. Optionally, the bit width of the partial product after the sign bit expansion may be equal to 2 times of the bit width N of the data received by the multiplier, and the bit width of the original partial product may be equal to the bit width N of the data received by the multiplier. Optionally, the upper N-bit value in the partial product after sign bit extension may be equal to the highest bit value in the original partial product, that is, the sign bit value of the original partial product, that is, the upper N + 1-bit value in the partial product after sign bit extension is equal to the lower N-1-bit value in the original partial product, and the lower N-1-bit value may be equal to the lower N-1-bit value in the original partial product.
For example, if the multiplier currently processes 8 bits by 8 bits fixed point multiplication, an original partial product obtained by the partial product obtaining unit 1112 is "p7p6p5p4p3p2p1p0", sign bit expansion processing is carried out on the original partial product, and the obtained partial product after sign bit expansion can be expressed as" p7p7p7p7p7p7p7p7p7p6p5p4p3p2p1p0”。
It can also be understood that, in the distribution rule of the partial products of all the target codes, each partial product of a target code may have a corresponding partial product after sign bit extension, the partial product of a first target code may be the partial product after first sign bit extension, and starting from the partial product of a second target code, the corresponding partial product after sign bit extension may be shifted to the left by one digit value on the basis of the partial product of the previous target code, and the highest digit value of each partial product of the target codes and the highest digit value of the partial product of the first target code are located in the same column, which is equivalent to that, after shifting each partial product after sign bit extension from the partial product of the second target code to the left, the higher digit value corresponding to the left shift is not subjected to addition operation.
In the multiplier provided by this embodiment, the multiplier performs regular signed number coding processing on received data through the regular signed number coding sub-circuit to obtain a target code, and obtains a partial product of the target code according to the target code, performs accumulation processing on the partial product after sign bit expansion through the accumulation sub-circuit to obtain a multiplication result, obtains a storage indication signal and a read indication signal through the state control circuit, and determines a register circuit storing the multiplication result according to the storage indication signal through the register circuit, and simultaneously, reads data in the stored multiplication result in the register circuit according to the read indication signal to obtain a target operation result, and the multiplier can perform regular signed number coding processing on the received data through the regular signed number coding sub-circuit, the number of effective partial products obtained in the multiplication process is reduced, so that the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Fig. 2 is a schematic diagram of a specific structure of a multiplier according to an embodiment. The multiplier includes: a multiplication circuit 21 and a revolution circuit 22, wherein the multiplication circuit 21 comprises a regular signed number coding sub-circuit 211 and an accumulation sub-circuit 212, an output end of the regular signed number coding sub-circuit 211 is connected with an input end of the accumulation sub-circuit 212, an output end of the accumulation sub-circuit 212 is connected with an input end of the revolution circuit 22, and the revolution circuit 22 comprises a first conversion sub-circuit 221 and a second conversion sub-circuit 222; the regular signed number coding sub-circuit 211 is configured to perform regular signed number coding on received data to obtain a target code, and obtain a partial product of the target code according to the target code, the accumulation sub-circuit 212 is configured to perform accumulation processing on the partial product of the target code to obtain a multiplication result, and the first conversion sub-circuit 221 and the second conversion sub-circuit 222 are respectively configured to perform rotation number processing on the multiplication result to obtain a target operation result.
Optionally, the regular signed number encoding sub-circuit 211 includes a regular signed number encoding unit 2111 and a partial product obtaining unit 2112, where the regular signed number encoding unit 2111 is configured to receive first data and perform the regular signed number encoding processing on the first data to obtain the target code, and the partial product obtaining unit 2112 is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product.
Specifically, the regular signed number encoding sub-circuit 211 may perform regular signed number encoding processing on received data, where the data may be a multiplier and a multiplicand in a multiplication operation, and the multiplier and the multiplicand may be fixed-point numbers with the same bit width. Optionally, the regular signed number encoding sub-circuit 211 may include a plurality of data processing sub-circuits with different functions, one or more input ports of the plurality of data processing sub-circuits with different functions may be provided, the function of each input port in each data processing sub-circuit may be different, one output port may be provided, the function of each output port in each data processing sub-circuit may be different, and the circuit structures of the data processing sub-circuits with different functions may be different. Optionally, the revolution circuit 22 may convert the multiplication result output by the accumulation sub-circuit 212 into data in a target format, that is, a target calculation result, where the multiplication result may be a fixed-point number, and the data in the target format may be the fixed-point number or a floating-point number, and in addition, a bit width of the data in the target format may be less than 2 times a bit width of the multiplication result. Alternatively, the target operation result may be a part of data in the multiplication result. Optionally, the bit width of the target operation result may be equal to 1/2 of the bit width of the multiplication result, and may also be equal to 1/4 of the bit width of the multiplication result, which is not limited in this embodiment. In this embodiment, it can also be understood that the bit width of the target operation result is less than 2 times the bit width of the multiplication result. The multiplication result obtained by the accumulation sub-circuit 212 is not the target calculation result obtained by the multiplier performing multiplication, but only part of the data in the target calculation result. Optionally, the number of the target codes may be equal to the number of partial products of the target codes, and the target codes may include three values, which are-1, 0, and 1, respectively.
It should be noted that, the regular signed number encoding sub-circuit 211 may perform multiplication processing on data with a fixed bit width, and a bit width of the data received by the regular signed number encoding sub-circuit 211 may be equal to a bit width of an input port of the multiplier, and in addition, in this embodiment, a bit width of an output port of the multiplier may be less than 2 times of a bit width of the input port.
Optionally, the revolution number circuit 22 includes an input port for receiving a data conversion signal. Optionally, the data conversion signal is used to determine the type of data conversion processed by the revolution number circuit 22.
Alternatively, the data conversion signal may be in various forms, and the circuit 22 may convert the received data into the data in the target format according to the number of revolutions. Optionally, the data conversion types may include fixed-point number to fixed-point number, and fixed-point number to floating-point number. For example, if the bit widths of the input port and the output port of the multiplier are both N, the multiplier may obtain a multiplication result with a bit width of 2N, and the multiplier may convert the multiplication result with a bit width of 2N into a target operation result with a bit width of N through the rotation number circuit 22, where the target operation result may be a floating point number, and in addition, the multiplier may convert the multiplication result with a bit width of 2N into a fixed point number with a bit width of N through the rotation number circuit 22, that is, a target operation result. In this embodiment, the circuit structure and the function of the regular signed number encoding sub-circuit 211 are the same as those of the regular signed number encoding sub-circuit 111, and the detailed structure of the regular signed number encoding sub-circuit 211 is not repeated herein.
In the multiplier provided by this embodiment, the regular signed number coding sub-circuit can be used to perform regular signed number coding processing on received data, so as to reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of the multiplier in realizing multiplication; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As one embodiment, the regular signed number encoding unit 1111 may include: a first data input port 1111a and a target encoding output port 1111 b; the first data input port 1111a is configured to receive the first data subjected to regular signed number encoding, and the target encoding output port 1111b is configured to output the target encoding obtained by performing regular signed number encoding on the first data.
Specifically, the first data received by the first data input port 1111a of the regular signed number encoding unit 1111 may be a multiplier in a multiplication operation, and the multiplier may be a fixed point number. Optionally, the second data received by the partial product obtaining unit 1112 may be a multiplicand in a multiplication operation, the multiplicand may be a fixed-point number, and the multiplier and the multiplicand may be data with the same bit width. Alternatively, the number of target codes may be equal to the number of original partial products and the number of target codes.
It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and the rest will correspond toCombining the (N-l) bit value with the converted (l +1) bit value to obtain new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100(-1) equivalently; and so on, the others are continued by l (l)>2) bit value 1 conversion process is also similar.
For example, the multiplier received by the regular signed number encoding unit 1111 is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1)00(-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1)00(-1)00(-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1)0(-1)00(-1)00(-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1)0(-1)00(-1) 0", and if the fifth new data does not have a continuous l (l > -2) bit value 1, the fifth new data may be called an intermediate code, and after the intermediate code is subjected to one bit complementing process, the representation regular signed number coding process is completed, wherein the bit width of the intermediate code may be equal to the bit width of the multiplier. Optionally, after the regular signed number encoding unit 1111 performs the regular signed number encoding processing on the multiplier, in the obtained new data (i.e. intermediate code), if the highest bit value and the second highest bit value in the new data are "10" or "01", the regular signed number encoding unit 1111 may complement a bit value of 0 to a higher bit of the highest bit value of the intermediate code obtained by the new data, so as to obtain the high three-bit values of the corresponding target codes, which are "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the target code minus 1.
It should be noted that the regular signed number encoding unit 1111 may output the target code through the target code output port 1111 b. Optionally, the bit width of the target code may be equal to the bit width of the data received by the regular signed number encoding unit 1111, and the target code may include three values, which are-1, 0, and 1, respectively.
In the multiplier provided by this embodiment, a regular signed number coding unit in a multiplication circuit may perform regular signed number coding on received data to obtain target codes, a partial product obtaining unit obtains an original partial product according to each target code, obtains a partial product of each target code according to the original partial product, and finally performs accumulation processing on the partial products of the target codes through an accumulation sub-circuit to obtain multiplication processing, a state control circuit obtains a storage indication signal and a read indication signal, a register control circuit determines a register circuit storing the multiplication result according to the storage indication signal, the register circuit stores the multiplication result, and a selection circuit reads data in the stored multiplication result in the register circuit according to the read indication signal to obtain the target calculation result, the multiplier can adopt a regular signed number coding unit to carry out regular signed number coding processing on received data, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of realizing multiplication by the multiplier; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As one embodiment, the partial product obtaining unit 1112 is specifically configured to perform conversion processing on the target code to obtain an original partial product, perform sign bit extension processing on the original partial product to obtain a sign bit extended partial product, and obtain the partial product of the target code according to the sign bit extended partial product.
Specifically, the above conversion process may be characterized as converting the value in the target code into the original partial product based on the multiplicand (i.e., X) in the multiplication operation. Optionally, each bit value in the target code has a corresponding original partial product; if the value in the target code is-1, the corresponding original partial product may be-X, if the value in the target code is 1, the corresponding original partial product may be X, and if the value in the target code is 0, the corresponding original partial product may be 0. Optionally, the original partial product may be a partial product without sign bit extension, and a bit width of the original partial product may be the same as a bit width of data currently processed by the multiplication circuit 11. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width N of the data processed by the multiplier, and at this time, the bit width of the original partial product may be equal to N. Optionally, the lower N-bit value in the sign-bit expanded partial product may be equal to the N-bit value included in the original partial product, and the upper N-bit value in the sign-bit expanded partial product may be equal to the highest bit value of the original partial product, that is, the sign bit value of the original partial product.
In addition, the partial product obtaining unit 1112 may obtain the partial product of the target code according to the obtained partial product after all sign bits are extended, and in the distribution rule of the partial products of all target codes, the partial product of the first target code may be equal to the partial product after the first sign bit expansion, the highest bit value of each target code partial product may be in the same column as the highest bit value of the first target code partial product from the second target code partial product, the bit width of each target code partial product may be equal to the bit width of the last target code partial product minus 1, or may be equal to the bit width of each corresponding sign bit expanded partial product minus 2N (i-1), where i represents the number of the partial products of the target code starting from 1, the resulting distribution diagram of the partial products of the 9 target codes can be seen in fig. 3.
Optionally, the partial product obtaining unit 1112 includes: a target code input port 1112a, a second data input port 1112b, and a partial product output port 1112 c; the target code input port 1112a is for receiving the target code, the second data input port 1112b is for receiving the second data, and the partial product output port 1112c is for outputting a partial product of the target code.
In this embodiment, the partial product obtaining unit 1112 may receive the target code obtained by the regular signed number encoding unit 1111 through the target code input port 1112a, receive the second data through the second data input port 1112b, perform a conversion process according to the target code and the second data, and perform a shift process to obtain a partial product of the target code, and output the partial product of the target code through the partial product output port 1112 c.
In the multiplier provided by the embodiment, the number of effective partial products which can be obtained by the multiplier is small, so that the complexity of realizing multiplication operation by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a multiplier, wherein the multiplier comprises the accumulation sub-circuit 112, and the accumulation sub-circuit 112 comprises: a wallace tree group unit 1121 and an accumulation unit 1122; wherein, the output end of the wallace tree group unit 1121 is connected with the input end of the accumulation unit 1122; the wallace tree group unit 1121 is configured to perform accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit 1122 is configured to perform accumulation processing on the accumulation operation result.
Specifically, the wallace tree group unit 1121 may perform accumulation processing on the numerical values in the partial products of all the target codes obtained by the partial product obtaining unit 1112 to obtain an accumulation operation result, and perform accumulation processing on the accumulation operation result obtained by the wallace tree group unit 1121 through the accumulation unit 1122 to obtain the target operation result.
According to the multiplier provided by the embodiment, the Wallace tree group unit can accumulate partial products of target codes, the accumulation unit accumulates the accumulated results to obtain multiplication results, and the target operation result is obtained according to the multiplication results, so that the number of effective partial products obtained by the multiplier is small, and the complexity of realizing multiplication by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a wallace tree group unit 1121 in the multiplier comprising: the Wallace tree subunits 1121_1 to 1121_ n, wherein the Wallace tree subunits 1121_1 to 1121_ n are used for accumulating the number of each column in the partial product of all target codes.
Specifically, the circuit structures of the wallace tree sub-units 1121_1 to 1121_ n may be implemented by a combination of a full adder and a half adder, and in addition, it can be understood that the wallace tree sub-units 1121_1 to 1121_ n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Optionally, the number n of wallace tree sub-units included in the wallace tree group unit 1121 may be equal to 2 times of the bit width of the data currently processed by the multiplication circuit 11, and the n wallace tree sub-units may perform parallel processing on the partial product of the target code, but the connection manner may be serial connection. Optionally, each wallace tree subunit in the wallace tree group unit 1121 may add each column number value in the partial product of all target codes, and each wallace tree subunit may output two signals, that is, Carry signal CarryiWith a Sum signal SumiWherein i may represent the number corresponding to each Wallace tree subunit, and the number of the first Wallace tree subunit is 1. Alternatively, the number of input signals received by each Wallace tree subunit may be equal to the number of target codes or the number of sign-extended partial products.
In addition, the signals received by each Wallace tree subunit in Wallace tree group unit 1121 may include carry input signals CiniPartial product input signal, carry output signal Couti. Optionally, the partial product input signal received by each wallace tree subunit may be each column number of the partial product of all target codes, and the carry signal Cout output by each wallace tree subunitiMay be equal to NCout=floor((NI+NCin)/2) -1. Wherein N isIMay represent the number of partial product value input signals, N, of the Wallace tree subunitCinMay represent the number of carry input signals, N, of the Wallace Tree subcellCoutMay represent the least number of carry output signals of the wallace tree subunit, floor () may represent a floor function. Optionally, the carry input signal received by each wallace tree subunit in the wallace tree group unit 1121 may be a carry output signal output by a previous wallace tree subunit, and the carry input signal received by the first wallace tree subunit may be 0, and meanwhile, the number of carry signal input ports received by the first wallace tree subunit may be the same as the number of carry signal input ports of other wallace tree subunits.
Illustratively, if the multiplication circuit 11 currently processes 8-bit by 8-bit multiplication, the sign-bit-extended partial product obtained by the partial product obtaining unit 1112 is "pi9pi9pi9pi9pi9pi9pi9pi9pi8pi7pi6pi5pi4pi3pi2pi1"(i ═ 1, …, n ═ 9), where i may represent the i-th sign bit expanded partial product, and obtain the 9 target coded partial products according to the 9 sign bit expanded partial products, and add up the 9 target coded partial products. Optionally, a distribution rule of the partial products of the 9 target codes may be shown in fig. 3, where each origin may represent each bit value in the partial product after sign bit extension, and the partial product of the first target code may be the partial product after first sign bit extension, where in the distribution rule of the partial products of the 9 target codes, each partial product of the target codes may have a corresponding partial product after sign bit extension, and starting from the partial product of the second target code, the corresponding partial product after sign bit extension may be shifted by one bit value to the left on the basis of the partial product of the previous target code, and the highest bit value of the partial product of each target code and the highest partial product of the first target code may be shifted by one bit valueThe bit values are located in the same column, which is equivalent to that, after the partial product of each sign bit after expansion is shifted to the left from the partial product of the second target code, the addition operation is not carried out on the higher bit values corresponding to the left shift. Optionally, in the partial products of 9 target codes, the partial product of the first target code may be the partial product after the first sign bit is expanded, and from the partial product of the second target code, the highest-order numerical value of the partial product of each target code is located in the same column as the highest-order numerical value of the partial product of the first target code; from the rightmost column to the leftmost column, 16 Wallace subunits are needed to perform accumulation processing on partial products of 9 symbol target codes, a connection circuit diagram of the 16 Wallace subunits is shown in FIG. 4, wherein Wallace _ i in FIG. 4 represents the Wallace subunits, i is the number of the Wallace subunits from 1, a solid line connected between every two Wallace subunits indicates that the Wallace subunits corresponding to the high-order numbers have carry output signals, and a dotted line indicates that the Wallace subunits corresponding to the high-order numbers do not have carry output signals.
In the multiplier provided by the embodiment, the number of effective partial products obtained by the multiplier is small, and the complexity of realizing multiplication operation by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As an embodiment, the accumulation unit 1122 in the multiplier includes: and the adder is used for adding the received accumulated correction result.
Specifically, the adder may be an adder with different bit widths, and the adder may be a carry-look-ahead adder. Optionally, the adder may receive the two paths of signals output by the modified wallace tree group unit 1121, perform addition operation on the two paths of output signals, and output a multiplication result.
Optionally, the adder includes: a carry signal input port, a sum signal input port and a result output port; the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the result output port is used for outputting a result of accumulation processing of the carry signal and the sum signal.
Specifically, the adder may receive the Carry signal Carry output by the modified wallace tree group unit 1121 through the Carry signal input port, receive the Sum bit signal Sum output by the modified wallace tree group unit 1121 through the Sum bit signal input port, add the result of the Sum bit signal Sum and the Carry signal Carry, and output the result through the result output port.
It should be noted that, during the multiplication, the multiplication processing circuit 11 may adopt adders with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the modified wallace tree group unit 1121, where the bit width of the data that can be processed by the adder may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Optionally, each wallace tree subunit in the modified wallace tree group unit 1121 may output a Carry output signal CarryiAnd a Sum bit output signal Sumi(i ═ 0, …, 2N-1, i is the corresponding number for each wallace tree subunit, starting with number 0). Optionally, the Carry { [ Carry ] received by the adder0:Carry2N-2]0, that is, the bit width of the Carry output signal Carry received by the adder is 2N, the first 2N-1 bit values in the Carry output signal Carry correspond to Carry output signals of the first 2N-1 wallace tree sub-units in the modified wallace tree group unit 1121, and the last bit value in the Carry output signal Carry may be replaced by a value 0. Optionally, the Sum bit output signal Sum received by the adder has a bit width of 2N, and the value of the Sum bit output signal Sum may be equal to the Sum bit output signal of each wallace tree subunit in the modified wallace tree group unit 1121.
Illustratively, if the multiplication circuit 11 is currently processing 8-bit by 8-bit multiplication, the adder may be a 16-bit Carry look ahead adder, and as shown in fig. 4, the modified wallace tree group unit 1121 may output Sum output signals Sum and Carry output signals Carry of 16 wallace tree subunits, however, the Sum output signal received by the 16-bit Carry look ahead adder may be the complete Sum output signal Sum output by the modified wallace tree group unit 1121, and the Carry output signal received may be the Carry output signal Carry combined with the value 0 of all Carry output signals except the Carry output signal output by the last wallace tree subunit in the modified wallace tree group unit 1121.
According to the multiplier provided by the embodiment, the number of the effective partial products obtained by the multiplier is small, the complexity of multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the multiplier is effectively reduced.
In one embodiment, the multiplier comprises said register circuit 13, the register circuit 13 comprising: a register sub-circuit 131, where the register sub-circuit 131 is used to store the multiplication result corresponding to different storage indication signals.
Specifically, the register circuit 13 may include two or more register sub-circuits 131, and it is understood that the number of register sub-circuits 131 in the register circuit 13 may be equal to 2Nin/Nout,NinIndicating the bit width, N, of the data received by the multiplierout(Nout<2Nin) Indicating the bit width of the data output by the multiplier. Optionally, the bit width of the data stored in the register sub-circuit 131 may be equal to 2 times the bit width of the input port of the multiplier. Optionally, the bit width of the data received by the multiplier may be equal to the bit width of the input port of the multiplier, and the bit width of the data output by the multiplier may be equal to the bit width of the input port of the multiplier, and may also be less than 2 times the bit width of the input port of the multiplier. Illustratively, if the bit width of the input port and the bit width of the output port of the multiplier are both N bits, the register circuit 13 needs to be formed by combining two register sub-circuits 131; if the bit width of the input port of the multiplier is N bits and the bit width of the output port is N/2 bits, the register circuit 13 needs to be formed by combining four register sub-circuits 131. Optionally, the multiplier may store the multiplication result obtained by each multiplication operation to the corresponding 2N according to the storage indication signalin/NoutIn each register sub-circuit 131, there are different register sub-circuits 131 for storing the multiplication result corresponding to different storage indication signals. Alternatively, multiplyEach multiplication result obtained by the method can be stored only in the register sub-circuit 131 corresponding to the storage instruction signal, and cannot be stored in another register sub-circuit 131 not corresponding to the storage instruction signal.
Illustratively, if there are n register sub-circuits 131 in register circuit 13, the numbers 1, 2, 3., n, the result of the first multiplication by the multiplier can be stored in register sub-circuit number 1 131, which, at this time, the value of the store indicator signal may be 1, and the result of the second multiplication operation by the multiplier may be stored in register No. 2 sub-circuit 132, and at this time, the value of the memory indication signal may be 2, it may also be understood that, when the value of the memory indication signal is odd, the register sub-circuit 131 storing the multiplication result has an odd number corresponding thereto, and when the value of the storage instruction signal is an even number, the register sub-circuit 131 storing the multiplication result has an even number corresponding thereto, wherein the value of the store indication signal may be equal to the number corresponding to the register sub-circuit 131 storing the multiplication result.
In the multiplier provided by this embodiment, the register sub-circuit in the multiplier stores the multiplication result obtained by each multiplication operation into different register sub-circuits according to different storage indication signals, and further outputs data in the multiplication result stored in the corresponding register sub-circuit according to the read indication signal, so that a target operation result is output through a subsequent multiplier whose output port bit width does not match 2 times the input port bit width.
Another embodiment provides a multiplier, wherein the multiplier comprises the accumulation sub-circuit 212, and the accumulation sub-circuit 212 comprises: a wallace tree group unit 2121 and an accumulation unit 2122; wherein, the output end of the wallace tree group unit 2121 is connected with the input end of the accumulation unit 2122; the wallace tree group unit 2121 is configured to perform accumulation processing on the partial product of the target code to obtain an accumulation operation result, and the accumulation unit 2122 is configured to perform accumulation processing on the accumulation operation result to obtain the target operation result.
Specifically, the wallace tree group unit 2121 may accumulate the values in the partial products of all target codes obtained by the partial product obtaining unit 2112 to obtain an accumulation operation result, and accumulate the accumulation operation result obtained by the wallace tree group unit 2121 by the accumulation unit 2122 to obtain the target operation result.
Optionally, a multiplier comprises the wallace tree group unit 2121, and the wallace tree group unit 2121 comprises: the multiple Wallace tree subunits 2121_ 1-2121 _ n are used for accumulating the number of each column in the partial product of all target codes.
In this embodiment, the circuit structure and the function of the wallace tree group unit 2121 may be the same as those of the wallace tree group unit 1121, and the detailed structure of the wallace tree group unit 2121 is not described herein again.
According to the multiplier provided by the embodiment, the Wallace tree group unit can accumulate partial products of target codes, the accumulation unit can accumulate results to obtain multiplication results, and the target operation results are obtained according to the multiplication results, so that the number of effective partial products obtained by the multiplier is small, and the complexity of the multiplier for realizing multiplication is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
As an embodiment, wherein the multiplier includes the accumulation unit 2122, the accumulation unit 2122 includes: an adder for adding the result of the addition operation.
Specifically, the adder may be an adder with different bit widths, and the adder may be a carry-look-ahead adder. Optionally, the adder may receive the two signals output by the wallace tree group unit 2121, perform addition operation on the two output signals, and output a multiplication result.
In the multiplier provided by the embodiment, the accumulation unit can accumulate two paths of signals output by the wallace tree group unit, output the multiplication result, and obtain the target operation result according to the multiplication result, so that the number of effective partial products obtained by the multiplier is small, and the complexity of realizing multiplication by the multiplier is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
In one embodiment, wherein the multiplier comprises the adder, the adder comprises: a carry signal input port, a sum signal input port and a result output port; the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the result output port is used for outputting a multiplication operation result obtained by accumulating the carry signal and the sum signal.
Specifically, the adder may receive the Carry signal Carry output by the wallace tree group unit 2121 through the Carry signal input port, receive the Sum signal Sum output by the wallace tree group unit 2121 through the Sum signal input port, add the Carry signal Carry and the Sum signal Sum to obtain a multiplication result, and output the multiplication result through the result output port.
It should be noted that, during the multiplication, the multiplication circuit 21 may adopt adders with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the wallace tree group unit 2121, where the bit width of the data that can be processed by the adder may be equal to 2 times of the bit width N of the data currently processed by the multiplier. Optionally, each Wallace tree subunit in the Wallace tree group unit 2121 may output a Carry output signal CarryiAnd a Sum bit output signal Sumi(i ═ 0, …, 2N-1, i is the corresponding number for each wallace tree subunit, starting with number 0). Optionally, the Carry { [ Carry ] received by the adder0:Carry2N-2]0), that is, the bit width of the Carry output signal Carry received by the adder is 2N, and the first 2N-1 bit values in the Carry output signal Carry correspond to the first 2N-1 wallace tree subunits in the wallace tree group unit 2121The last digit value in the Carry output signal Carry may be replaced by 0. Optionally, the Sum bit output signal Sum received by the adder has a bit width of 2N, and the value of the Sum bit output signal Sum may be equal to the Sum bit output signal of each wallace tree subunit in the wallace tree group unit 2121.
Illustratively, if the multiplication circuit 11 is currently processing 8-bit by 8-bit multiplication, the adder may be a 16-bit Carry look-ahead adder, and as shown in fig. 4, the wallace tree group unit 2121 may output the Sum output signal Sum and the Carry output signal Carry of 16 wallace tree subunits, however, the Sum output signal received by the 16-bit Carry look-ahead adder may be the complete Sum output signal Sum output by the wallace tree group unit 2121, and the received Carry output signal may be the Carry output signal Carry in the wallace tree group unit 2121 after all Carry output signals except the Carry output signal output by the last wallace tree subunit are combined with 0.
In the multiplier provided by the embodiment, the accumulation unit can perform accumulation operation on two paths of signals output by the Wallace tree group unit, output the multiplication result, and obtain the target operation result according to the multiplication result, so that the number of effective partial products obtained by the multiplier is small, and the complexity of the multiplier for realizing the multiplication operation is reduced; meanwhile, the multiplier can improve the operation efficiency of multiplication operation and effectively reduce the power consumption of the multiplier.
Another embodiment provides a multiplier comprising the first converting sub-circuit 221 and the second converting sub-circuit 222, wherein the first converting sub-circuit 221 is specifically configured to convert the multiplication operation result into the target operation result of a floating-point type, and the second converting sub-circuit 222 is specifically configured to convert the multiplication operation result into the target operation result of a fixed-point type.
Specifically, the bit width of the multiplication result may be equal to 2 times the bit width of the data received by the multiplier, the bit width of the floating-point type calculation result and the bit width of the fixed-point type calculation result may be equal to the bit width of the output port of the multiplier, and the bit width of the floating-point type calculation result in the rotation number circuit 22 may be equal to the bit width of the fixed-point type calculation result.
In the revolution number circuit 22, the first conversion sub-circuit 221 and the second conversion sub-circuit 222 are not connected to each other, but are independent of each other, and the revolution number circuit 22 only needs to perform data revolution number processing on the first conversion sub-circuit 221 or the second conversion sub-circuit 222 to obtain a target operation result during each multiplication. Alternatively, the revolution number circuit 22 may determine, according to the received data conversion signal, whether the multiplication operation needs to perform data revolution number processing through the first conversion sub-circuit 221 or the second conversion sub-circuit 222.
Optionally, the data conversion signal may include two signals, which may be represented by binary numbers as 00 and 01, respectively, where the signal represented by the data conversion signal as 00 may include a fixed point number with a 2N bit width received by the revolution number circuit 22, the fixed point number with the 2N bit width needs to be converted into a fixed point number with an N bit width, and a position of a fixed point number decimal point after the conversion, where the position of the fixed point number decimal point with the 2N bit width before the conversion may be determined; the signal represented by the data conversion signal of 01 may include the fixed point number with 2N bit width as the multiplication result received by the number of rotations circuit 22, and the fixed point number with 2N bit width needs to be converted into a floating point number with N bit width. Optionally, the number-of-revolutions circuit 22 may perform different number-of-revolutions processing on the received multiplication result through the first conversion sub-circuit 221 or the second conversion sub-circuit 222 according to two different received data conversion signals, and the specific implementation manner is implemented as follows:
(1) if the data conversion signal received by the revolution circuit 22 is 00, the revolution circuit 22 may convert the fixed point number with the 2N bit width into the fixed point number with the N bit width, at this time, the revolution circuit 22 may perform data conversion on the received fixed point number with the 2N bit width through the second conversion sub-circuit 222, specifically, during the revolution processing, it is necessary to align the position of the fixed point number with the small point with the N bit width after the target conversion with the position of the fixed point number with the 2N bit width before the conversion, and then intercept the total N bit number before and after the position of the fixed point number with the 2N bit width before the conversion to obtain the fixed point number with the N bit width after the conversion, where the interception manner may be divided into three cases:
in case a, when the truncated N-bit values are all included in the fixed-point number of 2N-bit width before conversion, the second conversion sub-circuit 222 may directly truncate the N-bit values that are total before and after the position of the decimal point in the fixed-point number of 2N-bit width before conversion;
in case b, when a part of the intercepted N-bit values is included in the fixed-point number of the 2N-bit width before the conversion, and the upper part of the N-bit values to be intercepted has no corresponding part of the N-bit values to be intercepted in the fixed-point number of the 2N-bit width before the conversion, the second converting sub-circuit 222 may complement each bit value of the part with the sign bit of the fixed-point number of the 2N-bit width before the conversion, and then intercept the N-bit values from the fixed-point number after the complementation;
in case c, when a part of the intercepted N-bit values is included in the fixed-point number of the 2N-bit width before the conversion, and the lower part of the N-bit values to be intercepted has no corresponding part of the N-bit value to be intercepted in the fixed-point number of the 2N-bit width before the conversion, the second converting sub-circuit 222 may complement each bit value of the part according to the positive or negative of the fixed-point number of the 2N-bit width before the conversion, and if the fixed-point number of the 2N-bit width before the conversion is a positive number, the part of each bit value may be complemented by a value of 0, otherwise, the fixed-point number is complemented by a value of 1, and then the N-bit value is intercepted from the fixed-point number after the complementation;
(2) if the data conversion signal received by the rotation circuit 22 is 01, the rotation circuit 22 may convert the fixed-point number with 2N bit width into a floating-point number with N bit width, at this time, the rotation circuit 22 may perform data conversion on the received fixed-point number with 2N bit width through the first conversion sub-circuit 221, specifically, during the rotation processing, the highest-order numerical value (i.e., the sign bit) of the fixed-point number may be used as the sign bit numerical value of the floating-point number after conversion, and if the 2N-order fixed-point number before conversion is a positive number, the sign bit of the highest-order numerical value is removed, the highest-order numerical value is searched from the highest order of the 2N-1-order fixed-point number to the lowest order direction, and when the numerical value 1 is found, an m-order numerical value is left after counting the numerical value 1, at this time, the exponent numerical value of the floating-point after conversion may be equal to m plus, however, if the 2N-bit fixed point number before conversion is a negative number, the sign bit of the highest-order numerical value is removed, the highest-order numerical value is searched from the highest order of the 2N-1-bit fixed point number in the direction of the lowest order, and when the numerical value 0 is found, the m-order numerical value is counted after the numerical value 0, and in addition, the higher N-order numerical value of the m-order numerical value needs to be intercepted as the mantissa numerical value of the floating point number after conversion, if m > is equal to N, the N-order numerical value can be directly intercepted as the mantissa numerical value, and if m < N, the N-m-order highest-order (i.e., sign bit) numerical value can be complemented after the 2N-bit fixed point.
For example, if a fixed-point number 2N bits wide needs to be converted into a floating-point number 16 bits wide, i may be equal to 16, and N may be equal to 10; if the fixed point number with the bit width of 2N needs to be converted into the floating point number with the bit width of 32N, i can be equal to 127, and N can be equal to 23; if it is desired to convert a 2N bit wide floating point number to a 64bit wide floating point number, i may equal 1023 and N may equal 52.
In the multiplier provided by this embodiment, after the multiplier converts the multiplication result into data with a bit width equal to the bit width of the output port of the multiplier through the revolution circuit, the target calculation result is output, so that the bit width of the obtained target calculation result can be less than 2 times of the bit width of the data input by the multiplier, thereby effectively reducing the requirement of the multiplier on the bit width of the input/output port.
Fig. 5 is a flowchart illustrating a data processing method according to an embodiment, where the method can be processed by the multiplier shown in fig. 1, and this embodiment relates to a process of performing comparison operation on data. As shown in fig. 5, the method includes:
s101, receiving data to be processed.
In particular, the regular signed number encoding subcircuit in the multiplier may receive two data to be processed. Optionally, the regular signed number encoding sub-circuit may process two fixed-bit-width data, and the fixed bit-width may be equal to the bit-width of the input port of the multiplier. Optionally, the data to be processed received by the regular signed number encoding sub-circuit may be a fixed-point number, and a bit width of the fixed-point number may be equal to a bit width of an input port of the multiplier.
S102, performing regular signed number coding processing on the data to be processed to obtain a partial product of target coding.
Specifically, the method of the regular signed number encoding process may be characterized by the following steps: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1). It should be noted that the number of partial products of the target code may be equal to the bit width N of the data received by the multiplier plus 1.
And S103, accumulating the partial product of the target code to obtain a multiplication result.
Specifically, the accumulation sub-circuit may perform accumulation operation on each column number in the partial products of all target codes to obtain a multiplication result. Optionally, the bit width of the multiplication result may be equal to 2 times of the bit width of the data received by the multiplier, and may also be equal to 2 times of the bit width of the input port of the multiplier.
And S104, acquiring a storage indication signal and a reading indication signal.
Specifically, the multiplier can automatically acquire the storage indication signal and the reading indication signal through the state control circuit.
And S105, storing the multiple multiplication operation results into different register sub-circuits according to the storage indication signal.
Specifically, the state control circuit in the multiplier may input the acquired storage instruction signal to the register control circuit, and the register control circuit determines the multiplication result obtained by the current multiplication according to the received storage instruction signal, and may store the multiplication result in the corresponding register sub-circuit.
It should be noted that one register sub-circuit can store only one multiplication result at most, and some register sub-circuits in the plurality of register sub-circuits may be in an idle state.
And S106, reading partial data stored in different register sub-circuits and corresponding to the multiplication result according to the reading indication signal to obtain a target operation result.
Specifically, the selection circuit in the multiplier may read a part of the data in the multiplication result stored in the corresponding register sub-circuit as the target operation result according to the received read instruction signal. Optionally, the operation result is not a target operation result, the target operation result of the multiplication operation may be formed by splicing the operation results read twice, or may be formed by splicing the operation results read multiple times, and it may be understood that the bit width of the partial data in the multiplication operation result may be equal to 1/2 of the bit width of the multiplication operation result, or may be smaller than 1/2 of the bit width of the multiplication operation result. Optionally, the bit width of the target operation result may be less than or equal to the bit width of the input port of the multiplier.
The data processing method provided by this embodiment can perform regular signed number coding processing on received data to obtain a partial product of a target code, perform accumulation processing on the partial product of the target code to obtain a multiplication result, and respectively read high-bit data and low-bit data in the multiplication result as the target operation result, so that the bit width of the obtained target operation result can be less than 2 times of the bit width of data input by a multiplier, thereby effectively reducing the requirement of the multiplier on the bit width of an input/output port; meanwhile, the method can adopt a regular signed number coding circuit to carry out regular signed number coding processing on the received data, and reduce the number of effective partial products obtained in the multiplication process, thereby reducing the complexity of multiplication; meanwhile, the method can improve the operation efficiency of multiplication operation.
As an embodiment, the step of performing regular signed number encoding processing on the data to be processed in S102 to obtain a partial product of target encoding may include:
and S1021, performing regular signed number coding processing on the data to be processed to obtain an original partial product.
Optionally, the step of performing regular signed number encoding processing on the data to be processed in the above S1021 to obtain an original partial product may include:
s1021a, performing regular signed number coding processing on the data to be processed to obtain target codes.
Specifically, the multiplier may perform regular signed number encoding processing on the received multiplier to be processed through the regular signed number encoding unit, so as to obtain the target code. The bit width of the target code may be equal to the bit width N of the multiplier to be processed plus 1.
Optionally, the step of performing regular signed number coding processing on the data to be processed in S1021a to obtain the target code may include: and converting continuous l-bit numerical values 1 in the data to be processed into (l +1) bits with the highest numerical value of 1, the lowest numerical value of-1 and the rest of bits of 0 to obtain the target code, wherein l is more than or equal to 2.
It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for N-bit multipliers, processing from lower to higher order values, if there are consecutive l (l)>2) bit value 1, successive n bit values 1 can be converted into data "1 (0))l-1(-1) ", and combining the remaining corresponding (N-l) bit values with the converted (l +1) bit values to obtain a new data; then, the new data is used as the initial data of the next stage of conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit value 1; the N-bit multiplier is subjected to regular signed number encoding processing, and the bit width of the obtained target code can be equal to (N + 1).
S1022b, converting the data to be processed and the target code to obtain the original partial product.
It should be noted that the number of the original partial products may be equal to the bit width of the target code.
Illustratively, if the partial product fetch unit receives an 8-bit multiplicand "x7x6x5x4x3x2x1x0"(i.e., X), then the partial product acquisition unit may be based on the multiplicand" X7x6x5x4x3x2x1x0"(i.e., X) directly obtains the corresponding original partial product with three values-1, 0, 1 contained in the target code, where the original partial product may be-X when the value of one bit in the target code is-1, the original partial product may be 0 when the value of one bit in the target code is 0, and the original partial product may be X when the value of one bit in the target code is 1. Alternatively, the conversion process may be characterized by converting the value in the target code to the original partial product based on the multiplicand in the multiplication operation.
And S1022, sign bit expansion processing is carried out on the original partial product to obtain the target coded partial product.
Optionally, the step of performing sign bit extension processing on the original partial product in the above S1022 to obtain the target encoded partial product may specifically include: and carrying out bit complementing treatment on the original partial product to obtain the partial product of the target code.
Specifically, the bit width of the partial product after sign bit extension may be equal to 2 times of the bit width N of the data currently processed by the multiplier, the bit width of the original partial product may be equal to N, and the bit number of the sign bit extension bit may be equal to N. Optionally, the sign bit extension processing may be understood as that the value of the sign bit extension bit is complemented by the value of the sign bit in the original partial product, that is, the complement value may be the sign bit value in the original partial product, and the sign bit value may be the highest bit value in the original partial product, so as to obtain a partial product after sign bit extension with a 2N-bit width. Optionally, the number of complementary bits may be equal to N. Optionally, in the distribution rule of the partial products after all sign bit extensions, the highest-order numerical value in the partial products after all sign bit extensions may be located in the same column, the lowest-order numerical value may be located in the same column, and other corresponding numerical values may also correspond to the same column.
The data processing method provided by this embodiment can perform regular signed number coding processing on the data to be processed to obtain an original partial product, perform sign bit extension processing on the original partial product to obtain a partial product of the target code, and perform accumulation processing on the partial product of the target code to obtain a multiplication result, and further read high-bit data and low-bit data in the multiplication result respectively as a target operation result, so that the bit width of the obtained target operation result can be less than 2 times of the bit width of data input by the multiplier, thereby effectively reducing the requirement of the multiplier on the bit width of the input/output port; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication operation is reduced; meanwhile, the method can improve the operation efficiency of multiplication operation.
Another embodiment provides the data processing method, wherein the step of storing the multiple multiplication results into different register sub-circuits according to the storage indication signal in S105 specifically includes:
s1051, storing the first multiplication result corresponding to the first storage indication signal into the first register sub-circuit.
Specifically, the number of the storage indication signals may be equal to the number of times that the multiplier performs multiplication, the multiplier performs one multiplication, a multiplication result may be obtained, and the state control circuit may obtain a corresponding storage indication signal. If the multiplier carries out the first multiplication operation to obtain a first multiplication operation result, the state control circuit automatically obtains a first storage indication signal, and the register control circuit determines a first register sub-circuit for storing the first multiplication operation result according to the first storage indication signal input by the state control circuit and inputs the first multiplication operation result to the first register sub-circuit for storage.
And S1052, storing a second multiplication operation result corresponding to the second storage indication signal into the second register sub-circuit.
It should be noted that, if the multiplier performs the second multiplication to obtain the second multiplication result, the state control circuit automatically obtains the second storage indication signal, and the register control circuit determines the second register sub-circuit storing the second multiplication result according to the second storage indication signal input by the state control circuit, and inputs the second multiplication result to the second register sub-circuit for storage. By analogy, the multiplier can store the multiplication result obtained by each multiplication operation into different register sub-circuits, and store the corresponding multiplication results according to the serial number sequence of the register sub-circuits, that is, the multiplication results of two consecutive times can be stored into two adjacent register sub-circuits.
In the data processing method provided by this embodiment, a first multiplication result corresponding to a first storage indication signal is stored in a first register sub-circuit, and a second multiplication result corresponding to a second storage indication signal is stored in a second register sub-circuit, so that the problem of multiplication result coverage is avoided; in addition, the method can also ensure that the bit width of the obtained target operation result can be less than 2 times of the bit width of data input by the multiplier, effectively reduces the requirement of the multiplier on the bit width of the input/output port, and simultaneously, the method can obtain fewer effective partial products and reduce the complexity of multiplication operation.
As an embodiment, the step of reading, in the step S106, partial data stored in different register sub-circuits and corresponding to the multiplication result according to the read indication signal to obtain the target operation result may specifically be implemented by:
s1061, reading a first part of data in the first multiplication result stored in the first register sub-circuit according to the first reading indication signal, to obtain a first operation result.
S1062, reading a second part of data in the first multiplication result stored in the first register sub-circuit according to a second reading instruction signal, to obtain a second operation result.
Specifically, the number of read instruction signals acquired by the state control circuit in the multiplier may be equal to the number of times the multiplier reads the operation result, which is 2 times the number of the multiplication results. Optionally, the multiplication result may include two parts of data, namely, a first part of data and a second part of data. For example, if the bit width of the multiplication result is equal to 2N, the multiplication result may be divided into two parts of data, i.e., upper N-bit data and lower N-bit data, where the first part of data may be the upper N-bit data or the lower N-bit data, and the second part of data may be the lower N-bit data or the upper N-bit data.
S1063, reading a first part of data in the second multiplication result stored in the second register sub-circuit according to a third read instruction signal, to obtain a third operation result.
Alternatively, each read indication signal may correspond to the first part of data or the second part of data in the multiplication result.
S1064, reading a second part of data in the second multiplication result stored in the second register sub-circuit according to a fourth reading indication signal, to obtain a fourth operation result.
Specifically, the multiplier may perform multiplication on multiple sets of data to be processed to obtain multiple multiplication results, and thus, after the multiplier reads the fourth operation result, part of the data in the next multiplication result may be read according to the next read instruction signal.
Illustratively, if the input port bit width of the multiplier is 32 bits and the output port bit width is 64/t + deta bits (generally, the multiplier can complete one multiplication operation through t clock cycles to obtain a multiplication operation result, t>1,deta>0), the data bit width received by the multiplier is 32 bits, and the multiplier needs to multiply multiple sets of data to be processed, in this case, the register circuit 13 includes (64/(64/t + deta)) register sub-circuits 131 (i.e. register sub-circuit a)1,A2,...,AiI may be equal to (64/(64/t + deta))), then we obtainThe implementation process of the target operation result may be:
if the multiplier obtains the first multiplication result M _0 through t (t can be more than or equal to 0) clock cycles, the register control circuit can store M _0 (64-bit width) to the register sub-circuit A according to the first storage indication signal1At this time, the selection circuit can slave the register sub-circuit A according to the first read indication signal1Reading the high 32-bit data of M _0 as a first operation result obtained by the first multiplication operation;
meanwhile, when the multiplier reaches the t +1 clock cycle, the selection circuit can slave register sub-circuit A according to the second read indication signal1Reading the low-order 32-bit data of M _0 as a second operation result obtained by the first multiplication, in this embodiment, the multiplier splices the first operation result and the second operation result to obtain a target operation result of the data to be processed;
if the multiplier can obtain the second multiplication result M _1 by the 2t clock cycle, the register control circuit can store M _1 to the register sub-circuit A according to the second storage indication signal2At this time, the selection circuit can read the indication signal from the register sub-circuit A according to the third reading signal2Reading the high-order 32-bit data of M _1 as a third operation result obtained by the second multiplication operation;
meanwhile, when the multiplier operates to the 2t +1 th clock cycle, the selection circuit can slave register sub-circuit A according to the fourth read indication signal2The low-order 32-bit data of M _1 is read as a fourth operation result obtained by the second multiplication, in this embodiment, the data comparator combines the third operation result and the fourth operation result to obtain a target operation result of the data to be processed;
and by analogy, the obtained multiplication result can be stored into corresponding different register sub-circuits according to different storage indication signals, and partial data in the stored multiplication result in different register sub-circuits can be read according to different reading indication signals to obtain a target operation result.
In addition, if a set of data to be processed in the multiple sets of data to be processed has a zero value, at this time, the multiplier may obtain a multiplication result corresponding to the set of data to be processed through m (m < t) clock cycles, the multiplier may store the multiplication result into a corresponding register sub-circuit according to the storage indication signal, in the current clock cycle, the multiplier may read a part of data in the multiplication results stored in different register sub-circuits according to the reading indication signal, and the multiplier in the next clock cycle may output the remaining part of data in the multiplication result; if the next group of data to be processed also has a zero value, and 1 clock cycle is required to complete one multiplication operation, so as to obtain a multiplication operation result, at this time, the multiplier can store the multiplication operation result into the next adjacent register sub-circuit.
In the data processing method provided by this embodiment, the multiplier reads part of data in the corresponding multiplication result stored in different register sub-circuits according to the read indication signal to obtain a target operation result, and the method can read high-order data and low-order data in the multiplication result respectively as the target operation result, so that the bit width of the obtained target operation result can be less than 2 times of the bit width of the data input by the multiplier, thereby effectively reducing the requirement of the multiplier on the bit width of the input/output port; meanwhile, the number of effective partial products obtained by the method is small, and the complexity of multiplication operation is reduced.
Fig. 6 is a flowchart illustrating a data processing method according to an embodiment, where the method can be processed by the multiplier shown in fig. 2, and this embodiment relates to a process of multiplying data. As shown in fig. 6, the method includes:
s201, receiving a data conversion signal and data to be processed.
Specifically, the multiplication circuit in the multiplier can receive two data to be processed and a data conversion signal. Optionally, the bit width of the data to be processed may be equal to the bit width of the input port of the multiplier. Optionally, if the rotation number circuit receives different data conversion signals, the rotation number circuit may convert the received data into data in a format corresponding to the data conversion signal.
S202, performing regular signed number coding processing on the data to be processed to obtain a partial product of target coding.
Specifically, the principle of the regular signed number encoding process can be characterized in that for an N-bit multiplier, the value is processed from a lower bit to a higher bit, and if there is a continuous l (l)>2) bit 1, then n bit 1 can be converted into data "1 (0))l-1(-1) ", and combining the other corresponding N-l bit value with the converted l +1 bit value to obtain a new data, and using the new data as the initial data of the next stage conversion processing until no continuous l (l) exists in the new data obtained after the conversion processing>2) bit 1, where the bit width of the target code resulting from the regular signed number coding process on the N-bit multiplier may be equal to the N + 1-bit value. It should be noted that the number of partial products of the target code may be equal to the bit width N of the data received by the multiplier plus 1.
And S203, accumulating the partial product of the target code to obtain a multiplication result.
Specifically, the accumulation sub-circuit may perform accumulation operation on each column number in the partial products of all target codes to obtain a multiplication result. Optionally, the bit width of the multiplication result may be equal to 2 times of the bit width of the data received by the multiplier, and may also be equal to 2 times of the bit width of the input port of the multiplier. Optionally, the bit width of the multiplication result may be equal to 2 times of the bit width of the input port of the multiplier, and may also be equal to 2 times of the bit width of the data to be processed.
And S204, performing revolution processing on the multiplication result according to the data conversion signal to obtain a target operation result, wherein the data conversion signal is used for indicating that the multiplier needs to convert the target operation result into a required data type.
Specifically, the number-of-revolutions circuit may convert the multiplication result into an operation result of a fixed-point type or an operation result of a floating-point type, as determined based on the received data conversion signal. For example, if the revolution circuit can receive two data conversion signals, which are respectively represented as 00 and 01, and bit widths of an input port and an output port of the multiplier are both N bits, 00 represents that the revolution circuit can convert a received 2N-bit multiplication result into an N-bit fixed-point type operation result, and 01 represents that the revolution circuit can convert a received 2N-bit multiplication result into an N-bit floating-point type operation result, where functions implemented by different data conversion signals corresponding to the revolution circuit can be flexibly set. Optionally, each data conversion signal may represent a data type that the multiplier needs to convert the multiplication result into a requirement.
According to the data processing method provided by the embodiment, the data conversion signal and the data to be processed are received, the data to be processed is subjected to multiplication processing to obtain a multiplication result, and the multiplication result is subjected to revolution processing according to the data conversion signal to obtain a target operation result, so that the bit width of the obtained target operation result can be smaller than 2 times of the bit width of input data of the multiplier, and the requirement of the multiplier on the bit width of an input/output port is effectively reduced; meanwhile, the number of effective partial products obtained by the method is small, and the complexity of multiplication operation is reduced.
The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers can be linked and transmit data through a specific structure, for example, a fast peripheral interconnection bus, so as to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has higher compatibility and can be connected with various types of servers through the quick external equipment interconnection interface.
The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 7 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Alternatively, as shown in fig. 8, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.
In some embodiments, a chip package structure is provided, which includes the above chip.
In some embodiments, a board card is provided, which includes the above chip package structure. As shown in fig. 9, fig. 9 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving means 391 and control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving means may be a standard fast external device interconnect interface. For example, the data to be processed is transmitted to the chip by the server through a standard fast external device interconnection interface, so that data transfer is realized. Preferably, when the fast peripheral component interconnect 3.0X 16 interface is adopted for transmission, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a multiplier, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required for this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A multiplier, characterized in that it comprises: the multiplication circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the first input end of the registration control circuit, the output end of the registration control circuit is connected with the input end of the register circuit, the output end of the register circuit is connected with the first input end of the selection circuit, the first output end of the state control circuit is connected with the second input end of the registration control circuit, and the second output end of the state control circuit is connected with the second input end of the selection circuit.
2. The multiplier of claim 1, wherein the regular signed number coding sub-circuit comprises a regular signed number coding unit and a partial product obtaining unit, the regular signed number coding unit is configured to receive first data and perform the regular signed number coding processing on the first data to obtain a target code, the partial product obtaining unit is configured to receive second data, obtain an original partial product according to the target code and the second data, and obtain a partial product of the target code according to the original partial product, the accumulation sub-circuit is configured to perform accumulation processing on the partial product of the target code to obtain a multiplication result, the state control circuit is configured to obtain a storage indication signal and a read indication signal, the register control circuit is configured to obtain the storage indication signal according to the storage indication signal input by the state control circuit, and the selection circuit is used for reading data in the multiplication result stored in the register circuit according to the received reading indication signal to be used as a target operation result.
3. The multiplier according to claim 2, wherein the partial product obtaining unit is specifically configured to perform conversion processing on the target code to obtain an original partial product, perform sign bit extension processing on the original partial product to obtain a sign bit extended partial product, and obtain the partial product of the target code according to the sign bit extended partial product.
4. The multiplier of any of claims 2 to 3, wherein the accumulation sub-circuit comprises: a Wallace tree group unit and an accumulation unit; the output end of the Wallace tree group unit is connected with the input end of the accumulation unit; the Wallace tree group unit is used for accumulating the partial product of the target code to obtain an accumulation operation result, and the accumulation unit is used for accumulating the accumulation operation result.
5. The multiplier of claim 4, wherein the Wallace Tree grouping unit comprises: a Wallace tree subunit to accumulate each column of values in the partial products of all target codes.
6. The multiplier of claim 4, wherein the accumulation unit comprises: an adder for adding the received accumulated result.
7. The multiplier of claim 1, wherein the register circuit comprises: and the register sub-circuit is used for storing the multiplication operation results corresponding to different storage indication signals.
8. A multiplier, characterized in that it comprises: the circuit comprises a multiplication operation circuit and a revolution circuit, wherein the multiplication operation circuit comprises a regular signed number coding sub-circuit and an accumulation sub-circuit, the output end of the regular signed number coding sub-circuit is connected with the input end of the accumulation sub-circuit, the output end of the accumulation sub-circuit is connected with the input end of the revolution circuit, and the revolution circuit comprises a first conversion sub-circuit and a second conversion sub-circuit;
the regular signed number coding sub-circuit is used for performing regular signed number coding processing on received data to obtain a target code and obtaining a partial product of the target code according to the target code, the accumulation sub-circuit is used for performing accumulation processing on the partial product of the target code to obtain a multiplication result, and the first conversion sub-circuit and the second conversion sub-circuit are respectively used for performing revolution processing on the multiplication result to obtain a target operation result.
9. The multiplier of claim 8, wherein the revolution circuit includes an input port for receiving a data conversion signal; the data conversion signal is used for determining the data conversion type processed by the revolution circuit.
10. The multiplier of claim 8 or 9, wherein the first conversion sub-circuit is configured to convert the result of the multiplication operation into the target result of the floating-point type operation, and wherein the second conversion sub-circuit is configured to convert the result of the multiplication operation into the target result of the fixed-point type operation.
CN201921433513.6U 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal Active CN209895329U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201921433513.6U CN209895329U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201921433513.6U CN209895329U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Publications (1)

Publication Number Publication Date
CN209895329U true CN209895329U (en) 2020-01-03

Family

ID=69022051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201921433513.6U Active CN209895329U (en) 2019-08-30 2019-08-30 Multiplier and method for generating a digital signal

Country Status (1)

Country Link
CN (1) CN209895329U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515589A (en) * 2019-08-30 2019-11-29 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515589A (en) * 2019-08-30 2019-11-29 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment
CN110515589B (en) * 2019-08-30 2024-04-09 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment

Similar Documents

Publication Publication Date Title
CN110515589B (en) Multiplier, data processing method, chip and electronic equipment
CN111008003B (en) Data processor, method, chip and electronic equipment
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN110515590B (en) Multiplier, data processing method, chip and electronic equipment
CN111381808B (en) Multiplier, data processing method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN209895329U (en) Multiplier and method for generating a digital signal
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN113031912A (en) Multiplier, data processing method, device and chip
CN210109863U (en) Multiplier, device, neural network chip and electronic equipment
CN110647307B (en) Data processor, method, chip and electronic equipment
CN110515586B (en) Multiplier, data processing method, chip and electronic equipment
CN209879493U (en) Multiplier and method for generating a digital signal
CN111258542B (en) Multiplier, data processing method, chip and electronic equipment
CN210006029U (en) Data processor
CN110515588B (en) Multiplier, data processing method, chip and electronic equipment
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN113031916A (en) Multiplier, data processing method, device and chip
CN113031915A (en) Multiplier, data processing method, device and chip
CN113031911A (en) Multiplier, data processing method, device and chip
CN209962284U (en) Multiplier, device, chip and electronic equipment
CN110378478B (en) Multiplier, data processing method, chip and electronic equipment
CN110378477B (en) Multiplier, data processing method, chip and electronic equipment

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant