CN110515589A

CN110515589A - Multiplier, data processing method, chip and electronic equipment

Info

Publication number: CN110515589A
Application number: CN201910819020.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-11-29
Anticipated expiration: 2039-08-30
Also published as: CN110515589B

Abstract

The application provides a kind of multiplier, data processing method, chip and electronic equipment, which includes: multiplying operational circuit, deposit control circuit, register circuit, state control circuit and selection circuit；Multiplying operational circuit includes canonical signed number coding sub-circuit and cumulative sub-circuit, the output end of canonical signed number coding sub-circuit is connect with the input terminal of cumulative sub-circuit, the output end of cumulative sub-circuit is connect with the first input end of deposit control circuit, the input terminal of the output end and register circuit of depositing control circuit connects, the output end of register circuit and the first input end of selection circuit connect, first output end of state control circuit is connect with the second input terminal of deposit control circuit, the second output terminal of state control circuit and the second input terminal of selection circuit connect, the multiplier can carry out canonical signed number coding to the data received, the number of obtained live part product is less, reduce the complexity that multiplier realizes multiplying.

Description

Multiplier, data processing method, chip and electronic equipment

Technical field

This application involves field of computer technology, more particularly to a kind of multiplier, data processing method, chip and electronics Equipment.

Background technique

With the continuous development of Digital Electronic Technique, all kinds of artificial intelligence (Artificial Intelligence, AI) cores The fast-developing requirement for good digital multiplier of piece is also higher and higher.Neural network algorithm is extensive as intelligent chip One of algorithm of application, carrying out multiplying by multiplier is a kind of common operation in neural network algorithm.

Currently, multiplier is to encode to every three bit value in multiplier as one, and obtain partial product according to multiplicand, And compression processing is carried out to all partial products with Wallace tree and obtains target operation result.It is non-in coding but in traditional technology The number of zero-bit numerical value is more, and the number of the corresponding part product of generation is more, and multiplier is caused to realize the complexity of multiplying It is higher.

Summary of the invention

Based on this, it is necessary to which in view of the above technical problems, providing a kind of can reduce having of obtaining in multiplication procedure Partial product number is imitated, to reduce multiplier, data processing method, chip and the electronic equipment of multiplier multiplying complexity.

The embodiment of the present application provides a kind of multiplier, comprising: multiplying operational circuit, deposit control circuit, register circuit, State control circuit and selection circuit, the multiplying operational circuit include canonical signed number coding sub-circuit and cumulative son The output end of circuit, the canonical signed number coding sub-circuit is connect with the input terminal of the cumulative sub-circuit, described cumulative The output end of sub-circuit is connect with the first input end of the deposit control circuit, the output end of the deposit control circuit and institute State the input terminal connection of register circuit, the first input end company of the output end of the register circuit and the selection circuit It connects, the first output end of the state control circuit is connect with the second input terminal of the deposit control circuit, the state control The second output terminal of circuit processed is connect with the second input terminal of the selection circuit.

The canonical signed number coding sub-circuit includes canonical signed number coding unit in one of the embodiments, And partial product acquiring unit, the canonical signed number coding unit are used to receive the first data, and to first data The canonical signed number coded treatment is carried out, obtains the target code, the partial product acquiring unit is for receiving second Data obtain initial protion product according to the target code and second data, and are obtained according to initial protion product The partial product of the target code, the cumulative sub-circuit are used to carry out accumulation process to the partial product of the target code to obtain Multiplication result, the state control circuit is for obtaining storage indication signal and reading indication signal, the deposit control Circuit processed is used for the storage indication signal inputted according to the state control circuit, determines and stores the multiplication result The register circuit, for the register circuit for storing the multiplication result, the selection circuit is used for basis The reading indication signal received, reads the data in the multiplication result stored in the register circuit, As target operation result.

The canonical signed number coding unit may include: data-in port and mesh in one of the embodiments, Mark coding output port；The data-in port is used to receive first number for carrying out canonical signed number coded treatment It is used to export according to, the target code output port to obtaining after first data progress canonical signed number coded treatment The target code.

The partial product acquiring unit is specifically used for carrying out at conversion the target code in one of the embodiments, Reason obtains initial protion product, and carries out sign bit extension process to initial protion product, the part after obtaining symbol Bits Expanding Product, obtains the partial product of the target code according to the partial product after the symbol Bits Expanding.

The partial product acquiring unit includes: that target code input port, the second data are defeated in one of the embodiments, Inbound port and partial product output port；The target code input port is for receiving the target code, second number It is used to export the part of the target code for receiving second data, the partial product output port according to input port Product.

The cumulative sub-circuit includes: Wallace tree group unit and summing elements in one of the embodiments,；Wherein, The output end of the Wallace tree group unit is connect with the input terminal of the summing elements；The Wallace tree group unit for pair The partial product of the target code carries out accumulation process and obtains accumulating operation as a result, the summing elements are used for the cumulative fortune It calculates result and carries out accumulation process.

The Wallace tree group unit includes: Wallace tree subelement, the Wallace tree in one of the embodiments, Subelement is used to carry out accumulation process to each columns value in the partial product of all target codes.

The summing elements include: adder in one of the embodiments, and the adder is used for the institute received It states cumulative correction result and carries out add operation.

In one of the embodiments, the adder include: carry signal input port and position signal input port with And result output port；The carry signal input port is used for for receiving carry signal, described and position signal input port It receives and position signal, the result output port is used to export the carry signal and carries out accumulation process with described and position signal As a result.

The register circuit includes: deposit sub-circuit in one of the embodiments, and the deposit sub-circuit is used for will The corresponding multiplication result of difference storage indication signal is stored.

The embodiment of the present application provides a kind of multiplier, which includes: multiplying operational circuit and revolution circuit, described Multiplying operational circuit includes canonical signed number coding sub-circuit and cumulative sub-circuit, the canonical signed number coding electricity The output end on road is connect with the input terminal of the cumulative sub-circuit, the output end of the cumulative sub-circuit and the revolution circuit Input terminal connection, the revolution circuit include the first conversion sub-circuit and the second conversion sub-circuit；

Wherein, the canonical signed number coding sub-circuit is used to carry out canonical signed number coding to the data received Processing obtains target code, and obtains the partial product of target code according to the target code, the cumulative sub-circuit for pair The partial product of the target code is modified accumulation process and obtains multiplication result, first conversion sub-circuit and second Conversion sub-circuit is respectively used to carry out revolution processing to the multiplication result, obtains target operation result.

It in one of the embodiments, include input port, for receiving data conversion signal in the revolution circuit；Institute Data conversion signal is stated for determining the data conversion type of the revolution processing of circuit.

First conversion sub-circuit is specifically used for for the multiplication result being converted into one of the embodiments, The target operation result of floating point type, second conversion sub-circuit are specifically used for for the multiplication result being converted into The target operation result of fixed point type.

A kind of multiplier provided in this embodiment, above-mentioned multiplier can encode sub-circuit docking by canonical signed number The data received carry out canonical signed number coding, and the number of obtained live part product is less, to reduce multiplier reality The complexity of existing multiplying.

The embodiment of the present application provides a kind of data processing method, which comprises

Receive pending data；

Canonical signed number coded treatment is carried out to the pending data, obtains the partial product of target code；

Accumulation process is carried out to the partial product of the target code, obtains multiplication result；

It obtains storage indication signal and reads indication signal；

Multiple multiplication results are stored into different deposit sub-circuits according to the storage indication signal；

According to the reading indication signal, read in the correspondence multiplication result stored in different deposit sub-circuits Partial data, obtain target operation result.

It is described in one of the embodiments, that canonical signed number coded treatment is carried out to the pending data, it obtains The partial product of target code, comprising:

Canonical signed number coded treatment is carried out to the pending data, obtains initial protion product；

Sign bit extension process is carried out to initial protion product, obtains the partial product of the target code.

It is described in one of the embodiments, that canonical signed number coded treatment is carried out to the pending data, it obtains Initial protion product, comprising:

Canonical signed number coded treatment is carried out to the pending data, obtains target code；

Conversion process is carried out according to the pending data and the target code, obtains the initial protion product.

It is described in one of the embodiments, that sign bit extension process is carried out to initial protion product, obtain the mesh Mark the partial product of coding, comprising: cover processing is carried out to initial protion product, obtains the partial product of the target code.

It is described in one of the embodiments, to be stored multiple multiplication results according to the storage indication signal To in different deposit sub-circuits, comprising:

Corresponding first multiplication result of first storage indication signal is stored into the first deposit sub-circuit；

Corresponding second multiplication result of second storage indication signal is stored into the second deposit sub-circuit.

It is described according to the reading indication signal in one of the embodiments, it reads in different deposit sub-circuits and stores The correspondence multiplication result in partial data, obtain target operation result, comprising:

Indication signal is read according to first, is read in the first multiplication result stored in the first deposit sub-circuit First part's data, obtain the first operation result；

Indication signal is read according to second, reads the first multiplying knot stored in the first deposit sub-circuit Second part data in fruit, obtain the second operation result；

Indication signal is read according to third, is read in the second multiplication result stored in the second deposit sub-circuit First part's data, obtain third operation result；

Indication signal is read according to the 4th, reads the second multiplying knot stored in the second deposit sub-circuit Second part data in fruit, obtain the 4th operation result.

Receive data conversion signal and pending data；

The multiplication result is subjected to revolution processing according to the data conversion signal, obtains target operation result, Wherein, the data conversion signal is used to indicate the data class that multiplier needs to be converted to the target operation result demand Type.

A kind of data processing method provided in this embodiment, the above method can carry out just the pending data received Then signed number encodes, and the number of live part product in multiplying is reduced, to reduce the complexity of multiplying.

A kind of machine learning arithmetic unit provided by the embodiments of the present application, the machine learning arithmetic unit include one or Multiple multipliers；The machine learning arithmetic unit is used to obtained from other processing units to operational data and control letter Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface；

When the machine learning arithmetic unit includes multiple multipliers, by default between multiple computing devices Specific structure is attached and transmits data；

Wherein, multiple multipliers are interconnected by PCIE bus and are transmitted data, to support more massive machine The operation of device study；Multiple multipliers share same control system or possess respective control system；Multiple multiplication Device shared drive possesses respective memory；The mutual contact mode of multiple multipliers is any interconnection topology.

A kind of combined treatment device provided by the embodiments of the present application, the combined treatment device include machine learning as mentioned Processing unit, general interconnecting interface and other processing units；The machine learning arithmetic unit and above-mentioned other processing units carry out Interaction, the common operation completing user and specifying；The combined treatment device can also include storage device, the storage device respectively with The machine learning arithmetic unit is connected with other processing units, for saving the machine learning arithmetic unit and described The data of other processing units.

A kind of neural network chip provided by the embodiments of the present application, the neural network chip include multiplication described above Device, machine learning arithmetic unit described above or combined treatment device described above.

A kind of neural network chip encapsulating structure provided by the embodiments of the present application, the neural network chip encapsulating structure include Neural network chip described above.

A kind of board provided by the embodiments of the present application, the board include neural network chip encapsulating structure described above.

The embodiment of the present application provides a kind of electronic device, the electronic device include neural network chip described above or Person's board described above.

A kind of chip provided by the embodiments of the present application, including at least one multiplier as described in any one of the above embodiments.

A kind of electronic equipment provided by the embodiments of the present application, including chip as mentioned.

Detailed description of the invention

Fig. 1 is the structural schematic diagram for the multiplier that an embodiment provides；

Fig. 2 is the structural schematic diagram for the multiplier that another embodiment provides；

Fig. 3 is the regularity of distribution schematic diagram of the partial product for 9 target codes that another embodiment provides；

The particular circuit configurations figure of summation circuit when 8 data operations that Fig. 4 provides for another embodiment；

Fig. 5 is a kind of flow diagram for data processing method that an embodiment provides；

Fig. 6 is the flow diagram for another data processing method that another embodiment provides；

Fig. 7 is a kind of structure chart for combined treatment device that an embodiment provides；

Fig. 8 is the structure chart for another combined treatment device that an embodiment provides；

Fig. 9 is a kind of structural schematic diagram for board that an embodiment provides.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Multiplier provided by the present application can be applied to AI chip, on-site programmable gate array FPGA (Field- Programmable Gate Array, FPGA) chip or be to be compared calculation process in other hardware circuit equipment, Its concrete structure schematic diagram is as depicted in figs. 1 and 2.

A kind of structural schematic diagram of the multiplier provided as shown in Figure 1 for an embodiment.The multiplier includes: multiplying Circuit 11, deposit control circuit 12, register circuit 13, state control circuit 14 and selection circuit 15, the multiplying Circuit 11 includes canonical signed number coding sub-circuit 111 and cumulative sub-circuit 112, the canonical signed number coding electricity The output end on road 111 is connect with the input terminal of the cumulative sub-circuit 112, the output end of the cumulative sub-circuit 112 with it is described Deposit the first input end connection of control circuit 12, the output end and the register circuit 13 of the deposit control circuit 12 Input terminal connection, the output end of the register circuit 13 are connect with the first input end of the selection circuit 15, the state First output end of control circuit 14 is connect with the second input terminal of the deposit control circuit 13, the state control circuit 14 Second output terminal connect with the second input terminal of the selection circuit 15.

Wherein, the canonical signed number coding sub-circuit 111 includes canonical signed number coding unit 1111 and portion Divide product acquiring unit 1112, the canonical signed number coding unit 1111 is counted for receiving the first data, and to described first According to the canonical signed number coded treatment is carried out, the target code is obtained, the partial product acquiring unit 1112 is for connecing The second data are received, obtain initial protion product according to the target code and second data, and according to the initial protion Product obtains the partial product of the target code, and the cumulative sub-circuit 112 is tired for carrying out to the partial product of the target code Processing is added to obtain multiplication result；The state control circuit 14 is for obtaining storage indication signal and reading instruction letter Number；The storage indication signal that the deposit control circuit 12 is used to be inputted according to the state control circuit 14, determination are deposited The register circuit 13 of the multiplication result is stored up, the register circuit 13 is for storing the multiplying knot Fruit, the selection circuit 15 are used to read in the register circuit 13 and store according to the reading indication signal received The multiplication result in data, as target operation result.

It, can be with specifically, above-mentioned canonical signed number coding sub-circuit 111 is by canonical signed number coding unit 1111 Canonical signed number coded treatment is carried out to the first data received and obtains target code, above-mentioned first data can be multiplication Multiplier in operation.Optionally, above-mentioned partial product acquiring unit 1112 can be compiled according to the second data and target received Code obtains initial protion product, and obtains the partial product of target code according to initial protion product, which can transport for multiplication Multiplicand in calculation.Wherein, above-mentioned multiplier and multiplicand can be the fixed-point number with bit wide.Optionally, above-mentioned register electricity Road 13 may include multiple storage units.Optionally, the bit wide of above-mentioned multiplication result can be equal to canonical signed number and compile 2 times of the data bit width that numeral circuit 111 receives.Optionally, above-mentioned canonical signed number coding sub-circuit 111 can be to solid It positions wide data to be handled, and the data bit width that receives of canonical signed number coding sub-circuit 111 can be equal to and multiply The bit wide of multiplier input terminal mouth, in addition, in the present embodiment, the bit wide of multiplier outputs mouth can be less than input port bit wide 2 times.Optionally, the input port of above-mentioned selection circuit 15 can have multiple, and the function of each input port can not phase Together, and output port can have one.Optionally, the bit wide of above-mentioned target operation result can be equal to multiplication result position Wide 1/2, the present embodiment do not do any restriction to this.In the present embodiment, it is also understood that being, the bit wide of target operation result 2 times of multiplication result bit wide can be less than.Optionally, the number of above-mentioned target code can be equal to the part of target code Long-pending number, and may include three kinds of numerical value, respectively -1,0 and 1 in the target code.

It obtains each it should be noted that above-mentioned state control circuit 14 can obtain cumulative sub-circuit 112 automatically and multiplies When method operation, corresponding storage indication signal, for example, when cumulative sub-circuit 112 obtains first multiplication result, state control The storage indication signal that circuit 14 processed obtains can be 1, when cumulative sub-circuit 112 obtains second multiplication result, state The storage indication signal that control circuit 14 obtains can be 2, and so on, cumulative sub-circuit 112 obtains each multiplying As a result, the numerical value for the storage indication signal that state control circuit 14 obtains, can be to deposit in upper multiplication result correspondence Add 1 on the basis of storage indication signal numerical value.Optionally, above-mentioned state control circuit 14 can also obtain register circuit 13 automatically In there are when multiplication result, the corresponding reading indication signal of present clock period number, wherein above-mentioned state control circuit 14 Current clock periodicity can be obtained automatically, can also receive the clock periodicity of external device transmission.For example, if when first When storing first multiplication result under the clock period, in register circuit 13, what state control circuit 14 obtained corresponds to reading Indication signal can be 1, at this point, selection circuit 15 can read the partial data stored in register circuit 13, second clock When the period, the corresponding indication signal that reads that state control circuit 14 obtains can be 2, post at this point, selection circuit 15 can be read The remainder data in first multiplication result that latch circuit 13 stores, it is also understood that being multiplier corresponding two A clock cycle can export a multiplication result；But it needs after obtaining first multiplication result by five When clock cycle available second multiplication result of, under the 6th clock cycle, register circuit 13 can just be deposited Second multiplication result is stored up, at this point, the corresponding indication signal that reads that state control circuit 14 obtains can be 3, is equivalent to The numerical value for reading indication signal can be determined according to the number of storing data in register circuit 13.

In addition, the multiplication result that cumulative sub-circuit 112 obtains not is the target operation result that multiplier obtains, mesh Two operation results that mark operation result can be exported twice by multiplier splice to obtain, and the selection circuit 15 in multiplier The operation result of output for the first time splices, the target operation that available multiplier obtains with the operation result of second of output As a result, and so on, the operation result splicing that selection circuit 15 exports twice, multiplying obtains available multiplier each time The target operation result arrived.In addition, the corresponding multiple clock cycle of multiplying operational circuit 11 can also export a target operation knot Fruit.

It should be noted that multiplier can receive cumulative sub-circuit 112 multiplication each time by depositing control circuit 12 The multiplication result of operation output, and according to the storage indication signal received, it determines and stores each multiplication result Storage unit.Optionally, selection circuit 15 can determine according to the different reading indication signals received and read corresponding posts Data in latch circuit 13, in the multiplication result of storage.Optionally, if the bit wide of multiplier input mouth is N, and The data bit width received is also N, at this point, the bit wide M of multiplier outputs mouth can be equal to 2N/t+deta ((2N/t+deta) <2N), wherein under normal conditions, multiplying operational circuit 11 can complete a multiplier by t (t>1) a clock cycle and realize Multiplying, obtain a multiplication result, and the multiplication that the cumulative sub-circuit 112 in multiplying operational circuit 11 is obtained Operation result is stored into register circuit 13, wherein deta (deta >=0) it is a constant.In addition, there is also a kind of small The case where probability, multiplier can complete multiplication operation by m (m < t, and m≤1) a clock cycle, obtain one and multiply Method operation result, and the multiplication result that the cumulative sub-circuit 112 in multiplying operational circuit 11 obtains is stored to register In circuit 13.Optionally, selection circuit 15 can with twi-read register circuit 13 store multiplication result in data, Wherein, the bit wide of multiplication result can be equal to 2N, and the data bit width in the multiplication result of reading can be equal to N, choosing Circuit 15 is selected twice and can read respectively high N data and low N data in the same multiplication result as transporting twice Calculate as a result, and two operation results are spliced, obtain multiplier and carry out the target operation result that multiplying obtains.

In addition, in the present embodiment, it is to be understood that above-mentioned partial product acquiring unit 1112 can be according to initial protion Product obtains the partial product after symbol Bits Expanding, and obtains the partial product of target code according to the partial product after symbol Bits Expanding. Optionally, the bit wide of the partial product after above-mentioned symbol Bits Expanding can be equal to 2 times of the data bit width N that multiplier receives, on The bit wide for stating initial protion product can be equal to the data bit width N that multiplier receives.Optionally, the partial product after symbol Bits Expanding In high N bit value can be equal to initial protion product in highest bit value, i.e., initial protion product symbol bit value, also It is that the high N+1 bit value in the partial product after symbol Bits Expanding is equal to, low N-1 bit value can be equal in initial protion product Low N-1 bit value.

Illustratively, if multiplier currently processed 8 * 8 fixed-point number multiplyings, pass through partial product acquiring unit 1112 obtained initial protion products are " p₇p₆p₅p₄p₃p₂p₁p₀", sign bit extension process is carried out to initial protion product, is obtained Symbol Bits Expanding after partial product can be expressed as " p₇p₇p₇p₇p₇p₇p₇p₇p₇p₆p₅p₄p₃p₂p₁p₀”。

It will also be appreciated that in the regularity of distribution of the partial product of all target codes, the part of each target code Product can have the partial product after corresponding symbol Bits Expanding, and the partial product of first aim coding can be first sign bit Partial product after extension, since the partial product that second target encodes, the partial product after corresponding symbol Bits Expanding is upper one On the basis of the partial product of a target code, one digit number value, the highest of the partial product of each target code can be moved to the left The highest order numerical value of bit value and the partial product of first aim coding is located at same row, is equivalent to, encodes from second target Partial product start, after the partial product after moving to left each symbol Bits Expanding, the corresponding more high-order numerical value moved to left is without addition Operation.

A kind of multiplier provided in this embodiment, multiplier encode sub-circuit to the number received by canonical signed number Target code is obtained according to canonical signed number coded treatment is carried out, and the partial product of target code is obtained according to target code, is led to It crosses cumulative sub-circuit and multiplication result is obtained to the partial product progress accumulation process after symbol Bits Expanding, pass through state and control electricity Road obtains storage indication signal and reads indication signal, and deposits control circuit according to storage indication signal, determines storage The register circuit of the multiplication result, by register circuit store multiplication result, meanwhile, selection circuit according to It reads indication signal to read in register circuit, the data in the multiplication result of storage obtain target operation result, this multiplies Musical instruments used in a Buddhist or Taoist mass can carry out canonical signed number coded treatment to the data received using canonical signed number coding sub-circuit, reduce The number of the live part product obtained in multiplication procedure, to reduce the complexity that multiplier realizes multiplying；Meanwhile The multiplier can be improved the operation efficiency of multiplying, effectively reduce the power consumption of multiplier.

A kind of concrete structure schematic diagram of multiplier of embodiment offer is provided.The multiplier includes: multiplication Computing circuit 21 and revolution circuit 22, the multiplying operational circuit 21 include canonical signed number coding sub-circuit 211 and tire out Add sub-circuit 212, the output end of the canonical signed number coding sub-circuit 211 and the input terminal of the cumulative sub-circuit 212 Connection, the output end on cumulative 212 tunnel Zi electricity are connect with the input terminal of the revolution circuit 22, and the revolution circuit 22 includes First conversion sub-circuit 221 and the second conversion sub-circuit 222；Wherein, canonical signed number coding sub-circuit 211 for pair The data received carry out canonical signed number coded treatment and obtain target code, and obtain target according to the target code and compile The partial product of code, the cumulative sub-circuit 212 are used to carry out accumulation process to the partial product of the target code to obtain multiplication fortune It calculates as a result, first conversion sub-circuit 221 and the second conversion sub-circuit 222 are respectively used to carry out the multiplication result Revolution processing, obtains target operation result.

Optionally, canonical signed number coding sub-circuit 211 include canonical signed number coding unit 2111 and Partial product acquiring unit 2112, the canonical signed number coding unit 2111 are used to receive the first data, and to described first Data carry out the canonical signed number coded treatment, obtain the target code, the partial product acquiring unit 2112 is used for The second data are received, obtain initial protion product according to the target code and second data, and according to the original portion Product is divided to obtain the partial product of the target code.

Specifically, above-mentioned canonical signed number coding sub-circuit 211 can carry out canonical to the data received and have symbol Number encoder processing, above-mentioned data can be the multiplier and multiplicand in multiplying, and multiplier and multiplicand can be same The fixed-point number of bit wide.Optionally, above-mentioned canonical signed number coding sub-circuit 211 may include multiple numbers with different function According to processing sub-circuit, the input port of the data processing sub-circuit of multiple and different functions can have one or more, each data The function of handling each input port in sub-circuit can not be identical, and output port can also have one, each data processing The function of each output port in sub-circuit can not be identical, and the circuit structure of different function data processing sub-circuit can With not identical.Optionally, the multiplication result of cumulative 212 output of son electricity can be converted into target by above-mentioned revolution circuit 22 The data of format, i.e. target operation result, wherein multiplication result can be fixed-point number, then the data of above-mentioned object format It can be fixed-point number, or floating number, in addition, the data bit width of object format can be less than multiplication result bit wide 2 times.Optionally, target operation result can be the partial data in multiplication result.Optionally, above-mentioned target operation result Bit wide can be equal to multiplication result bit wide 1/2, can also be equal to multiplication result bit wide 1/4, the present embodiment Any restriction is not done to this.In the present embodiment, it is also understood that being, the bit wide of target operation result is less than multiplication result 2 times of bit wide.In addition, the multiplication result that cumulative son electricity 212 obtains not is the mesh that multiplier realizes that multiplying obtains Operation result is marked, only the partial data in target operation result.Optionally, the number of above-mentioned target code can be equal to target The number of the partial product of coding, and may include three kinds of numerical value, respectively -1,0 and 1 in the target code.

It should be noted that above-mentioned canonical signed number coding sub-circuit 211 can multiply the data of fixed bit wide Method calculation process, and the data bit width that canonical signed number coding sub-circuit 211 receives can be equal to multiplier input The bit wide of mouth, in addition, in the present embodiment, the bit wide of multiplier outputs mouth can be less than 2 times of input port bit wide.

It optionally, include input port, for receiving data conversion signal in the revolution circuit 22.Optionally, described Data conversion signal is used to determine the data conversion type that the revolution circuit 22 is handled.

Optionally, above-mentioned data conversion signal can there are many, different data conversion signal corresponds to revolution circuit 22 can be with By the data conversion received at the data of object format.Optionally, above-mentioned data conversion type may include that fixed-point number turns fixed Points and fixed-point number turn floating number.Illustratively, if the bit wide of multiplier input mouth and output port is N, multiplication The multiplication result of the available 2N bit bit wide of device, and multiplier can be by revolution circuit 22 by 2N bit bit wide Multiplication result is converted into the target operation result of N-bit bit wide, which can be floating number, in addition, multiplying Musical instruments used in a Buddhist or Taoist mass can also be converted into the fixed-point number of N-bit bit wide, i.e., by revolution circuit 22 by the multiplication result of 2N bit bit wide Target operation result.In the present embodiment, the circuit structure and its function of canonical signed number coding sub-circuit 211, with canonical Signed number encode sub-circuit 111 circuit structure and its function it is identical, to this this embodiment is not repeated canonical signed number Encode the specific structure of sub-circuit 211.

A kind of multiplier provided in this embodiment, the multiplier can be using canonical signed number coding sub-circuits to reception The data arrived carry out canonical signed number coded treatment, reduce the number of the live part product obtained in multiplication procedure, from And reduce the complexity that multiplier realizes multiplying；Meanwhile the multiplier can be improved the operation efficiency of multiplying, effectively Reduce the power consumption of multiplier.

As one of embodiment, the canonical signed number coding unit 1111 may include: the input of the first data Port 1111a and target code output port 1111b；The first data-in port 1111a has for receiving progress canonical First data of symbolic number coded treatment, the target code output port 1111b is for exporting to first data Carry out the target code obtained after canonical signed number coded treatment.

Specifically, the first data-in port 1111a in canonical signed number coding unit 1111 receive first Data can be the multiplier in multiplying, which can be fixed-point number.Optionally, partial product acquiring unit 1112 receives The second data can be the multiplicand in multiplying, which can be fixed-point number, and above-mentioned multiplier and multiplicand can Think the data of same bit wide.Optionally, the number of above-mentioned target code can be equal to the number of initial protion product and target is compiled The number of the partial product of code.

It should be noted that the method for above-mentioned canonical signed number coded treatment can characterize in the following manner: for N For the multiplier of position, handled from low level numerical value to high-order numerical value, it, then can be by continuous n if it exists when continuous l (l >=2) bit value 1 Bit value 1 is converted to data " 1 (0)_l-1(- 1) ", and remaining is corresponded into position (l+1) after (N-l) bit value and conversion Numerical value is combined to obtain a new data；Then using the new data as the primary data of next stage conversion process, until There is no until continuous l (l >=2) bit value 1 in the new data obtained after conversion process；Wherein, canonical is carried out to N multipliers The bit wide of signed number coded treatment, obtained target code can be equal to (N+1).Further, it is compiled in canonical signed number Code processing when, data 11 can be converted to (100-001), i.e., data 11 can equivalence be converted to 10 (- 1)；Data 111 can turn Be changed to (1000-0001), i.e., data 111 can equivalence be converted to 100 (- 1)；And so on, other continuous l (l >=2) digit The mode of 1 conversion process of value is also similar.

For example, the multiplier that canonical signed number coding unit 1111 receives is " 001010101101110 ", to the multiplier Obtained the first new data is " 0010101011100 (- 1) 0 " after carrying out first order conversion process, continue to the first new data into The second new data obtained after the conversion process of the row second level is " 0010101100 (- 1) 00 (- 1) 0 ", is continued to the second new data Carrying out the third new data obtained after third level conversion process is " 0010110 (- 1) 00 (- 1) 00 (- 1) 0 ", is continued new to third It is " 00110 (- 1) 0 (- 1) 00 (- 1) 00 (- 1) 0 " that data, which carry out the 4th new data obtained after fourth stage conversion process, is continued Carrying out the 5th new data obtained after level V conversion process to the 4th new data is " 010 (- 1) 0 (- 1) 0 (- 1) 00 (- 1) 00 (- 1) 0 ", there is no continuous l (l >=2) bit values 1 in the 5th new data, at this point, the 5th new data is properly termed as centre Coding, and after carrying out the processing of cover to intermediate code, characterization canonical signed number coded treatment is completed, wherein is compiled centre The bit wide of code can be equal to the bit wide of multiplier.Optionally, canonical signed number coding unit 1111, which carries out canonical to multiplier, symbol After number coded treatment, in obtained new data (i.e. intermediate code), if highest bit value and time high-order numerical value in new data For " 10 " or " 01 ", then canonical signed number coding unit 1111 can highest digit to the intermediate code that the new data obtains One digit number value 0 is mended at higher one of value, high three bit value for obtaining corresponding target code is respectively " 010 " or " 001 ".It is optional , the bit wide that the bit wide of above-mentioned intermediate code can be equal to target code subtracts 1.

It should be noted that canonical signed number coding unit 1111 can be incited somebody to action by target code output port 1111b Target code output.Optionally, the bit wide of above-mentioned target code can be equal to canonical signed number coding unit 1111 and receive Data bit wide, and may include three kinds of numerical value, respectively -1,0 and 1 in target code, it is understood that, target The number for the numerical value for including in coding can be equal to the bit wide of target code.

A kind of multiplier provided in this embodiment, can be with by the canonical signed number coding unit in multiplying operational circuit Canonical signed number coded treatment is carried out to the data received and obtains target code, then by partial product acquiring unit according to every One target code obtains initial protion product, and obtains the partial product of target code according to initial protion product, finally by cumulative Sub-circuit carries out accumulation process to the partial product of target code, obtains multiplying processing, is deposited by state control circuit acquisition It stores up indication signal and reads indication signal, and deposit control circuit according to storage indication signal, determine and store the multiplication The register circuit of operation result stores multiplication result by register circuit, meanwhile, selection circuit is indicated according to reading In signal-obtaining register circuit, the data in the multiplication result of storage obtain target operation result, which can Canonical signed number coded treatment is carried out to the data received using canonical signed number coding unit, reduces multiplying The number of the live part product obtained in journey, to reduce the complexity that multiplier realizes multiplying；Meanwhile the multiplier energy The operation efficiency for enough improving multiplying, effectively reduces the power consumption of multiplier.

As one of embodiment, the partial product acquiring unit 1112 is specifically used for turning the target code It changes processing and obtains initial protion product, and sign bit extension process is carried out to initial protion product, after obtaining symbol Bits Expanding Partial product obtains the partial product of the target code according to the partial product after the symbol Bits Expanding.

Specifically, above-mentioned conversion process can be characterized as, based on the multiplicand (i.e. X) in multiplying, by target code In numerical value conversion at initial protion product.Optionally, each bit value in target code has corresponding initial protion product；If Numerical value in target code is -1, then corresponding initial protion product can be that-X is corresponded to if the numerical value in target code is 1 Initial protion product can be X, if numerical value in target code is 0, corresponding initial protion product can be 0.Optionally, on Stating initial protion product can be not carry out the partial product of symbol Bits Expanding, and the bit wide of initial protion product can be with multiplying electricity The bit wide that road 11 is presently in reason data is identical.Optionally, the bit wide of the partial product after above-mentioned symbol Bits Expanding can be equal to and multiply Musical instruments used in a Buddhist or Taoist mass handles 2 times of data bit width N, at this point, the bit wide of initial protion product can be equal to N.Optionally, the portion after symbol Bits Expanding The low N bit value divided in product can be equal to the N bit value that initial protion product includes, the high N in partial product after symbol Bits Expanding Bit value can be equal to the highest bit value of initial protion product, i.e. the symbol bit value of initial protion product.

In addition, partial product acquiring unit 1112 can obtain target according to the partial product after obtained all symbol Bits Expandings The partial product of coding, in the regularity of distribution of the partial product of all target codes, the partial product of first aim coding can be equal to Partial product after first symbol Bits Expanding, since the partial product that second target encodes, the part of each target code The highest order numerical value for the partial product that long-pending highest bit value can be encoded with first aim is located at same row, each target is compiled The bit wide for the partial product that the bit wide of the partial product of code can be equal to a upper target code subtracts 1, can also be equal to each correspondence The bit wide 2N of partial product after symbol Bits Expanding subtracts (i-1), wherein and i indicates number of the partial product of target code since 1, The distribution map of the partial product of 9 obtained target codes can be as shown in Figure 3.

Optionally, the partial product acquiring unit 1112 includes: target code input port 1112a, the input of the second data Port 1112b and partial product output port 1112c；The target code input port 1112a is compiled for receiving the target Code, for receiving second data, the partial product output port 1112c is used for the second data-in port 1112b Export the partial product of the target code.

In the present embodiment, partial product acquiring unit 1112 can receive canonical by target code input port 1112a The target code that signed number coding unit 1111 obtains receives the second data by the second data-in port 1112b, according to Target code and the second data carry out conversion process and shifting processing obtains the partial product of target code, and target is compiled The partial product of code is exported by partial product output port 1112c.

The number of a kind of multiplier provided in this embodiment, the live part product that multiplier can obtain is less, to drop Low multiplier realizes the complexity of multiplying；Meanwhile the multiplier can be improved the operation efficiency of multiplying, be effectively reduced The power consumption of multiplier.

A kind of multiplier that another embodiment provides, wherein multiplier includes the cumulative sub-circuit 112, the cumulative son Circuit 112 includes: Wallace tree group unit 1121 and summing elements 1122；Wherein, the Wallace tree group unit 1121 is defeated Outlet is connect with the input terminal of the summing elements 1122；The Wallace tree group unit 1121 is used for the target code Partial product carries out accumulation process and obtains accumulating operation as a result, the summing elements 1122 are used to carry out the accumulating operation result Accumulation process.

Specifically, above-mentioned Wallace tree group unit 1121 can compile all targets that partial product acquiring unit 1112 obtains Numerical value in the partial product of code carries out accumulation process and obtains accumulating operation as a result, and by summing elements 1122 to Wallace tree group Unit 1121 obtains accumulating operation result and carries out accumulation process, obtains target operation result.

A kind of multiplier provided in this embodiment can carry out the partial product of target code by Wallace tree group unit Accumulation process, and accumulation process is carried out to accumulation result by summing elements, multiplication result is obtained, and according to multiplying As a result target operation result is obtained, to guarantee that the number for the live part product that multiplier obtains is less, multiplier is reduced and realizes The complexity of multiplying；Meanwhile the multiplier can be improved the operation efficiency of multiplying, effectively reduce the function of multiplier Consumption.

The Wallace tree group unit 1121 in multiplier that another embodiment provides includes: Wallace tree subelement 1121_1~1121_n, multiple Wallace tree subelement 1121_1~1121_n are used for the partial product to all target codes In each columns value carry out accumulation process.

Specifically, the circuit structure of Wallace tree subelement 1121_1~1121_n can be combined by full adder and half adder Realize, furthermore it is also possible to be interpreted as Wallace tree subelement 1121_1~1121_n be one kind can to multidigit input signal into Row processing, multidigit input signal is added to obtain the circuit of two output signals.Optionally, Wallace tree group unit 1121 includes The number n of Wallace tree subelement can be equal to multiplying operational circuit 11 and be presently in reason 2 times of data bit width, and n Wallace tree subelement can carry out parallel processing to the partial product of target code, but connection type can be serial connection.It can Choosing, each Wallace tree subelement can be every in the partial product to all target codes in Wallace tree group unit 1121 One columns value carries out addition process, each Wallace tree subelement can export two signals, i.e. carry signal Carry_iWith One and position signal Sum_i, wherein i can indicate each corresponding number of Wallace tree subelement, first Wallace tree The number of subelement is 1.Optionally, the number that each Wallace tree subelement receives input signal can be equal to target and compile The number of code or the number of the partial product after symbol Bits Expanding.

In addition, the signal that each Wallace tree subelement receives in Wallace tree group unit 1121 may include carry Input signal Cin_i, partial product input signal, carry output signals Cout_i.Optionally, each Wallace tree subelement receives To partial product input signal can be each columns value in the partial product of all target codes, each Wallace tree is single The carry signal Cout of member output_iDigit can be equal to N_Cout=floor ((N_I+N_Cin)/2)-1.Wherein, N_IIt can indicate this The number of the partial product numerical value input signal of Wallace tree subelement, N_CinIt can indicate that the carry of the Wallace tree subelement is defeated Enter the number of signal, N_CoutIt can indicate the number of the least carry output signals of Wallace tree subelement, floor () can To indicate downward bracket function.Optionally, in Wallace tree group unit 1121 each Wallace tree subelement receive into Position input signal can be the carry output signals of upper Wallace tree subelement output, and first Wallace tree is single The carry input signal that member receives can be 0, meanwhile, the carry signal input terminal that first Wallace tree subelement receives The number of mouth, can be identical as the number of carry signal input port of other Wallace tree subelements.

Illustratively, if multiplying operational circuit 11 currently processed 8 * 8 multiplyings, pass through partial product acquiring unit Partial product after 1112 obtained symbol Bits Expandings is " p_i9p_i9p_i9p_i9p_i9p_i9p_i9p_i9p_i8p_i7p_i6p_i5p_i4p_i3p_i2p_i1" (i= 1 ..., n=9), wherein i can indicate the partial product after i-th of symbol Bits Expanding, and according to the portion after 9 symbol Bits Expandings Divide product to obtain the partial product of 9 target codes, and accumulation process is carried out to the partial product of this 9 target codes.Optionally, 9 The regularity of distribution of the partial product of target code may refer to shown in Fig. 3, each origin can represent the portion after symbol Bits Expanding Divide each bit value in product, and the partial product of first aim coding can be the partial product after first symbol Bits Expanding, Wherein, in the regularity of distribution of the partial product of 9 target codes, the partial product of each target code can have corresponding symbol Partial product after Bits Expanding, since the partial product that second target encodes, the partial product after corresponding symbol Bits Expanding is upper On the basis of the partial product of one target code, it can be moved to the left one digit number value, the partial product of each target code is most The highest order numerical value of the partial product of high-order numerical value and first aim coding is located at same row, is equivalent to, and compiles from second target The partial product of code starts, and after the partial product after moving to left each symbol Bits Expanding, the corresponding more high-order numerical value moved to left is without adding Method operation.Optionally, in the partial product of 9 target codes, the partial product of first aim coding can be first sign bit Partial product after extension, since the partial product that second target encodes, the highest digit of the partial product of each target code Value is located at same row with the highest order numerical value of the partial product of first aim coding；It counts from right column to left column, needs altogether The partial product that 16 Wallace tree subelements accord with target code to 9 carries out accumulation process, the company of 16 Wallace tree subelements It is as shown in Figure 4 to connect circuit diagram, wherein Wallace_i indicates that Wallace tree subelement, i are Wallace tree subelement from 1 in Fig. 4 The number of beginning, and the solid line connected between Wallace tree subelement two-by-two indicates that the corresponding Wallace tree of high bit number is single Member has carry output signals, and dotted line indicates that the corresponding Wallace tree subelement of high bit number does not carry out signal.

The number of a kind of multiplier provided in this embodiment, the live part product which obtains is less, reduces multiplication The complexity of device realization multiplying；Meanwhile the multiplier can be improved the operation efficiency of multiplying, effectively reduce multiplication The power consumption of device.

As one of embodiment, wherein the summing elements 1122 in multiplier include: adder, the adder For carrying out add operation to the cumulative correction result received.

Specifically, adder can be the adder of different bit wides, which can be carry lookahead adder.It is optional , adder can receive the two paths of signals that amendment Wallace tree group unit 1121 exports, and carry out addition to two-way output signal Operation exports multiplication result.

Optionally, the adder includes: carry signal input port and position signal input port and result output end Mouthful；The carry signal input port is for receiving carry signal, and described and position signal input port is used to receive and position signal, The result output port is used to export the carry signal and described and position signal carries out the result of accumulation process.

Specifically, adder can receive what amendment Wallace tree group unit 1121 exported by carry signal input port Carry signal Carry, by receiving amendment Wallace tree group unit 1121 exports and position signal with position signal input port Sum, and by carry signal Carry with and position signal Sum progress accumulated result, exported by result output port.

It should be noted that multiplication process circuit 11 can be using the adder of different bit wides to amendment when multiplying Wallace tree group unit 1121 export carry output signals Carry with and position output signal Sum progress add operation, wherein The bit wide that above-mentioned adder can handle data can be equal to 2 times of the currently processed data bit width N of multiplier.Optionally, it corrects Each of Wallace tree group unit 1121 Wallace tree subelement can export a carry output signals Carry_i, with one A and position output signal Sum_i(i=0 ..., 2N-1, i are the reference numeral of each Wallace tree subelement, and number is opened from 0 Begin).Optionally, the Carry={ [Carry that adder receives₀: Carry_2N-2], 0 }, that is to say, that adder received The bit wide of carry output signals Carry is 2N, the corresponding amendment Wallace tree group of preceding 2N-1 bit value in carry output signals Carry In unit 1121, the carry output signals of preceding 2N-1 Wallace tree subelement, last one digit number in carry output signals Carry Value can be replaced with numerical value 0.Optionally, adder receive and position output signal Sum bit wide be 2N and position output signal Numerical value in Sum, which can be equal to exporting with position for each Wallace tree subelement in amendment Wallace tree group unit 1121, to be believed Number.

Illustratively, if multiplying operational circuit 11 currently processed 8 * 8 multiplyings, adder can be 16 Carry lookahead adder continues as shown in figure 4, amendment Wallace tree group unit 1121 can export 16 Wallace tree subelements And position output signal Sum and carry output signals Carry, still, 16 carry lookahead adders receive and position output The complete and position signal Sum that signal can export for amendment Wallace tree group unit 1121, the carry output signals received can Think in amendment Wallace tree group unit 1121, removes the institute of the carry output signals of the last one Wallace tree subelement output There are carry output signals, the carry signal Carry after being combined with numerical value 0.

The number of a kind of multiplier provided in this embodiment, the live part product which obtains is less, reduces multiplication The complexity of operation improves the operation efficiency of multiplying, effectively reduces the power consumption of multiplier.

In one embodiment, multiplier includes the register circuit 13, which includes: deposit son electricity Road 131, the deposit sub-circuit 131 is for storing the corresponding multiplication result of different storage indication signals.

Specifically, above-mentioned register circuit 13 may include two or more deposit sub-circuits 131, it is also understood that be, The number that sub-circuit 131 is deposited in register circuit 13, can be equal to 2N_in/N_out, N_inIndicate the data bit that multiplier receives Width, N_out(N_out<2N_in) indicate the data bit width that multiplier exports.Optionally, the data bit width that deposit sub-circuit 131 stores can To be equal to 2 times of multiplier input mouth bit wide.Optionally, the data bit width that multiplier receives can be equal to multiplier and input The bit wide of port, and the data bit width of multiplier output can be equal to the bit wide of multiplier input mouth, be also less than and multiply 2 times of multiplier input terminal mouth bit wide.Illustratively, if the bit wide of multiplier input mouth and the bit wide of output port are N ratio Spy, then register circuit 13 needs to be composed by two deposit sub-circuits 131；If the bit wide of multiplier input mouth is N The bit wide of bit, output port is N/2 bit, then register circuit 13 needs to be composed by four deposit sub-circuits 131. Optionally, the multiplication result that multiplier can obtain multiplying each time according to storage indication signal, stores to right The 2N answered_in/N_outIn a deposit sub-circuit 131, wherein different storage indication signals has corresponding storage multiplication result Difference deposit sub-circuit 131.Optionally, each multiplication result that multiplier obtains, can only be according to storage indication signal Corresponding deposit sub-circuit 131 stores, the multiplication result that will can not be obtained each time, store to storage indication signal In not corresponding other deposit sub-circuits 131.

Illustratively, if having n deposit sub-circuit 131, reference numeral 1,2,3 ..., n in register circuit 13, then First multiplication result that multiplier obtains can store into No. 1 deposit sub-circuit 131, at this point, storage indication signal Numerical value can be 1, second multiplication result that multiplier obtains can store into No. 2 deposit sub-circuits 132, this When, the numerical value for storing indication signal can be for 2, it is also understood that storing multiplication when being that store the numerical value of indication signal be odd number The reference numeral of the deposit sub-circuit 131 of operation result is also odd number, when the numerical value for storing indication signal is even number, stores multiplication The reference numeral of the deposit sub-circuit 131 of operation result is also even number, wherein the numerical value for storing indication signal, which can be equal to, to be corresponded to Store the number of the deposit sub-circuit 131 of multiplication result.

A kind of multiplier provided in this embodiment, the deposit sub-circuit in multiplier, according to different storage indication signals The multiplication result that multiplying each time is obtained is stored into different deposit sub-circuits, and then is indicated according to reading Data in the multiplication result of the corresponding deposit sub-circuit storage of signal output, so as to subsequently through output port bit wide not With 2 times of multiplier of input port bit wide, target operation result is exported, meanwhile, the live part product that above-mentioned multiplier obtains Number is less, reduces the complexity that multiplier realizes multiplying.

A kind of multiplier that another embodiment provides, wherein multiplier includes the cumulative sub-circuit 212, the cumulative son Circuit 212 includes: Wallace tree group unit 2121 and summing elements 2122；Wherein, the Wallace tree group unit 2121 is defeated Outlet is connect with the input terminal of the summing elements 2122；The Wallace tree group unit 2121 is used for the target code Partial product carries out accumulation process and obtains accumulating operation as a result, the summing elements 2122 are used to carry out the accumulating operation result Accumulation process obtains the target operation result.

Specifically, above-mentioned Wallace tree group unit 2121 can compile all targets that partial product acquiring unit 2112 obtains Numerical value in the partial product of code carries out accumulation process and obtains accumulating operation as a result, and by summing elements 2122 to Wallace tree group Unit 2121 obtains accumulating operation result and carries out accumulation process, obtains target operation result.

Optionally, a kind of multiplier includes the Wallace tree group unit 2121, which includes: Wallace tree subelement 2121_1~2121_n, multiple Wallace tree subelement 2121_1~2121_n are for all mesh The each columns value marked in the partial product of coding carries out accumulation process.

In the present embodiment, the circuit structure and its function of Wallace tree group unit 2121, with Wallace tree group unit 1121 circuit structure and its function can be identical, to the specific knot of this this embodiment is not repeated Wallace tree group unit 2121 Structure.

A kind of multiplier provided in this embodiment can carry out the partial product of target code by Wallace tree group unit Accumulation process, and accumulation process is carried out to result by summing elements, multiplication result is obtained, and according to multiplication result Target operation result is obtained, to guarantee that the number for the live part product that multiplier obtains is less, multiplier is reduced and realizes multiplication The complexity of operation；Meanwhile the multiplier can be improved the operation efficiency of multiplying, effectively reduce the power consumption of multiplier.

As one of embodiment, wherein multiplier includes the summing elements 2122, the summing elements 2122 packet Include: adder, the adder are used to carry out add operation to the accumulating operation result.

Specifically, adder can be the adder of different bit wides, which can be carry lookahead adder.It is optional , adder can receive the two paths of signals of the output of Wallace tree group unit 2121, add operation is carried out to two-way output signal, Export multiplication result.

A kind of multiplier provided in this embodiment can believe the two-way that Wallace tree group unit exports by summing elements Number accumulation process is carried out, exports multiplication result, and target operation result is obtained according to multiplication result, to guarantee to multiply The number for the live part product that musical instruments used in a Buddhist or Taoist mass obtains is less, reduces the complexity that multiplier realizes multiplying；Meanwhile the multiplier energy The operation efficiency for enough improving multiplying, effectively reduces the power consumption of multiplier.

In one of the embodiments, wherein, multiplier includes the adder, which includes: that carry signal is defeated Inbound port and position signal input port and result output port；The carry signal input port is used to receive carry signal, Described and position signal input port is used to export the carry signal and institute for receiving with position signal, the result output port It states and carries out the multiplication result that accumulation process obtains with position signal.

Specifically, adder can receive the carry that Wallace tree group unit 2121 exports by carry signal input port Signal Carry, by receiving Wallace tree group unit 2121 exports and position signal Sum with position signal input port, and will be into Position signal Carry with and the multiplication result that is added up of position signal Sum, exported by result output port.

It should be noted that multiplying operational circuit 21 can use the adder Lay to China of different bit wides when multiplying Scholar's tree group unit 2121 export carry output signals Carry with and position output signal Sum progress add operation, wherein it is above-mentioned The bit wide that adder can handle data can be equal to 2 times of the currently processed data bit width N of multiplier.Optionally, Wallace tree Each of group unit 2121 Wallace tree subelement can export a carry output signals Carry_i, defeated with one and position Signal Sum out_i(i=0 ..., 2N-1, i are the reference numeral of each Wallace tree subelement, are numbered since 0).It is optional , the Carry={ [Carry that adder receives₀: Carry_2N-2], 0 }, that is to say, that the carry-out that adder receives The bit wide of signal Carry is 2N, before preceding 2N-1 bit value corresponds in Wallace tree group unit 2121 in carry output signals Carry The carry output signals of 2N-1 Wallace tree subelement, last bit value can use for 0 generation in carry output signals Carry It replaces.Optionally, adder receive and the bit wide of position output signal Sum be that numerical value in 2N and position output signal Sum can be with Equal in Wallace tree group unit 2121 each Wallace tree subelement and position output signal.

Illustratively, if multiplying operational circuit 11 currently processed 8 * 8 multiplyings, adder can be 16 Carry lookahead adder continues as shown in figure 4, Wallace tree group unit 2121 can export the sum of 16 Wallace tree subelements Position output signal Sum and carry output signals Carry, still, 16 carry lookahead adders receive and position output signal It can be the complete and position signal Sum that Wallace tree group unit 2121 exports, the carry output signals received can be Hua Lai In scholar's tree group unit 2121, all carry-outs letter of the carry output signals of the last one Wallace tree subelement output is removed Carry signal Carry after number being combined with 0.

A kind of multiplier provided in this embodiment can believe the two-way that Wallace tree group unit exports by summing elements Number accumulating operation is carried out, exports multiplication result, and target operation result is obtained according to multiplication result, to guarantee to multiply The number for the live part product that musical instruments used in a Buddhist or Taoist mass obtains is less, reduces the complexity that multiplier realizes multiplying；Meanwhile the multiplier energy The operation efficiency for enough improving multiplying, effectively reduces the power consumption of multiplier.

A kind of multiplier that another embodiment provides, the multiplier include first conversion sub-circuit 221 and described Second conversion sub-circuit 222, first conversion sub-circuit 221 are specifically used for the multiplication result being converted into floating-point class The target operation result of type, it is fixed that second conversion sub-circuit 222 is specifically used for for the multiplication result being converted into The target operation result of vertex type.

Specifically, the bit wide of above-mentioned multiplication result can be equal to 2 times of the data bit width that multiplier receives, floating-point The bit wide of type operation result and the bit wide of fixed point type operation result can be equal to the bit wide of multiplier outputs mouth, and In revolution circuit 22, the bit wide of the operation result of floating point type can be equal to the bit wide of the operation result of fixed point type.

It should be noted that first conversion sub-circuit 221 and the second conversion sub-circuit 222 do not have in revolution circuit 22 Any connection relationship, the two is mutually indepedent, and each time when multiplying, revolution circuit 22 only needs to use the first conversion sub-circuit 221 or second conversion sub-circuit 222 carry out the processing of data revolution, obtain target operation result.Optionally, revolution circuit 22 It can determine that this multiplying is needed through the first conversion sub-circuit 221 or according to the data conversion signal received Two conversion sub-circuits 222 carry out the processing of data revolution.

Optionally, data conversion signal may include two kinds of signals, can be expressed as 00,01 with binary numeral respectively, Wherein, it may include the data that receive of revolution circuit 22 is determining for 2N bit bit wide that data conversion signal, which is the signal of 00 characterization, The fixed-point number of the 2N bit bit wide is needed to be converted into the fixed-point number of N-bit bit wide, and conversion postfixed point number decimal point by points Position, wherein the position of the fixed-point number decimal point of 2N bit bit wide can be determining before converting；Data conversion signal is 01 The signal of characterization may include the fixed-point number that the multiplication result that receives of revolution circuit 22 is 2N bit bit wide, by the 2N ratio The fixed-point number of special bit wide needs to be converted into the floating number of N-bit bit wide.Optionally, revolution circuit 22 can be according to two received The different data conversion signal of kind is transported the multiplication received by the first conversion sub-circuit 221 or the second conversion sub-circuit 222 It calculates result and carries out different revolution processing, specific implementation is accomplished in that

(1) if the data conversion signal that revolution circuit 22 receives is 00, revolution circuit 22 can be by 2N bit bit wide Fixed-point number be converted into the fixed-point number of N-bit bit wide, at this point, revolution circuit 22 can be docked by the second conversion sub-circuit 222 The fixed-point number of the 2N bit bit wide received carries out data conversion, specifically, when revolution is handled, N-bit after needing to convert target The position of the fixed-point number decimal point of bit wide, the aligned in position with the fixed-point number decimal point for converting preceding 2N bit bit wide, then intercepts The total N bit value in fixed-point number scaling position front and back of 2N bit bit wide, the fixed point of the N-bit bit wide after being converted before converting Number, the mode of interception can be divided into three kinds of situations:

Situation a is all contained in the fixed-point number for converting preceding 2N bit bit wide, then the second conversion when that will intercept N bit value Sub-circuit 222 can directly intercept the total N bit value in scaling position front and back in the fixed-point number for converting preceding 2N bit bit wide；

Situation b, when a part of numerical value in the N bit value that will be intercepted includes the fixed-point number of 2N bit bit wide before switching It is interior, and the high-order portion numerical value in the N bit value for needing to intercept, it is not corresponding in the fixed-point number of 2N bit bit wide before switching Component values can intercept, then the second conversion sub-circuit 222 can use the sign bit for the fixed-point number for converting preceding 2N bit bit wide, right This part bits per inch value carries out cover, and N bit value is then intercepted from the fixed-point number after cover；

Situation c, when a part of numerical value in the N bit value that will be intercepted includes the fixed-point number of 2N bit bit wide before switching It is interior, and the low portion numerical value in the N bit value for needing to intercept, it is not corresponding in the fixed-point number of 2N bit bit wide before switching Component values can intercept, then the second conversion sub-circuit 222 can be according to the positive and negative of the fixed-point number for converting preceding 2N bit bit wide, to this Part bits per inch value carries out cover, if for the fixed-point number of 2N bit bit wide into positive number, this part bits per inch value can use number before converting Otherwise 0 cover of value uses 1 cover of numerical value, N bit value is then intercepted from the fixed-point number after cover；

(2) if the data conversion signal that revolution circuit 22 receives is 01, revolution circuit 22 can be by 2N bit bit wide Fixed-point number be converted into the floating number of N-bit bit wide, at this point, revolution circuit 22 can be docked by the first conversion sub-circuit 221 The fixed-point number of the 2N bit bit wide received carries out data conversion, specifically, when revolution is handled, by the highest bit value of fixed-point number (i.e. sign bit) can be used as the symbol bit value of floating number after conversion, in addition, if 2N fixed-point numbers remove before converting into positive number Highest order numerical symbol position is gone to, is searched from 2N-1 fixed-point number highest orders toward lowest order direction, when finding numerical value 1, statistical number There are also m bit values after value 1, at this point, the index bit value of floating number can add exponent bits deviant i equal to m after conversion, and subtract The position of 2N fixed-point number decimal points before converting still if 2N fixed-point numbers is negatives before converting, removes highest bit value symbol Number position is searched from 2N-1 fixed-point number highest orders toward lowest order direction, and when finding numerical value 0, statistics is that there are also m after numerical value 0 Bit value, in addition it is also necessary to mantissa bit value of the high n bit value as floating number after conversion in m bit value is intercepted, if m >= N, then can directly intercept n bit value as mantissa's bit value, can be to mend n-m after 2N fixed-point numbers before switching if m < n Highest order (i.e. sign bit) numerical value.

Illustratively, if desired the fixed-point number of 2N bit bit wide is converted into the floating number of 16 bit bit wides, then i can be waited 10 can be equal in 16, n；If desired the fixed-point number of 2N bit bit wide is converted into the floating number of 32 bit bit wides, then i can be waited 23 can be equal in 127, n；If desired the fixed-point number of 2N bit bit wide is converted into the floating number of 64 bit bit wides, then i can be with 52 can be equal to equal to 1023, n.

A kind of multiplier provided in this embodiment, the multiplier can be converted by revolution circuit by multiplication result After the bit wide data equal with multiplier outputs mouth bit wide, target operation result is exported, so that the target operation knot obtained The bit wide of fruit can be less than 2 times of the data bit width of multiplier input, to effectively reduce multiplier to input/output port The requirement of bit wide, meanwhile, the number for the live part product that above-mentioned multiplier obtains is less, reduces multiplier and realizes multiplying Complexity.

Fig. 5 is the flow diagram for the data processing method that an embodiment provides, and this method can be multiplied by shown in FIG. 1 Musical instruments used in a Buddhist or Taoist mass is handled, and what is involved is the processes that data are compared with operation for the present embodiment.As shown in figure 5, this method comprises:

S101, pending data is received.

Specifically, the canonical signed number coding sub-circuit in multiplier can receive two pending datas.Optionally, Canonical signed number coding sub-circuit can handle the data of two fixed bit wides, and fixed bit wide can be defeated equal to multiplier The bit wide of inbound port.Optionally, the pending data that above-mentioned canonical signed number coding sub-circuit receives can be fixed-point number, And the bit wide of fixed-point number can be equal to the bit wide of multiplier input mouth.

S102, canonical signed number coded treatment is carried out to the pending data, obtains the partial product of target code.

Specifically, the method for above-mentioned canonical signed number coded treatment can characterize in the following manner: for N multipliers For, it is handled from low level numerical value to high-order numerical value, it, then can be by continuous n bit value if it exists when continuous l (l >=2) bit value 1 1 is converted to data " 1 (0)_l-1(- 1) ", and by remaining correspond to (N-l) bit value and conversion after (l+1) bit value into Row combines and obtains a new data；Then using the new data as the primary data of next stage conversion process, at conversion There is no until continuous l (l >=2) bit value 1 in the new data obtained after reason；Wherein, carrying out canonical to N multipliers has symbol Number encoder processing, the bit wide of obtained target code can be equal to (N+1).It should be noted that the part of above-mentioned target code Long-pending number can be equal to the data bit width N that multiplier receives and add 1.

S103, accumulation process is carried out to the partial product of the target code, obtains multiplication result.

Specifically, cumulative sub-circuit can each columns value in the partial product to all target codes carry out cumulative fortune It calculates, obtains multiplication result.Optionally, the bit wide of above-mentioned multiplication result can be equal to the data bit that multiplier receives Wide 2 times can also be equal to 2 times of multiplier input mouth bit wide.

S104, it obtains storage indication signal and reads indication signal.

Specifically, multiplier can obtain storage indication signal by state control circuit automatically and read instruction letter Number.

S105, multiple multiplication results are stored to different deposit sub-circuits according to the storage indication signal In.

Specifically, the storage indication signal that the state control circuit in multiplier will acquire can be input to deposit control electricity Road deposits control circuit according to the storage indication signal received, determines the multiplication result that this multiplying obtains, can To store into corresponding deposit sub-circuit.

It should be noted that a deposit sub-circuit can only at most store a multiplication result, and multiple deposits Can have part deposit sub-circuit in sub-circuit is idle state.

S106, according to the reading indication signal, read the correspondence multiplyings stored in different deposit sub-circuits As a result the partial data in obtains target operation result.

Specifically, the selection circuit in multiplier can read corresponding deposit according to the reading indication signal received The partial data in multiplication result stored in circuit, as target operation result.Optionally, above-mentioned operation result is not It is target operation result, the target operation result of multiplying can be spliced to read operation result twice, or Multiple operation result is read to be spliced, it can be understood as, the bit wide of partial data can be equal in above-mentioned multiplication result The 1/2 of multiplication result bit wide is also less than the 1/2 of multiplication result bit wide.Optionally, the position of target operation result Width can be less than or equal to the bit wide of multiplier input mouth.

A kind of data processing method provided in this embodiment, this method, which can carry out canonical to the data received, symbol Number encoder processing, obtains the partial product of target code, carries out accumulation process to the partial product of target code, obtains multiplying knot Fruit reads high position data and low data in multiplication result respectively, as target operation result, so that the mesh obtained The bit wide for marking operation result can be less than 2 times of the data bit width that multiplier inputs, to effectively reduce multiplier to input The requirement of output port bit wide；Meanwhile this method can carry out the data received using canonical signed number coding circuit Canonical signed number coded treatment, reduces the number of the live part product obtained in multiplication procedure, to reduce multiplication fortune The complexity of calculation；Meanwhile this method can be improved the operation efficiency of multiplying.

As one of embodiment, the pending data is carried out at canonical signed number coding in above-mentioned S102 Reason the step of obtaining the partial product of target code, may include:

S1021, canonical signed number coded treatment is carried out to the pending data, obtains initial protion product.

Optionally, canonical signed number coded treatment is carried out to the pending data in above-mentioned S1021, obtains original portion Divide long-pending step, may include:

S1021a, canonical signed number coded treatment is carried out to the pending data, obtains target code.

Specifically, multiplier can carry out canonical to the multiplier to be processed received by canonical signed number coding unit Signed number coded treatment, obtains target code.Wherein, the bit wide of target code can be equal to multiplier bit wide N to be processed and add 1.

Optionally, canonical signed number coded treatment is carried out to the pending data in above-mentioned S1021a, obtains target The step of coding may include: that l bit value 1 continuous in the pending data is converted to the position (l+1) highest bit value to be 1, lowest order numerical value be -1, remaining position be numerical value 0 after, obtain the target code, wherein l be more than or equal to 2.

It should be noted that the method for above-mentioned canonical signed number coded treatment can characterize in the following manner: for N For the multiplier of position, handled from low level numerical value to high-order numerical value, it, then can be by continuous n if it exists when continuous l (l >=2) bit value 1 Bit value 1 is converted to data " 1 (0)_l-1(- 1) ", and remaining is corresponded into position (l+1) after (N-l) bit value and conversion Numerical value is combined to obtain a new data；Then using the new data as the primary data of next stage conversion process, until There is no until continuous l (l >=2) bit value 1 in the new data obtained after conversion process；Wherein, canonical is carried out to N multipliers The bit wide of signed number coded treatment, obtained target code can be equal to (N+1).

S1022b, conversion process is carried out according to the pending data and the target code, obtains the initial protion Product.

It should be noted that the number of above-mentioned initial protion product can be equal to the bit wide of target code.

Illustratively, if partial product acquiring unit receives one 8 multiplicand " x₇x₆x₅x₄x₃x₂x₁x₀" (i.e. X), then Partial product acquiring unit can be according to multiplicand " x₇x₆x₅x₄x₃x₂x₁x₀" three kinds of numerical value -1 including in (i.e. X) and target code, 0,1 directly obtains corresponding initial protion product, and when one digit number value is -1 in target code, then initial protion product can be-X, when When one digit number value is 0 in target code, then initial protion product can be 0, when one digit number value is 1 in target code, then original Partial product can be X.Optionally, above-mentioned conversion process can be characterized as, based on the multiplicand in multiplying, by target code In numerical value conversion at initial protion product.

S1022, sign bit extension process is carried out to initial protion product, obtains the partial product of the target code.

Optionally, sign bit extension process is carried out to initial protion product in above-mentioned S1022, obtains the target code Partial product the step of, can specifically include: to the initial protion product carry out cover processing, obtain the portion of the target code Divide product.

Specifically, the bit wide of the partial product after symbol Bits Expanding can be equal to 2 that multiplier is presently in reason data bit width N Times, and the bit wide of initial protion product can be equal to N, the digit of sign bit extension bits can be equal to N.Optionally, symbol Bits Expanding Processing is mended it is to be understood that the numerical value of sign bit extension bits is carried out cover with the numerical value of sign bit in initial protion product Bit value can be the symbol bit value in initial protion product, which can be the highest digit in initial protion product Value, the partial product after obtaining the symbol Bits Expanding of a 2N bit bit wide.Optionally, the digit of above-mentioned cover can be equal to N.It can Choosing, the highest in partial product in the regularity of distribution of the partial product after all symbol Bits Expandings, after all symbol Bits Expandings Bit value can be located at same row, and lowest order numerical value can also be located at same row, and other corresponding bit values can also correspond to same Column.

A kind of data processing method provided in this embodiment, this method, which can carry out canonical to the pending data, symbol Number coded treatment obtains initial protion product, carries out sign bit extension process to initial protion product, obtains the target and compile The partial product of code, and accumulation process is carried out to the partial product of target code, multiplication result is obtained, and then read multiplication respectively High position data and low data in operation result, as target operation result, so that the position of the target operation result obtained 2 times of the wide data bit width that can be less than multiplier input, to effectively reduce multiplier to input/output port bit wide It is required that；Meanwhile this method can obtain live part product number it is less, to reduce the complexity of multiplying；Meanwhile This method can be improved the operation efficiency of multiplying.

The data processing method that another embodiment provides, in above-mentioned S105 according to the storage indication signal will it is multiple described in Multiplication result stores the step into different deposit sub-circuits, can specifically include:

S1051, corresponding first multiplication result of the first storage indication signal is stored into the first deposit sub-circuit.

Specifically, the number of storage indication signal can be equal to the number that multiplier realizes multiplying, multiplier is realized Multiplication operation, an available multiplication result, and the available corresponding storage of state control circuit Indication signal.If multiplier carries out first time multiplying, the first multiplication result is obtained, state control circuit obtains automatically First storage indication signal, the first storage indication signal that deposit control circuit is inputted according to state control circuit, determines storage First deposit sub-circuit of the first multiplication result, and the first multiplication result is input to the first deposit sub-circuit and is deposited Storage.

S1052, corresponding second multiplication result of the second storage indication signal is stored into the second deposit sub-circuit.

It should be noted that if multiplier carries out second of multiplying, the second multiplication result, state control are obtained Circuit obtains the second storage indication signal, the second storage instruction letter that deposit control circuit is inputted according to state control circuit automatically Number, determine the second deposit sub-circuit of the second multiplication result of storage, and the second multiplication result is input to second and is posted Deposit sub-circuit storage.And so on, multiplier can store the multiplication result that multiplying each time obtains to difference Deposit sub-circuit in, and store corresponding multiplication result according to the number order of deposit sub-circuit, that is, continuously Multiplication result twice can store into two adjacent deposit sub-circuits.

A kind of data processing method provided in this embodiment, by the corresponding first multiplying knot of the first storage indication signal Fruit stores into the first deposit sub-circuit, and corresponding second multiplication result of the second storage indication signal is stored to second and is posted It deposits in sub-circuit, thus the problem of avoiding the occurrence of multiplication result covering；In addition, this method can also make the target obtained fortune The bit wide for calculating result can be less than 2 times of the data bit width that multiplier inputs, and multiplier is effectively reduced to input/output port position Wide requirement, meanwhile, the number for the live part product that this method can obtain is less, reduces the complexity of multiplying.

As one of embodiment, according to the reading indication signal in above-mentioned S106, different deposit sub-circuits are read Partial data in the correspondence of the middle storage multiplication result, can specifically pass through the step of obtaining target operation result Following manner is realized:

S1061, indication signal is read according to first, reads the first multiplying stored in the first deposit sub-circuit As a result first part's data in, obtain the first operation result.

S1062, indication signal is read according to second, reads first multiplication stored in the first deposit sub-circuit Second part data in operation result, obtain the second operation result.

Specifically, the number for the reading indication signal that the state control circuit in multiplier obtains, can be equal to multiplier The number for reading operation result, is equivalent to 2 times of multiplication result number.Optionally, multiplication result may include two Partial data, i.e. first part's data and second part data.Illustratively, if the bit wide of multiplication result is equal to 2N, Then multiplication result is segmented into two parts data, high N data and low N data, wherein first part's data can be High N data or low N data, second part data can be low N data or high N data.

S1063, indication signal is read according to third, reads the second multiplying stored in the second deposit sub-circuit As a result first part's data in, obtain third operation result.

Optionally, each reading indication signal can correspond to first part's data in multiplication result or second Divided data.

S1064, indication signal is read according to the 4th, reads second multiplication stored in the second deposit sub-circuit Second part data in operation result, obtain the 4th operation result.

Specifically, multiplier can carry out multiplying to multiple groups pending data, multiple multiplication results are obtained, because This can read in next multiplication result after multiplier reads the 4th operation result according to next reading indication signal Partial data.

Illustratively, if the input port bit wide of multiplier is 32 bits, output port bit wide is 64/t+deta bit (general, multiplier can complete multiplication operation by t clock cycle, obtain a multiplication result, t > 1, deta >=0), the data bit width that multiplier receives also is 32 bits, and the multiplier needs to multiply multiple groups pending data Method operation, in this case, including (64/ (64/t+deta)) a 131 (i.e. deposit electricity of deposit sub-circuit in register circuit 13 Road A₁, A₂..., A_i, i can be equal to (64/ (64/t+deta))), then the realization process for obtaining target operation result can be with are as follows:

If multiplier obtains the first multiplication result M_0 by t (t can be more than or equal to 0) a clock cycle, deposit Control circuit can store M_0 (64 bit bit wide) to deposit sub-circuit A according to the first storage indication signal₁In, at this point, choosing Indication signal can be read according to first by selecting circuit, from deposit sub-circuit A₁Middle high 32 data for reading M_0, as the first time The first operation result that multiplying obtains；

Meanwhile when multiplier is to t+1 clock cycle, then selection circuit can read indication signal according to second, From deposit sub-circuit A₁Middle low 32 data for reading M_0, as the second operation result that first time multiplying obtains, at this In embodiment, multiplier splices the first operation result and the second operation result, the target operation of available pending data As a result；

If multiplier is to 2t clock cycle, available second multiplication result M_1 then deposits control circuit M_1 can store to deposit sub-circuit A according to the second storage indication signal₂In, at this point, selection circuit can be read according to third Indication signal is taken, from deposit sub-circuit A₂Middle high 32 data for reading M_1, the third fortune obtained as second of multiplying Calculate result；

Meanwhile when operation of the multiplier to the 2t+1 clock cycle, then selection circuit can read according to the 4th and refer to Show signal, from deposit sub-circuit A₂Middle low 32 data for reading M_1, the 4th operation knot obtained as second of multiplying Fruit, in the present embodiment, data comparator merge third operation result with the 4th operation result, available pending data Target operation result；

And so on, according to the multiplication result that different storage indication signals will obtain, can store to correspondence not In same deposit sub-circuit, and read in different deposit sub-circuits according to different reading indication signals, the multiplying of storage As a result the partial data in obtains target operation result.

In addition, if one group of pending data in multiple groups pending data, the case where there are zeros, at this point, multiplier passes through The corresponding multiplication result of m (m < t) a clock cycle available this group of pending data is crossed, multiplier can be according to storage Indication signal stores the multiplication result into corresponding deposit sub-circuit, and under present clock period, multiplier can root The different partial datas deposited in the multiplication results that sub-circuits store are read according to indication signal is read, following clock cycle multiplies Musical instruments used in a Buddhist or Taoist mass can export the remainder data in multiplication result；If in next group of pending data, there is also the feelings of zero Condition, and need 1 clock cycle that can complete multiplication operation, multiplication result is obtained, at this point, multiplier can be with The multiplication result is stored into adjacent next deposit sub-circuit.

A kind of data processing method provided in this embodiment, multiplier read different deposit according to indication signal is read The partial data in correspondence multiplication result stored in circuit, obtains target operation result, and this method can be read respectively High position data and low data in multiplication result, as target operation result, so that the target operation result obtained Bit wide can be less than 2 times of data bit width of multiplier input, to effectively reduce multiplier to input/output port position Wide requirement；Meanwhile this method can obtain live part product number it is less, reduce the complexity of multiplying.

Fig. 6 is the flow diagram for the data processing method that one embodiment provides, and this method can be by shown in Fig. 2 Multiplier is handled, and what is involved is the processes that data are carried out with multiplying for the present embodiment.As shown in fig. 6, this method comprises:

S201, data conversion signal and pending data are received.

Specifically, the multiplying operational circuit in multiplier can receive two pending datas and data conversion signal.It can Choosing, the bit wide of pending data can be equal to the bit wide of multiplier input mouth.Optionally, if revolution circuit receive it is different Data conversion signal, then revolution circuit can be by the data conversion received at data conversion signal corresponds to the data of format.

S202, canonical signed number coded treatment is carried out to the pending data, obtains the partial product of target code.

Specifically, the principle of above-mentioned canonical signed number coded treatment can be characterized as, for N multipliers, from low Position is handled to high-order numerical value, if it exists when the position 1 continuous l (l >=2), then the position n 1 can be converted to data " 1 (0)_l-1(- 1) ", and remaining is corresponded into N-l bit value and obtains a new data in conjunction with the l+1 bit value after converting, by the new data As the primary data of next stage conversion process, there is no the positions continuous l (l >=2) in the new data that obtains after conversion process Until 1, wherein the bit wide for carrying out the target code that canonical signed number coded treatment obtains to N multipliers can be equal to N+1 Numerical value.Add it should be noted that the number of the partial product of above-mentioned target code can be equal to the data bit width N that multiplier receives 1。

S203, accumulation process is carried out to the partial product of the target code, obtains multiplication result.

Specifically, cumulative sub-circuit can each columns value in the partial product to all target codes carry out cumulative fortune It calculates, obtains multiplication result.Optionally, the bit wide of above-mentioned multiplication result can be equal to the data bit that multiplier receives Wide 2 times can also be equal to 2 times of multiplier input mouth bit wide.Optionally, the bit wide of above-mentioned multiplication result can wait In 2 times of the bit wide of multiplier input mouth, 2 times of the bit wide of pending data can also be equal to.

S204, the multiplication result is carried out by revolution processing according to the data conversion signal, obtains target operation As a result, wherein the data conversion signal is used to indicate the number that multiplier needs to be converted to the target operation result demand According to type.

Specifically, revolution circuit is determined according to the data conversion signal received, multiplication result can be converted into, The operation result of fixed point type or the operation result of floating point type.Illustratively, if revolution circuit can receive two kinds of data Conversion signal is expressed as 00 and 01, meanwhile, the bit wide of multiplier input mouth and output port is N-bit, then 00 table Show that the position the 2N received multiplication result can be converted by revolution circuit, the operation result of N fixed point types, 01 indicates to turn The position the 2N received multiplication result can be converted by number circuit, the operation result of N floating point types, wherein different numbers It can be with flexible setting according to the function that conversion signal corresponds to the realization of revolution circuit.Optionally, each data conversion signal can be with table Sign multiplier needs to be converted to multiplication result a kind of data type of demand.

A kind of data processing method provided in this embodiment receives data conversion signal and pending data, to described Pending data carries out multiplying processing, obtains multiplication result, and according to the data conversion signal by the multiplication Operation result carries out revolution processing, obtains target operation result, and this method enables to the bit wide of the target operation result obtained, 2 times of multiplier input data bit wide can be less than, to effectively reduce requirement of the multiplier to input/output port bit wide； Meanwhile this method can obtain live part product number it is less, reduce the complexity of multiplying.

The embodiment of the present application also provides a machine learning arithmetic units comprising one or more mentions in this application The multiplier arrived executes specified machine learning fortune to operational data and control information for obtaining from other processing units It calculates, implementing result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, net Card, wifi interface, server.When comprising more than one multiplier, it can be linked by specific structure between multiplier And data are transmitted, for example, data are interconnected and are transmitted by quick external equipment interconnection bus, to support more massive machine The operation of device study.At this point it is possible to share same control system, there can also be control system independent；In can sharing Deposit, can also each accelerator have respective memory.In addition, its mutual contact mode can be any interconnection topology.

The machine learning arithmetic unit compatibility with higher, can by quick external equipment interconnection interface with it is various types of The server of type is connected.

The embodiment of the present application also provides a combined treatment devices comprising above-mentioned machine learning arithmetic unit leads to With interconnecting interface and other processing units.Machine learning arithmetic unit is interacted with other processing units, completes user jointly Specified operation.Fig. 7 is the schematic diagram of combined treatment device.

Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its Interface of its processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine Device learns the basic control such as unlatching, stopping of arithmetic unit；Other processing units can also cooperate with machine learning arithmetic unit It is common to complete processor active task.

General interconnecting interface, for transmitting data and control between the machine learning arithmetic unit and other processing units Instruction.The machine learning arithmetic unit obtains required input data, write-in machine learning operation dress from other processing units Set the storage device of on piece；Control instruction can be obtained from other processing units, write-in machine learning arithmetic unit on piece Control caching；It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.

Optionally, the structure as shown in figure 8, can also include storage device, storage device respectively with the machine learning Arithmetic unit is connected with other processing units.Storage device for be stored in the machine learning arithmetic unit and it is described its The data of the data of its processing unit, operation required for being particularly suitable for learn arithmetic unit or other processing units in machine Storage inside in the data that can not all save.

The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.

In some embodiments, a kind of chip has also been applied for comprising at above-mentioned machine learning arithmetic unit or combination Manage device.

In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.

In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.As shown in figure 9, Fig. 9 A kind of board is provided, above-mentioned board can also include other matching components, this is matched other than including said chip 389 Set component includes but is not limited to: memory device 390, reception device 391 and control device 392；

The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).

DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.

In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.

The reception device is electrically connected with the chip in the chip-packaging structure.The reception device is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the reception Device can be the quick external equipment interconnection interface of standard.For example, pending data is set by server by the way that standard is quickly external Standby interconnection interface is transferred to the chip, realizes data transfer.Preferably, it is connect when using quick external equipment interconnection 3.0X 16 When port transmission, theoretical bandwidth can reach 16000MB/s.In another embodiment, the reception device can also be other Interface, the application are not intended to limit the specific manifestation form of above-mentioned other interfaces, and the interface unit can be realized signaling transfer point .In addition, the calculated result of the chip still sends back external equipment (such as server) by the reception device.

The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.

In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.

Electronic equipment can be multiplier, robot, computer, printer, scanner, tablet computer, intelligent terminal, hand Machine, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, wrist-watch, Earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.

The vehicles include aircraft, steamer and/or vehicle；The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator；The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Electrical combination, but those skilled in the art should understand that, the application is not limited by described electrical combination mode, Because certain circuits can be realized using other way or structure according to the application.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to alternative embodiment, related device and module not necessarily this Shen It please be necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of multiplier, which is characterized in that the multiplier includes: multiplying operational circuit, deposit control circuit, register electricity Road, state control circuit and selection circuit, the multiplying operational circuit include canonical signed number coding sub-circuit and tire out Add sub-circuit, the output end of the canonical signed number coding sub-circuit is connect with the input terminal of the cumulative sub-circuit, described The output end of cumulative sub-circuit is connect with the first input end of the deposit control circuit, the output end of the deposit control circuit It is connect with the input terminal of the register circuit, the first input end of the output end of the register circuit and the selection circuit Connection, the first output end of the state control circuit are connect with the second input terminal of the deposit control circuit, the state The second output terminal of control circuit is connect with the second input terminal of the selection circuit.

2. multiplier according to claim 1, which is characterized in that the canonical signed number coding sub-circuit includes canonical Signed number coding unit and partial product acquiring unit, the canonical signed number coding unit are used to receive the first data, And the canonical signed number coded treatment is carried out to first data, the target code is obtained, the partial product obtains Unit obtains initial protion product for receiving the second data, according to the target code and second data, and according to institute It states initial protion product and obtains the partial product of the target code, the cumulative sub-circuit is used for the partial product to the target code It carries out accumulation process and obtains multiplication result, the state control circuit is for obtaining storage indication signal and reading instruction Signal, the storage indication signal that the deposit control circuit is used to be inputted according to the state control circuit, determines storage The register circuit of the multiplication result, the register circuit are described for storing the multiplication result Selection circuit is used to read the multiplication fortune stored in the register circuit according to the reading indication signal received The data in result are calculated, as target operation result.

3. multiplier according to claim 2, which is characterized in that the canonical signed number coding unit may include: Data-in port and target code output port；The data-in port carries out at canonical signed number coding for receiving First data of reason, the target code output port carry out canonical signed number volume to first data for exporting The target code obtained after code processing.

4. multiplier according to claim 2 or 3, which is characterized in that the partial product acquiring unit is specifically used for institute It states target code progress conversion process and obtains initial protion product, and sign bit extension process is carried out to initial protion product, obtain Partial product to after symbol Bits Expanding obtains the partial product of the target code according to the partial product after the symbol Bits Expanding.

5. multiplier according to any one of claim 2 to 4, which is characterized in that the partial product acquiring unit includes: Target code input port, the second data-in port and partial product output port；The target code input port is used for The target code is received, second data-in port is for receiving second data, the partial product output port For exporting the partial product of the target code.

6. multiplier according to any one of claim 1 to 5, which is characterized in that the cumulative sub-circuit includes: Hua Lai Scholar's tree group unit and summing elements；Wherein, the input terminal of the output end of the Wallace tree group unit and the summing elements connects It connects；The Wallace tree group unit be used to carry out the partial product of the target code accumulation process obtain accumulating operation as a result, The summing elements are used to carry out accumulation process to the accumulating operation result.

7. multiplier according to claim 6, which is characterized in that the Wallace tree group unit includes: Wallace tree Unit, the Wallace tree subelement are used to carry out accumulation process to each columns value in the partial product of all target codes.

8. multiplier according to claim 6 or 7, which is characterized in that the summing elements include: adder, described to add Musical instruments used in a Buddhist or Taoist mass is used to carry out add operation to the cumulative correction result received.

9. multiplier according to claim 8, which is characterized in that the adder include: carry signal input port and Position signal input port and result output port；The carry signal input port is for receiving carry signal, described and position Signal input port is believed with position signal, the result output port for exporting the carry signal and described and position for receiving Number carry out accumulation process result.

10. multiplier according to any one of claim 1 to 9, which is characterized in that the register circuit includes: to post Sub-circuit is deposited, the deposit sub-circuit is for storing the corresponding multiplication result of different storage indication signals.

11. a kind of multiplier, which is characterized in that the multiplier includes: multiplying operational circuit and revolution circuit, the multiplication Computing circuit includes canonical signed number coding sub-circuit and cumulative sub-circuit, and the canonical signed number encodes sub-circuit Output end is connect with the input terminal of the cumulative sub-circuit, the input of the output end of the cumulative sub-circuit and the revolution circuit End connection, the revolution circuit include the first conversion sub-circuit and the second conversion sub-circuit；

Wherein, the canonical signed number coding sub-circuit is used to carry out canonical signed number coded treatment to the data received Target code is obtained, and the partial product of target code is obtained according to the target code, the cumulative sub-circuit is used for described The partial product of target code carries out accumulation process and obtains multiplication result, first conversion sub-circuit and the second conversion son electricity Road is respectively used to carry out revolution processing to the multiplication result, obtains target operation result.

12. multiplier according to claim 11, which is characterized in that include input port in the revolution circuit, be used for Receive data conversion signal；The data conversion signal is used to determine the data conversion type of the revolution processing of circuit.

13. multiplier according to claim 11 or 12, which is characterized in that first conversion sub-circuit is specifically used for will The multiplication result is converted into the target operation result of floating point type, and second conversion sub-circuit is specifically used for will The multiplication result is converted into the target operation result of fixed point type.

14. a kind of data processing method, which is characterized in that the described method includes:

Receive pending data；

It obtains storage indication signal and reads indication signal；

According to the reading indication signal, the portion in the correspondence multiplication result stored in different deposit sub-circuits is read Divided data obtains target operation result.

15. according to the method for claim 14, which is characterized in that described to have symbol to pending data progress canonical Number encoder processing, obtains the partial product of target code, comprising:

16. according to the method for claim 15, which is characterized in that described to have symbol to pending data progress canonical Number encoder processing obtains initial protion product, comprising:

17. method according to claim 15 or 16, which is characterized in that described to carry out sign bit to initial protion product Extension process obtains the partial product of the target code, comprising: carries out cover processing to initial protion product, obtains described The partial product of target code.

18. method described in any one of 4 to 17 according to claim 1, which is characterized in that described to indicate to believe according to the storage Number multiple multiplication results are stored into different deposit sub-circuits, comprising:

19. method described in any one of 4 to 18 according to claim 1, which is characterized in that described to indicate to believe according to the reading Number, the partial data in the correspondence multiplication result stored in different deposit sub-circuits is read, target operation knot is obtained Fruit, comprising:

Indication signal is read according to first, reads in the first deposit sub-circuit the in the first multiplication result for storing A part of data obtain the first operation result；

Indication signal is read according to second, is read in first multiplication result stored in the first deposit sub-circuit Second part data, obtain the second operation result；

Indication signal is read according to third, reads in the second deposit sub-circuit the in the second multiplication result for storing A part of data obtain third operation result；

Indication signal is read according to the 4th, is read in second multiplication result stored in the second deposit sub-circuit Second part data, obtain the 4th operation result.

20. a kind of data processing method, which is characterized in that the described method includes:

Receive data conversion signal and pending data；

The multiplication result is subjected to revolution processing according to the data conversion signal, obtains target operation result, wherein The data conversion signal is used to indicate the data type that multiplier needs to be converted to the target operation result demand.

21. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed Benefit requires the described in any item multipliers of 1-13, for being obtained from other processing units to operation input data and control letter Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface；

It is specific by presetting between multiple computing devices when the machine learning arithmetic unit includes multiple multipliers Structure is attached and transmits data；

Wherein, multiple multipliers are interconnected by PCIE bus and are transmitted data, to support more massive engineering The operation of habit；Multiple multipliers share same control system or possess respective control system；Multiple multipliers are total It enjoys memory or possesses respective memory；The mutual contact mode of multiple multipliers is any interconnection topology.

22. a kind of combined treatment device, which is characterized in that the combined treatment device includes machine as claimed in claim 21 Learn arithmetic unit, general interconnecting interface and other processing units；

The machine learning arithmetic unit is interacted with other processing units, the common calculating behaviour for completing user and specifying Make.

23. combined treatment device according to claim 22, which is characterized in that further include: storage device, the storage device It is connect respectively with the machine learning arithmetic unit and other processing units, for saving the machine learning arithmetic unit With the data of other processing units.

24. a kind of neural network chip, which is characterized in that the machine learning chip includes machine as claimed in claim 21 Learn arithmetic unit or combined treatment device as claimed in claim 22 or combined treatment device as claimed in claim 23.

25. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 24.

26. a kind of board, which is characterized in that the board includes: memory device, reception device and control device and such as right It is required that neural network chip described in 24；

Wherein, the neural network chip is separately connected with the memory device, the control device and the reception device；

The memory device, for storing data；

The reception device, for realizing the data transmission between the chip and external equipment；

The control device is monitored for the state to the chip.

27. board according to claim 26, which is characterized in that

The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM；

The chip includes: DDR controller, the control for data transmission and data storage to each storage unit；

The reception device are as follows: standard PCIE interface.