CN110413254A - Data processor, method, chip and electronic equipment - Google Patents

Data processor, method, chip and electronic equipment Download PDF

Info

Publication number
CN110413254A
CN110413254A CN201910902610.3A CN201910902610A CN110413254A CN 110413254 A CN110413254 A CN 110413254A CN 201910902610 A CN201910902610 A CN 201910902610A CN 110413254 A CN110413254 A CN 110413254A
Authority
CN
China
Prior art keywords
data
partial product
product
symbol bits
target code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910902610.3A
Other languages
Chinese (zh)
Other versions
CN110413254B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910902610.3A priority Critical patent/CN110413254B/en
Priority to CN201911349822.XA priority patent/CN111008003B/en
Publication of CN110413254A publication Critical patent/CN110413254A/en
Application granted granted Critical
Publication of CN110413254B publication Critical patent/CN110413254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a kind of data processor, method, chip and electronic equipment, data processor includes the first multiplying operational circuit, second multiplying operational circuit and partial product switched circuit, first multiplying operational circuit includes the first amendment coding sub-circuit and the first amendment compression sub-circuit, second multiplying operational circuit includes the second amendment coding sub-circuit and the second amendment compression sub-circuit, the data processor can carry out canonical signed number coded treatment to data are received, so that the number of the live part product obtained is less, data processor is reduced to realize multiplying or multiply accumulating the complexity of operation.

Description

Data processor, method, chip and electronic equipment
Technical field
This application involves field of computer technology, set more particularly to a kind of data processor, method, chip and electronics It is standby.
Background technique
With the continuous development of Digital Electronic Technique, all kinds of artificial intelligence (Artificial Intelligence, AI) cores The fast-developing requirement for High performance data processor of piece is also higher and higher, wherein data processor is multiplier, addition Device or multiply-accumulator.Neural network algorithm multiply by multiply-accumulator tired as one of widely applied algorithm of intelligent chip Adding operation is a kind of common operation in neural network algorithm.
Currently, data processor is to encode to every three bit value in multiplier as one, and obtain portion according to multiplicand Point product, and compression processing is carried out to all partial products with Wallace tree and obtains multiplication result or multiplies accumulating operation result.But It is that in traditional technology, the number of non-zero bit value is more in coding, the number of the effective partial product of the correspondence of generation is more, causes Data processor realization multiplying or the complexity for multiplying accumulating operation are higher.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of number of live part product that can reduce acquisition Mesh reduces data processor, method, chip and the electronic equipment of computational complexity.
A kind of data processor, the data processor include: the first multiplying operational circuit, the second multiplying operational circuit with And partial product switched circuit, first multiplying operational circuit include the first amendment coding sub-circuit and the first amendment compression Circuit, second multiplying operational circuit include the second amendment coding sub-circuit and the second amendment compression sub-circuit, wherein institute Stating the first amendment coding sub-circuit includes the first encoding branches and first choice branch, the second amendment coding sub-circuit packet Include the second encoding branches and the second selection branch, the first output end of the first amendment coding sub-circuit and the partial product The first input end of switched circuit connects, and the second output terminal of the first amendment coding sub-circuit and first amendment are compressed The input terminal of sub-circuit connects, and the first output end of the partial product switched circuit is defeated with the first amendment coding sub-circuit Entering end connection, the second output terminal of the partial product switched circuit is connect with the input terminal of the second amendment coding sub-circuit, First output end of the second amendment coding sub-circuit is connect with the second input terminal of the partial product switched circuit, and described the The second output terminal of two amendment coding sub-circuits is connect with the input terminal of the second amendment compression sub-circuit;
Wherein, first encoding branches are used to carry out canonical signed number coded treatment to the first data received, obtain First partial product after symbol Bits Expanding, the first choice branch are used for from the first partial product after the symbol Bits Expanding The first partial product of selection target coding, the first amendment compression sub-circuit are used for the first partial product to the target code Compression processing is carried out, first object operation result is obtained, second encoding branches are used to carry out the second data received Canonical signed number coded treatment, the second partial product after obtaining symbol Bits Expanding, the second selection branch are used for from described The second partial product that selection target encodes in second partial product after symbol Bits Expanding, the second amendment compression sub-circuit are used for Compression processing is carried out to the second partial product of the target code, obtains the second target operation result, the partial product exchange electricity Road is for handing over the second partial product after the first partial product and the symbol Bits Expanding after the symbol Bits Expanding It changes.
Include in first multiplying operational circuit and second multiplying operational circuit in one of the embodiments, First input end is used for receive capabilities selection mode signal;It include third input terminal in the partial product switched circuit, for connecing Receive the function selection mode signal;The function selection mode signal can be handled not for determining the data processor currently With the data operation of mode.
In one of the embodiments, the first amendment coding sub-circuit include: the first amendment coded treatment branch with And first partial product selects branch, the output end and the first partial product of the first amendment coded treatment branch select branch Input terminal connection;
Wherein, the first amendment coded treatment branch is used to carry out canonical signed number volume to first data received Code processing obtains the first object coding, and the first partial product selection branch according to the first object for encoding First partial product to after symbol Bits Expanding selects the first partial product after the symbol Bits Expanding, and receives institute Second partial product after stating the symbol Bits Expanding of partial product switched circuit output, after the symbol Bits Expanding received Second partial product, and the first partial product after selection after the obtained symbol Bits Expanding, as the target code First partial product.
The first amendment coded treatment branch includes: the first amendment coding unit, low level in one of the embodiments, Partial product acquiring unit, low level selector group unit, high-order portion product acquiring unit and high digit selector group unit, described the First output end of one amendment coding unit is connect with the first input end of low portion product acquiring unit, the low level choosing The output end for selecting device group unit is connect with the second input terminal of low portion product acquiring unit, and the first amendment coding is single The second output terminal of member is connect with the first input end of high-order portion product acquiring unit, the high digit selector group unit Output end is connect with the second input terminal of high-order portion product acquiring unit;
Wherein, the first amendment coding unit is used to carry out at canonical signed number coding first data received Reason determines that the data processor can handle the bit wide of data according to the function selection mode signal received, and according to The bit wide that the data processor can handle data obtains first object coding, and the low portion product acquiring unit is used for basis Receive the first object coding in the first low level target code and first data, after obtaining symbol Bits Expanding The first low portion product, the low level selector group unit is used to gate the product of the first low portion after the symbol Bits Expanding In numerical value, high-order portion product acquiring unit is used for according to the first high-order mesh in the first object coding received Mark coding and first data, the first high-order portion product after obtaining symbol Bits Expanding, the high digit selector group unit For gating the numerical value in the product of the first high-order portion after the symbol Bits Expanding.
The first amendment coding unit includes: the first data-in port, first mode in one of the embodiments, Selection signal input port, low level target code output port and high-order target code output port;First data are defeated Inbound port is for receiving first data, and the first mode selection signal input port is for receiving the function selection mould Formula signal, the low level target code output port carry out canonical signed number coded treatment to first data for exporting Afterwards, the first low level target code obtained, the high position target code output port is for exporting to first data After carrying out canonical signed number coded treatment, the high-order target code of described first obtained.
The low portion product acquiring unit includes: low level target code input port, choosing in one of the embodiments, Logical value input mouth, the first data-in port and low portion product output port;The low level target code input terminal Mouth is for receiving the first low level target code of the first amendment coding unit output, the gating value input mouth After receiving the low level selector group one-cell switching, in the first low portion product after the obtained symbol Bits Expanding Numerical value, first data-in port is for receiving first data, and the low portion product output port is for exporting The first low portion product after the symbol Bits Expanding.
The high-order portion product acquiring unit includes: high-order target code input port, choosing in one of the embodiments, Logical value input mouth, the first data-in port and high-order portion product output port;The high position target code input terminal Mouth is used for for receiving the first high-order target code of the first amendment coding unit output, the gating value input mouth The number in the first high-order portion product after receiving the high digit selector group one-cell switching, after the symbol Bits Expanding of output Value, first data-in port is for receiving first data, and the high-order portion product output port is for exporting institute The first high-order portion product after stating symbol Bits Expanding.
The low level selector group unit includes: low level selector in one of the embodiments, the low level selector For being gated to the numerical value in the first low portion product after the symbol Bits Expanding.
The high digit selector group unit includes: high digit selector in one of the embodiments, the high digit selector For being gated to the numerical value in the first high-order portion product after the symbol Bits Expanding.
The first partial product selection branch includes: function selection mode signal input part in one of the embodiments, Mouth, first partial product input port, second partial product input port, first partial product output port and gate unit product output Port;The function selection mode signal input port is for receiving the function selection mode signal, the first partial product Input port is used to receive the first partial product after the symbol Bits Expanding of the first amendment coding unit output, and described the Two partial product input ports are used to receive the second partial product after the symbol Bits Expanding of the partial product switched circuit exchange, The first partial product output port is used to export the symbol Bits Expanding for needing the partial product switched circuit to swap First partial product afterwards, the gate unit product output port are used to export first after the symbol Bits Expanding after gating Divide product, and the second partial product after the symbol Bits Expanding received.
The first amendment compression sub-circuit includes: amendment Wallace tree group unit and tires out in one of the embodiments, Add unit, the output end of the amendment Wallace tree group unit is connect with the input terminal of the summing elements;The amendment Hua Lai It is every in the first partial product of the target code of acquisition when scholar's tree group unit is used to handle the data operation of different mode One columns value carries out accumulation process, obtains accumulating operation as a result, the summing elements are used to carry out the accumulating operation result Add operation.
The amendment Wallace tree group unit includes: low level Wallace tree subelement, selection in one of the embodiments, Device and high-order Wallace tree subelement, the output end of the low level Wallace tree subelement and the input terminal of the selector connect It connects, the output end of the selector is connect with the input terminal of the high-order Wallace tree subelement;Wherein, the low level Wallace Tree unit is used to carry out each columns value in the first partial product of the target code accumulating operation to obtain described add up Operation result, the selector is for gating the received carry input signal of the high-order Wallace tree subelement, the high position Wallace tree subelement is used to carry out accumulating operation to each columns value in the first partial product of the target code to obtain institute State accumulating operation result.
The summing elements include: adder in one of the embodiments, and the adder is used for the cumulative fortune It calculates result and carries out add operation.
In one of the embodiments, the second amendment coding sub-circuit include: the second amendment coded treatment branch with And second partial product selects branch, the output end and the second partial product of the second amendment coded treatment branch select branch Input terminal connection;
The second amendment coded treatment branch is used to carry out at canonical signed number coding second data received Reason obtains second target code, and the second partial product selection branch according to second target code for being accorded with Second partial product after number Bits Expanding, selects the second partial product after the symbol Bits Expanding, and receive the portion First partial product after the symbol Bits Expanding of point product switched circuit output, by the after the symbol Bits Expanding received First partial product after the symbol Bits Expanding obtained after two partial products, and selection, second as the target code Partial product.
The second partial product selection branch includes: function selection mode signal input part in one of the embodiments, Mouth, second partial product input port, first partial product input port, second partial product output port and gate unit product output Port;The function selection mode signal input port is for receiving the function selection mode signal, the second partial product Input port is used to receive the second partial product after the symbol Bits Expanding of the second amendment coded treatment branch output, institute After first partial product input port is stated for receiving the symbol Bits Expanding obtained after the partial product switched circuit exchange First partial product, the second partial product output port for export need the partial product switched circuit need to exchange it is described Second partial product after symbol Bits Expanding, the gate unit product output port are used to export the symbol Bits Expanding after gating First partial product after second partial product afterwards, and the symbol Bits Expanding that receives.
The partial product switched circuit includes: function selection mode signal input port, in one of the embodiments, A part product input port, first partial product output port, second partial product input port and second partial product output port, The function selection mode signal input port is for receiving the function selection mode signal, the first partial product input terminal First partial product after the symbol Bits Expanding that the needs that mouth is used to receive the first partial product selection branch output exchange, The first partial product output port is for exporting the first partial product after the symbol Bits Expanding, the second partial product output Second part after the symbol Bits Expanding that the needs that port is used to receive the second partial product selection branch output exchange Product, the second partial product output port is for exporting the second partial product after the symbol Bits Expanding.
A kind of data processing method, which comprises
Receive pending data and function selection mode signal, wherein the function selection mode signal is used to indicate at data Reason device can currently handle the data operation of different mode;
According to the function selection mode signal, judge whether the pending data needs to carry out deconsolidation process;
If the pending data needs to carry out deconsolidation process, deconsolidation process is carried out to the pending data, is split Data afterwards;
Canonical signed number coded treatment is carried out to the data after the fractionation, obtains target code;
Conversion process is carried out according to the data after the target code and the fractionation, the part after obtaining symbol Bits Expanding Product;
According to the function selection mode signal, judge whether need to swap place to the partial product after the symbol Bits Expanding Reason;
If not needing to swap processing to the partial product after the symbol Bits Expanding, by the part after the symbol Bits Expanding Partial product of the product as target code;
Compression processing is carried out to the partial product of the target code, obtains target operation result.
It is described according to the function selection mode signal in one of the embodiments, judge that the pending data is It is no to need to carry out deconsolidation process, comprising: according to the function selection mode signal, to judge the bit wide and number of the pending data Whether the data bit width according to processor currently accessible associative mode operation is equal.
In one of the embodiments, according to the function selection mode signal, the position of the pending data is judged After whether width is equal with the data bit width of data processor currently accessible associative mode operation, the method also includes: If the pending data does not need to carry out deconsolidation process, continues to execute and canonical signed number is carried out to the pending data Coded treatment obtains the target code.
The data to after the fractionation carry out canonical signed number coded treatment in one of the embodiments, obtain To target code, comprising: will be continuous in the data after the fractionationlBit value 1 be converted to (l+ 1) highest bit value in position is 1, Lowest order numerical value be -1, remaining position be numerical value 0 after, obtain the target code, whereinlMore than or equal to 2.
The data to after the fractionation carry out canonical signed number coded treatment in one of the embodiments, obtain To target code, comprising:
Canonical signed number coded treatment is carried out to the data after the fractionation, obtains intermediate code;
According to the intermediate code and the function selection mode signal, the target code is obtained.
The data according to after the target code and the fractionation carry out at conversion in one of the embodiments, Reason, the partial product after obtaining symbol Bits Expanding, comprising:
Conversion process is carried out according to the data after the target code and the fractionation, obtains initial protion product;
Sign bit extension process is carried out to initial protion product, the partial product after obtaining the symbol Bits Expanding.
It is described according to the function selection mode signal in one of the embodiments, judge to the symbol Bits Expanding Whether partial product afterwards needs to swap processing, comprising: according to the function selection mode signal, judges that data processor is worked as Whether preceding handled data bit width is identical.
In one of the embodiments, according to the function selection mode signal, judge to after the symbol Bits Expanding Partial product whether need to swap processing after, the method also includes: if desired to the portion after the symbol Bits Expanding Product is divided to swap processing, then long-pending to the high-order portion in the partial product after the symbol Bits Expanding or low portion product is handed over Change processing.
The partial product to the target code carries out compression processing in one of the embodiments, obtains target fortune Calculate result, comprising:
Accumulation process is carried out to the partial product of the target code, obtains intermediate calculation results;
Accumulation process is carried out to the intermediate calculation results, obtains the target operation result.
It is described in one of the embodiments, that accumulation process is carried out to the intermediate calculation results, obtain the target fortune Calculate result, comprising:
Low level Wallace tree subelement carries out accumulation process to the columns value in the partial product of all target codes, obtains cumulative fortune Calculate result;
Selector gates the accumulating operation result according to the function selection mode signal, obtains carry gating letter Number;
High-order Wallace tree subelement is according to the columns value in the carry gating signal and the partial product of the target code Accumulation process is carried out, the target operation result is obtained.
A kind of data processor provided in this embodiment and method pass through the first amendment coding sub-circuit and the second amendment Coding sub-circuit realizes canonical signed number coded treatment to the data that receive respectively, the after respectively obtaining symbol Bits Expanding Second partial product after a part of product and symbol Bits Expanding, and need are determined whether according to the function selection mode signal received Will by partial product switched circuit to the second partial product after the first partial product and symbol Bits Expanding after symbol Bits Expanding into Row exchange processing, if desired swaps processing, then after exchanging processing, the first amendment coding sub-circuit and the second amendment coding Partial product after the symbol Bits Expanding that circuit can respectively have current each sub-circuit is as the partial product of target code, in turn Obtain the first partial product of target code and the second partial product of target code, finally by first amendment compression sub-circuit and Second amendment compression sub-circuit is compressed respectively to the second partial product of the first partial product of target code and target code Processing obtains target operation result, which can pass through the first amendment coding sub-circuit and the second amendment coding electricity Road, so that the number of the live part product of acquisition is less, drops respectively to data progress canonical signed number coded treatment is received Low data processor realizes multiplying or multiplies accumulating the complexity of operation.
A kind of machine learning arithmetic unit provided by the embodiments of the present application, the machine learning arithmetic unit include one or Multiple data processors described above;The machine learning arithmetic unit is used to obtain from other processing units to operational data With control information, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple data processors, by default between multiple computing devices Specific structure is attached and transmits data;
Wherein, multiple data processors are interconnected by PCIE bus and are transmitted data, to support more massive machine The operation of device study;Multiple data processors share same control system or possess respective control system;It is multiple described Data processor shared drive possesses respective memory;The mutual contact mode of multiple data processors is that any interconnection is opened up It flutters.
A kind of combined treatment device provided by the embodiments of the present application, the combined treatment device include engineering described above Practise processing unit, general interconnecting interface and other processing units.The machine learning arithmetic unit and above-mentioned other processing units into Row interaction, the common operation completing user and specifying;The combined treatment device can also include storage device, storage device difference It is connect with the machine learning arithmetic unit and other processing units, for saving the machine learning arithmetic unit and institute State the data of other processing units.
A kind of neural network chip provided by the embodiments of the present application, the neural network chip include at data described above Manage device, machine learning arithmetic unit described above or combined treatment device described above.
A kind of neural network chip encapsulating structure provided by the embodiments of the present application, the neural network chip encapsulating structure include Neural network chip described above.
A kind of board provided by the embodiments of the present application, the board include neural network chip encapsulating structure described above.
The embodiment of the present application provides a kind of electronic device, the electronic device include neural network chip described above or Person's board described above.
A kind of chip provided by the embodiments of the present application, including at least one data processor as described in any one of the above embodiments.
A kind of electronic equipment provided by the embodiments of the present application, including chip as described above.
Detailed description of the invention
Fig. 1 is a kind of electrical block diagram for data processor that an embodiment provides.
Fig. 2 is the electrical block diagram for another data processor that another embodiment provides.
Fig. 3 is the particular circuit configurations figure for the data processor that an embodiment provides.
Fig. 4 a is the regularity of distribution schematic diagram for the partial product that 16 data multiplyings that an embodiment provides obtain.
Fig. 4 b is the regularity of distribution signal that 16 * 8 data that an embodiment provides multiply accumulating the partial product that operation obtains Figure.
Fig. 5 is the particular circuit configurations figure for the data processor that another embodiment provides.
Fig. 6 is a kind of data processing method flow diagram that an embodiment provides.
The particular circuit configurations figure of compressor circuit when 8 data operations that Fig. 7 provides for another embodiment.
Fig. 8 is another data processing method flow diagram that an embodiment provides.
Fig. 9 is a kind of structure chart for combined treatment device that an embodiment provides.
Figure 10 is the structure chart for another combined treatment device that an embodiment provides.
Figure 11 is a kind of structural schematic diagram for board that an embodiment provides.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Data processor provided by the present application can be applied to AI chip, on-site programmable gate array FPGA (Field- Programmable Gate Array, FPGA) chip or be in other hardware circuit equipment progress multiplying processing Or multiplying accumulating calculation process, the structural schematic diagram of the data processor is as illustrated in fig. 1 and 2.
As shown in FIG. 1, FIG. 1 is a kind of structure charts for data processor that one embodiment provides.As shown in Figure 1, the number It include: the first multiplying operational circuit 11, the second multiplying operational circuit 12 and partial product switched circuit 13 according to processor;Described One multiplying operational circuit 11 includes the first amendment coding sub-circuit 111 and the first amendment compression sub-circuit 112, and described second multiplies Method computing circuit 12 includes the second amendment coding sub-circuit 121 and the second amendment compression sub-circuit 122, wherein described first Amendment coding sub-circuit 111 includes the first encoding branches 111a and first choice branch 111b, the second amendment coding Circuit 121 includes the selection of the second encoding branches 121a and second branch 121b, and the of the first amendment coding sub-circuit 111 One output end is connect with the first input end of the partial product switched circuit 13, and the of the first amendment coding sub-circuit 111 Two output ends are connect with the input terminal of the first amendment compression sub-circuit 112, and the first of the partial product switched circuit 13 is defeated Outlet is connect with the input terminal of the first amendment coding sub-circuit 111, the second output terminal of the partial product switched circuit 13 It is connect with the input terminal of the second amendment coding sub-circuit 121, the first output end of the second amendment coding sub-circuit 121 It is connect with the second input terminal of the partial product switched circuit 13, the second output terminal of the second amendment coding sub-circuit 121 It is connect with the input terminal of the second amendment compression sub-circuit 122.
Wherein, the first encoding branches 111a is used to carry out at canonical signed number coding the first data received Reason, the first partial product after obtaining symbol Bits Expanding, the first choice branch 111b are used for after the symbol Bits Expanding The first partial product that selection target encodes in first partial product, the first amendment compression sub-circuit 112 are used for the target The first partial product of coding carries out compression processing, obtains first object operation result, the second encoding branches 121a for pair The second data for receiving carry out canonical signed number coded treatment, the second partial product after obtaining symbol Bits Expanding, and described the Two selection branch 121b are for the second partial product that selection target encodes from the second partial product after the symbol Bits Expanding, institute It states the second amendment compression sub-circuit 122 to be used to carry out compression processing to the second partial product of the target code, obtains the second mesh Operation result is marked, the partial product switched circuit 13 is used for the first partial product and the symbol after the symbol Bits Expanding Second partial product after number Bits Expanding swaps.
Specifically, data multiplication operation can be thus achieved in above-mentioned data processor, data also may be implemented and multiply accumulating operation. Optionally, the first amendment coding sub-circuit 111 can receive the first data, and the second amendment coding sub-circuit 121 can receive the Two data, the first data and the second data may each comprise two subdatas, the two subdatas can be for the same as the identical of bit wide Subdata, or with the different subdatas of bit wide;The subdata can be used as multiplying or multiply accumulating in operation Multiplicand can also be used as multiplying or multiply accumulating the multiplier in operation.Optionally, two sons in above-mentioned first data Data can splice after as a whole, be input to the first amendment coding sub-circuit 111, can also separate while be input to the One amendment coding sub-circuit 111;Two subdatas in above-mentioned second data can splice after as a whole, be input to Two amendment coding sub-circuits 121 can also separate while be input to the second amendment coding sub-circuit 121.Wherein, above-mentioned subdata It can be fixed-point number, and bit wide can be 2N, the data bit width obtained after two subdata splicings can be 4N.Optionally, on Stating the first amendment coding sub-circuit 111 may include multiple data processing units with different function, these data processing lists Member can be the unit with canonical signed number coded treatment function, can also be the list with different switching processing function Member does not do any restriction to this present embodiment.Data processor in same secondary data calculation process, in data processor The subdata that one amendment coding sub-circuit 111 receives can be used as multiplicand, another subdata can be used as multiplier; The subdata that the second amendment coding sub-circuit 121 in data processor receives can be used as multiplicand, another height Data can be used as multiplier.After first partial product and symbol Bits Expanding after will also be appreciated that above-mentioned symbol Bits Expanding The bit wide of second partial product, multiplicand bit wide when can be equal to the currently processed multiplying of data processor or multiply accumulating operation 2 times;The number of first partial product after symbol Bits Expanding can be equal to the number of the first partial product of target code;Sign bit The number of second partial product after extension can be equal to the number of the second partial product of target code.Wherein, after symbol Bits Expanding First partial product may include the first high position portion after the first low portion after symbol Bits Expanding is long-pending and symbol Bits Expanding Divide product;Second partial product after symbol Bits Expanding may include that the second low portion after symbol Bits Expanding is long-pending and sign bit The second high-order portion product after extension;The first partial product of target code may include the first low portion product of target code, And the first high-order portion product of target code;The second partial product of target code may include the second low level portion of target code Divide the second high-order portion of product and target code product.
In the present embodiment, above-mentioned first amendment coding sub-circuit 111 can receive the multiplier in calculating process, and to this Multiplier carries out canonical signed number coded treatment, obtains target code.It should be noted that at above-mentioned canonical signed number coding The method of reason can characterize in the following manner: forNFor the multiplier of position, handled from low level numerical value to high-order numerical value, if it exists Continuouslyl(l >=2) bit value 1 when, then can will be continuousnBit value 1 be converted to data " 1(0) l-1(- 1) ", and will Remaining correspond to (N-l) bit value and conversion after (l+1) bit value is combined to obtain a new data;Then by the new number According to the primary data as next stage conversion process, there is no continuous in the new data that obtains after conversion processl(l >= 2) until bit value 1;Wherein, rightNPosition multiplier carries out canonical signed number coded treatment, and the bit wide of obtained target code can be with Equal to (N+1).Further, in canonical signed number coded treatment, data 11 can be converted to (100-001), that is, count According to 11 can equivalence be converted to 10(-1);Data 111 can be converted to (1000-0001), i.e., data 111 of equal value can be converted For 100(-1);And so on, it is other continuousl(l >=2) bit value 1 conversion process mode it is also similar.
For example, the first multiplier for receiving of amendment coding sub-circuit 111 is " 001010101101110 ", to the multiplier into The first new data obtained after row first order conversion process is " 0010101011100(-1) 0 ", continues to carry out the first new data The second new data obtained after the conversion process of the second level is " 0010101100(-1) 00(-1) 0 ", continue to the second new data into The third new data obtained after row third level conversion process be " 0010110(-1) 00(-1) 00(-1) 0 ", continue newly to count third According to carry out obtained the 4th new data after fourth stage conversion process be " 00110(-1) 0(-1) 00(-1) 00(-1) 0 ", continue pair 4th new data carry out obtained the 5th new data after level V conversion process be " 010(-1) 0(-1) 0(-1) 00(-1) 00(- 1) 0 ";And there is no continuous in the 5th new datal(l >=2) bit value 1, at this point, the 5th new data is properly termed as initially Coding, and intermediate code is obtained after carrying out a cover processing to the initial code, characterization canonical signed number coded treatment is complete At;Wherein, the bit wide of initial code can be equal to the bit wide of multiplier.Optionally, first amendment coding sub-circuit 111 to multiplier into After row canonical signed number coded treatment, obtained new data (i.e. initial code), if highest bit value in new data and time High-order numerical value is " 10 " or " 01 ", then the first amendment coding sub-circuit 111 can highest bit value to the new data it is high by one One digit number value 0 is mended at position, high three bit value for obtaining corresponding intermediate code is respectively " 010 " or " 001 ".Optionally, among the above Between the bit wide that encodes can be equal to data processor and be presently in the bit wides of reason data and add 1.
In addition, if the data bit width that data processor receives is 2N, and can currently handleNPosition data operation, then data The first amendment coding sub-circuit 111 in processor, can be by 2NPosition data split into two groupsNPosition data carry out data fortune respectively Calculate, at this point, by obtain two groups (N+1) position intermediate code can be used as target code after being combined;If data processor is worked as Before can handle 2NPosition data operation, then the first amendment in data processor encodes sub-circuit 111, can be to (the 2 of acquisitionN+1) Mend one digit number value 0(, that is, complement and handle in high one of highest bit value place of position intermediate code) after, by complement, treated (2N+ 2) position data are as target code.In the present embodiment, what data processor can execute initial code is cover processing, and What it is to intermediate code execution is complement processing.
It optionally, include the first input in first multiplying operational circuit 11 and second multiplying operational circuit 12 End is used for receive capabilities selection mode signal;It include third input terminal in the partial product switched circuit 13, it is described for receiving Function selection mode signal.Optionally, the function selection mode signal is for determining that the data processor can currently be handled The data operation of different mode.
In the present embodiment, each data processing unit that the first multiplying operational circuit 11 includes can receive function choosing Select mode signal;Each data processing unit that second multiplying operational circuit 12 includes can receive function selection mode letter Number.It should be noted that first multiplying electricity of the data processor in same secondary data calculation process, in data processor Road 11, the second multiplying operational circuit 12 and partial product switched circuit 13, the function selection mode signal received can phase Deng.Optionally, above-mentioned function selection mode signal may include four kinds of different signals, four kinds of function selection mode signal difference Corresponding data processor can handle the data operation of four kinds of different modes, and the data operation of four kinds of different modes may includeNPosition *NThe multiplying of position data,NPosition *NPosition data multiply accumulating operation, 2NPosition * 2NThe multiplying and 2 of position dataNPosition *NPosition Data multiply accumulating operation.For example, if the first data include two 2NSeat data, the second data include two 2NSeat number According to then data processor can determine current accessible specific mode according to the different function selection mode signal received Data operation;Four kinds of function selection mode signals can be expressed as 00,01,10,11 with binary numeral, or Other representations, wherein mode=00 can currently be handled with characterize data processorNPosition *NThe multiplying of position data, Mode=01 can currently be handled with characterize data processorNPosition *NPosition data multiply accumulating operation, and mode=10 can characterize number 2 can be currently handled according to processorNPosition * 2NThe multiplying of position data, mode=11 can currently be handled with characterize data processor 2NPosition *NPosition data multiply accumulating operation;It will also be appreciated that four kinds of function selection mode signals and four kinds of different modes There can be arbitrary one-to-one relationship between data operation, the present embodiment does not do any restriction to this.
In addition, working as data processor processes 2NPosition *NWhen multiplying accumulating operation of data of position, the partial product in data processor Switched circuit 13 can according to actual needs, by the first amendment coding sub-circuit 111 in data processor, obtained sign bit The first low portion product after extension or the first high-order portion product after symbol Bits Expanding, are repaired with second in data processor It is positive to encode sub-circuit 121, the second low portion product after obtained symbol Bits Expanding or second high position after symbol Bits Expanding Partial product swaps;It is also understood that being, data processor is when handling the data operation of other Three models, data processing Partial product switched circuit 13 in device is vacant state, the long-pending high position with after symbol Bits Expanding of the low portion after symbol Bits Expanding Partial product does not do corresponding exchange processing.Meanwhile first the bit wides of two subdatas for including in data be 2N, in the second data The bit wide for two subdatas for including also is 2NIf data processor can currently handle oneNPosition *NThe multiplication fortune of position data When calculation, according to actual needs, having a data in the first data and the second data at this time is 0, another data include two High-order numerical value in subdata be 0 or low level numerical value be 0, the first data and the second data can be according to original at this time Data carry out calculation process;If data processor can currently handle one 2NPosition * 2NWhen the multiplying of position data, according to reality Demand, having a data in the first data and the second data at this time is 0, in two subdatas of another data high-order numerical value and Low level numerical value is non-zero numerical value;If data processor can currently handle two 2NPosition * 2NWhen the multiplying of position data, according to Data 0 are not present in the first data and the second data at this time in actual demand.
A kind of data processor provided in this embodiment passes through the first amendment coding sub-circuit and the second amendment coding electricity Canonical signed number coded treatment is realized to the data received respectively in road, the first partial product after respectively obtaining symbol Bits Expanding And the second partial product after symbol Bits Expanding, and determine the need for passing through portion according to the function selection mode signal received Divide product switched circuit, place is swapped to the second partial product after the first partial product and symbol Bits Expanding after symbol Bits Expanding Reason, if desired swaps processing, then after exchanging processing, the first amendment coding sub-circuit and the second amendment encode sub-circuit can be with Respectively current each sub-circuit, partial product of the partial product as target code after the symbol Bits Expanding having, and then obtain mesh The first partial product of coding and the second partial product of target code are marked, is repaired finally by the first amendment compression sub-circuit and second The second partial product of positive compression sub-circuit difference, first partial product and target code to target code carries out compression processing, Obtain target operation result;The data processor can not only realize multiplying, additionally it is possible to which realization multiplies accumulating operation, to mention The high versatility of data processor;In addition, the data processor does not need to carry out one-accumulate again to multiplication result Operation could be completed to multiply accumulating arithmetic operation, can be only directly realized by and be multiplied accumulating or multiplying by once-through operation process Operation, to reduce the power consumption of data processor;In addition, data processor, which can also carry out canonical to the data received, to be had Symbolic number coded treatment, obtain live part product number it is less, thus reduce data processor realize multiplying or Multiply accumulating the complexity of operation.
As shown in Fig. 2, Fig. 2 is a kind of structural schematic diagram for data processor that another embodiment provides, the data processing Device includes canonical signed number coding circuit 21, first partial product obtains circuit 22, second partial product obtains circuit 23, first and presses Contracting circuit 24 and the second compressor circuit 25;The canonical signed number coding circuit 21 includes canonical signed number coded treatment Unit 211, the output end of the canonical signed number coding processing unit 211 and the first partial product obtain the of circuit 22 One input terminal, the output end and the second partial product of the canonical signed number coding processing unit 211 obtain circuit 23 First input end connection, the first partial product obtain the first input of the output end and first compressor circuit 24 of circuit 22 End connection, the output end that the second partial product obtains circuit 23 are connect with the first input end of second compressor circuit 25.
Wherein, the canonical signed number coding processing unit 211 is used to have the first data progress canonical received Symbolic number coded treatment obtains target code, and the first partial product obtains circuit 22 for receiving the second data, and according to institute It states target code and second data obtains the first partial product of target code, the second partial product obtains circuit 23 and uses In reception second data, and the second part of target code is obtained according to the target code and second data Product, first compressor circuit 24 are used to carry out the first partial product of the target code accumulation process, second compression Circuit 25 is used to carry out accumulation process to the second partial product of the target code.
Specifically, above-mentioned first data and the second data may each comprise two subdatas, two in first data Subdata can be used as multiplying or multiply accumulating the multiplier in operation, and two subdatas in the second data, which can be used as, to be multiplied Method operation multiplies accumulating multiplicand in operation.Optionally, the bit wide of subdata can be 2N, in addition, above-mentioned first data In two subdatas can splice after as a whole, be input to canonical signed number coding processing unit 211, can be with It separates while being input to canonical signed number coding processing unit 211;Two subdatas in above-mentioned second data can splice Afterwards as a whole, it is input to first partial product and obtains circuit 22 and second partial product acquisition circuit 23, can also separate same When be input to first partial product obtain circuit 22, and separate simultaneously be input to second partial product acquisition circuit 23 in.Optionally, After two subdatas in first data carry out canonical signed number coded treatment, respectively available first object coding and Second target code, and first object coding and the second target code are referred to as target code.Optionally, first object is compiled The bit wide of code can be equal to the bit wide of the second target code, and the bit wide that reason multiplier can also be presently in equal to data processor adds 1;The number of the first partial product of target code can be equal to the bit wide of first object coding;The second partial product of target code Number can be equal to the bit wide of the second target code.Optionally, above-mentioned first object coding may include that the first low level target is compiled Code and the first high-order target code, the second target code may include the second low level target code and the second high-order target code.
For example, the first data include dataAAnd dataB, the second data include dataCAnd dataDIf data processor needs It will be to dataA* dataCMultiplying is carried out, to dataB* dataDMultiplying is carried out, then the canonical in data processor has Symbolic number coding processing unit 211 can be to dataAIt carries out canonical signed number coded treatment and obtains first object coding, and is right DataBIt carries out canonical signed number coded treatment and obtains the second target code, and canonical signed number coding processing unit 211 First object can be encoded to (and/or second target code) and dataC(or second data) are input to first partial product Circuit 22 is obtained, by the second target code (and/or first object coding) and dataD(or second data) are input to second Partial product obtains circuit 23;Or first object is encoded into (and/or second target code) and dataC(or second data) It is input to second partial product and obtains circuit 23, by the second target code (and/or first object coding) and dataD(or the Two data) it is input to first partial product acquisition circuit 22;Meanwhile if first partial product obtains circuit 22 and second partial product obtains What circuit 23 received is the second data that two subdatas are spliced, then first partial product obtains circuit 22 and second part Product obtains circuit 23 and can split the second data (i.e. multiplicand), respectively obtains the subnumber for needing to carry out multiplying According to, and according to actual needs, it is encoded by the subdata and first object of acquisition or the second target code obtains partial product;It is above-mentioned Actual demand it can be appreciated that data processor current desired multiplicand to be processed and corresponding target code corresponding relationship. In addition, if the bit wide of first object coding can be equal to 2N, then the first high-order target code can be equal in first object coding High N data, the first low level target code can for first object coding in low N data.
It should be noted that first partial product, which obtains circuit 22, can receive canonical signed number volume in data processor The first object coding and multiplicand that code processing unit 211 inputs, obtain the first partial product of target code;Second partial product Obtaining circuit 23 can receive the second target code and multiplicand of the input of canonical signed number coding processing unit 211, obtain To the second partial product of target code.Optionally, the first partial product of above-mentioned target code may include the first of target code The first high-order portion product of low portion product and target code;The second partial product of above-mentioned target code may include that target is compiled The second low portion product of code and the second high-order portion product of target code.Optionally, the first low portion of target code Product can be the corresponding partial product of the first low level target code, and the first high-order portion product of target code can be the first high-order mesh Mark encodes corresponding partial product;Second low portion product of target code can be the corresponding part of the second low level target code Second high-order portion product of product, target code can be the corresponding partial product of the second high position target code.
Further, the first compressor circuit 24 in data processor can obtain circuit 22 to first partial product, obtain Target code first partial product (i.e. the first low portion of target code is long-pending and the first high-order portion product of target code) Carry out accumulation process;The second compressor circuit 25 in data processor can obtain circuit 23, obtained mesh to second partial product The second partial product (i.e. long-pending the second high-order portion product with target code of the second low portion of target code) for marking coding carries out Accumulation process, to obtain target operation result.In addition, in the present embodiment, the first data that data processor receives with And second in data, the bit wide for the subdata for including is 2N
Optionally, include first input end in the canonical signed number coding processing unit 211, selected for receive capabilities Select mode signal;The first partial product obtains circuit 22 and the second partial product obtains circuit 23 including the second input End, for receiving the function selection mode signal;First compressor circuit 24 and second compressor circuit 25 include Second input terminal, for receiving the function selection mode signal.Optionally, the function selection mode signal is for determining institute State the data operation of the currently processed different mode of data processor.
It is understood that above-mentioned function selection mode signal (mode) can there are four types of unlike signal, these four functions Selection mode signal (mode) corresponds to the data operation that the data processor can handle four kinds of different modes.Optionally, When same secondary data calculation process, canonical signed number coding processing unit 211, first partial product in data processor are obtained Circuit 22, second partial product obtain circuit 23, the first compressor circuit 24 and the second compressor circuit 25, the function selection received Mode signal (mode) can be equal, and four kinds of function selection mode signals (mode) can distinguish table with binary numeral The data operation for being shown as kind of the different mode of mode=00, mode=01, mode=10, mode=11, four may includeNPosition *NPosition data Multiplying,NPosition *NPosition data multiply accumulating operation, 2NPosition * 2NThe multiplying and 2 of position dataNPosition *NPosition data Multiply accumulating operation.Wherein, the first partial product in data processor obtains circuit 22 and second partial product obtains circuit 23, The input of canonical signed number coding processing unit 211 first can be controlled and received according to the function selection mode signal received Perhaps the second target code or first object coding and the second target code carry out subsequent arithmetic to target code.
In the present embodiment, above-mentioned canonical signed number coding processing unit 211 can receive the multiplier in calculating process, And canonical signed number coded treatment is carried out to the multiplier, obtain target code.It should be noted that above-mentioned canonical signed number The method of coded treatment can characterize in the following manner: forNFor the multiplier of position, handled from low level numerical value to high-order numerical value, It is continuous if it existsl(l >=2) bit value 1 when, then can will be continuousnBit value 1 be converted to data " 1(0) l-1(- 1) ", And by remaining correspond to (N-l) bit value and conversion after (l+1) bit value is combined to obtain a new data;Then will Primary data of the new data as next stage conversion process, there is no continuous in the new data obtained after conversion processl (l >=2) until bit value 1;Wherein, rightNPosition multiplier carries out canonical signed number coded treatment, the position of obtained target code Width can be equal to (N+1).Further, in canonical signed number coded treatment, data 11 can be converted to (100- 001), i.e., data 11 can equivalence be converted to 10(-1);Data 111 can be converted to (1000-0001), i.e., data 111 can 100(-1 is converted to equivalence);And so on, it is other continuousl(l >=2) bit value 1 conversion process mode it is also similar.
For example, the multiplier that canonical signed number coding processing unit 211 receives is " 001010101101110 ", to this It is " 0010101011100(-1) 0 " that multiplier, which carries out obtained the first new data after first order conversion process, is continued to the first new number It is " 0010101100(-1) 00(-1) 0 " according to obtained the second new data after the conversion process of the second level is carried out, continues new to second Data carry out obtained third new data after third level conversion process be " 0010110(-1) 00(-1) 00(-1) 0 ", continue to the Three new datas carry out obtained the 4th new data after fourth stage conversion process be " 00110(-1) 0(-1) 00(-1) 00(-1) 0 ", Continue to carry out the 4th new data obtained the 5th new data after level V conversion process be " 010(-1) 0(-1) 0(-1) 00(- 1) 00(-1) 0 ", there is no continuous in the 5th new datal(l >=2) bit value 1, at this point, the 5th new data is properly termed as Initial code, and after carrying out the processing of cover to initial code, characterization canonical signed number coded treatment is completed to obtain centre Coding, wherein the bit wide of initial code can be equal to the bit wide of multiplier.Optionally, canonical signed number coding processing unit 211 After carrying out canonical signed number coded treatment to multiplier, obtained new data (i.e. initial code), if the highest order in new data Numerical value and time high-order numerical value are " 10 " or " 01 ", then canonical signed number coding processing unit 211 can be to the new data most One digit number value 0 is mended at high one of high-order numerical value, high three bit value for obtaining corresponding intermediate code is respectively " 010 " or " 001 ". Optionally, the bit wide that the bit wide of above-mentioned intermediate code can be presently in reason data equal to data processor adds 1.
In addition, if the data bit width that data processor receives is 2N, and can currently handleNPosition data operation, then data Canonical signed number coding processing unit 211 in processor, can be by 2NPosition data split into two groupsNPosition data carry out respectively Data operation, at this point, by obtain two groups (N+1) position intermediate code can be used as target code after being combined;If at data Reason device can currently handle 2NPosition data operation, then the canonical signed number coding processing unit 211 in data processor can be right (2 obtainedN+1) one digit number value 0(, that is, complement processing is mended at high one of the highest bit value of position intermediate code) after, by complement Treated (2N+2) position data are as target code.
Data processor provided in this embodiment, the canonical signed number coding processing unit in data processor, docking The first data received carry out canonical signed number coded treatment and obtain target code, and first partial product obtains circuit according to reception The second data and target code arrived, obtain the first partial product of corresponding target code, and second partial product obtains circuit root According to the second data and target code received, the second partial product of corresponding target code is obtained, and passes through the first compression Circuit and the second compressor circuit carry out accumulation process respectively and obtain target operation result;The data processor can be to receiving Data carry out canonical signed number coded treatment, obtain live part product number it is less, to reduce data processing Device realizes multiplying or multiplies accumulating the complexity of operation;Meanwhile the data processor can not only realize multiplying, moreover it is possible to Enough realize multiplies accumulating operation, to improve the versatility of data processor;In addition, the data processor is not needed to multiplication Operation result carries out one-accumulate operation again could complete to multiply accumulating arithmetic operation, only can be direct by once-through operation process Realization multiplies accumulating or multiplying operation, to reduce the power consumption of data processor.
Fig. 3 is the concrete structure schematic diagram of a kind of data processor that another embodiment provides, the in data processor One amendment coding sub-circuit 111 includes: the first amendment coded treatment branch 1111 and first partial product selection branch 1112, institute The output end for stating the first amendment coded treatment branch 1111 is connect with the input terminal of first partial product selection branch 1112;
Wherein, the first amendment coded treatment branch 1111 is used to carry out canonical to first data received to have symbol Number encoder processing, obtains the first object coding, and the first partial product selection branch 1112 is used for according to first mesh Mark coding obtains the first partial product after symbol Bits Expanding, selects the first partial product after the symbol Bits Expanding, and And receive the second partial product after the symbol Bits Expanding that the partial product switched circuit 13 exports, the symbol that will be received First partial product after the symbol Bits Expanding obtained after second partial product after number Bits Expanding, and selection, as described The first partial product of target code.
Specifically, the first amendment coding sub-circuit 111 can carry out canonical to the multiplier in the first data received and have Symbolic number coded treatment obtains first object coding, and according to the multiplicand and first object coding in the first data, obtains First partial product after symbol Bits Expanding.Optionally, the bit wide that the bit wide of above-mentioned first object coding can be equal to multiplier adds 1, The bit wide of first partial product after above-mentioned symbol Bits Expanding can be equal to 2 that data processor is presently in the bit wide of reason multiplicand Times.Optionally, the number of the first partial product after above-mentioned symbol Bits Expanding can be equal to the number of the first partial product of target code Mesh can also be equal to the bit wide of first object coding.Wherein, the number of the first partial product after symbol Bits Expanding can be equal to the The bit wide of one target code.
Illustratively, what data processor received is the data of two 16 bit bit wides, if data processor currently may be used The multiplying of 8 * 8 data is handled, then the first amendment coding sub-circuit 111 in data processor, it can be by 16 bits The data of bit wide are divided into, and two groups of data of most-significant byte and least-significant byte carry out calculation process respectively, at this point, after obtained symbol Bits Expanding The bit wide of first partial product can be equal to 16, most-significant byte data carry out the after the available 9 symbol Bits Expandings of calculation process One high-order portion product, least-significant byte data carry out the first low portion product after the available 9 symbol Bits Expandings of calculation process;If Data processor can currently handle the multiplying of 16 * 16 data, then the first amendment coding electricity in data processor Road 111 can carry out calculation process to two complete 16 data, at this point, the first part after obtained symbol Bits Expanding Long-pending bit wide can be equal to 32, and can obtain the first partial product after 18 symbol Bits Expandings, in first object coding High 9 bit value, the partial product after corresponding symbol Bits Expanding are properly termed as the first high-order portion product after symbol Bits Expanding;First Low 9 bit value in target code, the partial product after corresponding symbol Bits Expanding are properly termed as after symbol Bits Expanding first low Bit position product.
Optionally, the second amendment coding sub-circuit 121 includes: the second amendment coded treatment branch 1211 and second Partial product selects branch 1212, the output end and second partial product selection branch of the second amendment coded treatment branch 1211 The input terminal on road 1212 connects;The second amendment coded treatment branch 1211 is used to carry out second data received Canonical signed number coded treatment, obtains second target code, and the second partial product selection branch 1212 is used for basis Second target code obtains the second partial product after symbol Bits Expanding, to the second partial product after the symbol Bits Expanding into Row selection, and the first partial product after the symbol Bits Expanding that the partial product switched circuit 13 exports is received, it will receive First part after the symbol Bits Expanding obtained after second partial product after the symbol Bits Expanding arrived, and selection Product, the second partial product as the target code.
It should be noted that working as data processor processes 2NPosition *NWhen multiplying accumulating operation of data of position, in data processor Partial product switched circuit 13 can according to actual needs, by the first obtained symbol Bits Expanding of amendment coded treatment branch 1111 The first high-order portion product after rear the first low portion product or symbol Bits Expanding, with the second amendment coding sub-circuit 121 To symbol Bits Expanding after the second low portion product or symbol Bits Expanding after the second high-order portion product swap.It is optional , after partial product switched circuit 13 realizes exchange processing, the first amendment coded treatment branch 1111 can be encoded the first amendment First partial product in processing branch 1111 after the symbol Bits Expanding that does not exchange, with second after the symbol Bits Expanding that receives Product is divided to be combined, the first partial product as target code;Second amendment coded treatment branch 1211 can be by the second amendment Second partial product after the symbol Bits Expanding not exchanged in coded treatment branch 1211, with after the symbol Bits Expanding that receives A part product is combined, the second partial product as target code.
In the present embodiment, at the method that the first amendment coded treatment branch 1111 handles data, with the second amendment coding The method for managing the processing data of branch 1211 is essentially identical;The present embodiment handles data to the second amendment coded treatment branch 1211 Method repeats no more.
Data processor provided in this embodiment, the first amendment coded treatment branch in data processor, to receiving The first data carry out canonical signed number coded treatment and obtain the first partial product after symbol Bits Expanding, and according to data processing Device is presently in the data pattern of reason, the first partial product after selecting branch to select symbol Bits Expanding by first partial product, with The first partial product of target code is obtained, is added up by the first amendment compression sub-circuit to the first partial product of target code Processing, obtains target operation result;The data processor does not need to carry out multiplication result again one-accumulate operation Can complete multiply accumulating arithmetic operation, only by once-through operation process can be directly realized by multiply accumulating or multiplying operation, from And reduce the power consumption of data processor;Meanwhile the data processor can also have symbol to the data progress canonical received The number of number encoder processing, obtained live part product is less, realizes multiplying to reduce data processor or multiplies tired Add the complexity of operation.
As one of embodiment, the first amendment coded treatment branch 1111 in data processor includes: first to repair Positive coding unit 1111a, low portion product acquiring unit 1111b, low level selector group unit 1111c, high-order portion product obtain Unit 1111d and high digit selector group unit 1111e, the first output end of the first amendment coding unit 1111a and institute State low portion product acquiring unit 1111b first input end connection, the output end of the low level selector group unit 1111c with The second input terminal connection of the low portion product acquiring unit 1111b, the second of the first amendment coding unit 1111a are defeated Outlet is connect with the first input end of high-order portion product acquiring unit 1111d, the high digit selector group unit 1111e's Output end is connect with the second input terminal of high-order portion product acquiring unit 1111d.
Wherein, the first amendment coding unit 1111a is used to carry out canonical to first data received to have symbol Number coded treatment determines that the data processor can handle the position of data according to the function selection mode signal received Width, and first object coding is obtained according to the bit wide that the data processor can handle data, the low portion product obtains single First 1111b is used for the first low level target code and first data in the first object coding that basis receives, The first low portion product after obtaining symbol Bits Expanding, the low level selector group unit 1111c is for gating the sign bit The numerical value in the first low portion product after extension, the high-order portion product acquiring unit 1111d are used for according to the institute received The first high-order target code and first data in first object coding are stated, first after obtaining symbol Bits Expanding is high-order Partial product, the high digit selector group unit 1111e is in the first high-order portion product after gating the symbol Bits Expanding Numerical value.
Specifically, above-mentioned first amendment coded treatment branch 1111 can receive the multiplier in the first data, and this is multiplied Number carries out canonical signed number coded treatments and obtains first object coding, and low portion product acquiring unit 1111b can be according to connecing The first object coding that the multiplicand in the first data received and the first amendment coding unit 1111a are obtained, obtains symbol Low portion product after Bits Expanding;High-order portion product acquiring unit 1111d can be according to being multiplied in the first data received The first object coding that number and the first amendment coding unit 1111a are obtained, the high-order portion product after obtaining symbol Bits Expanding. Wherein, above-mentioned first data may include multiplying or multiply accumulating multiplier and multiplicand in operation.If data processor Currently accessible data bit width isNBit, two numbers that the first amendment coding unit 1111a in data processor is received According to bit wide be 2NBit, then the first amendment coding unit 1111a will can receive 2 automaticallyNPosition data split into heightNPosition Data and lowNPosition data;Then respectively to heightNPosition data and lowNPosition data carry out canonical signed number coded treatment, obtain The bit wide of the first high-order target code be equal toNAdd 1, the bit wide of the first obtained low level target code is also equal toNAdd 1;Meanwhile The long-pending number with the first low portion product of target code of first high-order portion of obtained correspondence target code, can be equal to (N+1);If currently accessible data bit width is 2 to data processorN, the first amendment coded treatment branch in data processor The bit wide of 1111 two data received is 2N, then the first amendment coded treatment branch 1111 can be to receiving 2NPosition data Canonical signed number coded treatment is carried out, obtains (2N+1) intermediate code of position, and complement processing is carried out to intermediate code, it obtains (2N+2) position data, by this (2N+2) data of position are encoded as first object, wherein complement processing can be characterized as to data High one of highest bit value at complement value 0;At this point, first object coding in height (N+1) position data are properly termed as first High-order target code, first object coding in it is low (N+1) position data are properly termed as the first low level target code.Optionally, The highest bit value of one target code is the numerical value 0 obtained after complement is handled, is wrapped in the partial product of corresponding obtained target code The numerical value contained all can be numerical value 0.
It should be noted that above-mentioned low level selector group unit 1111c can believe according to the function selection mode received Number, the part bit value in the first low portion product after gating symbol Bits Expanding isNThe sign bit that position multiplying obtains expands The numerical value or 2 in the first low portion product after exhibitionNThe first low portion after the symbol Bits Expanding that position multiplying obtains Long-pending middle numerical value;Similarly, high digit selector group unit 1111e can gate symbol according to the function selection mode signal received The part bit value in the first high-order portion product after number Bits Expanding isNThe after the obtained symbol Bits Expanding of position multiplying Numerical value or 2 in one high-order portion productNThe number in the first high-order portion product after the symbol Bits Expanding that position multiplying obtains Value.
It is understood that if the data bit width that data processor receives can be 2NBit can currently handle 2NPosition Data operation, then the low portion product acquiring unit 1111b in data processor can be according in the first low level target code Each bit value, the low portion product after obtaining corresponding symbol Bits Expanding;Above-mentioned low level selector group unit 1111c can be selected The numerical value in the first low portion product after logical symbol Bits Expanding;Then by the low portion product after symbol Bits Expanding and after gating The numerical value in the first low portion product after the symbol Bits Expanding of acquisition is combined, the first low level after obtaining symbol Bits Expanding Partial product.Optionally, the high-order portion product acquiring unit 1111d in data processor can be according in the first high-order target code Each bit value, after obtaining corresponding symbol Bits Expanding high-order portion product;Above-mentioned high digit selector group unit 1111e can be with The numerical value in the first high-order portion product after gating symbol Bits Expanding;Then by the high-order portion product and gating after symbol Bits Expanding The numerical value in the first high-order portion product after the symbol Bits Expanding obtained afterwards is combined, and first after obtaining symbol Bits Expanding is high Bit position product.Optionally, in canonical signed number coding process, the bit wide of the first low level target code can be equal to first The bit wide of high-order target code can also be equal to lowNThe number of the first low portion product after the corresponding symbol Bits Expanding of position data Mesh, Huo ZhegaoNThe number of the first high-order portion product after the corresponding symbol Bits Expanding of position data.Optionally, the first amendment coding May include in processing branch 1111 (N+1) a low portion product acquiring unit 1111b, can also include (N+1) a high position portion Divide product acquiring unit 1111d.Optionally, above-mentioned each low portion product acquiring unit 1111b may include 4NA numerical value is raw At subelement, each high-order portion product acquiring unit 1111d also may include 4NA numerical generation subelement, and each number The one digit number value in the first low portion product after the value generation available symbol Bits Expanding of subelement.Meanwhile low portion product Acquiring unit 1111b can determine the first low level of target code according to the product of the first low portion after obtained symbol Bits Expanding Partial product, high-order portion product acquiring unit 1111d can be determined according to the product of the first high-order portion after obtained symbol Bits Expanding The first high-order portion product of target code.
In addition, the second amendment coded treatment branch 1211 and the first amendment coded treatment branch 1111, realize that canonical has symbol The method of number coded treatment is identical, and the second amendment coded treatment branch 1211 and the first amendment coded treatment branch 1111 Internal structure and external output port function it is also identical, therefore, the present embodiment to second amendment coded treatment branch The method and structure of 1211 processing data repeats no more.
A kind of data processor provided in this embodiment, data processor pass through the in the first amendment coded treatment branch One amendment coding unit carries out canonical signed number coded treatment to the data that receive, obtains the first low level target code and the One high-order target code, and low portion product acquiring unit obtained according to the first low level target code it is low after symbol Bits Expanding Bit position product, high-order portion product acquiring unit obtain the high-order portion product after symbol Bits Expanding according to the first high-order target code, And then it determines the need for handing over the low portion product after symbol Bits Expanding and the high-order portion product after symbol Bits Expanding Processing is changed, to obtain the partial product of target code, and accumulation process is carried out to the partial product of target code, obtains target operation knot Fruit;The data processor can not only realize multiplying, additionally it is possible to which realization multiplies accumulating operation, to improve data processor Versatility;Meanwhile the data processor can also carry out canonical signed number coded treatment to the data received, obtain The number of live part product is less, realizes multiplying to reduce data processor or multiplies accumulating the complexity of operation.
As one of embodiment, the first amendment coding unit 1111a in data processor includes: that the first data are defeated Inbound port 1111aa, first mode selection signal input port 1111ab, low level target code output port 1111ac and height Position target code output port 1111ad;The first data-in port 1111aa is described for receiving first data For first mode selection signal input port 1111ab for receiving the function selection mode signal, the low level target code is defeated For exit port 1111ac for exporting to after first data progress canonical signed number coded treatment, described first obtained is low Position target code, the high position target code output port 1111ad have symbol to first data progress canonical for exporting After number encoder processing, the high-order target code of described first obtained.
Specifically, the first amendment coding unit 1111a in data processor can pass through first in multiplication procedure Data-in port 1111aa receives the first data, is selected by first mode selection signal input port 1111ab receive capabilities Mode signal carries out canonical signed number coded treatment to the multiplier in the first data and obtains intermediate code, and according to receiving Function selection mode signal determine the need for intermediate code carry out complement processing, and then obtain first object coding, so The first low level target code in first object coding is exported by low level target code output port 1111ac afterwards, passes through a high position Encode the first high-order target code in output port 1111ad output first object coding.
A kind of data processor provided in this embodiment, which, which can carry out canonical to the data received, has Symbolic number coded treatment, to reduce the number of the live part obtained in multiplication procedure product, to reduce data processor The complexity for realizing multiplying, improves the operation efficiency of multiplying, effectively reduces the power consumption of data processor.
The low portion product acquiring unit 1111b in data processor includes: low level target in one of the embodiments, Coding input port 1111ba, gating value input mouth 1111bb, the first data-in port 1111bc and low portion Product output port 1111bd;The low level target code input port 1111ba is for receiving the first amendment coding unit The first low level target code of 1111a input, the gating value input mouth 1111bb is for receiving the low level choosing After selecting device group unit 1111c gating, numerical value in the first low portion after obtained symbol Bits Expanding product, described first Data-in port 1111bc is for receiving first data, and the low portion product output port 1111bd is for exporting institute The first low portion product after stating symbol Bits Expanding.
Specifically, the low portion product acquiring unit 1111b in data processor passes through low level target code input port 1111ba can receive the first low level target code of the first amendment coding unit 1111a output, and be inputted by the first data Port 1111bc can receive the multiplicand in the first data.Optionally, low portion product acquiring unit 1111b can be according to connecing The the first low level target code received, and the multiplying that receives or multiply accumulating multiplicand in operation, it obtains corresponding The first low portion product after symbol Bits Expanding.Optionally, if the first data in low portion product acquiring unit 1111b input The multiplicand bit wide that port 1111bc is received isN, then after low portion accumulates the symbol Bits Expanding that acquiring unit 1111b is obtained The bit wide of first low portion product can be equal to 2N.Illustratively, if low portion product acquiring unit 1111b receives oneN The multiplicand of bit bit wideX, then low portion product acquiring unit 1111b can be according to multiplicandXAnd first low level target compile That is, -1,1 and 0 three kinds of numerical value for including in code obtain corresponding initial protion product, and obtain sign bit according to initial protion product Low portion product after extension, in the low portion product after the symbol Bits Expanding it is low (N+ 1) bit value can be equal to original portion All numerical value that point product includes, the low portion after symbol Bits Expanding accumulate in height (N- 1) bit value can be equal to original portion Divide the symbol bit value (i.e. highest bit value) of product.Wherein, when the numerical value in the first low level target code is -1, then original portion Point product can for-X, when the numerical value in the first low level target code is 1, then initial protion product can beX, when the first low level mesh When numerical value in mark coding is 0, then initial protion product can be 0.
It should be noted that low portion product acquiring unit 1111b can be connect by gating value input mouth 1111bb When receiving the data operation of the different mode of low level selector group unit 1111c gating, first after obtained symbol Bits Expanding is low Correspondence bit value in bit position product;It then will be after low portion product acquiring unit 1111b currently available symbol Bits Expanding Low portion product, is combined with the corresponding bit value after gating, the first low portion product after obtaining symbol Bits Expanding.
Optionally, the high-order portion product acquiring unit 1111d in data processor includes: high-order target code input port 1111da, gating value input mouth 1111db, data-in port 1111dc and high-order portion product output port 1111dd; The high position target code input port 1111da is used to receive first high position of the first amendment coding unit 1111a output Target code, it is defeated after the gating value input mouth 1111db is for receiving the high digit selector group unit 1111e gating The numerical value in the first high-order portion product after the symbol Bits Expanding out, the data-in port 1111dc is for receiving institute The first data are stated, the high-order portion product output port 1111dd is for exporting the first high-order portion after the symbol Bits Expanding Product.
It is understood that high-order portion product acquiring unit 1111d obtains the first high-order portion product after symbol Bits Expanding Method, with low portion product acquiring unit 1111b obtain symbol Bits Expanding after the first low portion product method it is identical, this The method that embodiment repeats no more high-order portion product acquiring unit 1111d fetching portion product.In addition, low portion accumulates acquiring unit The internal circuit configuration of 1111b and high-order portion product acquiring unit 1111d can be identical, and the function of external output port can also be with It is identical, the specific structure of this embodiment is not repeated high-order portion product acquiring unit 1111d.
A kind of data processor provided in this embodiment, the low portion product acquiring unit in data processor can basis First low level target code obtains the product of the low portion after symbol Bits Expanding, then by after symbol Bits Expanding low portion product with The numerical value of low level selector group one-cell switching is combined, the first low portion product after obtaining symbol Bits Expanding, and then is determined Whether place is swapped to the first low portion product after symbol Bits Expanding and the first high-order portion product after symbol Bits Expanding Reason to obtain the partial product of target code, and carries out accumulation process to the partial product of target code, obtains the data of different mode Operation result;The data operation processing of different mode may be implemented in the data processor, to improve the logical of data processor The property used;Meanwhile after the data processor carries out canonical signed number coded treatment to the data received, the live part of acquisition Long-pending number is less, to reduce the complexity that data processor realizes multiplying.
The low level selector group unit 1111c in data processor includes: low level selector in one of the embodiments, 1111ca, multiple low level selector 1111ca are used for the numerical value in the first low portion product after the symbol Bits Expanding It is gated.
Specifically, in above-mentioned low level selector group unit 1111c low level selector 1111ca number, can be equal to 3N* (N+ 1), 2NIt can indicate that data processor is presently in the bit wide of reason data, it is each in low level selector group unit 1111c The internal circuit configuration of a low level selector 1111ca can be identical.Optionally, multiplying or when multiplying accumulating operation, first repairs Positive coding unit 1111a connection correspondence (N+ 1) in a low portion product acquiring unit 1111b, each low portion product is obtained Unit 1111b is taken to may include 4NA numerical generation subelement, wherein 2NA numerical generation subelement can connect 2NA low level Selector 1111ca, this 2NA numerical generation subelement can connect a low level selector 1111ca.Optionally, 2NIt is a low Digit selector 1111ca corresponding 2NA numerical generation subelement can be high in the first low portion product after sign bit extension 2NThe corresponding numerical generation subelement of position data, meanwhile, this 2NThe external input port of a low level selector 1111ca is in addition to function Outside, there are two other input ports for energy selection mode signal input port (mode).Optionally, if data processor can be located Manage the data operation of four kinds of different modes, and the bit wide of data that data processor receives is 2N, then above-mentioned low level selection The signal that two other input ports of device 1111ca can receive is respectively numerical value 0, carries out 2 with data processorNBit The first low portion product when wide data operation, after the correspondence symbol Bits Expanding that low portion product acquiring unit 1111b is obtained In symbol bit value.Wherein, (N+ 1) a low portion product acquiring unit 1111b can connect (N+ 1) 2 are organizedNA low level selection Device 1111ca, the 2 of each groupNThe correspondence symbol bit value that a low level selector 1111ca is received can be identical, can not also phase Together;But the 2 of same groupNThe symbol bit value that a low level selector 1111ca is received is identical, and the symbol digit Value can be according to each group 2NA low level selector 1111ca, what the low portion product acquiring unit 1111b being correspondingly connected with was obtained The symbol bit value in the first low portion product after symbol Bits Expanding obtains.
In addition, each low portion accumulates 4 that acquiring unit 1111b includesNA numerical generation subelement, wherein corresponding toN A numerical generation subelement can be not connected to low level selector 1111ca, at this point, shouldNThe number that a numerical generation subelement obtains Value can be presently in the numerical value in the first low level target code of the data acquisition for managing different bit wides for data processor, obtain To correspondence symbol Bits Expanding after the first low portion product in correspondence bit value;It is also understood thatNA numerical generation The numerical value that unit obtains can correspond in the first low portion product after corresponding symbol Bits Expanding from lowest order (i.e. the 1st) It is counted to highest order, the 1st toNAll numerical value between bit value.
It should be noted that each above-mentioned low portion accumulates 4 that acquiring unit 1111b includesNA numerical generation is single It is remaining in memberNA numerical generation subelement also can connectNA low level selector 1111ca, each numerical generation are single Member can connect 1 low level selector 1111ca;It shouldNThe external input port of a low level selector 1111ca is selected in addition to function Outside, there are two other input ports for mode signal input port (mode);The letter that the two other input ports can receive Number, respectively data processor carries out 2NPosition data operation, in the first low portion product after obtained correspondence symbol Bits Expanding Symbol bit value and data processor carry out 2NPosition data operation, the low portion after obtained correspondence symbol Bits Expanding Bit value is corresponded in product.Wherein, (N+ 1) a low portion product acquiring unit 1111b can connect (N+ 1) groupNA low level selection Device 1111ca, each groupNThe symbol bit value that a low level selector 1111ca is received can be identical, can not also be identical;But It is, same groupNThe symbol bit value that a low level selector 1111ca is received is identical, and the symbol bit value can be with According to each groupNA low level selector 1111ca, the sign bit that the low portion product acquiring unit 1111b being correspondingly connected with is obtained expand The symbol bit value in the first low portion product after exhibition obtains.
In addition, each groupNIn the first low portion product after the symbol Bits Expanding that a low level selector 1111ca is received Correspondence bit value can accumulate acquiring unit 1111b according to the low portion that this group of low level selector 1111ca is connected, acquisition The correspondence bit value in the first low portion product after symbol Bits Expanding determines;And each groupNA low level selector In 1111ca, the correspondence bit value that each low level selector 1111ca is received can be identical, can not also be identical.Wherein, 4 in each low portion product acquiring unit 1111bNThe position distribution rule of a numerical generation subelement, can be at upper one 4 in low portion product acquiring unit 1111bNOn the basis of a numerical generation subunit position, it is single to move to left numerical generation Member.Optionally, it participates in the first low portion product of all target codes of subsequent arithmetic, only the of first aim coding The bit wide of one low portion product, the bit wide 4 of the first low portion product after first symbol Bits Expanding can be equal toN;It is remaining The bit wide of the first low portion product of target code all can be one few on the basis of the first partial product of a upper target code, And the bit wide of the first high-order portion product of the last one target code can be equal to (2N-1).
Optionally, the high digit selector group unit 1111e includes: high digit selector 1111ea, multiple high-order choosings Device 1111ea is selected for gating to the numerical value in the first high-order portion product after the symbol Bits Expanding.
It should be noted that the method for high digit selector 1111ea gating numerical value can describe by the following method.
Optionally, in above-mentioned high digit selector group unit 1111e high digit selector 1111ea number, can be equal to 3N* (N+ 1), 2NIt can indicate that data processor is presently in the bit wide of reason data, it is each in the high digit selector group unit 1111e The internal circuit configuration of a high digit selector 1111ea can be identical.Optionally, multiplying or when multiplying accumulating operation, first repairs Positive coding unit 1111a can connect (N+ 1) a high-order portion accumulates acquiring unit, each high-order portion accumulates in acquiring unit, It may include 4NA numerical generation subelement, wherein 2NA numerical generation subelement can connect 2NA high digit selector 1111ea, each numerical generation subelement connect digit selector 1111ea one high.Optionally, above-mentioned 2NA high digit selector 1111ea corresponding 2NA numerical generation subelement can be in the high-order portion product of target code low 2NThe corresponding number of bit value Value generates subelement, this 2NThe external input port of a high digit selector 1111ea is in addition to function selection mode signal input port (mode) outside, there are two other input ports.Optionally, if data processor can handle the data fortune of four kinds of different modes It calculates, and the bit wide of data that data processor receives is 2N, then two other inputs of above-mentioned high digit selector 1111ea Received signal is distinguished in port can carry out 2 for 0 and data processorNWhen the data operation of bit bit wide, high-order portion product The correspondence bit value in partial product after the correspondence symbol Bits Expanding that acquiring unit obtains.Wherein, (N+ 1) a high-order portion product obtains Take unit can connect (N+ 1) 2 are organizedNA high digit selector 1111ea, the 2 of each groupNWhat a high digit selector 1111ea was received Corresponding bit value can be identical, can not also be identical.
In addition, each high-order portion accumulates 4 that acquiring unit includesNIt is corresponding in a numerical generation subelementNA numerical value is raw It can connect at subelementNA high digit selector 1111ea, each numerical generation subelement can connect 1 high digit selector 1111ea, shouldNA high digit selector 1111ea can be identical with the internal circuit configuration of selector, and shouldNA high-order selection The external input port of device 1111ea is other than function selection mode signal input port (mode), and there are two other input terminals Mouthful, the two other input ports distinguish received signal, can carry out 2 for data processorNPosition data operation, obtained pair Symbol bit value and data processor in partial product after answering symbol Bits Expanding carry out 2NPosition data operation, obtained correspondence Symbol bit value in partial product after symbol Bits Expanding.Wherein, (N+ 1) a high-order portion product acquiring unit can connect (N+ 1) groupNA high digit selector 1111ea, each groupNThe symbol bit value that a high digit selector 1111ea is received can be identical, Can not also be identical, still, same groupNThe symbol bit value that a high digit selector 1111ea is received is identical, and The symbol bit value can be according to each groupNA high digit selector 1111ea, the high-order portion product acquiring unit being correspondingly connected with obtain The symbol bit value in partial product after the symbol Bits Expanding taken obtains.In addition, each groupNA high digit selector 1111ea is received To symbol Bits Expanding after partial product in correspond to bit value, the high position that can be connected according to the high digit selector 1111ea of the group Partial product acquiring unit, the symbol bit value in partial product after the symbol Bits Expanding of acquisition determine, and each groupNA height In digit selector 1111ea, the correspondence bit value that each high digit selector 1111ea is received can be identical, can not be identical.
It should be noted that each high-order portion accumulates 4 that acquiring unit includesNIt is remaining in a numerical generation subelement 'sNA numerical generation subelement can be not connected to high digit selector 1111ea, at this point, shouldNWhat a numerical generation subelement obtained Numerical value can be presently in the data for managing different bit wides for data processor, what the obtained numerical value in high-order target code obtained The correspondence bit value in partial product after corresponding symbol Bits Expanding, it is understood that be,NWhat a numerical generation subelement obtained Numerical value can be to correspond in the high-order portion product after symbol Bits Expanding, and correspondence is counted from lowest order (i.e. the 1st) to highest order, the (2N+ 1) position is to the 3rdNAll numerical value between bit value.Wherein, 4 in each high-order portion product acquiring unitNA numerical value is raw It, can be 4 in upper high-order portion product acquiring unit at the regularity of distribution of the position of subelementNA numerical generation subelement position On the basis of setting, a numerical generation subelement is moved to left.Optionally, the high-order portion of all target codes of subsequent arithmetic is participated in In product, the bit wide of the only high-order portion product of first aim coding can be equal to 4N, the high-order portion of remaining target code Long-pending bit wide all can be one few on the basis of the high-order portion of upper target code product, and the height of the last one target code The bit wide of bit position product can be equal to (2N-1).
A kind of data processor provided in this embodiment, the low level selector group unit in data processor can gate low Numerical value in bit position product, the first low portion product after obtaining symbol Bits Expanding, and then according to first after symbol Bits Expanding Low portion product obtains the first partial product of target code, and is carried out by first partial product of the compressor circuit to target code tired Add processing, obtains the target operation result of different mode, which may be implemented the data operation processing of different mode, To improve the versatility of data processor.
Data processor includes first partial product selection branch 1112, the first part in one of the embodiments, Product selection branch 1112 includes: function selection mode signal input port (mode) 1112a, first partial product input port 1112b, second partial product input port 1112c, first partial product output port 1112d and gate unit product output port 1112e;Function selection mode signal input port (mode) 1112a is for receiving the function selection mode signal, institute It states first partial product input port 1112b and expands for receiving the sign bit that the first amendment coding sub-circuit 111 inputs First partial product after exhibition, the second partial product input port 1112c are exchanged for receiving the partial product switched circuit 13 The symbol Bits Expanding after second partial product, the first partial product output port 1112d needs the portion for exporting First partial product after dividing the product symbol Bits Expanding that switched circuit 13 swaps, the gate unit product output port 1112e is used to export the first partial product after the symbol Bits Expanding after gating, and the symbol Bits Expanding received Second partial product afterwards.
Specifically, if data processor can currently handle 2NPosition *NPosition data multiply accumulating operation, then in data processor Partial product switched circuit 13 can exchange the product of the second low portion after symbol Bits Expanding, it is low with first after symbol Bits Expanding Bit position product;Or the partial product switched circuit 13 in data processor can exchange the second high-order portion after symbol Bits Expanding Product, with the first high-order portion product after symbol Bits Expanding;At this point, first partial product selection branch 1112 can pass through second part Accumulate input port 1112c, the second partial product after the symbol Bits Expanding that receiving portion product switched circuit 13 exchanges, first partial product First partial product after selection branch 1112 and the symbol Bits Expanding for exchanging needs, passes through first partial product output port 1112d is exported to partial product switched circuit 13.Wherein, the gate unit product output port in first partial product selection branch 1112 1112e, the first partial product after the symbol Bits Expanding for not needing exchange can be exported, and after the symbol Bits Expanding that receives Second partial product;Meanwhile first partial product selection branch 1112 by do not need exchange symbol Bits Expanding after first partial product, And/or first partial product of the second partial product after the symbol Bits Expanding received as target code, it is input to the first amendment It compresses sub-circuit 112 and carries out compression processing.
A kind of data processor provided in this embodiment, data processor select branch can choose by first partial product First partial product after symbol Bits Expanding, to obtain the eastern first partial product of target code, so that data processor can not only It realizes the multiplying with bit wide data and multiplies accumulating operation, additionally it is possible to that realizes different bit wide data multiplies accumulating operation, from And improve the versatility of data processor.
Data processor includes that sub-circuit 112, the first amendment pressure are compressed in the first amendment in one of the embodiments, Contracting sub-circuit 112 includes: amendment Wallace tree group unit 1121 and summing elements 1122, the amendment Wallace tree group unit 1121 output end is connect with the input terminal of the summing elements 1122;The amendment Wallace tree group unit 1121 is used for not With mode data operation processing when, each columns value in the first partial product of the target code of acquisition carries out cumulative place Reason obtains accumulating operation as a result, the summing elements 1122 are used to carry out add operation to the accumulating operation result.
Specifically, the mesh that above-mentioned amendment Wallace tree group unit 1121 can obtain the first amendment coding sub-circuit 111 It marks each columns value in the first partial product of coding and carries out accumulation process, and pass through 1122 pairs of amendment Wallace trees of summing elements Two operation results that group unit 1121 obtains carry out accumulation process, obtain target operation result.Wherein, by correcting Wallace When tree group unit 1121 carries out accumulation process, the regularity of distribution of the first partial product of all target codes can be characterized as each Lowest order numerical value present position in the first partial product of the corresponding target code of row, than the first part that next line corresponds to target code Lowest order numerical value present position is staggered to the right one digit number value in product, still, in the first partial product of each target code most High-order numerical value is located at same row with highest order numerical value in the first partial product of first aim coding.Optionally, Hua Lai is corrected Scholar's tree group unit 1121 can be according to the regularity of distribution of the first partial product of all target codes, to the first of all target codes Each columns value in partial product carries out accumulation process.Optionally, obtain two of above-mentioned amendment Wallace tree group unit 1121 Operation result may include and position output signalSumWith carry output signalsCarry
Illustratively, if data processor currently processed 16 * 16 fixed-point number multiplyings, pass through first partial product The regularity of distribution of the first partial product for 9 target codes that selection branch 1112 obtains is as shown in fig. 4 a, wherein hollow Circle indicates that each bit value in partial product, solid circles indicate the sign extended bit value in partial product.
If data processor is circuit structure shown in Fig. 3, currently processed 16 * 8 fixed points of the data processor Number multiplies accumulating operation, the target code received by the first amendment compression sub-circuit 112 or the second amendment compression sub-circuit 122 First partial product the regularity of distribution it is as shown in Figure 4 b;Wherein, empty circles indicate that first partial product selects branch 1112 or the The partial product that two partial products selection branch 1212 obtains;Intersecting empty circles indicates that first partial product selection branch 1112 passes through portion Divide and accumulates switched circuit 13, the second partial product after the symbol Bits Expanding that the second partial product selection branch 1212 of acquisition obtains, or Person's second partial product selects branch 121 by partial product switched circuit 13, and the first partial product selection branch 1112 of acquisition obtains Symbol Bits Expanding after first partial product.
In addition, the second amendment compression sub-circuit 122 handles the method for data and the first amendment compression sub-circuit 112 handles number According to method it is identical;And the internal structure of the second amendment compression sub-circuit 122 and the first amendment compression sub-circuit 112, and The function of external output port is also identical, and the present embodiment handles the second amendment compression sub-circuit 122 method and structure of data Repeat no more.
A kind of data processor provided in this embodiment, data processor can be to mesh by the first amendment compression sub-circuit The first partial product of mark coding carries out accumulation process, and carries out accumulation process to accumulation result by summing elements, obtains target The data operation processing of different mode may be implemented in operation result, the data processor, to improve the logical of data processor With property, the area that data processor occupies AI chip is effectively reduced.
Data processor includes amendment Wallace tree group unit 1121, the amendment Hua Lai in one of the embodiments, Scholar's tree group unit 1121 includes: low level Wallace tree subelement 1121a, selector 1121b and high-order Wallace tree subelement The output end of 1121c, the low level Wallace tree subelement 1121a are connect with the input terminal of the selector 1121b, the choosing The output end for selecting device 1121b is connect with the input terminal of the high-order Wallace tree subelement 1121c;Wherein, multiple low levels Wallace tree subelement 1121a is used to carry out accumulating operation to each columns value in the first partial product of the target code, The accumulating operation is obtained as a result, the selector 1121b is received for gating the high-order Wallace tree subelement 1121c Carry input signal, multiple high position Wallace tree subelement 1121c are used for in the first partial product of the target code Each columns value carry out accumulating operation obtain the accumulating operation result.
Specifically, the circuit structure of each low level Wallace tree subelement 1121a, it can be by full adder and half adder group It closes and realizes, realization can also be combined by 4-2 compressor;The circuit structure of each high-order Wallace tree subelement 1121c, can also To combine realization by full adder and half adder, realization can also be combined by 4-2 compressor;In addition, low level Wallace tree subelement 1121a and high-order Wallace tree subelement 1121c, can be understood as one kind can be handled multidigit input signal, Multidigit input signal is added to obtain the circuit of two output signals.Optionally, it corrects high-order in Wallace tree group unit 1121 The number of Wallace tree subelement 1121c, when can be equal to data processor can currently handle multiplying or multiply accumulating operation The bit wide of multiplicandN, the number of low level Wallace tree subelement 1121a can also be equal to;Wherein, two neighboring low level Wallace It can be connected in series between tree unit 1121a, can also serially connect between two neighboring high position Wallace tree subelement 1121c It connects.Optionally, the output end of the last one low level Wallace tree subelement 1121a is connect with the input terminal of selector 1121b, choosing The output end for selecting device 1121b is connect with the input terminal of first high-order Wallace tree subelement 1121a.Optionally, Hua Lai is corrected In scholar's tree group unit 1121, each low level Wallace tree subelement 1121a can be to the first partial product of all target codes Respective column numerical value carry out addition process;Each low level Wallace tree subelement 1121a can export two signals, i.e. carry SignalCarry i With one and position signalSum i ;Wherein,iIt can indicate that each low level Wallace tree subelement 1121a is corresponding Number, the number of first low level Wallace tree subelement 1121a is 1.Optionally, each low level Wallace tree subelement 1121a receives the number of input signal, can be equal to the number of the first partial product of target code.Wherein, Wallace is corrected In tree group unit 1121, the sum of the number of high-order Wallace tree subelement 1121c and low level Wallace tree subelement 1121a can To be equal to 2N;In the first partial product of all target codes, the total columns arranged from low order column to highest can be equal to 2N,NIt is a low Position Wallace tree subelement 1121a can be to the low of the first partial product of all target codesNEach columns value in column data Accumulating operation is carried out,NA high position Wallace tree subelement 1121c can be to the height of the first partial product of all target codesNColumn Each columns value in data carries out accumulating operation.
Illustratively, if data processor currently needs to handle 2NPosition * 2NThe multiplying of position data, at this point, at data The last one low level Wallace tree that selector 1121b in reason device can be gated in amendment Wallace tree group unit 1121 is single First 1121a, the carry output signals of outputCout N As in amendment Wallace tree group unit 1121, first high-order Wallace The carry input signal that tree unit 1121c is receivedCin N+1;If data processor currently needs to handleNPosition *NPosition data Multiplying, at this point, the selector 1121b in data processor can gate numerical value 0 as amendment Wallace tree group unit In 1121, carry input signal that first high position Wallace tree subelement 1121c is receivedCin N+1;It is also understood that being number Can will currently be received according to processor 2NSeat data, are divided into heightNPosition data and lowNPosition data carry out multiplication fortune respectively It calculates, corrects in Wallace tree group unit 1121, from first low level Wallace tree subelement 1121a to the last one low level Hua Lai The reference numeral of scholar's tree unit 1121aiIt can be expressed as 1,2 respectively ...,N;From first high-order Wallace tree subelement The reference numeral of 1121c to the last one high-order Wallace tree subelement 1121ciIt can be expressed as respectivelyN+ 1,N+ 2 ..., 2N
It should be noted that amendment Wallace tree group unit 1121 in each low level Wallace tree subelement 1121a and High-order Wallace tree subelement 1121c, the signal received may each comprise carry input signalCin i , the input of partial product numerical value Signal, carry output signalsCout i .Optionally, each low level Wallace tree subelement 1121a and high-order Wallace tree are single The partial product numerical value input signal that first 1121c is received can be the number of respective column in the first partial product of all target codes Value;Each low level Wallace tree subelement 1121a and high-order Wallace tree subelement 1121c, the carry signal of outputCout i Digit can be equal toN Cout =floor((N I +N Cin ) / 2)-1.Wherein,N I It can indicate low level Wallace tree subelement The number of the partial product numerical value input signal of 1121a or high-order Wallace tree subelement 1121c,N Cin It can indicate low level Hua Lai The number of the carry input signal of scholar tree unit 1121a or high-order Wallace tree subelement 1121c,N Cout It can indicate low level The number of Wallace tree subelement 1121a or a high position least carry output signals of Wallace tree subelement 1121c,floorIt can To indicate downward bracket function.Optionally, it corrects in Wallace tree group unit 1121, each low level Wallace tree subelement The carry input signal that 1121a or high position Wallace tree subelement 1121c are received can be upper low level Wallace tree The carry output signals of unit 1121a or high position Wallace tree subelement 1121c output, and first low level Wallace tree The carry digit input signal that unit 1121a is received is numerical value 0.Wherein, first high position Wallace tree subelement 1121c is received The carry input signal arrived can be presently in the data bit width of reason different mode, with data processor by data processor It is presently in the multiplying of reason or multiplies accumulating the bit wide determination of multiplicand in operation.
A kind of data processor provided in this embodiment, data processor can be to mesh by amendment Wallace tree group unit The partial product of mark coding carries out accumulation process and obtains two-way output signal, and is carried out by summing elements to the two-way output signal Accumulation process obtains the data operation result of different mode;The data processor may be implemented at the data operation of different mode Reason effectively reduces the area that data processor occupies AI chip to improve the versatility of data processor;In addition, should Data processor does not need to carry out multiplication result again one-accumulate operation could to complete to multiply accumulating arithmetic operation, only leads to Cross once-through operation process can be directly realized by multiply accumulating or multiplying operation, to reduce the power consumption of data processor.
Data processor includes summing elements 1122 in one of the embodiments, and the summing elements 1122 include: to add Musical instruments used in a Buddhist or Taoist mass 1122a, the adder 1122a are for carrying out add operation to the accumulating operation result.
Specifically, adder 1122a can be the adder of different bit wides.Optionally, adder 1122a, which can receive, repairs The two paths of signals of positive 1121 output of Wallace tree group unit, carries out add operation, output data processor to two-way output signal It is presently in the data operation result of reason mode.Optionally, above-mentioned adder 1122a can be carry lookahead adder, this is super The bit wide of advanced potential adder alignment processing data can be equal to the position that amendment Wallace tree group unit 1121 exports operation result It is wide.
A kind of data processor provided in this embodiment, data processor can be to amendment Wallace trees by summing elements The two paths of signals of group unit output carries out accumulation process, exports the data operation result of different mode;The data processor is not It needs to carry out multiplication result again one-accumulate operation to complete to multiply accumulating arithmetic operation, only passes through once-through operation process Multiplication can be directly realized by or multiply accumulating arithmetic operation, to reduce the power consumption of data processor.
The second partial product selection branch 1212 in data processor includes: function selection in one of the embodiments, Mode signal input port (mode) 1212a, second partial product input port 1212b, first partial product input port 1212c, Second partial product output port 1212d and gate unit product output port 1212e;The function selection mode signal input part Mouth (mode) 1212a is for receiving the function selection mode signal, and the second partial product input port 1212b is for receiving Second partial product after the symbol Bits Expanding that the second amendment coding sub-circuit 121 inputs, the first partial product are defeated Inbound port 1212c is used to receive the first part after the symbol Bits Expanding obtained after the partial product switched circuit 13 exchanges Product, the second partial product output port 1212d for export the partial product switched circuit 13 needs swap it is described Second partial product after symbol Bits Expanding, the gate unit product output port 1212e are used to export the symbol after gating First partial product after second partial product after Bits Expanding, and the symbol Bits Expanding that receives.
Specifically, if data processor can currently handle 2NPosition *NPosition data multiply accumulating operation, then in data processor Partial product switched circuit 13 can exchange the second partial product after symbol Bits Expanding, with the first part after symbol Bits Expanding Product;Second partial product selection branch 1212 in data processor can pass through first partial product input port 1212c, receiving unit First partial product after dividing the product symbol Bits Expanding that switched circuit 13 exchanges, and second after the symbol Bits Expanding that needs are exchanged Partial product is exported by second partial product output port 1212d to partial product switched circuit 13.Wherein, gate unit product output end Mouth 1212e can export the second partial product after the symbol Bits Expanding for not needing exchange, and after the symbol Bits Expanding received First partial product;Then second partial product selects branch 1212 by the second part after the symbol Bits Expanding for not needing exchange It accumulates, and/or second partial product of the first partial product after the symbol Bits Expanding received as target code, is input to second and repairs Positive compression sub-circuit 122 carries out compression processing.
A kind of data processor provided in this embodiment, data processor select branch can choose by second partial product Partial product after symbol Bits Expanding, to obtain the partial product of target code, so that data processor can not only realize same bit wide The multiplying of data and multiply accumulating operation, additionally it is possible to which that realizes different bit wide data multiplies accumulating operation, to improve number According to the versatility of processor.
The partial product switched circuit 13 in data processor includes: function selection mode letter in one of the embodiments, Number input port (mode) 131, first partial product input port 132, first partial product output port 133, second partial product are defeated Inbound port 134 and second partial product output port 135, the function selection mode signal input port (mode) 131 are used for The function selection mode signal is received, the first partial product input port 132 is for receiving the first amendment coding First partial product after the symbol Bits Expanding that the needs that circuit 111 inputs exchange, the first partial product output port 133 are used for First partial product after exporting the symbol Bits Expanding, the second partial product output port 134 are repaired for receiving described second Second partial product after the symbol Bits Expanding that the needs that positive coding sub-circuit 121 inputs exchange, the second partial product output end Mouth 135 is for exporting the second partial product after the symbol Bits Expanding.
Specifically it is understood that partial product switched circuit 13 is according to function selection mode signal input port (mode) 131, the function selection mode signal received determines whether to need to exchange the first partial product after symbol Bits Expanding, with Second partial product after symbol Bits Expanding;Wherein, partial product switched circuit 13 can exchange the first low level after symbol Bits Expanding The second low portion product after partial product and symbol Bits Expanding, alternatively, partial product switched circuit 13 can exchange symbol Bits Expanding Long-pending the second high-order portion product with after symbol Bits Expanding of the first high-order portion afterwards.But in the present embodiment, only work as data Processor needs to handle 2NPosition *NWhen multiplying accumulating operation of data of position, partial product switched circuit 13 just need to exchange symbol Bits Expanding Partial product afterwards, when handling the data operation of other Three models, partial product switched circuit 13 can not need to swap place Reason.
A kind of data processor provided in this embodiment, data processor can exchange first by partial product switched circuit First partial product after the symbol Bits Expanding that amendment coding sub-circuit obtains, the sign bit obtained with the second amendment coding sub-circuit Second partial product after extension, and then realize 2NPosition *NPosition data multiply accumulating operation, which can not only realize With bit wide data multiplying and multiply accumulating operation, additionally it is possible to that realizes different bit wide data multiplies accumulating operation, to mention The high versatility of data processor.
A kind of data processor that another embodiment provides, the canonical signed number coding processing unit in data processor 211 include: the first data-in port 2111, function selection mode signal input port 2112 and target code output port 2113, first data-in port 2111 is used to receive first data for carrying out canonical signed number coded treatment, The function selection mode signal input port 2112 is for receiving the function selection mode signal, the target code output Port 2113 is for exporting to the target code after first data progress canonical signed number coded treatment, obtained.
Specifically, canonical signed number coding processing unit 211 can be according to the function selection mode signal received, really Determining data processor, currently accessible data bit width isNOr 2N.If canonical signed number coding processing unit 211 currently may be used The data bit width of processing isNWhen, then canonical signed number coding processing unit 211 can be automatically by receive two 2NSeat Data are divided into heightNPosition data (i.e. high position data) and lowNPosition data (i.e. low data), and respectively to high position data and Low data carries out canonical signed number coded treatment;If the current accessible number of canonical signed number coding processing unit 211 It is 2 according to bit wideNWhen, then canonical signed number coding processing unit 211 can be by two 2NSeat data are as a whole, right respectively The two subdatas carry out canonical signed number coded treatment.
It should be noted that the first data may include two 2NSeat data, if canonical signed number coded treatment list Member 211 is currently needed to 2NPosition data carry out canonical signed number coded treatment, then the low data in the first data can wrap Include two 2NCorresponding two low datas in the data of seat;If the current needs pair of canonical signed number coding processing unit 211N Position data are handled, then canonical signed number coding processing unit 211 can be by two 2NSeat data, are divided into two It is aNSeat data, i.e., fourNSeat data;Low data in above-mentioned first data may include two 2NSeat data pair Four low datas answered.In addition, in canonical signed number coding process, canonical signed number coding processing unit 211 The number of obtained low level target code can be equal to the number of obtained high-order target code, can also be equal to low data The number of the first low portion product of corresponding target code or the first high-order portion of the corresponding target code of high position data Long-pending number.If data processor currently processed oneNPosition*NThe multiplying of position data, at this point, the first data and the second number Having a subdata in is the height in 0, that is, the first data and the second dataNPosition data or lowNPosition data are all 0; In addition, if data processor currently processed one 2NPosition*2NThe multiplying of position data, at this point, the first data and the second data In have a subdata be 0, another subdata is 2NThe non-zero numerical value in position.
A kind of data processor provided in this embodiment, data processor pass through canonical signed number coding processing unit, Canonical signed number coded treatment is carried out to the first data received, obtains target code, and then obtain according to target code The partial product of target code, and accumulation process is carried out to the partial product of target code and obtains target operation result, it realizes a variety of The data operation of different mode is handled;The data processor can be by canonical signed number coding processing unit to receiving Data carry out canonical signed number coded treatment, and the number of obtained live part product is less, to reduce data processor It realizes multiplying or multiplies accumulating the complexity of operation;Meanwhile the data processor can be realized the data of a variety of different modes Calculation process effectively reduces the area that data processor occupies AI chip to improve the versatility of data processor.
As one of embodiment, it includes: low portion product that the first partial product in data processor, which obtains circuit 22, Acquiring unit 221, low level selector group unit 222, high-order portion product acquiring unit 223 and high digit selector group unit 224; The first input end of the low portion product acquiring unit 221 and the first input of high-order portion product acquiring unit 223 End, connect with the output end of the canonical signed number coding processing unit 211, the low portion product acquiring unit 221 The second input terminal connect with the output end of the low level selector group unit 222, high-order portion product acquiring unit 223 Second input terminal is connect with the output end of the high digit selector group unit 224.
Wherein, low portion product acquiring unit 221 be used for according to the low level target code in the target code with And second data, the first low portion product after obtaining symbol Bits Expanding, and according to first after the symbol Bits Expanding Low portion product obtains the first low portion product of target code, and the low level selector group unit 222 is used for basis and receives The function selection mode signal, after gating the symbol Bits Expanding the first low portion product in numerical value, the high position Partial product acquiring unit 223 be used for according in the target code high-order target code and second data, accorded with The first high-order portion product after number Bits Expanding, and target code is obtained according to the first high-order portion product after the symbol Bits Expanding The first high-order portion product, the high digit selector group unit 224 is used for according to the function selection mode signal that receives, The numerical value in the first high-order portion product after gating the symbol Bits Expanding.
Specifically it is understood that low portion product acquiring unit 221 can be according to canonical signed number coding unit 211 Each bit value in the low level target code of input, the low portion product after obtaining corresponding symbol Bits Expanding;Low level selection Device group unit 222 can gate to obtain the numerical value in the first low portion product after symbol Bits Expanding;Then by symbol Bits Expanding Low portion product afterwards is combined with the numerical value in the first low portion product after the symbol Bits Expanding after gating, obtains symbol The first low portion product after Bits Expanding, and the first of target code is obtained according to the first low portion product after symbol Bits Expanding Low portion product.Similarly, the height that high-order portion product acquiring unit 223 can be inputted according to canonical signed number coding unit 211 Each bit value in the target code of position, the high-order portion after obtaining the corresponding symbol Bits Expanding of high position data in the first data Product;High digit selector group unit 224 can gate to obtain the numerical value in the first high-order portion product after symbol Bits Expanding;Then will High-order portion product after symbol Bits Expanding and the numerical value in the first high-order portion product after the symbol Bits Expanding after gating, are accorded with The first high-order portion product after number Bits Expanding, and the of target code is obtained according to the first high-order portion product after symbol Bits Expanding One high-order portion product.
In the present embodiment, the first partial product of target code can pass through the first low portion of target code product and target The first high-order portion product of coding obtains.If the bit wide of first object coding can be equal to 2N, in the first low level target code Numerical value since lowest order numerical value corresponding number can for 1 ...,N, then the first low level portion after corresponding symbol Bits Expanding Point product reference numeral may be 1 ...,N, the reference numeral and symbol Bits Expanding of the first low portion product of target code The reference numeral of the first low portion product afterwards is similar;Meanwhile if numerical value in the first high-order target code from lowest order numerical value Starting corresponding number can beN+1 ..., 2N, then after corresponding symbol Bits Expanding the first high-order portion product reference numeral OrN+1 ..., 2N, the reference numeral of the first high-order portion product of target code and first after symbol Bits Expanding are high-order The reference numeral of partial product is similar;And then the regularity of distribution of the first partial product of all target codes can be characterized as, first First low portion product of a target code can be equal to the first low portion product after first symbol Bits Expanding, i.e., and first The first partial product of target code;Since the first low portion product that second target encodes, the of each target code The highest bit value of one low portion product, the highest order numerical value for the first partial product that can be encoded with first aim are located at same Column;It is equivalent to the lowest order numerical value of the first low portion product of each target code, it is low with the first of a upper target code The lowest order numerical value of bit position product is staggered one to the left, next target of the first low portion product of the last one target code The first partial product of coding can be the first high-order portion product of first aim coding;Wherein, the of first aim coding The bit wide of one high-order portion product can be equal toN, it is equivalent to the product respective column of the first low portion after first symbol Bits Expanding On the basis of, what the first high-order portion product after first symbol Bits Expanding moved to leftNBit value is not first of target code Numerical value in point product, the distribution mode of first high-order portion product of other target codes.
It should be noted that if data processor can currently handle 2NPosition * 2NThe multiplying of position data, then data processing First partial product in device obtain circuit 22 may include (N+1) a low portion product acquiring unit 221, and (N+1) an a high position Partial product acquiring unit 223;At this point, each low portion product acquiring unit 221 may include 4NA numerical generation subelement, Each high-order portion product acquiring unit 223 also may include 4NA numerical generation subelement.If data processor currently needs It is rightNPosition data are handled, then first partial product in data processor obtain circuit 22 may include (N+1)/2 low level portion Divide and accumulates acquiring unit 221, and (N+1)/2 high-order portion product acquiring unit 223;At this point, each low portion product obtains list Member 221 may include 2NA numerical generation subelement, each high-order portion product acquiring unit 223 may include 2NA numerical value is raw A numerical value in first partial product at subelement, after the available symbol Bits Expanding of each numerical generation subelement.
Optionally, it includes: low portion product acquiring unit 231, low level selector that the second partial product, which obtains circuit 23, Group unit 232, high-order portion product acquiring unit 233 and high digit selector group unit 234;The low portion product acquiring unit 231 first input end and the high-order portion product acquiring unit 233 first input end, with the canonical signed number The output end of coding processing unit 211 connects, and the second input terminal and the low level of the low portion product acquiring unit 231 select Select the output end connection of device group unit 232, the second input terminal of the high-order portion product acquiring unit 233 and the high-order selection The output end of device group unit 234 connects.
Wherein, low portion product acquiring unit 231 be used for according to the low level target code in the target code with And second data, the first low portion product after obtaining symbol Bits Expanding, and according to first after the symbol Bits Expanding Low portion product obtains the first low portion product of target code, and the low level selector group unit 232 is used for basis and receives The function selection mode signal, after gating the symbol Bits Expanding the first low portion product in numerical value, the high position Partial product acquiring unit 233 be used for according in the target code high-order target code and second data, accorded with The first high-order portion product after number Bits Expanding, and target code is obtained according to the first high-order portion product after the symbol Bits Expanding The first high-order portion product, the high digit selector group unit 234 is used for according to the function selection mode signal that receives, The numerical value in the first high-order portion product after gating the symbol Bits Expanding.
In addition, the method that first partial product obtains the first partial product that circuit 22 obtains after symbol Bits Expanding, with second Point product obtain circuit 23 obtain the second partial product after symbol Bits Expanding method it is identical, this embodiment is not repeated second part The method that product obtains 23 fetching portion of circuit product.In addition, first partial product obtains circuit 22 and second partial product obtains circuit 23 Internal circuit configuration can be identical, the function of external output port can also be identical, this embodiment is not repeated second part Product obtains the specific structure of circuit 23.
A kind of data processor provided in this embodiment, data processor pass through low portion product acquiring unit, high position portion Divide product acquiring unit and selector group unit, according to low level target code and high-order target code, after obtaining symbol Bits Expanding First partial product, and the first partial product of target code is obtained according to the first partial product after symbol Bits Expanding, and then to mesh The first partial product of mark coding carries out accumulation process, obtains target operation result;What the data processor can obtain effectively obtains The number taken is less, realizes multiplying to reduce data processor or multiplies accumulating the complexity of operation;Meanwhile the data Processor does not need to carry out multiplication result again one-accumulate operation could to complete to multiply accumulating arithmetic operation, only passes through one Secondary calculating process can be directly realized by multiplication or multiply accumulating arithmetic operation, to reduce the power consumption of data processor;In addition, Data processor can also realize the data operation processing of different mode, to improve the versatility of data processor.
The low portion product acquiring unit 221 in data processor includes: that low level target is compiled in one of the embodiments, Code input port 2211, gating value input mouth 2212, the second data-in port 2213 and low portion product output end Mouth 2214;The low level target code input port 2211 is defeated for receiving the canonical signed number coding processing unit 211 The the first low level target code entered, the gating value input mouth 2212 is for receiving the low level selector group unit After 222 gatings, numerical value in the first low portion after obtained symbol Bits Expanding product, second data-in port 2213 for receiving second data, and the low portion product output port 2214 is for exporting the first of the target code Low portion product.
Specifically, the low portion product acquiring unit 221 in data processor passes through low level target code input port 2211, it can receive the low level target code in the target code of the output of canonical signed number coding unit 211, and pass through second Data-in port 2213 can receive two subdatas (i.e. multiplicand) in the second data.Optionally, low portion product obtains It takes unit 221 can be according to the low level target code received, and the multiplying that receives or multiplies accumulating quilt in operation Multiplier, the low portion product after obtaining the corresponding symbol Bits Expanding of low data, and according to the low portion after symbol Bits Expanding Product obtains the first low portion product of target code.Optionally, if the second data in low portion product acquiring unit 221 input The multiplicand bit wide that port 2213 receives isN, then low portion accumulates first after the symbol Bits Expanding that acquiring unit 221 obtains The bit wide of low portion product can be equal to 2N
It should be noted that low portion product acquiring unit 221 can be received low by gating value input mouth 2212 When the data operation for the different mode that digit selector group unit 222 gates, in the low portion product after obtained symbol Bits Expanding Correspondence bit value;Then the low portion after the currently available symbol Bits Expanding of low portion product acquiring unit 221 is long-pending, with Correspondence bit value after gating is combined, the first low portion product after obtaining symbol Bits Expanding.
Optionally, data processor includes the high-order portion product acquiring unit 223, the high-order portion product acquiring unit 223 include: high-order target code input port 2231, gating value input mouth 2232, the second data-in port 2233 with And high-order portion product output port 2234;The high position target code input port 2231 is for receiving canonical signed number coding The high-order target code that unit 211 exports, the gating value input mouth 2232 is for receiving the high digit selector group list After 224 gating of member, numerical value in the first high-order portion after the symbol Bits Expanding of output product, second data input pin Mouthfuls 2233 for receiving second data, and the high-order portion product output port 2234 is used to export the of the target code One high-order portion product.
It is understood that the method that low portion product acquiring unit 221 obtains the first low portion product of target code, Identical as the long-pending method of the first high-order portion that high-order portion product acquiring unit 223 obtains target code, the present embodiment is no longer superfluous The method for stating high-order portion product 223 fetching portion of acquiring unit product.In addition, low portion product acquiring unit 221 and high-order portion The internal circuit configuration of product acquiring unit 223 can be identical, and the function of external output port can be similar, and the present embodiment is no longer superfluous State the specific structure of high-order portion product acquiring unit 223.
A kind of data processor provided in this embodiment, the low portion product acquiring unit in data processor can basis Each bit value in low level target code obtains the low portion product after symbol Bits Expanding, then will be low after symbol Bits Expanding The long-pending numerical value with low level selector group one-cell switching of bit position is combined, the first low portion after obtaining symbol Bits Expanding Product, and the first low portion product of target code is obtained according to the first low portion product after symbol Bits Expanding, and then to target The the first low portion product and high-order portion product of coding carry out accumulation process, obtain the data operation of different mode as a result, should The number effectively obtained that data processor can obtain is less, realizes multiplying to reduce data processor or multiplies tired Add the complexity of operation;Meanwhile the data operation processing of different mode may be implemented in the data processor, to improve data The versatility of processor.
Data processor includes low level selector group unit 222, the low level selector group in one of the embodiments, Unit 222 includes: low level selector 2221, and multiple low level selectors 2221 are used for first after the symbol Bits Expanding Numerical value in low portion product is gated.
Specifically, 2221 number of low level selector for including in above-mentioned low level selector group unit 222, can be equal to 3N* (N+ 1), 2NIt can indicate that data processor is presently in the bit wide of reason data, each in the low level selector group unit 222 The internal circuit configuration of low level selector 2221 can be identical.Optionally, if data processor can currently handle 2NPosition * 2NPosition The multiplying of data, then each canonical signed number coding unit 211 connection correspondence (N+ 1) a low portion product obtains In unit 221, it may include 4NA numerical generation subelement, wherein 2NA numerical generation subelement can connect 2NA low level Selector 2221, each numerical generation subelement connect a low level selector 2221.Optionally, above-mentioned 2NA low level selection Device 2221 corresponding 2NA numerical generation subelement can be high by 2 in the first low portion product after sign bit extensionNPosition data Corresponding numerical generation subelement, and this 2NThe internal circuit configuration of a low level selector 2221 and selector 212 can be complete It is identical, meanwhile, this 2NThe external input port of a low level selector 2221 is in addition to function selection mode signal input port (mode) outside, there are two other input ports.Optionally, if data processor can handle the data fortune of four kinds of different modes It calculates, and the multiplicand bit wide that data processor receives is 2N, then two other input terminals of above-mentioned low level selector 2221 The signal that mouth can receive is respectively numerical value 0, carries out 2 with data processorNPosition * 2NWhen the multiplying of position data, the low level The symbol bit value in the first low portion product after the correspondence symbol Bits Expanding that partial product acquiring unit 221 obtains.Wherein, (N + 1) a low portion product acquiring unit 221 can connect (N+ 1) 2 are organizedNA low level selector 2221, the 2 of each groupNA low level choosing Selecting symbol bit value that device 2221 receives can be identical, can not also be identical;But the 2 of same groupNA low level selector The 2221 correspondence symbol bit values received are identical, and the symbol bit value can be according to each group 2NA low level selection Device 2221, be correspondingly connected with low portion product acquiring unit 221 obtain symbol Bits Expanding after the first low portion product in Symbol bit value obtains.
In addition, each low portion accumulates 4 that acquiring unit 221 includesNA numerical generation subelement, wherein corresponding toNIt is a Numerical generation subelement can be not connected to low level selector 2221, at this point, shouldNThe numerical value that a numerical generation subelement obtains, can Think the numerical value that data processor is presently in the first low level target code of the data acquisition of reason multiplying difference bit wide, The correspondence bit value in the first low portion product after obtained correspondence symbol Bits Expanding;It is also understood thatNA numerical generation The numerical value that subelement obtains can correspond in the first low portion product after corresponding symbol Bits Expanding from lowest order the (the i.e. the 1st Position) it is counted to highest order, the 1st to theNAll numerical value between bit value.
It should be noted that each low portion accumulates 4 that acquiring unit 221 includesNIn a numerical generation subelement, remain RemainingNA numerical generation subelement also can connectNA low level selector 2221, each numerical generation subelement can connect 1 low level selector 2221;It shouldNA low level selector 2221 can be identical with the internal circuit configuration of selector 212, and shouldNThe external input port of a low level selector 2221 other than function selection mode signal input port (mode), there are two Other input ports;The signal that the two other input ports can receive, respectively data processor carry outNPosition *NDigit According to multiplying, at the symbol bit value and data in the first low portion product after obtained correspondence symbol Bits Expanding It manages device and carries out 2NPosition * 2NThe multiplying of position data, it is corresponding in the first low portion product after obtained correspondence symbol Bits Expanding Bit value.Wherein, (N+ 1) a low portion product acquiring unit 221 can connect (N+ 1) groupNA low level selector 2221, each group 'sNThe symbol bit value that a low level selector 2221 receives can be identical, can not also be identical;But same groupNIt is a low The symbol bit value that digit selector 2221 receives is identical, and the symbol bit value can be according to each groupNA low level Selector 2221, the low portion being correspondingly connected with accumulate the first low portion product after the symbol Bits Expanding that acquiring unit 221 obtains In symbol bit value obtain.
In addition, each groupNIt is right in the first low portion product after the symbol Bits Expanding that a low level selector 2221 receives Bit value is answered, it can be according to the low portion product acquiring unit 221 that this group of low level selector 2221 is connected, the sign bit of acquisition The correspondence bit value in the first low portion product after extension determines;And each groupNIt is each in a low level selector 2221 The correspondence bit value that a low level selector 2221 receives may be the same or different.Wherein, each low portion product obtains It takes 4 in unit 221NThe position distribution rule of a numerical generation subelement, can be in upper low portion product acquiring unit 221 In 4NOn the basis of a numerical generation subunit position, a numerical generation subelement is moved to left.Optionally, subsequent arithmetic is participated in In the first low portion product of all target codes, the only bit wide of the first low portion product of first aim coding can be with The bit wide 4 of the first low portion product after equal to first symbol Bits ExpandingN;The first low portion product of remaining target code Bit wide all can be one few on the basis of the first low portion of upper target code product, and the last one target code The bit wide of first high-order portion product can be equal to (2N-1).
Optionally, the high digit selector group unit 224 includes high digit selector 2241, multiple high digit selectors 2241 for gating the numerical value in the first high-order portion product after the symbol Bits Expanding.
It should be noted that the method that high digit selector 2241 gates numerical value, gates numerical value with high digit selector 1111ea Method it is identical, the present embodiment to high digit selector 2241 gating numerical value method repeat no more.
A kind of data processor provided in this embodiment, the low level selector group unit in data processor can gate low Numerical value in bit position product, the first low portion product after obtaining symbol Bits Expanding, and then according to first after symbol Bits Expanding Low portion product obtains the first partial product of target code, and is carried out by first partial product of the compressor circuit to target code tired Add processing, obtains the target operation result of different mode;The data operation processing of different mode may be implemented in the data processor, To improve the versatility of data processor.
Fig. 5 is a kind of concrete structure schematic diagram for data processor that another embodiment provides, wherein data processor packet The first compressor circuit 24 is included, first compressor circuit 24 includes: amendment Wallace tree group unit 241 and summing elements 242, institute The output end for stating amendment Wallace tree group unit 241 is connect with the input terminal of the summing elements 242;The amendment Wallace tree It is every in the first partial product of all target codes of acquisition when group unit 241 is used to handle the data operation of different mode One columns value carry out accumulation process, obtain accumulating operation as a result, the summing elements 242 be used for the accumulating operation result into Row add operation.
Specifically, above-mentioned amendment Wallace tree group unit 241 can obtain the target that circuit 22 obtains to first partial product Each columns value in the first low portion product of coding and the first high-order portion product of target code carries out accumulation process, And accumulation process is carried out by two operation results that 242 pairs of summing elements amendment Wallace tree group units 241 obtain, obtain mesh Mark operation result.Wherein, when carrying out accumulation process by amendment Wallace tree group unit 241, first of all target codes The regularity of distribution for dividing product, can be characterized as every a line and correspond to lowest order numerical value present position in the first partial product of target code, It corresponds to lowest order numerical value present position in the first partial product of target code than next line to be staggered to the right one digit number value, still, often Highest bit value in the first partial product of one corresponding target code, with first aim coding first partial product in most High-order numerical value is located at same row.Optionally, amendment Wallace tree group unit 241 can be according to the first part of all target codes The long-pending regularity of distribution carries out accumulation process to each columns value in the first partial product of all target codes.Optionally, above-mentioned Correcting two operation results that Wallace tree group unit 241 obtains may include and position output signalSumWith carry output signalsCarry
Optionally, second compressor circuit 25 includes: to correct Wallace tree group unit 251 and summing elements 252, described The output end of amendment Wallace tree group unit 251 is connect with the input terminal of the summing elements 252;The amendment Wallace tree group It is each in the second partial product of all target codes of acquisition when unit 251 is used to handle the data operation of different mode Columns value carries out accumulation process, obtains accumulating operation as a result, the summing elements 252 are used to carry out the accumulating operation result Add operation.
It should be noted that the method that the first compressor circuit 24 carries out compression processing to the first partial product of target code, It is identical as the second partial product progress method of compression processing of second compressor circuit 25 to target code, no longer to this present embodiment Repeat the compression method of the second compressor circuit 25.In addition, the internal structure of the first compressor circuit 24 and the second compressor circuit 25, with And the function of outside port is identical, the specific structure of this embodiment is not repeated the second compressor circuit 25.
A kind of data processor provided in this embodiment, data processor can be to mesh by amendment Wallace tree group unit First low portion of mark coding is long-pending and high-order portion product carries out accumulation process and obtains accumulating operation as a result, and passing through summing elements Accumulation process is carried out to accumulating operation result, obtains target operation result, which may be implemented the number of different mode According to calculation process, to improve the versatility of data processor, the area that data processor occupies AI chip is effectively reduced.
Continue the concrete structure schematic diagram of data processor as shown in Figure 5 in one of the embodiments, wherein data Processor includes the amendment Wallace tree group unit 241, which includes: low level Wallace tree Subelement 2411, selector 2412 and high-order Wallace tree subelement 2413, the low level Wallace tree subelement 2411 Output end is connect with the input terminal of the selector 2412, the output end of the selector 2412 and the high-order Wallace tree The input terminal of unit 2413 connects;Wherein, multiple low level Wallace tree subelements 2411 are used for the target code Each columns value in first partial product carries out accumulating operation, and the selector 2412 is for gating the high-order Wallace tree The received carry input signal of unit 2413, multiple high-order Wallace tree subelements 2413 are used for the target code Each columns value in first partial product carries out accumulating operation and obtains the accumulating operation result.
Specifically, the circuit structure of each low level Wallace tree subelement 2411, it can be by full adder and half adder group It closes and realizes, realization can also be combined by 4-2 compressor;The circuit structure of each high-order Wallace tree subelement 2413, can also To combine realization by full adder and half adder, realization can also be combined by 4-2 compressor;In addition, low level Wallace tree subelement 2411 and high-order Wallace tree subelement 2413, can be understood as one kind can be handled multidigit input signal, will Multidigit input signal is added to obtain the circuit of two output signals.Optionally, high position Hua Lai in Wallace tree group unit 241 is corrected The number of scholar tree unit 2413, multiplicand when can currently handle multiplying equal to data processor or multiply accumulating operation Bit wideN, can also be equal to low level Wallace tree subelement 2411 number, and each low level Wallace tree subelement 2411 it Between can be connected in series, can also be connected in series between each high position Wallace tree subelement 2413.Optionally, the last one low level The output end of Wallace tree subelement 2411 is connect with the input terminal of selector 2412, the output end of selector 2412 and first The input terminal of high-order Wallace tree subelement 2411 connects.Optionally, it corrects in Wallace tree group unit 241, each low level Wallace tree subelement 2411 can respective column numerical value to the partial product of all target codes carry out addition process;Each is low Position Wallace tree subelement 2411 can export two signals, i.e. carry signalCarry i With one and position signalSum i ;Wherein,i It can indicate each corresponding number of low level Wallace tree subelement 2411, first low level Wallace tree subelement 2411 Number is 0.Optionally, each low level Wallace tree subelement 2411 receives the number of input signal, can be equal to target The number of the first partial product of coding.Wherein, correct in Wallace tree group unit 241, high-order Wallace tree subelement 2413 with The sum of the number of low level Wallace tree subelement 2411 can be equal to 2N;In the first partial product of all target codes, from minimum 2 can be equal to by arranging the total columns arranged to highestN,NA low level Wallace tree subelement 2411 can be to the of all target codes A part is accumulated lowNIn column, each column carry out accumulating operation,NA high position Wallace tree subelement 2413 can be to all targets The height of the first partial product of codingNEach column in column carry out accumulating operation.
Optionally, the amendment Wallace tree group unit 251 in the second compressor circuit 25 includes: low level Wallace tree subelement 2511, selector 2512 and high-order Wallace tree subelement 2513, the output end of the low level Wallace tree subelement 2511 It is connect with the input terminal of the selector 2512, the output end of the selector 2512 and the high-order Wallace tree subelement 2513 input terminal connection;Wherein, multiple low level Wallace tree subelements 2511 are used for the second of the target code Each columns value in partial product carries out accumulating operation, and the selector 2512 is for gating the high-order Wallace tree subelement 2513 received carry input signals, multiple high-order Wallace tree subelements 2513 are used for the second of the target code Each columns value in partial product carries out accumulating operation and obtains the accumulating operation result.
It should be noted that the circuit structure and its function of the amendment Wallace tree group unit 241 in the first compressor circuit 24 Can, identical as the circuit structure of the amendment Wallace tree group unit 251 in the second compressor circuit 25 and its function, the present embodiment is not The specific structure of amendment Wallace tree group unit 251 is repeated again.
A kind of data processor provided in this embodiment, data processor can be to mesh by amendment Wallace tree group unit The partial product of mark coding carries out accumulation process and obtains two-way output signal, and carries out accumulation process to the two-way output signal, obtains To different mode data operation as a result, the data processor may be implemented different mode data operation processing, to improve The versatility of data processor effectively reduces the area that data processor occupies AI chip;In addition, the data processor is simultaneously It does not need to carry out multiplication result again one-accumulate operation could to complete to multiply accumulating arithmetic operation, only passes through once-through operation Journey can be directly realized by multiplication or multiply accumulating arithmetic operation, to reduce the power consumption of data processor.
A kind of data processor that another embodiment provides, wherein data processor includes the summing elements 242, should Summing elements 242 include: adder 2421, and the adder 2421 is used to carry out add operation to the accumulating operation result.
Specifically, adder 2421 can be the adder of different bit wides.Optionally, adder 2421 can receive amendment The two paths of signals that Wallace tree group unit 241 exports carries out add operation to two-way output signal, and output data processor is current The data operation result of handled mode.Optionally, above-mentioned adder 2421 can be carry lookahead adder.
A kind of data processor provided in this embodiment, data processor can be to amendment Wallace trees by summing elements Group unit output two paths of signals carry out accumulation process, export the data operation of different mode as a result, the data processor not It needs to carry out multiplication result again one-accumulate operation to complete to multiply accumulating arithmetic operation, only passes through once-through operation process Multiplication can be directly realized by or multiply accumulating arithmetic operation, to reduce the power consumption of data processor.
Data processor includes the adder 2421 in one of the embodiments, which includes: carry Signal input port 2421a and position signal input port 2421b and operation result output port 2421c;The carry signal Input port 2421a is for receiving carry signal and position signal input port 2421b for receiving and position signal, operation result Output port 2421c for output carry signal with and position signal progress accumulation process result.
Specifically, adder 2421 can receive amendment Wallace tree group unit by carry signal input port 2421a The carry signal of 241 outputsCarry, exported by receiving amendment Wallace's array circuit 241 with position signal input port 2421b And position signalSum, and by carry signalCarryWith with position signalSumAccumulated result is carried out, operation result output end is passed through Mouth 2421c output.
It should be noted that data processor can use the adder 2421 of different bit wides, right during calculation process Correct the carry output signals that Wallace tree group unit 241 exportsCarry, and with position output signalSumAdd operation is carried out, Wherein, above-mentioned adder 2421 can handle the bit wide of data, can be equal to data processor and need to carry out multiplying or multiply tired 2 times of multiplicand bit wide when adding operation.
A kind of data processor provided in this embodiment, data processor can be to amendment Wallace trees by summing elements Group unit output two paths of signals carry out accumulating operation, export the data operation of different mode as a result, the data processor not It needs to carry out multiplication result again one-accumulate operation to complete to multiply accumulating arithmetic operation, only passes through once-through operation process Multiplication can be directly realized by or multiply accumulating arithmetic operation, to reduce the power consumption of data processor.
Fig. 6 is the flow diagram for the data processing method that one embodiment provides, and this method can pass through Fig. 1 and Fig. 3 Shown in data processor handled, the present embodiment what is involved is realize four kinds of different modes data operation process.Such as Shown in Fig. 6, this method comprises:
S101, pending data and function selection mode signal are received, wherein the function selection mode signal is used to indicate number The data operation of different mode can be currently handled according to processor.
Specifically, multiplier and multiplicand when above-mentioned pending data may include multiplying or multiply accumulating operation.It can Choosing, data processor can receive one by the first amendment coding sub-circuit and the second amendment coding sub-circuit respectively Pending data, the pending data may include two subdatas to be processed, the two subdatas to be processed can be same position Wide identical subdata, or with the different subdatas of bit wide.Optionally, two subdatas in above-mentioned pending data After can splicing as a whole, it is input to the first amendment coding sub-circuit and the second amendment coding sub-circuit, it can be with Separate while being input to the first amendment coding sub-circuit and the second amendment coding sub-circuit.Wherein, above-mentioned subdata to be processed can Think fixed-point number, and bit wide can be 2N, the data bit width obtained after two subdata splicings to be processed can be 4N
It should be noted that the first multiplying operational circuit and the second multiplying operational circuit can receive identical function Energy selection mode, the function selection mode signal can there are four types of unlike signals, four kinds of function selection mode signals to respectively correspond The data operation of the accessible four kinds of modes of data processor, the data operation of four kinds of modes may includeNPosition *NPosition data Multiplying,NPosition *NPosition data multiply accumulating operation, 2NPosition * 2NThe multiplying and 2 of position dataNPosition *NPosition data multiply Accumulating operation.Wherein, for data processor according to the different function selection mode signal received, can determine can currently handle tool The data operation of bulk-mode.In addition, a subdata to be processed in a pending data can be used as at data processor Multiplier when managing multiplying or multiplying accumulating calculation process, another subdata to be processed can be used as data processor processes and multiply Method operation or multiplicand when multiplying accumulating calculation process.
S102, according to the function selection mode signal, judge whether the pending data needs to carry out deconsolidation process.
Specifically, data processor can determine that data processor is current according to the function selection mode signal received Accessible data bit width, to judge whether to need to carry out deconsolidation process to pending data.Wherein, deconsolidation process can characterize For the data that pending data is divided into multiple groups same bit-width.
Optionally, judge whether the pending data needs according to the function selection mode signal in above-mentioned S102 The step of carrying out deconsolidation process, may include: to judge the bit wide of the pending data according to the function selection mode signal It is whether equal with the data bit width of data processor currently accessible associative mode operation.
Optionally, judge whether the pending data needs according to the function selection mode signal in above-mentioned S102 After the step of carrying out deconsolidation process, if the method can also include: that the pending data does not need to carry out deconsolidation process, It then continues to execute and canonical signed number coded treatment is carried out to the pending data, obtain the target code.
It should be noted that it is above-mentioned according to function selection mode signal, judge whether pending data is split Processing, can actually be interpreted as, according to function selection mode signal, judge that the bit wide of pending data is worked as with data processor Whether the data bit width of preceding accessible associative mode operation is equal, if equal, does not need to split pending data Otherwise processing needs to carry out deconsolidation process to pending data.For example, the first amendment coding sub-circuit in data processor And second the bit wide of two data that is respectively received of amendment coding sub-circuit beNBit, and data processor can work as Before can handleNPosition *NThe multiplying of position, at this point, the bit wide of characterization pending data is current accessible right with data processor Answer the data bit width of mode operation equal.Wherein, above-mentioned canonical signed number coded treatment can be characterized as through numerical value 0, -1 With the data handling procedure of 1 coding.Optionally, the bit wide of target code can be presently in reason data equal to data processor Bit wide adds 1.
If S103, the pending data need to carry out deconsolidation process, deconsolidation process is carried out to the pending data, Data after being split.
For example, the first amendment coding sub-circuit and the second amendment coding sub-circuit in data processor are respectively received The bit wides of two data be 2NBit, and data processor can be handled currentlyNPosition *NThe multiplying of position, at this point, Receive two data can be divided by the first amendment coding sub-circuit and the second amendment coding sub-circuit automatically respectively It is highNDigit is accordingly and lowNPosition data, to meet the data bit width of data processor currently accessible associative mode operation.
S104, canonical signed number coded treatment is carried out to the data after the fractionation, obtains target code.
Optionally, canonical signed number coded treatment is carried out to the data after the fractionation in above-mentioned S104, obtains target The step of coding, may include: will be continuous in the data after the fractionationlBit value 1 be converted to (l+ 1) position highest bit value Be 1, lowest order numerical value be -1, remaining position be numerical value 0 after, obtain the target code, whereinlMore than or equal to 2.
Specifically, if the bit wide for the pending data that data processor receives is 2N, data processor can currently handle Data bit width beN, then the first amendment coding sub-circuit in data processor and the second amendment coding sub-circuit can be with Automatically by 2NPosition data split into heightNDigit is accordingly and lowNPosition data, meanwhile, respectively to heightNPosition data and lowNPosition data into Row canonical signed number coded treatment obtains corresponding high-order target code and low level target code.Optionally, above-mentioned wait locate Managing after data carry out deconsolidation process may include height to be processedNPosition data and to be processed lowNPosition data.Wherein, if wait locate The bit wide for managing data is 2N, then highNPosition data are properly termed as high position data to be processed, lowNPosition data are properly termed as to be processed High position data.
S105, conversion process is carried out according to the data after the target code and the fractionation, obtains symbol Bits Expanding Partial product afterwards.
Specifically, above-mentioned conversion process can be characterized as, based on the multiplicand in multiplying, by the number in target code Value is converted into the partial product after symbol Bits Expanding.Optionally, the bit wide of the partial product after symbol Bits Expanding can be equal at data Reason device is presently in 2 times of reason data bit width.
S106, according to the function selection mode signal, judge whether need to the partial product after the symbol Bits Expanding Swap processing.
Optionally, according to the function selection mode signal in above-mentioned S106, judge to the portion after the symbol Bits Expanding Divide whether product needs the step of swapping processing, may include: that data processing is judged according to the function selection mode signal Whether the data bit width that device is presently in reason is identical.
Specifically, working as data processor processes 2NPosition *NWhen multiplying accumulating operation of data of position, partial product switched circuit just may be used According to actual needs, first to be corrected the first low portion product or sign bit after encoding the symbol Bits Expanding that sub-circuit obtains The first high-order portion product after extension, the second low portion product after the symbol Bits Expanding that sub-circuit obtains is encoded with the second amendment After symbol Bits Expanding or the second high-order portion product swaps, it is also understood that being, data processor is handling other three kinds When the data operation of mode, partial product switched circuit is vacant state, low portion product and sign bit after symbol Bits Expanding High-order portion product after extension does not do corresponding exchange processing.Meanwhile first two sub- data bit widths in data and the second data It is 2NIf data processor can currently handle oneNPosition *NWhen the multiplying of position data, according to actual needs, at this time the Having a data in one data and the second data is 0, and the high-order numerical value in two subdatas that another data includes is 0, Or low level numerical value is 0, according to actual needs, the first data and the second data can be counted according to initial data at this time It calculates;If data processor can currently handle one 2NPosition * 2NWhen the multiplying of position data, according to actual needs, at this time first Having a data in data and the second data is 0, and high-order numerical value and low level numerical value are in two subdatas of another data Non-zero numerical value;If data processor can currently handle two 2NPosition * 2NWhen the multiplying of position data, according to actual needs, at this time Data 0 are not present in first data and the second data.
It should be noted that judge data processor be presently in reason data bit width it is whether identical, can actually table Sign is, data processor be presently in reason multiplicand bit wide and multiplier bit wide it is whether equal.
Optionally, judge according to the function selection mode signal to after the symbol Bits Expanding in above-mentioned S106 After whether partial product needs the step of swapping processing, the method can also include: if desired to expand the sign bit Partial product after exhibition swaps processing, then to the high-order portion product or low portion in the partial product after the symbol Bits Expanding Product swaps processing.
If S107, not needing to swap processing to the partial product after the symbol Bits Expanding, the sign bit is expanded Partial product of the partial product as target code after exhibition.
Specifically, if not needing to swap the partial product after symbol Bits Expanding processing, the first amendment coding electricity Road can will obtain the first partial product after symbol Bits Expanding as the first partial product of target code, the second amendment coding electricity Road can will obtain the second partial product after symbol Bits Expanding as the second partial product of target code.
S108, compression processing is carried out to the partial product of the target code, obtains target operation result.
Specifically, data processor can the columns value in the partial product to all target codes carry out accumulation process, obtain To target operation result.Optionally, the bit wide of target operation result can be presently in reason data bit width equal to data processor 2 times.
A kind of data processing method provided in this embodiment receives pending data and function selection mode signal, according to Function selection mode signal, judges whether pending data needs to carry out deconsolidation process, if pending data is split Processing then carries out deconsolidation process to pending data, and the data after being split, carrying out canonical to the data after fractionation has symbol Number encoder processing, obtains target code, carries out conversion process according to the data after target code and fractionation, obtains sign bit expansion Partial product after exhibition judges whether need to swap to the partial product after symbol Bits Expanding according to function selection mode signal Processing, if not needing to swap processing to the partial product after symbol Bits Expanding, using the partial product after symbol Bits Expanding as The partial product of target code carries out compression processing to the partial product of target code, obtains target operation result, this method passes through number Multiplying not only may be implemented according to processor, can also realize and multiply accumulating operation, to improve the general of data processor Property;In addition, this method does not need to carry out multiplication result again one-accumulate operation could to complete to multiply accumulating arithmetic operation, Only by once-through operation process can be directly realized by multiply accumulating or multiplying operation, to reduce the function of data processor Consumption;In addition, this method can also carry out canonical signed number coded treatment to the data received, obtained live part product Number is less, to reduce the complexity realized multiplying or multiply accumulating operation.
As one of embodiment, the data after the fractionation are carried out at canonical signed number coding in above-mentioned S104 The step of managing, obtaining target code, the method may include:
S1041, canonical signed number coded treatment is carried out to the data after the fractionation, obtains intermediate code.
Specifically, the data after the fractionation of above-mentioned carry out canonical signed number coded treatment can be multiplying or multiply tired Add the multiplier in operation.
S1042, according to the intermediate code and the function selection mode signal, obtain the target code.
Specifically, the method for canonical signed number coded treatment can characterize in the following manner: forNPosition multiplier and Speech, is handled, if it exists continuously from low level numerical value to high-order numerical valuel(l >=2) bit value 1 when, then can will be continuousnBit value 1 Be converted to data " 1(0) l-1(- 1) ", and by remaining correspond to (N-l) bit value and conversion after (l+1) bit value carries out In conjunction with obtaining a new data;Then using the new data as the primary data of next stage conversion process, until conversion process There is no continuous in the new data obtained afterwardsl(l >=2) until bit value 1;Wherein, rightNPosition multiplier carries out canonical signed number Coded treatment, the bit wide of obtained target code can be equal to (N+1).Further, in canonical signed number coded treatment, Data 11 can be converted to (100-001), i.e., data 11 can equivalence be converted to 10(-1);Data 111 can be converted to (1000-0001), i.e. data 111 can equivalence be converted to 100(-1);And so on, it is other continuousl(l >=2) bit value The mode of 1 conversion process is also similar.
For example, the multiplier that the first amendment coding sub-circuit or the second amendment coding sub-circuit in data processor receive For " 001010101101110 ", the first new data for obtain after first order conversion process to the multiplier is " 0010101011100(-1) 0 ", continue be to the second new data that the first new data obtain after the conversion process of the second level " 0010101100(-1) 00(-1) 0 ", continue to carry out the third new data obtained after third level conversion process to the second new data For " 0010110(-1) 00(-1) 00(-1) 0 ", continue to carry out the "four news" (new ideas obtained after fourth stage conversion process to third new data Data be " 00110(-1) 0(-1) 00(-1) 00(-1) 0 ", continue to the 4th new data carry out level V conversion process after obtain The 5th new data be " 010(-1) 0(-1) 0(-1) 00(-1) 00(-1) 0 ", there is no continuous in the 5th new datal(l >= 2) bit value 1, at this point, the 5th new data is properly termed as initial code, and after carrying out a cover processing to initial code, table Sign canonical signed number coded treatment is completed to obtain intermediate code, wherein the bit wide of initial code can be equal to the bit wide of multiplier. Optionally, the first amendment coding sub-circuit or the second amendment coding sub-circuit carry out canonical signed number coded treatment to multiplier Afterwards, the new data (i.e. initial code) obtained, if the highest bit value and time high-order numerical value in new data are " 10 " or " 01 ", First amendment coding sub-circuit or the second amendment coding sub-circuit can highest bit value to the new data high one place's benefit One digit number value 0, high three bit value for obtaining corresponding intermediate code is respectively " 010 " or " 001 ".Optionally, above-mentioned intermediate code Bit wide can be equal to data processor be presently in reason data bit wide add 1.
In addition, if the data bit width that data processor receives is 2N, and can currently handleNPosition data operation, then data The first amendment coding sub-circuit or the second amendment coding sub-circuit in processor, can be by 2NPosition data split into two groupsNPosition Data carry out data operation respectively, at this point, by obtain two groups (N+1) position intermediate code can be used as target volume after being combined Code;If data processor can currently handle 2NPosition data operation, then the first amendment coding sub-circuit or the in data processor Two amendment coding sub-circuits, can be to (the 2 of acquisitionN+1) one digit number value is mended at high one of the highest bit value of position intermediate code 0(, that is, complement processing) after, by complement, treated (2N+2) position data are as target code.
A kind of data processing method provided in this embodiment carries out canonical signed number coding to the data after the fractionation Processing, obtains intermediate code, according to the intermediate code and the function selection mode signal, obtains the target code, This method can carry out multiplying to the data of a variety of different bit wides and multiply accumulating operation, effectively reduce data processor Occupy the area of AI chip;Meanwhile this method can carry out canonical signed number coded treatment to data, reduce in calculating process The number of the live part product of acquisition improves operation efficiency to reduce multiplying or multiply accumulating the complexity of operation.
It is carried out in above-mentioned S105 according to the data after the target code and the fractionation in one of the embodiments, The step of conversion process, partial product after obtaining symbol Bits Expanding, may include:
S1051, conversion process is carried out according to the data after the target code and the fractionation, obtains initial protion product.
Specifically, if the numerical value in target code is -1, and the data after fractionation areX, then initial protion product can for-X, If the numerical value in target code is 1, initial protion product can beXIf the numerical value in target code is 0, initial protion product It can be 0.
S1052, sign bit extension process is carried out to initial protion product, the part after obtaining the symbol Bits Expanding Product.
Specifically, the bit wide of initial protion product can be equal to the bit wide that data processor is presently in reason dataN, sign bit Partial product after extension can be equal to data processor and be presently in reason data bit widthN2 times.Wherein, in initial protion productN Bit value can be low in the partial product after sign bit extensionNBit value, the height in partial product after symbol Bits ExpandingNDigit Value can be the highest bit value in initial protion product, i.e. symbol bit value in initial protion product.
The number of a kind of data processing method provided in this embodiment, the live part product that this method can obtain is less, To reduce multiplying or multiply accumulating the complexity of operation.
As one of embodiment, compression processing is carried out to the partial product of the target code in above-mentioned S108, is obtained The step of target operation result, may include:
S1081, accumulation process is carried out to the partial product of the target code, obtains intermediate calculation results.
For example, to low level target code, (bit wide isN+ 1) lowest order numerical value to highest bit value is numbered in, lowest order Value number is 1, and the number of highest bit value isN+ 1, then the number of the low portion product of corresponding target code is also similar, together When, to high-order target code, (bit wide isM+ 1) lowest order numerical value to highest bit value is numbered in, and lowest order value number is 1, the number of highest bit value isM+ 1, then the number of the high-order portion product of corresponding target code is also similar, all target codes Low portion product and the regularities of distribution of partial product of all target codes can be characterized as the high position of the target code that number is 1 The lowest order numerical value of partial product is with numberNThe secondary low level numerical value of the low portion product of+1 target code is located at same row, In On the basis of the high-order portion product of first aim coding, the secondary low level numerical value of the high-order portion product of other target codes is under The lowest order numerical value of the high-order portion product of one target code is located at same row, long-pending in the low portion of first aim coding On the basis of, the secondary low level numerical value of the low portion product of other target codes is long-pending most with the low portion of next target code Low level numerical value is located at same row.
It should be noted that amendment Wallace tree group unit can each columns in the partial product to all target codes Value carries out accumulation process.
S1062, accumulation process is carried out to the intermediate calculation results by summing elements, obtains the target operation knot Fruit.
Optionally, accumulation process is carried out to the intermediate calculation results by summing elements in above-mentioned S1062, obtained described The step of target operation result, can specifically include: low level Wallace tree subelement is in the partial product of all target codes Columns value carries out accumulation process, obtains accumulating operation result;Selector is according to the function selection mode signal to described cumulative Operation result is gated, and carry gating signal is obtained;High-order Wallace tree subelement according to the carry gating signal and Columns value in the partial product of the target code carries out accumulation process, obtains the target operation result.
Specifically, being advised according to the distribution of the high-order portion product of the low portion product and all target codes of all target codes Rule is it is found that total columns that the partial product of all target codes corresponds to numerical value is 2N(NReason data are presently in for data processor Bit wide), the corresponding number of each columns value can be 0 since lowest order numerical value ..., 2N- 1, wherein number 0 toN- 1 can be with Claim lowNColumns value.Optionally, accumulating operation result can be the carry-out of the last one high-order Wallace tree subelement output SignalCout
It should be noted thatNA low level Wallace tree subelement can be according to number order to lowNColumns value adds up Operation obtains accumulating operation result.Optionally, accumulating operation result may include that the carry of each Wallace tree subelement is defeated Signal outCarry,SumAnd the output signal of the last one high-order Wallace tree subelementCout
It is understood that the selector in amendment Wallace tree group unit can be according to the function selection mode received Signal gates the output signal of the last one low level Wallace tree subelementCoutOr numerical value 0,Obtain carry gating signal.
In the present embodiment, according to the regularity of distribution of the partial product of all target codes it is found that the portion of all target codes The total columns for dividing the corresponding numerical value of product is 2N(NThe bit wide of reason data is presently in for data processor), since lowest order numerical value The corresponding number of each columns value can be 0 ..., 2N- 1, wherein numberNTo 2N- 1 can claim heightNColumns value.
It should be noted thatNA high position Wallace tree subelement can be according to number order to heightNColumns value adds up Operation exports accumulating operation result.Wherein, the carry input signal that first high-order Wallace tree subelement receives can be The carry gating signal of selector output.If currently processed 8 data operations of data processor, corresponding amendment compression son electricity The circuit structure diagram on road may refer to shown in Fig. 7.
A kind of data processing method provided in this embodiment, by amendment Wallace tree group unit to the part of target code Product carries out accumulation process, obtains intermediate calculation results, carries out accumulation process to the intermediate calculation results by summation circuit, obtains To target operation result, this method can be according to the function selection mode signal that data processor receives to a variety of different bit wides Data carry out multiplying, effectively reduce data processor occupy AI chip area;Meanwhile this method can obtain The number of live part product is less, to reduce multiplying or multiply accumulating the complexity of operation, improves operation efficiency;In addition, This method does not need to carry out multiplication result again one-accumulate operation could to complete to multiply accumulating arithmetic operation, only passes through one Secondary calculating process can be directly realized by multiplication or multiply accumulating arithmetic operation, effectively reduce the power consumption of data processor.
Fig. 8 is the flow diagram for the data processing method that one embodiment provides, and this method can pass through Fig. 2 and Fig. 5 Shown in data processor handled, the present embodiment what is involved is realize four kinds of different modes data operation process.Such as Shown in Fig. 8, this method comprises:
S201, pending data and function selection mode signal are received, wherein the function selection mode signal is used to indicate The data operation of the current accessible associative mode of data processor.
Specifically, data processor can receive a pending data by canonical signed number coding circuit, pass through First partial product obtains circuit and second partial product obtains circuit and receives another pending data respectively, and canonical has symbol Number coding circuit, first partial product obtain circuit and second partial product obtains circuit and can receive the same function simultaneously Selection mode signal.Optionally, pending data may include two subdatas to be processed, the two subdatas to be processed can be with For with the identical subdata of bit wide, or with the different subdatas of bit wide.Optionally, two in a pending data Subdata to be processed can splice after as a whole, be input to canonical signed number coding circuit, can also separate simultaneously It is input to canonical signed number coding circuit, conduct after two subdatas to be processed in another pending data can splice One entirety, while being input to first partial product and obtaining circuit and second partial product acquisition circuit, it can also separate while input Circuit is obtained to first partial product and second partial product obtains circuit.Wherein, above-mentioned subdata to be processed can be fixed-point number, and Bit wide can be 2N, the data bit width obtained after two subdata splicings to be processed can be 4N
It should be noted that above-mentioned function selection mode signal can there are four types of, four kinds of function selection mode signals difference The data operation of the accessible four kinds of modes of corresponding data processor, the data operation of four kinds of modes may includeNPosition *NDigit According to multiplying,NPosition *NPosition data multiply accumulating operation, 2NPosition * 2NThe multiplying and 2 of position dataNPosition *NPosition data Multiply accumulating operation.In addition, a subdata to be processed in a pending data can be used as data processor processes and multiply Method operation or multiplier when multiplying accumulating calculation process, another subdata to be processed can be used as data processor processes multiplication fortune Multiplicand when calculating or multiplying accumulating calculation process.
S202, according to the function selection mode signal, the pending data is carried out at canonical signed number coding Reason, obtains target code.
Optionally, according to the function selection mode signal in above-mentioned S202, carrying out canonical to the pending data has Symbolic number coded treatment, the step of obtaining target code, comprising:, will be described to be processed according to the function selection mode signal It is continuous in datalBit value 1 be converted to (l+ 1) highest bit value in position is 1, and lowest order numerical value is -1, remaining position is numerical value 0 Afterwards, the target code is obtained, whereinlMore than or equal to 2.
Specifically, if the bit wide for the pending data that data processor receives is 2N, data processor can currently handle Data bit width beN, then the canonical signed number coding circuit in data processor can be automatically by 2NPosition data split into heightN Digit is accordingly and lowNPosition data, meanwhile, respectively to heightNPosition data and lowNPosition data carry out canonical signed number coded treatment, Obtain corresponding high-order target code and low level target code.
Further, the method for canonical signed number coded treatment can characterize in the following manner: forNPosition multiplier and Speech, is handled, if it exists continuously from low level numerical value to high-order numerical valuel(l >=2) bit value 1 when, then can will be continuousnBit value 1 Be converted to data " 1(0) l-1(- 1) ", and by remaining correspond to (N-l) bit value and conversion after (l+1) bit value carries out In conjunction with obtaining a new data;Then using the new data as the primary data of next stage conversion process, until conversion process There is no continuous in the new data obtained afterwardsl(l >=2) until bit value 1;Wherein, rightNPosition multiplier carries out canonical signed number Coded treatment, the bit wide of obtained target code can be equal to (N+1).Further, in canonical signed number coded treatment, Data 11 can be converted to (100-001), i.e., data 11 can equivalence be converted to 10(-1);Data 111 can be converted to (1000-0001), i.e. data 111 can equivalence be converted to 100(-1);And so on, it is other continuousl(l >=2) bit value The mode of 1 conversion process is also similar.
For example, the multiplier that canonical signed number coding circuit receives is " 001010101101110 ", which is carried out The first new data obtained after first order conversion process is " 0010101011100(-1) 0 ", continues to carry out the first new data the The second new data obtained after second level conversion process is " 0010101100(-1) 00(-1) 0 ", continues to carry out the second new data The third new data obtained after third level conversion process be " 0010110(-1) 00(-1) 00(-1) 0 ", continue to third new data Carry out obtained the 4th new data after fourth stage conversion process be " 00110(-1) 0(-1) 00(-1) 00(-1) 0 ", continue to the Four new datas carry out obtained the 5th new data after level V conversion process be " 010(-1) 0(-1) 0(-1) 00(-1) 00(-1) 0 ", there is no continuous in the 5th new datal(l >=2) bit value 1, at this point, the 5th new data is properly termed as initially compiling Code, and after carrying out the processing of cover to initial code, characterization canonical signed number coded treatment is completed to obtain intermediate code, In, the bit wide of initial code can be equal to the bit wide of multiplier.Optionally, canonical signed number coding circuit carries out canonical to multiplier After signed number coded treatment, obtained new data (i.e. initial code), if highest bit value and time seniority top digit in new data Value is " 10 " or " 01 ", then canonical signed number coding circuit can highest bit value to the new data high one place's benefit one Bit value 0, high three bit value for obtaining corresponding intermediate code is respectively " 010 " or " 001 ".Optionally, above-mentioned intermediate code The bit wide that bit wide can be presently in reason data equal to data processor adds 1.
In addition, if the data bit width that data processor receives is 2N, and can currently handleNPosition data operation, then data Canonical signed number coding circuit in processor, can be by 2NPosition data split into two groupsNPosition data carry out data fortune respectively Calculate, at this point, by obtain two groups (N+1) position intermediate code can be used as target code after being combined;If data processor is worked as Before can handle 2NPosition data operation, then the canonical signed number coding circuit in data processor can be to (the 2 of acquisitionN+1) Mend one digit number value 0(, that is, complement and handle in high one of highest bit value place of position intermediate code) after, by complement, treated (2N+ 2) position data are as target code.
S203, according to the target code and the pending data, obtain target code first partial product and The second partial product of target code.
Specifically, data processor can (multiplying multiplies tired according to actual operation demand and subdata to be processed Adding the multiplier in operation) (multiplying multiplies accumulating in operation with corresponding subdata to be processed for obtained correspondence target code Multiplicand), obtain the first partial product of target code and the second partial product of target code.Wherein, data processor can The first partial product of target code is obtained to obtain circuit by first partial product, circuit is obtained by second partial product and obtains mesh Mark the second partial product of coding.
S204, compression processing is carried out according to first partial product of the function selection mode signal to the target code, Obtain first object operation result.
Optionally, in above-mentioned S204 according to the function selection mode signal to the first partial product of the target code into Row compression processing, the step of obtaining first object operation result, comprising: low level Wallace tree subelement is to all target codes Columns value in first partial product carries out accumulation process, obtains the first accumulating operation result;Selector is selected according to the function Mode signal gates the first accumulating operation result, obtains the first carry gating signal;High-order Wallace tree is single Member carries out accumulation process according to the columns value in the first partial product of the first carry gating signal and the target code, Obtain the first object operation result.
Specifically, data processor can be by the amendment Wallace tree group unit in the first compressor circuit to target code First partial product carry out accumulating operation obtain the first accumulating operation as a result, and according to the function selection mode signal received Corresponding data operation mode determines the first carry gating signal of gating, and using the first carry gating signal as next sub-addition The carry input signal of operation carries out add operation with the columns value in the first partial product to target code, obtains the first mesh Mark operation result.Optionally, the first accumulating operation result may include that amendment Wallace tree group unit carries out accumulating operation, obtain And position output signalSumWith carry output signalsCarry, wherein and position output signalSumWith carry output signalsCarry Bit wide can be identical.In addition, summing elements be equivalent to position output signalSumWith carry output signalsCarryIt carries out tired Add operation.Optionally, above-mentioned first object operation result can be data 0, can also be non-zero data.
It should be noted that data processor can be by the adder in summing elements to amendment Wallace tree group unit The carry output signals of outputCarryWith with position output signalSumAdd operation is carried out, add operation result is exported.Optionally, Each Wallace tree subelement can export a carry output signals in amendment Wallace tree group unitCarry i , with one With position output signalSum i (i=0 ..., 2N- 1,iIt for the reference numeral of each Wallace tree subelement, numbers since 0). Optionally, adder receivesCarry={[Carry 0 :Carry 2N-2], 0 }, that is to say, that the carry that adder receives is defeated Signal outCarryBit wide beN,Carry output signalsCarryIn preceding 2NIn the corresponding amendment Wallace tree group unit of -1 bit value Preceding 2NThe carry output signals of -1 Wallace tree subelement, carry output signalsCarryIn last bit value can use number Value 0 replaces.Optionally, adder receive and position output signalSumBit wide beN,With position output signalSumIn numerical value Can be equal to amendment Wallace tree group unit in each Wallace tree subelement and position output signal.
In the present embodiment, according to the high-order portion product of the long-pending and all target code of the low portion of all target codes The regularity of distribution is it is found that total columns that the partial product of all target codes corresponds to numerical value is 2N(NReason is presently in for data processor The bit wide of data), the corresponding number of each columns value can be 0 since lowest order numerical value ..., 2N- 1, wherein number 0 toN- 1 can claim it is lowNColumns value.Optionally, accumulating operation result can be the last one high-order Wallace tree subelement output Carry output signalsCout
It should be noted thatNA low level Wallace tree subelement can be according to number order to lowNColumns value adds up Operation obtains accumulating operation result.Optionally, accumulating operation result may include that the carry of each Wallace tree subelement is defeated Signal outCarry,SumAnd the output signal of the last one high-order Wallace tree subelementCout
It is understood that the selector in amendment Wallace tree group unit can be according to the function selection mode received Signal gates the output signal of the last one low level Wallace tree subelementCoutOr numerical value 0,Obtain carry gating signal.
In the present embodiment, according to the regularity of distribution of the partial product of all target codes it is found that the portion of all target codes The total columns for dividing the corresponding numerical value of product is 2N(NThe bit wide of reason data is presently in for data processor), since lowest order numerical value The corresponding number of each columns value can be 0 ..., 2N- 1, wherein numberNTo 2N- 1 can claim heightNColumns value.
It should be noted thatNA high position Wallace tree subelement can be according to number order to heightNColumns value adds up Operation exports accumulating operation result.Wherein, the carry input signal that first high-order Wallace tree subelement receives can be First carry gating signal of selector output.
S205, compression processing is carried out according to second partial product of the function selection mode signal to the target code, Obtain the second target operation result.
Optionally, in above-mentioned S205 according to the function selection mode signal to the second partial product of the target code into Row compression processing, the step of obtaining the second target operation result, comprising: low level Wallace tree subelement is to all target codes Columns value in second partial product carries out accumulation process, obtains the second accumulating operation result;Selector is selected according to the function Mode signal gates the second accumulating operation result, obtains the second carry gating signal;High-order Wallace tree is single Member carries out accumulation process according to the columns value in the second partial product of the second carry gating signal and the target code, Obtain the second target operation result.
Further, data processor can compile target by the amendment Wallace tree group unit in the second compressor circuit The second partial product of code carries out accumulating operation and obtains the second accumulating operation as a result, and according to function selection mode signal and second Accumulating operation result gates the second carry gating signal, carries out further according to the second carry gating signal to the second accumulating operation result Accumulation process obtains the second target operation result.Optionally, above-mentioned second target operation result can be data 0, can also be Non-zero data.
In the present embodiment, data processor can be with synchronization process step S204 and step S205, to the two steps Sequencing the present embodiment does not do any restriction.
A kind of data processing method provided in this embodiment, this method can be according to the function selection mode signals received It determining the data operation that can currently handle specific mode, can not only realize multiplying, additionally it is possible to realization multiplies accumulating operation, from And improve the versatility of data processor;In addition, this method does not need to carry out one-accumulate fortune again to multiplication result Calculation could be completed to multiply accumulating arithmetic operation, only can be directly realized by multiplication by once-through operation process or multiply accumulating operation behaviour Make, also effectively reduces the power consumption of data processor;In addition, this method, which can carry out canonical to the pending data received, to be had Symbolic number coded treatment, so that the number of the live part product obtained is less, to reduce multiplying or multiply accumulating operation Complexity improves operation efficiency.
It is obtained in above-mentioned S203 according to the target code and the pending data in one of the embodiments, The step of second partial product of the first partial product of target code and target code, comprising:
S2031, conversion process is carried out according to first object coding and the pending data, obtains the first initial protion Product.
Specifically, if the numerical value in first object coding is -1, and pending data isX, then the first initial protion product can Think-XIf the numerical value in first object coding is 1, the first initial protion product can beXIf the number in first object coding Value is 0, then the first initial protion product can be 0.
S2032, sign bit extension process is carried out according to the first initial protion product and the pending data, obtained The first partial product of the target code.
Specifically, the bit wide of the first initial protion product can be equal to the bit wide that data processor is presently in reason dataN, symbol First partial product after number Bits Expanding can be equal to data processor and be presently in reason data bit widthN2 times.Wherein, first is former In initial portion productNBit value can be low in the first partial product after sign bit extensionNBit value, after symbol Bits Expanding Height in first partial productNBit value can for the first initial protion product in highest bit value, i.e. the first initial protion product in Symbol bit value.
S2033, the conversion process is carried out according to second target code and the pending data, obtains second Initial protion product.
S2034, sign bit extension process is carried out according to the second initial protion product and the pending data, obtained The second partial product of the target code.
Optionally, data processor can be to can be same between step S2031 and S2032, with step S2033 and S2034 Step processing, and any restriction is not done to processing sequence.
The number of a kind of data processing method provided in this embodiment, the live part product that this method can obtain is less, To reduce multiplying or multiply accumulating the complexity of operation.
The embodiment of the present application also provides a machine learning arithmetic units comprising one or more mentions in this application The data processor arrived executes specified engineering for being obtained from other processing units to operational data and control information Operation is practised, implementing result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, Network interface card, wifi interface, server.It, can be by specifically tying between data processor when comprising more than one data processor Structure is linked and is transmitted data, for example, data is interconnected and transmitted by PCIE bus, to support more massive machine The operation of study.At this point it is possible to share same control system, there can also be control system independent;Can with shared drive, Can also each accelerator have respective memory.In addition, its mutual contact mode can be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases Connection.
The embodiment of the present application also provides a combined treatment devices comprising above-mentioned machine learning arithmetic unit leads to With interconnecting interface and other processing units.Machine learning arithmetic unit is interacted with other processing units, completes user jointly Specified operation.Fig. 9 is the schematic diagram of combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its Interface of its processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine Device learns the basic control such as unlatching, stopping of arithmetic unit;Other processing units can also cooperate with machine learning arithmetic unit It is common to complete processor active task.
General interconnecting interface, for transmitting data and control between the machine learning arithmetic unit and other processing units Instruction.The machine learning arithmetic unit obtains required input data, write-in machine learning operation dress from other processing units Set the storage device of on piece;Control instruction can be obtained from other processing units, write-in machine learning arithmetic unit on piece Control caching;It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.
Optionally, the structure is as shown in Figure 10, can also include storage device, storage device respectively with the machine learning Arithmetic unit is connected with other processing units.Storage device for be stored in the machine learning arithmetic unit and it is described its The data of the data of its processing unit, operation required for being particularly suitable for learn arithmetic unit or other processing units in machine Storage inside in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.
In some embodiments, a kind of chip has also been applied for comprising at above-mentioned machine learning arithmetic unit or combination Manage device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.As shown in figure 11, scheme 11 provide a kind of board, and above-mentioned board can also include other matching components other than including said chip 389, should Matching component includes but is not limited to: memory device 390, reception device 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.It is described to deposit Memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can manage Solution, storage unit described in each group can be DDR SDRAM(English: Double Data Rate SDRAM, Double Data Rate are synchronous Dynamic RAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 grain (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 grain in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The reception device is electrically connected with the chip in the chip-packaging structure.The reception device is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the reception Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of 3.0 X of PCIE, 16 interface, theoretical bandwidth can reach 16000MB/s. In another embodiment, the reception device can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute It states reception device and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment can for data processor, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Electrical combination, but those skilled in the art should understand that, the application is not limited by described electrical combination mode, Because certain circuits can be realized using other way or structure according to the application.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to alternative embodiment, related device and module not necessarily this Shen It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (33)

1. a kind of data processor, which is characterized in that the data processor includes: the first multiplying operational circuit, the second multiplication Computing circuit and partial product switched circuit, first multiplying operational circuit include that the first amendment encodes sub-circuit and first Amendment compression sub-circuit, second multiplying operational circuit include the second amendment coding sub-circuit and the second amendment compression son electricity Road, wherein the first amendment coding sub-circuit includes the first encoding branches and first choice branch, and second amendment is compiled Numeral circuit include the second encoding branches and second selection branch, it is described first amendment coding sub-circuit the first output end with The first input end of the partial product switched circuit connects, the second output terminal of the first amendment coding sub-circuit and described the The input terminal connection of one amendment compression sub-circuit, the first output end of the partial product switched circuit and first amendment encode The input terminal of sub-circuit connects, and the second output terminal of the partial product switched circuit is defeated with the second amendment coding sub-circuit Enter end connection, the first output end of the second amendment coding sub-circuit and the second input terminal of the partial product switched circuit connect It connects, the second output terminal of the second amendment coding sub-circuit is connect with the input terminal of the second amendment compression sub-circuit;
Wherein, first encoding branches are used to carry out canonical signed number coded treatment to the first data received, obtain First partial product after symbol Bits Expanding, the first choice branch are used for from the first partial product after the symbol Bits Expanding The first partial product of selection target coding, the first amendment compression sub-circuit are used for the first partial product to the target code Compression processing is carried out, first object operation result is obtained, second encoding branches are used to carry out the second data received Canonical signed number coded treatment, the second partial product after obtaining symbol Bits Expanding, the second selection branch are used for from described The second partial product that selection target encodes in second partial product after symbol Bits Expanding, the second amendment compression sub-circuit are used for Compression processing is carried out to the second partial product of the target code, obtains the second target operation result, the partial product exchange electricity Road is for handing over the second partial product after the first partial product and the symbol Bits Expanding after the symbol Bits Expanding It changes.
2. data processor according to claim 1, which is characterized in that first multiplying operational circuit and described second Include first input end in multiplying operational circuit, is used for receive capabilities selection mode signal;In the partial product switched circuit Including third input terminal, for receiving the function selection mode signal;Described in the function selection mode signal is used to determine Data processor can currently handle the data operation of different mode.
3. data processor according to claim 2, which is characterized in that the first amendment coding sub-circuit includes: the One amendment coded treatment branch and first partial product select branch, the output end of the first amendment coded treatment branch and institute State the input terminal connection of first partial product selection branch;
Wherein, the first amendment coded treatment branch is used to carry out canonical signed number volume to first data received Code processing obtains the first object coding, and the first partial product selection branch according to the first object for encoding First partial product to after symbol Bits Expanding selects the first partial product after the symbol Bits Expanding, and receives institute Second partial product after stating the symbol Bits Expanding of partial product switched circuit output, after the symbol Bits Expanding received Second partial product, and the first partial product after selection after the obtained symbol Bits Expanding, as the target code First partial product.
4. data processor according to claim 3, which is characterized in that described first, which corrects coded treatment branch, includes: First amendment coding unit, low portion product acquiring unit, low level selector group unit, high-order portion product acquiring unit and height Digit selector group unit, the first output end of the first amendment coding unit and the first of low portion product acquiring unit Second input terminal of input terminal connection, the output end of the low level selector group unit and low portion product acquiring unit connects It connecing, the second output terminal of the first amendment coding unit is connect with the first input end of high-order portion product acquiring unit, The output end of the high digit selector group unit is connect with the second input terminal of high-order portion product acquiring unit;
Wherein, the first amendment coding unit is used to carry out at canonical signed number coding first data received Reason determines that the data processor can handle the bit wide of data according to the function selection mode signal received, and according to The bit wide that the data processor can handle data obtains first object coding, and the low portion product acquiring unit is used for basis Receive the first object coding in the first low level target code and first data, after obtaining symbol Bits Expanding The first low portion product, the low level selector group unit is used to gate the product of the first low portion after the symbol Bits Expanding In numerical value, high-order portion product acquiring unit is used for according to the first high-order mesh in the first object coding received Mark coding and first data, the first high-order portion product after obtaining symbol Bits Expanding, the high digit selector group unit For gating the numerical value in the product of the first high-order portion after the symbol Bits Expanding.
5. data processor according to claim 4, which is characterized in that the first amendment coding unit includes: first Data-in port, first mode selection signal input port, low level target code output port and high-order target code are defeated Exit port;First data-in port is for receiving first data, the first mode selection signal input port For receiving the function selection mode signal, the low level target code output port for export to first data into After row canonical signed number coded treatment, obtained the first low level target code, the high position target code output port For exporting to first high position target code after first data progress canonical signed number coded treatment, obtained.
6. data processor according to claim 4 or 5, which is characterized in that the low portion accumulates acquiring unit and includes: Low level target code input port, gating value input mouth, the first data-in port and low portion product output port; The first low level target that the low level target code input port is used to receive the first amendment coding unit output is compiled Code, the sign bit of the gating value input mouth for obtaining after receiving the low level selector group one-cell switching expand The numerical value in the first low portion product after exhibition, first data-in port are described low for receiving first data Bit position product output port is used to export the first low portion product after the symbol Bits Expanding.
7. data processor according to claim 4, which is characterized in that the high-order portion product acquiring unit includes: height Position target code input port, gating value input mouth, the first data-in port and high-order portion product output port;Institute The first high-order target code that high-order target code input port is exported for receiving the first amendment coding unit is stated, it is described Gating value input mouth for after receiving the high digit selector group one-cell switching, after the symbol Bits Expanding of output the Numerical value in one high-order portion product, first data-in port is for receiving first data, the high-order portion product Output port is used to export the first high-order portion product after the symbol Bits Expanding.
8. data processor according to claim 4, which is characterized in that the low level selector group unit includes: low level Selector, the low level selector are used to gate the numerical value in the first low portion product after the symbol Bits Expanding.
9. data processor according to claim 4, which is characterized in that the high digit selector group unit includes: a high position Selector, the high digit selector are used to gate the numerical value in the first high-order portion product after the symbol Bits Expanding.
10. data processor according to claim 3, which is characterized in that the first partial product selection branch includes: function It can selection mode signal input port, first partial product input port, second partial product input port, first partial product output end Mouth and gate unit product output port;The function selection mode signal input port is for receiving the function selection mode Signal, after the first partial product input port is used to receive the symbol Bits Expanding of the first amendment coding unit output First partial product, the second partial product input port is used to receive the sign bit of partial product switched circuit exchange Second partial product after extension, the first partial product output port need the partial product switched circuit to be handed over for exporting First partial product after the symbol Bits Expanding changed, the gate unit product output port are used to export the symbol after gating Second partial product after first partial product after number Bits Expanding, and the symbol Bits Expanding that receives.
11. data processor according to claim 1, which is characterized in that the first amendment compression sub-circuit includes: to repair Positive Wallace tree group unit and summing elements, the input of the output end and the summing elements of the amendment Wallace tree group unit End connection;When the amendment Wallace tree group unit is used to handle the data operation of different mode, the target of acquisition is compiled Code first partial product in each columns value carry out accumulation process, obtain accumulating operation as a result, the summing elements for pair The accumulating operation result carries out add operation.
12. data processor according to claim 11, which is characterized in that the amendment Wallace tree group unit includes: Low level Wallace tree subelement, selector and high-order Wallace tree subelement, the output of the low level Wallace tree subelement End is connect with the input terminal of the selector, the input terminal of the output end of the selector and the high-order Wallace tree subelement Connection;Wherein, the low level Wallace tree subelement is used for each columns value in the first partial product of the target code It carries out accumulating operation and obtains the accumulating operation as a result, the selector is received for gating the high-order Wallace tree subelement Carry input signal, it is described a high position Wallace tree subelement be used for each column in the first partial product of the target code Numerical value carries out accumulating operation and obtains the accumulating operation result.
13. data processor according to claim 11, which is characterized in that the summing elements include: adder, described Adder is used to carry out add operation to the accumulating operation result.
14. data processor according to claim 1, which is characterized in that the second amendment coding sub-circuit includes: the Two amendment coded treatment branches and second partial product select branch, the output end of the second amendment coded treatment branch and institute State the input terminal connection of second partial product selection branch;
The second amendment coded treatment branch is used to carry out at canonical signed number coding second data received Reason obtains second target code, and the second partial product selection branch according to second target code for being accorded with Second partial product after number Bits Expanding, selects the second partial product after the symbol Bits Expanding, and receive the portion First partial product after the symbol Bits Expanding of point product switched circuit output, by the after the symbol Bits Expanding received First partial product after the symbol Bits Expanding obtained after two partial products, and selection, second as the target code Partial product.
15. data processor according to claim 14, which is characterized in that second partial product selection branch includes: Function selection mode signal input port, second partial product input port, first partial product input port, second partial product output Port and gate unit product output port;The function selection mode signal input port is for receiving the function selection mould Formula signal, the second partial product input port are used to receive the sign bit of the second amendment coded treatment branch output Second partial product after extension, the first partial product input port after receiving the partial product switched circuit exchange for obtaining The symbol Bits Expanding after first partial product, the second partial product output port needs the partial product to hand over for exporting Second partial product after changing the symbol Bits Expanding that circuit needs to exchange, the gate unit product output port is for exporting choosing First partial product after the second partial product after the symbol Bits Expanding after logical, and the symbol Bits Expanding that receives.
16. data processor according to claim 1, which is characterized in that the partial product switched circuit includes: function choosing Select mode signal input port, first partial product input port, first partial product output port, second partial product input port with And second partial product output port, the function selection mode signal input port is for receiving the function selection mode letter Number, the first partial product input port is used to receive the symbol that the needs of the first partial product selection branch output exchange First partial product after number Bits Expanding, the first partial product output port is for exporting first after the symbol Bits Expanding Divide product, the needs that the second partial product output port is used to receive the second partial product selection branch output exchange described Second partial product after symbol Bits Expanding, the second partial product output port are used to export second after the symbol Bits Expanding Partial product.
17. a kind of data processing method, which is characterized in that the described method includes:
Receive pending data and function selection mode signal, wherein the function selection mode signal is used to indicate at data Reason device can currently handle the data operation of different mode;
According to the function selection mode signal, judge whether the pending data needs to carry out deconsolidation process;
If the pending data needs to carry out deconsolidation process, deconsolidation process is carried out to the pending data, is split Data afterwards;
Canonical signed number coded treatment is carried out to the data after the fractionation, obtains target code;
Conversion process is carried out according to the data after the target code and the fractionation, the part after obtaining symbol Bits Expanding Product;
According to the function selection mode signal, judge whether need to swap place to the partial product after the symbol Bits Expanding Reason;
If not needing to swap processing to the partial product after the symbol Bits Expanding, by the part after the symbol Bits Expanding Partial product of the product as target code;
Compression processing is carried out to the partial product of the target code, obtains target operation result.
18. according to the method for claim 17, which is characterized in that described according to the function selection mode signal, judgement Whether the pending data needs to carry out deconsolidation process, comprising: according to the function selection mode signal, judgement is described wait locate Whether bit wide and the data bit width of data processor currently accessible associative mode operation for managing data are equal.
19. according to the method for claim 18, which is characterized in that according to the function selection mode signal, judge institute State pending data bit wide it is whether equal with the data bit width of data processor currently accessible associative mode operation after, The method also includes: if the pending data does not need to carry out deconsolidation process, continue to execute to the pending data Canonical signed number coded treatment is carried out, the target code is obtained.
20. method described in any one of 7 to 19 according to claim 1, which is characterized in that the data to after the fractionation Canonical signed number coded treatment is carried out, target code is obtained, comprising: will be continuous in the data after the fractionationlBit value 1 Be converted to (l+ 1) position highest bit value be 1, lowest order numerical value be -1, remaining position be numerical value 0 after, obtain the target code, In,lMore than or equal to 2.
21. according to the method for claim 17, which is characterized in that the data to after the fractionation, which carry out canonical, symbol Number coded treatment, obtains target code, comprising:
Canonical signed number coded treatment is carried out to the data after the fractionation, obtains intermediate code;
According to the intermediate code and the function selection mode signal, the target code is obtained.
22. according to the method for claim 17, which is characterized in that it is described according to the target code and the fractionation after Data carry out conversion process, the partial product after obtaining symbol Bits Expanding, comprising:
Conversion process is carried out according to the data after the target code and the fractionation, obtains initial protion product;
Sign bit extension process is carried out to initial protion product, the partial product after obtaining the symbol Bits Expanding.
23. according to the method for claim 17, which is characterized in that described according to the function selection mode signal, judgement Whether the partial product after the symbol Bits Expanding is needed to swap processing, comprising: according to the function selection mode signal, Judge data processor be presently in reason data bit width it is whether identical.
24. according to the method for claim 23, which is characterized in that according to the function selection mode signal, judgement pair After whether the partial product after the symbol Bits Expanding needs to swap processing, the method also includes: if desired to described Partial product after symbol Bits Expanding swaps processing, then in the partial product after the symbol Bits Expanding high-order portion product or Low portion product swaps processing.
25. according to the method for claim 17, which is characterized in that the partial product to the target code is compressed Processing, obtains target operation result, comprising:
Accumulation process is carried out to the partial product of the target code, obtains intermediate calculation results;
Accumulation process is carried out to the intermediate calculation results, obtains the target operation result.
26. according to the method for claim 25, which is characterized in that described to carry out cumulative place to the intermediate calculation results Reason, obtains the target operation result, comprising:
Low level Wallace tree subelement carries out accumulation process to the columns value in the partial product of all target codes, obtains cumulative fortune Calculate result;
Selector gates the accumulating operation result according to the function selection mode signal, obtains carry gating letter Number;
High-order Wallace tree subelement is according to the columns value in the carry gating signal and the partial product of the target code Accumulation process is carried out, the target operation result is obtained.
27. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed Benefit requires the described in any item data processors of 1-16, for obtaining from other processing units to operation input data and control Information, and specified machine learning operation is handled, processing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple data processors, by pre- between multiple data processors If specific structure is attached and transmits data;
Wherein, multiple data processors are interconnected by PCIE bus and are transmitted data, to support more massive machine The operation of device study;Multiple data processors share same control system or possess respective control system;It is multiple described Data processor shared drive possesses respective memory;The mutual contact mode of multiple data processors is that any interconnection is opened up It flutters.
28. a kind of combined treatment device, which is characterized in that the combined treatment device includes machine as claimed in claim 27 Learn arithmetic unit, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other processing units, the common calculating behaviour for completing user and specifying Make.
29. combined treatment device according to claim 28, which is characterized in that further include: storage device, the storage device It is connect respectively with the machine learning arithmetic unit and other processing units, for saving the machine learning arithmetic unit With the data of other processing units.
30. a kind of neural network chip, which is characterized in that the machine learning chip includes machine as claimed in claim 27 Learn arithmetic unit or combined treatment device as claimed in claim 28 or combined treatment device as claimed in claim 29.
31. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 30.
32. a kind of board, which is characterized in that the board includes: memory device, reception device and control device and such as right It is required that neural network chip described in 30;
Wherein, the neural network chip is separately connected with the memory device, the control device and the reception device;
The memory device, for storing data;
The reception device, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
33. board according to claim 32, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The reception device are as follows: standard PCIE interface.
CN201910902610.3A 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment Active CN110413254B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910902610.3A CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment
CN201911349822.XA CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910902610.3A CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201911349822.XA Division CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN110413254A true CN110413254A (en) 2019-11-05
CN110413254B CN110413254B (en) 2020-01-10

Family

ID=68370615

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201911349822.XA Active CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment
CN201910902610.3A Active CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201911349822.XA Active CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Country Status (1)

Country Link
CN (2) CN111008003B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767025A (en) * 2020-08-04 2020-10-13 腾讯科技(深圳)有限公司 Chip comprising multiply-accumulator, terminal and control method of floating-point operation
CN112558920A (en) * 2020-12-21 2021-03-26 清华大学 Signed/unsigned multiply-accumulate device and method
CN113031911A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113031918A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033799A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033788A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1454347A (en) * 2000-10-16 2003-11-05 诺基亚公司 Multiplier and shift device using signed digit representation
US20070180015A1 (en) * 2005-12-09 2007-08-02 Sang-In Cho High speed low power fixed-point multiplier and method thereof
CN101923459A (en) * 2009-06-17 2010-12-22 复旦大学 Reconfigurable multiplication/addition arithmetic unit for digital signal processing
CN103955585A (en) * 2014-05-13 2014-07-30 复旦大学 FIR (finite impulse response) filter structure for low-power fault-tolerant circuit
CN104011665A (en) * 2011-12-23 2014-08-27 英特尔公司 Super Multiply Add (Super MADD) Instruction
CN105183424A (en) * 2015-08-21 2015-12-23 电子科技大学 Fixed-bit-width multiplier with high accuracy and low energy consumption properties
CN110190843A (en) * 2018-04-10 2019-08-30 北京中科寒武纪科技有限公司 Compressor circuit, Wallace tree circuit, multiplier circuit, chip and equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784305A (en) * 1995-05-01 1998-07-21 Nec Corporation Multiply-adder unit
CN1324456C (en) * 2004-01-09 2007-07-04 上海交通大学 Digital signal processor using mixed compression two stage flow multiplicaton addition unit
CN100356315C (en) * 2004-09-02 2007-12-19 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
CN100552620C (en) * 2007-09-21 2009-10-21 清华大学 Large number multiplication device based on quadratic B ooth coding
CN101625634A (en) * 2008-07-09 2010-01-13 中国科学院半导体研究所 Reconfigurable multiplier
CN102591615A (en) * 2012-01-16 2012-07-18 中国人民解放军国防科学技术大学 Structured mixed bit-width multiplying method and structured mixed bit-width multiplying device
CN107977191B (en) * 2016-10-21 2021-07-27 中国科学院微电子研究所 Low-power-consumption parallel multiplier
CN108459840B (en) * 2018-02-14 2021-07-09 中国科学院电子学研究所 SIMD structure floating point fusion point multiplication operation unit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1454347A (en) * 2000-10-16 2003-11-05 诺基亚公司 Multiplier and shift device using signed digit representation
US20070180015A1 (en) * 2005-12-09 2007-08-02 Sang-In Cho High speed low power fixed-point multiplier and method thereof
CN101923459A (en) * 2009-06-17 2010-12-22 复旦大学 Reconfigurable multiplication/addition arithmetic unit for digital signal processing
CN104011665A (en) * 2011-12-23 2014-08-27 英特尔公司 Super Multiply Add (Super MADD) Instruction
CN103955585A (en) * 2014-05-13 2014-07-30 复旦大学 FIR (finite impulse response) filter structure for low-power fault-tolerant circuit
CN105183424A (en) * 2015-08-21 2015-12-23 电子科技大学 Fixed-bit-width multiplier with high accuracy and low energy consumption properties
CN110190843A (en) * 2018-04-10 2019-08-30 北京中科寒武纪科技有限公司 Compressor circuit, Wallace tree circuit, multiplier circuit, chip and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HENG QUAN等: "《A Novel Vector/SIMD Multiply-Accumulate Unit based on Reconfigurable Booth Array》", 《2010 10TH IEEE INTERNATIONAL CONFERENCE ON SOILD-STATE AND INTEGRATED CIRCUIT TECHNOLOGY》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031911A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113031918A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033799A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033788A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033788B (en) * 2019-12-24 2023-08-18 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033799B (en) * 2019-12-24 2023-09-08 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN111767025A (en) * 2020-08-04 2020-10-13 腾讯科技(深圳)有限公司 Chip comprising multiply-accumulator, terminal and control method of floating-point operation
CN111767025B (en) * 2020-08-04 2023-11-21 腾讯科技(深圳)有限公司 Chip comprising multiply accumulator, terminal and floating point operation control method
CN112558920A (en) * 2020-12-21 2021-03-26 清华大学 Signed/unsigned multiply-accumulate device and method
CN112558920B (en) * 2020-12-21 2022-09-09 清华大学 Signed/unsigned multiply-accumulate device and method

Also Published As

Publication number Publication date
CN110413254B (en) 2020-01-10
CN111008003B (en) 2023-10-13
CN111008003A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN110413254A (en) Data processor, method, chip and electronic equipment
CN110362293A (en) Multiplier, data processing method, chip and electronic equipment
CN109086076A (en) Processing with Neural Network device and its method for executing dot product instruction
CN110515589A (en) Multiplier, data processing method, chip and electronic equipment
CN110163358A (en) A kind of computing device and method
CN107957976A (en) A kind of computational methods and Related product
CN110531954A (en) Multiplier, data processing method, chip and electronic equipment
CN110515587A (en) Multiplier, data processing method, chip and electronic equipment
CN105913118A (en) Artificial neural network hardware implementation device based on probability calculation
CN110515590A (en) Multiplier, data processing method, chip and electronic equipment
CN107957977A (en) A kind of computational methods and Related product
CN109711540A (en) A kind of computing device and board
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN110688087B (en) Data processor, method, chip and electronic equipment
CN110647307B (en) Data processor, method, chip and electronic equipment
CN110515588A (en) Multiplier, data processing method, chip and electronic equipment
CN111258541A (en) Multiplier, data processing method, chip and electronic equipment
CN110515586A (en) Multiplier, data processing method, chip and electronic equipment
CN210006029U (en) Data processor
CN210006030U (en) Data processor
CN110554854B (en) Data processor, method, chip and electronic equipment
CN209895329U (en) Multiplier and method for generating a digital signal
CN111260070B (en) Operation method, device and related product
CN111381875B (en) Data comparator, data processing method, chip and electronic equipment
CN210109789U (en) Data processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant