CN110515587A - Multiplier, data processing method, chip and electronic equipment - Google Patents

Multiplier, data processing method, chip and electronic equipment Download PDF

Info

Publication number
CN110515587A
CN110515587A CN201910817971.8A CN201910817971A CN110515587A CN 110515587 A CN110515587 A CN 110515587A CN 201910817971 A CN201910817971 A CN 201910817971A CN 110515587 A CN110515587 A CN 110515587A
Authority
CN
China
Prior art keywords
product
data
target code
circuit
multiplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910817971.8A
Other languages
Chinese (zh)
Other versions
CN110515587B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910817971.8A priority Critical patent/CN110515587B/en
Publication of CN110515587A publication Critical patent/CN110515587A/en
Application granted granted Critical
Publication of CN110515587B publication Critical patent/CN110515587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • G06F7/5318Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The application provides a kind of multiplier, data processing method, chip and electronic equipment, the multiplier includes: to improve canonical signed number coding circuit, improve Wallace tree group circuit and summation circuit, the output end for improving canonical signed number coding circuit is connect with the input terminal for improving Wallace tree group circuit, the output end for improving Wallace tree group circuit is connect with the input terminal of the summation circuit, the multiplier can carry out canonical signed number coding to the data received by canonical signed number coding circuit, the number of obtained live part product is less, to reduce the complexity that multiplier realizes multiplying.

Description

Multiplier, data processing method, chip and electronic equipment
Technical field
This application involves field of computer technology, more particularly to a kind of multiplier, data processing method, chip and electronics Equipment.
Background technique
With the continuous development of Digital Electronic Technique, all kinds of artificial intelligence (Artificial Intelligence, AI) cores Piece becomes the hot spot of current scientific and technological industry and social concerns.Main circuit one of of the multiplier circuit as AI chip, property It can be particularly important.
Currently, multiplier is to encode to every three bit value in multiplier as one, and obtain partial product according to multiplicand, And compression processing is carried out to all partial products with Wallace tree and obtains multiplication result.It is non-in coding but in traditional technology The number of zero-bit numerical value is more, and the number of the corresponding part product of generation is more, and multiplier is caused to realize the complexity of multiplying It is higher.
Summary of the invention
Based on this, it is necessary to which in view of the above technical problems, providing a kind of can reduce having of obtaining in multiplication procedure Partial product number is imitated, to reduce multiplier, data processing method, chip and the electronic equipment of multiplier multiplying complexity.
The embodiment of the present application provides a kind of multiplier, and the multiplier includes: to improve canonical signed number coding circuit, change Into Wallace tree group circuit and summation circuit, the improvement Wallace tree group circuit includes 4-2 compressor, the 4-2 compression Device includes selection circuit and full adder, the output end for improving canonical signed number coding circuit and the improvement Wallace The input terminal of tree group circuit connects, and the input terminal of the output end for improving Wallace tree group circuit and the summation circuit connects It connects;
Wherein, the canonical signed number coding circuit that improves is used to carry out canonical signed number volume to the data received Code processing, the partial product after obtaining symbol Bits Expanding, and target code is obtained according to the partial product after the symbol Bits Expanding Partial product, the Wallace tree group circuit that improves are used to carry out accumulation process to the partial product of the target code to obtain cumulative fortune It calculates as a result, the summation circuit is used to carry out accumulation process to the accumulating operation result.
Include in one of the embodiments, first input end in the improvement canonical signed number coding circuit, is used for Receive capabilities selection mode signal;It include the second input terminal in the improvement Wallace tree group circuit, for receiving the function Selection mode signal;The function selection mode signal is for determining the accessible data bit width of the multiplier.
The improvement canonical signed number coding circuit includes: to improve canonical signed number in one of the embodiments, Coding unit, low portion product acquiring unit, low level selector group unit, high-order portion product acquiring unit and high digit selector Group unit, the first of first output end for improving canonical signed number coding unit and low portion product acquiring unit Second input terminal of input terminal connection, the output end of the low level selector group unit and low portion product acquiring unit connects It connects, the first input of the second output terminal for improving canonical signed number coding unit and high-order portion product acquiring unit The output end of end connection, the high digit selector group unit is connect with the second input terminal of high-order portion product acquiring unit;
Wherein, the canonical signed number coding unit that improves is used to carry out canonical to the first data received to have symbol Number encoder processing, and according to the function selection mode signal received, determine that the multiplier can handle the position of data Width, and target code is obtained according to the bit wide that the multiplier can handle data, the low portion product acquiring unit is used for root Low portion according to the low level target code and the second data in the target code received, after obtaining symbol Bits Expanding Product, and the low portion product of target code, the low level selector are obtained according to the low portion product after the symbol Bits Expanding Group unit is used for for the numerical value in the low portion product after gating the symbol Bits Expanding, the high-order portion product acquiring unit According in the target code received high-order target code and second data, height after obtaining symbol Bits Expanding Bit position product, and the high-order portion product of target code, the high position are obtained according to the high-order portion product after the symbol Bits Expanding Selector group unit is for the numerical value in the high-order portion product after gating the symbol Bits Expanding.
In one of the embodiments, the improvement canonical signed number coding unit include: the first data-in port, First mode selection signal input port, low level target code output port and high-order target code output port;Described One data-in port is for receiving first data, and the first mode selection signal input port is for receiving the function Energy selection mode signal, the low level target code output port carry out canonical signed number to first data for exporting The low level target code obtained after coded treatment, the high position target code output port is for exporting to first number According to the high-order target code obtained after canonical signed number coded treatment.
Low portion product acquiring unit includes: low level target code input port, the in one of the embodiments, One gating value input mouth, second mode selection signal input port, the second data-in port and low portion product are defeated Exit port;The low level target code input port is for receiving the low level target code, the first gating numerical value input Port is for after receiving the low level selector group one-cell switching, the numerical value for including in the low portion product of output to be described For receiving the function selection mode signal, second data-in port is used for second mode selection signal input port Second data are received, the low portion product output port is used to export the low portion product of the target code.
The low level selector group unit includes: low level selector in one of the embodiments, the low level selector For being gated to the numerical value for including in the low portion product after the symbol Bits Expanding.
High-order portion product acquiring unit includes: high-order target code input port, the in one of the embodiments, Two gating value input mouths, the third mode selection signal input port, the second data-in port and high-order portion product are defeated Exit port;The high position target code input port is for receiving the high-order target code, the second gating numerical value input Port is for after receiving the high digit selector group one-cell switching, the numerical value for including in the high-order portion product of output to be described For receiving the function selection mode signal, second data-in port is used for the third mode selection signal input port Second data are received, the high-order portion product output port is used to export the high-order portion product of the target code.
The high digit selector group unit includes: high digit selector in one of the embodiments, the high-order choosing Device is selected for gating to the numerical value for including in the high-order portion product after the symbol Bits Expanding.
The improvement Wallace tree group circuit includes: to improve Wallace tree sub-circuit in one of the embodiments,;It is described Wallace tree sub-circuit is improved to be used to carry out accumulation process to each columns value in the partial product of the target code to be tired out Add operation result.
In one of the embodiments, the improvement Wallace tree group circuit include: low level improve Wallace tree sub-circuit, Selector and high-order improvement Wallace tree sub-circuit, the output end of the low level improvement Wallace tree sub-circuit and the selection The input terminal of device connects, and the output end of the selector is connect with the high-order input terminal for improving Wallace tree sub-circuit;Its In, multiple low levels improve Wallace tree sub-circuit and are used to carry out each columns value in the partial product of all target codes Accumulation process, the selector are multiple for gating the high-order improvement received carry input signal of Wallace tree sub-circuit The high-order Wallace tree sub-circuit that improves is used to carry out each columns value in the partial product of all target codes at cumulative place Reason.
The low level improves Wallace tree sub-circuit and the high-order improvement Wallace tree in one of the embodiments, Circuit includes the 4-2 compressor and mode selecting unit, and the output end of the mode selecting unit and the 4-2 compress The input terminal of device connects;The 4-2 compressor is for adding up to the numerical value of each column in the partial product of all target codes Processing, the mode selecting unit are used to gate the number in the partial product for the target code that the 4-2 compressor receives Value;It wherein, include first input end in the mode selecting unit, for receiving the function selection mode signal.
The summation circuit includes: adder in one of the embodiments, and the adder is used for the cumulative fortune It calculates result and carries out accumulation process.
In one of the embodiments, the adder include: carry signal input port and position signal input port with And operation result output port;The carry signal input port is used to receive carry signal and position signal input port is used for It receives and position signal, the operation result output port carries out cumulative place for exporting the carry signal and described and position signal Manage obtained target operation result.
Multiplier provided in this embodiment, the multiplier can carry out multiplying to the data of a variety of different bit wides, have Effect reduces the area that multiplier occupies AI chip;In addition, multiplier is by improving Wallace tree group circuit to target code Partial product carries out accumulating operation, and multiplier power consumption can also be effectively reduced.A kind of multiplier provided in this embodiment, passes through canonical Signed number coding circuit carries out canonical signed number coded treatment to the data received and obtains the partial product of target code, changes Accumulation process can be carried out to the partial product of target code into Wallace tree group circuit, and by summation circuit to lopsided Wallace The accumulation result that tree group circuit obtains carries out accumulation process again, obtains the target operation result of multiplying, the multiplier energy Canonical signed number coding is enough carried out to the data received by canonical signed number coding circuit, obtained live part product Number it is less, thus reduce multiplier realize multiplying complexity.
The embodiment of the present application provides a kind of data processing method, which comprises
Receive pending data and function selection mode signal, wherein the function selection mode signal is used to indicate The bit wide of data can currently be handled;
Canonical signed number coded treatment is carried out to the pending data, obtains intermediate code;
Complement processing is carried out to the intermediate code according to the function selection mode signal, obtains target code;
Conversion process is carried out to the pending data and the target code, obtains the partial product of target code;
Accumulation process is carried out to the partial product midrange value of the target code by improving Wallace tree group circuit, is obtained Target operation result.
It is described in one of the embodiments, that canonical signed number coded treatment is carried out to the pending data, it obtains Intermediate code, comprising: it is 1 that l bit value 1 continuous in the pending data, which is converted to the position (l+1) highest bit value, minimum Bit value be -1, remaining position be numerical value 0 after, obtain the intermediate code, wherein l be more than or equal to 2.
It is described in one of the embodiments, that complement is carried out to the intermediate code according to the function selection mode signal Processing, obtains target code, comprising:
According to the function selection mode signal, judge whether to need to carry out complement processing to the intermediate code;
If desired complement processing is carried out to obtain then to complement value 0 at high one of the highest bit value of the intermediate code The target code.
In one of the embodiments, the method also includes: if not needing to carry out complement processing, intermediate compiled described Code is used as the target code.
It is described in one of the embodiments, that conversion process is carried out to the pending data and the target code, it obtains To the partial product of target code, comprising:
By the low level target code and pending data progress conversion process in the target code, mesh is obtained The low portion product of mark coding;
By the high-order target code and pending data progress conversion process in the target code, mesh is obtained The high-order portion product of mark coding.
The low level target code by the target code and described to be processed in one of the embodiments, Data carry out conversion process, obtain the low portion product of target code, comprising:
Partial product according to the low level target code and the pending data, after obtaining symbol Bits Expanding;
Pass through the numerical value in the low portion product after low level selector group one-cell switching symbol Bits Expanding;
After the numerical value and the symbol Bits Expanding in the low portion product after the symbol Bits Expanding after gating Partial product in numerical value carry out conversion process, after obtaining the symbol Bits Expanding low portion product;
The low portion product of the target code is obtained according to the low portion product after the symbol Bits Expanding.
The high-order target code by the target code and described to be processed in one of the embodiments, Data carry out conversion process, obtain the high-order portion product of target code, comprising:
Partial product according to the high-order target code and the pending data, after obtaining symbol Bits Expanding;
Pass through the numerical value in the high-order portion product after high digit selector group one-cell switching symbol Bits Expanding;
After the numerical value and the symbol Bits Expanding in the high-order portion product after the symbol Bits Expanding after gating Partial product in numerical value, after obtaining the symbol Bits Expanding high-order portion product;
The high-order portion product of the target code is obtained according to the high-order portion product after the symbol Bits Expanding.
It is described by improving Wallace tree group circuit in the partial product of the target code in one of the embodiments, Columns value carries out accumulation process, obtains target operation result, comprising:
Wallace tree sub-circuit is improved by low level, and cumulative place is carried out to the columns value in the partial product of all target codes Reason, obtains intermediate calculation results;
The intermediate calculation results are gated by selector, obtain carry gating signal;
Wallace tree sub-circuit is improved according to the carry gating signal and the part of the target code by a high position Columns value in product carries out accumulation process, obtains the target operation result.
A kind of data processing method provided in this embodiment receives pending data and function selection mode signal, right The pending data carries out canonical signed number coded treatment, intermediate code is obtained, according to the function selection mode signal Complement processing is carried out to the intermediate code, obtains target code, the pending data and the target code are turned Processing is changed, the partial product of target code is obtained, by improving Wallace tree group circuit to arranging in the partial product of the target code Numerical value carries out accumulation process, obtains target operation result, and this method can carry out canonical signed number coding to pending data Processing, and target code is obtained according to the function selection mode signal received, it carries out obtaining target code according to target code Partial product, reduce target code in multiplying live part product number, to reduce the complexity of multiplying.
A kind of machine learning arithmetic unit provided by the embodiments of the present application, the machine learning arithmetic unit include one or Multiple multipliers;The machine learning arithmetic unit is used to obtained from other processing units to operational data and control letter Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple multipliers, by default between multiple computing devices Specific structure is attached and transmits data;
Wherein, multiple multipliers are interconnected by PCIE bus and are transmitted data, to support more massive machine The operation of device study;Multiple multipliers share same control system or possess respective control system;Multiple multiplication Device shared drive possesses respective memory;The mutual contact mode of multiple multipliers is any interconnection topology.
A kind of combined treatment device provided by the embodiments of the present application, the combined treatment device include machine learning as mentioned Processing unit, general interconnecting interface and other processing units;The machine learning arithmetic unit and above-mentioned other processing units carry out Interaction, the common operation completing user and specifying;The combined treatment device can also include storage device, the storage device respectively with The machine learning arithmetic unit is connected with other processing units, for saving the machine learning arithmetic unit and described The data of other processing units.
A kind of neural network chip provided by the embodiments of the present application, the neural network chip include multiplication described above Device, machine learning arithmetic unit described above or combined treatment device described above.
A kind of neural network chip encapsulating structure provided by the embodiments of the present application, the neural network chip encapsulating structure include Neural network chip described above.
A kind of board provided by the embodiments of the present application, the board include neural network chip encapsulating structure described above.
The embodiment of the present application provides a kind of electronic device, the electronic device include neural network chip described above or Person's board described above.
A kind of chip provided by the embodiments of the present application, including at least one multiplier as described in any one of the above embodiments.
A kind of electronic equipment provided by the embodiments of the present application, including chip as mentioned.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram for multiplier that an embodiment provides;
Fig. 2 is the specific structure circuit diagram for the multiplier that an embodiment provides;
Fig. 3 a is the distribution rule of the partial product for the target code that two groups of 8 data multiplyings that an embodiment provides obtain Restrain schematic diagram;
Fig. 3 b is the regularity of distribution of the partial product for the target code that 16 data multiplyings that an embodiment provides obtain Schematic diagram;
Fig. 4 is the low level that another embodiment provides or the high-order electrical block diagram for improving Wallace tree sub-circuit;
The connection structure for improving Wallace tree sub-circuit when 8 data multiplyings that Fig. 5 provides for another embodiment is shown It is intended to;
Fig. 6 is a kind of data processing method flow diagram that an embodiment provides;
Fig. 7 is a kind of structure chart for combined treatment device that an embodiment provides;
Fig. 8 is the structure chart for another combined treatment device that an embodiment provides;
Fig. 9 is a kind of structural schematic diagram for board that an embodiment provides.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Multiplier provided by the present application can be applied to AI chip, on-site programmable gate array FPGA (Field- Programmable Gate Array, FPGA) chip or be in other hardware circuit equipment progress multiplying processing, Its concrete structure schematic diagram is as shown in Figure 1.
A kind of structure chart of the multiplier provided as shown in Figure 1 for one embodiment.As shown in Figure 1, the multiplier includes: It improves canonical signed number coding circuit 11, improve Wallace tree group circuit 12 and summation circuit 13;The improvement Wallace Tree group circuit 12 includes 4-2 compressor, and the 4-2 compressor includes selection circuit and full adder, and the improvement canonical has symbol The output end of number coding circuit 11 is connect with the input terminal for improving Wallace tree group circuit 12, the improvement Wallace tree The output end of group circuit 12 is connect with the input terminal of the summation circuit 13.Wherein, the improvement canonical signed number coding electricity Road 11 is used to carry out canonical signed number coded treatment to the data received, the partial product after obtaining symbol Bits Expanding, and root Obtain the partial product of target code according to the partial product after the symbol Bits Expanding, the improvement Wallace tree group circuit 12 for pair The partial product of the target code carries out accumulation process and obtains accumulating operation as a result, the summation circuit 13 is used for described cumulative Operation result carries out accumulation process.
Specifically, above-mentioned improvement canonical signed number coding circuit 11 may include multiple data with not identical function Processing unit, and improve canonical signed number coding circuit 11 and can receive to two data, it can be used as multiplication fortune respectively Multiplier and multiplicand in calculation.Optionally, above-mentioned data can be fixed-point number.Optionally, canonical signed number coding electricity is improved Road 11 can receive the data of a variety of not same bits bit wides, that is to say, that multiplier provided in this embodiment can handle more Plant the multiplying of not same bits bit wide data.But when with multiplication operation, canonical signed number coding circuit is improved 11 multipliers received and multiplicand can be the data of same bit wide, i.e. multiplier and multiplicand bit wide is equal.Illustratively, this reality The multiplier for applying example offer can handle 8 * 8 data multiplication operations, 16 * 16 data multiplication operations, and 32 * 32 The data multiplication operation of position, can also handle 64 * 64 data multiplication operations, not be limited in any way to this present embodiment.
There is symbol it should be noted that improving canonical signed number coding circuit 11 and can carry out canonical to the data received Number coded treatment, that is, canonical signed number coded treatment is carried out to the multiplier received, and multiplied according to what is received It counts, the partial product after obtaining symbol Bits Expanding, the bit wide of the partial product after the symbol Bits Expanding can be equal to the current institute of multiplier Handle 2 times of data bit width.Illustratively, a multiplier receives the data of 16 bit bit wides, and the multiplier is presently in 8 data multiplyings are managed, then the improvement canonical signed number coding circuit 11 in multiplier is needed the number of 16 bit bit wides According to being divided into, two data of most-significant byte and least-significant byte carry out canonical signed number coded treatment respectively, at this point, obtained sign bit The bit wide of partial product after extension can be equal to 2 times, two of most-significant byte and least-significant byte that multiplier is presently in reason data bit width The number of partial product after the symbol Bits Expanding that data respectively obtain can be equal to the bit wide that multiplier is presently in reason data Add 1;Improvement canonical signed number coding circuit if the multiplier can currently handle 16 data multiplyings, in multiplier 11 need to carry out operation to whole 16 data, multiply at this point, the bit wide of the partial product after obtained symbol Bits Expanding can be equal to Musical instruments used in a Buddhist or Taoist mass is presently in 2 times of reason data bit width, and the number of the partial product after obtained symbol Bits Expanding can be worked as equal to multiplier The bit wide of preceding handled data adds 2.
It should be noted that the method for above-mentioned canonical signed number coded treatment can characterize in the following manner: for N For the multiplier of position, handled from low level numerical value to high-order numerical value, it, then can be by continuous n if it exists when continuous l (l >=2) bit value 1 Bit value 1 is converted to data " 1 (0)l-1(- 1) ", and remaining is corresponded into position (l+1) after (N-l) bit value and conversion Numerical value is combined to obtain a new data;Then using the new data as the primary data of next stage conversion process, until There is no until continuous l (l >=2) bit value 1 in the new data obtained after conversion process;Wherein, canonical is carried out to N multipliers The bit wide of signed number coded treatment, obtained target code can be equal to (N+1).Further, it is compiled in canonical signed number Code processing when, data 11 can be converted to (100-001), i.e., data 11 can equivalence be converted to 10 (- 1);Data 111 can turn Be changed to (1000-0001), i.e., data 111 can equivalence be converted to 100 (- 1);And so on, other continuous l (l >=2) digit The mode of 1 conversion process of value is also similar.
For example, improving the multiplier that canonical signed number coding circuit 11 receives is " 001010101101110 ", this is multiplied It is " 0010101011100 (- 1) 0 " that number, which carries out the first new data obtained after first order conversion process, is continued to the first new data Carrying out the second new data obtained after the conversion process of the second level is " 0010101100 (- 1) 00 (- 1) 0 ", is continued to the second new number It is " 0010110 (- 1) 00 (- 1) 00 (- 1) 0 " according to the third new data obtained after third level conversion process, continues to third It is " 00110 (- 1) 0 (- 1) 00 (- 1) 00 (- 1) 0 " that new data, which carries out the 4th new data obtained after fourth stage conversion process, after The 5th new data obtained after the continuous progress level V conversion process to the 4th new data is " 010 (- 1) 0 (- 1) 0 (- 1) 00 (- 1) 00 (- 1) 0 ", there is no continuous l (l >=2) bit values 1 in the 5th new data, at this point, the 5th new data is properly termed as just Begin coding, and after carrying out the processing of cover to initial code, characterization canonical signed number coded treatment is completed to obtain intermediate volume Code, wherein the bit wide of initial code can be equal to the bit wide of multiplier.Optionally, it is right to improve canonical signed number coding circuit 11 After multiplier carries out canonical signed number coded treatment, obtained new data (i.e. initial code), if the highest digit in new data Value and time high-order numerical value are " 10 " or " 01 ", then improving canonical signed number coding circuit 11 can be to the highest order of the new data One digit number value 0 is mended at high one of numerical value, high three bit value for obtaining corresponding intermediate code is respectively " 010 " or " 001 ".It is optional , the bit wide that the bit wide of above-mentioned intermediate code can be presently in reason data equal to multiplier adds 1.
In addition, if the data bit width that receives of multiplier is 2N, and can currently handle N data operations, then in multiplier Improvement canonical signed number coding circuit 11,2N data can be split into two groups of N data and carry out data operation respectively, At this point, can be used as target code after obtain the two groups of position (N+1) intermediate codes are combined;If multiplier can currently be handled 2N data operations, then the improvement canonical signed number coding circuit 11 in multiplier, can be to position (2N+1) of acquisition among After mending one digit number value 0 (i.e. complement processing) at high one of the highest bit value of coding, by complement treated (2N+2) digit According to as target code.
Optionally, include first input end in the improvement canonical signed number coding circuit 11, selected for receive capabilities Select mode signal;It include the second input terminal in the improvement Wallace tree group circuit 12, for receiving the function selection mode Signal.Optionally, the function selection mode signal is for determining the accessible data bit width of the multiplier.
It is understood that above-mentioned function selection mode signal can there are many, not identical function selection mode signal pair The multiplyings of not same bit-width data should currently be can handle in multiplier.When optionally, with multiplication operation, improve just The function selection mode signal that then signed number coding circuit 11 and improvement Wallace tree group circuit 12 receive can be equal.
Illustratively, if improving canonical signed number coding circuit 11 and improving Wallace tree group circuit 12 can receive Multiple functions selection mode signal can be respectively mode=00, mode=01 by taking three kinds of function selection mode signals as an example, Mode=10, mode=11, then mode=00 can characterize multiplier and can handle 8 data, and mode=01 can be with Characterization multiplier 16 data can be handled, mode=10 can characterize multiplier can be to 32 data at Reason, mode=11 can characterize multiplier and can handle 64 data, in addition, mode=00 can also be characterized as multiplication Device can be handled 32 data, and mode=01, which can also be characterized as multiplier, to be handled 64 data, Mode=10 can characterize multiplier and can handle 8 data, and mode=11 can characterize multiplier can be to 16 Data are handled, can be with flexible setting to this present embodiment.
It will also be appreciated that improving the part after the available symbol Bits Expanding of canonical signed number coding circuit 11 Long-pending bit wide can be equal to 2 times of the currently processed data bit width M of multiplier, wherein the high M of the partial product after symbol Bits Expanding Bit value can be equal, and low M bit value can be equal to the number in the initial protion product obtained by target code and multiplicand Value.If target code and M multiplicands available M initial protion product, the highest bit value of initial protion product can be with The high M bit value of partial product after equal symbol Bits Expanding, after the M bit value of initial protion product can be with equal symbol Bits Expanding The low M bit value of partial product.It optionally, may include three kinds of numerical value in above-mentioned target code, numerical value is worked as in respectively -1,0 and 1 When being -1, then corresponding initial protion product can be-X, and when numerical value is 0, then corresponding initial protion product can be 0, work as number When value is 1, then corresponding initial protion product can be X;Wherein, X can indicate that improving canonical signed number coding circuit 11 connects The multiplicand received, it is, each numerical value available one corresponding initial protion product in target code.In this reality It applies in example, improving Wallace tree group circuit 12 can be made up of 4-2 compressor.Optionally, 1 4-2 compressor may include Multiple full adders and selection circuit, under certain condition, the selection circuit in 4-2 compressor can control full adder to close State reduces time delay to improve the efficiency for improving Wallace tree sub-circuit.Optionally, selection circuit can be multiple selectors The circuit of composition, the selector can be No. two selectors.
Multiplier provided in this embodiment, multiplier is by improving canonical signed number coding circuit to the data received It carries out canonical signed number coded treatment and obtains the partial product after symbol Bits Expanding, and obtained according to the partial product after symbol Bits Expanding Accumulation process is carried out to the partial product of target code to the partial product of target code, and by improving Wallace tree group circuit, is obtained To target operation result, which, which can carry out canonical to the data received using canonical signed number coding circuit, symbol Number coded treatment is multiplied with reducing the number of the live part obtained in multiplication procedure product to reduce multiplier realization The complexity of method operation improves the operation efficiency of multiplying, effectively reduces the power consumption of multiplier;Meanwhile above-mentioned multiplication Device can carry out multiplying to the data of a variety of different bit wides, multiplier power consumption be effectively reduced, and effectively reduce multiplication The area of device occupancy AI chip.
Fig. 2 is the concrete structure schematic diagram for the multiplier that another embodiment provides, wherein multiplier includes described improves just Then signed number coding circuit 11, the improvement canonical signed number coding circuit 11 include: that the improvement canonical signed number is compiled Code circuit 11 includes: to improve canonical signed number coding unit 111, low portion product acquiring unit 112, low level selector group list Member 113, high-order portion product acquiring unit 114 and high digit selector group unit 115, the improvement canonical signed number coding are single First output end of member 111 is connect with the first input end of low portion product acquiring unit 112, the low level selector group The output end of unit 113 is connect with the second input terminal of low portion product acquiring unit 112, and the improvement canonical has symbol The second output terminal of number encoder unit 111 is connect with the first input end of high-order portion product acquiring unit 114, the high position The output end of selector group unit 115 is connect with the second input terminal of high-order portion product acquiring unit 114.
Wherein, the canonical signed number coding unit 111 that improves is used to have the first data progress canonical received Symbolic number coded treatment determines that the multiplier can handle the position of data according to the function selection mode signal received Width, and target code is obtained according to the bit wide that the multiplier can handle data, the low portion product acquiring unit 112 is used for Low level portion according to the low level target code and the second data in the target code received, after obtaining symbol Bits Expanding Divide product, and the low portion product of target code, the low level selection are obtained according to the low portion product after the symbol Bits Expanding Device group unit 113 obtains single for the numerical value in the low portion product after gating the symbol Bits Expanding, the high-order portion product Member 114 is used for high-order target code and second data in the target code that basis receives, obtains sign bit High-order portion product after extension, and the high-order portion of target code is obtained according to the high-order portion product after the symbol Bits Expanding Product, the high digit selector group unit 115 is for the numerical value in the high-order portion product after gating the symbol Bits Expanding.
Specifically, above-mentioned improvement canonical signed number coding unit 111 can receive the first data, and to first data Canonical signed number coded treatment is carried out, target code is obtained, which can be the multiplier in multiplying.It is optional , above-mentioned low portion product acquiring unit 112 can be according to the second data received, and improve canonical signed number coding The target code that unit 111 obtains, the low portion product after obtaining symbol Bits Expanding;High-order portion product acquiring unit 114 can be with According to the second data received, and the target code that canonical signed number coding unit 111 obtains is improved, obtains sign bit High-order portion product after extension.Optionally, if currently accessible data bit width is N-bit to multiplier, the improvement in multiplier The data bit width that canonical signed number coding unit 111 receives is 2N, then improving canonical signed number coding unit 111 can be with Automatically 2N data will be received and be divided into high N digit accordingly and low N data, high N data and low N data will be carried out respectively Canonical signed number coded treatment, the bit wide of obtained high-order target code are equal to N and add 1, the position of obtained low level target code The wide N that is equal to adds 1;Meanwhile the number of the high-order portion product after the obtained correspondence symbol Bits Expanding of high-order target code can be equal to (N+1);The number of low portion product after the correspondence symbol Bits Expanding that low level target code obtains can be equal to (N+1);If multiplying Currently accessible data bit width is 2N, the number that the improvement canonical signed number coding unit 111 in multiplier receives to musical instruments used in a Buddhist or Taoist mass It is 2N according to bit wide, then canonical signed number can be carried out to 2N data are received by improving canonical signed number coding unit 111 Coded treatment obtains the intermediate code of the position (2N+1), and carries out complement processing to intermediate code, obtains the position (2N+2) data, will The data of this position (2N+2) are as target code, wherein complement processing can be characterized as to the high by one of the highest bit value of data Complement value 0 at position;At this point, the highest bit value of target code is numerical value 0, in the partial product after corresponding obtained symbol Bits Expanding The numerical value for including is 0;Wherein, high position (N+1) data are properly termed as high-order target code, low (N in the target code of the position (2N+2) + 1) position data are properly termed as low level target code.
It should be noted that above-mentioned low level selector group unit 113 can according to the function selection mode signal received, The part bit value in low portion product after gating symbol Bits Expanding, after the symbol Bits Expanding obtained for N multiplyings The numerical value in partial product after the symbol Bits Expanding that numerical value or 2N multiplying in partial product obtains;Similarly, high-order choosing Selecting device group unit 115 can be according to the function selection mode signal received, in the high-order portion product after gating symbol Bits Expanding Part bit value, numerical value in partial product or 2N multiplyings after the symbol Bits Expanding obtained for N multiplyings The numerical value in partial product after obtained symbol Bits Expanding.
It is understood that can currently handle N data if the data bit width that multiplier receives can be 2N bit and multiply Method operation, then the low portion product acquiring unit 112 in multiplier can be obtained according to each bit value in low level target code Partial product to after the corresponding symbol Bits Expanding of low N data;Above-mentioned low level selector group unit 113 can gate sign bit expansion The numerical value in low portion product after exhibition;Then by after symbol Bits Expanding partial product and gating after after the symbol Bits Expanding that obtains Low portion product in numerical value be combined, low portion after obtaining symbol Bits Expanding product.Optionally, the height in multiplier Bit position product acquiring unit 114 can obtain the corresponding symbol of high N data according to each bit value in high-order target code Partial product after Bits Expanding;Above-mentioned high digit selector group unit 115 can gate in the product of the high-order portion after symbol Bits Expanding Numerical value;Then by after symbol Bits Expanding partial product and gating after numerical value in high-order portion product after the symbol Bits Expanding that obtains It is combined, the high-order portion product after obtaining symbol Bits Expanding.Optionally, it in canonical signed number coding process, obtains The bit wide of low level target code can be equal to the obtained bit wide of high-order target code, low N data correspondence can also be equal to Symbol Bits Expanding after low portion product number or the corresponding symbol Bits Expanding of high N data after high-order portion accumulate Number.Optionally, it improves and may include (N+1) a low portion product acquiring unit in canonical signed number coding circuit 11 112, (N+1) a high-order portion product acquiring unit 114 can also be included.Optionally, above-mentioned each low portion product obtains single Member 112 may include 4N numerical generation subelement, each high-order portion product acquiring unit 114 also may include 4N numerical value Generate subelement, the one digit number value in partial product after the available symbol Bits Expanding of each numerical generation subelement.Meanwhile Low portion product acquiring unit 112 can determine the low level of target code according to the product of the low portion after obtained symbol Bits Expanding Partial product, high-order portion product acquiring unit 114 can determine that target is compiled according to the product of the high-order portion after obtained symbol Bits Expanding The high-order portion product of code.
A kind of multiplier provided in this embodiment, multiplier is by improving the improvement in canonical signed number coding circuit just Then signed number coding unit carries out canonical signed number coded treatment to the data that receive, obtain low level target code and High-order target code, and low portion product acquiring unit and high-order portion product acquiring unit are according to low level target code and a high position Target code, the partial product after obtaining symbol Bits Expanding obtain the part of target code according to the partial product after symbol Bits Expanding Product, and then accumulation process is carried out to the partial product of target code by improving Wallace tree group circuit, target operation result is obtained, The multiplier can carry out canonical signed number coded treatment to the data received using canonical signed number coding circuit, with The number of the live part product obtained in multiplication procedure is reduced, so that the complexity that multiplier realizes multiplying is reduced, The operation efficiency for improving multiplying effectively reduces the power consumption of multiplier;Meanwhile above-mentioned multiplier can be to a variety of differences The data of bit wide carry out multiplying, effectively reduce the area that multiplier occupies AI chip.
In one of the embodiments, wherein, multiplier includes improving canonical signed number coding unit 111, described to change It include: the first data-in port 1111, first mode selection signal input port into canonical signed number coding unit 111 1112, low level target code output port 1113 and high-order target code output port 1114;First data input pin Mouth 1111 is for receiving first data, and the first mode selection signal input port 1112 is for receiving the function choosing Mode signal is selected, the low level target code output port 1113 carries out canonical signed number to first data for exporting The low level target code obtained after coded treatment, the high position target code output port 1114 is for exporting to described the One data carry out the high-order target code obtained after canonical signed number coded treatment.
Specifically, the improvement canonical signed number coding unit 111 in multiplier can pass through in multiplication procedure One data-in port 1111 receives the first data, selects mould by 1112 receive capabilities of first mode selection signal input port Formula signal carries out canonical signed number coded treatment to the first data, obtains intermediate code, and select according to the function of receiving Mode signal determines the need for carrying out complement processing to intermediate code, obtains target code, and then pass through low level target code Output port 1113 exports the low level target code in target code, exports target by high-order target code output port 1114 High-order target code in coding.It should be noted that the highest bit value that above-mentioned complement processing can encode between centering Complement value 0 at Gao Yiwei.
A kind of multiplier provided in this embodiment, the multiplier can be docked using canonical signed number coding unit is improved The data received carry out canonical signed number coded treatment, to reduce the number of the live part obtained in multiplication procedure product Mesh improves the operation efficiency of multiplying, effectively reduces multiplication to reduce the complexity that multiplier realizes multiplying The power consumption of device;Meanwhile above-mentioned multiplier can carry out multiplying to the data of a variety of different bit wides, effectively reduce multiplier Occupy the area of AI chip.
As one of embodiment, the low portion product acquiring unit 112 includes: low level target code input port 1121, the first gating value input mouth 1122, second mode selection signal input port 1123, the second data-in port 1124 and low portion product output port 1125;The low level target code input port 1121 is for receiving the low level mesh Mark coding, after the first gating value input mouth 1122 is used to receive the low level selector group one-cell switching, output The numerical value for including in the low portion product, the second mode selection signal input port 1123 is for receiving the function choosing Mode signal is selected, second data-in port 1124 is for receiving second data, the low portion product output end Mouth 1125 is for exporting the low portion product of the target code.
Specifically, the low portion product acquiring unit 112 in multiplier passes through low level target code input port 1121, it can To receive the low level target code for improving canonical signed number coding unit 111 and exporting, pass through the second data-in port 1124 The multiplicand in multiplying is received, the corresponding sign bit of low level target code is obtained according to low level target code and multiplicand Low portion product after extension.Optionally, if low portion accumulates the second mode selection signal input port in acquiring unit 112 The 1123 function selection mode signals received, which correspond to multiplier, can handle N data operations, the low level after then symbol Bits Expanding The bit wide of partial product can be equal to 2N.Illustratively, if multiplier handles N data operations, low portion product acquiring unit 112 The multiplicand X of a N-bit bit wide is received, then low portion product acquiring unit 112 can be according to multiplicand X and low level target Three kinds of numerical value -1,1 for including in coding and 0 obtain the partial product after corresponding 2N bit sign Bits Expanding, after the symbol Bits Expanding Partial product low N bit value can be equal to low level target code directly obtain initial protion product in all numerical value, symbol The high N bit value of partial product after Bits Expanding can be equal to the symbol bit value in initial protion product, the symbol bit value It is exactly the highest bit value of initial protion product.When the numerical value in low level target code is -1, then initial protion product can be-X, When the numerical value 1 in low level target code, then initial protion product can be X, when the numerical value 0 in low level target code, then former Initial portion product can be 0.
It should be noted that the low portion product acquiring unit 112 in multiplier can pass through the first gating numerical value input Port 1122, when receiving the not same bit-width data operation that low level selector group unit 113 gates, after obtained symbol Bits Expanding Low portion product in correspondence bit value;Then the low level target code that low portion product acquiring unit 112 is currently available Partial product after corresponding symbol Bits Expanding is combined with the corresponding bit value after gating, low after obtaining symbol Bits Expanding Bit position product.
Further, the low portion product acquiring unit 112 in multiplier can be according to low after all symbol Bits Expandings Bit position product obtains the low portion product of corresponding target code, and is compiled target by low portion product output port 1125 The low portion product output of code.Optionally, the regularity of distribution of the low portion product of all target codes can be characterized as, and first The low portion product of target code can be equal to the low portion product after first symbol Bits Expanding, i.e., in low level target code most Low portion product after the corresponding symbol Bits Expanding of low level numerical value, it is each since the low portion product that second target encodes Highest digit in highest bit value in the low portion product of a target code, with the low portion product of first aim coding Value is located at same row, and the low portion product of each target code can be equal to the low portion product after corresponding to symbol Bits Expanding, And the lowest order numerical value of the low portion product after the symbol Bits Expanding, time high position with the low portion product of a upper target code Numerical value is located at same row, it is, low level portion of the low portion product beyond first aim coding after corresponding symbol Bits Expanding Multiple numerical value of highest columns value in product are divided to be not involved in subsequent arithmetic.
A kind of multiplier provided in this embodiment, multiplier can be according to low level targets by low portion product acquiring unit The each bit value and the second data for including in coding, the partial product after obtaining symbol Bits Expanding, then by symbol Bits Expanding Partial product afterwards and the numerical value of low level selector group one-cell switching are combined, the low portion product after obtaining symbol Bits Expanding, The low portion product of target code is obtained according to the low portion product after symbol Bits Expanding, and list is obtained according to high-order portion product High-order portion product after the symbol Bits Expanding that member obtains determines the high-order portion product of target code, and then by improving Wallace tree Group circuit carries out accumulation process to the low portion product and high-order portion product of target code, obtains target operation result, this multiplies The number for the live part product that musical instruments used in a Buddhist or Taoist mass can obtain is less, to reduce the complexity that multiplier realizes multiplying, improves The operation efficiency of multiplying effectively reduces the power consumption of multiplier;Meanwhile above-mentioned multiplier can be to a variety of different bit wides Data carry out multiplying, effectively reduce the area that multiplier occupies AI chip.
In one of the embodiments, wherein, multiplier includes the low level selector group unit 113, low level selection Device group unit 113 includes: low level selector 1131, after multiple low level selectors 1131 are used for the symbol Bits Expanding The numerical value for including in low portion product is gated.
Specifically, in above-mentioned low level selector group unit 113 low level selector 1131 number, 3N* (N+ can be equal to 1), 2N can indicate that multiplier is presently in the bit wide of reason data, each low level selection in the low level selector group unit 113 The internal circuit configuration of device 1131 can be identical.Optionally, it when multiplying, improves canonical signed number coding unit 111 and connects In a low portion product acquiring unit 112 of the correspondence (N+1) connect, each low portion product acquiring unit 112 may include 4N A numerical generation subelement, wherein 2N numerical generation subelement can connect 2N low level selector 1131, this 2N numerical value Generating subelement can connect a low level selector 1131.Optionally, the 2N corresponding 2N number of low level selector 1131 Value generates subelement, can be the corresponding numerical generation subelement of 2N bit value high in the low portion product of target code, meanwhile, The external input port of the 2N low level selector 1131 is other than function selection mode signal input port (mode), and there are also two A other input ports.Optionally, if multiplier can handle the data operation of n kind difference bit wide, and multiplier receives The bit wides of data be 2N, then the other input ports of two of above-mentioned low level selector 1131 difference received signals can be 0 When carrying out the data operation of 2N bit bit wide with multiplier, low portion accumulates the correspondence symbol Bits Expanding that acquiring unit 112 obtains Symbol bit value in partial product afterwards.Wherein, (N+1) a low portion product acquiring unit 112 can connect (N+1) group 2N Low level selector 1131, the symbol bit value that 2N low level selector 1131 of each group receives can be identical, can not also phase Together, still, the symbol bit value that same group of 2N low level selector 1131 receives is identical, and the symbol bit value Can be according to each group of 2N low level selector 1131, the sign bit for the low portion product acquisition of acquiring unit 112 being correspondingly connected with Symbol bit value in partial product after extension obtains.
In addition, the 4N numerical generation subelement that each low portion product acquiring unit 112 includes, wherein corresponding to N number of Numerical generation subelement can be not connected to low level selector 1131, at this point, the numerical value that obtains of the N number of numerical generation subelement can be with The data for managing different bit wides are presently in for multiplier, the correspondence sign bit that the numerical value in the low level target code of acquisition obtains expands The correspondence bit value in low portion product after exhibition, it is understood that be that the numerical value that N number of numerical generation subelement obtains can be In low portion product after corresponding symbol Bits Expanding, correspondence is counted from lowest order (i.e. the 1st) to highest order, and the 1st to N All numerical value between numerical value.
It should be noted that the 4N numerical generation subelement that above-mentioned each low portion product acquiring unit 112 includes In, remaining N number of numerical generation subelement also can connect N number of low level selector 1131, each numerical generation subelement can To connect 1 low level selector 1131;The external input port of N number of low level selector 1131 is in addition to function selection mode signal Outside, there are two other input ports for input port (mode);The signal that the two other input ports can receive, respectively Multiplier carries out 2N data operations, the symbol bit value and multiplication in partial product after obtained correspondence symbol Bits Expanding Device carries out 2N data operations, corresponds to bit value in the low portion product after obtained correspondence symbol Bits Expanding, it is understood that For the numerical value that N number of numerical generation subelement obtains can correspond in the low portion product after corresponding symbol Bits Expanding from minimum It is counted to highest order, the position (N+1) to all numerical value between 2N bit value position (i.e. the 1st).Wherein, (N+1) a low level Partial product acquiring unit 112 can connect (N+1) and organize N number of low level selector 1131, and N number of low level selector 1131 of each group receives The symbol bit value arrived can be identical, can not also be identical, still, the symbol that same group of N number of low level selector 1131 receives Number bit value is identical, and the symbol bit value can be according to each group of N number of low level selector 1131, and what is be correspondingly connected with is low The symbol bit value in partial product after the symbol Bits Expanding that bit position product acquiring unit 112 obtains obtains.
In addition, bit value is corresponded in partial product after the symbol Bits Expanding that N number of low level selector 1131 of each group receives, It can be according to low portion that this group of low level selector 1131 is connected product acquiring unit 112, after the symbol Bits Expanding of acquisition Correspondence bit value in partial product determines;And in each group of N number of low level selector 1131, each low level selector 1131 The correspondence bit value received can be identical, can not also be identical.Wherein, 4N in each low portion product acquiring unit 112 The position distribution rule of a numerical generation subelement, can the 4N numerical value life in upper low portion product acquiring unit 112 On the basis of subunit position, a numerical generation subelement is moved to left.Optionally, all target codes of subsequent arithmetic are participated in Low portion product in, only first aim coding low portion product bit wide, first symbol Bits Expanding can be equal to The bit wide 4N of low portion product afterwards;The bit wide of the low portion product of remaining target code all can be in a upper target code It is one few on the basis of low portion product, and the bit wide of the low portion product of the last one target code can be equal to (2N-1).
A kind of multiplier provided in this embodiment, the low level selector group unit in multiplier can gate low portion product In numerical value, obtain the low portion product of target code, and then by improving Wallace tree group circuit to the low level of target code Partial-product sum high-order portion product carries out accumulation process, obtains target operation result, the live part product which can obtain Number it is less, thus reduce multiplier realize multiplying complexity, improve the operation efficiency of multiplying, effectively drop The low power consumption of multiplier;Meanwhile above-mentioned multiplier can carry out multiplying to the data of a variety of different bit wides, effectively reduce Multiplier occupies the area of AI chip.
In one of the embodiments, wherein, multiplier includes high-order portion product acquiring unit 114, the high-order portion Product acquiring unit 114 includes: that high-order target code input port 1141, second gates value input mouth 1142, the third mode Selection signal input port 1143, the second data-in port 1144 and high-order portion product output port 1145;The high position Target code input port 1141 is used for for receiving the high-order target code, the second gating value input mouth 1142 After receiving the high digit selector group one-cell switching, the numerical value for including in the high-order portion product of output, the third mode For receiving the function selection mode signal, second data-in port 1144 is used for selection signal input port 1143 Second data are received, the high-order portion product output port 1145 is used to export the high-order portion product of the target code.
Specifically, the high-order portion product acquiring unit 114 in multiplier passes through high-order target code input port 1141, it can To receive the high-order target code for improving canonical signed number coding unit 111 and exporting, pass through the second data-in port 1144 The multiplicand received in multiplying obtains the corresponding symbol of high-order target code according to high-order target code and multiplicand High-order portion product after Bits Expanding.Optionally, if high-order portion accumulates the third mode selection signal input terminal in acquiring unit 114 Mouth 1143, the function selection mode signal received correspond to multiplier and handle N data operations, then high-order portion product acquiring unit The bit wide of high-order portion product after 114 obtained symbol Bits Expandings can be equal to 2N.Illustratively, if multiplier handles N digit According to operation, high-order portion product acquiring unit 114 receives the multiplicand X of a N-bit bit wide, then high-order portion product acquiring unit 114 can directly obtain corresponding 2N bit sign according to the three kinds of numerical value -1,1 and 0 for including in multiplicand X and high-order target code High-order portion after Bits Expanding is long-pending, and it is straight that the low N bit value of the partial product after the symbol Bits Expanding can be equal to high-order target code All numerical value in initial protion product connect, it is original that the high N bit value of the partial product after symbol Bits Expanding can be equal to this Symbol bit value in partial product, the highest bit value of the symbol bit value i.e. initial protion product.When high-order target code In numerical value be -1 when, then initial protion product can be-X, when the numerical value 1 in high-order target code, then initial protion product can Think X, when the numerical value 0 in high-order target code, then initial protion product can be 0.
It should be noted that high-order portion product acquiring unit 114 can be connect by the second gating value input mouth 1142 When receiving the not same bit-width data operation that high digit selector group unit 115 gates, in the partial product after obtained symbol Bits Expanding Correspondence bit value;Then the currently available corresponding sign bit of high-order target code of high-order portion product acquiring unit 114 is expanded Partial product after exhibition is combined with the corresponding bit value after gating, the high-order portion product after obtaining symbol Bits Expanding.
Further, high-order portion product acquiring unit 114 can be obtained according to the product of the high-order portion after all symbol Bits Expandings High-order portion to corresponding target code accumulates, and accumulates output port 1145 for the high-order portion of target code by high-order portion Product output.Optionally, the regularity of distribution of the high-order portion product of all target codes can be characterized as, the height of first aim coding Bit position product can be located at the partial product of next target code of the low portion product of the last one target code, i.e., high-order mesh The bit wide of the partial product of the corresponding target code of lowest order numerical value in mark coding, the high-order portion product of first aim coding can Subtract 1 to be equal to the bit wide of the low portion product of the last one target code, it is, the high-order portion product of first aim coding High-order portion product after first symbol Bits Expanding can be equal to, and the lowest order digit of the high-order portion product after the symbol Bits Expanding Value is located at same row with time high-order numerical value of the low portion product of the last one target code, is equivalent to, first sign bit expands High-order portion product after exhibition is not involved in beyond multiple numerical value of highest columns value in the low portion product of the last one target code Subsequent arithmetic, the highest since the high-order portion product that second target encodes, in the high-order portion product of each target code Highest order numerical value in bit value, with the high-order portion product of first aim coding is located at same row, each target code High-order portion product, the high-order portion product after corresponding symbol Bits Expanding can be equal to, and the high-order portion product after the symbol Bits Expanding Lowest order numerical value, be located at same row with time high-order numerical value of the high-order portion of upper target code product, it is, corresponding accord with Multiple numerical value of the high-order portion product beyond highest columns value in the high-order portion product of first aim coding after number Bits Expanding are not Participate in subsequent arithmetic.
A kind of multiplier provided in this embodiment, multiplier can be according to high-order targets by high-order portion product acquiring unit The each bit value and the second data for including in coding obtain the high-order portion product of target code, and by improving Hua Lai Scholar's tree group circuit carries out accumulation process to the high-order portion product and low portion product of target code, obtains target operation result, The number for the live part product that the multiplier can obtain is less, to reduce the complexity that multiplier realizes multiplying, mentions The high operation efficiency of multiplying, effectively reduces the power consumption of multiplier;Meanwhile above-mentioned multiplier can be to a variety of different positions Wide data carry out multiplying, effectively reduce the area that multiplier occupies AI chip.
In one of the embodiments, wherein, multiplier includes high digit selector group unit 115, the high digit selector Group unit 115 includes: high digit selector 1151, and multiple high digit selectors 1151 are used for the height after the symbol Bits Expanding The numerical value for including in bit position product is gated.
Specifically, in above-mentioned high digit selector group unit 115 high digit selector 1151 number, 3N* (N+ can be equal to 1), 2N can indicate that multiplier is presently in the bit wide of reason data, each high position selection in the high digit selector group unit 115 The internal circuit configuration of device 1151 can be identical.Optionally, when multiplying, improving canonical signed number coding unit 111 can Acquiring unit 114 is accumulated with a high-order portion of connection (N+1), each high-order portion accumulates in acquiring unit 114, may include 4N Numerical generation subelement, wherein 2N numerical generation subelement can connect 2N high digit selectors 1151, each numerical value is raw One high digit selector 1151 is connected at subelement.Optionally, the above-mentioned 2N corresponding 2N numerical value of high digit selector 1151 are raw It can be the corresponding numerical generation subelement of 2N bit value low in the high-order portion product of target code at subelement, it is this 2N high For the external input port of digit selector 1151 other than function selection mode signal input port (mode), there are two other defeated Inbound port.Optionally, if multiplier can handle the data operation of n kind difference bit wide, and the data that receive of multiplier Bit wide is 2N, then the other input port difference received signals of two of above-mentioned high digit selector 1151 can be 0 and multiplier Portion when carrying out the data operation of 2N bit bit wide, after the correspondence symbol Bits Expanding that high-order portion product acquiring unit 114 obtains Divide the correspondence bit value in product.Wherein, (N+1) a high-order portion product acquiring unit 114 can connect (N+1) and organize 2N high-order choosing Device 1151 is selected, the correspondence bit value that 2N high digit selectors 1151 of each group receive can be identical, can not also be identical.
In addition, in the 4N numerical generation subelement that each high-order portion product acquiring unit 114 includes, corresponding N number of number Value, which generates subelement, can connect N number of high digit selector 1151, each numerical generation subelement can connect 1 high-order selection Device 1151, which can be identical with the internal circuit configuration of selector 113, and N number of high-order selection The external input port of device 1151 is other than function selection mode signal input port (mode), and there are two other input terminals Mouthful, the two other input ports distinguish received signal, can carry out N data operations for multiplier, obtained corresponding symbol Symbol bit value in partial product and multiplier after number Bits Expanding carry out 2N data operations, obtained correspondence sign bit expansion Symbol bit value in partial product after exhibition.Wherein, (N+1) a high-order portion product acquiring unit 114 can connect (N+1) group N A high digit selector 1151, the symbol bit value that N number of high digit selector 1151 of each group receives can be identical, can not also phase Together, still, the symbol bit value that same group of N number of high digit selector 1151 receives is identical, and the symbol bit value Can be according to each group of N number of high digit selector 1151, the sign bit that the high-order portion product acquiring unit 114 being correspondingly connected with obtains expands Symbol bit value in partial product after exhibition obtains.In addition, the symbol Bits Expanding that N number of high digit selector 1151 of each group receives Bit value is corresponded in partial product afterwards, it can be according to the high-order portion product acquiring unit that the high digit selector 1151 of the group is connected 114, the symbol bit value in partial product after the symbol Bits Expanding of acquisition determines, and each group of N number of high digit selector In 1151, the correspondence bit value that each high digit selector 1151 receives can be identical, can not be identical.
It should be noted that being remained in the 4N numerical generation subelement that each high-order portion product acquiring unit 114 includes Remaining N number of numerical generation subelement can be not connected to high digit selector 1151, at this point, what N number of numerical generation subelement obtained Numerical value can be presently in the data for managing different bit wides for multiplier, the correspondence that the obtained numerical value in high-order target code obtains Symbol Bits Expanding after partial product in correspondence bit value, it is understood that be the numerical value that N number of numerical generation subelement obtains It can be in the partial product after corresponding symbol Bits Expanding, correspondence be counted from lowest order (i.e. the 1st) to highest order, the position (2N+1) To all numerical value between 3N bit value.Wherein, 4N numerical generation is single in each high-order portion product acquiring unit 114 The regularity of distribution of the position of member can accumulate 4N numerical generation subunit position in acquiring unit 114 in a upper high-order portion On the basis of, move to left a numerical generation subelement.Optionally, the high-order portion product of all target codes of subsequent arithmetic is participated in In, the bit wide of the only high-order portion product of first aim coding can be equal to 4N, the high-order portion product of remaining target code Bit wide all can be one few on the basis of the high-order portion of upper target code product, an and high position for the last one target code The bit wide of partial product can be equal to (2N-1).
A kind of multiplier provided in this embodiment, the high digit selector group unit in multiplier can gate high-order portion product In numerical value, obtain the high-order portion product of target code, and then by improving Wallace tree group circuit to the high position of target code Partial-product sum low portion product carries out accumulation process, obtains target operation result, the live part product which can obtain Number it is less, thus reduce multiplier realize multiplying complexity, improve the operation efficiency of multiplying, effectively drop The low power consumption of multiplier;Meanwhile above-mentioned multiplier can carry out multiplying to the data of a variety of different bit wides, effectively reduce Multiplier occupies the area of AI chip.
In one of the embodiments, wherein, multiplier includes the improvement Wallace tree group circuit 12, improvement Hua Lai Scholar's tree group circuit 12 includes: to improve 121~12n of Wallace tree sub-circuit;Multiple improvement Wallace tree sub-circuits 121~ 12n is used to carry out accumulation process to each columns value in the partial product of the target code to obtain accumulating operation result.
Wherein, the improvement Wallace tree group circuit 12 includes: that low level improves Wallace tree sub-circuit 1211, selector 1212 and it is high-order improve Wallace tree sub-circuit 1213, the low level improves output end and the institute of Wallace tree sub-circuit 1211 State the input terminal connection of selector 1212, the output end of the selector 1212 and the high-order improvement Wallace tree sub-circuit 1213 input terminal connection;Wherein, multiple low levels improve Wallace tree sub-circuit 1211 and are used for all target codes Each columns value in partial product carries out accumulation process, and the selector 1212 is for gating the high-order improvement Wallace tree The received carry input signal of circuit, multiple high-order Wallace tree sub-circuits 1213 that improve are used for all target codes Each columns value in partial product carries out accumulation process.
Optionally, the low level improves Wallace tree sub-circuit 1211 and the high-order improvement Wallace tree sub-circuit 1213 It include the 4-2 compressor and mode selection circuit, the output end of the mode selection circuit and the 4-2 compressor Input terminal connection;Wherein, the 4-2 compressor is for adding up to the numerical value of each column in the partial product of all target codes Processing, the mode selection circuit are used to gate the number in the partial product for the target code that the 4-2 compressor receives Value;It wherein, include first input end in the mode selection circuit, for receiving the function selection mode signal.
Multiply specifically, the number n for improving the improvement Wallace tree sub-circuit that Wallace tree group circuit 12 includes can be equal to Musical instruments used in a Buddhist or Taoist mass is presently in 2 times of reason data bit width, and n improvement Wallace tree sub-circuit can be to a high position for all target codes The low portion of partial product and all target codes product carries out parallel processing, but connection type can be serial connection.Wherein, In the regularity of distribution of the partial product of all target codes, the height of the corresponding target code of lowest order numerical value in high-order target code Bit position product in lowest order numerical value, can target code corresponding with the highest bit value in low level target code low portion Time low level numerical value is located at same row in product.It optionally, can be to the part of target code by improving Wallace tree group circuit 12 Each columns value in product carries out accumulation process, obtains two-way output signal.
It should be noted that the partial product of each of partial product of all target codes target code can be equal to symbol Partial product after number Bits Expanding, can be with the part bit value in the partial product after equal symbol Bits Expanding, wherein first mesh The partial product for marking coding can be equal to the partial product after first corresponding symbol Bits Expanding.Optionally, each target code Partial product in lowest order numerical value can be located at same row, phase with the secondary low level numerical value in the partial product of a upper target code Each bit value in partial product after in each symbol Bits Expanding, in the partial product after a upper symbol Bits Expanding On the basis of respective column locating for each bit value, a column, and the highest bit value of the partial product of each target code are moved to left, Same row is respectively positioned on the highest bit value in the partial product of first aim coding, wherein beyond first aim coding The numerical value of the higher column of highest bit value respective column in partial product can be without accumulating operation.Optionally, all targets are compiled The columns of the partial product of code can be equal to 2 times that multiplier is presently in reason data bit width.
Illustratively, if two data bit widths that multiplier receives are 16 bits, multiplying for 8 data can currently be handled Method operation, then current multiplier can handle two groups 8 * 8 data multiplication operations, which has symbol by improving canonical The distribution rule of the high-order portion product of the low portion product and 9 target codes for 9 target codes that number encoder circuit 11 obtains Rule is as shown in Figure 3a, wherein the upper right corner is the distribution map of the low portion product of 9 target codes, and the lower left corner is 9 target codes High-order portion product distribution map, "○" indicate target code low portion product in each bit value,Indicate target Each bit value in the high-order portion product of coding, "●" indicate low portion product or the high-order portion of target code of target code Divide the sign extended bit value of product;If multiplier can currently handle 16 * 16 data multiplication operations, the multiplier is by changing The low portion product of 9 target codes obtained into canonical signed number coding circuit 11 and the high-order portion of 9 target codes Divide the regularity of distribution of product as shown in Figure 3b, wherein "○" indicates each bit value in the low portion product of target code, Indicate that each bit value in the high-order portion product of target code, "●" indicate the low portion product or target code of target code High-order portion product sign extended bit value.
It is understood that the circuit structure that each low level improves Wallace tree sub-circuit 1211 can be by multiple 4-2 Compressor and a mode selection circuit combination realize that each high position improves the circuit structure of Wallace tree sub-circuit 1213 Realization can also be combined by multiple 4-2 compressors and a mode selection circuit, furthermore it is also possible to be interpreted as, each changes Into multiple 4-2 compressors in Wallace tree sub-circuit, multidigit input signal can be handled, by multidigit input signal Addition obtains the circuit of two output signals, meanwhile, the circuit structure of each 4-2 compressor can pass through two full adder groups It closes and realizes.Optionally, improving 121~12n of Wallace tree group sub-circuit may include that multiple low levels improve Wallace tree sub-circuit 1211 and multiple high-order improve Wallace tree sub-circuit 1213, wherein the number that a high position improves Wallace tree sub-circuit 1213 can To be equal to the data bit width N that multiplier is currently received, the number that low level improves Wallace tree sub-circuit 1211 can also be equal to, And each low level improves can be connected in series between Wallace tree sub-circuit 1211, each high-order improvement Wallace tree sub-circuit 1213 Between can also be connected in series.Optionally, the last one low level improves the output end and selector of Wallace tree sub-circuit 1211 1212 input terminal connection, the output end of selector 1212 and first high-order input terminal for improving Wallace tree sub-circuit 1213 Connection.
In the present embodiment, it improves each of Wallace tree group circuit 12 low level and improves Wallace tree sub-circuit 1211 Can each columns value in the partial product to all target codes carry out add operation, each low level improves Wallace tree Circuit 1211 can export two signals, i.e. carry signal CarryiWith one and position signal Sumi, wherein i can indicate each A low level improves the corresponding number of Wallace tree sub-circuit 1211, and first low level improves the number of Wallace tree sub-circuit 1211 It is 0.Optionally, the number that each low level improvement Wallace tree sub-circuit 1211 receives input signal can be equal to target and compile The number of the partial product of code.Wherein, high-order improvement Wallace tree sub-circuit 1213 and low level in Wallace tree group circuit 12 are improved Improve the sum of the number of Wallace tree sub-circuit 1211,2N can be equal to, in the partial product of all target codes, from low order column to Total columns of highest column can be equal to 2N, and N number of low level improves Wallace tree sub-circuit 1211 can be to the portion of all target codes The each columns value divided in the low N column of product carries out accumulating operation, and N number of high-order improvement Wallace tree sub-circuit 1213 can be to all Each columns value in the high N column of the partial product of target code carries out accumulating operation.
Illustratively, if the data bit width that multiplier receives is 2N bit, and current multiplier carries out 2N data Multiplying, at this point, selector 1212, which can gate the last one low level improved in Wallace tree group circuit 12, improves Hua Lai Scholar tree circuit 1211, the carry output signals Cout of output2N-1As improve Wallace tree group circuit 12 in, first high position Improve the carry input signal Cin that Wallace tree sub-circuit 1213 receives2N, it is also understood that being, multiplier can currently be incited somebody to action The position the 2N data received carry out operation as a whole;Current multiplier carries out N data multiplyings, at this point, selection Device 1212 can gate 0 as improving in Wallace tree group circuit 12, and first high-order Wallace tree sub-circuit 1213 that improves connects The carry input signal Cin received2N, it is also understood that being, the position the 2N received data currently can be divided into high N by multiplier Position and low N data carry out multiplying respectively, wherein improve Wallace tree sub-circuit 1211 to last from first low level The reference numeral i that a low level improves Wallace tree sub-circuit 1211 is respectively 0,1,2 ..., N-1, from first high-order improvement China It is respectively N, N+ that Lay scholar tree circuit 1213, which improves the reference numeral i of Wallace tree sub-circuit 1213 to the last one high position, 1 ..., 2N-1.
It should be noted that improving each low level in Wallace tree group circuit 12 improves 1211 He of Wallace tree sub-circuit A high position improves Wallace tree sub-circuit 1213, and the signal received may include carry input signal Cini, partial product numerical value is defeated Enter signal, carry output signals Couti.Optionally, each low level improves Wallace tree sub-circuit 1211 and high-order improvement China The partial product numerical value input signal that Lay scholar tree circuit 1213 receives can be respective column in the partial product of all target codes Numerical value, each low level improves Wallace tree sub-circuit 1211 and the high-order carry improving Wallace tree sub-circuit 1213 and exporting Signal CoutiDigit can be equal to NCout=floor ((NI+NCin)/2)-1.Wherein, NIIt can indicate the improvement Wallace tree The number of the partial product numerical value input signal of sub-circuit, NCinIt can indicate the carry input signal of the Wallace tree sub-circuit Number, NCoutIt can indicate the number of the least carry output signals of improvement Wallace tree sub-circuit, floor () can be with table Show downward bracket function.Optionally, it improves each low level in Wallace tree group circuit 12 and improves Wallace tree sub-circuit 1211 Or the high-order carry input signal for improving Wallace tree sub-circuit 1213 and receiving, Wallace tree can be improved for a upper low level Sub-circuit 1211 or the high-order carry output signals improving Wallace tree sub-circuit 1213 and exporting, and first low level improves China The carry digit input signal that Lay scholar tree circuit 1211 receives is 0.Wherein, first high-order improvement Wallace tree sub-circuit The 1213 carry digit input signals received, the bit wide that the data of reason can be presently in by multiplier are received with multiplier Data bit width determine.Optionally, it improves in Wallace tree group circuit 12, low level improves the carry of Wallace tree sub-circuit 1211 Output port is connect with the input port of selector 1212, low level improve Wallace tree sub-circuit 1211 carry input mouth with The output port of selector 1212 connects.
In addition, each low level improves Wallace tree sub-circuit 1211 and each high position improves Wallace tree sub-circuit 1213, respective column can be gated in the partial product of all target codes by the mode selection circuit in circuit structure wherein One digit number value, and the bit value is input to the full adder in 4-2 compressor, to gate low level signal, so that the full adder Input signal be low level signal, be equivalent to and close the full adder.
Illustratively, in neural network computing, neural network computing data are that the data of zero or near zero are relatively more, warp Crossing after rarefaction and/or compression can be more for the data of zero or near zero.Mass data is converted in neural network computing data After binary data, a multiplier can carry out the data operation of two kinds of different bit wides, i.e., 8 ratios to the binary data after conversion Special bit wide data (corresponding mode=00) and 16 bit bit wide data (correspondence mode=11), and multiplying of receiving of the multiplier Several and multiplicand is the data of 8 bit bit wides, and multiple low levels improve Wallace tree sub-circuit and a multiple high positions in the multiplier Improve Wallace tree sub-circuit circuit structure diagram may refer to shown in Fig. 4, in figure~mode expression mode signal is negated, For example,~mode is low level signal if mode is high level signal, if mode is low level signal ,~mode is height Level signal.In addition, no matter multiplier carries out 8 bit wide data operations or 16 bit wide data operations, available 18 mesh The partial product for marking coding, wherein the numerical value of a column is respectively I in the partial product of all target codes0, I1, I2, I3, I4, I5, I6, I7, I8, I9, I10, I11, I12, I13, I14, I15, I16, I17, in this example, multiplier passes through the mode signal that receives, can be with Determine that high-order and low level improves the numerical value in the partial product for the target code that first mode selection circuit receives in Wallace tree For I7Or I13, to guarantee that first mode selection circuit is input to the signal of the 6th full adder as low level signal;In addition, multiplication Device can also determine that high-order and low level improves second mode selection circuit in Wallace tree and receives by the mode signal received To target code partial product in numerical value be I6Or I12, to guarantee that second mode selection circuit is input to the 8th full adder Signal be low level signal.In this example, multiplier can determine first mode selection electricity according to the mode signal received The gating signal I that road can receive7Or I13If mode=00, first mode selection circuit can gate I13It is complete as the 6th Add the input of device, gates I7Input as the 5th full adder;If mode=11, first mode selection circuit can gate I7 As the input of the 6th full adder, I is gated13Input as the 5th full adder;In addition, multiplier can also be according to receiving Mode signal determines the gating signal I that second mode selection circuit can receive6Or I12If mode=00, second mode choosing I can be gated by selecting circuit12As the input of the 8th full adder, I is gated6Input as the 7th full adder;If mode=11, Second mode selection circuit can gate I6As the input of the 8th full adder, I is gated12Input as the 7th full adder.
In addition, above-mentioned low level improves Wallace tree sub-circuit 1211 and high-order improvement Wallace tree sub-circuit 1213, It can be made of multiple 4-2 compressors.
A kind of multiplier provided in this embodiment, can low level portion to target code by improving Wallace tree group circuit Divide long-pending and high-order portion product to carry out accumulation process, and pass through summation circuit and accumulation process again is carried out to accumulation result, is multiplied The number of method operation result, the live part product which can obtain is less, so that reducing multiplier realizes multiplying Complexity, improve the operation efficiency of multiplying, effectively reduce the power consumption of multiplier;Meanwhile above-mentioned multiplier can Multiplying is carried out to the data of a variety of different bit wides, effectively reduces the area that multiplier occupies AI chip.
It is the concrete structure schematic diagram for the multiplier that another embodiment provides with continued reference to Fig. 2, multiplier includes described tired It is powered on road 13, which includes: adder 131, and the adder 131 is tired for carrying out to the accumulating operation result Add processing.
Specifically, adder 131 can be the carrier adder of not same bit-width.Optionally, adder 131 can receive The two paths of signals that Wallace tree group circuit 12 exports is improved, add operation is carried out to two-way output signal, obtains multiplying Target operation result.Optionally, above-mentioned adder 131 can be carry lookahead adder.
Optionally, the adder 131, the adder 131 include: that carry signal input port 131a and position signal are defeated Inbound port 131b and operation result output port 131c;The carry signal input port 131a is used to receive carry signal, It is used to export the carry with position signal, the operation result output port 131c for receiving with position signal input port 131b Signal and described and position signal carry out the target operation result that accumulation process obtains.
In the present embodiment, adder 131 can be received by carry signal input port 131a and improve Wallace tree group The carry signal Carry that circuit 12 exports, it is defeated by receiving improvement Wallace tree group circuit 12 with position signal input port 131b Out and position signal Sum, and by carry signal Carry with and position signal Sum progress accumulated result, exported by operation result Port 131c output.
It should be noted that multiplier can use 131 couples of improvement Wallaces of adder of different bit wides when multiplying Tree group circuit 12 export carry output signals Carry with and position output signal Sum progress add operation, wherein above-mentioned addition The bit wide for handling data of device 131 can be equal to 2 times of the currently processed data bit width N of multiplier.Optionally, Hua Lai is improved Each of scholar's tree group circuit 12 low level improves Wallace tree sub-circuit 1211 and high-order improvement Wallace tree sub-circuit 1213, A carry output signals Carry can be exportedi, with one and position output signal Sumi(i=1 ..., 2N, i are each low level Or the high-order reference numeral for improving Wallace tree sub-circuit, number since 1).Optionally, the Carry that adder 131 receives ={ [Carry1: Carry2N-1], 0 }, that is to say, that the bit wide for the carry output signals Carry that adder 131 receives is 2N, Preceding 2N-1 bit value is corresponding in carry output signals Carry improves in Wallace tree group circuit 12, preceding 2N-1 low level and a high position The carry output signals of Wallace tree sub-circuit are improved, last bit value can use 0 generation of numerical value in carry output signals Carry It replaces.Optionally, adder 131 receive and position output signal Sum bit wide be N and position output signal Sum in numerical value can It is improved in Wallace tree group circuit 12 with being equal to, each low level or a high position improve Wallace tree sub-circuit exports and position output Signal.
Illustratively, if multiplier currently processed 8 * 8 fixed-point number multiplyings, adder 131 can be 16 Position carry lookahead adder, as shown in figure 5, low level and high-order improvement Wallace tree can be exported by improving Wallace tree group circuit 12 Sub-circuit, 16 obtained and position output signal Sum and carry output signals Carry, still, 16 carry lookahead adders connect Receive and position output signal, can to improve the complete and position signal Sum that Wallace tree group circuit 12 exports, receive into Position output signal can remove the last one high position and improve Wallace tree sub-circuit 1213 to improve in Wallace tree group circuit 12 All carry output signals of the carry output signals of output, the carry signal Carry after being combined with numerical value 0.Wherein, in Fig. 5 Wallace_i indicates that low level or high-order improvement Wallace tree sub-circuit, i are low level and the high-order Wallace tree sub-circuit that improves from 0 The number of beginning, and the corresponding improvement Hua Lai of the high bit number of solid line expression connected between Wallace tree sub-circuit is improved two-by-two Scholar's tree circuit has carry output signals, and dotted line indicates that the corresponding improvement Wallace tree sub-circuit of high bit number does not carry out Signal, ladder circuit indicate that No. two selectors, No. two selector can be used as the selector 1212 in multiplier.
A kind of multiplier provided in this embodiment, multiplier can be defeated to Wallace tree group circuit is improved by summation circuit Two paths of signals out carries out accumulating operation, obtains target operation result, the multiplier can data to a variety of different bit wides into Row multiplying effectively reduces the area that multiplier occupies AI chip, reduces the power consumption of multiplier.
Fig. 6 is the flow diagram for the data processing method that one embodiment provides, and this method can pass through Fig. 1 and Fig. 2 Shown in multiplier handled, the present embodiment what is involved is to different bit wides data carry out multiplying process.Such as Fig. 6 It is shown, this method comprises:
S101, pending data and function selection mode signal are received, wherein the function selection mode signal is used for Indicate the bit wide that can currently handle data.
Specifically, multiplier can receive pending data by improving canonical signed number coding circuit, this is to be processed Data can be the multiplier and multiplicand in multiplying.Each time when multiplying, multiplier can also be by improving canonical Signed number coding circuit and improvement Wallace tree group circuit receive different function selection mode signals, and same once-through operation The function selection mode signal that Shi Gaijin canonical signed number coding circuit and improvement Wallace tree group circuit receive can phase Together.If multiplier receives different function selection mode signals, multiplier can handle the data operation of different bit wides, together When, different function selection mode signals can handle the corresponding relationship between the data of different bit wides from multiplier can be flexible Setting, is not limited in any way this present embodiment.
It should be noted that if improving multiplier to be processed and quilt to be processed that canonical signed number coding circuit receives The bit wide of multiplier, accessible data bit width corresponding with the function selection mode signal that multiplier receives is unequal, then multiplication The pending data received can be divided into, currently may be used with multiplier according to the current accessible data bit width of multiplier by device The equal multi-group data of the data bit width of processing carries out parallel processing, wherein improves canonical signed number coding circuit and receives The bit wide of pending data can be greater than the current accessible data bit width of multiplier.Optionally, above-mentioned parallel processing can be with It is characterized as each group of pending data after dividing while handling.If improving canonical signed number coding circuit to receive The bit wide of the pending data arrived, accessible data bit width phase corresponding with the function selection mode signal that multiplier receives Deng then multiplier is directly handled the pending data received.Optionally, above-mentioned pending data may include wait locate The high position data of reason and low data to be processed.Wherein, if the bit wide of pending data is 2N, high N is to be processed High position data, low N is high position data to be processed.
Optionally, the bit wide of multiplier to be processed and multiplicand to be processed that canonical signed number coding circuit receives is improved It can be 8 bits, 16 bits, 32 bits, 64 bits do not do any restriction to this present embodiment.Wherein, the position of multiplier to be processed Width can be equal to the bit wide of multiplicand to be processed.
Illustratively, if improve canonical signed number coding circuit and improve Wallace tree group circuit can receive it is a variety of Function selection mode signal can be respectively mode=00, mode=01, mode by taking three kinds of function selection mode signals as an example =10, mode=11, then mode=00 can characterize multiplier and can handle 8 data, and mode=01 can be characterized Multiplier can be handled 16 data, and mode=10 can characterize multiplier and can handle 32 data, Mode=11 can characterize multiplier and can handle 64 data, in addition, mode=00 can also be characterized as multiplier 32 data can be handled, mode=01, which can also be characterized as multiplier, to be handled 64 data, mode =10, which can characterize multiplier, to be handled 8 data, and mode=11 can characterize multiplier can be to 16 data It is handled, it can be with flexible setting to this present embodiment.
S102, canonical signed number coded treatment is carried out to the pending data, obtains intermediate code.
Specifically, multiplier can be by improving canonical signed number coding unit to multiplying in the multiplying received Number carries out canonical signed number coded treatment, obtains intermediate code.Wherein, the bit wide of intermediate code can be equal to the multiplier of processing Bit wide N adds 1.
Optionally, canonical signed number coded treatment is carried out to the pending data in above-mentioned S102, obtains intermediate volume Code, comprising: it is 1 that l bit value 1 continuous in the pending data, which is converted to the position (l+1) highest bit value, lowest order numerical value Be -1, remaining position be numerical value 0 after, obtain the intermediate code, wherein l be more than or equal to 2.
It should be noted that the method for above-mentioned canonical signed number coded treatment can characterize in the following manner, for N For the multiplier of position, handled from low level to high-order numerical value, it, then can be by continuous l digit if it exists when continuous l (l >=2) bit value 1 Value 1 is converted to data " 1 (0)l-1(- 1) ", and remaining is corresponded into (l+1) bit value after (N-l) bit value and conversion In conjunction with a new data are obtained, using the new data as the primary data of next stage conversion process, after conversion process To new data in there is no until continuous l (l >=2) bit value 1, obtain initial code, and to the highest digit of initial code Complement value 0 at high one of value, obtains the intermediate code of the position (N+1);For 2N multipliers, multiplier is to 2N complete Multiplier carries out canonical signed number coded treatment, obtains 2N initial codes, and obtain among the position (2N+1) according to initial code Coding.
S103, complement processing is carried out to the intermediate code according to the function selection mode signal, obtains target code.
Specifically, the bit wide of above-mentioned target code, which can be equal to multiplier bit wide N in multiplying, adds 1.Optionally, function is selected A variety of different forms can be defined as by selecting mode signal, each corresponding function selection mode signal corresponds to multiplier can be with Handle the bit wide of data.
In the present embodiment, above-mentioned target code may include low level target code and high-order target code.If multiplication The bit wide for the pending data that device receives is 2N bit, and the function selection mode signal correspondence that multiplier receives currently may be used The bit wide for handling data is N-bit, then 2N pending datas can be divided into high N digit accordingly and low N data, pass through high N The position available corresponding position (N+1) high position target code of data is low by the available corresponding position (N+1) of low N data Position target code, if the corresponding bit wide that can currently handle data of function selection mode signal that multiplier receives is 2N bit, Then multiplier is handled to obtain the position (2N+1) intermediate code to complete 2N data, and to this (2N+1) position intermediate code into The processing of row complement, obtains the target code of the position (2N+2), wherein in the target code of the position (2N+2), high (N+1) bit value can be with As high-order target code, low (N+1) bit value can be used as low level target code.Optionally, above-mentioned cover, which is handled, to be To complement value 0 at high one of the highest bit value of intermediate code.
S104, conversion process is carried out to the pending data and the target code, obtains the partial product of target code.
Specifically, low portion product acquiring unit and high-order portion product acquiring unit in multiplier, it can be according to connecing The multiplicand to be processed received, and the target code that canonical signed number coding unit obtains is improved, obtain target code Partial product.Optionally, the partial product of above-mentioned target code can be the partial product after the correspondence symbol Bits Expanding that multiplier obtains, Can be with the part bit value in the partial product after equal symbol Bits Expanding, the number of the partial product after the symbol Bits Expanding can be with Equal to the bit wide of target code, the bit wide that can also be equal to target code adds 1.Optionally, above-mentioned conversion process can be characterized as, Based on pending data, target code is converted to the partial product of target code, which is multiplicand.
S105, cumulative place is carried out to the partial product midrange value of the target code by improving Wallace tree group circuit Reason, obtains target operation result.
Specifically, improving Wallace tree group circuit can be made up of 4-2 compressor.
A kind of data processing method provided in this embodiment, this method can carry out canonical signed number to pending data Coded treatment, and target code is obtained according to the function selection mode signal received, it carries out obtaining target according to target code The partial product of coding reduces the number of the live part product of target code in multiplying, to reduce answering for multiplying Polygamy;Meanwhile this method can according to the function selection mode signal that multiplier receives to the data of a variety of different bit wides into Row multiplying effectively reduces the area that multiplier occupies AI chip, and by improving the mode in Wallace tree sub-circuit Selection circuit gates the numerical value in the partial product of target code, so that the full adder improved in Wallace tree sub-circuit receives To signal be low level signal, guarantee close the full adder, with achieve the purpose that reduce power consumption.
The data processing method that another embodiment provides, according to the function selection mode signal to described in above-mentioned S103 The step of intermediate code carries out complement processing, obtains target code, can specifically include: be believed according to the function selection mode Number, judge whether to need to carry out complement processing to the intermediate code;If desired complement processing is carried out, then to the intermediate code High one of highest bit value at complement value 0, obtain the target code.
Specifically, multiplier can receive different function selection mode signals, and different function selection mode letters Number can correspond to multiplier currently can handle the different bit wides of data.Optionally, different function selection mode signal corresponds to multiplication Device can handle N data operations, can also handle 2N data operations.Illustratively, if multiplier receive it is to be processed The bit wide of data is 2N bit, and the bit wide that can currently handle data is N-bit, then 2N pending datas are divided into N high Data and low N data accordingly and low N data are handled to high N digit respectively obtain N initial codes, and to N High one of highest bit value place's benefit one digit number value 0 of position initial code, obtains the position (N+1) intermediate code, and by this (N+1) Position intermediate code is as target code;If the bit wide that multiplier can currently handle data is 2N bit, multiplier can be to complete The whole position 2N pending data is handled, and 2N initial codes, and the height of the highest bit value to 2N initial codes are obtained One digit number value 0 is mended at one, the intermediate code of the position (2N+1) is obtained, at this point, multiplier is needed to this position (2N+1) Intermediate code carries out complement processing, obtains target code.Wherein, above-mentioned complement processing can be characterized as the highest to intermediate code Complement value 0 at high one of bit value.
Optionally, above-mentioned according to the function selection mode signal, judge whether to need to carry out the intermediate code After the step of complement processing, if the method can also include: not need to carry out complement processing, the intermediate code is made For the target code.
It should be noted that continuing a upper example, in the case that multiplier can currently handle N data, centering is not needed Between coding carry out complement processing, can be directly using the position (N+1) intermediate code as target code.
A kind of data processing method provided in this embodiment, the number of the partial product for the target code that this method can obtain It is less, to reduce the complexity of multiplying.
As one of embodiment, the pending data and the target code are carried out at conversion in above-mentioned S104 Reason the step of obtaining the partial product of target code, may include:
S1041, conversion process is carried out by low level target code in the target code and the pending data, Obtain the low portion product of target code.
Optionally, pass through low level target code in the target code and the pending data in above-mentioned S1041 Conversion process is carried out, the step of the low portion product of target code is obtained, can specifically include: according to the low level target code And the pending data, the partial product after obtaining symbol Bits Expanding;Expanded by low level selector group one-cell switching sign bit The numerical value in low portion product after exhibition;By gating after the symbol Bits Expanding after low portion product in numerical value and Numerical value in partial product after the symbol Bits Expanding carries out conversion process, the low portion after obtaining the symbol Bits Expanding Product;The low portion product of the target code is obtained according to the low portion product after the symbol Bits Expanding.
Specifically, multiplier is obtained according to the function selection mode signal received, low level target code and pending data It is presently in the reason not corresponding original low portion product of same bit-width data to multiplier, and original low portion product is accorded with Number Bits Expanding handles to obtain the partial product after symbol Bits Expanding.Optionally, above-mentioned original low portion product can be not accorded with The low portion product of number Bits Expanding, it is also understood that for the corresponding obtained partial product for not carrying out symbol Bits Expanding of low data. Optionally, the bit wide of the partial product after symbol Bits Expanding can currently handle 2 times of data bit width N equal to multiplier, original The bit wide of low portion product can be equal to (N+1).Optionally, the partial product after symbol Bits Expanding may include original low portion The symbol bit value in the original low portion product of (N+1) bit value and continuous position (N-1) in product.
It should be noted that if low portion product acquiring unit receives one 8 multiplicand x7x6x5x4x3x2x1x0 (i.e. X), then low portion product acquiring unit can be according to multiplicand x7x6x5x4x3x2x1x0It is wrapped in (i.e. X) and low level target code The three kinds of numerical value -1,1 contained and 0 directly obtain corresponding original low portion product, when the numerical value in low level target code is -1, Then original low portion product can be-X, and when the numerical value 1 in low level target code, then original low portion product can be X, when When numerical value 0 in low level target code, then original low portion product can be 0.
It is understood that each of low level selector group unit low level selector can be according to the difference received Function selection mode signal, the correspondence bit value in low portion product after gating symbol Bits Expanding.Optionally, low portion product Acquiring unit can be according to obtaining after low level selector group one-cell switching, the numerical value in low portion product after symbol Bits Expanding And multiplier can currently handle the part bit value in the partial product after the symbol Bits Expanding that corresponding bit wide data obtain, and obtain Multiplier is presently in the low portion product after managing the corresponding symbol Bits Expanding of corresponding bit wide data.
Further, multiplier can obtain corresponding target and compile according to the product of the low portion after all symbol Bits Expandings The regularity of distribution of the low portion product of code, the low portion product of all target codes can be characterized as, first aim coding Low portion product can be equal to the low portion product after first symbol Bits Expanding, i.e. lowest order numerical value pair in low level target code Low portion product after the symbol Bits Expanding answered, since the low portion product that second target encodes, each target code Low portion product in highest bit value, with first aim coding low portion product in highest order numerical value be located at it is same Column, the low portion product of each target code, the low portion product after corresponding symbol Bits Expanding can be equal to, and the sign bit The lowest order numerical value of low portion product after extension is located at same with time high-order numerical value of the low portion product of a upper target code One column, it is, the low portion product after corresponding symbol Bits Expanding is beyond highest in the low portion product of first aim coding Multiple numerical value of columns value are not involved in subsequent arithmetic.
S1042, conversion process is carried out by high-order target code in the target code and the pending data, Obtain the high-order portion product of target code.
Optionally, pass through high-order target code in the target code and the pending data in above-mentioned S1042 Conversion process is carried out, the step of the high-order portion product of target code is obtained, can specifically include: according to the high-order target code And the pending data, the partial product after obtaining symbol Bits Expanding;Expanded by high digit selector group one-cell switching sign bit The numerical value in high-order portion product after exhibition;According to after the symbol Bits Expanding after gating high-order portion product in numerical value and The numerical value in partial product after the symbol Bits Expanding, the high-order portion product after obtaining the symbol Bits Expanding;According to the symbol High-order portion product after number Bits Expanding obtains the high-order portion product of the target code.
Specifically, multiplier is obtained according to the function selection mode signal received, high-order target code and pending data It is presently in the reason not corresponding original high-order portion product of same bit-width data to multiplier, and original high-order portion product is accorded with Number Bits Expanding handles to obtain the partial product after symbol Bits Expanding.Optionally, above-mentioned original high-order portion product can be not accorded with The high-order portion product of number Bits Expanding, it is also understood that for the corresponding obtained partial product for not carrying out symbol Bits Expanding of high position data. Optionally, the bit wide of the partial product after symbol Bits Expanding can handle 2 times of data bit width N, an original high position equal to multiplier The bit wide of partial product can be equal to (N+1).Optionally, the partial product after symbol Bits Expanding may include in original high-order portion product (N+1) bit value and the position (N-1) original high-order portion product in symbol bit value.
It should be noted that the high digit selector of each of high digit selector group unit can be according to the difference received Function selection mode signal, the correspondence bit value in high-order portion product after gating symbol Bits Expanding.Optionally, high-order portion product Acquiring unit can be according to obtaining after high digit selector group one-cell switching, the numerical value in high-order portion product after symbol Bits Expanding And multiplier can currently handle the part bit value in the partial product after the symbol Bits Expanding that corresponding bit wide data obtain, and obtain Multiplier is presently in the high-order portion product after managing the corresponding symbol Bits Expanding of corresponding bit wide data.
Further, multiplier can obtain corresponding target and compile according to the product of the high-order portion after all symbol Bits Expandings The regularity of distribution of the high-order portion product of code, the high-order portion product of all target codes can be characterized as, first aim coding High-order portion product can be located at the partial product of next target code of the low portion product of the last one target code, i.e., high-order The partial product of the corresponding target code of lowest order numerical value in target code, the bit wide of the high-order portion product of first aim coding The bit wide that the low portion product of the last one target code can be equal to subtracts 1, it is, the high-order portion of first aim coding Product can be equal to the high-order portion product after first symbol Bits Expanding, and the lowest order of the high-order portion product after the symbol Bits Expanding Numerical value is located at same row with time high-order numerical value of the low portion product of the last one target code, is equivalent to, first sign bit High-order portion product after extension is not joined beyond multiple numerical value of highest columns value in the low portion product of the last one target code With subsequent arithmetic, since the high-order portion product that second target encodes, in the high-order portion of each target code product most Highest order numerical value in high-order numerical value, with the high-order portion product of first aim coding is located at same row, each target code High-order portion product, the high-order portion product after corresponding symbol Bits Expanding can be equal to, and the high-order portion after the symbol Bits Expanding Long-pending lowest order numerical value is located at same row with time high-order numerical value of the high-order portion product of a upper target code, it is, corresponding Multiple numerical value of the high-order portion product beyond highest columns value in the high-order portion product of first aim coding after symbol Bits Expanding It is not involved in subsequent arithmetic.
Wherein, execution sequence the present embodiment of above-mentioned steps S1041 and step S1042 can not do any restriction.
A kind of data processing method provided in this embodiment, the number of the partial product for the target code that this method can obtain It is less, to reduce the complexity of multiplying.
The multiplying method that another embodiment provides, by improving Wallace tree group circuit to the mesh in above-mentioned S105 The step of partial product midrange value of mark coding carries out accumulation process, obtains target operation result, can specifically include:
S1051, Wallace tree sub-circuit is improved by low level to the columns value progress in the partial product of all target codes Accumulation process obtains intermediate calculation results.
Specifically, being advised according to the distribution of the high-order portion product of the low portion product and all target codes of all target codes Rule is it is found that total columns that the partial product of all target codes corresponds to numerical value is that (N is the position that multiplier is presently in reason data to 2N It is wide), the corresponding number of each columns value can be 0 since lowest order numerical value ..., 2N-1, wherein number 0 to N-1 can claim Low N columns value.Optionally, intermediate calculation results can improve the last one improvement Hua Lai in Wallace tree sub-circuit for low level The carry output signals Cout of scholar's tree circuit output.
It should be noted that low level improve the N number of improvement Wallace tree sub-circuit for including in Wallace tree sub-circuit can be with Accumulating operation is carried out to low N columns value according to number order, obtains intermediate calculation results.Optionally, intermediate calculation results can be with The carry output signals Carry, Sum and low level for improving Wallace tree sub-circuit including each improve Wallace tree electricity The last one in road improves the output signal Cout of Wallace tree sub-circuit.
S1052, the intermediate calculation results are gated by selector, obtains carry gating signal.
Specifically, improve Wallace tree group circuit in selector can according to the function selection mode signal received, It gates the last one that low level improves in Wallace tree sub-circuit and improves the output signal Cout or numerical value of Wallace tree sub-circuit 0, obtain carry gating signal.
S1053, Wallace tree sub-circuit is improved according to the carry gating signal and the target code by a high position Partial product in columns value carry out accumulation process, obtain the target operation result.
Specifically, according to the partial product of all target codes the regularity of distribution it is found that all target codes partial product pair The total columns for answering numerical value is 2N (N is the bit wide that multiplier is presently in reason data), each columns value since lowest order numerical value Corresponding number can be 0 ..., 2N-1, wherein number N to 2N-1 can claim high N columns value.
It should be noted that it is high-order improve the N number of improvement Wallace tree sub-circuit for including in Wallace tree sub-circuit can be with Accumulating operation is carried out to high N columns value according to number order, exports accumulating operation result.Wherein, high-order to improve Wallace tree First in circuit high-order improve carry input signal that Wallace tree sub-circuit receives can for selector output into Position gating signal.
A kind of data processing method provided in this embodiment, this method can carry out canonical signed number to pending data Coded treatment, and target code is obtained according to the function selection mode signal received, it carries out obtaining target according to target code The partial product of coding reduces the number of the live part product of target code in multiplying, to reduce answering for multiplying Polygamy;Meanwhile this method can according to the function selection mode signal that multiplier receives to the data of a variety of different bit wides into Row multiplying effectively reduces the area that multiplier occupies AI chip.
The embodiment of the present application also provides a machine learning arithmetic units comprising one or more mentions in this application The multiplier arrived executes specified machine learning fortune to operational data and control information for obtaining from other processing units It calculates, implementing result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, net Card, wifi interface, server.When comprising more than one multiplier, it can be linked by specific structure between multiplier And data are transmitted, for example, data are interconnected and are transmitted by quick external equipment interconnection bus, to support more massive machine The operation of device study.At this point it is possible to share same control system, there can also be control system independent;In can sharing Deposit, can also each accelerator have respective memory.In addition, its mutual contact mode can be any interconnection topology.
The machine learning arithmetic unit compatibility with higher, can by quick external equipment interconnection interface with it is various types of The server of type is connected.
The embodiment of the present application also provides a combined treatment devices comprising above-mentioned machine learning arithmetic unit leads to With interconnecting interface and other processing units.Machine learning arithmetic unit is interacted with other processing units, completes user jointly Specified operation.Fig. 7 is the schematic diagram of combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor number included by other processing units is with no restrictions.Its Interface of its processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine Device learns the basic control such as unlatching, stopping of arithmetic unit;Other processing units can also cooperate with machine learning arithmetic unit It is common to complete processor active task.
General interconnecting interface, for transmitting data and control between the machine learning arithmetic unit and other processing units Instruction.The machine learning arithmetic unit obtains required input data, write-in machine learning operation dress from other processing units Set the storage device of on piece;Control instruction can be obtained from other processing units, write-in machine learning arithmetic unit on piece Control caching;It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.
Optionally, the structure as shown in figure 8, can also include storage device, storage device respectively with the machine learning Arithmetic unit is connected with other processing units.Storage device for be stored in the machine learning arithmetic unit and it is described its The data of the data of its processing unit, operation required for being particularly suitable for learn arithmetic unit or other processing units in machine Storage inside in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.
In some embodiments, a kind of chip has also been applied for comprising at above-mentioned machine learning arithmetic unit or combination Manage device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.As shown in figure 9, Fig. 9 A kind of board is provided, above-mentioned board can also include other matching components, this is matched other than including said chip 389 Set component includes but is not limited to: memory device 390, reception device 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 grain (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 grain in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The reception device is electrically connected with the chip in the chip-packaging structure.The reception device is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the reception Device can be the quick external equipment interconnection interface of standard.For example, pending data is set by server by the way that standard is quickly external Standby interconnection interface is transferred to the chip, realizes data transfer.Preferably, it is connect when using quick external equipment interconnection 3.0X 16 When port transmission, theoretical bandwidth can reach 16000MB/s.In another embodiment, the reception device can also be other Interface, the application are not intended to limit the specific manifestation form of above-mentioned other interfaces, and the interface unit can be realized signaling transfer point .In addition, the calculated result of the chip still sends back external equipment (such as server) by the reception device.
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing electricity The regulation of the working condition on road.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment can be multiplier, robot, computer, printer, scanner, tablet computer, intelligent terminal, hand Machine, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, wrist-watch, Earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Electrical combination, but those skilled in the art should understand that, the application is not limited by described electrical combination mode, Because certain circuits can be realized using other way or structure according to the application.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to alternative embodiment, related device and module not necessarily this Shen It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the protection scope of the application patent should be with appended claims.

Claims (28)

1. a kind of multiplier, which is characterized in that the multiplier includes: to improve canonical signed number coding circuit, improve Hua Lai Scholar's tree group circuit and summation circuit, the improvement Wallace tree group circuit includes 4-2 compressor, and the 4-2 compressor includes Selection circuit and full adder, the output end for improving canonical signed number coding circuit and improvement Wallace tree group electricity The input terminal on road connects, and the output end for improving Wallace tree group circuit is connect with the input terminal of the summation circuit;
Wherein, the canonical signed number coding circuit that improves is used to carry out at canonical signed number coding the data received Reason, the partial product after obtaining symbol Bits Expanding, and the part of target code is obtained according to the partial product after the symbol Bits Expanding Product, the Wallace tree group circuit that improves are used to carry out accumulation process to the partial product of the target code to obtain accumulating operation knot Fruit, the summation circuit are used to carry out accumulation process to the accumulating operation result.
2. multiplier according to claim 1, which is characterized in that include in the improvement canonical signed number coding circuit First input end is used for receive capabilities selection mode signal;Include the second input terminal in the improvement Wallace tree group circuit, uses In the reception function selection mode signal;The function selection mode signal is for determining the accessible data of the multiplier Bit wide.
3. multiplier according to claim 1 or 2, which is characterized in that the improvement canonical signed number coding circuit packet It includes: improving canonical signed number coding unit, low portion product acquiring unit, low level selector group unit, high-order portion product and obtain Take unit and high digit selector group unit, first output end and the low level for improving canonical signed number coding unit The first input end of partial product acquiring unit connects, and output end and the low portion product of the low level selector group unit obtain The second input terminal of unit is taken to connect, the second output terminal for improving canonical signed number coding unit and the high-order portion The first input end connection of product acquiring unit, the output end of the high digit selector group unit and high-order portion product obtain single The second input terminal connection of member;
Wherein, the canonical signed number coding unit that improves is used to carry out canonical signed number volume to the first data received Code processing, and according to the function selection mode signal received, determine that the multiplier can handle the bit wide of data, and Target code is obtained according to the bit wide that the multiplier can handle data, the low portion product acquiring unit is used for according to reception To the target code in low level target code and the second data, after obtaining symbol Bits Expanding low portion product, and The low portion product of target code, the low level selector group unit are obtained according to the low portion product after the symbol Bits Expanding For gating the numerical value in the product of the low portion after the symbol Bits Expanding, the high-order portion product acquiring unit is used for according to connecing The high-order target code in the target code received and second data, the high-order portion after obtaining symbol Bits Expanding Product, and the high-order portion product of target code, the high digit selector are obtained according to the high-order portion product after the symbol Bits Expanding Group unit is for the numerical value in the high-order portion product after gating the symbol Bits Expanding.
4. multiplier according to claim 3, which is characterized in that the improvement canonical signed number coding unit includes: First data-in port, first mode selection signal input port, low level target code output port and high-order target are compiled Code output port;First data-in port is for receiving first data, the first mode selection signal input Port is for receiving the function selection mode signal, and the low level target code output port is for exporting to first number According to the low level target code obtain after canonical signed number coded treatment, the high position target code output port is used The high-order target code obtained after exporting to first data progress canonical signed number coded treatment.
5. multiplier according to claim 3 or 4, which is characterized in that the low portion product acquiring unit includes: low level Target code input port, the first gating value input mouth, second mode selection signal input port, the second data input pin Mouth and low portion product output port;The low level target code input port is for receiving the low level target code, institute The first gating value input mouth is stated for after receiving the low level selector group one-cell switching, the low portion of output to be long-pending In include numerical value, the second mode selection signal input port is for receiving the function selection mode signal, described the Two data-in ports are for receiving second data, and the low portion product output port is for exporting the target code Low portion product.
6. multiplier according to any one of claim 3 to 5, which is characterized in that the low level selector group unit packet Include: low level selector, the low level selector be used for after the symbol Bits Expanding low portion product in include numerical value into Row gating.
7. multiplier according to any one of claim 3 to 6, which is characterized in that the high-order portion product acquiring unit It include: high-order target code input port, the second gating value input mouth, the third mode selection signal input port, second Data-in port and high-order portion product output port;The high position target code input port is for receiving the high-order mesh Mark coding, the second gating value input mouth for after receiving the high digit selector group one-cell switching, output it is described The numerical value for including in high-order portion product, the third mode selection signal input port is for receiving the function selection mode letter Number, second data-in port is for receiving second data, and the high-order portion product output port is for exporting institute State the high-order portion product of target code.
8. the multiplier according to any one of claim 3 to 7, which is characterized in that the high digit selector group list Member includes: high digit selector, and the high digit selector is used for the number for including in the high-order portion product after the symbol Bits Expanding Value is gated.
9. multiplier according to any one of claim 1 to 8, which is characterized in that the improvement Wallace tree group circuit It include: to improve Wallace tree sub-circuit;The improvement Wallace tree sub-circuit is used for in the partial product of the target code Each columns value carries out accumulation process and obtains accumulating operation result.
10. multiplier according to any one of claim 1 to 8, which is characterized in that the improvement Wallace tree group circuit It include: that low level improves Wallace tree sub-circuit, selector and high-order improvement Wallace tree sub-circuit, the low level improvement Hua Lai The output end of scholar's tree circuit is connect with the input terminal of the selector, the output end of the selector and the high-order improvement China The input terminal of Lay scholar's tree circuit connects;Wherein, multiple low levels improve Wallace tree sub-circuit and are used to compile all targets Each columns value in the partial product of code carries out accumulation process, and the selector is for gating the high-order improvement Wallace tree The received carry input signal of circuit, multiple high-order Wallace tree sub-circuits that improve are for the part to all target codes Each columns value in product carries out accumulation process.
11. multiplier according to claim 10, which is characterized in that the low level improves Wallace tree sub-circuit and described It includes the 4-2 compressor and mode selecting unit that a high position, which improves Wallace tree sub-circuit, the mode selecting unit Output end is connect with the input terminal of the 4-2 compressor;The 4-2 compressor is used for every in the partial product of all target codes The numerical value of one column carries out accumulation process, and the mode selecting unit is for gating the target that the 4-2 compressor receives Numerical value in the partial product of coding;It wherein, include first input end in the mode selecting unit, for receiving the function choosing Select mode signal.
12. multiplier according to any one of claim 1 to 11, which is characterized in that the summation circuit includes: addition Device, the adder are used to carry out accumulation process to the accumulating operation result.
13. multiplier according to claim 12, which is characterized in that the adder include: carry signal input port, With position signal input port and operation result output port;The carry signal input port is used to receive carry signal, and Position signal input port for receive and position signal, the operation result output port for export the carry signal with it is described The target operation result that accumulation process obtains is carried out with position signal.
14. a kind of data processing method, which is characterized in that the described method includes:
Receive pending data and function selection mode signal, wherein the function selection mode signal is used to indicate currently The bit wide of data can be handled;
Canonical signed number coded treatment is carried out to the pending data, obtains intermediate code;
Complement processing is carried out to the intermediate code according to the function selection mode signal, obtains target code;
Conversion process is carried out to the pending data and the target code, obtains the partial product of target code;
Accumulation process is carried out to the partial product midrange value of the target code by improving Wallace tree group circuit, obtains target Operation result.
15. according to the method for claim 14, which is characterized in that described to have symbol to pending data progress canonical Number encoder processing, obtains intermediate code, comprising: l bit value 1 continuous in the pending data is converted to the position (l+1) most High-order numerical value be 1, lowest order numerical value be -1, remaining position be numerical value 0 after, obtain the intermediate code, wherein l be more than or equal to 2.
16. method according to claim 14 or 15, which is characterized in that described according to the function selection mode signal pair The intermediate code carries out complement processing, obtains target code, comprising:
According to the function selection mode signal, judge whether to need to carry out complement processing to the intermediate code;
If desired complement processing is carried out, then to complement value 0 at high one of the highest bit value of the intermediate code, is obtained described Target code.
17. according to the method for claim 16, which is characterized in that the method also includes: if not needing to carry out at complement Reason, then using the intermediate code as the target code.
18. according to the method for claim 17, which is characterized in that described to the pending data and the target code Conversion process is carried out, the partial product of target code is obtained, comprising:
By the low level target code and pending data progress conversion process in the target code, target volume is obtained The low portion product of code;
By the high-order target code and pending data progress conversion process in the target code, target volume is obtained The high-order portion product of code.
19. according to the method for claim 18, which is characterized in that the low level target by the target code is compiled Code and the pending data carry out conversion process, obtain the low portion product of target code, comprising:
Partial product according to the low level target code and the pending data, after obtaining symbol Bits Expanding;
Pass through the numerical value in the low portion product after low level selector group one-cell switching symbol Bits Expanding;
Pass through the portion after the numerical value and the symbol Bits Expanding in the low portion product after the symbol Bits Expanding after gating The numerical value divided in product carries out conversion process, the low portion product after obtaining the symbol Bits Expanding;
The low portion product of the target code is obtained according to the low portion product after the symbol Bits Expanding.
20. method described in 8 or 19 according to claim 1, which is characterized in that the high-order mesh by the target code Mark coding and the pending data carry out conversion process, obtain the high-order portion product of target code, comprising:
Partial product according to the high-order target code and the pending data, after obtaining symbol Bits Expanding;
Pass through the numerical value in the high-order portion product after high digit selector group one-cell switching symbol Bits Expanding;
According to the portion after the numerical value and the symbol Bits Expanding in the high-order portion product after the symbol Bits Expanding after gating Divide the numerical value in product, the high-order portion product after obtaining the symbol Bits Expanding;
The high-order portion product of the target code is obtained according to the high-order portion product after the symbol Bits Expanding.
21. method described in any one of 4 to 20 according to claim 1, which is characterized in that described by improving Wallace tree group Circuit carries out accumulation process to the partial product midrange value of the target code, obtains target operation result, comprising:
Wallace tree sub-circuit is improved by low level, accumulation process is carried out to the columns value in the partial product of all target codes, obtain To intermediate calculation results;
The intermediate calculation results are gated by selector, obtain carry gating signal;
Wallace tree sub-circuit is improved according in the carry gating signal and the partial product of the target code by a high position Columns value carry out accumulation process, obtain the target operation result.
22. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed Benefit requires the described in any item multipliers of 1-13, for being obtained from other processing units to operation input data and control letter Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple multipliers, can pass through between the multiple computing device Specific structure is attached and transmits data;
Wherein, multiple multipliers are interconnected by PCIE bus and are transmitted data, to support more massive engineering The operation of habit;Multiple multipliers share same control system or possess respective control system;Multiple multipliers are total It enjoys memory or possesses respective memory;The mutual contact mode of multiple multipliers is any interconnection topology.
23. a kind of combined treatment device, which is characterized in that the combined treatment device includes machine as claimed in claim 22 Learn arithmetic unit, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other processing units, the common calculating behaviour for completing user and specifying Make.
24. combined treatment device according to claim 23, which is characterized in that further include: storage device, the storage device It is connect respectively with the machine learning arithmetic unit and other processing units, for saving the machine learning arithmetic unit With the data of other processing units.
25. a kind of neural network chip, which is characterized in that the machine learning chip includes machine as claimed in claim 22 Learn arithmetic unit or combined treatment device as claimed in claim 23 or combined treatment device as claimed in claim 24.
26. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 25.
27. a kind of board, which is characterized in that the board includes: memory device, reception device and control device and such as right It is required that neural network chip described in 25;
Wherein, the neural network chip is separately connected with the memory device, the control device and the reception device;
The memory device, for storing data;
The reception device, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
28. board according to claim 27, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The reception device are as follows: standard PCIE interface.
CN201910817971.8A 2019-08-30 2019-08-30 Multiplier, data processing method, chip and electronic equipment Active CN110515587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910817971.8A CN110515587B (en) 2019-08-30 2019-08-30 Multiplier, data processing method, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910817971.8A CN110515587B (en) 2019-08-30 2019-08-30 Multiplier, data processing method, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN110515587A true CN110515587A (en) 2019-11-29
CN110515587B CN110515587B (en) 2024-01-19

Family

ID=68628671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910817971.8A Active CN110515587B (en) 2019-08-30 2019-08-30 Multiplier, data processing method, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN110515587B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031911A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113033788A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
WO2023015442A1 (en) * 2021-08-10 2023-02-16 华为技术有限公司 Multiplier

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4308112A1 (en) * 1993-03-15 1994-10-13 Andreas Herrfeld Circuit for the CSD [canonical signed digit] encoding of a binary two's complement or binary number
WO2002033537A1 (en) * 2000-10-16 2002-04-25 Nokia Corporation Multiplier and shift device using signed digit representation
US20030220956A1 (en) * 2002-05-22 2003-11-27 Broadcom Corporation Low-error canonic-signed-digit fixed-width multiplier, and method for designing same
CN105183424A (en) * 2015-08-21 2015-12-23 电子科技大学 Fixed-bit-width multiplier with high accuracy and low energy consumption properties

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4308112A1 (en) * 1993-03-15 1994-10-13 Andreas Herrfeld Circuit for the CSD [canonical signed digit] encoding of a binary two's complement or binary number
WO2002033537A1 (en) * 2000-10-16 2002-04-25 Nokia Corporation Multiplier and shift device using signed digit representation
CN1454347A (en) * 2000-10-16 2003-11-05 诺基亚公司 Multiplier and shift device using signed digit representation
US7257609B1 (en) * 2000-10-16 2007-08-14 Nokia Corporation Multiplier and shift device using signed digit representation
US20030220956A1 (en) * 2002-05-22 2003-11-27 Broadcom Corporation Low-error canonic-signed-digit fixed-width multiplier, and method for designing same
CN105183424A (en) * 2015-08-21 2015-12-23 电子科技大学 Fixed-bit-width multiplier with high accuracy and low energy consumption properties

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万超 等: "一种高速数字FIR滤波器的VLSI实现", 《合肥工业大学学报(自然科学版)》, vol. 31, no. 5, pages 736 - 739 *
王瑞光 等: "基于CSD编码的16位并行乘法器的设计", 《微计算机信息》, vol. 24, no. 23, pages 75 - 76 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031911A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113033788A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033788B (en) * 2019-12-24 2023-08-18 上海寒武纪信息科技有限公司 Data processor, method, device and chip
WO2023015442A1 (en) * 2021-08-10 2023-02-16 华为技术有限公司 Multiplier

Also Published As

Publication number Publication date
CN110515587B (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN109740739B (en) Neural network computing device, neural network computing method and related products
CN109740754B (en) Neural network computing device, neural network computing method and related products
CN110413254A (en) Data processor, method, chip and electronic equipment
CN110515589A (en) Multiplier, data processing method, chip and electronic equipment
CN110362293A (en) Multiplier, data processing method, chip and electronic equipment
CN110515587A (en) Multiplier, data processing method, chip and electronic equipment
CN110163357A (en) A kind of computing device and method
CN110515590A (en) Multiplier, data processing method, chip and electronic equipment
CN110531954A (en) Multiplier, data processing method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN110515588A (en) Multiplier, data processing method, chip and electronic equipment
CN110515586A (en) Multiplier, data processing method, chip and electronic equipment
CN110688087B (en) Data processor, method, chip and electronic equipment
CN111260070B (en) Operation method, device and related product
CN210109863U (en) Multiplier, device, neural network chip and electronic equipment
CN210006031U (en) Multiplier and method for generating a digital signal
CN111258641B (en) Operation method, device and related product
CN110378477A (en) Multiplier, data processing method, chip and electronic equipment
CN110515585A (en) Multiplier, data processing method, chip and electronic equipment
CN113031909B (en) Data processor, method, device and chip
CN110554854B (en) Data processor, method, chip and electronic equipment
CN113033788B (en) Data processor, method, device and chip
CN109582277A (en) Data processing method, device and Related product
CN209879492U (en) Multiplier, machine learning arithmetic device and combination processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant