CN110096309A - Operation method, device, computer equipment and storage medium - Google Patents
Operation method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110096309A CN110096309A CN201910548268.1A CN201910548268A CN110096309A CN 110096309 A CN110096309 A CN 110096309A CN 201910548268 A CN201910548268 A CN 201910548268A CN 110096309 A CN110096309 A CN 110096309A
- Authority
- CN
- China
- Prior art keywords
- average
- instruction
- machine learning
- pondization
- pond
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Devices For Executing Special Programs (AREA)
- Advance Control (AREA)
Abstract
This disclosure relates to a kind of operation method, device, computer equipment and storage medium.Combined treatment device therein includes: machine learning arithmetic unit, general interconnecting interface and other processing units;Machine learning arithmetic unit is interacted with other processing units, the common calculating operation completing user and specifying, wherein, combined treatment device further include: storage device, the storage device is connect with machine learning arithmetic unit and other processing units respectively, for saving the data of machine learning arithmetic unit He other processing units.Operation method, device provided by the embodiment of the present disclosure, computer equipment and storage medium it is applied widely, the treatment effeciency for carrying out operation is high, processing speed is fast.
Description
Technical field
This disclosure relates to field of computer technology more particularly to a kind of average pond command processing method, device, computer
Equipment and storage medium.
Background technique
With the continuous development of science and technology, machine learning, especially neural network algorithm using more and more extensive.It is being schemed
As all having obtained good application in the fields such as identification, speech recognition, natural language processing.But due to answering for neural network algorithm
Miscellaneous degree is higher and higher, and related data operation type and quantity constantly increase.In the related technology, average pond is being carried out to data
Low efficiency, the speed for changing operation Average-Pooling are slow.
Summary of the invention
In view of this, the present disclosure proposes a kind of average pond command processing method, device, computer equipment and storages to be situated between
Matter, to improve the efficiency and speed that data carried out with average pond operation.
According to the disclosure in a first aspect, providing a kind of average pond instruction processing unit, described device includes:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled,
Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute
It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module obtains operation for carrying out average pond operation to operational data according to pondization verification is described
As a result, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
According to the second aspect of the disclosure, a kind of machine learning arithmetic unit is provided, described device includes:
Average pond instruction processing unit described in one or more above-mentioned first aspects, for from other processing units
It obtains to operational data and control information, and executes specified machine learning operation, implementing result is passed to by I/O interface
Other processing units;
It is the multiple described when the machine learning arithmetic unit includes multiple average pond instruction processing units
It can be attached by specific structure between average pond instruction processing unit and transmit data;
Wherein, multiple average pond instruction processing units are carried out by quick external equipment interconnection Bus PC IE bus
Data are interconnected and transmit, to support the operation of more massive machine learning;Multiple average pond instruction processing units are total
It enjoys same control system or possesses respective control system;Multiple average pond instruction processing unit shared drives are gathered around
There is respective memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
According to the third aspect of the disclosure, a kind of combined treatment device is provided, described device includes:
Machine learning arithmetic unit, general interconnecting interface described in above-mentioned second aspect and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying
Operation.
According to the fourth aspect of the disclosure, a kind of machine learning chip is provided, the machine learning chip includes above-mentioned
Combined treatment device described in machine learning network arithmetic unit or the above-mentioned third aspect described in second aspect.
According to the 5th of the disclosure the aspect, a kind of machine learning chip-packaging structure is provided, machine learning chip envelope
Assembling structure includes machine learning chip described in above-mentioned fourth aspect.
According to the 6th of the disclosure the aspect, a kind of board is provided, which includes machine described in above-mentioned 5th aspect
Learn chip-packaging structure.
According to the 7th of the disclosure the aspect, a kind of electronic equipment is provided, the electronic equipment includes above-mentioned fourth aspect
Board described in the machine learning chip or above-mentioned 6th aspect.
According to the eighth aspect of the disclosure, a kind of average pond command processing method is provided, the method is applied to flat
Equal pond instruction processing unit, which comprises
The average pondization instruction got is compiled, the average pondization instruction after being compiled, after the compiling
The instruction of average pondization parsed, obtain the operation code and operation domain that average pondization instructs, and according to the operation code and institute
Operation domain is stated to obtain needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, and will be described
Operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
According to the 9th of the disclosure the aspect, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with
Computer program instructions, the computer program instructions realize above-mentioned average pond command processing method when being executed by processor.
In some embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning
Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server,
Camera, video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment
Equipment.
In some embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity
Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include
Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
Average pond command processing method, device, computer equipment and storage medium provided by the embodiment of the present disclosure, should
Device includes control module and computing module, and control module is used to be compiled the average pondization instruction got, be compiled
Average pondization instruction after translating parses the average pondization instruction after compiling, obtain operation code that average pondization instructs with
Operation domain, and obtained needed for executing average pondization instruction according to operation code and operation domain to operational data, Chi Huahe and target
Address;Computing module is used to carry out average pond operation to operational data according to pondization verification, acquisition operation result, and by operation
As a result it is stored in destination address.Average pond command processing method, device provided by the embodiment of the present disclosure, computer equipment and
Storage medium it is applied widely, it is high to the treatment effeciency of average pondization instruction, processing speed is fast, carry out average pond operation
Treatment effeciency is high, speed is fast.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure
Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 2 a- Fig. 2 f shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 3 shows the schematic diagram of the application scenarios of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 4 a, Fig. 4 b show the block diagram of the combined treatment device according to one embodiment of the disclosure.
Fig. 5 shows the structural schematic diagram of the board according to one embodiment of the disclosure.
Fig. 6 shows the flow chart of the average pond command processing method according to one embodiment of the disclosure.
Specific embodiment
Below in conjunction with the attached drawing in present disclosure embodiment, the technical solution in present disclosure embodiment is carried out clear, complete
Site preparation description, it is clear that described embodiment is present disclosure a part of the embodiment, instead of all the embodiments.Based on originally draping over one's shoulders
Embodiment in dew, those skilled in the art's every other embodiment obtained without making creative work,
Belong to the range of present disclosure protection.
It should be appreciated that the claim of present disclosure, specification and term " the 0th " in attached drawing, " first ", " second " etc.
It is to be not use to describe a particular order for distinguishing different objects.Present disclosure it is used in the specification and claims
Term " includes " and "comprising" indicate described feature, entirety, step, operation, the presence of element and/or component, but do not arrange
Except one or more of the other feature, entirety, step, operation, the presence or addition of element, component and/or its set.
It is also understood that in this present disclosure term used in the description merely for the sake of the mesh of description specific embodiment
, and be not intended to limit present disclosure.As used in present disclosure specification and claims, unless context
Other situations are clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.It should also be into one
Step understands, refers to one in the associated item listed in present disclosure term "and/or" used in the specification and claims
A or multiple any combination and all possible combinations, and including these combinations.
As used in the present specification and claims, term " if " can be explained according to context
For " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " such as
Fruit detects [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to determination "
Or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
Due to being widely used for neural network algorithm, the continuous promotion of computer hardware operation people's ability, in practical application
The type and quantity of involved data operation are continuously improved.Average pond operation (Average-pooling)) it is that one kind obtains
Take the average value of all data in partial zones.It is flat to realize under different language environments due to the wide variety of programming language
The calculating process of equal pond operation, in the related technology, due to can not be widely used in being averaged for all kinds of programming languages at this stage
Pondization instruction, technical staff need a plurality of instruction of its programming language environment of customized correspondence to realize average pond operation, lead
Cause carries out that average pond operation efficiency is low, speed is slow.The disclosure provides a kind of average pond command processing method, device, calculating
Only average pond operation can be thus achieved with an instruction in machine equipment and storage medium, can significantly improve and carry out average pond
The efficiency and speed of operation.
Fig. 1 shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.As shown in Figure 1, the dress
It sets including control module 11 and computing module 12.
Control module 11, for being compiled to the average pondization instruction got, the average pond after being compiled refers to
It enables, the average pondization instruction after compiling is parsed, obtain the operation code and operation domain of average pondization instruction, and according to operation
Code and operation domain obtain needed for executing average pondization instruction to operational data, Chi Huahe and destination address.Wherein, operation code is used
In indicating that operation that average pondization instruction carries out data is averagely pond operation, operation domain include to operational data address,
The address Chi Huahe and destination address.
Computing module 12 carries out average pond operation to operational data for checking according to pondization, obtains operation result, and
Operation result is stored in destination address.
In the present embodiment, the instruction of average pondization accessed by control module be it is uncompiled, cannot be directly for hardware
The software instruction of execution, control module need first to be compiled average pondization instruction (uncompiled).It is flat after being compiled
After equal pondization instruction, the average pondization after compiling could be instructed and be parsed.Average pondization instruction after compiling is can
The hardware instruction directly executed for hardware.Control module can obtain respectively to operational data address and the address Chi Huahe
To operational data and Chi Huahe.Control module can obtain instruction and data, data input by data input-output unit
Output unit can be one or more data I/O interfaces or I/O pin.
In the present embodiment, operation code can be that a part instruction that execute operation of defined in computer program
Or field (usually being indicated with code), it is instruction sequence number, for informing which item the device executed instruction specifically needs to be implemented
Instruction.The source of all data needed for operation domain can be the corresponding instruction of execution executes needed for corresponding instruction and owns
Data include the parameters and corresponding operation method etc. such as evidence, pond core to be shipped of counting.Pondization average for one instructs it
It must include operation code and operation domain, wherein operation domain is included at least to operational data address, the address Chi Huahe and target
Location.
It should be understood that those skilled in the art can according to need instruction format and institute to the instruction of average pondization
The operation code and operation domain for including are configured, the disclosure to this with no restriction.
In the present embodiment, the apparatus may include one or more control modules, and one or more computing modules,
Can the quantity according to actual needs to control module and computing module be configured, the disclosure to this with no restriction.In device
When including a control module, which can receive the instruction of average pondization, and control one or more computing modules into
The average pond operation of row.When device includes multiple control modules, multiple control modules can receive average pondization instruction respectively,
And it controls corresponding one or more computing modules and carries out average pond operation.
Average pond instruction processing unit provided by the embodiment of the present disclosure, which includes control module and operation mould
Block, control module are used to be compiled the average pondization instruction got, the average pondization instruction after being compiled, to compiling
Average pondization instruction afterwards is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to operation code and operation
Domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;Computing module is used for according to pond
Verification carries out average pond operation to operational data, obtains operation result, and operation result is stored in destination address.The disclosure
Average pond instruction processing unit is applied widely provided by embodiment, place high to the treatment effeciency of average pondization instruction
Reason speed is fast, and treatment effeciency height, the speed for carrying out average pond operation are fast.
Fig. 2 a shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality
In existing mode, as shown in Figure 2 a, computing module 12 may include multiple adders 120 and multiple dividers 120 '.Multiple additions
Device 120 is used to execute the add operation in average pond operation.Multiple dividers 120 ' are for executing in average pond operation
Division arithmetic.
In this implementation, computing module also may include an adder and a divider, or including one
Adder, multiple dividers, then including multiple adders, a divider.It can be according to the average pond of required progress
The size of the data volume of operation requires the quantity to adder and divider to processing speed, efficiency of average pond operation etc.
Be configured, the disclosure to this with no restriction.
Fig. 2 b shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality
In existing mode, as shown in Figure 2 b, computing module 12 may include main operation submodule 121 and multiple from operation submodule 122.It is main
Operation submodule 121 may include multiple adders and multiple dividers.
Main operation submodule 121, for being carried out in average pond operation respectively using multiple adders and multiple dividers
Add operation and division arithmetic, obtain operation result, and operation result is stored in destination address.
In one possible implementation, control module 11 are also used to parse the computations got, obtain
To the operation domain and operation code of computations, and to operand according to needed for operation domain and operation code acquisition execution computations
According to.Computing module 12 is also used to treat operational data according to computations and carries out operation, obtains the calculated result of computations.
Wherein, computing module may include multiple arithmetic units, for executing operation corresponding with the arithmetic type of computations.
In this implementation, computations can be other and carry out arithmetic to data such as scalar, vector, matrix, tensors
The instruction of the operations such as operation, logical operation, those skilled in the art can according to actual needs be configured computations, this
It discloses to this with no restriction.
In the implementation, arithmetic unit may include adder, divider, multiplier, comparator etc. can to data into
The arithmetic unit of the operations such as row arithmetical operation, logical operation.It can size, operation class according to the data volume of the operation of required progress
Type requires to be configured the type and quantity of arithmetic unit to processing speed, efficiency of data progress operation etc., and the disclosure is to this
With no restriction.
In one possible implementation, control module 11 are also used to analytical Calculation and instruct to obtain multiple operational orders,
And main operation submodule 121 will be sent to operational data and multiple operational orders.
Main operation submodule 121 executes preamble processing for treating operational data, and with multiple from operation submodule
122 carry out the transmission of data and operational order.
From operation submodule 122, for being executed parallel according to the data and operational order transmitted from main operation submodule 121
Intermediate operations obtain multiple intermediate results, and multiple intermediate results are transferred to main operation submodule 122.
Main operation submodule 121 is also used to execute subsequent processing to multiple intermediate results, obtains the calculating knot of computations
Fruit, and calculated result is stored in corresponding address.
In this implementation, computations for carried out for scalar, vector data operation when, device can be controlled
It makes main operation submodule and carries out operation corresponding with computations using arithmetic unit therein.It is for square in computations
When data of the dimensions such as battle array, tensor more than or equal to 2 carry out operation, device can control from operation submodule and utilize fortune therein
It calculates device and carries out operation corresponding with computations.
It should be noted that those skilled in the art can be according to actual needs to main operation submodule and multiple from operation
Connection type between submodule is configured, to realize that the framework to computing module is arranged, for example, the framework of computing module can
To be " H " type frame structure, array type framework, tree-shaped framework etc., the disclosure to this with no restriction.
Fig. 2 c shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality
In existing mode, as shown in Figure 2 c, computing module 12 can also include one or more branch operations submodules 123, branch fortune
Operator module 123 is for forwarding main operation submodule 121 and from the data and/or operational order between operation submodule 122.Its
In, main operation submodule 121 is connect with one or more branch operations submodules 123.In this way, the main operator in computing module
Module, branch operations submodule and between operation submodule use " H " type frame structure connect, forwarded by branch operations submodule
Data and/or operational order save the resource occupation to main operation submodule, and then improve the processing speed of instruction.
Fig. 2 d shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality
In existing mode, as shown in Figure 2 d, it is multiple from operation submodule 122 be in array distribution.
It is each connect from operation submodule 122 with other adjacent from operation submodule 122, main operation submodule 121 connects
Multiple the k from operation submodule 122 are connect from operation submodule 122, k from operation submodule 122 are as follows: n of the 1st row from
Operation submodule 122, the n m arranged from operation submodule 122 and the 1st of m row are a from operation submodule 122.
Wherein, as shown in Figure 2 d, the k n for only including the 1st row from operation submodule are a from operation submodule, the n of m row
For a m arranged from operation submodule and the 1st from operation submodule, i.e. the k are multiple from operation submodule from operation submodule
Directly connect with main operation submodule in block from operation submodule.Wherein, k is a from operation submodule, in main operator
The forwarding of module and multiple data and instruction between operation submodule.In this way, it is multiple from operation submodule be in array
Distribution can be improved main operation submodule to from operation submodule and send data and/or operational order speed, and then improves instruction
Processing speed.
Fig. 2 e shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality
In existing mode, as shown in Figure 2 e, computing module can also include tree-shaped submodule 124.The tree-shaped submodule 124 includes a root
Port 401 and multiple ports 402.Root port 401 is connect with main operation submodule 121, multiple ports 402 and multiple from fortune
Operator module 122 is separately connected.Wherein, tree-shaped submodule 124 has transmission-receiving function, for forwarding main 121 He of operation submodule
From the data and/or operational order between operation submodule 122.In this way, by the effect of tree-shaped submodule so that computing module
It is connected in tree-shaped framework, and using the forwarding capability of tree-shaped submodule, main operation submodule can be improved to from operation submodule
Data and/or operational order speed are sent, and then improves the processing speed of instruction.
In one possible implementation, tree-shaped submodule 124 can be the optional as a result, it may include of the device
At least one layer of node.Node is the cable architecture with forwarding capability, and node itself does not have calculation function.Undermost node with
From operation submodule connect, with forward main operation submodule 121 and between operation submodule 122 data and/or operation refer to
It enables.Distinguishingly, as tree-shaped submodule has zero layer node, which is then not necessarily to tree-shaped submodule.
In one possible implementation, tree-shaped submodule 124 may include multiple nodes of n fork tree construction, n fork tree
Multiple nodes of structure can have multiple layers.
For example, Fig. 2 f shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.Such as figure
Shown in 2f, n fork tree construction can be binary tree structure, and tree-shaped submodule includes 2 node layers 01.Lowest level node 01 with from operation
Submodule 122 connects, to forward main operation submodule 121 and from the data and/or operational order between operation submodule 122.
In this implementation, n, which pitches tree construction, can also be that trident tree construction etc., n are the positive integer more than or equal to 2.
Those skilled in the art, which can according to need, is configured the number of plies of n and n fork tree construction interior joint in n fork tree construction,
The disclosure to this with no restriction.
In one possible implementation, operation domain can also include input height and input width.
Wherein, control module is also used to operational data address, obtain corresponding input width and input height to
Operational data.
In this implementation, input height and input width can limit the data volume obtained to operational data and
Size.Input height included by operation domain and input width can be specific numerical value, can also be storage input height and
Input the storage address of width.It is when directly including input height in operation domain and inputting the specific value of width, this is specific
Numerical value is determined as corresponding input height and input width.It include input height and the storage address for inputting width in operation domain
When, can input height and input width be obtained from input height and the storage address of input width respectively.
In one possible implementation, when not including input height in operation domain and/or inputting width, Ke Yigen
It obtains according to pre-set default input height and default input width to operational data.
By the above-mentioned means, the data volume of operational data can be treated and size is limited, guarantee the standard of operation result
True property, and guarantee that device can execute the average pondization instruction.
In one possible implementation, operation domain can also include pond core height and Chi Huahe width.
Wherein, control module 11 are also used to obtain pond from the address Chi Huahe according to pond core height and Chi Huahe width
Change core.
In one possible implementation, operation domain can also include the first stride.Wherein, computing module 12 may be used also
For moving Chi Huahe in the x direction according to the first stride.
In one possible implementation, operation domain can also include the second stride.Wherein, computing module 12 may be used also
For moving Chi Huahe in y-direction according to the second stride.
In this implementation, the stride of average pond operation is to move pond each time in carrying out average pond operation
The amplitude of core.First stride can be the amplitude for moving Chi Huahe in the x direction, and the second stride can be to be moved in y-direction
The amplitude of Chi Huahe.
It should be noted that describing in the disclosure only by taking Chi Huahe is two dimension as an example and carrying out average pond operation institute
The parameters such as height, width, the first stride and the second stride of Chi Huahe needed, if Chi Huahe is multidimensional, in correspondingly Chi Huahe
Parameter then include each of which dimension size and stride.
In one possible implementation, the first stride and second are not provided in the operation domain of average pondization instruction
When stride, computing module can be respectively its stride for corresponding to dimension with the height and width of Chi Huahe, guarantee average Chi Huayun
That calculates is normally carried out.For example, computing module 12, which can be also used for the non-overlap on to operational data, moves Chi Huahe, and compare pond
Change multiple to operational data, acquisition operation result in region corresponding to core.
In one possible implementation, do not include in operation domain pond core height, Chi Huahe width, when, can be with
Pre-set default pool core height, default pool core width are obtained, control module and computing module are executed flat
Equal pondization instruction.
In one possible implementation, operation domain can also include pond nuclear volume.Wherein, computing module 12, also
For being multiple Chi Huahe of pond nuclear volume by quantity, treats operational data and carry out average pond operation.
In this implementation, pond nuclear volume is corresponding with to operational data.For example, when pond nuclear volume is 5, it can
Five parts can be divided into operational data with determination, five parts for needing 5 pond cores to treat operational data respectively carry out
Average pond operation.
In this implementation, when operation domain does not include pond nuclear volume, can determine only needs one to operational data
Pond core can realize average pond operation.
In one possible implementation, the ruler that it is Chi Huahe in the size to operational data that computing module 12, which is also used to,
When very little non-integral multiple, the data for treating the integral multiple in operational data for the size of Chi Huahe carry out average pond operation.Its
In, size that the size to operational data is the Chi Huahe it is non-integral multiple, may include at least one of following: to operand
According to input width be Chi Huahe width it is non-integral multiple, to operational data input height be Chi Huahe height it is non-whole
Several times.
In this implementation, for be the non-integral multiple some residual data of the size of Chi Huahe in operational data
It can be without average pond operation.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, which can also include memory module 13.It deposits
Module 13 is stored up for storing to operational data and Chi Huahe.
In this implementation, memory module may include one of caching and register or a variety of, and caching can wrap
The temporary caching of speed is included, can also include that at least one NRAM (deposit at random by Neuron Random Access Memory, neuron
Access to memory).Caching can be used for storing to operational data and Chi Huahe, and register can be used for storing in operational data
Scalar data.
In one possible implementation, caching may include neuron caching.Neuron caching namely above-mentioned nerve
First random access memory can be used for storing to the neuron number evidence in operational data, and neuron number evidence may include nerve
First vector data.
In one possible implementation, which can also include direct memory access module, be used for from storage mould
Reading or storing data in block.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, control module 11 may include instruction storage submodule
Block 111, instruction processing submodule 112 and queue sub-module stored 113.
Instruction sub-module stored 111 is used to store the average pondization instruction after compiling.
Instruction processing submodule 112 is used to parse the average pondization instruction after compiling, obtains average pondization instruction
Operation code and operation domain.
The queue for storing instruction of queue sub-module stored 113, instruction queue include being arranged successively according to execution sequence
It is multiple instructions to be performed, it is multiple it is instructions to be performed may include compiling after average pondization instruct.
It in this implementation, can be according to receiving time instructions to be performed, priority level etc. to multiple pending fingers
The execution sequence of order carries out arrangement and obtains instruction queue, multiple instructions to be performed in order to successively be executed according to instruction queue.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, control module 11 can also include at dependence
Manage submodule 114.
Dependence handles submodule 114, for determining that first in multiple pending orders is instructions to be performed with the
One it is instructions to be performed before it is the 0th instructions to be performed there are when incidence relation, instructions to be performed be buffered in instruction storage for first
In submodule 111, the 0th it is instructions to be performed be finished after, from instruction sub-module stored 111 in extract the first pending finger
Order is sent to computing module 12.
Wherein, instructions to be performed there are incidence relation packets by the first the 0 with first before instructions to be performed instructions to be performed
It includes: data needed for the first storage address section of data needed for storage first is instructions to be performed and storage the 0th are instructions to be performed
0th storage address section has the region of overlapping.Conversely, the first the 0 with first before instructions to be performed instructions to be performed
Not having incidence relation to can be the first storage address section and the 0th storage address section between instructions to be performed does not have overlay region
Domain.
In this way, can according to first it is instructions to be performed with first it is instructions to be performed before the 0th pending finger
Dependence between order, so that the first 0th instructions to be performed to be finished and then execute posterior first pending
Instruction, guarantees the accuracy of operation result.
In one possible implementation, control module 11 can be also used for generating compilation text according to average pondization instruction
Part, and assembling file is translated into binary file, wherein binary file is the average pondization instruction after compiling.
In one possible implementation, the instruction format of average pondization instruction may is that
avgpool dst src0src1srcChannel srcHeigh srcWidth kernelHeight
kernelWidth sx sy
Wherein, avgpool is the operation code that average pondization instructs, dst, src0, src1, srcChannel, srcHeigh,
SrcWidth, kernelHeight, kernelWidth, sx, sy are the operation domain of average pondization instruction.Wherein, dst is target
Address, src0 are to operational data address, and src1 is the address Chi Huahe, and srcChannel is pond nuclear volume, and srcHeigh is
Input height, srcWidth are input width, and kernelHeight is pond core height, and kernelWidth is Chi Huahe width,
Sx is that Chi Huahe carries out the first mobile stride in the x direction, and sy is that Chi Huahe carries out the second mobile stride in y-direction.
It should be understood that those skilled in the art can according to need the operation code to the instruction of average pondization, instruction lattice
The position of operation code and operation domain is configured in formula, the disclosure to this with no restriction.
In one possible implementation, which can be set in graphics processor (Graphics Processing
Unit, abbreviation GPU), central processing unit (Central Processing Unit, abbreviation CPU) and embedded Processing with Neural Network
Device (Neural-network Processing Unit, abbreviation NPU) it is one or more among.
It should be noted that it is as above although describing average pond instruction processing unit using above-described embodiment as example,
It is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.In fact, user completely can according to personal preference and/or
Practical application scene flexibly sets each module, as long as meeting the technical solution of the disclosure.
Using example
Below in conjunction with " carrying out average pond operation using average pond instruction processing unit " property application as an example
Scene provides the application example according to the embodiment of the present disclosure, in order to understand the process of average pond instruction processing unit.Ability
Field technique personnel should be understood that it is following using example merely for the sake of the purpose for being easy to understand the embodiment of the present disclosure, be not construed as pair
The limitation of the embodiment of the present disclosure
Fig. 3 shows the schematic diagram of the application scenarios of the average pond instruction processing unit according to one embodiment of the disclosure.Such as
Shown in Fig. 3, the process that average pond instruction processing unit handles the instruction of average pondization is as follows:
Control module 11 is compiled the average pondization instruction 1 got, the average pondization instruction 1 after being compiled
(as averagely pondization instruction 1 is avgpool 500 100 200 5 64 32 222 1), instructs the average pondization after compiling
It is parsed, obtains the operation code and operation domain of average pondization instruction 1.Wherein, the operation code of average pondization instruction 1 is
Avgpool, it is 200 that destination address 500, which is the address 100, Chi Huahe to operational data address, and pond nuclear volume is 5, input
Height is 64, and it be 2, Chi Huahe width is 2 that input width, which be 32, Chi Huahe height, and the first stride is 2, and the second stride is 1.Control
Molding block 11 obtained to operational data address 100 64 × 32 to operational data, 2 × 2 are obtained in Cong Chihua core address 200
Chi Huahe.
Computing module 12 carries out average pond operation to operational data using 5 pondization verifications, obtains operation result, and will
Operation result is stored in destination address 500.
The course of work of above each module can refer to associated description above.
In this way, can efficiently, rapidly handle the instruction of average pondization, and carry out the efficiency of average pond operation with
Speed is also significantly increased.
The disclosure provides a kind of machine learning arithmetic unit, which may include in one or more
Average pond instruction processing unit is stated, for being obtained from other processing units to operational data and control information, is executed specified
Machine learning operation.The machine learning arithmetic unit can learn arithmetic unit or non-machine learning operation dress from other machines
It sets the middle averagely pondization that obtains to instruct, and implementing result, which is passed to peripheral equipment by I/O interface, (can also claim other processing to fill
It sets).Peripheral equipment for example camera, display, mouse, keyboard, network interface card, wifi interface, server.When flat comprising more than one
When equal pond instruction processing unit, it can be linked by specific structure between average pond instruction processing unit and transmit number
According to for example, data being interconnected and transmitted by PCIE bus, to support the operation of more massive neural network.At this point, can
To share same control system, there can also be control system independent;Can with shared drive, can also each accelerator have
Respective memory.In addition, its mutual contact mode can be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases
Connection.
Fig. 4 a shows the block diagram of the combined treatment device according to one embodiment of the disclosure.As shown in fig. 4 a, the combined treatment
Device includes above-mentioned machine learning arithmetic unit, general interconnecting interface and other processing units.Machine learning arithmetic unit and its
He interacts processing unit, the common operation completing user and specifying.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine
Device learns the basic control such as unlatching, stopping of arithmetic unit;Other processing units can also cooperate with machine learning arithmetic unit
It is common to complete processor active task.
General interconnecting interface refers to for transmitting data and control between machine learning arithmetic unit and other processing units
It enables.The machine learning arithmetic unit obtains required input data from other processing units, and machine learning arithmetic unit is written
The storage device of on piece;Control instruction can be obtained from other processing units, and the control of machine learning arithmetic unit on piece is written
System caching;It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.
Fig. 4 b shows the block diagram of the combined treatment device according to one embodiment of the disclosure.In a kind of possible implementation
In, as shown in Figure 4 b, the combined treatment device can also include storage device, storage device respectively with machine learning arithmetic unit
It is connected with other described processing units.Storage device is used to be stored in machine learning arithmetic unit and other processing units
Data, the data of operation required for being particularly suitable for are in the storage inside that machine learns arithmetic unit or other processing units
The data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
The disclosure provides a kind of machine learning chip, which includes above-mentioned machine learning arithmetic unit or combined treatment dress
It sets.
The disclosure provides a kind of machine learning chip-packaging structure, which includes above-mentioned machine
Learn chip.
The disclosure provides a kind of board, and Fig. 5 shows the structural schematic diagram of the board according to one embodiment of the disclosure.Such as Fig. 5
Shown, which includes above-mentioned machine learning chip-packaging structure or above-mentioned machine learning chip.Board is in addition to including machine
Learn other than chip 389, can also include other matching components, which includes but is not limited to: memory device 390,
Interface arrangement 391 and control device 392.
Memory device 390 and machine learning chip 389 (or the machine learning core in machine learning chip-packaging structure
Piece) it is connected by bus, for storing data.Memory device 390 may include multiple groups storage unit 393.Each group of storage list
Member 393 is connect with machine learning chip 389 by bus.It is appreciated that each group of storage unit 393 can be DDR SDRAM
(English: Double Data Rate SDRAM, Double Data Rate synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of storage units 393.Each group of storage unit 393 can
To include multiple DDR4 particles (chip).It in one embodiment, may include 4 72 inside machine learning chip 389
DDR4 controller, 64bit is used for transmission data in above-mentioned 72 DDR4 controllers, and 8bit is used for ECC check.It is appreciated that working as
When using DDR4-3200 particle in each group of storage unit 393, the theoretical bandwidth of data transmission can reach 25600MB/s.
In one embodiment, each group of storage unit 393 include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The setting control DDR in machine learning chip 389
Controller, the control for data transmission and data storage to each storage unit 393.
Interface arrangement 391 and machine learning chip 389 (or the machine learning core in machine learning chip-packaging structure
Piece) electrical connection.Interface arrangement 391 for realizing machine learning chip 389 and external equipment (such as server or computer) it
Between data transmission.Such as in one embodiment, interface arrangement 391 can be standard PCIE interface.For example, number to be processed
Machine learning chip 289 is transferred to by standard PCIE interface according to by server, realizes data transfer.Preferably, work as use
When 16 interface of PCIE 3.0X transmits, theoretical bandwidth can reach 16000MB/s.In another embodiment, interface arrangement 391 is gone back
It can be other interfaces, the disclosure is not intended to limit the specific manifestation form of above-mentioned other interfaces, and interface arrangement can be realized
Signaling transfer point.It (such as is serviced in addition, the calculated result of machine learning chip still sends back external equipment by interface arrangement
Device).
Control device 392 is electrically connected with machine learning chip 389.Control device 392 is used for machine learning chip 389
State is monitored.Specifically, machine learning chip 389 can be electrically connected with control device 392 by SPI interface.Controller
Part 392 may include single-chip microcontroller (Micro Controller Unit, MCU).If machine learning chip 389 may include multiple
Chip, multiple processing cores or multiple processing circuits are handled, multiple loads can be driven.Therefore, machine learning chip 389 can be located
In the different working condition such as multi-load and light load.It may be implemented by control device to processing multiple in machine learning chip
The regulation of the working condition of chip, multiple processing and/or multiple processing circuits.
The disclosure provides a kind of electronic equipment, which includes above-mentioned machine learning chip or board.
Electronic equipment may include data processing equipment, computer equipment, robot, computer, printer, scanner, put down
Plate computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera,
Video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles may include aircraft, steamer and/or vehicle.Household electrical appliance may include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator.Medical Devices may include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
Fig. 6 shows the flow chart of the average pond command processing method according to one embodiment of the disclosure.This method can answer
To contain such as computer equipment of memory and processor, wherein memory is made for storing execution method in the process
Data;Processor such as executes following step S51 and step S52 for executing relevant processing, calculation step.Such as Fig. 6 institute
Show, this method is applied to above-mentioned average pond instruction processing unit, the method comprising the steps of S51 and step S52.
In step s 51, the average pondization instruction got is compiled using control module, it is flat after being compiled
Equal pondization instruction parses the average pondization instruction after compiling, obtains the operation code and operation domain of average pondization instruction, and
It is obtained according to operation code and operation domain needed for executing average pondization instruction to operational data, Chi Huahe and destination address.Wherein,
It is average pond operation that operation code, which is used to indicate the operation that average pondization instruction carries out data, and operation domain includes to operand
According to address, the address Chi Huahe and destination address.
In step S52, is checked using computing module according to pondization and carry out average pond operation to operational data, transported
It calculates as a result, and operation result is stored in destination address.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported
It calculates as a result, may include: the add operation executed using multiple adders in computing module in average pond operation, Yi Jili
The division arithmetic in average pond operation is executed with multiple dividers in computing module.
In one possible implementation, computing module includes main operation submodule and multiple from operation submodule, main
Operation submodule includes multiple adders and multiple dividers.Wherein, step S52 may include:
Using in main operation submodule multiple adders and multiple dividers carry out adding in average pond operation respectively
Method operation and division arithmetic obtain operation result, and operation result are stored in destination address.
In one possible implementation, operation domain can also include input height and input width.Wherein, according to behaviour
Make code and operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address, may include:
From to operational data address, corresponding input width and input height are obtained to operational data.
In one possible implementation, operation domain can also include pond core height and Chi Huahe width.Wherein, root
It obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address, can wrap according to operation code and operation domain
It includes:
Chi Huahe is obtained from the address Chi Huahe according to pond core height and Chi Huahe width.
In one possible implementation, operation domain can also include the first stride.Wherein, it is checked according to pondization to be shipped
It counts according to average pond operation is carried out, may include: to move Chi Huahe in the x direction according to the first stride.
In one possible implementation, operation domain can also include the second stride.Wherein, it is checked according to pondization to be shipped
It counts according to average pond operation is carried out, may include: to move Chi Huahe in y-direction according to the second stride.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported
It calculates as a result, may include:
Non-overlap moves Chi Huahe on to operational data, and compares multiple to operation in region corresponding to the core of pond
Data obtain operation result.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported
It calculates as a result, may include:
When size that the size of operational data is Chi Huahe it is non-integral multiple when, treat in operational data as Chi Huahe
The data of the integral multiple of size carry out average pond operation,
Wherein, may include at least one of following to the non-integral multiple of size that the size of operational data is Chi Huahe: to
The width that the input width of operational data is Chi Huahe it is non-integral multiple, to operational data input height be Chi Huahe height
It is non-integral multiple.
In one possible implementation, operation domain can also include pond nuclear volume.Wherein, according to pondization check to
Operational data carries out average pond operation, obtains operation result, may include:
It is multiple Chi Huahe of pond nuclear volume by quantity, treats operational data and carry out average pond operation.
In one possible implementation, this method can also include: to be stored using the memory module of device to operation
Data and Chi Huahe.Wherein, memory module may include at least one of register and caching, and caching is for storing to operation
Data and Chi Huahe, caching may include at least one neuron caching NRAM;Register is for storing in operational data
Scalar data;For neuron caching for storing to the neuron number evidence in operational data, neuron number evidence may include neuron
Vector data.
In one possible implementation, the average pondization instruction got is parsed, obtains average pond and refers to
The operation code and operation domain of order may include:
Average pondization instruction after storage compiling;
Average pondization instruction after compiling is parsed, the operation code and operation domain of average pondization instruction are obtained;
Store instruction queue, instruction queue include be arranged successively according to execution sequence it is multiple instructions to be performed, it is multiple to
Execute instruction may include compiling after average pondization instruct.
In one possible implementation, this method can also include: determine it is multiple it is instructions to be performed in first
It is instructions to be performed the 0th instructions to be performed there are when incidence relation with first before instructions to be performed, cache the first pending finger
Enable, the 0th it is instructions to be performed be finished after, execute it is first instructions to be performed,
Wherein, instructions to be performed there are incidence relation packets by the first the 0 with first before instructions to be performed instructions to be performed
It includes:
The first storage address section and the storage the 0th for storing the first required data instructions to be performed are instructions to be performed required
0th storage address section of data has the region of overlapping.
In one possible implementation, the average pondization instruction got is compiled, it is flat after being compiled
Equal pondization instructs, and may include:
It is instructed according to average pondization and generates assembling file, and assembling file is translated into binary file.Wherein, binary system
File is the average pondization instruction after compiling.
It should be noted that it is as above although describing average pond command processing method using above-described embodiment as example,
It is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.In fact, user completely can according to personal preference and/or
Practical application scene flexibly sets each step, as long as meeting the technical solution of the disclosure.
Average pond command processing method is applied widely provided by the embodiment of the present disclosure, to the instruction of average pondization
Treatment effeciency is high, processing speed is fast, and treatment effeciency height, the speed for carrying out average pond operation are fast.
The disclosure also provides a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions,
It is characterized in that, the computer program instructions realize above-mentioned average pond command processing method when being executed by processor.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, present disclosure is not limited by the described action sequence because
According to present disclosure, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily present disclosure
It is necessary.
Explanation is needed further exist for, although each step in the flow chart of Fig. 6 is successively shown according to the instruction of arrow,
But these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these
There is no stringent sequences to limit for the execution of step, these steps can execute in other order.Moreover, in Fig. 6 at least
A part of step may include that perhaps these sub-steps of multiple stages or stage are not necessarily in same a period of time to multiple sub-steps
Quarter executes completion, but can execute at different times, the execution in these sub-steps or stage be sequentially also not necessarily according to
Secondary progress, but in turn or can replace at least part of the sub-step or stage of other steps or other steps
Ground executes.
It should be understood that above-mentioned Installation practice is only illustrative, the device of present disclosure can also be by another way
It realizes.For example, the division of units/modules described in above-described embodiment, only a kind of logical function partition, in actual implementation may be used
To there is other division mode.For example, multiple units, module or component can combine, or be desirably integrated into another system,
Or some features can be ignored or does not execute.
In addition, unless otherwise noted, each functional unit/module in each embodiment of present disclosure can integrate at one
In units/modules, it is also possible to each unit/module and physically exists alone, it can also be with two or more units/modules collection
At together.Above-mentioned integrated units/modules both can take the form of hardware realization, can also be using software program module
Form is realized.
If the integrated units/modules are realized in the form of hardware, which can be digital circuit, simulation electricity
Road etc..The physics realization of hardware configuration includes but is not limited to transistor, memristor etc..Unless otherwise noted, if without spy
Do not mentionlet alone bright, above-mentioned memory module can be any magnetic storage medium appropriate or magnetic-optical storage medium, for example, resistive formula stores
Device RRAM (Resistive Random Access Memory), dynamic random access memory DRAM (Dynamic Random
Access Memory), static random access memory SRAM (Static Random-Access Memory), enhancing dynamic with
Machine accesses memory EDRAM (Enhanced Dynamic Random Access Memory), high bandwidth memory HBM (High-
Bandwidth Memory), mixing storage cube HMC (Hybrid Memory Cube) etc..
If the integrated units/modules realized in the form of software program module and as independent product sale or
In use, can store in a computer-readable access to memory.Based on this understanding, the technical solution essence of present disclosure
On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words
Formula embodies, which is stored in a memory, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for present disclosure whole or
Part steps.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory
Various Jie that can store program code such as device (RAM, Random Access Memory), mobile hard disk, magnetic or disk
Matter.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.Each technical characteristic of above-described embodiment can be combined arbitrarily, to make
Description is succinct, and combination not all possible to each technical characteristic in above-described embodiment is all described, as long as however, these
Contradiction is not present in the combination of technical characteristic, all should be considered as described in this specification.
Foregoing teachings can be better understood according to following clause:
Clause A1, a kind of average pond instruction processing unit, described device include:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled,
Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute
It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module obtains operation for carrying out average pond operation to operational data according to pondization verification is described
As a result, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
Clause A2, the device according to clause A1, the computing module, comprising:
Multiple adders, for executing the add operation in the average pond operation;
Multiple dividers, for executing the division arithmetic in the average pond operation.
Clause A3, the device according to clause A2, the computing module include main operation submodule and multiple from operation
Submodule, the main operation submodule include the multiple adder and the multiple divider,
The main operation submodule is described flat for being carried out respectively using the multiple adder and the multiple divider
Add operation and division arithmetic in equal pond operation obtain operation result, and with being stored in the target by the operation result
In location.
Clause A4, the device according to clause A1, the operation domain further include inputting height and input width,
Wherein, the control module, is also used to from described in operational data address, obtain the corresponding input width with
It is described input height to operational data.
Clause A5, the device according to clause A1, the operation domain further include Chi Huahe height and Chi Huahe width,
Wherein, the control module is also used to according to the pond core height and the Chi Huahe width from the pond
The Chi Huahe is obtained in core address.
Clause A6, the device according to clause A1, the operation domain further include the first stride,
Wherein, the computing module is also used to move the Chi Huahe in the x direction according to first stride.
Clause A7, the device according to clause A1, the operation domain further include the second stride,
Wherein, the computing module is also used to move the Chi Huahe in y-direction according to second stride.
Clause A8, the device according to clause A1,
The computing module is also used to move the Chi Huahe to non-overlap in operational data described, and described in comparison
It is multiple to operational data in region corresponding to Chi Huahe, obtain the operation result.
Clause A9, the device according to clause A1,
The computing module is also used in the non-integral multiple of the size that the size to operational data is the Chi Huahe
When, average pond operation is carried out to the data in operational data be the integral multiple of the size of the Chi Huahe,
Wherein, the size to operational data is the non-integral multiple of the size of the Chi Huahe, including following at least one
: non-integral multiple, the described input to operational data for the width that the input width to operational data is the Chi Huahe
Height is the non-integral multiple of the height of the Chi Huahe.
Clause A10, the device according to clause A1, the operation domain further include pond nuclear volume,
Wherein, the computing module, be also used to by quantity be the pond nuclear volume multiple Chi Huahe, to it is described to
Operational data carries out average pond operation.
Clause A11, the device according to clause A1, described device further include:
Memory module, it is described to operational data and the Chi Huahe for storing,
Wherein, the memory module include register and caching at least one of,
The caching, described to operational data and the Chi Huahe for storing, the caching includes at least one nerve
Member caching NRAM;
The register, it is described to the scalar data in operational data for storing;
The neuron caching, it is described to the neuron number evidence in operational data, the neuron data packet for storing
Include neuron vector data.
Clause A12, the device according to clause A1, the control module, comprising:
Sub-module stored is instructed, is instructed for storing the average pondization after the compiling;
Instruction processing submodule obtains average pond and refers to for parsing to the average pondization instruction after the compiling
The operation code and operation domain of order;
Queue sub-module stored, for storing instruction queue, described instruction queue include being arranged successively according to execution sequence
It is multiple instructions to be performed, the multiple average pondization instructions to be performed including after the compiling instructs.
Clause A13, the device according to clause A12, the control module, further includes:
Dependence handle submodule, for determine it is the multiple it is instructions to be performed in the first instructions to be performed and institute
State first it is instructions to be performed before it is the 0th instructions to be performed there are when incidence relation, described first instructions to be performed is buffered in
In described instruction sub-module stored, the described 0th it is instructions to be performed be finished after, mentioned from described instruction sub-module stored
Take described first it is instructions to be performed be sent to the computing module,
Wherein, the described first the 0th presence instructions to be performed instructions to be performed with described first before instructions to be performed is closed
Connection relationship includes:
The first storage address section and the storage the described 0th for storing the described first required data instructions to be performed are pending
0th storage address section of data needed for instructing has the region of overlapping.
Clause A14, the device according to clause A1,
The control module is also used to according to the average pondization instruction generation assembling file, and by the assembling file
Binary file is translated into,
Wherein, the binary file is the average pondization instruction after the compiling.
Clause A15, a kind of machine learning arithmetic unit, described device include:
One or more described in any item average pond instruction processing units of such as clause A1- clause A14, are used for from other
It is obtained in processing unit to operational data and control information, and executes specified machine learning operation, implementing result is passed through into I/O
Interface passes to other processing units;
It is the multiple described when the machine learning arithmetic unit includes multiple average pond instruction processing units
It can be attached by specific structure between average pond instruction processing unit and transmit data;
Wherein, multiple average pond instruction processing units are carried out by quick external equipment interconnection Bus PC IE bus
Data are interconnected and transmit, to support the operation of more massive machine learning;Multiple average pond instruction processing units are total
It enjoys same control system or possesses respective control system;Multiple average pond instruction processing unit shared drives are gathered around
There is respective memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
Clause A16, a kind of combined treatment device, the combined treatment device include:
Machine learning arithmetic unit, general interconnecting interface and other processing units as described in clause A15;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying
Operation,
Wherein, the combined treatment device further include: storage device, the storage device respectively with the machine learning operation
Device is connected with other described processing units, for saving the number of the machine learning arithmetic unit and other processing units
According to.
Clause A17, a kind of machine learning chip, the machine learning chip include:
Machine learning arithmetic unit as described in clause A15 or the combined treatment device as described in clause A16.
Clause A18, a kind of electronic equipment, the electronic equipment include:
Machine learning chip as described in clause A17.
Clause A19, a kind of board, the board include: memory device, interface arrangement and control device and such as clause
Machine learning chip described in A17;
Wherein, the machine learning chip and the memory device, the control device and the interface arrangement are distinguished
Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the machine learning chip and external equipment;
The control device is monitored for the state to the machine learning chip.
Clause A20, a kind of average pond command processing method, the method are applied to average pond instruction processing unit,
Described device includes control module and computing module, which comprises
The average pondization instruction got is compiled using control module, the average pondization instruction after being compiled,
Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute
It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described using computing module, obtains operation knot
Fruit, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
Clause A21, the method according to clause A20 are averaged according to pondization verification is described to operational data
Pond operation obtains operation result, comprising:
The add operation in the average pond operation, Yi Jili are executed using multiple adders in the computing module
The division arithmetic in the average pond operation is executed with multiple dividers in the computing module.
Clause A22, the method according to clause A21, the computing module include main operation submodule and multiple from fortune
Operator module, the main operation submodule include multiple adders and multiple dividers,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, and will
The operation result is stored in the destination address, comprising:
Using in the main operation submodule the multiple adder and the multiple divider carry out respectively it is described flat
Add operation and division arithmetic in equal pond operation obtain operation result, and with being stored in the target by the operation result
In location.
Clause A23, the method according to clause A20, the operation domain further include inputting height and input width,
Wherein, it is obtained according to the operation code and the operation domain needed for executing the average pondization instruction to operand
According to, Chi Huahe and destination address, comprising:
From described in operational data address, obtain the corresponding input width and the input height to operand
According to.
Clause A24, the method according to clause A20, the operation domain further include Chi Huahe height and Chi Huahe width,
Wherein, obtained according to the operation code and the operation domain execute average pondization instruction it is required to operational data,
Chi Huahe and destination address, comprising:
The Chi Huahe is obtained from the address Chi Huahe according to the pond core height and the Chi Huahe width.
Clause A25, the method according to clause A20, the operation domain further include the first stride,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, comprising:
The Chi Huahe is moved in the x direction according to first stride.
Clause A26, the method according to clause A20, the operation domain further include the second stride,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, comprising:
The Chi Huahe is moved in y-direction according to second stride.
Clause A27, the method according to clause A20 are averaged according to pondization verification is described to operational data
Pond operation obtains operation result, comprising:
Described to the mobile Chi Huahe of non-overlap in operational data, and in region corresponding to the pond core
It is multiple to operational data, obtain the operation result.
Clause A28, the method according to clause A20 are averaged according to pondization verification is described to operational data
Pond operation obtains operation result, comprising:
It is described when the size of operational data be the Chi Huahe size it is non-integral multiple when, to described to operational data
In be the Chi Huahe the data of integral multiple of size carry out average pond operation,
Wherein, the size to operational data is the non-integral multiple of the size of the Chi Huahe, including following at least one
: non-integral multiple, the described input to operational data for the width that the input width to operational data is the Chi Huahe
Height is the non-integral multiple of the height of the Chi Huahe.
Clause A29, the method according to clause A20, the operation domain further include pond nuclear volume,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, packet
It includes:
It is multiple Chi Huahe of the pond nuclear volume by quantity, carries out average Chi Huayun to operational data to described
It calculates.
Clause A30, the method according to clause A20, the method also includes:
Stored using the memory module of described device it is described to operational data and the Chi Huahe,
Wherein, the memory module include register and caching at least one of,
The caching, described to operational data and the Chi Huahe for storing, the caching includes at least one nerve
Member caching NRAM;
The register, it is described to the scalar data in operational data for storing;
The neuron caching is described to the neuron number evidence in operational data for storing, and the neuron number evidence includes
Neuron vector data.
Clause A31, the method according to clause A20 parse the average pondization instruction got, obtain described
The operation code and operation domain of average pondization instruction, comprising:
Average pondization instruction after storing the compiling;
Average pondization instruction after the compiling is parsed, the operation code and operation domain of average pondization instruction are obtained;
Store instruction queue, described instruction queue include that multiple instructions to be performed, institute is arranged successively according to execution sequence
State multiple average pondization instructions instructions to be performed including after the compiling.
Clause A32, the method according to clause A31, the method also includes:
Determine it is the multiple it is instructions to be performed in first it is instructions to be performed with described first it is instructions to be performed before
0th is instructions to be performed there are when incidence relation, it is instructions to be performed to cache described first, and determining the 0th pending finger
After order is finished, control carries out first execution instructions to be performed,
Wherein, the described first the 0th presence instructions to be performed instructions to be performed with described first before instructions to be performed is closed
Connection relationship includes:
The first storage address section and the storage the described 0th for storing the described first required data instructions to be performed are pending
0th storage address section of data needed for instructing has the region of overlapping.
Clause A33, the method according to clause A20 are compiled the average pondization instruction got, are compiled
Average pondization instruction afterwards, comprising:
Assembling file is generated according to the average pondization instruction, and the assembling file is translated into binary file,
Wherein, the binary file is the average pondization instruction after the compiling.
Clause A34, a kind of non-volatile computer readable storage medium storing program for executing, are stored thereon with computer program instructions, described
Realize clause A20 to clause A33 described in any item methods when computer program instructions are executed by processor.
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and
Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application
There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.
Claims (10)
1. a kind of average pond instruction processing unit, which is characterized in that described device includes:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled, to institute
Average pondization instruction after stating compiling is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to the behaviour
Make code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module, for carrying out average pond operation to operational data according to pondization verification is described, acquisition operation result,
And the operation result is stored in the destination address,
Wherein, it is average pond operation that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
The operation domain includes to operational data address, the address Chi Huahe and the destination address.
2. the apparatus according to claim 1, which is characterized in that the computing module, comprising:
Multiple adders, for executing the add operation in the average pond operation;
Multiple dividers, for executing the division arithmetic in the average pond operation.
3. the apparatus of claim 2, which is characterized in that the computing module include main operation submodule and it is multiple from
Operation submodule, the main operation submodule include the multiple adder and the multiple divider,
The main operation submodule, for carrying out the average pond respectively using the multiple adder and the multiple divider
Change the add operation in operation and division arithmetic, obtains operation result, and the operation result is stored in the destination address.
4. a kind of machine learning arithmetic unit, which is characterized in that described device includes:
One or more average pond instruction processing units as described in any one of claims 1-3, for being filled from other processing
Middle acquisition is set to operational data and control information, and executes specified machine learning operation, implementing result is passed by I/O interface
Pass other processing units;
It is the multiple described average when the machine learning arithmetic unit includes multiple average pond instruction processing units
It can be attached by specific structure between the instruction processing unit of pond and transmit data;
Wherein, multiple average pond instruction processing units are interconnected by quick external equipment interconnection Bus PC IE bus
And data are transmitted, to support the operation of more massive machine learning;Multiple average pond instruction processing units are shared same
One control system possesses respective control system;Multiple average pond instruction processing unit shared drives possess each
From memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
5. a kind of combined treatment device, which is characterized in that the combined treatment device includes:
Machine learning arithmetic unit, general interconnecting interface and other processing units as claimed in claim 4;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying
Make,
Wherein, the combined treatment device further include: storage device, the storage device respectively with the machine learning arithmetic unit
It is connected with other described processing units, for saving the data of the machine learning arithmetic unit and other processing units.
6. a kind of machine learning chip, which is characterized in that the machine learning chip includes:
Machine learning arithmetic unit as claimed in claim 4 or combined treatment device as claimed in claim 5.
7. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Machine learning chip as claimed in claim 6.
8. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right
It is required that machine learning chip described in 6;
Wherein, the machine learning chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the machine learning chip and external equipment;
The control device is monitored for the state to the machine learning chip.
9. a kind of average pond command processing method, which is characterized in that the method is applied to average pond instruction processing unit,
Described device includes control module and computing module, which comprises
The average pondization instruction got is compiled using control module, the average pondization instruction after being compiled, to institute
Average pondization instruction after stating compiling is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to the behaviour
Make code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described using computing module, obtains operation result,
And the operation result is stored in the destination address,
Wherein, it is average pond operation that the operation code, which is used to indicate the operation that the average pondization instruction carries out data,
The operation domain includes to operational data address, the address Chi Huahe and the destination address.
10. a kind of non-volatile computer readable storage medium storing program for executing, which is characterized in that computer program instructions are stored thereon with,
It is characterized in that, the computer program instructions realize method as claimed in claim 9 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/110146 WO2020073923A1 (en) | 2018-10-09 | 2019-10-09 | Operation method and device, computer equipment, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811363029 | 2018-11-14 | ||
CN2018113630290 | 2018-11-14 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110096309A true CN110096309A (en) | 2019-08-06 |
CN110096309B CN110096309B (en) | 2020-04-14 |
Family
ID=67451175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910548268.1A Active CN110096309B (en) | 2018-10-09 | 2019-06-24 | Operation method, operation device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096309B (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346705A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112346784A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112346707A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112396169A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112394991A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Floating point to half precision floating point instruction processing device and method and related products |
CN112395008A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112394988A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Unsigned to half-precision floating point instruction processing device, method and related product |
CN112394902A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Device and method for processing half-precision floating point to floating point instruction and related products |
CN112394995A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Half-precision floating point to short shaping instruction processing device and method and related product |
CN112394903A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Short shaping to half precision floating point instruction processing device, method and related product |
CN112394985A (en) * | 2019-08-12 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Execution method, device and related product |
CN112395006A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112394997A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Eight-bit shaping to half-precision floating point instruction processing device and method and related product |
CN112396170A (en) * | 2019-08-14 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112394999A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN112394998A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN112394993A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Half-precision floating point to short shaping instruction processing device and method and related product |
WO2021082747A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Operational apparatus and related product |
CN113033789A (en) * | 2019-12-24 | 2021-06-25 | 中科寒武纪科技股份有限公司 | Bus system for order preservation, integrated circuit device, board card and order preservation method |
CN113435591A (en) * | 2019-08-14 | 2021-09-24 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN113704687A (en) * | 2020-05-21 | 2021-11-26 | 杭州海康威视数字技术股份有限公司 | Tensor calculation operation method and device and operation system |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779060A (en) * | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
CN106843993A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of method and system of resolving inversely GPU instructions |
CN106991473A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
CN107301453A (en) * | 2016-04-15 | 2017-10-27 | 北京中科寒武纪科技有限公司 | The artificial neural network forward operation apparatus and method for supporting discrete data to represent |
CN107704922A (en) * | 2017-04-19 | 2018-02-16 | 北京深鉴科技有限公司 | Artificial neural network processing unit |
CN107729990A (en) * | 2017-07-20 | 2018-02-23 | 上海寒武纪信息科技有限公司 | Support the device and method for being used to perform artificial neural network forward operation that discrete data represents |
CN107832804A (en) * | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | A kind of information processing method and Related product |
CN108197705A (en) * | 2017-12-29 | 2018-06-22 | 国民技术股份有限公司 | Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium |
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
US20180217962A1 (en) * | 2017-02-02 | 2018-08-02 | Fujitsu Limited | Operation processing apparatus and operation processing method |
CN108615072A (en) * | 2016-12-13 | 2018-10-02 | 谷歌公司 | Average pond is executed within hardware |
-
2019
- 2019-06-24 CN CN201910548268.1A patent/CN110096309B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301453A (en) * | 2016-04-15 | 2017-10-27 | 北京中科寒武纪科技有限公司 | The artificial neural network forward operation apparatus and method for supporting discrete data to represent |
CN108615072A (en) * | 2016-12-13 | 2018-10-02 | 谷歌公司 | Average pond is executed within hardware |
CN106843993A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of method and system of resolving inversely GPU instructions |
US20180217962A1 (en) * | 2017-02-02 | 2018-08-02 | Fujitsu Limited | Operation processing apparatus and operation processing method |
CN106779060A (en) * | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
CN106991473A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
CN107704922A (en) * | 2017-04-19 | 2018-02-16 | 北京深鉴科技有限公司 | Artificial neural network processing unit |
CN107807819A (en) * | 2017-07-20 | 2018-03-16 | 上海寒武纪信息科技有限公司 | A kind of device and method for being used to perform artificial neural network forward operation for supporting that discrete data represents |
CN107992329A (en) * | 2017-07-20 | 2018-05-04 | 上海寒武纪信息科技有限公司 | A kind of computational methods and Related product |
CN107729990A (en) * | 2017-07-20 | 2018-02-23 | 上海寒武纪信息科技有限公司 | Support the device and method for being used to perform artificial neural network forward operation that discrete data represents |
CN107832804A (en) * | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | A kind of information processing method and Related product |
CN108197705A (en) * | 2017-12-29 | 2018-06-22 | 国民技术股份有限公司 | Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium |
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346784A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112346707A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112346705A (en) * | 2019-08-07 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Instruction processing method and device and related product |
CN112394985A (en) * | 2019-08-12 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Execution method, device and related product |
CN112394997A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Eight-bit shaping to half-precision floating point instruction processing device and method and related product |
CN112394998A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN112394988A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Unsigned to half-precision floating point instruction processing device, method and related product |
CN112394902A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Device and method for processing half-precision floating point to floating point instruction and related products |
CN112394995A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Half-precision floating point to short shaping instruction processing device and method and related product |
CN112394903A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Short shaping to half precision floating point instruction processing device, method and related product |
CN112394991A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Floating point to half precision floating point instruction processing device and method and related products |
CN112395006A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112396169A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112395006B (en) * | 2019-08-13 | 2024-07-26 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN112394999A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN112395008A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN112394993A (en) * | 2019-08-13 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Half-precision floating point to short shaping instruction processing device and method and related product |
CN112396169B (en) * | 2019-08-13 | 2024-04-02 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN113435591A (en) * | 2019-08-14 | 2021-09-24 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN112396170B (en) * | 2019-08-14 | 2024-04-02 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN113435591B (en) * | 2019-08-14 | 2024-04-05 | 中科寒武纪科技股份有限公司 | Data processing method, device, computer equipment and storage medium |
CN112396170A (en) * | 2019-08-14 | 2021-02-23 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
WO2021082747A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Operational apparatus and related product |
CN113033789A (en) * | 2019-12-24 | 2021-06-25 | 中科寒武纪科技股份有限公司 | Bus system for order preservation, integrated circuit device, board card and order preservation method |
CN113033789B (en) * | 2019-12-24 | 2024-03-26 | 中科寒武纪科技股份有限公司 | Bus system, integrated circuit device, board card and order preserving method for order preserving |
CN113704687A (en) * | 2020-05-21 | 2021-11-26 | 杭州海康威视数字技术股份有限公司 | Tensor calculation operation method and device and operation system |
CN113704687B (en) * | 2020-05-21 | 2024-04-05 | 杭州海康威视数字技术股份有限公司 | Tensor calculation operation method, device and operation system |
Also Published As
Publication number | Publication date |
---|---|
CN110096309B (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096309A (en) | Operation method, device, computer equipment and storage medium | |
CN110096283A (en) | Operation method, device, computer equipment and storage medium | |
CN110096310A (en) | Operation method, device, computer equipment and storage medium | |
CN110119807A (en) | Operation method, device, computer equipment and storage medium | |
CN110163362A (en) | A kind of computing device and method | |
CN110059809A (en) | A kind of computing device and Related product | |
CN111047005A (en) | Operation method, operation device, computer equipment and storage medium | |
CN111061507A (en) | Operation method, operation device, computer equipment and storage medium | |
US20230259737A1 (en) | Integrated computing apparatus, chip, board card, device and computing method | |
CN111966399B (en) | Instruction processing method and device and related products | |
WO2021082721A1 (en) | Winograd convolution operation method, apparatus, and device, and storage medium | |
CN111047030A (en) | Operation method, operation device, computer equipment and storage medium | |
CN111949318A (en) | Instruction processing method and device and related product | |
CN110704040A (en) | Information processing method and device, computer equipment and readable storage medium | |
CN111949317A (en) | Instruction processing method and device and related product | |
CN112395008A (en) | Operation method, operation device, computer equipment and storage medium | |
CN112395006B (en) | Operation method, device, computer equipment and storage medium | |
CN112396170B (en) | Operation method, device, computer equipment and storage medium | |
CN112396169B (en) | Operation method, device, computer equipment and storage medium | |
CN111124497B (en) | Operation method, operation device, computer equipment and storage medium | |
CN111026440B (en) | Operation method, operation device, computer equipment and storage medium | |
CN111338694B (en) | Operation method, device, computer equipment and storage medium | |
WO2022001496A1 (en) | Computing apparatus, integrated circuit chip, board card, electronic device, and computing method | |
CN111290788B (en) | Operation method, operation device, computer equipment and storage medium | |
CN112395001A (en) | Operation method, operation device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |