CN110096309A - Operation method, device, computer equipment and storage medium - Google Patents

Operation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110096309A
CN110096309A CN201910548268.1A CN201910548268A CN110096309A CN 110096309 A CN110096309 A CN 110096309A CN 201910548268 A CN201910548268 A CN 201910548268A CN 110096309 A CN110096309 A CN 110096309A
Authority
CN
China
Prior art keywords
average
instruction
machine learning
pondization
pond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910548268.1A
Other languages
Chinese (zh)
Other versions
CN110096309B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Publication of CN110096309A publication Critical patent/CN110096309A/en
Priority to PCT/CN2019/110146 priority Critical patent/WO2020073923A1/en
Application granted granted Critical
Publication of CN110096309B publication Critical patent/CN110096309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Devices For Executing Special Programs (AREA)
  • Advance Control (AREA)

Abstract

This disclosure relates to a kind of operation method, device, computer equipment and storage medium.Combined treatment device therein includes: machine learning arithmetic unit, general interconnecting interface and other processing units;Machine learning arithmetic unit is interacted with other processing units, the common calculating operation completing user and specifying, wherein, combined treatment device further include: storage device, the storage device is connect with machine learning arithmetic unit and other processing units respectively, for saving the data of machine learning arithmetic unit He other processing units.Operation method, device provided by the embodiment of the present disclosure, computer equipment and storage medium it is applied widely, the treatment effeciency for carrying out operation is high, processing speed is fast.

Description

Operation method, device, computer equipment and storage medium
Technical field
This disclosure relates to field of computer technology more particularly to a kind of average pond command processing method, device, computer Equipment and storage medium.
Background technique
With the continuous development of science and technology, machine learning, especially neural network algorithm using more and more extensive.It is being schemed As all having obtained good application in the fields such as identification, speech recognition, natural language processing.But due to answering for neural network algorithm Miscellaneous degree is higher and higher, and related data operation type and quantity constantly increase.In the related technology, average pond is being carried out to data Low efficiency, the speed for changing operation Average-Pooling are slow.
Summary of the invention
In view of this, the present disclosure proposes a kind of average pond command processing method, device, computer equipment and storages to be situated between Matter, to improve the efficiency and speed that data carried out with average pond operation.
According to the disclosure in a first aspect, providing a kind of average pond instruction processing unit, described device includes:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled, Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module obtains operation for carrying out average pond operation to operational data according to pondization verification is described As a result, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
According to the second aspect of the disclosure, a kind of machine learning arithmetic unit is provided, described device includes:
Average pond instruction processing unit described in one or more above-mentioned first aspects, for from other processing units It obtains to operational data and control information, and executes specified machine learning operation, implementing result is passed to by I/O interface Other processing units;
It is the multiple described when the machine learning arithmetic unit includes multiple average pond instruction processing units It can be attached by specific structure between average pond instruction processing unit and transmit data;
Wherein, multiple average pond instruction processing units are carried out by quick external equipment interconnection Bus PC IE bus Data are interconnected and transmit, to support the operation of more massive machine learning;Multiple average pond instruction processing units are total It enjoys same control system or possesses respective control system;Multiple average pond instruction processing unit shared drives are gathered around There is respective memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
According to the third aspect of the disclosure, a kind of combined treatment device is provided, described device includes:
Machine learning arithmetic unit, general interconnecting interface described in above-mentioned second aspect and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying Operation.
According to the fourth aspect of the disclosure, a kind of machine learning chip is provided, the machine learning chip includes above-mentioned Combined treatment device described in machine learning network arithmetic unit or the above-mentioned third aspect described in second aspect.
According to the 5th of the disclosure the aspect, a kind of machine learning chip-packaging structure is provided, machine learning chip envelope Assembling structure includes machine learning chip described in above-mentioned fourth aspect.
According to the 6th of the disclosure the aspect, a kind of board is provided, which includes machine described in above-mentioned 5th aspect Learn chip-packaging structure.
According to the 7th of the disclosure the aspect, a kind of electronic equipment is provided, the electronic equipment includes above-mentioned fourth aspect Board described in the machine learning chip or above-mentioned 6th aspect.
According to the eighth aspect of the disclosure, a kind of average pond command processing method is provided, the method is applied to flat Equal pond instruction processing unit, which comprises
The average pondization instruction got is compiled, the average pondization instruction after being compiled, after the compiling The instruction of average pondization parsed, obtain the operation code and operation domain that average pondization instructs, and according to the operation code and institute Operation domain is stated to obtain needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, and will be described Operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
According to the 9th of the disclosure the aspect, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with Computer program instructions, the computer program instructions realize above-mentioned average pond command processing method when being executed by processor.
In some embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, Camera, video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment Equipment.
In some embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
Average pond command processing method, device, computer equipment and storage medium provided by the embodiment of the present disclosure, should Device includes control module and computing module, and control module is used to be compiled the average pondization instruction got, be compiled Average pondization instruction after translating parses the average pondization instruction after compiling, obtain operation code that average pondization instructs with Operation domain, and obtained needed for executing average pondization instruction according to operation code and operation domain to operational data, Chi Huahe and target Address;Computing module is used to carry out average pond operation to operational data according to pondization verification, acquisition operation result, and by operation As a result it is stored in destination address.Average pond command processing method, device provided by the embodiment of the present disclosure, computer equipment and Storage medium it is applied widely, it is high to the treatment effeciency of average pondization instruction, processing speed is fast, carry out average pond operation Treatment effeciency is high, speed is fast.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 2 a- Fig. 2 f shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 3 shows the schematic diagram of the application scenarios of the average pond instruction processing unit according to one embodiment of the disclosure.
Fig. 4 a, Fig. 4 b show the block diagram of the combined treatment device according to one embodiment of the disclosure.
Fig. 5 shows the structural schematic diagram of the board according to one embodiment of the disclosure.
Fig. 6 shows the flow chart of the average pond command processing method according to one embodiment of the disclosure.
Specific embodiment
Below in conjunction with the attached drawing in present disclosure embodiment, the technical solution in present disclosure embodiment is carried out clear, complete Site preparation description, it is clear that described embodiment is present disclosure a part of the embodiment, instead of all the embodiments.Based on originally draping over one's shoulders Embodiment in dew, those skilled in the art's every other embodiment obtained without making creative work, Belong to the range of present disclosure protection.
It should be appreciated that the claim of present disclosure, specification and term " the 0th " in attached drawing, " first ", " second " etc. It is to be not use to describe a particular order for distinguishing different objects.Present disclosure it is used in the specification and claims Term " includes " and "comprising" indicate described feature, entirety, step, operation, the presence of element and/or component, but do not arrange Except one or more of the other feature, entirety, step, operation, the presence or addition of element, component and/or its set.
It is also understood that in this present disclosure term used in the description merely for the sake of the mesh of description specific embodiment , and be not intended to limit present disclosure.As used in present disclosure specification and claims, unless context Other situations are clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.It should also be into one Step understands, refers to one in the associated item listed in present disclosure term "and/or" used in the specification and claims A or multiple any combination and all possible combinations, and including these combinations.
As used in the present specification and claims, term " if " can be explained according to context For " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " such as Fruit detects [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to determination " Or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
Due to being widely used for neural network algorithm, the continuous promotion of computer hardware operation people's ability, in practical application The type and quantity of involved data operation are continuously improved.Average pond operation (Average-pooling)) it is that one kind obtains Take the average value of all data in partial zones.It is flat to realize under different language environments due to the wide variety of programming language The calculating process of equal pond operation, in the related technology, due to can not be widely used in being averaged for all kinds of programming languages at this stage Pondization instruction, technical staff need a plurality of instruction of its programming language environment of customized correspondence to realize average pond operation, lead Cause carries out that average pond operation efficiency is low, speed is slow.The disclosure provides a kind of average pond command processing method, device, calculating Only average pond operation can be thus achieved with an instruction in machine equipment and storage medium, can significantly improve and carry out average pond The efficiency and speed of operation.
Fig. 1 shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.As shown in Figure 1, the dress It sets including control module 11 and computing module 12.
Control module 11, for being compiled to the average pondization instruction got, the average pond after being compiled refers to It enables, the average pondization instruction after compiling is parsed, obtain the operation code and operation domain of average pondization instruction, and according to operation Code and operation domain obtain needed for executing average pondization instruction to operational data, Chi Huahe and destination address.Wherein, operation code is used In indicating that operation that average pondization instruction carries out data is averagely pond operation, operation domain include to operational data address, The address Chi Huahe and destination address.
Computing module 12 carries out average pond operation to operational data for checking according to pondization, obtains operation result, and Operation result is stored in destination address.
In the present embodiment, the instruction of average pondization accessed by control module be it is uncompiled, cannot be directly for hardware The software instruction of execution, control module need first to be compiled average pondization instruction (uncompiled).It is flat after being compiled After equal pondization instruction, the average pondization after compiling could be instructed and be parsed.Average pondization instruction after compiling is can The hardware instruction directly executed for hardware.Control module can obtain respectively to operational data address and the address Chi Huahe To operational data and Chi Huahe.Control module can obtain instruction and data, data input by data input-output unit Output unit can be one or more data I/O interfaces or I/O pin.
In the present embodiment, operation code can be that a part instruction that execute operation of defined in computer program Or field (usually being indicated with code), it is instruction sequence number, for informing which item the device executed instruction specifically needs to be implemented Instruction.The source of all data needed for operation domain can be the corresponding instruction of execution executes needed for corresponding instruction and owns Data include the parameters and corresponding operation method etc. such as evidence, pond core to be shipped of counting.Pondization average for one instructs it It must include operation code and operation domain, wherein operation domain is included at least to operational data address, the address Chi Huahe and target Location.
It should be understood that those skilled in the art can according to need instruction format and institute to the instruction of average pondization The operation code and operation domain for including are configured, the disclosure to this with no restriction.
In the present embodiment, the apparatus may include one or more control modules, and one or more computing modules, Can the quantity according to actual needs to control module and computing module be configured, the disclosure to this with no restriction.In device When including a control module, which can receive the instruction of average pondization, and control one or more computing modules into The average pond operation of row.When device includes multiple control modules, multiple control modules can receive average pondization instruction respectively, And it controls corresponding one or more computing modules and carries out average pond operation.
Average pond instruction processing unit provided by the embodiment of the present disclosure, which includes control module and operation mould Block, control module are used to be compiled the average pondization instruction got, the average pondization instruction after being compiled, to compiling Average pondization instruction afterwards is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to operation code and operation Domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;Computing module is used for according to pond Verification carries out average pond operation to operational data, obtains operation result, and operation result is stored in destination address.The disclosure Average pond instruction processing unit is applied widely provided by embodiment, place high to the treatment effeciency of average pondization instruction Reason speed is fast, and treatment effeciency height, the speed for carrying out average pond operation are fast.
Fig. 2 a shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality In existing mode, as shown in Figure 2 a, computing module 12 may include multiple adders 120 and multiple dividers 120 '.Multiple additions Device 120 is used to execute the add operation in average pond operation.Multiple dividers 120 ' are for executing in average pond operation Division arithmetic.
In this implementation, computing module also may include an adder and a divider, or including one Adder, multiple dividers, then including multiple adders, a divider.It can be according to the average pond of required progress The size of the data volume of operation requires the quantity to adder and divider to processing speed, efficiency of average pond operation etc. Be configured, the disclosure to this with no restriction.
Fig. 2 b shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality In existing mode, as shown in Figure 2 b, computing module 12 may include main operation submodule 121 and multiple from operation submodule 122.It is main Operation submodule 121 may include multiple adders and multiple dividers.
Main operation submodule 121, for being carried out in average pond operation respectively using multiple adders and multiple dividers Add operation and division arithmetic, obtain operation result, and operation result is stored in destination address.
In one possible implementation, control module 11 are also used to parse the computations got, obtain To the operation domain and operation code of computations, and to operand according to needed for operation domain and operation code acquisition execution computations According to.Computing module 12 is also used to treat operational data according to computations and carries out operation, obtains the calculated result of computations. Wherein, computing module may include multiple arithmetic units, for executing operation corresponding with the arithmetic type of computations.
In this implementation, computations can be other and carry out arithmetic to data such as scalar, vector, matrix, tensors The instruction of the operations such as operation, logical operation, those skilled in the art can according to actual needs be configured computations, this It discloses to this with no restriction.
In the implementation, arithmetic unit may include adder, divider, multiplier, comparator etc. can to data into The arithmetic unit of the operations such as row arithmetical operation, logical operation.It can size, operation class according to the data volume of the operation of required progress Type requires to be configured the type and quantity of arithmetic unit to processing speed, efficiency of data progress operation etc., and the disclosure is to this With no restriction.
In one possible implementation, control module 11 are also used to analytical Calculation and instruct to obtain multiple operational orders, And main operation submodule 121 will be sent to operational data and multiple operational orders.
Main operation submodule 121 executes preamble processing for treating operational data, and with multiple from operation submodule 122 carry out the transmission of data and operational order.
From operation submodule 122, for being executed parallel according to the data and operational order transmitted from main operation submodule 121 Intermediate operations obtain multiple intermediate results, and multiple intermediate results are transferred to main operation submodule 122.
Main operation submodule 121 is also used to execute subsequent processing to multiple intermediate results, obtains the calculating knot of computations Fruit, and calculated result is stored in corresponding address.
In this implementation, computations for carried out for scalar, vector data operation when, device can be controlled It makes main operation submodule and carries out operation corresponding with computations using arithmetic unit therein.It is for square in computations When data of the dimensions such as battle array, tensor more than or equal to 2 carry out operation, device can control from operation submodule and utilize fortune therein It calculates device and carries out operation corresponding with computations.
It should be noted that those skilled in the art can be according to actual needs to main operation submodule and multiple from operation Connection type between submodule is configured, to realize that the framework to computing module is arranged, for example, the framework of computing module can To be " H " type frame structure, array type framework, tree-shaped framework etc., the disclosure to this with no restriction.
Fig. 2 c shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality In existing mode, as shown in Figure 2 c, computing module 12 can also include one or more branch operations submodules 123, branch fortune Operator module 123 is for forwarding main operation submodule 121 and from the data and/or operational order between operation submodule 122.Its In, main operation submodule 121 is connect with one or more branch operations submodules 123.In this way, the main operator in computing module Module, branch operations submodule and between operation submodule use " H " type frame structure connect, forwarded by branch operations submodule Data and/or operational order save the resource occupation to main operation submodule, and then improve the processing speed of instruction.
Fig. 2 d shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality In existing mode, as shown in Figure 2 d, it is multiple from operation submodule 122 be in array distribution.
It is each connect from operation submodule 122 with other adjacent from operation submodule 122, main operation submodule 121 connects Multiple the k from operation submodule 122 are connect from operation submodule 122, k from operation submodule 122 are as follows: n of the 1st row from Operation submodule 122, the n m arranged from operation submodule 122 and the 1st of m row are a from operation submodule 122.
Wherein, as shown in Figure 2 d, the k n for only including the 1st row from operation submodule are a from operation submodule, the n of m row For a m arranged from operation submodule and the 1st from operation submodule, i.e. the k are multiple from operation submodule from operation submodule Directly connect with main operation submodule in block from operation submodule.Wherein, k is a from operation submodule, in main operator The forwarding of module and multiple data and instruction between operation submodule.In this way, it is multiple from operation submodule be in array Distribution can be improved main operation submodule to from operation submodule and send data and/or operational order speed, and then improves instruction Processing speed.
Fig. 2 e shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.In a kind of possible reality In existing mode, as shown in Figure 2 e, computing module can also include tree-shaped submodule 124.The tree-shaped submodule 124 includes a root Port 401 and multiple ports 402.Root port 401 is connect with main operation submodule 121, multiple ports 402 and multiple from fortune Operator module 122 is separately connected.Wherein, tree-shaped submodule 124 has transmission-receiving function, for forwarding main 121 He of operation submodule From the data and/or operational order between operation submodule 122.In this way, by the effect of tree-shaped submodule so that computing module It is connected in tree-shaped framework, and using the forwarding capability of tree-shaped submodule, main operation submodule can be improved to from operation submodule Data and/or operational order speed are sent, and then improves the processing speed of instruction.
In one possible implementation, tree-shaped submodule 124 can be the optional as a result, it may include of the device At least one layer of node.Node is the cable architecture with forwarding capability, and node itself does not have calculation function.Undermost node with From operation submodule connect, with forward main operation submodule 121 and between operation submodule 122 data and/or operation refer to It enables.Distinguishingly, as tree-shaped submodule has zero layer node, which is then not necessarily to tree-shaped submodule.
In one possible implementation, tree-shaped submodule 124 may include multiple nodes of n fork tree construction, n fork tree Multiple nodes of structure can have multiple layers.
For example, Fig. 2 f shows the block diagram of the average pond instruction processing unit according to one embodiment of the disclosure.Such as figure Shown in 2f, n fork tree construction can be binary tree structure, and tree-shaped submodule includes 2 node layers 01.Lowest level node 01 with from operation Submodule 122 connects, to forward main operation submodule 121 and from the data and/or operational order between operation submodule 122.
In this implementation, n, which pitches tree construction, can also be that trident tree construction etc., n are the positive integer more than or equal to 2. Those skilled in the art, which can according to need, is configured the number of plies of n and n fork tree construction interior joint in n fork tree construction, The disclosure to this with no restriction.
In one possible implementation, operation domain can also include input height and input width.
Wherein, control module is also used to operational data address, obtain corresponding input width and input height to Operational data.
In this implementation, input height and input width can limit the data volume obtained to operational data and Size.Input height included by operation domain and input width can be specific numerical value, can also be storage input height and Input the storage address of width.It is when directly including input height in operation domain and inputting the specific value of width, this is specific Numerical value is determined as corresponding input height and input width.It include input height and the storage address for inputting width in operation domain When, can input height and input width be obtained from input height and the storage address of input width respectively.
In one possible implementation, when not including input height in operation domain and/or inputting width, Ke Yigen It obtains according to pre-set default input height and default input width to operational data.
By the above-mentioned means, the data volume of operational data can be treated and size is limited, guarantee the standard of operation result True property, and guarantee that device can execute the average pondization instruction.
In one possible implementation, operation domain can also include pond core height and Chi Huahe width.
Wherein, control module 11 are also used to obtain pond from the address Chi Huahe according to pond core height and Chi Huahe width Change core.
In one possible implementation, operation domain can also include the first stride.Wherein, computing module 12 may be used also For moving Chi Huahe in the x direction according to the first stride.
In one possible implementation, operation domain can also include the second stride.Wherein, computing module 12 may be used also For moving Chi Huahe in y-direction according to the second stride.
In this implementation, the stride of average pond operation is to move pond each time in carrying out average pond operation The amplitude of core.First stride can be the amplitude for moving Chi Huahe in the x direction, and the second stride can be to be moved in y-direction The amplitude of Chi Huahe.
It should be noted that describing in the disclosure only by taking Chi Huahe is two dimension as an example and carrying out average pond operation institute The parameters such as height, width, the first stride and the second stride of Chi Huahe needed, if Chi Huahe is multidimensional, in correspondingly Chi Huahe Parameter then include each of which dimension size and stride.
In one possible implementation, the first stride and second are not provided in the operation domain of average pondization instruction When stride, computing module can be respectively its stride for corresponding to dimension with the height and width of Chi Huahe, guarantee average Chi Huayun That calculates is normally carried out.For example, computing module 12, which can be also used for the non-overlap on to operational data, moves Chi Huahe, and compare pond Change multiple to operational data, acquisition operation result in region corresponding to core.
In one possible implementation, do not include in operation domain pond core height, Chi Huahe width, when, can be with Pre-set default pool core height, default pool core width are obtained, control module and computing module are executed flat Equal pondization instruction.
In one possible implementation, operation domain can also include pond nuclear volume.Wherein, computing module 12, also For being multiple Chi Huahe of pond nuclear volume by quantity, treats operational data and carry out average pond operation.
In this implementation, pond nuclear volume is corresponding with to operational data.For example, when pond nuclear volume is 5, it can Five parts can be divided into operational data with determination, five parts for needing 5 pond cores to treat operational data respectively carry out Average pond operation.
In this implementation, when operation domain does not include pond nuclear volume, can determine only needs one to operational data Pond core can realize average pond operation.
In one possible implementation, the ruler that it is Chi Huahe in the size to operational data that computing module 12, which is also used to, When very little non-integral multiple, the data for treating the integral multiple in operational data for the size of Chi Huahe carry out average pond operation.Its In, size that the size to operational data is the Chi Huahe it is non-integral multiple, may include at least one of following: to operand According to input width be Chi Huahe width it is non-integral multiple, to operational data input height be Chi Huahe height it is non-whole Several times.
In this implementation, for be the non-integral multiple some residual data of the size of Chi Huahe in operational data It can be without average pond operation.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, which can also include memory module 13.It deposits Module 13 is stored up for storing to operational data and Chi Huahe.
In this implementation, memory module may include one of caching and register or a variety of, and caching can wrap The temporary caching of speed is included, can also include that at least one NRAM (deposit at random by Neuron Random Access Memory, neuron Access to memory).Caching can be used for storing to operational data and Chi Huahe, and register can be used for storing in operational data Scalar data.
In one possible implementation, caching may include neuron caching.Neuron caching namely above-mentioned nerve First random access memory can be used for storing to the neuron number evidence in operational data, and neuron number evidence may include nerve First vector data.
In one possible implementation, which can also include direct memory access module, be used for from storage mould Reading or storing data in block.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, control module 11 may include instruction storage submodule Block 111, instruction processing submodule 112 and queue sub-module stored 113.
Instruction sub-module stored 111 is used to store the average pondization instruction after compiling.
Instruction processing submodule 112 is used to parse the average pondization instruction after compiling, obtains average pondization instruction Operation code and operation domain.
The queue for storing instruction of queue sub-module stored 113, instruction queue include being arranged successively according to execution sequence It is multiple instructions to be performed, it is multiple it is instructions to be performed may include compiling after average pondization instruct.
It in this implementation, can be according to receiving time instructions to be performed, priority level etc. to multiple pending fingers The execution sequence of order carries out arrangement and obtains instruction queue, multiple instructions to be performed in order to successively be executed according to instruction queue.
In one possible implementation, as shown in Fig. 2 a- Fig. 2 f, control module 11 can also include at dependence Manage submodule 114.
Dependence handles submodule 114, for determining that first in multiple pending orders is instructions to be performed with the One it is instructions to be performed before it is the 0th instructions to be performed there are when incidence relation, instructions to be performed be buffered in instruction storage for first In submodule 111, the 0th it is instructions to be performed be finished after, from instruction sub-module stored 111 in extract the first pending finger Order is sent to computing module 12.
Wherein, instructions to be performed there are incidence relation packets by the first the 0 with first before instructions to be performed instructions to be performed It includes: data needed for the first storage address section of data needed for storage first is instructions to be performed and storage the 0th are instructions to be performed 0th storage address section has the region of overlapping.Conversely, the first the 0 with first before instructions to be performed instructions to be performed Not having incidence relation to can be the first storage address section and the 0th storage address section between instructions to be performed does not have overlay region Domain.
In this way, can according to first it is instructions to be performed with first it is instructions to be performed before the 0th pending finger Dependence between order, so that the first 0th instructions to be performed to be finished and then execute posterior first pending Instruction, guarantees the accuracy of operation result.
In one possible implementation, control module 11 can be also used for generating compilation text according to average pondization instruction Part, and assembling file is translated into binary file, wherein binary file is the average pondization instruction after compiling.
In one possible implementation, the instruction format of average pondization instruction may is that
avgpool dst src0src1srcChannel srcHeigh srcWidth kernelHeight kernelWidth sx sy
Wherein, avgpool is the operation code that average pondization instructs, dst, src0, src1, srcChannel, srcHeigh, SrcWidth, kernelHeight, kernelWidth, sx, sy are the operation domain of average pondization instruction.Wherein, dst is target Address, src0 are to operational data address, and src1 is the address Chi Huahe, and srcChannel is pond nuclear volume, and srcHeigh is Input height, srcWidth are input width, and kernelHeight is pond core height, and kernelWidth is Chi Huahe width, Sx is that Chi Huahe carries out the first mobile stride in the x direction, and sy is that Chi Huahe carries out the second mobile stride in y-direction.
It should be understood that those skilled in the art can according to need the operation code to the instruction of average pondization, instruction lattice The position of operation code and operation domain is configured in formula, the disclosure to this with no restriction.
In one possible implementation, which can be set in graphics processor (Graphics Processing Unit, abbreviation GPU), central processing unit (Central Processing Unit, abbreviation CPU) and embedded Processing with Neural Network Device (Neural-network Processing Unit, abbreviation NPU) it is one or more among.
It should be noted that it is as above although describing average pond instruction processing unit using above-described embodiment as example, It is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.In fact, user completely can according to personal preference and/or Practical application scene flexibly sets each module, as long as meeting the technical solution of the disclosure.
Using example
Below in conjunction with " carrying out average pond operation using average pond instruction processing unit " property application as an example Scene provides the application example according to the embodiment of the present disclosure, in order to understand the process of average pond instruction processing unit.Ability Field technique personnel should be understood that it is following using example merely for the sake of the purpose for being easy to understand the embodiment of the present disclosure, be not construed as pair The limitation of the embodiment of the present disclosure
Fig. 3 shows the schematic diagram of the application scenarios of the average pond instruction processing unit according to one embodiment of the disclosure.Such as Shown in Fig. 3, the process that average pond instruction processing unit handles the instruction of average pondization is as follows:
Control module 11 is compiled the average pondization instruction 1 got, the average pondization instruction 1 after being compiled (as averagely pondization instruction 1 is avgpool 500 100 200 5 64 32 222 1), instructs the average pondization after compiling It is parsed, obtains the operation code and operation domain of average pondization instruction 1.Wherein, the operation code of average pondization instruction 1 is Avgpool, it is 200 that destination address 500, which is the address 100, Chi Huahe to operational data address, and pond nuclear volume is 5, input Height is 64, and it be 2, Chi Huahe width is 2 that input width, which be 32, Chi Huahe height, and the first stride is 2, and the second stride is 1.Control Molding block 11 obtained to operational data address 100 64 × 32 to operational data, 2 × 2 are obtained in Cong Chihua core address 200 Chi Huahe.
Computing module 12 carries out average pond operation to operational data using 5 pondization verifications, obtains operation result, and will Operation result is stored in destination address 500.
The course of work of above each module can refer to associated description above.
In this way, can efficiently, rapidly handle the instruction of average pondization, and carry out the efficiency of average pond operation with Speed is also significantly increased.
The disclosure provides a kind of machine learning arithmetic unit, which may include in one or more Average pond instruction processing unit is stated, for being obtained from other processing units to operational data and control information, is executed specified Machine learning operation.The machine learning arithmetic unit can learn arithmetic unit or non-machine learning operation dress from other machines It sets the middle averagely pondization that obtains to instruct, and implementing result, which is passed to peripheral equipment by I/O interface, (can also claim other processing to fill It sets).Peripheral equipment for example camera, display, mouse, keyboard, network interface card, wifi interface, server.When flat comprising more than one When equal pond instruction processing unit, it can be linked by specific structure between average pond instruction processing unit and transmit number According to for example, data being interconnected and transmitted by PCIE bus, to support the operation of more massive neural network.At this point, can To share same control system, there can also be control system independent;Can with shared drive, can also each accelerator have Respective memory.In addition, its mutual contact mode can be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases Connection.
Fig. 4 a shows the block diagram of the combined treatment device according to one embodiment of the disclosure.As shown in fig. 4 a, the combined treatment Device includes above-mentioned machine learning arithmetic unit, general interconnecting interface and other processing units.Machine learning arithmetic unit and its He interacts processing unit, the common operation completing user and specifying.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its His interface of the processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine Device learns the basic control such as unlatching, stopping of arithmetic unit;Other processing units can also cooperate with machine learning arithmetic unit It is common to complete processor active task.
General interconnecting interface refers to for transmitting data and control between machine learning arithmetic unit and other processing units It enables.The machine learning arithmetic unit obtains required input data from other processing units, and machine learning arithmetic unit is written The storage device of on piece;Control instruction can be obtained from other processing units, and the control of machine learning arithmetic unit on piece is written System caching;It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.
Fig. 4 b shows the block diagram of the combined treatment device according to one embodiment of the disclosure.In a kind of possible implementation In, as shown in Figure 4 b, the combined treatment device can also include storage device, storage device respectively with machine learning arithmetic unit It is connected with other described processing units.Storage device is used to be stored in machine learning arithmetic unit and other processing units Data, the data of operation required for being particularly suitable for are in the storage inside that machine learns arithmetic unit or other processing units The data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.
The disclosure provides a kind of machine learning chip, which includes above-mentioned machine learning arithmetic unit or combined treatment dress It sets.
The disclosure provides a kind of machine learning chip-packaging structure, which includes above-mentioned machine Learn chip.
The disclosure provides a kind of board, and Fig. 5 shows the structural schematic diagram of the board according to one embodiment of the disclosure.Such as Fig. 5 Shown, which includes above-mentioned machine learning chip-packaging structure or above-mentioned machine learning chip.Board is in addition to including machine Learn other than chip 389, can also include other matching components, which includes but is not limited to: memory device 390, Interface arrangement 391 and control device 392.
Memory device 390 and machine learning chip 389 (or the machine learning core in machine learning chip-packaging structure Piece) it is connected by bus, for storing data.Memory device 390 may include multiple groups storage unit 393.Each group of storage list Member 393 is connect with machine learning chip 389 by bus.It is appreciated that each group of storage unit 393 can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of storage units 393.Each group of storage unit 393 can To include multiple DDR4 particles (chip).It in one embodiment, may include 4 72 inside machine learning chip 389 DDR4 controller, 64bit is used for transmission data in above-mentioned 72 DDR4 controllers, and 8bit is used for ECC check.It is appreciated that working as When using DDR4-3200 particle in each group of storage unit 393, the theoretical bandwidth of data transmission can reach 25600MB/s.
In one embodiment, each group of storage unit 393 include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The setting control DDR in machine learning chip 389 Controller, the control for data transmission and data storage to each storage unit 393.
Interface arrangement 391 and machine learning chip 389 (or the machine learning core in machine learning chip-packaging structure Piece) electrical connection.Interface arrangement 391 for realizing machine learning chip 389 and external equipment (such as server or computer) it Between data transmission.Such as in one embodiment, interface arrangement 391 can be standard PCIE interface.For example, number to be processed Machine learning chip 289 is transferred to by standard PCIE interface according to by server, realizes data transfer.Preferably, work as use When 16 interface of PCIE 3.0X transmits, theoretical bandwidth can reach 16000MB/s.In another embodiment, interface arrangement 391 is gone back It can be other interfaces, the disclosure is not intended to limit the specific manifestation form of above-mentioned other interfaces, and interface arrangement can be realized Signaling transfer point.It (such as is serviced in addition, the calculated result of machine learning chip still sends back external equipment by interface arrangement Device).
Control device 392 is electrically connected with machine learning chip 389.Control device 392 is used for machine learning chip 389 State is monitored.Specifically, machine learning chip 389 can be electrically connected with control device 392 by SPI interface.Controller Part 392 may include single-chip microcontroller (Micro Controller Unit, MCU).If machine learning chip 389 may include multiple Chip, multiple processing cores or multiple processing circuits are handled, multiple loads can be driven.Therefore, machine learning chip 389 can be located In the different working condition such as multi-load and light load.It may be implemented by control device to processing multiple in machine learning chip The regulation of the working condition of chip, multiple processing and/or multiple processing circuits.
The disclosure provides a kind of electronic equipment, which includes above-mentioned machine learning chip or board.
Electronic equipment may include data processing equipment, computer equipment, robot, computer, printer, scanner, put down Plate computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, Video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles may include aircraft, steamer and/or vehicle.Household electrical appliance may include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator.Medical Devices may include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
Fig. 6 shows the flow chart of the average pond command processing method according to one embodiment of the disclosure.This method can answer To contain such as computer equipment of memory and processor, wherein memory is made for storing execution method in the process Data;Processor such as executes following step S51 and step S52 for executing relevant processing, calculation step.Such as Fig. 6 institute Show, this method is applied to above-mentioned average pond instruction processing unit, the method comprising the steps of S51 and step S52.
In step s 51, the average pondization instruction got is compiled using control module, it is flat after being compiled Equal pondization instruction parses the average pondization instruction after compiling, obtains the operation code and operation domain of average pondization instruction, and It is obtained according to operation code and operation domain needed for executing average pondization instruction to operational data, Chi Huahe and destination address.Wherein, It is average pond operation that operation code, which is used to indicate the operation that average pondization instruction carries out data, and operation domain includes to operand According to address, the address Chi Huahe and destination address.
In step S52, is checked using computing module according to pondization and carry out average pond operation to operational data, transported It calculates as a result, and operation result is stored in destination address.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported It calculates as a result, may include: the add operation executed using multiple adders in computing module in average pond operation, Yi Jili The division arithmetic in average pond operation is executed with multiple dividers in computing module.
In one possible implementation, computing module includes main operation submodule and multiple from operation submodule, main Operation submodule includes multiple adders and multiple dividers.Wherein, step S52 may include:
Using in main operation submodule multiple adders and multiple dividers carry out adding in average pond operation respectively Method operation and division arithmetic obtain operation result, and operation result are stored in destination address.
In one possible implementation, operation domain can also include input height and input width.Wherein, according to behaviour Make code and operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address, may include:
From to operational data address, corresponding input width and input height are obtained to operational data.
In one possible implementation, operation domain can also include pond core height and Chi Huahe width.Wherein, root It obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address, can wrap according to operation code and operation domain It includes:
Chi Huahe is obtained from the address Chi Huahe according to pond core height and Chi Huahe width.
In one possible implementation, operation domain can also include the first stride.Wherein, it is checked according to pondization to be shipped It counts according to average pond operation is carried out, may include: to move Chi Huahe in the x direction according to the first stride.
In one possible implementation, operation domain can also include the second stride.Wherein, it is checked according to pondization to be shipped It counts according to average pond operation is carried out, may include: to move Chi Huahe in y-direction according to the second stride.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported It calculates as a result, may include:
Non-overlap moves Chi Huahe on to operational data, and compares multiple to operation in region corresponding to the core of pond Data obtain operation result.
In one possible implementation, it is checked according to pondization and carries out average pond operation to operational data, transported It calculates as a result, may include:
When size that the size of operational data is Chi Huahe it is non-integral multiple when, treat in operational data as Chi Huahe The data of the integral multiple of size carry out average pond operation,
Wherein, may include at least one of following to the non-integral multiple of size that the size of operational data is Chi Huahe: to The width that the input width of operational data is Chi Huahe it is non-integral multiple, to operational data input height be Chi Huahe height It is non-integral multiple.
In one possible implementation, operation domain can also include pond nuclear volume.Wherein, according to pondization check to Operational data carries out average pond operation, obtains operation result, may include:
It is multiple Chi Huahe of pond nuclear volume by quantity, treats operational data and carry out average pond operation.
In one possible implementation, this method can also include: to be stored using the memory module of device to operation Data and Chi Huahe.Wherein, memory module may include at least one of register and caching, and caching is for storing to operation Data and Chi Huahe, caching may include at least one neuron caching NRAM;Register is for storing in operational data Scalar data;For neuron caching for storing to the neuron number evidence in operational data, neuron number evidence may include neuron Vector data.
In one possible implementation, the average pondization instruction got is parsed, obtains average pond and refers to The operation code and operation domain of order may include:
Average pondization instruction after storage compiling;
Average pondization instruction after compiling is parsed, the operation code and operation domain of average pondization instruction are obtained;
Store instruction queue, instruction queue include be arranged successively according to execution sequence it is multiple instructions to be performed, it is multiple to Execute instruction may include compiling after average pondization instruct.
In one possible implementation, this method can also include: determine it is multiple it is instructions to be performed in first It is instructions to be performed the 0th instructions to be performed there are when incidence relation with first before instructions to be performed, cache the first pending finger Enable, the 0th it is instructions to be performed be finished after, execute it is first instructions to be performed,
Wherein, instructions to be performed there are incidence relation packets by the first the 0 with first before instructions to be performed instructions to be performed It includes:
The first storage address section and the storage the 0th for storing the first required data instructions to be performed are instructions to be performed required 0th storage address section of data has the region of overlapping.
In one possible implementation, the average pondization instruction got is compiled, it is flat after being compiled Equal pondization instructs, and may include:
It is instructed according to average pondization and generates assembling file, and assembling file is translated into binary file.Wherein, binary system File is the average pondization instruction after compiling.
It should be noted that it is as above although describing average pond command processing method using above-described embodiment as example, It is understood by one of ordinary skill in the art that the disclosure answer it is without being limited thereto.In fact, user completely can according to personal preference and/or Practical application scene flexibly sets each step, as long as meeting the technical solution of the disclosure.
Average pond command processing method is applied widely provided by the embodiment of the present disclosure, to the instruction of average pondization Treatment effeciency is high, processing speed is fast, and treatment effeciency height, the speed for carrying out average pond operation are fast.
The disclosure also provides a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions, It is characterized in that, the computer program instructions realize above-mentioned average pond command processing method when being executed by processor.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, present disclosure is not limited by the described action sequence because According to present disclosure, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily present disclosure It is necessary.
Explanation is needed further exist for, although each step in the flow chart of Fig. 6 is successively shown according to the instruction of arrow, But these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these There is no stringent sequences to limit for the execution of step, these steps can execute in other order.Moreover, in Fig. 6 at least A part of step may include that perhaps these sub-steps of multiple stages or stage are not necessarily in same a period of time to multiple sub-steps Quarter executes completion, but can execute at different times, the execution in these sub-steps or stage be sequentially also not necessarily according to Secondary progress, but in turn or can replace at least part of the sub-step or stage of other steps or other steps Ground executes.
It should be understood that above-mentioned Installation practice is only illustrative, the device of present disclosure can also be by another way It realizes.For example, the division of units/modules described in above-described embodiment, only a kind of logical function partition, in actual implementation may be used To there is other division mode.For example, multiple units, module or component can combine, or be desirably integrated into another system, Or some features can be ignored or does not execute.
In addition, unless otherwise noted, each functional unit/module in each embodiment of present disclosure can integrate at one In units/modules, it is also possible to each unit/module and physically exists alone, it can also be with two or more units/modules collection At together.Above-mentioned integrated units/modules both can take the form of hardware realization, can also be using software program module Form is realized.
If the integrated units/modules are realized in the form of hardware, which can be digital circuit, simulation electricity Road etc..The physics realization of hardware configuration includes but is not limited to transistor, memristor etc..Unless otherwise noted, if without spy Do not mentionlet alone bright, above-mentioned memory module can be any magnetic storage medium appropriate or magnetic-optical storage medium, for example, resistive formula stores Device RRAM (Resistive Random Access Memory), dynamic random access memory DRAM (Dynamic Random Access Memory), static random access memory SRAM (Static Random-Access Memory), enhancing dynamic with Machine accesses memory EDRAM (Enhanced Dynamic Random Access Memory), high bandwidth memory HBM (High- Bandwidth Memory), mixing storage cube HMC (Hybrid Memory Cube) etc..
If the integrated units/modules realized in the form of software program module and as independent product sale or In use, can store in a computer-readable access to memory.Based on this understanding, the technical solution essence of present disclosure On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words Formula embodies, which is stored in a memory, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for present disclosure whole or Part steps.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory Various Jie that can store program code such as device (RAM, Random Access Memory), mobile hard disk, magnetic or disk Matter.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.Each technical characteristic of above-described embodiment can be combined arbitrarily, to make Description is succinct, and combination not all possible to each technical characteristic in above-described embodiment is all described, as long as however, these Contradiction is not present in the combination of technical characteristic, all should be considered as described in this specification.
Foregoing teachings can be better understood according to following clause:
Clause A1, a kind of average pond instruction processing unit, described device include:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled, Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module obtains operation for carrying out average pond operation to operational data according to pondization verification is described As a result, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
Clause A2, the device according to clause A1, the computing module, comprising:
Multiple adders, for executing the add operation in the average pond operation;
Multiple dividers, for executing the division arithmetic in the average pond operation.
Clause A3, the device according to clause A2, the computing module include main operation submodule and multiple from operation Submodule, the main operation submodule include the multiple adder and the multiple divider,
The main operation submodule is described flat for being carried out respectively using the multiple adder and the multiple divider Add operation and division arithmetic in equal pond operation obtain operation result, and with being stored in the target by the operation result In location.
Clause A4, the device according to clause A1, the operation domain further include inputting height and input width,
Wherein, the control module, is also used to from described in operational data address, obtain the corresponding input width with It is described input height to operational data.
Clause A5, the device according to clause A1, the operation domain further include Chi Huahe height and Chi Huahe width,
Wherein, the control module is also used to according to the pond core height and the Chi Huahe width from the pond The Chi Huahe is obtained in core address.
Clause A6, the device according to clause A1, the operation domain further include the first stride,
Wherein, the computing module is also used to move the Chi Huahe in the x direction according to first stride.
Clause A7, the device according to clause A1, the operation domain further include the second stride,
Wherein, the computing module is also used to move the Chi Huahe in y-direction according to second stride.
Clause A8, the device according to clause A1,
The computing module is also used to move the Chi Huahe to non-overlap in operational data described, and described in comparison It is multiple to operational data in region corresponding to Chi Huahe, obtain the operation result.
Clause A9, the device according to clause A1,
The computing module is also used in the non-integral multiple of the size that the size to operational data is the Chi Huahe When, average pond operation is carried out to the data in operational data be the integral multiple of the size of the Chi Huahe,
Wherein, the size to operational data is the non-integral multiple of the size of the Chi Huahe, including following at least one : non-integral multiple, the described input to operational data for the width that the input width to operational data is the Chi Huahe Height is the non-integral multiple of the height of the Chi Huahe.
Clause A10, the device according to clause A1, the operation domain further include pond nuclear volume,
Wherein, the computing module, be also used to by quantity be the pond nuclear volume multiple Chi Huahe, to it is described to Operational data carries out average pond operation.
Clause A11, the device according to clause A1, described device further include:
Memory module, it is described to operational data and the Chi Huahe for storing,
Wherein, the memory module include register and caching at least one of,
The caching, described to operational data and the Chi Huahe for storing, the caching includes at least one nerve Member caching NRAM;
The register, it is described to the scalar data in operational data for storing;
The neuron caching, it is described to the neuron number evidence in operational data, the neuron data packet for storing Include neuron vector data.
Clause A12, the device according to clause A1, the control module, comprising:
Sub-module stored is instructed, is instructed for storing the average pondization after the compiling;
Instruction processing submodule obtains average pond and refers to for parsing to the average pondization instruction after the compiling The operation code and operation domain of order;
Queue sub-module stored, for storing instruction queue, described instruction queue include being arranged successively according to execution sequence It is multiple instructions to be performed, the multiple average pondization instructions to be performed including after the compiling instructs.
Clause A13, the device according to clause A12, the control module, further includes:
Dependence handle submodule, for determine it is the multiple it is instructions to be performed in the first instructions to be performed and institute State first it is instructions to be performed before it is the 0th instructions to be performed there are when incidence relation, described first instructions to be performed is buffered in In described instruction sub-module stored, the described 0th it is instructions to be performed be finished after, mentioned from described instruction sub-module stored Take described first it is instructions to be performed be sent to the computing module,
Wherein, the described first the 0th presence instructions to be performed instructions to be performed with described first before instructions to be performed is closed Connection relationship includes:
The first storage address section and the storage the described 0th for storing the described first required data instructions to be performed are pending 0th storage address section of data needed for instructing has the region of overlapping.
Clause A14, the device according to clause A1,
The control module is also used to according to the average pondization instruction generation assembling file, and by the assembling file Binary file is translated into,
Wherein, the binary file is the average pondization instruction after the compiling.
Clause A15, a kind of machine learning arithmetic unit, described device include:
One or more described in any item average pond instruction processing units of such as clause A1- clause A14, are used for from other It is obtained in processing unit to operational data and control information, and executes specified machine learning operation, implementing result is passed through into I/O Interface passes to other processing units;
It is the multiple described when the machine learning arithmetic unit includes multiple average pond instruction processing units It can be attached by specific structure between average pond instruction processing unit and transmit data;
Wherein, multiple average pond instruction processing units are carried out by quick external equipment interconnection Bus PC IE bus Data are interconnected and transmit, to support the operation of more massive machine learning;Multiple average pond instruction processing units are total It enjoys same control system or possesses respective control system;Multiple average pond instruction processing unit shared drives are gathered around There is respective memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
Clause A16, a kind of combined treatment device, the combined treatment device include:
Machine learning arithmetic unit, general interconnecting interface and other processing units as described in clause A15;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying Operation,
Wherein, the combined treatment device further include: storage device, the storage device respectively with the machine learning operation Device is connected with other described processing units, for saving the number of the machine learning arithmetic unit and other processing units According to.
Clause A17, a kind of machine learning chip, the machine learning chip include:
Machine learning arithmetic unit as described in clause A15 or the combined treatment device as described in clause A16.
Clause A18, a kind of electronic equipment, the electronic equipment include:
Machine learning chip as described in clause A17.
Clause A19, a kind of board, the board include: memory device, interface arrangement and control device and such as clause Machine learning chip described in A17;
Wherein, the machine learning chip and the memory device, the control device and the interface arrangement are distinguished Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the machine learning chip and external equipment;
The control device is monitored for the state to the machine learning chip.
Clause A20, a kind of average pond command processing method, the method are applied to average pond instruction processing unit, Described device includes control module and computing module, which comprises
The average pondization instruction got is compiled using control module, the average pondization instruction after being compiled, Average pondization instruction after the compiling is parsed, obtains the operation code and operation domain of average pondization instruction, and according to institute It states operation code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described using computing module, obtains operation knot Fruit, and the operation result is stored in the destination address,
Wherein, it is average Chi Huayun that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, It calculates, the operation domain includes to operational data address, the address Chi Huahe and the destination address.
Clause A21, the method according to clause A20 are averaged according to pondization verification is described to operational data Pond operation obtains operation result, comprising:
The add operation in the average pond operation, Yi Jili are executed using multiple adders in the computing module The division arithmetic in the average pond operation is executed with multiple dividers in the computing module.
Clause A22, the method according to clause A21, the computing module include main operation submodule and multiple from fortune Operator module, the main operation submodule include multiple adders and multiple dividers,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, and will The operation result is stored in the destination address, comprising:
Using in the main operation submodule the multiple adder and the multiple divider carry out respectively it is described flat Add operation and division arithmetic in equal pond operation obtain operation result, and with being stored in the target by the operation result In location.
Clause A23, the method according to clause A20, the operation domain further include inputting height and input width,
Wherein, it is obtained according to the operation code and the operation domain needed for executing the average pondization instruction to operand According to, Chi Huahe and destination address, comprising:
From described in operational data address, obtain the corresponding input width and the input height to operand According to.
Clause A24, the method according to clause A20, the operation domain further include Chi Huahe height and Chi Huahe width,
Wherein, obtained according to the operation code and the operation domain execute average pondization instruction it is required to operational data, Chi Huahe and destination address, comprising:
The Chi Huahe is obtained from the address Chi Huahe according to the pond core height and the Chi Huahe width.
Clause A25, the method according to clause A20, the operation domain further include the first stride,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, comprising:
The Chi Huahe is moved in the x direction according to first stride.
Clause A26, the method according to clause A20, the operation domain further include the second stride,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, comprising:
The Chi Huahe is moved in y-direction according to second stride.
Clause A27, the method according to clause A20 are averaged according to pondization verification is described to operational data Pond operation obtains operation result, comprising:
Described to the mobile Chi Huahe of non-overlap in operational data, and in region corresponding to the pond core It is multiple to operational data, obtain the operation result.
Clause A28, the method according to clause A20 are averaged according to pondization verification is described to operational data Pond operation obtains operation result, comprising:
It is described when the size of operational data be the Chi Huahe size it is non-integral multiple when, to described to operational data In be the Chi Huahe the data of integral multiple of size carry out average pond operation,
Wherein, the size to operational data is the non-integral multiple of the size of the Chi Huahe, including following at least one : non-integral multiple, the described input to operational data for the width that the input width to operational data is the Chi Huahe Height is the non-integral multiple of the height of the Chi Huahe.
Clause A29, the method according to clause A20, the operation domain further include pond nuclear volume,
Wherein, average pond operation is carried out to operational data according to pondization verification is described, obtains operation result, packet It includes:
It is multiple Chi Huahe of the pond nuclear volume by quantity, carries out average Chi Huayun to operational data to described It calculates.
Clause A30, the method according to clause A20, the method also includes:
Stored using the memory module of described device it is described to operational data and the Chi Huahe,
Wherein, the memory module include register and caching at least one of,
The caching, described to operational data and the Chi Huahe for storing, the caching includes at least one nerve Member caching NRAM;
The register, it is described to the scalar data in operational data for storing;
The neuron caching is described to the neuron number evidence in operational data for storing, and the neuron number evidence includes Neuron vector data.
Clause A31, the method according to clause A20 parse the average pondization instruction got, obtain described The operation code and operation domain of average pondization instruction, comprising:
Average pondization instruction after storing the compiling;
Average pondization instruction after the compiling is parsed, the operation code and operation domain of average pondization instruction are obtained;
Store instruction queue, described instruction queue include that multiple instructions to be performed, institute is arranged successively according to execution sequence State multiple average pondization instructions instructions to be performed including after the compiling.
Clause A32, the method according to clause A31, the method also includes:
Determine it is the multiple it is instructions to be performed in first it is instructions to be performed with described first it is instructions to be performed before 0th is instructions to be performed there are when incidence relation, it is instructions to be performed to cache described first, and determining the 0th pending finger After order is finished, control carries out first execution instructions to be performed,
Wherein, the described first the 0th presence instructions to be performed instructions to be performed with described first before instructions to be performed is closed Connection relationship includes:
The first storage address section and the storage the described 0th for storing the described first required data instructions to be performed are pending 0th storage address section of data needed for instructing has the region of overlapping.
Clause A33, the method according to clause A20 are compiled the average pondization instruction got, are compiled Average pondization instruction afterwards, comprising:
Assembling file is generated according to the average pondization instruction, and the assembling file is translated into binary file,
Wherein, the binary file is the average pondization instruction after the compiling.
Clause A34, a kind of non-volatile computer readable storage medium storing program for executing, are stored thereon with computer program instructions, described Realize clause A20 to clause A33 described in any item methods when computer program instructions are executed by processor.
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas; At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.

Claims (10)

1. a kind of average pond instruction processing unit, which is characterized in that described device includes:
Control module, for being compiled to the average pondization instruction got, the average pondization instruction after being compiled, to institute Average pondization instruction after stating compiling is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to the behaviour Make code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Computing module, for carrying out average pond operation to operational data according to pondization verification is described, acquisition operation result, And the operation result is stored in the destination address,
Wherein, it is average pond operation that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, The operation domain includes to operational data address, the address Chi Huahe and the destination address.
2. the apparatus according to claim 1, which is characterized in that the computing module, comprising:
Multiple adders, for executing the add operation in the average pond operation;
Multiple dividers, for executing the division arithmetic in the average pond operation.
3. the apparatus of claim 2, which is characterized in that the computing module include main operation submodule and it is multiple from Operation submodule, the main operation submodule include the multiple adder and the multiple divider,
The main operation submodule, for carrying out the average pond respectively using the multiple adder and the multiple divider Change the add operation in operation and division arithmetic, obtains operation result, and the operation result is stored in the destination address.
4. a kind of machine learning arithmetic unit, which is characterized in that described device includes:
One or more average pond instruction processing units as described in any one of claims 1-3, for being filled from other processing Middle acquisition is set to operational data and control information, and executes specified machine learning operation, implementing result is passed by I/O interface Pass other processing units;
It is the multiple described average when the machine learning arithmetic unit includes multiple average pond instruction processing units It can be attached by specific structure between the instruction processing unit of pond and transmit data;
Wherein, multiple average pond instruction processing units are interconnected by quick external equipment interconnection Bus PC IE bus And data are transmitted, to support the operation of more massive machine learning;Multiple average pond instruction processing units are shared same One control system possesses respective control system;Multiple average pond instruction processing unit shared drives possess each From memory;The mutual contact mode of multiple average pond instruction processing units is any interconnection topology.
5. a kind of combined treatment device, which is characterized in that the combined treatment device includes:
Machine learning arithmetic unit, general interconnecting interface and other processing units as claimed in claim 4;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying Make,
Wherein, the combined treatment device further include: storage device, the storage device respectively with the machine learning arithmetic unit It is connected with other described processing units, for saving the data of the machine learning arithmetic unit and other processing units.
6. a kind of machine learning chip, which is characterized in that the machine learning chip includes:
Machine learning arithmetic unit as claimed in claim 4 or combined treatment device as claimed in claim 5.
7. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Machine learning chip as claimed in claim 6.
8. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right It is required that machine learning chip described in 6;
Wherein, the machine learning chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the machine learning chip and external equipment;
The control device is monitored for the state to the machine learning chip.
9. a kind of average pond command processing method, which is characterized in that the method is applied to average pond instruction processing unit, Described device includes control module and computing module, which comprises
The average pondization instruction got is compiled using control module, the average pondization instruction after being compiled, to institute Average pondization instruction after stating compiling is parsed, and obtains the operation code and operation domain of average pondization instruction, and according to the behaviour Make code and the operation domain obtains needed for executing average pondization instruction to operational data, Chi Huahe and destination address;
Average pond operation is carried out to operational data according to pondization verification is described using computing module, obtains operation result, And the operation result is stored in the destination address,
Wherein, it is average pond operation that the operation code, which is used to indicate the operation that the average pondization instruction carries out data, The operation domain includes to operational data address, the address Chi Huahe and the destination address.
10. a kind of non-volatile computer readable storage medium storing program for executing, which is characterized in that computer program instructions are stored thereon with, It is characterized in that, the computer program instructions realize method as claimed in claim 9 when being executed by processor.
CN201910548268.1A 2018-10-09 2019-06-24 Operation method, operation device, computer equipment and storage medium Active CN110096309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/110146 WO2020073923A1 (en) 2018-10-09 2019-10-09 Operation method and device, computer equipment, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811363029 2018-11-14
CN2018113630290 2018-11-14

Publications (2)

Publication Number Publication Date
CN110096309A true CN110096309A (en) 2019-08-06
CN110096309B CN110096309B (en) 2020-04-14

Family

ID=67451175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910548268.1A Active CN110096309B (en) 2018-10-09 2019-06-24 Operation method, operation device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110096309B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346705A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112346784A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112346707A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112396169A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112394991A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Floating point to half precision floating point instruction processing device and method and related products
CN112395008A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112394988A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Unsigned to half-precision floating point instruction processing device, method and related product
CN112394902A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Device and method for processing half-precision floating point to floating point instruction and related products
CN112394995A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Half-precision floating point to short shaping instruction processing device and method and related product
CN112394903A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Short shaping to half precision floating point instruction processing device, method and related product
CN112394985A (en) * 2019-08-12 2021-02-23 上海寒武纪信息科技有限公司 Execution method, device and related product
CN112395006A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112394997A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Eight-bit shaping to half-precision floating point instruction processing device and method and related product
CN112396170A (en) * 2019-08-14 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112394999A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, device and related product
CN112394998A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, device and related product
CN112394993A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Half-precision floating point to short shaping instruction processing device and method and related product
WO2021082747A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Operational apparatus and related product
CN113033789A (en) * 2019-12-24 2021-06-25 中科寒武纪科技股份有限公司 Bus system for order preservation, integrated circuit device, board card and order preservation method
CN113435591A (en) * 2019-08-14 2021-09-24 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN113704687A (en) * 2020-05-21 2021-11-26 杭州海康威视数字技术股份有限公司 Tensor calculation operation method and device and operation system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779060A (en) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization
CN106843993A (en) * 2016-12-26 2017-06-13 中国科学院计算技术研究所 A kind of method and system of resolving inversely GPU instructions
CN106991473A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 The average value value pond method for parallel processing based on SIMD of vector processor-oriented
CN107301453A (en) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 The artificial neural network forward operation apparatus and method for supporting discrete data to represent
CN107704922A (en) * 2017-04-19 2018-02-16 北京深鉴科技有限公司 Artificial neural network processing unit
CN107729990A (en) * 2017-07-20 2018-02-23 上海寒武纪信息科技有限公司 Support the device and method for being used to perform artificial neural network forward operation that discrete data represents
CN107832804A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
US20180217962A1 (en) * 2017-02-02 2018-08-02 Fujitsu Limited Operation processing apparatus and operation processing method
CN108615072A (en) * 2016-12-13 2018-10-02 谷歌公司 Average pond is executed within hardware

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301453A (en) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 The artificial neural network forward operation apparatus and method for supporting discrete data to represent
CN108615072A (en) * 2016-12-13 2018-10-02 谷歌公司 Average pond is executed within hardware
CN106843993A (en) * 2016-12-26 2017-06-13 中国科学院计算技术研究所 A kind of method and system of resolving inversely GPU instructions
US20180217962A1 (en) * 2017-02-02 2018-08-02 Fujitsu Limited Operation processing apparatus and operation processing method
CN106779060A (en) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization
CN106991473A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 The average value value pond method for parallel processing based on SIMD of vector processor-oriented
CN107704922A (en) * 2017-04-19 2018-02-16 北京深鉴科技有限公司 Artificial neural network processing unit
CN107807819A (en) * 2017-07-20 2018-03-16 上海寒武纪信息科技有限公司 A kind of device and method for being used to perform artificial neural network forward operation for supporting that discrete data represents
CN107992329A (en) * 2017-07-20 2018-05-04 上海寒武纪信息科技有限公司 A kind of computational methods and Related product
CN107729990A (en) * 2017-07-20 2018-02-23 上海寒武纪信息科技有限公司 Support the device and method for being used to perform artificial neural network forward operation that discrete data represents
CN107832804A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346784A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112346707A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112346705A (en) * 2019-08-07 2021-02-09 上海寒武纪信息科技有限公司 Instruction processing method and device and related product
CN112394985A (en) * 2019-08-12 2021-02-23 上海寒武纪信息科技有限公司 Execution method, device and related product
CN112394997A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Eight-bit shaping to half-precision floating point instruction processing device and method and related product
CN112394998A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, device and related product
CN112394988A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Unsigned to half-precision floating point instruction processing device, method and related product
CN112394902A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Device and method for processing half-precision floating point to floating point instruction and related products
CN112394995A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Half-precision floating point to short shaping instruction processing device and method and related product
CN112394903A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Short shaping to half precision floating point instruction processing device, method and related product
CN112394991A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Floating point to half precision floating point instruction processing device and method and related products
CN112395006A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112396169A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112395006B (en) * 2019-08-13 2024-07-26 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN112394999A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, device and related product
CN112395008A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
CN112394993A (en) * 2019-08-13 2021-02-23 上海寒武纪信息科技有限公司 Half-precision floating point to short shaping instruction processing device and method and related product
CN112396169B (en) * 2019-08-13 2024-04-02 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN113435591A (en) * 2019-08-14 2021-09-24 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112396170B (en) * 2019-08-14 2024-04-02 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN113435591B (en) * 2019-08-14 2024-04-05 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN112396170A (en) * 2019-08-14 2021-02-23 上海寒武纪信息科技有限公司 Operation method, operation device, computer equipment and storage medium
WO2021082747A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Operational apparatus and related product
CN113033789A (en) * 2019-12-24 2021-06-25 中科寒武纪科技股份有限公司 Bus system for order preservation, integrated circuit device, board card and order preservation method
CN113033789B (en) * 2019-12-24 2024-03-26 中科寒武纪科技股份有限公司 Bus system, integrated circuit device, board card and order preserving method for order preserving
CN113704687A (en) * 2020-05-21 2021-11-26 杭州海康威视数字技术股份有限公司 Tensor calculation operation method and device and operation system
CN113704687B (en) * 2020-05-21 2024-04-05 杭州海康威视数字技术股份有限公司 Tensor calculation operation method, device and operation system

Also Published As

Publication number Publication date
CN110096309B (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN110096309A (en) Operation method, device, computer equipment and storage medium
CN110096283A (en) Operation method, device, computer equipment and storage medium
CN110096310A (en) Operation method, device, computer equipment and storage medium
CN110119807A (en) Operation method, device, computer equipment and storage medium
CN110163362A (en) A kind of computing device and method
CN110059809A (en) A kind of computing device and Related product
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
US20230259737A1 (en) Integrated computing apparatus, chip, board card, device and computing method
CN111966399B (en) Instruction processing method and device and related products
WO2021082721A1 (en) Winograd convolution operation method, apparatus, and device, and storage medium
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111949318A (en) Instruction processing method and device and related product
CN110704040A (en) Information processing method and device, computer equipment and readable storage medium
CN111949317A (en) Instruction processing method and device and related product
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN112395006B (en) Operation method, device, computer equipment and storage medium
CN112396170B (en) Operation method, device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111338694B (en) Operation method, device, computer equipment and storage medium
WO2022001496A1 (en) Computing apparatus, integrated circuit chip, board card, electronic device, and computing method
CN111290788B (en) Operation method, operation device, computer equipment and storage medium
CN112395001A (en) Operation method, operation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant