CN109726800A - Operation method, device and Related product - Google Patents
Operation method, device and Related product Download PDFInfo
- Publication number
- CN109726800A CN109726800A CN201811639690.XA CN201811639690A CN109726800A CN 109726800 A CN109726800 A CN 109726800A CN 201811639690 A CN201811639690 A CN 201811639690A CN 109726800 A CN109726800 A CN 109726800A
- Authority
- CN
- China
- Prior art keywords
- caffe
- output
- model
- result
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Programmable Controllers (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This disclosure relates to a kind of operation method, device and Related product, the product includes control module, and the control module includes: instruction cache unit, instruction process unit and storage queue unit;Described instruction cache unit, for storing the associated computations of artificial neural network operation;Described instruction processing unit obtains multiple operational orders for parsing to the computations;The storage queue unit, for storing instruction queue, the instruction queue include: by the pending multiple operational orders of the tandem of the queue or computations.By above method, operation efficiency of the Related product when carrying out the operation of neural network model is can be improved in the disclosure.
Description
Technical field
This disclosure relates to field of artificial intelligence more particularly to a kind of operation method, device and Related product.
Background technique
Fusion mode and off-line mode are two kinds of distinctive methods of operation of neural network, are different from common layer-by-layer operation
Mode.Wherein, the data copy process of a part of fused network layer is no longer handled by CPU in neural network, but directly
The tasks such as data copy, data operation are completed on MLU board.This calculating process by multiple network layers all merges, no
The mode for directly completing to calculate on MLU via COU is fusion mode.On the basis of fusion mode, model is detached from
Network frame becomes independent of the network model of network frame, as off-line model, and the mode for running off-line model is as offline
Mode.
Since under fusion mode and off-line mode, the network layer data being fused is no longer flow through CPU, therefore user can not
Obtain the operation result of some of them layer.So, Debug demand or certain cause specifics needs directly obtain in user
When taking the result data of some in network layer, more complicated process flow just may require that.The processing mode generallyd use are as follows: will
Network is needing the position exported to be split, and the network layer for needing to export result data is detached from fusion mode;Either, will
The network layer for needing to export result data is replicated, to use as additional output.
In above two processing mode, the network layer due to being detached from fusion mode needs to run on CPU, this just make its
Arithmetic speed is decreased obviously, so as to cause more serious performance loss;And to handle course more multiple for the mode of duplicate network layer
It is miscellaneous, it equally also will affect processing speed, lead to performance loss.
Summary of the invention
In view of this, the present disclosure proposes a kind of operation method, it is additional by being defined in the configuration file of Caffe model
Output parameter, the Caffe configuration file after being adjusted are indicated by additional output parameter by the intermediate result of Caffe model
Increase in the output result of Caffe model, thus when executing Caffe model according to Caffe configuration file adjusted, energy
Access the output result of the Caffe model including intermediate result.Compared to traditional intermediate result acquisition modes, effectively improve
Arithmetic speed, avoids the loss of model performance.
According to the one side of the disclosure, a kind of operation method is provided, comprising:
Additional output parameter is defined in the configuration file of Caffe model, the Caffe configuration file after being adjusted, institute
It states additional output parameter to be used to indicate that the intermediate result by Caffe model increases in the output result of the Caffe model, institute
State the operation result that intermediate result includes at least one non-output layer in the Caffe model;
The Caffe model is executed according to the Caffe configuration file adjusted, obtains including the intermediate result
The output result of the Caffe model;
Wherein, the Caffe model is Fusion Model.
In one possible implementation, the method also includes:
Caffe off-line model is obtained according to the Caffe file adjusted, so that the Caffe off-line model is being held
When row, output includes the output result of the Caffe model of the intermediate result.
In one possible implementation, the operation method is suitable for Heterogeneous Computing framework, the Heterogeneous Computing frame
Structure includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;
The fusion sub-network are as follows: whole in Caffe model or subnetting network layers are subjected to the network obtained after Operator Fusion.
In one possible implementation, the value of the additional output parameter includes output identification or non-output identification,
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model
Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification
In the output result of Caffe model.
In one possible implementation, each non-in the intermediate result in the output result of the Caffe model
The output result of output layer is arranged according to the name order of network layer.
According to another aspect of the present disclosure, a kind of arithmetic unit is additionally provided, comprising:
Parameter definition module, for defining additional output parameter in the configuration file of Caffe model, after being adjusted
Caffe configuration file, the additional output parameter is for indicating that the intermediate result by Caffe model increases to the Caffe mould
In the output result of type, the intermediate result includes the operation result of at least one non-output layer in the Caffe model;
Model execution module is obtained for executing Caffe Fusion Model according to the Caffe configuration file adjusted
The output result of the Caffe model including the intermediate result.
In one possible implementation, described device further include:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe file adjusted, so that
When being executed, output includes the output result of the Caffe model of the intermediate result to the Caffe off-line model.
In one possible implementation, the arithmetic unit is suitable for Heterogeneous Computing framework, the Heterogeneous Computing frame
Structure includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;
The fusion sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
In one possible implementation, the value of the additional output parameter includes output identification or non-output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model
Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification
In the output result of Caffe model.
In one possible implementation, in the output result of the Caffe model, the centre and Ei excessively in it is non-
The output result of output layer is arranged according to the name order of network layer.
According to the one side that the disclosure provides, a kind of computer equipment, including memory, processor are provided, it is described to deposit
The computer program that can be run on a processor is stored on reservoir, the processor is realized such as when executing the computer program
The step of upper any the method.
According to the one side of the disclosure, a kind of readable storage medium storing program for executing is additionally provided, computer program is stored thereon with, it is special
The step of sign is, as above any the method is realized when the computer program is executed by processor.
According to another aspect of the present disclosure, a kind of machine learning arithmetic unit, the machine learning operation dress are additionally provided
It sets including one or more as above any arithmetic unit, for obtaining from other processing units to operation input data
With control information, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
It, can between the multiple arithmetic unit when the machine learning arithmetic unit includes multiple arithmetic units
To be attached by specific structure and transmit data;
Wherein, multiple arithmetic units are interconnected by PCIE bus and are transmitted data, more massive to support
The operation of machine learning;
Multiple arithmetic units share same control system or possess respective control system;
Multiple arithmetic unit shared drives possess respective memory;
The mutual contact mode of multiple arithmetic units is any interconnection topology.
According to the one side of the disclosure, a kind of combined treatment device is provided, the combined treatment device includes institute as above
The machine learning arithmetic unit stated, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying
Operation.
In one possible implementation, further includes: storage device;
The storage device is connect with the machine learning arithmetic unit and other described processing units respectively, for saving
Machine learning arithmetic unit as described above or combined treatment device as described above.
According to the one side of the disclosure, a kind of neural network chip is provided, the chip includes machine as described above
Learn arithmetic unit or combined treatment device as described above or combined treatment device as described above.
According to another aspect of the present disclosure, a kind of electronic equipment is additionally provided, the electronic equipment includes as described above
Neural network chip.
According to the one side of the disclosure, a kind of board is provided, the board includes: memory device, interface arrangement and control
Device processed and neural network chip as described above;
Wherein, the neural network chip and the memory device, the control device and the interface arrangement are distinguished
Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the neural network chip and external equipment;
The control device is monitored for the state to the neural network chip.
In one possible implementation, the memory device includes: multiple groups storage unit, is stored described in each group single
It is first to be connect with the neural network chip by bus, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit
System;
The interface arrangement are as follows: standard PCIE interface.
Above-mentioned operation method, by defining additional output parameter in the configuration file of Caffe model, after being adjusted
Caffe configuration file indicates the output that the intermediate result of Caffe model is increased to Caffe model by additional output parameter
As a result in, to can obtain including intermediate result when executing Caffe model according to Caffe configuration file adjusted
The output result of Caffe model.Compared to traditional intermediate result acquisition modes, arithmetic speed is effectively increased, mould is avoided
The loss of type performance.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become
It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure
Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure;
Fig. 2 shows the structural schematic diagrams of the neural network of the relevant technologies;
Fig. 3 shows the schematic diagram of operation logic of the neural network under layer-by-layer mode;
Fig. 4 shows the schematic diagram of operation logic of the neural network under fusion mode;
Fig. 5 shows the schematic diagram of operation logic of the neural network under an embodiment of the disclosure;
Fig. 6 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure;
Fig. 7 shows the block diagram of the combined treatment according to one embodiment of the disclosure;
Fig. 8 shows the block diagram of another combined treatment device according to one embodiment of the disclosure;
Fig. 9 shows the block diagram of the board according to one embodiment of the disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing
Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove
It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary "
Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure.
It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for
Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Firstly, it is necessary to explanation, the file for generating Caffe model generally includes two: one as structured file
(Prototxt, abbreviation pt), that is, configuration file described below;Another is then weight file (caffemodel).On
State the configuration file (pt) that the object adjusted in operation method can be stored in disk, or by configuration file
Adjustment after being loaded onto memory with weight file, to the configuration file progress being loaded onto memory.
Fig. 1 shows the flow chart of the operation method of the embodiment according to the disclosure.Refering to fig. 1, the embodiment of the present disclosure
Operation method, comprising:
Step S100 defines additional output parameter in the configuration file of Caffe model, and the Caffe after being adjusted matches
Set file.Wherein, it should be noted that additional output parameter is for indicating that the intermediate result by Caffe model increases to Caffe
In the output result of model.Intermediate result refer to include the non-output layer of at least one in Caffe model operation result.
Step S200 executes Caffe model according to Caffe configuration file adjusted, obtains including intermediate result
The output result of Caffe model.Wherein, Caffe model can be Fusion Model.
It should be pointed out that the operation method of the disclosure can be based on convolutional neural networks frame (Convolutional
Architecture for Fast Feature Embedding, abbreviation Caffe) fusion mode carry out, it can also be based on
He carries out at neural network structure, and embodiments disclosed below is illustrated by taking Caffe model as an example.Those skilled in the art
It is understood that the operation method of the disclosure can also be applied to other neural networks, principle is same or similar, herein not
It is repeated one by one again.
In addition, for the clearer operation method for illustrating the disclosure, below to neural network and neural network operation side
Formula carries out more detailed explanation, in order to be more clearly understood the disclosure operation method technical solution.
Referring to Fig.2, Fig. 2 shows the structural schematic diagrams of the neural network of the relevant technologies.The neural network structure includes one
Input (input) and two outputs (output1 and output2).What each layer referred between input and output is exactly net
Each network layer in network.Currently, the method for operation of neural network generally includes layer-by-layer mode, fusion mode and offline mould
Formula.
Operation logic of the neural network under layer-by-layer mode is shown refering to Fig. 3, Fig. 3.Wherein, under layer-by-layer mode, mind
Operation through each network layer (layer) in network is completed by MLU (artificial intelligence chip), the data between each network layer
Transmission is completed by CPU, since interaction between layers is realized in CPU, can be directly obtained under this kind of mode
Each layer of network output.
In order to accelerate network operations rate, network performance is improved, two kinds of nerve nets of fusion mode and off-line mode are proposed
The network method of operation.Operation logic of the neural network under fusion mode is shown refering to Fig. 4, Fig. 4.Wherein, under fusion mode,
The operation of each network layer and interaction between layers are completed in MLU, and CPU is only involved in the process output and input.That is,
Fusion Model refers to that the data copy process of fused network layer in neural network is no longer handled by CPU, but directly exists
Data copy, the tasks such as data operation are completed on MLU board.The method of operation of Fusion Model is fusion mode.In fusion mould
Under formula, the operation of whole network is transparent to CPU, therefore user is the operation knot that can not directly acquire mid-level net network layers at this time
Fruit.
Off-line mode is then that model is detached from frame (that is, Caffe model is detached from the basis of fusion mode
Caffe frame), become independent of the method for operation of the network model (off-line model) of frame.Similarly, in disconnection mode, net
The operational process of network be to CPU it is transparent, user can not directly acquire the operation result of mid-level net network layers.
It can be using an embodiment of the operation method of the mentioned-above disclosure, in network model (Caffe model)
Additional output parameter is defined in configuration file, is indicated by defined additional output parameter by the intermediate result of network model
Increase in the output result of network model, so that when executing Caffe model according to Caffe configuration file adjusted, energy
It is enough to be directly obtained intermediate result in the output result of Caffe model.
Operation method disclosed above, it is only necessary to the configuration file of Caffe model is adjusted, it is fixed in configuration file
The additional output parameter of justice, can be realized the purpose that the operation result (intermediate result) of mid-level net network layers is directly acquired on MLU, grasp
Make simple, it is easy to accomplish.Also, the operation method of the disclosure carries out network in the position for needing to export compared to traditional
The mode of fractionation, process flow is simple, while also will not influence the performance of network.
As a kind of possible implementation, Fig. 5 shows the signal of the operation method of the embodiment according to the disclosure
Figure.Refering to Fig. 5, the additional output parameter defined in the configuration file of Caffe model can be with are as follows: external_output.
That is, external_output additional to the non-output layer of at least one in Caffe model (other network layers in addition to output layer)
Parameter can carry out the operation result of respective wire network layers during Caffe model running according to external_output
Output.It is easy to operate, it is easy to accomplish.
As a kind of possible implementation, further includes: it is offline to obtain Caffe according to Caffe configuration file adjusted
Model, so that Caffe off-line model is when being executed, output includes the output result of the Caffe model of intermediate result.That is, passing through
Caffe off-line model is obtained according to Caffe configuration file adjusted, Caffe Fusion Model is further converted into offline mould
After type, so that in disconnection mode, can equally sample any operation method in front and obtain intermediate result.
Wherein, it should be pointed out that the operation method of the disclosure is suitable for Heterogeneous Computing framework.Heterogeneous Computing framework includes
General processor (CPU) and artificial intelligence process device.Artificial intelligence process device can be the people for executing artificial intelligence operation
Work intelligent processor (IPU), artificial intelligence operation may include machine learning operation, class brain operation etc..Wherein, machine learning operation
Including neural network computing, k-means operation, support vector machines operation etc..The artificial intelligent processor can be for example including GPU
(Graphics Processing Unit, graphics processing unit), NPU (Neural-Network Processing Unit, mind
Through network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array
One of (Field-Programmable Gate Array, FPGA) chip or combination.Non- output layer then includes: to operate in
The non-output layer in fusion sub-network on artificial intelligence process device.Meanwhile according to mentioned-above Fusion Model, this field skill
Art personnel are it is understood that fusion sub-network refers to network layer all or part of in Caffe model carrying out Operator Fusion
The network obtained afterwards.
In addition, a kind of possible implementation as disclosure operation method, the value of additional output parameter may include
Output identification (such as: true) or non-output identification (such as: false).Correspondingly, can then lead to when defining additional output parameter
It crosses and defines the value of additional output parameter to indicate that the intermediate result by Caffe model increases in the output result of Caffe model.
Wherein, the output that intermediate result is increased to Caffe model is being indicated by the value for defining additional output parameter
It as a result, can be with when middle are as follows:
The value of the additional output parameter of layer to be output (that is, it needs to a certain network layer in the centre for exporting operation result) is arranged
It is true (that is, external_output:true), when the additional output parameter of a certain layer network layer is true, then by the network layer
Output exported as the output result of fused Caffe model.
That is, the operation result for the network layer that setting external_output is true, can be considered network after fusion
It is normal output as a result, its practical processing mode is equivalent to the network end-point result of Caffe model.Meanwhile the operation result according to
So while not influencing network operations, it can also decline without apparent performance in the continuous participation operation of fused network trunk.
Wherein, each non-defeated in intermediate result in the output result of Caffe model as a kind of possible implementation
The output result of layer is arranged according to the name order of network layer out.That is, in the output knot that intermediate result is increased to Caffe model
In fruit, when so that including intermediate result in the output result of Caffe model, it is right that intermediate result may include multiple network layers institute
The operation result answered.At this point it is possible to by the operation result (that is, multiple intermediate results) of multiple network layers according to the title of network layer
Sequence, which is sequentially arranged in the output result of Caffe model, to be exported.
By the way that multiple intermediate results are arranged according to the output result that the name order of network layer is sequentially arranged in Caffe model
In table, so that the corresponding network layer of output result is clear.
Above-mentioned operation method, by defining additional output parameter in the configuration file of Caffe model, after being adjusted
Caffe configuration file indicates the output that the intermediate result of Caffe model is increased to Caffe model by additional output parameter
As a result in, to can obtain including intermediate result when executing Caffe model according to Caffe configuration file adjusted
The output result of Caffe model.Compared to traditional intermediate result acquisition modes, arithmetic speed is effectively increased, mould is avoided
The loss of type performance.
According to the one side of the disclosure, a kind of arithmetic unit 100 is additionally provided.Fig. 6 shows the arithmetic unit 100 of the disclosure
An embodiment block diagram.Refering to Fig. 6, arithmetic unit 100 includes:
Parameter definition module 110 is adjusted for defining additional output parameter in the configuration file of Caffe model
Caffe configuration file afterwards, the additional output parameter is for indicating that it is described that the intermediate result by Caffe model increases to
In the output result of Caffe model, the intermediate result includes the operation knot of at least one non-output layer in the Caffe model
Fruit;
Model execution module 120 is obtained for executing Caffe Fusion Model according to the Caffe configuration file adjusted
To the output result for the Caffe model for including the intermediate result.
As a kind of possible implementation, further includes:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe configuration file adjusted,
So that the Caffe off-line model is when being executed, output includes the output result of the Caffe model of the intermediate result.
As a kind of possible implementation, the arithmetic unit 100 is suitable for Heterogeneous Computing framework, the Heterogeneous Computing
Framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;
The fusion sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
As a kind of possible implementation, the value of the additional output parameter includes output identification or non-output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model
Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification
In the output result of Caffe model.
As a kind of possible implementation, in the output result of the Caffe model, the centre and Ei excessively in it is non-
The output result of output layer is arranged according to the name order of network layer.
According to another aspect of the present disclosure, a kind of computer equipment, including memory, processor are provided, on memory
It is stored with the computer program that can be run on a processor, processor realizes as above any one operation when executing computer program
The step of method.
According to another aspect of the present disclosure, a kind of readable storage medium storing program for executing is additionally provided, computer program is stored thereon with, is counted
The step of as above any one operation method is realized when calculation machine program is executed by processor.
According to the one side of the disclosure, a kind of machine learning arithmetic unit is provided, machine learning arithmetic unit includes one
A or multiple any arithmetic units as above, for being obtained from other processing units to operation input data and control letter
Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface.Other processing
Device is for example: camera, display, mouse, keyboard, network interface card, wifi interface, server.When including more than one arithmetic unit
When, it can be linked by specific structure between arithmetic unit and transmit data, for example, be interconnected simultaneously by PCIE bus
Data are transmitted, to support the operation of more massive machine learning.At this point it is possible to share same control system, can also have each
From independent control system;Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can
To be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases
Connection.
Fig. 7 shows the block diagram of the combined treatment device 200a according to one embodiment of the disclosure.Refering to Fig. 7, the disclosure is also
A kind of combined treatment device 200a is provided, combined treatment device includes machine learning arithmetic unit as above (neural network fortune
Calculate device 210), general interconnecting interface 220 and other processing units 230.Machine learning arithmetic unit 210 and other processing units
230 interact, the common operation completing user and specifying.
Other processing units 230, including central processor CPU, graphics processor GPU, neural network processor etc. are general/
One of application specific processor or above processor type.Processor quantity included by other processing units 230 does not limit
System.Other interfaces of processing unit 230 as machine learning arithmetic unit and external data and control, including data are carried, complete
The basic controls such as unlatching, the stopping of pairs of machine study arithmetic unit;Other processing units can also be with machine learning operation
Device cooperation is common to complete processor active task.
General interconnecting interface 220, for being transmitted between the machine learning arithmetic unit 210 and other processing units 230
Data and control instruction.The machine learning arithmetic unit 210 obtains required input data from other processing units 230, writes
Enter the storage device of machine learning arithmetic unit on piece;Control instruction can be obtained from other processing units 230, and machine is written
Learn the control caching of arithmetic unit on piece;It can also learn the data and transmission in the memory module of arithmetic unit with read machine
Give other processing units.
Fig. 8 shows the block diagram of the combined treatment device 200b according to another embodiment of the disclosure.Refering to Fig. 8, the disclosure
Combined treatment device 200b can also include storage device 240, storage device 240 respectively with the machine learning arithmetic unit
210 and other described processing units 230 connection.Storage device 240 is for being stored in the machine learning arithmetic unit 210 and institute
The data for stating other processing units 230, be particularly suitable for required for operation data machine learn arithmetic unit or other
The data that can not be all saved in the storage inside of processing unit.
Combined treatment device 200b can be used as the SOC of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system on chip, improves processing speed, reduces overall power.When this situation, the combination
The general interconnecting interface of processing unit is connected with certain components of equipment.Certain components for example camera, display, mouse,
Keyboard, network interface card, wifi interface.
In some embodiments, a kind of chip is also disclosed comprising at above-mentioned machine learning arithmetic unit or combination
Manage device.
In some embodiments, a kind of chip-packaging structure is disclosed comprising said chip.
In some embodiments, a kind of board is disclosed comprising said chip encapsulating structure.It is mentioned refering to Fig. 9, Fig. 9
A kind of board is supplied, above-mentioned board can also include other matching components, this is mating other than including said chip 389
Component includes but is not limited to: memory device 390, interface arrangement 391 and control device 392.
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute
Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can
To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate
Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with
Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment
In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers
According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group
Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips,
Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described
Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface
Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server
Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s.
In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces
Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute
It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip
Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list
Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more
A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load
State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits
Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology
Other those of ordinary skill in domain can understand each embodiment disclosed herein.
Claims (19)
1. a kind of operation method characterized by comprising
Additional output parameter is defined in the configuration file of Caffe model, the Caffe configuration file after being adjusted, the volume
Outer output parameter for indicating that the intermediate result by Caffe model increases in the output result of the Caffe model, it is described in
Between result include at least one non-output layer in the Caffe model operation result;
The Caffe model is executed according to the Caffe configuration file adjusted, obtains including the described of the intermediate result
The output result of Caffe model;
Wherein, the Caffe model is Fusion Model.
2. the method according to claim 1, wherein further include:
Caffe off-line model is obtained according to the Caffe configuration file adjusted, so that the Caffe off-line model is being held
When row, output includes the output result of the Caffe model of the intermediate result.
3. the method according to claim 1, wherein the operation method be suitable for Heterogeneous Computing framework, it is described
Heterogeneous Computing framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;It is described
Merge sub-network are as follows: whole in Caffe model or subnetting network layers are subjected to the network obtained after Operator Fusion.
4. the method according to claim 1, wherein the value of the additional output parameter includes output identification or non-
Output identification,
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the output knot of the Caffe model
In fruit, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification
In the output result of Caffe model.
5. method according to any one of claims 1 to 4, which is characterized in that in the output result of the Caffe model
In, the output result of each non-output layer is arranged according to the name order of network layer in the intermediate result.
6. a kind of arithmetic unit characterized by comprising
Parameter definition module, for defining additional output parameter in the configuration file of Caffe model, after being adjusted
Caffe configuration file, the additional output parameter is for indicating that the intermediate result by Caffe model increases to the Caffe mould
In the output result of type, the intermediate result includes the operation result of at least one non-output layer in the Caffe model;
Model execution module is used to execute Caffe Fusion Model according to the Caffe configuration file adjusted, including
The output result of the Caffe model of the intermediate result.
7. device according to claim 6, which is characterized in that further include:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe configuration file adjusted, so that
When being executed, output includes the output result of the Caffe model of the intermediate result to the Caffe off-line model.
8. device according to claim 6, which is characterized in that the arithmetic unit is suitable for Heterogeneous Computing framework, described
Heterogeneous Computing framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;It is described
Merge sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
9. device according to claim 6, which is characterized in that the value of the additional output parameter includes output identification or non-
Output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the output knot of the Caffe model
In fruit, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification
In the output result of Caffe model.
10. according to the described in any item devices of claim 6 to 9, which is characterized in that in the output result of the Caffe model
In, the output result of each non-output layer is arranged according to the name order of network layer in the intermediate result.
11. a kind of computer equipment, including memory, processor, be stored on the memory to run on a processor
Computer program, which is characterized in that the processor realizes any one of claims 1 to 5 when executing the computer program
The step of the method.
12. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed
The step of any one of claim 1 to 5 the method is realized when device executes.
13. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed
Benefit requires 6 to 10 described in any item arithmetic units, for obtaining from other processing units to operation input data and control
Information, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple arithmetic units, can lead between the multiple arithmetic unit
Specific structure is crossed to be attached and transmit data;
Wherein, multiple arithmetic units are interconnected by PCIE bus and are transmitted data, to support more massive machine
The operation of study;
Multiple arithmetic units share same control system or possess respective control system;
Multiple arithmetic unit shared drives possess respective memory;
The mutual contact mode of multiple arithmetic units is any interconnection topology.
14. a kind of combined treatment device, which is characterized in that the combined treatment device includes engineering described in claim 13
Practise arithmetic unit, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying
Make.
15. combined treatment device according to claim 14, which is characterized in that further include: storage device;
The storage device is connect with the machine learning arithmetic unit and other described processing units respectively, described for saving
Machine learning arithmetic unit or combined treatment device as claimed in claim 14.
16. a kind of neural network chip, which is characterized in that the chip includes machine learning operation as claimed in claim 13
Device, or the combined treatment device as described in claim 14-15.
17. a kind of electronic equipment, which is characterized in that the electronic equipment includes neural network core as claimed in claim 16
Piece.
18. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right
It is required that neural network chip described in 17;
Wherein, the neural network chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the neural network chip and external equipment;
The control device is monitored for the state to the neural network chip.
19. board according to claim 18, which is characterized in that
The memory device includes: that multiple groups storage unit, storage unit described in each group and the neural network chip pass through always
Line connection, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811639690.XA CN109726800B (en) | 2018-12-29 | 2018-12-29 | Operation method, device and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811639690.XA CN109726800B (en) | 2018-12-29 | 2018-12-29 | Operation method, device and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109726800A true CN109726800A (en) | 2019-05-07 |
CN109726800B CN109726800B (en) | 2019-12-24 |
Family
ID=66297971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811639690.XA Active CN109726800B (en) | 2018-12-29 | 2018-12-29 | Operation method, device and related product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726800B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490309A (en) * | 2019-08-14 | 2019-11-22 | 北京中科寒武纪科技有限公司 | A kind of Operator Fusion method and its Related product for neural network |
CN110990060A (en) * | 2019-12-06 | 2020-04-10 | 北京瀚诺半导体科技有限公司 | Embedded processor, instruction set and data processing method of storage and computation integrated chip |
CN112052040A (en) * | 2019-06-06 | 2020-12-08 | 中科寒武纪科技股份有限公司 | Processing method, processing device, computer equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050125792A1 (en) * | 2003-12-08 | 2005-06-09 | Che-An Chang | Software materialization platform and an artificial neuron computer system |
US20080244251A1 (en) * | 2007-03-29 | 2008-10-02 | Khipu Systems Limited | Predictive model implementation system and methodology |
CN105760932A (en) * | 2016-02-17 | 2016-07-13 | 北京物思创想科技有限公司 | Data exchange method, data exchange device and calculating device |
CN106845631A (en) * | 2016-12-26 | 2017-06-13 | 上海寒武纪信息科技有限公司 | One kind stream performs method and device |
CN107808098A (en) * | 2017-09-07 | 2018-03-16 | 阿里巴巴集团控股有限公司 | A kind of model safety detection method, device and electronic equipment |
CN109086244A (en) * | 2018-07-11 | 2018-12-25 | 中国人民解放军国防科技大学 | Matrix convolution vectorization implementation method based on vector processor |
CN109086877A (en) * | 2016-04-29 | 2018-12-25 | 北京中科寒武纪科技有限公司 | A kind of device and method for executing convolutional neural networks forward operation |
CN109102074A (en) * | 2017-06-21 | 2018-12-28 | 上海寒武纪信息科技有限公司 | A kind of training device |
-
2018
- 2018-12-29 CN CN201811639690.XA patent/CN109726800B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050125792A1 (en) * | 2003-12-08 | 2005-06-09 | Che-An Chang | Software materialization platform and an artificial neuron computer system |
US20080244251A1 (en) * | 2007-03-29 | 2008-10-02 | Khipu Systems Limited | Predictive model implementation system and methodology |
CN105760932A (en) * | 2016-02-17 | 2016-07-13 | 北京物思创想科技有限公司 | Data exchange method, data exchange device and calculating device |
CN109086877A (en) * | 2016-04-29 | 2018-12-25 | 北京中科寒武纪科技有限公司 | A kind of device and method for executing convolutional neural networks forward operation |
CN106845631A (en) * | 2016-12-26 | 2017-06-13 | 上海寒武纪信息科技有限公司 | One kind stream performs method and device |
CN109102074A (en) * | 2017-06-21 | 2018-12-28 | 上海寒武纪信息科技有限公司 | A kind of training device |
CN107808098A (en) * | 2017-09-07 | 2018-03-16 | 阿里巴巴集团控股有限公司 | A kind of model safety detection method, device and electronic equipment |
CN109086244A (en) * | 2018-07-11 | 2018-12-25 | 中国人民解放军国防科技大学 | Matrix convolution vectorization implementation method based on vector processor |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052040A (en) * | 2019-06-06 | 2020-12-08 | 中科寒武纪科技股份有限公司 | Processing method, processing device, computer equipment and storage medium |
CN110490309A (en) * | 2019-08-14 | 2019-11-22 | 北京中科寒武纪科技有限公司 | A kind of Operator Fusion method and its Related product for neural network |
CN110490309B (en) * | 2019-08-14 | 2022-06-07 | 中科寒武纪科技股份有限公司 | Operator fusion method for neural network and related product thereof |
CN110990060A (en) * | 2019-12-06 | 2020-04-10 | 北京瀚诺半导体科技有限公司 | Embedded processor, instruction set and data processing method of storage and computation integrated chip |
Also Published As
Publication number | Publication date |
---|---|
CN109726800B (en) | 2019-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI803663B (en) | A computing device and computing method | |
CN109657782A (en) | Operation method, device and Related product | |
CN109685201A (en) | Operation method, device and Related product | |
CN109543832A (en) | A kind of computing device and board | |
CN109522052A (en) | A kind of computing device and board | |
CN109726800A (en) | Operation method, device and Related product | |
CN109284815A (en) | Neural network model algorithm Compilation Method, device and Related product | |
CN109543825A (en) | Neural network model algorithm Compilation Method, device and Related product | |
CN109740751A (en) | The framework fusion method and relevant apparatus of neural network model | |
CN109739703A (en) | Adjust wrong method and Related product | |
CN109753319A (en) | A kind of device and Related product of release dynamics chained library | |
CN109670581A (en) | A kind of computing device and board | |
CN110119807A (en) | Operation method, device, computer equipment and storage medium | |
CN110147249A (en) | A kind of calculation method and device of network model | |
CN109711540A (en) | A kind of computing device and board | |
CN110163349A (en) | A kind of calculation method and device of network model | |
CN110059809A (en) | A kind of computing device and Related product | |
CN109740729A (en) | Operation method, device and Related product | |
CN109740746A (en) | Operation method, device and Related product | |
CN109711367A (en) | Operation method, device and Related product | |
CN109740730A (en) | Operation method, device and Related product | |
CN109739514A (en) | Parameter processing method and Related product | |
CN110472734A (en) | A kind of computing device and Related product | |
US20230169031A1 (en) | Method and device for constructing communication topology structure on basis of multiple processing nodes | |
CN110020720B (en) | Operator splicing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing Patentee after: Zhongke Cambrian Technology Co., Ltd Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing Patentee before: Beijing Zhongke Cambrian Technology Co., Ltd. |