CN109726800A - Operation method, device and Related product - Google Patents

Operation method, device and Related product Download PDF

Info

Publication number
CN109726800A
CN109726800A CN201811639690.XA CN201811639690A CN109726800A CN 109726800 A CN109726800 A CN 109726800A CN 201811639690 A CN201811639690 A CN 201811639690A CN 109726800 A CN109726800 A CN 109726800A
Authority
CN
China
Prior art keywords
caffe
output
model
result
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811639690.XA
Other languages
Chinese (zh)
Other versions
CN109726800B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201811639690.XA priority Critical patent/CN109726800B/en
Publication of CN109726800A publication Critical patent/CN109726800A/en
Application granted granted Critical
Publication of CN109726800B publication Critical patent/CN109726800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Programmable Controllers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This disclosure relates to a kind of operation method, device and Related product, the product includes control module, and the control module includes: instruction cache unit, instruction process unit and storage queue unit;Described instruction cache unit, for storing the associated computations of artificial neural network operation;Described instruction processing unit obtains multiple operational orders for parsing to the computations;The storage queue unit, for storing instruction queue, the instruction queue include: by the pending multiple operational orders of the tandem of the queue or computations.By above method, operation efficiency of the Related product when carrying out the operation of neural network model is can be improved in the disclosure.

Description

Operation method, device and Related product
Technical field
This disclosure relates to field of artificial intelligence more particularly to a kind of operation method, device and Related product.
Background technique
Fusion mode and off-line mode are two kinds of distinctive methods of operation of neural network, are different from common layer-by-layer operation Mode.Wherein, the data copy process of a part of fused network layer is no longer handled by CPU in neural network, but directly The tasks such as data copy, data operation are completed on MLU board.This calculating process by multiple network layers all merges, no The mode for directly completing to calculate on MLU via COU is fusion mode.On the basis of fusion mode, model is detached from Network frame becomes independent of the network model of network frame, as off-line model, and the mode for running off-line model is as offline Mode.
Since under fusion mode and off-line mode, the network layer data being fused is no longer flow through CPU, therefore user can not Obtain the operation result of some of them layer.So, Debug demand or certain cause specifics needs directly obtain in user When taking the result data of some in network layer, more complicated process flow just may require that.The processing mode generallyd use are as follows: will Network is needing the position exported to be split, and the network layer for needing to export result data is detached from fusion mode;Either, will The network layer for needing to export result data is replicated, to use as additional output.
In above two processing mode, the network layer due to being detached from fusion mode needs to run on CPU, this just make its Arithmetic speed is decreased obviously, so as to cause more serious performance loss;And to handle course more multiple for the mode of duplicate network layer It is miscellaneous, it equally also will affect processing speed, lead to performance loss.
Summary of the invention
In view of this, the present disclosure proposes a kind of operation method, it is additional by being defined in the configuration file of Caffe model Output parameter, the Caffe configuration file after being adjusted are indicated by additional output parameter by the intermediate result of Caffe model Increase in the output result of Caffe model, thus when executing Caffe model according to Caffe configuration file adjusted, energy Access the output result of the Caffe model including intermediate result.Compared to traditional intermediate result acquisition modes, effectively improve Arithmetic speed, avoids the loss of model performance.
According to the one side of the disclosure, a kind of operation method is provided, comprising:
Additional output parameter is defined in the configuration file of Caffe model, the Caffe configuration file after being adjusted, institute It states additional output parameter to be used to indicate that the intermediate result by Caffe model increases in the output result of the Caffe model, institute State the operation result that intermediate result includes at least one non-output layer in the Caffe model;
The Caffe model is executed according to the Caffe configuration file adjusted, obtains including the intermediate result The output result of the Caffe model;
Wherein, the Caffe model is Fusion Model.
In one possible implementation, the method also includes:
Caffe off-line model is obtained according to the Caffe file adjusted, so that the Caffe off-line model is being held When row, output includes the output result of the Caffe model of the intermediate result.
In one possible implementation, the operation method is suitable for Heterogeneous Computing framework, the Heterogeneous Computing frame Structure includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device; The fusion sub-network are as follows: whole in Caffe model or subnetting network layers are subjected to the network obtained after Operator Fusion.
In one possible implementation, the value of the additional output parameter includes output identification or non-output identification,
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification In the output result of Caffe model.
In one possible implementation, each non-in the intermediate result in the output result of the Caffe model The output result of output layer is arranged according to the name order of network layer.
According to another aspect of the present disclosure, a kind of arithmetic unit is additionally provided, comprising:
Parameter definition module, for defining additional output parameter in the configuration file of Caffe model, after being adjusted Caffe configuration file, the additional output parameter is for indicating that the intermediate result by Caffe model increases to the Caffe mould In the output result of type, the intermediate result includes the operation result of at least one non-output layer in the Caffe model;
Model execution module is obtained for executing Caffe Fusion Model according to the Caffe configuration file adjusted The output result of the Caffe model including the intermediate result.
In one possible implementation, described device further include:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe file adjusted, so that When being executed, output includes the output result of the Caffe model of the intermediate result to the Caffe off-line model.
In one possible implementation, the arithmetic unit is suitable for Heterogeneous Computing framework, the Heterogeneous Computing frame Structure includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device; The fusion sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
In one possible implementation, the value of the additional output parameter includes output identification or non-output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification In the output result of Caffe model.
In one possible implementation, in the output result of the Caffe model, the centre and Ei excessively in it is non- The output result of output layer is arranged according to the name order of network layer.
According to the one side that the disclosure provides, a kind of computer equipment, including memory, processor are provided, it is described to deposit The computer program that can be run on a processor is stored on reservoir, the processor is realized such as when executing the computer program The step of upper any the method.
According to the one side of the disclosure, a kind of readable storage medium storing program for executing is additionally provided, computer program is stored thereon with, it is special The step of sign is, as above any the method is realized when the computer program is executed by processor.
According to another aspect of the present disclosure, a kind of machine learning arithmetic unit, the machine learning operation dress are additionally provided It sets including one or more as above any arithmetic unit, for obtaining from other processing units to operation input data With control information, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
It, can between the multiple arithmetic unit when the machine learning arithmetic unit includes multiple arithmetic units To be attached by specific structure and transmit data;
Wherein, multiple arithmetic units are interconnected by PCIE bus and are transmitted data, more massive to support The operation of machine learning;
Multiple arithmetic units share same control system or possess respective control system;
Multiple arithmetic unit shared drives possess respective memory;
The mutual contact mode of multiple arithmetic units is any interconnection topology.
According to the one side of the disclosure, a kind of combined treatment device is provided, the combined treatment device includes institute as above The machine learning arithmetic unit stated, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating completing user and specifying Operation.
In one possible implementation, further includes: storage device;
The storage device is connect with the machine learning arithmetic unit and other described processing units respectively, for saving Machine learning arithmetic unit as described above or combined treatment device as described above.
According to the one side of the disclosure, a kind of neural network chip is provided, the chip includes machine as described above Learn arithmetic unit or combined treatment device as described above or combined treatment device as described above.
According to another aspect of the present disclosure, a kind of electronic equipment is additionally provided, the electronic equipment includes as described above Neural network chip.
According to the one side of the disclosure, a kind of board is provided, the board includes: memory device, interface arrangement and control Device processed and neural network chip as described above;
Wherein, the neural network chip and the memory device, the control device and the interface arrangement are distinguished Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the neural network chip and external equipment;
The control device is monitored for the state to the neural network chip.
In one possible implementation, the memory device includes: multiple groups storage unit, is stored described in each group single It is first to be connect with the neural network chip by bus, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit System;
The interface arrangement are as follows: standard PCIE interface.
Above-mentioned operation method, by defining additional output parameter in the configuration file of Caffe model, after being adjusted Caffe configuration file indicates the output that the intermediate result of Caffe model is increased to Caffe model by additional output parameter As a result in, to can obtain including intermediate result when executing Caffe model according to Caffe configuration file adjusted The output result of Caffe model.Compared to traditional intermediate result acquisition modes, arithmetic speed is effectively increased, mould is avoided The loss of type performance.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the operation method according to one embodiment of the disclosure;
Fig. 2 shows the structural schematic diagrams of the neural network of the relevant technologies;
Fig. 3 shows the schematic diagram of operation logic of the neural network under layer-by-layer mode;
Fig. 4 shows the schematic diagram of operation logic of the neural network under fusion mode;
Fig. 5 shows the schematic diagram of operation logic of the neural network under an embodiment of the disclosure;
Fig. 6 shows the block diagram of the arithmetic unit according to one embodiment of the disclosure;
Fig. 7 shows the block diagram of the combined treatment according to one embodiment of the disclosure;
Fig. 8 shows the block diagram of another combined treatment device according to one embodiment of the disclosure;
Fig. 9 shows the block diagram of the board according to one embodiment of the disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Firstly, it is necessary to explanation, the file for generating Caffe model generally includes two: one as structured file (Prototxt, abbreviation pt), that is, configuration file described below;Another is then weight file (caffemodel).On State the configuration file (pt) that the object adjusted in operation method can be stored in disk, or by configuration file Adjustment after being loaded onto memory with weight file, to the configuration file progress being loaded onto memory.
Fig. 1 shows the flow chart of the operation method of the embodiment according to the disclosure.Refering to fig. 1, the embodiment of the present disclosure Operation method, comprising:
Step S100 defines additional output parameter in the configuration file of Caffe model, and the Caffe after being adjusted matches Set file.Wherein, it should be noted that additional output parameter is for indicating that the intermediate result by Caffe model increases to Caffe In the output result of model.Intermediate result refer to include the non-output layer of at least one in Caffe model operation result.
Step S200 executes Caffe model according to Caffe configuration file adjusted, obtains including intermediate result The output result of Caffe model.Wherein, Caffe model can be Fusion Model.
It should be pointed out that the operation method of the disclosure can be based on convolutional neural networks frame (Convolutional Architecture for Fast Feature Embedding, abbreviation Caffe) fusion mode carry out, it can also be based on He carries out at neural network structure, and embodiments disclosed below is illustrated by taking Caffe model as an example.Those skilled in the art It is understood that the operation method of the disclosure can also be applied to other neural networks, principle is same or similar, herein not It is repeated one by one again.
In addition, for the clearer operation method for illustrating the disclosure, below to neural network and neural network operation side Formula carries out more detailed explanation, in order to be more clearly understood the disclosure operation method technical solution.
Referring to Fig.2, Fig. 2 shows the structural schematic diagrams of the neural network of the relevant technologies.The neural network structure includes one Input (input) and two outputs (output1 and output2).What each layer referred between input and output is exactly net Each network layer in network.Currently, the method for operation of neural network generally includes layer-by-layer mode, fusion mode and offline mould Formula.
Operation logic of the neural network under layer-by-layer mode is shown refering to Fig. 3, Fig. 3.Wherein, under layer-by-layer mode, mind Operation through each network layer (layer) in network is completed by MLU (artificial intelligence chip), the data between each network layer Transmission is completed by CPU, since interaction between layers is realized in CPU, can be directly obtained under this kind of mode Each layer of network output.
In order to accelerate network operations rate, network performance is improved, two kinds of nerve nets of fusion mode and off-line mode are proposed The network method of operation.Operation logic of the neural network under fusion mode is shown refering to Fig. 4, Fig. 4.Wherein, under fusion mode, The operation of each network layer and interaction between layers are completed in MLU, and CPU is only involved in the process output and input.That is, Fusion Model refers to that the data copy process of fused network layer in neural network is no longer handled by CPU, but directly exists Data copy, the tasks such as data operation are completed on MLU board.The method of operation of Fusion Model is fusion mode.In fusion mould Under formula, the operation of whole network is transparent to CPU, therefore user is the operation knot that can not directly acquire mid-level net network layers at this time Fruit.
Off-line mode is then that model is detached from frame (that is, Caffe model is detached from the basis of fusion mode Caffe frame), become independent of the method for operation of the network model (off-line model) of frame.Similarly, in disconnection mode, net The operational process of network be to CPU it is transparent, user can not directly acquire the operation result of mid-level net network layers.
It can be using an embodiment of the operation method of the mentioned-above disclosure, in network model (Caffe model) Additional output parameter is defined in configuration file, is indicated by defined additional output parameter by the intermediate result of network model Increase in the output result of network model, so that when executing Caffe model according to Caffe configuration file adjusted, energy It is enough to be directly obtained intermediate result in the output result of Caffe model.
Operation method disclosed above, it is only necessary to the configuration file of Caffe model is adjusted, it is fixed in configuration file The additional output parameter of justice, can be realized the purpose that the operation result (intermediate result) of mid-level net network layers is directly acquired on MLU, grasp Make simple, it is easy to accomplish.Also, the operation method of the disclosure carries out network in the position for needing to export compared to traditional The mode of fractionation, process flow is simple, while also will not influence the performance of network.
As a kind of possible implementation, Fig. 5 shows the signal of the operation method of the embodiment according to the disclosure Figure.Refering to Fig. 5, the additional output parameter defined in the configuration file of Caffe model can be with are as follows: external_output. That is, external_output additional to the non-output layer of at least one in Caffe model (other network layers in addition to output layer) Parameter can carry out the operation result of respective wire network layers during Caffe model running according to external_output Output.It is easy to operate, it is easy to accomplish.
As a kind of possible implementation, further includes: it is offline to obtain Caffe according to Caffe configuration file adjusted Model, so that Caffe off-line model is when being executed, output includes the output result of the Caffe model of intermediate result.That is, passing through Caffe off-line model is obtained according to Caffe configuration file adjusted, Caffe Fusion Model is further converted into offline mould After type, so that in disconnection mode, can equally sample any operation method in front and obtain intermediate result.
Wherein, it should be pointed out that the operation method of the disclosure is suitable for Heterogeneous Computing framework.Heterogeneous Computing framework includes General processor (CPU) and artificial intelligence process device.Artificial intelligence process device can be the people for executing artificial intelligence operation Work intelligent processor (IPU), artificial intelligence operation may include machine learning operation, class brain operation etc..Wherein, machine learning operation Including neural network computing, k-means operation, support vector machines operation etc..The artificial intelligent processor can be for example including GPU (Graphics Processing Unit, graphics processing unit), NPU (Neural-Network Processing Unit, mind Through network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array One of (Field-Programmable Gate Array, FPGA) chip or combination.Non- output layer then includes: to operate in The non-output layer in fusion sub-network on artificial intelligence process device.Meanwhile according to mentioned-above Fusion Model, this field skill Art personnel are it is understood that fusion sub-network refers to network layer all or part of in Caffe model carrying out Operator Fusion The network obtained afterwards.
In addition, a kind of possible implementation as disclosure operation method, the value of additional output parameter may include Output identification (such as: true) or non-output identification (such as: false).Correspondingly, can then lead to when defining additional output parameter It crosses and defines the value of additional output parameter to indicate that the intermediate result by Caffe model increases in the output result of Caffe model.
Wherein, the output that intermediate result is increased to Caffe model is being indicated by the value for defining additional output parameter It as a result, can be with when middle are as follows:
The value of the additional output parameter of layer to be output (that is, it needs to a certain network layer in the centre for exporting operation result) is arranged It is true (that is, external_output:true), when the additional output parameter of a certain layer network layer is true, then by the network layer Output exported as the output result of fused Caffe model.
That is, the operation result for the network layer that setting external_output is true, can be considered network after fusion It is normal output as a result, its practical processing mode is equivalent to the network end-point result of Caffe model.Meanwhile the operation result according to So while not influencing network operations, it can also decline without apparent performance in the continuous participation operation of fused network trunk.
Wherein, each non-defeated in intermediate result in the output result of Caffe model as a kind of possible implementation The output result of layer is arranged according to the name order of network layer out.That is, in the output knot that intermediate result is increased to Caffe model In fruit, when so that including intermediate result in the output result of Caffe model, it is right that intermediate result may include multiple network layers institute The operation result answered.At this point it is possible to by the operation result (that is, multiple intermediate results) of multiple network layers according to the title of network layer Sequence, which is sequentially arranged in the output result of Caffe model, to be exported.
By the way that multiple intermediate results are arranged according to the output result that the name order of network layer is sequentially arranged in Caffe model In table, so that the corresponding network layer of output result is clear.
Above-mentioned operation method, by defining additional output parameter in the configuration file of Caffe model, after being adjusted Caffe configuration file indicates the output that the intermediate result of Caffe model is increased to Caffe model by additional output parameter As a result in, to can obtain including intermediate result when executing Caffe model according to Caffe configuration file adjusted The output result of Caffe model.Compared to traditional intermediate result acquisition modes, arithmetic speed is effectively increased, mould is avoided The loss of type performance.
According to the one side of the disclosure, a kind of arithmetic unit 100 is additionally provided.Fig. 6 shows the arithmetic unit 100 of the disclosure An embodiment block diagram.Refering to Fig. 6, arithmetic unit 100 includes:
Parameter definition module 110 is adjusted for defining additional output parameter in the configuration file of Caffe model Caffe configuration file afterwards, the additional output parameter is for indicating that it is described that the intermediate result by Caffe model increases to In the output result of Caffe model, the intermediate result includes the operation knot of at least one non-output layer in the Caffe model Fruit;
Model execution module 120 is obtained for executing Caffe Fusion Model according to the Caffe configuration file adjusted To the output result for the Caffe model for including the intermediate result.
As a kind of possible implementation, further includes:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe configuration file adjusted, So that the Caffe off-line model is when being executed, output includes the output result of the Caffe model of the intermediate result.
As a kind of possible implementation, the arithmetic unit 100 is suitable for Heterogeneous Computing framework, the Heterogeneous Computing Framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device; The fusion sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
As a kind of possible implementation, the value of the additional output parameter includes output identification or non-output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the defeated of the Caffe model Out in result, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification In the output result of Caffe model.
As a kind of possible implementation, in the output result of the Caffe model, the centre and Ei excessively in it is non- The output result of output layer is arranged according to the name order of network layer.
According to another aspect of the present disclosure, a kind of computer equipment, including memory, processor are provided, on memory It is stored with the computer program that can be run on a processor, processor realizes as above any one operation when executing computer program The step of method.
According to another aspect of the present disclosure, a kind of readable storage medium storing program for executing is additionally provided, computer program is stored thereon with, is counted The step of as above any one operation method is realized when calculation machine program is executed by processor.
According to the one side of the disclosure, a kind of machine learning arithmetic unit is provided, machine learning arithmetic unit includes one A or multiple any arithmetic units as above, for being obtained from other processing units to operation input data and control letter Breath, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface.Other processing Device is for example: camera, display, mouse, keyboard, network interface card, wifi interface, server.When including more than one arithmetic unit When, it can be linked by specific structure between arithmetic unit and transmit data, for example, be interconnected simultaneously by PCIE bus Data are transmitted, to support the operation of more massive machine learning.At this point it is possible to share same control system, can also have each From independent control system;Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can To be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases Connection.
Fig. 7 shows the block diagram of the combined treatment device 200a according to one embodiment of the disclosure.Refering to Fig. 7, the disclosure is also A kind of combined treatment device 200a is provided, combined treatment device includes machine learning arithmetic unit as above (neural network fortune Calculate device 210), general interconnecting interface 220 and other processing units 230.Machine learning arithmetic unit 210 and other processing units 230 interact, the common operation completing user and specifying.
Other processing units 230, including central processor CPU, graphics processor GPU, neural network processor etc. are general/ One of application specific processor or above processor type.Processor quantity included by other processing units 230 does not limit System.Other interfaces of processing unit 230 as machine learning arithmetic unit and external data and control, including data are carried, complete The basic controls such as unlatching, the stopping of pairs of machine study arithmetic unit;Other processing units can also be with machine learning operation Device cooperation is common to complete processor active task.
General interconnecting interface 220, for being transmitted between the machine learning arithmetic unit 210 and other processing units 230 Data and control instruction.The machine learning arithmetic unit 210 obtains required input data from other processing units 230, writes Enter the storage device of machine learning arithmetic unit on piece;Control instruction can be obtained from other processing units 230, and machine is written Learn the control caching of arithmetic unit on piece;It can also learn the data and transmission in the memory module of arithmetic unit with read machine Give other processing units.
Fig. 8 shows the block diagram of the combined treatment device 200b according to another embodiment of the disclosure.Refering to Fig. 8, the disclosure Combined treatment device 200b can also include storage device 240, storage device 240 respectively with the machine learning arithmetic unit 210 and other described processing units 230 connection.Storage device 240 is for being stored in the machine learning arithmetic unit 210 and institute The data for stating other processing units 230, be particularly suitable for required for operation data machine learn arithmetic unit or other The data that can not be all saved in the storage inside of processing unit.
Combined treatment device 200b can be used as the SOC of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system on chip, improves processing speed, reduces overall power.When this situation, the combination The general interconnecting interface of processing unit is connected with certain components of equipment.Certain components for example camera, display, mouse, Keyboard, network interface card, wifi interface.
In some embodiments, a kind of chip is also disclosed comprising at above-mentioned machine learning arithmetic unit or combination Manage device.
In some embodiments, a kind of chip-packaging structure is disclosed comprising said chip.
In some embodiments, a kind of board is disclosed comprising said chip encapsulating structure.It is mentioned refering to Fig. 9, Fig. 9 A kind of board is supplied, above-mentioned board can also include other matching components, this is mating other than including said chip 389 Component includes but is not limited to: memory device 390, interface arrangement 391 and control device 392.
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (19)

1. a kind of operation method characterized by comprising
Additional output parameter is defined in the configuration file of Caffe model, the Caffe configuration file after being adjusted, the volume Outer output parameter for indicating that the intermediate result by Caffe model increases in the output result of the Caffe model, it is described in Between result include at least one non-output layer in the Caffe model operation result;
The Caffe model is executed according to the Caffe configuration file adjusted, obtains including the described of the intermediate result The output result of Caffe model;
Wherein, the Caffe model is Fusion Model.
2. the method according to claim 1, wherein further include:
Caffe off-line model is obtained according to the Caffe configuration file adjusted, so that the Caffe off-line model is being held When row, output includes the output result of the Caffe model of the intermediate result.
3. the method according to claim 1, wherein the operation method be suitable for Heterogeneous Computing framework, it is described Heterogeneous Computing framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;It is described Merge sub-network are as follows: whole in Caffe model or subnetting network layers are subjected to the network obtained after Operator Fusion.
4. the method according to claim 1, wherein the value of the additional output parameter includes output identification or non- Output identification,
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the output knot of the Caffe model In fruit, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification In the output result of Caffe model.
5. method according to any one of claims 1 to 4, which is characterized in that in the output result of the Caffe model In, the output result of each non-output layer is arranged according to the name order of network layer in the intermediate result.
6. a kind of arithmetic unit characterized by comprising
Parameter definition module, for defining additional output parameter in the configuration file of Caffe model, after being adjusted Caffe configuration file, the additional output parameter is for indicating that the intermediate result by Caffe model increases to the Caffe mould In the output result of type, the intermediate result includes the operation result of at least one non-output layer in the Caffe model;
Model execution module is used to execute Caffe Fusion Model according to the Caffe configuration file adjusted, including The output result of the Caffe model of the intermediate result.
7. device according to claim 6, which is characterized in that further include:
Off-line model obtains module, for obtaining Caffe off-line model according to the Caffe configuration file adjusted, so that When being executed, output includes the output result of the Caffe model of the intermediate result to the Caffe off-line model.
8. device according to claim 6, which is characterized in that the arithmetic unit is suitable for Heterogeneous Computing framework, described Heterogeneous Computing framework includes general processor and artificial intelligence process device;
The non-output layer includes: the non-output layer operated in the fusion sub-network on the artificial intelligence process device;It is described Merge sub-network are as follows: network layer all or part of in Caffe model is subjected to the network obtained after Operator Fusion.
9. device according to claim 6, which is characterized in that the value of the additional output parameter includes output identification or non- Output identification;
The additional output parameter is for indicating that the intermediate result by Caffe model increases to the output knot of the Caffe model In fruit, comprising:
The value of the additional output parameter indicates to increase to the intermediate result of the Caffe model described for output identification In the output result of Caffe model.
10. according to the described in any item devices of claim 6 to 9, which is characterized in that in the output result of the Caffe model In, the output result of each non-output layer is arranged according to the name order of network layer in the intermediate result.
11. a kind of computer equipment, including memory, processor, be stored on the memory to run on a processor Computer program, which is characterized in that the processor realizes any one of claims 1 to 5 when executing the computer program The step of the method.
12. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed The step of any one of claim 1 to 5 the method is realized when device executes.
13. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed Benefit requires 6 to 10 described in any item arithmetic units, for obtaining from other processing units to operation input data and control Information, and specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple arithmetic units, can lead between the multiple arithmetic unit Specific structure is crossed to be attached and transmit data;
Wherein, multiple arithmetic units are interconnected by PCIE bus and are transmitted data, to support more massive machine The operation of study;
Multiple arithmetic units share same control system or possess respective control system;
Multiple arithmetic unit shared drives possess respective memory;
The mutual contact mode of multiple arithmetic units is any interconnection topology.
14. a kind of combined treatment device, which is characterized in that the combined treatment device includes engineering described in claim 13 Practise arithmetic unit, general interconnecting interface and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying Make.
15. combined treatment device according to claim 14, which is characterized in that further include: storage device;
The storage device is connect with the machine learning arithmetic unit and other described processing units respectively, described for saving Machine learning arithmetic unit or combined treatment device as claimed in claim 14.
16. a kind of neural network chip, which is characterized in that the chip includes machine learning operation as claimed in claim 13 Device, or the combined treatment device as described in claim 14-15.
17. a kind of electronic equipment, which is characterized in that the electronic equipment includes neural network core as claimed in claim 16 Piece.
18. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right It is required that neural network chip described in 17;
Wherein, the neural network chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the neural network chip and external equipment;
The control device is monitored for the state to the neural network chip.
19. board according to claim 18, which is characterized in that
The memory device includes: that multiple groups storage unit, storage unit described in each group and the neural network chip pass through always Line connection, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
CN201811639690.XA 2018-12-29 2018-12-29 Operation method, device and related product Active CN109726800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811639690.XA CN109726800B (en) 2018-12-29 2018-12-29 Operation method, device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811639690.XA CN109726800B (en) 2018-12-29 2018-12-29 Operation method, device and related product

Publications (2)

Publication Number Publication Date
CN109726800A true CN109726800A (en) 2019-05-07
CN109726800B CN109726800B (en) 2019-12-24

Family

ID=66297971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811639690.XA Active CN109726800B (en) 2018-12-29 2018-12-29 Operation method, device and related product

Country Status (1)

Country Link
CN (1) CN109726800B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN112052040A (en) * 2019-06-06 2020-12-08 中科寒武纪科技股份有限公司 Processing method, processing device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050125792A1 (en) * 2003-12-08 2005-06-09 Che-An Chang Software materialization platform and an artificial neuron computer system
US20080244251A1 (en) * 2007-03-29 2008-10-02 Khipu Systems Limited Predictive model implementation system and methodology
CN105760932A (en) * 2016-02-17 2016-07-13 北京物思创想科技有限公司 Data exchange method, data exchange device and calculating device
CN106845631A (en) * 2016-12-26 2017-06-13 上海寒武纪信息科技有限公司 One kind stream performs method and device
CN107808098A (en) * 2017-09-07 2018-03-16 阿里巴巴集团控股有限公司 A kind of model safety detection method, device and electronic equipment
CN109086244A (en) * 2018-07-11 2018-12-25 中国人民解放军国防科技大学 Matrix convolution vectorization implementation method based on vector processor
CN109086877A (en) * 2016-04-29 2018-12-25 北京中科寒武纪科技有限公司 A kind of device and method for executing convolutional neural networks forward operation
CN109102074A (en) * 2017-06-21 2018-12-28 上海寒武纪信息科技有限公司 A kind of training device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050125792A1 (en) * 2003-12-08 2005-06-09 Che-An Chang Software materialization platform and an artificial neuron computer system
US20080244251A1 (en) * 2007-03-29 2008-10-02 Khipu Systems Limited Predictive model implementation system and methodology
CN105760932A (en) * 2016-02-17 2016-07-13 北京物思创想科技有限公司 Data exchange method, data exchange device and calculating device
CN109086877A (en) * 2016-04-29 2018-12-25 北京中科寒武纪科技有限公司 A kind of device and method for executing convolutional neural networks forward operation
CN106845631A (en) * 2016-12-26 2017-06-13 上海寒武纪信息科技有限公司 One kind stream performs method and device
CN109102074A (en) * 2017-06-21 2018-12-28 上海寒武纪信息科技有限公司 A kind of training device
CN107808098A (en) * 2017-09-07 2018-03-16 阿里巴巴集团控股有限公司 A kind of model safety detection method, device and electronic equipment
CN109086244A (en) * 2018-07-11 2018-12-25 中国人民解放军国防科技大学 Matrix convolution vectorization implementation method based on vector processor

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052040A (en) * 2019-06-06 2020-12-08 中科寒武纪科技股份有限公司 Processing method, processing device, computer equipment and storage medium
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network
CN110490309B (en) * 2019-08-14 2022-06-07 中科寒武纪科技股份有限公司 Operator fusion method for neural network and related product thereof
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip

Also Published As

Publication number Publication date
CN109726800B (en) 2019-12-24

Similar Documents

Publication Publication Date Title
TWI803663B (en) A computing device and computing method
CN109657782A (en) Operation method, device and Related product
CN109685201A (en) Operation method, device and Related product
CN109543832A (en) A kind of computing device and board
CN109522052A (en) A kind of computing device and board
CN109726800A (en) Operation method, device and Related product
CN109284815A (en) Neural network model algorithm Compilation Method, device and Related product
CN109543825A (en) Neural network model algorithm Compilation Method, device and Related product
CN109740751A (en) The framework fusion method and relevant apparatus of neural network model
CN109739703A (en) Adjust wrong method and Related product
CN109753319A (en) A kind of device and Related product of release dynamics chained library
CN109670581A (en) A kind of computing device and board
CN110119807A (en) Operation method, device, computer equipment and storage medium
CN110147249A (en) A kind of calculation method and device of network model
CN109711540A (en) A kind of computing device and board
CN110163349A (en) A kind of calculation method and device of network model
CN110059809A (en) A kind of computing device and Related product
CN109740729A (en) Operation method, device and Related product
CN109740746A (en) Operation method, device and Related product
CN109711367A (en) Operation method, device and Related product
CN109740730A (en) Operation method, device and Related product
CN109739514A (en) Parameter processing method and Related product
CN110472734A (en) A kind of computing device and Related product
US20230169031A1 (en) Method and device for constructing communication topology structure on basis of multiple processing nodes
CN110020720B (en) Operator splicing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Patentee after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Patentee before: Beijing Zhongke Cambrian Technology Co., Ltd.