CN109919315A - A kind of forward inference method, apparatus, equipment and the storage medium of neural network - Google Patents
A kind of forward inference method, apparatus, equipment and the storage medium of neural network Download PDFInfo
- Publication number
- CN109919315A CN109919315A CN201910188467.6A CN201910188467A CN109919315A CN 109919315 A CN109919315 A CN 109919315A CN 201910188467 A CN201910188467 A CN 201910188467A CN 109919315 A CN109919315 A CN 109919315A
- Authority
- CN
- China
- Prior art keywords
- network
- target nerve
- sub
- nerve network
- inference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
This application provides forward inference method, apparatus, equipment and the storage mediums of a kind of neural network, wherein, method includes: that target nerve network is divided into multiple sub-networks, any sub-network includes at least one hidden layer of target nerve network, the corresponding reasoning example of multiple sub-networks and inference engine are created on the hardware device of Inference Platform, based on the corresponding reasoning example of multiple sub-networks and inference engine, forward inference is carried out to target nerve network.Due to a part of hidden layer of an inference engine responsible nerve network, synchronization can have multiple data inputs to execute parallel in different inference engines, therefore, forward inference method provided by the present application Reasoning Efficiency with higher and data throughout, and the hardware resource of Inference Platform is fully used.
Description
Technical field
This application involves parallel computing field, more specifically to a kind of neural network forward inference method,
Device, equipment and storage medium.
Background technique
The forward inference of neural network refer to for create on Inference Platform to reasoning neural network reasoning example with
Inference engine, input data and reasoning example of the inference engine based on neural network input layer transport each layer of neural network
It calculates.
Current inference schemes are as follows: create a reasoning example for reasoning neural network, and in the reasoning example
Create an inference engine, inference engine receives input data, based on reasoning example to each layer of entire neural network in order
Carry out operation, that is, the operation strict sequential of an input data on the different layers, also, also strict sequential between different inputs,
I.e. latter input data could operation after the output result of previous input data must be waited to obtain.
By above-mentioned existing inference schemes it is found that with the neural network number of plies intensification, complete a data and be input to
The calculating time of output can increasingly be grown, and whole handling capacity is smaller and smaller.Meanwhile with the continuous development of chip technology, respectively
The computing capability that kind is suitable for the hardware device of neural network is greatly improved, and existing inference schemes set hardware
Standby utilization rate is very low, serious waste hardware resource.
Summary of the invention
In view of this, this application provides forward inference method, apparatus, equipment and the readable storage mediums of a kind of neural network
Matter, to solve existing inference schemes, time-consuming, low efficiency, and the problem that hardware resource utilization is low, and technical solution is such as
Under:
A kind of forward inference method of neural network, comprising:
Target nerve network is divided into multiple sub-networks, wherein any sub-network includes the target nerve network
At least one hidden layer;
The corresponding reasoning example of the multiple sub-network and inference engine are created on the hardware device of Inference Platform;
Based on the corresponding reasoning example of the multiple sub-network and inference engine, the target nerve network is carried out
Forward inference.
It is optionally, described that target nerve network is divided into multiple sub-networks, comprising:
It obtains the hardware equipment information of the Inference Platform and the calculation amount of target nerve network and required storage is empty
Between;
It the calculation amount of hardware equipment information and the target nerve network based on the Inference Platform and required deposits
Space is stored up, the target nerve network is divided into multiple sub-networks.
Wherein, the hardware equipment information of the Inference Platform includes one of following information or a variety of:
Transmission between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device
Bandwidth.
Optionally, obtain the target nerve network calculation amount and required memory space, comprising:
According to the network parameter of the target nerve network, the calculating figure of the target nerve network is constructed;
According to the calculating figure of the target nerve network, the calculation amount of each layer of target nerve network and required is determined
Memory space;
Calculation amount and required memory space by each layer of target nerve network determine the entire target nerve
The calculation amount of network and required memory space.
Optionally, the hardware equipment information based on the Inference Platform and the calculation amount of the target nerve network
With required memory space, the target nerve network is divided into multiple sub-networks, comprising:
The calculation amount of hardware equipment information, the target nerve network based on the Inference Platform and required storage are empty
Between and user configuration parallel schema, determine and be suitble to the parallel schema of the target nerve network, wherein the parallel schema
Including single device parallel schema and more equipment parallel schemas, under the single device parallel schema, the target nerve network
Forward inference is realized based on individual equipment, under more equipment parallel schemas, the forward inference base of the target nerve network
It is realized in multiple equipment;
Based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided into multiple subnets
Network.
Optionally, the hardware equipment information based on the Inference Platform, the calculation amount of the target nerve network and
The parallel schema of required memory space and user configuration determines the parallel schema for being suitble to the target nerve network, comprising:
If the computationally intensive computing capability in individual equipment of the entire target nerve network, and/or, the entire mesh
Memory space needed for marking neural network is greater than the memory capacity of individual equipment, it is determined that be suitble to the target nerve network and
Row mode is more equipment parallel schemas;
If the calculation amount of the entire target nerve network is less than or equal to the computing capability of the individual equipment, also,
Memory space needed for the entire target nerve network is less than or equal to the memory capacity of the individual equipment, then based on described
The parallel schema of user configuration determines the parallel schema for being suitble to the target nerve network.
Optionally, the parallel schema based on the user configuration determines the parallel mould for being suitble to the target nerve network
Formula, comprising:
When the parallel schema of the user configuration is the single device parallel schema, determination is suitble to the target nerve net
The parallel schema of network is the single device parallel schema;
When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than
The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to the target nerve network is described sets up
Standby parallel schema, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network,
It then determines and the parallel schema of the target nerve network is suitble to be more equipment parallel schemas.
Optionally, described based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided
For multiple sub-networks, comprising:
If being suitble to the parallel schema of the target nerve network is more equipment parallel schemas, set based on the hardware
Standby number obtains the division number of sub-network, is drawn based on the division number of the sub-network to the target nerve network
Point;
If being suitble to the parallel schema of the target nerve network is the single device parallel schema, it is based on preset son
Network divides number and divides to the target nerve network.
It is optionally, described that the target nerve network is divided based on the division number of the sub-network, comprising:
Division number based on the sub-network, the maximum number of the theoretical amount being responsible for single device and equipment room transmission
It is partitioning standards according to amount, the target nerve network is divided;
Wherein, the responsible theoretical amount of the single device passes through the calculation amount of the entire target nerve network and described
The division number of sub-network determines that the maximum amount of data of the equipment room transmission is executed by the maximum of pre-set sub-network
The transmission bandwidth of time and equipment room determines.
Optionally, the division number based on the sub-network, the theoretical amount and equipment room being responsible for single device
The maximum amount of data of transmission is partitioning standards, is divided to the target nerve network, comprising:
It is successively traversed backward since the input layer of the target nerve network: being sequentially overlapped the calculation amount of each hidden layer, and
Be currently superimposed obtained calculation amount close to the single device be responsible for theoretical amount when, obtain be overlapped it is multiple adjacent
The sub-network of hidden layer composition is as candidate sub networks network;
It, will if the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission
The sub-network that the candidate sub networks network is obtained as division;If the number of output of the candidate sub networks network is greater than the equipment
Between the maximum amount of data that transmits, then hidden layer is removed one by one from the front to the back from the candidate sub networks network, until the subnet after removing
The output data quantity of network is less than or equal to the maximum amount of data of equipment room transmission, and the sub-network after removing hidden layer is as division
An obtained sub-network;
Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again to the son
The calculation amount of hidden layer after network is overlapped.
Optionally, described to be based on the corresponding reasoning example of the multiple sub-network and inference engine, to the target
Neural network carries out forward inference, comprising:
According to the dependence between the multiple sub-network, the corresponding inference engine of the multiple sub-network is determined
Between dependence;
In order to the corresponding inference engine input data of the multiple sub-network, so that each inference engine is based on
Input data and corresponding reasoning example carry out operation to its corresponding sub-network.
A kind of forward inference device of neural network, comprising: network process module, example and engine creation module and reasoning
Module;
The network process module, for the target nerve network to be divided into multiple sub-networks, wherein any subnet
Network includes at least one hidden layer of the target nerve network;
The example and engine creation module, for creating the multiple sub-network point on the hardware device of Inference Platform
Not corresponding reasoning example and inference engine;
The reasoning module, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to institute
It states target nerve network and carries out forward inference.
Optionally, the network process module includes: data obtaining module and sub-network division module;
The data obtaining module, obtain the Inference Platform hardware equipment information and the target nerve network
Calculation amount and required memory space;
The sub-network division module, for based on the Inference Platform hardware equipment information and the target nerve
The target nerve network is divided into multiple sub-networks by the calculation amount of network and required memory space.
Wherein, the hardware equipment information of the Inference Platform includes one of following information or a variety of:
Transmission between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device
Bandwidth.
Optionally, the data obtaining module includes: that calculating figure building submodule and calculation amount and memory space determine
Submodule;
The calculating figure building submodule constructs the target for the network parameter according to the target nerve network
The calculating figure of neural network;
The calculation amount and memory space determine submodule, for the calculating figure according to the target nerve network, determine
The calculation amount of each layer of target nerve network and required memory space pass through the calculation amount of each layer of target nerve network
With required memory space, the entirely calculation amount of the target nerve network and required memory space are determined.
Optionally, the sub-network division module includes: that parallel schema determines that submodule and sub-network divide submodule;
The parallel schema determines submodule, for the hardware equipment information based on the Inference Platform, target mind
The parallel schema of calculation amount and required memory space and user configuration through network, determination are suitble to the target nerve network
Parallel schema, wherein the parallel schema includes single device parallel schema and more equipment parallel schemas, the single device simultaneously
Under row mode, the forward inference of the target nerve network is realized based on individual equipment, under more equipment parallel schemas, institute
The forward inference for stating target nerve network is realized based on multiple equipment;
The sub-network divides submodule, for based on the parallel schema for being suitble to the target nerve network, by the mesh
Mark neural network is divided into multiple sub-networks.
Optionally, the parallel schema determines that submodule includes: that the first determining submodule and second determine submodule;
Described first determines submodule, by working as the computationally intensive based on individual equipment of the entire target nerve network
Calculation ability, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determine
The parallel schema for being suitble to the target nerve network is more equipment parallel schemas;
Described second determines submodule, for being less than or equal to the list when the calculation amount of the entire target nerve network
The computing capability of a equipment, also, memory space needed for the entire target nerve network is individually set less than or equal to described
When standby memory capacity, determines based on the parallel schema of the user configuration and be suitble to the parallel schema of the target nerve network.
Optionally, it described second determines submodule, is described set up specifically for the parallel schema when the user configuration
When standby parallel schema, determines and the parallel schema of the target nerve network is suitble to be the single device parallel schema;When the use
When the parallel schema of family configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than pre-set sub-network
Maximum execution time, it is determined that be suitble to the target nerve network parallel schema be the single device parallel schema, if institute
State the maximum execution time that equipment room transmission time is less than or equal to the pre-set sub-network, it is determined that be suitble to the mesh
The parallel schema for marking neural network is more equipment parallel schemas.
Optionally, it includes: that the first division submodule and second divide submodule that the sub-network, which divides submodule,;
It is described first divide submodule, for when be suitble to the target nerve network parallel schema be more equipment simultaneously
When row mode, the number based on the hardware device obtains the division number of sub-network, and the division based on the sub-network
It is several that the target nerve network is divided;
It is described second divide submodule, for when be suitble to the target nerve network parallel schema be the single device simultaneously
When row mode, number is divided based on preset sub-network, the target nerve network is divided.
Optionally, described first submodule is divided, it is negative with single device specifically for the division number based on the sub-network
The maximum amount of data of theoretical amount and the equipment room transmission of duty is partitioning standards, is divided to the target nerve network;
Wherein, the responsible theoretical amount of the single device passes through the calculation amount of the entire target nerve network and described
The division number of sub-network determines that the maximum amount of data of the equipment room transmission is executed by the maximum of pre-set sub-network
The transmission bandwidth of time and equipment room determines.
Optionally, it is described first divide submodule, specifically for since the input layer of the target nerve network successively
It traverses backward: being sequentially overlapped the calculation amount of each hidden layer, and be currently superimposed what obtained calculation amount was responsible for close to the single device
When theoretical amount, the sub-network for the multiple adjacent hidden layer compositions being overlapped is obtained as candidate sub networks network;If the candidate
The output data quantity of sub-network be less than or equal to the equipment room transmission maximum amount of data, then using the candidate sub networks network as
Divide an obtained sub-network;If the number of output of the candidate sub networks network is greater than the maximum data of equipment room transmission
Amount, then remove hidden layer from the candidate sub networks network one by one from the front to the back, until the output data quantity of the sub-network after removing is small
In or equal to equipment room transmission maximum amount of data, sub-network after removing hidden layer is as dividing an obtained subnet
Network;Continuation traverses backward, until obtaining all sub-network, wherein after one sub-network of every acquisition, again to the sub-network after
The calculation amount of hidden layer be overlapped.
Optionally, the reasoning module, described in determining according to the dependence between the multiple sub-network
Dependence between the corresponding inference engine of multiple sub-networks;It is pushed away in order to the multiple sub-network is corresponding
Manage engine input data so that each inference engine be based on input data and corresponding reasoning example to its corresponding sub-network into
Row operation.
A kind of forward inference equipment of neural network, comprising: memory and processor;
The memory, for storing program;
The processor realizes each step of the forward inference method of the neural network for executing described program.
A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor
Each step of the forward inference method of the existing neural network.
It can be seen from the above technical scheme that the forward inference method of neural network provided by the present application, first by mesh
Mark neural network is divided into multiple sub-networks, is then that reasoning example and reasoning is respectively created in multiple sub-networks on Inference Platform
Engine is finally based on the corresponding reasoning example of multiple sub-networks and inference engine, to pushing away before carrying out to target nerve network
Reason, due to inference engine have it is multiple, and an inference engine only be responsible for target nerve network a part of hidden layer, this makes same
Moment can have multiple data to be input to the operation for executing corresponding sub-network in different inference engines parallel, with existing reasoning
Scheme is compared, since synchronization has multiple inference engines to be based on multiple input datas while carrying out operation, hardware resource
It is fully used, that is, improves the utilization rate of hardware resource, meanwhile, Reasoning Efficiency is improved, data throughout is improved, and
Under the premise of storage resource is constant, memory space is saved.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the schematic diagram of the forward inference process comprising a reasoning example;
Fig. 2 is the schematic diagram of the forward inference process provided by the embodiments of the present application comprising multiple reasoning examples;
Fig. 3 is the flow diagram of the forward inference method of neural network provided by the embodiments of the present application;
Fig. 4 is the process of the calculation amount provided by the embodiments of the present application for obtaining target nerve network and required memory space
Schematic diagram;
Fig. 5 is provided by the embodiments of the present application based on the hardware equipment information of Inference Platform and target nerve network
Target nerve network, is divided into the flow diagram of multiple sub-networks by calculation amount and required memory space;
Fig. 6 is the calculating of the hardware equipment information based on Inference Platform, target nerve network provided by the embodiments of the present application
The parallel schema of amount and required memory space and user configuration determines one kind of the parallel schema of suitable target nerve network
The flow diagram of optional specific implementation;
Fig. 7 is the exemplary schematic diagram provided by the embodiments of the present application that sub-network division is carried out to neural network;
Fig. 8 is the schematic diagram that inference engine is created under more equipment parallel schemas provided by the embodiments of the present application;
Fig. 9 is an exemplary schematic diagram of the reasoning process of neural network provided by the embodiments of the present application;
The structure chart of the forward inference device of Figure 10 neural network provided by the embodiments of the present application;
The structure chart of the forward inference equipment of Figure 11 neural network provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
Inventor has found during realizing the invention: existing inference schemes are created for entire neural network
One reasoning example and an inference engine, i.e., existing inference schemes are based on a reasoning example and an inference engine is realized
The reasoning of entire neural network, specific:
Firstly, calculating figure according to the parameter building to reasoning neural network, then determine according to calculating figure to reasoning nerve
Network it is whole execute sequence, then, based on the execution to reasoning neural network entirety sequentially with reasoning example by different hidden layers
Execution function be sequentially sent to inference engine and execute that queue etc. is pending, and after input data arrives, inference engine is according in it
The sequence that portion executes execution function in queue carries out the operation of each hidden layer, that is, serially holds between entire neural network difference hidden layer
Row, also, inference engine can just receive next only after all having executed the operation of each hidden layer for an input data
Input data, that is to say, that in existing inference schemes, the execution inner queue of the forward inference that individually inputs in inference engine
Strict sequential, also, also strict sequential between different input datas.
As shown in Figure 1, creating a reasoning example, input data X in Inference Platform for the neural network for having N number of hidden layer
Into after network, operation sequentially Jing Guo N number of hidden layer finally obtains output Y, the execution time T=T of whole network1+T2+…+
TN, wherein TiThe execution time for indicating i-th of hidden layer, for deep neural network, hidden layer number is more, a forward inference institute
The time needed is longer, and any moment only one hidden layer is executing, and other layers all idle, this causes to carry out based on single example
When forward inference, limited throughput is calculated.Wherein, handling capacity refers to the number of the input data handled in the unit time.
With the continuous development of chip technology, the computing capability of the various hardware devices suitable for deep learning obtains pole
Big to be promoted, by taking the tall and handsome GPU (Graphics Processing Unit, graphics processor) reached as an example, M40 single precision calculates energy
Power reaches 7Tflops (flops i.e. per second reaches 7T), and P40 reaches 12Tflops, and (flops i.e. per second reaches
Reach 15Tflops (flops i.e. per second reaches 15T) to 12T), V100, and newly added TensorCore theory is most
It is high up to 120Tflops, in existing forward inference scheme, single inference engine only carries out the fortune of single hidden layer in synchronization
It calculates, and the calculation amount of single hidden layer is difficult to keep GPU fully loaded, i.e., existing forward inference scheme makes the utilization rate of hardware device very
It is low, serious waste hardware resource.
In order to improve the speed of forward inference, the utilization rate of hardware device is improved, inventor has made intensive studies:
Originally thinking is: in order to make full use of hardware computing resource, creating multiple reasoning examples, a reasoning example is negative
An input data is blamed, operation is carried out to whole network based on the input data, when move ahead reasoning, opens multiple push away
Reason example makes inferences simultaneously, makes inferences as shown in Fig. 2, opening 4 reasoning examples simultaneously.
Inventor it has been investigated that: although above-mentioned thinking can play the powerful calculating energy of hardware device to a certain extent
Power, but be still that serial relationship (as shown in Fig. 2, being Serial Relation inside 0~example of example 4, works as reality inside each example
After input data X0 in example 0 enters network, operation sequentially Jing Guo N number of hidden layer finally obtains output Y0, obtains output Y0
Afterwards, input data X4 just can enter the example and carry out operation), do not improved, and each reasoning example needs are opened
A storage resource is warded off, when there are many number of plies of neural network, the storage demand of whole network is likely to be breached the rank of GB, storage
The increase of demand will lead to hardware price raising, and then the cost of forward inference is caused to improve.
In view of the above problems, inventor's further progress further investigation, finally proposes a kind of effect preferable
Forward inference scheme.It is situated between followed by forward inference method of following embodiments to neural network provided by the present application
It continues.
Referring to Fig. 3, the flow diagram of the forward inference method of neural network provided by the embodiments of the present application is shown,
This method may include:
Step S301: target nerve network is divided into multiple sub-networks.
Wherein, target nerve network is the neural network to reasoning, it is to be understood that target nerve network generally comprises
Multiple hidden layers execute operation between each hidden layer in order, and for multiple sub-networks that division obtains, each sub-network can
To include a hidden layer, it also may include multiple continuous adjacent hidden layers, there is successive dependence between multiple sub-networks.
Specifically, may include: by the process that target nerve network is divided into multiple sub-networks
Step S3011, the hardware equipment information and the calculation amount of target nerve network and required of Inference Platform are obtained
Memory space.
Wherein, Inference Platform can with but be not limited to GPU server, TPU (Tensor Processing Unit, tensor
Processing unit) server etc., the hardware device of Inference Platform can be the equipment with storage capacity and computing capability, such as aobvious
Card.
Wherein, hardware equipment information may include the number of hardware device, the computing capability of hardware device, hardware device
One of transmission bandwidth between memory capacity, hardware device is a variety of, preferably simultaneously includes above-mentioned four kinds of information.In one kind
Possible implementation can call intrinsic function to obtain the hardware equipment information of Inference Platform when starting Framework for Reasoning.
Illustratively, Inference Platform is GPU (Graphics Processing Unit, the graphics process of 4 card P40 video cards
Device) server, cuda function interface can be called to obtain hardware equipment information as follows: hardware device number is 4, the meter of hardware device
Calculation ability is 6.2, and tabling look-up obtains single precision 12Tflops flops i.e. per second of handling up and reach 12T, each hardware device
Memory capacity be 24G, bandwidth is that 10G/s nvlink bandwidth reaches 100G/s between PCIE interface equipment.
Wherein, the calculation amount of target nerve network refers to the calculation amount of entire target nerve network, can pass through target
The calculation amount of each hidden layer of neural network determines that memory space needed for target nerve network is referred to each of entire neural network
Total memory space needed for hidden layer carries out operation can be determined by memory space needed for each hidden layer of target nerve network.It obtains
The detailed process of the calculation amount and required memory space that take target nerve network can be found in the explanation of subsequent embodiment.
Step S3012, the calculation amount of the hardware equipment information based on Inference Platform and target nerve network and required
Target nerve network is divided into multiple sub-networks by memory space.
It should be noted that the hardware equipment information of Inference Platform and the calculation amount of target nerve network and required depositing
Storage space determine target nerve network division number and the division number based on sub-network to target nerve network into
The hidden layer that each sub-network is included when row divides is based on this, and the present embodiment is by the hardware equipment information and mesh of Inference Platform
The calculation amount for marking neural network and required memory space are as the partitioning standards for carrying out sub-network division to target nerve network.
Step S302: the corresponding reasoning example of multiple sub-networks and reasoning are created on the hardware device of Inference Platform
Engine.
Specifically, after target nerve network being divided into multiple sub-networks, need for each sub-network create one push away
Manage example and an inference engine, wherein reasoning example is responsible for the operation of each hidden layer in its corresponding sub-network, and reasoning is drawn
It holds up and is responsible for receiving input data, the operation of corresponding sub-network is completed based on input data and corresponding reasoning example.
Step S303: be based on the corresponding reasoning example of multiple sub-networks and inference engine, to target nerve network into
Row forward inference.
Since target nerve network is divided into multiple sub-networks, the corresponding inference engine of each sub-network and reasoning example,
Therefore, an inference engine is only responsible for a sub-network (i.e. part hidden layer), this allows synchronization to have multiple input numbers
According to multiple and different inference engines is input to, i.e. synchronization has multiple inference engines to be based on input data and corresponding reasoning reality
Example carries out concurrent operation.
Target nerve network is divided into multiple by the forward inference method of neural network provided by the embodiments of the present application first
Then reasoning example and inference engine is respectively created for multiple sub-networks in sub-network, finally respectively corresponded based on multiple sub-networks
Reasoning example and inference engine, forward inference is carried out to target nerve network, since inference engine has multiple, an and reasoning
Engine is only responsible for a part of hidden layer of target nerve network, this allows synchronization to have multiple data to be input to different push away
The operation for executing corresponding sub-network in engine parallel is managed, compared with existing inference schemes, since synchronization there are multiple reasonings
Engine is based on multiple input datas and carries out operation simultaneously, and therefore, hardware resource is fully used, that is, improves hardware resource
Utilization rate, meanwhile, Reasoning Efficiency is improved, data throughout is improved, and under the premise of storage resource is constant, saves and deposits
Store up space.
Below in above-mentioned steps S3011 obtain target nerve network calculation amount and required memory space process into
Row explanation.
Referring to Fig. 4, the flow diagram of the calculation amount and required memory space that obtain target nerve network is shown,
May include:
Step S401: according to the network parameter of target nerve network, the calculating figure of target nerve network is constructed.
Wherein, target nerve network includes that input layer, multiple hidden layers and output layer, input data are inputted by input layer, according to
The secondary operation (input that the output of previous hidden layer is the latter hidden layer) by each hidden layer, the final operation knot of each hidden layer
Fruit is exported by output layer, and in the present embodiment, the network parameter of target nerve network may include of target nerve network hidden layer
Number, the connection relationship of each hidden layer, Inport And Outport Node serial number of several, each hidden neuron etc., these network parameters are anti-
The complexity for having reflected target nerve network, it is related with the calculation amount of target nerve network and required memory space.
Optionally, the present embodiment can network parameter and preset Depth Priority Algorithm wound based on target nerve network
Build calculating figure.Wherein, the calculating figure of target nerve network is the figure for being able to reflect out the calculating process of target nerve network, packet
Node and side are included, while representing the operation that each hidden layer executes function, node on behalf executes the input of function.
Step S402: according to the calculating figure of target nerve network, the calculation amount of each layer of target nerve network and required is determined
Memory space.
After the calculating figure for obtaining target nerve network, calculating figure is traversed to obtain the calculation amount of each hidden layer and required
Memory space.In order to obtain each hidden layer calculation amount and required memory space, can in advance for each hidden layer be arranged calculation amount
Calculating function and required memory space calculating function, and it is associated with or is bound with hidden layer, it should be noted that hidden layer
There are many types, such as convolutional layer, pond layer, full articulamentum etc., needs that different calculating letters is arranged for different types of hidden layer
Number.
Optionally, multiply-add number required for completing a hidden layer can be used to indicate the calculating of a hidden layer in the present embodiment
Amount.Illustratively, input dimension is r*k, the calculation amount for the full articulamentum that neuron number is n is r*k*n*2, required storage
Space is r*k+k*n+r*n;Input dimension be v*c*h*w, convolution kernel kh*kw, step-length sh*sw, output channel number be f
The calculation amount of convolutional layer is about (v*c*h*w*kh*kw*f*2)/(sh*sw), required memory space is about v*c*h*w+f*c*kh*
kw。
Step S403: calculation amount and required memory space by each layer of target nerve network determine entire target mind
Calculation amount and required memory space through network.
It is after the calculation amount and required memory space for obtaining each hidden layer of target nerve network, target nerve network is each hidden
The calculation amount of layer is cumulative, obtains the calculation amount of entire target nerve network;Likewise, by needed for each hidden layer of target nerve network
Memory space is cumulative, memory space needed for obtaining entire target nerve network.
In the calculation amount and required storage of the hardware equipment information and entire target nerve network for obtaining Inference Platform
Behind space, using these information as foundation, the division of sub-network is carried out to target nerve network.
Below to " the step S3012: hardware equipment information and target nerve based on Inference Platform in above-described embodiment
Target nerve network is divided into multiple sub-networks by the calculation amount of network and required memory space " realization process be situated between
It continues, referring to Fig. 5, showing the flow diagram of the realization process, may include:
Step S501: the calculation amount and required storage of hardware equipment information, target nerve network based on Inference Platform
The parallel schema of space and user configuration determines the parallel schema of suitable target nerve network.
Wherein, parallel schema includes single device parallel schema and more equipment parallel schemas, whole under single device parallel schema
The forward inference process of a target nerve network is realized based on individual equipment, under more equipment parallel schemas, entire target nerve
The forward inference process of network is based on multiple equipment, and (multiple equipment can be all hardware equipment on Inference Platform, can also be with
For fractional hardware equipment) it realizes.
It should be noted that the parallel schema of user configuration may be the parallel schema for being suitble to target nerve network, it can also
It can be the parallel schema of unsuitable target nerve network, for example, the parallel schema of user configuration possibly can not support target nerve
Calculation amount needed for network, or memory space needed for can not supporting target nerve network, in view of this, it is thus necessary to determine that go out true
Just it is being suitble to the parallel schema of target nerve network, when determining the parallel schema of suitable target nerve network, should considering that hardware is set
Standby computing capability and storage capacity considers the operation demand of target nerve network, it is also contemplated that user configuration is parallel again
Mode.
Step S502: target nerve network is divided into multiple subnets by the parallel schema based on suitable target nerve network
Network.
After the parallel schema for determining to be suitble to target nerve network, the parallel mould of target nerve network can be suitble to based on this
Formula determines that sub-network divides number, divides number based on the sub-network determined and divides to target nerve network.
Below first to above-mentioned " step S501: the calculation amount of hardware equipment information, target nerve network based on Inference Platform
With the parallel schema of required memory space and user configuration, determine the parallel schema of suitable target nerve network " realization
Process is introduced.
The calculation amount of hardware equipment information, target nerve network based on Inference Platform and required memory space and use
The parallel schema of family configuration, if the process for determining the parallel schema of suitable target nerve network may include: entire target nerve
The computationally intensive computing capability in individual equipment of network, and/or, memory space needed for entire target nerve network is greater than single
The memory capacity of a equipment, it is determined that the parallel schema for being suitble to target nerve network is more equipment parallel schemas;If entire target
The calculation amount of neural network is less than or equal to the computing capability of individual equipment, also, storage needed for entire target nerve network
Space is less than or equal to the memory capacity of individual equipment, then determines suitable target nerve network based on the parallel schema of user configuration
Parallel schema.
It should be noted that the computationally intensive computing capability in individual equipment of entire target nerve network, alternatively, entirely
Memory space needed for target nerve network is greater than the memory capacity of individual equipment, shows that individual equipment is unable to satisfy target nerve
Network operations demand, no matter which kind of parallel schema the parallel schema of user configuration is, requires to keep final parallel schema more
Equipment parallel schema, that is to say, that if the parallel schema of user setting is single device parallel schema, need single device is parallel
Mode adjustment be more equipment parallel schemas, if the parallel schema of user setting be more equipment parallel schemas, keep more equipment simultaneously
Row mode is constant.
It should be noted that if the calculation amount of entire target nerve network is less than or equal to the computing capability of individual equipment,
Also, memory space needed for entire target nerve network is less than or equal to the memory capacity of individual equipment, shows individual equipment
It can satisfy target nerve network operations demand, at this point, single device parallel schema and more equipment parallel schemas are able to satisfy target
The operation demand of neural network can determine suitable target nerve network in this case based on the parallel schema of user configuration
Parallel schema.
Further, the realization of the parallel schema of suitable target nerve network is determined based on the parallel schema of user configuration
Journey include: when the parallel schema of user configuration is single device parallel schema, can be directly using single device parallel schema as suitable
The parallel schema of target nerve network;When the parallel schema of user configuration is more equipment parallel schemas, a kind of optional realization
Mode is, directly using more equipment parallel schemas as the parallel schema for being suitble to target nerve network, however, it is contemplated that more equipment are simultaneously
Under row mode, there are the transmission of the data of equipment room certainly will will affect target nerve net if the data transmission period of equipment room is too long
The reasoning rate of network, at this point, the use of more equipment parallel schemas not being a kind of preferred scheme.
For example, transmission bandwidth only has 10G/s, and single precision calculating, which is handled up, can achieve for the P40 of PCIE connection
12TFlops, that is, flops per second reaches 12T, is 1200 times of transmission, to input dimension as m*k, neuron number is
For the full articulamentum of n, this layer of calculation amount is m*n*k*2, output data quantity m*n, when k value is little, due between equipment
Transmission time be greater than equipment itself the calculating time, equipment need hang up etc. pending datas arrival, waste computing resource, increase
It is at this moment made inferences using multi-card paralleled mode and bad the total time of reasoning.
In view of this, equipment room transmission time and pre-set subnet can be based in a kind of preferred implementation
The maximum execution time of network determines the parallel schema of suitable target nerve network, specifically, if the transmission time of equipment room is greater than
The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to target nerve network is the parallel mould of single device
Formula, if equipment room transmission time is less than or equal to the maximum execution time of pre-set sub-network, it is determined that be suitble to target mind
Parallel schema through network is more equipment parallel schemas.
Referring to Fig. 6, showing the hardware equipment information based on Inference Platform, the calculation amount of target nerve network and required
Memory space and user configuration parallel schema, determine the optional tool of one kind of the parallel schema of suitable target nerve network
The flow diagram of body implementation may include:
Step S601: judging whether the calculation amount of target nerve network is greater than the computing capability of individual equipment, if it is not, then holding
Row step S602;If so, thening follow the steps S603.
Step S602: whether memory space needed for judging target nerve network is greater than the memory capacity of individual equipment, if
It is to then follow the steps S603;If it is not, thening follow the steps S604.
It should be noted that the execution sequence that the present embodiment does not limit step S601 and step S602 is said sequence,
For example, step S602 can be first carried out, then execute step S601, can also step S601 and step S602 execute parallel.Regardless of using
Which kind of sequence executes, and is to execute S603 when any judging result, which is, is, when two judging results are no, executes step
S604。
Step S603: the parallel schema for determining suitable target nerve network is more equipment parallel schemas.
Step S604: whether the parallel schema for judging user configuration is more equipment parallel schemas, if not, thening follow the steps
S605;If so, thening follow the steps S606.
Step S605: the parallel schema for determining suitable target nerve network is single device parallel schema.
Step S606: judging whether the transmission time of equipment room is greater than the maximum execution time of pre-set sub-network,
If so, executing step S605;If it is not, thening follow the steps S603.
Below by a specific example to the calculation amount of hardware equipment information, target nerve network based on Inference Platform and
It is further to determine that the parallel schema of suitable target nerve network carries out for the parallel schema of required memory space and user configuration
Illustrate:
The hardware device of Inference Platform is P40 video card, and the actual storage capacity of single P40 video card is 24GB, due to go
Except some system spaces and reserved space, the memory capacity of single P40 video card is 22GB, the single precision peak computational of P40 video card
Ability is 12TFlops, due to being extremely difficult to this theoretical peak under actual conditions, while in view of calculation scale, read-write delay
Deng influence, using 8TFlops as the average computation ability of single P40 video card, the parallel schema of Inference Platform includes that single deck tape-recorder is parallel
Mode and multi-card paralleled mode, the calculation amount of target nerve network is S, and required memory space is M, determines suitable target mind
The process of parallel schema through network:
If M > 22G or S/ (8*1012) > T1max(storage demand of i.e. entire target nerve network is greater than the available of single deck tape-recorder
Video memory, alternatively, the computationally intensive average computation ability in single deck tape-recorder of entire target nerve network), then show that single deck tape-recorder is unable to complete
The forward inference task of entire target nerve network, at this point, determining that multi-card paralleled mode is the parallel of suitable target nerve network
Mode.Wherein, T1maxBetween user setting, one input operation occupancy of single deck tape-recorder completion maximum execution.It needs to illustrate
When, in M > 22G or S/ (8*1012) > T1maxWhen, no matter which kind of mode the parallel schema of user configuration is, all by multi-card paralleled
Mode is determined as being suitble to the parallel schema of target nerve network.
If M≤22G and S/ (8*1012)≤T1max(storage demand of i.e. entire target nerve network is less than or equal to single deck tape-recorder
Available video memory, and the calculation amount of entire target nerve network is less than or equal to the average computation ability of single deck tape-recorder), then show single deck tape-recorder
The forward inference task of entire target nerve network can be completed, at this point, the parallel schema based on user configuration determines suitable mesh
Mark the parallel schema of neural network.Specifically, if the parallel schema of user configuration is single deck tape-recorder parallel schema, it is determined that be suitble to target
The parallel schema of neural network is single deck tape-recorder parallel schema;If the parallel schema of user configuration is multi-card paralleled mode, further base
The transmission time T between cardtWith the maximum execution time T2 of pre-set sub-networkmaxDetermine suitable target nerve network and
Row mode, specifically, if card between transmission time Tt> T2max, it is determined that the parallel schema for being suitble to target nerve network is single device
Parallel schema, if Tt≤T2max, it is determined that the parallel schema for being suitble to target nerve network is more equipment parallel schemas.Wherein, block
Between transmission time TtThe data interaction amount of=m/B, m between sub-network, B are transmission bandwidth between card.
It, can be based on the parallel mould of suitable target nerve network after the parallel schema for determining to be suitble to target nerve network
Target nerve network is divided into multiple sub-networks by formula.It is to the parallel schema based on suitable target nerve network that target is refreshing below
Multiple sub-networks are divided into through network to be introduced.
Based on the parallel schema of suitable target nerve network, target nerve network is divided into the realization of multiple sub-networks
If journey may include: that the parallel schema of target nerve network is suitble to be more equipment parallel schemas, the number based on hardware device
It obtains sub-network and divides number, number is divided based on the sub-network, target nerve network is divided;If being suitble to target nerve
The parallel schema of network is single device parallel schema, divides number based on preset sub-network and carries out to target nerve network
It divides.
It should be noted that in more equipment parallel schemas, if equipment number more than two, each hardware device can be based on
Information and target nerve network operation demand, determine the number of hardware device actually used under more equipment parallel schemas
Amount, for example, there is 5 hardware devices on Inference Platform, then can only use 3 hardware devices, certainly, it is flat that reasoning also can be used directly
All hardware equipment on platform.That is, when be suitble to target nerve network parallel schema be more equipment parallel schemas when, can by P (2 <
=P≤M, M are the number of hardware device on Inference Platform) division number as sub-network, target nerve network is divided into
P sub-network, each equipment are responsible for the calculation amount of S/P, it is preferred that can using the number of hardware device on M, that is, Inference Platform as
Target nerve network is divided into M sub-network by the division number of sub-network, i.e., each equipment is responsible for the calculation amount of S/M,
In, S is the calculation amount of entire target nerve network.
Below to when be suitble to target nerve network parallel schema be more equipment parallel schemas when, based on determining sub-network
Number is divided, the process divided to target nerve network is introduced.
In one possible implementation, the process that number divides target nerve network is divided based on sub-network
It may include: the division number based on sub-network, the maximum data for theoretical amount and the equipment room transmission being responsible for single device
Amount is partitioning standards, is divided to target nerve network.
Wherein, the division that the responsible theoretical amount of single device passes through the calculation amount and sub-network of entire target nerve network
Number determines, specifically, the division number of sub-network is M, then single device is negative if the calculation amount of entire target nerve network is S
The theoretical amount of duty is S/M;The maximum execution time that the maximum amount of data of equipment room transmission passes through pre-set sub-network
It is determined with the transmission bandwidth of equipment room, specifically, if the maximum execution time of pre-set sub-network is T2max, equipment room
Transmission bandwidth is B, then the maximum amount of data m of equipment room transmissionmax=T2max*B。
Further, the division number based on sub-network, the theoretical amount being responsible for single device and equipment room transmission
Maximum amount of data is partitioning standards, and the process divided to target nerve network may include: from the defeated of target nerve network
Enter layer to start successively to traverse backward: being sequentially overlapped the calculation amount of each hidden layer, and is currently superimposed obtained calculation amount close to setting up
When standby theoretical amount (such as S/M) being responsible for, the sub-network for the multiple adjacent hidden layers compositions being overlapped is obtained as candidate
Sub-network;If the output data quantity (i.e. the data volume of the last one hidden layer output of the candidate sub networks network) of candidate sub networks network is less than
Or the maximum amount of data m equal to equipment room transmissionmax, then using the candidate sub networks network as the obtained sub-network of division;If should
The number of output of candidate sub networks network is greater than the maximum amount of data m of equipment room transmissionmax, then from the candidate sub networks network from the front to the back
Hidden layer is removed one by one, until the output data quantity of the sub-network after removing is less than or equal to the maximum amount of data of equipment room transmission
mmax, sub-network after removing hidden layer is as dividing an obtained sub-network;Continuation traverses backward, until obtaining all sons
Network, wherein after one sub-network of every acquisition, the calculation amount of the hidden layer after the sub-network is overlapped again.
Illustratively, as shown in fig. 7, target nerve network includes Q hidden layer, it is followed successively by Layer 1, Layer in order
2 ..., LayerQ are traversed backward since input layer, and the calculation amount for being sequentially overlapped each hidden layer obtains Ssum(i), for example, when traversal
When to first hidden layer, SsumIt (1) is the calculation amount of first hidden layer, when traversing second hidden layer, SsumIt (2) is first
The calculation amount of the calculation amount of hidden layer and second hidden layer and, and so on, work as Ssum(K) it is responsible for close or equal to single device
When theoretical amount, using 1~LayerK of Layer as a candidate sub networks network, further by the output number of the candidate sub networks network
It is compared according to the maximum amount of data that amount is transmitted with equipment room, if the output data quantity of the candidate sub networks network is less than or equal to equipment
Between the maximum amount of data that transmits, then first sub-network obtained the candidate sub networks network as division, if the candidate sub networks network
Output data quantity be greater than equipment room transmission maximum amount of data, then removed one by one from the front to the back from the candidate sub networks network hidden
Layer, until the output data quantity for removing the sub-network after hidden layer is less than or equal to the maximum amount of data of equipment room transmission, for example, will
The output data quantity of sub-network after LayerK and LayerK-1 removal is less than or equal to the maximum amount of data of equipment room transmission, then
It is gradually obtained subsequent using Layer1~LayerK-2 as first obtained sub-network is divided then according to identical strategy
Each sub-network.It should be noted that one sub-network of every acquisition, is superimposed first hidden layer again after the sub-network
Calculation amount.
Below to when being suitble to the parallel schema of target nerve network to be single device parallel schema, it is based on preset son
Network divides the process that number divides target nerve network and is introduced.
When using single device parallel schema, number can be divided by preset sub-network and target nerve network is drawn
Point, it should be noted that sub-network, which divides number, should be arranged suitable, should not be too large, and sub-network divides that number is excessive to be will lead to
The calculation amount of single subnet network becomes very little, at this point, calculating time of the data synchronization time between sub-network relative to sub-network
Proportion can increase, and then drag down the handling capacity of sub-network.It, can be (whole based on that average computational load under single device parallel schema
The calculation amount of a target nerve network/preset sub-network divides number) target nerve network is divided.It is exemplary
, it is 8 that preset sub-network, which divides number, and the calculation amount of entire target nerve network is S, then that average computational load is S/8,
It is divided to target nerve network, is sequentially overlapped the calculation amount of each hidden layer, and approach being currently superimposed obtained calculation amount
Or when being equal to S/8, the sub-network for obtaining the multiple adjacent hidden layer compositions being overlapped makees one and divides an obtained sub-network,
Then superposition calculation amount again first hidden layer after the sub-network, when the calculation amount that superposition obtains is close or equal to S/
When 8, sub-network work that acquisition carries out multiple adjacent hidden layers compositions of this wheel superposition divides another obtained sub-network, with such
It pushes away, obtains all sub-networks, the calculation amount of each sub-network is close or equal to S/8.
It should be noted that the division of sub-network can be carried out to entire target nerve network in single device parallel schema, it is right
Individual equipment under multi-card paralleled mode, the sub-network that can be obtained to division is further divided, specific to divide
Mode can refer under single device parallel schema, the division mode divided to entire target nerve network.
After target nerve network is divided into multiple sub-networks, need to be each height on the hardware device of Inference Platform
Network creation reasoning example and inference engine.Specifically, needing respectively each sub-network right more equipment parallel schemas
Reasoning example and inference engine are created on the hardware device answered, and create reasoning under more equipment parallel schemas referring to Fig. 8, showing
The schematic diagram of engine, the neural network shown in Fig. 8 are divided into 4 sub-networks, and each sub-network corresponds to a hardware device,
When creating inference engine, an inference engine is respectively created on 4 hardware devices;For single device parallel schema, need one
It is respectively that each sub-network creates corresponding reasoning example and inference engine in a equipment.
It is right respectively based on multiple sub-networks after having created the corresponding reasoning example of multiple sub-networks and inference engine
The reasoning example and inference engine answered carry out forward inference to entire target nerve network.It is successive due to having between multiple sub-networks
Therefore dependence is being based on the corresponding reasoning example of multiple sub-networks and inference engine to entire target nerve network
When carrying out forward inference, draw firstly, establishing the corresponding reasoning of multiple sub-networks based on the successive dependence between multiple sub-networks
Successive dependence between holding up, specifically, can the successive dependence based on multiple sub-networks respectively corresponded in multiple sub-networks
Inference engine between establish read-write mark (as shown in figure 8, the reasoning created in the inference engine created on the device 1 and equipment 2
The reasoning created between the inference engine created between engine, in equipment 2 and the inference engine created in equipment 3, in equipment 3 is drawn
Hold up and establish read-write mark between the inference engine that creates in equipment 4 respectively), it is then, right respectively to multiple sub-networks in order
The inference engine input data answered, so that each inference engine is based on input data and corresponding reasoning example to its corresponding son
Network carries out operation.
The process that neural network in Fig. 8 makes inferences are as follows: data 1 are sequentially input into equipment 1 and (complete the fortune of sub-network 1
Calculate), equipment 2 operation of sub-network 2 (complete), equipment 3 (operation for completing sub-network 3) and (fortune of completion sub-network 4 of equipment 4
Calculate), the forward inference process for data 1 is completed with this, needs to stress, when equipment 1 is sent for the output of data 1
While entering equipment 2, the meeting input equipment 1 of data 2, it can be seen that, in synchronization, have multiple input datas in different reasonings
It is executed parallel in engine.
It is further illustrated below with reference to reasoning process of the Fig. 9 to neural network:
Include N number of inference engine in Fig. 9, be followed successively by Engine1, Engine2 ..., EngineN, at any time T1, will
New input data dataN is sent into inference engine Engine1 and carries out operation, is sent into inference engine Engine1 in data N and carries out
While operation, dataN-1 (substantially referring to that Engine1 is directed to the operation result of dataN-1) is sent into inference engine Engine2
Carry out operation, dataN-2 (substantially refer to Engine2 be directed to dataN-2 operation result) be sent into inference engine Engine3 into
Row operation, and so on, data 1 (substantially referring to that EngineN-1 is directed to the operation result of data1) is sent into inference engine
EngineN carries out operation, it can be seen that, in synchronization T1, inference engine Engine1~inference engine EngineN simultaneously into
Row operation.
Compare inference method in the prior art (only one inference engine of synchronization carry out operation, and, a reasoning
Engine carries out operation for whole network) it is found with inference method provided by the present application: assuming that have x input data, existing skill
Inference time needed for inference method in art is x*t, wherein t is inference time required for an input data, the application
Inference time needed for the inference method of offer is t/N* (2*x-1), wherein N is the number of inference engine, when n is large, whole
The handling capacity of a target nerve network substantially increases Reasoning Efficiency close to N/2 times of existing inference schemes.
The forward inference method of neural network provided by the embodiments of the present application, since neural network is divided into multiple subnets
Network, the corresponding inference engine of each sub-network, therefore, each inference engine is only responsible for a part of hidden layer of target nerve network,
This allows synchronization to have multiple data to be input to different inference engine progress operations, and synchronization has multiple reasonings to draw
Holding up concurrent operation makes the hardware resource of Inference Platform be fully used, and significantly improves Reasoning Efficiency, data throughout
It greatly increases.
The embodiment of the present application also provides a kind of forward inference devices of neural network, provide below the embodiment of the present application
The forward inference device of neural network be described, the forward inference device of neural network described below with it is above-described
The forward inference method of neural network can correspond to each other reference.
Referring to Fig. 10, the structure for showing a kind of forward inference device of neural network provided by the embodiments of the present application is shown
It is intended to, as shown in Figure 10, the apparatus may include: network process module 1001, example and engine creation module 1002 and reasoning
Module 1003.
Network process module 1001, for target nerve network to be divided into multiple sub-networks, wherein any sub-network packet
Include at least one hidden layer of the target nerve network.
Example and engine creation module 1002, for creating the multiple subnet on the hardware device of the Inference Platform
The corresponding reasoning example of network and inference engine.
Reasoning module 1003, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to institute
It states target nerve network and carries out forward inference.
Target nerve network can be divided into multiple sons by the forward inference device of neural network provided by the embodiments of the present application
Then reasoning example and inference engine is respectively created for multiple sub-networks in network, and then corresponding based on multiple sub-networks
Reasoning example and inference engine carry out forward inference to target nerve network, and since inference engine has multiple, and a reasoning is drawn
A part of hidden layer of only responsible target nerve network is held up, this allows synchronization there are multiple data to be input to different reasonings
Concurrent operation in engine, compared with existing inference schemes, since synchronization has multiple inference engine concurrent operations,
Hardware resource is fully used, that is, improves the utilization rate of hardware resource, meanwhile, Reasoning Efficiency is improved, data are improved
Handling capacity, and under the premise of storage resource is constant, save memory space.
In one possible implementation, in the forward inference device of neural network provided by the above embodiment, network
Processing module 1001 may include: data obtaining module and sub-network division module.
The data obtaining module, for obtain the Inference Platform hardware equipment information and the target nerve net
The calculation amount of network and required memory space.
The sub-network division module, for based on the Inference Platform hardware equipment information and the target nerve
The target nerve network is divided into multiple sub-networks by the calculation amount of network and required memory space.
In the forward inference device of neural network provided by the above embodiment, data obtaining module may include hardware information
Acquisition submodule.
The hardware information acquisition submodule, for obtaining one of following information or a variety of: the number of hardware device,
Transmission bandwidth between the computing capability of hardware device, the memory capacity of hardware device, hardware device.
In one possible implementation, data obtaining module further include: calculate figure building submodule and calculation amount
Submodule is determined with memory space.
The calculating figure building submodule constructs the target for the network parameter according to the target nerve network
The calculating figure of neural network.
The calculation amount and memory space determine submodule, for the calculating figure according to the target nerve network, determine
The calculation amount of each layer of target nerve network and required memory space pass through the calculation amount of each layer of target nerve network
With required memory space, the entirely calculation amount of the target nerve network and required memory space are determined.
In one possible implementation, the subnet in the forward inference device of neural network provided by the above embodiment
Network division module may include: that parallel schema determines that submodule and sub-network divide submodule.
The parallel schema determines submodule, for the hardware equipment information based on the Inference Platform, target mind
The parallel schema of calculation amount and required memory space and user configuration through network, determination are suitble to the target nerve network
Parallel schema, wherein the parallel schema includes single device parallel schema and more equipment parallel schemas, the single device simultaneously
Under row mode, the forward inference of the target nerve network is realized based on individual equipment, under more equipment parallel schemas, institute
The forward inference for stating target nerve network is realized based on multiple equipment;
The sub-network divides submodule, for based on the parallel schema for being suitble to the target nerve network, by the mesh
Mark neural network is divided into multiple sub-networks.
In one possible implementation, the parallel schema determines that submodule includes: first to determine submodule and the
Two determine submodule.
Described first determines submodule, by working as the computationally intensive based on individual equipment of the entire target nerve network
Calculation ability, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determine
The parallel schema for being suitble to the target nerve network is more equipment parallel schemas.
Described second determines submodule, for being less than or equal to the list when the calculation amount of the entire target nerve network
The computing capability of a equipment, also, memory space needed for the entire target nerve network is individually set less than or equal to described
When standby memory capacity, determines based on the parallel schema of the user configuration and be suitble to the parallel schema of the target nerve network.
In one possible implementation, it is described second determine submodule, specifically for when the user configuration and
Row mode be the single device parallel schema when, determine be suitble to the target nerve network parallel schema be the single device simultaneously
Row mode;When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than
The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to the target nerve network is described sets up
Standby parallel schema, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network,
It then determines and the parallel schema of the target nerve network is suitble to be more equipment parallel schemas.
In one possible implementation, it includes: the first division submodule and second that the sub-network, which divides submodule,
Divide submodule.
It is described first divide submodule, for when be suitble to the target nerve network parallel schema be more equipment simultaneously
When row mode, the number based on the hardware device obtains the division number of sub-network, and the division based on the sub-network
It is several that the target nerve network is divided.
It is described second divide submodule, for when be suitble to the target nerve network parallel schema be the single device simultaneously
When row mode, number is divided based on preset sub-network, the target nerve network is divided.
In one possible implementation, described first submodule is divided, specifically for drawing based on the sub-network
Divide number, the maximum amount of data for theoretical amount and the equipment room transmission being responsible for using single device is partitioning standards, to the target
Neural network is divided.
Wherein, the responsible theoretical amount of the single device passes through the calculation amount of the entire target nerve network and described
The division number of sub-network determines that the maximum amount of data of the equipment room transmission is executed by the maximum of pre-set sub-network
The transmission bandwidth of time and equipment room determines.
In one possible implementation, described first submodule is divided, be specifically used for from the target nerve network
Input layer start successively to traverse backward: be sequentially overlapped the calculation amount of each hidden layer, and close being currently superimposed obtained calculation amount
When the theoretical amount that the single device is responsible for, the sub-network for the multiple adjacent hidden layers compositions being overlapped is obtained as candidate son
Network;If the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission, by institute
State the sub-network that candidate sub networks network is obtained as division;If the number of output of the candidate sub networks network is greater than the equipment room
The maximum amount of data of transmission then removes hidden layer from the candidate sub networks network one by one from the front to the back, until the sub-network after removing
Output data quantity be less than or equal to the maximum amount of data of equipment room transmission, the sub-network after removing hidden layer is as dividing
The sub-network arrived;Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again
The calculation amount of hidden layer after the sub-network is overlapped.
In one possible implementation, reasoning module 1003, specifically for according between the multiple sub-network
Dependence determines the dependence between the corresponding inference engine of the multiple sub-network;In order to the multiple
The corresponding inference engine input data of sub-network, so that each inference engine is based on input data and corresponding reasoning example
Operation is carried out to its corresponding sub-network.
The embodiment of the present application also provides a kind of forward inference equipment of neural network, Figure 11 are please referred to, before showing this
To the structural schematic diagram of reasoning equipment, which may include: at least one processor 1101, at least one communication interface
1102, at least one processor 1103 and at least one communication bus 1104;
In the embodiment of the present application, the number of processor 1101, communication interface 1102, memory 1103, communication bus 1104
Amount be at least one, and processor 1101, communication interface 1102, memory 1103 by communication bus 1104 complete it is mutual
Communication;
Processor 1101 may be a central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road etc.;
Memory 1103 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-
Volatile memory) etc., a for example, at least magnetic disk storage;
Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:
Target nerve network is divided into multiple sub-networks, wherein any sub-network includes the target nerve network
At least one hidden layer;
The corresponding reasoning example of the multiple sub-network and inference engine are created on the hardware device of Inference Platform;
Based on the corresponding reasoning example of the multiple sub-network and inference engine, the target nerve network is carried out
Forward inference.
Optionally, the refinement function of described program and extension function can refer to above description.
The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor
Capable program, described program are used for:
Target nerve network is divided into multiple sub-networks, wherein any sub-network includes the target nerve network
At least one hidden layer;
The corresponding reasoning example of the multiple sub-network and inference engine are created on the hardware device of Inference Platform;
Based on the corresponding reasoning example of the multiple sub-network and inference engine, the target nerve network is carried out
Forward inference.
Optionally, the refinement function of described program and extension function can refer to above description.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (24)
1. a kind of forward inference method of neural network characterized by comprising
Target nerve network is divided into multiple sub-networks, wherein any sub-network include the target nerve network at least
One hidden layer;
The corresponding reasoning example of the multiple sub-network and inference engine are created on the hardware device of Inference Platform;
Based on the corresponding reasoning example of the multiple sub-network and inference engine, before being carried out to the target nerve network to
Reasoning.
2. the forward inference method of neural network according to claim 1, which is characterized in that described by target nerve network
It is divided into multiple sub-networks, comprising:
Obtain the hardware equipment information of the Inference Platform and the calculation amount of target nerve network and required memory space;
The calculation amount of hardware equipment information and the target nerve network based on the Inference Platform and required storage are empty
Between, the target nerve network is divided into multiple sub-networks.
3. the forward inference method of neural network according to claim 2, which is characterized in that the hardware of the Inference Platform
Facility information includes one of following information or a variety of:
Transmission belt between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device
It is wide.
4. the forward inference method of neural network according to claim 2 or 3, which is characterized in that obtain the target mind
Calculation amount and required memory space through network, comprising:
According to the network parameter of the target nerve network, the calculating figure of the target nerve network is constructed;
According to the calculating figure of the target nerve network, the calculation amount and required storage of each layer of target nerve network are determined
Space;
Calculation amount and required memory space by each layer of target nerve network determine the entire target nerve network
Calculation amount and required memory space.
5. the forward inference method of neural network according to claim 2 or 3, which is characterized in that described to be pushed away based on described
The calculation amount and required memory space of the hardware equipment information of platform and the target nerve network, by the target mind
Multiple sub-networks are divided into through network, comprising:
The calculation amount of hardware equipment information, the target nerve network based on the Inference Platform and required memory space with
And the parallel schema of user configuration, it determines and is suitble to the parallel schema of the target nerve network, wherein the parallel schema includes
Single device parallel schema and more equipment parallel schemas, under the single device parallel schema, the forward direction of the target nerve network
Reasoning is realized based on individual equipment, and under more equipment parallel schemas, the forward inference of the target nerve network is based on more
A equipment is realized;
Based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided into multiple sub-networks.
6. the forward inference method of neural network according to claim 5, which is characterized in that described flat based on the reasoning
The parallel mould of the hardware equipment information of platform, the calculation amount of the target nerve network and required memory space and user configuration
Formula determines the parallel schema for being suitble to the target nerve network, comprising:
If the computationally intensive computing capability in individual equipment of the entire target nerve network, and/or, the entire target mind
It is greater than the memory capacity of individual equipment through memory space needed for network, it is determined that be suitble to the parallel mould of the target nerve network
Formula is more equipment parallel schemas;
If the calculation amount of the entire target nerve network is less than or equal to the computing capability of the individual equipment, also, entire
Memory space needed for the target nerve network is less than or equal to the memory capacity of the individual equipment, then is based on the user
The parallel schema of configuration determines the parallel schema for being suitble to the target nerve network.
7. the forward inference method of neural network according to claim 6, which is characterized in that described to be matched based on the user
The parallel schema set determines the parallel schema for being suitble to the target nerve network, comprising:
When the parallel schema of the user configuration is the single device parallel schema, determination is suitble to the target nerve network
Parallel schema is the single device parallel schema;
When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than in advance
The maximum execution time of the sub-network of setting, it is determined that be suitble to the target nerve network parallel schema be the single device simultaneously
Row mode, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network, really
Surely the parallel schema for being suitble to the target nerve network is more equipment parallel schemas.
8. the forward inference method of neural network according to claim 5, which is characterized in that described to be based on being suitble to the mesh
The parallel schema for marking neural network, is divided into multiple sub-networks for the target nerve network, comprising:
If the parallel schema for being suitble to the target nerve network is more equipment parallel schemas, based on the hardware device
Number obtains the division number of sub-network, is divided based on the division number of the sub-network to the target nerve network;
If being suitble to the parallel schema of the target nerve network is the single device parallel schema, it is based on preset sub-network
Number is divided to divide the target nerve network.
9. the forward inference method of neural network according to claim 6, which is characterized in that described to be based on the sub-network
Division number the target nerve network is divided, comprising:
Division number based on the sub-network, the maximum amount of data of the theoretical amount being responsible for single device and equipment room transmission
For partitioning standards, the target nerve network is divided;
Wherein, the responsible theoretical amount of the single device passes through the entirely calculation amount of the target nerve network and the subnet
The division number of network is determining, the maximum execution time that the maximum amount of data of the equipment room transmission passes through pre-set sub-network
It is determined with the transmission bandwidth of equipment room.
10. the forward inference method of neural network according to claim 7, which is characterized in that described to be based on the subnet
The maximum amount of data of the division number of network, the theoretical amount being responsible for using single device and equipment room transmission is partitioning standards, to institute
Target nerve network is stated to be divided, comprising:
It is successively traversed backward since the input layer of the target nerve network: being sequentially overlapped the calculation amount of each hidden layer, and working as
When the theoretical amount that the calculation amount that preceding superposition obtains is responsible for close to the single device, the multiple adjacent hidden layers being overlapped are obtained
The sub-network of composition is as candidate sub networks network;
It, will be described if the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission
The sub-network that candidate sub networks network is obtained as division;If the number of output of the candidate sub networks network is passed greater than the equipment room
Defeated maximum amount of data then removes hidden layer from the candidate sub networks network one by one from the front to the back, until the sub-network after removing
Output data quantity is less than or equal to the maximum amount of data of equipment room transmission, and the sub-network after removing hidden layer is obtained as division
A sub-network;
Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again to the sub-network
The calculation amount of hidden layer afterwards is overlapped.
11. the forward inference method of neural network according to claim 1, which is characterized in that described based on the multiple
The corresponding reasoning example of sub-network and inference engine carry out forward inference to the target nerve network, comprising:
According to the dependence between the multiple sub-network, determine between the corresponding inference engine of the multiple sub-network
Dependence;
In order to the corresponding inference engine input data of the multiple sub-network, so that each inference engine is based on input
Data and corresponding reasoning example carry out operation to its corresponding sub-network.
12. a kind of forward inference device of neural network characterized by comprising network process module, example and engine creation
Module and reasoning module;
The network process module is divided into multiple sub-networks with by target nerve network, wherein any sub-network includes described
At least one hidden layer of target nerve network;
The example and engine creation module are right respectively for creating the multiple sub-network on the hardware device of Inference Platform
The reasoning example and inference engine answered;
The reasoning module, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to the mesh
It marks neural network and carries out forward inference.
13. forward inference device according to claim 12, which is characterized in that the network process module includes: information
Obtain module and sub-network division module;
The data obtaining module, for obtain the Inference Platform hardware equipment information and the target nerve network
Calculation amount and required memory space;
The sub-network division module, for based on the Inference Platform hardware equipment information and the target nerve network
Calculation amount and required memory space, the target nerve network is divided into multiple sub-networks.
14. the forward inference device of neural network according to claim 13, which is characterized in that the Inference Platform it is hard
Part facility information includes one of following information or a variety of:
Transmission belt between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device
It is wide.
15. the forward inference device of neural network described in 3 or 14 according to claim 1, which is characterized in that the acquisition of information
Module includes: that calculating figure building submodule and calculation amount and memory space determine submodule;
The calculating figure building submodule constructs the target nerve for the network parameter according to the target nerve network
The calculating figure of network;
The calculation amount and memory space determine submodule, for the calculating figure according to the target nerve network, determine described in
The calculation amount of each layer of target nerve network and required memory space, calculation amount and institute by each layer of target nerve network
The memory space needed determines the entirely calculation amount of the target nerve network and required memory space.
16. the forward inference method of neural network described in 3 or 14 according to claim 1, which is characterized in that the sub-network is drawn
Sub-module includes: that parallel schema determines that submodule and sub-network divide submodule;
The parallel schema determines submodule, for the hardware equipment information based on the Inference Platform, the target nerve net
The parallel schema of the calculation amount of network and required memory space and user configuration, determine be suitble to the target nerve network and
Row mode, wherein the parallel schema includes single device parallel schema and more equipment parallel schemas, in the parallel mould of the single device
Under formula, the forward inference of the target nerve network is realized based on individual equipment, under more equipment parallel schemas, the mesh
The forward inference for marking neural network is realized based on multiple equipment;
The sub-network divides submodule, for based on the parallel schema for being suitble to the target nerve network, by the target mind
Multiple sub-networks are divided into through network.
17. the forward inference device of neural network according to claim 16, which is characterized in that the parallel schema determines
Submodule includes: that the first determining submodule and second determine submodule;
Described first determines submodule, for the computationally intensive calculating energy in individual equipment when the entire target nerve network
Power, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determines and be suitble to
The parallel schema of the target nerve network is more equipment parallel schemas;
Described second determines submodule, for individually setting when the calculation amount of the entire target nerve network less than or equal to described
Standby computing capability, also, memory space needed for the entire target nerve network is less than or equal to the individual equipment
When memory capacity, based on the determining parallel schema for being suitble to the target nerve network of parallel schema of the user configuration.
18. the forward inference device of neural network according to claim 17, which is characterized in that described second determines submodule
Block is suitble to the target mind specifically for determining when the parallel schema of the user configuration is the single device parallel schema
Parallel schema through network is the single device parallel schema;When the parallel schema of the user configuration is that more equipment are parallel
When mode, if the transmission time of equipment room is greater than the maximum execution time of pre-set sub-network, it is determined that be suitble to the mesh
The parallel schema for marking neural network is the single device parallel schema, if the equipment room transmission time is less than or equal to described pre-
The maximum execution time for the sub-network being first arranged, it is determined that the parallel schema for being suitble to the target nerve network is more equipment
Parallel schema.
19. the forward inference device of neural network according to claim 16, which is characterized in that the sub-network divides son
Module includes: that the first division submodule and second divide submodule;
Described first divides submodule, is suitble to the parallel schema of the target nerve network to be more parallel moulds of equipment for working as
When formula, the number based on the hardware device obtains the division number of sub-network, and the division number pair based on the sub-network
The target nerve network is divided;
Described second divides submodule, is suitble to the parallel schema of the target nerve network to be the parallel mould of the single device for working as
When formula, number is divided based on preset sub-network, the target nerve network is divided.
20. the forward inference device of neural network according to claim 19, which is characterized in that described first divides submodule
Block, specifically for the division number based on the sub-network, the theoretical amount being responsible for single device and equipment room transmission are most
Big data quantity is partitioning standards, is divided to the target nerve network;
Wherein, the responsible theoretical amount of the single device passes through the entirely calculation amount of the target nerve network and the subnet
The division number of network is determining, the maximum execution time that the maximum amount of data of the equipment room transmission passes through pre-set sub-network
It is determined with the transmission bandwidth of equipment room.
21. the forward inference device of neural network according to claim 20, which is characterized in that described first divides submodule
Block, specifically for successively being traversed backward since the input layer of the target nerve network: it is sequentially overlapped the calculation amount of each hidden layer,
And when being currently superimposed the theoretical amount that obtained calculation amount is responsible for close to the single device, the multiple phases being overlapped are obtained
The sub-network of adjacent hidden layer composition is as candidate sub networks network;If the output data quantity of the candidate sub networks network is less than or equal to described set
The maximum amount of data transmitted between standby, the then sub-network obtained the candidate sub networks network as division;If the candidate son
The number of output of network is greater than the maximum amount of data of equipment room transmission, then from the candidate sub networks network from the front to the back one by one
Hidden layer is removed, until the output data quantity of the sub-network after removing is less than or equal to the maximum amount of data of equipment room transmission,
The sub-network that sub-network after removing hidden layer is obtained as division;Continuation traverses backward, until all sub-networks are obtained,
Wherein, after one sub-network of every acquisition, the calculation amount of the hidden layer after the sub-network is overlapped again.
22. the forward inference device of neural network according to claim 12, which is characterized in that the reasoning module, tool
Body is used for according to the dependence between the multiple sub-network, determine the corresponding inference engine of the multiple sub-network it
Between dependence;
In order to the corresponding inference engine input data of the multiple sub-network, so that each inference engine is based on input
Data and corresponding reasoning example carry out operation to its corresponding sub-network.
23. a kind of forward inference equipment of neural network characterized by comprising memory and processor;
The memory, for storing program;
The processor, for executing described program, before realizing the neural network as described in any one of claim 1~11
To each step of inference method.
24. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed
When device executes, each step of the forward inference method of the neural network as described in any one of claim 1~11 is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910188467.6A CN109919315B (en) | 2019-03-13 | 2019-03-13 | Forward reasoning method, device, equipment and storage medium of neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910188467.6A CN109919315B (en) | 2019-03-13 | 2019-03-13 | Forward reasoning method, device, equipment and storage medium of neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109919315A true CN109919315A (en) | 2019-06-21 |
CN109919315B CN109919315B (en) | 2021-10-01 |
Family
ID=66964550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910188467.6A Active CN109919315B (en) | 2019-03-13 | 2019-03-13 | Forward reasoning method, device, equipment and storage medium of neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109919315B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298437A (en) * | 2019-06-28 | 2019-10-01 | Oppo广东移动通信有限公司 | Separation calculation method, apparatus, storage medium and the mobile terminal of neural network |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN110837419A (en) * | 2019-11-08 | 2020-02-25 | 上海交通大学 | Inference engine system and method based on elastic batch processing and electronic equipment |
CN111372084A (en) * | 2020-02-18 | 2020-07-03 | 北京大学 | Parallel reasoning method and system for neural network coding and decoding tool |
CN111753950A (en) * | 2020-01-19 | 2020-10-09 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining forward time consumption |
WO2021134231A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳元戎启行科技有限公司 | Computing resource allocation method and apparatus based on inference engine, and computer device |
WO2021143883A1 (en) * | 2020-01-15 | 2021-07-22 | 华为技术有限公司 | Adaptive search method and apparatus for neural network |
CN113469360A (en) * | 2020-03-31 | 2021-10-01 | 杭州海康威视数字技术股份有限公司 | Inference method and device |
WO2022035058A1 (en) * | 2020-08-13 | 2022-02-17 | Samsung Electronics Co., Ltd. | Method and system of dnn modularization for optimal loading |
CN114501353A (en) * | 2020-10-23 | 2022-05-13 | 维沃移动通信有限公司 | Method for sending and receiving communication information and communication equipment |
CN115186821A (en) * | 2022-09-13 | 2022-10-14 | 之江实验室 | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment |
WO2022217419A1 (en) * | 2021-04-12 | 2022-10-20 | 深圳元戎启行科技有限公司 | Neural network model inference method and apparatus, computer device, and storage medium |
CN116629308A (en) * | 2023-07-24 | 2023-08-22 | 科大讯飞股份有限公司 | Neural network model reasoning method, device, equipment and storage medium |
CN116739090A (en) * | 2023-05-12 | 2023-09-12 | 北京大学 | Deep neural network reasoning measurement method and device based on Web browser |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0513976A2 (en) * | 1991-03-15 | 1992-11-19 | Sharp Kabushiki Kaisha | A video camera having an adaptive automatic iris control circuit |
CN1659589A (en) * | 2002-04-19 | 2005-08-24 | 电脑联合想象公司 | System and method for providing inferencing services |
CN102004486A (en) * | 2010-09-26 | 2011-04-06 | 中国石油化工股份有限公司 | Hybrid fault diagnosis method based on qualitative signed directed graph in petrochemical process |
CN107203807A (en) * | 2016-03-16 | 2017-09-26 | 中国科学院计算技术研究所 | The computational methods of neutral net, system and its apparatus |
CN107451653A (en) * | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
CN107659609A (en) * | 2017-07-26 | 2018-02-02 | 北京天云融创软件技术有限公司 | A kind of deep learning support platform and deep learning training method based on cloud computing |
CN107886167A (en) * | 2016-09-29 | 2018-04-06 | 北京中科寒武纪科技有限公司 | Neural network computing device and method |
CN107945053A (en) * | 2017-12-29 | 2018-04-20 | 广州思泰信息技术有限公司 | A kind of multiple source power distribution network data convergence analysis platform and its control method |
CN107977706A (en) * | 2017-08-09 | 2018-05-01 | 小蚁科技(香港)有限公司 | Modularized distribution type artificial neural network |
CN108292241A (en) * | 2015-10-28 | 2018-07-17 | 谷歌有限责任公司 | Processing calculates figure |
CN109299283A (en) * | 2018-08-29 | 2019-02-01 | 阿里巴巴集团控股有限公司 | A kind of data reasoning method, apparatus, server and the medium of knowledge based map |
-
2019
- 2019-03-13 CN CN201910188467.6A patent/CN109919315B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0513976A2 (en) * | 1991-03-15 | 1992-11-19 | Sharp Kabushiki Kaisha | A video camera having an adaptive automatic iris control circuit |
CN1659589A (en) * | 2002-04-19 | 2005-08-24 | 电脑联合想象公司 | System and method for providing inferencing services |
CN102004486A (en) * | 2010-09-26 | 2011-04-06 | 中国石油化工股份有限公司 | Hybrid fault diagnosis method based on qualitative signed directed graph in petrochemical process |
CN108292241A (en) * | 2015-10-28 | 2018-07-17 | 谷歌有限责任公司 | Processing calculates figure |
CN107203807A (en) * | 2016-03-16 | 2017-09-26 | 中国科学院计算技术研究所 | The computational methods of neutral net, system and its apparatus |
CN107886167A (en) * | 2016-09-29 | 2018-04-06 | 北京中科寒武纪科技有限公司 | Neural network computing device and method |
CN107451653A (en) * | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
CN107659609A (en) * | 2017-07-26 | 2018-02-02 | 北京天云融创软件技术有限公司 | A kind of deep learning support platform and deep learning training method based on cloud computing |
CN107977706A (en) * | 2017-08-09 | 2018-05-01 | 小蚁科技(香港)有限公司 | Modularized distribution type artificial neural network |
CN107945053A (en) * | 2017-12-29 | 2018-04-20 | 广州思泰信息技术有限公司 | A kind of multiple source power distribution network data convergence analysis platform and its control method |
CN109299283A (en) * | 2018-08-29 | 2019-02-01 | 阿里巴巴集团控股有限公司 | A kind of data reasoning method, apparatus, server and the medium of knowledge based map |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298437B (en) * | 2019-06-28 | 2021-06-01 | Oppo广东移动通信有限公司 | Neural network segmentation calculation method and device, storage medium and mobile terminal |
CN110298437A (en) * | 2019-06-28 | 2019-10-01 | Oppo广东移动通信有限公司 | Separation calculation method, apparatus, storage medium and the mobile terminal of neural network |
CN110796242A (en) * | 2019-11-01 | 2020-02-14 | 广东三维家信息科技有限公司 | Neural network model reasoning method and device, electronic equipment and readable medium |
CN110837419B (en) * | 2019-11-08 | 2023-05-19 | 上海交通大学 | Reasoning engine system and method based on elastic batch processing and electronic equipment |
CN110837419A (en) * | 2019-11-08 | 2020-02-25 | 上海交通大学 | Inference engine system and method based on elastic batch processing and electronic equipment |
WO2021134231A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳元戎启行科技有限公司 | Computing resource allocation method and apparatus based on inference engine, and computer device |
CN113412493A (en) * | 2019-12-30 | 2021-09-17 | 深圳元戎启行科技有限公司 | Inference engine-based computing resource allocation method and device and computer equipment |
WO2021143883A1 (en) * | 2020-01-15 | 2021-07-22 | 华为技术有限公司 | Adaptive search method and apparatus for neural network |
CN111753950A (en) * | 2020-01-19 | 2020-10-09 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining forward time consumption |
CN111753950B (en) * | 2020-01-19 | 2024-02-27 | 杭州海康威视数字技术股份有限公司 | Forward time consumption determination method, device and equipment |
CN111372084A (en) * | 2020-02-18 | 2020-07-03 | 北京大学 | Parallel reasoning method and system for neural network coding and decoding tool |
CN111372084B (en) * | 2020-02-18 | 2021-07-20 | 北京大学 | Parallel reasoning method and system for neural network coding and decoding tool |
CN113469360A (en) * | 2020-03-31 | 2021-10-01 | 杭州海康威视数字技术股份有限公司 | Inference method and device |
CN113469360B (en) * | 2020-03-31 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | Reasoning method and device |
WO2022035058A1 (en) * | 2020-08-13 | 2022-02-17 | Samsung Electronics Co., Ltd. | Method and system of dnn modularization for optimal loading |
CN114501353A (en) * | 2020-10-23 | 2022-05-13 | 维沃移动通信有限公司 | Method for sending and receiving communication information and communication equipment |
CN114501353B (en) * | 2020-10-23 | 2024-01-05 | 维沃移动通信有限公司 | Communication information sending and receiving method and communication equipment |
WO2022217419A1 (en) * | 2021-04-12 | 2022-10-20 | 深圳元戎启行科技有限公司 | Neural network model inference method and apparatus, computer device, and storage medium |
CN115186821A (en) * | 2022-09-13 | 2022-10-14 | 之江实验室 | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment |
CN115186821B (en) * | 2022-09-13 | 2023-01-06 | 之江实验室 | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment |
CN116739090A (en) * | 2023-05-12 | 2023-09-12 | 北京大学 | Deep neural network reasoning measurement method and device based on Web browser |
CN116739090B (en) * | 2023-05-12 | 2023-11-28 | 北京大学 | Deep neural network reasoning measurement method and device based on Web browser |
CN116629308A (en) * | 2023-07-24 | 2023-08-22 | 科大讯飞股份有限公司 | Neural network model reasoning method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109919315B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109919315A (en) | A kind of forward inference method, apparatus, equipment and the storage medium of neural network | |
US20190079975A1 (en) | Scheduling method and system based on hybrid variable neighborhood search and gravitational search algorithm | |
US20190080271A1 (en) | Coordinated Production and Transportation Scheduling Method and System Based on Improved Tabu Search Algorithm | |
CN110348571A (en) | A kind of neural network model training method, device, chip and system | |
CN108122027A (en) | A kind of training method of neural network model, device and chip | |
CN108776897A (en) | Data processing method, device, server and computer readable storage medium | |
CN108491255B (en) | Self-service MapReduce data optimal distribution method and system | |
CN107077390A (en) | A kind of task processing method and network interface card | |
CN110187965A (en) | The running optimizatin and data processing method of neural network, equipment and storage medium | |
CN108780524A (en) | Arithmetic unit, circuit and correlation technique for neural network | |
CN103914556A (en) | Large-scale graph data processing method | |
CN107357630A (en) | A kind of method, apparatus and storage medium for realizing that virtual machine is synchronous | |
CN103873380B (en) | A kind of method of adjustment of data distribution strategy, apparatus and system | |
CN105227601A (en) | Data processing method in stream processing system, device and system | |
CN106502918A (en) | A kind of scheduling memory method and device | |
CN108491924B (en) | Neural network data serial flow processing device for artificial intelligence calculation | |
CN109933430A (en) | The method and apparatus for distributing graphics processor | |
CN105051689A (en) | Method, apparatus and system for scheduling resource pool in multi-core system | |
CN113094180B (en) | Wireless federal learning scheduling optimization method and device | |
CN112862083B (en) | Deep neural network inference method and device in edge environment | |
CN103842955B (en) | A kind of job flow control method, device and system | |
CN105335135B (en) | Data processing method and central node | |
CN105740249A (en) | Processing method and system during big data operation parallel scheduling process | |
CN107844924A (en) | A kind of execution method, apparatus and medium for controlling workflow | |
CN116911366A (en) | Computing system neural network optimization method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |