CN109919315A

CN109919315A - A kind of forward inference method, apparatus, equipment and the storage medium of neural network

Info

Publication number: CN109919315A
Application number: CN201910188467.6A
Authority: CN
Inventors: 刘凯; 吕亚飞; 张致江; 李必然; 刘远东
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2019-06-21
Anticipated expiration: 2039-03-13
Also published as: CN109919315B

Abstract

This application provides forward inference method, apparatus, equipment and the storage mediums of a kind of neural network, wherein, method includes: that target nerve network is divided into multiple sub-networks, any sub-network includes at least one hidden layer of target nerve network, the corresponding reasoning example of multiple sub-networks and inference engine are created on the hardware device of Inference Platform, based on the corresponding reasoning example of multiple sub-networks and inference engine, forward inference is carried out to target nerve network.Due to a part of hidden layer of an inference engine responsible nerve network, synchronization can have multiple data inputs to execute parallel in different inference engines, therefore, forward inference method provided by the present application Reasoning Efficiency with higher and data throughout, and the hardware resource of Inference Platform is fully used.

Description

A kind of forward inference method, apparatus, equipment and the storage medium of neural network

Technical field

This application involves parallel computing field, more specifically to a kind of neural network forward inference method, Device, equipment and storage medium.

Background technique

The forward inference of neural network refer to for create on Inference Platform to reasoning neural network reasoning example with Inference engine, input data and reasoning example of the inference engine based on neural network input layer transport each layer of neural network It calculates.

Current inference schemes are as follows: create a reasoning example for reasoning neural network, and in the reasoning example Create an inference engine, inference engine receives input data, based on reasoning example to each layer of entire neural network in order Carry out operation, that is, the operation strict sequential of an input data on the different layers, also, also strict sequential between different inputs, I.e. latter input data could operation after the output result of previous input data must be waited to obtain.

By above-mentioned existing inference schemes it is found that with the neural network number of plies intensification, complete a data and be input to The calculating time of output can increasingly be grown, and whole handling capacity is smaller and smaller.Meanwhile with the continuous development of chip technology, respectively The computing capability that kind is suitable for the hardware device of neural network is greatly improved, and existing inference schemes set hardware Standby utilization rate is very low, serious waste hardware resource.

Summary of the invention

In view of this, this application provides forward inference method, apparatus, equipment and the readable storage mediums of a kind of neural network Matter, to solve existing inference schemes, time-consuming, low efficiency, and the problem that hardware resource utilization is low, and technical solution is such as Under:

A kind of forward inference method of neural network, comprising:

Target nerve network is divided into multiple sub-networks, wherein any sub-network includes the target nerve network At least one hidden layer；

The corresponding reasoning example of the multiple sub-network and inference engine are created on the hardware device of Inference Platform；

Based on the corresponding reasoning example of the multiple sub-network and inference engine, the target nerve network is carried out Forward inference.

It is optionally, described that target nerve network is divided into multiple sub-networks, comprising:

It obtains the hardware equipment information of the Inference Platform and the calculation amount of target nerve network and required storage is empty Between；

It the calculation amount of hardware equipment information and the target nerve network based on the Inference Platform and required deposits Space is stored up, the target nerve network is divided into multiple sub-networks.

Wherein, the hardware equipment information of the Inference Platform includes one of following information or a variety of:

Transmission between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device Bandwidth.

Optionally, obtain the target nerve network calculation amount and required memory space, comprising:

According to the network parameter of the target nerve network, the calculating figure of the target nerve network is constructed；

According to the calculating figure of the target nerve network, the calculation amount of each layer of target nerve network and required is determined Memory space；

Calculation amount and required memory space by each layer of target nerve network determine the entire target nerve The calculation amount of network and required memory space.

Optionally, the hardware equipment information based on the Inference Platform and the calculation amount of the target nerve network With required memory space, the target nerve network is divided into multiple sub-networks, comprising:

The calculation amount of hardware equipment information, the target nerve network based on the Inference Platform and required storage are empty Between and user configuration parallel schema, determine and be suitble to the parallel schema of the target nerve network, wherein the parallel schema Including single device parallel schema and more equipment parallel schemas, under the single device parallel schema, the target nerve network Forward inference is realized based on individual equipment, under more equipment parallel schemas, the forward inference base of the target nerve network It is realized in multiple equipment；

Based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided into multiple subnets Network.

Optionally, the hardware equipment information based on the Inference Platform, the calculation amount of the target nerve network and The parallel schema of required memory space and user configuration determines the parallel schema for being suitble to the target nerve network, comprising:

If the computationally intensive computing capability in individual equipment of the entire target nerve network, and/or, the entire mesh Memory space needed for marking neural network is greater than the memory capacity of individual equipment, it is determined that be suitble to the target nerve network and Row mode is more equipment parallel schemas；

If the calculation amount of the entire target nerve network is less than or equal to the computing capability of the individual equipment, also, Memory space needed for the entire target nerve network is less than or equal to the memory capacity of the individual equipment, then based on described The parallel schema of user configuration determines the parallel schema for being suitble to the target nerve network.

Optionally, the parallel schema based on the user configuration determines the parallel mould for being suitble to the target nerve network Formula, comprising:

When the parallel schema of the user configuration is the single device parallel schema, determination is suitble to the target nerve net The parallel schema of network is the single device parallel schema；

When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to the target nerve network is described sets up Standby parallel schema, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network, It then determines and the parallel schema of the target nerve network is suitble to be more equipment parallel schemas.

Optionally, described based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided For multiple sub-networks, comprising:

If being suitble to the parallel schema of the target nerve network is more equipment parallel schemas, set based on the hardware Standby number obtains the division number of sub-network, is drawn based on the division number of the sub-network to the target nerve network Point；

If being suitble to the parallel schema of the target nerve network is the single device parallel schema, it is based on preset son Network divides number and divides to the target nerve network.

It is optionally, described that the target nerve network is divided based on the division number of the sub-network, comprising:

Division number based on the sub-network, the maximum number of the theoretical amount being responsible for single device and equipment room transmission It is partitioning standards according to amount, the target nerve network is divided；

Wherein, the responsible theoretical amount of the single device passes through the calculation amount of the entire target nerve network and described The division number of sub-network determines that the maximum amount of data of the equipment room transmission is executed by the maximum of pre-set sub-network The transmission bandwidth of time and equipment room determines.

Optionally, the division number based on the sub-network, the theoretical amount and equipment room being responsible for single device The maximum amount of data of transmission is partitioning standards, is divided to the target nerve network, comprising:

It is successively traversed backward since the input layer of the target nerve network: being sequentially overlapped the calculation amount of each hidden layer, and Be currently superimposed obtained calculation amount close to the single device be responsible for theoretical amount when, obtain be overlapped it is multiple adjacent The sub-network of hidden layer composition is as candidate sub networks network；

It, will if the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission The sub-network that the candidate sub networks network is obtained as division；If the number of output of the candidate sub networks network is greater than the equipment Between the maximum amount of data that transmits, then hidden layer is removed one by one from the front to the back from the candidate sub networks network, until the subnet after removing The output data quantity of network is less than or equal to the maximum amount of data of equipment room transmission, and the sub-network after removing hidden layer is as division An obtained sub-network；

Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again to the son The calculation amount of hidden layer after network is overlapped.

Optionally, described to be based on the corresponding reasoning example of the multiple sub-network and inference engine, to the target Neural network carries out forward inference, comprising:

According to the dependence between the multiple sub-network, the corresponding inference engine of the multiple sub-network is determined Between dependence；

In order to the corresponding inference engine input data of the multiple sub-network, so that each inference engine is based on Input data and corresponding reasoning example carry out operation to its corresponding sub-network.

A kind of forward inference device of neural network, comprising: network process module, example and engine creation module and reasoning Module；

The network process module, for the target nerve network to be divided into multiple sub-networks, wherein any subnet Network includes at least one hidden layer of the target nerve network；

The example and engine creation module, for creating the multiple sub-network point on the hardware device of Inference Platform Not corresponding reasoning example and inference engine；

The reasoning module, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to institute It states target nerve network and carries out forward inference.

Optionally, the network process module includes: data obtaining module and sub-network division module；

The data obtaining module, obtain the Inference Platform hardware equipment information and the target nerve network Calculation amount and required memory space；

The sub-network division module, for based on the Inference Platform hardware equipment information and the target nerve The target nerve network is divided into multiple sub-networks by the calculation amount of network and required memory space.

Optionally, the data obtaining module includes: that calculating figure building submodule and calculation amount and memory space determine Submodule；

The calculating figure building submodule constructs the target for the network parameter according to the target nerve network The calculating figure of neural network；

The calculation amount and memory space determine submodule, for the calculating figure according to the target nerve network, determine The calculation amount of each layer of target nerve network and required memory space pass through the calculation amount of each layer of target nerve network With required memory space, the entirely calculation amount of the target nerve network and required memory space are determined.

Optionally, the sub-network division module includes: that parallel schema determines that submodule and sub-network divide submodule；

The parallel schema determines submodule, for the hardware equipment information based on the Inference Platform, target mind The parallel schema of calculation amount and required memory space and user configuration through network, determination are suitble to the target nerve network Parallel schema, wherein the parallel schema includes single device parallel schema and more equipment parallel schemas, the single device simultaneously Under row mode, the forward inference of the target nerve network is realized based on individual equipment, under more equipment parallel schemas, institute The forward inference for stating target nerve network is realized based on multiple equipment；

The sub-network divides submodule, for based on the parallel schema for being suitble to the target nerve network, by the mesh Mark neural network is divided into multiple sub-networks.

Optionally, the parallel schema determines that submodule includes: that the first determining submodule and second determine submodule；

Described first determines submodule, by working as the computationally intensive based on individual equipment of the entire target nerve network Calculation ability, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determine The parallel schema for being suitble to the target nerve network is more equipment parallel schemas；

Described second determines submodule, for being less than or equal to the list when the calculation amount of the entire target nerve network The computing capability of a equipment, also, memory space needed for the entire target nerve network is individually set less than or equal to described When standby memory capacity, determines based on the parallel schema of the user configuration and be suitble to the parallel schema of the target nerve network.

Optionally, it described second determines submodule, is described set up specifically for the parallel schema when the user configuration When standby parallel schema, determines and the parallel schema of the target nerve network is suitble to be the single device parallel schema；When the use When the parallel schema of family configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than pre-set sub-network Maximum execution time, it is determined that be suitble to the target nerve network parallel schema be the single device parallel schema, if institute State the maximum execution time that equipment room transmission time is less than or equal to the pre-set sub-network, it is determined that be suitble to the mesh The parallel schema for marking neural network is more equipment parallel schemas.

Optionally, it includes: that the first division submodule and second divide submodule that the sub-network, which divides submodule,；

It is described first divide submodule, for when be suitble to the target nerve network parallel schema be more equipment simultaneously When row mode, the number based on the hardware device obtains the division number of sub-network, and the division based on the sub-network It is several that the target nerve network is divided；

It is described second divide submodule, for when be suitble to the target nerve network parallel schema be the single device simultaneously When row mode, number is divided based on preset sub-network, the target nerve network is divided.

Optionally, described first submodule is divided, it is negative with single device specifically for the division number based on the sub-network The maximum amount of data of theoretical amount and the equipment room transmission of duty is partitioning standards, is divided to the target nerve network；

Optionally, it is described first divide submodule, specifically for since the input layer of the target nerve network successively It traverses backward: being sequentially overlapped the calculation amount of each hidden layer, and be currently superimposed what obtained calculation amount was responsible for close to the single device When theoretical amount, the sub-network for the multiple adjacent hidden layer compositions being overlapped is obtained as candidate sub networks network；If the candidate The output data quantity of sub-network be less than or equal to the equipment room transmission maximum amount of data, then using the candidate sub networks network as Divide an obtained sub-network；If the number of output of the candidate sub networks network is greater than the maximum data of equipment room transmission Amount, then remove hidden layer from the candidate sub networks network one by one from the front to the back, until the output data quantity of the sub-network after removing is small In or equal to equipment room transmission maximum amount of data, sub-network after removing hidden layer is as dividing an obtained subnet Network；Continuation traverses backward, until obtaining all sub-network, wherein after one sub-network of every acquisition, again to the sub-network after The calculation amount of hidden layer be overlapped.

Optionally, the reasoning module, described in determining according to the dependence between the multiple sub-network Dependence between the corresponding inference engine of multiple sub-networks；It is pushed away in order to the multiple sub-network is corresponding Manage engine input data so that each inference engine be based on input data and corresponding reasoning example to its corresponding sub-network into Row operation.

A kind of forward inference equipment of neural network, comprising: memory and processor；

The memory, for storing program；

The processor realizes each step of the forward inference method of the neural network for executing described program.

A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor Each step of the forward inference method of the existing neural network.

It can be seen from the above technical scheme that the forward inference method of neural network provided by the present application, first by mesh Mark neural network is divided into multiple sub-networks, is then that reasoning example and reasoning is respectively created in multiple sub-networks on Inference Platform Engine is finally based on the corresponding reasoning example of multiple sub-networks and inference engine, to pushing away before carrying out to target nerve network Reason, due to inference engine have it is multiple, and an inference engine only be responsible for target nerve network a part of hidden layer, this makes same Moment can have multiple data to be input to the operation for executing corresponding sub-network in different inference engines parallel, with existing reasoning Scheme is compared, since synchronization has multiple inference engines to be based on multiple input datas while carrying out operation, hardware resource It is fully used, that is, improves the utilization rate of hardware resource, meanwhile, Reasoning Efficiency is improved, data throughout is improved, and Under the premise of storage resource is constant, memory space is saved.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the schematic diagram of the forward inference process comprising a reasoning example；

Fig. 2 is the schematic diagram of the forward inference process provided by the embodiments of the present application comprising multiple reasoning examples；

Fig. 3 is the flow diagram of the forward inference method of neural network provided by the embodiments of the present application；

Fig. 4 is the process of the calculation amount provided by the embodiments of the present application for obtaining target nerve network and required memory space Schematic diagram；

Fig. 5 is provided by the embodiments of the present application based on the hardware equipment information of Inference Platform and target nerve network Target nerve network, is divided into the flow diagram of multiple sub-networks by calculation amount and required memory space；

Fig. 6 is the calculating of the hardware equipment information based on Inference Platform, target nerve network provided by the embodiments of the present application The parallel schema of amount and required memory space and user configuration determines one kind of the parallel schema of suitable target nerve network The flow diagram of optional specific implementation；

Fig. 7 is the exemplary schematic diagram provided by the embodiments of the present application that sub-network division is carried out to neural network；

Fig. 8 is the schematic diagram that inference engine is created under more equipment parallel schemas provided by the embodiments of the present application；

Fig. 9 is an exemplary schematic diagram of the reasoning process of neural network provided by the embodiments of the present application；

The structure chart of the forward inference device of Figure 10 neural network provided by the embodiments of the present application；

The structure chart of the forward inference equipment of Figure 11 neural network provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.

Inventor has found during realizing the invention: existing inference schemes are created for entire neural network One reasoning example and an inference engine, i.e., existing inference schemes are based on a reasoning example and an inference engine is realized The reasoning of entire neural network, specific:

Firstly, calculating figure according to the parameter building to reasoning neural network, then determine according to calculating figure to reasoning nerve Network it is whole execute sequence, then, based on the execution to reasoning neural network entirety sequentially with reasoning example by different hidden layers Execution function be sequentially sent to inference engine and execute that queue etc. is pending, and after input data arrives, inference engine is according in it The sequence that portion executes execution function in queue carries out the operation of each hidden layer, that is, serially holds between entire neural network difference hidden layer Row, also, inference engine can just receive next only after all having executed the operation of each hidden layer for an input data Input data, that is to say, that in existing inference schemes, the execution inner queue of the forward inference that individually inputs in inference engine Strict sequential, also, also strict sequential between different input datas.

As shown in Figure 1, creating a reasoning example, input data X in Inference Platform for the neural network for having N number of hidden layer Into after network, operation sequentially Jing Guo N number of hidden layer finally obtains output Y, the execution time T=T of whole network₁+T₂+…+ T_N, wherein T_iThe execution time for indicating i-th of hidden layer, for deep neural network, hidden layer number is more, a forward inference institute The time needed is longer, and any moment only one hidden layer is executing, and other layers all idle, this causes to carry out based on single example When forward inference, limited throughput is calculated.Wherein, handling capacity refers to the number of the input data handled in the unit time.

With the continuous development of chip technology, the computing capability of the various hardware devices suitable for deep learning obtains pole Big to be promoted, by taking the tall and handsome GPU (Graphics Processing Unit, graphics processor) reached as an example, M40 single precision calculates energy Power reaches 7Tflops (flops i.e. per second reaches 7T), and P40 reaches 12Tflops, and (flops i.e. per second reaches Reach 15Tflops (flops i.e. per second reaches 15T) to 12T), V100, and newly added TensorCore theory is most It is high up to 120Tflops, in existing forward inference scheme, single inference engine only carries out the fortune of single hidden layer in synchronization It calculates, and the calculation amount of single hidden layer is difficult to keep GPU fully loaded, i.e., existing forward inference scheme makes the utilization rate of hardware device very It is low, serious waste hardware resource.

In order to improve the speed of forward inference, the utilization rate of hardware device is improved, inventor has made intensive studies:

Originally thinking is: in order to make full use of hardware computing resource, creating multiple reasoning examples, a reasoning example is negative An input data is blamed, operation is carried out to whole network based on the input data, when move ahead reasoning, opens multiple push away Reason example makes inferences simultaneously, makes inferences as shown in Fig. 2, opening 4 reasoning examples simultaneously.

Inventor it has been investigated that: although above-mentioned thinking can play the powerful calculating energy of hardware device to a certain extent Power, but be still that serial relationship (as shown in Fig. 2, being Serial Relation inside 0~example of example 4, works as reality inside each example After input data X0 in example 0 enters network, operation sequentially Jing Guo N number of hidden layer finally obtains output Y0, obtains output Y0 Afterwards, input data X4 just can enter the example and carry out operation), do not improved, and each reasoning example needs are opened A storage resource is warded off, when there are many number of plies of neural network, the storage demand of whole network is likely to be breached the rank of GB, storage The increase of demand will lead to hardware price raising, and then the cost of forward inference is caused to improve.

In view of the above problems, inventor's further progress further investigation, finally proposes a kind of effect preferable Forward inference scheme.It is situated between followed by forward inference method of following embodiments to neural network provided by the present application It continues.

Referring to Fig. 3, the flow diagram of the forward inference method of neural network provided by the embodiments of the present application is shown, This method may include:

Step S301: target nerve network is divided into multiple sub-networks.

Wherein, target nerve network is the neural network to reasoning, it is to be understood that target nerve network generally comprises Multiple hidden layers execute operation between each hidden layer in order, and for multiple sub-networks that division obtains, each sub-network can To include a hidden layer, it also may include multiple continuous adjacent hidden layers, there is successive dependence between multiple sub-networks.

Specifically, may include: by the process that target nerve network is divided into multiple sub-networks

Step S3011, the hardware equipment information and the calculation amount of target nerve network and required of Inference Platform are obtained Memory space.

Wherein, Inference Platform can with but be not limited to GPU server, TPU (Tensor Processing Unit, tensor Processing unit) server etc., the hardware device of Inference Platform can be the equipment with storage capacity and computing capability, such as aobvious Card.

Wherein, hardware equipment information may include the number of hardware device, the computing capability of hardware device, hardware device One of transmission bandwidth between memory capacity, hardware device is a variety of, preferably simultaneously includes above-mentioned four kinds of information.In one kind Possible implementation can call intrinsic function to obtain the hardware equipment information of Inference Platform when starting Framework for Reasoning.

Illustratively, Inference Platform is GPU (Graphics Processing Unit, the graphics process of 4 card P40 video cards Device) server, cuda function interface can be called to obtain hardware equipment information as follows: hardware device number is 4, the meter of hardware device Calculation ability is 6.2, and tabling look-up obtains single precision 12Tflops flops i.e. per second of handling up and reach 12T, each hardware device Memory capacity be 24G, bandwidth is that 10G/s nvlink bandwidth reaches 100G/s between PCIE interface equipment.

Wherein, the calculation amount of target nerve network refers to the calculation amount of entire target nerve network, can pass through target The calculation amount of each hidden layer of neural network determines that memory space needed for target nerve network is referred to each of entire neural network Total memory space needed for hidden layer carries out operation can be determined by memory space needed for each hidden layer of target nerve network.It obtains The detailed process of the calculation amount and required memory space that take target nerve network can be found in the explanation of subsequent embodiment.

Step S3012, the calculation amount of the hardware equipment information based on Inference Platform and target nerve network and required Target nerve network is divided into multiple sub-networks by memory space.

It should be noted that the hardware equipment information of Inference Platform and the calculation amount of target nerve network and required depositing Storage space determine target nerve network division number and the division number based on sub-network to target nerve network into The hidden layer that each sub-network is included when row divides is based on this, and the present embodiment is by the hardware equipment information and mesh of Inference Platform The calculation amount for marking neural network and required memory space are as the partitioning standards for carrying out sub-network division to target nerve network.

Step S302: the corresponding reasoning example of multiple sub-networks and reasoning are created on the hardware device of Inference Platform Engine.

Specifically, after target nerve network being divided into multiple sub-networks, need for each sub-network create one push away Manage example and an inference engine, wherein reasoning example is responsible for the operation of each hidden layer in its corresponding sub-network, and reasoning is drawn It holds up and is responsible for receiving input data, the operation of corresponding sub-network is completed based on input data and corresponding reasoning example.

Step S303: be based on the corresponding reasoning example of multiple sub-networks and inference engine, to target nerve network into Row forward inference.

Since target nerve network is divided into multiple sub-networks, the corresponding inference engine of each sub-network and reasoning example, Therefore, an inference engine is only responsible for a sub-network (i.e. part hidden layer), this allows synchronization to have multiple input numbers According to multiple and different inference engines is input to, i.e. synchronization has multiple inference engines to be based on input data and corresponding reasoning reality Example carries out concurrent operation.

Target nerve network is divided into multiple by the forward inference method of neural network provided by the embodiments of the present application first Then reasoning example and inference engine is respectively created for multiple sub-networks in sub-network, finally respectively corresponded based on multiple sub-networks Reasoning example and inference engine, forward inference is carried out to target nerve network, since inference engine has multiple, an and reasoning Engine is only responsible for a part of hidden layer of target nerve network, this allows synchronization to have multiple data to be input to different push away The operation for executing corresponding sub-network in engine parallel is managed, compared with existing inference schemes, since synchronization there are multiple reasonings Engine is based on multiple input datas and carries out operation simultaneously, and therefore, hardware resource is fully used, that is, improves hardware resource Utilization rate, meanwhile, Reasoning Efficiency is improved, data throughout is improved, and under the premise of storage resource is constant, saves and deposits Store up space.

Below in above-mentioned steps S3011 obtain target nerve network calculation amount and required memory space process into Row explanation.

Referring to Fig. 4, the flow diagram of the calculation amount and required memory space that obtain target nerve network is shown, May include:

Step S401: according to the network parameter of target nerve network, the calculating figure of target nerve network is constructed.

Wherein, target nerve network includes that input layer, multiple hidden layers and output layer, input data are inputted by input layer, according to The secondary operation (input that the output of previous hidden layer is the latter hidden layer) by each hidden layer, the final operation knot of each hidden layer Fruit is exported by output layer, and in the present embodiment, the network parameter of target nerve network may include of target nerve network hidden layer Number, the connection relationship of each hidden layer, Inport And Outport Node serial number of several, each hidden neuron etc., these network parameters are anti- The complexity for having reflected target nerve network, it is related with the calculation amount of target nerve network and required memory space.

Optionally, the present embodiment can network parameter and preset Depth Priority Algorithm wound based on target nerve network Build calculating figure.Wherein, the calculating figure of target nerve network is the figure for being able to reflect out the calculating process of target nerve network, packet Node and side are included, while representing the operation that each hidden layer executes function, node on behalf executes the input of function.

Step S402: according to the calculating figure of target nerve network, the calculation amount of each layer of target nerve network and required is determined Memory space.

After the calculating figure for obtaining target nerve network, calculating figure is traversed to obtain the calculation amount of each hidden layer and required Memory space.In order to obtain each hidden layer calculation amount and required memory space, can in advance for each hidden layer be arranged calculation amount Calculating function and required memory space calculating function, and it is associated with or is bound with hidden layer, it should be noted that hidden layer There are many types, such as convolutional layer, pond layer, full articulamentum etc., needs that different calculating letters is arranged for different types of hidden layer Number.

Optionally, multiply-add number required for completing a hidden layer can be used to indicate the calculating of a hidden layer in the present embodiment Amount.Illustratively, input dimension is r*k, the calculation amount for the full articulamentum that neuron number is n is r*k*n*2, required storage Space is r*k+k*n+r*n；Input dimension be v*c*h*w, convolution kernel k_h*k_w, step-length s_h*s_w, output channel number be f The calculation amount of convolutional layer is about (v*c*h*w*k_h*k_w*f*2)/(s_h*s_w), required memory space is about v*c*h*w+f*c*k_h* k_w。

Step S403: calculation amount and required memory space by each layer of target nerve network determine entire target mind Calculation amount and required memory space through network.

It is after the calculation amount and required memory space for obtaining each hidden layer of target nerve network, target nerve network is each hidden The calculation amount of layer is cumulative, obtains the calculation amount of entire target nerve network；Likewise, by needed for each hidden layer of target nerve network Memory space is cumulative, memory space needed for obtaining entire target nerve network.

In the calculation amount and required storage of the hardware equipment information and entire target nerve network for obtaining Inference Platform Behind space, using these information as foundation, the division of sub-network is carried out to target nerve network.

Below to " the step S3012: hardware equipment information and target nerve based on Inference Platform in above-described embodiment Target nerve network is divided into multiple sub-networks by the calculation amount of network and required memory space " realization process be situated between It continues, referring to Fig. 5, showing the flow diagram of the realization process, may include:

Step S501: the calculation amount and required storage of hardware equipment information, target nerve network based on Inference Platform The parallel schema of space and user configuration determines the parallel schema of suitable target nerve network.

Wherein, parallel schema includes single device parallel schema and more equipment parallel schemas, whole under single device parallel schema The forward inference process of a target nerve network is realized based on individual equipment, under more equipment parallel schemas, entire target nerve The forward inference process of network is based on multiple equipment, and (multiple equipment can be all hardware equipment on Inference Platform, can also be with For fractional hardware equipment) it realizes.

It should be noted that the parallel schema of user configuration may be the parallel schema for being suitble to target nerve network, it can also It can be the parallel schema of unsuitable target nerve network, for example, the parallel schema of user configuration possibly can not support target nerve Calculation amount needed for network, or memory space needed for can not supporting target nerve network, in view of this, it is thus necessary to determine that go out true Just it is being suitble to the parallel schema of target nerve network, when determining the parallel schema of suitable target nerve network, should considering that hardware is set Standby computing capability and storage capacity considers the operation demand of target nerve network, it is also contemplated that user configuration is parallel again Mode.

Step S502: target nerve network is divided into multiple subnets by the parallel schema based on suitable target nerve network Network.

After the parallel schema for determining to be suitble to target nerve network, the parallel mould of target nerve network can be suitble to based on this Formula determines that sub-network divides number, divides number based on the sub-network determined and divides to target nerve network.

Below first to above-mentioned " step S501: the calculation amount of hardware equipment information, target nerve network based on Inference Platform With the parallel schema of required memory space and user configuration, determine the parallel schema of suitable target nerve network " realization Process is introduced.

The calculation amount of hardware equipment information, target nerve network based on Inference Platform and required memory space and use The parallel schema of family configuration, if the process for determining the parallel schema of suitable target nerve network may include: entire target nerve The computationally intensive computing capability in individual equipment of network, and/or, memory space needed for entire target nerve network is greater than single The memory capacity of a equipment, it is determined that the parallel schema for being suitble to target nerve network is more equipment parallel schemas；If entire target The calculation amount of neural network is less than or equal to the computing capability of individual equipment, also, storage needed for entire target nerve network Space is less than or equal to the memory capacity of individual equipment, then determines suitable target nerve network based on the parallel schema of user configuration Parallel schema.

It should be noted that the computationally intensive computing capability in individual equipment of entire target nerve network, alternatively, entirely Memory space needed for target nerve network is greater than the memory capacity of individual equipment, shows that individual equipment is unable to satisfy target nerve Network operations demand, no matter which kind of parallel schema the parallel schema of user configuration is, requires to keep final parallel schema more Equipment parallel schema, that is to say, that if the parallel schema of user setting is single device parallel schema, need single device is parallel Mode adjustment be more equipment parallel schemas, if the parallel schema of user setting be more equipment parallel schemas, keep more equipment simultaneously Row mode is constant.

It should be noted that if the calculation amount of entire target nerve network is less than or equal to the computing capability of individual equipment, Also, memory space needed for entire target nerve network is less than or equal to the memory capacity of individual equipment, shows individual equipment It can satisfy target nerve network operations demand, at this point, single device parallel schema and more equipment parallel schemas are able to satisfy target The operation demand of neural network can determine suitable target nerve network in this case based on the parallel schema of user configuration Parallel schema.

Further, the realization of the parallel schema of suitable target nerve network is determined based on the parallel schema of user configuration Journey include: when the parallel schema of user configuration is single device parallel schema, can be directly using single device parallel schema as suitable The parallel schema of target nerve network；When the parallel schema of user configuration is more equipment parallel schemas, a kind of optional realization Mode is, directly using more equipment parallel schemas as the parallel schema for being suitble to target nerve network, however, it is contemplated that more equipment are simultaneously Under row mode, there are the transmission of the data of equipment room certainly will will affect target nerve net if the data transmission period of equipment room is too long The reasoning rate of network, at this point, the use of more equipment parallel schemas not being a kind of preferred scheme.

For example, transmission bandwidth only has 10G/s, and single precision calculating, which is handled up, can achieve for the P40 of PCIE connection 12TFlops, that is, flops per second reaches 12T, is 1200 times of transmission, to input dimension as m*k, neuron number is For the full articulamentum of n, this layer of calculation amount is m*n*k*2, output data quantity m*n, when k value is little, due between equipment Transmission time be greater than equipment itself the calculating time, equipment need hang up etc. pending datas arrival, waste computing resource, increase It is at this moment made inferences using multi-card paralleled mode and bad the total time of reasoning.

In view of this, equipment room transmission time and pre-set subnet can be based in a kind of preferred implementation The maximum execution time of network determines the parallel schema of suitable target nerve network, specifically, if the transmission time of equipment room is greater than The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to target nerve network is the parallel mould of single device Formula, if equipment room transmission time is less than or equal to the maximum execution time of pre-set sub-network, it is determined that be suitble to target mind Parallel schema through network is more equipment parallel schemas.

Referring to Fig. 6, showing the hardware equipment information based on Inference Platform, the calculation amount of target nerve network and required Memory space and user configuration parallel schema, determine the optional tool of one kind of the parallel schema of suitable target nerve network The flow diagram of body implementation may include:

Step S601: judging whether the calculation amount of target nerve network is greater than the computing capability of individual equipment, if it is not, then holding Row step S602；If so, thening follow the steps S603.

Step S602: whether memory space needed for judging target nerve network is greater than the memory capacity of individual equipment, if It is to then follow the steps S603；If it is not, thening follow the steps S604.

It should be noted that the execution sequence that the present embodiment does not limit step S601 and step S602 is said sequence, For example, step S602 can be first carried out, then execute step S601, can also step S601 and step S602 execute parallel.Regardless of using Which kind of sequence executes, and is to execute S603 when any judging result, which is, is, when two judging results are no, executes step S604。

Step S603: the parallel schema for determining suitable target nerve network is more equipment parallel schemas.

Step S604: whether the parallel schema for judging user configuration is more equipment parallel schemas, if not, thening follow the steps S605；If so, thening follow the steps S606.

Step S605: the parallel schema for determining suitable target nerve network is single device parallel schema.

Step S606: judging whether the transmission time of equipment room is greater than the maximum execution time of pre-set sub-network, If so, executing step S605；If it is not, thening follow the steps S603.

Below by a specific example to the calculation amount of hardware equipment information, target nerve network based on Inference Platform and It is further to determine that the parallel schema of suitable target nerve network carries out for the parallel schema of required memory space and user configuration Illustrate:

The hardware device of Inference Platform is P40 video card, and the actual storage capacity of single P40 video card is 24GB, due to go Except some system spaces and reserved space, the memory capacity of single P40 video card is 22GB, the single precision peak computational of P40 video card Ability is 12TFlops, due to being extremely difficult to this theoretical peak under actual conditions, while in view of calculation scale, read-write delay Deng influence, using 8TFlops as the average computation ability of single P40 video card, the parallel schema of Inference Platform includes that single deck tape-recorder is parallel Mode and multi-card paralleled mode, the calculation amount of target nerve network is S, and required memory space is M, determines suitable target mind The process of parallel schema through network:

If M > 22G or S/ (8*10¹²) > T1_max(storage demand of i.e. entire target nerve network is greater than the available of single deck tape-recorder Video memory, alternatively, the computationally intensive average computation ability in single deck tape-recorder of entire target nerve network), then show that single deck tape-recorder is unable to complete The forward inference task of entire target nerve network, at this point, determining that multi-card paralleled mode is the parallel of suitable target nerve network Mode.Wherein, T1_maxBetween user setting, one input operation occupancy of single deck tape-recorder completion maximum execution.It needs to illustrate When, in M > 22G or S/ (8*10¹²) > T1_maxWhen, no matter which kind of mode the parallel schema of user configuration is, all by multi-card paralleled Mode is determined as being suitble to the parallel schema of target nerve network.

If M≤22G and S/ (8*10¹²)≤T1_max(storage demand of i.e. entire target nerve network is less than or equal to single deck tape-recorder Available video memory, and the calculation amount of entire target nerve network is less than or equal to the average computation ability of single deck tape-recorder), then show single deck tape-recorder The forward inference task of entire target nerve network can be completed, at this point, the parallel schema based on user configuration determines suitable mesh Mark the parallel schema of neural network.Specifically, if the parallel schema of user configuration is single deck tape-recorder parallel schema, it is determined that be suitble to target The parallel schema of neural network is single deck tape-recorder parallel schema；If the parallel schema of user configuration is multi-card paralleled mode, further base The transmission time T between card_tWith the maximum execution time T2 of pre-set sub-network_maxDetermine suitable target nerve network and Row mode, specifically, if card between transmission time T_t> T2_max, it is determined that the parallel schema for being suitble to target nerve network is single device Parallel schema, if T_t≤T2_max, it is determined that the parallel schema for being suitble to target nerve network is more equipment parallel schemas.Wherein, block Between transmission time T_tThe data interaction amount of=m/B, m between sub-network, B are transmission bandwidth between card.

It, can be based on the parallel mould of suitable target nerve network after the parallel schema for determining to be suitble to target nerve network Target nerve network is divided into multiple sub-networks by formula.It is to the parallel schema based on suitable target nerve network that target is refreshing below Multiple sub-networks are divided into through network to be introduced.

Based on the parallel schema of suitable target nerve network, target nerve network is divided into the realization of multiple sub-networks If journey may include: that the parallel schema of target nerve network is suitble to be more equipment parallel schemas, the number based on hardware device It obtains sub-network and divides number, number is divided based on the sub-network, target nerve network is divided；If being suitble to target nerve The parallel schema of network is single device parallel schema, divides number based on preset sub-network and carries out to target nerve network It divides.

It should be noted that in more equipment parallel schemas, if equipment number more than two, each hardware device can be based on Information and target nerve network operation demand, determine the number of hardware device actually used under more equipment parallel schemas Amount, for example, there is 5 hardware devices on Inference Platform, then can only use 3 hardware devices, certainly, it is flat that reasoning also can be used directly All hardware equipment on platform.That is, when be suitble to target nerve network parallel schema be more equipment parallel schemas when, can by P (2 < =P≤M, M are the number of hardware device on Inference Platform) division number as sub-network, target nerve network is divided into P sub-network, each equipment are responsible for the calculation amount of S/P, it is preferred that can using the number of hardware device on M, that is, Inference Platform as Target nerve network is divided into M sub-network by the division number of sub-network, i.e., each equipment is responsible for the calculation amount of S/M, In, S is the calculation amount of entire target nerve network.

Below to when be suitble to target nerve network parallel schema be more equipment parallel schemas when, based on determining sub-network Number is divided, the process divided to target nerve network is introduced.

In one possible implementation, the process that number divides target nerve network is divided based on sub-network It may include: the division number based on sub-network, the maximum data for theoretical amount and the equipment room transmission being responsible for single device Amount is partitioning standards, is divided to target nerve network.

Wherein, the division that the responsible theoretical amount of single device passes through the calculation amount and sub-network of entire target nerve network Number determines, specifically, the division number of sub-network is M, then single device is negative if the calculation amount of entire target nerve network is S The theoretical amount of duty is S/M；The maximum execution time that the maximum amount of data of equipment room transmission passes through pre-set sub-network It is determined with the transmission bandwidth of equipment room, specifically, if the maximum execution time of pre-set sub-network is T2_max, equipment room Transmission bandwidth is B, then the maximum amount of data m of equipment room transmission_max=T2_max*B。

Further, the division number based on sub-network, the theoretical amount being responsible for single device and equipment room transmission Maximum amount of data is partitioning standards, and the process divided to target nerve network may include: from the defeated of target nerve network Enter layer to start successively to traverse backward: being sequentially overlapped the calculation amount of each hidden layer, and is currently superimposed obtained calculation amount close to setting up When standby theoretical amount (such as S/M) being responsible for, the sub-network for the multiple adjacent hidden layers compositions being overlapped is obtained as candidate Sub-network；If the output data quantity (i.e. the data volume of the last one hidden layer output of the candidate sub networks network) of candidate sub networks network is less than Or the maximum amount of data m equal to equipment room transmission_max, then using the candidate sub networks network as the obtained sub-network of division；If should The number of output of candidate sub networks network is greater than the maximum amount of data m of equipment room transmission_max, then from the candidate sub networks network from the front to the back Hidden layer is removed one by one, until the output data quantity of the sub-network after removing is less than or equal to the maximum amount of data of equipment room transmission m_max, sub-network after removing hidden layer is as dividing an obtained sub-network；Continuation traverses backward, until obtaining all sons Network, wherein after one sub-network of every acquisition, the calculation amount of the hidden layer after the sub-network is overlapped again.

Illustratively, as shown in fig. 7, target nerve network includes Q hidden layer, it is followed successively by Layer 1, Layer in order 2 ..., LayerQ are traversed backward since input layer, and the calculation amount for being sequentially overlapped each hidden layer obtains S_sum(i), for example, when traversal When to first hidden layer, S_sumIt (1) is the calculation amount of first hidden layer, when traversing second hidden layer, S_sumIt (2) is first The calculation amount of the calculation amount of hidden layer and second hidden layer and, and so on, work as S_sum(K) it is responsible for close or equal to single device When theoretical amount, using 1~LayerK of Layer as a candidate sub networks network, further by the output number of the candidate sub networks network It is compared according to the maximum amount of data that amount is transmitted with equipment room, if the output data quantity of the candidate sub networks network is less than or equal to equipment Between the maximum amount of data that transmits, then first sub-network obtained the candidate sub networks network as division, if the candidate sub networks network Output data quantity be greater than equipment room transmission maximum amount of data, then removed one by one from the front to the back from the candidate sub networks network hidden Layer, until the output data quantity for removing the sub-network after hidden layer is less than or equal to the maximum amount of data of equipment room transmission, for example, will The output data quantity of sub-network after LayerK and LayerK-1 removal is less than or equal to the maximum amount of data of equipment room transmission, then It is gradually obtained subsequent using Layer1~LayerK-2 as first obtained sub-network is divided then according to identical strategy Each sub-network.It should be noted that one sub-network of every acquisition, is superimposed first hidden layer again after the sub-network Calculation amount.

Below to when being suitble to the parallel schema of target nerve network to be single device parallel schema, it is based on preset son Network divides the process that number divides target nerve network and is introduced.

When using single device parallel schema, number can be divided by preset sub-network and target nerve network is drawn Point, it should be noted that sub-network, which divides number, should be arranged suitable, should not be too large, and sub-network divides that number is excessive to be will lead to The calculation amount of single subnet network becomes very little, at this point, calculating time of the data synchronization time between sub-network relative to sub-network Proportion can increase, and then drag down the handling capacity of sub-network.It, can be (whole based on that average computational load under single device parallel schema The calculation amount of a target nerve network/preset sub-network divides number) target nerve network is divided.It is exemplary , it is 8 that preset sub-network, which divides number, and the calculation amount of entire target nerve network is S, then that average computational load is S/8, It is divided to target nerve network, is sequentially overlapped the calculation amount of each hidden layer, and approach being currently superimposed obtained calculation amount Or when being equal to S/8, the sub-network for obtaining the multiple adjacent hidden layer compositions being overlapped makees one and divides an obtained sub-network, Then superposition calculation amount again first hidden layer after the sub-network, when the calculation amount that superposition obtains is close or equal to S/ When 8, sub-network work that acquisition carries out multiple adjacent hidden layers compositions of this wheel superposition divides another obtained sub-network, with such It pushes away, obtains all sub-networks, the calculation amount of each sub-network is close or equal to S/8.

It should be noted that the division of sub-network can be carried out to entire target nerve network in single device parallel schema, it is right Individual equipment under multi-card paralleled mode, the sub-network that can be obtained to division is further divided, specific to divide Mode can refer under single device parallel schema, the division mode divided to entire target nerve network.

After target nerve network is divided into multiple sub-networks, need to be each height on the hardware device of Inference Platform Network creation reasoning example and inference engine.Specifically, needing respectively each sub-network right more equipment parallel schemas Reasoning example and inference engine are created on the hardware device answered, and create reasoning under more equipment parallel schemas referring to Fig. 8, showing The schematic diagram of engine, the neural network shown in Fig. 8 are divided into 4 sub-networks, and each sub-network corresponds to a hardware device, When creating inference engine, an inference engine is respectively created on 4 hardware devices；For single device parallel schema, need one It is respectively that each sub-network creates corresponding reasoning example and inference engine in a equipment.

It is right respectively based on multiple sub-networks after having created the corresponding reasoning example of multiple sub-networks and inference engine The reasoning example and inference engine answered carry out forward inference to entire target nerve network.It is successive due to having between multiple sub-networks Therefore dependence is being based on the corresponding reasoning example of multiple sub-networks and inference engine to entire target nerve network When carrying out forward inference, draw firstly, establishing the corresponding reasoning of multiple sub-networks based on the successive dependence between multiple sub-networks Successive dependence between holding up, specifically, can the successive dependence based on multiple sub-networks respectively corresponded in multiple sub-networks Inference engine between establish read-write mark (as shown in figure 8, the reasoning created in the inference engine created on the device 1 and equipment 2 The reasoning created between the inference engine created between engine, in equipment 2 and the inference engine created in equipment 3, in equipment 3 is drawn Hold up and establish read-write mark between the inference engine that creates in equipment 4 respectively), it is then, right respectively to multiple sub-networks in order The inference engine input data answered, so that each inference engine is based on input data and corresponding reasoning example to its corresponding son Network carries out operation.

The process that neural network in Fig. 8 makes inferences are as follows: data 1 are sequentially input into equipment 1 and (complete the fortune of sub-network 1 Calculate), equipment 2 operation of sub-network 2 (complete), equipment 3 (operation for completing sub-network 3) and (fortune of completion sub-network 4 of equipment 4 Calculate), the forward inference process for data 1 is completed with this, needs to stress, when equipment 1 is sent for the output of data 1 While entering equipment 2, the meeting input equipment 1 of data 2, it can be seen that, in synchronization, have multiple input datas in different reasonings It is executed parallel in engine.

It is further illustrated below with reference to reasoning process of the Fig. 9 to neural network:

Include N number of inference engine in Fig. 9, be followed successively by Engine1, Engine2 ..., EngineN, at any time T1, will New input data dataN is sent into inference engine Engine1 and carries out operation, is sent into inference engine Engine1 in data N and carries out While operation, dataN-1 (substantially referring to that Engine1 is directed to the operation result of dataN-1) is sent into inference engine Engine2 Carry out operation, dataN-2 (substantially refer to Engine2 be directed to dataN-2 operation result) be sent into inference engine Engine3 into Row operation, and so on, data 1 (substantially referring to that EngineN-1 is directed to the operation result of data1) is sent into inference engine EngineN carries out operation, it can be seen that, in synchronization T1, inference engine Engine1~inference engine EngineN simultaneously into Row operation.

Compare inference method in the prior art (only one inference engine of synchronization carry out operation, and, a reasoning Engine carries out operation for whole network) it is found with inference method provided by the present application: assuming that have x input data, existing skill Inference time needed for inference method in art is x*t, wherein t is inference time required for an input data, the application Inference time needed for the inference method of offer is t/N* (2*x-1), wherein N is the number of inference engine, when n is large, whole The handling capacity of a target nerve network substantially increases Reasoning Efficiency close to N/2 times of existing inference schemes.

The forward inference method of neural network provided by the embodiments of the present application, since neural network is divided into multiple subnets Network, the corresponding inference engine of each sub-network, therefore, each inference engine is only responsible for a part of hidden layer of target nerve network, This allows synchronization to have multiple data to be input to different inference engine progress operations, and synchronization has multiple reasonings to draw Holding up concurrent operation makes the hardware resource of Inference Platform be fully used, and significantly improves Reasoning Efficiency, data throughout It greatly increases.

The embodiment of the present application also provides a kind of forward inference devices of neural network, provide below the embodiment of the present application The forward inference device of neural network be described, the forward inference device of neural network described below with it is above-described The forward inference method of neural network can correspond to each other reference.

Referring to Fig. 10, the structure for showing a kind of forward inference device of neural network provided by the embodiments of the present application is shown It is intended to, as shown in Figure 10, the apparatus may include: network process module 1001, example and engine creation module 1002 and reasoning Module 1003.

Network process module 1001, for target nerve network to be divided into multiple sub-networks, wherein any sub-network packet Include at least one hidden layer of the target nerve network.

Example and engine creation module 1002, for creating the multiple subnet on the hardware device of the Inference Platform The corresponding reasoning example of network and inference engine.

Reasoning module 1003, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to institute It states target nerve network and carries out forward inference.

Target nerve network can be divided into multiple sons by the forward inference device of neural network provided by the embodiments of the present application Then reasoning example and inference engine is respectively created for multiple sub-networks in network, and then corresponding based on multiple sub-networks Reasoning example and inference engine carry out forward inference to target nerve network, and since inference engine has multiple, and a reasoning is drawn A part of hidden layer of only responsible target nerve network is held up, this allows synchronization there are multiple data to be input to different reasonings Concurrent operation in engine, compared with existing inference schemes, since synchronization has multiple inference engine concurrent operations, Hardware resource is fully used, that is, improves the utilization rate of hardware resource, meanwhile, Reasoning Efficiency is improved, data are improved Handling capacity, and under the premise of storage resource is constant, save memory space.

In one possible implementation, in the forward inference device of neural network provided by the above embodiment, network Processing module 1001 may include: data obtaining module and sub-network division module.

The data obtaining module, for obtain the Inference Platform hardware equipment information and the target nerve net The calculation amount of network and required memory space.

In the forward inference device of neural network provided by the above embodiment, data obtaining module may include hardware information Acquisition submodule.

The hardware information acquisition submodule, for obtaining one of following information or a variety of: the number of hardware device, Transmission bandwidth between the computing capability of hardware device, the memory capacity of hardware device, hardware device.

In one possible implementation, data obtaining module further include: calculate figure building submodule and calculation amount Submodule is determined with memory space.

The calculating figure building submodule constructs the target for the network parameter according to the target nerve network The calculating figure of neural network.

In one possible implementation, the subnet in the forward inference device of neural network provided by the above embodiment Network division module may include: that parallel schema determines that submodule and sub-network divide submodule.

In one possible implementation, the parallel schema determines that submodule includes: first to determine submodule and the Two determine submodule.

Described first determines submodule, by working as the computationally intensive based on individual equipment of the entire target nerve network Calculation ability, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determine The parallel schema for being suitble to the target nerve network is more equipment parallel schemas.

In one possible implementation, it is described second determine submodule, specifically for when the user configuration and Row mode be the single device parallel schema when, determine be suitble to the target nerve network parallel schema be the single device simultaneously Row mode；When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than The maximum execution time of pre-set sub-network, it is determined that the parallel schema for being suitble to the target nerve network is described sets up Standby parallel schema, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network, It then determines and the parallel schema of the target nerve network is suitble to be more equipment parallel schemas.

In one possible implementation, it includes: the first division submodule and second that the sub-network, which divides submodule, Divide submodule.

It is described first divide submodule, for when be suitble to the target nerve network parallel schema be more equipment simultaneously When row mode, the number based on the hardware device obtains the division number of sub-network, and the division based on the sub-network It is several that the target nerve network is divided.

In one possible implementation, described first submodule is divided, specifically for drawing based on the sub-network Divide number, the maximum amount of data for theoretical amount and the equipment room transmission being responsible for using single device is partitioning standards, to the target Neural network is divided.

In one possible implementation, described first submodule is divided, be specifically used for from the target nerve network Input layer start successively to traverse backward: be sequentially overlapped the calculation amount of each hidden layer, and close being currently superimposed obtained calculation amount When the theoretical amount that the single device is responsible for, the sub-network for the multiple adjacent hidden layers compositions being overlapped is obtained as candidate son Network；If the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission, by institute State the sub-network that candidate sub networks network is obtained as division；If the number of output of the candidate sub networks network is greater than the equipment room The maximum amount of data of transmission then removes hidden layer from the candidate sub networks network one by one from the front to the back, until the sub-network after removing Output data quantity be less than or equal to the maximum amount of data of equipment room transmission, the sub-network after removing hidden layer is as dividing The sub-network arrived；Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again The calculation amount of hidden layer after the sub-network is overlapped.

In one possible implementation, reasoning module 1003, specifically for according between the multiple sub-network Dependence determines the dependence between the corresponding inference engine of the multiple sub-network；In order to the multiple The corresponding inference engine input data of sub-network, so that each inference engine is based on input data and corresponding reasoning example Operation is carried out to its corresponding sub-network.

The embodiment of the present application also provides a kind of forward inference equipment of neural network, Figure 11 are please referred to, before showing this To the structural schematic diagram of reasoning equipment, which may include: at least one processor 1101, at least one communication interface 1102, at least one processor 1103 and at least one communication bus 1104；

In the embodiment of the present application, the number of processor 1101, communication interface 1102, memory 1103, communication bus 1104 Amount be at least one, and processor 1101, communication interface 1102, memory 1103 by communication bus 1104 complete it is mutual Communication；

Processor 1101 may be a central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road etc.；

Memory 1103 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory) etc., a for example, at least magnetic disk storage；

Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:

Optionally, the refinement function of described program and extension function can refer to above description.

The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor Capable program, described program are used for:

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of forward inference method of neural network characterized by comprising

Target nerve network is divided into multiple sub-networks, wherein any sub-network include the target nerve network at least One hidden layer；

Based on the corresponding reasoning example of the multiple sub-network and inference engine, before being carried out to the target nerve network to Reasoning.

2. the forward inference method of neural network according to claim 1, which is characterized in that described by target nerve network It is divided into multiple sub-networks, comprising:

Obtain the hardware equipment information of the Inference Platform and the calculation amount of target nerve network and required memory space；

The calculation amount of hardware equipment information and the target nerve network based on the Inference Platform and required storage are empty Between, the target nerve network is divided into multiple sub-networks.

3. the forward inference method of neural network according to claim 2, which is characterized in that the hardware of the Inference Platform Facility information includes one of following information or a variety of:

Transmission belt between the number of hardware device, the computing capability of hardware device, the memory capacity of hardware device, hardware device It is wide.

4. the forward inference method of neural network according to claim 2 or 3, which is characterized in that obtain the target mind Calculation amount and required memory space through network, comprising:

According to the calculating figure of the target nerve network, the calculation amount and required storage of each layer of target nerve network are determined Space；

Calculation amount and required memory space by each layer of target nerve network determine the entire target nerve network Calculation amount and required memory space.

5. the forward inference method of neural network according to claim 2 or 3, which is characterized in that described to be pushed away based on described The calculation amount and required memory space of the hardware equipment information of platform and the target nerve network, by the target mind Multiple sub-networks are divided into through network, comprising:

The calculation amount of hardware equipment information, the target nerve network based on the Inference Platform and required memory space with And the parallel schema of user configuration, it determines and is suitble to the parallel schema of the target nerve network, wherein the parallel schema includes Single device parallel schema and more equipment parallel schemas, under the single device parallel schema, the forward direction of the target nerve network Reasoning is realized based on individual equipment, and under more equipment parallel schemas, the forward inference of the target nerve network is based on more A equipment is realized；

Based on the parallel schema for being suitble to the target nerve network, the target nerve network is divided into multiple sub-networks.

6. the forward inference method of neural network according to claim 5, which is characterized in that described flat based on the reasoning The parallel mould of the hardware equipment information of platform, the calculation amount of the target nerve network and required memory space and user configuration Formula determines the parallel schema for being suitble to the target nerve network, comprising:

If the computationally intensive computing capability in individual equipment of the entire target nerve network, and/or, the entire target mind It is greater than the memory capacity of individual equipment through memory space needed for network, it is determined that be suitble to the parallel mould of the target nerve network Formula is more equipment parallel schemas；

If the calculation amount of the entire target nerve network is less than or equal to the computing capability of the individual equipment, also, entire Memory space needed for the target nerve network is less than or equal to the memory capacity of the individual equipment, then is based on the user The parallel schema of configuration determines the parallel schema for being suitble to the target nerve network.

7. the forward inference method of neural network according to claim 6, which is characterized in that described to be matched based on the user The parallel schema set determines the parallel schema for being suitble to the target nerve network, comprising:

When the parallel schema of the user configuration is the single device parallel schema, determination is suitble to the target nerve network Parallel schema is the single device parallel schema；

When the parallel schema of the user configuration is more equipment parallel schemas, if the transmission time of equipment room is greater than in advance The maximum execution time of the sub-network of setting, it is determined that be suitble to the target nerve network parallel schema be the single device simultaneously Row mode, if the equipment room transmission time is less than or equal to the maximum execution time of the pre-set sub-network, really Surely the parallel schema for being suitble to the target nerve network is more equipment parallel schemas.

8. the forward inference method of neural network according to claim 5, which is characterized in that described to be based on being suitble to the mesh The parallel schema for marking neural network, is divided into multiple sub-networks for the target nerve network, comprising:

If the parallel schema for being suitble to the target nerve network is more equipment parallel schemas, based on the hardware device Number obtains the division number of sub-network, is divided based on the division number of the sub-network to the target nerve network；

If being suitble to the parallel schema of the target nerve network is the single device parallel schema, it is based on preset sub-network Number is divided to divide the target nerve network.

9. the forward inference method of neural network according to claim 6, which is characterized in that described to be based on the sub-network Division number the target nerve network is divided, comprising:

Division number based on the sub-network, the maximum amount of data of the theoretical amount being responsible for single device and equipment room transmission For partitioning standards, the target nerve network is divided；

Wherein, the responsible theoretical amount of the single device passes through the entirely calculation amount of the target nerve network and the subnet The division number of network is determining, the maximum execution time that the maximum amount of data of the equipment room transmission passes through pre-set sub-network It is determined with the transmission bandwidth of equipment room.

10. the forward inference method of neural network according to claim 7, which is characterized in that described to be based on the subnet The maximum amount of data of the division number of network, the theoretical amount being responsible for using single device and equipment room transmission is partitioning standards, to institute Target nerve network is stated to be divided, comprising:

It is successively traversed backward since the input layer of the target nerve network: being sequentially overlapped the calculation amount of each hidden layer, and working as When the theoretical amount that the calculation amount that preceding superposition obtains is responsible for close to the single device, the multiple adjacent hidden layers being overlapped are obtained The sub-network of composition is as candidate sub networks network；

It, will be described if the output data quantity of the candidate sub networks network is less than or equal to the maximum amount of data of equipment room transmission The sub-network that candidate sub networks network is obtained as division；If the number of output of the candidate sub networks network is passed greater than the equipment room Defeated maximum amount of data then removes hidden layer from the candidate sub networks network one by one from the front to the back, until the sub-network after removing Output data quantity is less than or equal to the maximum amount of data of equipment room transmission, and the sub-network after removing hidden layer is obtained as division A sub-network；

Continuation traverses backward, until obtaining all sub-networks, wherein after one sub-network of every acquisition, again to the sub-network The calculation amount of hidden layer afterwards is overlapped.

11. the forward inference method of neural network according to claim 1, which is characterized in that described based on the multiple The corresponding reasoning example of sub-network and inference engine carry out forward inference to the target nerve network, comprising:

According to the dependence between the multiple sub-network, determine between the corresponding inference engine of the multiple sub-network Dependence；

12. a kind of forward inference device of neural network characterized by comprising network process module, example and engine creation Module and reasoning module；

The network process module is divided into multiple sub-networks with by target nerve network, wherein any sub-network includes described At least one hidden layer of target nerve network；

The example and engine creation module are right respectively for creating the multiple sub-network on the hardware device of Inference Platform The reasoning example and inference engine answered；

The reasoning module, for being based on the corresponding reasoning example of the multiple sub-network and inference engine, to the mesh It marks neural network and carries out forward inference.

13. forward inference device according to claim 12, which is characterized in that the network process module includes: information Obtain module and sub-network division module；

The data obtaining module, for obtain the Inference Platform hardware equipment information and the target nerve network Calculation amount and required memory space；

The sub-network division module, for based on the Inference Platform hardware equipment information and the target nerve network Calculation amount and required memory space, the target nerve network is divided into multiple sub-networks.

14. the forward inference device of neural network according to claim 13, which is characterized in that the Inference Platform it is hard Part facility information includes one of following information or a variety of:

15. the forward inference device of neural network described in 3 or 14 according to claim 1, which is characterized in that the acquisition of information Module includes: that calculating figure building submodule and calculation amount and memory space determine submodule；

The calculating figure building submodule constructs the target nerve for the network parameter according to the target nerve network The calculating figure of network；

The calculation amount and memory space determine submodule, for the calculating figure according to the target nerve network, determine described in The calculation amount of each layer of target nerve network and required memory space, calculation amount and institute by each layer of target nerve network The memory space needed determines the entirely calculation amount of the target nerve network and required memory space.

16. the forward inference method of neural network described in 3 or 14 according to claim 1, which is characterized in that the sub-network is drawn Sub-module includes: that parallel schema determines that submodule and sub-network divide submodule；

The parallel schema determines submodule, for the hardware equipment information based on the Inference Platform, the target nerve net The parallel schema of the calculation amount of network and required memory space and user configuration, determine be suitble to the target nerve network and Row mode, wherein the parallel schema includes single device parallel schema and more equipment parallel schemas, in the parallel mould of the single device Under formula, the forward inference of the target nerve network is realized based on individual equipment, under more equipment parallel schemas, the mesh The forward inference for marking neural network is realized based on multiple equipment；

The sub-network divides submodule, for based on the parallel schema for being suitble to the target nerve network, by the target mind Multiple sub-networks are divided into through network.

17. the forward inference device of neural network according to claim 16, which is characterized in that the parallel schema determines Submodule includes: that the first determining submodule and second determine submodule；

Described first determines submodule, for the computationally intensive calculating energy in individual equipment when the entire target nerve network Power, and/or, when memory space needed for the entire target nerve network is greater than the memory capacity of individual equipment, determines and be suitble to The parallel schema of the target nerve network is more equipment parallel schemas；

Described second determines submodule, for individually setting when the calculation amount of the entire target nerve network less than or equal to described Standby computing capability, also, memory space needed for the entire target nerve network is less than or equal to the individual equipment When memory capacity, based on the determining parallel schema for being suitble to the target nerve network of parallel schema of the user configuration.

18. the forward inference device of neural network according to claim 17, which is characterized in that described second determines submodule Block is suitble to the target mind specifically for determining when the parallel schema of the user configuration is the single device parallel schema Parallel schema through network is the single device parallel schema；When the parallel schema of the user configuration is that more equipment are parallel When mode, if the transmission time of equipment room is greater than the maximum execution time of pre-set sub-network, it is determined that be suitble to the mesh The parallel schema for marking neural network is the single device parallel schema, if the equipment room transmission time is less than or equal to described pre- The maximum execution time for the sub-network being first arranged, it is determined that the parallel schema for being suitble to the target nerve network is more equipment Parallel schema.

19. the forward inference device of neural network according to claim 16, which is characterized in that the sub-network divides son Module includes: that the first division submodule and second divide submodule；

Described first divides submodule, is suitble to the parallel schema of the target nerve network to be more parallel moulds of equipment for working as When formula, the number based on the hardware device obtains the division number of sub-network, and the division number pair based on the sub-network The target nerve network is divided；

Described second divides submodule, is suitble to the parallel schema of the target nerve network to be the parallel mould of the single device for working as When formula, number is divided based on preset sub-network, the target nerve network is divided.

20. the forward inference device of neural network according to claim 19, which is characterized in that described first divides submodule Block, specifically for the division number based on the sub-network, the theoretical amount being responsible for single device and equipment room transmission are most Big data quantity is partitioning standards, is divided to the target nerve network；

21. the forward inference device of neural network according to claim 20, which is characterized in that described first divides submodule Block, specifically for successively being traversed backward since the input layer of the target nerve network: it is sequentially overlapped the calculation amount of each hidden layer, And when being currently superimposed the theoretical amount that obtained calculation amount is responsible for close to the single device, the multiple phases being overlapped are obtained The sub-network of adjacent hidden layer composition is as candidate sub networks network；If the output data quantity of the candidate sub networks network is less than or equal to described set The maximum amount of data transmitted between standby, the then sub-network obtained the candidate sub networks network as division；If the candidate son The number of output of network is greater than the maximum amount of data of equipment room transmission, then from the candidate sub networks network from the front to the back one by one Hidden layer is removed, until the output data quantity of the sub-network after removing is less than or equal to the maximum amount of data of equipment room transmission, The sub-network that sub-network after removing hidden layer is obtained as division；Continuation traverses backward, until all sub-networks are obtained, Wherein, after one sub-network of every acquisition, the calculation amount of the hidden layer after the sub-network is overlapped again.

22. the forward inference device of neural network according to claim 12, which is characterized in that the reasoning module, tool Body is used for according to the dependence between the multiple sub-network, determine the corresponding inference engine of the multiple sub-network it Between dependence；

23. a kind of forward inference equipment of neural network characterized by comprising memory and processor；

The memory, for storing program；

The processor, for executing described program, before realizing the neural network as described in any one of claim 1~11 To each step of inference method.

24. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed When device executes, each step of the forward inference method of the neural network as described in any one of claim 1~11 is realized.