CN109284820A

CN109284820A - A kind of search structure method and device of deep neural network

Info

Publication number: CN109284820A
Application number: CN201811259033.2A
Authority: CN
Inventors: 黄泽昊; 张新邦; 王乃岩
Original assignee: Beijing Tusimple Future Technology Co Ltd
Current assignee: Beijing Tusimple Future Technology Co Ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2019-01-29
Also published as: CN110717586A; WO2020082663A1

Abstract

The present invention provides a kind of search structure method and devices of deep neural network, are related to field of artificial intelligence.Method includes: every layer of computing unit structure in each module for obtaining in pre-set search space and being sequentially connected in series in deep neural network；Each computing unit is attached using default connection type in each module, obtains the information flow in each module；According to the connection of the computing unit in module and each module, initial neural network is obtained；Sparse scaling operator is arranged to the information flow in initial neural network, wherein sparse scaling operator is for zooming in and out information flow；The weight of initial neural network and the sparse scaling operator of information flow are trained using preset training sample data, obtain intermediate nerve network；The information flow that scaling operator sparse in intermediate nerve network is zero is deleted, the search result neural network in search space is obtained.The present invention can save the time of network structure search.

Description

A kind of search structure method and device of deep neural network

Technical field

The present invention relates to field of artificial intelligence more particularly to the search structure methods and dress of a kind of deep neural network It sets.

Background technique

In recent years, deep neural network achieved huge success in numerous areas, such as computer vision, natural language Speech processing etc..Deep neural network is converted the feature of traditional hand-designed in order to end-to-end by powerful characterization ability Study.However, structure is complicated for deep neural network at present, the computing units node such as convolution, pond is numerous, so that such as Where in numerous computing unit nodes search obtain one compact-sized, the speed of service is very fast, effect again good model structure at For a difficult point.

Currently available technology generally using search space is first defined, then searches for optimal network knot in search space Structure.Network structure search can be carried out using the heuristic searched for based on the network structure of controller under normal circumstances, Or network structure search is carried out using evolution algorithm.However, controller is needed to be trained or use in the prior art Evolution algorithm carries out network structure search, needs in search process to convergence to come the sub-network training in complete or collected works to subnet Network is assessed, so that the time of network structure search and calculation amount are very big, for biggish data set, are adopted this method and is searched The process of rope to optimum network structure is cumbersome and slow.

Summary of the invention

The embodiment of the present invention provides a kind of search structure method and device of deep neural network, to solve the prior art In network structure search time and calculation amount it is very big, for biggish data set, search the process of optimum network structure Cumbersome and slow problem.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

On the one hand, the present invention provides a kind of search structure method of deep neural network, comprising:

Every layer of meter in each module being sequentially connected in series in deep neural network is obtained in pre-set search space Calculate cellular construction；Every layer of computing unit structure includes at least one computing unit；

Each computing unit is attached using default connection type in each module, obtains the information in each module Stream；Wherein, between the computing unit in same layer computing unit structure without connection, each computing unit can with and its The computing unit of different layers in the module and outputting and inputting for module where it be attached；

According to the connection of the computing unit in module and each module, initial neural network is obtained；

Sparse scaling operator is arranged to the information flow in the initial neural network, wherein the sparse scaling operator is used for The information flow is zoomed in and out；

Using preset training sample data to the weight of the initial neural network and the sparse scaling operator of information flow It is trained, obtains intermediate nerve network；

The information flow that scaling operator sparse in the intermediate nerve network is zero is deleted, the search in search space is obtained As a result neural network.

On the other hand, the present invention provides a kind of object detection method, comprising:

The sample data for obtaining pending target detection is input to the search structure side using above-mentioned deep neural network In the search result neural network that method obtains, using the output of described search result neural network as object detection results.

On the other hand, the present invention provides a kind of semantic segmentation method, comprising:

The sample data for obtaining pending semantic segmentation is input to the search structure side using above-mentioned deep neural network In the search result neural network that method obtains, using the output of described search result neural network as semantic segmentation result.

Another aspect, the present invention provide a kind of search structure device of deep neural network, comprising:

Computing unit structure obtaining unit, for being obtained in deep neural network successively in pre-set search space Every layer of computing unit structure in each module of concatenation；Every layer of computing unit structure includes at least one computing unit；

Information flow obtaining unit, for each computing unit to be attached using default connection type in each module, Obtain the information flow in each module；Wherein, without connection between the computing unit in same layer computing unit structure, often A computing unit can with its different layers in the module computing unit and module where it output and input into Row connection；

Initial neural network obtaining unit is obtained for the connection according to the computing unit in module and each module To initial neural network；

Sparse scaling operator setting unit is calculated for sparse scaling to be arranged to the information flow in the initial neural network Son, wherein the sparse scaling operator is for zooming in and out the information flow；

Weight and operator training unit, for the weight using preset training sample data to the initial neural network It is trained with the sparse scaling operator of information flow, obtains intermediate nerve network；

Search result obtaining unit, for by the intermediate nerve network it is sparse scaling operator be zero information flow delete It removes, obtains the search result neural network in search space.

In another aspect, the present invention provides a kind of computer readable storage medium, it is stored thereon with computer program, feature It is, which realizes the search structure method of above-mentioned deep neural network when being executed by processor.

In another aspect, the present invention provides a kind of computer equipment, including memory, processor and to be stored in storage upper and can The computer program run on a processor, the processor realize the knot of above-mentioned deep neural network when executing described program Structure searching method.

The search structure method and device of a kind of deep neural network provided in an embodiment of the present invention, firstly, being set in advance Every layer of computing unit structure in each module being sequentially connected in series in deep neural network is obtained in the search space set；Every layer of meter Calculating cellular construction includes at least one computing unit；Later, use default connection type by each computing unit in each module It is attached, obtains the information flow in each module；Wherein, between the computing unit in same layer computing unit structure not into Row connection, each computing unit can with its institute's different layers in the module computing unit and its place module it is defeated Enter and output is attached；Then, according to the connection of the computing unit in module and each module, initial nerve net is obtained Network；Sparse scaling operator is arranged to the information flow in initial neural network, wherein sparse scaling operator is used to carry out information flow Scaling；The weight of initial neural network and the sparse scaling operator of information flow are instructed using preset training sample data Practice, obtains intermediate nerve network；In turn, the information flow that scaling operator sparse in intermediate nerve network is zero is deleted, is searched Search result neural network in rope space.The present invention and important network structure is directly searched for from search space in the prior art Difference, the present invention can delete unessential information flow by sparse scaling operator to realize the search of network structure.The present invention exists In the search process of network structure, without being trained to controller, without complicated evolution algorithm is used, antithetical phrase is not needed Network is trained for a long time, search result only can be obtained by the training to weight and sparse scaling operator, so that net The time of network search structure greatly reduces, and searches for especially for the network structure on large-scale dataset, more saving network The time of search structure.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart one of the search structure method of deep neural network provided in an embodiment of the present invention；

Fig. 2 is the network structure signal in the search space in deep neural network involved in the embodiment of the present invention Figure；

Fig. 3 is the example schematic that the embodiment of the present invention is applied to the web search of double-layer structure；

Fig. 4 is a kind of structural schematic diagram of the search structure device of deep neural network provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

To facilitate the understanding of the present invention, technical term according to the present invention is explained below:

DNN: deep neural network (Deep Neural Network).

Computing unit: the cell node for being used to carry out the calculating such as convolution, pond in neural network.

Network structure search: the process of optimum network structure is searched in neural network.

During realizing the embodiment of the present invention, inventor has found that the prior art generally uses the network based on controller The heuristic of search structure, it may be assumed that

According to priori knowledge and deep neural network structure (specific structures such as neuron, network layer, mould group, module) come structure Build some network structures to be searched；Then controller is set for network structure to be searched, in such a way that distribution solves, I.e. for each controller, the parallel computation of multiple network structures to be searched is carried out, the accuracy rate of each network structure is obtained For carrying out gradient descent algorithm to controller, to obtain optimum network structure.As it can be seen that for using the net based on controller The heuristic of network search structure needs to be trained big amount controller, and distributed solution, and process is relatively complicated slow Slowly.

In order to solve above-mentioned the problems of the prior art, as shown in Figure 1, the embodiment of the present invention provides a kind of depth nerve net The search structure method of network, comprising:

Step 101 is obtained in pre-set search space in each module being sequentially connected in series in deep neural network Every layer of computing unit structure.

Wherein, every layer of computing unit structure includes at least one computing unit.

Step 102 uses default connection type to be attached each computing unit in each module, obtains each module In information flow.

Wherein, without connection between the computing unit in same layer computing unit structure, each computing unit can With with its different layers in the module computing unit and outputting and inputting for module where it be attached.

Step 103, according to the connection of the computing unit in module and each module, obtain initial neural network.

Sparse scaling operator is arranged to the information flow in initial neural network in step 104, wherein sparse scaling operator is used for Information flow is zoomed in and out.

Step 105, using preset training sample data to the weight of initial neural network and the sparse scaling of information flow Operator is trained, and obtains intermediate nerve network.

Step 106 deletes the information flow that scaling operator sparse in intermediate nerve network is zero, obtains in search space Search result neural network.

It is worth noting that the pre-set search space of institute can be as shown in Fig. 2, wherein in deep neural network It may include multiple modules 21, modules 21 are sequentially connected in series, i.e., the output of a upper module is the input of next module；Each Module 21 (can be considered as directed acyclic graph) may include multilayer computing unit structure 22, and every layer of computing unit structure 22 includes at least One computing unit 23 (each computing unit can be considered as the node in directed acyclic graph), in this every layer computing unit structure 22 Computing unit 23 generally may include at least one of convolutional calculation unit and pond computing unit.The convolutional calculation unit It can also be expansion convolutional calculation unit or group convolutional calculation unit etc..

Preferably, above-mentioned steps 102 can be accomplished in that

In each module 21, each computing unit 23 is attached using full connection type, i.e., as shown in Fig. 2, it will Each computing unit 23 with and the different layers where it in module 21 computing unit 23 and its place module 21 input and Output is attached；It is available in this way to be input to every layer of computing unit structure 22, from every layer of computing unit knot from module 21 Structure 22 (can be considered between directed acyclic graph interior joint to the information flow between the output and each computing unit 23 of module 21 Side).(any network structure in the search space can be considered as the complete or collected works of network structure in search space available in this way The subgraph of above-mentioned directed acyclic graph).For example, in a module 21, i-th of computing unit F⁽ⁱ⁾(x) output h (i), is equal to The sum of output h (j) of computing unit before all passes through computing unit F⁽ⁱ⁾(x) calculate as a result, can be formulated Are as follows:

In this way, according to above-mentioned structure shown in Fig. 2, initial neural network can be obtained in above-mentioned steps 103.

Further, after above-mentioned steps 103, the weight of initial neural network can be configured, with initialization The weight of initial neural network.Or, it is preferable that after above-mentioned steps 103, preset pre-training sample data can be used Pre-training is carried out to the weight of the initial neural network, the initial neural network after obtaining pre-training, after such pre-training, The weight of obtained initial neural network is preferable.It is initial refreshing in order to obtain for being configured or carried out herein pre-training to weight Weight initial value through network, in order to the setting and training of subsequent sparse scaling operator.

Later in above-mentioned steps 104, need that sparse scaling operator, i.e. example is arranged to the information flow in initial neural network Such as it is above-mentioned it is all before computing unit output h (j) at increase sparse scaling operatorFor indicating that j-th of calculating is single Sparse scaling operator of the member to information flow between i-th of computing unit.Then above-mentioned formula (1) is after increasing sparse scaling operator, It should indicate are as follows:

Herein, the value of each sparse scaling operator is more than or equal to 0.For example, the above-mentioned weight to initial neural network into Row configures, and after the weight to initialize initial neural network, value interval of the sparse scaling operator can be [0,1], sparse Scaling operator needs not be equal to 1.And it is carried out above-mentioned using weight of the preset pre-training sample data to initial neural network After pre-training, the value of the sparse scaling operator is generally taken as 1.

It is illustrated below with the search of a convolution neural network structure, in convolutional neural networks structure, computing unit As convolutional calculation unit and pond computing unit, information flow are the characteristic pattern in network.In the convolutional neural networks structure In, several modules are contained, each module includes several layers computing unit structure, if each layer of computing unit structure includes again Different computing units is done (for example, 1 × 1 convolutional calculation, 3 × 3 convolutional calculation, 5 × 5 convolutional calculation, pondization calculate Deng being not only limited to above-mentioned these types).Modules are sequentially connected in series, i.e., the output of a upper module is the defeated of next module Enter, each computing unit with and its computing unit of different layers in the module and outputting and inputting for its place module It is attached.In this way, the output of each computing unit can be represented, for example, in convolutional neural networks structure, b The output of i-th layer of j-th of computing unit of a module can indicate are as follows:

Wherein, F^(b,i,j)(x) calculating of i-th layer of j-th of computing unit of b-th of module is indicated；N indicates one layer of calculating The computing unit sum that cellular construction is included；Indicate m layers of n-th of computing unit of b-th of module to b-th The sparse scaling operator of information flow between i-th layer of j-th of computing unit of module；H (b, m, n) indicates the m of b-th of module The output of n-th of computing unit of layer；O (b-1) indicates the output of the b-1 module, i.e., the input of b-th module；Table Show the sparse scaling of information flow between j-th of computing unit of the input O (b-1) to i-th layer of b-th of module of b-th of module Operator.Herein, if input of h (b, 0, the 0)=O (b-1) as b-th of module, if h (b, M+1,0)=O (b) is used as b-th of mould The output of block, wherein M indicates the layer sum that b-th of module is included.It can determine that the computing unit positioned at m layers shares in this way (m-1) N+1 input.

Herein, it should be noted that in embodiments of the present invention, company of each computing unit to module outlet chamber where it Connect is also that can train study.For example, the output O (b) of b-th of module can be by this in above-mentioned convolutional neural networks The output of all computing units is spliced in module, and reusing the convolution that convolution kernel size is 1 reduces the port number of characteristic pattern Keep port number constant, shown in following formula:

Wherein, h (b, m, n) is indicated in b-th of module, the output of n-th of computing unit in m layers,It indicates in b-th of module, the information flow of n-th of computing unit and b-th of module output connection in m layers Scaling operator, O (b-1) indicate the b-1 module output, i.e., the input of b-th module.The splicing of R (x) expression characteristic pattern The convolutional calculation for being 1 with convolution kernel size for fusion feature figure and guarantees that the port number of module output is constant.

For above-mentioned steps 105, can realize in the following way:

Step S1, construct the corresponding objective function of initial neural network, the objective function include loss function, weight just Then function and sparse regular function.The objective function can be such as formula:

Wherein, W is weight, and λ is sparse scaling operator vector, and K is the quantity of sample data, L (y_i,Net(x_i, W, λ)) be Neural network is in sample data x_iOn loss, y_iFor sample label, Net (x_i, W, λ) be neural network output,For Weight regular function is denoted as R (W), parameter decaying weight of the δ for weight W, γ | | λ | |₁For sparse regular function, it is denoted as Rs (λ).In addition, sparse regular function γ herein | | λ | |₁It can also be substituted by more complicated sparse constraint, such as non-convex dilute Dredge constraint.

Step S2, training is iterated to the initial neural network using the training sample data.

Step S3, it when repetitive exercise number reaches threshold value or the objective function meets the preset condition of convergence, obtains To the intermediate nerve network.

Preferably, abovementioned steps S2 specific implementation can by initial neural network carry out repeatedly repetitive exercise below, To be described for the iterative process (hereinafter referred to as current iteration training) of primary non-iteration for the first time and non-tail time iteration, once Repetitive exercise includes the following steps C1~step C3:

Step C1, the sparse scaling operator for obtaining a preceding repetitive exercise is as the constant of the objective function, by institute Variable of the weight as the objective function is stated, the objective function is optimized using the first optimization algorithm, obtains this The weight of repetitive exercise；

Step C2, using the weight of current iteration training as the constant of the objective function, using sparse scaling operator as The variable of the objective function optimizes the objective function using the second optimization algorithm, obtains current iteration training Sparse scaling operator；

Step C3, weight and sparse scaling operator based on current iteration training carry out next iteration training.

In addition, repetitive exercise process is as follows for the first time:, will using initial sparse scaling operator as the constant of the objective function Variable of the weight as the objective function optimizes the objective function using the first optimization algorithm, obtains this The weight of secondary repetitive exercise；Using the weight of current iteration training as the constant of the objective function, sparse scaling operator is made For the variable of the objective function, the objective function is optimized using the second optimization algorithm, obtains current iteration training Sparse scaling operator；Weight and sparse scaling operator based on current iteration training carry out second of repetitive exercise.

In addition, tail time repetitive exercise process is as follows: the sparse scaling operator that a preceding repetitive exercise is obtained is as described in The constant of objective function, using the weight as the variable of the objective function, using the first optimization algorithm to the target letter Number optimizes, and obtains the weight of current iteration training；Using the weight of current iteration training as the constant of the objective function, Using sparse scaling operator as the variable of the objective function, the objective function is optimized using the second optimization algorithm, Obtain the sparse scaling operator of current iteration training；By the mind of sparse the scaling operator and weight that are obtained comprising current iteration training Through network as intermediate nerve network.

Herein, in embodiments of the present invention, which can be, but not limited to as any one following algorithm: with Machine gradient descent algorithm, the mutation algorithm for introducing momentum.

Herein, in embodiments of the present invention, which can be, but not limited to as any one following algorithm: add Fast proximal end gradient descent algorithm, proximal end gradient descent algorithm or alternating direction Multiplier Algorithm.

For further the W and λ that how are solved in objective function in the embodiment of the present invention are described in detail, below By taking objective function is above-mentioned formula (5) as an example, an iteration training optimization object function is solved to obtain W and λ is described.It willIt is denoted as g (λ), Rs (λ) is denoted as H (λ).

Using λ as constant, using W as variable, then objective function, which changes, switchs toUsing stochastic gradient Descent algorithm can solve to obtain the value of W, and detailed process is not described in detail.

Using W as constant, using λ as variable, then objective function, which changes, switchs toUsing acceleration proximal end Gradient descent algorithm solves the value of λ, can specifically pass through but be not limited only to following methods and obtains:

Mode 1 obtains λ using formula (6)~formula (8):

λ_t=prox η_tH(z_t) formula (8)

Wherein η t indicates the step-length that gradient declines in the t times repetitive exercise, For soft-threshold operator, it is defined as follows S_α(z)_i=sign (z_i)(|z_i|-α)₊。

Mode 2 is solved λ due to foregoing manner 1 and needs additional forward-backward algorithm to calculate to obtainThe algorithm is straight It scoops out and uses existing deep learning frame and have a difficulty.Therefore, mode 2 is updated the formula of foregoing manner 1, obtains formula (9) λ is calculated according to formula (9)~formula (11) in~formula (11):

λ_t=λ_t-1+v_tFormula (11)

Mode 3, the present invention can also use variable substitution method, i.e., λ is calculated using following formula (12)~(14):

Wherein λ '_t-1=λ_t-1+μ_t-1v_t-1, μ is preset fixed value, and come more in the form of batch stochastic gradient descent New W and λ.

Later, in above-mentioned steps 106, the information flow that scaling operator sparse in intermediate nerve network is zero can be deleted It removes, obtains the search result neural network in search space.Also, in information flow corresponding with the connection of a computing unit quilt After deletion, then the computing unit can then delete the computing unit to subsequent calculating without effect.

For example, as shown in figure 3, the embodiment of the present invention is applied in picture classification task.Contain in setting basic network Double-layer structure Level1 and Level2, the connection there are two different computing unit OP1 and OP2 in every layer, between computing unit As shown in the leftmost side of Fig. 3.After above-mentioned steps 101 to step 105, it can train to obtain shown in the centre in Fig. 3 The sparse scaling operator of dotted line is 0.And then as shown in the rightmost side of Fig. 3, after these dotted lines are deleted, the meter of Level1 layers of confirmation The connectionless corresponding information flow of unit OP1 is calculated, then is also deleted, finally obtains search result neural network.

It is worth noting that Fig. 3 exemplifications set out is only a concrete application of the embodiment of the present invention, and not all Using.The embodiment of the present invention is located at the sparse of network disparate modules in addition to applying other than individual module search structure in the present invention Scaling operator can be enabled disparate modules to search for training and be obtained more flexible network structure with independently updated.

In addition, the embodiment of the present invention also provides a kind of object detection method, comprising:

The sample data for obtaining pending target detection, the structure for being input to the corresponding deep neural network of above-mentioned Fig. 1 are searched In the search result neural network that Suo Fangfa is obtained, using the output of described search result neural network as object detection results.

In addition, the embodiment of the present invention also provides a kind of semantic segmentation method, comprising:

The sample data for obtaining pending semantic segmentation, the structure for being input to the corresponding deep neural network of above-mentioned Fig. 1 are searched In the search result neural network that Suo Fangfa is obtained, using the output of described search result neural network as semantic segmentation result.

The search structure method of the corresponding deep neural network of Fig. 1 is not limited solely to apply in target detection and semantic point It cuts in task, can be also used for will not enumerate herein in other different tasks.

In addition, as shown in figure 4, the embodiment of the present invention also provides a kind of search structure device of deep neural network, it is special Sign is, comprising:

Computing unit structure obtaining unit 31, in pre-set search space obtain deep neural network according to Every layer of computing unit structure in each module of secondary concatenation；Every layer of computing unit structure includes that at least one calculates list Member.

Information flow obtaining unit 32, for being connected each computing unit using default connection type in each module It connects, obtains the information flow in each module；Wherein, without even between the computing unit in same layer computing unit structure Connect, each computing unit can with its different layers in the module computing unit and the input of module where it and Output is attached.

Initial neural network obtaining unit 33, for the connection according to the computing unit in module and each module, Obtain initial neural network.

Sparse scaling operator setting unit 34 is calculated for sparse scaling to be arranged to the information flow in the initial neural network Son, wherein the sparse scaling operator is for zooming in and out the information flow.

Weight and operator training unit 35, for the power using preset training sample data to the initial neural network The sparse scaling operator of weight and information flow is trained, and obtains intermediate nerve network.

Search result obtaining unit 36, for by the intermediate nerve network it is sparse scaling operator be zero information flow delete It removes, obtains the search result neural network in search space.

In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored thereon with computer program, It is characterized in that, which realizes the search structure method of the corresponding deep neural network of above-mentioned Fig. 1 when being executed by processor.

In addition, the embodiment of the present invention also provides a kind of computer equipment, including memory, processor and it is stored in storage And the computer program that can be run on a processor, the processor realize the corresponding depth of above-mentioned Fig. 1 when executing described program The search structure method of neural network.

In conclusion a kind of search structure method and device of deep neural network provided in an embodiment of the present invention, firstly, Every layer of computing unit knot in each module being sequentially connected in series in deep neural network is obtained in pre-set search space Structure；Every layer of computing unit structure includes at least one computing unit；It later, will be each using default connection type in each module Computing unit is attached, and obtains the information flow in each module；Wherein, the computing unit in same layer computing unit structure Between without connection, each computing unit can with its institute's different layers in the module computing unit and its place Outputting and inputting for module is attached；Then, it according to the connection of the computing unit in module and each module, obtains just Beginning neural network；Sparse scaling operator is arranged to the information flow in initial neural network, wherein sparse scaling operator is used for letter Breath stream zooms in and out；Using preset training sample data to the weight of initial neural network and the sparse scaling operator of information flow It is trained, obtains intermediate nerve network；In turn, the information flow that scaling operator sparse in intermediate nerve network is zero is deleted, Obtain the search result neural network in search space.The present invention and important net is directly searched for from search space in the prior art Network structure is different, and the present invention can delete unessential information flow by sparse scaling operator to realize the search of network structure.This Invention is in the search process of network structure, without being trained to controller, without complicated evolution algorithm is used, is not required to Sub-network is trained for a long time, search result only can be obtained by the training to weight and sparse scaling operator, So that the time of network structure search greatly reduces, searches for especially for the network structure on large-scale dataset, more save Save the time of network structure search.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Specific embodiment is applied in the present invention, and principle and implementation of the present invention are described, above embodiments Explanation be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, According to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion in this specification Appearance should not be construed as limiting the invention.

Claims

1. a kind of search structure method of deep neural network characterized by comprising

The every layer of calculating obtained in each module being sequentially connected in series in deep neural network in pre-set search space is single Meta structure；Every layer of computing unit structure includes at least one computing unit；

Each computing unit is attached using default connection type in each module, obtains the information flow in each module； Wherein, between the computing unit in same layer computing unit structure without connection, each computing unit can with its institute The computing unit of different layers in the module and outputting and inputting for module where it are attached；

Sparse scaling operator is arranged to the information flow in the initial neural network, wherein the sparse scaling operator is used for institute Information flow is stated to zoom in and out；

The weight of the initial neural network and the sparse scaling operator of information flow are carried out using preset training sample data Training, obtains intermediate nerve network；

The information flow that scaling operator sparse in the intermediate nerve network is zero is deleted, the search result in search space is obtained Neural network.

2. the search structure method of deep neural network according to claim 1, which is characterized in that every layer of calculating is single The computing unit of meta structure includes at least one of convolutional calculation unit and pond computing unit.

3. the search structure method of deep neural network according to claim 1, which is characterized in that adopted in each module Each computing unit is attached with default connection type, obtains the information flow in each module, comprising:

In each module, by each computing unit with and its different layers in the module computing unit and its place Outputting and inputting for module is attached；It obtains being input to every layer of computing unit structure, from every layer of computing unit knot from module Structure is to the information flow between the output and each computing unit of module.

4. the search structure method of deep neural network according to claim 1, which is characterized in that according to module and often The connection of computing unit in a module, after obtaining initial neural network, further includes:

The weight of initial neural network is configured, to initialize the weight of initial neural network.

5. the search structure method of deep neural network according to claim 1, which is characterized in that according to module and often The connection of computing unit in a module, after obtaining initial neural network, further includes:

Pre-training is carried out using weight of the preset pre-training sample data to the initial neural network, after obtaining pre-training Initial neural network.

6. the search structure method of deep neural network according to claim 1, which is characterized in that will the intermediate mind After being deleted through the information flow that scaling operator sparse in network is zero, further includes:

After information flow corresponding with the connection of a computing unit is deleted, which is deleted.

7. the search structure method of deep neural network according to claim 1, which is characterized in that described using preset Training sample data are trained the weight of the initial neural network and the sparse scaling operator of information flow, obtain intermediate mind Through network, comprising:

Construct the corresponding objective function of initial neural network, the objective function includes loss function, weight regular function and dilute Dredge regular function；

Training is iterated to the initial neural network using the training sample data；

When repetitive exercise number reaches threshold value or the objective function meets the preset condition of convergence, the intermediate mind is obtained Through network.

8. the search structure method of deep neural network according to claim 7, which is characterized in that described to use the instruction Practice sample data and training be iterated to the initial neural network, specifically includes:

Repeatedly repetitive exercise below is carried out to the initial neural network:

The sparse scaling operator that a preceding repetitive exercise is obtained is as the constant of the objective function, using the weight as institute The variable for stating objective function optimizes the objective function using the first optimization algorithm, obtains the power of current iteration training Weight；

Using the weight of current iteration training as the constant of the objective function, using sparse scaling operator as the objective function Variable, the objective function is optimized using the second optimization algorithm, obtain current iteration training sparse scaling operator；

Weight and sparse scaling operator based on current iteration training carry out next iteration training.

9. the search structure method of deep neural network according to claim 8, which is characterized in that second optimization is calculated Method is to accelerate proximal end gradient descent algorithm, proximal end gradient descent algorithm or alternating direction Multiplier Algorithm.

10. the method according to the description of claim 7 is characterized in that the objective function are as follows:

Wherein, W is weight, and λ is sparse scaling operator vector, and K is the quantity of sample data, L (y_i,Net(x_i, W, λ)) it is nerve Network is in sample data x_iOn loss, y_iFor sample label, Net (x_i, W, λ) be neural network output,For weight Regular function, parameter decaying weight of the δ for weight W, γ | | λ | |₁For sparse regular function.

11. a kind of object detection method characterized by comprising

The sample data for obtaining pending target detection is input to using the described in any item depth nerves of claims 1 to 10 In the search result neural network that the search structure method of network obtains, using the output of described search result neural network as mesh Mark testing result.

12. a kind of semantic segmentation method characterized by comprising

The sample data for obtaining pending semantic segmentation is input to using the described in any item depth nerves of claims 1 to 10 In the search result neural network that the search structure method of network obtains, using the output of described search result neural network as language Adopted segmentation result.

13. a kind of search structure device of deep neural network characterized by comprising

Computing unit structure obtaining unit is sequentially connected in series for obtaining in deep neural network in pre-set search space Each module in every layer of computing unit structure；Every layer of computing unit structure includes at least one computing unit；

Information flow obtaining unit is obtained for being attached each computing unit using default connection type in each module Information flow in each module；Wherein, without connection, Mei Geji between the computing unit in same layer computing unit structure Calculating unit can be with the company of outputting and inputting of computing unit and its place module with its institute's different layers in the module It connects；

Initial neural network obtaining unit obtains just for the connection according to the computing unit in module and each module Beginning neural network；

Sparse scaling operator setting unit, for sparse scaling operator to be arranged to the information flow in the initial neural network, Described in sparse scaling operator for being zoomed in and out to the information flow；

Weight and operator training unit, for the weight and letter using preset training sample data to the initial neural network The sparse scaling operator of breath stream is trained, and obtains intermediate nerve network；

Search result obtaining unit is obtained for the information flow deletion for being zero by scaling operator sparse in the intermediate nerve network Search result neural network in search space.

14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Claims 1 to 10 described in any item methods are realized when execution.

15. a kind of computer equipment including memory, processor and is stored in the calculating that storage is upper and can run on a processor Machine program, which is characterized in that the processor realizes the described in any item methods of claims 1 to 10 when executing described program.