WO2020082663A1

WO2020082663A1 - Structural search method and apparatus for deep neural network

Info

Publication number: WO2020082663A1
Application number: PCT/CN2019/077049
Authority: WO
Inventors: 黄泽昊; 张新邦; 王乃岩
Original assignee: 北京图森未来科技有限公司
Priority date: 2018-10-26
Filing date: 2019-03-05
Publication date: 2020-04-30
Also published as: CN109284820A; CN110717586A

Abstract

A structural search method and apparatus for a deep neural network, relating to the field of artificial intelligence technology. The method comprises: obtaining, in a preset search space, the structure of each layer of computing units in each module connected in series in a deep neural network (101); connecting the computing units in each module in a preset connection manner to obtain information flows in each module (102); obtaining an initial neural network according to the modules and a connection situation of the computing units in each module (103); configuring a sparse scaling operator for the information flows in the initial neural network, the sparse scaling operator being used for scaling the information flows (104); using preset training sample data to train the weight of the initial neural network and the sparse scaling operator for the information flows, to obtain an intermediate neural network (105); and deleting, from the intermediate neural network, the information flows with the sparse scaling operator being zero, to obtain a search result neural network within the search space (106). The solution saves the time of a network structure search.

Description

Deep neural network structure search method and device

This application requires the priority of the Chinese patent application filed on October 26, 2018 in the Chinese Patent Office with the application number 201811259033.2 and the application name as "a deep neural network structure search method and device" In this application.

Technical field

The present application relates to the field of artificial intelligence technology, in particular to a deep neural network structure search method and device.

Background technique

In recent years, deep neural networks have achieved great success in many fields, such as computer vision and natural language processing. Deep neural networks transform traditional hand-designed features into end-to-end learning through powerful representation capabilities. However, the current deep neural network has a complex structure, such as convolution, pooling, and many computing unit nodes, so how to search among many computing unit nodes to obtain a compact, fast, and effective model structure has become a difficulty.

At present, in the prior art, a search space is first defined, and then an optimal network structure is searched in the search space. Generally, heuristic methods based on controller-based network structure search can be used for network structure search, or evolutionary algorithms can be used for network structure search. However, in the prior art, a controller needs to be trained or an evolutionary algorithm to be used to search the network structure. During the search process, the subnet in the full set needs to be trained to converge to evaluate the subnetwork, making the time and calculation of the network structure search The amount is huge. For larger data sets, the process of searching for the optimal network structure using this method is cumbersome and slow.

Summary of the invention

The embodiments of the present application provide a deep neural network structure search method and device to solve the problem that the time and calculation amount of the network structure search in the prior art are extremely large. For larger data sets, the optimal network structure is searched. The process is cumbersome and slow.

In order to achieve the above purpose, this application adopts the following technical solutions:

On the one hand, this application provides a deep neural network structure search method, including:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.

On the other hand, this application provides a target detection method, including:

The sample data to be subjected to target detection is obtained and input into the search result neural network obtained by using the above-mentioned deep neural network structure search method, and the output of the search result neural network is used as the target detection result.

On the other hand, this application provides a semantic segmentation method, including:

The sample data to be semantically segmented is obtained and input into the search result neural network obtained by using the structure search method of the deep neural network described above, and the output of the search result neural network is used as the semantic segmentation result.

In yet another aspect, the present application provides a deep neural network structure search device, including:

A computing unit structure obtaining unit, configured to obtain each layer of computing unit structures in each module connected in series in the deep neural network in a preset search space; each layer of computing unit structure includes at least one computing unit;

The information flow obtaining unit is used to connect each computing unit in each module with a preset connection method to obtain the information flow in each module; wherein, the computing units in the same layer of computing unit structure are not connected, Each computing unit can be connected to the computing units at different layers in the module where it is located, and the inputs and outputs of the module where it is located;

The initial neural network obtaining unit is used to obtain the initial neural network according to the connection of the modules and the computing units in each module;

A sparse scaling operator setting unit, configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

A weight and operator training unit, used to train the weight of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network;

A search result obtaining unit is used to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.

In yet another aspect, the present application provides a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the above deep neural network structure search method is implemented.

In still another aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor. When the processor executes the program, the structure of the deep neural network described above is realized. Search method.

An embodiment of the present application provides a method and device for searching a structure of a deep neural network. First, the structure of each layer of computing units in each module connected in series in the deep neural network is obtained in a preset search space; The unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection manner in each module to obtain the information flow in each module; wherein, between computing units in the same layer of computing unit structure is not Connect, each computing unit can be connected to the computing units at different layers of the module where it is located, and the input and output of the module where it is located; then, according to the connection of the module and the computing unit in each module, the initial Neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; the preset training sample data is used to weight the initial neural network and the sparse scaling operation of the information flow Sub-training to obtain an intermediate neural network; further, the intermediate neural network is sparse Operators put information deleted sub-zero flow, neural networks get search results in the search space. This application is different from the prior art in searching for an important network structure directly from the search space. This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator. This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators. The search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.

Other features and advantages of the present application will be explained in the subsequent description, and partly become obvious from the description, or be understood by implementing the present application. The purpose and other advantages of the present application can be realized and obtained by the structures specifically pointed out in the written description, claims, and drawings.

The technical solutions of the present application will be further described in detail below through the accompanying drawings and embodiments.

BRIEF DESCRIPTION

The drawings are used to provide a further understanding of the present application, and constitute a part of the specification. They are used to explain the present application together with the embodiments of the present application, and do not constitute a limitation on the present application. Obviously, the drawings in the following description are only some embodiments of the present application, and those of ordinary skill in the art can obtain other drawings based on these drawings without creative efforts. In the drawings:

FIG. 1 is a flowchart 1 of a method for searching a deep neural network structure provided by an embodiment of the present application;

2 is a schematic diagram of a network structure in a search space in a deep neural network involved in an embodiment of this application;

3 is a schematic diagram of an example of applying the embodiment of the present application to a two-layer structure network search;

FIG. 4 is a schematic structural diagram of a deep neural network structure search device provided by an embodiment of the present application.

detailed description

In order to enable those skilled in the art to better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of this application.

In order to facilitate understanding of this application, the following explains technical terms involved in this application:

DNN: Deep Neural Network (Deep Neural Network).

Computing unit: A unit node in a neural network used for calculations such as convolution and pooling.

Network structure search: The process of searching for the optimal network structure in a neural network.

In the process of implementing the embodiments of the present application, the applicant found that the prior art generally uses a heuristic method of network structure search based on a controller, namely:

According to the prior knowledge and deep neural network structure (neuron, network layer, module, module and other specific structures) to build some network structures to be searched; then set the controller for the network structure to be searched, using a distributed solution That is, for each controller, parallel calculation of multiple network structures to be searched is performed, and the accuracy of each network structure is obtained and used to perform gradient descent calculation on the controller, thereby obtaining the optimal network structure. It can be seen that for the heuristic method of searching the network structure based on the controller, a large number of controllers need to be trained and distributed, and the process is cumbersome and slow.

In order to solve the above problems in the prior art, as shown in FIG. 1, an embodiment of the present application provides a structure search method for a deep neural network, including:

Step 101: Obtain a computing unit structure of each layer in each module connected in series in the deep neural network in a preset search space.

Wherein, each layer of computing unit structure includes at least one computing unit.

Step 102: Connect each computing unit in a preset connection mode in each module to obtain the information flow in each module.

Among them, there is no connection between the computing units in the same layer of computing unit structure, and each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located.

Step 103: Obtain an initial neural network according to the module and the connection of the computing units in each module.

Step 104: Set a sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow.

Step 105: Use the preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network.

Step 106: Delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.

It is worth noting that in the deep neural network, the preset search space can be as shown in Figure 2, which can include multiple modules 21, each module 21 in series, that is, the output of the previous module is the next module Input; each module 21 (can be regarded as a directed acyclic graph) can include a multi-layer computing unit structure 22, each layer of computing unit structure 22 includes at least one computing unit 23 (each computing unit can be regarded as a directed acyclic graph Node in the), the calculation unit 23 in the calculation unit structure 22 of each layer may generally include at least one of a convolution calculation unit and a pooling calculation unit. The convolution calculation unit may also be an expansion convolution calculation unit or a group convolution calculation unit.

Preferably, the above step 102 can be implemented as follows:

In each module 21, each computing unit 23 is connected in a fully connected manner, that is, as shown in FIG. 2, each computing unit 23 and the computing unit 23 at a different layer in the module 21 in which it is located, and its Connect the input and output of the module 21; this can get the input from the module 21 to the computing unit structure 22 of each layer, the output from the computing unit structure 22 of each layer to the module 21, and the information flow between the computing units 23 (It can be regarded as an edge between nodes in a directed acyclic graph). In this way, a complete set of network structures in the search space can be obtained (any network structure in the search space can be regarded as a subgraph of the directed acyclic graph). For example, in a module 21, the output h (i) of the i-th calculation unit F ⁽ⁱ⁾ (x) is equal to the sum of the outputs h (j) of all previous calculation units through the calculation unit F ⁽ⁱ⁾ ( x) The calculation result can be expressed as:

In this way, in the above step 103, according to the structure shown in FIG. 2 above, an initial neural network can be obtained.

Further, after the above step 103, the weights of the initial neural network may be configured to initialize the weights of the initial neural network. Alternatively, after the above step 103, the pre-trained sample data may be used to pre-train the weights of the initial neural network to obtain the pre-trained initial neural network, so that after pre-training, the weights of the initial neural network obtained better. Here, the weights are configured or pre-trained in order to obtain the initial weight values of the initial neural network, so as to facilitate the subsequent setting and training of the sparse scaling operator.

Then in the above step 104, a sparse scaling operator needs to be set for the information flow in the initial neural network, that is, for example, a sparse scaling operator is added at the output h (j) of all the previous computing units

A sparse scaling operator used to represent the information flow between the jth calculation unit to the ith calculation unit. Then, after adding the sparse scaling operator, the above formula (1) should be expressed as:

Here, the value of each sparse scaling operator is greater than or equal to 0. For example, after configuring the weights of the initial neural network to initialize the weights of the initial neural network, the value interval of the sparse scaling operator may be [0,1], and the sparse scaling operator may not be equal to 1. However, after pre-training the initial neural network weights using preset pre-training sample data, the value of the sparse scaling operator is generally taken to be 1.

The following is a description of the search for a convolutional neural network structure. In the convolutional neural network structure, the calculation unit is the convolution calculation unit and the pooling calculation unit, and the information flow is the feature map in the network. In this convolutional neural network structure, it contains several modules, each module contains several layers of computing unit structure, and each layer of computing unit structure includes several different computing units (for example, 1 × 1 convolution calculation, 3 (× 3 convolution calculation, 5 × 5 convolution calculation, pooling calculation, etc. are not limited to the above) Each module is connected in series, that is, the output of the previous module is the input of the next module, and each computing unit is connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located. In this way, the output of each computing unit can be expressed. For example, in the structure of a convolutional neural network, the output of the jth computing unit of the i-th layer of the b-th module can be expressed as:

Among them, F ^{(b, i, j)} (x) represents the calculation of the j-th computing unit of the i-th layer of the b-th module; N represents the total number of computing units included in the structure of one-layer computing unit;

Represents the sparse scaling operator of the information flow between the nth computing unit of the mth layer of the bth module and the jth computing unit of the ith layer of the bth module; h (b, m, n) represents the The output of the nth calculation unit of the mth layer of the b modules; O (b-1) represents the output of the b-1th module, that is, the input of the bth module;

Represents the sparse scaling operator of the information flow between the input O (b-1) of the b-th module to the j-th computing unit of the i-th layer of the b-th module. Here, let h (b, 0,0) = O (b-1) be the input of the bth module, and let h (b, M + 1,0) = O (b) be the output of the bth module , Where M represents the total number of layers contained in the b-th module. In this way, it can be determined that the computing unit located at the m-th layer has (m-1) N + 1 inputs.

Here, it should be noted that, in the embodiment of the present application, the connection between each computing unit and the output of the module where it is located can also be trained and learned. For example, in the above convolutional neural network, the output O (b) of the bth module can be spliced by the outputs of all computing units in the module, and then the convolution kernel size of 1 is used to reduce the number of channels of the feature map To keep the number of channels unchanged, as shown in the following formula:

Among them, h (b, m, n) represents the output of the nth computing unit in the mth layer in the bth module,

Represents the scaling operator of the information stream connected to the output of the nth computing unit in the mth layer and the bth module in the bth module, O (b-1) represents the output of the b-1th module, that is Input of module b. R (x) represents the splicing of the feature map and the convolution calculation with the size of the convolution kernel being 1. It is used to fuse the feature map and ensure that the number of channels output by the module remains unchanged.

For the above step 105, it can be implemented as follows:

Step S1. Construct an objective function corresponding to the initial neural network. The objective function includes a loss function, a weight regular function, and a sparse regular function. The objective function can be as the formula:

Among them, W is the weight, λ is the sparse scaling operator vector, K is the number of sample data, L (y _i , Net (x _i , W, λ)) is the loss of the neural network on the sample data x _i , y _i Is the sample label, and Net (x _i , W, λ) is the output of the neural network,

Is the weight regular function, denoted R (W), δ is the parameter attenuation weight of the weight W, and γ || λ || ₁ is the sparse regular function, denoted Rs (λ). In addition, the sparse regular function γ || λ || ₁ can also be replaced by more complex sparse constraints, such as non-convex sparse constraints.

Step S2: Perform iterative training on the initial neural network using the training sample data.

Step S3: When the number of iteration training times reaches a threshold or the target function meets a preset convergence condition, the intermediate neural network is obtained.

In some embodiments, the foregoing step S2 may be implemented by performing the following iteration training on the initial neural network multiple times, taking an iteration process of a non-first iteration and a non-tail iteration (hereinafter referred to as this iteration training) as an example It is described that one iteration training includes the following steps C1 to C3:

Step C1, using the sparse scaling operator obtained in the previous iteration training as the constant of the objective function, using the weight as the variable of the objective function, and optimizing the objective function using the first optimization algorithm to obtain this time Weights for iterative training;

Step C2. Using the weight of this iteration training as the constant of the objective function, using the sparse scaling operator as the variable of the objective function, and optimizing the objective function with the second optimization algorithm to obtain Sparse scaling operator;

Step C3. Perform the next iterative training based on the weights and sparse scaling operator of this iterative training.

In addition, the first iteration training process is as follows: the initial sparse scaling operator is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm to obtain this time Weights for iterative training; use the weights for this iterative training as the constants of the objective function, and use the sparse scaling operator as the variable of the objective function, and optimize the objective function with the second optimization algorithm to obtain this iteration The sparse scaling operator for training; the second iterative training is based on the weights of this iterative training and the sparse scaling operator.

In addition, the last iteration training process is as follows: the sparse scaling operator obtained from the previous iteration training is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm Perform optimization to obtain the weight of this iteration training; take the weight of this iteration training as the constant of the objective function, use the sparse scaling operator as the variable of the objective function, and use the second optimization algorithm to perform the objective function Optimize to obtain the sparse scaling operator trained in this iteration; use the neural network containing the sparse scaling operator and weights obtained in this iteration training as the intermediate neural network.

Here, in the embodiment of the present application, the first optimization algorithm may be, but not limited to, any one of the following algorithms: a stochastic gradient descent algorithm, a variant algorithm that introduces momentum.

Here, in the embodiment of the present application, the second optimization algorithm may be, but not limited to, any one of the following algorithms: accelerated near-end gradient descent algorithm, near-end gradient descent algorithm, or alternating direction multiplier algorithm.

In order to further describe in detail how to solve for the W and λ in the objective function in the embodiment of the present application, the following uses the objective function as the above formula (5) as an example to describe W and λ obtained by solving the objective function for one iteration of training optimization. will

It is denoted as g (λ), and Rs (λ) is denoted as H (λ).

Using λ as a constant and W as a variable, the objective function is converted to

The value of W can be obtained by using the stochastic gradient descent algorithm, and the specific process will not be described in detail.

Using W as a constant and λ as a variable, the objective function is converted to

The accelerated near-end gradient descent algorithm is used to solve the value of λ, which can be obtained by but not limited to the following methods:

Method 1: Use formula (6) to formula (8) to obtain λ:

λ _t = Proxη _t H (z _t ) Formula (8)

Where η _t represents the step size of gradient descent during the t-th iteration training,

For the soft threshold operator, it is defined as S _α (z) _i = sign (z _i ) (| z _i | -α) ₊ .

Method 2. Since the previous method 1 solves λ, additional forward and backward calculations are required to obtain

Applying this algorithm directly to existing deep learning frameworks is a bit difficult. Therefore, Mode 2 updates the formula of the foregoing Mode 1 to obtain formula (9) to formula (11), and calculates λ according to formula (9) to formula (11):

λ _t = λ _t-1 + v _t formula (11)

Method 3: In this application, a variable substitution method can also be used, that is, the following formulas (12) to (14) are used to calculate λ:

Where λ ′ _t-1 = λ _t-1 + μ _t-1 v _t-1 , μ is a preset fixed value, and W and λ are updated in the form of batch stochastic gradient descent.

Then, in the above step 106, the information stream with the sparse scaling operator being zero in the intermediate neural network can be deleted to obtain the search result neural network in the search space. In addition, after the information streams corresponding to the connection of one computing unit are all deleted, the computing unit has no effect on subsequent calculations, and the computing unit may be deleted.

For example, as shown in FIG. 3, the embodiment of the present application is applied to a picture classification task. It is assumed that the basic network contains two layers of Level 1 and Level 2, and there are two different computing units OP 1 and OP 2 in each layer. The connection between the computing units is shown in the leftmost side of FIG. 3. After the above steps 101 to 105, it can be trained that the sparse scaling operator of the broken line shown in the middle of FIG. 3 is 0. Furthermore, as shown in the far right side of FIG. 3, after deleting these dotted lines, it is confirmed that the computing unit OP at Level 1 has no information flow corresponding to the connection, and it is also deleted, and finally the search result neural network is obtained.

It is worth noting that the example listed in FIG. 3 is only a specific application of the embodiments of the present application, and not all applications. In addition to being applied to the search of a single module structure in this embodiment of the present application, the sparse scaling operators located in different modules of the network in this application can also be independently updated, so that different modules can search and train to obtain a more flexible network structure.

In addition, embodiments of the present application also provide a target detection method, including:

The sample data to be subjected to target detection is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the target detection result.

In addition, embodiments of the present application also provide a semantic segmentation method, including:

The sample data to be semantically segmented is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the semantic segmentation result.

The structure search method of the deep neural network corresponding to FIG. 1 is not limited to the application in target detection and semantic segmentation tasks, but can also be used in other different tasks, which will not be listed here one by one.

In addition, as shown in FIG. 4, an embodiment of the present application also provides a deep neural network structure search device, which is characterized by including:

The calculation unit structure obtaining unit 31 is configured to obtain each layer of the calculation unit structure in each module connected in series in the deep neural network in a preset search space; each layer of the calculation unit structure includes at least one calculation unit.

The information flow obtaining unit 32 is used to connect each computing unit with a preset connection method in each module to obtain the information flow in each module; wherein, computing units in the same layer computing unit structure are not connected Each computing unit can be connected to computing units at different layers of the module in which it resides, and the inputs and outputs of the module in which it resides.

The initial neural network obtaining unit 33 is used to obtain the initial neural network according to the connection of the modules and the calculation units in each module.

The sparse scaling operator setting unit 34 is configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow.

The weight and operator training unit 35 is configured to train the weights of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network.

The search result obtaining unit 36 is configured to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.

In addition, embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the structure search method of the deep neural network corresponding to FIG. 1 described above is implemented.

In addition, the embodiments of the present application also provide a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor. When the processor executes the program, the depth corresponding to FIG. 1 described above is realized. Neural network structure search method.

In summary, a method and device for searching a structure of a deep neural network provided in the embodiments of the present application, first, obtain each layer of computing units in each module connected in series in the deep neural network in a preset search space Structure; each layer of computing unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection mode in each module to obtain the information flow in each module; There is no connection between the computing units, each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located; then, according to the module and the computing unit in each module Connect the situation to get the initial neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; use the preset training sample data to weight and information the initial neural network The sparse scaling operator of the stream is trained to obtain an intermediate neural network; Sparse network operators scale of zero flow of information deleted, the neural network get search results in the search space. This application is different from the prior art in searching for an important network structure directly from the search space. This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator. This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators The search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.

The basic principles of the present application have been described above in conjunction with specific embodiments. However, it should be noted that those of ordinary skill in the art can understand that all or any steps or components of the methods and devices of the present application can be included in any computing device (including Processors, storage media, etc.) or computing device networks, implemented in hardware firmware, software, or a combination of these, this is the basic programming skills of those of ordinary skill in the art after reading the description of this application. Achievable.

A person of ordinary skill in the art may understand that all or part of the steps carried in the method of the above embodiment may be completed by instructing relevant hardware through a program, and the program may be stored in a computer-readable storage medium, and when the program is executed , Including one of the steps of the method embodiment or a combination thereof.

In addition, the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer usable program code.

This application is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and / or block in the flowchart and / or block diagram and a combination of the flow and / or block in the flowchart and / or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple blocks of a flowchart.

These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.

Although the above embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be construed as including the above embodiments and all changes and modifications falling within the scope of the present application.

Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and equivalent technologies thereof, the present application is also intended to include these modifications and variations.

Claims

A deep neural network structure search method, which is characterized by:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
The method according to claim 1, wherein the calculation unit of the calculation unit structure of each layer comprises at least one of a convolution calculation unit and a pooling calculation unit.
The method according to claim 1, wherein each computing unit is connected in a preset connection manner in each module to obtain the information flow in each module, including:

In each module, connect each computing unit to the computing unit at the different layer of the module in which it is located, and the input and output of the module where it is located; get the input from the module to the structure of the computing unit at each layer, from each layer The output from the calculation unit structure to the module, and the information flow between the calculation units.
The method according to claim 1, characterized in that after obtaining the initial neural network according to the connection status of the modules and the calculation units in each module, the method further comprises:

Configure the initial neural network weights to initialize the initial neural network weights.
The method according to claim 1, characterized in that after obtaining the initial neural network according to the connection status of the modules and the calculation units in each module, the method further comprises:

The pre-trained sample data is used to pre-train the weights of the initial neural network to obtain the pre-trained initial neural network.
The method according to claim 1, characterized in that, after deleting the information stream whose sparse scaling operator is zero in the intermediate neural network, further comprising:

After all the information streams corresponding to the connection of a computing unit are deleted, the computing unit is deleted.
The method according to claim 1, characterized in that, the preset training sample data is used to train the weights of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network, including:

Construct an objective function corresponding to the initial neural network, the objective function including a loss function, a weight regular function and a sparse regular function;

Performing iterative training on the initial neural network using the training sample data;

When the number of iteration training times reaches a threshold or the target function meets a preset convergence condition, the intermediate neural network is obtained.
The method according to claim 7, wherein the iterative training of the initial neural network using the training sample data specifically includes:

Perform the following iterative training for the initial neural network multiple times:

The sparse scaling operator obtained in the previous iteration training is used as the constant of the objective function, and the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm to obtain Weights;

Using the weight of this iteration training as the constant of the objective function, and the sparse scaling operator as the variable of the objective function, the second optimization algorithm is used to optimize the objective function to obtain the sparse scaling algorithm of this iteration training child;

Based on the weights of this iterative training and the sparse scaling operator for the next iterative training.
The method according to claim 8, wherein the second optimization algorithm is an accelerated near-end gradient descent algorithm, a near-end gradient descent algorithm, or an alternating direction multiplier algorithm.
The method according to claim 7, wherein the objective function is:

Among them, W is the weight, λ is the sparse scaling operator vector, K is the number of sample data, L (y i , Net (x i , W, λ)) is the loss of the neural network on the sample data x i , y i Is the sample label, and Net (x i , W, λ) is the output of the neural network,
Is the weight regular function, δ is the parameter attenuation weight of the weight W, and γ || λ || 1 is the sparse regular function.
A target detection method, characterized in that it includes:

Obtain sample data to be detected by the target, and input it into the search result neural network obtained by the structural search method of the deep neural network, and use the output of the search result neural network as the target detection result; Search methods include:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
A semantic segmentation method, which includes:

Obtain the sample data to be semantically segmented, and input it into the search result neural network obtained by the structural search method of the deep neural network, and use the output of the search result neural network as the semantic segmentation result; wherein, the structure of the deep neural network Search methods include:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
A structure search device for deep neural network, which is characterized by comprising:

A computing unit structure obtaining unit, configured to obtain each layer of computing unit structures in each module connected in series in the deep neural network in a preset search space; each layer of computing unit structure includes at least one computing unit;

The information flow obtaining unit is used to connect each computing unit in each module with a preset connection method to obtain the information flow in each module; wherein, the computing units in the same layer of computing unit structure are not connected, Each computing unit can be connected to the computing units at different layers in the module where it is located, and the inputs and outputs of the module where it is located;

The initial neural network obtaining unit is used to obtain the initial neural network according to the connection of the modules and the computing units in each module;

A sparse scaling operator setting unit, configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

A weight and operator training unit, used to train the weight of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network;

A search result obtaining unit is used to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, a structure search method for a deep neural network is implemented, the method includes:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
A computer device, including a memory, a processor, and a computer program stored on the storage and runable on the processor, characterized in that the processor implements a deep neural network structure search method when the processor executes the program, the Methods include:

Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;

In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;

According to the module and the connection of the computing unit in each module, the initial neural network is obtained;

Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;

Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;

The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.