CN111062465A

CN111062465A - Image recognition model and method with neural network structure self-adjusting function

Info

Publication number: CN111062465A
Application number: CN201911259716.2A
Authority: CN
Inventors: 陈荣聪; 林倞; 王广润; 王广聪; 张吉祺
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-04-24

Abstract

The invention discloses an image recognition model and a method with a neural network structure self-adjusting function, wherein the model comprises the following steps: the pre-training model generating unit is used for constructing a neural network model structure and pre-training on a source domain based on standard transfer learning to obtain a pre-training model; a search space design unit for designing a search space of the neural network model structure such that the selected neural network model structure can be regarded as one instance in the search space; the combined fine tuning unit is used for combining and fine tuning network parameters and a network structure in a search space from the obtained pre-training model, and obtaining a target neural network structure after optimization; and the network parameter fine tuning unit is used for further fine tuning the network parameters of the target neural network structure obtained by the combined fine tuning unit.

Description

Image recognition model and method with neural network structure self-adjusting function

Technical Field

The invention relates to the technical field of computer vision based on deep learning, in particular to an image recognition model and method with a neural network structure self-adjusting function.

Background

There is a large body of evidence that in deep learning, pre-trained features can be transferred across tasks, i.e., migratory learning. In the 80's of the 20 th century, Hinton introduced migratory learning into deep learning, especially unsupervised learning. In 2012, this technology began to appeal to academia when ImageNet was first introduced to computer vision communities. In the tasks of image recognition, target detection, semantic segmentation, video recognition, pedestrian re-recognition, pedestrian attribute recognition and the like, a better result can be obtained by using the ImageNe pre-trained model for transfer learning. Other data domains are migrated after the ImageNet is pre-trained, so that the performance of a target task can be improved, the learning process can be accelerated, and the training time can be shortened. Standard transfer learning is not only applied to computer vision, but also in other fields, such as Natural Language Processing (NLP).

On the other hand, human processes that automatically design machine learning algorithms are of increasing interest. In particular, neural network architecture tuning is expected to reduce the time spent by human experts in neural network architecture design. However, there is an unresolved problem in neural network structure adjustment, namely how to effectively solve such a search model. The most accurate and reliable solution is to train each candidate architecture in the search space and compare their performances, and the neural network structure with the best performance is taken as the final neural network structure. However, this approach is very time consuming because the search space is typically large (e.g., greater than 1e 20). To address this problem, many researchers have explored training candidate architectures using Reinforcement Learning (RL) or evolutionary learning to guide the search direction. For example, in RL-based neural network structure tuning, only the most potential candidate neural network structures with the greatest reward are trained because they assume the target neural network structure to be included therein. These neural network structure adjustment algorithms achieve more significant performance, however, they still require a large amount of computation. For example, to obtain the most advanced neural network structure on CIFAR-10, reinforcement learning requires 1800GPU days, while evolutionary learning requires 3150GPU days. This indicates that training candidate neural network structures in a search subspace (e.g., 100 ten thousand neural network structures) is still impractical because even training only one neural network structure often takes a long time (e.g., training ResNet on ImageNet for more than 10GPU days).

At present, the following framework flows are mostly followed by standard image recognition systems: (a) pre-training a neural network on a large-scale dataset (e.g., ImageNet); (b) network parameters are fine-tuned over a smaller, task-specific data set. This process of migration learning is intended to migrate the recognition capabilities of the network from one data domain to another through parameter adaptation, but is based on the assumption that a fixed neural network structure is applicable to all data domains. However, data fields with different recognition targets may require different feature hierarchies, where some neurons may become redundant and others reactivated to form new network structures.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide an image recognition model and method with a self-adjusting neural network structure, which realize the joint optimization of the neural network structure and the improvement of the image recognition performance of the neural network parameter enhancement model by combining a transfer learning technology and a neural network structure adjustment technology, utilizing the characteristics learned by pre-training in the transfer learning and self-adaptively adjusting the neural network structure for different tasks and data.

To achieve the above object, the present invention provides an image recognition model with a neural network structure self-adjusting, comprising:

the pre-training model generating unit is used for constructing a neural network model structure and pre-training on a source domain based on standard transfer learning to obtain a pre-training model;

a search space design unit for designing a search space of the neural network model structure such that the selected neural network model structure can be regarded as one instance in the search space;

the combined fine tuning unit is used for combining and fine tuning network parameters and a network structure in a search space from the obtained pre-training model, and obtaining a target neural network structure after optimization;

and the network parameter fine tuning unit is used for further fine tuning the network parameters of the target neural network structure obtained by the combined fine tuning unit.

Preferably, in the pre-training model generation unit, after the neural network model structure is constructed, the network model α is given₀Training by using a source data set ImageNet to obtain a model parameter W of the pre-training network₀。

Preferably, the neural network model structure adopts a ResNet50 neural network structure.

Preferably, the pre-training model is expressed as:

where Φ (·) represents a nonlinear function of the neural network, X is an input of the neural network, and W ═ W¹,W²,…,Wⁱ,…,W^K-1,W^KDenotes the parameters of the neural network, K denotes the depth of the neural network, α₀For a given network model α₀，

Is represented by WⁱIn order to perform the convolution operation of the convolution kernel,

is an operation sequence symbol.

Preferably, in step S1, the standard migration learning can be formulated as:

W_*representing a given network structure α₀The best network parameters.

Preferably, the search space design unit expands the neural network structure selected in the pre-training model generation unit to a larger neural network space.

Preferably, the search space is represented as:

A＝{O₁,O₂,…,O_i,…,O_K-1,O_K}

wherein O is_i(1 ≦ i ≦ K) represents the set of candidate operations.

Preferably, the joint fine tuning unit obtains the discrete target neural network structure α by using a soft selection method on the basis of joint fine tuning_*。

Preferably, the image recognition model is trained on an ImageNet data set, and the obtained trained image recognition model realizes the classification of the input images.

In order to achieve the above object, the present invention further provides an image recognition method for self-adjusting a neural network structure, comprising the following steps:

step S1, constructing a neural network model structure, and pre-training on a source domain based on standard transfer learning to obtain a pre-training model;

step S2, designing a search space of the neural network structure, so that the selected neural network structure can be regarded as an example in the search space;

step S3, starting from the obtained pre-training model, combining and fine-tuning network parameters and a network structure in a search space, and obtaining a target neural network structure after optimization;

step S4 is to perform fine adjustment of network parameters in the target domain for the target neural network structure obtained in step S3.

Compared with the prior art, the image recognition method and the system for the self-adjustment of the neural network structure combine the transfer learning technology and the neural network structure adjustment technology, utilize the representation learned by pre-training in the transfer learning, adaptively adjust the neural network structure for different tasks and data, jointly optimize the neural network structure and the image recognition performance of the neural network parameter promotion model, realize the image recognition frame for the self-adjustment of the neural network structure based on the transfer learning, and simultaneously utilize the capabilities of standard transfer learning and network structure adjustment to obtain better performance.

Drawings

FIG. 1 is a system architecture diagram of an image recognition model with a self-adjusting neural network architecture according to the present invention;

FIG. 2 illustrates an initial network structure α according to an embodiment of the present invention₀Target network fabric α_*A relationship diagram of the search space A;

FIG. 3 is a graph illustrating the comparison of the results of the present invention with standard transfer learning;

fig. 4 is a system architecture diagram of an image recognition method with a self-adjusting neural network structure according to the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

FIG. 1 is a system architecture diagram of an image recognition model with a self-adjusting neural network structure according to the present invention. As shown in fig. 1, the image recognition model with a neural network structure self-adjusting according to the present invention includes:

and the pre-training model generating unit 101 is configured to construct a neural network model structure, such as ResNet50, and pre-train the neural network model structure on the source domain based on standard transfer learning to obtain a pre-training model.

In an embodiment of the present invention, the mathematical representation of the constructed neural network model is as follows

It can be seen that the neural network is a complex nonlinear function phi (·)) X is the input of the neural network, i.e. the preprocessed input image, W ═ W¹,W²,…,Wⁱ,…,W^K-1,W^KRepresents the parameters of the neural network, K represents the depth of the neural network,

can represent convolution operations (e.g. general convolution, dilation convolution), psi (·) represents a non-linear activation function (e.g. ReLU), and (1) can also be expressed as:

wherein

for operation sequence notation, α denotes a network structure, which will also be described

Simplified to C_iThen the network structure can be simplified as:

the deep learning problem can be expressed as the following optimization problem:

wherein

Representing a loss function, λ | W |₂In order to simplify the expression, the invention does not express the commonly used components in the network structure in the formula, and the default here is to be 1e-4Batch normalization (BatchNormalize) was used for these operations.

In the pre-training model generation unit 101, after the neural network model structure (e.g., ResNet50) is constructed, the network model α is given₀(if ResNet50 is used, ResNet50 is α₀) The pre-training is to obtain a model parameter W of the pre-training network through training in the source data set ImageNet₀Thus, a pre-trained model can be expressed as:

in computer vision, ImageNet has been proven to be useful for many other tasks. ImageNet is considered to be the most commonly used source data set, and therefore, the present invention employs ImageNet as a pre-trained source data set.

In standard migration learning, network parameters are adjustable when migrating to a target task, while the network structure is fixed, with the goal of finding a given network structure α₀Of the optimal network parameter W_*And then:

so standard migration learning can be formalized as:

wherein the network structure α₀The optimization is not changed before and after, namely the network structure is not changed.

A search space design unit 102 for designing a search space of the neural network structure such that the selected neural network structure can be regarded as one instance in the search space.

In the embodiment of the present invention, the search space of the neural network structure is designed such that the selected neural network structure can be regarded as an example in the search space by expanding the neural network structure selected by the pre-training model generating unit 101 to a larger neural network space.

It is an object of the present invention to utilize the initial network structure α₀Migratable property of, second explores the initial network structure α₀To this end, the present invention designs a search space for network architecture A that results in an initial network structure α₀Can be viewed as an instance in search space a (i.e., α)₀e.A) FIG. 2 shows the initial network structure α₀Target network fabric α_*And the relation with the search space A, according to the definition of the search space, the self-adjusting image identification model of the neural network structure based on the transfer learning of the invention can be further expressed as follows:

wherein O is_i(1 ≦ i ≦ K) representing the set of candidate operations, then the search of the network structure can be simplified to choose the appropriate operation from each layer, and the search space can be expressed as:

A＝{O₁,I₂,…,O_i,…,O_K-1,O_K} (9)

to ensure α₀E.g. A, then has,

in particular, three types of candidate operations are exemplified below:

convolution: convolution with 5 × 5,3 × 3,1 × 1, respectively

To represent

Pooling: 3X 3 maximum pooling, 3X 3 average pooling, Global average pooling, individually

To represent

And others: equivalent transformation, noise disturbance operation, respectively

To represent

Wherein the noise perturbation operation is to add Gaussian noise to the input of the operation, i represents the ith layer,

it should be noted that the above 8 kinds of candidate operations are only specific examples, and more operations, such as adding 7 × 7 convolution, expanding convolution, etc., may be specified by designing the search space and may be flexibly changed.

And the joint fine-tuning unit 103 is used for joint fine-tuning the network parameters and the network structure in the search space from the obtained pre-training model, and obtaining the target neural network structure after optimization.

In the invention, starting from the obtained pre-training model, network parameters and network structures are adjusted in a search space in a combined manner, which is different from standard transmission learning, and the invention adjusts the network parameters and the network structures in the search space at the same time and can be expressed as:

wherein the content of the first and second substances,

and (9) can be further simplified as follows:

then α is driven from the initial network configuration₀Tuning to a target neural network structure α_*The process of (a) can be expressed as:

preferably, the joint fine tuning unit 103 further obtains the discrete target neural network structure α by using a soft selection method based on the joint fine tuning_*。

According to the above equation (8), the optimal neural network structure is found, and it is simplified to select an appropriate operation from the candidate operation set of each layer. However, selecting an operation from the candidate set is discrete, not trivial, which is not optimized for DNNs. To address this problem, the present invention relaxes the hard selection problem to the soft selection problem. Specifically, each candidate operation in O is associated with a confidence value P ∈ [0,1], where P ═ 1 indicates that the corresponding operation is positively taken. Assuming that P can be learned in a data-driven manner, for example, for the ith layer (0 ≦ i ≦ K), the probability of selecting a 3 × 3 convolution is defined as:

wherein

Representing a learnable parameter that measures the probability that the ith layer selects a 3 x 3 convolution. The possibility of selecting other operations is defined similarly. The neural network in the search process can then be expressed as:

wherein

Represents a weighted sum of candidate operations, namely:

due to the fact that

Has been pre-trained so that it has a greater capacity at the start of the search than other candidate operations, and so will

The initialization is 1 and theta for other operations is initialized to 0. Then there are:

wherein phi_W,θ(X) is:

wherein

The expression parameter is

Operation of

The invention also adds an operation regularization item

Wherein the content of the first and second substances,

to suppress operation of the initial network structure

To encourage new network operation, and in addition, with equal transformation, noise perturbation operation can reduce network complexity, and finally, the image recognition framework for neural network structure self-adjustment based on transfer learning can be expressed as:

in a specific embodiment of the present invention, a random gradient descent method may be used to solve this problem.

In the present invention, a target network structure α is obtained_*Reduced to a set of candidate operations O from each layer_iMiddle selection operation C_iBy searching, obtainθ, which measures the probability that the i-th layer operation is selected. Thus, the optimal operation should be the candidate operation with the highest probability:

after the optimal operation is selected, the target neural network structure α is obtained_*。

Fig. 3 shows a schematic diagram comparing the results of the present invention with standard transfer learning. Wherein (a) represents standard transfer learning, when the source domain is transferred to other target domains after pre-training, target1, target2 and target3, which only change the weight of the network structure and do not change the network structure, and the network structure of the source domain and the network structure of the target domain are the same. And (b) represents the invention, when the source domain is moved to other target domains after pre-training, namely target1, target2 and target3, the invention changes not only the weight but also the network structure (including the connection mode and operation).

A network parameter fine-tuning unit 104 for fine-tuning the target neural network structure α obtained by the joint fine-tuning unit 103_*And (4) fine tuning of network parameters in the target domain, namely adjusting the weight W based on the network structure and the weight W obtained in the last step, and continuing training.

After the image recognition model with the self-adjusting neural network structure is trained, the image to be recognized can be input into the image recognition model for image processing to obtain an image recognition result, and if the image recognition model is trained on an ImageNet data set, the image recognition model can classify the image.

Fig. 4 is a system architecture diagram of an image recognition method with a self-adjusting neural network structure according to the present invention. As shown in fig. 4, the image recognition method for neural network structure self-adjustment of the present invention includes the following steps:

and step S1, constructing a neural network model structure such as ResNet50, and pre-training on a source domain based on standard transfer learning to obtain a pre-training model.

It can be seen that the neural network is a complex nonlinear function Φ (·), X is the input to the neural network, and W ═ W¹,W²,…,Wⁱ,…,W^K-1,W^KRepresents the parameters of the neural network, K represents the depth of the neural network,

wherein

Simplified to C_iThen the network structure can be simplified as:

wherein

Representing a loss function, λ | W |₂For the regularization term, λ is a hyperparameter, typically set to 1e-4, where the batch normalization is hidden for simplified representation.

In step S1, after the neural network model structure (e.g., ResNet50) is constructed, the network model α is given₀The pre-training is to obtain a model parameter W of the pre-training network through training in the source data set ImageNet₀Thus, a pre-trained model can be expressed as:

so standard migration learning can be formalized as:

Step S2, a search space of neural network structures is designed such that the selected neural network structure can be considered as one instance in the search space.

A＝{O₁,O₂,…,O_i,…,O_K-1,O_K}(9)

to ensure α₀E.g. A, then has,

in particular, three types of candidate operations are exemplified below:

convolution: convolution with 5 × 5,3 × 3,1 × 1, respectively

To represent

To represent

To represent

and step S3, combining and fine-tuning network parameters and a network structure in a search space from the obtained pre-training model, and optimizing to obtain a target neural network structure.

Unlike standard transmission learning, the present invention adjusts both network parameters and network structure in the search space, which can be expressed as:

wherein the content of the first and second substances,

and (9) can be further simplified as follows:

preferably, the present invention also uses a soft selection method to obtain the discrete target neural network structure α based on the joint fine tuning_*。

wherein

wherein

Represents a weighted sum of candidate operations, namely:

due to the fact that

wherein phi_W,θ(X) is:

wherein

The expression parameter is

Operation of

The invention also adds an operation regularization item

Wherein the content of the first and second substances,

to suppress operation of the initial network structure

In the present invention, a target network structure α is obtained_*Reduced to a set of candidate operations O from each layer_iMiddle selection operation C_iFrom the search, θ is obtained, which measures the probability that the operation at the ith level is selected. Thus, the optimal operation should be the candidate operation with the highest probability:

Step S4, the target neural network α obtained through step S3_*And (4) fine tuning of network parameters in the target domain, namely, adjusting the weight based on the network structure and the weight obtained in the last step, and continuing training.

Preferably, after step S4, the method further includes the following steps:

and inputting the image to be recognized into the trained image recognition model for image processing to obtain an image recognition result. If the image recognition model is trained on ImageNet datasets, it can classify the images.

In summary, the image recognition method and system for neural network structure self-adjustment of the present invention combine the transfer learning technology and the neural network structure adjustment technology, utilize the representation learned by the pre-training in the transfer learning, adaptively adjust the neural network structure for different tasks and data, and jointly optimize the neural network structure and the image recognition performance of the neural network parameter enhancement model.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. An image recognition model with a neural network structure self-adjusting, comprising:

2. The model of claim 1, wherein the pre-trained model generation unit is configured to generate the network model α after constructing the neural network model structure₀Training by using a source data set ImageNet to obtain a model parameter W of the pre-training network₀。

3. The self-adjusting image recognition model of claim 2, wherein the neural network model structure adopts a ResNet50 neural network structure.

4. The model for image recognition with self-adjustment of neural network structure as claimed in claim 2, wherein the pre-training model is expressed as:

is an operation sequence symbol.

5. The self-adjusting image recognition model of neural network structure as claimed in claim 4, wherein in step S1, the standard transfer learning can be formed as:

W_*representing a given network structure α₀The best network parameters.

6. The model of claim 2, wherein the search space design unit expands the neural network structure selected by the pre-training model generation unit to a larger neural network space.

7. The model of claim 6, wherein the search space is represented by:

A＝{O₁,O₂,…,O_i,…,O_K-1,O_K}

wherein O is_i(1 ≦ i ≦ K) represents the set of candidate operations.

8. The model for image recognition of self-adjustment of neural network structure as claimed in claim 1, wherein the joint fine-tuning unit obtains the discrete target neural network structure α by using soft-selection method based on the joint fine-tuning_*。

9. A self-adjusting image recognition model of neural network architecture as claimed in claim 1, wherein: and training the image recognition model on an ImageNet data set to obtain a trained image recognition model to realize the classification of the input images.

10. An image recognition method for self-adjusting a neural network structure comprises the following steps: