CN110674326A

CN110674326A - Neural network structure retrieval method based on polynomial distribution learning

Info

Publication number: CN110674326A
Application number: CN201910722978.1A
Authority: CN
Inventors: 纪荣嵘; 郑侠武
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2020-01-10

Abstract

A neural network structure retrieval method based on polynomial distribution learning relates to neural architecture search. 1) The method comprises the steps of providing a calibrated image-label pair set, dividing the image-label pair set into a training sample set, testing a photo sample set and a verification sample set, and defining a possible search space of a neural network to be searched; 2) sampling possible network structures in a search space; 3) after the sampling in the step 2), training the sampled neural network structure by using the image label pair in the step 1); 4) recording the number of times each operation is sampled and the precision of each operation on the verification set; 5) calculating the difference of sampling times and the difference of precision between operations; 6) updating the probability defined in step 2) with the difference calculated in step 5); 7) and (5) circulating the steps 3) to 6) until a fixed training time is reached. The method is suitable for a relatively large data set, and is efficient and accurate.

Description

Neural network structure retrieval method based on polynomial distribution learning

Technical Field

The invention relates to neural architecture search, in particular to a neural network structure retrieval method based on polynomial distribution learning.

Background

In recent years, with the development of artificial intelligence and deep learning, people begin to grow exponentially for customized deep learning network structures. The users are more expected to deeply learn the current tasks, and customized network structures and parameters are generated, so that the generation of the neural network structure retrieval system is guided. Given a data set, Neural Architecture Search (NAS) aims to find high-performance convolutional architectures in a huge search space through search algorithms. NAS has enjoyed great success in automated architectural searches for various deep learning tasks, such as image classification, language modeling and semantic segmentation. As described in [1] (t.devries and g.w.taylor.improved regulation of neural networks with future. arXiv preprinting arXiv:1708.04552,2017.), the neural architecture search method consists of three parts: search space, search strategy, and performance evaluation.

Conventional NAS algorithms sample a particular convolutional architecture through a search strategy and estimate performance, and at the same time, performance can be regarded as an objective function for updating the search strategy. Despite significant advances, conventional neural network structure search methods are still limited by computational intensive and memory costs. For example, over 20,000 neural networks out of 500 GPUs need to be trained and evaluated within 4 days based on the reinforcement learning method [2] (b.zoph, v.vacuevan, j.shlen, and q.v.le.learning transferable architecture for a scalable image retrieval in Proceedings of the IEEE con on computing and pattern retrieval, pages 8697 and 8710, 2018.). Recent work has improved scalability by making it in a differentiable way, where the search space is relaxed to a continuous space, so that the architecture can be optimized by performance on a gradient-decreasing validation set. However, the discriminative neural network fabric search is still subject to high GPU memory consumption, which grows linearly with the size of the candidate search set.

Disclosure of Invention

The invention aims to provide a neural network structure retrieval method based on polynomial distribution learning.

The invention comprises the following steps:

1) the method comprises the steps of providing a calibrated image-label pair set, dividing the image-label pair set into a training sample set, testing a photo sample set and a verification sample set, and defining a possible search space of a neural network to be searched;

2) sampling possible network structures in a search space, and defining sampling probability of each operation; the network structure is divided into networks, cells and nodes according to different scales;

3) after the sampling in the step 2), training the sampled neural network structure by using the image label pair in the step 1);

4) after training, recording the sampling times of each operation and the precision of each operation on a verification set;

5) calculating the difference of the sampling times and the difference of the accuracies among the operations according to the sampling times and the accuracies on the verification set of each operation obtained in the step 4);

6) updating the sampling probability defined in step 2) by using the difference calculated in step 5);

7) and (5) circulating the steps 3) to 6) until a fixed training time is reached.

In step 2), the network structure refers to the entire network topology; different numbers of cells are stacked linearly to form different network structures, wherein the cells are mainly divided into down-sampling cells and common cells; the width, the height and the depth of the input and the output of the common cells are kept consistent, and the width and the height of the down-sampling cells are halved and the depth of the down-sampling cells is doubled; the cells are composed of nodes, and a certain sequence of acyclic fully-connected topological graphs is maintained among the nodes in each cell; the nodes are mainly divided into input nodes, output nodes and intermediate nodes, each node stores a neural network intermediate characteristic diagram, and the connection among the nodes is a specific operation; the neural network search mainly determines which operation selection needs to be performed between two nodes; assuming that between any two nodes i, j, the sampling probability of each operation is defined as:

where N is the number of operations, that is, each operation is uniformly sampled.

In step 4), the record, for the operation space between two nodes, assumes that the operation space contains N possible operations, each operation being sampled for the number of times

Precision of each operation on verification set

Is a vector of N dimensions.

In step 5), calculating the difference of sampling times between operations

The following formula:

difference between precisions

The following formula:

where N is the number of operations.

In step 6), the specific method for updating may be: when two operations are compared, one of the operations has a smaller number of times of being sampled and has higher precision, the probability of the operation being sampled is improved, and conversely, when one of the operations has a larger number of times of being sampled and has lower precision, the probability of the operation being sampled is reduced, and the formula is expressed as:

wherein,

to indicate a function, when the input is true, 1 is returned, and the rest are returned to 0.

The method provided by the invention mainly comprises a rapid neural network structure searching method based on distributed learning. First, a completely new web search framework is proposed. Secondly, for better training, a distributed learning-based algorithm is provided, and the algorithm achieves the optimal training speed and precision. More ingeniously, the two methods proposed above can be mutually enhanced.

Compared with the prior art, the method has the following outstanding advantages:

firstly, the invention explicitly introduces the idea of the distributed learning algorithm, thereby solving the problem that the neural network structure retrieval is difficult to train to a certain extent.

Secondly, the invention provides a brand-new neural network searching framework, thereby efficiently and accurately carrying out the neural network structure retrieval.

Third, the present invention can be applied to relatively large data sets, which are optimized for speed and accuracy.

Drawings

FIG. 1 is a flow chart of the present invention for imparting a loss of central ordering and weak surveillance object localization.

Detailed Description

The invention discloses a neural network structure retrieval based on polynomial distribution learning. The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, the steps of the embodiment of the present invention are as follows:

step 1, a marked image-label pair set is given, the image-label pair set is divided into a training sample set, a photo sample set and a verification sample set are tested, and a possible search space of a neural network to be searched is defined;

step 2, sampling possible network structures in a search space; the network structure can be divided into networks, cells and nodes according to different scales. Wherein a network refers to the entire network topology. Different numbers of cells are stacked linearly to form different network structures, wherein the cells are mainly classified into down-sampling cells and common cells. The width, height, and depth of the input and output of the normal cell are kept consistent, while the width and height of the down-sampled cell are halved and the depth is doubled. The cells are fully connected topological graphs between the nodes. The nodes are mainly divided into input nodes, output nodes and intermediate nodes, each node stores a neural network intermediate characteristic diagram, and connection among the nodes is specific operation. The neural network search mainly determines which operation selection needs to be performed between two nodes. We assume that between any two nodes i, j, the sampling probability of each operation is:

Step 3), after the step 2) is carried out sampling, training the sampled neural network structure by using the image label pair in the step 1);

step 4, step 3) of training the neural network, and then recording the precision on the verification set. Recording is performed, essentially two pieces of information, the first being the number of times each operation is sampled

And the precision of each operation on the verification set

For an operating space between two nodes, it is assumed that the operating space contains N possible operations,and

is a vector of N dimensions.

Step 5) according to the sampling times and the precision of each operation obtained in step 4), calculating the difference of the sampling times among the operations

And difference between precisions

Wherein N is defined in step 4) as the number of operations.

Step 6 the difference calculated in step 5) can be used to update the probability defined in step 2), and the following update is performed, when two operations are compared, one of the operations has a smaller number of times of being sampled and has higher precision, so as to raise the probability of being sampled, and conversely, when one of the operations has a larger number of times of being sampled and has lower precision, the probability of being sampled is lowered, and the formula is expressed as:

wherein,

And 7, circulating the steps 3-6 until a fixed training frequency is reached.

The method provided by the invention mainly comprises a rapid neural network structure searching method based on distributed learning. First, a completely new web search framework is proposed. Secondly, for better training, a distributed learning-based algorithm is provided, and the algorithm achieves the optimal training speed and precision. More subtly, the two algorithms proposed above can be mutually enhanced.

The effects of the present invention are further illustrated by the following simulation experiments.

1. Simulation conditions

The invention is developed on a Pycharm platform, and the developed deep learning framework is based on Pytorch. The language mainly used in the invention is Python, and the OpenCV is utilized to realize the traditional visual algorithm used in the invention.

2. Emulated content

Simulations were performed on the Cifar10 and ILSVRC2012 datasets, the Cifar-10 dataset consisting of 10 classes of 32x32 color pictures, for a total of 60000 pictures, each class containing 6000 pictures. Wherein 50000 pictures are taken as a training set, and 10000 pictures are taken as a testing set. The CIFAR-10 dataset was divided into 5 trained lots and 1 tested lot, each containing 10000 pictures. The pictures of the test set batch are composed of 1000 randomly selected pictures from each category, and the training set batch contains the remaining 50000 pictures in a random order. Some training sets batch may contain a greater number of pictures in one class than in other classes. The training set batch contains 5000 pictures from each class for a total of 50000 training pictures. The ImageNet project is a large visualization database for visual object recognition software research. Image URLs in excess of 1400 million were manually annotated by ImageNet to indicate objects in the picture; a bounding box is also provided in at least one million images. ImageNet contains 2 ten thousand categories; the comparison result between the invention and the best existing fine-grained retrieval method is shown in table 1, and the table 1 shows that compared with other methods, the invention has higher precision and higher speed.

TABLE 1

The invention provides a rapid neural network structure searching method based on distributed learning. A new network architecture search algorithm is introduced, applicable to a variety of large-scale datasets, because memory and computational costs are similar to those of ordinary neural network training. Furthermore, a performance ranking assumption is proposed that can be incorporated into existing NAS algorithms to speed up their search. The proposed method achieves a significant search efficiency improvement, for example, using 1 block GTX1080Ti to show that the network structure searched within 4h has a test error of only 2.4% on the relevant data set (6.0 times faster compared with the most advanced algorithm), which is attributed to the fact that the distributed learning using the present invention is completely different from the previous reinforcement learning-based method and differentiable method.

Claims

1. A neural network structure retrieval method based on polynomial distribution learning is characterized by comprising the following steps:

2. The method for searching a neural network structure based on polynomial distribution learning of claim 1, wherein in step 2), the network structure refers to the whole network topology; different numbers of cells are stacked linearly to form different network structures, wherein the cells are mainly divided into down-sampling cells and common cells; the width, the height and the depth of the input and the output of the common cells are kept consistent, and the width and the height of the down-sampling cells are halved and the depth of the down-sampling cells is doubled; the cells are composed of nodes, and a certain sequence of acyclic fully-connected topological graphs is maintained among the nodes in each cell; the nodes are mainly divided into input nodes, output nodes and intermediate nodes, each node stores a neural network intermediate characteristic diagram, and the connection among the nodes is a specific operation; the neural network search mainly determines which operation selection needs to be performed between two nodes; assuming that between any two nodes i, j, the sampling probability of each operation is defined as:

3. The method as claimed in claim 1, wherein in step 4), the record is recorded, for the operation space between two nodes, assuming that the operation space contains N possible operations, and each operation is sampled for the number of times

Precision of each operation on verification set

Is oneA vector of N dimensions.

4. The method as claimed in claim 1, wherein in step 5), the difference of sampling times between operations is calculated

The following formula:

difference between precisions

The following formula:

where N is the number of operations.

5. The method for searching the neural network structure based on the polynomial distribution learning of claim 1, wherein in the step 6), the specific method for updating is as follows: when two operations are compared, one of the operations has a smaller number of times of being sampled and has higher precision, the probability of the operation being sampled is improved, and conversely, when one of the operations has a larger number of times of being sampled and has lower precision, the probability of the operation being sampled is reduced, and the formula is expressed as:

wherein,