CN114429197B

CN114429197B - Neural network architecture searching method, system, equipment and readable storage medium

Info

Publication number: CN114429197B
Application number: CN202210085746.1A
Authority: CN
Inventors: 徐亦飞; 王正洋; 朱利; 尉萍萍; 王超勇; 张越皖; 张扬; 徐明杰
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2024-05-28
Anticipated expiration: 2042-01-25
Also published as: CN114429197A

Abstract

The invention discloses a neural network architecture searching method, a system, equipment and a readable storage medium, which are used for initializing related parameters of a DARTS network, inputting an image training set into the initialized DARTS network, calculating a loss value according to an objective function, calculating network loss change according to gradient information by using a second-order Taylor expansion, calculating index significance by using a grading index based on synapse significance, adopting a connection sensitivity index to the neural network architecture searching for indicating the importance of operation, defining the micro-architecture searching as network pruning during the initialization, adopting a measure called operation significance in the initialization of the network pruning, and experimental results show that the framework is a promising and reliable micro-neural architecture searching solution and has good performance on different reference data sets and DARTS searching spaces. The method of the invention is very efficient and can complete the architecture search within a few seconds.

Description

Neural network architecture searching method, system, equipment and readable storage medium

Technical Field

The invention belongs to the technical field of artificial intelligence, and relates to a neural network architecture searching method, a system, equipment and a readable storage medium.

Background

The success of deep learning in computer vision is largely due to the deep a priori expertise of human experts, however such manual designs are costly and become more difficult as networks become larger and more complex. Neural Network Architecture Search (NAS) automates the neural network design process and is therefore of great interest. However, this approach requires very high computational power, and early NAS approaches required thousands of GPUs to find efficient network architectures. To improve efficiency, many recent studies have turned to reducing search costs, with one of the most popular paradigms being known as the microneural network architecture search (DARTS) framework. DARTS employs continuous relaxation to convert the operation selection problem to continuous amplitude optimization of a set of candidate operations, which is then converted to a double-layer optimization problem, alternately optimizing architecture parameters and model weights in weights by gradient descent.

Although microneural network architecture searching (DARTS) has become the dominant paradigm for neural Network Architecture Searching (NAS) due to its simplicity and efficiency, recent research has found that with the progress of DARTS optimization, the performance of search architectures has barely improved, as they simply use the parameter values of the corresponding architecture as an important indicator of architecture selection. This will result in the network architecture selected from the search space typically falling into a suboptimal state, which indicates that the final value of the network architecture parameters obtained by DARTS hardly indicates the importance of the operation. The above observations indicate that supervisory signals in DARTS may be a poor or unreliable indicator for network architecture searches, and Wang et al, a recent work shows that the magnitude of architectural parameters obtained after super network training by dart is fundamentally erroneous and hardly indicates operational importance. More interestingly, there are several studies that utilize a simple early-stop strategy to interrupt super-network training during the search process, which can significantly improve DARTS performance. These empirical observations indicate that the super network training degrades performance as the search progresses.

Network pruning is an efficient method of compressing a parametric neural network that minimizes performance degradation of the neural network by removing parameters. The last discretization stage of DARTS, i.e., selecting a discrete architecture from the hyper-parameterized hyper-networks according to the size of the operation, can be considered as an operation-level network pruning, and based on this motivation ,Architecture search,anneal and prune.In International Conference on Artificial Intelligence and Statistics,pages 493–503.PMLR,2020., a scalable search space is proposed, which can progressively prune inferior operations during the search process. The search may also be expedited because the number of candidate operations is reduced during the search. Also ,Progressive differentiable architecture search:Bridging the depth gap between search and evaluation.In Proceedings of the IEEE International Conference on Computer Vision prunes cell candidate operations during the search process, and increases the network depth step by step to alleviate the depth gap in the architecture search and evaluation process, these operations all accelerate the network search process to a certain extent, but the excessive search cost still makes the application scenario limited for the network architecture search.

Disclosure of Invention

The invention aims to provide a neural network architecture searching method, a system, equipment and a readable storage medium, which solve the problems that the existing DARTS algorithm is too heavy in searching cost and the searching index cannot reflect the importance of operation.

A neural network architecture search method, comprising the steps of:

S1, initializing related parameters of a DARTS network;

S2, inputting an image training set into an initialized DARTS network, calculating a loss value according to an objective function, calculating network loss change according to gradient information by using a second-order Taylor expansion method, and calculating index significance by using a grading index based on synapse significance;

and S3, analyzing and calculating an optimal Cell network structure according to the significance index of the operation, the network loss change and the loss value, and stacking the obtained optimal Cell network structure to form a searched model structure.

Further, the initialized relevant parameters include weight parameters, architecture parameters, learning rate and batch size.

Further, R is adopted as the network loss change caused by removing the operation

R＝L(D,W,α,Sp)-L(D,W|(1-α_k ^T)Sp)

Wherein D, W, α, sp, k are the data set, network parameters, architecture parameters, search space, and operations to be removed, respectively.

Further, the score index based on synaptic significance is shown in formula (1)

Alpha is the architecture parameter.

Further, the network loss variation R using a second-order taylor series expansion design:

Further, cifar-10 datasets were employed as training sets.

Further, DARTS network structures include Normal Cell structures and Reduction Cell structures.

A neural network architecture search system comprises an initialization module, an optimization training module and a search module;

The initialization module is used for initializing related parameters of the DARTS network;

The optimization training module is used for processing the image training set according to the initialized DARTS network, calculating a loss value according to an objective function, calculating network loss change according to gradient information by using a second-order Taylor expansion, calculating index saliency by using a grading index based on synapse saliency, and calculating an optimal Cell network structure according to the operation saliency index, the network loss change and the loss value;

And the searching module stacks and forms a searched model structure according to the obtained optimal Cell network structure.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the neural network architecture search method when the computer program is executed.

A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the neural network architecture search method.

Compared with the prior art, the invention has the following beneficial technical effects:

The invention relates to a neural network architecture searching method, which is characterized in that related parameters of a DARTS network are initialized, an image training set is input into the initialized DARTS network, a loss value is calculated according to an objective function, a network loss change is calculated according to gradient information by using a second-order Taylor expansion, index significance is calculated by using a grading index based on synapse significance, a connection sensitivity index is adopted to the neural network architecture searching for indicating the importance of operation, a micro-architecture searching is defined as network pruning during initialization, an operation significance measuring is adopted during the initialization of the network pruning, and experimental results show that the framework is a promising and reliable micro-neural structure searching solution and has good performance on different reference data sets and DARTS searching spaces. The method of the invention is very efficient and can complete the architecture search within a few seconds.

Furthermore, due to the memory and calculation efficiency, the method is applied to a more flexible search space, in which the depth of the network in architecture search and evaluation can be the same, and the method can be directly applied to a large data set to perform architecture search and then be transferred to the large data set to perform evaluation, so that the flexibility of the method is shown.

Drawings

Fig. 1 is a schematic diagram of an internal structure of a DARTS network Cell according to an embodiment of the invention.

FIG. 2 is a schematic diagram of a DARTS model network according to an embodiment of the present invention.

FIG. 3 is a flow chart of an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Setting the image size of the input image X as H×W×C, wherein H, W, C is the height, width and channel of the input image respectively; the DARTS network is S (X), the classification class of the classification network is N, the architecture parameter of the DARTS network is alpha, and the weight parameter is W.

The invention discloses a neural network architecture searching method based on saliency, which comprises the following steps:

S1, collecting an input image and dividing the input image into a training set T _train and a testing set T _test;

s2, initializing a weight parameter W, a framework parameter alpha, a learning rate and a batch size in a DARTS network S (X);

S3, inputting image data X in a training set T _train into an initialized DARTS network S (X), calculating a loss value according to an objective function, calculating index significance M by using a grading index based on synapse significance, and calculating network loss change R by using a second-order Taylor expansion according to gradient information;

The score index based on synaptic significance is shown in formula (1)

The grading index of the synaptic significance is used for pruning the network during initialization; the application does not use a significance index to score the weight parameters of the super network; scoring operations in the super network by using importance of pruning the network architecture and correspondingly using scoring indicators of synaptic significance; the application adopts the grading index of synaptic significance without training, which enables us to perform pruning operation on the super network in initialization without training, namely removing the operation.

Using R as the change in network loss to remove the operation

R＝L(D,W,α,Sp)-L(D,W|(1-α_k ^T)Sp) (2)

Wherein D, W, alpha, sp, k are the data set, network parameters, architecture parameters, search space, operations to be removed, respectively; the application uses R as the optimal architecture selection criterion. Specifically, network architecture significance is defined as the change in network loss caused by removing the architecture from the neural network search space, reflecting the contribution of the candidate architecture to network performance, effectively eliminating the bias in architecture selection.

The network loss variation R using a second order taylor series expansion design in which a step size can be calculated by one back propagation over the validation dataset and a second derivative can be calculated by two back propagation:

S4, analyzing and calculating an optimal Cell network structure according to the operational significance index M and the network loss change R, and stacking the obtained optimal Cell network structure to form a searched model structure.

In one embodiment of the present invention, there is provided a terminal device including a processor and a memory for storing a computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor adopts a Central Processing Unit (CPU), or adopts other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), ready-made programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and the like, which are a computation core and a control core of the terminal, and are suitable for realizing one or more instructions, in particular for loading and executing one or more instructions so as to realize corresponding method flows or corresponding functions; the processor provided by the embodiment of the invention can be used for the operation of the neural network architecture searching method.

A neural network architecture search system, comprising:

the system comprises an initialization module, an optimization training module and a search module;

The searching module stacks according to the obtained optimal Cell network structure to form a searched model structure

In still another embodiment of the present invention, a storage medium, specifically a computer readable storage medium (Memory), is a Memory device in a terminal device, for storing programs and data. The computer readable storage medium includes a built-in storage medium in the terminal device, provides a storage space, stores an operating system of the terminal, and may also include an extended storage medium supported by the terminal device. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium may be a high-speed RAM memory or a Non-volatile memory (Non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the above embodiments that may be used in a neural network architecture search method.

As shown in the implementation flowchart of the present invention in fig. 3, the present invention proposes a neural network architecture searching method based on saliency; the invention is described in detail below with reference to the attached drawing figures:

description of data:

we used Cifar-10, NAS-bench-201 dataset evaluations for training and evaluation, using DARTS and NAS-bench-201 search spaces. CIFAR-10 the dataset consisted of 60000 32x32 color images of 10 classes, 6000 images each. There are 50000 training images and 10000 test images. The dataset was divided into five training batches and one test batch, each batch having 10000 images. The test lot contains exactly 1000 randomly selected images from each category. The training batch contains the remaining images in a random order. Overall, the sum of the five training sets contains exactly 5000 images from each class.

The search space defined in NAS-band-201 includes all possible cell structures generated by 4 nodes and 5 related operation options, yielding a total of 5 ⁶ = 15625 cell candidates. The three data sets (CIFAR-10, CIFAR-100, imagenet downsampled 16X 16, 120 classes selected) are provided with training logs using the same set and performance for each structure candidate

Training a network:

The DARTS network model adopted in this example is shown in FIG. 2, and the DARTS network structure is composed of two types of cells, namely Normal Cell and Reduction Cell, and total 20 cells, and the structure of the DARTS network model is shown in FIG. 2.

Evaluation of the searched model:

The training model is evaluated by using CIFAR-10 data sets, as shown in a formula (4), the cross entropy loss is an important evaluation index in the field of image classification, the lower the index is, the better the proving effect is, and the cross entropy loss is reduced to 0.12 by calculating the predictive score of 500 samples and 600 epochs. The present invention allows for a dramatic reduction in search time compared to other DARTS methods.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the concept of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A neural network architecture search method, comprising the steps of:

s1, initializing related parameters of a DARTS network; the initialized related parameters comprise weight parameters, architecture parameters, learning rate and batch size;

S3, analyzing and calculating an optimal Cell network structure according to the significance index of the operation, the network loss change and the loss value, and stacking the obtained optimal Cell network structure to form a searched model structure;

using R as the change in network loss to remove the operation

R＝L(D,W,α,Sp)-L(D,W|(1-α_k ^T)Sp)

Wherein D, W, alpha, sp, k are the data set, network parameters, architecture parameters, search space, operations to be removed, respectively; the score index based on synaptic significance is shown in formula (1)

Alpha is a framework parameter;

network loss variation R using a second-order taylor series expansion design:

Cifar-10 datasets were used as training sets.

2. The neural network architecture search method of claim 1, wherein the DARTS network structure comprises a Normal Cell structure and a Reduction Cell structure.

3. A neural network architecture search system for use in the method of claim 1, comprising an initialization module, an optimization training module, and a search module;

4. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 2 when the computer program is executed by the processor.

5. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 2.