CN113408721A

CN113408721A - Neural network structure searching method, apparatus, computer device and storage medium

Info

Publication number: CN113408721A
Application number: CN202011567991.3A
Authority: CN
Inventors: 李健; 刘勇; 王流斌; 杨毅果; 王巨宏
Original assignee: Tencent Technology Shenzhen Co Ltd; Institute of Information Engineering of CAS
Current assignee: Tencent Technology Shenzhen Co Ltd; Institute of Information Engineering of CAS
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-09-17

Abstract

The application relates to a neural network structure searching method, a neural network structure searching device, computer equipment and a storage medium. The method comprises the following steps: inputting the training neural network structure into a neural network of a graph to obtain corresponding discrete structure characteristics; inputting the discrete structural features into an encoding network, and encoding the discrete structural features into continuous structural features through the encoding network; decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure; training the neural network, the coding network and the decoding network of the graph based on reconstruction loss between the training neural network structure and the reconstruction neural network structure until a training stopping condition is met, obtaining a target coding network and a target decoding network, and determining a hidden space corresponding to the target coding network as a target search space; and searching from the target search space according to a target search strategy to obtain target structure characteristics, and decoding through a target decoding network to obtain a target neural network structure. The method can improve the searching efficiency of the neural network structure.

Description

Neural network structure searching method, apparatus, computer device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network structure search method, apparatus, computer device, and storage medium.

Background

With the development of artificial intelligence technology, the design of neural network structure is changing from manual design to automatic machine design. Through Neural Network Architecture Search (NAS), developers can be helped to automatically Search out the optimal Neural network Architecture.

In the traditional technology, the neural network structure search is usually carried out on a discrete space, the search on the discrete space belongs to the black box optimization problem, the convergence speed is slow, and the efficiency of the neural network structure search is low.

Disclosure of Invention

In view of the above, it is necessary to provide a neural network structure searching method, apparatus, computer device and storage medium capable of improving the neural network structure searching efficiency.

A neural network structure searching method, the method comprising:

acquiring a training neural network structure;

inputting the training neural network structure into a neural network of a graph to obtain discrete structure characteristics corresponding to the training neural network structure;

inputting the discrete structure features into a coding network, and coding the discrete structure features into corresponding continuous structure features through the coding network;

decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure;

training the graph neural network, the coding network and the decoding network based on reconstruction loss between the training neural network structure and the reconstruction neural network structure until a target coding network and a target decoding network are obtained when a training stopping condition is met, and determining a hidden space corresponding to the target coding network as a target search space;

and searching from the target search space according to a target search strategy to obtain target structure characteristics, and decoding the target structure characteristics through the target decoding network to obtain a target neural network structure.

An apparatus for searching a neural network structure, the apparatus comprising:

the training data acquisition module is used for acquiring a training neural network structure;

the discrete coding module is used for inputting the training neural network structure into a graph neural network to obtain discrete structure characteristics corresponding to the training neural network structure;

the continuous coding module is used for inputting the discrete structure characteristics into a coding network and coding the discrete structure characteristics into corresponding continuous structure characteristics through the coding network;

the decoding template is used for decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure;

the training module is used for training the graph neural network, the coding network and the decoding network based on reconstruction loss between the training neural network structure and the reconstruction neural network structure until a target coding network and a target decoding network are obtained when a training stopping condition is met, and determining a hidden space corresponding to the target coding network as a target search space;

and the searching module is used for searching from the target searching space according to a target searching strategy to obtain target structure characteristics, and decoding the target structure characteristics through the target decoding network to obtain a target neural network structure.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a training neural network structure;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a training neural network structure;

The neural network structure searching method, the device, the computer equipment and the storage medium have the advantages that the training neural network structure is input into the graph neural network, the structure information of the training neural network can be effectively extracted through the graph neural network, the discrete structure characteristics corresponding to the training neural network structure are obtained, the discrete structure characteristics are input into the coding network, the discrete structure characteristics can be coded into continuous structure characteristics through the coding network, so that a continuous searching space is constructed, decoding reconstruction is carried out according to the continuous structure characteristics and the decoding network, the reconstructed neural network structure is obtained, finally, the graph neural network, the coding network and the decoding network are trained on the basis of the reconstruction loss between the training neural network structure and the reconstructed neural network structure until the training stopping condition is met, the target coding network and the target decoding network are obtained, and the target coding network and the target decoding network are obtained on the basis of the reconstruction loss training, the target coding network can well learn the structural features of the training neural network, and the target decoding network can accurately decode and reconstruct the structural features from the target search space to obtain the target neural network structure, so that the hidden space of the target coding network can be determined as the target search space.

Drawings

FIG. 1 is a flow diagram illustrating a neural network structure search method according to one embodiment;

FIG. 2 is a flowchart illustrating a neural network structure searching method according to another embodiment;

FIG. 2A is a schematic diagram of a training process in one embodiment;

FIG. 2B is a diagram illustrating the structure of a trained neural network in one embodiment;

FIG. 2C is a schematic diagram of a reconstructed neural network structure obtained by reconstructing the trained neural network structure of FIG. 2B according to an embodiment;

FIG. 3 is a schematic diagram illustrating a process for searching a target search space to obtain target structural features according to an embodiment;

FIG. 4 is an architecture diagram of a neural network structure search methodology in a particular embodiment;

FIG. 5 is a block diagram showing the structure of a neural network structure search device according to an embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:

in an embodiment, as shown in fig. 1, a neural network structure searching method is provided, and this embodiment is illustrated by applying the method to a terminal, it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:

step 102, obtaining a training neural network structure.

Wherein training the neural network refers to structural information of the neural network given in the training dataset. The structural information of the neural network includes: 1) network topology, such as layer number, layer connection relation, etc.; 2) types of layers, such as convolutional layers, pooling layers, fully-connected layers, active layers, and the like; 3) and (4) hyper-parameters inside the layers, such as the number of convolution kernels, the number of channels, the step length and the like in the convolution layers.

Specifically, the terminal may acquire the neural network structure information from the training data set, and use the acquired neural network structure information as a training neural network structure.

In one embodiment, the training data set may be a data set that is pre-stored locally by the terminal. In another embodiment, the terminal may obtain the training data set from another computer device, such as a server, via a network or the like.

In a particular embodiment, the training data set may be an existing data set, such as NAS-Bench-101, NAS-Bench-201, and so on.

It can be understood that, in different task scenarios, the types of the obtained training neural network structures are different, for example, when a suitable neural network structure needs to be searched for an image recognition task in computer vision, the selected training neural network structure is a neural network structure for image recognition, and in other task scenarios, the used neural network structure is definitely different from the image recognition task, and a neural network structure of a type corresponding to the other task scenario needs to be selected for learning, so that a neural network structure of a type corresponding to the task scenario needs to be reselected as the training neural network structure.

And 104, inputting the training neural network structure into a neural network of the graph to obtain discrete structure characteristics corresponding to the training neural network structure.

Among them, Graph Neural Networks (GNNs) refer to Neural Networks used for processing Graph data. Training the discrete structural features corresponding to the neural network structure refers to representing the structural features of the neural network in a discrete form.

Specifically, the neural network structure may be regarded as a Directed Acyclic Graph (DAG), each layer of the neural network is a node in the DAG, and a connection relationship between layers is an edge in the DAG, so that feature extraction may be performed on the neural network structure through the Graph neural network. After the terminal acquires the training neural network structure, the training neural network structure can be input into the graph neural network, the network topology structure and the node content information in the training neural network DAG are extracted through the graph neural network, all nodes in the DAG are sequentially encoded from the input nodes, the encoding of each node comprehensively uses the encoding information (network topology structure information) of all front nodes (namely the front nodes connected into the current node through the directed edges) and the current node type (node content information), and the encoding expression of the recursion of the subsequent nodes is carried out. Since the output node codes include the topology information and node information of the actual training neural network DAG, the output node codes are expressed as codes of the entire network. Meanwhile, structural feature representations of different neural networks are discretely distributed, so that the obtained training neural network coding representation is a discrete expression.

For example, a certain training neural network is a three-layer neural network, which corresponds to three nodes X1, X2, and X3, and has a topology structure of: x1 → X2 → X3, X1 → X3. And coding the training neural network by using a graph neural network, aggregating the coding expressions of all preposed nodes by using an aggregation function A, and coding the current node by using an update function U in combination with the preposed node expression and the current node type. The training neural network is expressed by using the graph neural network, and the method comprises the following steps: 1) x1 has no preceding node, only its node type T1 needs to be considered, then X1 is denoted as X1 ═ U (T1, 0; 2) the X2 front node is X1, the X2 node type is T2, and X2 is denoted as X2 ═ U (T2, a (X1)); 3) the X3 front nodes are X1, X2, and the X3 node type is T3, so X3 is denoted as X3 ═ U (T3, a (X1, X2)).

In one embodiment, before inputting the training neural network structure into the graph neural network, the graph neural network needs to be initialized, including determining model structure information and model parameters of the graph neural network. In a particular embodiment, the graph neural network may be a convolutional neural network. Since the convolutional neural network is a multi-layer neural network, each layer is composed of a plurality of two-dimensional planes, and each plane is composed of a plurality of independent neurons, it is necessary to determine which layers (e.g., convolutional layers, pooling layers, etc.), the connection order relationship between layers, and which parameters (e.g., weights, bias terms, convolution step size) each layer includes, etc. the convolutional neural network is a multi-layer neural network.

In one particular embodiment, the graph neural network includes two parts: 1) an aggregation function A for synthesizing all the pre-node coded representations, the aggregation function A being fitted using a gated sum; 2) and the updating function U is used for giving a current node representation according to the node type and the front node expression, and fitting the updating function U by using a Gated Recurrent Unit (GRU). The graph neural network uses network topology structure information and all node information of a training neural network structure as input to obtain discrete structure characteristics, and the discrete structure characteristics are specifically expressed as follows:

h_v＝U(T_v,A({h_u:u→v}))

wherein h is_vIs a characteristic representation of the v node, h_uThe characteristic representation of the u node is that the u node is a front node of the v node. It can be seen that the network a aggregates the incoming edge information of the v node and the feature representation of the front node; and the U network updates the characteristic representation of the v node according to the node type of the v node. T is_vFor vector representations of node type (i.e. layer type), e.g. (0,0,1,0,0) represents a total of five layer classesType, which layer belongs to a third layer type, for example a convolutional layer.

Further, network parameters of the graph neural network may be initialized. In practice, the individual network parameters of the graph neural network may be initialized with a number of different small random numbers. The small random number is used for ensuring that the network does not enter a saturation state due to overlarge weight value, so that training fails, and the different random numbers are used for ensuring that the network can normally learn.

And 106, inputting the discrete structural features into the coding network, and coding the discrete structural features into corresponding continuous structural features through the coding network.

Where an encoding network refers to a machine learning module used for encoding, encoding refers to the process of converting information from one form or format to another. A continuous structural feature refers to a feature that appears as a continuous probability distribution in hidden space. It is understood that the hidden space in which the continuous structural feature is located is a continuous space.

Specifically, after the terminal inputs the obtained discrete structure characteristics into the coding network, the coding network codes the discrete structure characteristics to obtain continuous structure characteristics, and at this time, the hidden space corresponding to the coder is a continuous space.

In a particular embodiment, the encoding network may be an encoding portion of a Variational auto-encoder (VAE). A Variational autocoder, a form of a depth generative model, is a generative network structure based on Variational Bayes (VB) inference, proposed by Kingma et al in 2014. Unlike conventional self-coders that describe the hidden space by way of a value, it describes the observation of the hidden space in a probabilistic manner.

Step 108, inputting the discrete structural features into the coding network, and coding the discrete structural features into corresponding continuous structural features through the coding network.

The decoding network refers to a machine learning module for decoding, the decoding is the inverse process of encoding, the decoding restores data expressed in another form to the original form or format, and data with the same form or format as the original data is reconstructed.

Specifically, the terminal can perform decoding reconstruction through the decoding network based on the continuous structure characteristics to obtain a reconstructed neural network structure.

In one embodiment, the terminal may sample from the continuous structural features, input the sampled sample points into a decoding network, and perform decoding reconstruction through the decoding network to obtain a reconstructed neural network.

In another embodiment, since the continuous structure feature is a continuous probability distribution in a hidden space, the terminal may further calculate a mean value for the continuous probability distribution, input the calculation result into the decoding network, and decode and reconstruct through the decoding network to obtain a reconstructed neural network structure.

And step 110, training the neural network, the coding network and the decoding network based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure until the training stopping condition is met, obtaining a target coding network and a target decoding network, and determining a hidden space corresponding to the target coding network as a target searching space.

The reconstruction loss is related to the difference between the reconstructed neural network structure and the training neural network structure, and the smaller the difference between the reconstructed neural network structure and the training neural network structure is, the smaller the reconstruction loss is, the closer the obtained reconstructed neural network is to the training neural network. The training stopping condition includes, but is not limited to, the training duration exceeding a preset time threshold, the training number exceeding a preset number, the reconstruction loss being less than a preset threshold, and the like.

In one embodiment, the reconstruction loss between the trained neural network structure and the reconstructed neural network structure may be a reconstruction loss of a hidden layer of the neural network; in another embodiment, the reconstruction loss between the trained neural network structure and the reconstructed neural network structure may also be the reconstruction loss of the neural network connection edge; in other embodiments, the reconstruction loss between the trained neural network structure and the reconstructed neural network structure may also be an accumulated value of the reconstruction loss of the hidden layer of the neural network and the reconstruction loss of the edges of the neural network.

Specifically, the terminal determines reconstruction loss between a training neural network structure and a reconstruction neural network structure based on the difference between the training neural network structure and the reconstruction neural network structure, adjusts network parameters of a graph neural network, an encoding network and a decoding network according to the reconstruction loss back propagation, and ends training until a training stopping condition is met to obtain a trained target encoding network, a target decoding network and a target graph neural network.

And step 112, searching from the target search space according to the target search strategy to obtain target structure characteristics, and decoding the target structure characteristics through the target decoding network to obtain the target neural network structure.

Wherein, the target search strategy refers to a search strategy for searching the neural network structure. Different target search strategies may be defined according to different search requirements. For example, under the requirement of high search efficiency, the search strategy may be to perform random search in the target search space.

In one embodiment, the target search strategy refers to a search strategy that searches for an optimal neural network structure. Different target search strategies can be provided according to different evaluation indexes of the optimal neural network structure to be searched. For example, the optimal neural network may be the neural network with the lowest model complexity, and the target search strategy is to search with the minimized model complexity. For another example, the optimal neural network may be the neural network with the largest generalization performance, and the target search strategy is to search with the largest generalization performance. The target structure features refer to the structure features of the optimal neural network structure searched under the target search strategy.

Specifically, the terminal may perform iterative search in a target search space according to a target search strategy until a structural feature of the optimal neural network structure is searched as a target structural feature. And inputting the searched target structure characteristics into the trained target decoding network, and accurately reconstructing a neural network structure corresponding to the target structure characteristics to obtain the target neural network structure because the target decoding network is obtained based on reconstruction loss training. The obtained target neural network structure can be used for machine learning tasks in the current scene, for example, if the training neural network structure in the current scene is a neural network structure for in vivo detection, the obtained target neural network structure can be used for in vivo detection.

In one embodiment, when searching in the target search space, the terminal may first sample a sample point in the target search space, and then iteratively search for the neural network structural feature corresponding to the target search strategy by using a gradient method based on the sample point until the target structural feature is obtained.

In the neural network structure searching method, the training neural network structure is input into the graph neural network, the structure information of the training neural network can be effectively extracted through the graph neural network to obtain discrete structure characteristics corresponding to the training neural network structure, the discrete structure characteristics are input into the coding network, the discrete structure characteristics can be coded into continuous structure characteristics through the coding network, so that a continuous searching space is constructed, decoding reconstruction is carried out according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure, and finally the graph neural network, the coding network and the decoding network are trained on the basis of reconstruction loss between the training neural network structure and the reconstructed neural network structure until the training stopping condition is met to obtain the target coding network and the target decoding network, and the target coding network and the target decoding network are obtained on the basis of loss reconstruction training, the target coding network can well learn the structural features of the training neural network, and the target decoding network can accurately decode and reconstruct the structural features from the target search space to obtain the target neural network structure, so that the hidden space of the target coding network can be determined as the target search space.

In one embodiment, before training the neural network, the encoding network and the decoding network based on reconstruction loss between the training neural network structure and the reconstructed neural network structure, the neural network structure searching method further includes: acquiring at least one evaluation index label value corresponding to a training neural network structure; forecasting according to the continuous structure characteristics corresponding to the training neural network structure and each evaluation index prediction network to respectively obtain an evaluation index training value corresponding to each evaluation index prediction network; determining the evaluation index loss corresponding to each evaluation index label value according to each evaluation index label value and the corresponding evaluation index training value; and the loss of each evaluation index is used for training to obtain a target evaluation index prediction network corresponding to each evaluation index.

The evaluation index refers to an index for evaluating the quality of the neural network structure. Evaluation metrics include, but are not limited to, generalization performance, model complexity, resource constraints, and the like. The generalization performance refers to performance of the neural network on unknown data, and the generalization performance may be, for example, a model accuracy, where the model accuracy is used to describe an accuracy of the searched neural network structure in a corresponding machine learning task, for example, the model accuracy of the neural network structure for image classification, and the corresponding model accuracy describes a classification accuracy of the neural network structure when the neural network structure is used in the image classification task. The model complexity is used for representing the complexity of the model structure, and can be parameter quantity and training time.

The evaluation index label value refers to a corresponding value of an evaluation index of a training neural network in the training data set, and the value is used as a training label of the training neural network in the training process, namely an expected output value of the training neural network. The evaluation index prediction network refers to a network for predicting an evaluation index value of an unknown neural network structure. In one embodiment, the evaluation index tag value includes at least one of model complexity, generalization performance.

It is understood that, depending on the search strategy for the neural network structure, the evaluation index may be one or more, and different evaluation indexes correspond to different evaluation index prediction networks, and accordingly, the evaluation index prediction network may also be one or more. For example, when the evaluation index is the model complexity, the corresponding evaluation index prediction network is a model complexity prediction network, and is used for predicting the model complexity of the neural network structure; for another example, when the evaluation index is generalization performance, the corresponding evaluation index prediction network is a generalization performance prediction network, and is used for predicting the generalization performance of the neural network structure.

Specifically, in order to enable the evaluation index prediction network to accurately predict the evaluation index value of the searched neural network structure, the terminal needs to train the evaluation index prediction network. In the application, since the hidden space corresponding to the coding network is searched, and finally the evaluation index prediction network needs to predict the evaluation index corresponding to the neural network structural feature in the hidden space, in the training process, the terminal can perform training based on the hidden space, and the terminal performs prediction according to the continuous structural feature obtained by coding the training neural network into the hidden space and each evaluation index prediction network to obtain the evaluation index value corresponding to each evaluation index prediction network, wherein the evaluation index value is a real numerical value output in the training process, and is called an evaluation index training value. The purpose of training the evaluation index prediction network is to make the real value output by the evaluation index prediction network fit to the expected output value corresponding to the evaluation index prediction network, the parameters of the evaluation index prediction network may be adjusted during the fitting process based on the evaluation index loss determined from the difference between the expected output value (i.e., the evaluation index label value) and the true output value (i.e., the evaluation index training value) corresponding to the evaluation index prediction network, that is, the loss of each evaluation index determined according to the difference between the expected output value (i.e. the label value of the evaluation index) and the real output value (i.e. the training value of the evaluation index) corresponding to each evaluation index prediction network is used for training to obtain the corresponding target evaluation index prediction network, the target evaluation index prediction network herein refers to an evaluation index prediction network obtained when training is completed.

In an embodiment, the terminal performs prediction according to the continuous structure features obtained by encoding the training neural network into the hidden space and each evaluation index prediction network to obtain an evaluation index value corresponding to each evaluation index prediction network, which may specifically be: and the terminal samples in the continuous structural characteristics, respectively inputs the sampling results into each evaluation index prediction network, and predicts the corresponding evaluation index values of the sampling results through each evaluation index prediction network.

In another embodiment, the terminal performs prediction according to the continuous structure features obtained by encoding the training neural network into the hidden space and each evaluation index prediction network to obtain an evaluation index value corresponding to each evaluation index prediction network, which may specifically be: and the terminal calculates the mean value of the continuous structural characteristics, respectively inputs the calculation result into each evaluation index prediction network, and predicts the evaluation index value corresponding to the calculation result through each evaluation index prediction network.

In a specific embodiment, in order to strengthen the positive correlation that highlights the difference between the expected output value and the actual output value and the loss of the evaluation index, the square loss between the expected output value and the actual output value can be used as the loss of the evaluation index.

In one embodiment, as shown in fig. 2, there is provided a neural network structure searching method, including the following steps 202-216:

step 202, obtaining a training neural network structure.

And 204, inputting the training neural network structure into the neural network of the graph to obtain discrete structure characteristics corresponding to the training neural network structure.

Step 206, inputting the discrete structural features into the coding network, and coding the discrete structural features into corresponding continuous structural features through the coding network.

And step 208, decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure.

And step 210, obtaining at least one evaluation index label value corresponding to the training neural network structure.

And 212, predicting according to the continuous structure characteristics corresponding to the training neural network structure and each evaluation index prediction network to respectively obtain an evaluation index training value corresponding to each evaluation index prediction network.

And 214, determining the evaluation index loss corresponding to each evaluation index label value according to each evaluation index label value and each corresponding evaluation index training value.

For the steps 202 to 214, reference may be made to the description in the above embodiments, which are not repeated herein.

And step 216, combining the training graph neural network, each evaluation index prediction network, the coding network and the decoding network based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure and each evaluation index loss until a training stopping condition is met, and obtaining a target evaluation index prediction network, a target coding network and a target decoding network.

It can be understood that, in order to achieve the purpose of searching the optimal neural network structure in the continuous search space, the neural network of the training graph, the coding network, the decoding network and at least one evaluation index prediction network are required, and then a supervised training loss function for jointly training these networks can be constructed, and the neural network of the training graph, the coding network, the decoding network and the at least one evaluation index prediction network are jointly trained according to the supervised training loss function, so that the training efficiency can be improved.

Specifically, the terminal can construct a supervised training loss function based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure, the loss of each evaluation index, then jointly training the neural network of the graph, each evaluation index prediction network, the coding network and the decoding network based on the supervised training loss function, wherein the training process is the process of adjusting the network parameters of the neural network of the graph, each evaluation index prediction network, the coding network and the decoding network until the training stopping condition is met, at the moment, network parameters of the graph neural network, each evaluation index prediction network, the coding network and the decoding network are determined, and then the graph neural network, each evaluation index prediction network, the coding network and the decoding network at the moment can be determined as a target evaluation index prediction network, a target coding network and a target decoding network.

Fig. 2A is a schematic diagram of a training process in one embodiment. In this embodiment, the graph neural network and the coding network are collectively referred to as a coding network. Referring to fig. 2A, the terminal firstly inputs the training neural network structure into the coding network, codes the training neural network structure into a continuous structural feature in a hidden space through the coding network, then performs sampling based on the continuous structural feature, and inputs a sampling result into the evaluation index prediction network, so that the evaluation index prediction network can fit the evaluation index of the training neural network structure, and the decoding network restores the continuous structural feature to obtain a specific neural network structure, that is, a reconstructed neural network structure.

Referring to FIG. 2B, a diagram of a structure of a trained neural network in one embodiment is shown. In this embodiment, the trained neural network structure specifically includes one input layer, five convolutional layers, a maximum pooling layer of 3x3, and one output layer, where the five convolutional layers are, in order from top to bottom, a depth separable convolutional layer of 5x5 (depthwise separable convolution), a depth separable convolutional layer of 3x3, a normal convolutional layer of 3x3, a normal convolutional layer of 3x3, and a depth separable convolutional layer of 3x 3.

Fig. 2C is a schematic diagram of a reconstructed neural network structure obtained by reconstructing the training neural network structure in fig. 2B in an embodiment. The reconstructed neural network structure specifically comprises an input layer, four convolutional layers, an average pooling layer of 3x3, a maximum pooling layer of 3x3 and an output layer, wherein the four convolutional layers are sequentially a depth separable convolutional layer of 5x5, a normal convolutional layer of 5x5, a depth separable convolutional layer of 3x3 and a normal convolutional layer of 3x3 from top to bottom.

It is understood that the neural network structures shown in fig. 2B and fig. 2C are only examples, and in a specific application, training neural network structures of different structures are selected according to requirements of an application scenario, and corresponding reconstructed neural network structures are reconstructed.

It is understood that, in other embodiments, the terminal may first train the neural network of the graph, the coding network and the decoding network based on the reconstruction loss between the neural network structure of the training and the neural network structure of the reconstruction, fix the parameters of the target neural network, the coding network of the target and the decoding network of the target after training, and then train the evaluation index prediction network based on the training sample set. Specifically, the terminal can input the training neural network structure in the training sample set into the target graph neural network to obtain corresponding discrete structure characteristics, then input the obtained discrete structure characteristics into the target coding network, obtain corresponding continuous structure characteristics through coding by the target coding network, obtain a real output value according to the obtained continuous structure characteristics and the evaluation index prediction network, determine the evaluation index loss based on the difference between the real output value and the expected output value, adjust the model parameters of the evaluation index prediction network based on the loss back propagation, and end training until the training stopping condition is met, so as to obtain the target evaluation index prediction network.

In the above embodiment, by training the evaluation index prediction network, in the model search stage, the searched neural network structure can be predicted based on the evaluation index prediction network, so that the neural network structure with the optimized evaluation index can be searched.

In one embodiment, as shown in fig. 3, searching from the target search space according to the target search strategy to obtain the target structural feature includes:

step 302, sampling is performed in a target search space to obtain sample points.

In one embodiment, the terminal may perform random sampling in the target search space to obtain sample points.

In another embodiment, the coding network is a coding network of a variational self-coder, the decoding network is a decoding network of the variational self-coder, and the terminal can sample from the prior distribution of the variational self-coder to obtain a sample point when sampling.

And step 304, searching the neural network structural features by taking the sample points as starting points.

And step 306, inputting the searched neural network structural characteristics into each trained target evaluation index prediction network to obtain the evaluation index prediction value corresponding to each target evaluation index prediction network.

And 308, obtaining a comprehensive evaluation value according to each evaluation index predicted value and the target search strategy.

Specifically, the terminal starts to search the neural network structure from a sample point, inputs the searched neural network structure characteristics into each trained target evaluation index prediction network, predicts the searched neural network structure characteristics through each target evaluation index prediction network to obtain evaluation index predicted values corresponding to each target evaluation index prediction network, and then the terminal can calculate each evaluation index predicted value according to a target search strategy to obtain a comprehensive evaluation value, wherein the comprehensive evaluation value is used for comprehensively evaluating the advantages and disadvantages of the searched neural network structure.

It can be understood that, when there is only one evaluation index, there is only one corresponding evaluation index prediction network, and there is only one finally obtained evaluation index prediction value of the neural network structure, and at this time, the evaluation index prediction value is also a comprehensive evaluation value.

In one embodiment, the evaluation index label value includes model complexity and generalization performance, where a trained target evaluation index prediction network corresponding to the model complexity is a target model complexity prediction network, and a trained evaluation index prediction network corresponding to the generalization performance is a target generalization performance prediction network, so that when the terminal searches in a target search space, the searched neural network structural features can be respectively input into the target model complexity prediction network and the target generalization performance prediction network, the model complexity of the searched neural network structural features is predicted through the target model complexity prediction network to obtain a corresponding model complexity prediction value, and the generalization performance of the searched neural network structural features is predicted through the target generalization performance prediction network to obtain a generalization performance prediction value.

And 310, iteratively searching the neural network structural features along the direction of the optimized comprehensive evaluation value until the target structural features are obtained.

Specifically, the terminal can start from a sample starting point and iteratively search the neural network structural features in the direction of the optimized comprehensive evaluation value by adopting a gradient method until the target structural features are obtained. The gradient method can be gradient ascending or gradient descending and is determined according to a target search strategy. The optimization may be a minimum or maximum comprehensive evaluation value, which is determined according to the target search strategy. It is understood that, in other embodiments, other methods may be used to optimize the composite average, and the application is not limited thereto.

In an embodiment, the target search strategy is to maximize generalization performance while reducing model complexity, so that the terminal may subtract the model complexity prediction value from the generalization performance prediction value after obtaining the generalization performance prediction value and the model complexity prediction value to obtain a comprehensive evaluation value, and then iteratively search the neural network structural features along the direction of maximizing the comprehensive evaluation value by using a gradient rise method until the target structural features are obtained. The specific formula is as follows:

wherein f(s) is a comprehensive evaluation value f_perf(s) is a generalized predicted performance value, f_comp(s) is the model complexity prediction value. In this embodiment, a neural network structure having both excellent generalization performance and low model complexity can be searched by optimizing the comprehensive evaluation value.

In another embodiment, the target search strategy is to minimize the complexity of the model and simultaneously improve the generalization performance, so that the terminal can subtract the generalization performance predicted value from the model complexity predicted value after obtaining the generalization performance predicted value and the model complexity predicted value to obtain a comprehensive evaluation value, and then iteratively search the neural network structure features along the direction of the minimized comprehensive evaluation value by adopting a gradient descent method until the target structure features are obtained. In this embodiment, by optimizing the comprehensive evaluation value, a neural network structure with low model complexity and excellent generalization performance can be searched.

In one embodiment, training the neural network of the graph, the encoding network, and the decoding network based on reconstruction losses between the training neural network structure and the reconstruction neural network structure, comprises: determining reconstruction loss of a hidden layer according to a hidden layer structure of a training neural network structure and a hidden layer structure of a reconstruction neural network structure; and training a neural network, an encoding network and a decoding network based on the reconstruction loss of the hidden layer.

Specifically, the terminal determines the reconstruction loss of the hidden layer with reference to the following formula:

wherein L is_nIs the reconstruction loss of the hidden layer, T_iThe layer type representing the i-th layer of the training neural network structure may be, for example, (0,0,1,0,0), T_i' represents a layer type of an i-th layer of the reconstructed neural network structure, and CE represents a difference between the layer type of the i-th layer of the trained neural network structure and the layer type of the i-th layer of the reconstructed neural network structure measured by cross-entropy (cross-entropy).

Specifically, after the reconstruction loss of the hidden layer is calculated and obtained, the terminal can reversely propagate and adjust the network parameters of the neural network, the coding network and the decoding network based on the reconstruction loss of the hidden layer.

In one embodiment, before training the neural network, the coding network and the decoding network based on reconstruction loss of the hidden layer, the method further comprises: determining reconstruction loss of the connecting edge according to the connecting edge of the training neural network structure and the connecting edge of the reconstruction neural network structure; based on reconstruction loss of a hidden layer, training a neural network, an encoding network and a decoding network of a graph comprise: and training a neural network, an encoding network and a decoding network of the graph based on the reconstruction loss of the hidden layer and the reconstruction loss of the continuous edge.

Specifically, the terminal references the following equation to connect the reconstruction loss:

wherein L is_eIs the reconstruction loss of the connecting edge, E_iAll the edge types representing the ith node of the training neural network structure may be, for example, (1,1,0,0,0,1), E_i' represents the type of the connecting edge of the ith node of the reconstructed neural network structure, and CE represents the difference between the type of the connecting edge of the ith node of the trained neural network structure and the type of the connecting edge of the ith node of the reconstructed neural network structure measured by cross-entropy (cross-entropy).

Specifically, after the reconstruction loss of the hidden layer and the reconstruction loss of the connecting edge are obtained through calculation, the two losses are superposed to obtain the loss sum, and the network parameters of the neural network, the coding network and the decoding network of the graph are adjusted based on the loss sum and the back propagation.

It is understood that in some other embodiments, the terminal may adjust the network parameters of the neural network, the encoding network and the decoding network based on the back propagation of the reconstruction loss of the connected edge after calculating the reconstruction loss of the connected edge.

In one embodiment, the encoding network is a variational self-encoder encoding network and the decoding network is a variational self-encoder decoding network: based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure, training the neural network of the graph, the coding network and the decoding network, comprising: training a neural network of the graph, an encoding network and a decoding network based on the relative entropy and reconstruction loss between the training neural network structure and the reconstruction neural network structure.

Among them, the relative entropy (relative entropy), also called Kullback-Leibler divergence (KL divergence) or information divergence (information divergence), is a measure of asymmetry of the difference between two probability distributions (probability distributions).

Specifically, since the variational autocoder encodes a continuous probability distribution, which is expected to be a standard normal distribution in practice, the mean value is 0 and the variance is 1, so that the obtained implicit spatial continuity is better. In order to fit the distribution of the variational self-encoder to the standard normal distribution, a loss function can be constructed during training according to a reconstruction loss and a regularization term, wherein the reconstruction term tends to make the encoding and decoding scheme have high performance as much as possible, and the regularization term normalizes the organization of the hidden space by making the distribution returned by the encoding network close to the standard normal distribution. Specifically, in this embodiment, the reconstruction loss is the reconstruction loss between the training neural network structure and the reconstructed neural network structure, and the regularization term is the KL divergence between the returned distribution and the standard Gaussian.

In a specific embodiment, when the terminal performs training, a loss function may be constructed based on KL divergence and the reconstruction loss, the training is performed based on the loss function, network parameters of the graph neural network, the coding network and the decoding network are adjusted, a gradient descent method is adopted to optimize the loss function in the training process, and the optimization target is to minimize the loss function.

In the embodiment, the variational self-coding is adopted as the coding network and the decoding network, and the training is performed based on the relative entropy and the reconstruction loss in the training process, so that the obtained coding network and the decoding network have high performance, and the obtained hidden space has good continuity, thereby being easier to converge in the searching process and further improving the searching efficiency.

In a specific embodiment, a network architecture used in the neural network structure searching method provided in the embodiment of the present application is shown in fig. 4. Referring to fig. 4, the network architecture includes an encoder, a predictor, a decoder, a hidden space: the encoder represents a training neural network structure by using a Graph Neural Network (GNN), represents structure information of the training neural network by using a discrete form, maps discrete expression (namely discrete structure characteristics) of the neural network structure into a proper hidden space by using a coding part (namely a coding network) of a variational self-encoder (VAE), and needs to learn a decoder (namely a decoding network) by minimizing KL divergence in a training process; the predictor comprises a generalization performance predictor (namely a generalization performance prediction network) and a model complexity predictor (namely a model complexity prediction network), wherein the two regression models are respectively used for fitting the potential relation between the continuous expression (namely continuous structure characteristics) of the neural network structure and the corresponding generalization performance and the corresponding model complexity, and the continuous expression of the optimal neural network is obtained by using an optimization algorithm on the predictor in the model searching stage; the decoder adopts a decoder part of a variational self-encoder (VAE) to decode the structural features in the continuous space into a neural network structure, needs to minimize the reconstruction error between the decoded neural network structure and a training neural network in the training process, and is used for decoding the searched neural network structural features in the model searching stage. The network architecture can construct a continuous search space to search the neural network structure, and finally the neural network structure with excellent generalization performance and lower model complexity is obtained.

In a specific embodiment, a neural network structure searching method is provided, which includes the following steps:

a training process includes the following steps

1. Obtaining a training neural network structure and a model complexity label value and a generalization performance label value corresponding to the training neural network structure;

2. inputting the training neural network structure into a neural network of a graph to obtain discrete structure characteristics corresponding to the training neural network structure;

3. inputting the discrete structure characteristics into an encoding part of a variable self-encoder, and encoding the discrete structure characteristics into continuous expression in a continuous differentiable characteristic space through the encoding part;

4. sampling from the continuous expression, and inputting sample points obtained by sampling to a decoding part of a variational self-encoder to obtain a reconstructed neural network structure;

5. determining reconstruction loss of a hidden layer according to a hidden layer structure of a training neural network structure and a hidden layer structure of a reconstruction neural network structure;

6. determining reconstruction loss of the connecting edge according to the connecting edge of the training neural network structure and the connecting edge of the reconstruction neural network structure;

7. inputting sample points obtained by sampling into a generalization performance predictor and a model complexity predictor respectively to obtain a generalization performance training value and a model complexity training value;

8. determining the model complexity loss according to the square of the difference between the model complexity label value and the model complexity training value; ,

9. determining the generalization performance loss according to the square of the difference between the generalization performance label value and the generalization performance training value;

10. constructing a loss function of a joint training graph neural network, a coding part of a variational self-encoder, a decoding part of the variational self-encoder, a generalization performance predictor and a model complexity predictor according to the reconstruction loss of a hidden layer, the reconstruction loss of a connecting edge, the model complexity loss, the generalization performance loss and relative entropy (KL divergence), training by adopting a gradient descent method to minimize the loss function as a target, adjusting network parameters of the graph neural network, the coding part of the variational self-encoder, the decoding part of the variational self-encoder, the generalization performance predictor and the model complexity predictor until the training process converges, and obtaining the target graph neural network, the target variational self-encoder, the target generalization performance prediction network and the target model complexity prediction network. In the training process, the generalization performance predictor and the model complexity predictor fit the continuous expression of the training neural network structure in the continuous differentiable feature space.

Wherein the loss function is as follows:

wherein phi and theta are parameters of the coding network and the decoding network respectively, y is a generalization performance training value, and z is a model complexity training value.

Secondly, the searching process specifically comprises the following steps:

11. determining a continuous micro characteristic space corresponding to a target variational self-encoder as a target search space, and sampling based on prior distribution of the variational self-encoder in the target search space to obtain sample points;

12. searching neural network structural features by taking a sample point as a starting point, inputting the searched neural network structural features into a trained target generalization performance predictor and a trained target model complexity predictor in the searching process to obtain a generalization performance prediction value and a model complexity prediction value, and optimizing an optimization target according to the obtained generalization performance prediction value and the model complexity prediction value, wherein the optimization target is to maximize the generalization performance and simultaneously reduce the model complexity, a gradient ascent method can be adopted in the optimization method, and the optimization target is specifically as follows;

wherein f is_perf(s) is a generalized predicted performance value, f_comp(s) is the model complexity prediction value.

15. And when the optimization target reaches the minimum value, determining the searched neural network structure as a target structure characteristic, inputting the target structure characteristic into a target decoding network, and decoding the target structure characteristic through the target decoding network to obtain the target neural network structure.

In the embodiment, the structure of the training neural network is input into the neural network of the graph to construct discrete structure characteristics, so that the structure information in the training neural network can be effectively extracted; by using a variational self-encoder, the neural network structure can be mapped into a continuous search space in the embodiment; by optimizing the neural network of each graph, the coding part of the variational self-coder, the decoding part of the variational self-coder, the generalization performance predictor and the model complexity predictor in the continuous micro-feature space by using a gradient method, the neural network structure searching method provided by the embodiment has higher training efficiency in the training stage; by fitting the generalization performance and the model complexity of the neural network at the same time, the neural network searched by the neural network structure searching method provided by the embodiment in the search stage has both excellent generalization performance and lower model complexity.

The overall effect of the neural network structure search method provided by the embodiment of the application in practical application is shown in the following table 1:

TABLE 1

As can be seen from the above table, in the case of selecting training samples in the same data set, the neural network structure search method provided in the embodiment of the present application obtains excellent results in terms of training time, generalization performance, and complexity of the searched model, compared with other methods.

The application also provides an application scenario, and the neural network structure searching method is applied to the application scenario. Specifically, the application of the neural network structure search method in the application scenario is as follows:

in the application scenario, a suitable deep learning model network architecture needs to be selected for an image classification task in computer vision, and the image classification task may be, for example, classification of commodity images, gender classification of images including human faces, and the like. In the application scenario, firstly, a neural network structure for image classification is selected from an NAS-Bench-101 data set as a training neural network structure, parameter quantity and model accuracy rate corresponding to the selected neural network structure are used as evaluation indexes to predict a training label of a network, when the neural network structure is searched, the neural network structure for image classification is firstly input into a graph neural network to obtain corresponding discrete structure characteristics, then the discrete structure characteristics are coded into a continuous space through a coding part of a variational self-coding to obtain continuous probability distribution, sampling is carried out in the continuous probability distribution, a sample point is input into a decoding part of a variational self-coding device, a reconstructed neural network structure is obtained through decoding and reconstruction of the decoding part, and reconstruction loss of a hidden layer is determined according to the hidden layer structure of the reconstructed neural network structure and the hidden layer structure of the selected training neural network structure, determining reconstruction loss of the connecting edge according to the connecting edge of a training neural network structure and the connecting edge of a reconstruction neural network structure, simultaneously inputting the sample point to a generalization performance predictor and a model complexity predictor, obtaining an accuracy predicted value through the generalization performance predictor, obtaining a model parameter quantity predicted value through the model complexity predictor, then superposing KL divergence, reconstruction loss of a hidden layer, reconstruction loss of the connecting edge, square loss between the accuracy predicted value and model accuracy in a training label, and square loss between the model parameter quantity predicted value and parameter quantity in the training label to obtain comprehensive loss, combining a training graph network, a coding part of variable self-coding, a decoding part of variable self-coder, the generalization performance predictor and the model complexity predictor based on the comprehensive loss, adopting a gradient descent method in the training process to adjust parameters of each network by taking minimum comprehensive loss as a target, until the training process converges, a trained graph neural network, a variational self-coding, a generalization performance predictor and a model complexity predictor are obtained, then searching can be carried out based on a plurality of trained networks, specifically, sampling is carried out in the prior distribution of the variational self-coding, from a sample point obtained by sampling, a neural network structure is iteratively searched by using a gradient ascending method, the searched neural network structure characteristics are input into the trained generalization performance predictor and the model complexity predictor, a parameter quantity predicted value and a model accuracy predicted value of the searched neural network structure characteristics are obtained, until the value obtained by subtracting the parameter quantity predicted value from the model accuracy predicted value is maximum, the searched neural network structure characteristics are determined as target structure characteristics, and the target structure characteristics are decoded and reconstructed by a decoding network of the trained variational self-coding, a neural network structure is obtained that can be used for the image classification task.

Taking the classification of the commodity images as an example, the classification categories can be clothing, fresh goods, diet goods, washing and caring goods and the like, when an image classification task is performed, firstly, labeled commodity image data needs to be obtained, the labeled commodity images are used as training sample images, corresponding category labels are used as training labels to perform supervised training on the neural network, and after training is performed, the unknown classified commodity images can be classified. In practical application, the same type of commodity can be recommended to the user by classifying the commodity images.

in the application scenario, a suitable deep learning model network architecture needs to be selected for a text classification task in natural language processing, where the text classification task may be, for example, news classification, emotion classification, public opinion classification, spam classification, and the like. In the application scenario, firstly, a neural network structure for text classification is selected from an ENAS-CIFAR-10 data set as a training neural network structure, training time and model accuracy corresponding to the selected neural network structure are used as training labels of an evaluation index prediction network, when the neural network structure is searched, the neural network structure for text classification is firstly input into a graph neural network to obtain corresponding discrete structure characteristics, then the discrete structure characteristics are coded into a continuous space through a coding part of a variational self-coding to obtain a continuous probability distribution, sampling is carried out in the continuous probability distribution, sample points are input into a decoding part of a variational self-coding device, a reconstructed neural network structure is obtained through decoding reconstruction of the decoding part, and reconstruction loss of a hidden layer is determined according to the hidden layer structure of the reconstructed neural network structure and the hidden layer structure of the selected training neural network structure, determining reconstruction loss of the continuous edges according to the continuous edges of a training neural network structure and the continuous edges of a reconstruction neural network structure, simultaneously inputting the sample points into a generalization performance predictor and a model complexity predictor, obtaining an accuracy predicted value through the generalization performance predictor, obtaining a model training time predicted value through the model complexity predictor, then superposing KL divergence, reconstruction loss of a hidden layer, reconstruction loss of the continuous edges, square loss between the accuracy predicted value and model accuracy in a training label, and square loss between the model training time predicted value and training time in the training label to obtain comprehensive loss, jointly training a psychology network, a coding part of variable self-coding, a decoding part of variable self-coder, the generalization performance predictor and the model complexity predictor on the basis of the comprehensive loss, adopting a gradient descent method in the training process to adjust parameters of each network by taking the minimum comprehensive loss as a target, until the training process converges, a trained graph neural network, a variational self-coding, a generalization performance predictor and a model complexity predictor are obtained, then searching can be carried out based on a plurality of trained networks, specifically, sampling is carried out in the prior distribution of the variational self-coding, from a sample point obtained by sampling, a neural network structure is iteratively searched by using a gradient ascending method, the searched neural network structure characteristics are input into the trained generalization performance predictor and the model complexity predictor, a training time predicted value and a model accuracy predicted value of the searched neural network structure characteristics are obtained, until the value obtained by subtracting the training time predicted value from the model accuracy predicted value is maximum, the searched neural network structure characteristics are determined as target structure characteristics, and the target structure characteristics are decoded and reconstructed by a decoding network of the trained variational self-coding, a neural network structure is obtained that can be used for the text classification task.

Taking news classification as an example, the classified categories can be international news, domestic news and local news, when a text classification task is performed, firstly, labeled news texts are obtained, the labeled news texts are used as training samples, corresponding category labels are used as training labels to train the neural network, and after training is completed, the unknown classified news texts can be classified. In practical applications, by classifying news, the same type of news can be pushed to users based on the text type of news used for browsing frequently.

It should be understood that, although the steps in the flowcharts of fig. 1 and 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in fig. 1 and 3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

In one embodiment, as shown in fig. 5, there is provided a neural network structure searching apparatus 500, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a training data acquisition module 502, a discrete encoding module 504, a continuous encoding module 506, a decoding template 508, a training module 510, and a search module 512, wherein:

a training data acquisition module 502 for acquiring a training neural network structure;

a discrete coding module 504, configured to input the training neural network structure into a graph neural network, so as to obtain a discrete structure characteristic corresponding to the training neural network structure;

a continuous encoding module 506, configured to input the discrete structural features into an encoding network, and encode the discrete structural features into corresponding continuous structural features through the encoding network;

a decoding template 508, configured to perform decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure;

a training module 510, configured to train the neural network, the coding network, and the decoding network based on a reconstruction loss between the training neural network structure and the reconstruction neural network structure until a training stop condition is met, obtain a target coding network and a target decoding network, and determine a hidden space corresponding to the target coding network as a target search space;

and the searching module 512 is configured to search from a target search space according to a target search strategy to obtain a target structure feature, and decode the target structure feature through a target decoding network to obtain a target neural network structure.

The neural network structure searching device can effectively extract the structure information of the training neural network through the graph neural network by inputting the training neural network structure into the graph neural network to obtain the discrete structure characteristics corresponding to the training neural network structure, can code the discrete structure characteristics into the continuous structure characteristics through the coding network by inputting the discrete structure characteristics into the coding network, thereby constructing a continuous searching space, performs decoding reconstruction according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure, and finally trains the graph neural network, the coding network and the decoding network based on the reconstruction loss between the training neural network structure and the reconstructed neural network structure until the training stopping condition is met to obtain the target coding network and the target decoding network, because the target coding network and the target decoding network are obtained based on the reconstruction loss training, the target coding network can well learn the structural features of the training neural network, and the target decoding network can accurately decode and reconstruct the structural features from the target search space to obtain the target neural network structure, so that the hidden space of the target coding network can be determined as the target search space.

In one embodiment, the above apparatus further comprises: the evaluation index input module is used for acquiring at least one evaluation index label value corresponding to the training neural network structure; forecasting according to the continuous structure characteristics corresponding to the training neural network structure and each evaluation index prediction network to respectively obtain an evaluation index training value corresponding to each evaluation index prediction network; determining the evaluation index loss corresponding to each evaluation index label value according to each evaluation index label value and the corresponding evaluation index training value; and the loss of each evaluation index is used for training to obtain a target evaluation index prediction network corresponding to each evaluation index.

In one embodiment, the training module 510 is further configured to: and based on reconstruction loss and each evaluation index loss between the training neural network structure and the reconstruction neural network structure, combining the training graph neural network, each evaluation index prediction network, the coding network and the decoding network until the training stopping condition is met, and obtaining a target evaluation index prediction network, a target coding network and a target decoding network.

In one embodiment, the search module 512 is further configured to sample in the target search space to obtain sample points; searching the structural characteristics of the neural network by taking the sample points as starting points; inputting the searched neural network structural features into each trained target evaluation index prediction network to obtain evaluation index prediction values corresponding to each target evaluation index prediction network; obtaining a comprehensive evaluation value according to each evaluation index predicted value and a target search strategy; and iteratively searching the neural network structural features along the direction of the optimized comprehensive evaluation value until the target structural features are obtained.

In one embodiment, the at least one evaluation index tag value includes at least one of model complexity, generalization performance.

In one embodiment, the evaluation index tag values include model complexity and generalization performance; the searching module 512 is further configured to input the searched neural network structural features into the target model complexity prediction network and the target generalization performance prediction network, respectively, to obtain a generalization performance prediction value and a model complexity prediction value; subtracting the model complexity predicted value from the generalization performance predicted value to obtain a comprehensive evaluation value; and iteratively searching the neural network structural features along the direction of the maximized comprehensive evaluation value until the target structural features are obtained.

In one embodiment, the evaluation index tag values include model complexity and generalization performance; the searching module 512 is further configured to input the searched neural network structural features into the target model complexity prediction network and the target generalization performance prediction network, respectively, to obtain a generalization performance prediction value and a model complexity prediction value; subtracting the generalization performance predicted value from the model complexity predicted value to obtain a comprehensive evaluation value; and iteratively searching the neural network structural features along the direction of the minimized comprehensive evaluation value until the target structural features are obtained.

In one embodiment, the training module 510 is further configured to determine a reconstruction loss of the hidden layer according to the hidden layer structure of the trained neural network structure and the hidden layer structure of the reconstructed neural network structure; and training a neural network, an encoding network and a decoding network based on the reconstruction loss of the hidden layer.

In one embodiment, the training module 510 is further configured to determine a reconstruction loss of the connected edges according to the connected edges of the training neural network structure and the connected edges of the reconstructed neural network structure; and training a neural network, an encoding network and a decoding network of the graph based on the reconstruction loss of the hidden layer and the reconstruction loss of the continuous edge.

In one embodiment, the search module 512 is further configured to sample in an a priori distribution of the variational self-encoder to obtain sample points.

In one embodiment, the encoding network is a variational self-encoder encoding network and the decoding network is a variational self-encoder decoding network: the training module 510 is further configured to train the neural network, the encoding network, and the decoding network based on the relative entropy and reconstruction loss between the training neural network structure and the reconstructed neural network structure.

In one embodiment, the decoding template 508 is further configured to perform sampling according to the continuous structural feature to obtain a sampled structural feature; and inputting the sampling structure characteristics into a decoding network to obtain a reconstructed neural network structure.

For specific limitations of the neural network structure searching apparatus, reference may be made to the above limitations of the neural network structure searching method, which are not described herein again. The modules in the neural network structure searching device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a neural network structure searching method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A neural network structure searching method, the method comprising:

acquiring a training neural network structure;

2. The method of claim 1, wherein prior to said training the graph neural network, the encoding network, and the decoding network based on reconstruction losses between the trained neural network structure and the reconstructed neural network structure, the method further comprises:

obtaining at least one evaluation index label value corresponding to the training neural network structure;

predicting according to the continuous structure characteristics corresponding to the training neural network structure and each evaluation index prediction network to respectively obtain an evaluation index training value corresponding to each evaluation index prediction network;

determining the evaluation index loss corresponding to each evaluation index label value according to each evaluation index label value and the corresponding evaluation index training value; and each evaluation index loss is used for training to obtain a corresponding target evaluation index prediction network.

3. The method of claim 2, wherein training the graph neural network, the coding network, and the decoding network based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure until a training stop condition is met to obtain a target coding network and a target decoding network comprises:

and jointly training the graph neural network, each evaluation index prediction network, the coding network and the decoding network based on the reconstruction loss between the training neural network structure and the reconstruction neural network structure and each evaluation index loss until the training stopping condition is met, and obtaining the target evaluation index prediction network, the target coding network and the target decoding network.

4. The method of claim 2, wherein searching from the target search space according to a target search strategy to obtain a target structural feature comprises:

sampling in the target search space to obtain sample points;

searching for the structural features of the neural network by taking the sample points as starting points;

inputting the searched neural network structural features into each trained target evaluation index prediction network to obtain evaluation index prediction values corresponding to each target evaluation index prediction network;

obtaining a comprehensive evaluation value according to each evaluation index predicted value and the target search strategy;

and iteratively searching the neural network structural features along the direction of optimizing the comprehensive evaluation value until the target structural features are obtained.

5. The method of claim 2, wherein the at least one evaluation index tag value comprises at least one of model complexity, generalization performance.

6. The method of claim 4, wherein the evaluation index tag values include model complexity and generalization performance;

the step of inputting the searched neural network structural features into each trained target evaluation index prediction network to obtain the evaluation index prediction value corresponding to each target evaluation index prediction network comprises the following steps:

inputting the searched neural network structure characteristics into a target model complexity prediction network and a target generalization performance prediction network respectively to obtain a generalization performance prediction value and a model complexity prediction value;

obtaining a comprehensive evaluation value according to each evaluation index predicted value and the target search strategy, wherein the comprehensive evaluation value comprises the following steps:

subtracting the model complexity predicted value from the generalization performance predicted value to obtain a comprehensive evaluation value;

the iterative search of the neural network structural features along the direction of optimizing the comprehensive evaluation value until target structural features are obtained includes:

and iteratively searching the neural network structural features along the direction of maximizing the comprehensive evaluation value until the target structural features are obtained.

7. The method of claim 4, wherein the evaluation index tag values include model complexity and generalization performance;

subtracting the generalization performance predicted value from the model complexity predicted value to obtain a comprehensive evaluation value;

and iteratively searching the neural network structural features along the direction of minimizing the comprehensive evaluation value until the target structural features are obtained.

8. The method of claim 1, wherein training the graph neural network, the encoding network, and the decoding network based on reconstruction losses between the trained neural network structure and the reconstructed neural network structure comprises:

determining reconstruction loss of a hidden layer according to the hidden layer structure of the training neural network structure and the hidden layer structure of the reconstruction neural network structure;

training the graph neural network, the encoding network, and the decoding network based on reconstruction losses of the hidden layer.

9. The method of claim 8, wherein before the training the graph neural network, the encoding network, and the decoding network based on the reconstruction loss of the hidden layer, the method further comprises:

determining reconstruction loss of the connecting edge according to the connecting edge of the training neural network structure and the connecting edge of the reconstruction neural network structure;

the training the graph neural network, the encoding network, and the decoding network based on the reconstruction loss of the hidden layer comprises:

training the graph neural network, the encoding network and the decoding network based on reconstruction loss of the hidden layer and reconstruction loss of the connected edges.

10. The method of claim 4, wherein the encoding network is an encoding network of a variational self-encoder and the decoding network is a decoding network of a variational self-encoder: the sampling in the target search space to obtain a sample point includes:

and sampling in the prior distribution of the variational self-encoder to obtain sample points.

11. The method according to any one of claims 1 to 10, wherein the encoding network is an encoding network of a variational self-encoder and the decoding network is a decoding network of a variational self-encoder: the training the graph neural network, the encoding network, and the decoding network based on reconstruction losses between the trained neural network structure and the reconstructed neural network structure, comprising:

training the graph neural network, the encoding network, and the decoding network based on relative entropy and reconstruction loss between the training neural network structure and the reconstruction neural network structure.

12. The method according to any one of claims 1 to 10, wherein the encoding network is an encoding network of a variational self-encoder and the decoding network is a decoding network of a variational self-encoder: and decoding according to the continuous structure characteristics and the decoding network to obtain a reconstructed neural network structure, and the method comprises the following steps:

sampling according to the continuous structural features to obtain sampling structural features;

and inputting the sampling structure characteristics into the decoding network to obtain the reconstructed neural network structure.

13. An apparatus for searching a neural network structure, the apparatus comprising:

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.