CN114065003A

CN114065003A - Network structure searching method, system and medium oriented to super large searching space

Info

Publication number: CN114065003A
Application number: CN202111256098.3A
Authority: CN
Inventors: 谭明奎; 国雍; 陈耀佛
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; South China University of Technology SCUT
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; South China University of Technology SCUT
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-02-18
Also published as: WO2023071592A1

Abstract

The invention discloses a network structure searching method, a system and a medium facing to a super-large searching space, wherein the method comprises the following steps: constructing a target data set for training a neural network; determining a neural network search space for a target task; in the process of gradually amplifying the search space, a reinforcement learning algorithm is adopted to search the high-efficiency network structure; and training the search network structure obtained by searching according to the target data set to obtain the final search network structure. The method for gradually increasing the search space can effectively reduce the search difficulty, improve the search efficiency and performance, and can be widely applied to the field of artificial intelligence.

Description

Network structure searching method, system and medium oriented to super large searching space

Technical Field

The invention relates to the field of artificial intelligence, in particular to a network structure searching method, a system and a medium for an ultra-large searching space.

Background

In recent years, deep neural networks have been widely used in different tasks. With the increase of the number of parameters and the number of network layers and the effective utilization of the GPU, the accuracy and the efficiency of the neural network model are remarkably improved. However, most of the existing neural networks are designed manually, and the design process depends on rich network structure design experience, which brings high design cost. This makes the deep neural network model difficult to apply to a multitude of real-world engineering tasks.

To solve the above problems, neural network search attempts to automatically design an efficient neural network architecture. In order to search for an efficient network architecture, existing methods must explore the search space by sampling enough network structures. However, the search space is often very large (often contains billions of candidate network architectures), and under the condition of limited computing resources, only a few network structures can be sampled from the search space, so that the exploration efficiency of the search space is greatly restricted. In order to find a better network structure, the whole search space needs to be explored as much as possible, and given the limited sampling times of the network structure, the problem puts a very high requirement on the sampling accuracy in the search process.

Disclosure of Invention

In order to solve at least one of the technical problems in the prior art to a certain extent, the present invention aims to provide a method, a system and a medium for searching a network structure oriented to a very large search space.

The technical scheme adopted by the invention is as follows:

a network structure searching method facing to a super-large searching space comprises the following steps:

constructing a target data set for training a neural network;

determining a neural network search space for a target task;

in the process of gradually amplifying the search space, a reinforcement learning algorithm is adopted to search the high-efficiency network structure;

and training the search network structure obtained by searching according to the target data set to obtain the final search network structure.

Further, the constructing a target data set for training a neural network includes:

collecting pictures from a target task scene, and carrying out category marking on the pictures to construct a target data set;

and dividing the labeled target data set into a training set, a verification set and a test set.

Further, the determining a neural network search space for the target task includes:

dividing a computing unit forming a deep convolutional neural network model into a standard computing unit and a downsampling computing unit;

a search space of the calculation unit is set.

Further, in the process of gradually expanding the search space, a reinforcement learning algorithm is adopted to search the efficient network structure, and the method comprises the following steps:

a1, constructing a meta-controller by using a single-layer bidirectional long-short term memory network, and generating a network architecture alpha-pi (alpha, theta); wherein, alpha is the generated network structure, pi is the strategy learned by the element controller, and theta is the element controller network parameter;

a2, constructing a hyper-network model facing to an image recognition task, wherein the hyper-network model is obtained by stacking a plurality of computing units; a plurality of candidate operations are contained between the input characteristic and the output characteristic of each computing unit;

a3, constructing an initial search space according to the candidate operation, gradually adding the candidate operation in the search process, and obtaining a new search space by amplification on the basis of the previous search space; the whole search process comprises K search stages, wherein each stage corresponds to a search space omega with different sizes_i；

A4, generating a sub-network architecture alpha-pi (alpha, theta) in the current search space by using the meta-controller, and setting the corresponding candidate operation weight w of the sub-network architecture in the super-network model_αActivating; training a hyper-network model in a target data set;

a5, generating a sub-network architecture alpha-pi (alpha, theta) in the current search space by using the meta-controllerInheriting the weight of the corresponding operation in the hyper-network model to obtain the weight w of the sub-network model_α(ii) a Performance indicators R (alpha, w) tested in partitioned validation data sets_α) And updating the weight theta of the meta-controller by taking the weight theta as the reward value;

a6, repeating the steps A3 to A5 until K candidate operations have been added to the search space.

Further, the training the search network structure obtained by searching according to the target data set to obtain a final search network structure includes:

acquiring a high-performance network structure according to the trained meta-controller model;

and training the obtained network structure model according to the target data set to obtain a final search network structure model.

Further, the training the obtained network structure includes:

and training the obtained network structure model by adopting a random gradient descent algorithm until the network architecture model converges.

Further, the step a3 specifically includes:

if e.g. the initial search space omega₀Randomly selecting one candidate operation from all candidate operations to construct an initial search space;

if the current search space is constructed, one candidate operation is randomly selected from the rest candidate operations to be added into the search space omega_i-1To construct a new search space omega_i。

The other technical scheme adopted by the invention is as follows:

a network structure search system facing a super-large search space comprises:

the data set construction module is used for constructing a target data set for training the neural network;

the search space determining module is used for determining a neural network search space for the target task;

the space searching module is used for searching the high-efficiency network structure by adopting a reinforcement learning algorithm in the process of gradually amplifying the searching space;

and the model training module is used for training the search network structure obtained by searching according to the target data set to obtain the final search network structure.

The other technical scheme adopted by the invention is as follows:

a network structure search system facing a super-large search space comprises:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The other technical scheme adopted by the invention is as follows:

a storage medium having stored therein a processor-executable program for performing the method as described above when executed by a processor.

The invention has the beneficial effects that: the method for gradually increasing the search space can effectively reduce the search difficulty and improve the search efficiency and performance.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating steps of a method for searching a network structure oriented to a very large search space according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network structure search space gradually expanded in an embodiment of the present invention;

fig. 3 is a schematic diagram comparing a standard neural network searching method with a searching method of an automatic network structure searching method for a super-large search space according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

The overall scheme of the invention is shown in fig. 1, and the first step needs to construct a target data set and set a search space of a target task. Then an initial relatively small search space is constructed and searched, the search space is gradually enlarged by gradually increasing candidate calculation operations, and more accurate sampling is carried out in a larger search space by utilizing the previously learned network structure, so that the search space is more fully explored under the limited sampling times. The method can reduce the searching difficulty in the searching process so as to find a better network structure. And finally, training the searched network structure on a target data set until convergence is reached, and deploying the model to the target task. Each step in the figure will be described in detail below with reference to fig. 1:

s1: a target data set for training a neural network is collected and labeled.

S1-1: collecting pictures from a target task scene and carrying out category marking on the pictures to construct a data set;

s1-2: and dividing the labeled data set into a training set, a verification set and a test set.

S2: a neural network search space for the target task is set.

S2-1: for computer vision tasks, the constituent computational units of the deep convolutional neural network model can be divided into two categories: a standard calculation unit and a downsampling calculation unit. The search space for both types of computation units is the same, but the standard computation unit will maintain the spatial resolution of the input features, while the downsampling computation unit will reduce the spatial resolution of the output features to half the input. A complete convolutional neural network can be constructed by stacking the calculation units for a plurality of times.

S2-2: setting a search space of a computing unit: each computing unit contains 7 nodes, of which there are 2 input nodes, 4 intermediate nodes, and 1 output node. There may be 8 different candidate operation operations between two nodes, including operations such as 3 × 3 depth separable convolution, 5 × 5 depth separable convolution, 3 × 3 max pooling, 3 × 3 average pooling, 3 × 3 hole convolution, 5 × 5 hole convolution, skip operation, and null operation. Each convolution operation is followed by a batch normalized budget and a ReLU activation function is used.

S3: efficient network structures are searched in a gradually expanding search space using a reinforcement learning algorithm.

S3-1: a single-layer bidirectional long-short term memory network is used for constructing a meta-controller for generating a network architecture alpha-pi (alpha, theta), wherein alpha is a network structure, and pi is a strategy learned by the controller. Initializing a network parameter theta of a meta-controller;

s3-2: and constructing a hyper-network model facing to an image recognition task, wherein the weight parameter is marked as w. The whole hyper-network model is obtained by stacking a plurality of computing units, wherein the computing units comprise a standard computing unit and a down-sampling computing unit; as shown in fig. 2, each computing unit (i.e., node in the graph) includes a plurality of candidate operations (i.e., connecting lines between nodes in the graph) between the input features and the output features, and only one candidate operation is activated in the training and testing phase of the network model and is finally used for a target task;

s3-3: the whole search process comprises K search stages, wherein each stage corresponds to a search space omega with different sizes_i. Such as the initial search space omega₀And if not, randomly selecting one candidate operation from all the candidate operations to construct an initial search space. Because the hyper-network model cannot be trained without parameters, the operations that specify the first addition must be parameterized (e.g., convolved). If the current search space is constructed, randomly selecting one candidate operation from the rest candidate operations to be added into the search space omega_i-1In constructing a new search space omega_i. This process is schematically illustrated in fig. 2, where different colored connecting lines between nodes represent different candidate operations. In the initial stage, the whole search space only has one candidate operation, the search space is small, and a network structure with excellent performance can be found in the space easily; continuously randomly selecting one from the rest candidate operations to be added into the search space to form a new search space along with the increase of the search stage;

preheating the newly added candidate operation weight: in order to improve the fairness of competition among different computing operations, a super network is trained through a random sampling network structure, and each computing operation is trained with equal probability. Thus, the device is provided withCandidate network structures containing new computing operations may achieve performance comparable to other network structures. By operating the preheating mode, the searching process becomes more stable, and the searching performance is also obviously improved. Specifically, a subnetwork structure alpha-p (alpha, omega) is randomly sampled from the search space_i) Where p denotes a uniform distribution, Ω_iRepresenting the search space of the i-th stage. The sub-network model is shared with the corresponding parametric weights of the super-network model. Training the generated subnetwork model weights w on the partitioned training set by a stochastic gradient descent algorithm after randomly sampling the subnetwork structure_α。

S3-4: training a hyper-network model: generating a sub-network architecture alpha-pi (alpha, theta) in the current search space by using a meta-controller, and setting the corresponding candidate operation weight w in the super-network model_αAnd (4) activating. And selecting a batch of sample data from the divided training data set, and training the generated subnetwork model through a random gradient descent algorithm.

S3-5: training element controller: generating a sub-network architecture alpha-pi (alpha, theta) by using the meta-controller in the current search space, and obtaining the weight w of the sub-network model by inheriting the weight of the corresponding operation in the super-network model_α(ii) a Selecting a batch of sample data from the divided verification data set, and testing the performance index R (alpha, w) of the generated sub-network model on the target task on the sample data_α) (ii) a The performance index R (alpha, w) obtained by the test_α) As an incentive value, updating the weight theta of the element controller through a reinforcement learning strategy gradient algorithm;

s3-6: the steps S3-3 through S3-5 are repeated until all K candidate operations have been added to the search space.

S4: the network structure is inferred and the model is trained on the target data set.

S4-1: based on a trained meta-controller model, a high-performance network structure can be inferred. Given K candidate operations, selecting a strategy pi (·, theta; omega) obtained by learning in the final stage (i.e. the stage with the largest search space)_K) As a final strategy for sampling the network structure. Firstly, 10 network nodes are sampledAnd then selecting the network structure with the highest classification accuracy on the verification set as the final search network structure.

S4-2: and training the searched network architecture model on the divided training data set by using a random gradient descent algorithm until convergence.

In summary, compared with the prior art, the method of the embodiment of the invention has the following beneficial effects:

the existing neural network searching method usually directly searches in a fixed oversized searching space, and because of the huge searching difficulty brought by the oversized searching space, only suboptimal network structures can be found. Different from the existing method, the method for gradually increasing the search space can effectively reduce the search difficulty and improve the search efficiency and performance. As shown in fig. 3, once some good network structures are found in a small search space, by gradually enlarging the search space, it is more likely to find a candidate subspace (gray circle) with higher similarity to the optimal network structure found in the previous subspace. Therefore, sampling in the new subspace has a high possibility of finding a better network structure (at least having similar performance to the optimal network structure of the previous subspace), and the sampling opportunity is reduced due to the fact that sampling to a very poor network structure is wasted, so that the sampling accuracy is improved. The scheme searches in the gradually enlarged search space, is a multi-stage search process in essence, and the candidate subspace can adaptively evolve along with the search process. Because the candidate subspace is smaller in each search stage, the method can perform more accurate sampling, and a high-performance network structure can be found in a larger search space.

The following presents the beneficial effects of the method of this embodiment in combination with experimental data.

The automatic network structure searching method facing the super-large searching space can reduce the difficulty of neural network searching in the initial stage, gradually increase the searching space along with the advancing of the searching stage, and assist the searching in the next stage by utilizing the structure searched in the previous stage. Table 1 and Table 2 show the results of the comparison between the CIFAR-10 dataset and the ImageNet dataset with the best known methods, respectively. After the scheme is applied, the search cost can be reduced and the search performance can be improved on two common image recognition data sets.

TABLE 1

TABLE 2

The embodiment further provides a network structure search system facing a huge search space, which includes:

The network structure search system for the ultra-large search space of the embodiment can execute the network structure search method for the ultra-large search space provided by the method embodiment of the invention, can execute any combination of the implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method shown in fig. 1.

The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the network structure search method facing the ultra-large search space provided by the method embodiment of the present invention, and when the instruction or the program is executed, the method embodiment can be executed by any combination of the implementation steps, and the method has the corresponding functions and advantages of the method.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A network structure searching method facing to a super-large searching space is characterized by comprising the following steps:

constructing a target data set for training a neural network;

determining a neural network search space for a target task;

2. The method for searching the network structure facing the huge search space, according to claim 1, wherein the constructing of the target data set for training the neural network comprises:

3. The method for searching the network structure facing the huge search space, according to claim 1, wherein the determining the neural network search space for the target task comprises:

a search space of the calculation unit is set.

4. The method for searching the network structure facing the huge search space, according to claim 1, wherein in the process of gradually expanding the search space, the efficient network structure is searched by adopting a reinforcement learning algorithm, and the method comprises the following steps:

a1, constructing a meta-controller by using a single-layer bidirectional long-short term memory network, and generating a network architecture alpha-pi (alpha, theta); wherein, alpha is the generated network structure, pi is the strategy learned by the element controller, and theta is the parameter of the element controller network;

A4, generating a sub-network structure alpha-pi (alpha, theta) in the current search space by using the meta-controller, and arranging the sub-network structure in the super-networkCorresponding candidate operation weight w in model_αActivating; training a hyper-network model in a target data set;

a5, generating a sub-network architecture alpha-pi (alpha, theta) in the current search space by using the meta-controller, and obtaining the weight w of the sub-network model by inheriting the weight of the corresponding operation in the super-network model_α(ii) a Performance indicators R (alpha, w) tested in partitioned validation data sets_α) And updating the weight theta of the meta-controller by taking the weight theta as the reward value;

5. The method for searching the network structure facing the ultra-large search space according to claim 4, wherein the training of the search network structure obtained by searching according to the target data set to obtain the final search network structure model comprises:

obtaining a high-performance network structure according to the trained meta-controller model;

and training the obtained network structure according to the target data set to obtain a final search network structure model.

6. The method for searching the network structure facing the huge search space, according to claim 5, wherein the training of the obtained network structure comprises:

7. The method for searching the network structure facing the ultra-large search space according to claim 5, wherein the step A3 specifically comprises:

if the current search space has been constructed, a candidate is randomly selected from the remaining candidate operationsSelecting an operation to join a search space omega_i-1To construct a new search space omega_i。

8. A network structure search system facing a super-large search space is characterized by comprising:

and the model training module is used for training the search network structure model obtained by searching according to the target data set to obtain the final search network structure model.

9. A network structure search system facing a super-large search space is characterized by comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.

10. A storage medium having stored therein a program executable by a processor, wherein the program executable by the processor is adapted to perform the method of any one of claims 1-7 when executed by the processor.