CN112906865A

CN112906865A - Neural network architecture searching method and device, electronic equipment and storage medium

Info

Publication number: CN112906865A
Application number: CN202110191861.2A
Authority: CN
Inventors: 骆剑平; 陈宇苏
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2021-06-04
Anticipated expiration: 2041-02-19
Also published as: CN112906865B

Abstract

The embodiment of the application discloses a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium. The method comprises the following steps: performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model; and determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model. By the technical scheme, the optimal target neural network is obtained on the premise of balancing network performance and computing resource cost.

Description

Neural network architecture searching method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium.

Background

Neural Network Architecture Search (NAS) is an important research direction in the field of Auto Machine Learning (Auto-ML), and the main process is to perform Network Search by using a specific Search method in a Search space which is defined in advance and contains specific operations of various Neural networks, and finally combine to obtain one or more Neural networks with better performance. In the process, a search method is a key of the technology, and the search method directly determines the calculation cost consumption and the final network performance in the search process.

In the prior art, the mainstream methods of the NAS include an evolution-based method, a gradient descent-based method, and a reinforcement learning-based method. However, in the solution process of the evolution-based method, the algorithm consumes a large amount of time and computing resources; in the process of solving based on the gradient descent method, the algorithm is easy to fall into a local optimal state, so that the network architecture is also in the local optimal state, the network architecture is single, and the performance is limited.

Disclosure of Invention

The embodiment of the application provides a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium, so as to determine an optimal neural network model and improve network performance.

In a first aspect, an embodiment of the present application provides a neural network architecture searching method, where the method includes:

performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;

according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model;

and determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.

In a second aspect, an embodiment of the present application provides a neural network architecture search apparatus, including:

the target network individual determining module is used for carrying out search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;

the initial neural network model determining module is used for carrying out search training on the target network individuals according to second search parameter setting and a sample image data set to obtain an initial neural network model;

and the operation mode determining module is used for determining the operation mode among the nodes in the network unit of the initial neural network model so as to obtain the target neural network model.

In a third aspect, an embodiment of the present application provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a neural network architecture search method as provided by any of the embodiments of the present application.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the neural network architecture search method as provided in any of the embodiments of the present application.

According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to search training by combining first search parameter setting and a sample image data set, and a target network individual is determined from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; and then, carrying out search training on the target network individual according to the second search parameter setting and the sample image data set to obtain an initial neural network model, and determining an operation mode among nodes in network units of the initial neural network model to further obtain the target neural network model. According to the technical scheme, the population concept is introduced, the diversity of the population is improved while the dominant individuals are kept, the searching process of the network structure is not easy to fall into a local optimal state, and meanwhile, the optimal target neural network is obtained on the premise of balancing the network performance and the cost of computing resources.

Drawings

Fig. 1A is a flowchart of a neural network architecture search method according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of a network entity according to an embodiment of the present application;

fig. 1C is a schematic structural diagram of a network unit according to an embodiment of the present application;

fig. 2 is a flowchart of a neural network architecture search method according to a second embodiment of the present application;

fig. 3A is a flowchart of a neural network architecture search method according to a third embodiment of the present application;

fig. 3B is a schematic diagram of a network individual before non-dominated sorting according to a third embodiment of the present application;

fig. 3C is a schematic diagram of a non-dominated sorted network individual provided in the third embodiment of the present application;

fig. 4 is a schematic structural diagram of a neural network architecture search apparatus according to a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

The embodiments of the present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the embodiments of the application and are not limiting of the application. It should be further noted that, for convenience of description, only some structures related to the embodiments of the present application are shown in the drawings, not all of the structures are shown.

Example one

Fig. 1A is a flowchart of a neural network architecture search method according to an embodiment of the present application; the embodiment can be applied to image processing task scenes, and is particularly suitable for recognition and classification task scenes of images. The method may be executed by the neural network architecture search apparatus provided in the embodiment of the present application, and the apparatus may be implemented in a hardware and/or software manner, and may be integrated in an electronic device bearing a neural network architecture search function, such as a server or a workstation configured with a Graphics Processing Unit (GPU) accelerator card, and the like.

As shown in fig. 1A, the method may specifically include:

s110, performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result.

Wherein, the initial population includes at least two network individuals, and for any network individual, at least two network units connected in a chain manner can be included, for example, the network individual shown in fig. 1B is formed by combining n (n ≧ 2) network units according to a chain order. The network unit is a modularized network architecture, and can be divided into a Normal Cell and a dimension Reduction Cell according to different functions, wherein the Normal Cell does not change the width and height of the input image data, and the dimension Reduction Cell changes the width and height of the input image data. Further, each network element may comprise at least two nodes, and a specific network element is composed of N (N)>2) nodes, wherein the nodes are feature graphs in a convolutional network, each directed edge represents a specific operation between the nodes, as shown in fig. 1C, and 0, 1, 2, and 3 respectively represent nodes of a network unit. Optionally, the network unit includes a plurality of hybrid operations (mixed-Op), each of which may include 8 specific operation manners, which are a residual connection (skip-connect) operation manner, a maximum pooling (max-pool-3x3) operation manner, an average pooling (avg-pool-3x3) operation manner, a 3x3 deep separable convolution (sep-conv-3x3) operation manner, a 5x5 deep separable convolution (sep-conv-5x5) operation manner, a 3x3 hole convolution (dil-conv-3x3) operation manner,5x5 hole convolution (dil-conv-5x5) operation mode and zero (zero) operation mode. For these specific calculation modes, a network architecture parameter (also called calculation weight value) α is given_i，i∈[1，8]And as the weight ratio of each specific operation mode in a single mixing operation, adjusting and optimizing by using gradient descent in the subsequent searching process. In an initial environment, there may be multiple specific operation modes between any two nodes of a network element, for example, there are 4 specific operation modes between node 0 and node 2 shown on the right of fig. 1C, and each operation mode is assigned an operation weight value of α₁，α₂，α₃，α₄(ii) a The adjustment and optimization are performed by using gradient descent in the subsequent search process, and the operation mode with the maximum operation weight is used as the final operation mode between the node 0 and the node 2, which is shown on the left side in fig. 1C.

In this embodiment, the determining of the sample image dataset may be determining a small sample dataset and a low-fraction dataset according to the original image dataset; the small sample dataset and the low fraction dataset are taken as sample image datasets. The raw image data set refers to a standard data set acquired from a network, such as a CIFAR10 data set and a CIFAR100 data set. Optionally, homogeneous sampling of the same category is performed from the original image data set to determine a small sample data set, for example, the same number of image data are selected from each category of image data as the small sample data set. Optionally, the image in the original image dataset is scaled and sampled to obtain the low resolution dataset, for example, the 32 × 32 × 3 image in the CIFAR10 dataset and the CIFAR100 dataset is scaled and sampled by bilinear interpolation or cubic interpolation to obtain the 16 × 16 × 3 image. Illustratively, a set number of image data are randomly chosen from the small sample dataset and the low resolution dataset of the sample image dataset, respectively, as the first sample image dataset.

It can be understood that the search training of the network individuals is performed by approximating the small sample data set and the low-resolution data to the original image data set, so that the initial performance of the network individuals can be quickly obtained, and the time consumption is saved.

In this embodiment, according to the first search parameter setting and the first sample image data set, search training is performed on at least two network individuals in the initial population by using a gradient descent method, and according to a search training result, a target network individual is determined from the at least two network individuals. The first search parameter setting may be, for example, a setting of the number of iterations, the number of training samples, a network depth, and the like.

And S120, carrying out search training on the target network individuals according to the second search parameter setting and the sample image data set to obtain an initial neural network model.

In this embodiment, according to the second search parameter setting and the second sample image data set, search training is performed on the target network individual by using a gradient descent method, so as to obtain an initial neural network model. Wherein the second search parameter setting is a further improvement of the first search parameter setting (e.g., increasing the number of iterations, number of training samples, network depth). Wherein a set number of image data are randomly picked up in equal amounts for each category from the small sample data set and the low resolution data set of the sample image data set, respectively, as a second sample image data set, the number of images in the second sample image data set being larger than the number of images in the first sample image data set.

S130, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.

Optionally, candidate operation weights between nodes in the network unit of the initial neural network model may be determined; and determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model. The candidate operation weight refers to an operation weight of a specific operation mode in a mixing operation between any two nodes in the network unit.

Specifically, candidate operation weights between nodes in network elements of the initial neural network model may be determined by a softmax (α) function, where the softmax (α) function is specifically as follows:

in the above formula

Representing a specific way o of operation between two nodes i, j in a network element_mThe corresponding weight α, see the operation weight α of the operation mode between node 0 and node 2 in FIG. 1C₁，α₂，α₃，α₄。

Then, for any two nodes in each network unit of the initial neural network model, calculating the candidate operation weight between the two nodes by using an argmax function to obtain an operation mode corresponding to the maximum operation weight in the candidate operation weight, and taking the operation mode as a specific operation mode between the two nodes. Wherein, the formula of the argmax function is as follows:

wherein ,o_i，jShowing a specific way of operation between two compute nodes i, j.

According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to search training by combining a first sample image data set, and a target network individual is determined from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; and then, according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model, and determining an operation mode among nodes in network units of the initial neural network model to further obtain the target neural network model. According to the technical scheme, the population concept is introduced, the diversity of the population is improved while the dominant individuals are kept, the searching process of the network structure is not easy to fall into a local optimal state, and meanwhile, the optimal target neural network is obtained on the premise of balancing the network performance and the cost of computing resources.

On the basis of the above embodiment, after the operation manner between the nodes in the network elements of the initial neural network model is determined, the network elements may be combined according to the attributes of the network elements of the initial neural network model to determine the target neural network model. The attributes of the network unit are determined according to the functions of the network unit, and include a Normal Cell and a Reduction Cell. Generally, the obtained initial neural network model includes 20 network units, among which 18 common units and 2 dimensionality reduction units, and in this embodiment, the 20 network units are re-chained to determine the target neural network model. For example, dimension reduction cells are placed at the 1/3 st position and the 2/3 st position, and normal cells are placed at other positions.

Example two

Fig. 2 is a flowchart of a neural network architecture search method according to a second embodiment of the present application; on the basis of the above embodiment, optimization is performed on "at least two network individuals in the initial population are search-trained according to the first search parameter setting and the sample image dataset, and a target network individual is determined from the at least two network individuals according to a search training result".

As shown in fig. 2, the method may specifically include:

s210, performing first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage.

The search training result in the first stage may include the accuracy and parameter amount of each network individual.

In this embodiment, according to the first search parameter setting and the sample image dataset, a first-stage search training is performed on at least two network individuals in the initial population by using a gradient descent method, and according to a search training result in the first stage, a first network individual is determined from the at least two network individuals. Wherein, the number of the first network individuals is at least two.

Optionally, ranking each network individual according to the accuracy in the search training result of each network individual in the first stage, and then selecting a set number of network individuals as the first network individual according to the ranking order.

S220, carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals.

In this embodiment, in order to further enrich the diversity of the population, the first network individual is subjected to cross and inheritance processing to obtain a new network individual, and the new network individual and the first network individual are used as the second network individual.

Specifically, crossing the first network individual means that the common units and dimension reduction units of the network individuals a and B are exchanged to obtain a new network individual, for example, the two networks a and B may be represented as a (NormalCell) according to the composition of the network units_A，ReductionCell_A)，B(NormalCell_B， ReductionCell_B) Then, the A and B are operated alternately to obtain a new individual C (NormalCell)_A， ReductionCell_B) And D (NormalCell)_B，ReductionCell_A) (ii) a The inheritance of the first network individual means that the operation weight alpha between nodes in the network unit of the current network individual A is obtained_iThen, the operation weight value alpha between the nodes in the network frame network unit_iThe two network individuals B, A, B are given different numbers of network elements (cells), while network individual B inherits the network element (Cell) structure of network individual a. Through the crossing and inheritance operations of the first network individuals, the excellent architecture of the network in the last stage is reserved, and the diversity of the population is increased.

And S230, performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the second-stage search training results.

The image data of a set number is randomly selected from the small sample data set and the low-resolution data set in the sample image data set according to each category in an equal number mode to serve as a third sample image data set, and the number of the images in the third sample image data set is larger than that in the first image sample data set and smaller than that in the second sample image data set.

In this embodiment, according to the third search parameter setting and the third sample image data set, the second-stage search training is performed on the second network individual by using the gradient descent method, and according to the search training result of the second stage, the target network individual is determined from the second network individual. And setting the third search parameter refers to improving the depth of the network individual, the number of training iterations, the number of samples of a training data set and the like.

Specifically, ranking each network individual according to the accuracy in the search training result of each network individual in the second stage, and selecting the network individual with the highest accuracy as the target network individual.

S240, searching and training the target network individual according to the second searching parameter setting and the sample image data set to obtain an initial neural network model.

In the embodiment, a target network individual is searched and trained according to the second search parameter setting and the second sample image data set to obtain an initial neural network model; wherein the second search parameter setting is a further improvement of the first search parameter setting (e.g., increasing the number of iterations, number of training samples, network depth).

And S250, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.

According to the technical scheme of the embodiment of the application, through combination of first search parameter setting and a sample image data set, first-stage search training is carried out on at least two network individuals in an initial population, and according to a search training result in the first stage, a first network individual is determined from the at least two network individuals; carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as a second network individual; and performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the search training results of the second stage. According to the technical scheme, the network individuals are preliminarily screened by using rough search in the first stage, the preliminarily screened first network individuals are expanded to obtain second network individuals, the population diversity is enriched, the second network individuals are finely searched, the search resources are inclined to the selected network individuals, and the calculation resources are reasonably distributed by searching in stages, so that the calculation resources are saved, and the final excellent network architecture is gradually approached and obtained.

EXAMPLE III

Fig. 3A is a flowchart of a neural network architecture search method according to a third embodiment of the present application; on the basis of the above embodiment, optimization is performed on "performing first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a first network individual from the at least two network individuals according to a search training result of the first stage".

As shown in fig. 3A, the method may specifically include:

s310, according to the first search parameter setting and the sample image data set, performing first-stage search training on at least two network individuals in the initial population to obtain the accuracy and parameter quantity of the at least two network individuals.

In this embodiment, according to the first search parameter setting and the sample image dataset, a first-stage search training is performed on each network individual in the initial population by using a gradient descent method, so as to obtain the accuracy and the parameter quantity of each network individual.

S320, sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals.

In this embodiment, first, the accuracy and the parameter of each network individual are counted, each network individual is regarded as a point on a two-dimensional plane, and the abscissa and the ordinate are the accuracy respectively

And number of parameters

And counting the distribution of the network individuals on a two-dimensional plane. Then, the points on the two-dimensional plane are sorted non-dominantly, and fig. 3B shows a schematic diagram of the network individuals before sorting non-dominance.

The larger the two targets of accuracy and parameter amount is, the better, but it cannot be guaranteed that the accuracy and parameter amount of each point are simultaneously maximized, so the points are sorted non-dominantly, i.e. the points are sorted and layered according to the difference of two target components (accuracy and parameter amount), as shown in fig. 3C, a schematic diagram of the network individuals sorted by non-dominance is shown, the uppermost point is better in overall performance than the points of the other layers, and the uppermost layer is also called pareto front.

Specifically, the steps of performing non-dominated sorting and pareto frontier on network individuals are as follows:

s1: assuming that the number of non-dominant solutions of each network individual Gj is Nj, and the corresponding set of non-dominant solutions is Sj;

s2: assuming that the number I of pareto fronts is 0;

s3: gj corresponding to all Nj ═ 0 is placed in the set Fj;

s4: judging whether all network individuals complete non-dominated sorting, if so, executing S9; if not, go to S5;

s5: judging whether all network individuals Gj in Fj are traversed, if so, executing S8; if not, go to S6;

s6: obtaining the next Gj and the corresponding Sj;

s7: counting all network individuals in Sj, subtracting 1 from Nj, and placing the network individuals in a set H when Nj is 0;

s8: let the number I of pareto fronts equal to I +1 and let set H replace set Fj; and returns to continue with execution S4;

s9: finishing the sorting and finishing.

S330, determining a first network individual from the at least two network individuals according to the sequencing result.

In this embodiment, according to the sorting result, a pareto frontier is used for selecting, and a set number of first network individuals are selected from at least two network individuals.

Specifically, each point on the front edge is selected, the selection criteria are accuracy and parameter quantity, and a diversity principle needs to be satisfied, and the diversity principle can be understood as that the distance on the graph between the selected points should be as far as possible. If the number of leading points is insufficient, all leading points are picked, and then the picking is continued in the next layer until a sufficient set number of network individuals are picked.

S340, carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals.

And S350, performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the second-stage search training results.

And S360, carrying out search training on the target network individuals according to the second search parameter setting and the sample image data set to obtain an initial neural network model.

And S370, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.

According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to first-stage search training according to first search parameter setting and a sample image data set, so that the accuracy and parameter quantity of the at least two network individuals are obtained; and sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals, and determining a first network individual from the at least two network individuals according to a sequencing result. According to the technical scheme, the network individuals with better performance in the population are obtained by using non-dominated sorting and pareto front selection, and the current performance and the potential performance of the network individuals are considered at the same time.

Example four

Fig. 4 is a schematic structural diagram of a neural network architecture search apparatus according to a fourth embodiment of the present application; the embodiment can be applied to image processing task scenes, and is particularly suitable for recognition and classification task scenes of images. The device can be implemented by hardware and/or software, and can be integrated in an electronic device bearing a neural network architecture search function, such as a server or a workstation configured with a Graphics Processing Unit (GPU) accelerator card, and the like.

As shown in fig. 4, the apparatus includes a target network individual determination module 410, an initial neural network model determination module 420, and an operation manner determination module 430, wherein,

a target network individual determining module 410, configured to perform search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determine a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;

the initial neural network model determining module 420 is configured to perform search training on the target network individual according to the second search parameter setting and the sample image data set, so as to obtain an initial neural network model;

and an operation mode determining module 430, configured to determine an operation mode between nodes in the network unit of the initial neural network model to obtain the target neural network model.

Further, the target network individual determination module comprises a first network individual determination unit, a second network individual determination unit and a target network individual determination unit, wherein,

the first network individual determining unit is used for carrying out first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage;

the second network individual determining unit is used for carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as a second network individual;

and the target network individual determining unit is used for performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining the target network individuals from the second network individuals according to the search training results of the second stage.

Further, the first network individual determination unit comprises a training subunit, a sorting subunit and a first network individual determination subunit, wherein,

the training subunit is used for carrying out first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set to obtain the accuracy and parameter quantity of the at least two network individuals;

the sequencing subunit is used for sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals;

and the first network individual determining subunit is used for determining the first network individual from the at least two network individuals according to the sequencing result.

Further, the apparatus comprises a first data determining module and a sample image dataset determining module, wherein,

a first data determination module for determining a small sample dataset and a low fraction dataset from an original image dataset;

a sample image dataset determination module for taking the small sample dataset and the low fraction dataset as a sample image dataset.

Further, the operation manner determining module 430 includes a candidate operation weight determining unit and an operation manner determining unit, wherein,

the candidate operation weight determining unit is used for determining candidate operation weights among nodes in the network unit of the initial neural network model;

and the operation mode determining unit is used for determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model.

Further, the apparatus further comprises a target neural network model determining module, wherein,

and the target neural network model determining module is used for combining the network units according to the attributes of the network units of the initial neural network model to determine the target neural network model.

The neural network architecture search device provided in the above embodiments can execute the neural network architecture search method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.

EXAMPLE five

Fig. 5 is a schematic structural diagram of an electronic device provided in the fifth embodiment of the present application, and fig. 5 shows a block diagram of an exemplary device suitable for implementing the embodiments of the present application. The device shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.

As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, for example, to implement the neural network architecture search method provided in the embodiments of the present application.

EXAMPLE six

A sixth embodiment of the present application further provides a computer-readable storage medium, on which a computer program (or referred to as computer-executable instructions) is stored, where the computer program is used for executing the neural network architecture searching method provided in the embodiments of the present application when the computer program is executed by a processor.

The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the embodiments of the present application have been described in more detail through the above embodiments, the embodiments of the present application are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A neural network architecture search method, the method comprising:

2. The method of claim 1, wherein performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a target network individual from the at least two network individuals according to a search training result comprises:

performing first-stage search training on at least two network individuals in the initial population according to first search parameter setting and a sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage;

carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals;

and performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the search training results of the second stage.

3. The method of claim 2, wherein performing a first stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a first network individual from the at least two network individuals according to a result of the first stage search training comprises:

performing a first-stage search training on at least two network individuals in the initial population according to a first search parameter setting and a sample image data set to obtain the accuracy and parameter quantity of the at least two network individuals;

sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals;

and determining a first network individual from the at least two network individuals according to the sequencing result.

4. The method of claim 1, further comprising:

determining a small sample data set and a low fraction data set according to an original image data set;

and using the small sample data set and the low fraction data set as sample image data sets.

5. The method of claim 1, wherein determining the manner of operation between nodes in the network elements of the initial neural network model comprises:

determining candidate operation weight values among nodes in network units of the initial neural network model;

and determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model.

6. The method of claim 1, wherein determining the operational mode between nodes in the network elements of the initial neural network model further comprises:

and combining the network units according to the attributes of the network units of the initial neural network model to determine a target neural network model.

7. An apparatus for neural network architecture search, the apparatus comprising:

8. The apparatus of claim 7, wherein the target network individual determination module comprises:

the second network individual determining unit is used for carrying out cross and inheritance processing on the first network individual to obtain a new network individual and taking the new network individual and the first network individual as a second network individual;

and the target network individual determining unit is used for carrying out second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining the target network individuals from the second network individuals according to the second-stage search training results.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the neural network architecture search method of any one of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a neural network architecture search method according to any one of claims 1 to 6.