CN112906865A - Neural network architecture searching method and device, electronic equipment and storage medium - Google Patents

Neural network architecture searching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112906865A
CN112906865A CN202110191861.2A CN202110191861A CN112906865A CN 112906865 A CN112906865 A CN 112906865A CN 202110191861 A CN202110191861 A CN 202110191861A CN 112906865 A CN112906865 A CN 112906865A
Authority
CN
China
Prior art keywords
network
search
individuals
individual
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110191861.2A
Other languages
Chinese (zh)
Other versions
CN112906865B (en
Inventor
骆剑平
陈宇苏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202110191861.2A priority Critical patent/CN112906865B/en
Publication of CN112906865A publication Critical patent/CN112906865A/en
Application granted granted Critical
Publication of CN112906865B publication Critical patent/CN112906865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium. The method comprises the following steps: performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model; and determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model. By the technical scheme, the optimal target neural network is obtained on the premise of balancing network performance and computing resource cost.

Description

Neural network architecture searching method and device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium.
Background
Neural Network Architecture Search (NAS) is an important research direction in the field of Auto Machine Learning (Auto-ML), and the main process is to perform Network Search by using a specific Search method in a Search space which is defined in advance and contains specific operations of various Neural networks, and finally combine to obtain one or more Neural networks with better performance. In the process, a search method is a key of the technology, and the search method directly determines the calculation cost consumption and the final network performance in the search process.
In the prior art, the mainstream methods of the NAS include an evolution-based method, a gradient descent-based method, and a reinforcement learning-based method. However, in the solution process of the evolution-based method, the algorithm consumes a large amount of time and computing resources; in the process of solving based on the gradient descent method, the algorithm is easy to fall into a local optimal state, so that the network architecture is also in the local optimal state, the network architecture is single, and the performance is limited.
Disclosure of Invention
The embodiment of the application provides a neural network architecture searching method, a neural network architecture searching device, electronic equipment and a storage medium, so as to determine an optimal neural network model and improve network performance.
In a first aspect, an embodiment of the present application provides a neural network architecture searching method, where the method includes:
performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;
according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model;
and determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.
In a second aspect, an embodiment of the present application provides a neural network architecture search apparatus, including:
the target network individual determining module is used for carrying out search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;
the initial neural network model determining module is used for carrying out search training on the target network individuals according to second search parameter setting and a sample image data set to obtain an initial neural network model;
and the operation mode determining module is used for determining the operation mode among the nodes in the network unit of the initial neural network model so as to obtain the target neural network model.
In a third aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a neural network architecture search method as provided by any of the embodiments of the present application.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the neural network architecture search method as provided in any of the embodiments of the present application.
According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to search training by combining first search parameter setting and a sample image data set, and a target network individual is determined from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; and then, carrying out search training on the target network individual according to the second search parameter setting and the sample image data set to obtain an initial neural network model, and determining an operation mode among nodes in network units of the initial neural network model to further obtain the target neural network model. According to the technical scheme, the population concept is introduced, the diversity of the population is improved while the dominant individuals are kept, the searching process of the network structure is not easy to fall into a local optimal state, and meanwhile, the optimal target neural network is obtained on the premise of balancing the network performance and the cost of computing resources.
Drawings
Fig. 1A is a flowchart of a neural network architecture search method according to an embodiment of the present application;
fig. 1B is a schematic structural diagram of a network entity according to an embodiment of the present application;
fig. 1C is a schematic structural diagram of a network unit according to an embodiment of the present application;
fig. 2 is a flowchart of a neural network architecture search method according to a second embodiment of the present application;
fig. 3A is a flowchart of a neural network architecture search method according to a third embodiment of the present application;
fig. 3B is a schematic diagram of a network individual before non-dominated sorting according to a third embodiment of the present application;
fig. 3C is a schematic diagram of a non-dominated sorted network individual provided in the third embodiment of the present application;
fig. 4 is a schematic structural diagram of a neural network architecture search apparatus according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The embodiments of the present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the embodiments of the application and are not limiting of the application. It should be further noted that, for convenience of description, only some structures related to the embodiments of the present application are shown in the drawings, not all of the structures are shown.
Example one
Fig. 1A is a flowchart of a neural network architecture search method according to an embodiment of the present application; the embodiment can be applied to image processing task scenes, and is particularly suitable for recognition and classification task scenes of images. The method may be executed by the neural network architecture search apparatus provided in the embodiment of the present application, and the apparatus may be implemented in a hardware and/or software manner, and may be integrated in an electronic device bearing a neural network architecture search function, such as a server or a workstation configured with a Graphics Processing Unit (GPU) accelerator card, and the like.
As shown in fig. 1A, the method may specifically include:
s110, performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result.
Wherein, the initial population includes at least two network individuals, and for any network individual, at least two network units connected in a chain manner can be included, for example, the network individual shown in fig. 1B is formed by combining n (n ≧ 2) network units according to a chain order. The network unit is a modularized network architecture, and can be divided into a Normal Cell and a dimension Reduction Cell according to different functions, wherein the Normal Cell does not change the width and height of the input image data, and the dimension Reduction Cell changes the width and height of the input image data. Further, each network element may comprise at least two nodes, and a specific network element is composed of N (N)>2) nodes, wherein the nodes are feature graphs in a convolutional network, each directed edge represents a specific operation between the nodes, as shown in fig. 1C, and 0, 1, 2, and 3 respectively represent nodes of a network unit. Optionally, the network unit includes a plurality of hybrid operations (mixed-Op), each of which may include 8 specific operation manners, which are a residual connection (skip-connect) operation manner, a maximum pooling (max-pool-3x3) operation manner, an average pooling (avg-pool-3x3) operation manner, a 3x3 deep separable convolution (sep-conv-3x3) operation manner, a 5x5 deep separable convolution (sep-conv-5x5) operation manner, a 3x3 hole convolution (dil-conv-3x3) operation manner,5x5 hole convolution (dil-conv-5x5) operation mode and zero (zero) operation mode. For these specific calculation modes, a network architecture parameter (also called calculation weight value) α is giveni,i∈[1,8]And as the weight ratio of each specific operation mode in a single mixing operation, adjusting and optimizing by using gradient descent in the subsequent searching process. In an initial environment, there may be multiple specific operation modes between any two nodes of a network element, for example, there are 4 specific operation modes between node 0 and node 2 shown on the right of fig. 1C, and each operation mode is assigned an operation weight value of α1,α2,α3,α4(ii) a The adjustment and optimization are performed by using gradient descent in the subsequent search process, and the operation mode with the maximum operation weight is used as the final operation mode between the node 0 and the node 2, which is shown on the left side in fig. 1C.
In this embodiment, the determining of the sample image dataset may be determining a small sample dataset and a low-fraction dataset according to the original image dataset; the small sample dataset and the low fraction dataset are taken as sample image datasets. The raw image data set refers to a standard data set acquired from a network, such as a CIFAR10 data set and a CIFAR100 data set. Optionally, homogeneous sampling of the same category is performed from the original image data set to determine a small sample data set, for example, the same number of image data are selected from each category of image data as the small sample data set. Optionally, the image in the original image dataset is scaled and sampled to obtain the low resolution dataset, for example, the 32 × 32 × 3 image in the CIFAR10 dataset and the CIFAR100 dataset is scaled and sampled by bilinear interpolation or cubic interpolation to obtain the 16 × 16 × 3 image. Illustratively, a set number of image data are randomly chosen from the small sample dataset and the low resolution dataset of the sample image dataset, respectively, as the first sample image dataset.
It can be understood that the search training of the network individuals is performed by approximating the small sample data set and the low-resolution data to the original image data set, so that the initial performance of the network individuals can be quickly obtained, and the time consumption is saved.
In this embodiment, according to the first search parameter setting and the first sample image data set, search training is performed on at least two network individuals in the initial population by using a gradient descent method, and according to a search training result, a target network individual is determined from the at least two network individuals. The first search parameter setting may be, for example, a setting of the number of iterations, the number of training samples, a network depth, and the like.
And S120, carrying out search training on the target network individuals according to the second search parameter setting and the sample image data set to obtain an initial neural network model.
In this embodiment, according to the second search parameter setting and the second sample image data set, search training is performed on the target network individual by using a gradient descent method, so as to obtain an initial neural network model. Wherein the second search parameter setting is a further improvement of the first search parameter setting (e.g., increasing the number of iterations, number of training samples, network depth). Wherein a set number of image data are randomly picked up in equal amounts for each category from the small sample data set and the low resolution data set of the sample image data set, respectively, as a second sample image data set, the number of images in the second sample image data set being larger than the number of images in the first sample image data set.
S130, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.
Optionally, candidate operation weights between nodes in the network unit of the initial neural network model may be determined; and determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model. The candidate operation weight refers to an operation weight of a specific operation mode in a mixing operation between any two nodes in the network unit.
Specifically, candidate operation weights between nodes in network elements of the initial neural network model may be determined by a softmax (α) function, where the softmax (α) function is specifically as follows:
Figure BDA0002944752280000071
in the above formula
Figure BDA0002944752280000072
Representing a specific way o of operation between two nodes i, j in a network elementmThe corresponding weight α, see the operation weight α of the operation mode between node 0 and node 2 in FIG. 1C1,α2,α3,α4
Then, for any two nodes in each network unit of the initial neural network model, calculating the candidate operation weight between the two nodes by using an argmax function to obtain an operation mode corresponding to the maximum operation weight in the candidate operation weight, and taking the operation mode as a specific operation mode between the two nodes. Wherein, the formula of the argmax function is as follows:
Figure BDA0002944752280000081
wherein ,oi,jShowing a specific way of operation between two compute nodes i, j.
According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to search training by combining a first sample image data set, and a target network individual is determined from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; and then, according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model, and determining an operation mode among nodes in network units of the initial neural network model to further obtain the target neural network model. According to the technical scheme, the population concept is introduced, the diversity of the population is improved while the dominant individuals are kept, the searching process of the network structure is not easy to fall into a local optimal state, and meanwhile, the optimal target neural network is obtained on the premise of balancing the network performance and the cost of computing resources.
On the basis of the above embodiment, after the operation manner between the nodes in the network elements of the initial neural network model is determined, the network elements may be combined according to the attributes of the network elements of the initial neural network model to determine the target neural network model. The attributes of the network unit are determined according to the functions of the network unit, and include a Normal Cell and a Reduction Cell. Generally, the obtained initial neural network model includes 20 network units, among which 18 common units and 2 dimensionality reduction units, and in this embodiment, the 20 network units are re-chained to determine the target neural network model. For example, dimension reduction cells are placed at the 1/3 st position and the 2/3 st position, and normal cells are placed at other positions.
Example two
Fig. 2 is a flowchart of a neural network architecture search method according to a second embodiment of the present application; on the basis of the above embodiment, optimization is performed on "at least two network individuals in the initial population are search-trained according to the first search parameter setting and the sample image dataset, and a target network individual is determined from the at least two network individuals according to a search training result".
As shown in fig. 2, the method may specifically include:
s210, performing first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage.
The search training result in the first stage may include the accuracy and parameter amount of each network individual.
In this embodiment, according to the first search parameter setting and the sample image dataset, a first-stage search training is performed on at least two network individuals in the initial population by using a gradient descent method, and according to a search training result in the first stage, a first network individual is determined from the at least two network individuals. Wherein, the number of the first network individuals is at least two.
Optionally, ranking each network individual according to the accuracy in the search training result of each network individual in the first stage, and then selecting a set number of network individuals as the first network individual according to the ranking order.
S220, carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals.
In this embodiment, in order to further enrich the diversity of the population, the first network individual is subjected to cross and inheritance processing to obtain a new network individual, and the new network individual and the first network individual are used as the second network individual.
Specifically, crossing the first network individual means that the common units and dimension reduction units of the network individuals a and B are exchanged to obtain a new network individual, for example, the two networks a and B may be represented as a (NormalCell) according to the composition of the network unitsA,ReductionCellA),B(NormalCellB, ReductionCellB) Then, the A and B are operated alternately to obtain a new individual C (NormalCell)A, ReductionCellB) And D (NormalCell)B,ReductionCellA) (ii) a The inheritance of the first network individual means that the operation weight alpha between nodes in the network unit of the current network individual A is obtainediThen, the operation weight value alpha between the nodes in the network frame network unitiThe two network individuals B, A, B are given different numbers of network elements (cells), while network individual B inherits the network element (Cell) structure of network individual a. Through the crossing and inheritance operations of the first network individuals, the excellent architecture of the network in the last stage is reserved, and the diversity of the population is increased.
And S230, performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the second-stage search training results.
The image data of a set number is randomly selected from the small sample data set and the low-resolution data set in the sample image data set according to each category in an equal number mode to serve as a third sample image data set, and the number of the images in the third sample image data set is larger than that in the first image sample data set and smaller than that in the second sample image data set.
In this embodiment, according to the third search parameter setting and the third sample image data set, the second-stage search training is performed on the second network individual by using the gradient descent method, and according to the search training result of the second stage, the target network individual is determined from the second network individual. And setting the third search parameter refers to improving the depth of the network individual, the number of training iterations, the number of samples of a training data set and the like.
Specifically, ranking each network individual according to the accuracy in the search training result of each network individual in the second stage, and selecting the network individual with the highest accuracy as the target network individual.
S240, searching and training the target network individual according to the second searching parameter setting and the sample image data set to obtain an initial neural network model.
In the embodiment, a target network individual is searched and trained according to the second search parameter setting and the second sample image data set to obtain an initial neural network model; wherein the second search parameter setting is a further improvement of the first search parameter setting (e.g., increasing the number of iterations, number of training samples, network depth).
And S250, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.
According to the technical scheme of the embodiment of the application, through combination of first search parameter setting and a sample image data set, first-stage search training is carried out on at least two network individuals in an initial population, and according to a search training result in the first stage, a first network individual is determined from the at least two network individuals; carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as a second network individual; and performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the search training results of the second stage. According to the technical scheme, the network individuals are preliminarily screened by using rough search in the first stage, the preliminarily screened first network individuals are expanded to obtain second network individuals, the population diversity is enriched, the second network individuals are finely searched, the search resources are inclined to the selected network individuals, and the calculation resources are reasonably distributed by searching in stages, so that the calculation resources are saved, and the final excellent network architecture is gradually approached and obtained.
EXAMPLE III
Fig. 3A is a flowchart of a neural network architecture search method according to a third embodiment of the present application; on the basis of the above embodiment, optimization is performed on "performing first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a first network individual from the at least two network individuals according to a search training result of the first stage".
As shown in fig. 3A, the method may specifically include:
s310, according to the first search parameter setting and the sample image data set, performing first-stage search training on at least two network individuals in the initial population to obtain the accuracy and parameter quantity of the at least two network individuals.
In this embodiment, according to the first search parameter setting and the sample image dataset, a first-stage search training is performed on each network individual in the initial population by using a gradient descent method, so as to obtain the accuracy and the parameter quantity of each network individual.
S320, sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals.
In this embodiment, first, the accuracy and the parameter of each network individual are counted, each network individual is regarded as a point on a two-dimensional plane, and the abscissa and the ordinate are the accuracy respectively
Figure DEST_PATH_IMAGE002
And number of parameters
Figure DEST_PATH_IMAGE004
And counting the distribution of the network individuals on a two-dimensional plane. Then, the points on the two-dimensional plane are sorted non-dominantly, and fig. 3B shows a schematic diagram of the network individuals before sorting non-dominance.
The larger the two targets of accuracy and parameter amount is, the better, but it cannot be guaranteed that the accuracy and parameter amount of each point are simultaneously maximized, so the points are sorted non-dominantly, i.e. the points are sorted and layered according to the difference of two target components (accuracy and parameter amount), as shown in fig. 3C, a schematic diagram of the network individuals sorted by non-dominance is shown, the uppermost point is better in overall performance than the points of the other layers, and the uppermost layer is also called pareto front.
Specifically, the steps of performing non-dominated sorting and pareto frontier on network individuals are as follows:
s1: assuming that the number of non-dominant solutions of each network individual Gj is Nj, and the corresponding set of non-dominant solutions is Sj;
s2: assuming that the number I of pareto fronts is 0;
s3: gj corresponding to all Nj ═ 0 is placed in the set Fj;
s4: judging whether all network individuals complete non-dominated sorting, if so, executing S9; if not, go to S5;
s5: judging whether all network individuals Gj in Fj are traversed, if so, executing S8; if not, go to S6;
s6: obtaining the next Gj and the corresponding Sj;
s7: counting all network individuals in Sj, subtracting 1 from Nj, and placing the network individuals in a set H when Nj is 0;
s8: let the number I of pareto fronts equal to I +1 and let set H replace set Fj; and returns to continue with execution S4;
s9: finishing the sorting and finishing.
S330, determining a first network individual from the at least two network individuals according to the sequencing result.
In this embodiment, according to the sorting result, a pareto frontier is used for selecting, and a set number of first network individuals are selected from at least two network individuals.
Specifically, each point on the front edge is selected, the selection criteria are accuracy and parameter quantity, and a diversity principle needs to be satisfied, and the diversity principle can be understood as that the distance on the graph between the selected points should be as far as possible. If the number of leading points is insufficient, all leading points are picked, and then the picking is continued in the next layer until a sufficient set number of network individuals are picked.
S340, carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals.
And S350, performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the second-stage search training results.
And S360, carrying out search training on the target network individuals according to the second search parameter setting and the sample image data set to obtain an initial neural network model.
And S370, determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.
According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to first-stage search training according to first search parameter setting and a sample image data set, so that the accuracy and parameter quantity of the at least two network individuals are obtained; and sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals, and determining a first network individual from the at least two network individuals according to a sequencing result. According to the technical scheme, the network individuals with better performance in the population are obtained by using non-dominated sorting and pareto front selection, and the current performance and the potential performance of the network individuals are considered at the same time.
Example four
Fig. 4 is a schematic structural diagram of a neural network architecture search apparatus according to a fourth embodiment of the present application; the embodiment can be applied to image processing task scenes, and is particularly suitable for recognition and classification task scenes of images. The device can be implemented by hardware and/or software, and can be integrated in an electronic device bearing a neural network architecture search function, such as a server or a workstation configured with a Graphics Processing Unit (GPU) accelerator card, and the like.
As shown in fig. 4, the apparatus includes a target network individual determination module 410, an initial neural network model determination module 420, and an operation manner determination module 430, wherein,
a target network individual determining module 410, configured to perform search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determine a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;
the initial neural network model determining module 420 is configured to perform search training on the target network individual according to the second search parameter setting and the sample image data set, so as to obtain an initial neural network model;
and an operation mode determining module 430, configured to determine an operation mode between nodes in the network unit of the initial neural network model to obtain the target neural network model.
According to the technical scheme of the embodiment of the application, at least two network individuals in an initial population are subjected to search training by combining first search parameter setting and a sample image data set, and a target network individual is determined from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes; and then, carrying out search training on the target network individual according to the second search parameter setting and the sample image data set to obtain an initial neural network model, and determining an operation mode among nodes in network units of the initial neural network model to further obtain the target neural network model. According to the technical scheme, the population concept is introduced, the diversity of the population is improved while the dominant individuals are kept, the searching process of the network structure is not easy to fall into a local optimal state, and meanwhile, the optimal target neural network is obtained on the premise of balancing the network performance and the cost of computing resources.
Further, the target network individual determination module comprises a first network individual determination unit, a second network individual determination unit and a target network individual determination unit, wherein,
the first network individual determining unit is used for carrying out first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage;
the second network individual determining unit is used for carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as a second network individual;
and the target network individual determining unit is used for performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining the target network individuals from the second network individuals according to the search training results of the second stage.
Further, the first network individual determination unit comprises a training subunit, a sorting subunit and a first network individual determination subunit, wherein,
the training subunit is used for carrying out first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set to obtain the accuracy and parameter quantity of the at least two network individuals;
the sequencing subunit is used for sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals;
and the first network individual determining subunit is used for determining the first network individual from the at least two network individuals according to the sequencing result.
Further, the apparatus comprises a first data determining module and a sample image dataset determining module, wherein,
a first data determination module for determining a small sample dataset and a low fraction dataset from an original image dataset;
a sample image dataset determination module for taking the small sample dataset and the low fraction dataset as a sample image dataset.
Further, the operation manner determining module 430 includes a candidate operation weight determining unit and an operation manner determining unit, wherein,
the candidate operation weight determining unit is used for determining candidate operation weights among nodes in the network unit of the initial neural network model;
and the operation mode determining unit is used for determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model.
Further, the apparatus further comprises a target neural network model determining module, wherein,
and the target neural network model determining module is used for combining the network units according to the attributes of the network units of the initial neural network model to determine the target neural network model.
The neural network architecture search device provided in the above embodiments can execute the neural network architecture search method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device provided in the fifth embodiment of the present application, and fig. 5 shows a block diagram of an exemplary device suitable for implementing the embodiments of the present application. The device shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, for example, to implement the neural network architecture search method provided in the embodiments of the present application.
EXAMPLE six
A sixth embodiment of the present application further provides a computer-readable storage medium, on which a computer program (or referred to as computer-executable instructions) is stored, where the computer program is used for executing the neural network architecture searching method provided in the embodiments of the present application when the computer program is executed by a processor.
The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the embodiments of the present application have been described in more detail through the above embodiments, the embodiments of the present application are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims (10)

1. A neural network architecture search method, the method comprising:
performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;
according to the second search parameter setting and the sample image data set, carrying out search training on the target network individual to obtain an initial neural network model;
and determining an operation mode among nodes in the network unit of the initial neural network model to obtain a target neural network model.
2. The method of claim 1, wherein performing search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a target network individual from the at least two network individuals according to a search training result comprises:
performing first-stage search training on at least two network individuals in the initial population according to first search parameter setting and a sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage;
carrying out cross and inheritance processing on the first network individual to obtain a new network individual, and taking the new network individual and the first network individual as second network individuals;
and performing second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining target network individuals from the second network individuals according to the search training results of the second stage.
3. The method of claim 2, wherein performing a first stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image dataset, and determining a first network individual from the at least two network individuals according to a result of the first stage search training comprises:
performing a first-stage search training on at least two network individuals in the initial population according to a first search parameter setting and a sample image data set to obtain the accuracy and parameter quantity of the at least two network individuals;
sequencing the at least two network individuals according to the accuracy and the parameter quantity of the at least two network individuals;
and determining a first network individual from the at least two network individuals according to the sequencing result.
4. The method of claim 1, further comprising:
determining a small sample data set and a low fraction data set according to an original image data set;
and using the small sample data set and the low fraction data set as sample image data sets.
5. The method of claim 1, wherein determining the manner of operation between nodes in the network elements of the initial neural network model comprises:
determining candidate operation weight values among nodes in network units of the initial neural network model;
and determining the operation mode among the nodes in the network unit of the initial neural network model according to the candidate operation weight among the nodes in the network unit of the initial neural network model.
6. The method of claim 1, wherein determining the operational mode between nodes in the network elements of the initial neural network model further comprises:
and combining the network units according to the attributes of the network units of the initial neural network model to determine a target neural network model.
7. An apparatus for neural network architecture search, the apparatus comprising:
the target network individual determining module is used for carrying out search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a target network individual from the at least two network individuals according to a search training result; wherein each network entity comprises at least two network elements which are connected in a chain way, and each network element comprises at least two nodes;
the initial neural network model determining module is used for carrying out search training on the target network individuals according to second search parameter setting and a sample image data set to obtain an initial neural network model;
and the operation mode determining module is used for determining the operation mode among the nodes in the network unit of the initial neural network model so as to obtain the target neural network model.
8. The apparatus of claim 7, wherein the target network individual determination module comprises:
the first network individual determining unit is used for carrying out first-stage search training on at least two network individuals in the initial population according to the first search parameter setting and the sample image data set, and determining a first network individual from the at least two network individuals according to a search training result of the first stage;
the second network individual determining unit is used for carrying out cross and inheritance processing on the first network individual to obtain a new network individual and taking the new network individual and the first network individual as a second network individual;
and the target network individual determining unit is used for carrying out second-stage search training on the second network individuals according to the third search parameter setting and the sample image data set, and determining the target network individuals from the second network individuals according to the second-stage search training results.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the neural network architecture search method of any one of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a neural network architecture search method according to any one of claims 1 to 6.
CN202110191861.2A 2021-02-19 2021-02-19 Neural network architecture searching method and device, electronic equipment and storage medium Active CN112906865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110191861.2A CN112906865B (en) 2021-02-19 2021-02-19 Neural network architecture searching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110191861.2A CN112906865B (en) 2021-02-19 2021-02-19 Neural network architecture searching method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112906865A true CN112906865A (en) 2021-06-04
CN112906865B CN112906865B (en) 2023-08-18

Family

ID=76123869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110191861.2A Active CN112906865B (en) 2021-02-19 2021-02-19 Neural network architecture searching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112906865B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554097A (en) * 2021-07-26 2021-10-26 北京市商汤科技开发有限公司 Model quantization method and device, electronic equipment and storage medium
CN113627611A (en) * 2021-08-06 2021-11-09 苏州科韵激光科技有限公司 Model training method and device, electronic equipment and storage medium
CN113780518A (en) * 2021-08-10 2021-12-10 深圳大学 Network architecture optimization method, terminal device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188562A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Deep Neural Network Hardening Framework
CN111553480A (en) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 Neural network searching method and device, computer readable medium and electronic equipment
US20200302272A1 (en) * 2019-03-19 2020-09-24 Cisco Technology, Inc. Systems and methods for auto machine learning and neural architecture search
CN111814966A (en) * 2020-08-24 2020-10-23 国网浙江省电力有限公司 Neural network architecture searching method, neural network application method, device and storage medium
CN111967569A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network structure generation method and device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188562A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Deep Neural Network Hardening Framework
US20200302272A1 (en) * 2019-03-19 2020-09-24 Cisco Technology, Inc. Systems and methods for auto machine learning and neural architecture search
CN111967569A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network structure generation method and device, storage medium and electronic equipment
CN111553480A (en) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 Neural network searching method and device, computer readable medium and electronic equipment
CN111814966A (en) * 2020-08-24 2020-10-23 国网浙江省电力有限公司 Neural network architecture searching method, neural network application method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卞伟伟;邱旭阳;申研;: "基于神经网络结构搜索的目标识别方法", 空军工程大学学报(自然科学版), no. 04, pages 92 - 96 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554097A (en) * 2021-07-26 2021-10-26 北京市商汤科技开发有限公司 Model quantization method and device, electronic equipment and storage medium
CN113554097B (en) * 2021-07-26 2023-03-24 北京市商汤科技开发有限公司 Model quantization method and device, electronic equipment and storage medium
CN113627611A (en) * 2021-08-06 2021-11-09 苏州科韵激光科技有限公司 Model training method and device, electronic equipment and storage medium
CN113780518A (en) * 2021-08-10 2021-12-10 深圳大学 Network architecture optimization method, terminal device and computer-readable storage medium
CN113780518B (en) * 2021-08-10 2024-03-08 深圳大学 Network architecture optimization method, terminal equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN112906865B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN110991311B (en) Target detection method based on dense connection deep network
CN110163234B (en) Model training method and device and storage medium
CN112906865B (en) Neural network architecture searching method and device, electronic equipment and storage medium
CN108920720B (en) Large-scale image retrieval method based on depth hash and GPU acceleration
CN112613581B (en) Image recognition method, system, computer equipment and storage medium
US10467547B1 (en) Normalizing text attributes for machine learning models
WO2023138188A1 (en) Feature fusion model training method and apparatus, sample retrieval method and apparatus, and computer device
CN112749300B (en) Method, apparatus, device, storage medium and program product for video classification
WO2021218037A1 (en) Target detection method and apparatus, computer device and storage medium
CN115496955A (en) Image classification model training method, image classification method, apparatus and medium
CN113139651A (en) Training method and device of label proportion learning model based on self-supervision learning
CN114154557A (en) Cancer tissue classification method, apparatus, electronic device, and storage medium
WO2023124342A1 (en) Low-cost automatic neural architecture search method for image classification
CN111178196B (en) Cell classification method, device and equipment
CN115344805A (en) Material auditing method, computing equipment and storage medium
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus
CN111709473B (en) Clustering method and device for object features
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
CN111158918B (en) Supporting point parallel enumeration load balancing method, device, equipment and medium
CN110009091B (en) Optimization of learning network in equivalence class space
CN111738290A (en) Image detection method, model construction and training method, device, equipment and medium
CN116958809A (en) Remote sensing small sample target detection method for feature library migration
US10997497B2 (en) Calculation device for and calculation method of performing convolution
CN108830302B (en) Image classification method, training method, classification prediction method and related device
US20220343146A1 (en) Method and system for temporal graph neural network acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant