CN114626506A - Attention mechanism-based neural network unit structure searching method and system - Google Patents

Attention mechanism-based neural network unit structure searching method and system Download PDF

Info

Publication number
CN114626506A
CN114626506A CN202210219650.XA CN202210219650A CN114626506A CN 114626506 A CN114626506 A CN 114626506A CN 202210219650 A CN202210219650 A CN 202210219650A CN 114626506 A CN114626506 A CN 114626506A
Authority
CN
China
Prior art keywords
unit structure
attention
network
search
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210219650.XA
Other languages
Chinese (zh)
Inventor
胡瑜
孙自浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202210219650.XA priority Critical patent/CN114626506A/en
Publication of CN114626506A publication Critical patent/CN114626506A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a neural network unit structure searching method and system based on an attention mechanism, which comprises the following steps: constructing a macro-architecture hyper-network in a search space, wherein each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space; adding an attention module after outputting the characteristic diagram of all candidate operations of each edge in the unit structure to obtain a network to be searched; and training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate search network unit structure in the training process until the training reaches the preset iteration number, and eliminating all attention modules in the current network to be searched to obtain the search result of the neural network unit structure of the data set. The invention can not only consider the mutual influence among the operations, but also reserve each operation until the final step of searching.

Description

Attention mechanism-based neural network unit structure searching method and system
Technical Field
The invention relates to the technical field of neural network architecture search and picture classification in the field of automatic machine learning, in particular to a neural network unit structure search method and device based on an attention mechanism.
Background
Automatic Machine Learning (Auto-ML for short) refers to automating the steps of data preprocessing, feature selection, algorithm selection and the like in Machine Learning and the steps of neural network architecture design, hyper-parameter optimization, neural network model training and the like in deep Learning, and obtaining expected results without manual intervention. Neural network Architecture Search (NAS for short) belongs to the category of network design in automatic machine learning, and refers to automatically searching to obtain a Neural network Architecture, for example, combining various operations according to a Search strategy from a Search space containing various operations (such as convolution, pooling, jump connection) aiming at different computer vision tasks such as classification, detection, segmentation, tracking and the like to obtain a Neural network Architecture, and further measuring the performance of the Neural network Architecture on the corresponding computer vision task under a specified evaluation strategy.
Early neural network architecture search strategies including reinforcement learning, evolutionary algorithms, random search and bayesian optimization generally required retraining of each resulting network structure to evaluate the performance of the corresponding network structure, and thus the whole search process was computationally intensive and time consuming. In recent years, the differentiable search strategy has attracted extensive attention in academia and industry because it utilizes weight sharing and gradient descent optimization algorithms to significantly reduce the search time. Particularly representative is a differentiable architecture search dart, which verifies performance by searching for cell structures and then stacking the searched cell structures into a target network.
But the differentiable search strategy DARTS only considers the influence of the neural network model loss function on each operation weight in the search space, and does not consider the mutual influence among the operations in the search space; StacNAS proposes a problem that similar operations have 'vote' due to multiple collinearity (multicollinearity) among the operations, so that a selection drop problem is caused, a correlation matrix of all the operations in an original search space is calculated firstly, the operations are grouped according to the correlation, then one operation is selected from each group to represent the operation in each group, and a compact search space is formed by the representative operations; then in the compact search space, the StacNAS obtains the weight of each operation on each edge by adopting the same method as DARTS, only reserves the operation with the maximum weight on each edge, replaces the operation with a plurality of operations in the corresponding group of the original search space, and continues searching; and finally, obtaining the final reserved operation according to the operation weight on each edge. Since both the stacNAS and the DARTS delete the operation with smaller weight in the searching process, when the operation weight is not greatly different, the operation with smaller weight is not selected any more after being deleted, and thus, only a suboptimal neural network architecture can be searched.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a neural network unit structure searching method based on an attention mechanism, which comprises the following steps:
step 1, constructing a macro-architecture hyper-network in a search space, wherein each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph comprise input nodes, intermediate nodes and output nodes, the input nodes receive output feature graphs of previous unit structures, the intermediate nodes aggregate the feature graphs of all previous nodes in the unit structure, the output nodes splice the feature graphs of all intermediate nodes, the nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space;
step 2, adding an attention module after outputting a characteristic diagram to all candidate operations of each edge in the unit structure to obtain a network to be searched;
and 3, training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate search network unit structure in the training process until the training reaches the preset iteration times, and eliminating all attention modules in the current network to be searched to obtain the search result of the neural network unit structure of the data set.
The neural network unit structure searching method comprises the steps that a data set comprises a plurality of samples, each sample is provided with a corresponding label, the samples are pictures, and the labels are picture categories; the search space is a DARTS search space.
The attention mechanism-based neural network macro architecture searching method further comprises the following steps:
and 5, training a search result of the neural network unit structure by using the data set to obtain an image search model, and inputting the image to be classified into the image search model to obtain the image category of the image to be classified.
The neural network macro architecture searching method based on the attention mechanism is characterized in that each edge in a unit structure of the macro architecture super network consists of a plurality of candidate operations, the edge has m candidate operations, and each candidate operation corresponds to m characteristic graphs
Figure BDA0003536397130000021
Each feature map has a size of
Figure BDA0003536397130000022
Splicing the m feature graphs according to the channel dimension to obtain spliced features
Figure BDA0003536397130000023
Inputting the calculated attention weight into the attention module for calculating the attention weight of each candidate operation, wherein the attention module consists of a global average pooling layer, a full connection layer and a Sigmoid layer;
this step 3 comprises calculating the attention weight of each candidate operation on each edge in the cell structure of the network to be searched:
stitching features F of all candidate operations on each edgeconObtaining features after pooling via global average pooling
Figure BDA0003536397130000031
Then the characteristic
Figure BDA0003536397130000032
Output attention weight through two layers of full connection and Sigmoid layer, Sigmoid layer
Figure BDA0003536397130000033
The invention also provides a neural network unit structure searching system based on the attention mechanism, which comprises the following components:
the macro-architecture hyper-network comprises an initialization module, a search space and a search module, wherein the initialization module is used for constructing a macro-architecture hyper-network in the search space, each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph comprise input nodes, intermediate nodes and output nodes, the input nodes receive output feature graphs of previous unit structures, the intermediate nodes gather the feature graphs of all the previous nodes in the unit structure, the output nodes are spliced with the feature graphs of all the intermediate nodes, the nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space;
the adding module is used for adding an attention module after outputting the characteristic diagram to all the candidate operations of each edge in the unit structure to obtain a network to be searched;
and the searching module is used for training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate searching network unit structure in the training process until the training reaches the preset iteration times, and eliminating all the attention modules in the current network to be searched to obtain the searching result of the neural network unit structure of the data set.
The neural network unit structure searching system comprises a plurality of samples in the data set, wherein each sample is provided with a corresponding label, the samples are pictures, and the labels are picture categories; the search space is a DARTS search space.
The attention mechanism-based neural network macro architecture search system further comprises:
and the picture classification module is used for training the neural network unit structure search result by using the data set to obtain a picture search model, and inputting the picture to be classified into the picture search model to obtain the picture category of the picture to be classified.
The neural network macro architecture search system based on the attention mechanism is characterized in that each edge in the unit structure of the macro architecture super network consists of a plurality of candidate operations, the edge has m candidate operations, and each candidate operation corresponds to m characteristic graphs
Figure BDA0003536397130000041
Each feature map having dimensions of
Figure BDA0003536397130000042
Splicing the m feature graphs according to the channel dimension to obtain spliced features
Figure BDA0003536397130000043
Inputting the calculated attention weight into the attention module for calculating the attention weight of each candidate operation, wherein the attention module consists of a global average pooling layer, a full connection layer and a Sigmoid layer;
the searching module is used for calculating the attention weight of each candidate operation on each edge in the unit structure of the network to be searched:
stitching feature F of all candidate operations on each edgeconObtaining features after pooling through global average pooling
Figure BDA0003536397130000044
Then the characteristic
Figure BDA0003536397130000045
Output attention weight through two layers of full connection and Sigmoid layer, Sigmoid layer
Figure BDA0003536397130000046
The invention also provides a storage medium for storing a program for executing the any one attention-based neural network macro architecture searching method.
The invention also provides a client used for the neural network macro architecture search system based on the attention mechanism.
According to the scheme, the invention has the advantages that:
the invention provides an Attention-based Neural network unit structure searching method and device (ANCS for short), which utilize an Attention mechanism to evaluate the importance of each operation on the basis of a differentiable searching strategy, add a regular term in a loss function to sparsify operation weight, and finally select the operation which needs to be finally reserved and the number of the operation according to the constraints of calculation and storage resources and inference time. Compared with the prior art, the method can not only consider the mutual influence among the operations, but also reserve each operation until the final step of searching, thereby obtaining the neural network architecture with excellent performance.
Drawings
FIG. 1 is a schematic diagram of a neural network unit structure searching method based on an attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hyper-network in accordance with an embodiment of the present invention;
FIG. 3 is a schematic view of an attention module of an embodiment of the present invention;
fig. 4 is a schematic diagram of an apparatus for searching a neural network unit structure based on an attention mechanism according to an embodiment of the present invention.
Detailed Description
The invention aims to provide a neural network unit structure searching method and device based on an attention mechanism. The invention mainly aims at the cell structure search in DARTS search space, and adopts an attention mechanism to measure the importance of each candidate operation in the cell.
In a first aspect, the present invention provides a neural network unit structure search method based on an attention mechanism, specifically including the following steps:
step 1, designing a directed acyclic graph.
The directed acyclic graph is composed of N (e.g., N ═ 3) intermediate nodes and E (e.g., E ═ 9) edges. Each node represents a corresponding feature graph, each edge represents a combination of a plurality of candidate operations in a search space, both N and E are hyperreferences, and the larger the two are, the larger the search space is, the larger the difficulty in searching a good structure is. The directed acyclic graph has two input nodes, and each input node receives the output characteristic graphs of two previous unit structures; each intermediate node of the directed acyclic graph aggregates the characteristic graph information of all the previous nodes in the unit structure; the feature graph of the output nodes of the directed acyclic graph is defined as a stitched feature graph of all intermediate nodes.
And 2, adding an attention module to each edge of the acyclic graph.
In a well-defined directed acyclic graph, each edge is composed of a plurality of candidate operations. Assuming that the edge has m candidate operations, each candidate operation receives the input node of the edge, and thus m feature maps are obtained
Figure BDA0003536397130000051
Each feature map has a size of
Figure BDA0003536397130000052
h and w represent the length and width of the feature map, respectively, c is the number of channels, and b represents the size of the convolution kernel. Then, the m feature graphs are spliced according to the channel dimension to obtain spliced features
Figure BDA0003536397130000053
And then input into our proposed attention mechanism module to calculate attention weights for each candidate operation. The attention mechanism module consists of a global averaging pooling, a full connectivity layer, and a Sigmoid layer.
And 3, calculating the importance of each candidate operation on each edge.
First, the stitching features F of all candidate operations on each edgeconObtaining features after pooling through global average pooling
Figure BDA0003536397130000054
Then the characteristic
Figure BDA0003536397130000055
Output via two full connections and Sigmoid, Sigmoid layers
Figure BDA0003536397130000056
Which is the attention weight, can be viewed as the degree of importance of each channel. Thus, the importance of each candidate operation is represented by the sum of the activation values of the corresponding channels. To enable updating the weights of the attention module, the activation value of each channel is multiplied by the original stitching profile
Figure BDA0003536397130000057
Obtaining attention weighted feature maps
Figure BDA0003536397130000058
Attention weighted feature maps to maintain the same dimensions as the original input feature map
Figure BDA0003536397130000059
A dot add operation is performed.
And 4, training and updating the weight of the candidate operation of the directed acyclic graph and the weight of the attention module based on the selected execution task.
Depending on the different target tasks, which may be target classification (object classification), target detection (object detection), semantic segmentation (semantic segmentation), instance segmentation (instance segmentation), target tracking (object tracking), etc., the directed acyclic graph is trained on the received training data set using conventional machine learning training techniques (e.g., stochastic gradient descent with back propagation) appropriate for the task, updating the weights of the different operations and the attention weight on each side according to the back propagation.
And 5, evaluating the performance of the directed acyclic graph by using the verification data set of the selected task.
And when the training set of the selected task is used for converging the network training in the step 4, evaluating the performance of the directed acyclic graph by using the verification set of the task, obtaining the attention weights of all candidate operations on each edge in the unit structure, and deleting the low-attention-weight operation of the directed acyclic graph firstly.
And 6, repeating the step 5, continuing training until convergence, and then using the verification set again for evaluation.
When the operation with low attention weight on each edge of the unit structure in step 5 is deleted, the accuracy of the super network is reduced, the training set is required to be used again to train to converge, then the verification set is used to perform evaluation again, the attention weights of all candidate operations on each edge of the unit structure in the current state are obtained, and the operation with low attention weight of the directed acyclic graph is deleted again.
And 7, outputting the neural network corresponding to the directed acyclic graph.
When only one operation with the largest attention weight on each side of the unit structure is reserved, the optimal target neural network suitable for the selected task is obtained. And finally, retraining the network by using all the data sets, and verifying the final performance.
In a second aspect, the invention provides an attention mechanism-based neural network unit structure searching device, which specifically comprises the following modules:
A. a unit structure construction module: the module constructs a directed acyclic graph structure of a unit structure, and the directed acyclic graph structure is composed of N intermediate nodes and E edges.
B. An attention module: this module is added to each edge of the unit structure for extracting the attention weight magnitude of all candidate operations on each edge of the unit structure, i.e. the importance degree of each candidate operation.
C. A unit structure searching and optimizing module: the module mainly sends the training set into the unit structure for forward propagation, and optimizes the weight parameters of different candidate operations in the unit structure and the weight parameters of the attention module through backward propagation.
D. Evaluating and updating the unit structure module: the module is used for evaluating the importance degree of each candidate operation in the unit structure, deleting the operation with low attention weight according to the weight of each operation, and then updating the topology of the unit structure.
E. A unit structure acquisition module: the module is used for obtaining an optimal unit structure model according to the importance degree of each candidate operation on each edge of the unit structure.
F. A target network training and verifying module: the module retrains the obtained optimal unit network structure model by using all training sets so as to optimize the weight parameters in the unit structure model and verify the performance of the searched unit structure model on the test set.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Example 1
The invention provides a neural network unit structure searching method based on an attention mechanism, which comprises the following steps:
s11: the unit structure is designed to have an acyclic graph.
In this step, the present invention is directed to a neural network cell structure search, i.e., a candidate operation for searching and determining the optimum on each edge in the cell structure. Taking DARTS search space as an example, each edge in the space contains a plurality of candidate operations, and the purpose of unit structure search is to select the optimal operation from the plurality of candidate operations to form the final target structure.
S12: an attention module is added to each edge of the directed acyclic graph.
In this step, the super network is formed by stacking L-layer units, as shown in FIG. 2, where c _ { k-2} and c _ { k-2} are input nodes, input data is the output of the first two units, 0, 1, 2 are intermediate nodes, and c _ { k } is an output node. The cells have two types, namely, a normal cell (normal cell) and a down-sampling cell (reduction cell), wherein the down-sampling cell is located in the network
Figure BDA0003536397130000071
And
Figure BDA0003536397130000072
layers, with the general cells being located in other layers. All the units are internally provided with directed acyclic graphs which comprise N nodes and E edges, and each edge is provided with m candidate operations, such as zero operation, convolution operation, pooling operation and the like. Meanwhile, the present invention adds an attention module after outputting the feature map for all candidate operations of each edge, as shown in fig. 3. The attention weight of the attention module may be expressed as a degree of importance of each candidate operation.
The resolution of the characteristic diagram is not changed by the general unit structure, the resolution of the characteristic diagram is halved by the down-sampling unit, and the channel is doubled. The number of the down-sampling units in the search space is generally two, and the down-sampling units may be arranged at different positions of the super-network according to the requirement, which is not limited by the present invention.
S13: and sending the selected task training set picture into a directed acyclic graph, calculating the gradient, and optimizing the unit structure and the weight of the attention module according to the gradient direction by using an optimizer.
In this step, training a hyper-network including an attention module on a training set is mainly performed, and the weight of the hyper-network is updated at the same time of training, and the weight of the attention module is updated accordingly, so as to learn the importance of each candidate operation in the hyper-network.
S14: the low attention weight operation is deleted according to the weight size of each operation, and then the cell structure topology is updated.
In this step, as training of the hyper-network proceeds, operations with smaller attention weights on each edge in the unit structure are deleted step by step, and then the topology of the unit structure is updated. The gradual deletion may be performed every iteration, or after every preset number of iterations.
S15: whether the algorithm reaches a specified number of iterations.
In this step, it is used to determine whether the training of the hyper-network including the attention module reaches the specified iteration number, that is, whether there are multiple candidate operations on each edge of the unit structure, if not, continue to train in step S13, and if the specified iteration number is reached, that is, after there is only one operation on each edge, proceed to the next step.
S16: and obtaining the target optimal unit structure.
In this step, candidate operations with low attention weights are deleted step by step according to steps S13-S15, and then only one operation with the largest attention weight is left on each side of the final cell structure, resulting in a target optimal cell structure on the task.
S17: the target structure is retrained on the entire data set, verifying performance.
In this step, the target structure obtained in step S16 above is retrained over the entire training set until convergence, and then its performance index is tested in the test set.
Example 2
An embodiment of the present invention further provides an apparatus for searching a neural network unit structure based on an attention mechanism, as shown in fig. 4, the apparatus includes: a unit structure construction module 21, an attention module 22, a unit structure searching and optimizing module 23, an evaluation and update unit structure module 24, a unit structure acquisition module 25, and a target network training and verifying module 26.
The unit structure building module 21 is configured to build a directed acyclic graph structure of a unit structure, and is configured to form N intermediate nodes and E edges; an attention module 22, which is composed of a global average pooling layer, a full connection layer and a Sigmoid layer, and is added to each edge of the unit structure for extracting the attention weight of all candidate operations on each edge of the unit structure, i.e. the importance degree of each candidate operation; a unit structure searching and optimizing module 23, which mainly sends the training set into the unit structure for forward propagation and optimizes the weight parameters of different candidate operations in the unit structure and the weight parameters of the attention module through backward propagation; an evaluate and update unit structure module 24 for evaluating the importance of each candidate operation in the unit structure, deleting the operation with low attention weight according to the weight of each operation, and then updating the topology of the unit structure; a unit structure obtaining module 25, configured to obtain an optimal unit structure model according to the importance degree of each candidate operation on each edge of the unit structure; and a target network training and verification module 26, which retrains the obtained optimal unit network structure model by using the whole training set to optimize the weight parameters in the unit structure model, and verifies the performance of the searched unit structure model on the test set.
In the device for searching the network architecture based on the microscopic spirit of the attention mechanism provided by the embodiment of the invention, the working process of each module has the same technical characteristics as the method for searching the network architecture based on the microscopic spirit of the attention mechanism, so that the functions can be realized in the same way, and the detailed description is omitted.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a neural network unit structure searching system based on the attention mechanism, which comprises the following components:
the macro-architecture hyper-network comprises an initialization module, a search space and a search module, wherein the initialization module is used for constructing a macro-architecture hyper-network in the search space, each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph comprise input nodes, intermediate nodes and output nodes, the input nodes receive output feature graphs of previous unit structures, the intermediate nodes aggregate the feature graphs of all the previous nodes in the unit structure, the output nodes splice the feature graphs of all the intermediate nodes, the nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space;
the adding module is used for adding an attention module after outputting the characteristic diagram to all the candidate operations of each edge in the unit structure to obtain a network to be searched;
and the searching module is used for training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate searching network unit structure in the training process until the training reaches the preset iteration number, and eliminating all the attention modules in the current network to be searched to obtain the searching result of the neural network unit structure of the data set.
The neural network unit structure searching system comprises a plurality of samples in the data set, wherein each sample is provided with a corresponding label, the samples are pictures, and the labels are picture categories; the search space is a DARTS search space.
The attention mechanism-based neural network macro architecture search system further comprises:
and the picture classification module is used for training the neural network unit structure search result by using the data set to obtain a picture search model, and inputting the picture to be classified into the picture search model to obtain the picture category of the picture to be classified.
The neural network macro architecture search system based on the attention mechanism is characterized in that each edge in the unit structure of the macro architecture super network consists of a plurality of candidate operations, the edge has m candidate operations, and each candidate operation corresponds to m feature maps
Figure BDA0003536397130000101
Each feature map has a size of
Figure BDA0003536397130000102
Splicing the m characteristic graphs according to the channel dimension to obtain the spliced characteristics
Figure BDA0003536397130000103
Inputting the calculated attention weight into the attention module for calculating the attention weight of each candidate operation, wherein the attention module comprises a global average pooling layer, a full connection layer and a Sigmoid layer;
the searching module is configured to calculate an attention weight of each candidate operation on each edge in a unit structure of the network to be searched:
stitching features F of all candidate operations on each edgeconObtaining features after pooling through global average pooling
Figure BDA0003536397130000104
Then the characteristic
Figure BDA0003536397130000105
Output attention weight through two layers of full connection and Sigmoid layer, Sigmoid layer
Figure BDA0003536397130000106
The invention also provides a storage medium for storing a program for executing the any one attention-based neural network macro architecture searching method.
The invention also provides a client used for the arbitrary attention mechanism-based neural network macro architecture search system.

Claims (10)

1. A neural network unit structure searching method based on an attention mechanism is characterized by comprising the following steps:
step 1, constructing a macro-architecture hyper-network in a search space, wherein each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph comprise input nodes, intermediate nodes and output nodes, the input nodes receive output feature graphs of previous unit structures, the intermediate nodes aggregate the feature graphs of all previous nodes in the unit structure, the output nodes splice the feature graphs of all intermediate nodes, the nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space;
step 2, adding an attention module after outputting the characteristic diagram of all candidate operations of each edge in the unit structure to obtain a network to be searched;
and 3, training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate search network unit structure in the training process until the training reaches the preset iteration times, and eliminating all attention modules in the current network to be searched to obtain the search result of the neural network unit structure of the data set.
2. The method of claim 1, wherein the data set comprises a plurality of samples, each sample having a corresponding label, the samples being pictures and the labels being picture categories; the search space is a DARTS search space.
3. The attention mechanism-based neural network macro architecture search method of claim 2, further comprising:
and 5, training a search result of the neural network unit structure by using the data set to obtain an image search model, and inputting the image to be classified into the image search model to obtain the image category of the image to be classified.
4. The method according to claim 1 or 2, wherein each edge of the unit structure of the macro architecture super network is composed of a plurality of candidate operations, the edge has m candidate operations, and each candidate operation corresponds to m feature maps
Figure FDA0003536397120000011
Each feature map having dimensions of
Figure FDA0003536397120000012
Splicing the m feature graphs according to the channel dimension to obtain spliced features
Figure FDA0003536397120000013
Inputting the calculated attention weight into the attention module for calculating the attention weight of each candidate operation, wherein the attention module consists of a global average pooling layer, a full connection layer and a Sigmoid layer;
this step 3 comprises calculating the attention weight of each candidate operation on each edge in the cell structure of the network to be searched:
stitching features F of all candidate operations on each edgeconObtaining features after pooling through global average pooling
Figure FDA0003536397120000021
Then the characteristic
Figure FDA0003536397120000022
Output attention weight through two layers of full connection and Sigmoid layer, Sigmoid layer
Figure FDA0003536397120000023
5. An attention-based neural network element structure search system, comprising:
the macro-architecture hyper-network comprises an initialization module, a search space and a search module, wherein the initialization module is used for constructing a macro-architecture hyper-network in the search space, each layer of unit structure in the macro-architecture hyper-network is a directed acyclic graph, nodes in the directed acyclic graph comprise input nodes, intermediate nodes and output nodes, the input nodes receive output feature graphs of previous unit structures, the intermediate nodes aggregate the feature graphs of all the previous nodes in the unit structure, the output nodes splice the feature graphs of all the intermediate nodes, the nodes in the directed acyclic graph are connected through edges, and each edge represents a combination of a plurality of candidate operations in the search space;
the adding module is used for adding an attention module after outputting the characteristic diagram to all the candidate operations of each edge in the unit structure to obtain a network to be searched;
and the searching module is used for training the network to be searched by using the labeled data set, gradually deleting the candidate operation with the minimum attention weight on each edge in the intermediate searching network unit structure in the training process until the training reaches the preset iteration number, and eliminating all the attention modules in the current network to be searched to obtain the searching result of the neural network unit structure of the data set.
6. The neural network unit structure searching system of claim 5, wherein the data set comprises a plurality of samples, each sample having a corresponding label, the samples being pictures, the labels being picture categories; the search space is a DARTS search space.
7. The attention mechanism-based neural network macro architecture search system of claim 6, further comprising:
and the picture classification module is used for training the neural network unit structure search result by using the data set to obtain a picture search model, and inputting the picture to be classified into the picture search model to obtain the picture category of the picture to be classified.
8. The attention-based neural network macro architecture search system of claim 5 or 6, wherein each edge in the unit structure of the macro architecture super network is composed of a plurality of candidate operations, the edge has m candidate operations, and each candidate operation corresponds to m feature maps
Figure FDA0003536397120000024
Each feature map has a size of
Figure FDA0003536397120000025
Splicing the m feature graphs according to the channel dimension to obtain spliced features
Figure FDA0003536397120000026
Inputting the calculated attention weight into the attention module for calculating the attention weight of each candidate operation, wherein the attention module consists of a global average pooling layer, a full connection layer and a Sigmoid layer;
the searching module is used for calculating the attention weight of each candidate operation on each edge in the unit structure of the network to be searched:
all on each edgeSplicing feature F of candidate operationconObtaining features after pooling through global average pooling
Figure FDA0003536397120000031
Then the characteristics
Figure FDA0003536397120000032
Output attention weight through two layers of full connection and Sigmoid layer, Sigmoid layer
Figure FDA0003536397120000033
9. A storage medium storing a program for executing the attention mechanism-based neural network macro architecture search method according to any one of claims 1 to 4.
10. A client for use in the attention-based neural network macro architecture search system of any one of claims 5 to 8.
CN202210219650.XA 2022-03-08 2022-03-08 Attention mechanism-based neural network unit structure searching method and system Pending CN114626506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210219650.XA CN114626506A (en) 2022-03-08 2022-03-08 Attention mechanism-based neural network unit structure searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210219650.XA CN114626506A (en) 2022-03-08 2022-03-08 Attention mechanism-based neural network unit structure searching method and system

Publications (1)

Publication Number Publication Date
CN114626506A true CN114626506A (en) 2022-06-14

Family

ID=81899583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210219650.XA Pending CN114626506A (en) 2022-03-08 2022-03-08 Attention mechanism-based neural network unit structure searching method and system

Country Status (1)

Country Link
CN (1) CN114626506A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116736713A (en) * 2023-06-13 2023-09-12 天津国能津能滨海热电有限公司 Power plant combustion control system and method based on NARX prediction model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116736713A (en) * 2023-06-13 2023-09-12 天津国能津能滨海热电有限公司 Power plant combustion control system and method based on NARX prediction model

Similar Documents

Publication Publication Date Title
CN109120462B (en) Method and device for predicting opportunistic network link and readable storage medium
CN111325338B (en) Neural network structure evaluation model construction and neural network structure searching method
CN111723780B (en) Directional migration method and system of cross-domain data based on high-resolution remote sensing image
CN112381208B (en) Picture classification method and system based on neural network architecture search
CN113393474B (en) Feature fusion based three-dimensional point cloud classification and segmentation method
CN113255895B (en) Structure diagram alignment method and multi-diagram joint data mining method based on diagram neural network representation learning
CN113065013B (en) Image annotation model training and image annotation method, system, equipment and medium
CN113792768A (en) Hypergraph neural network classification method and device
CN112000689A (en) Multi-knowledge graph fusion method based on text analysis
CN112712169A (en) Model building method and application of full residual depth network based on graph convolution
CN112215269A (en) Model construction method and device for target detection and neural network architecture
CN114626506A (en) Attention mechanism-based neural network unit structure searching method and system
CN111767983A (en) Discrete differentiable neural network searching method based on entropy loss function
CN112270259B (en) SAR image ship target rapid detection method based on lightweight convolutional neural network
CN116416468B (en) SAR target detection method based on neural architecture search
CN117111464A (en) Self-adaptive fault diagnosis method under multiple working conditions
CN114742199A (en) Attention mechanism-based neural network macro architecture searching method and system
CN113610350B (en) Complex working condition fault diagnosis method, equipment, storage medium and device
CN115292509A (en) Graph cube link prediction method based on multi-granularity attention network
CN115018884A (en) Visible light infrared visual tracking method based on multi-strategy fusion tree
CN114972959A (en) Remote sensing image retrieval method for sample generation and in-class sequencing loss in deep learning
CN115147727A (en) Method and system for extracting impervious surface of remote sensing image
KR102110316B1 (en) Method and device for variational interference using neural network
CN114611668A (en) Vector representation learning method and system based on heterogeneous information network random walk
CN112801264B (en) Dynamic differentiable space architecture searching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination