Background
Convolutional Neural Networks (CNN) are one of the algorithms for deep learning, and include a Neural Network with a deep structure and a convolutional calculation. Most image recognition methods today utilize CNN to automatically extract image features, thereby enabling a transition from an empirically-driven artificial feature paradigm to a data-driven representation learning paradigm. However, CNN also has the following disadvantages: (1) a large number of samples are needed for training, and the training speed is slow; (2) the convolution kernel extracts only local features, and the features finally used for recognition are aggregations of the local features, and global spatial features are lacked to some extent.
The data can be viewed as graphs in nature, and representing the data as graphs can provide great flexibility and can provide distinct perspectives in dealing with problems. The data relation is modeled by graph topology, graph nodes represent data entities, graph edges represent the relation between the entities, and a graph formed by a plurality of nodes and edges can perfectly and clearly describe the data. However, the traditional graph analysis method is difficult to realize the application level and the model performance which are equivalent to the computer vision field, and the existing machine learning algorithm can not be directly applied to graph data. The Graph Neural Network (GNN) utilizes the strong learning capability of the Neural Network to learn and memorize the topological knowledge in the Graph Network, thereby extracting the abundant potential value in the Graph structure.
Garcia V et al propose using the GNN method for image-less sample recognition (see: Garcia V, BrunaJ. Few-shot learning with graphic network [ C ]// ICLR, 2018, https:// arxiv. org/pdf/1711.04043.pdf), but its GNN input is the depth feature of different classes of images, with a graph node for each image. Kim J proposes an Edge-folding-based GNN structure, and focuses on considering similarity within classes and dissimilarity between classes (see: Kim J, Kim T, Kim S, et al. Edge-folding Graph Neural Network for Few-sharing left [ C ]// CVPR, 2019, http:// open access. the cvf. com/content _ CVPR _ 2019/pages/Kim _ Edge-folding _ Graph _ Neural _ Net _ for _ Few-Shot _ left _ PR _2019_ P.pdf), but still obtains corresponding feature vectors through a convolutional Neural Network as initialization map nodes, and does not consider the relationship between depth features of different scales.
Because the graph data structures are different, the existing method cannot process all the graph data. Therefore, designing a specific deep learning model to handle these different types of graphs is an important work content. However, the current topological space relationship surrounding the multi-channel CNN convolution features is not known, and how to dig the value behind the graph topology, thereby realizing rapid training and convergence.
Disclosure of Invention
The invention aims to provide a deep learning network structure and a training method based on a group convolution characteristic topological space, which comprises the steps of extracting multichannel CNN convolution characteristics, classifying the multichannel CNN convolution characteristics according to groups by channel indexes to form group convolution, then constructing a graph topological space, viewing each group convolution characteristic into a graph topological space node, automatically/manually constructing a graph topological space node connection rule, generating a Laplace matrix L, sending the Laplace matrix L into a GNN hidden layer network, outputting a group convolution characteristic topological space graph characteristic, and finally completing characteristic identification based on the graph characteristic; and migration learning is selectively utilized in a convolution feature extraction layer during training, so that the performance of the model is improved. The invention can endow the graph topological space rules of the CNN characteristics under different channels, thereby accelerating the traditional CNN training and convergence speed and providing a new thought and method for the fusion of the CNN and the GNN.
In order to achieve the above purpose, with reference to fig. 1, the present invention provides a deep learning network based on a group convolution feature topology space, where the deep learning network includes a convolution feature extraction layer, a group convolution topology layer, and a deep feature identification layer, which are connected in sequence;
the convolution feature extraction layer comprises a first convolution layer, a first pooling layer, a second convolution layer and a second pooling layer which are sequentially connected, and is used for extracting multichannel CNN convolution features of sample data, and extracting results are used as input of the group convolution topology layer;
the group convolution topology layer comprises a group convolution characteristic layer, a graph network input layer, a graph neural network hidden layer and a graph network output layer which are sequentially connected, and is used for combining the extracted multi-channel CNN convolution characteristics, forming group convolution according to group classification by channel indexes, constructing a graph topology space, viewing each group convolution characteristic as a graph topology space node, automatically/manually constructing a graph topology space node connection rule, generating a Laplace matrix L, and taking the Laplace matrix L as the input of the depth characteristic identification layer;
the depth feature identification layer comprises a leveling layer, a full connection output layer and a Softmax layer which are sequentially connected, and is used for outputting the group convolution feature topological space diagram features corresponding to the sample data according to the input Laplace matrix L.
With reference to fig. 2, the present invention further provides a training method for a deep learning network based on a group convolution characteristic topology space, where the deep learning network based on the group convolution characteristic topology space employs the deep learning network of claim 1;
the training method comprises the following steps:
s1, extracting multichannel CNN convolution characteristics according to input sample data;
s2, combining the extracted multi-channel CNN convolution characteristics, forming group convolution according to group classification by channel indexes, constructing a graph topology space, viewing each group convolution characteristic as a graph topology space node, automatically/manually constructing a graph topology space node connection rule, and generating a Laplace matrix L;
and S3, sending the generated Laplace matrix L into the GNN hidden layer network, and outputting the topological space diagram feature of the group convolution feature.
As a preferable example of this, in step S1, the convolution feature extraction layer is selectively processed by the migration learning algorithm according to the data features of the sample data.
As a preferred example, the selectively processing the convolution feature extraction layer by using the transfer learning algorithm according to the data features of the sample data refers to:
analyzing the data size and the similarity of the sample data:
1) when the data volume of the sample data is smaller than the data volume threshold and the similarity is larger than or equal to the similarity threshold, freezing all parameters of the convolution feature extraction layer, and only the group convolution topological layer and the depth feature identification layer participate in updating;
2) when the data size of the sample data is smaller than the data size threshold and the similarity is smaller than the similarity threshold, the convolution feature extraction layer adopts a Fine-tune strategy;
3) when the data volume of the sample data is larger than or equal to the data volume threshold value and the similarity is smaller than the similarity threshold value, retraining the network;
4) and when the data volume of the sample data is more than or equal to the data volume threshold and the similarity is more than or equal to the similarity threshold, extracting layer parameters by adopting convolution characteristics in advance and training.
As a preferred example, in step S2, the process of automatically/manually constructing the graph topology space node connection rule includes the following steps:
s21, manually and randomly generating an adjacent feature matrix A;
s22, calculating a Laplace matrix L according to the following formula:
L=D-1/2AD-1/2
where D is a degree matrix.
As a preferred example, in step S21, the method for manually generating the adjacent feature matrix a is as follows:
For i in length(features):
if(i+1<length(features)):
A[i,i+1]=1
A[i+1,i]=1。
as a preferred example, in step S3, the step of sending the generated laplacian matrix L to the GNN hidden layer network and outputting the group convolution feature topological-spatial graph feature includes the following steps:
inputting a depth feature recognition layer in the original CNN network, changing multi-channel CNN convolution features extracted by a second pooling layer into group convolution feature topological space diagram features in a group convolution topological layer, carrying out full-connection network training, and finally outputting a final result through a Softmax layer.
As a preferred example, the feature of the group convolution feature topological space diagram is extracted according to the following formula:
Xl+1=σ(LXlW+B)
wherein, Xl+1For feature output at a level subsequent to the level of l of the graph network, W being a topological feature of the graphWeight vector, B is the bias vector, σ is the activation function.
Compared with the prior art, the technical scheme of the invention has the following remarkable beneficial effects:
1. the GNN and the CNN are fused, the traditional method for directly identifying the CNN characteristics is changed, and the multi-channel CNN convolution characteristics are brought into the graph topology space, so that the characteristic identification is realized under a certain space topology rule, and the network training and convergence speed is accelerated.
2. Different from the prior GNN method that the nodes correspond to each image, the method introduces a group convolution characteristic topology space, each graph node in the space corresponds to each group of extracted group convolution characteristics, graph neural network transmission is realized on the image depth characteristic graph layer, and valuable information in the graph topology is mined.
3. In the training process, the network training can be realized by selectively utilizing transfer learning in the convolution feature extraction layer and freezing or Fine-tune convolution feature extraction layer under the condition of few samples; under the condition of a large sample, the convolution feature extraction layer directly adopts pre-training model parameters, and the training convergence speed is further accelerated.
It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent. In addition, all combinations of claimed subject matter are considered a part of the presently disclosed subject matter.
The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.
Detailed Description
In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides a detailed network structure diagram, including: a convolutional feature extraction layer L1, a group convolutional topology layer L2 and a depth feature identification layer L3.
The convolution feature extraction layer L1 utilizes a conventional CNN network to extract multi-channel convolution features, and comprises a first convolution layer Conv1, a first pooling layer Pool1, a second convolution layer Conv2 and a second pooling layer Pool 2.
The Group convolution topology Layer L2 comprises a Group convolution feature Layer Group Features, a Graph network input Layer GraphInput, a Graph neural network Hidden Layer Hidden Layer and a Graph network Output Layer Graph Output, and the Layer is responsible for extracting Graph topology space Features of CNN Features under different channels.
The depth feature recognition Layer L3 comprises a leveling Layer Flatten Layer, a full connection Layer FC Layer, a full connection Output Layer and a Softmax Layer, wherein the Output features of the group convolution topology Layer are used as input, and the final recognition result is Output by the Softmax Layer.
With reference to fig. 2, the present invention further provides a training method for a deep learning network based on a group convolution feature topology space, which specifically includes the following steps:
(1) and extracting multichannel CNN convolution characteristics according to input sample data.
Preferably, in order to further improve the network performance, migration learning is selectively used in the convolution feature extraction layer L1. The selective utilization of transfer learning is mainly divided into the following four cases: 1) the data volume is small but the similarity is very high, all parameters of the convolution feature extraction layer are frozen, and only the group convolution topology layer and the feature identification layer participate in updating; 2) the data volume is small but the data similarity is low, and a Fine-tune strategy is adopted by the convolution feature extraction layer; 3) the data volume is large but the data similarity is low, and all networks in each layer are retrained; 4) the data volume is large but the data similarity is high, and the layer parameters are extracted by adopting the convolution characteristics in advance for training.
(2) After the multi-channel convolution characteristics are extracted, the channel indexes are classified according to groups to form group convolution, a graph topology space is constructed, each group convolution characteristic is seen as a graph topology space node, a graph topology space node connection rule is automatically/manually constructed, a Laplace matrix L is generated and is sent to a GNN hidden layer network, and the group convolution characteristic topology space graph characteristics are output.
In the graph topology space, each node is a group convolution formed by channel indexes according to group classification, each channel is a node in the embodiment of the invention, a method for automatically constructing a graph topology space node connection rule is that an adjacent feature matrix A is randomly generated, and a method for manually constructing a graph topology space node connection rule in the embodiment of the invention is as follows:
For i in length(features):
if(i+1<length(features)):
A[i,i+1]=1
A[i+1,i]=1
the calculation formula of the Laplace matrix L is L ═ D-1/2AD-1/2Where D is a degree matrix.
The specific method for extracting the features of the output group convolution feature topological space graph comprises the following steps:
Xl+1=σ(LXlW+B)
wherein Xl+1And outputting the characteristics of the later layer of the l level of the graph network, wherein W is a weight vector of the topological characteristics of the graph, B is a bias vector, and sigma is an activation function.
(3) Inputting a depth feature recognition Layer L3 in the original CNN network, changing the multichannel convolution features extracted by a second pooling Layer Pool2 into the topological space diagram features of the group convolution features in a group convolution topological Layer L2, carrying out full-connection network training, and finally outputting a final result through a Softmax Layer.
In this disclosure, aspects of the present invention are described with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the present disclosure are not necessarily defined to include all aspects of the invention. It should be appreciated that the various concepts and embodiments described above, as well as those described in greater detail below, may be implemented in any of numerous ways, as the disclosed concepts and embodiments are not limited to any one implementation. In addition, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.