CN112308089A - Attention mechanism-based capsule network multi-feature extraction method - Google Patents

Attention mechanism-based capsule network multi-feature extraction method Download PDF

Info

Publication number
CN112308089A
CN112308089A CN201910689204.3A CN201910689204A CN112308089A CN 112308089 A CN112308089 A CN 112308089A CN 201910689204 A CN201910689204 A CN 201910689204A CN 112308089 A CN112308089 A CN 112308089A
Authority
CN
China
Prior art keywords
network
layer
capsule
image
proofreading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910689204.3A
Other languages
Chinese (zh)
Inventor
王耀彬
卜得庆
唐苹苹
王欣夷
李凌
孟慧玲
刘启川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN201910689204.3A priority Critical patent/CN112308089A/en
Publication of CN112308089A publication Critical patent/CN112308089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a capsule network multi-feature identification and extraction method based on an attention mechanism, which comprises the following steps: (1) designing an NCap network and constructing an attention-based capsule network framework by using the NCap network; (2) inputting an image training set to the attention mechanism capsule network, wherein the attention mechanism capsule network completes the recognition and extraction of image features after training and learning and generates a corresponding optimal training model; (3) inputting an image to be identified to the attention mechanism capsule network, and loading an optimal network model and identifying image characteristics by the attention mechanism capsule network; (4) and the attention mechanism capsule network outputs the identification result of the image to be identified. The invention provides an attention mechanism-based idea of fusing a convolution network mechanism and a capsule network structure, recording the relative position and direction of an image and reducing parameters during training, and effectively improving the recognition efficiency and accuracy.

Description

Attention mechanism-based capsule network multi-feature extraction method
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image multi-feature recognition and classification method based on an attention mechanism and combined with a capsule network in the technical field of image recognition. The method can be used for extracting and identifying the key information features of the image.
Background
In recent years, the trend of single attribute identification to multi-attribute identification has been developed in the technical fields of target identification and feature extraction, and the rapid innovation of re-identification technology is greatly promoted by the increasing maturity of the technology. However, the precise classification of multi-attribute recognition still has some difficulties, such as too high dimensionality of pixels, low resolution, and noise interference. At present, the methods of target identification and feature extraction basically adopt a convolutional neural network for training and learning. Convolutional layers are important components of convolutional neural networks, and the size of the convolutional kernel determines the richness of the feature map. Meanwhile, in order to reduce the calculation amount of the model, the convolutional neural network reduces the size of the feature map through pooling operation, but the use of the pooling layer easily causes the problems of loss of key information such as position, direction and the like. For example, in a human face image, after the relative positions of the eyes and the mouth are adjusted, when the image is input into the convolutional neural network for prediction, the result of recognizing the human face is far greater than the result of recognizing the non-human face.
Aiming at the problems of loss of key information such as position, direction and the like in the identification process of a convolutional neural network, a father Hinton of the neural network provides a solution measure such as a capsule network, the capsule network adopts vectors, namely the distance and the direction from a viewpoint to an image as input and output, and then a dynamic routing mechanism is used for carrying out iterative updating on parameters, but the calculation amount is quite large. Compared with a convolutional neural network, the capsule network uses two-dimensional data as input and output, so that loss of key information such as relative positions and directions of images in a training process can be prevented, and the recognition accuracy is improved.
The southwest university proposes a capsule network image classification and identification method for improving a reconstruction network in the patent document "a capsule network image classification and identification method for improving a reconstruction network" (patent application number: CN201810509412.6, application publication number: CN 108985316A). The method comprises the following specific steps: firstly, the method constructs a capsule network by constructing a working network and a proofreading network, selects the maximum numerical vector output by the working network as an image classification recognition result, and outputs the maximum numerical vector to margin loss to calculate the deviation between the recognition result and a real result. The reconstruction network then reverts the recognition result back to an image and compares it with the input image to derive a variance. And finally, adding the deviation amount and the variance, feeding the sum back to a working network, and continuously training and learning to achieve the aim of accurately identifying the image. According to the method, the deviation value and the variance are obtained by constructing the working network and the correcting network and are fed back to the working network for dynamic tuning, so that the identification accuracy is improved. However, this method still has disadvantages: the deviation and variance calculated by the working network and the calibration network are fed back to the working network, which inevitably affects the network calculation rate and energy consumption, and the method is only suitable for image recognition of single-feature attributes.
Xihui Liu et al propose an attention-based depth network HydraPlus-Net in its published paper "HydraPlus-Net: attention Deep cultures for Peer Analysis" (ICCV 2017). The method maps an image to be recognized to different feature layers in a multi-way and multi-range mode through an attention mechanism network, captures local and global information, and collects features according to semantics of different layers. The network mainly comprises a main network and an attention network, wherein the structures of the main network and the attention network are convolution neural network structures and shared convolution frames thereof, output cascade is carried out, finally global average pooling and full connection layer fusion are used, and output mapping is carried out on characteristic attributes for multi-attribute identification or characteristic vectors for re-identification. The method uses an attention mechanism network to identify the image attributes of multiple features and obtains a good identification result. However, the method still has the disadvantages that the main network and the attention network still use the mainstream convolutional neural network structure in the overall network framework of the method, and the most fatal problem of the convolutional neural network structure is that the use of the pooling layer causes the problem that important information such as position, direction and the like is lost, and the problem that the multi-feature attribute identification accuracy is never high is still caused.
Disclosure of Invention
Aiming at solving the problems of low accuracy in the process of multi-attribute feature recognition and loss of important information such as position, direction and the like of a convolutional neural network in the training process, the invention provides a capsule network multi-feature extraction and recognition method based on an attention mechanism, aiming at the current situation that the fields such as target recognition, feature extraction and the like all use the convolutional neural network as a main method. In the training process, the identification area is dynamically adjusted, the capsule network technology is combined with the convolutional neural network technology, and important information such as the relative position and direction of an image is kept from a training source by comparing and feeding back calculation errors between the capsule network and the convolutional network, so that the problem that the identification accuracy is reduced due to the fact that the important information is lost when data are trained is solved.
In order to achieve the above purpose, the present invention starts from the "concentration" area of the training data, and sets an adjustable convolution layer between the input layer of the network architecture and the main network and the attention network, wherein the layer can dynamically adjust the deep semantics, shallow semantics and local semantics of the identified image to achieve the purpose of dynamically identifying the specific area. In a main network and an attention network, an Ncap network with a capsule network and a convolution network fused is used, and the capsule network multi-feature identification and extraction method based on the attention system is realized by comparing and feeding back error parameters, dynamically adjusting identification network weights and corresponding calculation loss, and comprises the following specific steps:
step 1: constructing an attention machine capsule network, wherein the attention machine capsule network comprises an input layer, an adjustable convolution layer, a main network, an attention network, a global average weight, a full connection structure and an output layer, the input layer is used for inputting training data and identification data, the adjustable convolution layer is used for adjusting the semantic range of an identification image, the main network is used for extracting the overall semantics of a character image under different scales, the attention network is used for extracting the shallow semantics and the local semantics of the character image under different scales, and the main network and the attention network are sequentially connected with the global average weight, the full connection structure and the output layer after being converged; the main network comprises 3 serially connected Ncap networks, the input end of the Ncap network is connected with the input end of the data image, and the output end of the Ncap network is connected with the global weight calculation;
the attention network comprises 3 layers and 3 cascaded Ncap networks, wherein the input end of each layer of Ncap network is connected with the input end of an image, and the output end of each layer of Ncap network is connected with a global weight calculation;
the Ncap network comprises a working network and a proofreading network, wherein the working network is used for inputting an image and outputting an identification result of the image, and the proofreading network is used for comparing and feeding back training adjustment parameters to the working network;
the working network comprises a convolution structure and a full connection structure. The output end of the convolution layer of the convolution structure is respectively connected with the pooling layer of the convolution structure in the working network and the capsule layer of the proofreading capsule in the proofreading network, the output end of the pooling layer of the convolution structure is respectively connected with the input end of the convolution layer of the convolution structure in the working network and the proofreading layer of the proofreading capsule in the proofreading network, and the Nth pooling layer of the convolution structure is connected with the weight calculation of the full-connection structure in the working network. The full connection structure is a network structure of a weight calculation layer and a full connection layer in sequence;
the proofreading network comprises a proofreading capsule, a loss layer and an optimization algorithm layer, wherein the proofreading capsule comprises a capsule, a proofreading layer and a proofreading image error loss, the input end of the capsule is connected with a convolution layer of the working network, the input end of the proofreading layer is respectively connected with the output end of the capsule and a pooling layer of the working network, the output end of the proofreading layer is connected with the image error loss, the output end of the image error loss layer is respectively a convolution layer next to a convolution structure in the working network and the loss layer in the proofreading network, and the input end and the output end of the optimization algorithm layer are respectively connected with the loss layer and the working network;
step 2: inputting an image training set to the attention mechanism capsule network, completing feature extraction of images after training and learning by the attention mechanism capsule network, and outputting an optimal network training model;
and step 3: inputting an image to be recognized to the attention mechanism capsule network and loading an optimal network training model, wherein the output of the working network is the obtained recognition characteristic;
and 4, step 4: and the capsule network outputs the characteristic result of the image to be identified.
The existing characteristic recognition network structure is a convolutional neural network, so that the problems of loss of important information such as relative positions and directions of images and the like are easily caused, the calculation amount of the existing capsule network is large when vector transformation calculation is carried out, through the design, the advantages that the capsule network records the important information such as the relative positions and the directions of the images and the like are used, the advantages that the convolutional neural network is small in calculation amount and the like are used, the advantage that the network is high in accuracy is guaranteed in the training process, and the problems that energy consumption of running hardware of the network is low and the like are also guaranteed.
Further, the specific process of training the attention-based capsule network framework in step 2 is as follows:
s2.1, inputting the images in the image training set into an adjustable convolution layer, and obtaining multi-scale image information data D after convolution operation after adjusting the parameter t of the layer;
s2.2, transmitting the image information data D with the large-scale shallow semantic range to a main network consisting of NCaps, and calculating to obtain image global feature information I1
S2.3, transferring image information data D in a small-scale deep semantic range to NCap to form an attention network, and calculating to obtain local feature information I of the image2
S2.4, obtaining the global feature information I through calculation1And local feature information I2Input after mergingAnd calculating the global weight, performing full-connection operation, and performing classified output on the operation.
The specific steps of the NCap network training image information data of the main network and the attention network are as follows:
q1, the image information data D is transmitted to the NCap working network, and before the convolution operation is carried out on the convolution structure of the working network, capsule in the proofreading capsule of the proofreading network records the position direction information of the capsule;
q2, after the convolution and pooling operation of the convolution structure of the working network, transferring the pooled image information data to the collation layer in the collation capsule of the collation network;
q3, comparing the image information in the capsule with the image information stored in the proof layer after pooling operation to obtain an image error loss;
q4, repeating the operations of S2 and S3, and cascading the n losses to obtain the final loss;
q5, the final loss is optimized by an optimization algorithm and then fed back to the working network;
and Q6, adjusting parameters of each layer from back to front in a reverse order by the working network until the identification accuracy of the working network is constant, and finishing training and learning of the NCap network.
The invention has the following beneficial effects:
(1) the invention avoids the problem of losing important information such as relative position and direction of the image from the source of data training, and uses the attention mechanism to identify the specific area, so that the invention can extract more abundant and perfect characteristic information, thereby improving the accuracy of classification and identification of the image in the identification process.
(2) The network structure used by the invention and formed by fusing the capsule network and the convolution network is more suitable for the field of extracting various character attributes and multi-feature extraction and identification in the current field of feature extraction and target identification, and is more in line with the extraction of fine-grained features.
(3) The method combines the advantages of the capsule network and the convolutional network, not only solves the problems of large calculated amount and the like of the capsule network, but also solves the problems of information loss and the like of the convolutional network to high-level characteristics in the deep learning process, and the method has universality.
Drawings
FIG. 1 is a flowchart of an overall attention-based capsule network multi-feature extraction network according to an embodiment of the present invention;
FIG. 2 is a principal network NCap network architecture of an embodiment of the present invention;
FIG. 3 is an overall framework of an attention-directed capsule network of an embodiment of the present invention;
FIG. 4 is an illustration of the effect of the embodiment of the invention on the overall network framework;
fig. 5 is an illustration of the effect of the NCap capsule network structure according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
In order to realize the problems of low accuracy in the process of multi-attribute feature recognition and loss of important information such as position, direction and the like of a convolutional neural network in the training process, the technical scheme adopted by the invention is as follows: firstly, designing a capsule network NCap by combining the advantages of the capsule network and the convolution network, and constructing a capsule network framework based on an attention mechanism by using the NCap; then acquiring image data from the public data set, training and learning the image data in a capsule network framework, and completing extraction and identification of image features and generating an optimal training model after the attention mechanism capsule network training and learning; in the identification process, the image to be identified is input to the attention mechanism capsule network, and the optimal training model file is loaded, so that the identification result can be obtained and output. The flow chart is shown in fig. 1.
The implementation of the overall network framework of the embodiment comprises the following specific steps:
designing an NCap network, and constructing an attention capsule network framework by using the NCap network;
the NCap network is provided with a working network and a proofreading network as shown in FIG. 2, wherein the working network is used for inputting an image and outputting an identification result of the image, and the proofreading network is used for comparing and feeding back training adjustment parameters to the working network;
the working network comprises a convolution structure and a full connection structure. The output end of the convolution layer of the convolution structure is respectively connected with the pooling layer of the convolution structure in the working network and the capsule layer of the proofreading capsule in the proofreading network, the output end of the pooling layer of the convolution structure is respectively connected with the input end of the convolution layer of the convolution structure in the working network and the proofreading layer of the proofreading capsule in the proofreading network, and the Nth pooling layer of the convolution structure is connected with the weight calculation of the full-connection structure in the working network. The full connection structure is a network structure of a weight calculation layer and a full connection layer in sequence;
the proofreading network comprises a proofreading capsule, a loss layer and an optimization algorithm layer, wherein the proofreading capsule comprises a capsule, a proofreading layer and a proofreading image error loss, the input end of the capsule is connected with a convolution layer of the working network, the input end of the proofreading layer is respectively connected with the output end of the capsule and a pooling layer of the working network, the output end of the proofreading layer is connected with the image error loss, the output end of the image error loss layer is respectively a convolution layer next to a convolution structure in the working network and the loss layer in the proofreading network, and the input end and the output end of the optimization algorithm layer are respectively connected with the loss layer and the working network;
the attention mechanism capsule network framework is shown in fig. 3, the attention mechanism capsule network comprises an input layer, an adjustable convolution layer, a main network, an attention network, a global average weight, a full connection structure and an output layer, the input layer is used for inputting training data and identification data, the adjustable convolution layer is used for adjusting the semantic range of an identification image, the main network is used for extracting the overall semantics of a character image under different scales, the attention network is used for extracting the shallow semantics and the local semantics of the character image under different scales, and the main network and the attention network are connected with the global average weight, the full connection structure and the output layer in sequence after being converged;
the main network comprises 3 serially connected Ncap networks, the input end of the Ncap network is connected with the input end of the data image, and the output end of the Ncap network is connected with the global weight calculation;
the attention network comprises 3 layers and 3 cascaded Ncap networks, wherein the input end of the Ncap network is connected with the input end of an image, and the output end of the Ncap network is connected with the global weight calculation;
inputting an image training set to an attention mechanism capsule network, completing recognition and extraction of image multi-feature characters after the attention mechanism network training is completed, and storing an optimal training network model;
the training identification image used in the present embodiment is a pedestrian re-identification data set PA-100K, which includes hundreds of thousands of images corresponding to 26 human identification attributes. 80% of the data set was used as the training set, 10% of the data set was used as the validation set, and 10% of the data set was used as the test set.
An example of the effect of each NCap network of the overall attention mechanism capsule network framework is shown in fig. 4, in which the specific steps of training the re-recognition data set are as follows:
s2.1, inputting an image data set PA-100K to be trained into a Conv-layers of an attention system capsule network frame, wherein the Conv-layers set the identified granularity and range by setting the size of a corresponding convolution kernel;
s2.2, the image information I with shallow semantic meaning, namely large granularity, generated in the step S2.11Transmitting the training data to a main network for training;
s2.3, the deep semantic meaning generated in the step S2.1, namely the image information I with small granularity2-1、I2-2And I2-3The information is transmitted to an attention network for training;
and S2.4, carrying out global weight calculation and full connection on the character attribute identification characteristic information output by the main network and the attention network, and then carrying out classified output.
In the above steps, the image data enters the NCap network in the main network and the attention network for training and learning, and the network effect is shown in fig. 5, which includes the following specific steps:
q1, the image information data I is transmitted to the NCap working network, and the convolution structure of the working networkBefore convolution operation is carried out on the convolution layer N, position and direction information D of the capsule recorded image in the proofreading capsule of the proofreading network is proofread1
Q2, after convolution and pooling operation of convolution structure of working network, transferring image information data after pooling operation to collation layer in collation capsule of collation network to obtain image information D2
Q3, image information data D produced by comparing the steps Q1 and Q21And D2Obtaining an image error LOSS, and transmitting the image error LOSS into a convolutional layer N +1 in a working network and a LOSS layer in a proofreading network;
q4, repeating the operations of Q2 and Q3, and cascading the obtained n losses to obtain the final loss;
q5, the final loss is optimized by an optimization algorithm and then fed back to the working network;
and Q6, adjusting parameters of each layer from back to front in a reverse order by the working network until the identification accuracy of the working network is constant, and finishing training and learning of the NCap network.
Inputting an image to be recognized to the attention-making capsule network, loading an optimal training network model to the attention-making capsule network, and outputting a result, namely a recognition characteristic result;
and step four, outputting the character attribute feature recognition result of the image to be recognized by the capsule network.
The foregoing is only illustrative of the present invention. The convolutional neural network has an advantage of a small amount of calculation, but has a disadvantage of important information such as a directional position, and the capsule network has an advantage of recording important information such as a directional position, but has a disadvantage of a large amount of calculation. Therefore, the method combines the two and combines an attention mechanism to realize extraction and identification of the character characteristic information in multiple scales and ranges, thereby achieving the advantages of high accuracy, identification of multiple attributes and the like.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and changes may be made without inventive changes in the technical solutions of the present invention.

Claims (4)

1. A capsule network multi-feature extraction method based on an attention mechanism is characterized by comprising the following steps:
step 1: constructing an attention machine capsule network, wherein the attention machine capsule network comprises an input layer, an adjustable convolution layer, a main network, an attention network, a global average weight, a full connection structure and an output layer, the input layer is used for inputting training data and identification data, the adjustable convolution layer is used for adjusting the semantic range of an identification image, the main network is used for extracting the overall semantics of a character image under different scales, the attention network is used for extracting the shallow semantics and the local semantics of the character image under different scales, and the main network and the attention network are sequentially connected with the global average weight, the full connection structure and the output layer after being converged;
the main network comprises 3 serially connected Ncap networks, the input end of the Ncap network is connected with the input end of the data image, and the output end of the Ncap network is connected with the global weight calculation;
the attention network comprises 3 layers and 3 cascaded Ncap networks, wherein the input end of each layer of Ncap network is connected with the input end of an image, and the output end of each layer of Ncap network is connected with a global weight calculation;
the Ncap network comprises a working network and a proofreading network, wherein the working network is used for inputting an image and outputting an identification result of the image, and the proofreading network is used for comparing and feeding back training adjustment parameters to the working network;
the working network comprises a convolution structure and a full connection structure. The output end of the convolution layer of the convolution structure is respectively connected with the pooling layer of the convolution structure in the working network and the capsule layer of the proofreading capsule in the proofreading network, the output end of the pooling layer of the convolution structure is respectively connected with the input end of the convolution layer of the convolution structure in the working network and the proofreading layer of the proofreading capsule in the proofreading network, and the Nth pooling layer of the convolution structure is connected with the weight calculation of the full-connection structure in the working network. The full connection structure is a network structure of a weight calculation layer and a full connection layer in sequence;
the proofreading network comprises a proofreading capsule, a loss layer and an optimization algorithm layer, wherein the proofreading capsule comprises a capsule, a proofreading layer and a proofreading image error loss, the input end of the capsule is connected with a convolution layer of the working network, the input end of the proofreading layer is respectively connected with the output end of the capsule and a pooling layer of the working network, the output end of the proofreading layer is connected with the image error loss, the output end of the image error loss layer is respectively a convolution layer next to a convolution structure in the working network and the loss layer in the proofreading network, and the input end and the output end of the optimization algorithm layer are respectively connected with the loss layer and the working network;
step 2: inputting an image training set to the attention mechanism capsule network, completing feature extraction of images after training and learning by the attention mechanism capsule network, and outputting an optimal network training model;
and step 3: inputting an image to be recognized to the attention mechanism capsule network and loading an optimal network training model, wherein the output of the working network is the obtained recognition characteristic;
and 4, step 4: and the capsule network outputs the characteristic result of the image to be identified.
2. The attention-based capsule network multi-feature extraction method of claim 1, wherein: the specific process of the capsule network training and learning in the step 2 is as follows:
s2.1, inputting the images in the image training set into an adjustable convolution layer, and obtaining multi-scale image information data D after convolution operation after adjusting the parameter t of the layer;
s2.2, transmitting the image information data D with the large-scale shallow semantic range to a main network consisting of NCaps, and calculating to obtain image global feature information I1;
s2.3, transmitting image information data D in a small-scale deep semantic range to NCap to form an attention network, and calculating to obtain local feature information I2 of the image;
and S2.4, merging the global characteristic information I1 and the local characteristic information I2 obtained by calculation, inputting the merged information into global weight calculation, performing full-connection operation, and classifying and outputting the operation.
3. The attention-based capsule network multi-feature extraction method of claim 1 or 2, wherein: the specific mode of the NCap network training image information data is as follows:
q1, the image information data D is transmitted to the NCap working network, and before the convolution operation is carried out on the convolution structure of the working network, capsule in the proofreading capsule of the proofreading network records the position direction information of the capsule;
q2, after the convolution and pooling operation of the convolution structure of the working network, transferring the pooled image information data to the collation layer in the collation capsule of the collation network;
q3, comparing the image information in the capsule with the image information stored in the proof layer after pooling operation to obtain an image error loss;
q4, repeating the operations of S2 and S3, and cascading the n losses to obtain the final loss;
q5, the final loss is optimized by an optimization algorithm and then fed back to the working network;
and Q6, adjusting parameters of each layer from back to front in a reverse order by the working network until the identification accuracy of the working network is constant, and finishing training and learning of the NCap network.
4. The attention-based capsule network multi-feature extraction method of claim 1, 2 or 3, wherein: the NCap network structure comprises a working network and a proofreading network, wherein the working network comprises a convolution structure and a full-connection structure, the proofreading network comprises n proofreading capsules, LOSS layers and an optimization algorithm layer, the NCap network operation mechanism mainly utilizes image information before convolution and image information after convolution to generate a LOSS error, the LOSS error is fed back to a next layer of convolution layer in the working network convolution structure, n losses generated by the n proofreading capsules are cascaded into the LOSS layer, and the LOSS error is fed back to the working network after the optimization algorithm and the network parameters are adjusted in a reverse order.
CN201910689204.3A 2019-07-29 2019-07-29 Attention mechanism-based capsule network multi-feature extraction method Pending CN112308089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910689204.3A CN112308089A (en) 2019-07-29 2019-07-29 Attention mechanism-based capsule network multi-feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910689204.3A CN112308089A (en) 2019-07-29 2019-07-29 Attention mechanism-based capsule network multi-feature extraction method

Publications (1)

Publication Number Publication Date
CN112308089A true CN112308089A (en) 2021-02-02

Family

ID=74329486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910689204.3A Pending CN112308089A (en) 2019-07-29 2019-07-29 Attention mechanism-based capsule network multi-feature extraction method

Country Status (1)

Country Link
CN (1) CN112308089A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298037A (en) * 2021-06-18 2021-08-24 重庆交通大学 Vehicle weight recognition method based on capsule network
CN113591556A (en) * 2021-06-22 2021-11-02 长春理工大学 Three-dimensional point cloud semantic analysis method based on neural network three-body model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549646A (en) * 2018-04-24 2018-09-18 中译语通科技股份有限公司 A kind of neural network machine translation system based on capsule, information data processing terminal
CN108596870A (en) * 2018-03-06 2018-09-28 重庆金山医疗器械有限公司 Capsule endoscope image based on deep learning screens out method, apparatus and equipment
CN108898577A (en) * 2018-05-24 2018-11-27 西南大学 Based on the good malign lung nodules identification device and method for improving capsule network
CN108985316A (en) * 2018-05-24 2018-12-11 西南大学 A kind of capsule network image classification recognition methods improving reconstructed network
CN109241283A (en) * 2018-08-08 2019-01-18 广东工业大学 A kind of file classification method based on multi-angle capsule network
US20190034800A1 (en) * 2016-04-04 2019-01-31 Olympus Corporation Learning method, image recognition device, and computer-readable storage medium
CN109710769A (en) * 2019-01-23 2019-05-03 福州大学 A kind of waterborne troops's comment detection system and method based on capsule network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034800A1 (en) * 2016-04-04 2019-01-31 Olympus Corporation Learning method, image recognition device, and computer-readable storage medium
CN108596870A (en) * 2018-03-06 2018-09-28 重庆金山医疗器械有限公司 Capsule endoscope image based on deep learning screens out method, apparatus and equipment
CN108549646A (en) * 2018-04-24 2018-09-18 中译语通科技股份有限公司 A kind of neural network machine translation system based on capsule, information data processing terminal
CN108898577A (en) * 2018-05-24 2018-11-27 西南大学 Based on the good malign lung nodules identification device and method for improving capsule network
CN108985316A (en) * 2018-05-24 2018-12-11 西南大学 A kind of capsule network image classification recognition methods improving reconstructed network
CN109241283A (en) * 2018-08-08 2019-01-18 广东工业大学 A kind of file classification method based on multi-angle capsule network
CN109710769A (en) * 2019-01-23 2019-05-03 福州大学 A kind of waterborne troops's comment detection system and method based on capsule network

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ABDULLAH M. ALGAMDI等: "Learning Temporal Information from Spatial Information Using CapsNets for Human Action Recognition", 《ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
ASSAF HOOGI等: "Self-attention capsule networks for image classification", 《ARXIV》 *
XIHUI LIU等: "HydraPlus-Net-Attentive Deep Features for Pedestrian Analysis", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
付家慧等: "基于仿射变换的胶囊网络特征研究", 《信号处理》 *
林少丹等: "结合胶囊网络和卷积神经网络的目标识别模型", 《电讯技术》 *
王金甲等: "基于注意力胶囊网络的家庭活动识别", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298037A (en) * 2021-06-18 2021-08-24 重庆交通大学 Vehicle weight recognition method based on capsule network
CN113298037B (en) * 2021-06-18 2022-06-03 重庆交通大学 Vehicle weight recognition method based on capsule network
CN113591556A (en) * 2021-06-22 2021-11-02 长春理工大学 Three-dimensional point cloud semantic analysis method based on neural network three-body model

Similar Documents

Publication Publication Date Title
CN104866810B (en) A kind of face identification method of depth convolutional neural networks
CN110516536B (en) Weak supervision video behavior detection method based on time sequence class activation graph complementation
CN109190581A (en) Image sequence target detection recognition methods
CN111460980B (en) Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion
CN111860587B (en) Detection method for small targets of pictures
CN113628244B (en) Target tracking method, system, terminal and medium based on label-free video training
CN111680739B (en) Multi-task parallel method and system for target detection and semantic segmentation
CN112464004A (en) Multi-view depth generation image clustering method
CN110263855B (en) Method for classifying images by utilizing common-basis capsule projection
Jiang et al. Learning to transfer focus of graph neural network for scene graph parsing
KR20210100592A (en) Face recognition technology based on heuristic Gaussian cloud transformation
WO2024032010A1 (en) Transfer learning strategy-based real-time few-shot object detection method
CN112308089A (en) Attention mechanism-based capsule network multi-feature extraction method
Zhai et al. An improved faster R-CNN pedestrian detection algorithm based on feature fusion and context analysis
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
Buenaposada et al. Improving multi-class Boosting-based object detection
CN114818963A (en) Small sample detection algorithm based on cross-image feature fusion
CN112541566B (en) Image translation method based on reconstruction loss
CN117853955A (en) Unmanned aerial vehicle small target detection method based on improved YOLOv5
CN111898560A (en) Classification regression feature decoupling method in target detection
CN110717068A (en) Video retrieval method based on deep learning
CN115329821A (en) Ship noise identification method based on pairing coding network and comparison learning
Chen et al. Collaborative Learning-Based Network for Weakly Supervised Remote Sensing Object Detection
Wang et al. Real-time and accurate face detection networks based on deep learning
CN111428674A (en) Multi-loss joint training method for keeping multi-metric space consistency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210202

WD01 Invention patent application deemed withdrawn after publication