CN111507410B - Construction method of rolling capsule layer and classification method and device of multi-view images - Google Patents

Construction method of rolling capsule layer and classification method and device of multi-view images Download PDF

Info

Publication number
CN111507410B
CN111507410B CN202010309310.7A CN202010309310A CN111507410B CN 111507410 B CN111507410 B CN 111507410B CN 202010309310 A CN202010309310 A CN 202010309310A CN 111507410 B CN111507410 B CN 111507410B
Authority
CN
China
Prior art keywords
capsule
layer
input
output layer
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010309310.7A
Other languages
Chinese (zh)
Other versions
CN111507410A (en
Inventor
宁欣
李卫军
田伟娟
孙琳钧
李爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Shangyi Health Technology Beijing Co ltd
Original Assignee
Institute of Semiconductors of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Semiconductors of CAS filed Critical Institute of Semiconductors of CAS
Priority to CN202010309310.7A priority Critical patent/CN111507410B/en
Publication of CN111507410A publication Critical patent/CN111507410A/en
Application granted granted Critical
Publication of CN111507410B publication Critical patent/CN111507410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application provides a construction method of a convolution capsule layer, the convolution capsule layer at least comprises an input layer and an output layer, the input layer and the output layer are provided with a plurality of capsules, and the method comprises the following steps: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer. In addition, a construction device of the convolution capsule layer, a multi-view image classification method and device and an electronic device are also provided.

Description

Construction method of rolling capsule layer and classification method and device of multi-view images
Technical Field
The application relates to the technical field of pattern recognition, in particular to a construction method of a rolling capsule layer and a classification method and device of multi-view images.
Background
Convolutional Neural Networks (CNNs) have made a breakthrough in many computer vision tasks in recent years and are significantly superior to many traditional tactical feature-driven models. Two common topics for improving CNN performance are increasing the depth and width of the network (e.g., the number of levels of the network and the number of cells per level), and using as much training data as possible. Although CNN has been successful, it also has many limitations, such as invariance caused by merging and an inability to understand the spatial relationship between elements, and to address these limitations, a dynamic routing based CapsNet network has been proposed, comprising only one convolutional layer and one fully connected capsule layer, which has shown comparable results to CNN in several standard datasets. In addition to dynamic routing, the use of EM routing to represent the matrix capsule of each entity by a gesture matrix has many extensions, such as data enhancement using mixed hit and miss layers. Attempts by existing algorithms to create a depth CapsNet by simply stacking fully connected capsule layers will result in an architecture similar to the MLP model, but with some limitations. First, dynamic routing used in capsule networks is an extremely computationally expensive process, and having multiple routing layers results in increased training and reasoning times. Second, it has recently been shown that stacking fully connected capsule layers together can lead to poor learning of the middle layer. This is because when there are too many capsules, the coupling coefficient tends to be too small, thereby attenuating gradient flow and inhibiting learning. Third, it has been shown that, particularly in the lower layers, the relevant cells tend to concentrate in local areas. Although local routing can make explicit use of this observation, such local routing cannot be implemented in fully connected capsules.
Disclosure of Invention
Technical problem to be solved
The application provides a construction method of a rolling capsule layer and a classification method and device of multi-view images, which at least solve the technical problems.
(II) technical scheme
In a first aspect, the present application provides a method of constructing a layer of convoluted capsules, the convoluted capsule layer comprising at least an input layer having a plurality of capsules and an output layer, the method comprising: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
Optionally, the method for constructing the self-attention route includes: will predict vector [ wj,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule.
Optionally, the assignment probability calculation process is as follows:
obtaining attention value head of prediction vectorhWherein, in the step (A),
Figure BDA0002456241570000021
x is a query vector
Figure BDA0002456241570000022
Y is a key value vector
Figure BDA0002456241570000023
Z is a vector of values
Figure BDA0002456241570000024
Taking the attention value as a weight coefficient from the input layer capsule to the output layer capsule;
splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj
Optionally, the input of the capsule of the output layer is calculated by:
Figure BDA0002456241570000025
wherein s isjIs the input of the capsules of the output layer,cijthe probability value of an input layer capsule to an output layer capsule,
Figure BDA0002456241570000026
for the prediction vector, i is the ith capsule of the input layer and j is the jth capsule of the output layer.
Optionally, the calculation formula of the output of the capsule of the output layer is:
Figure BDA0002456241570000031
wherein v isjIs the output vector of the jth capsule of the output layer.
In a second aspect, the present application provides a method for classifying multi-view images based on the above rolling capsule layer, including: inputting the image into a convolutional neural network to obtain a main characteristic image; and inputting the main characteristic image into two convolution capsule layers to obtain a classification result of the multi-view image.
Optionally, the convolutional neural network comprises an input layer, a plurality of convolutional layers, a ReLU layer for making part of neuron outputs 0 to cause sparsity, and a max-firing layer for compressing the feature image to obtain a main feature image.
In a third aspect, the present application provides an apparatus for constructing a layer of convoluted capsules, the layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, the apparatus comprising: the inner product module is used for carrying out inner product on the Gabor filter and the convolution kernel so as to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; the convolution module is used for convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; the building module is used for building a white attention route so as to obtain the distribution probability of the capsules of the input layer to the capsules of the output layer; an obtaining module for obtaining an input of a capsule of the output layer according to the distribution probability; and the activation module is used for activating the input of the capsules of the output layer by the Squash activation function to obtain the output of the capsules of the output layer.
In a fourth aspect, the present application provides a device for classifying multi-view images, comprising: the first input module is used for inputting the image into the convolutional neural network to obtain a main characteristic image; and the second input module is used for inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-view image.
In a fifth aspect, the present application provides an electronic device, comprising: a processor; and a memory having computer readable instructions stored thereon, which when executed by the processor, cause the processor to perform the above-described method.
(III) advantageous effects
The application provides a building method of a rolling capsule layer and a classification method and device of a multi-view image, the traditional capsule building method based on common convolution is replaced by a 3d convolution method based on gabor convolution, the complexity of an algorithm can be greatly reduced, the building of a deep capsule network is realized, the modulation of a gabor filter for convolution can be used for guiding the learning of convolution characteristics, and finally the deep capsule network based on sausage coverage learning is built, so that the problems of gradient disappearance caused by deep stacking and excessive coupling of capsules in the traditional capsule network building process are solved. The method can be used for multi-view image classification, such as image retrieval, intelligent monitoring, intelligent transportation, monitoring security and the like.
Drawings
FIG. 1 schematically illustrates a step diagram of a method of building a rolled layer of capsules according to an embodiment of the disclosure;
FIG. 2 schematically illustrates a flow chart of a method of building a rolled layer of capsules according to an embodiment of the disclosure;
fig. 3 schematically shows a step diagram of a classification method of multi-view images according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a build apparatus that rolls a layer of capsules according to an embodiment of the disclosure;
fig. 5 schematically shows a block diagram of a classification apparatus of a multi-view image according to an embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described below with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
Embodiments of the present disclosure provide a method of constructing a layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, as shown in fig. 1, the method comprising: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
The facial image attribute editing method in the present disclosure will be described in detail below with reference to the accompanying drawings. The input layer of the rolled capsule layer of the disclosed embodiments includes a plurality of capsules and the output layer also includes a plurality of capsules.
S1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
as shown in FIG. 2, firstly, a gabor filter with 4 directions can be initialized, the parameters are fixed, secondly, a convolution kernel with fixed size and learnable parameters is initialized, and the two are subjected to inner product to obtain a convolution gabor filter wijWherein i is inputThe ith capsule of the layer, j being the jth capsule of the output layer.
S2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
convolving the above-mentioned convolved gabor filter with the i-th input feature map uiPerforming convolution operation to obtain a prediction vector
Figure BDA0002456241570000061
The calculation formula is as follows:
Figure BDA0002456241570000062
s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer;
the self-attention route can be constructed by using a prediction vector [ wj,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule.
Obtaining attention value head of prediction vectorhWherein, in the step (A),
Figure BDA0002456241570000063
x is a query vector
Figure BDA0002456241570000064
Y is a key value vector
Figure BDA0002456241570000065
Z is a vector of values
Figure BDA0002456241570000066
X, Y and Z can be obtained by linear mapping of parameter matrix, firstly, the similarity between query vector X and key value vector Y is calculated in the form of inner product, then the scale factor
Figure BDA0002456241570000067
Is to adjust to avoid excessive inner product values, dim is the dimension of the query vector and the key-value vector.
Using the above attention value as the weight coefficient head from the input layer capsule to the output layer capsuleh
Splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj. Specifically, the distribution probability from the capsule of the input layer to the capsule of the output layer can be obtained by splicing the weight coefficients corresponding to the attention heads
S4, obtaining the input of the capsule of the output layer according to the distribution probability;
the formula for the input of the capsule of the output layer is:
Figure BDA0002456241570000071
wherein s isjInput of capsules as output layer, cijThe probability value of an input layer capsule to an output layer capsule,
Figure BDA0002456241570000072
for the prediction vector, i is the ith capsule of the input layer and j is the jth capsule of the output layer.
And S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
The formula for the output of the capsules of the output layer is:
Figure BDA0002456241570000073
wherein v isjIs the output vector of the jth capsule of the output layer.
The method is based on the self-attention mechanism and the intensive study of gabor convolution, a novel method for constructing a gabor convolution capsule based on a self-attention route is provided, and on the basis, a convolution capsule network of the attention route is constructed, so that parameter decrement and local routing can be realized, and an application mechanism of a dynamic routing mechanism in a convolution neural network is realized to construct a deeper network structure. The method solves the problems of gradient disappearance caused by deep stacking and excessive coupling of capsules in the traditional capsule network construction process, and ensures the accuracy of feature representation in multi-view image classification.
The present disclosure further discloses a method for classifying multi-view images based on the above rolling capsule layer, as shown in fig. 3, the method includes:
s31, inputting the image into a convolutional neural network to obtain a main characteristic image;
the convolutional neural network includes an input layer, a plurality of convolutional layers, a ReLU layer, and a max-firing layer. Converting the multi-view image X into (X)1,x2,……xm) And inputting an input layer, wherein the ReLU layer is used for enabling partial neuron output to be 0, sparseness is caused, and the max-posing layer is used for compressing the characteristic image to obtain a main characteristic image.
And S32, inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-perspective image.
Based on the same inventive concept, the embodiment of the present disclosure further provides a device for constructing a convolution capsule layer, and the following introduces a facial image attribute editing device according to the embodiment of the present disclosure with reference to fig. 4.
Fig. 4 schematically illustrates a block diagram of a build apparatus 400 for rolling a layer of capsules, in accordance with an embodiment of the disclosure.
As shown in fig. 4, the building apparatus 400 for rolling a layer of capsules includes an inner product module 410, a convolution module 420, a building module 430, an obtaining module 440, and an activation module 450. The build device 400 may perform the various methods described above with reference to fig. 1 and 2.
The convolutional capsule layer includes at least an input layer having a plurality of capsules and an output layer, the apparatus comprising:
the inner product module 410 performs, for example, operation S1 described with reference to fig. 1 above, for inner-product the Gabor filter with the convolution kernel to obtain a convolved Gabor filter of input layer capsules to output layer capsules;
the convolution module 420 performs, for example, operation S2 described with reference to fig. 1 above, for convolving the convolved gabor filter with the convolved feature map input by the input layer to obtain a prediction vector;
the building module 430 performs, for example, operation S3 described with reference to fig. 1 above, for building a self-attention route to obtain an assignment probability of a capsule of the input layer to a capsule of the output layer;
the obtaining module 440 performs, for example, operation S4 described with reference to fig. 1 above, for obtaining an input of a capsule of the output layer according to the assigned probability;
the activation module 450 performs, for example, operation S5 described with reference to fig. 1 above, for activating the input of the capsules of the output layer via the Squash activation function to obtain the output of the capsules of the output layer.
The embodiment of the present disclosure further provides a device for classifying multi-view images, and the device 500 for classifying multi-view images according to the embodiment of the present disclosure is described below with reference to fig. 5.
Fig. 5 schematically shows a block diagram of a classification apparatus 500 of a multi-view image according to an embodiment of the present disclosure.
As shown in fig. 5, the apparatus 500 for classifying multi-view images includes a first input module 510 and a second input module 520. The apparatus 500 for classifying a multi-view image may perform various methods described above with reference to fig. 3.
The first input module 510 performs, for example, operation S31 described with reference to fig. 3 above, for inputting the image into a convolutional neural network to obtain a main feature image;
the second input module 520 performs, for example, operation S32 described with reference to fig. 3 above, for inputting the main feature image into two layers of the convolution capsule layer to obtain a classification result of the multi-perspective image.
Fig. 6 schematically shows a block diagram of an electronic device adapted to implement the methods of the present disclosure, in accordance with an embodiment of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The computer system 600 may perform a method according to an embodiment of the disclosure.
In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. The processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
Computer-readable storage medium 620 may be, for example, any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The computer-readable storage medium 620 may include a computer program 621, which computer program 621 may include code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 610.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method of constructing a layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, the method comprising:
s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; the construction method of the self-attention route comprises the following steps: the prediction vector [ w ]j,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the corresponding dimensionWherein, wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule; the calculation process of the distribution probability comprises the following steps: obtaining attention value head of the prediction vectorhWherein, in the step (A),
Figure FDA0002833912780000011
x is a query vector
Figure FDA0002833912780000012
Y is a key value vector
Figure FDA0002833912780000013
Z is a vector of values
Figure FDA0002833912780000014
Figure FDA0002833912780000015
Is a prediction vector; taking the attention value as a weight coefficient of an input layer capsule to an output layer capsule; splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj
S4, obtaining the input of the capsule of the output layer according to the distribution probability;
and S5, activating the input of the capsule of the output layer through a Squash activation function to obtain the output of the capsule of the output layer.
2. The building method according to claim 1, wherein the input of the capsule of the output layer is calculated by the formula:
Figure FDA0002833912780000021
wherein s isjInput of capsules as output layer, cijThe probability value of an input layer capsule to an output layer capsule,
Figure FDA0002833912780000022
for the prediction vector, i is the ith capsule of the input layer and j is the jth capsule of the output layer.
3. The building method according to claim 2, wherein the output of the capsule of the output layer is calculated by:
Figure FDA0002833912780000023
wherein v isjIs the output vector of the jth capsule of the output layer.
4. A method for classifying multi-view images based on the method for constructing a rolling capsule layer according to any one of claims 1 to 3, comprising:
inputting the image into a convolutional neural network to obtain a main characteristic image;
and inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of multi-view images.
5. The method according to claim 4, wherein the convolutional neural network comprises an input layer, a plurality of convolutional layers, a ReLU layer for making a partial neuron output 0, resulting in sparsity, and a max-posing layer for compressing the feature image to obtain a main feature image.
6. An apparatus for constructing a convolutional capsule layer comprising at least an input layer having a plurality of capsules and an output layer, the apparatus comprising:
the inner product module is used for carrying out inner product on the Gabor filter and the convolution kernel so as to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
the convolution module is used for convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
a construction module, configured to construct a self-attention route to obtain an assignment probability of a capsule of the input layer to a capsule of the output layer, wherein the construction method of the self-attention route includes: the prediction vector [ w ]j,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule; the calculation process of the distribution probability comprises the following steps: obtaining attention value head of the prediction vectorhWherein, in the step (A),
Figure FDA0002833912780000031
x is a query vector
Figure FDA0002833912780000032
Y is a key value vector
Figure FDA0002833912780000033
Z is a vector of values
Figure FDA0002833912780000034
Figure FDA0002833912780000035
Is a prediction vector; taking the attention value as a weight coefficient of an input layer capsule to an output layer capsule; splicing the weight coefficients to obtain the input layerProbability value c of capsule to output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj
An obtaining module for obtaining an input of a capsule of an output layer according to the distribution probability;
and the activation module is used for activating the input of the capsule of the output layer by a Squash activation function to obtain the output of the capsule of the output layer.
7. A multi-view image classification device based on the method for constructing a rolling capsule layer according to any one of claims 1 to 3, comprising:
the first input module is used for inputting the image into the convolutional neural network to obtain a main characteristic image;
and the second input module is used for inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-view image.
8. An electronic device, comprising:
a processor; and
a memory having computer-readable instructions stored thereon that, when executed by the processor, cause the processor to perform the method of any of claims 1-5.
CN202010309310.7A 2020-04-17 2020-04-17 Construction method of rolling capsule layer and classification method and device of multi-view images Active CN111507410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010309310.7A CN111507410B (en) 2020-04-17 2020-04-17 Construction method of rolling capsule layer and classification method and device of multi-view images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010309310.7A CN111507410B (en) 2020-04-17 2020-04-17 Construction method of rolling capsule layer and classification method and device of multi-view images

Publications (2)

Publication Number Publication Date
CN111507410A CN111507410A (en) 2020-08-07
CN111507410B true CN111507410B (en) 2021-02-12

Family

ID=71869444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010309310.7A Active CN111507410B (en) 2020-04-17 2020-04-17 Construction method of rolling capsule layer and classification method and device of multi-view images

Country Status (1)

Country Link
CN (1) CN111507410B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205137B (en) * 2021-04-30 2023-06-20 中国人民大学 Image recognition method and system based on capsule parameter optimization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103456014A (en) * 2013-09-04 2013-12-18 西北工业大学 Scene matching suitability analyzing method based on multiple-feature integrating visual attention model
CN106097335A (en) * 2016-06-08 2016-11-09 安翰光电技术(武汉)有限公司 Digestive tract focus image identification system and recognition methods
CN107909059A (en) * 2017-11-30 2018-04-13 中南大学 It is a kind of towards cooperateing with complicated City scenarios the traffic mark board of bionical vision to detect and recognition methods
CN109063724A (en) * 2018-06-12 2018-12-21 中国科学院深圳先进技术研究院 A kind of enhanced production confrontation network and target sample recognition methods

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100046816A1 (en) * 2008-08-19 2010-02-25 Igual-Munoz Laura Method for automatic classification of in vivo images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103456014A (en) * 2013-09-04 2013-12-18 西北工业大学 Scene matching suitability analyzing method based on multiple-feature integrating visual attention model
CN106097335A (en) * 2016-06-08 2016-11-09 安翰光电技术(武汉)有限公司 Digestive tract focus image identification system and recognition methods
CN107909059A (en) * 2017-11-30 2018-04-13 中南大学 It is a kind of towards cooperateing with complicated City scenarios the traffic mark board of bionical vision to detect and recognition methods
CN109063724A (en) * 2018-06-12 2018-12-21 中国科学院深圳先进技术研究院 A kind of enhanced production confrontation network and target sample recognition methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BDARS_CapsNet: Bi-Directional Attention Routing Sausage Capsule Network;Xin Ning et al.;《IEEE Access》;20200323;59059-59068 *

Also Published As

Publication number Publication date
CN111507410A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN110175671B (en) Neural network construction method, image processing method and device
US20220108546A1 (en) Object detection method and apparatus, and computer storage medium
CN110188795B (en) Image classification method, data processing method and device
WO2019228317A1 (en) Face recognition method and device, and computer readable medium
US10037457B2 (en) Methods and systems for verifying face images based on canonical images
CN111914997B (en) Method for training neural network, image processing method and device
US20230153615A1 (en) Neural network distillation method and apparatus
CN111192270A (en) Point cloud semantic segmentation method based on point global context reasoning
WO2022052601A1 (en) Neural network model training method, and image processing method and device
CN113326930B (en) Data processing method, neural network training method, related device and equipment
US20220157046A1 (en) Image Classification Method And Apparatus
CN113065645B (en) Twin attention network, image processing method and device
CN110222718B (en) Image processing method and device
CN111695673B (en) Method for training neural network predictor, image processing method and device
WO2021018245A1 (en) Image classification method and apparatus
CN112215332A (en) Searching method of neural network structure, image processing method and device
EP3965071A2 (en) Method and apparatus for pose identification
CN111797970B (en) Method and device for training neural network
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
WO2023280113A1 (en) Data processing method, training method for neural network model, and apparatus
US20220222934A1 (en) Neural network construction method and apparatus, and image processing method and apparatus
CN111507410B (en) Construction method of rolling capsule layer and classification method and device of multi-view images
CN116888605A (en) Operation method, training method and device of neural network model
CN113065575A (en) Image processing method and related device
CN110826726B (en) Target processing method, target processing device, target processing apparatus, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230117

Address after: Room 302, Floor 3, Building 20, No. 2, Jingyuan North Street, Daxing Economic and Technological Development Zone, Beijing, 100176 (Yizhuang Cluster, High-end Industrial Zone, Beijing Pilot Free Trade Zone)

Patentee after: Zhongke Shangyi Health Technology (Beijing) Co.,Ltd.

Address before: 100083 No. 35, Qinghua East Road, Beijing, Haidian District

Patentee before: INSTITUTE OF SEMICONDUCTORS, CHINESE ACADEMY OF SCIENCES