CN114581698A - Target classification method based on space cross attention mechanism feature fusion - Google Patents

Target classification method based on space cross attention mechanism feature fusion Download PDF

Info

Publication number
CN114581698A
CN114581698A CN202210084352.4A CN202210084352A CN114581698A CN 114581698 A CN114581698 A CN 114581698A CN 202210084352 A CN202210084352 A CN 202210084352A CN 114581698 A CN114581698 A CN 114581698A
Authority
CN
China
Prior art keywords
feature
output
dimensional
features
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210084352.4A
Other languages
Chinese (zh)
Inventor
李岳阳
顾中轩
罗海驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202210084352.4A priority Critical patent/CN114581698A/en
Publication of CN114581698A publication Critical patent/CN114581698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The invention discloses a target classification method based on space cross attention mechanism feature fusion, and belongs to the technical field of computer-aided detection. The method captures important features of similar feature maps through a non-attention mechanism, and then re-calibrates another feature map through the feature of one feature map by a channel attention mechanism excitation method to achieve a fusion effect.

Description

Target classification method based on space cross attention mechanism feature fusion
Technical Field
The invention relates to a target classification method based on space cross attention mechanism feature fusion, and belongs to the technical field of computer-aided detection.
Background
With the development of image processing technology, image-based object recognition and classification techniques have been widely applied to various fields of life, including the field of computer-aided detection. For example, in the medical field, in the detection of lung nodules, a clinician comprehensively judges whether a lung nodule is early lung cancer or not according to the existence of the lung nodule, the size of the nodule, the nodule density and other factors, such as the existence of the lung nodule, the long-term smoking history, the existence of the family history of lung cancer and the like.
Low-dose helical computed tomography (LDCT) is currently the most widely used lung nodule screening modality, which can detect lung nodules of more than 3 mm. However, several hundred axial images can be generated by one LDCT, and in the stage of pulmonary nodule screening, a professional doctor needs to check each image, which takes a lot of time and is labor-intensive. To reduce the work intensity of physicians, currently, a pulmonary tumor nodule Computer Aided Detection (CAD) system is commonly used to assist radiologists in pulmonary tumor nodule detection.
When the system is used, a doctor inputs a CT image to be detected into the system, the system can quickly locate a suspected nodule area and give the probability that the candidate nodule is positive, namely, whether the suspected nodule in the CT image is a true lung nodule or not is distinguished from the shadows of other organs, blood vessels and the like. The system consists of two parts: nodule candidate detection, lung nodule identification and classification. Where the degree of perception of false positive nodule candidates by lung nodule identification and classification determines overall system performance. Therefore, improving the accuracy of the system for lung nodule identification and classification is the main direction for the development of the auxiliary lung nodule detection technology.
In recent years, with the popularization of deep learning application methods, a computer medical aided diagnosis system based on a Convolutional Neural Network (CNN) has become a research hotspot, and research on lung nodule identification has achieved some results. The CNN model is based on two-dimensional CT images to identify false positive nodules. Because the positions and the expression forms of the lung nodules are different, the lung nodule forms can be subdivided into the insulativity, the vascular adhesion, the pleural adhesion, the frosted glass and the cavitity, and the difficulty of identifying the false positives of the model is increased. And the three-dimensional image has richer semantic information than the two-dimensional image, so that false positive nodules are identified based on the three-dimensional image, and the robustness of the model can be improved.
However, in the existing process of identifying and classifying false positive nodules based on a three-dimensional image, multi-scale feature fusion is usually adopted to improve the detection accuracy, that is, multi-scale features extracted at each stage are fused, for example, after the multi-scale features are extracted by down-sampling, a pyramid sampling mode is usually adopted to upwards fuse, and finally, target classification is performed according to the fused features, but such a mode causes feature redundancy, thereby causing the classification accuracy to be reduced.
Disclosure of Invention
In order to further improve the precision of target classification in the computer aided detection technology, the invention provides a target classification method based on spatial cross attention mechanism feature fusion, which comprises the following steps:
step 1: acquiring a three-dimensional image to be classified, and setting the length, width and height of a target area in the three-dimensional image to be classified as L, W and H respectively;
step 2: performing feature extraction on a target area in an image to be classified by adopting a 3DSeNet backbone network to obtain four primary feature vectors; the 3DSeNet backbone network is composed of a plurality of SeBlock blocks, and the SeBlock blocks are obtained by adding an SE three-dimensional channel attention module in a ResBlock block;
step 3: respectively carrying out feature refinement on four primary feature vectors output by a 3DSeNet backbone network to obtain refined feature vectors;
step 4: using the feature of one feature map to calibrate another feature map by using the feature of the feature map to re-calibrate the four thinned feature vectors by a channel attention mechanism excitation method, and finally obtaining a feature vector for classification;
step 5: and classifying the three-dimensional image to be classified according to the finally obtained characteristic vector for classification.
Optionally, the SE three-dimensional channel attention module is used for inputting the feature xC×L×W×HFirst, the Squeeze operation is carried out to obtain a global feature map z based on a channelC×1×1×1And performing an Excitation operation according to the global feature map to obtain a feature map
Figure RE-GDA0003625028910000021
Using Scale operation to map the feature
Figure RE-GDA0003625028910000022
With the features x of the original inputC×L×W×HMultiplying, and finishing the function of feature recalibration to correct the features.
Optionally, the SE three-dimensional channel attention module in Step2 is based on the input feature xC×L×W×HFour preliminary feature vectors are obtained, including:
for input feature xC×L×W×HPerforming Squeeze operation to obtain the global characteristics of the characteristic diagram based on the channel, wherein the implementation method comprises the following steps:
Figure RE-GDA0003625028910000023
namely, a three-dimensional global self-adaptive average pooling method is adopted to encode the whole spatial feature on a channel into a global feature, and a feature map z is outputC×1×1×1(ii) a Wherein C represents the number of channels;
the feature map is obtained by realizing the specification operation through the following formula
Figure RE-GDA0003625028910000024
sc=σ(ω2δ(ω1z)) (3)
Where z is the output of the Squeeze operation, σ and δ are sigmoid activation functions, w1=C/R,w2=C2R, R represents a reduction factor;
s obtained by performing specification operation by Scale operationcAnd xC×L×W×HThe formula for the multiplication, Scale operation is as follows:
Figure RE-GDA0003625028910000031
wherein
Figure RE-GDA0003625028910000032
The output of the attention module of the SE three-dimensional channel is the same as the input feature map in size;
the four preliminary feature vectors are respectively marked as x1、x2、x3And x4
Optionally, Step3 includes:
from input features x2Obtaining an output x 'after passing through a multi-scale feature refining module'2Then inputting the feature x1And x'2The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output of x'1Finally x is1And x'1Fuse to obtain an output s1
Figure RE-GDA0003625028910000033
Wherein f issFor the feature refinement module, λ1,λ2Is a set of linear parameters, λ'1,λ′2Is another set of linear parameters, Up (x'2) Represents a refined feature x'2Carrying out up-sampling operation;
from input features x3Obtaining an output x 'after passing through a multi-scale feature refining module'3Then inputting the feature x2And x'3The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output of x'2Finally x is2、x′2And output characteristics s1By fusing to obtain an output s2
Figure RE-GDA0003625028910000034
Among them, DOwn(s)1) Represents the feature s fused to stage11Carry out a down-sampling operation of λ'1λ′2λ′3Is a set of linear parameters;
from input features x4Obtaining an output x 'after passing through a multi-scale feature refining module'4Then inputting the feature x3And x'4The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output of x'3Finally x is3、x′3And output characteristics s2By fusing to obtain an output s3
Figure RE-GDA0003625028910000035
From input features x4Obtaining an output x 'after passing through a multi-scale feature refining module'4(ii) a X is to be4、x′4And output characteristics s of stage33And fusing the down-sampled features to obtain an output feature s4
Figure RE-GDA0003625028910000036
Optionally, Step4 includes:
will s4And s3S is fused by a cross-attention mechanism4Is calibrated to s3Up-output fusion feature F3
F is to be3And s2F by a cross-attention fusion mechanism3Is calibrated to s2Up-out fusion feature F2
F is to be2And s1Fusion of F by cross-attention mechanism2Is calibrated to s1Up-output fusion feature F1
Optionally, said coupling s4And s3S is fused by a cross-attention mechanism4Is calibrated to s3Up-output fusion feature F3
The method comprises the following steps:
for two features s of the same channel4And s3And performing similarity recalibration of the features on the three-dimensional space:
s′3=SimAM(s3),s′4=SimAM(s4) (14)
operating by the channel attention compressed excitation method Squeeze, s4' compression into a channel-based one-dimensional vector f4Re-sum s3' multiplication to give the output F3To recalibrate s3The characteristics of (A):
f4=σ(w2δ(w1Fsq(s′4))),F3=s′3f1 (15)
wherein
Figure RE-GDA0003625028910000041
Indicating the intra-pixel linear separability, sigmod denotes the activation function.
Step5 comprises the following steps:
characteristic F1And outputting a one-dimensional classification vector through three-dimensional space pooling, wherein the length of the one-dimensional classification vector is 4379, and outputting a confidence coefficient sequence of the two classifications by the classifier.
The application also provides a lung nodule positive probability prediction method, and the method is used for obtaining a lung nodule positive probability prediction value based on suspected lung nodule data output by a CAD system.
The invention has the beneficial effects that:
the method not only effectively fuses the characteristics of the two characteristic graphs, but also is different from the method for fusing similar characteristic graphs by means of upsampling of a characteristic pyramid.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a three-dimensional representation of a segmentation of the pulmonary parenchyma provided in one embodiment of the present invention.
Fig. 2 is a schematic model structure diagram of an object classification method based on spatial cross-attention mechanism feature fusion provided in an embodiment of the present invention.
FIG. 3 is a schematic diagram of an SE three-dimensional channel attention module provided in one embodiment of the present invention.
Fig. 4 is a schematic diagram of a SeBlock provided in an embodiment of the present invention.
FIG. 5 is a schematic diagram of a feature refinement module provided in one embodiment of the present invention.
FIG. 6 is a schematic diagram of a feature fusion method provided in an embodiment of the present invention.
FIG. 7 is a schematic diagram of a cross attention feature fusion Module (CSFA) provided in an embodiment of the present invention.
FIG. 8 is a graph illustrating an example effect of a positive lung nodule prediction provided in one embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
the embodiment provides a target classification method based on spatial cross attention mechanism feature fusion, which comprises the following steps:
step 1: acquiring a three-dimensional image to be classified, and setting the length, width and height of a target area in the three-dimensional image to be classified as L, W and H respectively;
step 2: performing feature extraction on a target area in an image to be classified by adopting a 3DSeNet backbone network to obtain four primary feature vectors; the 3DSeNet backbone network is composed of a plurality of SeBlock blocks, and the SeBlock blocks are obtained by adding an SE three-dimensional channel attention module in a ResBlock block;
step 3: respectively carrying out feature refinement on four primary feature vectors output by a 3DSeNet backbone network to obtain refined feature vectors;
step 4: re-calibrating the other characteristic diagram by using the characteristics of one characteristic diagram by using the four thinned characteristic vectors in a channel attention mechanism excitation method, and finally obtaining characteristic vectors for classification;
step 5: and classifying the three-dimensional image to be classified according to the finally obtained characteristic vector for classification.
Example two:
this embodiment provides a target classification method based on spatial cross-attention mechanism feature fusion, where a CAD (Computer-aided Detection) system determines a suspected nodule region according to a CT image, and then further performs processing to give a probability that the suspected nodule region is a positive lung nodule (i.e., the suspected nodule region is classified into a true lung nodule and a false lung nodule), the method includes:
step one, obtaining suspected pulmonary nodule data:
in the actual detection process, the CAD system may directly give the data of the suspected lung nodule, and in this embodiment, the LUNA16 data set is taken as an example for description, so that the data in the data set is preprocessed to obtain the data of the suspected lung nodule, specifically, the method includes:
step 1-1: pulmonary parenchymal segmentation
In the original CT image, X-ray attenuation values are acquired in Hounsfield Units (HU). The CT value of a substance reflects its density, with higher CT values indicating higher substance densities. The HU value of the lung is around-500, so when the lung is substantially segmented, the threshold interval can be set to [ -1000,400], i.e. the value with HU greater than 400 is set to 400, and the value with HU less than-1000 is set to-1000; the HU values can then be converted to a [0,255] range using a normalization method. Fig. 1 is a three-dimensional image of a processed segmented parenchymal lung.
Step 1-2: extracting data of suspected pulmonary nodules
For each candidate nodule in the LUNA16 data set, namely a suspected lung nodule area, reading a coordinate file of each candidate nodule, and acquiring three-dimensional world coordinates of the candidate nodule
Figure RE-GDA0003625028910000061
The voxel coordinate v is calculated by the following formulavoxel
Figure RE-GDA0003625028910000062
Wherein v isorginAs the origin coordinate of the lung, dspacingIs the pixel spacing.
And (4) cutting out a cube with the same length, width and height by taking the voxel coordinate of each candidate nodule as a central point.
According to the size distribution of suspected lung nodule data blocks in the LUNA16 data set, cube blocks with side lengths of 24, 28, 32, 36 and 40 (unit: mm) in five different sizes are respectively selected and stored as files in a npy format, so that the file is convenient to use in subsequent training and testing. If the clipping range exceeds the image area, 0 is complemented.
Step 1-3: data enhancement
Analyzing the selected LUNA16 data set, and counting the number of all positive and negative samples, wherein the number of positive lung nodules (positive samples) is 1557 and accounts for 0.21% of the total number, the number of false positive lung nodules (negative samples) is 753418 and accounts for 99.79% of the total number, wherein 1 represents a positive lung nodule, and 0 represents a false positive lung nodule. Obviously, the number of the negative samples is far greater than that of the positive samples, so that the training result of the two-classification model is prone to being recognized as the negative samples, and the performance of the two-classification model is not beneficial to being effectively evaluated. To alleviate the data sample number imbalance, a data enhancement method may be used to expand the number of positive samples. In the present invention, the selected data enhancement method is as follows:
(1) rotating: the cube image is rotated 90 °, 180 °, 270 ° along the cross-section.
(2) Mirroring: the cube image is symmetrically inverted along the coronal and sagittal planes, respectively.
By the method, the data enhancement is carried out on the positive samples, and the final total number of the positive samples is 20 times of the original number of the positive samples.
It is noted that the LUNA16 dataset is a public data office that includes 888 low dose pulmonary CT image (mhd format) data, each image containing a series of multiple axial slices of the thorax. The number of slices contained in each image will vary from one scanning machine to another, from one scan layer thickness to another, and from one patient to another. The original image is a three-dimensional image. Each image contains a series of axial slices of the thorax. This three-dimensional image is composed of a different number of two-dimensional images.
Step two: modeling
By analyzing the preprocessed data set, to construct a suitable model, the following two problems need to be solved:
(1) lung nodules can be subdivided into solitary, vascular adhesion, pleural adhesion, frosty, and cavitary, depending on their location and manifestation. The lung nodules have the characteristics of small size, irregular shape and the like, and the difficulty of the model in identifying the negative samples is increased.
(2) The number of positive and negative samples is still unbalanced after data enhancement, so that the performance of the classifier needs to be adjusted in the model construction and training processes, and the perception degree of the model on the two types of samples is approximately equal.
For the above two problems, as shown in fig. 2, the model design is divided into three stages to obtain better classification, and the three stages are respectively: a backbone network feature extraction stage, a multi-scale feature fusion stage and a feature classification stage.
In the stage of extracting the features of the backbone network, a 3DSeNet model is adopted, and in the model, the features can be better extracted because a three-dimensional channel attention mechanism is adopted. The 3DSeNet model is the basic Network model, and the detailed description can refer to the description in Yang J, Jiang X, Ma X.3DSeNet:3D Spatial Attention Region Ensemble Network for Real-time 3D Hand position Estimation [ C ]// 202010 th International Conference on Information Science and Technology (ICIST).2020.
In the multi-scale feature fusion stage, feature refinement is carried out by acquiring feature maps of different stages of a backbone network, and similar feature maps are fused through a spatial cross attention fusion mechanism, so that the effect of multi-scale feature fusion is achieved.
In the feature classification stage, variable parameters are introduced into the classifier, self-adaptive change is carried out along with the proportion of positive and negative samples in the training process, and the perception effect of the samples with less number is improved through linear change, so that the method has better classification performance.
Step 2-1: backbone network feature extraction stage
A backbone network is built for feature extraction, and the backbone network adopts 3DSeNet and consists of a plurality of SeBlock blocks. In fig. 4, a schematic diagram of a SeBlock block is shown, that is, a SE three-dimensional channel attention module is added to a ResBlock (the ResBlock is a main module in a backbone network 3 dreson model), so that the feature extraction capability is enhanced.
The principle of the SE three-dimensional channel attention module is shown in FIG. 3. The method comprises the following specific steps:
(1) squeeze operation
For input feature xC×L×W×H(where C represents the number of channels, and L, W and H represent the length, width and height of the cube, respectively), first, Squeeze operation is performed to obtain the global feature of the feature map based on the channels, and the implementation method is as follows:
Figure RE-GDA0003625028910000071
namely, a three-dimensional global self-adaptive average pooling method is adopted to encode the whole spatial feature on a channel into a global feature, and a feature map z is outputC×1×1×1
(2) Excitation operation
Is realized by the following formulaThe specification operation obtains a feature map
Figure RE-GDA0003625028910000081
sc=σ(w2δ(ω1z)) (3)
Where z is the output of the Squeeze operation, σ and δ are sigmoid activation functions, w1=C/R,w2=C2R, R represents a reduction factor, which here may take the value 16.
(3) Scale operation
Through learning channel weight, S is processed by Scale operationcMultiplying the feature by the original feature map to finish the function of feature recalibration, namely correcting the features, wherein the corrected features can retain valuable features and reject the features without valuable values. The formula for the Scale operation is as follows:
Figure RE-GDA0003625028910000082
wherein
Figure RE-GDA0003625028910000083
The output of the SE three-dimensional channel attention module is the same size as the input feature map.
The SE three-dimensional attention channel mechanism module can be used as a plug-and-play module to be inserted into a feature extraction module ResBlock of the 3DResnet to be constructed into a SeBlock module, so that the effect of three-dimensional feature recalibration is achieved.
Step 2-2: multi-scale feature fusion phase
Outputting four outputs of the backbone network in the feature extraction stage: stage1, stage2, stage3, and stage4 output feature vectors for classification by a multi-scale feature fusion module as shown in FIG. 2. The pulmonary nodules have the characteristics of variable size, irregular shape, random position distribution and the like, and are not beneficial to effectively extracting features from the backbone network. In order to improve the characteristic extraction effect of the model, after the characteristics of the main network are extracted, the output of each stage is respectively subjected to multi-scale characteristic refinement treatment and then is fused with the characteristics of other stages, so that a better characteristic extraction effect is achieved. The detailed steps are as follows:
(1) feature refinement module
The present invention adopts a multi-scale feature refinement module, as shown in fig. 5, which can fuse the next level of upsampling information as input (see the dashed box) and perform feature refinement processing by using 4 branches, that is, respectively adopt 1 × 1 × 1, 3 × 3 × 3, 5 × 5 × 5, 7 × 7 × 7 three-dimensional convolution and then adopt 3 × 3 × 3 three-dimensional void convolution (the corresponding expansion coefficients are 1, 3, 5, 7, respectively).
(2) Feature fusion framework
The feature fusion method from stage1 to stage4 can be described as follows:
stage 1: first, input feature x from stage22Obtaining an output x 'after passing through a multi-scale feature refining module'2Then input feature x of stage11And x'2The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output of x'1Finally x is1And x'1Fuse to obtain an output s1. The feature fusion formula of stage1 can be expressed as:
Figure RE-GDA0003625028910000084
wherein f issFor the feature refinement module, λ1,λ2Is a set of linear parameters, λ'1,λ′2Is another set of linear parameters, Up (x'2) Represents a refined feature x'2Carrying out up-sampling operation;
stage 2: first, input feature x from stage33Obtaining an output x 'after passing through a multi-scale feature refining module'3Then input feature x of stage22And x'3The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output x'2Finally x is2、x′2And staOutput characteristic s of ge11By fusing to obtain an output s2. The feature fusion formula of stage2 can be expressed as:
Figure RE-GDA0003625028910000091
wherein f issFor the feature refinement module, Down(s)1) Represents the feature s fused to stage11Carry out a down-sampling operation of λ'3Is a linear parameter;
stage 3: first, input feature x from stage44Obtaining output x 'after passing through a multi-scale feature refinement module'4Then input feature x of stage33And x'4The up-sampling characteristics are fused and then input into a multi-scale characteristic thinning module fsTo obtain an output of x'3Finally x is3、x′3And output characteristics s of stage22And (3) down-sampling features of the input signal, fusing to obtain an output s3. The feature fusion formula of stage3 can be expressed as:
Figure RE-GDA0003625028910000092
stage 4: input feature x4Obtaining output x 'after passing through a multi-scale feature refinement module'4. X is to be4、x′4And output characteristics s of stage33And fusing the down-sampled features to obtain an output feature s4. The feature fusion formula of stage4 can be expressed as:
Figure RE-GDA0003625028910000093
(3) spatial cross-attention fusion mechanism
By analyzing the output vectors of the feature fusion framework and the fusion process, the output vectors of each layer have the same channel number and have been fused with similar features. The upsampling and connecting fusion method using the feature pyramid will undoubtedly cause feature redundancy, thereby affecting the precision. Therefore, the invention provides a novel feature fusion method, namely fusing similar feature maps based on a space cross attention mechanism.
(a) Attention mechanism SimAM
In the invention, the similarity of different vectors is calculated by adopting an attention mechanism SimAM: the attention mechanism evaluates the importance of each pixel point in the three-dimensional characteristic diagram by realizing an energy function, does not need additional parameters to derive a weight value for the characteristic diagram, and is a parameter-free three-dimensional attention module.
Suppose the input feature map is x ∈ RC×L×W×HDefining an energy function e as followst
Figure RE-GDA0003625028910000094
Wherein
Figure RE-GDA0003625028910000095
M is L × H × W, t is the target pixel, y0And ytIs a binary label (y)0Take 1, ytTaking-1), xiAnd (3) representing pixel points in the characteristic diagram, wherein the minimization formula (9) is equivalent to training the linear separability between the pixel point t and other pixel points in the same channel. In order to improve the generalization capability of the model, a regularization coefficient λ may be added to the function, resulting in the following formula:
Figure RE-GDA0003625028910000101
minimizing the energy function e in equation (10)tCan obtain wtAnd bt
Figure RE-GDA0003625028910000102
Wherein
Figure RE-GDA0003625028910000103
The mean and variance, w, of all pixels except the channel are calculatedt,btAre calculated in a single channel.
Assuming that all pixels in a single channel follow the same distribution, the mean and variance of all pixels can be calculated and reused to assess the importance of all pixels on that channel. The formula for calculating the minimum energy of the pixel point can be expressed as follows:
Figure RE-GDA0003625028910000104
wherein
Figure RE-GDA0003625028910000105
Therefore, the importance of each pixel point can be expressed as
Figure RE-GDA0003625028910000106
That is, the lower the energy is, the smaller the linear separability of other pixels is, and the higher the importance is.
Inputting the feature map, the output of the attention mechanism SimAM can be expressed as:
Figure RE-GDA0003625028910000107
(b) spatial cross-attention fusion
Multi-scale features [ s ] output by a feature fusion framework1,s2,s3,s4]And outputting the feature vectors for classification by a pairwise feature fusion method. The invention replaces the up-sampling method of feature pyramid fusion by a space cross attention mechanism, and mainly comprises the following implementation steps:
in FIG. 7, the equation s3,s4Fusion as an example to describe the cross-attention fusion mechanism (CSFA), a method to fuse two features:
s3,s4the number of channels is equal, the characteristics with similar other scales are obtained in the characteristic fusion process, and the similarity recalibration of the characteristics in a three-dimensional space is carried out on the two characteristics with the same channels through a SimAM attention mechanism:
s′3=SimAM(s3),s′4=SimAM(s4) (14)
s 'by channel attention compressive excitation method (Squeeze operation)'4Compression into a channel-based one-dimensional vector f4S 'are re-mixed'3Multiplying to obtain an output F3To recalibrate s3The characteristics of (A):
f4=σ(w2δ(w1Fsq(s′4))),F3=s′3f4 (15)
because of s3Originally has s4And with s4Recalibrate s3This method is therefore equivalent in effect to fusion. And compared with the method that similar features are fused in an up-sampling mode, the parameter quantity and feature redundancy are reduced.
Multi-scale features [ s ] output by a feature fusion framework1,s2,s3,s4]The fusion process of (2) is shown in fig. 6:
1、s4and s3By CSFA, s is4Is calibrated to s3Up-output fusion feature F3
2、F3And s2By CSFA, F3Is calibrated to s2Up-output fusion feature F2
3、F2And s1By CSFA, F2Is calibrated to s1Up-out fusion feature F1
Step 2-3: feature classification phase
In the feature classification stage, the long-tail learning classifier is adopted in the invention. When the features are input, the confidence coefficient can be adjusted in a self-adaptive mode according to the distribution situation of the features. The principle can be described as follows.
For tag [0, 1]Let the output of the model, i.e. the predicted confidence sequence, be
Figure RE-GDA0003625028910000111
The category confidence can be adjusted by means of linear transformation, and the formula is as follows:
Figure RE-GDA0003625028910000112
wherein alpha isj,βjThe correction parameters to be learned are adjusted for each class probability distribution. Then, a confidence equation is defined to combine the alignment probability and the original probability to obtain a corrected probability
Figure RE-GDA0003625028910000113
Figure RE-GDA0003625028910000114
Where σ (x) is the activation function.
And inputting the vector after the Spatial Pyramid Pooling (SPP) into a long-tail learning classifier with the input dimension of 4379 and the output dimension of 2 to obtain a confidence sequence of the two classifications.
And 3, step 3: model training
The model was trained using 4 GeForce RTX 3090 GPUs with the batch size set to 128, and 100 epochs were iterated. The optimizer selects SGD (initial learning rate is set to be 0.001, momentum factor is set to be 0.9), and cross entropy is adopted by a loss function to perform ten-fold cross validation.
And 4, step 4: multi-dimensional prediction confidence sequence weighted fusion
Because the lung nodule shape, size and texture change range is large, samples with different sizes are respectively intercepted aiming at the same suspicious lesion region, the samples are respectively trained, and finally, a confidence coefficient sequence z is output as [ z ═ z0,z1]。
Fusion size of 24 by weighted fusion3、283、323、363、403mm3Confidence sequence of 5 sizes:
Figure RE-GDA0003625028910000115
where ω is 0.2. And obtaining the confidence coefficient output by the final model, and obtaining the dimension with higher confidence coefficient through argmax, namely the result predicted by the final model.
And 5: model evaluation
The model was evaluated using the FROC standard. The independent variable of FROC is the number of false positive samples per CT (FPPS) on average and the dependent variable is sensitivity (sensitivity).
The sensitivity is calculated as:
Figure RE-GDA0003625028910000121
wherein TP indicates true positive (true positive) and FN indicates false negative (false negative).
FROC can reflect the classification performance of models at different thresholds. Seven representative points were taken from FROC: 0.125, 0.25,0.5,1,2,4,8. CPM is the average of these seven points, and can be integrated to represent the classification performance of the model:
Figure RE-GDA0003625028910000122
wherein Recallfpr=iA value representing the recall rate corresponding to a false positive rate (false positive rate).
The CPM results of the present invention are shown in Table 1:
table 1: CPM results
Figure RE-GDA0003625028910000123
Step 6: results display
Shown in fig. 8 are partial positive lung nodule predictions. A two-dimensional cross section and corresponding confidence are shown in fig. 8. As can be seen from fig. 8, (a) (b) (c) are simple samples, the model has very good recognition effect; (d) (e) (f) is a small or large nodule, and the model has good identification effect; (g) (h) and (i) are difficult samples, have the characteristics of interference information, irregular shape and the like, and the model still has a considerable recognition effect.
Table 2: uniformly taking suspected lung nodule blocks with the length, width and height of 32mm as input contrast effects:
Figure RE-GDA0003625028910000124
as shown in table 2, M2 reduced the number of layers of the backbone network compared to the method M1 of the present application, and although the highest score was achieved at 8FPPs, all other points were lower than the method M1 of the present application. M3 replaced the loss function with cross entropy, which was higher at 0.125FPPs than the method M1 of the present application, but the effect was not ideal at 4, 8FPPs, and it can be seen that GRWLoss had some adjustment effect on DisAlignLinear. M4 replaces CSFA with an FPN up-sampling connection method, the effect is not ideal compared with the method M1, and the CSFA has a certain improvement on the fusion effect of similar features. M5 and M6 adopt a traditional full connection layer, M5 adopts Focalloss as a solution for the problem of data imbalance, and M6 adopts a data equalization method and intercepts negative samples with the same number as that of positive samples as a training set.
Table 3: comparison of effects for different input sizes:
Figure RE-GDA0003625028910000131
CPM score for the model at 243To 403Shows an ascending trend but is 403To 563The interval (a) is in a downward trend. Accordingly, the present application selects 243To 4035 sizes for weighted fusion:
table 4: selecting 243To 4035 sizes of
Figure RE-GDA0003625028910000132
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A method for classifying an object based on spatial cross attention mechanism feature fusion, the method comprising:
step 1: acquiring a three-dimensional image to be classified, and setting the length, width and height of a target area in the three-dimensional image to be classified as L, W and H respectively;
step 2: performing feature extraction on a target area in an image to be classified by adopting a 3DSeNet backbone network to obtain four primary feature vectors; the 3DSeNet backbone network is composed of a plurality of SeBlock blocks, and the SeBlock blocks are obtained by adding an SE three-dimensional channel attention module in a ResBlock block;
step 3: respectively carrying out feature refinement on four primary feature vectors output by a 3DSeNet backbone network to obtain refined feature vectors;
step 4: re-calibrating the other characteristic diagram by using the characteristics of one characteristic diagram by using the four thinned characteristic vectors in a channel attention mechanism excitation method, and finally obtaining characteristic vectors for classification;
step 5: and classifying the three-dimensional image to be classified according to the finally obtained characteristic vector for classification.
2. The method of claim 1, wherein the method is performed in a batch processFeature x of the SE three-dimensional channel attention module pair inputC×L×W×HFirst, the Squeeze operation is carried out to obtain a global feature map z based on a channelC×1×1×1And performing an Excitation operation according to the global feature map to obtain a feature map
Figure FDA0003480182980000011
Using Scale operation to map the feature
Figure FDA0003480182980000012
With the features x of the original inputC×L×W×HMultiplying, and finishing the function of feature recalibration to correct the features.
3. The method of claim 2, wherein the SE three-dimensional channel attention module in Step2 is based on the input feature xC×L×W×HFour preliminary feature vectors are obtained, including:
for input feature xC×L×W×HPerforming Squeeze operation to obtain the global characteristics of the characteristic diagram based on the channel, wherein the implementation method comprises the following steps:
Figure FDA0003480182980000013
namely, a three-dimensional global self-adaptive average pooling method is adopted to encode the whole spatial feature on a channel into a global feature, and a feature map z is outputC×1×1×1(ii) a Wherein C represents the number of channels;
the feature map is obtained by realizing the specification operation through the following formula
Figure FDA0003480182980000014
sc=σ(w2δ(w1z)) (3)
Where z is the output of the Squeeze operation, σ and δ are sigmoid activation functions, w1=C/R,w2=C2R, R represents a reduction factor;
s obtained by performing specification operation by Scale operationcAnd input feature xC×L×W×HThe formula for the multiplication, Scale operation is as follows:
Figure FDA0003480182980000021
wherein
Figure FDA0003480182980000022
The output of the attention module of the SE three-dimensional channel is the same as the input feature map in size;
the four preliminary feature vectors are respectively marked as x1、x2、x3And x4
4. The method of claim 3, wherein Step3 comprises:
from input features x2Obtaining an output x 'after passing through a multi-scale feature refining module'2Then inputting the feature x1And x'2The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output of x'1Finally x is1And x'1Fusing to obtain an output s1
Figure FDA0003480182980000023
Wherein f issFor a multi-scale feature refinement module, λ1λ2Is a set of linear parameters, λ'1λ′2Is another set of linear parameters, Up (x'2) Represents a refined feature x'2Carrying out up-sampling operation;
from input features x3Obtaining an output x 'after passing through a multi-scale feature refining module'3Then inputting the feature x2And x'3Is fused and thenInput to a multi-scale feature refinement module fsTo obtain an output of x'2Finally x is2、x′2And output characteristics s1By fusing to obtain an output s2
Figure FDA0003480182980000024
Wherein, Down(s)1) Represents the feature s fused to stage11Down-sampling operation is performed, λ'1λ′2λ′3Is a set of linear parameters;
from input features x4Obtaining output x 'after passing through a multi-scale feature refinement module'4Then inputting the feature x3And x'4The up-sampling features are fused and then input into a multi-scale feature refinement module fsTo obtain an output x'3Finally x is3、x′3And output characteristics s2By fusing to obtain an output s3
Figure FDA0003480182980000025
From input features x4Obtaining an output x 'after passing through a multi-scale feature refining module'4(ii) a X is to be4、x′4And output characteristics s of stage33And fusing the down-sampled features to obtain an output feature s4
Figure FDA0003480182980000026
5. The method of claim 4, wherein Step4 comprises:
will s4And s3S is fused by a cross-attention mechanism4Is calibrated to s3Up-output fusion feature F3
F is to be3And s2Fusion of F by cross-attention mechanism3Is calibrated to s2Up-output fusion feature F2
F is to be2And s1Fusion of F by cross-attention mechanism2Is calibrated to s1Up-output fusion feature F1
6. The method of claim 5, wherein s is4And s3S is fused by a cross-attention mechanism4Is calibrated to s3Up-output fusion feature F3The method comprises the following steps:
for two features s identical in channel4And s3And performing similarity recalibration of the features on the three-dimensional space:
s′3=SimAM(s3),s′4=SimAM(s4) (14)
s 'by channel attention compressed excitation method Squeeze operation'4Compression into a channel-based one-dimensional vector f4S 'are re-mixed'3Multiplying to obtain an output F3To recalibrate s3The characteristics of (A):
f4=σ(w2δ(w1Fsq(s′4))),F3=s′3f4 (15)
wherein
Figure FDA0003480182980000031
Figure FDA0003480182980000032
Indicating the intra-pixel linear separability, sigmod denotes the activation function.
7. The method of claim 6, wherein Step5 comprises:
characteristic F1By three-dimensional spaceThe one-dimensional classification vector is output by the interval pooling, the length is 4379, and the confidence coefficient sequence of the two classifications is output by the classifier.
8. A method for predicting lung nodule positive probability, which is characterized in that the method adopts the method of any one of claims 1-7 to obtain a lung nodule positive probability prediction value based on suspected lung nodule data output by a CAD system.
CN202210084352.4A 2022-01-20 2022-01-20 Target classification method based on space cross attention mechanism feature fusion Pending CN114581698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210084352.4A CN114581698A (en) 2022-01-20 2022-01-20 Target classification method based on space cross attention mechanism feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210084352.4A CN114581698A (en) 2022-01-20 2022-01-20 Target classification method based on space cross attention mechanism feature fusion

Publications (1)

Publication Number Publication Date
CN114581698A true CN114581698A (en) 2022-06-03

Family

ID=81772515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210084352.4A Pending CN114581698A (en) 2022-01-20 2022-01-20 Target classification method based on space cross attention mechanism feature fusion

Country Status (1)

Country Link
CN (1) CN114581698A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115276784A (en) * 2022-07-26 2022-11-01 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method
CN116188392A (en) * 2022-12-30 2023-05-30 阿里巴巴(中国)有限公司 Image processing method, computer-readable storage medium, and computer terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115276784A (en) * 2022-07-26 2022-11-01 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method
CN115276784B (en) * 2022-07-26 2024-01-23 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method
CN116188392A (en) * 2022-12-30 2023-05-30 阿里巴巴(中国)有限公司 Image processing method, computer-readable storage medium, and computer terminal

Similar Documents

Publication Publication Date Title
Santos et al. Artificial intelligence, machine learning, computer-aided diagnosis, and radiomics: advances in imaging towards to precision medicine
Binczyk et al. Radiomics and artificial intelligence in lung cancer screening
Halder et al. Lung nodule detection from feature engineering to deep learning in thoracic CT images: a comprehensive review
Sun et al. Deep learning-based classification of liver cancer histopathology images using only global labels
CN107016665B (en) CT pulmonary nodule detection method based on deep convolutional neural network
Froz et al. Lung nodule classification using artificial crawlers, directional texture and support vector machine
CN110766051A (en) Lung nodule morphological classification method based on neural network
Shaukat et al. Computer-aided detection of lung nodules: a review
CN112270666A (en) Non-small cell lung cancer pathological section identification method based on deep convolutional neural network
WO2018107371A1 (en) Image searching system and method
ur Rehman et al. An appraisal of nodules detection techniques for lung cancer in CT images
Liu Stbi-yolo: A real-time object detection method for lung nodule recognition
CN111798424B (en) Medical image-based nodule detection method and device and electronic equipment
CN114581698A (en) Target classification method based on space cross attention mechanism feature fusion
US11062443B2 (en) Similarity determination apparatus, similarity determination method, and program
CN111767952A (en) Interpretable classification method for benign and malignant pulmonary nodules
CN112258461A (en) Pulmonary nodule detection method based on convolutional neural network
CN116091490A (en) Lung nodule detection method based on YOLOv4-CA-CBAM-K-means++ -SIOU
Mobiny et al. Lung cancer screening using adaptive memory-augmented recurrent networks
CN115715416A (en) Medical data inspector based on machine learning
Zhou et al. Deep learning-based breast region extraction of mammographic images combining pre-processing methods and semantic segmentation supported by Deeplab v3+
Zhang et al. LungSeek: 3D Selective Kernel residual network for pulmonary nodule diagnosis
CN117710760B (en) Method for detecting chest X-ray focus by using residual noted neural network
Tian et al. Radiomics and its clinical application: artificial intelligence and medical big data
CN114565786A (en) Tomography image classification device and method based on channel attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination