CN114782403A - Pneumonia image detection method and device based on mixed space and inter-channel attention - Google Patents

Pneumonia image detection method and device based on mixed space and inter-channel attention Download PDF

Info

Publication number
CN114782403A
CN114782403A CN202210536524.7A CN202210536524A CN114782403A CN 114782403 A CN114782403 A CN 114782403A CN 202210536524 A CN202210536524 A CN 202210536524A CN 114782403 A CN114782403 A CN 114782403A
Authority
CN
China
Prior art keywords
feature
image
tensor
network
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210536524.7A
Other languages
Chinese (zh)
Inventor
庞子龙
莫也
马韶胤
武戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202210536524.7A priority Critical patent/CN114782403A/en
Publication of CN114782403A publication Critical patent/CN114782403A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a pneumonia image detection method and device based on mixed space and inter-channel attention. The method comprises the following steps: step 1: carrying out data preprocessing on the lung X-ray image; step 2: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C; and 3, step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F; and 4, step 4: constructing an attention module for mixing spatial attention and inter-channel attention, and processing the characteristic diagram F by adopting the attention module to obtain a characteristic tensor X; and 5: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.

Description

Pneumonia image detection method and device based on mixed space and inter-channel attention
Technical Field
The invention relates to the technical field of medical image recognition, in particular to a pneumonia image detection method and device based on mixed space and inter-channel attention.
Background
Pneumonia is an inflammation occurring in the terminal airways, alveoli and pulmonary interstitium, and can be classified into bacterial pneumonia, viral pneumonia and the like, the etiology of the pneumonia is numerous, the morbidity of the pneumonia is high, and the pneumonia is one of the most common infectious diseases. Early diagnosis of pneumonia is critical to its successful cure. The pneumonia may be detected by X-ray imaging, pulmonary CT, Magnetic Resonance Imaging (MRI), and the like. The lung X-ray detection has the advantages of convenient process, small radiation amount, low cost and the like, and is the first choice of the current clinical detection. However, for the doctor, it is a complicated task to check the lesion information in the lung medical image through manual radiograph interpretation, the traditional radiograph interpretation method of the doctor usually consumes a lot of time and energy, the accuracy of diagnosis mainly depends on the level and work experience of the doctor, and misdiagnosis and missed diagnosis may occur due to visual fatigue, environmental disturbance and the like.
Since the 21 st century, with the development of computer science technologies, mainly image recognition and pattern recognition technologies, object detection has come from this. The main task of object detection is to identify the class of objects in the input image and their location coordinates. The types of objects that can be detected are defined by manually setting the desired object objects in the image. Because the shape, size and position of the object in each picture are different, improving the accuracy of target detection is always an urgent need for perfection. At present, the auxiliary diagnosis of medical images by artificial intelligence can reach the precision of expert level, and the artificial intelligence method applied to pneumonia diagnosis can effectively improve the diagnosis efficiency and quality, and provide help for relieving medical resource imbalance and improving the diagnosis efficiency. The detection of the lesion region of the lung X-ray image helps a doctor to make a diagnosis by automatically analyzing the lung X-ray image and outputting information such as the position and size of the lesion region. However, the lung X-ray image detection task is different from other image detection tasks, and the lung X-ray image has the characteristics of high similarity between classes and low intra-class variability, namely the characteristics of high similarity of image features of different classes and low image difference of the same class, so that the problems of model deviation and overfitting are easily caused during training of data with the features, the generalization capability of a network is reduced, the difficulty of image identification is increased, and the lung X-ray pneumonia detection effect only adopting the traditional network is unsatisfactory, and the classification precision still needs to be improved by improving the network structure.
Disclosure of Invention
In order to improve the pneumonia detection effect based on a lung X-ray image, the invention provides a pneumonia image detection method and a pneumonia image detection device based on mixed space and inter-channel attention, and the specific scheme is as follows:
the invention provides a pneumonia image detection method based on attention between a mixed space and a channel, which comprises the following steps:
step 1: carrying out data preprocessing on the lung X-ray image;
step 2: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
and step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F;
and 4, step 4: constructing an attention module for mixing space attention and inter-channel attention, and processing the feature map F by adopting the attention module to obtain a feature tensor X;
and 5: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.
Further, step 1 specifically includes:
step 1.1: screening out unsatisfactory lung X-ray images;
step 1.2: dividing a data set consisting of all lung X-ray images meeting the requirements into a training set, a verification set and a test set;
step 1.3: converting each lung X-ray image in the data set into an RGB three-channel image;
step 1.4: carrying out image enhancement on each RGB three-channel image;
step 1.5: converting each RGB three-channel image subjected to image enhancement into a tensor image;
step 1.6: regularizing each channel of each RGB three-channel image, and then normalizing the tensor image according to a mean vector and a standard vector of three channels;
step 1.7: and converting each normalized tensor image into a gray level image.
Further, step 1.4 specifically includes:
step A1: horizontally overturning the RGB three-channel image according to the overturning probability of 0.5;
step A2: adjusting the image attribute of the reversed RGB three-channel image, specifically: setting the brightness offset amplitude to 0.5, the contrast offset amplitude to 0.5, the saturation offset amplitude to 0.5, and the hue offset amplitude to 0;
step A3: setting a random cutting area ratio to be (0.7, 1.0), randomly cutting the RGB three-channel image with the adjusted image attribute to different sizes and width-height ratios, and then scaling the size of the cut RGB three-channel image to 224 multiplied by 224 pixels;
step A4: and automatically amplifying the RGB three-channel image after each pixel is zoomed by using RandAugmentation.
Further, the first feature network adopts ResNet101 using inclusion convolution as a backbone network, and includes 5 feature extraction layers in total, which are: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;
the feature extraction process of the first feature layer comprises the following steps: firstly, carrying out convolution operation on input preprocessed lung X-ray images by adopting 64 convolution kernels with the channel number of 3, then carrying out Batchnormalization operation by adopting a BN layer, then processing by adopting a ReLu activation function, and finally inputting to a maximum pooling layer with the channel number of 64 to obtain a characteristic map C1;
the feature extraction process of the second feature layer comprises a first branch and a second branch, the two branches sequentially repeat three times of feature extraction operations on the input feature map C1 according to respective feature extraction processes, and then the last output of the two branches is subjected to preset processing operation to obtain a feature map C2; the preset processing operation specifically includes: adding the outputs of the two branches and then processing by adopting a ReLu activation function;
the feature extraction process of the third feature layer comprises a third branch and a fourth branch, the two branches sequentially perform feature extraction operations on the input feature map C2 in the third feature layer under different parameter states for four times according to respective feature extraction processes, and then execute the preset processing operation on the last output of the two branches to obtain a feature map C3;
the feature extraction process of the fourth feature layer comprises a fifth branch and a sixth branch, the two branches sequentially perform twenty-three times of feature extraction operations on the input feature map C3 in the fourth feature layer under different parameter states according to respective feature extraction processes, and then perform the preset processing operation on the last output of the two branches to obtain a feature map C4;
the feature extraction process of the fifth feature layer includes a seventh branch and an eighth branch, the two branches sequentially perform three feature extraction operations on the input feature map C4 in the fifth feature layer under different parameter states according to respective feature extraction processes, then perform the preset processing operation on the last output of the two branches to obtain a feature map C5, and use the feature map C5 as a final feature map C.
Further, the second feature extraction network adopts an FPN network;
the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; carrying out feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.
Further, the processing procedure of the feature map F by the attention module includes:
step B1: for each image set containing m lung X-ray images, each of size H0×W0After the lung X-ray images pass through a first feature extraction network and a second feature extraction network, corresponding feature images F are obtained, and all the feature images F of each image set form a feature tensor X;
step B2: performing 1 × 1 convolution operation on the feature tensor X, and dividing the convolution operation by the regularized transpose of the feature tensor X;
step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the scatter () function, thereby separating the feature tensor for each X-ray image of the lungs into X1,X2,X3,……XH×W
Step B4: and carrying out global average pooling operation on the features of all positions in the feature tensor X to obtain global class-independent features g:
Figure BDA0003648489360000041
step B5: computing
Figure BDA0003648489360000042
Obtaining the maximum value of all spatial positions of each category by performing weighted combination of feature tensors on score to obtain a class-specific feature tensor a:
Figure BDA0003648489360000043
wherein T is one>A hyper-parameter of 0 (x),
Figure BDA0003648489360000044
and
Figure BDA0003648489360000045
respectively represent XjAnd XkTranspose of (m)iA classifier parameter representing an ith class;
step B6: obtaining final f according to class specific feature tensor a and global class independent feature gi:fi=g+λai(ii) a And will fiTo [ m, 2048, 1](ii) a Wherein f isiA feature vector representing the ith class;
step B7: handle fiIs the same as the feature tensor X in step B1, and is multiplied by the feature tensor X in step B1 to obtain a new feature tensor X.
Further, the detection process of the network classifier specifically includes:
step 5.1: using adaptive averaging pooling for the feature tensor X;
step 5.2: unfolding the feature tensor X in the step 5.1 from a first dimension by using a flatten () function, and then inputting the unfolded feature tensor X into a full connection layer to perform linear conversion to obtain X', which specifically comprises the following steps: x' ═ XAT+ b; wherein A isTRepresenting the transposition of A, wherein A is represented as a parameter matrix of a full connection layer, and b is a bias row vector;
step 5.3: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X ') ═ X')+=max(0,X′);
Step 5.4: inputting the feature tensor X 'output in the step 5.3 into the full connection layer again for linear conversion, and outputting a feature tensor X' with the size of [2048, 2 ]:
step 5.5: inputting the feature tensor X' subjected to linear conversion into a loss function to calculate a loss value; then, through an optimizer, a loss functionData evaluation and hyper-parameter modification training models; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function
Figure BDA0003648489360000051
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003648489360000052
c is the number of categories, y represents the true value of the image,
Figure BDA0003648489360000053
representing the predicted value of the picture.
Further, in step 5.5, the optimizer adopts an SGD optimizer.
The invention also provides a pneumonia image detection device based on mixed space and inter-channel attention, which comprises:
the preprocessing module is used for preprocessing the data of the lung X-ray image;
the first feature network construction module is used for constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
the first characteristic network construction module is used for constructing a second characteristic network, and the second characteristic network is adopted to carry out characteristic extraction on the characteristic diagram C to obtain a characteristic diagram F;
the attention mechanism construction module is used for constructing an attention module for mixing space attention and inter-channel attention, and the attention module is adopted for processing the feature map F to obtain a feature tensor X;
and the classifier building module is used for building a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.
The invention has the beneficial effects that:
1. the invention adds a attention mechanism between a mixing space and channels into the classification process of the lung X-ray images, generates class-specific characteristics for two classes of pneumonia or pneumonia-free, and realizes the improvement of performance without any additional training burden.
2. The invention applies the attention mechanism to the characteristic fusion process, can effectively utilize information which is useful for pneumonia detection in different characteristic extraction layers, and inhibits irrelevant noise, thereby improving the detection efficiency.
Drawings
Fig. 1 is a schematic flowchart of a pneumonia image detection method based on mixed space and inter-channel attention according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a first feature network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a second feature network according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an attention module for mixing attention between a space and a channel according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be described clearly below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a pneumonia image detection method based on mixed space and inter-channel attention, including the following steps:
s101: carrying out data preprocessing on the lung X-ray image;
specifically, the method specifically comprises the following steps:
s1011: screening out unsatisfactory lung X-ray images;
s1012: dividing the lung X-ray image data set into a training set, a verification set and a test set;
s1013: converting the lung X-ray image into an RGB sequential three-channel image;
s1014: carrying out image enhancement on the lung X-ray image;
as an implementation, the sub-step specifically includes:
step A1: horizontally flipping the lung X-ray image according to the probability that p is 0.5;
step A2: adjusting the image attribute of the lung X-ray image, specifically: setting the brightness offset amplitude to 0.5, the contrast offset amplitude to 0.5, the saturation offset amplitude to 0.5, and the hue offset amplitude to 0;
step A3: randomly cutting the lung X-ray image into different sizes and aspect ratios by setting the random cutting area ratio to be (0.7, 1.0), and zooming the size of the cut lung X-ray image to 224 multiplied by 224 pixels;
step A4: automatic data enhancement was performed for each lung X-ray image using RandAugmentation.
S1015: converting the picture format of the lung X-ray image into a Tensor format (namely a vector format adopted in training), and normalizing, namely dividing each channel by 255;
s1016: regularizing each channel of the lung X-ray image, and normalizing a tensor image according to a mean vector and a standard vector of the three channels;
specifically, the mean vector of a given three-channel vector is set to [0, 0%]With the norm vector set to [1,1 ]]. Normalizing the tensor image according to the mean vector and the standard vector of the three channels, namely:
Figure BDA0003648489360000071
s1017: the lung X-ray images are converted into grey-scale images.
S102: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
specifically, as shown in fig. 2, the first feature network adopts ResNet101 using an inclusion convolution as a backbone network, and includes 5 feature extraction layers in total, which are: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;
the inclusion convolution makes the convolution kernel of the original ResNet101 have independent spaces between different axes (dimensions), channels and layers, i.e. can have different dilation values. For each layer, the expansion values of two axes of each channel
Figure BDA0003648489360000072
From the trained supernet, set d is the set of two-axis expansion values for all channels in the layer, and is expressed as follows:
Figure BDA0003648489360000073
wherein
Figure BDA0003648489360000074
And
Figure BDA0003648489360000075
represents the x-axis and y-axis expansion values in the ith channel, dmaxIs the maximum expansion value, CoutIs the number of output channels.
The method comprises the following steps that the number of the supernets is 4, training parameters respectively correspond to second to fifth feature layers of ResNet101, each layer of the supernet is composed of a plurality of convolutions covering all possible expansion values, after the supernet is trained, the expansion value is selected for each layer according to the principle that a loss function is minimum, and the optimal expansion mode of the layer is determined. The specific method comprises the following steps: for each layer in the super-net, W is the tensor of all the original parameters, WiThe tensors of all the original parameters for the ith channel,
Figure BDA0003648489360000076
expanding the parameters of the convolution kernel for the ith channel by
Figure BDA0003648489360000077
Representing stacks along the output path
Figure BDA0003648489360000078
i∈{1,2,…,Cout}. Due to W and
Figure BDA0003648489360000079
independent of X, optimized via W and
Figure BDA00036484893600000710
desired convolution L of the sum of differences X1Let L be1And (4) minimizing. X is an input lung X-ray image, and since each group of X is not greatly different, the expectation of X can be replaced by a constant alpha, namely
Figure BDA0003648489360000081
Where 1 is the full 1 matrix and is the convolution operation. L is a radical of an alcohol1The optimal d for each layer can be determined at a minimum. After that, by applying the optimal d of each layer as a parameter to the ResNet101, a backbone network of the ResNet101 using the inclusion convolution can be obtained.
The feature extraction process of the first feature layer comprises the following steps: firstly, carrying out convolution operation on the input preprocessed lung X-ray image by adopting 64 convolution kernels with the number of channels being 3, then carrying out Batchnormalization operation by adopting a BN layer, then processing by adopting a ReLu activation function, and finally inputting to a maximum pooling layer with the number of channels being 64 to obtain a characteristic map C1;
the feature extraction process of the second feature layer comprises a first branch and a second branch, the two branches sequentially repeat three times of feature extraction operations on the input feature map C1 according to respective feature extraction processes, and then the last output of the two branches is subjected to preset processing operation to obtain a feature map C2; the preset processing operation specifically comprises: adding the outputs of the two branches and then processing by adopting a ReLu activation function;
wherein, the feature extraction process of the first branch comprises in sequence: performing convolution operation on an input characteristic diagram by adopting 64 convolution kernels, performing Batch Normalization operation by adopting a BN layer, performing convolution operation by adopting 64 convolution kernels, performing Batch Normalization operation by adopting the BN layer, performing convolution operation by adopting 256 convolution kernels, and performing Batch Normalization operation by adopting the BN layer; the feature extraction process of the second branch comprises the following steps: carrying out convolution operation on the input feature map by adopting 256 convolution kernels;
the feature extraction process of the third feature layer comprises a third branch and a fourth branch, the two branches sequentially perform feature extraction operations on the input feature map C2 in the third feature layer under different parameter states for four times according to respective feature extraction processes, and then execute the preset processing operation on the last output of the two branches to obtain a feature map C3;
wherein, the feature extraction process of the third branch comprises in sequence: adopting 128 convolution kernels to perform a first convolution operation on an input characteristic diagram, adopting a BN layer to perform a Batchnormalization operation, adopting 128 convolution kernels to perform a downsampling operation, adopting the BN layer to perform the Batchnormalization operation, adopting 512 convolution kernels to perform a second convolution operation, and adopting the BN layer to perform the Batchnormalization operation; the feature extraction process of the fourth branch includes: adopting 512 convolution kernels to check the input feature map for down-sampling operation;
the feature extraction process of the fourth feature layer comprises a fifth branch and a sixth branch, the two branches sequentially perform twenty-three times of feature extraction operations on the input feature map C3 in the fourth feature layer under different parameter states according to respective feature extraction processes, and then perform the preset processing operation on the last output of the two branches to obtain a feature map C4;
wherein, the feature extraction process of the fifth branch comprises in sequence: adopting 256 convolution kernels to perform a first convolution operation on an input characteristic diagram, adopting a BN layer to perform a Batchnormalization operation, adopting 256 convolution kernels to perform a downsampling operation, adopting the BN layer to perform the Batchnormalization operation, adopting 1024 convolution kernels to perform a second convolution operation, and adopting the BN layer to perform the Batchnormalization operation; the feature extraction process of the sixth branch includes: adopting 1024 convolution checks to carry out downsampling on the input feature map;
the feature extraction process of the fifth feature layer comprises a seventh branch and an eighth branch, the two branches sequentially perform three times of feature extraction operations on the input feature map C4 in the fifth feature layer under different parameter states according to respective feature extraction processes, then perform the preset processing operation on the last output of the two branches to obtain a feature map C5, and use the feature map C5 as a final feature map C;
wherein, the feature extraction process of the seventh branch comprises in sequence: performing a first convolution operation on an input characteristic diagram by adopting 512 convolution kernels, performing a Batchnormalization operation by adopting a BN layer, performing a downsampling operation by adopting 512 convolution kernels, performing the Batchnormalization operation by adopting the BN layer, performing a second convolution operation by adopting 2048 convolution kernels, and performing the Batchnormalization operation by adopting the BN layer; the feature extraction process of the eighth branch includes: performing downsampling operation on the input feature map by adopting 2048 convolution kernels;
s103: constructing a second feature network, and extracting features of the feature graph C by using the second feature network to obtain a feature graph F;
specifically, the second feature network adopts an FPN network; the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; carrying out feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.
S104: constructing an attention module (HSC module) for mixing spatial attention and interchannel attention, and processing the feature map F by using the attention module to obtain a feature tensor X;
specifically, the present step includes the following substeps:
step B1: for each image set containing m lung X-ray images, each of size H0×W0After passing through the first feature extraction network and the second feature extraction networkObtaining a corresponding characteristic graph F, and forming a characteristic tensor X by all the characteristic graphs F of each batch of image sets;
step B2: performing 1 × 1 convolution operation on the feature tensor X, and dividing by the regularized transpose;
step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the flatten () function to separate the feature tensor of each pulmonary X-ray image into X1,X2,X3,……,XH×W
Step B4: and (3) performing global average pooling operation on the features of all positions in the feature tensor X to obtain a global class-independent feature g:
Figure BDA0003648489360000101
step B5: computing
Figure BDA0003648489360000102
Obtaining the maximum value of all spatial positions of each category by performing weighted combination of feature tensors on score to obtain a class-specific feature tensor a:
Figure BDA0003648489360000103
wherein T is one>A hyper-parameter of 0 (x),
Figure BDA0003648489360000104
and
Figure BDA0003648489360000105
respectively represent XjAnd XkTranspose of (m)iA classifier parameter representing an ith class;
step B6: obtaining final f according to class specific feature tensor a and global class independent feature gi:fi=g+λai(ii) a And will fiIs expanded to [ m, 2048, 1 ]](ii) a Wherein f isiA feature vector representing the ith class;
step B7: handle fiIs expanded to the same size as the feature tensor X in step B1, and is further similar to stepThe feature tensor X in B1 is multiplied to obtain a new feature tensor X.
This step inputs the fusion feature F into the HSC module to achieve higher accuracy by taking advantage of the spatial attention of each object class. The weights of different channels are adjusted by obtaining the weights of the different channels, so that the proportion of useful information is improved, and the proportion of useless information is reduced. The weights of all pixels on one feature map are obtained, the weights of different pixels are adjusted, the proportion of effective features is improved, and the influence of background information is reduced.
S105: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.
Specifically, this step includes the following substeps:
s1051: using adaptive average pooling for the feature tensor X;
s1052: unfolding the feature tensor X in the step S1051 from the first dimension by using a flatten () function, and then inputting the unfolded feature tensor X into a full connection layer to perform linear conversion to obtain X', which specifically comprises the following steps: x ═ XAT+ b; wherein A isTRepresenting the transposition of A, wherein A is represented as a parameter matrix of a full connection layer, and b is a bias row vector;
s1053: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X')+=max(0,X′);
S1054: the feature tensor X' output in step S1053 is input to the full connection layer again for linear conversion, and a feature tensor X ″ having a size of [2048, 2] is output:
s1055: inputting the linearly converted feature tensor X' into a loss function to calculate a loss value; then, modifying the training model through an optimizer, a loss function, data evaluation and a hyper-parameter; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function
Figure BDA0003648489360000111
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003648489360000112
c is the number of categories, y represents the true value of the image,
Figure BDA0003648489360000113
representing the predicted value of the picture.
As an implementation manner, in step S1055, the optimizer adopts an SGD optimizer. The method comprises the following specific steps:
Figure BDA0003648489360000114
wherein, alpha refers to the learning rate, the initial value is 0.01, and each iteration is multiplied by 0.1; y is(i)-hθ(x(i)) Denotes a loss function, i denotes the number of cycles, j denotes a parameter number, θjRepresenting the jth parameter.
The pneumonia X-ray image detection method based on the attention between the mixed space and the channels provided by the invention carries out data preprocessing on the selected lung X-ray image and makes the data into a data set to enhance the image characteristics; the method comprises the steps of adjusting parameters by using processed lung X-ray images and training a convolutional neural network for feature extraction, wherein the feature extraction network adopts a ResNet101 network, realizes feature fusion among different feature images by constructing a residual block, solves the problems of gradient elimination and gradient explosion in a deep network, sequentially performs convolution operation on the lung X-ray images concentrated in training, then performs feature fusion, adds an attention mechanism between a mixed space and a channel, and finally constructs a network classifier to perform image classification diagnosis and finally obtains a prediction result.
Example 2
On the basis of the above embodiment 1, in the embodiment of the present invention, the size of the lung X-ray image is set to 3 × 224 × 224, and the batch is set to 16. The pneumonia image detection method comprises the following steps:
step S201: after passing through the first characteristic network and the second characteristic network, the size of a characteristic tensor X formed by all characteristic images F of each batch of image sets is [16,256,7,7 ];
step S202: using a1 × 1 convolution operation on the feature tensor X, and dividing by the regularized transpose;
step S203: unfolding the feature tensor X from the second dimension, thereby separating the feature tensor of each pulmonary X-ray image into X1,X2,X3,……X49
Step S204: obtaining global class-independent features g by performing a global average pooling operation on the features of all positions in the feature tensor X:
Figure BDA0003648489360000121
step S205: calculating out
Figure BDA0003648489360000122
Taking the value of the first dimension; then, carrying out weighted combination operation of the feature tensors on the score to obtain the maximum value of all spatial positions of each category, and obtaining a feature tensor a with specific category:
Figure BDA0003648489360000123
step S206: the final f is obtained by adding class specific features and global class independent featuresi:fi=g+λaiWhere λ is 0.1, and f isiThe final prediction result can be obtained by fusion by applying to each score tensor;
step S207: extension fiTo [16, 2048, 1];
Step S208: handle fiThe feature tensor X in step S201 is multiplied by the same size expansion as the feature tensor X in step S201 to obtain a new feature tensor X;
step S209: by detecting the new feature tensor X by using the network classifier in embodiment 1, a detection result, that is, whether the input X-ray image of the lung belongs to the pneumonia image can be obtained.
Example 3
The embodiment of the invention also provides a pneumonia image detection device based on attention between the mixed space and the channels, which comprises:
the preprocessing module is used for preprocessing the data of the lung X-ray image;
the first feature network construction module is used for constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
the first characteristic network construction module is used for constructing a second characteristic network, and the second characteristic network is adopted to perform characteristic extraction on the characteristic diagram C to obtain a characteristic diagram F;
the attention mechanism construction module is used for constructing an attention module of attention between a mixing space and a channel, and the attention module is adopted to process the characteristic diagram F to obtain a new characteristic diagram or a new characteristic tensor X;
and the classifier building module is used for building a network classifier, and detecting the characteristic diagram by adopting the network classifier to obtain a detection result.
The pneumonia image detection device provided by the embodiment of the invention is used for realizing the method embodiment, and specific functions of the pneumonia image detection device can refer to the method embodiment, and are not described again here.

Claims (9)

1. The pneumonia image detection method based on mixed space and inter-channel attention is characterized by comprising the following steps:
step 1: carrying out data preprocessing on the lung X-ray image;
step 2: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
and 3, step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F;
and 4, step 4: constructing an attention module for mixing spatial attention and inter-channel attention, and processing the characteristic diagram F by adopting the attention module to obtain a characteristic tensor X;
and 5: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.
2. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 1 is characterized in that step 1 specifically comprises:
step 1.1: screening out unsatisfactory lung X-ray images;
step 1.2: dividing a data set consisting of all lung X-ray images meeting the requirements into a training set, a verification set and a test set;
step 1.3: converting each lung X-ray image in the data set into an RGB three-channel image;
step 1.4: carrying out image enhancement on each RGB three-channel image;
step 1.5: converting each RGB three-channel image subjected to image enhancement into a tensor image;
step 1.6: regularizing each channel of each RGB three-channel image, and then normalizing the tensor image according to a mean vector and a standard vector of three channels;
step 1.7: and converting each normalized tensor image into a gray level image.
3. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 2 is characterized in that step 1.4 specifically comprises:
step A1: horizontally overturning the RGB three-channel image according to the overturning probability of 0.5;
step A2: adjusting the image attribute of the reversed RGB three-channel image, specifically: setting the brightness offset amplitude to 0.5, the contrast offset amplitude to 0.5, the saturation offset amplitude to 0.5, and the hue offset amplitude to 0;
step A3: setting a random cutting area ratio to be (0.7, 1.0), randomly cutting the RGB three-channel image with the adjusted image attribute to different sizes and width-height ratios, and then scaling the size of the cut RGB three-channel image to 224 multiplied by 224 pixels;
step A4: and automatically amplifying the RGB three-channel image after each pixel is zoomed by using RandAuthement.
4. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 1 is characterized in that the first feature network adopts ResNet101 using Incepton convolution as a backbone network, and comprises 5 feature extraction layers in total, which are respectively: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;
the feature extraction process of the first feature layer comprises the following steps: firstly, carrying out convolution operation on the input preprocessed lung X-ray image by adopting 64 convolution kernels with the number of channels being 3, then carrying out Batchnormalization operation by adopting a BN layer, then processing by adopting a ReLu activation function, and finally inputting to a maximum pooling layer with the number of channels being 64 to obtain a characteristic map C1;
the feature extraction process of the second feature layer comprises a first branch and a second branch, the two branches sequentially repeat three times of feature extraction operations on the input feature map C1 according to respective feature extraction processes, and then the last output of the two branches is subjected to preset processing operation to obtain a feature map C2; the preset processing operation specifically includes: adding the outputs of the two branches and then processing by adopting a ReLu activation function;
the feature extraction process of the third feature layer comprises a third branch and a fourth branch, the two branches sequentially perform feature extraction operations on the input feature map C2 in the third feature layer under different parameter states for four times according to respective feature extraction processes, and then execute the preset processing operation on the last output of the two branches to obtain a feature map C3;
the feature extraction process of the fourth feature layer comprises a fifth branch and a sixth branch, the two branches sequentially perform twenty-three times of feature extraction operations on the input feature map C3 in the fourth feature layer under different parameter states according to respective feature extraction processes, and then perform the preset processing operation on the last output of the two branches to obtain a feature map C4;
the feature extraction process of the fifth feature layer includes a seventh branch and an eighth branch, the two branches sequentially perform three feature extraction operations on the input feature map C4 in the fifth feature layer under different parameter states according to respective feature extraction processes, then perform the preset processing operation on the last output of the two branches to obtain a feature map C5, and use the feature map C5 as a final feature map C.
5. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 4 wherein said second feature extraction network employs an FPN network;
the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; performing feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.
6. The pneumonia image detection method based on mixed space and inter-channel attention of claim 1 is characterized in that the attention module processes the feature map F including:
step B1: for each image set containing m lung X-ray images, each of size H0×W0After the lung X-ray image passes through a first feature extraction network and a second feature extraction network, a corresponding feature image F is obtained, and all feature images F of each batch of image sets form a feature tensor X;
step B2: performing 1 × 1 convolution operation on the feature tensor X, and dividing by the regularized transpose;
step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the flatten () function, thereby separating the feature tensor of each pulmonary X-ray image into X1,X2,X3,……XH×W
Step B4: and carrying out global average pooling operation on the features of all positions in the feature tensor X to obtain global class-independent features g:
Figure FDA0003648489350000031
step B5: computing
Figure FDA0003648489350000032
Obtaining the maximum value of all spatial positions of each category by performing weighted combination of feature tensors on score to obtain a class-specific feature tensor a:
Figure FDA0003648489350000033
wherein T is one>A hyper-parameter of 0 (m) is,
Figure FDA0003648489350000034
and
Figure FDA0003648489350000035
respectively represent XjAnd XkTranspose of (m)iClassifier parameters representing the ith class;
step B6: obtaining final f according to class specific feature tensor a and global class independent feature gi:fi=g+λai(ii) a And will fiTo [ m, 2048, 1](ii) a Wherein, fiA feature vector representing the ith class;
step B7: handle fiIs the same as the feature tensor X in step B1, and is multiplied by the feature tensor X in step B1 to obtain a new feature tensor X.
7. The pneumonia image detection method based on mixed space and inter-channel attention of claim 1 is characterized in that the detection process of the network classifier specifically comprises the following steps:
step 5.1: using adaptive average pooling for the feature tensor X;
step 5.2: unfolding the feature tensor X in the step 5.1 from a first dimension by using a flatten () function, and then inputting the unfolded feature tensor X into a full connection layer to perform linear conversion to obtain X', which specifically comprises the following steps: x' ═ XAT+ b; wherein A isTRepresenting the transposition of A, wherein A is represented as a parameter matrix of a full connection layer, and b is a bias row vector;
step 5.3: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X ') + ═ max (0, X');
step 5.4: inputting the feature tensor X 'output in the step 5.3 into the full-connection layer again for linear conversion, and outputting a feature tensor X' with the size of [2048, 2 ]:
step 5.5: inputting the feature tensor X' subjected to linear conversion into a loss function to calculate a loss value; then, modifying the training model through an optimizer, a loss function, data evaluation and a hyper-parameter; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function
Figure FDA0003648489350000041
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003648489350000042
c is the number of categories, y represents the true value of the image,
Figure FDA0003648489350000043
representing the predicted value of the picture.
8. The pneumonia image detection method based on mixed spatial and interchannel attention of claim 7 wherein in step 5.5 said optimizer employs an SGD optimizer.
9. Pneumonia image detection device based on mixed space and interchannel attention includes:
the preprocessing module is used for preprocessing the data of the lung X-ray image;
the first feature network construction module is used for constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;
the first characteristic network construction module is used for constructing a second characteristic network, and the second characteristic network is adopted to perform characteristic extraction on the characteristic diagram C to obtain a characteristic diagram F;
the attention mechanism construction module is used for constructing an attention module for mixing space attention and inter-channel attention, and the attention module is adopted for processing the characteristic diagram F to obtain a characteristic tensor X;
and the classifier building module is used for building a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.
CN202210536524.7A 2022-05-17 2022-05-17 Pneumonia image detection method and device based on mixed space and inter-channel attention Pending CN114782403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536524.7A CN114782403A (en) 2022-05-17 2022-05-17 Pneumonia image detection method and device based on mixed space and inter-channel attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536524.7A CN114782403A (en) 2022-05-17 2022-05-17 Pneumonia image detection method and device based on mixed space and inter-channel attention

Publications (1)

Publication Number Publication Date
CN114782403A true CN114782403A (en) 2022-07-22

Family

ID=82437616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536524.7A Pending CN114782403A (en) 2022-05-17 2022-05-17 Pneumonia image detection method and device based on mixed space and inter-channel attention

Country Status (1)

Country Link
CN (1) CN114782403A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024045320A1 (en) * 2022-08-31 2024-03-07 北京龙智数科科技服务有限公司 Facial recognition method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024045320A1 (en) * 2022-08-31 2024-03-07 北京龙智数科科技服务有限公司 Facial recognition method and apparatus

Similar Documents

Publication Publication Date Title
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN108960143B (en) Ship detection deep learning method in high-resolution visible light remote sensing image
CN106529447B (en) Method for identifying face of thumbnail
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN109685819B (en) Three-dimensional medical image segmentation method based on feature enhancement
Lin et al. Hyperspectral image denoising via matrix factorization and deep prior regularization
CN112580782B (en) Channel-enhanced dual-attention generation countermeasure network and image generation method
CN112288011B (en) Image matching method based on self-attention deep neural network
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
WO2022083335A1 (en) Self-attention mechanism-based behavior recognition method
CN112734764A (en) Unsupervised medical image segmentation method based on countermeasure network
CN110648311A (en) Acne image focus segmentation and counting network model based on multitask learning
CN115222998B (en) Image classification method
CN111680755A (en) Medical image recognition model construction method, medical image recognition device, medical image recognition medium and medical image recognition terminal
CN111899203A (en) Real image generation method based on label graph under unsupervised training and storage medium
CN115049952A (en) Juvenile fish limb identification method based on multi-scale cascade perception deep learning network
CN115880523A (en) Image classification model, model training method and application thereof
CN116229230A (en) Vein recognition neural network model, method and system based on multi-scale transducer
CN114782403A (en) Pneumonia image detection method and device based on mixed space and inter-channel attention
CN115100165A (en) Colorectal cancer T staging method and system based on tumor region CT image
CN117115675A (en) Cross-time-phase light-weight spatial spectrum feature fusion hyperspectral change detection method, system, equipment and medium
CN117036948A (en) Sensitized plant identification method based on attention mechanism
CN116189160A (en) Infrared dim target detection method based on local contrast mechanism
CN109829377A (en) A kind of pedestrian's recognition methods again based on depth cosine metric learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination