CN114782403A

CN114782403A - Pneumonia image detection method and device based on mixed space and inter-channel attention

Info

Publication number: CN114782403A
Application number: CN202210536524.7A
Authority: CN
Inventors: 庞子龙; 莫也; 马韶胤; 武戈
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-07-22

Abstract

The invention provides a pneumonia image detection method and device based on mixed space and inter-channel attention. The method comprises the following steps: step 1: carrying out data preprocessing on the lung X-ray image; step 2: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C; and 3, step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F; and 4, step 4: constructing an attention module for mixing spatial attention and inter-channel attention, and processing the characteristic diagram F by adopting the attention module to obtain a characteristic tensor X; and 5: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.

Description

Pneumonia image detection method and device based on mixed space and inter-channel attention

Technical Field

The invention relates to the technical field of medical image recognition, in particular to a pneumonia image detection method and device based on mixed space and inter-channel attention.

Background

Pneumonia is an inflammation occurring in the terminal airways, alveoli and pulmonary interstitium, and can be classified into bacterial pneumonia, viral pneumonia and the like, the etiology of the pneumonia is numerous, the morbidity of the pneumonia is high, and the pneumonia is one of the most common infectious diseases. Early diagnosis of pneumonia is critical to its successful cure. The pneumonia may be detected by X-ray imaging, pulmonary CT, Magnetic Resonance Imaging (MRI), and the like. The lung X-ray detection has the advantages of convenient process, small radiation amount, low cost and the like, and is the first choice of the current clinical detection. However, for the doctor, it is a complicated task to check the lesion information in the lung medical image through manual radiograph interpretation, the traditional radiograph interpretation method of the doctor usually consumes a lot of time and energy, the accuracy of diagnosis mainly depends on the level and work experience of the doctor, and misdiagnosis and missed diagnosis may occur due to visual fatigue, environmental disturbance and the like.

Since the 21 st century, with the development of computer science technologies, mainly image recognition and pattern recognition technologies, object detection has come from this. The main task of object detection is to identify the class of objects in the input image and their location coordinates. The types of objects that can be detected are defined by manually setting the desired object objects in the image. Because the shape, size and position of the object in each picture are different, improving the accuracy of target detection is always an urgent need for perfection. At present, the auxiliary diagnosis of medical images by artificial intelligence can reach the precision of expert level, and the artificial intelligence method applied to pneumonia diagnosis can effectively improve the diagnosis efficiency and quality, and provide help for relieving medical resource imbalance and improving the diagnosis efficiency. The detection of the lesion region of the lung X-ray image helps a doctor to make a diagnosis by automatically analyzing the lung X-ray image and outputting information such as the position and size of the lesion region. However, the lung X-ray image detection task is different from other image detection tasks, and the lung X-ray image has the characteristics of high similarity between classes and low intra-class variability, namely the characteristics of high similarity of image features of different classes and low image difference of the same class, so that the problems of model deviation and overfitting are easily caused during training of data with the features, the generalization capability of a network is reduced, the difficulty of image identification is increased, and the lung X-ray pneumonia detection effect only adopting the traditional network is unsatisfactory, and the classification precision still needs to be improved by improving the network structure.

Disclosure of Invention

In order to improve the pneumonia detection effect based on a lung X-ray image, the invention provides a pneumonia image detection method and a pneumonia image detection device based on mixed space and inter-channel attention, and the specific scheme is as follows:

the invention provides a pneumonia image detection method based on attention between a mixed space and a channel, which comprises the following steps:

step 1: carrying out data preprocessing on the lung X-ray image;

step 2: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;

and step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F;

and 4, step 4: constructing an attention module for mixing space attention and inter-channel attention, and processing the feature map F by adopting the attention module to obtain a feature tensor X;

and 5: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.

Further, step 1 specifically includes:

step 1.1: screening out unsatisfactory lung X-ray images;

step 1.2: dividing a data set consisting of all lung X-ray images meeting the requirements into a training set, a verification set and a test set;

step 1.3: converting each lung X-ray image in the data set into an RGB three-channel image;

step 1.4: carrying out image enhancement on each RGB three-channel image;

step 1.5: converting each RGB three-channel image subjected to image enhancement into a tensor image;

step 1.6: regularizing each channel of each RGB three-channel image, and then normalizing the tensor image according to a mean vector and a standard vector of three channels;

step 1.7: and converting each normalized tensor image into a gray level image.

Further, step 1.4 specifically includes:

step A1: horizontally overturning the RGB three-channel image according to the overturning probability of 0.5;

step A2: adjusting the image attribute of the reversed RGB three-channel image, specifically: setting the brightness offset amplitude to 0.5, the contrast offset amplitude to 0.5, the saturation offset amplitude to 0.5, and the hue offset amplitude to 0;

step A3: setting a random cutting area ratio to be (0.7, 1.0), randomly cutting the RGB three-channel image with the adjusted image attribute to different sizes and width-height ratios, and then scaling the size of the cut RGB three-channel image to 224 multiplied by 224 pixels;

step A4: and automatically amplifying the RGB three-channel image after each pixel is zoomed by using RandAugmentation.

Further, the first feature network adopts ResNet101 using inclusion convolution as a backbone network, and includes 5 feature extraction layers in total, which are: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;

the feature extraction process of the first feature layer comprises the following steps: firstly, carrying out convolution operation on input preprocessed lung X-ray images by adopting 64 convolution kernels with the channel number of 3, then carrying out Batchnormalization operation by adopting a BN layer, then processing by adopting a ReLu activation function, and finally inputting to a maximum pooling layer with the channel number of 64 to obtain a characteristic map C1;

the feature extraction process of the second feature layer comprises a first branch and a second branch, the two branches sequentially repeat three times of feature extraction operations on the input feature map C1 according to respective feature extraction processes, and then the last output of the two branches is subjected to preset processing operation to obtain a feature map C2; the preset processing operation specifically includes: adding the outputs of the two branches and then processing by adopting a ReLu activation function;

the feature extraction process of the third feature layer comprises a third branch and a fourth branch, the two branches sequentially perform feature extraction operations on the input feature map C2 in the third feature layer under different parameter states for four times according to respective feature extraction processes, and then execute the preset processing operation on the last output of the two branches to obtain a feature map C3;

the feature extraction process of the fourth feature layer comprises a fifth branch and a sixth branch, the two branches sequentially perform twenty-three times of feature extraction operations on the input feature map C3 in the fourth feature layer under different parameter states according to respective feature extraction processes, and then perform the preset processing operation on the last output of the two branches to obtain a feature map C4;

the feature extraction process of the fifth feature layer includes a seventh branch and an eighth branch, the two branches sequentially perform three feature extraction operations on the input feature map C4 in the fifth feature layer under different parameter states according to respective feature extraction processes, then perform the preset processing operation on the last output of the two branches to obtain a feature map C5, and use the feature map C5 as a final feature map C.

Further, the second feature extraction network adopts an FPN network;

the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; carrying out feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.

Further, the processing procedure of the feature map F by the attention module includes:

step B1: for each image set containing m lung X-ray images, each of size H₀×W₀After the lung X-ray images pass through a first feature extraction network and a second feature extraction network, corresponding feature images F are obtained, and all the feature images F of each image set form a feature tensor X;

step B2: performing 1 × 1 convolution operation on the feature tensor X, and dividing the convolution operation by the regularized transpose of the feature tensor X;

step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the scatter () function, thereby separating the feature tensor for each X-ray image of the lungs into X₁,X₂,X₃,……X_H×W；

Step B4: and carrying out global average pooling operation on the features of all positions in the feature tensor X to obtain global class-independent features g:

step B5: computing

Obtaining the maximum value of all spatial positions of each category by performing weighted combination of feature tensors on score to obtain a class-specific feature tensor a:

wherein T is one>A hyper-parameter of 0 (x),

and

respectively represent X_jAnd X_kTranspose of (m)_iA classifier parameter representing an ith class;

step B6: obtaining final f according to class specific feature tensor a and global class independent feature gⁱ：fⁱ＝g+λaⁱ(ii) a And will fⁱTo [ m, 2048, 1](ii) a Wherein f isⁱA feature vector representing the ith class;

step B7: handle fⁱIs the same as the feature tensor X in step B1, and is multiplied by the feature tensor X in step B1 to obtain a new feature tensor X.

Further, the detection process of the network classifier specifically includes:

step 5.1: using adaptive averaging pooling for the feature tensor X;

step 5.2: unfolding the feature tensor X in the step 5.1 from a first dimension by using a flatten () function, and then inputting the unfolded feature tensor X into a full connection layer to perform linear conversion to obtain X', which specifically comprises the following steps: x' ═ XA^T+ b; wherein A is^TRepresenting the transposition of A, wherein A is represented as a parameter matrix of a full connection layer, and b is a bias row vector;

step 5.3: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X ') ═ X')⁺＝max(0,X′)；

Step 5.4: inputting the feature tensor X 'output in the step 5.3 into the full connection layer again for linear conversion, and outputting a feature tensor X' with the size of [2048, 2 ]:

step 5.5: inputting the feature tensor X' subjected to linear conversion into a loss function to calculate a loss value; then, through an optimizer, a loss functionData evaluation and hyper-parameter modification training models; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function

Wherein, the first and the second end of the pipe are connected with each other,

c is the number of categories, y represents the true value of the image,

representing the predicted value of the picture.

Further, in step 5.5, the optimizer adopts an SGD optimizer.

The invention also provides a pneumonia image detection device based on mixed space and inter-channel attention, which comprises:

the preprocessing module is used for preprocessing the data of the lung X-ray image;

the first feature network construction module is used for constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;

the first characteristic network construction module is used for constructing a second characteristic network, and the second characteristic network is adopted to carry out characteristic extraction on the characteristic diagram C to obtain a characteristic diagram F;

the attention mechanism construction module is used for constructing an attention module for mixing space attention and inter-channel attention, and the attention module is adopted for processing the feature map F to obtain a feature tensor X;

and the classifier building module is used for building a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.

The invention has the beneficial effects that:

1. the invention adds a attention mechanism between a mixing space and channels into the classification process of the lung X-ray images, generates class-specific characteristics for two classes of pneumonia or pneumonia-free, and realizes the improvement of performance without any additional training burden.

2. The invention applies the attention mechanism to the characteristic fusion process, can effectively utilize information which is useful for pneumonia detection in different characteristic extraction layers, and inhibits irrelevant noise, thereby improving the detection efficiency.

Drawings

Fig. 1 is a schematic flowchart of a pneumonia image detection method based on mixed space and inter-channel attention according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a first feature network according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a second feature network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an attention module for mixing attention between a space and a channel according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be described clearly below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1, an embodiment of the present invention provides a pneumonia image detection method based on mixed space and inter-channel attention, including the following steps:

s101: carrying out data preprocessing on the lung X-ray image;

specifically, the method specifically comprises the following steps:

s1011: screening out unsatisfactory lung X-ray images;

s1012: dividing the lung X-ray image data set into a training set, a verification set and a test set;

s1013: converting the lung X-ray image into an RGB sequential three-channel image;

s1014: carrying out image enhancement on the lung X-ray image;

as an implementation, the sub-step specifically includes:

step A1: horizontally flipping the lung X-ray image according to the probability that p is 0.5;

step A2: adjusting the image attribute of the lung X-ray image, specifically: setting the brightness offset amplitude to 0.5, the contrast offset amplitude to 0.5, the saturation offset amplitude to 0.5, and the hue offset amplitude to 0;

step A3: randomly cutting the lung X-ray image into different sizes and aspect ratios by setting the random cutting area ratio to be (0.7, 1.0), and zooming the size of the cut lung X-ray image to 224 multiplied by 224 pixels;

step A4: automatic data enhancement was performed for each lung X-ray image using RandAugmentation.

S1015: converting the picture format of the lung X-ray image into a Tensor format (namely a vector format adopted in training), and normalizing, namely dividing each channel by 255;

s1016: regularizing each channel of the lung X-ray image, and normalizing a tensor image according to a mean vector and a standard vector of the three channels;

specifically, the mean vector of a given three-channel vector is set to [0, 0%]With the norm vector set to [1,1 ]]. Normalizing the tensor image according to the mean vector and the standard vector of the three channels, namely:

s1017: the lung X-ray images are converted into grey-scale images.

S102: constructing a first feature network, and performing feature extraction on the preprocessed lung X-ray image by adopting the first feature network to obtain a feature map C;

specifically, as shown in fig. 2, the first feature network adopts ResNet101 using an inclusion convolution as a backbone network, and includes 5 feature extraction layers in total, which are: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;

the inclusion convolution makes the convolution kernel of the original ResNet101 have independent spaces between different axes (dimensions), channels and layers, i.e. can have different dilation values. For each layer, the expansion values of two axes of each channel

From the trained supernet, set d is the set of two-axis expansion values for all channels in the layer, and is expressed as follows:

wherein

And

represents the x-axis and y-axis expansion values in the ith channel, d_maxIs the maximum expansion value, C^outIs the number of output channels.

The method comprises the following steps that the number of the supernets is 4, training parameters respectively correspond to second to fifth feature layers of ResNet101, each layer of the supernet is composed of a plurality of convolutions covering all possible expansion values, after the supernet is trained, the expansion value is selected for each layer according to the principle that a loss function is minimum, and the optimal expansion mode of the layer is determined. The specific method comprises the following steps: for each layer in the super-net, W is the tensor of all the original parameters, WⁱThe tensors of all the original parameters for the ith channel,

expanding the parameters of the convolution kernel for the ith channel by

Representing stacks along the output path

i∈{1,2,…,C^out}. Due to W and

independent of X, optimized via W and

desired convolution L of the sum of differences X₁Let L be₁And (4) minimizing. X is an input lung X-ray image, and since each group of X is not greatly different, the expectation of X can be replaced by a constant alpha, namely

Where 1 is the full 1 matrix and is the convolution operation. L is a radical of an alcohol₁The optimal d for each layer can be determined at a minimum. After that, by applying the optimal d of each layer as a parameter to the ResNet101, a backbone network of the ResNet101 using the inclusion convolution can be obtained.

The feature extraction process of the first feature layer comprises the following steps: firstly, carrying out convolution operation on the input preprocessed lung X-ray image by adopting 64 convolution kernels with the number of channels being 3, then carrying out Batchnormalization operation by adopting a BN layer, then processing by adopting a ReLu activation function, and finally inputting to a maximum pooling layer with the number of channels being 64 to obtain a characteristic map C1;

the feature extraction process of the second feature layer comprises a first branch and a second branch, the two branches sequentially repeat three times of feature extraction operations on the input feature map C1 according to respective feature extraction processes, and then the last output of the two branches is subjected to preset processing operation to obtain a feature map C2; the preset processing operation specifically comprises: adding the outputs of the two branches and then processing by adopting a ReLu activation function;

wherein, the feature extraction process of the first branch comprises in sequence: performing convolution operation on an input characteristic diagram by adopting 64 convolution kernels, performing Batch Normalization operation by adopting a BN layer, performing convolution operation by adopting 64 convolution kernels, performing Batch Normalization operation by adopting the BN layer, performing convolution operation by adopting 256 convolution kernels, and performing Batch Normalization operation by adopting the BN layer; the feature extraction process of the second branch comprises the following steps: carrying out convolution operation on the input feature map by adopting 256 convolution kernels;

wherein, the feature extraction process of the third branch comprises in sequence: adopting 128 convolution kernels to perform a first convolution operation on an input characteristic diagram, adopting a BN layer to perform a Batchnormalization operation, adopting 128 convolution kernels to perform a downsampling operation, adopting the BN layer to perform the Batchnormalization operation, adopting 512 convolution kernels to perform a second convolution operation, and adopting the BN layer to perform the Batchnormalization operation; the feature extraction process of the fourth branch includes: adopting 512 convolution kernels to check the input feature map for down-sampling operation;

wherein, the feature extraction process of the fifth branch comprises in sequence: adopting 256 convolution kernels to perform a first convolution operation on an input characteristic diagram, adopting a BN layer to perform a Batchnormalization operation, adopting 256 convolution kernels to perform a downsampling operation, adopting the BN layer to perform the Batchnormalization operation, adopting 1024 convolution kernels to perform a second convolution operation, and adopting the BN layer to perform the Batchnormalization operation; the feature extraction process of the sixth branch includes: adopting 1024 convolution checks to carry out downsampling on the input feature map;

the feature extraction process of the fifth feature layer comprises a seventh branch and an eighth branch, the two branches sequentially perform three times of feature extraction operations on the input feature map C4 in the fifth feature layer under different parameter states according to respective feature extraction processes, then perform the preset processing operation on the last output of the two branches to obtain a feature map C5, and use the feature map C5 as a final feature map C;

wherein, the feature extraction process of the seventh branch comprises in sequence: performing a first convolution operation on an input characteristic diagram by adopting 512 convolution kernels, performing a Batchnormalization operation by adopting a BN layer, performing a downsampling operation by adopting 512 convolution kernels, performing the Batchnormalization operation by adopting the BN layer, performing a second convolution operation by adopting 2048 convolution kernels, and performing the Batchnormalization operation by adopting the BN layer; the feature extraction process of the eighth branch includes: performing downsampling operation on the input feature map by adopting 2048 convolution kernels;

s103: constructing a second feature network, and extracting features of the feature graph C by using the second feature network to obtain a feature graph F;

specifically, the second feature network adopts an FPN network; the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; carrying out feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.

S104: constructing an attention module (HSC module) for mixing spatial attention and interchannel attention, and processing the feature map F by using the attention module to obtain a feature tensor X;

specifically, the present step includes the following substeps:

step B1: for each image set containing m lung X-ray images, each of size H₀×W₀After passing through the first feature extraction network and the second feature extraction networkObtaining a corresponding characteristic graph F, and forming a characteristic tensor X by all the characteristic graphs F of each batch of image sets;

step B2: performing 1 × 1 convolution operation on the feature tensor X, and dividing by the regularized transpose;

step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the flatten () function to separate the feature tensor of each pulmonary X-ray image into X₁,X₂,X₃,……,X_H×W；

Step B4: and (3) performing global average pooling operation on the features of all positions in the feature tensor X to obtain a global class-independent feature g:

step B5: computing

wherein T is one>A hyper-parameter of 0 (x),

and

step B6: obtaining final f according to class specific feature tensor a and global class independent feature gⁱ：fⁱ＝g+λaⁱ(ii) a And will fⁱIs expanded to [ m, 2048, 1 ]](ii) a Wherein f isⁱA feature vector representing the ith class;

step B7: handle fⁱIs expanded to the same size as the feature tensor X in step B1, and is further similar to stepThe feature tensor X in B1 is multiplied to obtain a new feature tensor X.

This step inputs the fusion feature F into the HSC module to achieve higher accuracy by taking advantage of the spatial attention of each object class. The weights of different channels are adjusted by obtaining the weights of the different channels, so that the proportion of useful information is improved, and the proportion of useless information is reduced. The weights of all pixels on one feature map are obtained, the weights of different pixels are adjusted, the proportion of effective features is improved, and the influence of background information is reduced.

S105: and constructing a network classifier, and detecting the characteristic tensor X by adopting the network classifier to obtain a detection result.

Specifically, this step includes the following substeps:

s1051: using adaptive average pooling for the feature tensor X;

s1052: unfolding the feature tensor X in the step S1051 from the first dimension by using a flatten () function, and then inputting the unfolded feature tensor X into a full connection layer to perform linear conversion to obtain X', which specifically comprises the following steps: x ═ XA^T+ b; wherein A is^TRepresenting the transposition of A, wherein A is represented as a parameter matrix of a full connection layer, and b is a bias row vector;

s1053: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X')⁺＝max(0,X′)；

S1054: the feature tensor X' output in step S1053 is input to the full connection layer again for linear conversion, and a feature tensor X ″ having a size of [2048, 2] is output:

s1055: inputting the linearly converted feature tensor X' into a loss function to calculate a loss value; then, modifying the training model through an optimizer, a loss function, data evaluation and a hyper-parameter; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function

c is the number of categories, y represents the true value of the image,

representing the predicted value of the picture.

As an implementation manner, in step S1055, the optimizer adopts an SGD optimizer. The method comprises the following specific steps:

wherein, alpha refers to the learning rate, the initial value is 0.01, and each iteration is multiplied by 0.1; y is⁽ⁱ⁾-h_θ(x⁽ⁱ⁾) Denotes a loss function, i denotes the number of cycles, j denotes a parameter number, θ_jRepresenting the jth parameter.

The pneumonia X-ray image detection method based on the attention between the mixed space and the channels provided by the invention carries out data preprocessing on the selected lung X-ray image and makes the data into a data set to enhance the image characteristics; the method comprises the steps of adjusting parameters by using processed lung X-ray images and training a convolutional neural network for feature extraction, wherein the feature extraction network adopts a ResNet101 network, realizes feature fusion among different feature images by constructing a residual block, solves the problems of gradient elimination and gradient explosion in a deep network, sequentially performs convolution operation on the lung X-ray images concentrated in training, then performs feature fusion, adds an attention mechanism between a mixed space and a channel, and finally constructs a network classifier to perform image classification diagnosis and finally obtains a prediction result.

Example 2

On the basis of the above embodiment 1, in the embodiment of the present invention, the size of the lung X-ray image is set to 3 × 224 × 224, and the batch is set to 16. The pneumonia image detection method comprises the following steps:

step S201: after passing through the first characteristic network and the second characteristic network, the size of a characteristic tensor X formed by all characteristic images F of each batch of image sets is [16,256,7,7 ];

step S202: using a1 × 1 convolution operation on the feature tensor X, and dividing by the regularized transpose;

step S203: unfolding the feature tensor X from the second dimension, thereby separating the feature tensor of each pulmonary X-ray image into X₁,X₂,X₃,……X₄₉；

Step S204: obtaining global class-independent features g by performing a global average pooling operation on the features of all positions in the feature tensor X:

step S205: calculating out

Taking the value of the first dimension; then, carrying out weighted combination operation of the feature tensors on the score to obtain the maximum value of all spatial positions of each category, and obtaining a feature tensor a with specific category:

step S206: the final f is obtained by adding class specific features and global class independent featuresⁱ：fⁱ＝g+λaⁱWhere λ is 0.1, and f isⁱThe final prediction result can be obtained by fusion by applying to each score tensor;

step S207: extension fⁱTo [16, 2048, 1]；

Step S208: handle fⁱThe feature tensor X in step S201 is multiplied by the same size expansion as the feature tensor X in step S201 to obtain a new feature tensor X;

step S209: by detecting the new feature tensor X by using the network classifier in embodiment 1, a detection result, that is, whether the input X-ray image of the lung belongs to the pneumonia image can be obtained.

Example 3

The embodiment of the invention also provides a pneumonia image detection device based on attention between the mixed space and the channels, which comprises:

the first characteristic network construction module is used for constructing a second characteristic network, and the second characteristic network is adopted to perform characteristic extraction on the characteristic diagram C to obtain a characteristic diagram F;

the attention mechanism construction module is used for constructing an attention module of attention between a mixing space and a channel, and the attention module is adopted to process the characteristic diagram F to obtain a new characteristic diagram or a new characteristic tensor X;

and the classifier building module is used for building a network classifier, and detecting the characteristic diagram by adopting the network classifier to obtain a detection result.

The pneumonia image detection device provided by the embodiment of the invention is used for realizing the method embodiment, and specific functions of the pneumonia image detection device can refer to the method embodiment, and are not described again here.

Claims

1. The pneumonia image detection method based on mixed space and inter-channel attention is characterized by comprising the following steps:

step 1: carrying out data preprocessing on the lung X-ray image;

and 3, step 3: constructing a second feature network, and extracting features of the feature map C by adopting the second feature network to obtain a feature map F;

and 4, step 4: constructing an attention module for mixing spatial attention and inter-channel attention, and processing the characteristic diagram F by adopting the attention module to obtain a characteristic tensor X;

2. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 1 is characterized in that step 1 specifically comprises:

step 1.1: screening out unsatisfactory lung X-ray images;

step 1.4: carrying out image enhancement on each RGB three-channel image;

step 1.7: and converting each normalized tensor image into a gray level image.

3. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 2 is characterized in that step 1.4 specifically comprises:

step A4: and automatically amplifying the RGB three-channel image after each pixel is zoomed by using RandAuthement.

4. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 1 is characterized in that the first feature network adopts ResNet101 using Incepton convolution as a backbone network, and comprises 5 feature extraction layers in total, which are respectively: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer;

5. The pneumonia image detection method based on mixed space and inter-channel attention according to claim 4 wherein said second feature extraction network employs an FPN network;

the feature extraction process of the FPN network comprises the following steps: reducing the number of channels of the feature map C5 from 2048 to 256 by using 1 × 1 convolution, then performing up-sampling operation to obtain a feature map with the same size as the feature map C4, recording the feature map as the feature map C5_ up, and performing weighted summation on the feature map C5_ up and the feature map C4 to obtain a feature map P; performing feature fusion on the feature map P by adopting convolution of 3 multiplied by 3 to obtain a feature map F; and (5) transversely connecting the characteristic diagram F, and increasing the number of channels to 2048.

6. The pneumonia image detection method based on mixed space and inter-channel attention of claim 1 is characterized in that the attention module processes the feature map F including:

step B1: for each image set containing m lung X-ray images, each of size H₀×W₀After the lung X-ray image passes through a first feature extraction network and a second feature extraction network, a corresponding feature image F is obtained, and all feature images F of each batch of image sets form a feature tensor X;

step B3: unfolding the feature tensor X obtained in step B2 from the second dimension using the flatten () function, thereby separating the feature tensor of each pulmonary X-ray image into X₁,X₂,X₃,……X_H×W；

step B5: computing

wherein T is one>A hyper-parameter of 0 (m) is,

and

respectively represent X_jAnd X_kTranspose of (m)_iClassifier parameters representing the ith class;

step B6: obtaining final f according to class specific feature tensor a and global class independent feature gⁱ：fⁱ＝g+λaⁱ(ii) a And will fⁱTo [ m, 2048, 1](ii) a Wherein, fⁱA feature vector representing the ith class;

7. The pneumonia image detection method based on mixed space and inter-channel attention of claim 1 is characterized in that the detection process of the network classifier specifically comprises the following steps:

step 5.1: using adaptive average pooling for the feature tensor X;

step 5.3: inputting the feature vector X' after linear conversion into a ReLU activation function, specifically: ReLU (X ') + ═ max (0, X');

step 5.4: inputting the feature tensor X 'output in the step 5.3 into the full-connection layer again for linear conversion, and outputting a feature tensor X' with the size of [2048, 2 ]:

step 5.5: inputting the feature tensor X' subjected to linear conversion into a loss function to calculate a loss value; then, modifying the training model through an optimizer, a loss function, data evaluation and a hyper-parameter; wherein the loss function adopts a binary cross entropy function, specifically a binary cross entropy function

c is the number of categories, y represents the true value of the image,

representing the predicted value of the picture.

8. The pneumonia image detection method based on mixed spatial and interchannel attention of claim 7 wherein in step 5.5 said optimizer employs an SGD optimizer.

9. Pneumonia image detection device based on mixed space and interchannel attention includes:

the attention mechanism construction module is used for constructing an attention module for mixing space attention and inter-channel attention, and the attention module is adopted for processing the characteristic diagram F to obtain a characteristic tensor X;