CN114266735A

CN114266735A - Method for detecting pathological change abnormality of chest X-ray image

Info

Publication number: CN114266735A
Application number: CN202111484958.9A
Authority: CN
Inventors: 巫义锐; 孔其然; 袁驰
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-01

Abstract

The invention discloses a method for detecting pathological changes and abnormalities of a chest X-ray image, which comprises the steps of inputting an image to be detected into a feature extraction module to obtain a first feature map; inputting the first feature map into a context information extraction module to obtain a second feature map rich in context information; unfolding a second characteristic diagram into a one-dimensional sequence, mapping the one-dimensional sequence into an embedded sequence with set dimensionality, and adding position coding information into the embedded sequence; and inputting the sequence added with the position code into a transform network model, outputting a target frame of a lesion and a lesion type, and finishing lesion abnormality detection of the X-ray picture. The method can effectively cope with the complexity and diversity of chest X-ray lesions, has high detection accuracy, and can effectively finish the chest X-ray lesion detection.

Description

Method for detecting pathological change abnormality of chest X-ray image

Technical Field

The invention relates to a method for detecting pathological changes and abnormalities of a chest X-ray image, and belongs to the technical field of image processing.

Background

In recent years, in the medical field, X-ray images have a very important role in diagnosis. In order to make rapid and accurate automatic diagnosis possible, a great deal of research is devoted to developing an intelligent computer-aided detection system to help doctors to diagnose the pathological changes of the chest X-ray film. With the outbreak of new coronary pneumonia, the demand for X-ray diagnosis of chest diseases is greatly increased, and effective computer-assisted tools are urgently needed to reduce the burden of doctors. The identification of X-ray images is very difficult due to the overlapping of organ structures along the projection direction, combined with the diversity of chest X-ray diseases. Often requiring a highly experienced physician to make the diagnosis. The existing X-ray image detection method is difficult to cope with complex scenes and has poor accuracy.

Disclosure of Invention

The invention aims to provide a chest X-ray image lesion abnormity detection method to solve the problems that the existing X-ray image detection method is difficult to cope with complex scenes and poor in accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a method for detecting the pathological changes of a chest X-ray image, which is realized based on an abnormality detection model, wherein the abnormality detection model comprises a feature extraction module, a context information extraction module, a position encoder and a transform network model which are sequentially connected, and the method comprises the following steps:

inputting an image to be detected into a feature extraction module to obtain a first feature map;

inputting the first feature map into a context information extraction module to obtain a second feature map rich in context information;

expanding the second characteristic diagram into a one-dimensional sequence, mapping the one-dimensional sequence into an embedded sequence with a set dimension, and adding position coding information in the embedded sequence by using a position encoder;

and inputting the sequence added with the position codes into a transformer network model, and outputting a target frame of the lesion and the type of the lesion.

Further, the feature extraction module employs a ResNet network.

Further, the inputting the first feature map into a context information extraction module to obtain a second feature map rich in context information includes:

inputting the first feature graph into a context information extraction module, and adding a result output by the context information extraction module and the feature graph before input to obtain a new feature graph;

after pooling and downsampling the new feature graph, taking the new feature graph as the input of a context information extraction module;

and repeating the steps for multiple times until the characteristic diagram output by the context information extraction module is fused with multiple layers of information.

Further, the context information extraction module includes 2 standard convolution layers, a plurality of bottleneck structures including jump connections, and 1 standard convolution layer, which are connected in sequence, and each of the bottleneck structures includes 1 standard convolution layer, 1 extended convolution layer, and 1 standard convolution layer, which are connected in sequence.

Further, the sizes of convolution kernels of 3 standard convolution layers in the context information extraction module are respectively 1x1, 3x3 and 3x3, and the number of channels is respectively 128, 128 and 512; the convolution kernel sizes of the 2 standard convolution layers in each bottleneck structure are respectively 1x1 and 1x1, the channel numbers are respectively 128 and 128, the convolution kernel size of the expansion convolution layer is 3x3, the expansion rate is 2, and the channel number is 256.

Further, the adding position coding information in the embedded sequence includes:

position coding information is added using sine and cosine functions of different frequencies.

Further, the transform network model comprises a transform encoder and a transform decoder including a multi-head attention mechanism, and a multi-layer feedforward neural network.

Further, the anomaly detection model is obtained by training through the following method:

acquiring a chest lesion data set of an X-ray image, wherein the data set is divided into a plurality of lesion category sub-data sets, each sub-data set comprises a plurality of sample images under the same lesion category, each sample image comprises coordinates of a lesion and is marked with a lesion category;

inputting each sample image of each subdata set into the anomaly detection model, and outputting a prediction result;

and performing optimal matching on the prediction result and the true value by adopting a Hungarian algorithm to obtain a loss function, performing back propagation according to the loss function, performing gradient descent, and training to obtain the anomaly detection model.

Further, the loss function is:

wherein, the loss function includes classification loss function and positioning regression loss function, and the classification loss function adopts cross entropy loss:

the localization regression loss function includes IoU losses and regression losses, expressed as:

wherein L is_regIs the smooth L1 function, which is of the form:

L_iouis a GIoU function of the form:

wherein, A and B represent rectangles participating in calculation, C represents a minimum rectangular box containing both A and B, and | represents the area of the rectangular box.

Compared with the prior art, the invention has the following beneficial effects:

according to the method for detecting the pathological changes of the chest X-ray image, the transformer structure fused with the context information is used as the feature extractor, the complexity and the diversity of the chest X-ray pathological changes can be effectively dealt with, the detection accuracy is high, various different types of pathological change regions can be distinguished, and the pathological change regions can be accurately marked.

Drawings

FIG. 1 is a network structure diagram of a method for detecting abnormal lesion in chest X-ray images according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a context information extraction module;

FIG. 3 is a chest X-ray image to be examined;

FIG. 4 is a graph showing the results of detection.

Detailed Description

The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The embodiment of the invention provides a method for detecting the pathological changes and abnormalities of a chest X-ray image, which is realized based on an abnormality detection model as shown in figure 1. The anomaly detection model comprises a feature extraction module, a context information extraction module, a position encoder and a transform network model which are sequentially connected.

With reference to fig. 1, a method for detecting abnormalities of breast X-ray image lesions specifically includes the following steps:

step 1, inputting an image to be detected into a feature extraction module to obtain a first feature map;

and converting the input image into a feature map by using the ResNet as a feature extraction module and using convolution, pooling and jump connection.

Step 2, inputting the first feature map into a context information extraction module to obtain a second feature map rich in context information;

as shown in fig. 2, the context information extraction module (DCE module) includes a 1x1 standard convolutional layer, a 3x3 standard convolutional layer, a plurality of bottleneck structures with jump connection, and a 3x3 standard convolutional layer, which are connected in sequence, wherein the number of channels of the 3 standard convolutional layers is 128, and 512, respectively.

Each bottleneck structure containing jump connection comprises a standard convolution layer of 1x1, an expansion convolution layer of 3x3 and a standard convolution layer of 1x1 which are connected in sequence. The number of channels of 2 standard convolutional layers is 128 and 128 respectively, the expansion rate of the extended convolutional layer is 2, and the number of channels is 256.

The feature map is input into the DCE module, firstly, the dimension of the feature map is reduced through standard convolution of 1x1 and standard convolution of 3x3, then the feature map is subjected to bottleneck structure containing a plurality of jump connections, and then the dimension of the feature map is increased through one standard convolution of 1x 1.

Wherein, the context information is extracted by adopting an iterative fusion mode.

In an embodiment, the performing, by using a context information extraction module, the context feature extraction on the first feature map specifically includes:

step 21, sending the characteristic diagram into a DCE module, and adding the obtained result and the original characteristic diagram;

wherein the characteristic diagram output in the step 1 is marked as F₁。

Step 22, reducing the size of the characteristic diagram to be half of the original size by using the pooling operation;

wherein the size of the pooling layer is 2 x 2.

And step 23, repeating the steps 21 and 22 for a plurality of times until the characteristic diagram fuses a plurality of layers of information.

Layer I feature F is shown in FIG. 2_lAfter DCE module and a pooling downsampling operation, the l +1 layer characteristic F with smaller size is obtained_l+1It can be formulated as:

F_l＝f_l-1+F_DCE(F_l-1)

F_l+1＝F_do_wn(F_l)

wherein F_lExpressed as the first layer characteristic diagram, f_DCERepresenting a context information extraction module, f_downIndicating downsampling, here a pooling operation is used as the downsampling operation.

In this embodiment, there are 4 layers of feature maps, and finally, a feature map F is obtained₄。

Step 3, unfolding the second characteristic diagram into a one-dimensional sequence, mapping the one-dimensional sequence into an embedded sequence with set dimensionality, and adding position coding information in the embedded sequence by using a position coder;

wherein the one-dimensional sequence is mapped to a dimension d_modelThen add position-coding information using sine and cosine functions of different frequencies.

Where pos represents position, i represents dimension, d_modelRepresenting the total dimension of the embedded sequence, the dimension of the added position-coded sequence is still d_model。

And 4, inputting the sequence added with the position codes into a transform network model, and outputting a target frame of the lesion and the type of the lesion.

The transform network model comprises a transform encoder and a transform decoder which apply a multi-head attention mechanism, and a multi-layer feedforward neural network.

The multi-headed self-attentiveness mechanism may be expressed as:

MultiHead(Q,K,V)＝Concat(head₁,…,head _h)W^o

wherein concat represents the concatenation of the feature tensors. head_iThe ith single-headed attention head is shown.

Wherein the content of the first and second substances,

and is

h represents the number of heads of multi-head attention, and the dimension of X is d_model。

The multi-head attention consists of h single-head attention mechanisms, wherein the single-head self-attention mechanism is expressed as:

wherein Q, K, V are obtained by a series of matrix multiplication transformations of the input sequence X, and respectively represent a query, a key and a value. Dimension of Q is N_qThe dimensions of K and V are N_kvThe Softmax function is used to calculate the attention weight a, which is computed from the query and key:

wherein

Here, i is the index of the query and j is the index of the key value. N is a radical of_kvRepresenting the dimensions of query Q and key K. The final result is the sum of the values weighted by the attention weights, i.e. the ith behavior of the output of the single-headed attention mechanism:

and obtaining an output sequence through a plurality of transform encoders and transform decoders.

Then, the output sequence passes through two layers of feedforward neural networks, and the output dimension is N_objAnd (4) detecting the detection frame multiplied by 4 and the corresponding lesion type, and finally finishing the detection. Wherein N is_objIndicating the number of detection boxes, and 4 indicates that the dimension of the coordinates of the rectangular detection box is 4. Using relu as the activation function in between, it can be expressed as:

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

wherein W₁、W₂Respectively a parameter matrix, b₁、b₂For biasing, the dimension of input x is d_model. In the invention d_modelSet to 256, the number of heads of attention h is set to 8, N_q,N_kvIs set to d_modelThe value of/h is 32.

In the embodiment of the present invention, the anomaly detection model also needs to be trained in advance, which specifically includes:

step a, acquiring a chest lesion data set of an X-ray image, wherein the data set is divided into a plurality of lesion category sub-data sets, each sub-data set comprises a plurality of sample images under the same lesion category, each sample image comprises coordinates of a lesion and is marked with the lesion category;

b, inputting each sample image of each subdata set into the abnormal detection model, and outputting a prediction result;

and c, optimally matching the prediction result with the true value by adopting a Hungarian algorithm to obtain a loss function, carrying out reverse propagation according to the loss function, carrying out gradient descent, and training to obtain a model.

Model output N_objThe target box is used as a prediction result, which is hereinafter abbreviated as N, and is used as a prediction result set

And (4) showing.

Using the best match found using the Hungarian algorithm

Used as training:

wherein omega_NAll permutations, L, representing the set of true values y_matchIs a function that measures the difference between predicted and true values. Is defined as

Wherein the content of the first and second substances,

the format of the prediction result comprises the coordinates and confidence of the target box.

I target boxes represented as predictions belong to C_iThe probability of the category.

The optimal arrangement which is found by adopting Hungarian algorithm and enables the overall matching loss to be minimum is shown. The matching loss is:

in training, the loss function includes classification loss and localization loss.

The classification loss adopts cross entropy loss:

wherein L is_regIs the smooth L1 function, which is of the form:

L_iouis a GIoU function of the form:

wherein, A and B represent rectangular boxes participating in calculation, C represents a minimum rectangular box containing both A and B, and | represents the area of the rectangular box.

Training uses an Adam optimizer, where β₁＝0.9，β₂＝0.98，∈＝10^-9In the training process, the learning rate is continuously changed according to the following formula:

where step _ num represents the number of steps trained and warp _ steps represents the number of steps preheated, here set to 4000.

As shown in fig. 3, a chest X-ray image to be detected is given, and is input into the abnormality detection model, and the lesion region and the category of the image are obtained by the model output, as shown in fig. 4, it can be seen that the method successfully detects lung consolidation, effusion and fibrosis in the X-ray image.

Through the embodiment, the method for detecting the abnormal lesion of the chest X-ray image adopts the transformer structure fused with the context information as the feature extractor, can effectively cope with the complexity and diversity of the chest X-ray lesion, has high detection accuracy, can distinguish lesion areas of different types, and can accurately mark the lesion areas.

The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.

Claims

1. A chest X-ray image lesion abnormality detection method is characterized in that: the method is realized based on an anomaly detection model, wherein the anomaly detection model comprises a feature extraction module, a context information extraction module, a position encoder and a transform network model which are sequentially connected, and the method comprises the following steps:

2. The method of claim 1, wherein the feature extraction module employs a ResNet network.

3. The method of claim 1, wherein inputting the first feature map into a context information extraction module to obtain a second feature map rich in context information comprises:

4. The method of claim 1, wherein the context information extraction module comprises 2 standard convolutional layers, a plurality of bottleneck structures with jump connections, and 1 standard convolutional layer connected in sequence, each of the bottleneck structures comprising 1 standard convolutional layer, 1 extended convolutional layer, and 1 standard convolutional layer connected in sequence.

5. The method of claim 1, wherein the convolution kernel sizes of the 3 standard convolution layers in the context information extraction module are 1x1, 3x3 and 3x3, respectively, and the number of channels is 128, 128 and 512; the convolution kernel sizes of the 2 standard convolution layers in each bottleneck structure are respectively 1x1 and 1x1, the channel numbers are respectively 128 and 128, the convolution kernel size of the expansion convolution layer is 3x3, the expansion rate is 2, and the channel number is 256.

6. The method of claim 1, wherein adding position-coding information to the embedded sequence comprises:

7. The method of claim 1, wherein the transform network model comprises a transform encoder and a transform decoder comprising a multi-headed attention mechanism, and a multi-layered feedforward neural network.

8. The method of claim 1, wherein the anomaly detection model is trained by:

9. The method of claim 8, wherein the loss function is:

wherein L is_regIs the smooth L1 function, which is of the form:

L_iouis a GIoU function of the form: