CN117522892A

CN117522892A - Lung adenocarcinoma CT image focus segmentation method based on space channel attention enhancement

Info

Publication number: CN117522892A
Application number: CN202311524395.0A
Authority: CN
Inventors: 潘细朋; 俸思洋; 王子民; 刘振丙; 刘再毅; 陈明威; 邓华虎
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-06

Abstract

The invention relates to a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement, which comprises the following steps: acquiring a CT image of a lung to be detected; inputting the lung CT image to be detected into a preset segmentation model to obtain a focus segmentation mask image; and extracting influence features based on the focus segmentation mask map and quantifying to obtain tumor image feature information in the CT image of the lung to be detected, wherein the segmentation model is obtained based on training of a training set, the training set comprises a plurality of CT images of the lung of the patient, and the segmentation model is constructed by adopting a segmentation network EfficientUNet++ with enhanced concentration of a scSE spatial channel. The invention can reduce the time cost and the labor cost existing in the early lung adenocarcinoma CT image focus segmentation task and improve the segmentation precision.

Description

Lung adenocarcinoma CT image focus segmentation method based on space channel attention enhancement

Technical Field

The invention relates to the technical field of medical image processing, in particular to a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement.

Background

Lung cancer can be clinically divided into two types, non-small cell lung cancer and small cell lung cancer, the former accounting for about 85% of lung cancer, while lung adenocarcinoma is the major subtype in non-small cell lung cancer. Imaging examination is the main mode for examining lung canceration in hospitals at present, wherein the computed tomography (Computed Tomography, CT) technology is the most common, CT image diagnosis is based on an imaging histology method, has great potential in quantitative analysis of tumors, is a necessary premise and an important foundation for helping radiologists to conduct downstream image analysis, and has important significance for guiding clinical decisions.

In recent years, with the rapid development of computer software and hardware technology and continuous enrichment of GPU computing resources, the adoption of a deep learning method for dividing cancer lesions becomes one of the current research directions. The deep learning method inputs the preprocessed CT images into the constructed neural network, automatically extracts the features of the images, learns according to the extracted feature images, and finally detects the interesting nodule area to form an end-to-end detection model. The method can automatically segment the focus of the CT image, assist radiologists to rapidly analyze the CT image, lighten the workload of the radiologists, and quantitatively or qualitatively evaluate the pathological tissues, thereby realizing good prognosis effect with earlier time and lower cost.

The current stage early lung adenocarcinoma CT image focus segmentation task has the following defects:

(1) The related diagnosis given by the radiologist through observing the CT image in the traditional mode depends on the experience of the doctor, and has the defects of long time consumption, large influence by subjective factors, lower detection precision and the like;

(2) When the traditional segmentation network based on the U-shaped structure segments lung nodules, the problems of nodule adhesion, high false positive and the like exist, so that prognosis analysis of patients is affected.

Therefore, how to quickly and accurately automatically segment lung nodules in CT images is an urgent problem to be solved by the current early lung adenocarcinoma CT image focus segmentation task.

Disclosure of Invention

The invention aims to provide a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement, which solves the problems of insufficient time cost, labor cost and segmentation precision existing in early lung adenocarcinoma CT image focus segmentation tasks through a segmentation network EfficientUNet++ fused with a scSE attention mechanism.

In order to achieve the above object, the present invention provides the following solutions:

a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement comprises the following steps:

acquiring a CT image of a lung to be detected;

inputting the lung CT image to be detected into a preset segmentation model to obtain a focus segmentation mask image;

and extracting image features based on the focus segmentation mask image and quantifying to obtain tumor information in the CT image of the lung to be detected, wherein the segmentation model is obtained based on training set training, the training set comprises a plurality of CT images of the lung of the patient, and the segmentation model is constructed by adopting a segmentation network EfficientUNet++ with enhanced spatial channel attention.

Preferably, before training the segmentation model based on the training set, the method further comprises preprocessing a patient lung CT image in the training set, and the preprocessing process includes:

and performing sequence cutting, windowing, resampling normalization and data enhancement on the patient lung CT image to obtain an effective focus area slice of the patient lung CT image, wherein the effective focus area slice comprises a 2D lung image and a focus mask image corresponding to the 2D lung image.

Preferably, the segmentation model comprises: an efficiency module, a kxk depth convolution module, a scSE attention module, a Sigmoid function, wherein the efficiency module is used for extracting a first depth feature of an input image through downsampling; the k x k depth convolution module is used for further extracting second depth features of the input image based on upsampling, the scSE attention module is used for screening and refining the first depth features and the second depth features to obtain feature images, and the Sigmoid function is used for mapping the feature images into focus segmentation mask images.

Further, the Efficientnet module includes: one layer of 3 x 3 convolution, seven layers of different sized depth separable convolutions, and one layer of 1 x 1 convolution, connected in sequence.

Further, the scSE attention module includes: a cSE channel attention module and a sSE spatial attention module, wherein the cSE channel attention module is used for calibrating channel information and acquiring a first feature map; the sSE spatial attention module is used for calibrating spatial information and acquiring a second characteristic diagram.

Preferably, training the segmentation model further comprises:

and calculating a Loss value of the focus segmentation mask map generated by the segmentation model and the actual focus segmentation mask map based on a BCEDice Loss function, reversely updating model parameters based on the Loss value, and carrying out the next training by iteration until the Loss value tends to be stable.

Preferably, before extracting the image features based on the focus segmentation mask map and quantifying, the method further comprises performing a binarization process on the focus segmentation mask map, where the binarization process includes:

setting a threshold value threshold, setting a pixel value between 0 and threshold in the focus segmentation mask map as 0, setting a pixel value between threshold and 1 as 255, and obtaining a binary focus segmentation mask map.

Preferably, extracting image features based on the focus segmentation mask map and quantifying includes:

performing edge detection on the binary focus segmentation mask map based on different pixel values, acquiring and storing a sequence of focus region edge points, setting that the pixel position difference of two adjacent points is not more than 1, dividing a closed region formed by the focus region edge points into contours of different levels, and acquiring focus contours in the binary focus segmentation mask map;

searching the minimum external regular polygon of the focus outline, marking the minimum external regular polygons in different focus outlines respectively, obtaining the vertex coordinates of the minimum external regular polygon, calculating the center coordinates, the maximum diameter and the area of the nodule based on the vertex coordinates of the minimum external regular polygon, and obtaining the tumor information.

To further achieve the above object, the present invention further provides a lung adenocarcinoma CT image lesion segmentation system based on spatial channel attention enhancement, comprising: the system comprises a CT image acquisition module, a focus region segmentation module and a tumor information quantization module;

the CT image acquisition module is used for acquiring a lung CT image to be detected;

the focus region segmentation module is used for inputting the lung CT image to be detected into a preset segmentation model to obtain a focus segmentation mask map;

and the tumor information quantization module is used for extracting image features based on the focus segmentation mask image and quantizing the image features to acquire tumor information in the CT image of the lung to be detected.

To further achieve the above object, the present invention further provides a computer readable storage medium storing a program, wherein the program when executed by a processor implements a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement.

The beneficial effects of the invention are as follows:

(1) The invention provides a new Attention module scSE which is inserted into a neural network to strengthen the characteristic extraction and discrimination capability of the scSE, improve the segmentation precision of a lung nodule focus area and provide a more accurate focus mask for the follow-up tumor information quantitative analysis;

(2) The invention can automatically generate the lung focus prediction mask image by a computer under the condition of only the original lung image, provides a channel for obtaining more reliable tumor segmentation for a detector, and has better effect even by the computer under certain conditions than manual diagnosis;

(3) The invention digs deep image features in the Encoder deeply, then the features pass through a scSE module in the Encoder respectively, finally, the features are encoded into a feature vector according to the assigned weight and are given to shallow features, and then the feature vector is combined with the deep features through dense jump connection; the intention of the method is to fully utilize the channel and the space characteristics of the same image, and then merge the channel and the space characteristics into a characteristic vector with high importance through addition operation, so that more important characteristics in the image can be extracted and distinguished more effectively;

(4) The invention saves the time and cost of the radiologist's diagnosis of reading the film to a great extent, and simultaneously alleviates the problems of node adhesion and high false positive.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for segmenting lung adenocarcinoma CT image lesions based on spatial channel attention enhancement according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a split network EfficientUNet++ architecture according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a feature extraction part of the Efficient net according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a scSE module according to an embodiment of the invention;

FIG. 5 is a schematic diagram of a spatial channel attention-based lung adenocarcinoma CT image lesion segmentation system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

The embodiment provides a lung adenocarcinoma CT image focus segmentation method based on spatial channel attention enhancement, as shown in fig. 1, comprising the following steps:

s1, constructing a segmentation model, and training the segmentation model.

Step 1, CT image preprocessing;

and performing operations such as sequence cutting, windowing, resampling normalization, data enhancement and the like on the original lung CT image of the patient to obtain an effective focus region slice of the normalized lung CT image.

The sequence cutting is to start from the first 3D scanning slice of each patient, slice the original 3D lung CT image (recorded as data) layer by layer according to the z-axis direction, and finally obtain M square 2D original lung images with equal length and width of the focus area and focus mask images (recorded as slices).

Windowing refers to setting a lung window to [ -400,1500] according to Hu values of lung organs and gray value conversion functions for a lung CT image slice image, wherein m and n respectively represent upper and lower boundary values of the window, θ represents CT values of an original image, and g represents pixel gray values after mapping:

and enabling a tumor focus area with a CT value higher than a window range in the image to be white, and enabling a background area lower than the window range to be black.

Resampling normalization refers to resampling a CT image picture in a data set to 1mm by 1mm spatial resolution through a linear interpolation method, so that the spatial interval between pixels of the CT image in a world coordinate system is unified with the physical distance between each pixel in reality.

The data enhancement comprises a traditional data enhancement method and a data enhancement method based on bilateral filtering, and specifically comprises the following steps:

traditional data enhancement, performing a series of affine transformations of up-down flipping, mirror flipping, random angle rotation and image brightness transformation on the original lung image and the lesion mask image in the dataset, and generating a new image similar to the original image to expand the dataset.

The data enhancement based on bilateral filtering adopts a bilateral filtering method combining a low-pass filter and an edge stopping function, meanwhile, the difference between a space domain and a value domain in an image is considered, the high-frequency details of the image are protected at the edge with severe gray value change, the noise generated by the air, capillary vessels, alveoli and other tissues in the lung parenchyma is removed to the maximum extent, the contour characteristics of focus areas are reserved, and the image similar to the original image after noise reduction and smoothness is generated to expand a data set. This is achieved by the following formula:

wherein (i, j) represents the position of a certain pixel value in the image, (k, l) represents the position of a neighborhood pixel value, f and g both represent the specific pixel value of the current coordinate of the image, g is the output pixel value, and the weight coefficient w is the product of the space domain kernel and the value domain kernel.

Step 2, respectively taking the preprocessed 2D original lung training image containing the focus area and the corresponding focus mask image as a source and a target of a segmentation model, and simultaneously inputting the source and the target into the segmentation model for training to obtain a predicted focus segmentation mask image;

the training scheme and the super-parameters are set to have a learning rate lr=0.001, a batch size batch_size=4, an optimization function optimizer=adam, training times epoch=50, a Loss function loss=bcEDice Loss, and the learning rate optimization strategy is reduced by one tenth every 15 epochs;

as shown in FIG. 2, the segmentation model is constructed based on a segmentation network Efficient UNet++ with enhanced attention of a scSE spatial channel, and an Efficient Net network is used as a new Encoder to replace the original Encoder of a U-Net++ network so as to improve the extraction capability of the U-Net++ on image depth characteristics; the Decoder section can be seen as a splice using four different depth U-Net subnetworks, each subnetwork containing a different number of 3 x 3 convolutional layers for feature extraction, the outputs of which are spliced together with corresponding feature maps from the Encoder through a series of nested and densely-hopped connections for generating full resolution feature maps at multiple semantic levels; similar to U-Net, the Decoder part also comprises four up-sampling operations, and a new Attention module scSE is inserted into the Decoder up-sampling part, so as to improve the characteristic identification capability of the network to channels and spaces, namely, emphasize important characteristics and simultaneously restrain irrelevant characteristics, thereby achieving the segmentation effect with higher precision.

The segmentation model comprises an Efficientnet module, a k multiplied by k depth convolution module, a scSE attention module and a Sigmoid function, wherein the preprocessed 2D original lung training image containing the focus area and the corresponding focus mask image are respectively used as a source and a target of the segmentation model, and are simultaneously input into the segmentation model for training;

as shown in fig. 3, the Efficientnet module uniformly scales three dimensions of depth, width and resolution of the input image of the network by adopting a composite coefficient, and obtains the best classification effect by balancing weights of the dimensions; the input image is firstly subjected to 3X 3 convolution, then 7 layers of depth separable convolutions with different sizes are carried out, the purpose is to carry out dimension lifting on the feature image obtained after the convolution, the parameter calculation amount is reduced while the feature is extracted, finally, the dimension of the feature image is reduced through 1X 1 convolution, the original resolution of the image is recovered, and an output image containing depth features is obtained;

assuming that the encodings share n layers, in the ith layer (n=4, i e {1,2,3,4 }) extracting depth features through a k×k depth convolution module, compression (Squeeze) and Excitation (expression) operation, filtering and refining through a scSE attention module, and then mutually fusing the extracted depth features with the n-i layers of the encodings through a Dense jump connection (decode skip-connection), generating a new feature map and transmitting the new feature map to the next Encoder layer, namely the n-i+1 layer, and the like until the last layer of the encodings, wherein the output feature map is mapped through a Sigmoid to obtain a predicted focus mask map;

as shown in fig. 4, the scSE attention module merges the channel attention module (cSE) with global space information to recalibrate the channel information, and the space attention module (sSE) provides greater weight to the interested segmentation area to guide the network to pay more attention, compresses the feature images output by each layer of sampling on the Decoder, simultaneously excavates important features of the image from two layers of space and channel, then adds the two results, performs stronger excitation operation, redistributes different weights, and obtains an output feature image with high importance;

the scSE attention module is the result of a parallel combination of cSE and sSE modules; the cSE module changes the dimension of an input feature map from [ C, H, W ] to [ C, 1] through a global average pooling layer, then uses two 1X 1 convolutions to perform information dimension reduction and dimension increase operations to obtain a vector of the C dimension, uses a Sigmoid function to normalize the vector to obtain a corresponding weight vector file, and finally multiplies the input feature map through a channel-wise function to obtain an output feature map calibrated by channel information, so as to enhance the learning of important channel features by a model; the sSE module is similar to the cSE module, but when weight information is extracted, the image space dimension is expanded, a global average pooling layer is not used any more, an output channel is 1, a convolution layer with a convolution kernel size of 1×1 is used for carrying out information dimension reduction and dimension increase operation, and finally, a Sigmoid function is used for normalizing and generating space position weight values of features on each channel and carrying out space information calibration, so that the output feature map contains more important space position information, and irrelevant space position information is ignored;

the new feature map obtained by adding the feature maps output by the cSE module and the sSE module is assigned with different weights, so as to assign low weights to the space and channel features with low importance and assign high weights to the space and channel features with high importance, and the specific operations are as follows:

weight＝softmax(L)*U

l is a new feature map obtained by adding the feature maps output by the cSE module and the sSE module, softmax (L) is used for giving low-importance features a low value, high-importance features a high value, and finally each position is multiplied by an original feature map U, so that the aim of distributing different weights according to the importance degree of the features is fulfilled;

and performing Loss calculation on the focus mask image generated by network prediction and the real focus mask image through BCEDice Loss, and reversely propagating the Loss together to update network parameters, and continuing training the next iteration until the Loss tends to be stable, wherein the training is finished.

Bcalice Loss can be expressed as:

L _new ＝αL _Dice +(1-α)L _BCE

wherein L is _BCE Representing the calculated value of the BCE loss function, L _Dice Representing the calculated value of the Dice loss function, x _i Andrepresenting the predicted value of the network, y _i Representing the true value of the label image, N representing the number of pixels in the image, and α representing the balance factor used to balance the proportion of the Dice loss function and the BCE loss function.

S2, inputting the CT image of the lung to be detected into a trained segmentation model, obtaining a focus segmentation mask map, and performing binarization processing on the focus segmentation mask map.

The focus segmentation mask map is denoted as pred, a threshold value threshold=0.5 is set, pixel values between 0 and threshold in pred are set to 0, pixel values between threshold and 1 are set to 255 (pred [ pred >0.5] =255, pred [ pred < =0.5 ] =0), and thus a binary focus segmentation mask map m is obtained, wherein white is a foreground, representing a lung nodule tumor region, and black is a background, representing lung tissue excluding nodules in the binary focus segmentation mask map m.

S3, carrying out focus contour sketching on the binary focus segmentation mask map m, extracting image features and quantifying, wherein the specific operation is as follows:

edge detection is carried out on the mask image m according to different pixel values, sequences of edge points of a focus area are obtained and stored, the pixel position difference of two adjacent points is not more than 1, and the closed area formed by the edge points is divided into outlines of different levels according to a tree search algorithm, so that all focus outlines in the mask image are obtained;

then, a minimum external regular polygon of the focus outline is found out by utilizing a minimum regular polygon approximation method, and color filling is carried out for marking, and the specific operation is as follows: designating an error threshold Dmax, then selecting a straight line between two points A and B on a focus contour curve as a chord of the contour curve, then traversing all other points on the contour curve, finding a distance from each point to the straight line AB, finding a maximum distance point C therefrom, recording the maximum distance as Dmax, if Dmax is smaller than Dmax, taking the straight line AB as an approximate straight line of the contour curve, otherwise dividing AB into two sections of AC and CB, repeating the operation until Dmax is smaller than Dmax, thereby obtaining a minimum regular polygon of the focus contour, and then filling different colors for the contour according to the number of the contour to show distinction;

and calculating to obtain image characteristic information such as the center coordinates, the maximum diameter, the area of the nodule and the like of the nodule according to the coordinates of each vertex of the minimum circumscribed regular polygon, so as to qualitatively classify the lung nodule according to the characteristics, obtain a tumor information quantization result and provide a basis for downstream clinical analysis.

To further optimize the technical solution, the present embodiment further provides a lung adenocarcinoma CT image focus segmentation system 100 based on spatial channel attention enhancement, as shown in fig. 5, including: a CT image acquisition module 101, a lesion area segmentation module 102, and a tumor information quantization module 103;

the system is designed and developed based on a Web application framework streamlite of Python machine learning, and is deployed to a cloud server by using a Git tool;

the CT image acquisition module 101 is used for acquiring a lung CT image to be detected;

the focus region segmentation module 102 is used for inputting the CT image of the lung to be detected into a preset segmentation model to obtain a focus segmentation mask map;

and the tumor information quantization module 103 is used for extracting image features based on the focus segmentation mask image and quantizing the image features to acquire tumor information in the CT image of the lung to be detected.

In order to further optimize the technical scheme, the embodiment also provides a computer readable storage medium, as shown in fig. 6, wherein a program is stored in a memory, and when the program is executed by a processor, the lung adenocarcinoma CT image focus segmentation method based on the spatial channel attention enhancement is realized.

Those skilled in the art will appreciate that the processes implementing all or part of the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a non-volatile computer readable storage medium, and the program may include processes of the embodiments of the methods as above when executed. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims

1. The lung adenocarcinoma CT image focus segmentation method based on the spatial channel attention enhancement is characterized by comprising the following steps of:

acquiring a CT image of a lung to be detected;

2. The method of claim 1, further comprising preprocessing the patient lung CT images in the training set before training the segmentation model based on the training set, wherein the preprocessing process comprises:

3. The method of claim 1, wherein the segmentation model comprises: an efficiency module, a kxk depth convolution module, a scSE attention module, a Sigmoid function, wherein the efficiency module is used for extracting a first depth feature of an input image through downsampling; the k x k depth convolution module is used for further extracting second depth features of the input image based on upsampling, the scSE attention module is used for screening and refining the first depth features and the second depth features to obtain feature images, and the Sigmoid function is used for mapping the feature images into focus segmentation mask images.

4. The method of claim 3, wherein the Efficientnet module comprises: one layer of 3 x 3 convolution, seven layers of different sized depth separable convolutions, and one layer of 1 x 1 convolution, connected in sequence.

5. The method of claim 3, wherein the scSE attention module comprises: a cSE channel attention module and a sSE spatial attention module, wherein the cSE channel attention module is used for calibrating channel information and acquiring a first feature map; the sSE spatial attention module is used for calibrating spatial information and acquiring a second characteristic diagram.

6. The method of claim 1, wherein training the segmentation model further comprises:

7. The method for segmenting lung adenocarcinoma CT image lesions based on spatial channel attention enhancement according to claim 1, further comprising binarizing the lesion segmentation mask map before extracting image features based on the lesion segmentation mask map and quantifying, wherein the binarizing process comprises:

8. The method of claim 7, wherein extracting and quantifying image features based on the focus segmentation mask map comprises:

9. A system of spatial channel attention-based lung adenocarcinoma CT image lesion segmentation method according to any of claims 1-8, comprising: the system comprises a CT image acquisition module, a focus region segmentation module and a tumor information quantization module;

10. A computer readable storage medium storing a program, wherein the program when executed by a processor implements the lung adenocarcinoma CT image lesion segmentation method based on spatial channel attention enhancement according to any of claims 1-8.