CN113571190A

CN113571190A - Lung function decline prediction device and prediction method thereof

Info

Publication number: CN113571190A
Application number: CN202110988100.XA
Authority: CN
Inventors: 陈舞; 孙军梅; 李秀梅
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2021-10-29
Anticipated expiration: 2041-08-26
Also published as: CN113571190B

Abstract

The invention discloses a lung function decline prediction device and a prediction method thereof. The lung function progress prediction model provided by the invention comprises a CT (computed tomography) feature extraction network and a multi-modal feature prediction network; the CT feature extraction network is used for carrying out CT feature extraction on the preprocessed lung CT images, the multi-mode feature prediction network is used for predicting the lung function progress condition, multi-mode features formed by fusing CT features extracted by the CT feature extraction network and clinical features are input, and the FVC prediction values in different weeks in the future are output. The lung function progress prediction model constructed by the method improves the extraction capability of CT characteristics through the CT characteristic extraction network, and predicts the lung function progress by multi-modal data through the multi-modal characteristic prediction network, thereby effectively improving the accuracy of model prediction.

Description

Lung function decline prediction device and prediction method thereof

Technical Field

The invention belongs to the field of artificial intelligence, and relates to a lung function decline prediction device based on multi-modal data and a non-diagnosis-purpose lung function decline prediction method thereof.

Background

Idiopathic Pulmonary Fibrosis (IPF) is a chronic lung disease characterized by occult onset of disease, unknown etiology, histological or imaging manifestations of common interstitial pneumonia, and progressive dyspnea and decreased Pulmonary function, with morbidity and prevalence ranging from 0.09-1.30 and 0.33-4.51, respectively, in ten thousand people. Due to the progressive development of IPF disease and limited diagnostic tools, complete lung dysfunction may eventually result. The typical median survival time for IPF patients is only 3-5 years, and the prognosis of the disease is more difficult. Although no widely used technique is available to estimate the progression of IPF disease, it is widely believed that decreased lung function in IPF patients may provide some guidance and advice for the prognosis of IPF.

Disclosure of Invention

It is an object of the present invention to address the deficiencies of the prior art by providing a method for predicting lung function decline based on multi-modal data for non-diagnostic purposes.

The invention relates to a method for predicting lung function decline based on multi-modal data, which is not used for diagnosis, and comprises the following steps:

acquiring historical lung CT images and corresponding clinical text data; wherein the clinical text data comprises lung influence factors, week number for measuring Forced Vital Capacity, Forced Vital Capacity (FVC), and percentage of Forced Vital Capacity to normal standard value; the lung influencing factors comprise age, sex and smoking condition;

preprocessing historical lung CT images and corresponding clinical text data to construct a data set;

preferably, the step (2) is specifically:

2-1, preprocessing the lung CT image: removing DICOM medical image files which cannot be opened and worthless CT image data which do not contain lung information; the image size is adjusted to be uniform 512 x 512.

2-2, preprocessing clinical text data: removing incomplete and wrong data in clinical text data; performing feature engineering on the clinical text data to generate more effective data features for model training; carrying out Min-Max standardization treatment; finally, the clinical characteristics after pretreatment are obtained.

And 2-3, calculating different weeks in the data set and corresponding FVC values according to a least square method to obtain the linear change rate of the FVC, and using the linear change rate as one label of the training set.

Step (3), constructing a lung function progress prediction model, and training by using the data set constructed in the step (2)

The lung function progress prediction model comprises a CT (computed tomography) feature extraction network and a multi-modal feature prediction network; the CT feature extraction network is used for carrying out CT feature extraction on the preprocessed lung CT images, the multi-mode feature prediction network is used for predicting the lung function progress condition, inputting multi-mode features formed by fusing CT features extracted by the CT feature extraction network and clinical features, and outputting the multi-mode features as FVC predicted values in different weeks in the future.

3-1 construction of CT feature extraction network

The CT feature extraction network takes IncepotionV 1 as a backbone network and comprises a front-end downsampling module and a multi-scale CT feature fusion module.

The front-end down-sampling module comprises 1 × 1, 3 × 3 convolution layers and a maximum pooling layer, high-dimensional features are obtained through down-sampling, the number of network parameters is reduced, the calculation speed is increased, and meanwhile over-fitting is prevented.

Preferably, the front-end down-sampling module comprises three serially-connected 3 × 3 convolution layers, a maximum pooling layer, 1 × 1 convolution layer, two serially-connected 3 × 3 convolution layers and a maximum pooling layer which are sequentially cascaded;

the multi-scale CT feature fusion module comprises n1 serially-connected multi-scale CT feature fusion modules A, a first maximum pooling layer, n2 serially-connected multi-scale CT feature fusion modules B, a second maximum pooling layer, n3 serially-connected multi-scale CT feature fusion modules C, an average pooling layer, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially cascaded, wherein n1 is more than or equal to 1, n2 is more than or equal to 1, and n3 is more than or equal to 1.

Preferably, the multi-scale CT feature fusion module comprises 2 serially-connected multi-scale CT feature fusion modules a, a first maximum pooling layer, 2 serially-connected multi-scale CT feature fusion modules B, a second maximum pooling layer, 2 serially-connected multi-scale CT feature fusion modules C, an average pooling layer, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially cascaded;

the multi-scale CT feature fusion module A comprises 5 parallel branches, a feature fusion layer, a residual connection layer, an improved CBAM channel attention module and a 1 x 1 convolution dimensionality increasing layer; the 1 st branch in the 5 parallel branches comprises a 1 x 1 convolution layer; the 2 nd branch comprises 1 × 1 convolution layer, 3 × 3 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 3 × 3 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 3 × 3 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 5 × 5 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 5 × 5 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 5 × 5 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 4 th branch comprises an average pooling layer and a 1 x 1 convolution layer which are sequentially cascaded; the output feature maps of the 1 st to 4 th branches are connected with Concatenate at a feature fusion layer to form a multi-scale CT feature; the residual error connecting layer adds the multi-scale CT characteristics and the original input characteristic diagram of the 5 th branch, so that the characterization capability of the network is improved; the improved CBAM channel attention module is used for receiving the features after residual connecting layer processing, adding attention weight to the multi-scale CT features and inhibiting useless information; and (3) performing cross-channel feature fusion on the multi-scale CT features output by the improved CBAM channel attention module by the 1 × 1 convolution dimensionality increasing layer, and widening the number of network channels by using the minimum parameters.

The multi-scale CT feature fusion module B comprises 5 parallel branches, a feature fusion layer, a residual connection layer and a 1 x 1 convolution dimensionality increasing layer; the structure of the 1 st, 2 nd and 4 th branches in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 3 rd branch comprises 1 × 1 convolution layer, a first 3 × 3 convolution layer, a second 3 × 3 convolution layer, a cavity convolution layer with a cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the first 3 × 3 convolution layer, the first input end of the second 3 × 3 convolution layer and the input end of the cavity convolution layer with a cavity rate of 2, the output end of the first 3 × 3 convolution layer is connected with the second input end of the second 3 × 3 convolution layer, and the output end of the second 3 × 3 convolution layer and the output end of the cavity convolution layer with a cavity rate of 2 are connected with the feature map addition layer. The output feature graphs of the 1 st to 4 th branches are connected with each other through concatemate in a feature fusion layer to form a multi-scale CT feature; the residual error connecting layer adds the multi-scale CT characteristics and the original input characteristic diagram of the 5 th branch, so that the characterization capability of the network is improved; and (3) performing cross-channel feature fusion on the features processed by the residual connecting layer by the 1 × 1 convolution dimensionality-increasing layer, and widening the number of network channels by using the minimum parameters.

The multi-scale CT feature fusion module C comprises 5 parallel branches, a feature fusion layer, a residual connection layer, an improved CBAM channel attention module and a 1 x 1 convolution dimensionality increasing layer; the structure of the 1 st branch in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 2 nd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer, 3 × 1 convolution layer, a cavity convolution layer with the cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 1 × 3 convolution layer and the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 1 × 3 convolution layer is connected with the input end of the 3 × 1 convolution layer, and the output end of the 3 × 1 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the feature map addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer A, 3 × 1 convolution layer A, a cavity convolution layer A with a void rate of 2, a feature map addition layer A, 1 × 3 convolution layer B, 3 × 1 convolution layer B, a cavity convolution layer B with a void rate of 2, a feature map addition layer B, an output end of 1 × 1 convolution layer A is connected with an input end of 1 × 3 convolution layer A, an input end of cavity convolution layer A with a void rate of 2, an input end of 1 × 3 convolution layer B, an output end of 1 × 3 convolution layer A is connected with an input end of 3 × 1 convolution layer A, an output end of 3 × 1 convolution layer A is connected with an input end of feature map addition layer A, an output end of cavity convolution layer A with a void rate of 2 is connected with an input end of feature map addition layer A, an output end of feature map addition layer A is connected with an input end of 1 × 3 convolution layer B, an input end of 2 convolution layer B, and an output end of 1 × 3 convolution layer B is connected with an input end of feature map addition layer B, 3, the output end of the convolution layer B by 1 and the output end of the void convolution layer B with the void ratio of 2 are connected with the input end of the characteristic graph addition layer B; the 4 th branch comprises an average pooling layer, a maximum pooling layer, a characteristic diagram adding layer and a 1 x 1 convolution layer, wherein the output end of the average pooling layer and the output end of the maximum pooling layer are connected with the input end of the characteristic diagram adding layer, and the output end of the characteristic diagram adding layer is connected with the input end of the 1 x 1 convolution layer.

The Improved CBAM Channel Attention module (CBAM-ICA) performs global average pooling and global maximum pooling on an input feature map, generates two different Channel Attention feature maps through two 1 × 1 convolutional layers and a Sigmoid activation function, multiplies the two Channel Attention feature maps to form a final Attention weight, and multiplies the final Attention weight with an input feature map F pixel by pixel to obtain a final output feature F'. The specific process can be represented by formula 1:

F'＝(ε(C(P_ag(F)))×ε(C(P_mx(F))))⊙F (1)

wherein F represents an input feature map, P_agRepresenting global average pooling, P_mxIndicating global maximum pooling, C representing two 1 x 1 convolutional layers, epsilon indicating Sigmoid activation function, and F' indicating output characteristics after passing through the modified CBAM channel attention module.

3-2 building a multimodal feature prediction network

The multi-modal feature prediction network comprises a first multi-modal feature module and a second multi-modal feature module;

the first multi-modal feature module takes CT features extracted by a CT feature extraction network and preprocessed lung influence factors as input, takes the linear change rate of the FVC as output, and comprises a Concatenate feature fusion layer A and a full connection layer which are sequentially cascaded; the concatemate feature fusion layer A is used for carrying out concatemate fusion on CT features extracted by a CT feature extraction network and preprocessed clinical features (age, gender and smoking condition) to obtain first multi-modal features; the full-link layer is used for predicting the FVC linear change rate. The loss function used for predicting the linear change rate of the FVC is an average Absolute error MAE (mean Absolute error), which is the sum of Absolute values of the difference between a target value and a predicted value, and represents the average error amplitude of the predicted value, so that the method has better robustness. In order to relieve the appearance of an overfitting phenomenon in the network training process and enable the network to have good generalization, a dropout layer is added before a full connection layer.

The second multi-modal feature module takes the FVC linear change rate and all clinical features output by the first multi-modal feature module as input, and takes the FVC predicted values in different weeks in the future as output, and comprises an attention module and a multi-layer perceptron (MLP) which are sequentially cascaded.

The attention module calculation process is represented by equation 2:

in the formula, F_xRepresents the linear rate of change of the FVC and all clinical features output by the first multimodal feature model, M represents two fully connected layers, ε represents the Sigmoid activation function, F_wxRepresenting the output characteristics after passing through the attention module. Multimodal features F_xThrough two full-connection layers, calculating by a Sigmoid activation function to obtain an attention weight, and finally, calculating the attention weight and an input feature F_xMultiplying and adding to obtain the final output characteristic F_wx。

The multilayer perceptron comprises a first full-connection layer, an ELU activation function layer, a second full-connection layer, a GELU activation function layer and a third full-connection layer which are sequentially cascaded, the output characteristic diagram of the attention module is used as input, and the FVC value is used as output.

And (4) utilizing the trained lung function progress prediction model to realize the prediction of the lung function progress.

It is another object of the present invention to provide a lung function deterioration predicting apparatus based on multi-modal data, comprising:

the lung data acquisition module is used for acquiring lung CT images and corresponding lung influence factors, wherein the lung influence factors comprise age, gender and smoking conditions;

the lung data preprocessing module is used for preprocessing the lung CT image and the lung influence factors;

and the lung function progress prediction module is used for processing the preprocessed lung CT images and the lung influence factors by utilizing the trained lung function progress prediction model so as to obtain the FVC prediction values in different weeks in the future.

The invention has the beneficial effects that:

(1) high accuracy of lung function prediction

The lung function progress prediction model constructed by the method improves the extraction capability of CT characteristics through the CT characteristic extraction network, and predicts the lung function progress by multi-modal data through the multi-modal characteristic prediction network, thereby effectively improving the accuracy of model prediction.

(2) Generalization ability enhancement of lung function progression prediction model

The lung function progress prediction model constructed by the invention carries out a series of measures for avoiding overfitting: an Adam optimizer is adopted to adaptively adjust the learning rate during training; a dropout layer is added to the CT characteristic extraction network, so that the appearance of an overfitting phenomenon in the model training process is relieved; when the linear change rate of the FVC is predicted, the average absolute error MAE is used as a loss function, so that the robustness is better; predicting the FVC value in a multi-modal characteristic prediction network, and training by using a K-fold cross verification method to reduce the risk of overfitting to a certain extent; an ELU activation function and a GELU activation function are used after the full connection layer, so that the robustness to noise and the model generalization capability are improved; and setting a training early termination value, and stopping training when the loss value exceeds 15 times and does not decrease. These measures ultimately enhance the generalization ability of the model.

(3) The lung function progress prediction model can effectively predict the lung function progress

The lung function progress prediction model can predict the FVC values of different weeks in the future, so that the severity of the lung function decline of the predicted person can be better understood, and the lung function progress prediction model has guiding significance for the prognosis of the lung function of the predicted person.

Drawings

FIG. 1 is a schematic diagram of the structure of a model for predicting the progression of lung function;

FIG. 2 is a schematic diagram of a CT feature extraction network;

FIG. 3 is a schematic structural diagram of a multi-scale CT feature fusion module A;

FIG. 4 is a schematic structural diagram of a multi-scale CT feature fusion module B;

FIG. 5 is a schematic structural diagram of a multi-scale CT feature fusion module C;

FIG. 6 is a schematic diagram of a structure of a channel attention module of the modified CBAM;

fig. 7 is a schematic structural diagram of a multi-modal feature prediction network.

Detailed Description

The present invention is further analyzed with reference to the following specific examples.

A method for non-diagnostic purposes of predicting lung function decline based on multi-modal data, comprising the steps of:

table 1 clinical text data

Preprocessing historical lung CT images and corresponding clinical text data to construct a data set; the method comprises the following steps:

2-1, preprocessing the lung CT image; removing DICOM medical image files which cannot be opened and worthless CT image data which do not contain lung information; the image size is adjusted to be uniform 512 x 512.

2-2, preprocessing clinical text data; removing incomplete and wrong data in clinical text data; performing feature engineering on the clinical text data to generate more effective data features for model training; carrying out Min-Max standardization treatment; finally, the clinical characteristics after pretreatment are obtained.

Step (3), constructing a lung function progress prediction model as shown in figure 1, and training by using the data set constructed in the step (2)

The lung function progress prediction model comprises a CT (computed tomography) feature extraction network and a multi-modal feature prediction network; the CT feature extraction network is used for preprocessing lung CT images l_iPerforming CT feature extraction, wherein i is more than or equal to 0 and less than N, N represents the cycle number of the forced vital capacity predicted, the multi-mode feature prediction network is used for predicting the lung function progress condition, inputting multi-mode features formed by fusing the CT features extracted by the CT feature extraction network and clinical features, and outputting the multi-mode features as FVC predicted values FVC at different cycle numbers in the future_N。

3-1 constructing CT feature extraction network as shown in FIG. 2

Compared with the Inception V1 network, the network adds a residual error and an improved CBAM channel attention module to expand the receptive field of the network, pays attention to the effective characteristics of the lung region, adds a cavity convolution module parallel to a convolution layer to supplement lost detail information, and finally forms three different multi-scale CT characteristic fusion modules and carries out two-time series stacking. Therefore, the lung CT image is subjected to multi-scale feature extraction and fusion, the CT feature extraction capability of the network is enhanced, and more accurate and effective CT features are obtained.

The front-end down-sampling module comprises three serially-connected 3 x 3 convolution layers, a maximum pooling layer, a 1 x 1 convolution layer, two serially-connected 3 x 3 convolution layers and a maximum pooling layer which are sequentially cascaded;

the multi-scale CT feature fusion module comprises 2 serially-connected multi-scale CT feature fusion modules A, a first maximum pooling layer, 2 serially-connected multi-scale CT feature fusion modules B, a second maximum pooling layer, 2 serially-connected multi-scale CT feature fusion modules C, an average pooling layer, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially cascaded.

As shown in fig. 3, the multi-scale CT feature fusion module a includes 5 parallel branches, a feature fusion layer, a residual connection layer, a channel attention module for improving CBAM, and a 1 × 1 convolution dimensionality-increasing layer; the 1 st branch in the 5 parallel branches comprises a 1 x 1 convolution layer; the 2 nd branch comprises 1 × 1 convolution layer, 3 × 3 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 3 × 3 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 3 × 3 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 5 × 5 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 5 × 5 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 5 × 5 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 4 th branch comprises an average pooling layer and a 1 x 1 convolution layer which are sequentially cascaded; the output feature maps of the 1 st to 4 th branches are connected with Concatenate at a feature fusion layer to form a multi-scale CT feature; the residual error connecting layer adds the multi-scale CT characteristics and the original input characteristic diagram of the 5 th branch, so that the characterization capability of the network is improved; the improved CBAM channel attention module is used for receiving the features after residual connecting layer processing, adding attention weight to the multi-scale CT features and inhibiting useless information; and (3) performing cross-channel feature fusion on the multi-scale CT features output by the improved CBAM channel attention module by the 1 × 1 convolution dimensionality increasing layer, and widening the number of network channels by using the minimum parameters.

As shown in fig. 4, the multi-scale CT feature fusion module B includes 5 parallel branches, a feature fusion layer, a residual connection layer, and a 1 × 1 convolution dimensionality-increasing layer; the structure of the 1 st, 2 nd and 4 th branches in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 3 rd branch comprises 1 × 1 convolution layer, a first 3 × 3 convolution layer, a second 3 × 3 convolution layer, a cavity convolution layer with a cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the first 3 × 3 convolution layer, the first input end of the second 3 × 3 convolution layer and the input end of the cavity convolution layer with a cavity rate of 2, the output end of the first 3 × 3 convolution layer is connected with the second input end of the second 3 × 3 convolution layer, and the output end of the second 3 × 3 convolution layer and the output end of the cavity convolution layer with a cavity rate of 2 are connected with the feature map addition layer. The output feature graphs of the 1 st to 4 th branches are connected with each other through concatemate in a feature fusion layer to form a multi-scale CT feature; the residual error connecting layer adds the multi-scale CT characteristics and the original input characteristic diagram of the 5 th branch, so that the characterization capability of the network is improved; and (3) performing cross-channel feature fusion on the features processed by the residual connecting layer by the 1 × 1 convolution dimensionality-increasing layer, and widening the number of network channels by using the minimum parameters.

As shown in fig. 5, the multi-scale CT feature fusion module C includes 5 parallel branches, a feature fusion layer, a residual connection layer, a channel attention module for improving CBAM, and a 1 × 1 convolution dimensionality-increasing layer; the structure of the 1 st branch in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 2 nd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer, 3 × 1 convolution layer, a cavity convolution layer with the cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 1 × 3 convolution layer and the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 1 × 3 convolution layer is connected with the input end of the 3 × 1 convolution layer, and the output end of the 3 × 1 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the feature map addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer A, 3 × 1 convolution layer A, a cavity convolution layer A with a void rate of 2, a feature map addition layer A, 1 × 3 convolution layer B, 3 × 1 convolution layer B, a cavity convolution layer B with a void rate of 2, a feature map addition layer B, an output end of 1 × 1 convolution layer A is connected with an input end of 1 × 3 convolution layer A, an input end of cavity convolution layer A with a void rate of 2, an input end of 1 × 3 convolution layer B, an output end of 1 × 3 convolution layer A is connected with an input end of 3 × 1 convolution layer A, an output end of 3 × 1 convolution layer A is connected with an input end of feature map addition layer A, an output end of cavity convolution layer A with a void rate of 2 is connected with an input end of feature map addition layer A, an output end of feature map addition layer A is connected with an input end of 1 × 3 convolution layer B, an input end of 2 convolution layer B, and an output end of 1 × 3 convolution layer B is connected with an input end of feature map addition layer B, 3, the output end of the convolution layer B by 1 and the output end of the void convolution layer B with the void ratio of 2 are connected with the input end of the characteristic graph addition layer B; the 4 th branch comprises an average pooling layer, a maximum pooling layer, a characteristic diagram adding layer and a 1 x 1 convolution layer, wherein the output end of the average pooling layer and the output end of the maximum pooling layer are connected with the input end of the characteristic diagram adding layer, and the output end of the characteristic diagram adding layer is connected with the input end of the 1 x 1 convolution layer.

As shown in fig. 6, the channel attention module of the improved CBAM specifically includes:

the method comprises the steps of changing two fully-connected layers of the original CBAM for calculating the attention weight into 1-1 convolutional layers, converting the original calculation process of adding the layers and calculating the weight by using an activation function into the process of directly calculating the attention weight value by using a Sigmoid activation function for two convolution output characteristics, then multiplying the two attention weights, outputting the channel attention weight value, and removing the space attention part in the CBAM attention mechanism. That is, in the improved CBAM channel attention module, first, the input feature maps are respectively subjected to global average pooling and global maximum pooling, then, two 1 × 1 convolutional layers are passed through, two different channel attention feature maps are generated through a Sigmoid activation function, finally, the two channel attention feature maps are multiplied to form a final attention weight, and the final attention weight is multiplied with the input feature map F pixel by pixel to obtain a final output feature F'. The specific process can be represented by formula (1):

F'＝(ε(C(P_ag(F)))×ε(C(P_mx(F))))⊙F (1)

3-2 construction of a multimodal feature prediction network as in FIG. 7

the first multi-modal feature module takes CT features extracted by a CT feature extraction network and preprocessed lung influence factors as input, |_iThe corresponding FVC linear change rate is taken as output and comprises a Concatenate characteristic fusion layer A, a,A fully-connected layer; the concatemate feature fusion layer A is used for carrying out concatemate fusion on CT features extracted by a CT feature extraction network and preprocessed clinical features (age, gender and smoking condition) to obtain first multi-modal features; the full-link layer is used for predicting the FVC linear change rate. The loss function used for predicting the linear change rate of the FVC is an average Absolute error MAE (mean Absolute error), which is the sum of Absolute values of the difference between a target value and a predicted value, and represents the average error amplitude of the predicted value, so that the method has better robustness. In order to relieve the appearance of an overfitting phenomenon in the network training process and enable the network to have good generalization, a dropout layer is added before a full connection layer.

The second multi-modal signature module outputs the linear rate of change of FVC and all clinical features (lung impact factor, /)_iThe corresponding cycle number of the forced vital capacity, the forced vital capacity FVC, and the percentage of the forced vital capacity to the standard value of a normal person) are taken as input, and the FVC value at the cycle number N is taken as output, and comprises an attention module and a multilayer perceptron (MLP) which are sequentially cascaded.

The attention module calculation process is represented by equation (2):

The multilayer perceptron comprises a first full-connection layer, an ELU activation function layer, a second full-connection layer, a GELU activation function layer and a third full-connection layer which are sequentially cascaded, the output characteristic diagram of the attention module is used as input, and the FVC value is used as output. Finally, the third full link layer outputs three characteristic values, Out1, Out2, and Out3, respectively. Out2 is the predicted value of FVC, and Out3 minus Out1 is the standard deviation value used to calculate the Laplace log likelihood score and then evaluate the model. The loss function used for predicting the FVC predicted value in different weeks in the future is Quantile loss function (Quantile loss function), and the Quantile value is [0.2,0.5,0.8 ]. In the training process, a K-fold cross-validation method is used for training, samples of K-1/K are randomly selected from all multi-modal characteristics to serve as a training set, the rest samples are used as a validation set, meanwhile, a training early termination value probability is set to be 15, and the training is stopped when a loss function is reduced to a certain degree, so that the risk of overfitting is reduced to a certain degree. And the second multi-modal feature module selects the K value to be 6, six times of training are carried out, and the final prediction result is the average value of the sum of the six prediction results. And an ELU activation function and a GELU activation function are used after the full connection layer, so that the robustness to noise and the network generalization capability are improved.

And inputting the lung CT images and clinical characteristics after the centralized test preprocessing into a trained lung function progress prediction model, predicting the FVC prediction values of the model at different weeks in the future by the preprocessed lung CT images and clinical characteristics, and finally outputting the lung function progress prediction result of the forecasted person in the test set.

In order to use the appropriate quantile values in the quantile loss function calculation process to obtain a better prediction result, experiments were performed on the selection of the quantile values, and the experimental results are shown in table 2.

TABLE 2 comparison of results for different quantiles

The experimental result can obtain that the [0.2,0.5 and 0.8] is used as the quantile loss function quantile value to obtain better Laplace log-likelihood score and improve the accuracy of the model prediction result.

In order to verify that the performance improvement effect obtained by introducing the improved CBAM channel attention mechanism into the multi-scale CT feature fusion module is better, an attention mechanism introduction position comparison experiment is performed, and the experiment result is shown in table 3.

TABLE 3 attention mechanism introduction position comparison

The experimental result shows that the model performance can be effectively improved and the prediction accuracy can be improved by introducing the attention mechanism to the proper position in the multi-scale CT feature fusion module, wherein the optimal effect of the attention mechanism ratio is simultaneously added in the structures of the multi-scale CT feature fusion module A and the multi-scale CT feature fusion module C.

In order to compare the roles of different attention modules in the CT feature extraction network, comparative experiments were conducted simultaneously using attention mechanisms such as SE attention module, CBAM attention module, ECA attention module, scSE attention module, CBAM-ICA module, and the like. The attention module is mainly added after the multi-scale connection of the multi-scale CT feature fusion module A, C in the CT feature extraction network, and the experimental results are shown in table 4.

Table 4 attention Module comparative experiment

As can be seen from the table, the CBAM-ICA module works best to improve the model performance compared to other attention modules, but the parameter quantity is not greatly improved and is the same as that of the CBAM attention module.

There are two ways in which the attention mechanism and residual module can be combined. After multi-scale feature fusion is completed, attention weight is added to the multi-scale features, and then residual connection is performed on the multi-scale features and original input features. And in the other structure b, after the multi-scale feature fusion is completed, the multi-scale feature and the original input feature are subjected to residual error connection, and then an attention mechanism is added. In order to verify the merits of the two module structures, comparative experiments were performed on the module structures a and b, and the experimental results are shown in table 5.

TABLE 5 comparison of residual error plus CBAM-ICA Module Structure

As can be seen from the table, the laplacian log-likelihood score of structure b is significantly better than that of structure a under the same parameter number. Therefore, a multi-scale CT feature fusion module in the CT feature extraction network adopts a structure b mode to construct a residual error and CBAM-ICA module.

In order to verify the performance of extracting the CT characteristics by the CT characteristic extraction network in the lung function progress prediction model, different networks are used for replacing the CT characteristic extraction network to perform a comparison experiment. The comparative networks selected for the experiments were inclusion V1, inclusion V3, inclusion _ respet _ V2, Resnet50, densnet 121 and efficientnet b 0. The results of the experiment are shown in Table 6.

TABLE 6CT feature extraction network effect comparison

From the table, it is observed that the laplacian log-likelihood score on the test set is superior to most other networks based on using three networks of the inclusion multi-scale module as the CT feature extraction network, so the invention uses the inclusion v1 as the backbone network of the CT feature extraction network. Compared with other networks, the Laplace log-likelihood score obtained by the model on the test set is-6.8107, the best effect is obtained, and the number of used network parameters is small.

A comparison experiment is carried out on several existing lung function progress prediction methods so as to verify the effectiveness of a lung function progress prediction model. The experimental comparison model method comprises the following steps: FibrosisNet, Fibro-CoSANet, DNN + GBDT + NGBoost + Elasticent integration model. The results of the experiment are shown in Table 7.

TABLE 7 Lung function progression prediction method comparison

The experimental result can show that compared with the existing lung function progress prediction method, the model of the invention obtains better Laplace log-likelihood score. Therefore, the model of the invention can predict the progress of the lung function more accurately.

The basic model used in the ablation experiment is a lung function progress prediction model with the three modules of a residual error module, a cavity convolution module and a CBAM-ICA attention module in the CT feature extraction network removed, the three modules are respectively added in the experiment process to carry out the ablation experiment, and finally the results are compared with the experiment results of the basic model. The results of the experiment are shown in Table 8.

Table 8 model ablation experiment

According to the experimental result, the Laplace log-likelihood scores obtained by the CT feature extraction network in the lung function progress prediction model after the residual error module, the cavity convolution module and the CBAM-ICA attention module are respectively added are improved to different degrees. And when three modules are added simultaneously, the prediction score of the model is optimal. The result of the prediction of the lung function progress prediction model comprising the residual error module, the cavity convolution module and the CBAM-ICA attention module is more accurate.

To verify the validity of the multi-modal data, the lung function progress prediction model based on the multi-modal data of the present invention was compared with a model method using only clinical text data or lung CT image data, and the experimental results are shown in table 9.

TABLE 9 Effect of different modality data prediction

As can be seen from the experimental results, the laplacian log-likelihood score of the model method using only lung CT image data is lower than that of the model method using only clinical text data. However, these methods using only one medical modality data all score much less than the laplacian log-likelihood score of the lung function progression prediction model based on multi-modality data. Therefore, compared with single medical modal data, the multi-modal data can effectively improve the accuracy of model prediction.

The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above embodiments, and all embodiments are within the scope of the present invention as long as the requirements of the present invention are met.

Claims

1. A method for predicting lung function decline based on multi-modal data for non-diagnostic purposes, comprising the steps of:

acquiring historical lung CT images and corresponding clinical text data; wherein the clinical text data comprises lung influence factors, week number of the forced vital capacity, forced vital capacity FVC, and percentage of the forced vital capacity to the standard value of normal people; the lung influencing factors comprise age, sex and smoking condition;

The lung function progress prediction model comprises a CT (computed tomography) feature extraction network and a multi-modal feature prediction network; the CT feature extraction network is used for carrying out CT feature extraction on the preprocessed lung CT images, the multi-mode feature prediction network is used for predicting the lung function progress condition, inputting multi-mode features formed by fusing CT features extracted by the CT feature extraction network and clinical features, and outputting the multi-mode features as FVC predicted values in different weeks in the future;

3-1 construction of CT feature extraction network

The CT feature extraction network takes IncepotionV 1 as a backbone network and comprises a front-end downsampling module and a multi-scale CT feature fusion module;

the multi-scale CT feature fusion module comprises n1 serially-connected multi-scale CT feature fusion modules A, a first maximum pooling layer, n2 serially-connected multi-scale CT feature fusion modules B, a second maximum pooling layer, n3 serially-connected multi-scale CT feature fusion modules C, an average pooling layer, a global average pooling layer, a first full-connection layer and a second full-connection layer which are sequentially cascaded, wherein n1 is more than or equal to 1, n2 is more than or equal to 1, and n3 is more than or equal to 1;

the multi-scale CT feature fusion module A comprises 5 parallel branches, a feature fusion layer, a residual connection layer, an improved CBAM channel attention module and a 1 x 1 convolution dimensionality increasing layer; the 1 st branch in the 5 parallel branches comprises a 1 x 1 convolution layer; the 2 nd branch comprises 1 × 1 convolution layer, 3 × 3 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 3 × 3 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 3 × 3 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 5 × 5 convolution layer, a cavity convolution layer with the cavity rate of 2 and a characteristic diagram addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 5 × 5 convolution layer, the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 5 × 5 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the characteristic diagram addition layer; the 4 th branch comprises an average pooling layer and a 1 x 1 convolution layer which are sequentially cascaded; the output feature maps of the 1 st to 4 th branches are connected with Concatenate at a feature fusion layer to form a multi-scale CT feature; adding the multi-scale CT characteristics and the original input characteristic graph of the 5 th branch by the residual connection layer; the improved CBAM channel attention module is used for receiving the processed features of the residual connecting layer and adding attention weight to the multi-scale CT features; 1, performing cross-channel feature fusion on the multi-scale CT features output by the improved CBAM channel attention module by the convolution dimensionality-increasing layer;

the multi-scale CT feature fusion module B comprises 5 parallel branches, a feature fusion layer, a residual connection layer and a 1 x 1 convolution dimensionality increasing layer; the structure of the 1 st, 2 nd and 4 th branches in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 3 rd branch comprises 1 × 1 convolution layer, a first 3 × 3 convolution layer, a second 3 × 3 convolution layer, a cavity convolution layer with a cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the first 3 × 3 convolution layer, the first input end of the second 3 × 3 convolution layer and the input end of the cavity convolution layer with the cavity rate of 2, the output end of the first 3 × 3 convolution layer is connected with the second input end of the second 3 × 3 convolution layer, and the output end of the second 3 × 3 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the feature map addition layer; the output feature graphs of the 1 st to 4 th branches are connected with each other through concatemate in a feature fusion layer to form a multi-scale CT feature; adding the multi-scale CT characteristics and the original input characteristic graph of the 5 th branch by the residual connection layer; 1, performing cross-channel feature fusion on the features processed by the residual connecting layer by the convolution dimensionality-increasing layer 1;

the multi-scale CT feature fusion module C comprises 5 parallel branches, a feature fusion layer, a residual connection layer, an improved CBAM channel attention module and a 1 x 1 convolution dimensionality increasing layer; the structure of the 1 st branch in the 5 parallel branches is the same as that of the multi-scale CT characteristic fusion module A; the 2 nd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer, 3 × 1 convolution layer, a cavity convolution layer with the cavity rate of 2 and a feature map addition layer, wherein the output end of the 1 × 1 convolution layer is connected with the input end of the 1 × 3 convolution layer and the input end of the cavity convolution layer with the cavity rate of 2, the output end of the 1 × 3 convolution layer is connected with the input end of the 3 × 1 convolution layer, and the output end of the 3 × 1 convolution layer and the output end of the cavity convolution layer with the cavity rate of 2 are connected with the feature map addition layer; the 3 rd branch comprises 1 × 1 convolution layer, 1 × 3 convolution layer A, 3 × 1 convolution layer A, a cavity convolution layer A with a void rate of 2, a feature map addition layer A, 1 × 3 convolution layer B, 3 × 1 convolution layer B, a cavity convolution layer B with a void rate of 2, a feature map addition layer B, an output end of 1 × 1 convolution layer A is connected with an input end of 1 × 3 convolution layer A, an input end of cavity convolution layer A with a void rate of 2, an input end of 1 × 3 convolution layer B, an output end of 1 × 3 convolution layer A is connected with an input end of 3 × 1 convolution layer A, an output end of 3 × 1 convolution layer A is connected with an input end of feature map addition layer A, an output end of cavity convolution layer A with a void rate of 2 is connected with an input end of feature map addition layer A, an output end of feature map addition layer A is connected with an input end of 1 × 3 convolution layer B, an input end of 2 convolution layer B, and an output end of 1 × 3 convolution layer B is connected with an input end of feature map addition layer B, 3, the output end of the convolution layer B by 1 and the output end of the void convolution layer B with the void ratio of 2 are connected with the input end of the characteristic graph addition layer B; the 4 th branch comprises an average pooling layer, a maximum pooling layer, a characteristic graph adding layer and a 1 x 1 convolution layer, wherein the output end of the average pooling layer and the output end of the maximum pooling layer are connected with the input end of the characteristic graph adding layer, and the output end of the characteristic graph adding layer is connected with the input end of the 1 x 1 convolution layer;

the improved CBAM channel attention module is used for respectively carrying out global average pooling and global maximum pooling on input feature maps, then generating two different channel attention feature maps through two 1-by-1 convolutional layers and a Sigmoid activation function, finally multiplying the two channel attention feature maps to form a final attention weight, and multiplying the final attention weight by an input feature map F pixel by pixel to obtain a final output feature F';

3-2 building a multimodal feature prediction network

the first multi-modal feature module takes CT features extracted by a CT feature extraction network and preprocessed lung influence factors as input, takes the linear change rate of the FVC as output, and comprises a Concatenate feature fusion layer A and a full connection layer which are sequentially cascaded; the concatemate feature fusion layer A is used for concatemate fusion of CT features extracted by a CT feature extraction network and lung influence factors in the preprocessed clinical features to obtain first multi-modal features; the full connection layer is used for predicting the linear change rate of the FVC;

the second multi-modal characteristic module takes the FVC linear change rate and all clinical characteristics output by the first multi-modal characteristic module as input, and takes the FVC predicted values in different weeks in the future as output, and comprises an attention module and a multilayer perceptron MLP which are sequentially cascaded;

the attention module calculation process is represented by equation 2:

in the formula, F_xRepresents the linear rate of change of the FVC and all clinical features output by the first multimodal feature model, M represents two fully connected layers, ε represents the Sigmoid activation function, F_wxRepresenting the output characteristics after passing through the attention module;

the multilayer perceptron comprises a first full-connection layer, an ELU activation function layer, a second full-connection layer, a GELU activation function layer and a third full-connection layer which are sequentially cascaded, the output characteristic diagram of the attention module is used as input, and the FVC value is used as output;

2. The method for predicting lung function decline based on multi-modal data for non-diagnostic purposes as claimed in claim 1, wherein the step (2) is specifically:

2-1, preprocessing the lung CT image:

removing DICOM medical image files which cannot be opened and worthless CT image data which do not contain lung information, and adjusting the sizes of the images to be uniform;

2-2, preprocessing clinical text data:

removing incomplete and wrong data in the clinical text data, performing characteristic engineering on the clinical text data, and performing Min-Max standardized processing to obtain required clinical characteristics;

3. The method according to claim 1, wherein the front-end down-sampling module in the CT feature extraction network comprises three serially connected 3 x 3 convolutional layers, a max-pooling layer, a 1 x 1 convolutional layer, two serially connected 3 x 3 convolutional layers, and a max-pooling layer, which are sequentially cascaded.

4. The method according to claim 1, wherein the multi-scale CT feature fusion module in the CT feature extraction network comprises sequentially cascaded 2 serially connected multi-scale CT feature fusion modules a, a first maximum pooling layer, 2 serially connected multi-scale CT feature fusion modules B, a second maximum pooling layer, 2 serially connected multi-scale CT feature fusion modules C, an average pooling layer, a global average pooling layer, a first fully-connected layer, and a second fully-connected layer.

5. The method of claim 1, wherein the specific process of the channel attention module in the CT feature extraction network for improving CBAM is represented by formula 1:

F'＝(ε(C(P_ag(F)))×ε(C(P_mx(F))))⊙F (1)

6. A lung function decline prediction apparatus based on multi-modal data, comprising: