CN114529516A

CN114529516A - Pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion

Info

Publication number: CN114529516A
Application number: CN202210050130.0A
Authority: CN
Inventors: 蔡林沁; 黄宇婷; 谭飞; 谢昊天
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-24
Anticipated expiration: 2042-01-17
Also published as: CN114529516B

Abstract

The invention discloses a pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion, which comprises the following steps of: preprocessing an original CT image; introducing space attention and channel attention capable of fusing space semantic features and channel features into a feature extraction network, adding SPCS (SpCS) with self attention at the tail part to construct a residual error network with double-path connection for extracting features under multiple scales, and fusing the features by using a feature pyramid network; constructing a plurality of branches of a detection and segmentation task; dividing branches to output multi-scale masks and aggregating results, detecting the branches to merge features again by a path aggregation network and outputting detection results; finally unifying the two task branch results as the total detection output, which has higher precision and sensitivity; and according to the detection result, cutting is carried out based on the positions of the nodules, and good-malignant differentiation of the lung nodules and prediction of cancer risk levels are completed through a good-malignant and cancer risk level classification network, so that doctors are assisted in disease diagnosis.

Description

Pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion

Technical Field

The invention belongs to the technical field of deep learning, computer vision and medical image processing, and particularly relates to a pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion.

Background

It is well known that biological aging is one of the causes of death in humans, and another cause of death is cancer. With the influence of smoking, air pollution, occupational exposure and the like, lung cancer patients are increasing year by year at present. The incidence and mortality of lung cancer among all cancer types worldwide is the first and one of the most common, most dangerous cancers in the world today. Since the disease is often diagnosed in patients at an advanced stage, its overall prognosis is very poor. An early form of lung cancer is lung nodules that need to be further examined and closely observed when the patient finds them. Therefore, early detection and Diagnosis of lung nodules is crucial to improving patient survival, and researchers have been extensively studied by researchers to use computers to assist radiologists in detecting and identifying lung nodules, i.e., Computer Aided Diagnosis (CAD).

Currently, many studies only detect or segment lung nodules in CT images, and there are few relevant studies to predict malignancy and malignancy of the detected or segmented results. In addition, because the CT of one case often contains 200 and 400 tomographic images, the CAD system constructed based on the conventional method is prone to missed diagnosis and misdiagnosis of a lesion, which is a lung nodule and has a small volume and is not easily distinguished by naked eyes, and brings additional economic burden to a patient and reduces the chance of healing. Therefore, the model constructed based on the convolutional neural network can be used for detecting and diagnosing the CT of the patient, the work of a pathologist can be effectively improved, and meanwhile, a good detection rate is obtained.

In recent years, lung nodule detection based on deep learning is generally a two-stage algorithm: firstly, a candidate nodule region is generated, and secondly, a false positive reduction link is added, so that the detection sensitivity is improved. The detection algorithm is complex in design and poor in real-time performance, and is difficult to meet the changing requirements of medical clinical application. Meanwhile, the existing lung nodule detection work only classifies the nodules as positive nodules and non-positive nodules. Erroneously, a "positive nodule" is considered a malignant nodule and a "non-positive nodule" is a benign nodule. In fact, these "positive nodules" include all nodules that a radiologist deems to be potentially malignant, with the degree of malignancy divided into five grades. The first grade is benign probability, the second grade is moderate benign probability, the third grade is uncertain probability is benign, the fourth grade is moderate malignant probability, and the fifth grade is high malignant probability. From the description at each level, we know that some lung nodules with high probability of being benign are still included in these "positive nodules". In other words, existing CAD systems are merely aids for nodule detection and do not identify the malignancy of lung nodules, and the final diagnosis still depends on the physician's decision. Therefore, it is necessary to design a mature and convenient intelligent diagnosis method for pulmonary nodules.

Application publication No. CN113902676A, a lung nodule image detection method and system based on attention mechanism neural network, comprising the following steps: acquiring a lung nodule image; obtaining a classification result of the lung nodule according to the obtained image and a preset neural network detection model; wherein, the neural network detection model adopts an attention mechanism and a 3D residual neural network. The method has the advantages that multiple trainable attention mechanism modules are fused into the neural network, high-level abstract features of data are effectively extracted, the data are directly used for recognition, classification and detection, the automation degree is high, real nodules and non-nodules can be effectively distinguished, and good effects are achieved in the aspects of improving the detection rate and reducing the false positive rate. Under the condition of guaranteeing high accuracy, according to multiple task forms, on the basis of finding out the nodules of different characteristic forms, a plurality of task branches are constructed, a channel and a space attention mechanism are added to fully focus on position information and contour information of lung nodules, an SPCS (shortest Path first) module is added behind a backbone network, information fusion of multiple scales is further completed, and aggregation of high-level semantic information is facilitated, and subsequent classification aiming at the nodules is facilitated. Thereby, high detection sensitivity with high accuracy is achieved.

CN112364909A is a method and a device for classifying wash paintings based on attention mechanism and multi-scale fusion, wherein the method for classifying wash paintings comprises: acquiring ink and wash paintings to be classified; inputting the ink and wash painting into a pre-trained deep convolution neuron network model, and respectively extracting low-level features, middle-level features and high-level features according to a preset rule; respectively inputting the low-level features and the middle-level features into a pre-trained spatial attention processing module, and extracting feature information with resolution; inputting the high-level features into a pre-trained multi-scale feature processing module for preventing the high-level features from being over-fitted; inputting the processed low-level features, the middle-level features and the high-level features into a pre-trained conditional random field model for feature fusion; inputting the fused features into a pre-trained channel attention processing module to assign values to each feature channel; and inputting the processed features into a pre-trained classifier for classification, thereby realizing the classification of the ink and wash painting with high accuracy. The preamble of the ink-wash painting classification does not detect this part, and the classification of the patent is based on the detected lung nodules and then classifies the nodules. Unlike ink-wash painting, lung nodules have different sizes on a CT picture, and when a target image is cut, the lung nodules are not only cut according to the sizes of the nodules, but also the size information of the lung nodules is kept, which can provide certain reference for subsequent classification and can be cut in a self-adaptive mode. In order to meet the challenges brought by the difference of the diameters of the nodules, convolution kernels with different sizes are used in a first-layer network of a classification network to complete convolution operation, and different nodule features are aggregated for further feature extraction in the follow-up process. And important characteristic basis is provided for subsequent classification.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The pulmonary nodule detection and classification method based on the fusion of multi-attention and multi-task features is provided, and is used for carrying out high-sensitivity detection on pulmonary nodules at high precision and giving evaluation on benign and malignant pulmonary nodules and cancer risk levels through a classification network. The technical scheme of the invention is as follows:

a pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion comprises the following steps:

the original CT image reserves HU value of-1000, +400, and others are cut off, and becomes an image of 512 x 512 by the range of [0,255] through linear transformation, and can be visually checked;

after image normalization processing, inputting the image into a lung nodule detection network with a space attention, a channel attention and a self-attention mechanism and two task branches to obtain a detection result;

and according to the detection result, cutting based on the position of the nodule, and inputting the cut into a classification network with an attention mechanism and two task branches to obtain the grade prediction result of the benign and malignant lung nodule and the cancer risk.

Further, the pulmonary nodule detection network comprises a feature extraction network, feature fusion and two task branches. The image is transmitted into a feature extraction network, and feature maps of six scales of 512 × 512, 256 × 256, 128 × 128, 64 × 64, 32 × 32 and 16 × 16 are obtained;

features of the 16 × 16, 32 × 32, 64 × 64 scale will be fused first from dark to light. Then dividing the task into two task branches, namely a detection task branch and a division task branch;

further, the detection branch uses the three scale feature maps of 16 × 16, 32 × 32 and 64 × 64 after feature fusion, then fuses the three scale feature maps from shallow to deep, and finally outputs the nodule detection result on the three scale feature maps of 16 × 16, 32 × 32 and 64 × 64;

further, the segmentation branches continue to fuse the features of 128 × 128, 256 × 256 and 512 × 512 scales on the basis of feature fusion of 64 × 64, and output semantic segmentation maps of three different scales through up-sampling, convolution and sigmoid activation, and finally output the final semantic segmentation map by combining the results of the three. The semantic segmentation graph is unified with the result output by the detection branch through result conversion, and the final lung nodule detection result is output.

Furthermore, the feature extraction network has a total of 40 layers, and comprises 17 convolutional layers and 23 residual blocks, wherein each residual block comprises a channel attention with a convolutional kernel size of 1 × 1 and a spatial attention with a convolutional kernel size of 7 × 7; the first convolution set comprises 2 convolution layers and can rapidly expand channels, and other convolution sets comprise 3 convolution layers;

furthermore, the feature extraction network takes a residual error network as a backbone, a space attention mechanism and a channel attention mechanism are added in a residual error block, and a path with only 1 × 1 convolution is additionally added in the block to form a double path, so that the focusing of positioning information and the smooth flow of the information are facilitated;

further, an SPCS module is added at the tail part of the backbone network, and information and focusing important semantic information are further extracted from the whole situation and the multiple scales. When the normalized image is input, learnable normalization is carried out through a layer of BactchNorm, and then characteristic maps of six scales of 512 × 512, 256 × 256, 128 × 128, 64 × 64, 32 × 32 and 16 × 16 are output through a backbone network and an SPCS module;

further, the SPCS module is a spatial pyramid convolution with a self-attention mechanism, takes a deepest feature map extracted by a backbone network as input, is divided into four paths after one layer of convolution, one path is connected as a shortcut, and the other three paths are respectively subjected to convolution and self-attention by convolution kernels of 5 × 5, 9 × 9 and 13 × 13, and are fused and output by one layer of convolution;

further, the detection branch outputs three prediction results on the feature maps of the three scales of 16 × 16, 32 × 32 and 64 × 64 through two-layer convolution. The smallest lung nodule also dominates eight pixels wide, so it is appropriate to choose these three dimensions for prediction;

further, the network output has 4 values, target confidence, coordinate x, coordinate y, and nodule diameter, respectively, where results with target confidence greater than 0.85 will remain and results with confidence less will be suppressed;

furthermore, the segmentation branch obtains a semantic segmentation image of the image, and the semantic segmentation image is subjected to binarization processing and morphological processing through threshold filtering according to the following formula:

wherein M denotes a binary segmentation map resulting from the segmentation of the branch, which indicates a morphological erosion by the convolution kernel K,

representing the morphological dilation by the convolution kernel K. Firstly, etching is carried out to remove the noise interference of the sporadic pixels; then the lung nodules expand greatly and corrode finally, and the slightly discontinuous areas are connected into one area because some lung nodules have the characteristic that the outline of a frosted glass-shaped boundary is not very clear, the size (diameter) of the lung nodules is kept unchanged, and the targets with the small size are removed. And finally, in the processed segmentation map, searching a minimum circumscribed rectangular frame of each connection region to obtain the position of each rectangular frame, namely the position of the lung nodule detected in the segmentation map, wherein the maximum side length of the rectangular frame is the diameter of the nodule. The reliability value of the result is calculated according to the following formula:

m＝M(box)

p＝mean(mean(m),max(m))

where M is the original segmentation map output by the network — each pixel value on the map is a probability value of whether it belongs to a lung nodule. M (box) indicates the target frame obtained by the conversion, and the segmentation binary mask is used to ask for the value of the region in M, resulting in M. Finally, the confidence of the target object is obtained. The semantic segmentation map is also converted into a node target output represented by four values of target confidence, coordinate x, coordinate y and node diameter as the same as the detection branch.

Furthermore, the unification of the detection branch and the segmentation branch results is to find a union of the two results and output the union as a final detection result of the pulmonary nodule detection network.

The lung nodule classification network has the same spatial attention mechanism, channel attention mechanism and self-attention mechanism. Since lung nodules have different sizes in the picture, while cropping will also preserve their size characteristics, and since lung nodules have complex morphological characteristics, at the beginning of the network, convolution operations will be performed by convolution kernels with sizes of 7 × 7, 14 × 14, 32 × 32, and 64 × 64, and key features will be focused by spatial attention and channel attention, followed by convolution to fuse the four-scale features. And extracting high-level semantic information through a series of residual blocks and self-attention blocks. Wherein the residual block uses the same residual block as the pulmonary nodule detection network. Semantic features of different depths are fused, and prediction results of two branches are output by a fully-connected neural network and are respectively benign and malignant lung nodules and cancer risk levels.

Cutting the detection result of the detection model into 64 multiplied by 64 to be used as the input of the benign and malignant prediction;

the image is cut, the cutting mode is determined according to the lung nodule diameter of the detection result of the lung nodule obtained by the lung nodule detection network, the pixel width of the diameter is less than 57 multiplied by 57, the cutting is carried out by taking the detection center point as the cutting center point, the cutting rectangular area is 57 multiplied by 57, and then the size is reduced to 64 multiplied by 64 by using resize;

further, a rectangular area with a diameter larger than 57 × 57 will be cut to have the same true diameter, and then resize to a size of 64 × 64. And then input into a subsequent pulmonary nodule classification network. The method comprises the steps that input data serve as a training set, training of classifier models of benign and malignant nodules is conducted on a convolutional neural network structure, the quality of a plurality of models is compared on a verification set, and an optimal two-dimensional depth convolutional neural network model is obtained;

further, the pulmonary nodule classification network also has spatial attention, channel attention and self-attention mechanisms. Since lung nodules have different sizes in the picture, the cropping will also retain the size characteristics, and meanwhile, due to the complex morphological characteristics of lung nodules, at the beginning of the network, convolution operations will be performed by convolution kernels with sizes of 7 × 7, 14 × 14, 32 × 32 and 64 × 64, and key features are focused through a spatial attention mechanism and a channel attention mechanism, and then four-scale features are fused through convolution;

further, high-level semantic information is extracted through a series of residual blocks and self-attention blocks. Wherein the residual block is the same as the residual block used by the pulmonary nodule detection network.

Furthermore, semantic features of different depths are fused, the prediction results of the two branches are output by the fully-connected neural network, and the five classes of benign, malignant and cancer risk levels of the pulmonary nodules are classified respectively.

Finally, considering that the morphological characteristics of the nodule are complex, a single characteristic set cannot fully describe the information of the lung nodule, and the malignancy degree of the lung nodule needs to be classified by fusing edge characteristics and texture characteristics; extracting four scale features of 7 multiplied by 7, 14 multiplied by 14, 32 multiplied by 32 and 64 multiplied by 64 from the classification network, and learning high-level semantic information through a plurality of staggered blocks; the advanced feature learning stage is composed of a series of residual blocks, the features of the last average pooling layer and the features of the last residual block are fused, and finally, the full-connection layer outputs good and malignant results.

The invention has the following advantages and beneficial effects:

the method is used for detecting and classifying the lung nodules based on a multi-attention mechanism, a multi-task form and multi-scale feature fusion, and the lung nodules are used at different positions of a network according to different characteristics of the lung nodules through various attentions. Wherein, the channel attention and the space attention introduce small calculation amount and use shallow layer to focus key features; self-attention is used in the SPCS module at the deepest level because its computation is exponential to size and has a higher global feature focusing capability. The use of the multiple attention mechanisms solves the problems of difficult detection and low detection precision of small targets such as lung nodules. And two task branches are constructed, the lung nodule characteristics are inspected differently through two task forms, the lung nodules which cannot be detected in the respective task forms can be detected, the output results of the two task branches are comprehensively unified in an inference stage through joint training, the detection rate of the lung nodules is further improved, and the method is realized on the premise of high accuracy. Meanwhile, the benign and malignant classification task of the lung nodules has correlation with the five classification tasks of the cancer risk level, the improvement of the common task can be promoted, and therefore the classification prediction result of the model also has high accuracy. In addition, the detection result of the lung nodule provides more valuable reference for the subsequent classification of the lung nodule.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the present invention for intelligently diagnosing pulmonary nodules;

FIG. 2 is a block diagram of a pulmonary nodule detection network of the present invention;

FIG. 3 is a block diagram of a pulmonary nodule classification network of the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

the patent provides a full-automatic intelligent pulmonary nodule detection and classification method. The invention provides a multi-cascade parallel network which is used for completing nodule detection and classification based on a multi-attention machine mechanism, multi-scale feature fusion and a multi-task mode. The advantages of the deep learning and the autonomous learning of the lung nodule characteristics can be well utilized, and meanwhile, the nodule characteristics of different scales are fused on the basis of lung nodules with different diameters, so that high detection sensitivity and classification accuracy are obtained, and as shown in fig. 1, the method mainly comprises two parts, namely nodule detection and benign and malignant prediction.

As shown in fig. 2, a pulmonary nodule detection network based on multi-attention mechanism, multi-task form, multi-scale feature fusion. The original CT image is normalized and then input to the detection network. Firstly, parameter science standardization of a BN layer is carried out, and then six Block blocks are carried out, namely Block0, Block1, Block2, Block3, Block4 and Block 5. Wherein Block0 is wrapped by two layers of convolution of 3 x 3, and the main purpose of the Block is to rapidly expand channels and enrich feature representation. And blocks 1, 2, 3, 4 and 5 contain a 3 × 3 convolution with a step size of 2 as downsampling, so that the downsampling is divided into two paths, one path is stacked by a plurality of residual blocks, the other path is processed by a1 × 1 convolution, then the results of the two paths are spliced, and the results are fused by the 3 × 3 convolution to be used as the output of the Block. And the residual block consists of two 3 x 3 convolutions, plus an intermediate spatial attention mechanism and channel attention mechanism. The detailed structure of the residual block is shown in the residual block of the legend portion of fig. 1. Through these six block processes, there will be six feature maps outputting dimensions 512 × 512, 256 × 256, 128 × 128, 64 × 64, 32 × 32, and 16 × 16. And the 16 × 16 feature graph is processed by the SPCS module and then output. The BN layer, in addition to the six block modules and SPCS modules, forms the backbone network. The detailed parameters are shown in the table I.

TABLE 1 Structure and parameters of feature extraction network

As shown in the SPCS module area of fig. 2, the structure of the SPCS module is divided into four paths through a layer of 3 × 3, one path is connected as a shortcut, the other three paths pass convolution kernels of 5 × 5, 9 × 9, and 13 × 13 and self-attention, and finally, the four path features are merged and output through a layer of convolution.

Here, the BN layer plus six block modules and SPCS modules form the backbone network. The spatial attention and the channel attention in the network are used in each residual block, so that the network is helped to aggregate related information and suppress irrelevant information, and the detection accuracy and sensitivity of the small lung nodule are improved; the self-attention system is only used in the SPCS module, and mainly in a deep network, global useful features are more comprehensively aggregated through self-attention, semantic understanding is enhanced, and classification is facilitated, namely the detection accuracy is improved. The image will be processed by the backbone network to output features at six different scales, 512 × 512, 256 × 256, 128 × 128, 64 × 64, 32 × 32, and 16 × 16. Feature fusion will start on the 16 x 16 scale with deep to shallow depths in the form of FPN, from 16 x 16, 32 x 32 to 64 x 64. After the feature of 64 × 64 scale after feature fusion is obtained, the branch is divided into two task branches, namely a segmentation task branch and a detection task branch.

The detection task branches use three feature maps of 64 × 64, 32 × 32 and 16 × 16 after feature fusion, and then fusion is performed on features of 64 × 64, 32 × 32 to 16 × 16 scales from shallow to deep in a PAN mode, and a prediction result of the lung nodule is output. The smallest lung nodule also occupies 8 pixels in width, so this is suitable. This process is illustrated in fig. 2. The representation of the lung nodule target is target confidence, x-axis coordinate, y-axis coordinate, lung nodule diameter, respectively. While results with target confidence greater than 0.85 will remain and less than will be suppressed.

And the segmentation branches continue to carry out feature fusion from deep to light and are decoded into semantic segmentation maps of the images. After the feature of 64 × 64 scale is obtained after the feature is fused, it is upsampled, spliced with the feature of 128 × 128 scale and fused by convolution as shown in fig. 2. This process continues to the 512 x 512 position. Meanwhile, after convolution processing, features of 128 × 128, 256 × 256 and 512 × 512 subjected to feature fusion are convolved and upsampled to the same size as that of an input image to obtain semantic segmentation maps under features of different scales, so that rough segmentation maps to fine segmentation maps are formed, and the results of the rough segmentation maps and the fine segmentation maps are fused through convolution operation and are output as the final segmentation map result. The segmentation result will then be unified with the output result of the detection branch by result conversion.

The segmentation branches into semantic segmentation maps of the image and filtering by thresholding, regions that are likely to be lung nodules are shown as white highlighted regions in fig. 2. Then, the binary processing is carried out, the morphological processing is carried out, and the processing is carried out according to the following formula

m＝M(box)

p＝mean(mean(m),max(m))

where M is the original segmentation map output by the network-each pixel value on the map is a probability value of whether it belongs to a lung nodule. M (box) indicates the target frame obtained by the conversion, and the segmentation binary mask is used to ask for the value of the region in M, resulting in M. Finally, the confidence of the target object is obtained. The semantic segmentation map is also converted into a node target output represented by four values of target confidence, coordinate x, coordinate y and node diameter as the same as the detection branch.

And finally, unifying the results of the detection branch and the division branch. The results of the two are merged and output as the final detection result of the pulmonary nodule detection network. The two branch results complement each other and complement each other, so that the detection sensitivity is improved under the condition of keeping high precision. As shown in fig. 2, there are a total of three nodules in the current input. The detection branch finds two nodules and the segmentation branch also finds two nodules, but their two branches do not find the same nodule but repeat to find one, each finding another different nodule. And finally, integrating the results of the two aspects, and outputting a final detection result, namely finding three nodules in total. The nodes which cannot be detected by the other side can be mutually supplemented based on the two task forms, so that the network focuses on improving the accuracy of network pulmonary node detection, and the detection sensitivity is improved under the condition of ensuring the accuracy by the difference of lung node feature investigation through different tasks.

Early detection of benign or malignant lung nodules is of great significance for clinical diagnosis. For malignant nodules, however, a malignant tumor risk assessment is often used to guide physicians in determining the stage of the cancer and in planning subsequent prognosis. However, due to differences in the size, shape and location of nodules, classifying nodules in computer-aided diagnosis systems has been a significant challenge. Based on the detection result of the lung nodules, the method completes the second classification of benign and malignant nodules and the fifth classification of cancer risk grades. The pulmonary nodule classification model structure proposed by the patent is shown in fig. 3 and mainly comprises three parts: multi-scale feature extraction, advanced feature learning and feature fusion.

Firstly, a nodule detection result is obtained according to a pulmonary nodule detection network. The lung nodule image at the corresponding position is cut out from the original image according to the result, and the lung nodule image is sent to a next-level lung nodule classification network.

The cropping method is to determine the cropping mode according to the lung nodule diameter, the pixel width of the cropping mode is less than 57 × 57, the center point of the cropping mode is the detected center point, the rectangular area of the cropping mode is 57 × 57, the size is adjusted to be 64 × 64, and the rectangular area of the cropping mode, the diameter of which is more than 57 × 57, is adjusted to be 64 × 64 according to the real diameter of the rectangular area of the cropping mode. A subsequent pulmonary nodule classification network is then input.

The lung nodule classification network based on the fusion of the multi-attention and the multi-task features also comprises spatial attention, channel attention and self-attention, and has two task branches, wherein one task branch is used for predicting the benign and malignant degree of the lung nodule, and the other branch is used for predicting the cancer risk level of the lung nodule. The two task branches have their relevance, but are distinct. The cancer risk grades are five grades respectively, namely first grade, second grade, third grade, fourth grade and fifth grade, and the probability of cancer is from low to high and is considered by a radiologist. At the same time, the first grade and the second grade are divided into benign, the fourth grade and the fifth grade are malignant, wherein the third grade is uncertain whether the cancer is suffered or not. In this way, improvement of the common accuracy can be facilitated from two angles. The detailed parameters of the network structure are shown in table 2:

table 2 structure and parameters of feature extraction network

Since lung nodules have different sizes in the picture, while cropping will also preserve their size characteristics, and since lung nodules have complex morphological characteristics, at the beginning of the network, convolution operations will be performed by convolution kernels with sizes of 7 × 7, 14 × 14, 32 × 32, and 64 × 64, and key features will be focused by spatial attention and channel attention, followed by convolution to fuse the four-scale features. And extracting high-level semantic information through a series of residual blocks and self-attention blocks. Wherein the residual block uses the same residual block as the pulmonary nodule detection network. Semantic features of different depths are fused, and prediction results of two branches are output by a fully-connected neural network and are respectively benign and malignant lung nodules and cancer risk levels. More reference information may be provided for the detected lung nodules. Meanwhile, an intelligent diagnosis system for pulmonary nodules can be constructed.

The flow chart of the intelligent diagnosis of the pulmonary nodule provided by the invention is shown in fig. 1. The CT image reserves HU value of-1000, +400, and others are discarded, and is converted into an image with the range of [0,255] being 512 x 512 through linear transformation, on one hand, the image can be visually checked by a doctor, and on the other hand, the image is conveniently input into a pulmonary nodule detection network after being normalized. After the CT image is preprocessed, the lung nodule detection is completed through a lung nodule detection network based on a multi-attention mechanism and a multi-task form, and a detection result is obtained; then cutting the image of the corresponding lung nodule area, and obtaining the image of each detected lung nodule through a certain suitable cutting strategy; and obtaining the prediction result of the benign and malignant lung nodules and the cancer risk grade of each lung nodule from the lung nodule benign and malignant lung nodule and cancer risk grade prediction network, and displaying and storing the corresponding result for other use.

The training process of the pulmonary nodule detection network of the present invention is as follows:

the CT image is converted into a png image, and a doctor marks the position and the mask of a corresponding pulmonary nodule, the LUNA16 data set is used, the position of the pulmonary nodule is marked by a professional doctor, and the mask is marked additionally;

and after data enhancement processing, the CT image is input into a pulmonary nodule detection network for prediction. The segmentation branch is trained by using binary cross entropy, the target confidence of the detection branch is trained by using binary cross entropy, and the prediction of two coordinates and the lung nodule diameter is trained by using a mean square error loss function.

The whole network is regularized by using dropout with the probability of 0.1, the initial learning rate is 0.001 by using an Adma optimization algorithm, the parameter beta 1 is 0.9, the parameter beta 2 is 0.999, the batch size is 8, and 1000 epochs are iterated in total.

The training process of the pulmonary nodule classification network of the invention is as follows:

adopting a cutting strategy which is the same as the reasoning phase to manufacture a data set for classifying network training from the CT video;

and after data enhancement, the pictures are sent to a classification network for prediction. And (3) optimizing by using a cross entropy loss function, adopting the same optimization algorithm as the detection network, and iterating for 500 epochs with an initial learning rate of 0.0001, a parameter beta 1 of 0.9 and a parameter beta 2 of 0.999 and a batch size of 64.

Wherein the detection network achieved a detection sensitivity of 0.94 at an average of 0.125 errors per CT scan and a CPM score of 0.949, and the classification, wherein the benign-malignant classification, AUC, was 0.95 and the cancer risk score for the grade prediction had a macro-average ROC score of 0.95.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A pulmonary nodule detection and classification method based on multi-attention and multi-task feature fusion is characterized by comprising the following steps:

preprocessing a lung CT scanning image to obtain CT image data for training, dividing the CT image data into a training set and a verification set, and inputting the CT image data into a lung nodule detection network to finish nodule detection;

the pulmonary nodule detection network outputs characteristic maps of six different scales, namely 512 × 512, 256 × 256, 128 × 128, 64 × 64, 32 × 32 and 16 × 16, through a backbone network consisting of six block modules and one SPCS module;

then, three feature maps of 64 × 64, 32 × 32 and 16 × 16 are fused in the detection branch to output prediction results, namely a target confidence degree, an x-axis coordinate, a y-axis coordinate and a lung nodule diameter;

the division task branches and fuses six feature maps, outputs semantic division maps under the three scales of 512 multiplied by 512, 256 multiplied by 256 and 128 multiplied by 128 through convolution, up-samples the semantic division maps to the size of 512 multiplied by 512, and finally outputs the final result through convolution and fusion of the three;

then converting the results of the segmentation branches, and unifying the results of the segmentation branches and the results of the detection branches to be used as the total pulmonary nodule detection output;

and cutting the region data of the corresponding lung nodule according to the detection result of the lung nodule, finishing the judgment of the benign and malignant lung nodule and the judgment of the cancer risk level by a lung nodule benign and malignant lung nodule and cancer risk level prediction network, and finally outputting.

2. The method of claim 1, wherein the backbone network comprises three attention mechanisms, namely a channel attention mechanism, a spatial attention mechanism and a self-attention mechanism.

3. The method for lung nodule detection and classification based on multi-attention and multi-task feature fusion as claimed in claim 1, wherein the multi-task is a detection task and a segmentation task.

4. The method for lung nodule detection and classification based on multi-attention and multi-task feature fusion as claimed in claim 1, wherein the SPCS module is a spatial pyramid convolution with a self-attention mechanism, takes the deepest feature map extracted by the backbone network as an input, and after one layer of convolution, divides the input into four paths, one path is connected as a shortcut, and the other three paths respectively pass convolution with convolution kernels of 5 × 5, 9 × 9 and 13 × 13 and self-attention, and fuses the four paths of features through one layer of convolution and outputs, thereby enhancing the identification capability of lung nodule representation.

5. The method for lung nodule detection and classification based on multi-attention and multi-task feature fusion as claimed in claim 1, wherein the network for predicting benign and malignant lung nodules and risk level of cancer uses the channel attention mechanism, the spatial attention mechanism and the self-attention mechanism as well, and has two task branches, one for predicting benign and malignant lung nodules and the other for predicting risk level of cancer.