CN115496744A

CN115496744A - Lung cancer image segmentation method, device, terminal and medium based on mixed attention

Info

Publication number: CN115496744A
Application number: CN202211267954.XA
Authority: CN
Inventors: 公一; 韩峻松; 苏军; 徐祎春; 周佳菁
Original assignee: SHANGHAI BIOCHIP CO Ltd
Current assignee: SHANGHAI BIOCHIP CO Ltd
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2022-12-20

Abstract

The invention provides a lung cancer image segmentation method, a lung cancer image segmentation device, a lung cancer image segmentation terminal and a lung cancer image segmentation medium based on mixed attention, wherein a lung cancer pathological section image data set marked with lung cancer characteristics is obtained and preprocessed; constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set; the hybrid attention neural network is trained and validated. The method can be used for more accurately predicting the lung cancer types and realizing the function of predicting lesion areas which is not possessed by the neural network based on classification; the classical attention mechanism based on a neural network is improved, the channel attention, the space attention and the coordinate attention are fused to form a mixed attention module, the attention on the channel dimension, the space dimension and the coordinate dimension can be mixed simultaneously, and the attention on the lung cancer characteristics is better improved; the problem of inefficient ability to extract features without attention or with a single attention is overcome.

Description

Lung cancer image segmentation method, device, terminal and medium based on mixed attention

Technical Field

The invention relates to the technical field of lung cancer image segmentation based on single image prediction, in particular to a lung cancer image segmentation method, a lung cancer image segmentation device, a lung cancer image segmentation terminal and a lung cancer image segmentation medium based on mixed attention.

Background

With the increase in incidence and medical level of lung cancer, more and more lung cancer pathological sections need to be preliminarily diagnosed to assist physicians in guiding treatment decisions of individual patients by combining molecular and clinical information. Traditional lung cancer pathological section images need to be manually interpreted by professional pathologists, and a large amount of time cost and manpower resources are consumed. The rise of artificial intelligence provides possibility for automatically judging and calculating the lung cancer types in the images. Therefore, the lung cancer image segmentation method based on mixed attention is used for predicting the existence of the lung cancer and the types of the lung cancer, has important significance for reducing the repeated work of medical personnel and improving the diagnosis efficiency of doctors, and also provides convenience for scientific research.

The existing lung cancer image detection is mainly divided into two types: 1) A classification-based method, for example, a neural network such as VGG is used to determine the lung cancer type of the whole picture by linear regression at the rear end of the network; 2) Segmentation-based methods, such as methods that use neural networks to regress mask maps. However, the above lung cancer detection methods based on deep learning both have limitations in methods 1) and 2), and cannot draw the distribution of lung cancer in a picture.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a method, an apparatus, a terminal and a medium for lung cancer image segmentation based on mixed attention, which are used to solve the technical problems that the existing lung cancer detection method has limitations and the distribution of lung cancer in a picture cannot be drawn.

To achieve the above and other related objects, a first aspect of the present invention provides a method for lung cancer image segmentation based on mixed attention, including: acquiring and preprocessing a lung cancer pathological section image data set with a label marked with lung cancer characteristics; constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set; and training and verifying the mixed attention neural network for outputting a corresponding segmentation graph and a prediction result after inputting the lung cancer image to be predicted.

In some embodiments of the first aspect of the present invention, the acquiring and preprocessing the lung cancer pathological section image dataset labeled with the lung cancer features includes: downloading data with the data type of a lung cancer pathological section image from the public data set to form a lung cancer pathological section image data set; dividing the lung cancer pathological section image data set into a training set and a verification set according to a preset proportion; converting the image labels of the images in the lung cancer pathological section image data set into mask images as true value images; and adjusting the width and height pixels in the image data of each image to be multiples of 8, and correspondingly adjusting the true value image according to the proportion.

In some embodiments of the first aspect of the present invention, the constructing a mixed attention neural network model fused with channel attention, coordinate attention and spatial attention according to the preprocessed lung cancer pathological section image dataset includes: constructing an encoder backbone network by taking an initial input feature map layer in a convolutional neural network as a feature extraction layer so as to extract features of image data in a lung cancer pathological section image data set; constructing a decoder backbone network; the decoder backbone network comprises 5 modules, wherein the first three modules are respectively provided with branch line outputs, and output results are unified in dimensionality through the mixed attention module and are output after being connected according to channels to serve as the input of a fourth module; the output of the fourth module is used as the input of the fifth module, and the final output result is a single channel which is the same as the size of the input original image.

In some embodiments of the first aspect of the present invention, the initial input feature map layer in the convolutional neural network VGG16 is used as a feature extraction layer; the number of convolution kernels of each branch line of each layer in the first module is N, and then the convolution kernels are connected with a maximum pooling layer; the number of convolution kernels of each branch line of each layer in the second module is 2N, the second module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/2 of that of an original image; the number of convolution kernels of each branch line of each layer in the third module is 4N, the third module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/4 of that of an original image; the number of convolution kernels of each branch line of each layer in the fourth module is 8N, the output of the branch lines of the first module, the second module and the third module is modified into a uniform dimension, the branch lines are connected according to channels and then output, the number of output channels is 8N, and the size of a feature map is 1/8 of that of an original image; wherein N is an integer greater than 0.

In some embodiments of the first aspect of the present invention, the process of constructing the hybrid attention module comprises: and processing and superposing the initial input feature maps respectively based on a channel attention mechanism, a coordinate attention mechanism and a space attention mechanism so as to weight the attention of the channel, the coordinate and the space.

In some embodiments of the first aspect of the present invention, the hybrid attention module performs three sets of processing on an initial input feature map: processing the initial input feature map based on a channel attention mechanism in a first set of processes, comprising: respectively carrying out average pooling and maximum pooling on the initial input feature map; all feature maps obtained by pooling are processed by a convolution module, an activation function module and a convolution module in sequence and then added; processing the feature maps obtained by adding by an activation function module, and multiplying the obtained result by the initial input feature map to obtain a first group of processing results; processing the initial input feature map based on a coordinate attention mechanism in a second set of processes, comprising: carrying out average pooling treatment on the initial input feature map in the height direction and the width direction respectively, and splicing the pooling feature map in the height direction with the pooling feature map in the width direction; the spliced feature graphs are sequentially processed by a convolution module and a normalization module to obtain a first to-be-multiplied feature graph, and the first to-be-multiplied feature graph is input into an activation function module to obtain a second to-be-multiplied feature graph; multiplying the first to-be-multiplied feature map with the second to-be-multiplied feature map; decomposing the feature maps obtained by multiplication into feature maps in the height direction and the width direction, and adding the feature maps in all directions after the feature maps are processed by a convolution module and an activation function module; multiplying the feature map obtained by adding with the initial input feature map to obtain a second group of processing results; processing the initial input feature map based on a spatial attention mechanism in a third set of processes, comprising: respectively carrying out averaging and maximization processing on the initial input characteristic diagram on the spatial dimension with the channel number of 1, and then splicing; the feature graph obtained by splicing is sequentially processed by a convolution module and an activation function module, and the obtained feature graph is multiplied by the initial input feature graph to obtain a third group of processing results; and adding the first group of processing results, the second group of processing results, the third group of processing results and the initial input feature map and outputting.

In some embodiments of the first aspect of the present invention, in the decoder backbone network, the number of convolution kernels of each branch line in each layer in the first module is 8N, and then an up-sampling layer is connected to perform up-sampling processing; the number of convolution kernels of each branch line in each layer in the second module is 4N, and then an upper sampling layer is connected for up-sampling processing; the number of convolution kernels of each branch line in each layer in the third module is 2N, and then an upper sampling layer is connected for carrying out up-sampling processing; the number of convolution kernels of each branch line of each layer in the fourth module is N; the number of convolution kernels of each layer in the fifth module is (N, 1), and the final output result is a single-channel image with the same size as the original image; wherein N is an integer greater than 0.

In some embodiments of the first aspect of the present invention, the process of training and validating the hybrid attention neural network comprises: setting a loss function and selecting a model optimizer and parameters thereof; inputting the preprocessed images and truth diagrams in the training set into a neural network for training; and loading the neural network parameters obtained by training, and evaluating the performance of the neural network obtained by training by using a verification set test.

To achieve the above and other related objects, a second aspect of the present invention provides a mixed attention-based lung cancer image segmentation apparatus, comprising: the preprocessing module is used for acquiring and preprocessing the lung cancer pathological section image dataset marked with the lung cancer characteristics; the model construction module is used for constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set; and the training and verifying module is used for training and verifying the mixed attention neural network for outputting a corresponding segmentation graph and a prediction result after inputting the lung cancer image to be predicted.

To achieve the above and other related objects, a third aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the mixed attention-based lung cancer image segmentation method.

To achieve the above and other related objects, a fourth aspect of the present invention provides an electronic terminal comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory to enable the terminal to execute the lung cancer image segmentation method based on mixed attention.

As described above, the lung cancer image segmentation method, apparatus, terminal and medium based on mixed attention according to the present invention have the following advantageous effects:

(1) The method can predict the lung cancer types more accurately, and can realize the function of predicting lesion areas which is not provided by a neural network based on classification.

(2) The invention designs a new mixed attention module, improves the classical attention mechanism based on a neural network, forms a mixed attention module by fusing the ideas of channel attention, space attention and coordinate attention, can simultaneously mix the attention on the channel dimension, the space dimension and the coordinate dimension, and better improves the attention on the lung cancer characteristics.

(3) On the basis of extracting features from a VGG16feature layer, the overall architecture of the invention better extracts feature information by introducing a mixed attention module, improves the attention of a network to lung cancer features, and overcomes the problem of low efficiency of feature extraction capability without attention or with single attention.

Drawings

Fig. 1 is a flowchart illustrating a lung cancer image segmentation method based on mixed attention according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart illustrating a process of acquiring and preprocessing an image dataset of lung cancer pathological section labeled with lung cancer features according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating a process of constructing a hybrid attention neural network model according to an embodiment of the present invention.

Fig. 4A is a flowchart illustrating a lung cancer image segmentation method based on mixed attention according to an embodiment of the invention.

Fig. 4B is a schematic diagram illustrating a structure of a lung cancer image segmentation neural network according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention.

Fig. 6 is a schematic structural diagram illustrating a lung cancer image segmentation apparatus based on mixed attention according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.

In order to solve the problems in the background art, the invention provides a lung cancer image segmentation method, a device, a terminal and a medium based on mixed attention, and aims to effectively extract the features of lung cancer images with different scales by adopting a mixed attention neural network, and simultaneously concentrate the attention on the region where the lung cancer is located in a single image, so that the problem of low efficiency of feature extraction capability under the condition of no attention or single attention is solved, the effective feature extraction under the mixed attention of deep learning is enhanced, and the method, the device, the terminal and the medium have important practical significance for improving the network performance and assisting professionals in detection.

Meanwhile, in order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Before the present invention is explained in further detail, terms and expressions referred to in the embodiments of the present invention are explained, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations:

<1> mixed attention mechanism: the core goal of the attention mechanism in deep learning is to select information which is more critical to the current task goal from a large number of information, and the essence is weighting, and a neural network is expected to automatically learn the places needing attention in the picture or text sequence. The mixed Attention mechanism is to evaluate and score the channel Attention and the space Attention simultaneously, such as CBAM (conditional Block Attention Module)

<2> -OpenCV: the software library is a cross-platform computer vision and machine learning software library issued based on Apache2.0 license (open source), and can run on Linux, windows, android and Mac OS operating systems; openCV is composed of a series of C functions and a small amount of C + +, provides interfaces of languages such as Python, ruby, MATLAB and the like, and realizes a plurality of general algorithms in the aspects of image processing and computer vision.

Embodiments of the present invention provide a method for lung cancer image segmentation based on mixed attention, a system for the method for lung cancer image segmentation based on mixed attention, and a storage medium storing an executable program for implementing the method for lung cancer image segmentation based on mixed attention. In terms of implementation of the method for segmenting the lung cancer image based on mixed attention, the embodiment of the invention will describe an exemplary implementation scenario of the segmentation of the lung cancer image based on mixed attention.

Fig. 1 is a flowchart illustrating a lung cancer image segmentation method based on mixed attention according to an embodiment of the present invention. The lung cancer image segmentation method based on mixed attention in the embodiment mainly comprises the following steps:

step S1: and acquiring and preprocessing a lung cancer pathological section image data set with a label marked with lung cancer characteristics.

Specifically, the implementation process of acquiring and preprocessing the lung cancer pathological section image dataset labeled with the lung cancer features includes the sub-steps shown in fig. 2:

step S11: and downloading data with the data type of the lung cancer pathological section map from the public data set to form a lung cancer pathological section image data set.

Step S12: and dividing the lung cancer pathological section image data set into a training set and a verification set according to a preset proportion.

Step S13: converting the image labels of the images in the lung cancer pathological section image data set into mask images as true value images; and adjusting the width and height pixels in the image data of each image to be multiples of 8, and correspondingly adjusting the true value image according to the proportion. It should be noted that, since the image data is input into the neural network model for feature extraction, pooling, etc., and the image size is reduced to 1/8 of the original image step by step in these processes, the width and height of the image data need to be adjusted to be multiples of 8 in the preprocessing.

In this embodiment, the conversion of the image labels into the mask map as the true value map can be implemented using OpenCV computer vision software. The concept of "masking" in digital image processing is based on the PCB plate making process, in semiconductor manufacturing many chip processing steps use photolithography, and the graphic "negative" used for these steps is called a mask, which acts to mask an opaque graphic template in selected areas on the silicon wafer, and subsequent etching or diffusion below will affect only areas outside the selected areas. Therefore, the role of the graphic mask (mask) in digital image processing mainly includes: (1) multiplying a pre-made interesting mask and an image to be processed to obtain an interesting Region (ROI), wherein the image value in the interesting region is kept unchanged, and the image values outside the interesting region are all 0; (2) shielding certain areas on the image by using a mask so as to enable the certain areas not to participate in certain processing or calculation; (3) detecting and extracting structural features similar to the mask in the image by using a similarity variable or an image matching method; (4) and (4) making a special-shaped image.

Step S2: and constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set.

In this embodiment, the step S2 can be divided into the sub-steps shown in fig. 3:

step S21: and constructing an encoder backbone network by taking the initial input feature map layer in the convolutional neural network as a feature extraction layer so as to extract the features of the image data in the lung cancer pathological section image data set.

Specifically, an initial input Feature layer (Feature layer) of the VGG16 is used as a Feature extraction layer, the basic structure of each layer is that the convolution Kernel size is 3*3 (Kernel = 3), a two-dimensional convolution layer (Conv 2 d) is adopted, and a Relu activation function is added after each convolution layer, so that the Feature is extracted by using the structure. The constructed encoder comprises 4 modules (blocks), namely a first block, a second block, a third block and a mixed attention module, wherein branch lines are respectively output from the first block, the second block and the third block, and the branch lines are output after being unified in dimension by the mixed attention module and connected according to channels so as to be used as the input of a fourth module.

More specifically, the step S21 may be divided into the following sub-steps:

step S211: taking an initial input feature map layer of the VGG16 as a feature extraction layer; the number of convolution kernels of each branch line of each layer in the first module is N, and then the convolution kernels are connected with a maximum pooling layer; the number of convolution kernels of each branch line of each layer in the second module is 2N, the second module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/2 of that of an original image; the number of convolution kernels of each branch line of each layer in the third module is 4N, the third module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/4 of that of an original image; the number of convolution kernels of each branch line of each layer in the fourth module is 8N, the output of the branch lines of the first module, the second module and the third module is modified into a uniform dimension, the branch lines are connected according to channels and then output, the number of output channels is 8N, and the size of a feature map is 1/8 of that of an original image; wherein N is an integer greater than 0.

For example, there are 4 blocks of the encoder, the first block has 2 branches, the number of convolution kernels in each layer is 64,64, and then a maxporoling layer (kernel = 2); the second module has 2 branches, each layer of convolution kernels has 128,128, and is followed by a maxporoling layer (Kernel = 2); the third module has 3 branches, the number of convolution kernels of each layer is 256,256,256, the subsequent convolution layer is maxporoling (Kernel = 2), the fourth module has 3 branches, and the number of convolution kernels of each layer is 512,512,512. The first module, the second module and the third module are respectively provided with branch line outputs, the branch line outputs are modified into uniform dimensionality through the mixed attention module and then output after being connected according to channels, the number of the output channels is 512, and the size of a characteristic diagram is one eighth of that of an original image.

Step S212: and constructing a mixed attention module fused with channel attention, coordinate attention and space attention, and processing and then superposing the initial input feature maps respectively based on a channel attention mechanism, a coordinate attention mechanism and a space attention mechanism so as to carry out attention weighting on the channel, the coordinate and the space.

The mixed attention module constructed in the embodiment combines the ideas of channel attention and space attention to form a new mixed attention module. Specifically, the initial input feature map is processed in three groups, and the specific process is as shown in fig. 4:

processing the initial input feature map based on a channel attention mechanism in a first set of processes, comprising: respectively carrying out average pooling and maximum pooling on the initial input feature map; all feature maps obtained by pooling are processed by a convolution module, an activation function module and a convolution module in sequence and then added; processing the feature maps obtained by adding by an activation function module, and multiplying the obtained result by the initial input feature map to obtain a first group of processing results;

for example, after performing average pooling (Avgpool) and maximum pooling (Maxpool) on the initial input feature map, obtaining a pooled feature map; and (3) processing the feature maps after the average pooling and the maximum pooling respectively by 1*1 convolution kernels, a ReLU activation function and a 1*1 convolution kernel, adding, processing the result obtained by adding by a Sigmoid activation function, and multiplying the result obtained by the initial input feature map to obtain a first group of processing results.

It should be noted that, in the embodiment of the present invention, a micro convolutional network having two convolutional layers is used to learn the features of the channel layer, and the feature dimension of the input is (batch, channel,1,1), so that two 1*1 convolutional kernels are used to perform the convolution processing, and two fully connected layers may also be used here.

It should be understood that pooling, which is typically done after the convolutional layer, may increase the model running speed to some extent; maximum pooling (Maxpool) is to select the maximum value in each convolution kernel area as output; average pooling (Avgpool) is the selection of the average value as output within each convolution kernel region.

Processing the initial input feature map based on a coordinate attention mechanism in a second set of processes, comprising: respectively carrying out average pooling treatment on the initial input feature maps in the height direction and the width direction, and splicing the pooling feature maps in the height direction with the pooling feature maps in the width direction; the spliced feature graphs are sequentially processed by a convolution module and a normalization module to obtain a first to-be-multiplied feature graph, and the first to-be-multiplied feature graph is input into an activation function module to obtain a second to-be-multiplied feature graph; multiplying the first to-be-multiplied feature map with the second to-be-multiplied feature map; decomposing the feature maps obtained by multiplication into feature maps in the height direction and the width direction, and adding the feature maps in all directions after the feature maps are processed by a convolution module and an activation function module; and multiplying the feature map obtained by adding with the initial input feature map to obtain a second group of processing results.

For example, the initial input feature map is averaged and pooled in the height direction and the width direction, and after dimension reduction and conversion to the last dimension, a pooled feature map in the height direction (AvgPool _ H) and a pooled feature map in the width direction (AvgPool _ W) are obtained; the height direction pooling feature map (AvgPool _ H) and the width direction pooling feature map (AvgPool _ W) are concatenated (Concat). The spliced feature map is sequentially processed by a 1*1 convolution kernel and a BN module to obtain a first to-be-multiplied feature map, and the first to-be-multiplied feature map is subjected to nonlinear processing by inputting an activation function (for example, a ReLU6 activation function with the maximum value adjusted to 1, or a common ReLU function with the maximum value of 1) with the maximum value of 1 to obtain a second to-be-multiplied feature map. Decomposing a feature map obtained by multiplying the first to-be-multiplied feature map and the second to-be-multiplied feature map into feature maps in the height direction and the width direction on a space dimension with the channel number of 2 (dim = 2); the height direction characteristic diagram and the width direction characteristic diagram are added after being processed by a 1*1 convolution kernel and a Sigmoid activation function; and multiplying the feature map obtained by adding with the initial input feature map to obtain a second group of processing results.

It should be understood that dim represents a spatial dimension, and that the case where the color channel is typically 3 channels (dim = 3), the grayscale image is 1 channel (dim = 1), and dim =2 illustrates that only two of the color channels are calculated. The BN (Batch Normalization) module is a Batch Normalization layer that transforms the data of each layer to a state with a mean of 0 and a variance of 1 to help the model converge more easily.

Processing the initial input feature map based on a spatial attention mechanism in a third set of processes, comprising: respectively carrying out averaging and maximization processing on the initial input characteristic diagram on the spatial dimension with the channel number of 1, and then splicing; and multiplying the feature graph obtained by splicing the feature graphs by the initial input feature graph after the feature graphs are sequentially processed by the convolution module and the activation function module to obtain a third group of processing results.

For example, the initial input feature maps are respectively subjected to averaging (mean) and maximization (max) processing on the spatial dimension with the channel number of 1 (dim = 1) and then spliced; the feature maps obtained by splicing are sequentially processed by a convolution kernel (the specific size can be 5*5 convolution kernel, 7*7 convolution kernel, 9*9 convolution kernel or 11 × 11 convolution kernel and the like) and a Sigmoid activation function respectively.

And adding the first group of processing results, the second group of processing results, the third group of processing results and the initial input characteristic diagram and outputting the result, and simultaneously carrying out attention weighting on channels, spaces and coordinates by the processing mode so as to improve the precision.

Step S22: constructing a decoder backbone network; the decoder backbone network comprises 5 modules, the first three modules (block 5, block6 and block 7) are respectively provided with branch line outputs, and output results are output after unified dimensionality of the mixed attention module and channel connection and serve as input of a fourth module (block 8); the output of the fourth module is used as the input of the fifth module (block 9), and the final output result is a single channel which is the same as the size of the input original image.

In this embodiment, the number of convolution kernels of each branch line in each layer in the first module is 8N (e.g., 512,512,512), and then an up-sampling layer is connected to perform up-sampling processing; the number of convolution kernels of each branch line of each layer in the second module is 4N (such as 256,256,256), and then an upper sampling layer is connected for up-sampling processing; the number of convolution kernels of each branch line of each layer in the third module is 2N (such as 128,128,128), and then an upper sampling layer is connected for up-sampling processing; the number of convolution kernels of each branch line in each layer in the fourth module is N (such as 64,64,64); the number of convolution kernels of each branch line of each layer in the fifth module is (64,1), and the final output result is a single channel which is the same as the original image in size.

For example, the upsampling may use parameters (upsize =2, mode = 'nearest'), that is, the output size is 2 according to different input types, and the upsampling algorithm used is nearest; besides, the upsampling algorithm may use linear, bilinear, bicubic, trilinear, or the like.

The overall structure of the lung cancer image segmentation neural network based on mixed attention in the embodiment of the invention is shown in fig. 4B: wherein block 1-block 4 form an encoder backbone network, and block 5-block 9 form a decoder backbone network.

In this embodiment, the structure and processing procedure of the hybrid attention module in the decoder backbone network are the same as those of the hybrid attention module in the encoder, and thus are not described again.

It is worth emphasizing that: the attention of the invention is enhanced from three dimensions, one is a channel level, each channel is a detector, and the attention of the invention to a certain channel is improved; secondly, on the aspect of space coordinates, learning the relation between the width and the height after the width and the height are disassembled so as to improve the attention on the dimension of the coordinate of the width and the height; and thirdly, the space level of the 2d image is used for enhancing the attention of a certain area in the 2d image. In the prior art, attention is only enhanced from a single angle, three kinds of attention are mixed, attention of three different dimensions of a channel-coordinate-space is fully utilized to comprehensively improve attention to features in an image, and as a label to be learned of the method is a labeled lung cancer feature, the method can realize 'improvement of the attention of a network to the lung cancer feature'.

And step S3: and training and verifying the mixed attention neural network for outputting a corresponding segmentation graph and a prediction result after inputting the lung cancer image to be predicted.

In this embodiment, the training and verifying the hybrid attention neural network includes:

step S31: a loss function is set and a model optimizer and its parameters are selected.

In some examples, the loss function uses a BCELoss function (Binary Cross Engine), which returns an average of the loss. In addition, CELoss function (Cross entry) or MSE function (Mean Squared error) may be used, and the present embodiment is not limited thereto.

Furthermore, optimizers such as Adam, SGD, momentum, adaGrad and RMSProp are selected to optimize the neural network model. Taking the Adam optimizer as an example, batch _ size is set to 1, the initial learning rate is 0.0001, epoch is set to 500.

Step S32: and inputting the preprocessed images and the truth-value diagram in the training set into a neural network for training.

In this embodiment, an optimizer is used to update and compute the network parameters that affect the model training and model output to approximate or reach optimal values, thereby minimizing or maximizing the loss function.

Step S33: and loading the neural network parameters obtained by training, and evaluating the performance of the neural network obtained by training by using a verification set test.

In this embodiment, the MIOU (mean intersection over intersection), precision/Precision (Precision), recall/Recall (Recall), mPA (mean average Precision), precision/Recall (Accuracy), or f-score (mean harmonic average of Precision and Recall) are selected.

Referring to fig. 5, a schematic diagram of an optional hardware structure of the electronic terminal 500 according to the embodiment of the present invention is shown, where the terminal 500 may be a mobile phone, a computer device, a tablet device, a personal digital processing device, a factory background processing device, and the like. The electronic terminal 500 includes: at least one processor 501, memory 502, at least one network interface 504, and a user interface 506. The various components in the device are coupled together by a bus system 505. It will be appreciated that the bus system 505 is used to enable connected communication between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for the sake of clarity the various buses are labeled as a bus system in figure 4.

The user interface 506 may include, among other things, a display, a keyboard, a mouse, a trackball, a click gun, keys, buttons, a touch pad, a touch screen, or the like.

It will be appreciated that the memory 502 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), synchronous Static Random Access Memory (SSRAM). The described memory for embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The memory 502 in embodiments of the present invention is used to store various types of data to support the operation of the electronic terminal 500. Examples of such data include: any executable programs for operating on the electronic terminal 500, such as an operating system 5021 and application programs 5022; the operating system 5021 includes various system programs such as a framework layer, a core library layer, a driver layer, etc. for implementing various basic services and processing hardware-based tasks. The application 5022 may contain various applications such as a media player (MediaPlayer), a Browser (Browser), etc., for implementing various application services. The method for segmenting the lung cancer image based on the mixed attention provided by the embodiment of the invention can be contained in the application program 5022.

The method disclosed by the above-mentioned embodiments of the present invention may be applied to the processor 501, or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The Processor 501 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. Processor 501 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. The general purpose processor 501 may be a microprocessor or any conventional processor or the like. The steps of the method for optimizing the accessories provided by the embodiment of the invention can be directly embodied as the execution of a hardware decoding processor, or the combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium having a memory and a processor reading the information in the memory and combining the hardware to perform the steps of the method.

In an exemplary embodiment, the electronic terminal 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), for performing the aforementioned methods.

Fig. 6 is a schematic structural diagram illustrating a lung cancer image segmentation apparatus based on mixed attention according to an embodiment of the present invention. In this embodiment, the lung cancer image segmentation apparatus 600 based on mixed attention includes a preprocessing module 601, a model building module 602, and a training and verification module 603.

The preprocessing module 601 is used for acquiring and preprocessing a lung cancer pathological section image dataset marked with lung cancer characteristics; the model construction module 602 is configured to construct a mixed attention neural network model fused with channel attention, coordinate attention and spatial attention according to the preprocessed lung cancer pathological section image dataset; the training and verification module 603 is configured to train and verify the hybrid attention neural network, so as to output a segmentation map and a prediction result after inputting the lung cancer image to be predicted.

In some examples, the process of acquiring and preprocessing the lung cancer pathological section image dataset labeled with the lung cancer features by the preprocessing module 601 includes: downloading data of which the data type is a lung cancer pathological section image from the public data set to form a lung cancer pathological section image data set; dividing the lung cancer pathological section image data set into a training set and a verification set according to a preset proportion; converting the image labels of the images in the lung cancer pathological section image data set into mask images as true value images; and adjusting the width and height pixels in the image data of each image to be multiples of 8, and proportionally and correspondingly adjusting the true value image.

In some examples, the model construction module 602 constructs a mixed attention neural network model fused with channel attention, coordinate attention and spatial attention from the preprocessed lung cancer pathological section image dataset, including: constructing an encoder backbone network by taking an initial input feature map layer in a convolutional neural network as a feature extraction layer so as to extract features of image data in a lung cancer pathological section image data set; constructing a decoder backbone network; the decoder backbone network comprises 5 modules, wherein the first three modules are respectively provided with branch line outputs, and output results are unified in dimensionality through the mixed attention module and are output after being connected according to channels to serve as the input of a fourth module; the output of the fourth module is used as the input of the fifth module, and the final output result is a single channel which is the same as the size of the input original image.

In some examples, the model building module 602 takes an initial input feature layer in the convolutional neural network VGG16 as a feature extraction layer; the number of convolution kernels of each branch line of each layer in the first module is N, and then the convolution kernels are connected with a maximum pooling layer; the number of convolution kernels of each branch line of each layer in the second module is 2N, the second module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/2 of that of an original image; the number of convolution kernels of each branch line of each layer in the third module is 4N, the third module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/4 of that of an original image; the convolution kernel number of each branch line of each layer in the fourth module is 8N, after the output of the branch lines of the first module, the second module and the third module is modified into a uniform dimension, the output is carried out after the output is connected according to channels, the number of the output channels is 8N, and the size of the characteristic diagram is 1/8 of that of the original image; wherein N is an integer greater than 0.

In some examples, the construction process of the hybrid attention module includes: and processing and superposing the initial input feature maps respectively based on a channel attention mechanism, a coordinate attention mechanism and a space attention mechanism so as to weight the attention of the channel, the coordinate and the space.

In some examples, the hybrid attention module performs three sets of processing on an initial input feature map: processing the initial input feature map based on a channel attention mechanism in a first set of processes, comprising: respectively carrying out average pooling and maximum pooling on the initial input feature map; all feature maps obtained by pooling are processed by a convolution module, an activation function module and a convolution module in sequence and then added; processing the feature maps obtained by adding by an activation function module, and multiplying the obtained result by the initial input feature map to obtain a first group of processing results; processing the initial input feature map based on a coordinate attention mechanism in a second set of processes, comprising: carrying out average pooling treatment on the initial input feature map in the height direction and the width direction respectively, and splicing the pooling feature map in the height direction with the pooling feature map in the width direction; the spliced feature graphs are sequentially processed by a convolution module and a normalization module to obtain a first to-be-multiplied feature graph, and the first to-be-multiplied feature graph is input into an activation function module to obtain a second to-be-multiplied feature graph; multiplying the first to-be-multiplied feature map with the second to-be-multiplied feature map; decomposing the feature maps obtained by multiplication into feature maps in the height direction and the width direction, and adding the feature maps in all directions after the feature maps are processed by a convolution module and an activation function module; multiplying the feature map obtained by adding with the initial input feature map to obtain a second group of processing results; processing the initial input feature map based on a spatial attention mechanism in a third set of processes, comprising: respectively carrying out averaging and maximization processing on the initial input characteristic diagram on the spatial dimension with the channel number of 1, and then splicing; the feature graph obtained by splicing is sequentially processed by a convolution module and an activation function module, and the obtained feature graph is multiplied by the initial input feature graph to obtain a third group of processing results; and adding the first group of processing results, the second group of processing results, the third group of processing results and the initial input feature map and outputting.

In some examples, in the decoder backbone network constructed by the model construction module 602, the number of convolution kernels of each branch line in each layer in the first module is 8N, and then an upper sampling layer is connected to perform upsampling; the number of convolution kernels of each branch line in each layer in the second module is 4N, and then an upper sampling layer is connected for up-sampling processing; the number of convolution kernels of each branch line in each layer in the third module is 2N, and then an upper sampling layer is connected for carrying out up-sampling processing; the number of convolution kernels of each branch line of each layer in the fourth module is N; the number of convolution kernels of each layer in the fifth module is (N, 1), and the final output result is a single-channel image with the same size as the original image; wherein N is an integer greater than 0.

In some examples, the process of model building module 602 training and validating the hybrid attention neural network includes: setting a loss function and selecting a model optimizer and parameters thereof; inputting the preprocessed images and truth diagrams in the training set into a neural network for training; and loading the parameters of the neural network obtained by training, and evaluating the performance of the neural network obtained by training by using a verification set test.

It should be noted that: the lung cancer image segmentation apparatus based on mixed attention provided in the above embodiment is only exemplified by the above division of each program module when performing lung cancer image segmentation based on mixed attention, and in practical applications, the above processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the above-described processing. In addition, the lung cancer image segmentation device based on mixed attention provided by the above embodiment and the lung cancer image segmentation method based on mixed attention belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In embodiments provided herein, the computer-readable and writable storage medium may comprise read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium which can be used to store desired program code in the form of instructions or data structures and which can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

In summary, the present invention provides a method, an apparatus, a terminal and a medium for segmenting a lung cancer image based on mixed attention, which can predict the types of lung cancer more accurately, and can realize the function of predicting lesion regions that a neural network based on classification does not have; according to the invention, a new mixed attention module is designed, a classical attention mechanism based on a neural network is improved, a mixed attention module is formed by fusing the ideas of channel attention, space attention and coordinate attention, the attention on the channel dimension, the space dimension and the coordinate dimension can be mixed at the same time, and the attention on the lung cancer characteristics is better improved; on the basis of extracting features from a VGG16feature layer, the overall architecture of the invention better extracts feature information by introducing a mixed attention module, improves the attention of a network to lung cancer features, and overcomes the problem of low efficiency of feature extraction capability without attention or with single attention. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A lung cancer image segmentation method based on mixed attention is characterized by comprising the following steps:

acquiring and preprocessing a lung cancer pathological section image data set with a label marked with lung cancer characteristics;

constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set;

and training and verifying the mixed attention neural network for outputting a corresponding segmentation graph and a prediction result after inputting the lung cancer image to be predicted.

2. The lung cancer image segmentation method based on mixed attention according to claim 1, wherein the acquiring and preprocessing of the lung cancer pathological section image dataset labeled with lung cancer features comprises:

downloading data of which the data type is a lung cancer pathological section image from the public data set to form a lung cancer pathological section image data set;

dividing the lung cancer pathological section image data set into a training set and a verification set according to a preset proportion;

converting the image labels of the images in the lung cancer pathological section image data set into mask images as true value images; and adjusting the width and height pixels in the image data of each image to be multiples of 8, and correspondingly adjusting the true value image according to the proportion.

3. The method for lung cancer image segmentation based on mixed attention according to claim 1, wherein the constructing a mixed attention neural network model fused with channel attention, coordinate attention and spatial attention according to the preprocessed lung cancer pathological section image dataset comprises:

constructing an encoder backbone network by taking an initial input feature map layer in a convolutional neural network as a feature extraction layer so as to extract features of image data in a lung cancer pathological section image data set;

constructing a decoder backbone network; the decoder backbone network comprises 5 modules, wherein the first three modules are respectively provided with branch line outputs, and output results are unified in dimensionality through the mixed attention module and are output after being connected according to channels to serve as the input of a fourth module; the output of the fourth module is used as the input of the fifth module, and the final output result is a single channel which is the same as the size of the input original image.

4. The lung cancer image segmentation method based on mixed attention of claim 3, wherein an initial input feature map layer in a convolutional neural network VGG16 is used as a feature extraction layer; the number of convolution kernels of each branch line of each layer in the first module is N, and then the convolution kernels are connected with a maximum pooling layer; the number of convolution kernels of each branch line of each layer in the second module is 2N, the second module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/2 of that of an original image; the number of convolution kernels of each branch line of each layer in the third module is 4N, the third module is connected with a maximum pooling layer, and the size of a characteristic graph is 1/4 of that of an original image; the convolution kernel number of each branch line of each layer in the fourth module is 8N, after the output of the branch lines of the first module, the second module and the third module is modified into a uniform dimension, the output is carried out after the output is connected according to channels, the number of the output channels is 8N, and the size of the characteristic diagram is 1/8 of that of the original image; wherein N is an integer greater than 0.

5. The method of claim 3, wherein the construction process of the mixed attention module comprises: and processing and superposing the initial input feature maps respectively based on a channel attention mechanism, a coordinate attention mechanism and a space attention mechanism so as to weight the attention of the channel, the coordinate and the space.

6. The mixed attention based lung cancer image segmentation method according to claim 5, wherein the mixed attention module performs three sets of processing on an initial input feature map:

processing the initial input feature map based on a coordinate attention mechanism in a second set of processes, comprising: carrying out average pooling treatment on the initial input feature map in the height direction and the width direction respectively, and splicing the pooling feature map in the height direction with the pooling feature map in the width direction; the spliced feature graphs are sequentially processed by a convolution module and a normalization module to obtain a first to-be-multiplied feature graph, and the first to-be-multiplied feature graph is input into an activation function module to obtain a second to-be-multiplied feature graph; multiplying the first to-be-multiplied feature map with the second to-be-multiplied feature map; decomposing the feature maps obtained by multiplication into feature maps in the height direction and the width direction, and adding the feature maps in all directions after the feature maps are processed by a convolution module and an activation function module; multiplying the feature map obtained by adding with the initial input feature map to obtain a second group of processing results;

processing the initial input feature map based on a spatial attention mechanism in a third set of processes, comprising: respectively carrying out averaging and maximization processing on the initial input characteristic diagram on the spatial dimension with the channel number of 1, and then splicing; the feature graph obtained by splicing is sequentially processed by a convolution module and an activation function module, and the obtained feature graph is multiplied by the initial input feature graph to obtain a third group of processing results;

and adding the first group of processing results, the second group of processing results, the third group of processing results and the initial input feature map and outputting.

7. The lung cancer image segmentation method based on mixed attention as claimed in claim 3, wherein in the decoder backbone network, the number of convolution kernels of each branch line of each layer in a first module is 8N, and then an up-sampling layer is connected to perform up-sampling processing; the number of convolution kernels of each branch line in each layer in the second module is 4N, and then an upper sampling layer is connected to perform up-sampling processing; the number of convolution kernels of each branch line in each layer in the third module is 2N, and then an upper sampling layer is connected for carrying out up-sampling processing; the number of convolution kernels of each branch line of each layer in the fourth module is N; the number of convolution kernels of each layer in the fifth module is (N, 1), and the final output result is a single-channel image with the same size as the original image; wherein N is an integer greater than 0.

8. The method of mixed attention based lung cancer image segmentation according to claim 1, wherein the process of training and validating a mixed attention neural network comprises: setting a loss function and selecting a model optimizer and parameters thereof; inputting the preprocessed images and truth diagrams in the training set into a neural network for training; and loading the neural network parameters obtained by training, and evaluating the performance of the neural network obtained by training by using a verification set test.

9. A mixed attention-based lung cancer image segmentation apparatus, comprising:

the preprocessing module is used for acquiring and preprocessing the lung cancer pathological section image dataset marked with the lung cancer characteristics;

the model construction module is used for constructing a mixed attention neural network model fused with channel attention, coordinate attention and space attention according to the preprocessed lung cancer pathological section image data set;

and the training and verifying module is used for training and verifying the mixed attention neural network for outputting a corresponding segmentation graph and a prediction result after inputting the lung cancer image to be predicted.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the hybrid attention-based lung cancer image segmentation method according to any one of claims 1 to 8.

11. An electronic terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory to cause the terminal to perform the mixed attention based lung cancer image segmentation method according to any one of claims 1 to 8.