CN114511581A

CN114511581A - Multi-task multi-resolution collaborative esophageal cancer lesion segmentation method and device

Info

Publication number: CN114511581A
Application number: CN202210415926.1A
Authority: CN
Inventors: 胡兵; 袁湘蕾; 曾宪晖; 牟一; 刘伟; 叶连松; 罗奇
Original assignee: West China Hospital of Sichuan University
Current assignee: Xiamen Yinnuo Medical Technology Co ltd
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-05-17
Anticipated expiration: 2042-04-20
Also published as: CN114511581B

Abstract

The invention discloses a multitask multiresolution collaborative esophageal cancer lesion segmentation method and device, and mainly solves the problems that the boundary of a lesion area and a normal area cannot be identified with high precision by the existing esophageal cancer lesion area identification method in the prior art, and the judgment of the condition of an illness and the determination of a subsequent treatment scheme are influenced. Firstly, acquiring an esophageal cancer lesion image dataset, and preprocessing the esophageal cancer lesion image dataset; then constructing a multitask multiresolution collaborative esophageal cancer lesion segmentation model; then after training the segmentation model, inputting esophageal cancer lesion image data to be predicted into the trained segmentation model, and outputting a lesion segmentation result and a lesion contour result; combining the lesion contour result with a lesion segmentation result after the lesion contour result is processed by using a conditional random field to form a final segmentation result; the invention fully combines the outline information of the pathological changes, extracts and combines rich characteristics, effectively separates the characteristics of the pathological changes and realizes high-precision segmentation of the pathological change area.

Description

Multi-task multi-resolution collaborative esophageal cancer lesion segmentation method and device

Technical Field

The invention relates to the technical field of esophageal cancer lesion recognition, in particular to a multitask and multi-resolution collaborative esophageal cancer lesion segmentation method and device.

Background

Esophageal cancer is one of main malignant tumors threatening human health, is a malignant tumor with the 4 th mortality rate and the 6 th morbidity rate in China, and accounts for half of the number of cases all over the world. The survival rate of the early esophageal cancer patient after the operation treatment can reach more than 90 percent in 5 years, and the survival rate of the advanced esophageal cancer patient after the operation treatment in 5 years is less than 20 percent. Therefore, early diagnosis and early treatment of esophageal cancer are key measures for improving the prognosis of patients, the gastroenterology combined biopsy histopathology examination is an important means for diagnosing early esophageal cancer, and effective biopsy of lesions is key for ensuring that the lesions are not missed. Endoscopic submucosal resection is a preferred mode for treating early esophageal cancer, and accurate assessment of lesion range before operation is crucial to endoscopic treatment; therefore, on the basis of diagnosing early esophageal cancer, accurate assessment of lesion boundaries is critical to selection of biopsy sites and selection of treatment regimens for later stages of patients.

At present, the lesion area of the esophageal cancer is identified through artificial identification and Artificial Intelligence (AI); the artificial identification is influenced by a plurality of factors such as the operation level of a doctor, the diagnosis and identification capability of pathological changes and the like, and simultaneously, the judgment of the boundary of the pathological changes of the early esophageal cancer is brought about seriously by the fact that the pathological changes of the early esophageal cancer are difficult to identify the characteristics of the pathological changes of the early esophageal cancer, such as chromatic aberration, morphology, esophageal background mucosa, esophagitis and the like and the influence of factors such as digestive tract mucus and the like.

The artificial intelligence is identified through an AI model, and at present, a BP neural network is used for identifying blood vessels after segmentation training; recognizing after the U-net network training; Nasr-Esfahani and the like divide the contrast image into a blood vessel and a background area, send the blood vessel and the background area into a CNN network, combine global information and local information, and inject a canny edge detector for training so as to obtain a good result; jun et al introduced a new network of T-Net to overcome the limitation that U-Net had only one set of cascade layers between encoding and encoding blocks, and added pooling layers and upper application layers during encoding to make the predicted mask more accurate.

The above methods all achieve certain effects, but researchers do not comprehensively consider the outline information of the lesion and the feature information of different coding layers, but the outline information of the lesion and the feature information of different coding layers are both helpful for segmenting the lesion, so that the existing esophageal cancer lesion area identification method cannot accurately identify the boundary between the lesion area and the normal area, and influences the judgment of the lesion and the determination of the subsequent treatment scheme.

Disclosure of Invention

The invention aims to provide a multitask and multiresolution collaborative esophageal cancer lesion segmentation method and a multitask and multiresolution collaborative esophageal cancer lesion segmentation device, and aims to solve the problem that the boundary of a lesion region and a normal region cannot be identified with high precision by the existing esophageal cancer lesion region identification method, so that the judgment of a disease condition and the determination of a subsequent treatment scheme are influenced.

In order to solve the above problems, the present invention provides the following technical solutions:

a multitask and multiresolution collaborative esophageal cancer lesion segmentation method comprises the following steps:

s1, acquiring an esophageal cancer lesion image dataset and preprocessing the esophageal cancer lesion image dataset; the esophageal cancer lesion image data is any one or more of esophageal cancer endoscope images, esophageal cancer endoscope images and esophageal cancer CT images;

s2, constructing a multitask and multiresolution collaborative esophageal cancer lesion segmentation model, wherein the segmentation model comprises four decoding blocks and four coding blocks; the decoding block and the part corresponding to the coding block adopt jump connection; the segmentation model is fused with an attention module; the decoding block comprises a convolution layer, a batch normalization layer and a MaxPholing layer; the coding block comprises a convolution layer, a batch normalization layer, a Max scaling layer and an upper coding block.

S3, training the segmentation model in the step S2 by adopting the esophageal cancer lesion image data obtained in the step S1;

s4, inputting the esophageal cancer lesion image data to be predicted into the segmentation model trained in the step S3, and outputting a lesion segmentation result and a lesion contour result;

and S5, combining the lesion contour result processed by the conditional random field and the lesion segmentation result in the step S4 to form a final segmentation result.

Firstly, processing esophageal cancer lesion image data, then building a multitask multiresolution collaborative segmentation model, and then training the segmentation model; inputting esophageal cancer lesion image data to be predicted, outputting a lesion segmentation result and a lesion contour result, and combining the lesion contour result with the lesion segmentation result after conditional random field processing to form a final segmentation result; the invention fully combines the outline information of the pathological changes, extracts and combines rich characteristics, effectively separates the characteristics of the pathological changes, finds the segmentation boundary of the pathological change area and the normal area and realizes the high-precision segmentation of the pathological change area.

Furthermore, an esophageal cancer lesion area is marked on the esophageal cancer lesion image to form a label, and then esophageal cancer lesion image data are read and normalized.

Further, the preprocessing process also comprises the step of randomly dividing the esophageal cancer lesion image data into a plurality of subblocks after the data after the normalization processing is subjected to rotation, cutting and noise addition to amplify the esophageal cancer lesion image data, wherein each subblock is marked with a label with a corresponding size.

Further, the training process in step S3 is as follows:

s301, enabling the sub-blocks to enter the sampling features of the coding blocks, and fusing the sampling features of the bottommost coding block with the sampling features of the three coding blocks to obtain a fusion feature;

s302, sending the fusion features to an attention module to output the adjusted features; the attention module comprises a channel attention module and a space attention module;

s303, sending the adjusted features into a decoding block, interpolating each feature point by using a bidirectional bilinear interpolation method in the decoding block, performing convolution, standardization and ReLU operation to finish up-sampling, and then entering an output layer;

s304, the output layer divides the characteristics into two branches, wherein one branch is a lesion segmentation result, and the other branch is a lesion contour result;

s305 and S304 train the segmentation model by using the joint loss function to obtain the trained segmentation model.

Furthermore, the coding block adopts MaxPoint to carry out downsampling, and the decoding block adopts bilinear interpolation.

Further, the activation function of the dice coefficient in the joint loss function is defined as:

wherein

the result of the prediction is represented by,

a label graph after the label is manually set is shown; the cross entropy loss function in the joint loss function is defined as:

(ii) a Wherein, K represents the number of categories,

indicating that the sign function is 0 or 1, where sample i belongs to the true class k taking 1, otherwise 0,

the probability that pixel point i belongs to category k.

Further, the specific process of step S5 is: definition of

Is a pixel

Probability of whether or not it is a lesion;

is a pixel

The lesion contour probability of (a); processing the lesion contour result by using a conditional random field, and then merging the processed lesion contour result with a lesion segmentation result, wherein the calculation formula is as follows:

wherein CRF is a conditional random field.

Further, the method for segmenting the esophageal cancer lesion by multi-task multi-resolution cooperation further comprises the step of evaluating the segmentation model trained in the step S3, wherein the evaluation process comprises the following steps: and measuring by using a joint loss function, comparing the output final segmentation result with the overlapped part of the real cancerous data labeling part, and judging to reserve the optimal segmentation model according to a preset evaluation index.

A multi-task multi-resolution collaborative esophageal cancer lesion segmentation device comprises a memory: for storing executable instructions; a processor: the executable instructions stored in the memory are executed to realize a multitask and multi-resolution collaborative esophageal cancer lesion segmentation method.

Compared with the prior art, the invention has the following beneficial effects:

(1) firstly, processing esophageal cancer lesion image data, then building a multi-task multi-resolution collaborative segmentation model, then training the segmentation model, inputting esophageal cancer lesion image data to be predicted, outputting a lesion segmentation result and a lesion contour result, and combining the lesion contour result with the lesion segmentation result after the lesion contour result is processed by using a conditional random field to form a final segmentation result; the invention fully combines the outline information of the pathological changes, extracts and combines rich characteristics, effectively separates the characteristics of the pathological changes, finds the segmentation boundary of the pathological change area and the normal area and realizes the high-precision segmentation of the pathological change area.

(2) In the invention, different resolution characteristics of the coding blocks are fused and then sent to a convolution attention mechanism to strengthen the relation of useful information of the segmentation labels; and constructing a convolution attention module through a channel attention and space attention module by using the multi-resolution features, and enabling a network to pay attention to the boundary of the lesion by using the contour of the lesion so as to obtain a more accurate segmentation result.

(3) The invention constructs multi-scale information fusion by utilizing each step of down-sampling information in the segmentation module, and can more effectively fuse the information of different decoding blocks, thereby enhancing the maximum feeling, obtaining more accurate characteristics and further improving the segmentation accuracy.

(4) The method can be used for rapidly segmenting the lesion of the esophageal cancer, realizing the segmentation of the lesion data of the esophageal cancer in batches, realizing unattended batch operation, and being rapid in speed, saving manpower and material resources for lesion marking, and providing a more powerful basis for the auxiliary diagnosis of doctors.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

FIG. 1 is a schematic flow chart of training a segmentation model according to the present invention.

FIG. 2 is a flow chart of the test segmentation model of the present invention.

FIG. 3 is a schematic view of the segmentation process of esophageal cancer lesion image data by the segmentation model of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to fig. 1 to 3, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 to fig. 3, a method for segmenting esophageal cancer lesion by multi-task and multi-resolution cooperation comprises the following steps:

s1, acquiring an esophageal cancer lesion image dataset and preprocessing the esophageal cancer lesion image dataset; marking the esophageal cancer lesion area on the esophageal cancer lesion image to form a label, reading esophageal cancer lesion image data, performing normalization processing, amplifying the esophageal cancer lesion image data by adopting rotation, cutting and noise adding, and then randomly dividing the esophageal cancer lesion image data into a plurality of subblocks, wherein each subblock is marked with a label with a corresponding size.

S2, constructing a multitask and multiresolution collaborative esophageal cancer lesion segmentation model, wherein the segmentation model comprises four decoding blocks and four coding blocks; the decoding block and the part corresponding to the coding block adopt jump connection; the segmentation model is fused with an attention module; the coding block adopts MaxPoint to carry on the downsampling, the decoding block adopts the bilinear interpolation;

each coding block layer comprises a convolution layer, a batch normalization layer, a ReLu layer and a MaxPholing layer, the characteristics of the four coding blocks with different resolutions are fused, then the coding blocks are sent to a convolution attention module, then four decoding blocks are carried out, each decoding block layer comprises the convolution layer, the batch normalization layer and the ReLu layer, the last output layer is provided with two branches, one branch is a lesion segmentation result, the other branch is used for calculating a lesion contour result, and the corresponding layers in the U shape are subjected to jump connection to construct a multi-task multi-resolution collaborative segmentation model;

Example 2

As shown in fig. 1 and fig. 2, the specific process of training the segmentation model by the training set is as follows:

step 1: and sending the sub-block data in the training set into the coding block.

Step 2: and carrying out convolution, batch normalization and maximum pooling on the subblock data in the coding block so as to obtain the maximum receptive field.

And step 3: and at the bottommost coding block, fusing the characteristics of the former coding blocks and the bottom down-sampling characteristic, wherein the characteristics of different characteristic scale sizes are processed into the same size by utilizing convolution, and then the same size is sent to an attention module.

And 4, step 4: the attention module is a module for calculating the attention of one channel by inputting a feature map; the channel attention module is characterized by comprising a channel attention module, a three-layer perceptron, a maximum amplitude vector (Max) sensor, a Global average amplitude vector (Global average) sensor and a Global average amplitude vector (Global average) sensor, wherein the characteristic diagram is input according to the result of the Max Pooling and the Global average Pooling of each channel, the result is respectively sent to the three-layer perceptron, the output result is directly added, and then the result is sent to a Max Pooling activation function to obtain the characteristic diagram of the channel attention module; then multiplying the module according to the channel sequence; then, continuously calculating a space attention module; the spatial attention module is a feature diagram processed by a channel feature diagram mechanism, two feature diagrams of Global Max Pooling and Global Average Pooling are calculated on channel dimensions at first and are spliced together, then a traditional convolution process is used for one time, and finally a feature diagram used by the spatial attention module is obtained by using Sigmoid activation; then, the module is multiplied by the width dimension and the height dimension continuously, and finally, the adjusted feature graph is output.

And 5: and the characteristics acquired from the coding block and the convolution attention module are sent to a decoding block, each point is interpolated by using a bilinear interpolation method in the decoding block, and then convolution, standardization and ReLU operations are performed to finish up-sampling.

Step 6: the four coding blocks and the decoding blocks form a U-shaped shape, and the down-sampling edges of the U-shaped shape are symmetrically spliced with the up-sampling edges through a cross-layer connecting structure.

And 7: after each subblock passes through four coding blocks and decoding blocks, the final output layer is provided with two branches, one branch is a lesion segmentation result, the other branch calculates a lesion contour result, the two branches use a joint loss function to train a segmentation model, wherein an activation function of a dice coefficient is defined as follows:

wherein,

the result of the prediction is represented by,

a label graph after the label is manually set is shown;

another cross entropy loss function, defined as follows:

wherein, K represents the number of categories,

representing a sign function (0 or 1), taking 1 if the sample i belongs to the true class k, otherwise taking 0,

the probability that pixel point i belongs to category k.

And 8: on the trained model prediction, the results of the two branches are computed separately, wherein,

is a pixel

Probability of whether or not it is a lesion;

is a pixel

Probability of belonging to a lesion contour. And post-processing the lesion contour result by using a conditional random field, wherein the segmentation result of the final lesion can be obtained by merging the evolved coronary artery and the segmentation result, and the calculation formula is as follows:

wherein CRF is a conditional random field.

Example 3

As shown in fig. 1 to 3, the operation process of training the segmentation model is as follows: changing the augmented esophageal cancer lesion image data into a plurality of subblocks segmented by a blocking operation, sending the subblocks into a segmentation model in batches, setting the learning rate to be 0.001 by a network, attenuating the learning rate by ten times after 20 learning iterations, initializing a convolution weight by Gaussian distribution, setting a training batch to be 16, setting the learning iteration number to be 200, calculating the gradient and updating the weight by the network training by adopting a BP (back propagation) algorithm, updating a parameter for each batch by the network learning, judging the segmented evaluation result by the segmentation model after each iteration learning, saving the current segmentation model if the current error is smaller than the error of the last iteration, and continuing training until the maximum iteration number is reached.

Example 4

The method for segmenting the esophageal cancer lesion by multi-task and multi-resolution cooperation further comprises the step of evaluating the segmentation model trained in the step S3, wherein the evaluation process comprises the following steps: and measuring by using a joint loss function, comparing the output final segmentation result with the overlapped part of the real cancerous data labeling part, and judging to reserve the optimal segmentation model according to a preset evaluation index.

Example 5

According to the invention, a lesion segmentation result and a lesion contour result are obtained, conditional random field filling is carried out on the result of the contour segmentation module, and then the result and the lesion segmentation are merged to obtain a final segmentation result, so that the method can fully capture multi-scale information, enhances the segmentation of the lesion boundary of the esophageal cancer, and carries out post-processing on the result of the lesion segmentation by using the conditional random field, thereby improving the effectiveness of the esophageal cancer lesion segmentation and relatively accurately providing judgment basis for doctors.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multitask and multiresolution collaborative esophageal cancer lesion segmentation method is characterized by comprising the following steps:

s1, acquiring an esophageal cancer lesion image dataset and preprocessing the esophageal cancer lesion image dataset;

s2, constructing a multitask and multiresolution collaborative esophageal cancer lesion segmentation model, wherein the segmentation model comprises four decoding blocks and four coding blocks; the parts corresponding to the decoding block and the coding block adopt jump connection; the segmentation model is fused with an attention module;

s5, combining the lesion contour result of the step S4 with the lesion segmentation result after the conditional random field processing to form a final segmentation result.

2. The method for segmenting esophageal cancer lesions with multitask and multiresolution cooperation as claimed in claim 1, wherein the specific process of preprocessing in step S1 is as follows: marking the esophageal cancer lesion area on the esophageal cancer lesion image to form a label, and then reading esophageal cancer lesion image data and carrying out normalization processing.

3. The method according to claim 2, wherein the preprocessing further comprises dividing the esophageal cancer lesion image data into a plurality of sub-blocks randomly after the data after normalization processing is augmented by rotating, clipping, and adding noise, wherein each sub-block is marked with a label with a corresponding size.

4. The method for segmenting the esophageal cancer lesion based on the multi-task and multi-resolution cooperation of claim 3, wherein the training process in the step S3 is as follows:

s302, sending the fusion features to an attention module to output the adjusted features; the attention module includes a channel attention module and a spatial attention module;

and S305, training the segmentation model by utilizing the joint loss function in both branches of the step S304 to obtain the trained segmentation model.

5. The method as claimed in claim 4, wherein the coding block is downsampled by Max boosting, and the decoding block is bilinear interpolation.

6. The method for segmenting the esophageal cancer lesion based on the multi-task and multi-resolution cooperation of claim 4, wherein an activation function of a dice coefficient in the joint loss function is defined as:

wherein

the result of the prediction is represented by,

(ii) a Wherein, K represents the number of categories,

the probability that pixel point i belongs to category k.

7. The method for segmenting the esophageal cancer lesion based on the multi-task and multi-resolution cooperation of claim 6, wherein the specific process of the step S5 is as follows: definition of

Is a pixel

Probability of whether or not it is a lesion;

is a pixel

wherein CRF is a conditional random field.

8. The method according to claim 7, further comprising evaluating the segmentation model trained in step S3, wherein the evaluating process comprises: and measuring by using a joint loss function, comparing the output final segmentation result with the overlapped part of the real cancerous data labeling part, and judging to reserve the optimal segmentation model according to a preset evaluation index.

9. A multi-task and multi-resolution cooperative esophageal cancer lesion area segmentation device is characterized by comprising

A memory: for storing executable instructions;

a processor: for executing the executable instructions stored in the memory, a method of multi-tasking multi-resolution collaborative segmentation of esophageal cancer lesions as claimed in any one of claims 1-8 is implemented.