CN115272283A

CN115272283A - Endoscopic OCT image segmentation method, device, medium and product for colorectal tumor

Info

Publication number: CN115272283A
Application number: CN202210988273.6A
Authority: CN
Inventors: 吕晶; 李敏; 唐玉国; 任林; 王艳; 周镇乔; 贾宏博; 陈月岩; 王斯博
Original assignee: Suzhou Institute of Biomedical Engineering and Technology of CAS
Current assignee: Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-11-01

Abstract

The invention relates to an endoscopic OCT image segmentation method, equipment, medium and product of colorectal tumor, wherein the method comprises the following steps: preprocessing an endoscopic OCT image of the colon and the rectum; marking tissues and tumor regions of the endoscopic OCT image in the colon and rectum, and generating a training set, a verification set and a test set; constructing a basic network, and adding a multi-scale input feature fusion module based on cavity convolution and a Tr i l et attention i on triple attention mechanism module in the basic network; constructing a weighted mixed Loss function based on the combination of the Dce Loss and the Boundary Loss, and performing hierarchical segment training on the model by adjusting the weight value; and carrying out region segmentation on the preprocessed input OCT image through the trained model structure and parameters. The invention improves the model structure based on the Unet network, and more accurately segments the OCT image area, thereby being beneficial to carrying out quantitative analysis on the tumor in the endoscopic OCT image and providing effective help for early cancer diagnosis and accurate excision.

Description

Endoscopic OCT image segmentation method, device, medium and product for colorectal tumor

Technical Field

The invention relates to the technical field of medical image processing, in particular to an endoscopic OCT image segmentation method, equipment, a medium and a product for colorectal tumors.

Background

The Optical Coherence Tomography (OCT) is a high-resolution, non-invasive, non-contact, non-invasive, and non-invasive technique for cross-sectional imaging of biological tissues, and opens up a new way for detecting and diagnosing early canceration of human organ tissues. Tissue stratification and tumor area segmentation based on endoscopic OCT images are crucial to medical clinical diagnosis and disease quantitative analysis. Because the OCT image has the characteristics of speckle noise, uneven gray scale, artifact and the like, the accurate segmentation of the OCT image still has difficulty particularly for the cavity tissue. Because, there is some specificity in the lumen tissue: (1) The endoscopic OCT imaging probe is wrapped by a layer of plastic protective film to isolate the probe from a tissue layer, and the protective film can interfere image segmentation; (2) The natural motion of human body (including respiration, intestinal peristalsis, etc.) causes the lumen tissue structure to have irregular shape, and the stable segmentation reference is difficult to extract; (3) The tissue fluid in the cavity also brings certain trouble to image segmentation. Therefore, how to perform accurate segmentation on colorectal tumor images acquired by the sweep frequency endoscopic OCT system with 360-degree rotation scanning is a key problem. Due to different working principles, the traditional retina segmentation method is difficult to be suitable for intra-cavity OCT image segmentation.

Li et al propose a model training framework with multiple Unet combined in parallel, stratify guinea pig esophageal OCT image tissues obtained in an endoscopic OCT system, and the method emphasizes parallel training of multiple models to improve the prediction of tissue level topology, but the model training and prediction process becomes more complicated. C.Wang and M.gan propose a TSA-Net network model, apply to guinea pig esophagus OCT tissue automatic stratification, TSA-Net regards Unet as the basic network, wherein, TSA module uses the global context dependency used for catching OCT image of the self-attention mechanism, adopt the mixed loss function training model of cross entropy and Dice, has realized the higher segmentation precision, however, TSA module needs a large amount of memories to calculate the characteristic diagram of self-attention, if add a plurality of TSA modules, the burden to the computational device is greater. Yang et al propose a bilateral connectivity-based Bicon-CE network structure, is used for the epithelial tissue segmentation of human esophagus OCT image, the model defines the segmentation task as the combination of pixel connectivity modeling and classification based on pixel tissue, thus reduce topological structure problems such as segmentation result fracture, abnormal prediction, etc., however, the model is only applied to the segmentation problem of two classifications, if the popularization and application to the problem of multi-classification, the model structure with more parameters and greater complexity may be required. The models are respectively long, and all focus image layering fields obtain specific layering information, so that accurate segmentation of the whole tumor area is weakened, and difficulty is brought to subsequent accurate tumor excision. Therefore, the invention mainly focuses on accurate segmentation of a tumor tissue area, weakens hierarchical information, and provides an endoscopic OCT image segmentation method for colorectal tumors based on a triple attention mechanism U-shaped network, so as to help doctors to realize accurate resection of the tumor area in the future.

Disclosure of Invention

To achieve the above objects and other advantages and in accordance with the purpose of the invention, a first object of the present invention is to provide an endoscopic OCT image segmentation method of colorectal tumor, comprising the steps of:

preprocessing an endoscopic OCT image of the colon and the rectum;

marking tissues and tumor regions of the endoscopic OCT image in the colon and rectum, and generating a training set, a verification set and a test set;

constructing a basic network, and adding a multi-scale input feature fusion module based on cavity convolution and a triple Attention mechanism module in the basic network;

constructing a weighted mixed Loss function based on the combination of the Dice Loss and the Boundary Loss, and performing hierarchical segment training on the model by adjusting the weight;

and carrying out region segmentation on the preprocessed input OCT image through the trained model structure and parameters.

Further, the preprocessing of the colorectal endoscopic OCT image comprises the following steps:

cutting an interested area of the endoscopic OCT image of the colon and the rectum, and adjusting the size to be uniform;

carrying out noise reduction on the OCT image by adopting a median filtering method;

and adjusting the gray value of the image by adopting a Gamma nonlinear transformation method, and enhancing the details of the dark part of the image.

Further, the labeling of the tissue and the tumor region of the endoscopic OCT image for colorectal cancer and the generation of the training set, the verification set and the test set comprise the following steps:

marking the tissue levels of the preprocessed colorectal endoscopic OCT image by using a marking tool, wherein the marked levels are a mixed tissue level area of a mucosa layer and an intrinsic muscle layer and a tumor area;

and randomly sampling according to a preset proportion to form a training sample set and a test sample set.

Further, the generating of the training set, the verification set and the test set comprises the following steps:

performing enhancement processing on the marked sample, and performing data enhancement operation based on a torchvision.

And randomly sampling the enhanced samples according to a preset proportion to form a training set and a verification set, and using the training set and the verification set as model training.

Further, the building of the base network and the adding of the multi-scale input feature fusion module based on the hole convolution and the triple Attention mechanism module in the base network comprise the following steps:

taking a Unet network structure as a basic network;

adding a multi-scale fusion module in an Encoder structure of the Unet network;

and adding a triple attention mechanism module in different layers of a Decoder structure of the Unet network, wherein the input and output sizes of the triple attention mechanism module are consistent.

Further, the adding of the multi-scale fusion module in the Encoder structure of the Unet network comprises the following steps:

adding a multi-scale fusion module at a second layer of an Encoder structure of the Unet network, performing AvePool operation on an input image at an upper layer, and performing DiatedConv operation for a plurality of times to respectively generate a plurality of groups of output feature maps with different experience areas, wherein each group comprises a plurality of feature maps, and fusing the obtained channel feature map with a channel feature map transmitted after the pooling operation of the first layer to obtain enhanced features;

adding a multi-scale fusion module at the third layer of the Encoder structure of the Unet network, and merging the obtained channel characteristic diagram with the channel characteristic diagram transmitted after the operation of the second layer;

adding a multi-scale fusion module at the fourth layer of the Encoder structure of the Unet network, merging the obtained channel characteristic diagram with the channel characteristic diagram transmitted after the operation of the third layer, and performing subsequent convolution operation;

the method for adding the triple attention mechanism module in different layers of the Decoder structure of the Unet network comprises the following steps:

constructing a Mixed-Pool module, respectively carrying out Maxpool operation and AvePool average Pool operation on the input tensor on the channel dimension, and finally merging the two Pool results;

constructing a Basic-Conv module, wherein the Basic-Conv module consists of Conv, batch Norm and Sigmoid, and carrying out Conv, batch Norm and Sigmoid operation on the input tensor in sequence;

constructing a triple attention mechanism module, wherein the triple attention mechanism module consists of three branches, each branch is constructed based on a Mixed-Pool module and a Conv module, constructing a first branch module, giving an input tensor with the size of (C, H, W), performing Permute (0,2,1,3) dimension transformation on H and C dimensions to obtain a tensor with the shape of (C, H, W) changed into the shape of (H, C, W), performing Mixed-Pool operation to obtain a tensor with the shape of (2, C, W), performing Conv operation to obtain an attention weight with the shape of (C, W), multiplying the attention weight by the tensor after the dimension transformation to obtain an eigenvalue after the attention weighting, performing the same Permute (0,2,1,3) dimension transformation on the obtained eigenvalue again, recovering to the shape of (C, H, W), and obtaining a first branch eigen Map Map1 of the attention weighting;

constructing a second branch module, carrying out C and W dimension interaction, and carrying out dimension transformation operation to be Permute (0,3,2,1) to obtain a second branch feature Map2 weighted by attention;

constructing a third branch module, and carrying out dimension interaction between H and W to obtain a third branch feature Map3 weighted by attention;

weighting and combining the three branches, and respectively weighting and adding the attention characteristic values of the three branches;

and adding an attention mechanism module, and adding three-branch attention mechanism modules to the Decoder structure before the UpConv deconvolution operations of the second layer, the third layer and the fourth layer.

Further, the construction of the weighted mixed Loss function based on the combination of the Dice Loss and the Boundary Loss, and the step of carrying out the hierarchical segment training on the model by adjusting the weight value comprises the following steps:

defining a Dice Loss function, inputting the function as a model Prediction result Prediction and a real label Target, converting the Prediction into a probability value through Softmax operation, converting the Target into one-hot codes, and respectively solving | X _gt ∩Y _pred I and | X _gt |+|Y _pred Obtaining the Dice pass of each Batch, and then taking the average value of the Dice passes of all the batches to be the final Dice pass Loss value;

defining a Boundary Loss function, inputting the function as a Prediction result Prediction of a model and a real label Target, converting the Prediction into a probability value through Softmax operation, converting the Target into a one-hot code, firstly calculating a Boundary according to a Boundary calculation formula, calculating an expanded Boundary according to the expanded Boundary calculation formula, calculating the Boundary Prediction accuracy and the Prediction recall rate, and calculating the Boundary Loss L of each region class _EF1c Calculating the boundary loss L of each region class of each Batch _EF1c Then, taking the average value of the Boundary Loss of all batchs to be the final Boundary Loss value;

defining a weighted mixed Loss function, and carrying out weighted addition on the Dice Loss function and the boundry Loss function.

A second object of the present invention is to provide an electronic apparatus, comprising: a memory having program code stored thereon; a processor coupled with the memory and when the program code is executed by the processor, implementing an endoscopic OCT image segmentation method of a colorectal tumor.

It is a third object of the present invention to provide a computer readable storage medium having stored thereon program instructions which, when executed, implement a method of endoscopic OCT image segmentation of colorectal tumors.

A fourth object of the present invention is to provide a computer program product comprising computer program/instructions which when executed by a processor, implement a method for endoscopic OCT image segmentation of colorectal tumors.

Compared with the prior art, the invention has the beneficial effects that:

aiming at the difficulty of an endoscopic OCT image segmentation task and the limitation of the existing method, the invention provides an endoscopic OCT image segmentation method for applying a U-shaped network based on a triple attention mechanism to colorectal tumors. The advantages of the invention are mainly embodied in the following two aspects:

(1) The method comprises the steps that a U-shaped network frame and a model structure of a triple Attention mechanism are provided, a multi-scale input feature fusion module based on hole convolution is added in an Encoder structure of Unet on the basis of a classical Unet network, so that the model can always keep a global view field in feature learning of different levels and effectively capture context information, and meanwhile, a triple Attention module is added in a Decoder structure of Unet, has the remarkable advantages of light weight and almost no learning parameters, can capture a spatial dependency relationship and extract cross-dimension interactive features;

(2) A model training strategy combining grading and weighting mixed Loss functions is adopted, the Dice pass is used as a Loss function in the early stage of model training, accuracy in different hierarchical regions is mainly learned and concerned, and the Loss function combining the Dice pass and the Boundary pass is adopted in the later stage of model training, so that learning in the OCT hierarchical region is considered, and learning of the region Boundary is concerned.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of an endoscopic OCT image segmentation method applied to colorectal tumor by a U-shaped network based on a triple attention mechanism in embodiment 1;

FIG. 2 is a diagram of a model architecture based on a U-network framework and a triple attention mechanism;

FIG. 3 is a diagram of a model structure sub-module;

FIG. 4 is a diagram showing the result of tumor segmentation in mouse colorectal images acquired by the swept-frequency endoscopic OCT system;

FIG. 5 is a detailed comparison graph of mouse colorectal OCT image segmentation results of different models;

FIG. 6 is a comparison graph of tumor segmentation effects of different models on a fine region of a mouse colorectal OCT image;

FIG. 7 is a schematic view of an electronic apparatus according to embodiment 2;

fig. 8 is a schematic diagram of a computer-readable storage medium of embodiment 3.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.

Example 1

An endoscopic OCT image segmentation method for colorectal tumors is used for finely segmenting tumor areas of endoscopic OCT images of colorectal tissue levels. As shown in fig. 1, the method comprises the following steps:

performing colorectal OCT imaging on live guinea pigs with tumors and without tumors by using high-speed endoscopic SSOCT equipment, and selecting OCT 2D images capable of effectively displaying tissue layers such as mouse colorectal mucosa layers, intrinsic muscle layers and the like as data sources of the embodiment;

preprocessing the collected colorectal endoscopic OCT image by using median filtering, GAMMA transformation and cutting so as to eliminate artifacts and reduce noise; the method specifically comprises the following steps:

cutting an interested region of the endoscopic OCT image of the colon and rectum, removing the artifact, and adjusting the size to be uniform, such as 1024 × 128;

performing noise reduction on the OCT image by adopting a median filtering method, specifically scanning each pixel in the image by using a fixed-size template, and sequencing the gray value of each pixel in the template from large to small to form a sequence queue; and taking a median value of the sequence queue, and assigning the median value to a pixel corresponding to the center position of the template to obtain the noise-reduced image.

And adjusting the gray value of the image by adopting a Gamma nonlinear transformation method, enhancing the details of the dark part of the image, wherein the Gamma transformation is as shown in a formula (1).

Marking tissues and tumor regions of the endoscopic OCT image in the colon and rectum, performing effective sample expansion by adopting a data enhancement technology, and generating a training set, a verification set and a test set in a random mode; the method specifically comprises the following steps:

marking the tissue levels of the preprocessed colorectal endoscopic OCT image by using an open source marking tool, wherein the marked levels are a mixed tissue level area of a mucosa layer and an inherent muscle layer and a tumor area;

the training sample set and the test sample set are formed by randomly sampling at a preset ratio, for example 9:1.

Enhancing the marked samples, and performing data enhancement operations such as horizontal turning, gaussian blurring, random contrast adjustment, random brightness adjustment, random distortion and the like based on the torchvision.

And randomly sampling the enhanced samples according to a preset ratio such as 8:1 to form a training set and a verification set for subsequent model training and testing.

Constructing a basic network, and adding a multi-scale input feature fusion module based on cavity convolution and a triple Attention mechanism module in the basic network, as shown in FIG. 2; the method specifically comprises the following steps:

as shown in fig. 3 (a), a net network structure is used as an infrastructure network;

as shown in fig. 3 (b), a multi-scale fusion module is added to an Encoder structure of the Unet network, in order to keep a global view at each layer of the Encoder, extract inputs of different scales by using an AvePool pooling operation, perform a convolution operation on an input image of each scale by using four cavity convolution operations of different kernel sizes to construct a pyramid feature map, and fuse the obtained pyramid feature map with a feature map transferred from a previous layer to obtain an enhanced feature. This module is used to capture global context information. The size of an input image of a first layer of an Encoder structure of Unet is 1024 × 128, and multi-scale input is respectively added to a second layer, a third layer and a fourth layer, and specifically, the method comprises the following steps:

adding a multi-scale fusion module at the second layer of the Encoder structure of the Unet network, performing AvePool operation on the input image at the upper layer to obtain 512 × 64 images, performing a plurality of times of operations such as 4 DilatedConv operations to respectively generate 4 groups of output feature maps with different perception areas or visual fields, wherein each group comprises 8 feature maps, the final output channel is 32, merging the obtained 32-channel feature map with the 64-channel feature map transmitted after the first-layer pooling operation into 96-channel features, and performing subsequent operations;

adding a multi-scale fusion module at the third layer of the Encoder structure of the Unet network, wherein the specific operation is similar to that of the second layer, the final output channel is 64, combining the obtained 64-channel characteristic diagram with a 128-channel characteristic diagram transferred after the operation of the second layer into a 192-channel characteristic diagram, and performing subsequent convolution operation;

adding a multi-scale fusion module at the fourth layer of the Encoder structure of the Unet network, wherein the specific operation is still similar to that of the second layer, the final output channel is 128, combining the obtained 128-channel characteristic diagram with the 128-channel characteristic diagram transferred after the operation of the third layer into a 384-channel characteristic diagram, and performing subsequent convolution operation;

as shown in fig. 3 (c), triple attention mechanism modules are added in different layers of the Decoder structure of the Unet network to help capture the spatial dependency relationship without dimension reduction and without a large number of learning parameters, cross-dimension interaction features are extracted, and the input and output sizes of the triple attention mechanism modules are consistent. Specifically, the method comprises the following steps:

a Mixed-Pool module is constructed, the Mixed-Pool module is composed of MaxPool maximum pooling and AvePool average pooling operations, the MaxPool maximum pooling and the AvePool average pooling operations are respectively performed on the input tensor on the channel dimension, and finally, the two pooling results are combined, and the concrete implementation is shown in (d) of fig. 3. Giving input tensors with the size of (C, H, W), enabling each eigenvalue of the eigen map to be maximum in channel dimension through MaxPool maximum pooling operation, finally obtaining an eigen map (H, W), enabling each eigenvalue of the eigen map to be averaged in the channel dimension through AvePool average pooling operation, finally obtaining an eigen map (H, W), and then combining the two eigen tensors to obtain a tensor with the shape of (2, H, W);

and (e) constructing a Basic-Conv module, wherein the Basic-Conv module consists of Conv, batch Norm and Sigmoid, and sequentially performing Conv 7*7, batch Norm and Sigmoid on the input tensor, specifically realizing the operation as shown in (e) in FIG. 3. For an input tensor with given size (C, H, W), performing 7 × 7Conv2D operation with an output channel of 1, performing normalization processing through Batch Norm, and mapping values between 0 and 1 through Sigmoid to represent attention weight;

and constructing a triple attention mechanism module, wherein the triple attention mechanism module consists of three branches, and each branch is constructed based on a Mixed-Pool module and a Conv module.

Constructing a first branch module of the Triple Attention module, the first branch being used for C and H dimension interaction, such as the upper branch of the Triple Attention module in fig. 3 (C), specifically including the following steps:

giving an input tensor with the size of (C, H, W), and performing Permute (0,2,1,3) dimension transformation on H and C dimensions to obtain a tensor which is changed from the shape of (C, H, W) into the shape of (H, C, W);

carrying out Mixed-Pool operation on the tensor obtained in the last step to obtain the tensor with the shape of (2, C, W);

performing Conv operation on the tensor obtained in the previous step, namely sequentially performing Conv 7*7, batch Norm, sigmoid and the like to obtain the attention weight with the shape of (C, W);

multiplying the attention weight obtained in the last step by the tensor subjected to Permute dimension transformation to obtain an attention weighted eigenvalue;

performing the same Permute (0,2,1,3) dimension transformation again on the attention feature value obtained in the previous step, and recovering to the shape of (C, H, W) to obtain a first attention weighted branch feature Map1;

constructing a second branch module, performing C and W dimension interaction on the second branch, performing dimension transformation operation to Permute (0,3,2,1), wherein the specific operation process is similar to that of the first branch, such as the middle branch of the Triple Attention module in FIG. 3 (C), and obtaining a Attention-weighted second branch feature Map2;

constructing a third branch module for performing H and W dimension interaction, wherein the branch does not need Permutee dimension transformation, and other operation processes are similar to those of the first branch, such as the lower branch of the triple Attention module in FIG. 3 (c), so as to obtain a third branch feature Map3 weighted by attention;

weighting and combining the three branches, and respectively performing weighted summation on the attention feature values of the three branches, wherein in the embodiment, the weights are 0.25,0.25,0.5 in sequence, as shown in (c) in fig. 3, the formula is as shown in formula (2):

Outputs＝0.25*Map1+0.25*Map2+0.5*Map3 (2)

and adding an attention mechanism module, namely adding the three-branch attention mechanism module into a Decoder structure of the network, wherein the specific adding positions are respectively before the UpConv deconvolution operations of the second, third and fourth layers in the Decoder structure.

The invention improves the model structure based on the Unet network, more accurately segments the OCT image area, is beneficial to quantitatively analyzing the tumor in the endoscopic OCT image, and provides effective help for early cancer diagnosis and accurate excision.

And constructing a weighted mixed Loss function based on the combination of the Dice Loss and the Boundary Loss, and continuously optimizing the model parameters by taking the weighted mixed Loss function as the difference between the prediction result of the measurement model and the target value in the training process. And the Dice Loss function is used for monitoring the regional prediction error of the model parameters, the Boundary Loss function is used for monitoring the Boundary prediction error of the model parameters, the Dice Loss function and the Boundary Loss function are added in a weighting mode to obtain a mixed Loss function with adjustable weight, and the model is trained in a grading section through adjusting the weight. In this embodiment, the model is trained by using a strategy of adjusting the weight value of the loss function in stages. In the first stage, the weight of the Dice Loss function is set to be 1, the weight of Boundary Loss is set to be 0, and the model is made to focus on learning region characteristics; in the second stage, the weights of the Dice Loss function and the Boundary Loss function are set to be 0.5, so that the model focuses on learning of Boundary features while considering region learning. Specifically, the method comprises the following steps:

defining a Dice Loss function, wherein DiceCoef is used for characterizing the similarity between two samples, and the expression is as formula (3), X _gt For training specimen true labels, Y _pred The results are predicted for the corresponding model. DataLoss characterizes the difference of the two samples, and the expression is shown as formula (4).

DataLoss＝1-DiceCoef (4)

The method comprises the steps of inputting a Dice Loss function into a Prediction result Prediction and a real label Target, wherein the specific calculation process comprises the steps of firstly converting the Prediction into a probability value through Softmax operation, converting the Target into one-hot codes, and then respectively calculating | X _gt ∩Y _pred I and | X _gt |+|Y _pred Obtaining the Dice pass of each Batch, and then taking the average value of the Dice passes of all the batches to be the final Dice pass Loss value;

defining a Boundary Loss function, firstly defining BF1, namely a Boundary F1-score, as shown in formula (5), wherein P represents the accuracy of Boundary prediction, R represents the recall rate of Boundary prediction, and the BF1 score is between 0 and 1, and the larger the value is, the higher the accuracy rate of Boundary prediction is. BoundaryLoss is defined again, as equation (6),

BoundaryLoss＝1-BF1 (6)

wherein P for one region type C ^c And R ^c As defined in equation (7),

the specific calculation process is that firstly the Prediction is converted into probability values through Softmax operation, then the Target is converted into one-hot codes, firstly boundaries are calculated according to a Boundary calculation formula (8), namely, the Prediction regions of the Prediction and the Target regions of the Prediction are respectively calculated by adopting MaxPointThe boundaries of the regions are labeled with the kernel size of the MaxPhool operation set to 3 and the padding set to 1. And then, calculating an expansion boundary according to an expansion boundary calculation formula (9), namely calculating the expansion boundary of each Prediction region of the Prediction region and each marking region of the Target by adopting MaxPoint, setting the kernel size of MaxPoint operation to be 5, and setting padding to be 2. After the boundary is calculated, the boundary prediction accuracy and the prediction recall ratio can be calculated according to the formula (11), and then the boundary loss L of each region class is obtained according to the formula (10) _EF1c So that the region-class boundary of each Batch is lost by L _BF1c After calculation, the Boundary Loss average of all Batch is taken to be the final Boundary Loss value, as shown in formula (11).

Defining a weighted mixed loss function, as shown in formula (12), performing weighted addition on the two defined loss functions to serve as a loss function (FinalLoss) actually adopted in the model training process, wherein the weight coefficient can be adjusted, and the loss functions with different weight values can be adopted in different training stages.

FinalLoss＝α*DiceLoss+(1-α)*BoundaryLoss (12)

Model training is based on an Ubuntu18.04 LTS system version, a video card is NVIDIA GeForce RTX 3070, a PyTorch deep learning framework is adopted, a GPU is used in the training process, and the video memory is 8G. The input size of the model is 1024 × 128, the number of model input channels is 3, the number of output channels is 3, the optimizer of the training model is Adam, the BatchSize is 2, the training epochs is 600, and the learning rate is 0.0001. In the first 300 epochs, the Loss function is defined as die Loss, and in the last 300 epochs, the Loss function is defined as a weighted mixture of die Loss and Boundary Loss, with a weighting factor of 0.5. After training, the model is gradually converged, and the model with the minimum loss on the verification set is saved as the optimal model.

And training the constructed network model and learning the weight parameters. In the first stage of model training, the weight of the Dice Loss function is set to be 1, the weight of boundary Loss function is set to be 0, so that the model is subjected to key learning region characteristics, as shown in formula (13),

FinalLoss＝DiceLoss (13)

in the second stage of model training, the weights of the Dice Loss function and Boundary Loss function are both set to be 0.5, so that the model focuses on learning of Boundary characteristics while considering region learning, as shown in formula (14),

FinalLoss＝0.5*DiceLoss+0.5*BoundaryLoss (14)

the OCT image is collected from equipment, preprocessing steps such as noise reduction and GAMMA conversion are carried out, then the image is cut to extract an ROI, the size is adjusted to 1024 × 128 to be used as input, a model structure and trained network parameters are called, and the segmentation of an OCT image tumor region is achieved. Namely, the input OCT image after preprocessing is subjected to region segmentation through a trained model structure and parameters.

And smoothing the segmentation result, extracting two maximum foreground regions except the background, extracting a region contour line, and if a tumor region is predicted, calculating the ratio of the number of the pixels of the tumor region to the total number of the pixels of the OCT image according to the pixel characteristics of the identified tumor region to further obtain the area ratio of the tumor region, as shown in FIG. 4. The quantitative index extraction after tumor segmentation based on the OCT image can be used for judging the size and development change condition of the tumor, and has application values in early diagnosis of gastrointestinal tumor, postoperative follow-up, monitoring of tumor progress condition and the like.

All test samples in the test set are respectively subjected to segmentation prediction by adopting a Unet model and the model of the invention, and the following conclusion can be obtained after the prediction results are compared:

(1) The results of the prediction evaluation indexes are compared, and the evaluation indexes DSC (Dice similarity Coefficient) and Accuracy are calculated, wherein the results are shown in table 1.

TABLE 1 accuracy contrast table for different OCT image segmentation methods

(2) From the comparison between the integrity of the boundary and the result of the region segmented by mistake, it can be seen from fig. 5 that the segmentation effect of the Unet model is easy to have the phenomena of boundary prediction fracture, small region segmentation by mistake and the like, but the model segmentation provided by the invention improves the accuracy of boundary prediction and reduces the phenomenon of region segmentation by mistake.

(3) From comparison of the results of segmentation of the insignificant small tumor regions, it can be seen from fig. 6 that when the Unet model training does not use the weighted mixed loss function, it is difficult to learn the insignificant small tumor region, and when the Unet model or the model training proposed by the present invention uses the stage-wise weighted mixed loss function, the learning effect of the insignificant small tumor region is significantly improved.

The model structure based on the U-shaped network frame and the triple attention mechanism combines the ideas of multi-scale feature fusion and the triple attention mechanism, so that the model keeps a global view field in the feature learning process, the global context information is captured from a plurality of view fields, and the region features can be recovered by effectively utilizing the spatial dependency relationship among pixels in the feature summarizing process. In addition, the invention adopts a mode of combining regional loss and boundary loss to update and learn the model parameters, so that the model can be hierarchically learned and trained like the human brain. In the invention, the adopted multi-scale feature fusion module has certain calculation overhead, but the adopted triple attention mechanism is lighter than the common attention mechanism and almost has no parameters, so that the whole model structure is not complicated.

Example 2

An electronic device 200, as shown in FIG. 7, includes but is not limited to: a memory 201 having program code stored thereon; a processor 202 coupled with the memory and when the program code is executed by the processor, implementing an endoscopic OCT image segmentation method of a colorectal tumor. For the detailed description of the method, reference may be made to the corresponding description in the above method embodiments, which is not repeated herein.

Example 3

A computer readable storage medium, as shown in fig. 8, having stored thereon program instructions which, when executed, implement a method for endoscopic OCT image segmentation of colorectal tumors. For the detailed description of the method, reference may be made to the corresponding description in the above method embodiments, which is not repeated herein.

Example 4

A computer program product comprising computer program/instructions which, when executed by a processor, implement a method of endoscopic OCT image segmentation of colorectal tumours. For the detailed description of the method, reference may be made to the corresponding description in the above method embodiments, which is not repeated herein.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The foregoing is merely an example of the present specification and is not intended to limit one or more embodiments of the present specification. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of claims of one or more embodiments of the present disclosure. One or more embodiments of this specification.

Claims

1. An endoscopic OCT image segmentation method for colorectal tumors is characterized by comprising the following steps:

preprocessing an endoscopic OCT image of the colon and the rectum;

marking tissues and tumor regions of the endoscopic OCT image of the colon and rectum, and generating a training set, a verification set and a test set;

2. The endoscopic OCT image segmentation method for colorectal tumor according to claim 1, wherein the preprocessing of the endoscopic OCT image for colorectal tumor comprises the following steps:

3. The endoscopic OCT image segmentation method for colorectal tumor according to claim 1, wherein the step of labeling the tissue and tumor region of the endoscopic OCT image for colorectal tumor and generating training set, verification set and test set comprises the steps of:

4. The method for endoscopic OCT image segmentation of a colorectal tumor according to claim 1, wherein said generation of a training set, a validation set and a test set comprises the following steps:

5. The method for endoscopic OCT image segmentation of colorectal tumor according to claim 1, wherein said constructing a base network and adding a hole convolution-based multi-scale input feature fusion module and a triple Attention triple mechanism module in the base network comprises the following steps:

taking a Unet network structure as a basic network;

adding a multi-scale fusion module in an Encoder structure of the Unet network;

6. The method for endoscopic OCT image segmentation of colorectal tumor according to claim 5, wherein adding a multi-scale fusion module in the Encoder structure of Unet network comprises the following steps:

adding a multi-scale fusion module at a second layer of an Encoder structure of the Unet network, performing AvePool operation on an input image at an upper layer, and performing DialatedConv operation for a plurality of times to respectively generate a plurality of groups of output feature maps with different sensing areas, wherein each group comprises a plurality of feature maps, and fusing the obtained channel feature map with a channel feature map transmitted after the first layer of pooling operation to obtain an enhanced feature;

constructing a second branch module, carrying out C and W dimension interaction, and carrying out dimension transformation operation to Permutee (0,3,2,1) to obtain a second branch feature Map2 weighted by attention;

and adding an attention mechanism module, and adding the three-branch attention mechanism module to the Decoder structure before the UpConv deconvolution operations of the second layer, the third layer and the fourth layer.

7. The method for endoscopic OCT image segmentation of colorectal tumor according to claim 1, wherein the construction of weighted mixed Loss function based on the combination of Dice Loss and Boundary Loss, and the training of the model in hierarchical segments by adjusting the weights comprises the following steps:

defining a Dice Loss function, inputting the function as a model Prediction result Prediction and a real label Target, converting the Prediction into a probability value through Softmax operation, converting the Target into one-hot codes, and respectively solving | X _gt ∩Y _pred I and | X _gt |+|Y _pred Obtaining the Dice Loss of each Batch, and then taking the average value of the Dice Loss of all the batchs to be the final Dice Loss value;

defining a Boundary Loss function, inputting the function as a Prediction result Prediction of a model and a real label Target, converting the Prediction into a probability value through Softmax operation, converting the Target into a one-hot code, firstly calculating a Boundary according to a Boundary calculation formula, calculating an expanded Boundary according to the expanded Boundary calculation formula, calculating the Boundary Prediction accuracy and the Prediction recall rate, and calculating the Boundary Loss L of each region class _EF1c Calculating the boundary loss L of each region class of each Batch _EF1c Then, take the boundary of all BatchThe Loss average value is the final Boundary Loss value;

8. An electronic device, comprising: a memory having program code stored thereon; a processor coupled with the memory and implementing the method of any of claims 1 to 7 when the program code is executed by the processor.

9. A computer-readable storage medium, having stored thereon program instructions which, when executed, implement the method of any one of claims 1 to 7.

10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the method according to any of claims 1 to 7.