CN112767406B

CN112767406B - Deep convolution neural network training method for corneal ulcer segmentation and segmentation method

Info

Publication number: CN112767406B
Application number: CN202110140538.2A
Authority: CN
Inventors: 陈新建; 王婷婷; 朱伟芳; 陈中悦
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2023-12-12
Anticipated expiration: 2041-02-02
Also published as: CN112767406A

Abstract

The application discloses a deep convolution neural network suitable for fluorescent staining slit lamp image cornea ulcer segmentation, and relates to the technical field of medical image segmentation. The effect of improving the segmentation precision of the corneal ulcer is achieved.

Description

Deep convolution neural network training method for corneal ulcer segmentation and segmentation method

Technical Field

The application relates to a deep convolution neural network training method for corneal ulcer segmentation and a segmentation method, and belongs to the technical field of medical image segmentation.

Background

Corneal ulcers are common diseases of the cornea and are the main cause of corneal blindness. Fluorescein is the most widely used diagnostic dye in optometry and ophthalmology for assessing ocular surface integrity, particularly cornea integrity. When assessing corneal ulcers using fluorescein staining, the ulcerated area appears bright green, while the other areas of the cornea appear blue or brown, so doctors employ it as a common method of diagnosing corneal ulcers in many ophthalmic examinations. Because of the different sizes and shapes of the punctate lamellar mixed corneal ulcers and lamellar corneal ulcers, the inconsistency between focuses and in-focus is caused, the accuracy of segmentation is affected, and particularly the segmentation performance of punctate micro-lesion areas is affected, so that the detection and segmentation of the corneal ulcer areas in slit lamp images is very challenging.

In recent years, researchers have proposed several semi-automatic and automatic segmentation methods for the segmentation of corneal ulcer lesions. Most of the algorithms are complicated, the segmentation accuracy is not high, and the situation of missing segmentation exists in smaller punctiform focus areas.

Disclosure of Invention

The application aims to provide a deep convolution neural network suitable for dividing corneal ulcers of fluorescent staining slit lamp images, which is used for solving the problems in the prior art.

In order to achieve the above purpose, the present application provides the following technical solutions:

according to a first aspect, an embodiment of the present application provides a deep convolutional neural network suitable for fluorescence staining slit lamp image corneal ulcer segmentation, the deep convolutional neural network comprising:

the system comprises a U-shaped encoder decoder convolutional neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, wherein the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolutional neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolutional neural network, and the deep convolutional neural network is used for segmenting fluorescent staining images of the corneal ulcers of the slit lamps.

Optionally, the multi-scale adaptive deformable module includes: parallel and deformable convolution modules, a multi-global spatial attention module, a multi-global channel attention module, and an adaptive residual module.

Optionally, the parallel and deformable convolution module includes n convolution branches and a deformable convolution layer, where each convolution branch in the n convolution branches is used to compress a channel to obtain n extracted features, the deformable convolution layer processes n features input in series to obtain an output feature, and the output feature is input to the multiple global spatial attention module and the multiple global channel attention module, where n is an integer greater than 1.

Optionally, the multi-global spatial attention module performs two-dimensional average pooling and maximum pooling operation on the input features, the two-dimensional average pooling represents an average value of all channels at corresponding positions in the input feature map, the maximum pooling operation is used for extracting spatial response information of each channel in the feature map, the output of the two-dimensional average pooling and the output features of the maximum pooling operation are spliced, the spliced features are compressed through a convolution layer, and the compressed features are normalized through a Sigmoid function, so that context information on the spatial dimension of the original feature mapping is obtained.

Optionally, the multi-global channel attention module is configured to calculate a global channel maximum value and a global channel average value of each channel feature map, concatenate the global channel maximum value feature map and the global channel average value feature map, smooth and compress the concatenated result through the convolution layer, send the size of the result to the full connection layer, and input the result to the Sigmoid function to obtain the weight of each feature map.

Optionally, the adaptive residual module is configured to multiply the features output by the multiple global spatial attention module and the multiple global channel attention module by coefficients, add the multiplication result to the result of the deformable convolution, smooth the added features by convolution, and add the smoothed features to the original features to obtain a residual mechanism.

Optionally, the global multi-scale pyramid feature aggregation module takes a current feature map and a low-level feature map of a current stage as input, processes and adjusts the low-level feature map to the size of the current feature map, superimposes and processes the current feature map and the adjusted low-level feature map, adjusts the processed features to the number of channels of a decoder layer, and inputs the processed features to the decoder.

Optionally, the global multi-scale pyramid feature aggregation module processes the superimposed features through a compression excitation module, the compression excitation module comprises a compression unit and an excitation unit, the compression unit processes the superimposed features through a global average pooling layer to obtain global information, the excitation unit processes the global information obtained through compression to obtain weight coefficients of each feature map, and the processed features are obtained according to the input features and the corresponding weight coefficients.

In a second aspect, a method for training a deep convolutional neural network for corneal ulcer segmentation is provided, the method comprising:

obtaining a sample corneal ulcer fluorescence staining image, wherein the sample corneal ulcer fluorescence staining image comprises a dot-sheet mixed corneal ulcer image and a sheet-like corneal ulcer image;

training a deep convolutional neural network for corneal ulcer segmentation from the sample corneal ulcer fluorescence staining image, the deep convolutional neural network being a network as described in the first aspect.

Optionally, before training the deep convolutional neural network for corneal ulcer segmentation according to the sample corneal ulcer fluorescence staining image, the method further comprises:

downsampling the fluorescent staining image of the sample corneal ulcer by a bilinear interpolation method;

normalizing the downsampled fluorescent dye image of the sample corneal ulcer;

the training of the deep convolutional neural network for corneal ulcer segmentation according to the sample corneal ulcer fluorescence staining image comprises the following steps: training the deep convolutional neural network according to the normalized sample cornea ulcer fluorescent staining image.

In a third aspect, a method of corneal ulcer segmentation is provided, the method comprising:

obtaining a fluorescent staining image of the corneal ulcer;

and dividing the fluorescence staining image of the corneal ulcer according to the trained deep convolution neural network, wherein the deep convolution neural network is trained by the training method of the second aspect.

Obtaining a sample cornea ulcer fluorescence staining image, wherein the sample cornea ulcer fluorescence staining image comprises a dot-sheet mixed cornea ulcer image and a sheet cornea ulcer image; the method comprises the steps of training a cornea ulcer segmentation model according to a sample cornea ulcer fluorescence staining image, wherein the cornea ulcer segmentation model comprises a U-shaped encoder decoder convolutional neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolutional neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolutional neural network, and the cornea ulcer segmentation model is used for segmenting slit lamp cornea ulcer fluorescence staining images. Namely, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module are added in the convolutional neural network of the U-shaped encoder and decoder, so that the learning capacity of the network on global semantic information is improved, the multi-scale feature of a target is guided to be focused by the model, the aggregation of context information is promoted, and the segmentation precision of a segmentation network is further improved.

The foregoing description is only an overview of the present application, and is intended to provide a better understanding of the present application, as it is embodied in the following description, with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a schematic diagram of the overall structure of a CU-SegNet network according to one embodiment of the present application;

FIG. 2 is a schematic diagram of a residual module according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a multi-scale adaptive deformable module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an SE module according to one embodiment of the present application;

FIG. 5 is a flow chart of a method for training a corneal ulcer segmentation model according to one embodiment of the present application;

FIG. 6 is a schematic representation of a fluorescent staining image of a sample corneal ulcer provided in accordance with one embodiment of the present application;

FIG. 7 is a schematic diagram of a corneal ulcer segmentation model trained by the method of the present application and segmentation results obtained by various prior methods;

fig. 8 is a flow chart of a method for dividing a corneal ulcer according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the description of the present application, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

In addition, the technical features of the different embodiments of the present application described below may be combined with each other as long as they do not collide with each other.

The application provides a deep convolution neural network suitable for fluorescent staining slit lamp image cornea ulcer segmentation, which comprises a U-shaped encoder decoder convolution neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, wherein the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolution neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolution neural network, and the deep convolution neural network is used for fluorescent staining image segmentation of slit lamp cornea ulcers.

The U-shaped convolutional neural network may be CU-SegNet, and the U-shaped encoder-decoder convolutional neural network is composed of an encoder path and a decoder path. Referring to fig. 1, the encoder path uses pretrained res net-34 as the encoder, and has a total of 5 encoder blocks, the first encoder block having a convolution layer with a 7×7 step size of 2, a batch normalization layer and a 3×3 step size of 2 max pooling layer, and the remaining four encoder blocks having 3 residual blocks and a convolution layer with a 1×1 step size of 2. Each residual block contains two convolutional layers of 3 x3 step size 1, see fig. 2 (each layer contains a batch normalization layer). Meanwhile, a shortcut mechanism exists between the input and the output, so that gradient disappearance is avoided, and network convergence is accelerated. In the decoder path, it recovers the input high-level semantic features, and the jump connection compensates the information loss caused by multiple convolution pooling operations. The decoder path has 5 decoder blocks, four of which contain one convolution layer with 1x 1 step size, one deconvolution layer with 3 x3 step size of 2 and one convolution layer with 1x 1 step size (each layer convolution contains a batch of normalized layers). The last decoder block is used to recover the image size and contains one deconvolution of 4 x 4 step size 1 and two convolution layers of 3 x3 step size 1.

In connection with fig. 1, the present application sets a global Multi-scale pyramid feature aggregation Module (MGPA) and a Multi-scale Adaptive-aware Deformation (MAD) in the U-shaped network structure, and the specific structure of each module is as follows.

Referring to fig. 3, the multi-scale adaptive deformable module includes: parallel and deformable convolution modules, a multi-global spatial attention module, a multi-global channel attention module, and an adaptive residual module. Wherein:

the parallel and deformable convolution module comprises n convolution branches and a deformable convolution layer, wherein each convolution branch in the n convolution branches is used for compressing a channel to obtain n extracted features, the deformable convolution layer processes the n features input in series to obtain output features, and the output features are input to the multi-global-space attention module and the multi-global-channel attention module, wherein n is an integer greater than 1.

For example, in one possible implementation, the parallel and deformable convolution module includes four convolution branches and one deformable convolution layer, the four convolution branches being superimposed in parallel. First, four convolution branches compress the channel to reduce the computational cost. The four convolution branches are sequentially subjected to 1×1 common convolution and 3×3 expansion convolution with rates of 1, 3, 5 and 7, and the obtained receptive fields have sizes of 3, 11, 19 and 27. These feature maps are then concatenated and input into a deformable convolutional layer that can enhance the spatial sampling locations in the module by adding additional offsets of the convolutional kernel size in the horizontal and vertical directions. Finally, the feature map output by the module is input to a multi-global space attention module and a multi-global channel attention module which are structurally parallel.

The multi-global space attention module performs two-dimensional average pooling and maximum pooling operation on input features, the two-dimensional average pooling represents an average value of all channels at corresponding positions in an input feature map, the maximum pooling operation is used for extracting space response information of each channel in the feature map, the two-dimensional average pooled output and the output features of the maximum pooling operation are spliced, the spliced features are compressed through a convolution layer, and the compressed features are normalized through a Sigmoid function to obtain context information in the space dimension of original feature mapping. Maximum pooling can extract the most significant spatial response information in each channel of the feature map. However, it may also introduce noise due to the different size and shape of the lesions. Meanwhile, the average pooling may represent an average value of all channels at corresponding positions in the input feature map. While it may suppress some of the noise interference in the channels, it also suppresses the most important spatial response information in all channels. Therefore, in order to obtain the most significant spatial response information in each channel and suppress noise interference, the multi-size global spatial attention module performs two-dimensional average pooling and maximum pooling operations on the input feature map at the same time, and inputs the feature map to the maximum pooling branch (h×w, H represents the high of the feature map, W represents the wide of the feature map) and the average pooling branch (h×w). The two branch results are stitched into a feature map of size 2×h×w. The pooled branches are followed by a convolutional layer, which is used to compress the channels of the feature map (2 XH XW) to 1 XH XW. And finally, normalizing the feature mapping to 0-1 (H multiplied by W) by adopting a Sigmoid function to obtain the context information in the space dimension of the original feature mapping. At the same time, the module can also suppress the interference of noise.

The multi-global channel attention module is used for calculating the global channel maximum value and the global channel average value of each channel feature map, cascading the global channel maximum value feature map and the global channel average value feature map, smoothing and compressing the cascading result through the convolution layer, adjusting the size of the result, sending the result to the full connection layer, and then inputting the result to the Sigmoid function to obtain the weight of each feature map.

Firstly, inputting the feature images into two parallel branches, and respectively calculating the maximum value and the average value of the feature images of each channel. Then, both the global channel maximum feature map (cx1×1, C representing the number of channels) and the global channel mean feature map (cx1×1) are concatenated, and the concatenated result is smoothed and compressed by the convolution layer. And finally, adjusting the size (C multiplied by 1) of the result, sending the result into a full-connection layer, and inputting a Sigmoid function to obtain the weight of each feature mapping. The module can obtain the response of each characteristic mapping of each channel and inhibit noise interference.

The self-adaptive residual error module is used for multiplying the characteristics output by the multi-global space attention module and the multi-global channel attention module by coefficients, adding the multiplication result to the result of the deformable convolution, smoothing the added characteristics through convolution, and adding the smoothed characteristics to the original characteristics to obtain a residual error mechanism.

The feature maps of the multiple global spatial attention module (X2) and the multiple global channel attention module (X3) are multiplied by coefficients respectively, then added together with the results of the deformable convolution (X1), and finally the feature maps are smoothed by the convolution. The learnable parameters of λ and γ are initialized to 1.0 herein. This process can be summarized as:

output = Conv (x1+λx1x2+γx1x3).

And finally, directly adding the smoothed feature map and the original feature map to construct a residual mechanism.

The global multi-scale pyramid feature aggregation module takes a current feature map and a low-level feature map of a current stage as input, processes and adjusts the low-level feature map to the size of the current feature map, superimposes and processes the current feature map and the adjusted low-level feature map, adjusts the processed features to the number of channels of a decoder layer, and inputs the processed features to the decoder.

The global multi-scale pyramid feature aggregation module processes the superimposed features through the compression excitation module, the compression excitation module comprises a compression unit and an excitation unit, the compression unit processes the superimposed features through a global average pooling layer to obtain global information, the excitation unit processes the global information obtained through compression to obtain weight coefficients of each feature map, and the processed features are obtained according to the input features and the corresponding weight coefficients.

Specifically, in connection with fig. 1, the original hopping connections in the u-shaped network create a semantic gap. In order to solve the problem, the application designs a multi-scale global pyramid feature aggregation module which is embedded into jump connection to supplement local feature information for high-level feature mapping. In the multi-scale global pyramid feature aggregation module, the feature map of the current stage F2 (C ₂ ×H ₂ ×W ₂ ) And a low-level feature map F1 (C ₁ ×H ₁ ×W ₁ ) As input. First, the F1 feature map is input to the largest pooling layer with pooling steps of 2, 3 and 6, respectively, to capture targets of different sizes and shapes, and then input to the 1x 1 convolutional layer. They are then adjusted to the scale by bilinear interpolationThe inches are the same size as the feature map of the current stage. At the same time, the F1 feature map is input to a 1x 1 convolutional layer and resized to F2 to supplement more local feature information alone. All these results are superimposed with the feature map F2 of the current stage. Finally, feature mapping ((4C) ₁ +C ₂ )×H ₂ ×W ₂ ) The 1x 1 convolutional layer is fed into the Squeeze-and-Excitation (SE) module to adjust the number of channels matching the corresponding decoder layer. The module supplements the high-level feature map with local feature information. The SE module mainly comprises two parts, namely compression (sequence) and Excitation (specification), as shown in figure 4. The compression operation is completed through a global average pooling layer, and the dimension of the result obtained in the step is Cx1×01, which indicates the numerical distribution condition of C feature graphs of the layer, or called global information, and C represents the number of channels. The excitation operation means that the result obtained by compression is firstly subjected to convolution of C/r multiplied by 11 multiplied by 1, the dimension of the result obtained by the compression is C/r multiplied by 1, r represents the compression rate, and then the output dimension is kept unchanged through a ReLU layer (activation function); then, a result obtained through convolution of Cx1×1 is output with dimension Cx1×1, and then, a sigmoid function is carried out to obtain weight coefficients of C feature graphs, wherein the C weight coefficients represent the importance degree of each channel, and the C weight coefficients are obtained through convolution operation and nonlinear layer learning. The effect of the two 1x 1 convolutions is to fuse the feature map features of each channel. And finally, carrying out dot product on the input features and the obtained weights to obtain the selected important feature information and inhibit some irrelevant information.

Referring to fig. 5, a flowchart of a method for training a deep convolutional neural network for corneal ulcer segmentation according to an embodiment of the present application is shown, and the method includes:

step 501, obtaining a sample corneal ulcer fluorescence staining image, wherein the sample corneal ulcer fluorescence staining image comprises a dot-sheet mixed corneal ulcer image and a sheet corneal ulcer image;

the sample corneal ulcer fluorescent staining image may be a gold standard corneal ulcer image. In practice, the resolution of the fluorescent dye image of each sample corneal ulcer may be 2592×1728 pixels.

For example, please refer to fig. 6, which shows a schematic diagram of several possible fluorescent staining images of a sample corneal ulcer.

In addition, the sample corneal ulcer fluorescent staining image in the present application may be an image in SUSTech-SYSU, which is used not only to develop and evaluate an automated corneal ulcer segmentation algorithm, but also to identify general and specific ulcer patterns and ulcer severity.

The application completes the training of the model based on the integrated environment of Pytorch and 1 Tesla K40 GPU (Graphics Processing Unit, graphics processor) with 12GB memory space.

Alternatively, the sample set may be divided into a training set and a test set, the corneal ulcer segmentation model is trained by the sample corneal ulcer fluorescent staining image in the training set, and the corneal ulcer segmentation model obtained by training is tested by the sample corneal ulcer fluorescent staining image in the test set. For example, if the sample set includes 354 images, the application can divide the images into four parts (90,90,90,84), train the model by adopting a four-fold cross-validation strategy, and test the performance of the model by using the validation set after model training is finished so as to evaluate the performance of the application.

Step 502, training a deep convolutional neural network according to the sample cornea ulcer fluorescence staining image, wherein the deep convolutional neural network comprises a U-shaped encoder decoder convolutional neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolutional neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolutional neural network, and the cornea ulcer segmentation model is used for segmenting the slit lamp cornea ulcer fluorescence staining image.

Optionally, after the sample corneal ulcer fluorescent staining image is obtained, the obtained sample corneal ulcer fluorescent staining image may be preprocessed, and further training is performed according to the preprocessed image in this step. Wherein the preprocessing step may include:

(1) Downsampling the fluorescent staining image of the sample corneal ulcer by a bilinear interpolation method;

for example, to reduce the computational cost of network training, all sample corneal ulcer fluorescent staining images were downsampled to a size of 512×512×3 by bilinear interpolation.

(2) And carrying out normalization treatment on the downsampled sample cornea ulcer fluorescent staining image.

Of course, in practical implementation, in order to prevent overfitting and enhance the generalization ability of the model, after the sample corneal ulcer fluorescent staining image is acquired, online data amplification operations are performed on the data, including rotation-10 degrees to 10 degrees, horizontal inversion, vertical inversion, gaussian noise addition, and affine transformation.

In actual implementation, the model can be trained by minimizing cross entropy and Dice losses by a back propagation algorithm, using an optimizer Adam to minimize the cost function, with a base learning rate set to 0.0005 and a weight decay set to 0.0001. The Batch (Batch) size was set to 4 and the number of iterations (Epoch) was set to 100.

After training the corneal ulcer segmentation model, the trained corneal ulcer segmentation model is validated. The evaluation indexes used in the application are a Dice coefficient (DSC), sensitivity (sensitivity), specificity (specificity), pearson correlation coefficient (Pearson's correlation coefficient, PCC). Wherein the definition of the Dice coefficient (Dice), sensitivity (SEN), specificity (SPE), pearson correlation coefficient (Pearson's correlation coefficient, PCC) is as follows:

wherein TP, FP, TN and FN represent true positive, false positive, true negative and false negative, respectively, and X and Y represent a prediction set and a gold standard set, respectively.

The present application evaluates and compares the original U-shaped encoder and decoder structure with the convolutional neural network CU-SegNet set forth in the present application in the test dataset. To demonstrate the effectiveness of the MGPA and MAD modules, a series of ablation experiments were performed. The experimental results are shown in table 1. The original U-shaped encoder decoder structure is denoted by "backup", by "backup+mgpa" by adding MGPA modules in the original U-shaped encoder decoder structure, by "backup+mad" by adding MAD modules in the original U-shaped encoder decoder structure, by "protected" by adding both MGPA modules and MAD modules in the original U-shaped encoder decoder structure, i.e. the method proposed in the present application. The split Dice coefficient of the original U-shaped encoder decoder structure can be seen to be 87.71%, after improvement, the split Dice coefficient of the application can reach 89.14%, and the split Dice coefficient is improved by 1.43% compared with the original U-shaped encoder decoder structure. It can be seen from table 1 that the MGPA module and the MAD module designed in the present application are each more accurate than the segmentation of the original U-shaped encoder-decoder structure.

TABLE 1

Fig. 7 shows a comparison of network partitioning of the proposed application with other improvements based on the U-shaped encoder and decoder architecture. The segmentation accuracy in the application is superior to the segmentation accuracy before improvement. In a word, the application provides two modules MGPA module and MAD module to ensure the segmentation accuracy and efficiency of the fluorescent staining image focus of the slit lamp cornea ulcer.

So far, a training method for CU-Seg Net neural network, which is an automatic segmentation method for fluorescent staining image focus of slit lamp corneal ulcers, has been realized and verified. The performance of the application in experiments is superior to that of the original U-shaped encoder and decoder structure convolutional neural network, the application can make better prediction for the automatic segmentation of the fluorescent staining image focus of the slit lamp corneal ulcer, and on the other hand, the attention modules MG PA and MAD designed in the application are not complex, and can be embedded into any convolutional neural network, so that the characteristic extraction capability of the network is stronger, the overall performance of the network is improved, the automatic segmentation of the fluorescent staining image focus of the slit lamp corneal ulcer is facilitated, and the accuracy of the automatic segmentation of the fluorescent staining image focus of the slit lamp corneal ulcer is greatly improved. The application combines the image preprocessing, the building and training of the CU-SegNet network model and the test, so that the follow-up research on the focus of the slit lamp fluorescent staining image corneal ulcer, such as the classification research of the severity of the lesion, and the like, is greatly facilitated.

In summary, by acquiring a sample corneal ulcer fluorescence staining image, the sample corneal ulcer fluorescence staining image comprises a dot-sheet mixed corneal ulcer image and a sheet corneal ulcer image; training a deep convolutional neural network according to the sample cornea ulcer fluorescent staining image, wherein the deep convolutional neural network comprises a U-shaped encoder decoder convolutional neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolutional neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolutional neural network, and the deep convolutional neural network is used for segmenting the slit lamp cornea ulcer fluorescent staining image. Namely, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module are added in the convolutional neural network of the U-shaped encoder and decoder, so that the learning capacity of the network on global semantic information is improved, the multi-scale feature of a target is guided to be focused by the model, the aggregation of context information is promoted, and the segmentation precision of a segmentation network is further improved.

Referring to fig. 8, a method flowchart of the method for dividing corneal ulcer according to the present application is shown, and as shown in fig. 8, the method includes:

step 801, obtaining a fluorescence staining image of a corneal ulcer;

and step 802, segmenting the fluorescence staining image of the corneal ulcer according to the trained deep convolutional neural network.

The deep convolutional neural network is trained by the training method of the embodiment shown in fig. 5.

In summary, by obtaining fluorescence staining images of corneal ulcers; and dividing the fluorescence staining image of the corneal ulcer according to the trained deep convolution neural network, wherein the deep convolution neural network is the network obtained by training in the embodiment. Namely, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module are added in the convolutional neural network of the U-shaped encoder and decoder, so that the learning capacity of the network on global semantic information is improved, the multi-scale feature of a target is guided to be focused by the model, the aggregation of context information is promoted, and the segmentation precision of a segmentation network is further improved.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method for training a deep convolutional neural network for corneal ulcer segmentation, the method comprising: obtaining a sample corneal ulcer fluorescence staining image, wherein the sample corneal ulcer fluorescence staining image comprises a dot-sheet mixed corneal ulcer image and a sheet-like corneal ulcer image;

training a deep convolutional neural network for corneal ulcer segmentation according to the sample corneal ulcer fluorescent staining image;

the depth convolution neural network comprises a U-shaped encoder decoder convolution neural network, a multi-scale self-adaptive deformable module and a global multi-scale pyramid feature aggregation module, wherein the multi-scale self-adaptive deformable module is embedded into the top end of the U-shaped encoder decoder convolution neural network, the global multi-scale pyramid feature aggregation module is embedded into interlayer jump connection of the U-shaped encoder decoder convolution neural network, and the depth convolution neural network is used for segmenting fluorescent staining images of corneal ulcers of slit lamps;

the multi-scale adaptive deformable module comprises: a parallel and deformable convolution module, a multi-global spatial attention module, a multi-global channel attention module and an adaptive residual module; the parallel and deformable convolution module utilizes parallel convolution branches to carry out channel compression on the original input characteristic graph and processes the characteristics of the serial input through the deformable convolution layer; the output feature map of the module is input to a multi-global-space attention module and the multi-global-channel attention module respectively; the multi-global space attention module performs two-dimensional average pooling and maximum pooling operation on the input features, then splices the features and sends the features into a convolution layer for compression, and normalizes the features through a Sigmoid function to obtain context information in the space dimension of the original feature mapping; the multi-global channel attention module calculates the global channel maximum value and the global channel average value of each channel feature map, and then sends the global channel maximum value and the global channel average value into a full-connection layer, and obtains the weight of each feature map through a Sigmoid function; the self-adaptive residual error module is used for multiplying the characteristics output by the multi-global space attention module and the multi-global channel attention module by coefficients, adding the multiplication result to the output result of the parallel and deformable convolution module, smoothing the added characteristics through convolution, and adding the smoothed characteristics to the original input characteristics to obtain a residual error mechanism.

2. The method according to claim 1, wherein the global multi-scale pyramid feature aggregation module takes as input a current feature map and a low-level feature map of a current stage, processes and adjusts the low-level feature map to a size of the current feature map, superimposes and processes the current feature map and the adjusted low-level feature map, adjusts the processed features to a channel number of a decoder layer, and inputs the processed features to the decoder.

3. The deep convolutional neural network training method for corneal ulcer segmentation according to claim 2, wherein the global multi-scale pyramid feature aggregation module processes the superimposed features through a compression excitation module, the compression excitation module comprises a compression unit and an excitation unit, the compression unit processes the superimposed features through a global average pooling layer to obtain global information, the excitation unit processes the global information obtained through compression to obtain weight coefficients of each feature map, and the processed features are obtained according to input features and corresponding weight coefficients.

4. The method of training a deep convolutional neural network for corneal ulcer segmentation according to claim 1, wherein prior to training the deep convolutional neural network for corneal ulcer segmentation from the sample corneal ulcer fluorescence staining image, the method further comprises:

normalizing the downsampled fluorescent dye image of the sample corneal ulcer;

the training of the deep convolutional neural network for corneal ulcer segmentation according to the sample corneal ulcer fluorescence staining image comprises the following steps:

training the deep convolutional neural network according to the normalized sample cornea ulcer fluorescent staining image.

5. A method of dividing a corneal ulcer, the method comprising:

obtaining a fluorescent staining image of the corneal ulcer;

dividing the fluorescence staining image of the corneal ulcer according to a trained deep convolutional neural network, wherein the deep convolutional neural network is obtained by the training method of the corneal ulcer dividing method of any one of claims 1 to 4.