CN113205094A

CN113205094A - Tumor image segmentation method and system based on ORSU-Net

Info

Publication number: CN113205094A
Application number: CN202110389723.5A
Authority: CN
Inventors: 罗斌; 李露; 杨琨; 王行环
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-08-03

Abstract

The invention discloses a tumor image segmentation method and a tumor image segmentation system based on ORSU-Net, wherein OctConv convolution is used for replacing the traditional convolution and is combined with U-Net, so that a new ORSU-Net network for tumor segmentation is provided, and a new ORSU-Net is used for frames before and after image enhancement processing, so that local and global context information is better utilized to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation.

Description

Tumor image segmentation method and system based on ORSU-Net

Technical Field

The invention belongs to the field of image segmentation and medical engineering combination, and particularly relates to an ORSU-Net-based tumor segmentation method and system.

Background

Bladder cancer is the ninth largest malignancy worldwide, and standard diagnostic and detection means for bladder cancer still rely on white light cystoscopy. Over a million cystoscopies are performed annually worldwide. The high recurrence of bladder tumors requires frequent monitoring and intervention by medical personnel. While papillary tumors and flat lesions that are not easily detectable under a white light cystoscope can be detected under a blue light cystoscope. Although blue light cystoscopy can improve tumor detection, it requires pre-operative intravesical instillation of hexametaphosphate and a special fluorescent cystoscope, so we should use blue light cystoscopy moderately. Therefore, there is a need for a low-cost, non-invasive, real-time, easy-to-use auxiliary imaging technique that addresses the diagnostic deficiencies of white-light cystoscopes.

Various automated segmentation systems have been proposed by many researchers using existing techniques. Early systems were based on conventional methods, primarily by means of edge detection filters and mathematical methods.

Since the 2000 s, the deep learning approach began to break away from the corner and began to demonstrate its powerful capabilities in the image processing task due to advances in computer hardware devices. With the development of the automatic image processing technology based on deep learning, the limitation of tumor identification in white light cystoscopy can be solved by adopting a deep learning method. The convolutional neural network has the capability of learning complex relationships and incorporating the existing knowledge into an inference model, and has a great development prospect. The medical image has the characteristics of simpler image semantics, more fixed structure, less image data amount, multi-mode and the like, and a network needs to be better designed to extract the characteristics of different modes, so that a better effect is achieved. U-Net has been proposed since 2015 to have a good effect on medical segmentation, and it can achieve a good effect under a small data volume by adding coding features to decoding features, but the U-Net is not deep enough to express features accurately, so that the effect is not obvious in some cases.

Disclosure of Invention

To address the limitations of white-light cystoscopy, we used convolutional neural networks to develop a deep learning algorithm to enhance the tumor detection capability of white-light cystoscopy. A cystoscope tumor detection assistant is developed, so that a doctor can perform real-time tumor detection and segmentation while performing cystoscopy, and the condition of missed detection is prevented.

In order to achieve the purpose, the invention provides the technical scheme that: an ORSU-Net-based tumor image segmentation method comprises the following steps:

step 1), data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the bladder tumor key frame by using an irregular frame, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;

step 2), preprocessing a data set: carrying out image enhancement and image denoising pretreatment on the manufactured data set so as to solve the segmentation problem in the test process;

step 3), training a network model: constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;

the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;

the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;

step 4), calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;

step 5), optimizing the network: and taking the loss function as an optimization objective function, and enabling the segmentation network model to participate in a gradient back propagation process in network optimization to realize optimization of tumor image segmentation.

Further, in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;

the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;

the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k₂Convolution layer with step size of 2.

Further, the dual residual network includes six dual residual modules.

Further, the expression of the loss function is as follows;

where H denotes the number of network layers, l_BCE、l_KLThe functional expressions of (a) are respectively:

where (i, j) is the pixel coordinate and (M, N) is the imageG (i, j) and S (i, j) are the true value and the predicted target split pixel value, respectively,

and

the weights of the BCE loss function and the KL loss function are respectively.

The invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:

the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;

the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;

the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;

the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;

and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.

Further, a dual residual error network is used in the data set preprocessing module to perform noise reduction processing on the data set, and the specific implementation mode is as follows;

Further, the dual residual network includes six dual residual modules.

Further, the expression of the loss function is as follows;

wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,

and

BCE loss function and KL loss function respectivelyThe weight of the number.

Compared with the prior art, the invention has the advantages and beneficial effects that: .

The invention decomposes the output characteristic diagram of the convolution layer into high and low frequency characteristic diagrams stored in different groups by using octave convolution, can safely reduce the spatial resolution of a low frequency group and reduce the spatial redundancy, and can effectively enlarge the receiving field in the pixel space by carrying out low frequency convolution operation on low frequency information. Therefore, compared with the traditional method, the method can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of the segmentation.

The method extracts multi-scale features from the step-by-step down-sampling feature map and encodes the multi-scale features into a high-resolution feature map through the step-by-step up-sampling, cascading and convolution methods. This process reduces the loss of detail caused by large-scale direct upsampling. The local features and the multi-scale features are connected using residual connection so that the network can extract features on multiple scales directly from the residual block.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a schematic diagram of an ORSU-Net network structure according to the present invention.

Fig. 3 is a schematic diagram of the basic structure of the residual error network.

Fig. 4 is a schematic diagram of a dual residual block structure.

Fig. 5 is a schematic diagram of a dual residual network structure.

Fig. 6 is a schematic diagram of the principle of octave convolution.

Detailed Description

The technical solution of the present invention is further explained with reference to the drawings and the embodiments.

The invention replaces the traditional convolution with OctConv (Octave convolution) convolution and combines with U-Net, thereby providing a new ORSU-Net (Octave residual U-Net) network for tumor segmentation, and using a new ORSU-Net (Octave residual U-Net) for frames before and after the image enhancement processing to better utilize local and global context information to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. The input convolution layer uses OctConv (octave convolution) instead of the ordinary convolution to extract the local features. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation. The structure similar to the U-Net inherits the advantages of the U-Net, and local features and multi-scale features are fused by residual connection.

The method specifically comprises the following steps:

1) data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the training sample by using an irregular frame, and classifying each pixel in an image into a corresponding class to realize pixel-level classification;

and screening the collected cystoscope detection video to obtain a key frame containing the tumor, and labeling the key frame with an irregular frame to determine the size and the position of the tumor. The training set image used in the implementation process of the invention is a sub-image with the size of 1280 multiplied by 720, and the sub-image with the specified size is extracted as the training sample by positioning the position of the tumor marked in the image, thereby overcoming the problems of inconsistent specifications of different videos and unbalanced data between normal tissues and tumors in the original image.

2) Preprocessing a data set: preprocessing the manufactured data set such as image enhancement, image denoising and the like to solve the segmentation problem in the test process;

in the embodiment of the invention, a Dual Residual network (Dual Residual Networks) is used for carrying out noise reduction processing on a data set; the specific implementation of the dual residual network for image denoising is:

a dual residual error network DuRN-U for motion blur removal is built, and the structure of the dual residual error network DuRN-U sequentially comprises the following steps: two input convolutional layers-six dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer being connected to the input.

The dual residual module structure for motion blur removal is set as follows in sequence: convolution layer with convolution kernel size of 3 × 3-upsampling and convolution layer (convolution kernel size k1, expansion coefficient d) -downsampling (convolution kernel size k₂Convolutional layer of step 2), the result of the output convolutional layer is connected to the input. The parameters of six dual residual modules in the DuRN-U network are respectively as follows: k11 ═ 5, k21 ═ 3, d1 ═ 1; k12 ═ 7, k22 ═ 5, d2 ═ 1; k13 ═ 7, k23 ═ 5, d3 ═ 2; k14 ═ 11, k24 ═ 7, d4 ═ 2; k15 ═ 11, k25 ═ 5, d5 ═ 1; k 16-11, k 26-7, and d 6-3.

The basic structure of the residual error network is shown in fig. 3(a), and the residual error network is a network consisting of three modules f1, f2 and f3, and has a common structure from input to output (2)³Two paths) 8 paths: f1 → f2 → f3, f₁→f₂、f1→f3、f₂→f₃、f1、f₂、f₃And 1. Each module may operate as a computing unit, either attached to or detached from the host network. Taking this property of the residual network into account, it is subjected to a pairing connection operation. The operation of pairing is denoted by f and g, and one configuration of the present invention is recently to be (f)_i，g_i) Is regarded as a unit module as shown in fig. 3 (b). In this connection, fi, and gi are paired for any path. To achieve the performance improvement, another structure called "double residual concatenation" is considered as shown in fig. 3 (d). This structure allows any i and j to be paired with fi, and gi when i ≦ j. For example, six combinations occur in all possible paths: (f)₁,g₁)、(f2,g2)、f3,g3、(f1,g2)、(f₁,g₃) And (f2, g 3). In this way, { f } is increased_iThe potential connection between gi and f can improve the performance of the image restoration task, while ensuring that f and g remain paired in all possible paths. While other connection structures, as shown in fig. 3(c), do not guarantee that f and g are paired.Such a module capable of ensuring f and g pairing is called a dual residual module (DuRB), as shown in fig. 4. The DuRB is a generic structure that contains two containers for paired operations for which two operations can be selected according to different usage scenarios. For different tasks, pairing operations of the DuRB and the entire network are specified.

The basic structure of the double residual block is shown in FIG. 4(a), where c denotes a convolutional layer (3 × 3 kernel),

And T2l denote containers for the first and second operations of the pairing of the ith double residual block in the network, respectively. The normalization layer and the ReLU layer may be merged when necessary. For the case shown in FIG. 4(b), the operations of T1l, T2l are designated as [ upsample + convolution, downsample](step size is 2). Such a module is mainly aimed at motion blurred image restoration.

For the entire dual residual network, a symmetric encoding-decoding structure is employed, as shown in fig. 5. The network consists of an initial block which performs 4:1 downsampling on an input image, then performs two convolution operations, a normalization layer + ReLU (n + r), six repeated DuRB modules, and finally amplifies the output of the last DuRB to the original size in a 1:2 upsampling mode.

3) Training a network model: constructing a new ORSU-Net network model provided by the invention, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;

the ORSU-Net segmentation network constructed by the invention is further described in detail with reference to FIG. 2:

an ORSU-Net segmentation network with the length L being 7 is built, and the structure of the ORSU-Net segmentation network is as follows in sequence: input layer-second combination module-output layer.

The second combination module is similar to the U-Net structure and is a down-sampling convolution layer and an up-sampling convolution layer corresponding to the down-sampling convolution layer.

The encoder module comprises 7 octave convolutional layers and 5 downsampling layers respectively, and comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Connecting the input layer output with the decoder input, fusing local features and multi-scale features by residual connection

Referring to fig. 2, the main structure of the network is that in the ORSU-Net of the present invention, the input convolution layer uses octave convolution instead of the conventional convolution to perform local feature extraction, and the input feature X (H × W × C) is extracted_in) Conversion to intermediate feature F with Cout channel₁(x)；

ORSU-Net is a U-Net-like symmetric codec structure of length L, the deeper the structure, the larger the value of L. In the invention, the length L of an ORSU-Net network is 7, an encoder module comprises 7 octave convolution layers and 5 down-sampling layers respectively, and the encoder module comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Fusing local features and multi-scale features by residual concatenation: f1x + μ (F1 (x));

referring to fig. 6, a detailed diagram of an implementation of octave convolution is shown, which is composed of four computation paths, corresponding to four terms: f (X)^H；W^H→H)、upsamplef(XL；W→H)、f(X^L；W^L→L) fpoolXH, 2; WH → L, two solid paths corresponding to information updates for the high and low frequency profiles, and two dashed paths facilitating information exchange between the two octaves.

The octave convolution decomposes the output feature map of the convolution layer into high and low frequency feature maps stored in different groups, as in the decomposition of the spatial frequency components of the natural image. It can safely reduce the spatial resolution of the low frequency clusters and reduce the spatial redundancy through information sharing between adjacent locations. In addition, the octave convolution can effectively enlarge the receiving field in the pixel space by performing the low-frequency convolution operation on the low-frequency information. The use of octave convolution can therefore further reduce the computational and memory overhead of the network, while retaining the design advantages of U-Net.

The ORSU-Net and the U-Net provided by the invention have similar structures, and can capture the multi-scale features of the image on the premise of not reducing the high-resolution features. The biggest difference is that ORSU-Net replaces the normal convolution with an octave convolution. Compared with the traditional method, the octave convolution is used, so that the calculation amount and the memory consumption can be further reduced, and meanwhile, the segmentation accuracy is improved.

4) Calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;

in the training process, the invention adopts a layered training supervision strategy to replace a standard top-level supervision training and deep supervision scheme, and the tumor segmentation loss function in the invention is as follows:

where (i, j) is the pixel coordinate and (M, N) is the width and height of the image. G (i, j) and S (i, j) are the true value and the predicted target-segmented-pixel value, respectively.

And

the weights of the BCE loss function and the KL loss function are respectively.

For each layer, we used the standard BCE loss function and KL loss function to calculate the loss. By adding a pair of probabilistic predictive match penalties (i.e., KL penalty functions) between any two layers, multi-layer interactions between different layers are facilitated. The optimization targets of different layer loss functions are consistent, so that the robustness and the generalization of the model are ensured.

5) Optimizing the network: and taking the loss function as an optimization objective function, and enabling the convolutional neural network to participate in the gradient back propagation process in network optimization to realize optimization of cystoscope video tumor segmentation.

The embodiment of the invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:

Further, the dual residual network includes six dual residual modules.

Further, the expression of the loss function is as follows;

and

the weights of the BCE loss function and the KL loss function are respectively.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. An ORSU-Net-based tumor image segmentation method is characterized by comprising the following steps:

2. The ORSU-Net based tumor image segmentation method of claim 1, wherein: in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;

3. The ORSU-Net based tumor image segmentation method of claim 2, wherein: the dual residual network includes six dual residual modules.

4. The ORSU-Net based tumor image segmentation method of claim 1, wherein: the expression of the loss function is as follows;

wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel respectivelyThe value of the one or more of the one,

and

the weights of the BCE loss function and the KL loss function are respectively.

5. An ORSU-Net based tumor image segmentation system, comprising the following modules:

6. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the data set preprocessing module uses a dual residual error network to perform noise reduction processing on the data set, and the specific implementation mode is as follows;

7. An ORSU-Net based tumor image segmentation system according to claim 6, wherein: the dual residual network includes six dual residual modules.

8. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the expression of the loss function is as follows;

wherein(i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,

and

the weights of the BCE loss function and the KL loss function are respectively.