CN113205094A - Tumor image segmentation method and system based on ORSU-Net - Google Patents
Tumor image segmentation method and system based on ORSU-Net Download PDFInfo
- Publication number
- CN113205094A CN113205094A CN202110389723.5A CN202110389723A CN113205094A CN 113205094 A CN113205094 A CN 113205094A CN 202110389723 A CN202110389723 A CN 202110389723A CN 113205094 A CN113205094 A CN 113205094A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- network
- orsu
- net
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000003709 image segmentation Methods 0.000 title claims abstract description 21
- 230000011218 segmentation Effects 0.000 claims abstract description 54
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 230000009977 dual effect Effects 0.000 claims description 42
- 238000012549 training Methods 0.000 claims description 36
- 238000005457 optimization Methods 0.000 claims description 22
- 238000007781 pre-processing Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 15
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 13
- 230000014509 gene expression Effects 0.000 claims description 11
- 230000001788 irregular Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 abstract description 10
- 238000013527 convolutional neural network Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 abstract description 6
- 239000000284 extract Substances 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 28
- 238000002574 cystoscopy Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 206010005003 Bladder cancer Diseases 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 201000005112 urinary bladder cancer Diseases 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229940005740 hexametaphosphate Drugs 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Abstract
The invention discloses a tumor image segmentation method and a tumor image segmentation system based on ORSU-Net, wherein OctConv convolution is used for replacing the traditional convolution and is combined with U-Net, so that a new ORSU-Net network for tumor segmentation is provided, and a new ORSU-Net is used for frames before and after image enhancement processing, so that local and global context information is better utilized to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation.
Description
Technical Field
The invention belongs to the field of image segmentation and medical engineering combination, and particularly relates to an ORSU-Net-based tumor segmentation method and system.
Background
Bladder cancer is the ninth largest malignancy worldwide, and standard diagnostic and detection means for bladder cancer still rely on white light cystoscopy. Over a million cystoscopies are performed annually worldwide. The high recurrence of bladder tumors requires frequent monitoring and intervention by medical personnel. While papillary tumors and flat lesions that are not easily detectable under a white light cystoscope can be detected under a blue light cystoscope. Although blue light cystoscopy can improve tumor detection, it requires pre-operative intravesical instillation of hexametaphosphate and a special fluorescent cystoscope, so we should use blue light cystoscopy moderately. Therefore, there is a need for a low-cost, non-invasive, real-time, easy-to-use auxiliary imaging technique that addresses the diagnostic deficiencies of white-light cystoscopes.
Various automated segmentation systems have been proposed by many researchers using existing techniques. Early systems were based on conventional methods, primarily by means of edge detection filters and mathematical methods.
Since the 2000 s, the deep learning approach began to break away from the corner and began to demonstrate its powerful capabilities in the image processing task due to advances in computer hardware devices. With the development of the automatic image processing technology based on deep learning, the limitation of tumor identification in white light cystoscopy can be solved by adopting a deep learning method. The convolutional neural network has the capability of learning complex relationships and incorporating the existing knowledge into an inference model, and has a great development prospect. The medical image has the characteristics of simpler image semantics, more fixed structure, less image data amount, multi-mode and the like, and a network needs to be better designed to extract the characteristics of different modes, so that a better effect is achieved. U-Net has been proposed since 2015 to have a good effect on medical segmentation, and it can achieve a good effect under a small data volume by adding coding features to decoding features, but the U-Net is not deep enough to express features accurately, so that the effect is not obvious in some cases.
Disclosure of Invention
To address the limitations of white-light cystoscopy, we used convolutional neural networks to develop a deep learning algorithm to enhance the tumor detection capability of white-light cystoscopy. A cystoscope tumor detection assistant is developed, so that a doctor can perform real-time tumor detection and segmentation while performing cystoscopy, and the condition of missed detection is prevented.
In order to achieve the purpose, the invention provides the technical scheme that: an ORSU-Net-based tumor image segmentation method comprises the following steps:
step 1), data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the bladder tumor key frame by using an irregular frame, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
step 2), preprocessing a data set: carrying out image enhancement and image denoising pretreatment on the manufactured data set so as to solve the segmentation problem in the test process;
step 3), training a network model: constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
step 4), calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
step 5), optimizing the network: and taking the loss function as an optimization objective function, and enabling the segmentation network model to participate in a gradient back propagation process in network optimization to realize optimization of tumor image segmentation.
Further, in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
where (i, j) is the pixel coordinate and (M, N) is the imageG (i, j) and S (i, j) are the true value and the predicted target split pixel value, respectively,andthe weights of the BCE loss function and the KL loss function are respectively.
The invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
Further, a dual residual error network is used in the data set preprocessing module to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,andBCE loss function and KL loss function respectivelyThe weight of the number.
Compared with the prior art, the invention has the advantages and beneficial effects that: .
The invention decomposes the output characteristic diagram of the convolution layer into high and low frequency characteristic diagrams stored in different groups by using octave convolution, can safely reduce the spatial resolution of a low frequency group and reduce the spatial redundancy, and can effectively enlarge the receiving field in the pixel space by carrying out low frequency convolution operation on low frequency information. Therefore, compared with the traditional method, the method can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of the segmentation.
The method extracts multi-scale features from the step-by-step down-sampling feature map and encodes the multi-scale features into a high-resolution feature map through the step-by-step up-sampling, cascading and convolution methods. This process reduces the loss of detail caused by large-scale direct upsampling. The local features and the multi-scale features are connected using residual connection so that the network can extract features on multiple scales directly from the residual block.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an ORSU-Net network structure according to the present invention.
Fig. 3 is a schematic diagram of the basic structure of the residual error network.
Fig. 4 is a schematic diagram of a dual residual block structure.
Fig. 5 is a schematic diagram of a dual residual network structure.
Fig. 6 is a schematic diagram of the principle of octave convolution.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
The invention replaces the traditional convolution with OctConv (Octave convolution) convolution and combines with U-Net, thereby providing a new ORSU-Net (Octave residual U-Net) network for tumor segmentation, and using a new ORSU-Net (Octave residual U-Net) for frames before and after the image enhancement processing to better utilize local and global context information to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. The input convolution layer uses OctConv (octave convolution) instead of the ordinary convolution to extract the local features. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation. The structure similar to the U-Net inherits the advantages of the U-Net, and local features and multi-scale features are fused by residual connection.
The method specifically comprises the following steps:
1) data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the training sample by using an irregular frame, and classifying each pixel in an image into a corresponding class to realize pixel-level classification;
and screening the collected cystoscope detection video to obtain a key frame containing the tumor, and labeling the key frame with an irregular frame to determine the size and the position of the tumor. The training set image used in the implementation process of the invention is a sub-image with the size of 1280 multiplied by 720, and the sub-image with the specified size is extracted as the training sample by positioning the position of the tumor marked in the image, thereby overcoming the problems of inconsistent specifications of different videos and unbalanced data between normal tissues and tumors in the original image.
2) Preprocessing a data set: preprocessing the manufactured data set such as image enhancement, image denoising and the like to solve the segmentation problem in the test process;
in the embodiment of the invention, a Dual Residual network (Dual Residual Networks) is used for carrying out noise reduction processing on a data set; the specific implementation of the dual residual network for image denoising is:
a dual residual error network DuRN-U for motion blur removal is built, and the structure of the dual residual error network DuRN-U sequentially comprises the following steps: two input convolutional layers-six dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer being connected to the input.
The dual residual module structure for motion blur removal is set as follows in sequence: convolution layer with convolution kernel size of 3 × 3-upsampling and convolution layer (convolution kernel size k1, expansion coefficient d) -downsampling (convolution kernel size k2Convolutional layer of step 2), the result of the output convolutional layer is connected to the input. The parameters of six dual residual modules in the DuRN-U network are respectively as follows: k11 ═ 5, k21 ═ 3, d1 ═ 1; k12 ═ 7, k22 ═ 5, d2 ═ 1; k13 ═ 7, k23 ═ 5, d3 ═ 2; k14 ═ 11, k24 ═ 7, d4 ═ 2; k15 ═ 11, k25 ═ 5, d5 ═ 1; k 16-11, k 26-7, and d 6-3.
The basic structure of the residual error network is shown in fig. 3(a), and the residual error network is a network consisting of three modules f1, f2 and f3, and has a common structure from input to output (2)3Two paths) 8 paths: f1 → f2 → f3, f1→f2、f1→f3、f2→f3、f1、f2、f3And 1. Each module may operate as a computing unit, either attached to or detached from the host network. Taking this property of the residual network into account, it is subjected to a pairing connection operation. The operation of pairing is denoted by f and g, and one configuration of the present invention is recently to be (f)i,gi) Is regarded as a unit module as shown in fig. 3 (b). In this connection, fi, and gi are paired for any path. To achieve the performance improvement, another structure called "double residual concatenation" is considered as shown in fig. 3 (d). This structure allows any i and j to be paired with fi, and gi when i ≦ j. For example, six combinations occur in all possible paths: (f)1,g1)、(f2,g2)、f3,g3、(f1,g2)、(f1,g3) And (f2, g 3). In this way, { f } is increasediThe potential connection between gi and f can improve the performance of the image restoration task, while ensuring that f and g remain paired in all possible paths. While other connection structures, as shown in fig. 3(c), do not guarantee that f and g are paired.Such a module capable of ensuring f and g pairing is called a dual residual module (DuRB), as shown in fig. 4. The DuRB is a generic structure that contains two containers for paired operations for which two operations can be selected according to different usage scenarios. For different tasks, pairing operations of the DuRB and the entire network are specified.
The basic structure of the double residual block is shown in FIG. 4(a), where c denotes a convolutional layer (3 × 3 kernel),And T2l denote containers for the first and second operations of the pairing of the ith double residual block in the network, respectively. The normalization layer and the ReLU layer may be merged when necessary. For the case shown in FIG. 4(b), the operations of T1l, T2l are designated as [ upsample + convolution, downsample](step size is 2). Such a module is mainly aimed at motion blurred image restoration.
For the entire dual residual network, a symmetric encoding-decoding structure is employed, as shown in fig. 5. The network consists of an initial block which performs 4:1 downsampling on an input image, then performs two convolution operations, a normalization layer + ReLU (n + r), six repeated DuRB modules, and finally amplifies the output of the last DuRB to the original size in a 1:2 upsampling mode.
3) Training a network model: constructing a new ORSU-Net network model provided by the invention, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the ORSU-Net segmentation network constructed by the invention is further described in detail with reference to FIG. 2:
an ORSU-Net segmentation network with the length L being 7 is built, and the structure of the ORSU-Net segmentation network is as follows in sequence: input layer-second combination module-output layer.
The second combination module is similar to the U-Net structure and is a down-sampling convolution layer and an up-sampling convolution layer corresponding to the down-sampling convolution layer.
The encoder module comprises 7 octave convolutional layers and 5 downsampling layers respectively, and comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Connecting the input layer output with the decoder input, fusing local features and multi-scale features by residual connection
Referring to fig. 2, the main structure of the network is that in the ORSU-Net of the present invention, the input convolution layer uses octave convolution instead of the conventional convolution to perform local feature extraction, and the input feature X (H × W × C) is extractedin) Conversion to intermediate feature F with Cout channel1(x);
ORSU-Net is a U-Net-like symmetric codec structure of length L, the deeper the structure, the larger the value of L. In the invention, the length L of an ORSU-Net network is 7, an encoder module comprises 7 octave convolution layers and 5 down-sampling layers respectively, and the encoder module comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Fusing local features and multi-scale features by residual concatenation: f1x + μ (F1 (x));
referring to fig. 6, a detailed diagram of an implementation of octave convolution is shown, which is composed of four computation paths, corresponding to four terms: f (X)H;WH→H)、upsamplef(XL;W→H)、f(XL;WL→L) fpoolXH, 2; WH → L, two solid paths corresponding to information updates for the high and low frequency profiles, and two dashed paths facilitating information exchange between the two octaves.
The octave convolution decomposes the output feature map of the convolution layer into high and low frequency feature maps stored in different groups, as in the decomposition of the spatial frequency components of the natural image. It can safely reduce the spatial resolution of the low frequency clusters and reduce the spatial redundancy through information sharing between adjacent locations. In addition, the octave convolution can effectively enlarge the receiving field in the pixel space by performing the low-frequency convolution operation on the low-frequency information. The use of octave convolution can therefore further reduce the computational and memory overhead of the network, while retaining the design advantages of U-Net.
The ORSU-Net and the U-Net provided by the invention have similar structures, and can capture the multi-scale features of the image on the premise of not reducing the high-resolution features. The biggest difference is that ORSU-Net replaces the normal convolution with an octave convolution. Compared with the traditional method, the octave convolution is used, so that the calculation amount and the memory consumption can be further reduced, and meanwhile, the segmentation accuracy is improved.
4) Calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
in the training process, the invention adopts a layered training supervision strategy to replace a standard top-level supervision training and deep supervision scheme, and the tumor segmentation loss function in the invention is as follows:
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
where (i, j) is the pixel coordinate and (M, N) is the width and height of the image. G (i, j) and S (i, j) are the true value and the predicted target-segmented-pixel value, respectively.Andthe weights of the BCE loss function and the KL loss function are respectively.
For each layer, we used the standard BCE loss function and KL loss function to calculate the loss. By adding a pair of probabilistic predictive match penalties (i.e., KL penalty functions) between any two layers, multi-layer interactions between different layers are facilitated. The optimization targets of different layer loss functions are consistent, so that the robustness and the generalization of the model are ensured.
5) Optimizing the network: and taking the loss function as an optimization objective function, and enabling the convolutional neural network to participate in the gradient back propagation process in network optimization to realize optimization of cystoscope video tumor segmentation.
The embodiment of the invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
Further, a dual residual error network is used in the data set preprocessing module to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,andthe weights of the BCE loss function and the KL loss function are respectively.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (8)
1. An ORSU-Net-based tumor image segmentation method is characterized by comprising the following steps:
step 1), data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the bladder tumor key frame by using an irregular frame, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
step 2), preprocessing a data set: carrying out image enhancement and image denoising pretreatment on the manufactured data set so as to solve the segmentation problem in the test process;
step 3), training a network model: constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
step 4), calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
step 5), optimizing the network: and taking the loss function as an optimization objective function, and enabling the segmentation network model to participate in a gradient back propagation process in network optimization to realize optimization of tumor image segmentation.
2. The ORSU-Net based tumor image segmentation method of claim 1, wherein: in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
3. The ORSU-Net based tumor image segmentation method of claim 2, wherein: the dual residual network includes six dual residual modules.
4. The ORSU-Net based tumor image segmentation method of claim 1, wherein: the expression of the loss function is as follows;
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel respectivelyThe value of the one or more of the one,andthe weights of the BCE loss function and the KL loss function are respectively.
5. An ORSU-Net based tumor image segmentation system, comprising the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
6. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the data set preprocessing module uses a dual residual error network to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
7. An ORSU-Net based tumor image segmentation system according to claim 6, wherein: the dual residual network includes six dual residual modules.
8. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the expression of the loss function is as follows;
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110389723.5A CN113205094A (en) | 2021-04-12 | 2021-04-12 | Tumor image segmentation method and system based on ORSU-Net |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110389723.5A CN113205094A (en) | 2021-04-12 | 2021-04-12 | Tumor image segmentation method and system based on ORSU-Net |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113205094A true CN113205094A (en) | 2021-08-03 |
Family
ID=77026560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110389723.5A Pending CN113205094A (en) | 2021-04-12 | 2021-04-12 | Tumor image segmentation method and system based on ORSU-Net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113205094A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113781410A (en) * | 2021-08-25 | 2021-12-10 | 南京邮电大学 | Medical image segmentation method and system based on MEDU-Net + network |
CN114612479A (en) * | 2022-02-09 | 2022-06-10 | 苏州大学 | Medical image segmentation method based on global and local feature reconstruction network |
CN115908831A (en) * | 2022-11-18 | 2023-04-04 | 中国人民解放军军事科学院系统工程研究院 | Image detection method and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627019A (en) * | 2020-06-03 | 2020-09-04 | 西安理工大学 | Liver tumor segmentation method and system based on convolutional neural network |
-
2021
- 2021-04-12 CN CN202110389723.5A patent/CN113205094A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627019A (en) * | 2020-06-03 | 2020-09-04 | 西安理工大学 | Liver tumor segmentation method and system based on convolutional neural network |
Non-Patent Citations (2)
Title |
---|
CHENJIE WANG.ET AL: ""U^2-ONet: A Two一level Nested Octave U-structure with Multiscale Attention Mechanism for Moving Instances Segmentation"", 《ARXIV:2007.13092V1》 * |
XING LIU.ET AL: ""Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration"", 《ARXIV:1903.08817V1》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113781410A (en) * | 2021-08-25 | 2021-12-10 | 南京邮电大学 | Medical image segmentation method and system based on MEDU-Net + network |
CN113781410B (en) * | 2021-08-25 | 2023-10-13 | 南京邮电大学 | Medical image segmentation method and system based on MEDU-Net+network |
CN114612479A (en) * | 2022-02-09 | 2022-06-10 | 苏州大学 | Medical image segmentation method based on global and local feature reconstruction network |
CN115908831A (en) * | 2022-11-18 | 2023-04-04 | 中国人民解放军军事科学院系统工程研究院 | Image detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113205094A (en) | Tumor image segmentation method and system based on ORSU-Net | |
Zhou et al. | LSNet: Lightweight spatial boosting network for detecting salient objects in RGB-thermal images | |
US8983178B2 (en) | Apparatus and method for performing segment-based disparity decomposition | |
US8655093B2 (en) | Method and apparatus for performing segmentation of an image | |
CN113888466A (en) | Pulmonary nodule image detection method and system based on CT image | |
US20120206573A1 (en) | Method and apparatus for determining disparity of texture | |
CN115375711A (en) | Image segmentation method of global context attention network based on multi-scale fusion | |
Zhang et al. | Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation | |
Feng et al. | GCCINet: Global feature capture and cross-layer information interaction network for building extraction from remote sensing imagery | |
Shi et al. | Global-and local-aware feature augmentation with semantic orthogonality for few-shot image classification | |
CN113222824B (en) | Infrared image super-resolution and small target detection method | |
Zhou et al. | A superior image inpainting scheme using Transformer-based self-supervised attention GAN model | |
Dinh et al. | 1M parameters are enough? A lightweight CNN-based model for medical image segmentation | |
Ren et al. | A lightweight object detection network in low-light conditions based on depthwise separable pyramid network and attention mechanism on embedded platforms | |
CN112215140A (en) | 3-dimensional signal processing method based on space-time countermeasure | |
CN116681592A (en) | Image super-resolution method based on multi-scale self-adaptive non-local attention network | |
CN116310324A (en) | Pyramid cross-layer fusion decoder based on semantic segmentation | |
Liang et al. | MAXFormer: Enhanced transformer for medical image segmentation with multi-attention and multi-scale features fusion | |
Du et al. | X-ray image super-resolution reconstruction based on a multiple distillation feedback network | |
Wu et al. | Continuous Refinement-based Digital Pathology Image Assistance Scheme in Medical Decision-Making Systems | |
Wang et al. | A Novel Neural Network Based on Transformer for Polyp Image Segmentation | |
Shen et al. | Mask-guided explicit feature modulation for multispectral pedestrian detection | |
Wang et al. | EFSSD: An Enhanced Fusion SSD with Feature Fusion and Visual Object Association Method | |
Wang et al. | Two-Stage CNN Whole Heart Segmentation Combining Image Enhanced Attention Mechanism and Metric Classification | |
Xia et al. | Gastric Endoscopy Image Classification Based on MobileSiT-SimAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210803 |