CN113205094A - Tumor image segmentation method and system based on ORSU-Net - Google Patents

Tumor image segmentation method and system based on ORSU-Net Download PDF

Info

Publication number
CN113205094A
CN113205094A CN202110389723.5A CN202110389723A CN113205094A CN 113205094 A CN113205094 A CN 113205094A CN 202110389723 A CN202110389723 A CN 202110389723A CN 113205094 A CN113205094 A CN 113205094A
Authority
CN
China
Prior art keywords
segmentation
network
orsu
net
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110389723.5A
Other languages
Chinese (zh)
Inventor
罗斌
李露
杨琨
王行环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110389723.5A priority Critical patent/CN113205094A/en
Publication of CN113205094A publication Critical patent/CN113205094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Abstract

The invention discloses a tumor image segmentation method and a tumor image segmentation system based on ORSU-Net, wherein OctConv convolution is used for replacing the traditional convolution and is combined with U-Net, so that a new ORSU-Net network for tumor segmentation is provided, and a new ORSU-Net is used for frames before and after image enhancement processing, so that local and global context information is better utilized to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation.

Description

Tumor image segmentation method and system based on ORSU-Net
Technical Field
The invention belongs to the field of image segmentation and medical engineering combination, and particularly relates to an ORSU-Net-based tumor segmentation method and system.
Background
Bladder cancer is the ninth largest malignancy worldwide, and standard diagnostic and detection means for bladder cancer still rely on white light cystoscopy. Over a million cystoscopies are performed annually worldwide. The high recurrence of bladder tumors requires frequent monitoring and intervention by medical personnel. While papillary tumors and flat lesions that are not easily detectable under a white light cystoscope can be detected under a blue light cystoscope. Although blue light cystoscopy can improve tumor detection, it requires pre-operative intravesical instillation of hexametaphosphate and a special fluorescent cystoscope, so we should use blue light cystoscopy moderately. Therefore, there is a need for a low-cost, non-invasive, real-time, easy-to-use auxiliary imaging technique that addresses the diagnostic deficiencies of white-light cystoscopes.
Various automated segmentation systems have been proposed by many researchers using existing techniques. Early systems were based on conventional methods, primarily by means of edge detection filters and mathematical methods.
Since the 2000 s, the deep learning approach began to break away from the corner and began to demonstrate its powerful capabilities in the image processing task due to advances in computer hardware devices. With the development of the automatic image processing technology based on deep learning, the limitation of tumor identification in white light cystoscopy can be solved by adopting a deep learning method. The convolutional neural network has the capability of learning complex relationships and incorporating the existing knowledge into an inference model, and has a great development prospect. The medical image has the characteristics of simpler image semantics, more fixed structure, less image data amount, multi-mode and the like, and a network needs to be better designed to extract the characteristics of different modes, so that a better effect is achieved. U-Net has been proposed since 2015 to have a good effect on medical segmentation, and it can achieve a good effect under a small data volume by adding coding features to decoding features, but the U-Net is not deep enough to express features accurately, so that the effect is not obvious in some cases.
Disclosure of Invention
To address the limitations of white-light cystoscopy, we used convolutional neural networks to develop a deep learning algorithm to enhance the tumor detection capability of white-light cystoscopy. A cystoscope tumor detection assistant is developed, so that a doctor can perform real-time tumor detection and segmentation while performing cystoscopy, and the condition of missed detection is prevented.
In order to achieve the purpose, the invention provides the technical scheme that: an ORSU-Net-based tumor image segmentation method comprises the following steps:
step 1), data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the bladder tumor key frame by using an irregular frame, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
step 2), preprocessing a data set: carrying out image enhancement and image denoising pretreatment on the manufactured data set so as to solve the segmentation problem in the test process;
step 3), training a network model: constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
step 4), calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
step 5), optimizing the network: and taking the loss function as an optimization objective function, and enabling the segmentation network model to participate in a gradient back propagation process in network optimization to realize optimization of tumor image segmentation.
Further, in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
Figure BDA0003016098490000021
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure BDA0003016098490000022
Figure BDA0003016098490000023
where (i, j) is the pixel coordinate and (M, N) is the imageG (i, j) and S (i, j) are the true value and the predicted target split pixel value, respectively,
Figure BDA0003016098490000031
and
Figure BDA0003016098490000032
the weights of the BCE loss function and the KL loss function are respectively.
The invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
Further, a dual residual error network is used in the data set preprocessing module to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
Figure BDA0003016098490000033
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure BDA0003016098490000041
Figure BDA0003016098490000042
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,
Figure BDA0003016098490000043
and
Figure BDA0003016098490000044
BCE loss function and KL loss function respectivelyThe weight of the number.
Compared with the prior art, the invention has the advantages and beneficial effects that: .
The invention decomposes the output characteristic diagram of the convolution layer into high and low frequency characteristic diagrams stored in different groups by using octave convolution, can safely reduce the spatial resolution of a low frequency group and reduce the spatial redundancy, and can effectively enlarge the receiving field in the pixel space by carrying out low frequency convolution operation on low frequency information. Therefore, compared with the traditional method, the method can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of the segmentation.
The method extracts multi-scale features from the step-by-step down-sampling feature map and encodes the multi-scale features into a high-resolution feature map through the step-by-step up-sampling, cascading and convolution methods. This process reduces the loss of detail caused by large-scale direct upsampling. The local features and the multi-scale features are connected using residual connection so that the network can extract features on multiple scales directly from the residual block.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an ORSU-Net network structure according to the present invention.
Fig. 3 is a schematic diagram of the basic structure of the residual error network.
Fig. 4 is a schematic diagram of a dual residual block structure.
Fig. 5 is a schematic diagram of a dual residual network structure.
Fig. 6 is a schematic diagram of the principle of octave convolution.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
The invention replaces the traditional convolution with OctConv (Octave convolution) convolution and combines with U-Net, thereby providing a new ORSU-Net (Octave residual U-Net) network for tumor segmentation, and using a new ORSU-Net (Octave residual U-Net) for frames before and after the image enhancement processing to better utilize local and global context information to improve the segmentation effect. The ORSU-Net follows the basic structure of U-Net, extracts features through convolution operation, and extracts feature information of cystoscope tumor data in different scales through downsampling coding processing. Then, a decoding module of the latter half part of the convolutional neural network performs up-sampling processing on the cystoscope tumor data set after down-sampling processing to recover the spatial dimension of the data set and compensate details; meanwhile, the middle part of the convolutional neural network is connected through a jump layer, and the information of the network bottom layer is transmitted to a deep network. The input convolution layer uses OctConv (octave convolution) instead of the ordinary convolution to extract the local features. Compared with the traditional method, the method using the octave convolution can further reduce the calculation amount and the memory consumption, and simultaneously improve the accuracy of segmentation. The structure similar to the U-Net inherits the advantages of the U-Net, and local features and multi-scale features are fused by residual connection.
The method specifically comprises the following steps:
1) data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the training sample by using an irregular frame, and classifying each pixel in an image into a corresponding class to realize pixel-level classification;
and screening the collected cystoscope detection video to obtain a key frame containing the tumor, and labeling the key frame with an irregular frame to determine the size and the position of the tumor. The training set image used in the implementation process of the invention is a sub-image with the size of 1280 multiplied by 720, and the sub-image with the specified size is extracted as the training sample by positioning the position of the tumor marked in the image, thereby overcoming the problems of inconsistent specifications of different videos and unbalanced data between normal tissues and tumors in the original image.
2) Preprocessing a data set: preprocessing the manufactured data set such as image enhancement, image denoising and the like to solve the segmentation problem in the test process;
in the embodiment of the invention, a Dual Residual network (Dual Residual Networks) is used for carrying out noise reduction processing on a data set; the specific implementation of the dual residual network for image denoising is:
a dual residual error network DuRN-U for motion blur removal is built, and the structure of the dual residual error network DuRN-U sequentially comprises the following steps: two input convolutional layers-six dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer being connected to the input.
The dual residual module structure for motion blur removal is set as follows in sequence: convolution layer with convolution kernel size of 3 × 3-upsampling and convolution layer (convolution kernel size k1, expansion coefficient d) -downsampling (convolution kernel size k2Convolutional layer of step 2), the result of the output convolutional layer is connected to the input. The parameters of six dual residual modules in the DuRN-U network are respectively as follows: k11 ═ 5, k21 ═ 3, d1 ═ 1; k12 ═ 7, k22 ═ 5, d2 ═ 1; k13 ═ 7, k23 ═ 5, d3 ═ 2; k14 ═ 11, k24 ═ 7, d4 ═ 2; k15 ═ 11, k25 ═ 5, d5 ═ 1; k 16-11, k 26-7, and d 6-3.
The basic structure of the residual error network is shown in fig. 3(a), and the residual error network is a network consisting of three modules f1, f2 and f3, and has a common structure from input to output (2)3Two paths) 8 paths: f1 → f2 → f3, f1→f2、f1→f3、f2→f3、f1、f2、f3And 1. Each module may operate as a computing unit, either attached to or detached from the host network. Taking this property of the residual network into account, it is subjected to a pairing connection operation. The operation of pairing is denoted by f and g, and one configuration of the present invention is recently to be (f)i,gi) Is regarded as a unit module as shown in fig. 3 (b). In this connection, fi, and gi are paired for any path. To achieve the performance improvement, another structure called "double residual concatenation" is considered as shown in fig. 3 (d). This structure allows any i and j to be paired with fi, and gi when i ≦ j. For example, six combinations occur in all possible paths: (f)1,g1)、(f2,g2)、f3,g3、(f1,g2)、(f1,g3) And (f2, g 3). In this way, { f } is increasediThe potential connection between gi and f can improve the performance of the image restoration task, while ensuring that f and g remain paired in all possible paths. While other connection structures, as shown in fig. 3(c), do not guarantee that f and g are paired.Such a module capable of ensuring f and g pairing is called a dual residual module (DuRB), as shown in fig. 4. The DuRB is a generic structure that contains two containers for paired operations for which two operations can be selected according to different usage scenarios. For different tasks, pairing operations of the DuRB and the entire network are specified.
The basic structure of the double residual block is shown in FIG. 4(a), where c denotes a convolutional layer (3 × 3 kernel),
Figure BDA0003016098490000061
And T2l denote containers for the first and second operations of the pairing of the ith double residual block in the network, respectively. The normalization layer and the ReLU layer may be merged when necessary. For the case shown in FIG. 4(b), the operations of T1l, T2l are designated as [ upsample + convolution, downsample](step size is 2). Such a module is mainly aimed at motion blurred image restoration.
For the entire dual residual network, a symmetric encoding-decoding structure is employed, as shown in fig. 5. The network consists of an initial block which performs 4:1 downsampling on an input image, then performs two convolution operations, a normalization layer + ReLU (n + r), six repeated DuRB modules, and finally amplifies the output of the last DuRB to the original size in a 1:2 upsampling mode.
3) Training a network model: constructing a new ORSU-Net network model provided by the invention, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the ORSU-Net segmentation network constructed by the invention is further described in detail with reference to FIG. 2:
an ORSU-Net segmentation network with the length L being 7 is built, and the structure of the ORSU-Net segmentation network is as follows in sequence: input layer-second combination module-output layer.
The second combination module is similar to the U-Net structure and is a down-sampling convolution layer and an up-sampling convolution layer corresponding to the down-sampling convolution layer.
The encoder module comprises 7 octave convolutional layers and 5 downsampling layers respectively, and comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Connecting the input layer output with the decoder input, fusing local features and multi-scale features by residual connection
Referring to fig. 2, the main structure of the network is that in the ORSU-Net of the present invention, the input convolution layer uses octave convolution instead of the conventional convolution to perform local feature extraction, and the input feature X (H × W × C) is extractedin) Conversion to intermediate feature F with Cout channel1(x);
ORSU-Net is a U-Net-like symmetric codec structure of length L, the deeper the structure, the larger the value of L. In the invention, the length L of an ORSU-Net network is 7, an encoder module comprises 7 octave convolution layers and 5 down-sampling layers respectively, and the encoder module comprises 32, 64, 128, 256 and 512 feature maps respectively; the decoder module is symmetrical to the decoder module and comprises 5 upsampling layers and 7 octave convolutional layers, and the decoder module respectively comprises 512 feature maps, 256 feature maps, 64 feature maps and 32 feature maps. Fusing local features and multi-scale features by residual concatenation: f1x + μ (F1 (x));
referring to fig. 6, a detailed diagram of an implementation of octave convolution is shown, which is composed of four computation paths, corresponding to four terms: f (X)H;WH→H)、upsamplef(XL;W→H)、f(XL;WL→L) fpoolXH, 2; WH → L, two solid paths corresponding to information updates for the high and low frequency profiles, and two dashed paths facilitating information exchange between the two octaves.
The octave convolution decomposes the output feature map of the convolution layer into high and low frequency feature maps stored in different groups, as in the decomposition of the spatial frequency components of the natural image. It can safely reduce the spatial resolution of the low frequency clusters and reduce the spatial redundancy through information sharing between adjacent locations. In addition, the octave convolution can effectively enlarge the receiving field in the pixel space by performing the low-frequency convolution operation on the low-frequency information. The use of octave convolution can therefore further reduce the computational and memory overhead of the network, while retaining the design advantages of U-Net.
The ORSU-Net and the U-Net provided by the invention have similar structures, and can capture the multi-scale features of the image on the premise of not reducing the high-resolution features. The biggest difference is that ORSU-Net replaces the normal convolution with an octave convolution. Compared with the traditional method, the octave convolution is used, so that the calculation amount and the memory consumption can be further reduced, and meanwhile, the segmentation accuracy is improved.
4) Calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
in the training process, the invention adopts a layered training supervision strategy to replace a standard top-level supervision training and deep supervision scheme, and the tumor segmentation loss function in the invention is as follows:
Figure BDA0003016098490000071
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure BDA0003016098490000072
Figure BDA0003016098490000073
where (i, j) is the pixel coordinate and (M, N) is the width and height of the image. G (i, j) and S (i, j) are the true value and the predicted target-segmented-pixel value, respectively.
Figure BDA0003016098490000074
And
Figure BDA0003016098490000075
the weights of the BCE loss function and the KL loss function are respectively.
For each layer, we used the standard BCE loss function and KL loss function to calculate the loss. By adding a pair of probabilistic predictive match penalties (i.e., KL penalty functions) between any two layers, multi-layer interactions between different layers are facilitated. The optimization targets of different layer loss functions are consistent, so that the robustness and the generalization of the model are ensured.
5) Optimizing the network: and taking the loss function as an optimization objective function, and enabling the convolutional neural network to participate in the gradient back propagation process in network optimization to realize optimization of cystoscope video tumor segmentation.
The embodiment of the invention also provides an ORSU-Net-based tumor image segmentation system, which comprises the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
Further, a dual residual error network is used in the data set preprocessing module to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
Further, the dual residual network includes six dual residual modules.
Further, the expression of the loss function is as follows;
Figure BDA0003016098490000081
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure BDA0003016098490000091
Figure BDA0003016098490000092
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,
Figure BDA0003016098490000093
and
Figure BDA0003016098490000094
the weights of the BCE loss function and the KL loss function are respectively.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (8)

1. An ORSU-Net-based tumor image segmentation method is characterized by comprising the following steps:
step 1), data set acquisition: collecting a medical cystoscope detection video, selecting a bladder tumor key frame from a video stream to make a training sample, labeling the bladder tumor key frame by using an irregular frame, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
step 2), preprocessing a data set: carrying out image enhancement and image denoising pretreatment on the manufactured data set so as to solve the segmentation problem in the test process;
step 3), training a network model: constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the step 2, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
step 4), calculating a loss function: calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
step 5), optimizing the network: and taking the loss function as an optimization objective function, and enabling the segmentation network model to participate in a gradient back propagation process in network optimization to realize optimization of tumor image segmentation.
2. The ORSU-Net based tumor image segmentation method of claim 1, wherein: in the step 2), a dual residual error network is used for carrying out noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
3. The ORSU-Net based tumor image segmentation method of claim 2, wherein: the dual residual network includes six dual residual modules.
4. The ORSU-Net based tumor image segmentation method of claim 1, wherein: the expression of the loss function is as follows;
Figure FDA0003016098480000011
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure FDA0003016098480000021
Figure FDA0003016098480000022
wherein (i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel respectivelyThe value of the one or more of the one,
Figure FDA0003016098480000023
and
Figure FDA0003016098480000024
the weights of the BCE loss function and the KL loss function are respectively.
5. An ORSU-Net based tumor image segmentation system, comprising the following modules:
the data set acquisition module is used for acquiring a medical cystoscope detection video, selecting bladder tumor key frames from a video stream to make a training sample, labeling the bladder tumor key frames by using irregular frames, and dividing each pixel in an image into corresponding categories to realize pixel-level classification;
the data set preprocessing module is used for carrying out image enhancement and image denoising preprocessing on the manufactured data set so as to solve the segmentation problem in the test process;
the network model training module is used for constructing a new ORSU-Net segmentation network model, training the constructed network model by using the training samples generated in the data set preprocessing module, and generating a prediction mask;
the structure of the ORSU-Net segmentation network model comprises the following components in sequential connection: input layer-second combination module-output layer;
the second combination module comprises an encoder module and a decoder module, wherein the encoder module comprises n1 octave convolutional layers and n2 downsample layers, the decoder module is symmetrical to the encoder module and comprises n2 upsample layers and n1 octave convolutional layers, and the local feature and the multi-scale feature are fused through residual connection;
the loss function calculation module is used for calculating the loss between the pre-training tumor prediction segmentation result and the tumor segmentation truth value;
and the network optimization module is used for taking the loss function as an optimization target function, so that the segmentation network model participates in the gradient back propagation process in network optimization, and the optimization of tumor image segmentation is realized.
6. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the data set preprocessing module uses a dual residual error network to perform noise reduction processing on the data set, and the specific implementation mode is as follows;
the network structure of the dual residual error network comprises: two input convolutional layers-n 3 dual residual modules for motion blur removal-two output convolutional layers, the result of the second output convolutional layer is residual connected with the input;
the structure of a dual residual module for removing motion blur is set as follows: a first convolutional layer-an upsampling layer and a second convolutional layer-a downsampling layer, wherein the second convolutional layer has a convolutional kernel size of k1 and a coefficient of expansion of d, and the downsampling is such that the convolutional kernel size is k2Convolution layer with step size of 2.
7. An ORSU-Net based tumor image segmentation system according to claim 6, wherein: the dual residual network includes six dual residual modules.
8. An ORSU-Net based tumor image segmentation system according to claim 5, wherein: the expression of the loss function is as follows;
Figure FDA0003016098480000031
where H denotes the number of network layers, lBCE、lKLThe functional expressions of (a) are respectively:
Figure FDA0003016098480000032
Figure FDA0003016098480000033
wherein(i, j) is the pixel coordinate, (M, N) is the width and height of the image, G (i, j) and S (i, j) are the true value and the predicted target segmentation pixel value, respectively,
Figure FDA0003016098480000034
and
Figure FDA0003016098480000035
the weights of the BCE loss function and the KL loss function are respectively.
CN202110389723.5A 2021-04-12 2021-04-12 Tumor image segmentation method and system based on ORSU-Net Pending CN113205094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110389723.5A CN113205094A (en) 2021-04-12 2021-04-12 Tumor image segmentation method and system based on ORSU-Net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110389723.5A CN113205094A (en) 2021-04-12 2021-04-12 Tumor image segmentation method and system based on ORSU-Net

Publications (1)

Publication Number Publication Date
CN113205094A true CN113205094A (en) 2021-08-03

Family

ID=77026560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110389723.5A Pending CN113205094A (en) 2021-04-12 2021-04-12 Tumor image segmentation method and system based on ORSU-Net

Country Status (1)

Country Link
CN (1) CN113205094A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781410A (en) * 2021-08-25 2021-12-10 南京邮电大学 Medical image segmentation method and system based on MEDU-Net + network
CN114612479A (en) * 2022-02-09 2022-06-10 苏州大学 Medical image segmentation method based on global and local feature reconstruction network
CN115908831A (en) * 2022-11-18 2023-04-04 中国人民解放军军事科学院系统工程研究院 Image detection method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627019A (en) * 2020-06-03 2020-09-04 西安理工大学 Liver tumor segmentation method and system based on convolutional neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627019A (en) * 2020-06-03 2020-09-04 西安理工大学 Liver tumor segmentation method and system based on convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENJIE WANG.ET AL: ""U^2-ONet: A Two一level Nested Octave U-structure with Multiscale Attention Mechanism for Moving Instances Segmentation"", 《ARXIV:2007.13092V1》 *
XING LIU.ET AL: ""Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration"", 《ARXIV:1903.08817V1》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781410A (en) * 2021-08-25 2021-12-10 南京邮电大学 Medical image segmentation method and system based on MEDU-Net + network
CN113781410B (en) * 2021-08-25 2023-10-13 南京邮电大学 Medical image segmentation method and system based on MEDU-Net+network
CN114612479A (en) * 2022-02-09 2022-06-10 苏州大学 Medical image segmentation method based on global and local feature reconstruction network
CN115908831A (en) * 2022-11-18 2023-04-04 中国人民解放军军事科学院系统工程研究院 Image detection method and device

Similar Documents

Publication Publication Date Title
CN113205094A (en) Tumor image segmentation method and system based on ORSU-Net
Zhou et al. LSNet: Lightweight spatial boosting network for detecting salient objects in RGB-thermal images
US8983178B2 (en) Apparatus and method for performing segment-based disparity decomposition
US8655093B2 (en) Method and apparatus for performing segmentation of an image
CN113888466A (en) Pulmonary nodule image detection method and system based on CT image
US20120206573A1 (en) Method and apparatus for determining disparity of texture
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
Zhang et al. Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation
Feng et al. GCCINet: Global feature capture and cross-layer information interaction network for building extraction from remote sensing imagery
Shi et al. Global-and local-aware feature augmentation with semantic orthogonality for few-shot image classification
CN113222824B (en) Infrared image super-resolution and small target detection method
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
Dinh et al. 1M parameters are enough? A lightweight CNN-based model for medical image segmentation
Ren et al. A lightweight object detection network in low-light conditions based on depthwise separable pyramid network and attention mechanism on embedded platforms
CN112215140A (en) 3-dimensional signal processing method based on space-time countermeasure
CN116681592A (en) Image super-resolution method based on multi-scale self-adaptive non-local attention network
CN116310324A (en) Pyramid cross-layer fusion decoder based on semantic segmentation
Liang et al. MAXFormer: Enhanced transformer for medical image segmentation with multi-attention and multi-scale features fusion
Du et al. X-ray image super-resolution reconstruction based on a multiple distillation feedback network
Wu et al. Continuous Refinement-based Digital Pathology Image Assistance Scheme in Medical Decision-Making Systems
Wang et al. A Novel Neural Network Based on Transformer for Polyp Image Segmentation
Shen et al. Mask-guided explicit feature modulation for multispectral pedestrian detection
Wang et al. EFSSD: An Enhanced Fusion SSD with Feature Fusion and Visual Object Association Method
Wang et al. Two-Stage CNN Whole Heart Segmentation Combining Image Enhanced Attention Mechanism and Metric Classification
Xia et al. Gastric Endoscopy Image Classification Based on MobileSiT-SimAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210803