CN114693933A

CN114693933A - Medical image segmentation device based on generation of confrontation network and multi-scale feature fusion

Info

Publication number: CN114693933A
Application number: CN202210361443.8A
Authority: CN
Inventors: 孙美君; 杨淑清; 王征
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-07-01

Abstract

The invention discloses a medical image segmentation device based on generation of a countermeasure network and multi-scale feature fusion, which comprises: the input of the segmentation frame is an original 3D CT image, and a standardized image is obtained after preprocessing; utilizing a trained characteristic-based graph to synthesize and generate a discriminator in a confrontation network to segment the liver, and outputting a prediction probability graph; the value of each pixel in the probability map represents the probability that the pixel belongs to the liver, and more information is learned through counterstudy between the generator and the discriminator; automatic extraction of liver ROI: performing dot multiplication on the liver 3D segmentation result and the standardized image, shielding other non-related visceral organs, calculating a minimum external cuboid of a liver region, cutting, and resampling livers with different sizes to the same size; the method comprises the steps of taking a liver ROI as input, fusing multi-scale features by utilizing a trained three-channel cascade network based on improved V-Net, expanding a receptive field, processing the problems of position, shape and size difference of target regions and fuzzy boundaries of lesion regions in different data, and finally obtaining a tumor segmentation result.

Description

Medical image segmentation device based on generation of confrontation network and multi-scale feature fusion

Technical Field

The invention relates to the field of computer vision of machine learning, in particular to a medical image segmentation device based on generation countermeasure network and multi-scale feature fusion.

Background

With the development of computer technology and biomedicine, medical imaging devices are gradually popularized, and various medical imaging technologies are widely applied to clinics. For example: magnetic Resonance Imaging (MRI), Computed Tomography (CT), Ultrasound (Ultrasound), X-ray (X-ray), etc., are all used to visualize various organs, different tissues, and diseased areas inside the human body in a non-invasive manner. Medical image segmentation plays an indispensable auxiliary role in disease diagnosis, case analysis, surgical planning and prognosis evaluation, can provide doctors with extremely valuable information such as the position of organ lesion, the size of lesion area, the severity of lesion and the like, and can play a role in real-time imaging in the surgical operation process. Deep learning is rapidly developed in the field of medical image segmentation, low-level features are combined into abstract high-level features by utilizing nonlinear combination, and the analysis difficulty of low resolution and high complexity of medical images is solved, so that the construction, upgrading and reasonable interpretation of a deep learning network become one of the hot points of current artificial intelligence and medical cross field research.

At present, how to realize high-precision automatic segmentation of liver tumors remains one of the most challenging tasks in medical image processing. Due to the limitation of labor cost and professional knowledge, it is difficult to label liver and liver tumor on voxel level in a large liver image data set, and the lack of labeled data is undoubtedly a problem to be solved at present for a deep learning model using data as a driver. Secondly, compared with the liver, the liver tumor has small volume and uneven gray distribution, the shape, the number and the position of the liver tumor are different from person to person, and the boundaries of the tumor and organs are quite fuzzy, so that the difficulty of finely dividing the liver tumor is stepped.

Currently, the medical image segmentation method at home and abroad can be divided into: conventional techniques, shallow machine learning based techniques, and deep learning based techniques. The traditional image segmentation technology utilizes the characteristics of gray scale, texture, edge and the like of an image, and segments a target area by manually setting a characteristic value, wherein the quality of a segmentation result is closely related to the manually set characteristics, so that the prediction performance of a complex scene is usually limited, and a large amount of available original image information is ignored; facing the limitation in the traditional segmentation technology, the development of machine learning provides a new solution for the segmentation of medical images, and the shallow machine learning technology comprises: clustering, support vector machines and the like all rely on manual feature extraction, the labor cost and the time cost are high, and the quality of feature selection directly influences the segmentation effect; research related to deep learning has undergone long-term development and evolution, from the introduction of MP (logical neuron) model in 1943, researchers in past generations successively introduced network models such as perceptron, back propagation algorithm, convolutional neural network, generation countermeasure network, residual error network, etc., and these methods appeared by injecting fresh blood for machine learning. With the application and popularization of artificial intelligence in the medical field, a deep learning algorithm is utilized to segment the liver and the tumor thereof from the medical image, the segmentation precision is greatly improved compared with the traditional image segmentation method and the segmentation method based on shallow machine learning, and partial problems still exist to become the barrier of further development. For example: for liver tumors with uneven density distribution or different scales, the existing segmentation technology still has a larger progress space; in addition, at present, no large-scale labeled data set can sufficiently meet the training requirement of a deep network, so that it is necessary to research how to learn under the condition of less sample data.

For medical images, especially three-dimensional images, a great deal of labor cost and time cost are needed for acquiring annotation data, and the development of AI (artificial intelligence) medical images is hindered to a certain extent by the problem of small samples. In recent years researchers have tried a variety of approaches to fully exploit incomplete data sets, proposing many high-performance models to reduce the need for labeled data in medical image segmentation. Data determines the upper limit of model performance, and under the condition that the amount of labeled data is very limited, how to enable a small amount of data to play a greater role is an urgent problem to be solved in medical image segmentation research.

At present, deep learning achieves excellent results in the field of liver tumor segmentation, but due to the characteristics of medical images, the segmentation task has the following problems:

1) most of 3D segmentation models are used for reducing the operation amount, 2D segmentation results are spliced into 3D results, the boundary curve of ROI (region of interest) in a 2D image slice replaces the ROI curved surface in a 3D image in the optimization target of the model, and the segmentation precision is undoubtedly reduced;

2) in abdominal CT images of different cases, the information such as the size, the position, the texture and the like of liver tumors are greatly different, and it is very difficult to accurately position the tumor region by directly using an end-to-end network;

3) the training of the deep neural network relies on a large number of labeled medical images, and the manual completion of accurate voxel-level labeling is a very time-consuming and labor-consuming process and has certain subjectivity. In addition, because medical images have different imaging protocols, training set images labeled for one study are difficult to use as a training set for another study, and often require re-labeling. Therefore, it is worth further exploring how to fully utilize the existing small amount of tag information under the condition that the tag data is incomplete.

4) Most segmentation algorithms have good segmentation effect only in images with clear and sharp boundaries and strong contrast, and are influenced by local body effect, tissue motion, noise, artifacts and the like, so that liver tumor boundaries in CT images are fuzzy, the contrast with the liver is low, and the data complexity is high. The neural network model can well divide the edges by extracting a large number of features, and how to use data with different scales to improve the model representation capability is also a problem to be solved, so that the tumor edge refined segmentation is realized.

Disclosure of Invention

Aiming at the problem of small samples of medical images, the invention realizes the segmentation of livers and tumors from abdominal CT images by combining the multi-scale semantic information of medical images through a semi-supervised learning mode and using a generated countermeasure network synthesized based on a feature map and improved multi-scale V-Nets, and is described in detail in the following description

A medical image segmentation apparatus based on generation of confrontation networks and multi-scale feature fusion, the apparatus comprising: a frame for dividing the liver tumor from coarse to fine,

the input of the segmentation frame is an original 3D CT image, and a standardized image is obtained after preprocessing; utilizing a trained discriminator in a confrontation network generated based on feature map synthesis to segment the liver, and outputting a prediction probability map;

the value of each pixel in the probability map represents the probability that the pixel belongs to the liver, and a countermeasure network is generated to learn more information through countermeasure learning between a generator and a discriminator;

automatic extraction of liver ROI: performing dot multiplication on the liver 3D segmentation result and the standardized image, shielding other non-related visceral organs, calculating a minimum external cuboid of a liver region, cutting, and resampling livers with different sizes to the same size;

the method comprises the steps of taking a liver ROI as input, fusing multi-scale features by utilizing a trained three-channel cascade network based on improved V-Net, expanding a receptive field, processing the problems of position, shape and size difference of target regions and fuzzy boundaries of lesion regions in different data, and finally obtaining a tumor segmentation result.

The discriminator is a V-Net network integrated with a pyramid pooling module, the generator is a neural network synthesized based on a feature map, the generator is trained by using the feature map output by the spatial pyramid pooling, and distribution of CT images is learned from label-free data to generate pseudo image data.

The discriminator uses the divided labeled data, non-labeled data and pseudo data generated by the generator to carry out semi-supervised learning; the network of the discriminator and the network of the generator mutually resist and learn until the discriminant and the generator reach dynamic balance, and the training is finished.

Further, the V-Net network of the pyramid pooling module is:

pyramid pooling is improved with a spatial shape awareness module for capturing long distance dependencies between different lesion areas in an image, capturing local context dependencies;

and for the input tensor, processing the input tensor by using three mutually vertical sheet-shaped seeds to obtain three outputs, expanding the three outputs to be consistent with the input tensor in size, and fusing to obtain a new eigenvector.

Furthermore, the fusion process is that dot product operation is followed by a Softmax activation function, and the fused tensor and the original input tensor are subjected to the same fusion operation to obtain a final output tensor;

using three-dimensional pooling layers of three scales to pool the input feature maps to three scales of 1 × 1 × 1, 2 × 2 × 2 and 3 × 3 × 3 respectively; reducing the number of channels of the three-scale pooling result by convolution; then up-sampling to the size of the original characteristic diagram respectively, and fusing the original characteristic diagram with the output result of the original characteristic diagram and the space shape sensing module; and reducing the number of channels through convolution again to obtain a feature map containing multi-scale information.

The three-channel cascade network based on the improved V-Net comprises the following components: a multi-scale split network is provided,

the input of the first branch is to reduce the three dimensions of the original input data to 0.5 times respectively, perform 2 times up-sampling on the result after outputting the corresponding segmentation result, and fuse the results of the other two branches to output the segmentation graph of the liver tumor.

The technical scheme provided by the invention has the beneficial effects that:

1. the invention uses the idea of cascade network to realize the division of liver image areas directly from 3D pictures, further more finely divide tumor image areas from the liver image areas and realize the end-to-end detection of medical images;

2. the invention uses a semi-supervised learning mode, can fully utilize the information of the unlabeled data, solves the problem of excessive dependence of a deep learning model on the labeled data, and can obtain higher segmentation precision by using less labeled data;

3. the invention uses the space shape perception module to improve the original pyramid pooling module, can match proper target sizes from the global and local angles through multiple receptive fields with different sizes, effectively plays the role of different scale features, utilizes the semantic information of the high-level features to enable the feature map of the low-level features to be more complete, utilizes the details of the low-level features to refine the edges of the high-level features, and avoids the interference caused by background noise.

Drawings

FIG. 1 is a schematic structural diagram of a medical image segmentation apparatus based on generation of a countermeasure network and multi-scale feature fusion;

FIG. 2 is a schematic diagram of a liver segmentation model;

FIG. 3 is a schematic diagram of a training process of a liver segmentation model;

FIG. 4 is a schematic diagram of a pyramid pooling module architecture using spatial shape perception improvement;

FIG. 5 is a schematic diagram of a spatial shape perception module;

FIG. 6 is a schematic diagram of a tumor segmentation model;

fig. 7 is a schematic diagram illustrating visualization of the segmentation result of each model liver tumor.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

Example 1

The embodiment of the invention provides a medical image segmentation device based on generation of an antagonistic network and multi-scale feature fusion, and describes a liver tumor segmentation frame from coarse to fine, as shown in figure 1, the input of the frame is an original 3D CT image, and a standardized image is obtained after preprocessing; then, segmenting the liver by using a discriminant in a trained feature-based graph synthesis generation countermeasure network (FRGAN), outputting a prediction probability graph, wherein the value of each pixel in the probability graph represents the probability that the pixel belongs to the liver, and the FRGAN learns more information from less labeled data through countermeasure learning between a generator and the discriminant; liver ROI was then automatically extracted: performing dot multiplication on the liver 3D segmentation result and the standardized image, shielding other non-related visceral organs, calculating a minimum external cuboid of a liver region, cutting, and resampling livers with different sizes to the same size; then, the ROI of the liver is used as input, and a trained three-channel cascade network based on an improved V-Net (full convolution neural network for segmenting the volume medical image) is utilized to fuse multi-scale features, so that the receptive field is enlarged, the problems of position, shape and size difference of target regions and fuzzy boundary of lesion regions in different data are better processed, and finally, a tumor segmentation result is obtained.

In summary, the tumor segmentation performed by the above framework is performed in a smaller and more accurate liver region than the original image, so that the situation of mistakenly segmenting the tumor can be effectively reduced, and the tumor segmentation accuracy can be improved.

Example 2

The solution of example 1 is further described below in conjunction with fig. 2-5, and is described in detail below:

first, liver region extraction

Embodiments of the present invention use a feature map synthesis-based confrontation network (FRGAN) as the liver region segmentation module of the segmentation framework in fig. 1. Fig. 2 shows a network structure of FRGAN, which mainly comprises a generator and a discriminator, and uses a V-Net network integrated with a pyramid pooling module as the discriminator, and outputs a segmentation result, and uses a neural network based on a feature map synthesis method as the generator. The generator utilizes the characteristic diagram output by the S-PPM (space pyramid pooling) module to train, learns the distribution of the CT images from the label-free data and generates pseudo image data; in order to verify the performance of the model, dividing experimental data into labeled data and unlabeled data according to the ratio of 4:6, and performing semi-supervised learning by using the divided labeled data, unlabeled data and pseudo data generated by a generator by using a discriminator; the network of the discriminator and the network of the generator mutually resist and learn until the discriminant and the generator reach dynamic balance, and the training is finished. The three color arrows in fig. 3 represent the training process of the network.

The left half of fig. 2 is a network structure of the generator, and a pseudo image is generated using a feature map synthesis method. An S-PPM module in the discriminator network aggregates multi-level and multi-scale feature information, an output feature graph is used as input of a generator, and a pseudo image with the same size as real data is generated through 4 times of upsampling operation and 4 stages of convolution operation. Then, the authenticity image respectively passes through an encoder part of the discriminator to extract a Feature map (Unlabeled Feature-map) of the Unlabeled image and a Feature map (Fake Feature-map) of the Fake image, the mean difference of the two is used as loss, and after a plurality of iterations, the generator can synthesize the Fake image which is closer to the real image through the Feature maps.

Fig. 4 shows a network structure of the improved pyramid pooling module S-PPM. Organ tumors in the medical images are irregular in shape and uneven in distribution, original pyramid pooling completely depends on stacking of pooling layers of different sizes, and target region features and key position information are difficult to learn from the medical images. Embodiments of the present invention thus utilize a Spatial Shape Awareness Module (SSAM) to improve the pyramid pooling module. The network structure of the space shape perception module SSAM is shown in fig. 5, and can flexibly capture the long-distance dependency relationship between different lesion areas in an image, effectively capture the correlation of local context, and prevent the interference of irrelevant areas. For the input tensor, processing the input tensor by using three mutually vertical sheet sub-modules to obtain three outputs, expanding the three outputs to be consistent with the input tensor in size and fusing the outputs to obtain a new eigenvector, wherein the fusing process is a dot product operation and then is connected with a Softmax activation function; and finally, performing the same fusion operation on the fused tensor and the original input tensor to obtain the final output tensor. On the basis, three-dimensional pooling layers of three scales are used for pooling the input feature maps to three scales of 1 × 1 × 1, 2 × 2 × 2 and 3 × 3 × 3 respectively; then, reducing the number of channels for the three scales of pooling results through convolution; then, respectively up-sampling to the size of the original characteristic diagram, and fusing the size with the original characteristic diagram and the output result of the SSAM; and reducing the number of channels through convolution again to obtain a feature map containing multi-scale information.

Wherein the loss function l of the generator_GThe following were used:

l_G＝‖E_x′～G(x)f(x)+E_x′～G(x)f(x^′)‖² (1)

where x represents the input unlabeled data, x^′Representing the pseudo data generated by the generator according to the non-label data, f (x) representing the non-label characteristic diagram extracted after the pseudo data passes through an S-PPM module in the discriminator, f (x)^′) The method comprises the steps of representing a pseudo feature map extracted after pseudo data pass through an S-PPM module, E representing expected values of loss of unlabeled data and pseudo data, and G (x) representing distribution of images generated by a generator.

The discriminator calculates the supervision loss between the label of the real marked sample and the target area segmentation result of the predicted marked sample, and simultaneously calculates the unsupervised loss of the target area segmentation result of the predicted unmarked sample and the target area segmentation result of the pseudo sample, the loss l of the discriminator_DLoss of image l from tape label_LabeledImage without label l_UnlabeledAnd loss of data of three types of pseudo image_FakeConsists of the following components:

l_D＝l_Labeled+l_Unlabeled+l_Fake (2)

for input data, x_GThe label data representing the correspondence of the image with label is x, x_G～P_data(x,x_G) Representing input as tagged data, x-P_data(x) Indicating input as unlabeled data, x^′G (f) represents that the input is dummy data, and G (f) represents the distribution of the dummy data. Using a cross-entropy loss function (well known to those skilled in the art), three classes are obtainedThe loss of the input image can be calculated by the following formula:

wherein p is_D(y_i|x,y_i<N +1) indicates that the voxel prediction class in the image x is y_iProbability of p_D(y_iN +1| x) represents the probability that the voxel prediction class in the unlabeled data is a false image, p_D(y_i＝N+1|x^′) Representing the probability that the voxel prediction class in the pseudo data is a pseudo image, where N is 1, p_DRepresenting the predicted probability of the discriminator network for each type of data.

The loss function has the functions of calculating the difference between the forward calculation result and the true value in each iterative training process of the network so as to guide the next training to be carried out in the correct direction, updating parameters in the training process of the neural network, learning the distribution of data images by the network after training, and having the capability of extracting liver regions from the images.

Tumor segmentation

After the extraction of the liver region is completed, the liver tumor is segmented, and in order to better utilize Multi-scale information to improve the segmentation performance of the network model, a three-channel Parallel liver tumor segmentation network (MP V-Nets) is designed based on S-PPM V-Net, and the structure is shown in fig. 6. There are 3 similar segmentation branches in a multi-scale segmentation network, except for the different scales of the inputs and the different sampling operations performed when the final segmentation results are merged. Taking the first branch in fig. 6 as an example, the input of the first branch is that the three dimensions of the original input data are respectively reduced to 0.5 times, after the corresponding segmentation result is output, the result needs to be up-sampled by 2 times, the results of the other two branches are fused, and finally, the segmentation map of the liver tumor is output.

Defining a loss function L for a network MP V-Nets_MPVIs the sum of three parallel branch penalty functions, each of which employs a cross entropy penalty function. Using x_ijIs shown asinput data for i branches, N representing the total number of voxels in the data, x_GijThe truth label, p (x), representing the jth voxel in the data_ij) The prediction probability vector representing the point, the loss function is calculated from the following equation:

L_MPV＝L_0.5+L₁+L₂ (4)

by a loss function L_MPVAnd calculating the difference between the prediction result of the tumor segmentation network and the real label, updating parameters of each layer of the network in the process of back propagation, and updating through multiple iterations to enable the network to have the capability of distinguishing whether a certain pixel belongs to the foreground or the background, so that the tumor is accurately segmented from the image.

Example 3

The schemes of examples 1 and 2 are further described below in conjunction with fig. 7, table 1, and described in detail below:

the algorithm experiment code in the embodiment of the invention uses Python programming language, the algorithm frame is built in an efficient and flexible Pytrch deep learning frame, and because the scheme of the embodiment of the invention is based on a deep neural network, a GPU (graphics processing unit) processor needs to be configured in the experiment link, and the complex neural network calculation is completed by matching with a Cuda acceleration operation platform promoted by NVIDIA company and a GPU database cuDNN for the deep neural network.

Since the FRGAN liver segmentation network uses a semi-supervised training mode, 100 preprocessed training set data are randomly divided into 40 and 60 data, which are respectively used as labeled data and unlabeled data of a training discriminator. Using Adam optimization algorithm in the training process of the network, the initial learning rate of the generator is set to 1 × 10^-3The initial learning rate of the discriminator is set to 1 × 10^-5The BatchSize is set to 2 and the maximum number of training rounds is set to 20000. The segmented liver image area is used as training data of a tumor segmentation stage, meanwhile, the liver in the label is set to be 0, and the tumor is set to be 1.

Three-channel parallel tumor segmentation models, MP V-Nets, were then trained. During the training process of the network, Adam optimization algorithm is used, BatchSize is set to 2, and the initial learning rate is set to 1 × 10^-3The gradient coefficient was set to 0.9, the coefficient of the square gradient was set to 0.999, and the weight attenuation coefficient was 1 × 10^-8And setting the maximum training round number to be 200, stopping training when obvious overfitting occurs, and storing all weight parameters of the network.

The LiTS2017 is used for data collection, and the total number of training samples is 131, wherein the training set is 100, and the testing set is 31. As evaluation indexes, DICE (DICE coefficient), ACC (accuracy), VOE (voxel overlap error), RVD (relative error of acceleration), SP (specificity), and SE (forest acuity) are used. The DICE number is a similarity measure commonly used in medical image segmentation, and the larger the value is, the closer the output result is to a real result.

5 different advanced 3D segmentation algorithms are selected for comparison experiments, and the test results of each experimental method are shown in Table 1. The 5 models were: DenseVoxNet, Rubik's cube + +, 3D U-Net, V-Net, Vox2Vox, corresponding to [ a ] to [ e ] in the table.

TABLE 1 liver tumor segmentation index comparison

As can be seen from the above table:

(1) by comprehensively considering all indexes, the three-channel parallel liver tumor segmentation model MP V-Nets provided by the embodiment of the invention has better performance and is generally superior to other methods, and the segmentation framework from thick to thin is proved to be an excellent choice for realizing accurate segmentation of liver tumors.

(2) Compared with Rubik's cube + +, MP V-Nets have higher performance on DICE coefficients, ACC, VOE and SE although being slightly inferior in relative volume error, and are improved by 3.79% on the most important index DICE, which indicates that the model segmentation result is closer to the true value.

(3) Compared with V-Net, after the space shape sensing module and the pyramid pooling module are integrated, DICE indexes are improved, VOE and RVD are obviously reduced, the visible space shape feature module and the improved pyramid pooling module excite more information of the feature map on the space, high-level semantic information of the image is fully mined, more image edge information is kept, and accuracy and superiority of liver tumor segmentation of the scheme are proved.

In order to observe and compare the performance of each model more intuitively, the segmentation effect of each model on several cases which are difficult to segment is compared, and the segmentation results of different models are visualized, as shown in fig. 7. 5 specific cases with high segmentation difficulty in experimental tests are selected for displaying in the same way, and it can be obviously seen that compared with the liver, the tumor has the characteristics of wide distribution, large quantity, different shapes and sizes and the like, and the tumor segmentation difficulty is increased. The first four columns in the figure are the segmentation results of the original data, liver region, gold standard and the device, and the next five columns are the segmentation results of the five comparison models. The comparison shows that when the tumor volume is small and the distribution is very dense, other models can easily integrate a plurality of small tumors and divide the small tumors into a whole, and the model can accurately eliminate irrelevant areas among the tumors; for the case that the contrast between the gray scale of the tumor and the gray scale of surrounding tissues is small, the tumor boundary which is very similar to the gold standard can be obtained by the model, and the edge part of the tumor is difficult to accurately segment by other models, so that the under-segmentation phenomenon occurs.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-mentioned serial numbers of the embodiments of the present invention are only for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An apparatus for segmenting medical images based on generation of confrontation networks and multi-scale feature fusion, the apparatus comprising: a frame for dividing liver tumor from coarse to fine,

the input of the segmentation frame is an original 3D CT image, and a standardized image is obtained after preprocessing; segmenting the liver by using a discriminator in a generation countermeasure network synthesized based on a trained feature map, and outputting a prediction probability map;

automatic extraction of liver ROI: performing dot multiplication on the liver 3D segmentation result and the standardized image, shielding other non-relevant visceral organs, calculating a minimum external cuboid of a liver region, cutting, and resampling livers of different sizes to the same size;

2. The medical image segmentation apparatus based on generation countermeasure network and multi-scale feature fusion as claimed in claim 1,

the discriminator is a V-Net network integrated with a pyramid pooling module, the generator is a neural network synthesized based on a feature map, the generator is trained by using the feature map output by spatial pyramid pooling, and distribution of CT images is learned from label-free data to generate pseudo image data.

3. The medical image segmentation device based on generation countermeasure network and multi-scale feature fusion according to claim 1 or 2,

4. The medical image segmentation apparatus based on generation countermeasure network and multi-scale feature fusion as claimed in claim 2, wherein the V-Net network based on pyramid pooling module is:

pyramid pooling is improved and attached in the jump connection of the V-Net network with a spatial shape awareness module for capturing long distance dependencies between different lesion areas in the image, capturing local context dependencies;

5. The medical image segmentation device based on generation of the countermeasure network and multi-scale feature fusion as claimed in claim 4, wherein the fusion process is a dot product operation followed by a Softmax activation function, and the fused tensor and the original input tensor are subjected to the same fusion operation to obtain a final output tensor;

pooling the input feature maps to three scales of 1 × 1 × 1, 2 × 2 × 2 and 3 × 3 × 3 respectively by using three-dimensional pooling layers of three scales; reducing the number of channels of the three-scale pooling result by convolution; then up-sampling to the size of the original characteristic diagram respectively, and fusing the original characteristic diagram with the output result of the original characteristic diagram and the space shape sensing module; and reducing the number of channels through convolution again to obtain a feature map containing multi-scale information.

6. The medical image segmentation device based on generation countermeasure network and multi-scale feature fusion of claim 4, wherein the three-channel cascade network based on improved V-Net is characterized in that: a multi-scale split network is provided,