CN115439486A

CN115439486A - Semi-supervised organ tissue image segmentation method and system based on dual-countermeasure network

Info

Publication number: CN115439486A
Application number: CN202210629202.7A
Authority: CN
Inventors: 雷涛; 张栋; 尚佳童; 杜晓刚; 王营博; 丁丽萍
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-12-06

Abstract

The invention discloses a semi-supervised organ tissue image segmentation method and system based on a dual-countermeasure network, and belongs to the field of medical organ tissue image processing. According to the invention, mutual learning of the segmentation network and the discrimination network is promoted through dual confrontation network confrontation training, and the knowledge transfer capability of the segmentation network to the labeled data to the unlabeled data is improved. The bidirectional attention component based on dynamic convolution effectively prevents overfitting, reduces error accumulation in average teachers, and improves generalization capability of the segmentation network, so that performance of the image segmentation network is effectively improved. The method has smaller memory occupation and faster reasoning speed, reduces the dependence of the network on pixel-level labeled data by jointly learning a small amount of data with accurate labeling and a large amount of unlabeled data, and improves the knowledge migration capability of the segmentation network on the labeled data to the unlabeled data, thereby accurately positioning and segmenting organs and other tissue images of a human body.

Description

Semi-supervised organ tissue image segmentation method and system based on dual-countermeasure network

Technical Field

The invention belongs to the field of medical organ tissue image processing, relates to a semi-supervised organ tissue image segmentation method based on a dual countermeasure network, and is particularly suitable for a small number of semi-supervised medical image segmentation scenes with accurate annotation data and a large number of unmarked data.

Background

The coding and decoding convolutional neural network based on supervised learning achieves remarkable effect in the semantic segmentation task of the medical image, and effectively promotes the development of medical image segmentation, such as U-Net, U-Net + +, denseUNet and the like. However, the success of these techniques relies heavily on the data of a large number of pixel-level labels. The data accurately labeled in the abdominal CT medical image segmentation task is less, and the CT scanning image has the problems of high noise, low contrast and the like, so that the medical image labeling is difficult. In addition, the medical image labeling cost is high due to the fact that the medical image labeling task has high requirements on medical professional knowledge. Semi-supervised learning is a learning paradigm for solving the problem of incomplete data labeling in weak supervised learning, and the semi-supervised learning mainly utilizes a small amount of labeled data and a large amount of unlabeled data to carry out combined training, so that the semi-supervised learning is more in line with the actual clinical scene and has important research significance. In order to better utilize the unlabeled data, the mainstream semi-supervised image segmentation method based on disturbance consistency regularization has been greatly successful. Other new methods such as self-training, generation of countermeasure networks, contrast learning, and collaborative training are also applied to the semi-supervised image segmentation.

Among the consistency regularization methods, the Mean Teachers (MT) framework is one of the mainstream methods. The MT framework firstly carries out supervised learning on the marked data; then, providing a pseudo label for the unmarked data by using a teacher model, and enabling the teacher-student model to output similar prediction results for the unmarked data under different disturbances by using different regularization modes; finally, the student model is updated by monitoring feedback of loss and consistency loss. The teacher model is an Exponential Moving Average (EMA) of the weight of the student model, and the teacher model continuously accumulates network history prediction information of the unmarked data. The method based on the average teacher is very effective, but still has the following two problems that (1) the mainstream semi-supervised medical image segmentation network directly uses the segmentation network for supervised learning to have a larger problem, and because different models in the segmentation network share the same weight parameter, model overfitting is easily caused during training under a small number of data sets, the quality of generating pseudo labels for unmarked data is poor, so that parameters between the teacher and the student model are highly coupled, and the accumulation of error information is increased. (2) The mainstream semi-supervised medical image segmentation method does not better utilize the prior data distribution relation between the labeled data and the unlabelled data, so that the efficiency of the network to learn the characteristics of the unlabelled data is low, and the generalization capability of the model is poor.

Disclosure of Invention

The organ tissue image segmentation model based on deep learning is seriously dependent on a large amount of data with pixel-level labeling, but the organ tissue image labeling task is extremely difficult, requires basic professional knowledge, and is time-consuming and labor-consuming. Therefore, the invention mainly aims to provide a semi-supervised organ tissue image segmentation method and system based on a dual confrontation network, mutual learning of a segmentation network and a discrimination network is promoted through dual confrontation network confrontation training, prediction consistency of the segmentation network on labeled data and unlabeled data and prediction consistency of the same data under different disturbances are mainly learned by two discriminator networks respectively, and the knowledge migration capability of the segmentation network on the labeled data to the unlabeled data is promoted. In addition, in order to effectively prevent over-fitting and improve the utilization efficiency of unlabeled data, the invention provides a bidirectional attention component based on dynamic convolution, and network parameters are dynamically adjusted according to the structure prior information of each sample. The assembly can effectively prevent overfitting, reduce error accumulation in average teachers, and improve generalization capability of the segmentation network, so that performance of the image segmentation network is effectively improved. Compared with the existing semi-supervised segmentation network, the method can obtain more precise image segmentation results, has smaller memory occupation and faster reasoning speed, reduces the dependence of the network on pixel-level labeled data by jointly learning a small amount of data with precise labels and a large amount of unlabelled data, and improves the knowledge migration capability of the segmentation network on the labeled data to the unlabelled data, thereby accurately positioning and segmenting organs and other tissues of a human body.

In addition, the invention designs a segmentation system of human organs and skin lesion tissues based on deep learning, and assists in supporting artificial intelligence to segment human abdominal organs and tissues images, positioning tissue markers, reconstructing images, evaluating risk degree, detecting lesions, assisting in training and the like.

The artificial intelligence assists the medical training of the liver, including based on the daily training of supplementary medical staff of the liver image segmentation data, the supplementary support improves abdomen organ image and tissue pathological change analysis precision and efficiency.

In order to achieve the above purpose, the purpose of the invention is realized by the following technical scheme:

the invention discloses a semi-supervised organ and tissue image segmentation method based on a dual-countermeasure network, which comprises the following steps:

the method comprises the following steps: the original data is preprocessed according to the imaging principle of the image, and the contrast of the organ and tissue image is enhanced to obtain a clearer organ and tissue image data set.

The image preprocessing method comprises two window level and window width adjustment, image resampling, noise adding, random rotation and random overturning.

The specific implementation method for adjusting the data window level and the window width in the first step comprises the following steps:

step 1.1: and selecting proper window level and window width parameters according to different segmentation tasks for organ data, and enhancing the contrast of the target organ.

Because the contrast in the medical image data is low, the image is seriously interfered by noise and the size of the image is inconsistent, in order to acquire clear images of organs and tissues, the interference of other clutter is reduced by cutting off the part which exceeds the range in the data. The contrast of the organ area is enhanced by setting the proper window level and window width. And solving the problem of inconsistent image resolution through resampling operation, and preparing for next training of the segmentation model.

The main principle of the window level and width adjusting method in step 1.1 is that different pixel value domains in the image correspond to different abdominal organ tissues, and in order to enhance the target organ region, the data is normalized to be within the range of [ m, n ], so that a preprocessed image F is obtained. The image is first converted to Hounsfield (HU) values, the formula for calculation is given as:

HU＝Pixel*Rs+Ri

wherein HU represents the density of different tissues, pixel is the Pixel value, rs value set by the invention is the conversion coefficient, and Ri value is the conversion bias. After converting the image into Hounsfield values, the maximum value H of the tissue density is calculated _max And a minimum value H _min Adjusting the window level and the window width, wherein the specific formula is as follows:

H _max ＝(2*wc-ww)/2.0+0.5

H _min ＝(2*wc+ww)/2.0+0.5

wherein, ww (window width) is the window width, and wc (window center) is the window level. Normalizing the obtained HU values to a fixed value range [ m, n ], obtaining a preprocessed image F, with the formula:

step 1.2: data enhancement is performed on data of different organizations.

The purpose of data enhancement of the image in step 1.2 is to facilitate increasing the diversity of data and the generalization performance of the model on the one hand, and to prevent overfitting of the network and facilitate the training and convergence of the model on the other hand.

Step two: and (3) performing feature extraction by using a designed convolutional neural network based on dual countermeasure according to the data set preprocessed in the step one, decoding the extracted features into low-dimensional feature representation, and finally outputting a segmentation result with a semantic level. The invention provides an image segmentation model with double antagonism. The DA-Net network is composed of two segmentation networks and two discriminator networks. The segmentation network consists of a student model and a teacher model. The student model and the teacher model have the same structure, and both are based on the coding and decoding structure, except that the former is trained by a loss function, while the latter is an exponential moving average of the weights of the student model. The arbiter network consists of convolutional layers, dynamic bidirectional attention components, and global averaging pools. In DA-Net, the average teacher is still used as the basic framework. In a network of discriminators, an antagonistic consensus training strategy is employed to achieve different goals by using dual discriminators of the same structure. The first discriminator learns the consistency of the prediction quality of the segmented network of unlabelled data and labeled data. The second discriminator learns the predicted consistency of the teacher and student networks with the same data but under different perturbations. Unlike other competing networks, the input to the discriminator network is the output of the segmented network and the original image. And (4) judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result. In the split network portion, the bi-directional attention component is applied to all layers of the split network (except the first layer) in place of the conventional standard convolution. The dynamic bidirectional attention component adaptively adjusts parameters of the convolution kernel according to different inputs, improves the feature representation capability of the network, and reduces the risk of overfitting. The dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment. In addition, the segmentation network and the discriminators are alternately trained, and the discriminators are not needed in the inference stage, so that additional computation overhead is avoided.

And in the second step, the specific implementation of the organ tissue segmentation network based on the dual confrontation network is divided into two parts, namely a generation process of a dynamic bidirectional attention component and a confrontation consistency training strategy.

Step 2.1: and generating a bidirectional attention component, wherein the dynamic bidirectional attention component adaptively adjusts parameters of the convolution kernel according to different inputs, improves the characteristic representation capability of the network, and reduces the risk of overfitting. The dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment.

At given input data

C represents the number of input channels, and H × W represents the resolution of the input feature map. In order to enhance the significance of important spatial positions, the input feature map is firstly focused through a simple space. Carrying out dimension reduction on the input feature map by 1 multiplied by 1 convolution, then carrying out normalization by a sigmoid activation function, and multiplying the obtained space attention weight on the input feature map pixel by pixel to obtain an output feature map

First x ₁ Feature map obtained by global average pooling

Then reducing dimension by 1 × 1 convolution and activating function by softmax to obtain

And N is the number of convolution kernels defined in advance, is a hyper-parameter, is set according to a specific task, and is set to be N =4 through experimental verification. Multiplying the obtained coefficients p to N convolution kernels respectively, summing the weights of the convolution kernels, generating only one convolution kernel and performing convolution operation, thereby obtaining the final productResulting convolution kernel weights

Is represented as follows:

wherein p is _i Denotes the i-th coefficient of p, 0 ≦ p _i ≤1，

w _i Is the weight of the ith convolution kernel.

Meanwhile, the parameter Q of the standard dynamic convolution is calculated as:

Q＝C _in ×N+N×C _in ×C _out ×k×k

where k × k is the size of the convolution kernel, C _in And C _out The number of channels of the input/output characteristic diagram is shown. Obviously, the parameter amount is N times or more of that of the normal convolution, and in order to reduce the parameter amount, the original parameter amount is reduced to

Obviously, compared with the standard convolution and the standard dynamic convolution, the method can greatly reduce the parameter quantity and effectively improve the model performance.

Step 2.2: with the confrontation consistency training strategy, different goals are achieved by using dual discriminators of the same structure. The first discriminator learns the consistency of the prediction quality of the segmented network of unlabelled data and labeled data. The second discriminator learns the predicted consistency of the teacher and student networks with the same data but under different perturbations. Unlike other competing networks, the input to the network of discriminators is the output of the segmentation network and the original image. And (4) judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result. In the split network portion, the bi-directional attention component is applied to all layers of the split network (except the first layer) in place of the conventional standard convolution.

The antagonism consistency training frame is mainly added with two discriminator networks on the basis of an average teacher, the two discriminator networks have the same structure but have different functions, and a discriminator D ₁ The learning network outputs the difference of quality for the marked data and the unmarked data. Discriminator D ₂ The difference between the unlabeled data under perturbation and the unperturbed data is learned. Finally, loss L is monitored _s Loss of consistency L _semi And to combat the loss L _adv1 ，L _adv2 The method encourages the student network to generate a high-quality prediction result on the unmarked data, and the countermeasure consistency training strategy adopting the DA-Net model structure is specifically realized by the following steps:

the counterstudy is realized by an alternate training mode, the segmentation network inputs medical images, outputs segmentation prediction graphs, splices the output of the segmentation network and the input images together and inputs the spliced output and the input images into a discrimination network, the output is the number of categories, 0 represents that the quality of segmentation results is poor, and 1 represents that the quality of the segmentation results is good. In training, it is encouraged to segment the network versus unlabeled data x _u Generating a high quality segmentation result with a score close to 1, in summary, during the training process, such that the segmentation network outputs a high quality segmentation result and such that the discrimination network cannot discriminate whether the input is from a label or the segmentation network, the segmentation network objective function L (θ) _s Is defined as:

the discriminator D decides whether the network is to distinguish the output of the segmented network as much as possible ₁ And D ₂ Are defined as:

wherein L is _s (. Is) a multi-class cross-entropy penalty and dice penalty, L _semi Is mean square error, L _adv1 And L _adv2 Cross entropy loss for multi-classification; x is the number of _i And y _i For input data and corresponding labels, x _u And x _ema The data is input as unlabelled data, with noise interference,

and

respectively the segmentation results of marked data and unmarked data,

a predicted result for the teacher network; λ is a weighting coefficient, which is a Gaussian rising curve,

i is the number of iterations of the training.

The parameters of the teacher model are EMA accumulations of student model parameters, and have proven their effectiveness in most methods, defined as:

θ' _t ＝αθ' _t-1 +(1-α)θ _t ，

wherein, theta' _t To update the parameters of the teacher model, θ _t Alpha is a hyperparameter of a smoothing coefficient, alpha determines the dependency relationship between the teacher model and the student model, and according to experience, the optimal performance is obtained when alpha = 0.999.

And inputting the processed data into the trained model to obtain a primary segmentation result p.

Step three: based on the first step and the second step, the dependence of a network on pixel-level labeled data is reduced by performing combined learning on a small amount of data with accurate labeling and a large amount of unlabeled data, the knowledge migration capability of a segmentation network on the labeled data to the unlabeled data is improved, so that organs and other tissues of a human body are accurately positioned and segmented, a more precise organ and tissue image segmentation result is obtained, the segmentation result p is post-processed, and hole filling is performed through morphological reconstruction to obtain a precise organ and tissue image segmentation result.

Step four: the image segmentation results of organs and tissues are evaluated by using the Dice coefficient and Average Symmetric Surface Distance (ASD).

The performance of the segmentation result is evaluated by using a Dice coefficient (DI), a Jaccard similarity index (Jaccard index, JA), a pixel precision (AC), a Sensitivity (SE), and a Specificity (SP), and the specific formula is as follows:

wherein, TP, TN, FP, FN respectively represent a positive class with correct prediction, a parent class with correct prediction, a positive class with wrong prediction and a negative class with wrong prediction.

Where a and B represent the true value and the segmentation result, respectively. S (A) and S (B) represent the surface voxel sets of A and B, and d (-) represents the Euclidean distance. The closer the Dice coefficient, jaccard similarity index, pixel accuracy, sensitivity and specificity to 1, the closer to the true value, the better the segmentation effect, and the smaller the ASD, the average distance of the surface, the closer to the true value. Notably, the final organ segmentation results are evaluated by synthetic 3D.

Further comprises the following steps: and designing a segmentation system of human organs and skin lesion tissues based on deep learning according to the image segmentation results of the organs and tissues evaluated in the step four, and assisting in supporting artificial intelligence to perform image segmentation, tissue marking positioning, image reconstruction, risk degree evaluation, focus detection and auxiliary training on the organs and tissues in the abdomen of the human body.

The auxiliary training comprises the step of assisting the daily training of medical staff based on the liver image segmentation data, and the auxiliary support improves the analysis precision and efficiency of abdominal organ images and tissue lesions.

The invention also discloses a double-confrontation network-based semi-supervised organ tissue image segmentation system, which is used for realizing the double-confrontation network-based semi-supervised organ tissue image segmentation method.

The image preprocessing module is used for uniformly preprocessing an input image, mainly comprises data enhancement and resampling of the image, and is beneficial to accelerating the convergence speed of the model and reducing the overfitting risk.

The network training module is used for training the preprocessed data, and training the preprocessed data by integrating an average teacher training strategy and an antagonism consistency training strategy and selecting parameters such as iteration times, a learning rate and batch quantity.

And the network test module is used for predicting the trained model and outputting an image segmentation prediction result.

The visualization module is used for realizing visualization of a loss curve of the visualization training set, a correct rate curve of the test set and a segmentation result of the image.

The organ tissue includes liver, pancreas, kidney, lung tissue, and skin tissue.

The invention relates to the technology of convolutional neural network, computer vision, pattern recognition and the like, and has the following beneficial effects:

1. the invention discloses a method and a system for segmenting a semi-supervised organ dirty tissue image based on a dual confrontation network. In addition, in order to effectively prevent over-fitting and improve the utilization efficiency of unmarked data, the invention provides a bidirectional attention component based on dynamic convolution, and network parameters are dynamically adjusted according to the structure prior information of each sample. The assembly can effectively prevent overfitting, reduce error accumulation in average teachers, and improve generalization capability of the segmentation network, so that performance of the image segmentation network is effectively improved. Compared with the existing semi-supervised segmentation network, the method can obtain a more refined image segmentation result, has smaller memory occupation and higher reasoning speed, reduces the dependence of the network on pixel-level labeled data by jointly learning a small amount of data with accurate labels and a large amount of unlabelled data, and improves the knowledge migration capability of the segmentation network on the labeled data to the unlabelled data, thereby accurately positioning and segmenting organs and other tissues of a human body.

2. The invention discloses a method and a system for segmenting a semi-supervised organ tissue image based on a dual-countermeasure network.

3. The invention discloses a method and a system for segmenting a semi-supervised organ tissue image based on a dual-countermeasure network, which adopt a training strategy of countermeasure consistency and pass through two discriminators D ₁ And D ₂ The method respectively learns the prior relationship between the unmarked data and the marked data and the prediction consistency of the same data to different disturbances, effectively utilizes the unmarked data, and reduces the dependency of the network to the unmarked data.

4. The invention discloses a method and a system for segmenting a semi-supervisor dirty tissue image based on a dual-countermeasure network.

5. The invention discloses a method and a system for segmenting a semi-supervisor dirty tissue image based on a dual-countermeasure network.

6. The invention discloses a method and a system for segmenting a semi-supervised organ tissue image based on a dual-countermeasure network, which can assist in supporting artificial intelligence to segment human abdominal organs and tissue images, locate tissue marks, reconstruct images, evaluate risk degree, detect focuses and assist in training.

Drawings

FIG. 1 is a flow chart of a semi-supervised visceral tissue image segmentation method based on a dual-countermeasure network.

Fig. 2 is a detailed structure diagram of the semi-supervised image segmentation network based on the dual-countervailing learning of the invention.

FIG. 3 is a detailed block diagram of the bidirectional attention component of the present invention based on dynamic convolution.

Fig. 4 is a visualization of the segmentation result on the LiTS liver segmentation dataset according to the present invention.

Fig. 5 is a visualization of the segmentation result on a skin lesion data set according to the invention.

Detailed Description

For better illustrating the objects and advantages of the present invention, the following description is provided in conjunction with the accompanying drawings and examples.

Example 1:

as shown in fig. 1, the semi-supervised organ image segmentation method based on the dual countermeasure network disclosed in this embodiment includes the following specific steps:

The main principle of the window level and width adjusting method in step 1.1 is that different pixel value domains in the image correspond to different abdominal organ tissues, and in order to enhance the target organ region, the data is normalized to the range of [ -200,250] to obtain a preprocessed image F. The image is first converted to Hounsfield (HU) values, the calculation formula is expressed as:

HU＝Pixel*1-1024

after converting the image into Hounsfield values, the maximum value H of the tissue density is calculated _max And a minimum value H _min Adjusting the window level and the window width, wherein the specific formula is as follows:

H _max ＝(2*100-400)/2.0+0.5

H _min ＝(2*100+400)/2.0+0.5

normalizing the obtained HU values to a fixed value range [0, 255], obtaining the image F after preprocessing, with the formula:

step 1.2: data enhancement is performed on data of different organizations.

Step two: and (3) performing feature extraction by using a designed convolutional neural network based on dual countermeasure according to the data set preprocessed in the step one, decoding the extracted features into low-dimensional feature representation, and finally outputting a segmentation result with a semantic level. The invention provides an image segmentation model with double antagonism. The DA-Net network consists of two split networks and two discriminator networks. The segmentation network consists of a student model and a teacher model. The student model and the teacher model have the same structure, and both are based on the coding and decoding structure, except that the former is trained by a loss function, while the latter is an exponential moving average of the weights of the student model. The arbiter network consists of convolutional layers, dynamic bidirectional attention components, and global averaging pools. In DA-Net, the average teacher is still used as the basic framework. In a network of discriminators, an antagonistic consensus training strategy is employed to achieve different goals by using dual discriminators of the same structure. The first discriminator learns the consistency of the prediction quality of the segmented network of unlabelled data and labeled data. The second discriminator learns the predicted consistency of the teacher and student networks with the same data but under different perturbations. Unlike other competing networks, the input to the discriminator network is the output of the segmented network and the original image. And (4) judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result. In the split network portion, the bi-directional attention component is applied to all layers of the split network (except the first layer) in place of the conventional standard convolution. And the dynamic bidirectional attention component adaptively adjusts parameters of the convolution kernel according to different inputs, improves the characteristic representation capability of the network and reduces the risk of overfitting. The dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment. In addition, the segmentation network and the discriminators are alternately trained, and the discriminators are not needed in the inference stage, so that additional computation overhead is avoided.

Step 2.1: and generating a bidirectional attention component, wherein the dynamic bidirectional attention component adaptively adjusts parameters of the convolution kernel according to different inputs, so that the feature representation capability of the network is improved, and the risk of overfitting is reduced. The dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment.

As shown in FIG. 3, at a given input data

C represents the number of input channels, and H × W represents the resolution of the input feature map. In order to enhance the significance of important spatial positions, the input feature map is firstly focused through a simple space. The specific operation is shown in figure 3 (a), andreducing dimension of the input feature map by 1 multiplied by 1 convolution, then normalizing by sigmoid activation function, multiplying the obtained space attention weight on the input feature map pixel by pixel to obtain an output feature map

First x ₁ Feature map obtained by global average pooling

Then reducing the dimension by 1 multiplied by 1 convolution and activating the function by softmax to obtain

And N is the number of convolution kernels defined in advance, is a hyper-parameter, is set according to a specific task, and is set to be N =4 through experimental verification. Multiplying the obtained coefficients p to N convolution kernels respectively, summing the weights of the convolution kernels, generating only one convolution kernel for convolution operation, and obtaining the weight of the convolution kernel

Is represented as follows:

wherein p is _i Denotes the i-th coefficient of p, 0 ≦ p _i ≤1，

w _i Is the weight of the ith convolution kernel.

Q＝G _in ×4+4×G _in ×G _out ×3×3

wherein, C _in And C _out The number of channels of the input/output characteristic diagram is shown. Obviously, the parameter amount is more than 4 times of the ordinary convolution, and in order to reduce the parameter amount, the parameter amount is reducedIs reduced to

Step 2.2: with the confrontation consistency training strategy, different goals are achieved by using dual discriminators of the same structure. The first discriminator learns the consistency of the prediction quality of the segmented networks of unlabelled data and labeled data. The second discriminator learns the predicted consistency of the teacher and student networks with the same data but under different perturbations. Unlike other countermeasure networks, the input to the discriminator network is the output of the segmentation network and the original image. And (4) judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result. In the split network portion, the bi-directional attention component is applied to all layers of the split network (except the first layer) in place of the conventional standard convolution.

The antagonism consistency training frame is mainly added with two discriminator networks on the basis of an average teacher, the two discriminator networks have the same structure but have different functions, and a discriminator D ₁ The learning network outputs the difference of quality for the marked data and the unmarked data. Discriminator D ₂ The difference between the unlabeled data under perturbation and the unperturbed data is learned. Finally, loss L is monitored _s Loss of consistency L _semi And to combat the loss L _adv1 ，L _adv2 To encourage the student network to generate high-quality prediction results for the unlabeled data, the structure of the DA-Net model is shown in FIG. 2. The countermeasure consistency training strategy comprises the following specific implementation steps:

the antagonistic learning is realized by an alternate training mode, the segmentation network inputs medical images, outputs segmentation prediction images and divides the segmentation networkThe output of the network and the input image are spliced together and input into a discrimination network, the output is the number of categories, 0 represents that the quality of the segmentation result is poor, and 1 represents that the quality of the segmentation result is good. In training, it is encouraged to segment the network versus unlabeled data x _u Generating a high quality segmentation result with a score close to 1, in summary, during the training process, such that the segmentation network outputs a high quality segmentation result and such that the discrimination network cannot discriminate whether the input is from a label or the segmentation network, the segmentation network objective function L (θ) _s Is defined as:

wherein L is _s (. Is) a multi-class cross-entropy penalty and dice penalty, L _semi Is mean square error, L _adv1 And L _adv2 Cross entropy loss for multi-classification; x is the number of _i And y _i For input data and corresponding labels, x _u And x _ema For input unlabeled data, with noise interference,

and

respectively the segmentation results of marked data and unmarked data,

predictive knot for teacher networkFruit; λ is a weighting coefficient, which is a Gaussian rising curve,

i is the number of iterations of the training.

θ′ _t ＝0.999*θ' _t-1 +(1-0.999)*θ _t ，

wherein, theta' _t To update the parameters of the teacher model, θ _t And inputting the processed data into the trained model for the weight parameters of the students to obtain a primary segmentation result p.

Step three: and (4) carrying out post-processing on the segmentation result p, and filling holes through morphological reconstruction to obtain a fine organ and tissue image segmentation result.

In the experiment of the invention, two data sets, namely a liver image and a skin tissue lesion image, are adopted to verify the effectiveness of the invention.

Liver image: the invention selects Liver Tumor Segmentation Change (LiTS) as an experimental data set, the LiTS comprises 131 cases of CT scanning data with labels and 70 cases of labeling data (not disclosed), in order to enhance the Liver contrast and remove interference, the intensity values of all CT images are cut off to the range of [ -200,250] HU, and the size of each image is 512 x 512. In semi-supervised learning, 121 cases are randomly selected as a training set, the remaining 10 cases are used as a test set, and random data enhancement such as turning, mirroring, rotation and the like is performed on the training set. For better comparison, the present invention randomly selected 10% and 20% of cases in the training set as labeled data, respectively, and the rest as unlabeled data.

Skin tissue lesion image: the skin tissue image dataset was from the 2018 International Skin Imaging Cooperation (ISIC) skin lesion segmentation challenge. The training set contains 2594 images and the validation set contains 100 images. The data sets have different types of skin lesions and different resolutions. To improve the computational efficiency of the different models, we resize all images to 256 × 192. Similarly, for semi-supervised learning, we randomly selected 10% (259 images) and 20% (519 images) as labeled data and the rest as unlabeled data in the training set. In the training phase, online random data augmentation is performed.

All experiments were performed on one server with specific parameters Intel (R) Xeon (R) Gold 6226R CPU @ 2.90GHz,40GB RAM, NVIDIA GeForce RTX 3090GPU, ubuntu 18.04, and PyTorch 1.7. The invention selects Adam to optimize the segmentation model, and the initial learning rate is 1 multiplied by 10 ^-3 The discrimination network is optimized by using a random gradient descent algorithm with momentum of 0.9, the initial learning rate is 0.01, and the weight attenuation is 0.0001.

Two advantages are emphatically introduced in the invention, one is the bidirectional attention component TAC-Dy, and the other is consistency antagonistic learning. The invention takes a semi-supervised method average teacher as a reference and takes U-Net as a backbone network. Ablation experiments were performed on the LiTS liver test set, with the training set divided by 10% labeled and 90% unlabeled. The results as in table 1 demonstrate the effectiveness of the present invention. The average teacher semi-supervised method based on U-Net cuts the liver with accuracy rate DICE of 92.39%, and obviously, an added discriminator D is designed ₁ 、D ₂ And DyTAC were raised by 0.72%,0.75% and 0.97% respectively on MT basis. And the bidirectional attention component can effectively improve the feature expression capability of the network, and the discriminator D ₁ Can effectively utilize the relation between the unmarked data and the marked data, and a discriminator D ₂ The generalization capability of the network can be effectively improved.

TABLE 1 ablation experiments on the LiTS test set

In order to further verify the effectiveness of the proposed framework of the invention, the invention mainly compares the mainstream semi-supervised image segmentation method, and tables 2 and 3 show the performances of using the supervision method U-Net, the semi-supervision methods DAN, mean Teacher, UA-MT, TCSM _ v2, CPS and the proposed DA-Net on the test data set. Obviously, the accuracy rate DICE of the method provided by the invention is 94.12% and 95.07% under the training conditions marked by 10% and 20%, respectively, and the average surface distance ASD is 3.51mm and 3.04mm, respectively, which is superior to other methods.

TABLE 2 comparative experimental results of different methods on LiTS liver test set, 10% of labeled data

TABLE 3 comparative experimental results of different methods on LiTS liver test set, 20% of labeled data

Table 4 shows the results of comparative experiments under skin tissue data, and it can be seen that the method of the present invention has better generalization performance. In addition, fig. 4 shows visualization results of liver segmentation of different methods under 10% labeled data, where the green part represents ground truth, the red part represents segmentation results, and the yellow part represents the overlap of segmentation results with ground truth, so that fewer green and red parts, and more yellow parts represent better segmentation results. Fig. 5 shows the segmentation result of the skin tissue lesion under the condition of 20% of training data, and as can be seen from the visualization result, the segmentation result of the invention is higher in quality compared with other methods.

Table 4 comparative experiments on skin lesion data sets

Method	Marked/unmarked	DI	JA	SE	AC	SP
							UNet++	2594/0	87.67	80.06	90.65	93.29	96.78
UNet++	259/0	82.57	73.55	88.31	91.00	93.76
							DAN(MACCAI 2017)	259/2335	84.26	75.15	87.23	91.97	95.75
Mean teacher(NIPS 2017)	259/2335	84.58	76.54	87.25	92.02	95.69
							UA-MT(MACCAI 2019)	259/2335	84.80	78.02	88.63	91.94	95.82
TCSM_v2(TNNLS 2020)	259/2335	84.71	75.55	90.22	91.92	95.77
							CPS(CVPR 2021)	259/2335	84.72	76.81	86.87	91.87	95.42
DA-Net(ours)	259/2335	85.19	77.80	89.38	91.78	96.80
							UNet++	519/0	84.36	75.64	88.83	92.15	94.95
DAN(MACCAI 2017)	519/2075	85.41	77.16	89.69	92.16	95.01
							Mean teacher(NIPS 2017)	519/2075	85.83	77.48	89.97	92.57	94.46
UA-MT(MACCAI 2019)	519/2075	86.19	78.06	91.15	92.64	94.49
							TCSM_v2(TNNLS 2020)	519/2075	86.16	77.98	91.07	92.56	94.26
CPS(CVPR 2021)	519/2075	86.34	78.17	90.57	92.72	94.78
							DA-Net(ours)	519/2075	86.63	78.37	90.72	93.09	94.52

In addition, as shown in table 4, the efficiency analysis of the different networks at the inference stage is shown. Since our proposed arbiter network is only used in the training phase, we only test the efficiency of the split network. In particular, the present invention uses a standard convolution that replaces the segmented network with a bidirectional attention component based on dynamic convolution, with the first layer excluded. The calculation estimation input size is 1 × 256 × 256. Obviously, the invention obviously improves the reasoning speed of the segmentation network and reduces the parameter quantity.

TABLE 5 comparison of computational efficiencies of different network models

Model	operations(GFLOPs)	parameters(M)	storage usage(MB)
				U-Net	65.39G	34.52M	131.82M
DA-Net	9.26G	5.18M	21.11M

The visualization module can visualize the loss curve of the training set, the accuracy curve of the test set and the segmentation result of the image and perform visualization.

The above detailed description is further intended to illustrate the objects, technical solutions and advantages of the present invention, and it should be understood that the above detailed description is only an example of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A semi-supervised organ and tissue image segmentation method based on a dual-countermeasure network is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

the method comprises the following steps: preprocessing original data according to an imaging principle of an image, and enhancing the contrast of an organ and tissue image to obtain a clearer organ and tissue image data set;

step two: performing feature extraction by using a designed convolutional neural network based on dual countermeasure according to the data set preprocessed in the step one, decoding the extracted features into low-dimensional feature representation, and outputting a segmentation result with a semantic level; adopting a dual-antagonism image segmentation model; the DA-Net network consists of two segmentation networks and two discriminator networks; the segmentation network consists of a student model and a teacher model; the student model and the teacher model have the same structure, and both are based on coding and decoding structures, and the difference is that the student model is trained by a loss function, while the teacher model is an exponential moving average of the weight of the student model; the discriminator network consists of a convolutional layer, a dynamic bidirectional attention component and a global average pool; in DA-Net, the average teacher is still used as the basic frame; in the arbiter network, adopting an antagonistic consistency training strategy, and realizing different targets by using double arbiters with the same structure; a first discriminator learns the prediction quality consistency of the segmentation networks of the unlabelled data and the labeled data; the second discriminator learns the prediction consistency of the teacher and student networks with the same data but under different disturbances; adopting the input of the discriminator network as the output result of the segmentation network and the original image; judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result; in the part of the segmentation network, a bidirectional attention component is applied to the segmentation network to replace the conventional standard convolution; the dynamic bidirectional attention component adaptively adjusts parameters of the convolution kernel according to different inputs, improves the characteristic representation capability of the network and reduces the risk of overfitting; the dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment; in addition, the segmentation network and the discriminator are alternately trained, and the discriminator is not needed in the reasoning stage, so that the additional calculation expense is avoided;

step three: performing joint learning on a small amount of data with accurate labels and a large amount of unmarked data based on the first step and the second step to reduce the dependence of a network on pixel-level labeled data and improve the knowledge migration capability of a segmentation network on labeled data to unmarked data, so as to perform accurate positioning and segmentation on organs and other tissues of a human body, obtain more precise organ and tissue image segmentation results, perform post-processing on the segmentation results p, and perform hole filling through morphological reconstruction to obtain precise organ and tissue image segmentation results;

step four: and evaluating the image segmentation result of the organ and the tissue by using the Dice coefficient and the average symmetric surface distance ASD, and realizing semi-supervised organ and tissue image segmentation based on a dual-countermeasure network.

2. The method for semi-supervised organ and tissue image segmentation based on the dual-countermeasure network as claimed in claim 1, wherein: step one, the image preprocessing method comprises two window level and window width adjustment, image resampling, noise adding, random rotation and random overturning;

step 1.1: selecting appropriate window level and window width parameters according to different segmentation tasks for organ data, and enhancing the contrast of a target organ;

because the contrast in the medical image data is low, the image is seriously interfered by noise and the sizes of the images are inconsistent, in order to acquire clear organ and tissue images, other clutter interference is reduced by cutting off the part which exceeds the range in the data; the contrast of the organ area is enhanced by setting a proper window level and window width; solving the problem of inconsistent image resolution through resampling operation, and preparing for next training of the segmentation model;

the main principle of the window level and width adjusting method in step 1.1 is that different pixel value domains in the image correspond to different abdominal organ tissues, and in order to enhance the target organ region, the data is normalized to be within the range of [ m, n ], so that a preprocessed image F is obtained; the image is first converted to Hounsfield (HU) values, the formula for calculation is given as:

HU＝Pixel*Rs+Ri

HU represents the density of different tissues, pixel is a Pixel value, an Rs value set by the method is a conversion coefficient, and a Ri value is conversion offset; after converting the image into Hounsfield values, the maximum value H of the tissue density is calculated _max And a minimum value H _min Adjusting the window level and the window width, wherein the specific formula is as follows:

H _max ＝(2*wc-ww)/2.0+0.5

H _min ＝(2*wc+ww)/2.0+0.5

wherein, ww (window width) is window width, wc (window center) is window level; normalizing the obtained HU values to a fixed value range [ m, n ], obtaining the preprocessed image F, with the formula:

step 1.2: carrying out data enhancement on data of different organizations;

3. The method of claim 2, wherein the semi-supervised organ and tissue image segmentation based on the dual countermeasure network comprises: in the second step, the specific implementation of the organ tissue segmentation network based on the dual confrontation network is divided into two parts, namely a generation process of a dynamic bidirectional attention component and a confrontation consistency training strategy;

step 2.1: generating a bidirectional attention component, wherein the dynamic bidirectional attention component adaptively adjusts parameters of a convolution kernel according to different inputs, improves the characteristic representation capability of the network, and reduces the risk of overfitting; the dynamic bidirectional attention component fully decouples the relation between the space and the channel, reduces the parameter quantity of the network while improving the characteristic expression capability of the network, improves the reasoning speed of the network, and is easier to deploy in edge equipment;

at given input data

C represents the number of input channels, and H multiplied by W represents the resolution of the input feature map; in order to enhance the significance of important spatial positions, firstly, the input feature map is focused through a simple space; carrying out dimension reduction on the input feature map by 1 multiplied by 1 convolution, then carrying out normalization by a sigmoid activation function, and multiplying the obtained space attention weight on the input feature map pixel by pixel to obtain an output feature map

First x ₁ Feature map obtained by global average pooling

Wherein N is the number of convolution kernels defined in advance, is a hyper-parameter, is set according to a specific task, and is set to be N =4 through experimental verification; multiplying the obtained coefficients p to N convolution kernels respectively, summing the weights of the convolution kernels, generating only one convolution kernel for convolution operation, and obtaining the weight of the convolution kernel

Is represented as follows:

wherein p is _i Denotes the i-th coefficient of p, 0 ≦ p _i ≤1，

w _i The weight of the ith convolution kernel;

Q＝C _in ×N+N×C _in ×C _out ×k×k

where k × k is the size of the convolution kernel, C _in And C _out The number of channels representing the input and output characteristic diagram; obviously, the parameter amount is N times or more of that of the normal convolution, and in order to reduce the parameter amount, the original parameter amount is reduced to

Step 2.2: adopting an antagonistic consistency training strategy, and realizing different targets by using double discriminators with the same structure; a first discriminator learns the prediction quality consistency of the segmentation networks of the unlabelled data and the labeled data; the second arbiter learns the prediction consistency of teacher and student network under different disturbance with same data; different from other confrontation networks, the input of the discriminator network is the output result and the original image of the segmentation network; judging the matching relation between the network learning segmentation result and the original by taking the original image as a reference, and further measuring the quality of the segmentation result; in the part of the segmentation network, a bidirectional attention component is applied to the segmentation network to replace the conventional standard convolution;

the antagonism consistency training frame is mainly added with two discriminator networks on the basis of an average teacher, the two discriminator networks have the same structure but different functions, and a discriminator D ₁ The learning network outputs the difference of quality between the marked data and the unmarked data; discriminator D ₂ Learning the difference of the unlabeled data under disturbance and undisturbed conditions; finally, loss L is monitored _s Loss of consistency L _semi And to combat the loss L _adv1 ，L _adv2 To encourage student network to mark unmarked numbersAccording to the generated high-quality prediction result, the countermeasure consistency training strategy adopting the DA-Net model structure is specifically realized by the following steps:

the counterstudy is realized in an alternate training mode, the segmentation network inputs medical images, outputs segmentation prediction graphs, splices the output of the segmentation network and the input images together and inputs the spliced output and the input images into a discrimination network, the output is the number of categories, 0 represents that the quality of segmentation results is poor, and 1 represents that the quality of the segmentation results is good; in training, it is encouraged to segment the network versus unlabeled data x _u Generating a high quality segmentation result with a score close to 1, in sum, during the training process, such that the segmentation network outputs a high quality segmentation result and such that the discrimination network cannot discriminate whether the input is from a label or the segmentation network, the segmentation network objective function L (θ) _s Is defined as:

discriminating the network wants to distinguish the output of the segmented network as much as possible, discriminator D ₁ And D ₂ Are defined as:

wherein L is _s (. Is) a multi-class cross-entropy penalty and dice penalty, L _semi Is mean square error, L _adv1 And L _adv2 Cross entropy losses for multiple classes; x is the number of _i And y _i For input data and corresponding labels, x _u And x _ema For input unlabeled data, with noise interference,

and

respectively the segmentation results of marked data and unmarked data,

is the prediction result of the teacher network; λ is a weighting coefficient, which is a gaussian rising curve,

i is the iteration number of training;

the parameters of the teacher model are EMA accumulations of student model parameters, and have proven effective in most methods, defined as:

θ’ _t ＝αθ′ _t-1 +(1-α)θ _t ，

wherein, theta' _t To update the parameters of the teacher model, θ _t The method comprises the following steps of (1) determining a weight parameter of a student, wherein alpha is a hyperparameter of a smoothing coefficient and determines the dependency relationship between a teacher model and a student model;

4. The method of claim 3, wherein the semi-supervised organ and tissue image segmentation based on the dual countermeasure network comprises: the implementation method of the fourth step is that,

the performance of the segmentation result is evaluated by using Dice coefficient (DI), jaccard similarity index (Jaccard index, JA), pixel precision (AC), sensitivity (SE), and Specificity (SP), and the specific formula is as follows:

wherein, TP, TN, FP and FN respectively represent a positive class with correct prediction, a parent class with correct prediction, a positive class with wrong prediction and a negative class with wrong prediction;

wherein, A and B respectively represent a true value and a segmentation result; s (A) and S (B) represent surface voxel sets of A and B, and d (-) represents Euclidean distance; the closer the Dice coefficient, the Jaccard similarity index, the pixel precision, the sensitivity and the specificity are to 1, the closer the Dice coefficient, the Jaccard similarity index, the pixel precision, the sensitivity and the specificity are to a true value, the better the segmentation effect is, the ASD represents the average distance of the surface, and the smaller the ASD represents the closer the Dice coefficient, the closer the Jaccard similarity index, the sensitivity and the specificity are to the true value; notably, the final organ segmentation results are evaluated by synthetic 3D.

5. The method of claim 4, wherein the semi-supervised organ and tissue image segmentation based on the dual countermeasure network comprises: and fifthly, designing a segmentation system of the human organs and the skin lesion tissues based on deep learning according to the image segmentation results of the organs and the tissues evaluated in the fourth step, and assisting in supporting artificial intelligence to segment the human abdominal organs and the tissues, marking and positioning the tissues, reconstructing images, evaluating risk degree, detecting focuses and training.

6. The method of claim 5, wherein the semi-supervised organ and tissue image segmentation based on the dual countermeasure network comprises: the auxiliary training comprises the step of assisting the daily training of medical staff based on the liver image segmentation data, and the auxiliary support improves the analysis precision and efficiency of abdominal organ images and tissue lesions.

7. A double-countermeasure-network-based semi-supervised organ tissue image segmentation system for implementing the double-countermeasure-network-based semi-supervised organ tissue image segmentation method of claim 1, 2, 4, 5 or 6, wherein: the system comprises an image preprocessing module, a network training module, a testing module and a visualization module;

the image preprocessing module is used for uniformly preprocessing an input image, mainly comprises data enhancement and resampling of the image, and is beneficial to accelerating the convergence speed of the model and reducing the overfitting risk;

the network training module is used for training the preprocessed data by integrating an average teacher and an antagonistic consistency training strategy and selecting iteration times, a learning rate and batch number;

the network test module is used for predicting the trained model and outputting an image segmentation prediction result;