WO2023065070A1

WO2023065070A1 - Multi-domain medical image segmentation method based on domain adaptation

Info

Publication number: WO2023065070A1
Application number: PCT/CN2021/124414
Authority: WO
Inventors: 乐美琰; 秦文健; 谢耀钦
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2023-04-27

Abstract

A multi-domain medical image segmentation method based on domain adaptation. The method comprises: training a variational autoencoder by using a set loss function as a target to extract latent space codes of different domains and style vectors of corresponding domains, wherein the variational autoencoder comprises an encoder and a decoder; for an image to be processed, using the variational autoencoder to infer domain information and extract domain style vectors, and subtracting style vectors of corresponding domains from latent space codes of the image to obtain de-stylized latent space codes; inputting the de-stylized latent space codes into the decoder to reconstruct an image having a unified style; and inputting the image having a unified style into a trained segmentation network to obtain a segmentation result. According to the method, domain gaps between multi-domain data are eliminated, the data volume for training the segmentation network is increased, and the image segmentation precision is improved.

Description

A Multi-Domain Medical Image Segmentation Method Based on Domain Adaptation

technical field

The present invention relates to the technical field of medical image processing, and more specifically, to a multi-domain medical image segmentation method based on domain self-adaptation.

Background technique

Clinically, the precise segmentation of medical images is of great significance. On the one hand, volume calculation and shape evaluation of the segmented structures can assist clinicians in diagnosing the health status of relevant parts or diagnosing the type of disease. On the other hand, analyzing the dose distribution information within the target area is an essential step in radiotherapy planning. In addition, the calculation of features such as the gray distribution in the target area is also helpful for efficacy analysis and prognosis. Currently, clinical image segmentation is usually done manually, which is time-consuming and labor-intensive.

In recent years, the development of artificial intelligence technology has brought new opportunities for medical image segmentation, which requires the collection of large amounts of medical image data to train machine learning models. However, due to privacy protection, medical imaging data is difficult to collect. For a certain part, it is easier to collect a small amount of data from multiple hospitals. However, since the instruments and image acquisition standards of each hospital are different, even if they belong to the same modality, these multi-source data will have great differences. In addition, because doctors are accustomed to observing and comparing images of multiple modalities at the same time, and tend to delineate the target area in the most clearly seen modality, there are significant differences between labeled data even for data collected in the same hospital. difference. The above reasons lead to multi-domain medical images, and there is only a small amount of data distribution in each domain. This data distribution poses a great challenge to the training of a single image segmentation network.

In the prior art, methods for segmenting target regions using multi-domain medical images are mainly divided into three categories, including transfer learning, style transfer, and feature mapping to common spaces, etc. Transfer learning is to obtain a pre-trained segmentation model by training a domain with a large amount of data, and then use a small amount of labeled data from other domains to fine-tune the pre-trained model to obtain a segmentation model suitable for each domain data. The disadvantage of this method is that it is necessary to find a domain with a large amount of data to pre-train the model, but generally the sample size of medical images collected in each domain is small, which leads to poor performance of the pre-training model and also affects to the model performance after migration.

Style transfer is to realize the style transfer of medical images through Generative Adversarial Networks (GAN). Specifically, it is necessary to take a domain with a large amount of data as the source domain, and train the segmentation network on the source domain, then convert the style of other domain images into the style of the source domain through GAN, and finally use the segmentation network of the source domain to split. This approach also requires a large number of medical image samples. This method first selects a source domain, and other domains are used as target domains. In order to convert the image style of the target domain to the style of the source domain, a GAN needs to be trained between each target domain and the source domain, and the parameter amount increases linearly with the number of domains, especially when the number of domains is large, it needs to occupy More resources.

In addition, the feature mapping to the common space is to shorten the distance between the feature spaces by adding the features extracted from different domains to the confrontation loss or the constraints on the distribution, so as to obtain the common feature space, and then the features in the space Unified decoding to get the segmentation result.

The current scheme of mapping features to public space usually uses VAE (Variational Autoencoder, Variational Autoencoder) to extract features. In order to map the images of each domain to the same feature space, KL divergence is often used to constrain the distribution of the coding space of VAE. For example, constraining the distribution of the coding space for all domains to a standard Gaussian distribution. When all domains share a VAE, the mean absolute error loss in the VAE enables features in the common space to be decoded to return images with domain style, which indicates that the feature distribution of each domain deviates from the standard Gaussian distribution, and the deviation of each domain There is a large difference in orientation, so there is still an inter-domain gap when performing image segmentation. And when each domain trains a VAE separately, it will cause a huge resource occupation.

In summary, the precise segmentation of medical images can assist doctors to diagnose and treat related diseases more effectively. However, current medical images often exist scattered in the form of multi-source or multi-modal, and only a small amount of data exists in each domain. And the images of multiple domains are directly mixed together to train the image segmentation model, and if you want to train a model suitable for all domains, the function mapping relationship of the model will be very complicated, so underfitting is very likely to occur during the training process Or the problem of overfitting to certain domains.

Contents of the invention

The purpose of the present invention is to overcome the defective of above-mentioned prior art, provide a kind of multi-domain medical image segmentation method based on domain self-adaptation, this method comprises the following steps:

Step S1: Train a variational autoencoder with the set loss function as the target to extract latent space codes of different domains and style vectors of corresponding domains, wherein the variational autoencoder includes an encoder and a decoder;

Step S2: For the image to be processed, use the variational autoencoder to infer the domain information and extract the domain style vector, subtract the style vector of the corresponding domain from the latent space code of the image, and obtain the destylized latent space code;

Step S3: Input the de-stylized latent space code into the decoder to reconstruct an image with a unified style;

Step S4: Input the image of the unified style into the trained segmentation network to obtain a segmentation result.

Compared with the prior art, the advantage of the present invention is that by destylizing the latent space encoding, the domain gap between multi-domain data is eliminated, and images with a uniform style are obtained to train the segmentation network. Compared with directly training with a single domain, this method increases the amount of data for training the segmentation network and can improve the segmentation accuracy. In addition, the present invention theoretically has no limit to the number of domains, and increasing the number of domains will not obviously increase the parameter quantity of the network.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of a domain-adaptive-based multi-domain medical image segmentation process according to an embodiment of the present invention;

Fig. 2 is a flowchart of a multi-domain medical image segmentation method based on domain adaptation according to an embodiment of the present invention.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

The present invention aims to extract latent space encoding features of different domains by training a shared VAE to reduce resource occupation. The domain gap between multi-domain data is then eliminated by destylizing the latent space encoding. And when training VAE, the latent space encoding features of images in different domains are guided into non-overlapping distributions, thereby avoiding the directional bias problem that occurs when all domains are guided to a standard Gaussian distribution. Further, after destylizing the encoding of the latent space, all domains will be mapped to a common feature space, and then a unified style image will be decoded to train the subsequent segmentation network. In this way, images from all domains are used to train the segmentation network, which is equivalent to enriching the amount of training data.

Referring to Fig. 1, for the sake of clarity, it is described in the form of functional modules, which generally include a VAE module, a destylization module and a segmentation module. Suppose there are a total of N fields, where M _i represents the image of the i-th field,

Represents the hidden space features extracted from the i-th domain image, and the reconstructed image of the j-th domain obtained after transform decoding.

Represents the image of a unified style decoded after the feature vector of the i-th domain is decentralized,

and Y _i denote the segmentation result and the gold standard of the i-th domain image, respectively.

Specifically, as shown in FIG. 1 and FIG. 2 , the provided multi-domain medical image segmentation method based on domain adaptation includes the following steps.

Step S210, training a variational autoencoder to extract latent space coding features for multi-domain images.

The variational self-encoder includes an encoder and a decoder, where the encoder takes multi-domain images as input to extract latent space encoding features, and the decoder is used to realize image reconstruction based on latent space encoding features. The variational self-encoder is used to encode the image to obtain a feature representation vector, which contains the information of the original image (that is, the original image can be obtained by decoding).

In one embodiment, the loss function for training a variational autoencoder is expressed as:

Among them, D _KL represents the loss formed by KL divergence, which can make the two distributions as close as possible. For example, the calculation for this term is expressed as:

in,

Indicates the style vector of the domain, which is a vector with a length of N, the value of the i-th component of the vector is 5, and the others are all 0,

Represents an identity matrix with a dimension of N×N, where it is hoped that the latent space encoding of samples in different domains will be close to different Gaussian distributions. Σ _i represents the encoding of the covariance matrix of the image latent space of the i-th domain,

Consistent with the covariance matrix of the standard Gaussian distribution, x represents the sample obtained by sampling the Gaussian distribution N(μ,Σ), and _μi represents the encoding of the mean value of the image latent space of the i-th domain.

Indicates that the absolute error between the reconstructed image and the original image is calculated pixel by pixel, and the mean value can be used. This loss (loss) ensures that the latent space coding can be decoded and returned to the original image, that is, to ensure that the extracted latent space coding retains the structure and other information of the original image. In addition to decoding and returning the same domain image, it is also possible to transform the latent space encoding

Make it decoded and return the image of the jth domain, so that N-1 images of other domains can be obtained, and then the image of the corresponding domain can be extracted from the real data to calculate the confrontation loss of the two

Make the reconstructed image more realistic.

In one embodiment, the

and

Set as:

Among them, W represents the width of the image, H represents the height of the image,

Indicates the value of image M _i at position (x, y),

represent image

The value at position (x, y), D represents the discriminator,

Indicates the data distribution of the i-th image domain,

Denotes the data distribution of images of the j-th domain generated from images of the i-th domain.

In this step, the essence of VAE is to use the encoder to learn the style of the image (latent space coding) under the premise of maintaining the structure information, and then decode the structure and the selected style to obtain the reconstructed image with the same structure and different styles. . During the training process, preferably, the style encoding vector is closer to the set one-hot encoding, so that the image style in the later segmentation is easier to control and eliminate.

Step S220, extracting the image style of each domain, subtracting the style vector of the domain from the latent space code of the image to obtain a de-stylized latent space code, and reconstructing an image with a unified style.

After training the VAE module, the style of the image can be extracted and controlled. In order to unify the image style of each domain, it is first necessary to clarify the domain of the image, and then subtract the style vector of the domain from the latent space encoding μ _i of the image

Make the latent space coding of images in all domains approach (0,0,…,0), that is, destylized latent space coding. Inputting the de-stylized latent space encoding vector into the decoder can reconstruct a unified style image

For subsequent training of the unified segmentation network.

It should be noted that the domain information of each image is known only in the training phase, but not in the testing phase. During the test (or when the image to be processed is actually divided), an image whose domain is unknown will be obtained. At this time, it needs to be input to the VAE encoder, and the domain information is inferred according to the obtained latent space encoding (as shown in formula 5), and finally Decentralization according to determined domain style vectors.

domain ^* ＝argmax(softmax(μ _i )) (5)

Step S230, using images of a unified style to train a segmentation network.

In one embodiment, the image segmentation network uses the U-Net framework to supervise the update of network parameters through two loss weights of Dice and cross entropy. The total segmentation loss L _seg is expressed as:

L _Seg = L _Dice + λ ₃ L _CE (6)

Among them, Y represents the gold standard of segmentation tasks,

Represents the prediction result of the segmentation network, N _c represents the number of categories to be segmented, x and y represent the spatial coordinates,

Indicates the probability that the (x, y) position predicted by the segmentation network belongs to the c-th class, and λ ₃ is the set weight parameter.

In this step, images of all domains can be used to train the segmentation network, making full use of the data and increasing the training sample size of the segmentation network.

It should be noted that, without departing from the spirit and scope of the present invention, those skilled in the art may make appropriate changes or modifications to the above-mentioned embodiments. For example, other loss functions are used to train variational autoencoders or image segmentation networks, such as likelihood loss, exponential loss, etc.

To sum up, the present invention designs a multi-domain medical image segmentation method based on domain adaptation, which maps all domain samples to a common feature space, improves the utilization rate of marked multi-domain medical images, and makes The trained unified segmentation network is more robust. And a method based on VAE encoding domain style is proposed, and the style encoding vector is closer to the set encoding value, so that the image style in the later segmentation is easier to control, and it is beneficial to the later style elimination. In addition, a method of style removal is proposed, and the latent space vector obtained by VAE encoding is subtracted from the defined domain style vector to obtain the latent space vector of style removal. Compared with the distribution of feature vectors in the latent space and the standard Gaussian distribution directly, the domain deviation obtained in this way of the present invention will not show strong directional differences, so as to map images of different domains to a common feature space A more efficient solution is provided.

The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A multi-domain medical image segmentation method based on domain adaptation, comprising the following steps:

Step S1: Train a variational autoencoder with the set loss function as the target to extract latent space codes of different domains and style vectors of corresponding domains, wherein the variational autoencoder includes an encoder and a decoder;

Step S2: For the image to be processed, use the variational autoencoder to infer the domain information and extract the domain style vector, subtract the style vector of the corresponding domain from the latent space code of the image, and obtain the destylized latent space code;

Step S3: Input the de-stylized latent space code into the decoder to reconstruct an image with a unified style;

Step S4: Input the image of the unified style into the trained segmentation network to obtain a segmentation result.
The method according to claim 1, wherein the loss function of training the variational autoencoder is set to:

where D KL represents the loss constituted by the KL divergence, expressed as:

in
Represents a vector of length N,
Represents an identity matrix with a dimension of N×N, Σ i represents the encoding of the covariance matrix of the image latent space of the i-th domain, μ i represents the encoding of the mean value of the image latent space of the i-th domain,
Indicates that the absolute error between the reconstructed image and the original image is calculated pixel by pixel, and the hidden space coding is transformed
Make it decoded and return the image of the jth field,
Represents the confrontation loss between the image extracted from the corresponding domain and the reconstructed image in the real data, λ 1 and λ 2 represent the weight of the corresponding item, and x represents the sample obtained by sampling the Gaussian distribution N(μ,Σ).
The method according to claim 2, characterized in that,
and
respectively set to:

Among them, W represents the width of the image, H represents the height of the image,
Indicates the value of image M i at position (x, y),
represent image
The value at position (x, y), D represents the discriminator,
Indicates the data distribution of the i-th image domain,
Denotes the data distribution of images of the j-th domain generated from images of the i-th domain.
The method according to claim 1, characterized in that in step S2, the domain information is inferred according to the following formula:

domain * ＝argmax(softmax(μ i ))

Among them, μi represents the latent space encoding.
The method according to claim 1, wherein the loss function of training the segmentation network is set to:

L Seg = L Dice + λ 3 L CE

Among them, L Seg represents the total segmentation loss, L Dice represents the Dice loss, L CE represents the cross-entropy loss, and λ3 represents the weight coefficient.
method according to claim 5, is characterized in that, Dice loss and cross entropy loss are respectively set to:

Among them, Y represents the gold standard of segmentation tasks,
Represents the prediction result of the segmentation network, N c represents the number of categories to be segmented, x and y represent the spatial coordinates,
Indicates the probability that the (x, y) position predicted by the segmentation network belongs to the c-th class.
The method according to claim 1, characterized in that, in the process of training the variational autoencoder, the style encoding vector is drawn closer to the set one-hot encoding.
The method according to claim 1, wherein the segmentation network adopts a U-Net framework and utilizes images of multiple domains for training.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are realized.
A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored on the memory, wherein any one of claims 1 to 8 is implemented when the processor executes the program The steps of the method described in the item.