CN116630634B

CN116630634B - Image processing method, device, equipment and storage medium

Info

Publication number: CN116630634B
Application number: CN202310618973.0A
Authority: CN
Inventors: 孙安澜
Original assignee: Zhejiang Yizhun Intelligent Technology Co ltd
Current assignee: Zhejiang Yizhun Intelligent Technology Co ltd
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2024-01-30
Anticipated expiration: 2043-05-29
Also published as: CN116630634A

Abstract

The present disclosure provides an image processing method, apparatus, device, and storage medium, by acquiring an image to be processed; inputting the image to be processed into a back diffusion network in a stable diffusion model to obtain a denoised image to be processed; the stable diffusion model is a model trained by alternately using paired data and unpaired data, the problem that the paired data does not exist in ultrasonic image denoising in the prior art can be effectively solved, and training logic alternately performed by paired training and unpaired training is designed, so that the model not only has strong supervision of paired denoising tasks, but also can capture the special noise distribution characteristic of an ultrasonic image from unpaired training, and noise of the ultrasonic image is effectively removed.

Description

Image processing method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of medical images, and in particular relates to an image processing method, an image processing device, image processing equipment and a storage medium.

Background

Ultrasonic inspection is an inspection and diagnosis method for diagnosing human diseases by using an ultrasonic diagnostic apparatus.

The most easily influencing diagnosis in ultrasonic imaging is speckle noise, and most of the current speckle noise treatment depends on the traditional image post-treatment technology, such as Lee filtering algorithm; a Kuan filtering algorithm; a correlation denoising method; the wavelet threshold denoising method also comprises a PM model; an SRAD model; NCD model, etc.

These conventional denoising algorithms, while capable of partially removing noise in an ultrasound image, have difficulty in dealing with noise that is complex in distribution. Noise artifacts in ultrasound images present a considerable challenge to the generalization and robustness of algorithms, especially in the field of intelligent image analysis. The main processing method for the problems is to collect the ultrasonic data of each model to be incorporated into model training, and although the accuracy and generalization of the model can be improved, the image collection cost of the ultrasonic data is high, more model data also means the increase of the labeling cost, so that the comprehensive cost is obviously multiplied. In addition, image collection of ultrasound data is difficult, and as such, the prior art does not have an effective method to address such image noise.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, device, and storage medium, to at least solve the above technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided an image processing method, the method comprising:

acquiring an image to be processed;

inputting the image to be processed into a back diffusion network in a stable diffusion model to obtain a denoised image to be processed;

The stable diffusion model is a model trained by alternately using paired data and unpaired data.

In one embodiment, within the stable diffusion model, the training process using paired data includes:

obtaining a standard sample image, wherein the standard sample image is a noiseless image or an image with noise content lower than preset noise;

inputting the standard sample image into a diffusion network in the stable diffusion model to obtain a noisy sample image characteristic;

inputting the characteristics of the noise-added sample image into a back diffusion network in the stable diffusion model to obtain a first noise-removed sample image;

comparing the first denoised sample image with the standard sample image and optimizing parameters of the diffusion network and the inverse diffusion network in combination with a pairing loss function.

In an embodiment, the inputting the standard sample image into the diffusion network in the stable diffusion model to obtain the noisy sample image feature includes:

inputting the standard sample image into an encoder in the diffusion network to obtain standard image characteristics;

and adding the noise characteristics of the preset times to the standard image characteristics to obtain noise-added sample image characteristics.

In one embodiment, the inputting the noise-added sample image features into the back-diffusion network in the stable diffusion model, to obtain a first noise-removed sample image, includes:

removing noise characteristics of preset times from the noise-added sample image characteristics in the inverse diffusion network to obtain intermediate image characteristics;

and inputting the intermediate image characteristic into a decoder in the inverse diffusion network to obtain a first denoising sample image.

In one embodiment, within the stable diffusion model, the training process using unpaired data includes:

acquiring a noise sample image;

inputting the noise sample image into a back diffusion network in the stable diffusion model to obtain a second denoising sample image;

and inputting the noise sample image, the second denoising sample image and the standard sample image into a perception discriminator in the stable diffusion model, and optimizing parameters of the perception discriminator by combining a non-pairing loss function.

In an embodiment, the comparing the noise sample image, the second denoising sample image, and the standard sample image, and optimizing model parameters of the stable diffusion model in combination with a non-pairing loss function includes:

Inputting the noise sample image, the second denoising sample image and the standard sample image into a feature extraction layer in the perception discriminator to obtain noise semantic features, denoising texture features and standard texture features;

processing the denoising texture features and the standard texture features, inputting the processed denoising texture features and the standard texture features into a texture discriminator in the perception discriminator, and optimizing parameters of the texture discriminator and the feature extraction layer according to a judging result and a texture loss function;

inputting the noise semantic features and the denoising semantic features into a content discriminator in the perception discriminator, and optimizing parameters of the content discriminator according to a judging result and a content loss function.

In one embodiment, the processing the de-noised texture features and the standard texture features and inputting into a texture discriminator within the perception discriminator includes:

converting the denoising texture features and the standard texture features into a denoising texture matrix and a standard texture matrix;

and inputting the denoising texture matrix and the standard texture matrix into a full-connection layer and a normalization layer in the texture discriminator.

According to a second aspect of the present disclosure, there is provided an image processing apparatus including:

the acquisition module is used for acquiring the image to be processed;

the denoising module is used for inputting the image to be processed into a back diffusion network in the stable diffusion model to obtain a denoised image to be processed;

In an embodiment, the method further comprises:

the training module is used for acquiring a standard sample image, wherein the standard sample image is a noiseless image or an image with noise content lower than preset noise; inputting the standard sample image into a diffusion network in the stable diffusion model to obtain a noisy sample image characteristic; inputting the characteristics of the noise-added sample image into a back diffusion network in the stable diffusion model to obtain a first noise-removed sample image; comparing the first denoised sample image with the standard sample image and optimizing parameters of the diffusion network and the inverse diffusion network in combination with a pairing loss function.

In an embodiment, the training module is specifically further configured to:

inputting the standard sample image into an encoder in the diffusion network to obtain standard image characteristics; and adding the noise characteristics of the preset times to the standard image characteristics to obtain a noise-added sample image.

In an embodiment, the training module is specifically further configured to:

removing noise characteristics of preset times from the noise-added sample image characteristics in the inverse diffusion network to obtain intermediate image characteristics; and inputting the intermediate image characteristic into a decoder in the inverse diffusion network to obtain a first denoising sample image.

In an embodiment, the training module is specifically further configured to:

acquiring a noise sample image; inputting the noise sample image into a back diffusion network in the stable diffusion model to obtain a second denoising sample image; and inputting the noise sample image, the second denoising sample image and the standard sample image into a perception discriminator in the stable diffusion model, and optimizing parameters of the perception discriminator by combining a non-pairing loss function.

In an embodiment, the training module is specifically further configured to:

inputting the noise sample image, the second denoising sample image and the standard sample image into a feature extraction layer in the perception discriminator to obtain noise semantic features, denoising texture features and standard texture features; processing the denoising texture features and the standard texture features, inputting the processed denoising texture features and the standard texture features into a texture discriminator in the perception discriminator, and optimizing parameters of the texture discriminator and the feature extraction layer according to a judging result and a texture loss function; inputting the noise semantic features and the denoising semantic features into a content discriminator in the perception discriminator, and optimizing parameters of the content discriminator according to a judging result and a content loss function.

In an embodiment, the training module is specifically further configured to:

converting the denoising texture features and the standard texture features into a denoising texture matrix and a standard texture matrix; and inputting the denoising texture matrix and the standard texture matrix into a full-connection layer and a normalization layer in the texture discriminator.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described in the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the present disclosure.

The image processing method, the device, the equipment and the storage medium disclosed by the invention are used for acquiring an image to be processed; inputting the image to be processed into a back diffusion network in a stable diffusion model to obtain a denoised image to be processed; the stable diffusion model is a model trained by alternately using paired data and unpaired data, the problem that the paired data does not exist in ultrasonic image denoising in the prior art can be effectively solved, and training logic alternately performed by paired training and unpaired training is designed, so that the model not only has strong supervision of paired denoising tasks, but also can capture the special noise distribution characteristic of an ultrasonic image from unpaired training, and noise of the ultrasonic image is effectively removed.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Fig. 1 shows a schematic implementation flow diagram of an image processing method according to an embodiment of the disclosure;

FIG. 2 illustrates a schematic implementation flow diagram of an exemplary image processing method provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram showing the structure of an image processing apparatus according to a third embodiment of the present disclosure;

fig. 4 shows a schematic diagram of a composition structure of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure will be clearly described in conjunction with the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

Currently, ultrasound is used in a wide range of medical fields. Various clear section figures of various organs of the human viscera can be obtained by ultrasonic waves. The ultrasonic comparison is suitable for diagnosing various visceral diseases such as liver, gall bladder, kidney, bladder, uterus, ovary and the like. The price of ultrasonic examination is low, adverse reaction is avoided, and the ultrasonic examination can be repeatedly performed.

With the development of scientific technology, more and more novel ultrasonic scanning technologies are invented, such as color Doppler ultrasound, three-dimensional ultrasound and the like. The principle of ultrasonic examination is to examine the size, shape and internal echo of internal organs of the human body using ultrasound as a medium. The conduction of ultrasonic waves in a human body has the characteristic of sound waves, can generate refraction, reflection and diffraction, and has different refraction and reflection rates for different tissues. The ultrasonic probe transmits ultrasonic waves and receives the ultrasonic waves, and clear ultrasonic images are obtained through post-treatment.

Speckle noise is the easiest to affect diagnosis in ultrasonic imaging, and the presence of significant domain gap between cross-model ultrasonic images at the noise level can also present challenges to the generalization of the algorithm. And speckle noise patterns in ultrasound images have a significant impact on lesion observations. The most mainstream treatment method for the problems is to collect ultrasonic data of each model to be incorporated into model training, which can improve the accuracy and generalization of the model, but has extremely high cost. Ultrasound images are different from scans such as CT, MRI and the like, and most hospitals do not archive ultrasound scanning videos. This determines that the image collection costs are relatively high and must be collected by a hospital prospective. Meanwhile, the ultrasonic scanning is a video mode, the marking cost is huge, and more machine type data means the marking cost is increased by times. Finally, ultrasound has mature technology, large equipment and scanning amount, quick technology updating and iteration, and a large number of brands and models in China and worldwide, which puts high demands on data size. It is a higher cost matter to simply rely on expanding the data volume to promote the ultrasound generalization effect.

To overcome this difficulty, many studies have been focused on improving the performance of ultrasound models under different noise conditions using techniques such as domain generalization. There are two types of work: and optimizing the generalization of the model. The optimization method is characterized in that the image data is not edited, but is edited on the characteristic level of the model, so that the training domain characteristics are aligned with the characteristics of certain specific model domains; aiming at the generalization and optimization of data, the method is characterized in that model parameters are kept motionless, and editing is carried out in an image space, so that an ultrasonic image needing generalization is aligned with an image in a training set.

Both methods have certain effects, but all domain generalization algorithms have certain performance degradation and poor denoising effects, so the embodiment provides an image processing method, which specifically comprises the following steps:

fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure, where the method may be performed by an image processing apparatus according to an embodiment of the present disclosure, and the apparatus may be implemented in software and/or hardware. The method specifically comprises the following steps:

s110, acquiring an image to be processed.

Wherein the image to be processed is an ultrasound image with noise. The embodiment can directly collect the image of the patient after ultrasonic examination as the image to be processed; the method can also receive images transmitted by other electronic equipment in a wired or wireless mode and serve as images to be processed.

S120, inputting the image to be processed into a back diffusion network in a stable diffusion model to obtain a denoised image to be processed; the stable diffusion model is a model trained by alternately using paired data and unpaired data.

The stable diffusion model is a stable diffusion model with image priori, and is an image editing network, and the main network comprises a diffusion network and an inverse diffusion network. In the process of training the network, the diffusion network is used for adding noise to the received image, pairing data is manufactured manually, and the inverse diffusion network is used for denoising the image after noise is added. And after the mature stable diffusion model is obtained, the model inputs the received image to be processed into a back diffusion network, so that the image to be processed can be denoised. The pairing data are a first denoising sample image and a standard sample image; the unpaired data is a noise sample image, a second denoised sample image, and a standard sample image (described in detail below).

In the process of using the stable diffusion model to apply, the diffusion process is abandoned, and only the inverse diffusion process is used, wherein the process takes a vector sampled randomly as input, and after the image to be processed is subjected to denoising operation for preset times, a pair of denoised image to be processed is finally obtained through a decoder.

Due to the nature of ultrasound scanning, it is difficult to obtain paired noisy and noiseless images, and thus it is difficult to create effective supervision for the denoising task. For the stable diffusion model and the image editing network commonly used for image editing such as VAE, auto-Encoder and UNet, the original training constraint is in a paired form, namely, a hidden layer feature corresponds to a specific output image, and the hidden layer feature is obtained by a specific image encoding. To solve the above problem, we have devised a training scheme in which paired supervision and unpaired supervision are mixed, that is, a scheme in which paired supervision and unpaired supervision are alternately trained.

In an embodiment of the present disclosure, within the stable diffusion model, a process of training using paired data includes: obtaining a standard sample image, wherein the standard sample image is a noiseless image or an image with noise content lower than preset noise; inputting the standard sample image into a diffusion network in the stable diffusion model to obtain a noisy sample image characteristic; inputting the characteristics of the noise-added sample image into a back diffusion network in the stable diffusion model to obtain a first noise-removed sample image; comparing the first denoised sample image with the standard sample image and optimizing parameters of the diffusion network and the inverse diffusion network in combination with a pairing loss function.

The first denoising sample image is an image obtained by denoising the standard sample image. In this embodiment, the image with low noise or no noise as the standard sample image in the training mode is selected, and then the noise feature is randomly added to the standard sample image during each training to make the real input sample of the noise editing network. The purpose of this training pattern is to reconstruct the corresponding standard samples directly from the noisy image.

In the case of pairing training, the embodiment has the original image with no specific noise added to the denoised image, namely the pairing data composed of the first denoised sample image and the standard sample image. The pairing loss function in this part is therefore relatively simple, and as a result, the two-norm loss between the image generated by the steady diffusion model decoder and the original image, i.e. the two-norm loss between the first denoised sample image and the standard sample image, is obtained.

In an embodiment of the present disclosure, the inputting the standard sample image into the diffusion network in the stable diffusion model, to obtain a noisy sample image feature, includes: inputting the standard sample image into an encoder in the diffusion network to obtain standard image characteristics; and adding the noise characteristics of the preset times to the standard image characteristics to obtain noise-added sample image characteristics.

The noise characteristic may be gaussian noise or laplace noise. The encoder is used to extract image features, which may be, for example, the backbone of ResNet 50. The noise-added sample image is an image obtained by adding noise to the standard image features.

The stable diffusion model provided in this embodiment is an improvement based on the diffusion model, and the main training and reasoning process follows the flow of the diffusion model. In the training phase, the stable diffusion model consists of two parts: a forward diffusion process and a backward denoising process. The forward diffusion process is non-parametric except for the encoder, i.e. its diffusion process has no model parameters available for learning. The forward diffusion process takes any standard sample image as input, compresses the image to 1024-dimensional vector by using a back bone of ResNet50 as encoder, and defines F _o . Then successively to F _o Adding noise a predetermined number of times, for example 500 times in total, the added noise being 1024-dimensional vector V sampled from N (0, 1) _i . And the noise added at each step is multiplied by a weight beta between 0 and 1 _i+1 And adding the vector to the vector of the previous step. The formulation is:

wherein beta is _i+1 As the weight, V _i+1 F is a 1024-dimensional vector sampled from N (0, 1) _i To increase the characteristic of the noisy sample image after i times of noise, F _i+1 To increase the noise added sample image characteristics after i+1 times of noise.

This embodiment can be considered as a vector that matches N (0, 1) after the addition of noise by the number of prejudgments. This is to gradually change the image coding vector into an isotropically distributed gaussian vector by continuously adding gaussian noise to the image coding vector during diffusion.

In an embodiment of the present disclosure, the inputting the characteristics of the noisy sample image into the back diffusion network in the stable diffusion model, to obtain a first denoised sample image includes: removing noise characteristics of preset times from the noise-added sample image characteristics in the inverse diffusion network to obtain intermediate image characteristics; and inputting the intermediate image characteristic into a decoder in the inverse diffusion network to obtain a first denoising sample image.

The first denoising sample image is an image obtained by denoising the characteristics of the denoising sample image. The decoder is for generating a first denoised sample image.

Specifically, the back diffusion process and the diffusion process are two symmetrical opposite processes, and the backward back diffusion process is a denoising process, and a large number of learnable parameters exist in the process. The integral backward diffusion process corresponds to the forward diffusion process, specifically, the image features of the noise-added sample are input into a back diffusion network, a vector is randomly sampled from N (0, 1), the denoising operation is carried out for preset times, the intermediate image features are obtained, and then the intermediate image features are input into a decoder in the back diffusion network, so that a first denoising sample image is obtained.

In particular, the back-diffusion process is consistent with the steady diffusion, and in the decoder section, each decoding step uses a modified version of the UNet network, whose overall structure is consistent with UNet, but the feature down-sampled and up-sampled sections incorporate the transcoder. And, unlike stable diffusion, the multi-modal feature is used as vector K and vector V of the encoder. In back diffusion ofThe Q vector of the in-process encoder is from F _i The result after encoding. The vector K and the vector V come from the inputted picture a priori, and under the back diffusion process of the embodiment, that is, the standard sample image is inputted to the encoder in the diffusion network, the standard image characteristic is obtained.

Compared with the prior domain generalization thought, the embodiment designs a noise editing algorithm to directly remove noise from the image. Moreover, we have made more thought and design in the image generator section than existing deep learning denoising methods, which mostly use full convolution networks or UNet-like downsampling or upsampling networks for image editing. Although the method can capture image details, the process does not consider the characteristics of ultrasonic data and the characteristics of tasks, and the problems of image blurring after denoising and the like are easy to occur. Therefore, the characteristic of the denoising task is considered, and the SOTA algorithm in the image editing field is referred to, and a stable diffusion model is introduced for noise editing. Meanwhile, corresponding loss functions and training logic are designed, so that the generated image is clearer and more reliable.

Although the pairing training process described above may give the model a strong target constraint, there are two disadvantages: firstly, because the ultrasonic image is closely related to factors such as doctor operation methods, patient postures and the like, and has irreproducibility, the strictly matched noise-free and noise-free real data cannot be acquired by replacing machines and the like; second, paired de-noised data can be constructed by adding pairs of noise constructed samples, but artificial noise cannot migrate directly to the ultrasound image noise distribution. To further enhance the effect, the present embodiment proposes unpaired supervised training. In unpaired supervised training, the present embodiment cannot directly supervise the output of a particular input. However, it is constant that the desired output of the denoising task is two-fold: first, the output image is noiseless or low noise; second, the output image content is consistent with the original image. Therefore, by introducing a mechanism of countermeasure training, we can design loss functions for two targets respectively, i.e. the purpose of denoising without damaging the content can be achieved.

In this embodiment, the training process using unpaired data in the stable diffusion model includes: acquiring a noise sample image; inputting the noise sample image into a back diffusion network in the stable diffusion model to obtain a second denoising sample image; and inputting the noise sample image, the second denoising sample image and the standard sample image into a perception discriminator in the stable diffusion model, and optimizing parameters of the perception discriminator by combining a non-pairing loss function.

The noise sample image is an image which truly contains noise, for example, an ultrasonic image which contains noise after being scanned. The second denoised sample image is a denoised noise sample image. The perception discriminator comprises a texture discriminator and a content discriminator, and the noise characteristic is also an underlying texture, so the texture discriminator is used for enabling the noise characteristic in the second denoising sample image to approach the standard sample image infinitely by comparing the second denoising sample image with the standard sample image. And because the main presentation contents of the second denoising sample image and the noise sample image are consistent, the content discriminator is used for comparing the second denoising sample image and the noise sample image, so that the content of the second denoising sample image approaches to the noise sample image infinitely. The unpaired penalty function includes the penalty functions involved in both the texture arbiter and the content arbiter.

In this embodiment, the comparing the noise sample image, the second denoising sample image, and the standard sample image, and optimizing the model parameters of the stable diffusion model in combination with the unpaired loss function includes: inputting the noise sample image, the second denoising sample image and the standard sample image into a feature extraction layer in the perception discriminator to obtain noise semantic features, denoising texture features and standard texture features; processing the denoising texture features and the standard texture features, inputting the processed denoising texture features and the standard texture features into a texture discriminator in the perception discriminator, and optimizing parameters of the texture discriminator and the feature extraction layer according to a judging result and a texture loss function; inputting the noise semantic features and the denoising semantic features into a content discriminator in the perception discriminator, and optimizing parameters of the content discriminator according to a judging result and a content loss function.

The perception discriminator adopts an ImageNet backbone network, is a framework for vision related tasks, and a model which is pretrained by adopting the framework has strong generalization capability, is easier to migrate, and can be well applied to tasks for extracting texture features and content features. The feature extraction layer may be a ResNet network that is used to extract noise semantic features, denoising texture features, and standard texture features.

The noise semantic features are features of related semantic parts adopted in five layers of downsampled features of the noise sample image obtained by inputting the noise sample image into a ResNet network; the denoising semantic feature is a feature of a related semantic part adopted in five layers of features after downsampling of the second denoising sample image by inputting the second denoising sample image into a ResNet network; the denoising texture feature is that the features of the texture part are adopted in five layers of downsampled features of the second denoising sample image obtained by inputting the second denoising sample image into a ResNet network; the standard texture feature is the feature of the texture part adopted in the five-layer downsampled feature obtained by inputting the standard sample image into a ResNet network.

Because strict supervision of paired data cannot be directly obtained under the unpaired training condition, the embodiment trains by taking the noise sample image, the second denoising sample image and the standard sample image as unpaired data, and the loss generated by training is countermeasures. Thus, the present embodiment in turn uses the distribution of the two unpaired data sets against loss constraints to optimize the generated results.

Specifically, in order to guide editing of noise images, the embodiment may first use a set where a noise-free and low-noise standard sample image is located, and then use the set as a standard set. Then, the embodiment screens out one piece of I from the noise sample image, and transmits the I to the noise editing network (namely, the inverse diffusion network in the stable diffusion model) to generate a denoised image to be processed, and the second denoised sample image is taken as O. And a standard sample image S is selected from the standard set. These three images were then passed into a pre-trained ResNet on ImageNet to obtain five-layer downsampled features. We divide the res net into two groups based on its different downsampling characteristics to represent the content differently: texture features of the first three layers and semantic features of the second two layers, namely noise semantic features, denoising texture features and standard texture features. For the texture features of the first three layers, the present embodiment expects the denoising texture features to be able to approach the standard texture features. For the second two layers of semantic features, the embodiment expects that the de-noised semantic features can be close to noise semantic features (i.e., the semantic features of the noisy artwork).

The present embodiment uses a stable diffusion model for noise editing. The stable diffusion model has several benefits: firstly, a stable diffusion model is an SOTA algorithm in the field of image editing of the current image generation; secondly, the process of stabilizing the diffusion model is to train a denoising algorithm, which is naturally consistent with the task requirements of the embodiment; third, the stable diffusion model may incorporate a number of modal priors, such as images and text, for guiding image generation, such that the generated image meets some of the specific requirements of the present embodiment.

Fig. 2 is a flowchart illustrating an exemplary image processing method according to an embodiment of the present disclosure. The vector Q, the vector K and the vector V are vectors used in the diffusion process and the inverse diffusion process in the model.

Specifically, the input of the present embodiment may be two sets of input images (input image, standard sample) and (input image, noisy sample). The set of input images (input map, noisy samples) is first explained, which is aimed at letting the model learn the task of denoising. Specifically, the input image is an arbitrary standard sample image I without noise/with low noise, and the noise adding sample is an image I obtained by randomly adding Gaussian noise to the standard sample image I _noise I.e. the noisy sample image. Then the standard sample image I is sent into a noise editing network to generate a denoised picture O through the characteristic extraction of an encoder ₁ I.e. the first denoised sample image. The first de-mapping is then pulled by pairing the loss-optimized generated graphThe distribution of the noise sample image and the standard sample image achieves the aim of enabling the noise editing network to learn the denoising task.

Because of the mode of the set of input images (input image, noisy samples), the model can learn the task of denoising, but the noise comes from gaussian noise, which does not conform to the characteristics of the noise distribution in the ultrasound image. The addition of this training pattern allows the model to be adapted to the ultrasound image.

In particular, the set of input images (input images, standard samples) is interpreted, the purpose of which is to let the model learn the real noise distribution in the ultrasound images. The input image is any pair of noisy real images I', i.e. noisy sample images. The standard sample is any pair of standard sample images I without noise/low noise. Subsequently, the noise sample image I' is sent into a noise editing network through the extraction feature of a right encoder to generate a denoised image O ₂ I.e. the second denoised sample image. Subsequently, the texture arbiter and the content arbiter constrain the generated graph while maintaining the second denoised sample image O ₂ In agreement with the content of the noise sample image I', the second denoised sample image O is zoomed in ₂ And texture distribution of the standard sample image I. The reason why pairing loss is not directly used here is that: second denoised sample image O ₂ As with the two graphs for which the standard sample image I is not consistent in content (paired), the use of the pairing penalty (two-norm penalty) directly results in the generated graph being identical to the standard sample image I, thus changing the content. To avoid the above-described problem, the present embodiment turns the denoising task into a texture editing task, with noise as a local texture feature. Thus, the present embodiment may instead require a second denoised sample image O ₂ The texture is consistent with the texture of the standard sample image I, while the content is consistent with the noise sample image I'. The embodiment utilizes the characteristic that CNN extraction features gradually aggregate to whole content from local, so that editing of separated textures and content can be realized by setting different constraints on different layer features.

In this embodiment, the processing the denoising texture feature and the standard texture feature and inputting the processed denoising texture feature and the processed denoising texture feature into the texture arbiter in the perception arbiter includes: converting the denoising texture features and the standard texture features into a denoising texture matrix and a standard texture matrix; and inputting the denoising texture matrix and the standard texture matrix into a full-connection layer and a normalization layer in the texture discriminator.

For pairing training and unpaired training, the Loss function is designed in the Loss module respectively, and for pairing training, the pairing Loss function is designed in the embodiment; for the unpaired training case, this embodiment designs two countermeasures against loss, generated by the texture and content discriminators, respectively.

For the loss related to the texture features of the first three layers, the embodiment converts the denoising texture features and the standard texture features into corresponding Gram matrix Gram respectively _o And Gram (Gram) _s Namely a denoising texture matrix and a standard texture matrix. Then, the embodiment performs concatenation on the denoising texture matrix and the standard texture matrix to input the texture discriminator. The texture identifier in this embodiment may be a fully connected network layer, where the last layer outputs a 2-dimensional vector, and the 2-dimensional vector passes through the normalization layer softmax to form a binary probability distribution. The training goal of the texture discriminant is to discriminate Gram _o And Gram (Gram) _s The optimized parameters are texture arbiter parameters and ResNet parameters.

At the same time, the diffusion model optimization parameters attempt to fool the texture discriminant. For loss of content feature correlation because of the second denoised sample image O ₂ Is in one-to-one correspondence with the noise sample image I', so the embodiment can directly calculate the second denoising sample image O ₂ And the loss of the two norms between the corresponding features of the noise sample image I' is taken as loss.

The Loss calculation module in the embodiment ensures that the generated image is consistent with the content of the input image while removing noise by restraining the characteristics of the generated image in the training stage. The Loss calculation module contains three parts of losses: pairing loss, noise texture loss, and content loss. In a word, the noise characteristics and the semantic characteristics are effectively separated through the optimization method and the loss function of unpaired training, and the countermeasure optimization is provided, so that the scheme of the method is applied to the problem of noise editing of the ultrasonic image, has strong generalization, can be applied to image editing tasks such as image generalization of the ultrasonic image crossing machine type images, and can also ensure that the image after image denoising is not distorted.

In the training process, the present embodiment alternately uses paired data training and unpaired data training, specifically, for example: training process 1 of paired data, training process 1 of unpaired data, training process 2 of paired data, training process 2 of unpaired data and the like, and so on, so that the model can be from strong supervision of paired denoising tasks, and can capture noise distribution characteristics specific to ultrasonic images from unpaired training.

In addition, the image processing method provided in this embodiment may be applied to other types of medical images, such as an X-ray image, a computed tomography (Computed Tomography, CT) image, and the like, in addition to an ultrasound image.

Fig. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, where the apparatus specifically includes:

an acquisition module 310, configured to acquire an image to be processed;

the denoising module 320 is configured to input the image to be processed into a back diffusion network in a stable diffusion model, so as to obtain a denoised image to be processed;

In an embodiment, the method further comprises:

In an embodiment, the training module is specifically further configured to:

removing noise characteristics of preset times from the noise-added sample image in the inverse diffusion network to obtain intermediate image characteristics; and inputting the intermediate image characteristic into a decoder in the inverse diffusion network to obtain a first denoising sample image.

In an embodiment, the training module is specifically further configured to:

According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, for example, an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When a computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method, the method comprising:

acquiring an image to be processed;

the stable diffusion model is a model trained by alternately using paired data and unpaired data, wherein the paired data comprises a first denoising sample image and a standard sample image;

wherein, in the stable diffusion model, a training process using unpaired data includes: acquiring a noise sample image; inputting the noise sample image into a back diffusion network in the stable diffusion model to obtain a second denoising sample image; inputting the noise sample image, the second denoising sample image and the standard sample image into a perception discriminator in the stable diffusion model, and optimizing parameters of the perception discriminator by combining a non-pairing loss function;

wherein inputting the noise sample image, the second denoised sample image, and the standard sample image into a perception discriminant within the stable diffusion model, and optimizing parameters of the perception discriminant in combination with a non-paired loss function, comprises: inputting the noise sample image, the second denoising sample image and the standard sample image into a feature extraction layer in the perception discriminator to obtain noise semantic features, denoising texture features and standard texture features; processing the denoising texture features and the standard texture features, inputting the processed denoising texture features and the standard texture features into a texture discriminator in the perception discriminator, and optimizing parameters of the texture discriminator and the feature extraction layer according to a judging result and a texture loss function; inputting the noise semantic features and the denoising semantic features into a content discriminator in the perception discriminator, and optimizing parameters of the content discriminator according to a judging result and a content loss function.

2. The method of claim 1, wherein within the stable diffusion model, the process of training using paired data comprises:

3. The method of claim 2, wherein said inputting the standard sample image into the diffusion network within the stable diffusion model results in a noisy sample image, comprising:

and adding the noise characteristics of the preset times to the standard image characteristics to obtain a noise-added sample image.

4. A method according to claim 3, wherein said inputting the noisy sample image features into the back-diffusion network within the stable diffusion model results in a first denoised sample image, comprising:

5. The method of claim 1, wherein said processing said de-noised texture features and said standard texture features and inputting into a texture arbiter within said perceptual arbiter comprises:

6. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring the image to be processed;

7. The apparatus of claim 6, wherein within the stable diffusion model, the process of training using paired data comprises:

obtaining a standard sample image, wherein the standard sample image is a noiseless image or an image with noise content lower than preset noise; inputting the standard sample image into a diffusion network in the stable diffusion model to obtain a noisy sample image characteristic; inputting the characteristics of the noise-added sample image into a back diffusion network in the stable diffusion model to obtain a first noise-removed sample image; comparing the first denoised sample image with the standard sample image and optimizing parameters of the diffusion network and the inverse diffusion network in combination with a pairing loss function.

8. The apparatus of claim 6, wherein said inputting the standard sample image into the diffusion network within the stable diffusion model results in a noisy sample image, comprising:

9. An electronic device, comprising:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-5.