WO2023202283A1 - 图像生成模型的训练方法、图像生成方法、装置及设备 - Google Patents

图像生成模型的训练方法、图像生成方法、装置及设备 Download PDF

Info

Publication number
WO2023202283A1
WO2023202283A1 PCT/CN2023/081897 CN2023081897W WO2023202283A1 WO 2023202283 A1 WO2023202283 A1 WO 2023202283A1 CN 2023081897 W CN2023081897 W CN 2023081897W WO 2023202283 A1 WO2023202283 A1 WO 2023202283A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sample
image generation
sample image
images
Prior art date
Application number
PCT/CN2023/081897
Other languages
English (en)
French (fr)
Inventor
黄雅雯
郑冶枫
周鹤翔
周毅
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023202283A1 publication Critical patent/WO2023202283A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks

Definitions

  • the present application relates to the field of image processing technology, and in particular to an image generation model training method, image generation method, device and equipment.
  • High-resolution three-dimensional medical images can display detailed information more clearly, contribute to medical diagnosis and analysis, and play an important role in the medical field. Since the imaging cost of high-resolution three-dimensional images is high and requires a long acquisition time, it is difficult to directly obtain high-resolution three-dimensional images. Currently, high-resolution three-dimensional images are usually obtained indirectly by generating high-resolution three-dimensional images based on low-resolution three-dimensional images.
  • Embodiments of the present application provide an image generation model training method, image generation method, device and equipment.
  • a method for training an image generation model includes: training a correction layer in the image generation model based on a plurality of first sample images and a plurality of second sample images.
  • the correction layer For converting the image type of the input sample image, the resolution of the first sample image is higher than the second sample image; based on the trained correction layer, the plurality of first sample images are Perform processing to obtain a plurality of third sample images; based on the plurality of first sample images and the plurality of third sample images, train an image generation layer in the image generation model, and the image generation layer Used to increase the resolution of the input sample image.
  • an image generation method including: processing the first image through a correction layer in the image generation model to obtain a second image, the correction layer being used to convert the image of the input image type, the second image belongs to the target image type; the second image is processed through the image generation layer in the image generation model to obtain a third image, and the image generation layer is used to improve the quality of the input image resolution.
  • a training device for an image generation model includes: a first training module for correcting corrections in the image generation model based on a plurality of first sample images and a plurality of second sample images.
  • layer for training the correction layer is used to convert the image type of the input sample image, the resolution of the first sample image is higher than the second sample image; the second training module is used to based on the trained The correction layer processes the plurality of first sample images to obtain a plurality of third sample images; the second training module is also used to obtain a plurality of third sample images based on the plurality of first sample images and the plurality of first sample images.
  • a third sample image is used to train an image generation layer in the image generation model, where the image generation layer is used to improve the resolution of the input sample image.
  • an image generation device includes: a correction module for processing the first image through a correction layer in the image generation model to obtain a second image.
  • the correction layer is used for converting the The image type of the input image, the second image belongs to the target image type;
  • the image generation module is used to process the second image through the image generation layer in the image generation model to obtain a third image, the The image generation layer is used to increase the resolution of the input image.
  • a computer device includes a processor and a memory.
  • the memory is used to store at least one computer program.
  • the at least one computer program is loaded and executed by the processor to implement the present application.
  • a computer-readable storage medium stores at least one section of a computer program.
  • the at least one section of the computer program is loaded and executed by a processor to realize the image in the embodiment of the present application.
  • Generative model training methods or image generation methods are provided.
  • a computer program product includes a computer program that, when executed by a processor, implements the training of the image generation model provided in each of the above aspects or in various optional implementations of each aspect. method or image generation method.
  • Figure 1 is a schematic diagram of the implementation environment of an image generation model training method provided according to an embodiment of the present application
  • Figure 2 is a flow chart of a training method for an image generation model provided according to an embodiment of the present application
  • Figure 3 is a flow chart of another training method for an image generation model provided according to an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of an image generation model provided according to an embodiment of the present application.
  • Figure 5 is an effect display diagram provided according to an embodiment of the present application.
  • Figure 6 is an experimental effect diagram provided according to an embodiment of the present application.
  • Figure 7 is a flow chart of an image generation method provided according to an embodiment of the present application.
  • Figure 8 is a schematic diagram of an image generation process provided according to an embodiment of the present application.
  • Figure 9 is a block diagram of a training device for an image generation model provided according to an embodiment of the present application.
  • Figure 10 is a block diagram of another training device for an image generation model provided according to an embodiment of the present application.
  • Figure 11 is a block diagram of an image generation device provided according to an embodiment of the present application.
  • Figure 12 is a structural block diagram of a terminal provided according to an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a server provided according to an embodiment of the present application.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application All are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the sample images involved in this application were obtained with full authorization.
  • the method of generating high-resolution three-dimensional images based on low-resolution three-dimensional images relies more on manually matched training data sets and test data sets. Due to the lack of manually matched data sets, the efficiency of generating high-resolution three-dimensional images is low.
  • Data domain is also called value domain.
  • a domain is a collection of technical attributes used to describe a field, including data type, data length, number of decimal points, and value range. Fields with the same definition of the above technical attributes can be included in the same domain, and when the attribute definition of the domain changes, the attributes of all fields that reference it will be prompted.
  • Voxel is the abbreviation of volume element (Volume Pixel).
  • a solid containing a voxel can be represented by stereo rendering or by extracting a polygonal isosurface of a given threshold contour. As the name suggests, it is the smallest unit of digital data in three-dimensional space segmentation. Voxels are used in fields such as three-dimensional imaging, scientific data, and medical imaging. Conceptually similar to the smallest unit of two-dimensional space - pixel, pixel is used in the image data of two-dimensional computer images.
  • Spatial resolution refers to the size of the ground range represented by pixels, that is, the instantaneous field of view of the scanner, or the smallest unit that can resolve ground objects.
  • spatial resolution refers to the minimum distance between two adjacent features that can be identified on remote sensing images. For photographic images, it is usually expressed by the number of distinguishable black and white "line pairs" contained within the unit length (line pairs/mm); For scanning images, the instantaneous field of view is usually used.
  • Spatial resolution is one of the important indicators for evaluating sensor performance and remote sensing information, and is also an important basis for identifying the shape and size of ground objects. The intuitive understanding of spatial resolution is the critical geometric size of an object that can be identified through an instrument.
  • Generative Adversarial Networks is a deep learning model and one of the most promising methods for unsupervised learning on complex distributions in recent years.
  • the model learns through the mutual game of (at least) two modules in the framework: the Generative Model and the Discriminative Model, producing a pretty good output GAN.
  • the embodiment of the present application provides a training solution for an image generation model, which is executed by a computer device, and the computer device is provided as a terminal or a server.
  • the following uses a computer device as a server as an example to introduce the implementation environment of the solution provided by the embodiment of the present application.
  • Figure 1 is a schematic diagram of the implementation environment of an image generation model training method provided by the embodiment of the present application. Referring to Figure 1, the implementation environment includes a terminal 101 and a server 102.
  • the terminal 101 and the server 102 can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application.
  • the terminal 101 is a smartphone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, etc., but is not limited thereto.
  • the terminal 101 is installed and runs an application program for image processing, and the image is a two-dimensional image or a three-dimensional image.
  • the application program sends the image to be processed to the server 102 and displays the image returned by the server 102; or, the application program obtains a trained image generation model from the server 102, and based on the image generation model, the image to be processed is displayed. Images are processed.
  • the number of terminals 101 can be more or less. For example, there may be one terminal 101, or there may be dozens, hundreds, or more terminals. The embodiment of the present application does not limit the number and device types of terminals 101.
  • the server 102 is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, Cloud servers for basic cloud computing services such as cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the server 102 is used to provide background services for the above-mentioned application programs for image processing.
  • the server 102 undertakes the main image processing work, and the terminal 101 undertakes the secondary image processing work; or the server 102 undertakes the secondary image processing work, and the terminal 101 undertakes the main image processing work; or the server 102 and the terminal 101 A distributed computing architecture is used among them for collaborative image processing.
  • the server 102 obtains multiple authorized three-dimensional medical images from multiple terminals 101 , where the multiple three-dimensional medical images are The image includes a three-dimensional medical image with a first resolution and a three-dimensional medical image with a second resolution, and the first resolution is higher than the second resolution. Then the server 102 uses the multiple acquired three-dimensional medical images as sample images, and trains the image generation model based on the image generation model training method provided in the embodiment of the present application. The server 102 can process the three-dimensional medical image of the second resolution uploaded by the terminal 101 based on the image generation model to generate a three-dimensional medical image of the first resolution. Then, the server 102 returns the generated three-dimensional medical image to the terminal 101, and the terminal 101 displays it.
  • the computer device can be configured as a terminal or a server.
  • Figure 2 is a flow chart of a training method for an image generation model provided according to an embodiment of the present application. As shown in Figure 2, in the embodiment of the present application, execution by a server is used as an example for explanation. The method includes the following steps:
  • the server trains a correction layer in the image generation model based on the plurality of first sample images and the plurality of second sample images.
  • the correction layer is used to convert the image type of the input sample image.
  • the first sample The resolution of the image is higher than the second sample image.
  • the server is the server 102 shown in FIG. 1
  • the first sample image and the second sample image are two-dimensional images or three-dimensional images.
  • the resolution of the first sample image is higher than the resolution of the second sample image.
  • first The sample image is a sample image with a first resolution
  • the second sample image is a sample image with a second resolution, where the first resolution is higher than the second resolution.
  • the embodiment of the present application compares the first resolution and the second resolution.
  • the resolution is not limited.
  • the image types of the first sample image and the second sample image are different, and the server can train the correction layer based on two sample images of different resolutions and different image types, so that the trained correction layer can convert the image from the first sample image to the second sample image.
  • the image type to which the second sample image belongs is converted to the image type of the first sample image.
  • the first sample image and the second sample image appear in pairs, that is, the server trains the correction layer in the image generation model based on multiple sample image pairs, each sample image pair includes a A first sample image and a second sample image, the first sample image and the second sample image have different image types, and the resolution of the first sample image is higher than the resolution of the second sample image. In this way, it is equivalent to multiple first sample images and multiple second sample images, forming multiple sample image pairs.
  • the two sample images in each sample image pair have the same or different image content, but each sample image needs to be maintained.
  • Two sample images in each sample image pair are input to the image generation model at the same time, that is, the first sample image and the second sample image in each sample image pair have an associated relationship, and this associated relationship indicates that the first sample The image and the second sample image are input into the image generation model at the same time.
  • the server processes the plurality of first sample images based on the trained correction layer to obtain a plurality of third sample images.
  • the server processes the first sample image based on the trained correction layer to obtain a third sample image associated with the first sample image.
  • the third sample The association relationship between the image and the first sample image indicates that the third sample image is generated from the first sample image.
  • the resolution of the third sample image is the second resolution, that is, the resolution of the third sample image is lower than the resolution of the first sample image, and the image type to which the third sample image belongs is the same as the first sample image.
  • the server trains an image generation layer in the image generation model based on the plurality of first sample images and the plurality of third sample images.
  • the image generation layer is used to improve the resolution of the input sample image.
  • the server processes the third sample image based on the image generation layer to improve the resolution of the third sample image.
  • the server uses the first sample image associated with the third sample image as supervisory information to train the image generation layer to improve the accuracy of the image generated by the image generation layer.
  • This application provides a training solution for an image generation model.
  • the correction layer can convert the image to a preset image type. Since each third sample The images are all obtained by processing a first sample image based on the correction layer. The higher-resolution first sample image associated with each third sample image is used as supervision information, and each third sample image is used as input. Images are used to train the image generation layer, which can improve the image reconstruction ability of the image generation layer.
  • the image generation model can convert the image type of the sample image, the model training can be completed without using the training data set and the test data set of the same image type, which can overcome the problem of lack of training data of the same image type and reduce the cost of image generation. The training cost of the model improves the efficiency of generating images with higher resolution.
  • FIG. 3 is a flow chart of another training method for an image generation model provided according to an embodiment of the present application.
  • a computer device is used as a server as an example for explanation. The method includes the following steps:
  • the server obtains multiple first sample images and multiple second sample images, and the resolution of the first sample images is higher than that of the second sample images.
  • the server obtains a three-dimensional image set, which includes multiple three-dimensional images.
  • the server pre-processes the multiple three-dimensional images in the three-dimensional image set to obtain multiple first sample images and multiple second sample images. sample image, ensuring that the resolution of the first sample image is higher than the resolution of the second sample image.
  • the parameters of the obtained first sample image and the second sample image can meet the conditions of the input model.
  • the HCP image set contains more than 1,000 brain structure samples.
  • the three-dimensional image set includes both high-resolution images with a spatial resolution of 0.7mm and a voxel size of 320*320*256, and low-resolution images with a spatial resolution of 2.0mm and a voxel size of 113*136*113.
  • the server randomly segments the high-resolution image to obtain an image with voxels of 80*80*80 as the first sample image, and randomly segments the low-resolution image to obtain an image with voxels of 40*40*40 as the second sample.
  • the server before the server inputs the first sample image and the second sample image into the model, it can also randomly flip the sample image around the spatial axis of the sample image.
  • the flip angle is 90°, 180° or 270°. The embodiments of the present application do not limit this.
  • the server uses an iterative approach to train a correction layer in the image generation model based on multiple first sample images and multiple second sample images.
  • the correction layer is used to convert the input image to which it belongs. image type.
  • the structure of the correction layer is a generative adversarial network. See steps 302 to 305 below.
  • the learning rate is initialized to 10 -4 and reduced to one-third of the previous value every 100 epochs.
  • the server uses more than 300 epochs to train the correction layer, and the mini-batch (minimum batch) size is 4.
  • the first sample image and the second sample image appear in pairs, that is, the server trains the correction layer in the image generation model based on multiple sample image pairs, each sample image pair includes a A first sample image and a second sample image, the first sample image and the second sample image have different image types, and the resolution of the first sample image is higher than the resolution of the second sample image. In this way, it is equivalent to multiple first sample images and multiple second sample images, forming multiple sample image pairs.
  • the two sample images in each sample image pair have the same or different image content, but each sample image needs to be maintained.
  • Two sample images in each sample image pair are input to the image generation model at the same time, that is, the first sample image and the second sample image in each sample image pair have an associated relationship, and this associated relationship indicates that the first sample The image and the second sample image are input into the image generation model at the same time.
  • the server processes multiple first sample images respectively to obtain multiple fourth sample images.
  • the resolution of the fourth sample images is lower than that of the first sample images.
  • the server reduces the resolution of the first sample image to obtain a fourth sample image associated with the first sample image, the first sample image and the fourth sample image.
  • the correlation between the sample images indicates that the fourth sample image is generated by transforming the first sample image.
  • the server first adds noise to the sample image and then reduces the resolution of the sample image.
  • the server adds Gaussian noise to the first sample image, and then reduces the resolution of the first sample image after adding Gaussian noise to obtain the first sample image.
  • the server performs image interpolation, downsampling or other methods on the first sample image after adding Gaussian noise to reduce the resolution of the first sample image to obtain the fourth sample image associated with the first sample image. Sample images, the embodiments of this application do not limit this.
  • the image generation model trained based on the sample image can be applied to situations where the image includes noise, thereby improving the accuracy of the image generation model.
  • the above-mentioned processing process of the first sample image can be expressed by the following formula (1).
  • y LR D ⁇ (y HR +n) (1)
  • y LR represents the fourth sample image
  • D ⁇ () represents the interpolation function
  • y HR represents the first sample image
  • n represents Gaussian noise.
  • this step 302 is implemented by the correction layer of the server based on the image generation model, that is, after the server inputs each first sample image into the correction layer, the correction layer processes the first sample image; or this step Step 302 is implemented by the server based on the image processing layer, that is, the server inputs the first sample image to the image processing layer, and the image processing layer processes the first sample image.
  • the server Based on the correction layer in the image generation model, the server processes the plurality of fourth sample images respectively, and obtains To a plurality of first intermediate images, the plurality of first intermediate images belong to the first image type.
  • the fourth sample image belongs to the second image type.
  • the server converts the image type of the fourth sample image through the correction layer to obtain a first intermediate image associated with the fourth sample image.
  • the correction layer can be trained based on the accuracy of the conversion, so that the correction layer can accurately convert the image to the first image type.
  • the data fields to which the images belong are also different.
  • CT Computer Tomography
  • MRI Magnetic Resonance Imaging
  • the image type to which the image belongs can also be called the data domain to which the image belongs.
  • the noise and blur kernel distributions of each image in the same data domain are the same or similar.
  • the high-resolution first sample image belongs to the second image type
  • the low-resolution second sample image belongs to the first image type
  • the fourth sample obtained after resolution degradation of the first sample image The image type of the image will not change, so the low-resolution fourth sample image also belongs to the second image type.
  • the correction layer is used to convert the fourth sample image of the second image type into the first intermediate image of the first image type.
  • the fourth sample image with low resolution is converted from the first image type.
  • the high-resolution data domain to which this image belongs is corrected to the low-resolution data domain to which the second sample image belongs.
  • the correction layer in the image generation model includes a first generator for converting an image from a second image type to a first image type.
  • the first generator is used for converting an image from a second image type to a first image type. Convert a data field to another data field.
  • the server can process the fourth sample image based on the first generator.
  • the server convolves the fourth sample image through the first generator to obtain a first intermediate image associated with the fourth sample image.
  • the backbone network of the first generator is a three-dimensional convolutional neural network.
  • the backbone network connects multiple dense blocks through residual connections, and the dense blocks include multiple three-dimensional convolutional layers.
  • the first generator consists of three dense blocks including four three-dimensional convolutional layers.
  • the three-dimensional convolution kernel of the three-dimensional convolution layer is replaced with a lightweight convolution module: the queue module (queue module), in order to On the premise of maintaining the original performance, reducing the number of parameters of the first generator can thereby reduce the training difficulty of the correction layer and the computing resources consumed when using the correction layer.
  • the above-mentioned first intermediate image can be expressed as G X (y LR ), where y LR represents the fourth sample image and G X () represents the first generator.
  • the plurality of first intermediate images in the embodiment of the present application are intermediate images in the training process. For convenience of description, it is considered that the plurality of first intermediate images belong to the first image type.
  • the correction layer cannot accurately convert the fourth sample image to the first image type, and the first intermediate image obtained at this time does not belong to the first image type.
  • the correction layer can accurately convert the fourth sample image to the first image type, and the first intermediate image obtained at this time belongs to the first image type.
  • the server Based on the correction layer in the image generation model, the server processes the plurality of second sample images respectively to obtain a plurality of second intermediate images, and the plurality of second intermediate images belong to the second image type.
  • the second sample image belongs to the first image type.
  • the server can convert the image type of the second sample image through the correction layer to obtain a second intermediate image associated with the second sample image.
  • the correction layer can be trained based on the accuracy of the conversion, so that the correction layer can accurately convert the image to the second image type.
  • the sample image pair includes a pair of sample images: a high-resolution first sample image, and a low-resolution second sample image, where the first sample image It belongs to the second image type, that is, it belongs to the high-resolution data domain, and the second sample image belongs to the first image type, that is, it belongs to the low-resolution data domain.
  • image degradation is performed on the high-resolution first sample image of the sample image pair to obtain a low-resolution fourth sample image.
  • the image type of the fourth sample image is different. change, so the fourth sample image also belongs to the second image type, that is, it belongs to the high-resolution data domain.
  • domain offset is performed on the fourth sample image, and the fourth sample image is The image is converted from the second image type to the first image type to obtain a first intermediate image.
  • the first intermediate image is not only a low-resolution image, but also corrected to a low-resolution data domain.
  • domain offset is performed on the low-resolution second sample image in the sample image pair, and the second sample image is converted from the first image type to the second image type to obtain the second intermediate image.
  • the second sample image is Although the intermediate image is a low-resolution image, it is corrected to the high-resolution data domain.
  • the correction layer in the image generation model includes a second generator for converting images from a first image type to a second image type.
  • the server can process the second sample image through the second generator.
  • the server can convolve the second sample image through the second generator to obtain a second intermediate image associated with the second sample image.
  • the backbone network of the second generator is a three-dimensional convolutional neural network.
  • the backbone network connects multiple dense blocks through residual connections, and the dense blocks include multiple three-dimensional convolutional layers.
  • the second generator consists of three dense blocks including four three-dimensional convolutional layers.
  • the three-dimensional convolution kernel of the three-dimensional convolution layer is replaced with a lightweight convolution module: the queue module (queue module), in order to On the premise of maintaining the original performance, reducing the number of parameters of the second generator can thereby reduce the training difficulty of the correction layer and the computing resources consumed when using the correction layer.
  • the above-mentioned second intermediate image can be expressed as G Y (x LR ), where x LR represents the second sample image and G Y () represents the second generator.
  • the plurality of second intermediate images in the embodiment of the present application are intermediate images in the training process.
  • the plurality of second intermediate images are considered to belong to the second image type.
  • the correction layer cannot accurately convert the second sample image to the second image type, and the second intermediate image obtained at this time does not belong to the second image type.
  • the correction layer can accurately convert the second sample image to the second image type, and the second intermediate image obtained at this time belongs to the second image type.
  • the server trains the correction layer in the image generation model based on the plurality of first intermediate images, the plurality of second sample images, the plurality of fourth sample images, and the plurality of second intermediate images.
  • the training goal of the correction layer is to make the first intermediate image and the second sample image belong to the same image type, and the second intermediate image and the fourth sample image belong to the same image type.
  • the server can train the correction layer through differences between the plurality of first intermediate images and the plurality of second sample images, and the differences between the plurality of fourth sample images and the plurality of second intermediate images, so that the correction layer can accurately Convert images to other image types.
  • each sample image pair ⁇ first sample image, second sample image ⁇ is the unit, where the first sample image is a high-resolution image and belongs to the high-resolution data domain (the second image type ), the second sample image is a low-resolution image and belongs to the low-resolution data domain (first image type).
  • After performing the above step 304 on the second sample image in four low-resolution images associated with each sample image pair can be obtained: the first intermediate image, the second sample image, the fourth sample image and the second intermediate image, where , the fourth sample image is a low-resolution image obtained by direct image degradation of the first sample image, and belongs to the high-resolution data domain.
  • the first intermediate image is a low-resolution image obtained by domain migration of the fourth sample image through a correction layer.
  • rate image which belongs to the low-resolution data domain
  • the second intermediate image is a low-resolution image obtained by domain shifting the second sample image through the correction layer, and belongs to the high-resolution data domain.
  • the training goal of the correction layer is to train the correction layer in the image generation model through the first intermediate image, the second sample image, the fourth sample image and the second intermediate image associated with each sample image pair, such that The difference between the two low-resolution images ⁇ the second sample image, the first intermediate image ⁇ (both belonging to the first image type) belonging to the low-resolution data domain is minimized, and the difference between the two low-resolution images belonging to the high-resolution data domain is minimized. Minimize the difference between low-resolution images ⁇ the fourth sample image, the second intermediate image ⁇ (both belong to the second image type).
  • the correction layer in the image generation model includes a first discriminator and a second discriminator, both of which are used to determine similarity between images.
  • the associated discriminant information can be obtained through the discriminator.
  • the associated discriminant information can also be obtained through the discriminator.
  • the above two discriminant information can reflect the difference between the image from the generator and the real image. similarity between. Based on the above two discriminant information, the adversarial losses of the generator and the discriminator are determined.
  • the server can process a plurality of first intermediate images and a plurality of second sample images through the first discriminator, and process a plurality of second intermediate images and a plurality of fourth sample images through the second discriminator.
  • the server can use the first discriminator to distinguish the first intermediate image and the second sample image associated with the first intermediate image, and obtain the first intermediate image and the second sample image. similarity between images.
  • the server can use the second discriminator to distinguish the second intermediate image and the fourth sample image associated with the second intermediate image, and obtain the similarity between the second intermediate image and the fourth sample image.
  • the server can generate corrections in the image generation model through similarities between the plurality of first intermediate images and the plurality of second sample images and between the plurality of second intermediate images and the plurality of fourth sample images. layer for training.
  • the first discriminator and the second discriminator are constructed for PatchGAN (discriminator of fully convolutional network).
  • the second discriminator discriminates the difference between the fourth sample image belonging to the second image type and the second intermediate image converted to the second image type, and adjusts the parameters of the correction layer based on the discrimination result, thereby improving the correction layer conversion The accuracy of the image type of the image.
  • the correction layer makes two low-resolution images ⁇ the second sample image, the first intermediate image ⁇ that belong to the same low-resolution data domain (both belong to the first image type) The difference is minimized and the difference between the two low-resolution images ⁇ the fourth sample image, the second intermediate image ⁇ (both belonging to the second image type) belonging to the high-resolution data domain is minimized. Therefore, in the correction layer Two discriminators are configured in it. The first discriminator is used to determine the similarity between ⁇ second sample image, first intermediate image ⁇ belonging to the first image type, and the second discriminator is used to determine the similarity between ⁇ second sample image, first intermediate image ⁇ belonging to the same type.
  • the discriminator calculates two similarities, that is, the similarity between two low-resolution images in the low-resolution data domain calculated according to the first discriminator, and the similarity between two low-resolution images in the high-resolution data domain calculated according to the second discriminator.
  • the similarity of the resolution image in this way, based on the multiple similarities output by the first discriminator for multiple sample image pairs, and the multiple similarities output by the second discriminator for multiple sample image pairs, image generation can be achieved
  • the correction layer in the model is trained.
  • the first discriminator and the second discriminator have the same structure.
  • the first discriminator includes six convolutional layers, the convolutional step size is all 1, and the convolutional layer is connected to the LeakyRelu (activation function) layer. Except for the first and last layers, the convolutional layers are replaced by lightweight convolutional modules to reduce the number of parameters of the first discriminator while maintaining the original performance, thereby reducing the training difficulty and difficulty of the correction layer. For the computing resources consumed when using the correction layer, see the generator, which will not be described again here. It should be noted that the LeakyRelu layer is not connected to the normalization layer.
  • L adv (G Y , D Y , X LR , Y LR ) represents the adversarial loss of the second generator and the second discriminator, represents the data distribution of x subject to P(X LR ), and P(X LR ) represents the data distribution of x LR in the image type X LR , Indicates the data distribution of y obeying P(Y LR ), P(Y LR ) indicates the data distribution of y LR in the image type Y LR , ⁇ indicates the weight parameter, E x ⁇ P(penalty) indicates that x obeys P(penalty) Data distribution, P (penalty) represents the gradient penalty distribution of x LR , represents the gradient, D Y () represents the second discriminator.
  • the image type X LR represents the first image type, that is, the low-resolution data field
  • the image type Y LR represents the second image type, that is, the high-resolution data field.
  • the correction layer can convert y LR to image type X LR and then convert it back to image type Y LR , similarly it can convert x LR to image type Y LR and then convert it back to image type X LR . Cycle consistency is lost via Formula (3) is obtained.
  • L cyc (G X , G Y ) represents the cycle consistency loss
  • G X () represents the first generator
  • G Y () represents the second generator
  • y LR represents the fourth sample image
  • the third sample image is an image obtained after inputting the fourth sample image to the first generator and the second generator successively.
  • L idt (G Y ) represents the intrinsic loss of G Y ()
  • G Y () represents the second generator
  • y LR represents the fourth sample image.
  • the L1 norm is applied to L cyc (G X , G Y ) and L idt (G Y ) to obtain the first training loss.
  • the first training loss is obtained by the following formula (5).
  • L dom represents the first training loss
  • L adv (G Y , D Y , X LR , Y LR ) represents the adversarial loss of the second generator and the second discriminator
  • L adv (G X , D X , Y LR , X LR ) represents the adversarial loss of the first generator and the first discriminator
  • L cyc (G X , G Y ) represents the cycle consistency loss
  • L cyc (G Y , G (G Y ) represents the intrinsic loss of G Y ()
  • L idt (G X ) represents the intrinsic loss of G X ()
  • ⁇ cyc and ⁇ idt represent the weight parameters.
  • the value of the above weight parameter is 1.
  • the server processes multiple first sample images based on the trained correction layer to obtain multiple third sample images.
  • the server After the server completes training of the correction layer, it processes multiple first sample images based on the correction layer to obtain multiple third sample images required for training the image generation layer.
  • the server uses an iterative method to train the image generation layer in the image generation model based on multiple third sample images.
  • the image generation layer is used to improve the resolution of the image. See steps 307 and 308 below.
  • the image generation layer is trained using the Adam optimizer. After the server trains the correction layer, it starts training the image generation layer. During training, 500 epochs are used, and the mini-batch size is 8.
  • the above-mentioned processing process of the first sample image can be expressed by the above-mentioned formula (1) and the following formula (6).
  • G Y () represents the second generator
  • G X () represents the first generator
  • y LR represents the fourth sample image
  • the resolution of the first sample image is first reduced to obtain the fourth sample image, and then the fourth sample image is input to the first generator and the second generator successively to obtain the third sample image.
  • the low-resolution fourth sample image can first be converted from the second image type to the first image type through the first generator, and then converted from the first image type back to the second image type through the second generator, that is, It is guaranteed that the first sample image, the fourth sample image and the third sample image have the same image type (second image type) and belong to the same data domain (high-resolution data domain), but the first sample image is The high-resolution image, the third sample image and the fourth sample image are all low-resolution images.
  • the server processes the plurality of third sample images through the image generation layer in the image generation model to obtain multiple result images.
  • the server can process the third sample image through the image generation layer to improve the resolution of the third sample image and obtain the third sample image associated with the third sample image. Result image.
  • the architecture of the image generation layer is Volumenet (Turbo Network), which implements a deeper and wider network structure with a parallel convolutional layer layout.
  • the image generation layer includes 9 convolutional layers and 1 layer based on pixel shuffle
  • the upsampling layer of (pixel shuffling) is connected to a fully connected layer after the upsampling layer.
  • the above result image can be expressed as in, represents the third sample image, and U() represents the image generation layer.
  • the server trains the image generation layer based on the differences between the plurality of result images and the plurality of associated first sample images.
  • the server can use the first sample image as supervision information, and associate the first sample image with the first sample image.
  • the difference between the result image and the first sample image is used to adjust the parameters of the image generation layer.
  • This result image represents For the high-resolution reconstruction result of the fourth sample image, by comparing the difference between the high-resolution reconstruction result image and the high-resolution original first sample image, iterative training of the image generation layer can be achieved, and the training goal is To control the minimization of the difference between each result image and the first sample image associated with that result image.
  • a batch of high-resolution images with similar noise and blur kernel distributions are used as multiple first sample images.
  • the first sample image is represented as y HR
  • y HR ⁇ Y HR indicates that the first sample image y HR belongs to image type Y HR .
  • the second sample image is represented as x LR
  • x LR ⁇ X LR represents that the second sample image x LR belongs to the image type X LR .
  • FIG. 4 is a schematic structural diagram of an image generation model provided according to an embodiment of the present application.
  • the image generation model includes a correction layer and an image generation layer.
  • the correction layer includes a first generator G (), the first discriminator D X () and the second discriminator D Y ().
  • the server trains a correction layer in the image generation model based on a plurality of fourth sample images and a plurality of second sample images.
  • the sample image pair includes a first sample image and a second sample image. Then, the resolution of the first sample image is reduced to obtain the second sample image.
  • the fourth sample image and the second sample image are input into the correction layer; for the input fourth sample image, the fourth sample image is processed by the first generator to obtain the first intermediate image G X ( y LR ); for the input second sample image, the second sample image is processed by the second generator to obtain the second intermediate image G Y (x LR ).
  • the first intermediate image and the second sample image are distinguished through the first discriminator, and D X ( G Make a distinction and get D Y (G Y (x LR ), y LR ).
  • the parameters of the first and second generators in the correction layer are adjusted based on D Secondly, after the correction layer is trained, multiple first sample images are processed through the correction layer to obtain multiple third sample images.
  • the third sample image is expressed as Taking a third sample image as an example, the third sample image is processed through the image generation layer to obtain the result image. Based on y HR and Adjust the parameters of the image generation layer.
  • the training loss of the image generation layer is the second training loss, which can be obtained by the following formula (7).
  • L ups represents the second training loss
  • y HR represents the first sample image
  • a second sample image can also be input, and the second sample image is processed based on the trained correction layer to obtain G Y (x LR ).
  • G Y (x LR ) is processed through the image generation layer to obtain U(G Y (x LR )).
  • U(G Y (x LR )) represents the image generated based on x LR . Since x LR has no associated high-resolution image is the reference, so based on y HR and The difference between represents the performance of the model.
  • Figure 5 is an effect display diagram provided according to the embodiment of the present application.
  • the evaluation indicators used are: normalized mean absolute error (Mean Absolute Error, MAE), peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR) and structural similarity (Structural Similarity, SSIM), which are used to evaluate the first The difference between a sample image and the result image.
  • MAE Normalized mean absolute error
  • PSNR Peak Signal to Noise Ratio
  • SSIM structural similarity
  • FIG. 5 exemplarily shows the result image output when four different first sample images are input.
  • the normalized average absolute error is 0.015 ⁇ 0.012
  • the peak signal-to-noise ratio is 34.793 ⁇ 2.246
  • the structural similarity is 0.919 ⁇ 0.005.
  • represents the standard deviation.
  • (b) in Figure 5 exemplarily shows the result image output when four different second sample images are input.
  • the input image and the real image are matched image pairs, the input image is a low-resolution image, and the real image is an associated high-resolution image.
  • the experimental results are shown in Table 1.
  • the numbers in parentheses in Table 1 represent the standard deviation. It can be seen from Table 1 that the solution provided by this application performs better than the traditional solution on the above two data sets and has high applicability and stability.
  • This application provides a training solution for an image generation model.
  • the correction layer can convert the image to a preset image type. Since each third sample The images are all obtained by processing a first sample image based on the correction layer. The higher-resolution first sample image associated with each third sample image is used as supervision information, and each third sample image is used as input. Images are used to train the image generation layer, which can improve the image reconstruction capability of the image generation layer.
  • the image generation model can convert the image type of the sample image, there is no need to Model training can be completed by using training data sets and test data sets of the same image type, which can overcome the problem of lack of training data of the same image type, reduce the training cost of the image generation model, and improve the efficiency of generating images with higher resolution.
  • the solution provided by the embodiment of the present application also integrates lightweight modules into the framework of the image generation model, thereby reducing the number of parameters of the image generation model, reducing the training difficulty and consumption of computing resources of the image generation model.
  • FIG. 7 is a flow chart of an image generation method provided according to an embodiment of the present application. As shown in Figure 7, the image generation method is implemented based on the above image generation model.
  • a computer device is used as a terminal as an example. illustrate. The method includes the following steps:
  • the terminal processes the first image through the correction layer in the image generation model to obtain a second image.
  • the correction layer is used to convert the image type of the input image, and the second image belongs to the target image type.
  • the terminal is the terminal 101 shown in Figure 1, and the terminal can obtain the trained image generation model from the server.
  • the first image is a two-dimensional image or a three-dimensional image to be processed.
  • the terminal can input the first image into the image generation model, and the correction layer in the image generation model can process the first image, transfer the original image type of the first image to the target image type, and obtain the second image.
  • the terminal processes the second image through the image generation layer in the image generation model to obtain the third image.
  • the image generation layer is used to improve the resolution of the input image.
  • the image generation layer in the image generation model can process the second image output by the correction layer by sequentially convolving, upsampling, and fully connecting the input image to improve the second image. resolution, the third image is obtained.
  • an image generation method provided by embodiments of the present application can also be applied to a server.
  • the first terminal sends an image generation request to the server.
  • the server processes the first image carried in the image generation request based on the image generation model to obtain a third image.
  • the server sends the third image to the second terminal, and the second terminal displays it.
  • the second terminal can be the same as the first terminal, or can be different from the first terminal.
  • FIG. 8 is a schematic diagram of an image generation process provided according to an embodiment of the present application.
  • the first terminal 801 sends the first image to the server 802, and the server 802 sends the third image to the second terminal 803.
  • This application provides an image generation solution, by training the correction layer based on the first sample image and the second sample image, so that the correction layer can convert the image to a preset image type, because each third sample image It is obtained based on the correction layer processing a first sample image.
  • the first sample image with higher resolution associated with each third sample image is used as the supervision information, and each third sample image is used as the input image.
  • Training the image generation layer can improve the image reconstruction capability of the image generation layer.
  • the model training can be completed without using the training data set and the test data set of the same image type, which can overcome the problem of lack of training data of the same image type and reduce the cost of image generation.
  • the training cost of the model improves the efficiency of generating images with higher resolution.
  • the solution provided by the embodiment of the present application also integrates lightweight modules into the framework of the image generation model, thereby reducing the number of parameters of the image generation model, reducing the training difficulty and consumption of computing resources of the image generation model. Then the input image is processed through the image generation model, and an image with a higher resolution can be obtained.
  • the processing method is convenient and fast, and the accuracy of the generated image is high, which improves the efficiency of image generation.
  • Figure 9 is a block diagram of a training device for an image generation model provided according to an embodiment of the present application.
  • the device includes: a first training module 901 and a second training module 902.
  • the first training module 901 is used to train a correction layer in the image generation model based on a plurality of first sample images and a plurality of second sample images.
  • the correction layer is used to convert the image type of the input sample image, The resolution of the first sample image is higher than the second sample image;
  • the second training module 902 is used to process the plurality of first sample images based on the trained correction layer to obtain a plurality of third sample images;
  • the second training module 902 is also used to train an image generation layer in the image generation model based on the plurality of first sample images and the plurality of third sample images.
  • the image generation layer is used to improve the input The resolution of the sample image.
  • Figure 10 is a block diagram of another training device for an image generation model provided according to an embodiment of the present application.
  • the first training module 901 includes:
  • the image processing unit 9011 is used to determine multiple sample image pairs, each sample image pair includes a first sample image and a second sample image; process the first sample image in each sample image pair to obtain a fourth sample image, the resolution of the fourth sample image is lower than that of the first sample image;
  • the first correction unit 9012 is configured to process the fourth sample image based on the correction layer in the image generation model to obtain a first intermediate image, where the first intermediate image belongs to the first image type;
  • the second correction unit 9013 is configured to process the second sample image in the sample image pair based on the correction layer in the image generation model to obtain a second intermediate image, where the second intermediate image belongs to the second image type;
  • the training unit 9014 is configured to train the correction layer in the image generation model based on the first intermediate image, the second sample image, the fourth sample image and the second intermediate image associated with each sample image pair.
  • the image processing unit 9011 is used to add Gaussian noise to the first sample image; reduce the resolution of the first sample image after adding Gaussian noise to obtain the first sample image associated Fourth sample image.
  • the correction layer in the image generation model includes a first generator for converting the image from the second image type to the first image type; the first correction unit 9012 is used to convert the image from the second image type to the first image type.
  • the fourth sample image is convolved by the first generator to obtain a first intermediate image associated with the fourth sample image.
  • the correction layer in the image generation model includes a second generator, the second generator is used to convert the image from the first image type to the second image type; the second correction unit 9013, with The second sample image is convolved by the second generator to obtain a second intermediate image associated with the second sample image.
  • the correction layer in the image generation model includes a first discriminator and a second discriminator, both of which are used to determine the similarity between images; the training unit 9014, used to use the first discriminator to distinguish the first intermediate image and the second sample image associated with the first intermediate image, and obtain the similarity between the first intermediate image and the second sample image; by The second discriminator distinguishes the second intermediate image and the fourth sample image associated with the second intermediate image, and obtains the similarity between the second intermediate image and the fourth sample image; through the first discriminator The multiple similarities output and the multiple similarities output by the second discriminator are used to train the correction layer in the image generation model.
  • the second training module 902 is used to process each third sample image separately through the image generation layer in the image generation model to obtain a result image; based on each result image being associated with the result image The difference between the first sample images is used to train the image generation layer.
  • This application provides a training solution for an image generation model.
  • the correction layer can convert the image to a preset image type. Since each third sample The images are all obtained by processing a first sample image based on the correction layer. The higher-resolution first sample image associated with each third sample image is used as supervision information, and each third sample image is used as input. Images are used to train the image generation layer, which can improve the image reconstruction ability of the image generation layer.
  • the image generation model can convert the image type of the sample image, the model training can be completed without using the training data set and the test data set of the same image type, which can overcome the problem of lack of training data of the same image type and reduce the cost of image generation.
  • the training cost of the model improves the efficiency of generating images with higher resolution.
  • the solution provided by the embodiment of the present application also integrates lightweight modules into the framework of the image generation model, thereby reducing the number of parameters of the image generation model, reducing the training difficulty and consumption of computing resources of the image generation model.
  • Figure 11 is a block diagram of an image generation device provided according to an embodiment of the present application.
  • the device includes: a correction module 1101 and an image generation module 1102.
  • the correction module 1101 is used to process the first image through the correction layer in the image generation model to obtain a second image.
  • the correction layer is used to convert the image type of the input image, and the second image belongs to the target image type;
  • the image generation module 1102 is used to process the second image through the image generation layer in the image generation model to obtain a third image.
  • the image generation layer is used to improve the resolution of the input image.
  • the image generation device provided in the above embodiment performs model training, only the division of the above functional modules is used as an example. In actual applications, the above function allocation can be completed by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the image generation device provided by the above embodiments and the image generation method embodiments belong to the same concept. Please refer to the method embodiments for the specific implementation process, which will not be described again here.
  • the computer device can be configured as a terminal or a server.
  • the terminal can be used as the execution subject to implement the technical solution provided by the embodiment of the present application.
  • the technical solution provided by the embodiments of this application can be implemented by the server as the execution subject, or the technical solution provided by this application can be implemented through the interaction between the terminal and the server, which is not limited by the embodiments of this application.
  • FIG. 12 is a structural block diagram of a terminal 1200 provided according to an embodiment of the present application.
  • the terminal 1200 may be called a user device, a portable terminal, a laptop terminal, a desktop terminal, and other names.
  • the terminal 1200 includes: a processor 1201 and a memory 1202.
  • the processor 1201 includes one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor 1201 is implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor 1201 includes a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit);
  • a coprocessor is a low-power processor used to process data in standby mode.
  • the processor 1201 is integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 1201 also includes an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 1202 includes one or more computer-readable storage media that are non-transitory. Memory 1202 may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1202 is used to store at least one computer program, and the at least one computer program is used to be executed by the processor 1201 to implement the methods provided by the method embodiments in this application. A training method for an image generation model or an image generation method.
  • the terminal 1200 optionally further includes: a peripheral device interface 1203 and at least one peripheral device.
  • the processor 1201, the memory 1202 and the peripheral device interface 1203 are connected through a bus or a signal line.
  • Each peripheral device is connected to the peripheral device interface 1203 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a display screen 1205 and a camera assembly 1206.
  • the peripheral device interface 1203 is used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1201 and the memory 1202 .
  • the processor 1201, the memory 1202, and the peripheral device interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1201, the memory 1202, and the peripheral device interface 1203 or Both are implemented on separate chips or circuit boards, which is not limited in this embodiment.
  • the display screen 1205 is used to display UI (User Interface, user interface).
  • the UI includes graphics, text, icons, videos, and any combination thereof.
  • display screen 1205 is a touch display screen, display screen 1205 also has the ability to collect touch signals on or above the surface of display screen 1205 .
  • the touch signal is input to the processor 1201 as a control signal for processing.
  • the display screen 1205 is also used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1205 there is one display screen 1205, which is provided on the front panel of the terminal 1200; in other embodiments, the display screen 1205 is There are at least two display screens 1205, which are respectively arranged on different surfaces of the terminal 1200 or have a folding design; in other embodiments, the display screen 1205 is a flexible display screen, which is arranged on a curved surface or a folding surface of the terminal 1200. Even the display screen 1205 is set in a non-rectangular irregular shape, that is, a special-shaped screen.
  • the display screen 1205 is made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera component 1206 is used to capture images or videos.
  • camera assembly 1206 includes a front-facing camera and a rear-facing camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • there are at least two rear cameras one of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the integration of the main camera and the depth-of-field camera to realize the background blur function.
  • camera assembly 1206 also includes a flash.
  • the flash is a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, used for light compensation under different color temperatures.
  • Figure 12 does not constitute a limitation on the terminal 1200, and it can include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
  • FIG. 13 is a schematic structural diagram of a server according to an embodiment of the present application.
  • the server 1300 may vary greatly due to different configurations or performance, and may include one or more processors 1301 and one or more memories 1302, wherein at least one computer program is stored in the memory 1302, and the at least one computer program is loaded and executed by the processor 1301 to implement the training method of the image generation model provided by the above method embodiments. Or image generation method.
  • the server also has components such as wired or wireless network interfaces, keyboards, and input and output interfaces for input and output.
  • the server also includes other components for realizing device functions, which will not be described again here.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores at least one computer program.
  • the at least one computer program is loaded and executed by the processor of the computer device to implement the images of the above embodiments.
  • Generative model training methods or image generation methods are included in the computer-readable storage medium.
  • the computer-readable storage medium is read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), Tapes, floppy disks and optical data storage devices, etc.
  • the computer program involved in the embodiments of the present application may be deployed and executed on one computer device, or executed on multiple computer devices located in one location, or distributed in multiple locations and communicated through It is executed on multiple computer devices interconnected by the network.
  • Multiple computer devices distributed in multiple locations and interconnected through the communication network form a blockchain system.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes computer program code, and the computer program code is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device performs the training method or image generation of the image generation model provided in the above various optional implementations. method.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium obtained can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种图像生成模型的训练方法、图像生成方法、装置及设备,涉及图像处理技术领域。方法包括:基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练(201);基于训练完毕的校正层,对多个第一样本图像进行处理(202;306),得到多个第三样本图像;基于多个第一样本图像和多个第三样本图像,对图像生成模型中的图像生成层进行训练(203)。

Description

图像生成模型的训练方法、图像生成方法、装置及设备
本申请要求于2022年04月22日提交的申请号为202210431484.X、发明名称为“图像生成模型的训练方法、图像生成方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别涉及一种图像生成模型的训练方法、图像生成方法、装置及设备。
背景技术
高分辨率的三维医学图像能够更清晰的展示细节信息,有助于医学诊断和分析,在医学领域具有重要的作用。由于高分辨率的三维图像的成像成本较高且需要较长的采集时间,因此直接获取高分辨率的三维图像的难度较大。目前,通常采用基于低分辨率的三维图像生成高分辨率的三维图像的方式,来间接获取高分辨率的三维图像。
发明内容
本申请实施例提供了一种图像生成模型的训练方法、图像生成方法、装置及设备。
一方面,提供了一种图像生成模型的训练方法,所述方法包括:基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,所述校正层用于转换所输入的样本图像的图像类型,所述第一样本图像的分辨率高于所述第二样本图像;基于训练完毕的所述校正层,对所述多个第一样本图像进行处理,得到多个第三样本图像;基于所述多个第一样本图像和所述多个第三样本图像,对所述图像生成模型中的图像生成层进行训练,所述图像生成层用于提高所输入的样本图像的分辨率。
另一方面,提供了一种图像生成方法,所述方法包括:通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,所述校正层用于转换所输入的图像的图像类型,所述第二图像属于目标图像类型;通过所述图像生成模型中的图像生成层对所述第二图像进行处理,得到第三图像,所述图像生成层用于提高所输入的图像的分辨率。
另一方面,提供了一种图像生成模型的训练装置,所述装置包括:第一训练模块,用于基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,所述校正层用于转换所输入的样本图像的图像类型,所述第一样本图像的分辨率高于所述第二样本图像;第二训练模块,用于基于训练完毕的所述校正层,对所述多个第一样本图像进行处理,得到多个第三样本图像;所述第二训练模块,还用于基于所述多个第一样本图像和所述多个第三样本图像,对所述图像生成模型中的图像生成层进行训练,所述图像生成层用于提高所输入的样本图像的分辨率。
另一方面,提供了一种图像生成装置,所述装置包括:校正模块,用于通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,所述校正层用于转换所输入的图像的图像类型,所述第二图像属于目标图像类型;图像生成模块,用于通过所述图像生成模型中的图像生成层对所述第二图像进行处理,得到第三图像,所述图像生成层用于提高所输入的图像的分辨率。
另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器用于存储至少一段计算机程序,所述至少一段计算机程序由所述处理器加载并执行以实现本申请实施例中的图像生成模型的训练方法或图像生成方法。
另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一段计算机程序,所述至少一段计算机程序由处理器加载并执行以实现如本申请实施例中图像 生成模型的训练方法或图像生成方法。
另一方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序被处理器执行时实现上述各个方面或者各个方面的各种可选实现方式中提供的图像生成模型的训练方法或图像生成方法。
附图说明
图1是根据本申请实施例提供的一种图像生成模型的训练方法的实施环境示意图;
图2是根据本申请实施例提供的一种图像生成模型的训练方法的流程图;
图3是根据本申请实施例提供的另一种图像生成模型的训练方法的流程图;
图4是根据本申请实施例提供的一种图像生成模型的结构示意图;
图5是根据本申请实施例提供的一种效果展示图;
图6是根据本申请实施例提供的一种实验效果图;
图7是根据本申请实施例提供的一种图像生成方法的流程图;
图8是根据本申请实施例提供的一种图像生成流程的示意图;
图9是根据本申请实施例提供的一种图像生成模型的训练装置的框图;
图10是根据本申请实施例提供的另一种图像生成模型的训练装置的框图;
图11是根据本申请实施例提供的一种图像生成装置的框图;
图12是根据本申请实施例提供的一种终端的结构框图;
图13是根据本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。
本申请中术语“至少一个”是指一个或多个,“多个”的含义是指两个或两个以上。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请中涉及到的样本图像都是在充分授权的情况下获取的。
基于低分辨率的三维图像生成高分辨率的三维图像的方式,较为依赖人工匹配好的训练数据集和测试数据集。由于人工匹配好的数据集较为缺乏,因此生成高分辨率的三维图像的效率较低。
以下,对本申请涉及的术语进行解释。
数据域(Domain)也称值域。域是用来描述一个字段的技术属性的集合,包括数据类型,数据长度,小数点位数以及取值范围等。具有以上技术属性相同定义的字段能够包含进同一个域,而当域的属性定义发生改变时,所有引用它的字段的属性都会进行提示。
体素是体积元素(Volume Pixel)的简称,包含体素的立体能够通过立体渲染或者提取给定阈值轮廓的多边形等值面表现出来。一如其名,是数字数据于三维空间分割上的最小单位,体素用于三维成像、科学数据与医学影像等领域。概念上类似二维空间的最小单位——像素,像素用在二维计算机图像的影像数据上。
空间分辨率是指像素所代表的地面范围的大小,即扫描仪的瞬时视场,或地面物体能分辨的最小单元。换一种表述,空间分辨率是指遥感影像上能够识别的两个相邻地物的最小距离。对于摄影影像,通常用单位长度内包含可分辨的黑白“线对”数表示(线对/毫米);对 于扫描影像,通常用瞬时视场角。空间分辨率是评价传感器性能和遥感信息的重要指标之一,也是识别地物形状大小的重要依据。空间分辨率直观的理解就是通过仪器能够识别物体的临界几何尺寸。
生成式对抗网络(GAN,Generative Adversarial Networks)是一种深度学习模型,是近年来复杂分布上无监督学习最具前景的方法之一。模型通过框架中(至少)两个模块:生成模型(Generative Model)和判别模型(Discriminative Model)的互相博弈学习,产生相当好的输出GAN。
本申请实施例提供了一种图像生成模型的训练方案,由计算机设备执行,该计算机设备被提供为终端或服务器。下面以计算机设备为服务器为例,介绍一下本申请实施例提供的方案的实施环境,图1是根据本申请实施例提供的一种图像生成模型的训练方法的实施环境示意图。参见图1,该实施环境包括终端101和服务器102。
终端101和服务器102能够通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
在一些实施例中,终端101是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能语音交互设备、智能家电以及车载终端等,但并不局限于此。终端101安装和运行有用于图像处理的应用程序,该图像为二维图像或者三维图像。在一些实施例中,该应用程序将待处理的图像发送至服务器102,展示服务器102返回的图像;或者,该应用程序从服务器102获取训练完毕的图像生成模型,基于该图像生成模型对待处理的图像进行处理。本领域技术人员能够知晓,终端101的数量能够更多或更少。比如上述终端101为一个,或者上述终端为几十个或几百个,或者更多数量。本申请实施例对终端101的数量和设备类型不加以限定。
在一些实施例中,服务器102是独立的物理服务器,或者是多个物理服务器构成的服务器集群或者分布式系统,或者是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)以及大数据和人工智能平台等基础云计算服务的云服务器。服务器102用于为上述用于图像处理的应用程序提供后台服务。在一些实施例中,服务器102承担主要图像处理工作,终端101承担次要图像处理工作;或者,服务器102承担次要图像处理工作,终端101承担主要图像处理工作;或者,服务器102和终端101二者之间采用分布式计算架构进行协同图像处理。
在一个示例性场景中,以终端101安装和运行有用于查看三维医学图像的应用程序为例,服务器102从多个终端101中获取已授权的多个三维医学图像,其中,该多个三维医学图像中包括第一分辨率的三维医学图像和第二分辨率的三维医学图像,第一分辨率高于第二分辨率。然后服务器102以获取到的多个三维医学图像为样本图像,基于本申请实施例提供的图像生成模型的训练方法,训练得到图像生成模型。服务器102能够基于该图像生成模型,对终端101上传的第二分辨率的三维医学图像进行处理,生成第一分辨率的三维医学图像。然后,服务器102将生成的三维医学图像返回至终端101,由终端101进行展示。
在本申请实施例中,计算机设备能够被配置为终端或服务器。图2是根据本申请实施例提供的一种图像生成模型的训练方法的流程图,如图2所示,在本申请实施例中以由服务器执行为例进行说明。该方法包括以下步骤:
201、服务器基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,该校正层用于转换所输入的样本图像的图像类型,该第一样本图像的分辨率高于该第二样本图像。
在本申请实施例中,该服务器为图1所示的服务器102,该第一样本图像和第二样本图像为二维图像或者三维图像。第一样本图像的分辨率高于第二样本图像的分辨率。如,第一 样本图像为第一分辨率的样本图像,而第二样本图像为第二分辨率的样本图像,其中,第一分辨率高于第二分辨率,本申请实施例对第一分辨率和第二分辨率不进行限制。另外,第一样本图像和第二样本图像的图像类型不同,服务器能够基于两种不同分辨率、不同图像类型的样本图像来训练校正层,以使训练完毕的校正层,能够将图像由第二样本图像所属的图像类型,转换至第一样本图像的图像类型。
在一些实施例中,第一样本图像和第二样本图像是成对出现的,即,服务器基于多个样本图像对,来训练图像生成模型中的校正层,每个样本图像对中包含一个第一样本图像和一个第二样本图像,第一样本图像和第二样本图像具有不同的图像类型,且第一样本图像的分辨率高于第二样本图像的分辨率。这样,相当于多个第一样本图像和多个第二样本图像,组成了多个样本图像对,每个样本图像对中的两个样本图像具有相同或者不同的图像内容,但需要保持每个样本图像对中的两个样本图像是同时输入到图像生成模型的,即每个样本图像对中的第一样本图像和第二样本图像具有关联关系,这一关联关系指示第一样本图像和第二样本图像是同时输入到图像生成模型中的。
202、服务器基于训练完毕的该校正层,对该多个第一样本图像进行处理,得到多个第三样本图像。
在本申请实施例中,对于任一第一样本图像,服务器基于训练完毕的校正层对该第一样本图像进行处理,得到该第一样本图像关联的第三样本图像,第三样本图像和第一样本图像之间的关联关系指示第三样本图像是由第一样本图像生成的。第三样本图像的分辨率为第二分辨率,即第三样本图像的分辨率低于第一样本图像的分辨率,第三样本图像所属的图像类型与第一样本图像相同,这样,能够通过训练完毕的校正层,在不改变图像类型的前提下,降低每个第一样本图像的分辨率,得到一个第三样本图像。
203、服务器基于该多个第一样本图像和该多个第三样本图像,对该图像生成模型中的图像生成层进行训练,该图像生成层用于提高所输入的样本图像的分辨率。
在本申请实施例中,对于任一第三样本图像,服务器基于图像生成层对该第三样本图像进行处理,以提高该第三样本图像的分辨率。在一些实施例中,服务器以第三样本图像关联的第一样本图像为监督信息,对图像生成层进行训练,以提高图像生成层生成图像的准确度。
本申请提供了一种图像生成模型的训练方案,通过基于第一样本图像和第二样本图像来训练校正层,使得校正层能够将图像转换至预设的图像类型,由于每个第三样本图像都是基于校正层对一个第一样本图像处理得到的,以每个第三样本图像关联的、分辨率较高的第一样本图像为监督信息,以每个第三样本图像为输入图像来训练图像生成层,能够提高图像生成层的图像重建能力。并且,由于图像生成模型能够转换样本图像的图像类型,使得不需要使用图像类型相同的训练数据集和测试数据集即可完成模型训练,能够克服图像类型相同的训练数据缺乏的问题,降低图像生成模型的训练成本,提高生成分辨率较高的图像的效率。
上述图2示例性的示出了本申请提供的图像生成模型的训练方案,下面基于一种应用场景,来对上述方案进行进一步的说明。在该应用场景中,以训练用于生成三维图像的图像生成模型为例进行说明。图3是根据本申请实施例提供的另一种图像生成模型的训练方法的流程图,如图3所示,在本申请实施例中以计算机设备为服务器为例进行说明。该方法包括以下步骤:
301、服务器获取多个第一样本图像和多个第二样本图像,该第一样本图像的分辨率高于第二样本图像。
在本申请实施例中,服务器获取三维图像集,该三维图像集包括多个三维图像,服务器对三维图像集中的多个三维图像进行预处理,得到多个第一样本图像和多个第二样本图像,保证第一样本图像的分辨率高于第二样本图像的分辨率。这样,通过对三维图像集中的图像数据进行预处理,使得获取的第一样本图像和第二样本图像的参数能够满足输入模型的条件。
例如,以三维图像集为人类连接组计划项目(Human Connectome Project,HCP)为例, HCP图像集中含有1000多个大脑结构样本。三维图像集中既包括空间分辨率为0.7mm,体素为320*320*256的高分辨率图像,也包括空间分辨率为2.0mm,体素为113*136*113的低分辨率图像。服务器从高分辨率图像中随机分割得到体素为80*80*80的图像作为第一样本图像,从低分辨率图像中随机分割得到体素为40*40*40的图像作为第二样本图像,这样保证了随机分割所得的第一样本图像的空间分辨率必然高于第二样本图像的空间分辨率,并且,第一样本图像和第二样本图像的体素尺寸不同。可选地,服务器在将第一样本图像和第二样本图像输入模型之前,还能够绕样本图像的空间轴,对样本图像进行随机翻转,翻转的角度为90°、180°或者270°,本申请实施例对此不进行限制。
在一些实施例中,服务器基于多个第一样本图像和多个第二样本图像,采用迭代的方式,对图像生成模型中的校正层进行训练,该校正层用于转换所输入的图像所属的图像类型。其中,该校正层的结构为生成式对抗网络。参见下述步骤302至步骤305所示。例如,校正层采用Adam优化器进行训练,Adam优化器的参数为beta1=0.9,beta2=0.999。学习率初始化为10-4,每100个epoch(代)将学习率减少为之前的三分之一。服务器训练校正层时采用300多个epoch,mini-batch(最小批)大小为4。
在一些实施例中,第一样本图像和第二样本图像是成对出现的,即,服务器基于多个样本图像对,来训练图像生成模型中的校正层,每个样本图像对中包含一个第一样本图像和一个第二样本图像,第一样本图像和第二样本图像具有不同的图像类型,且第一样本图像的分辨率高于第二样本图像的分辨率。这样,相当于多个第一样本图像和多个第二样本图像,组成了多个样本图像对,每个样本图像对中的两个样本图像具有相同或者不同的图像内容,但需要保持每个样本图像对中的两个样本图像是同时输入到图像生成模型的,即每个样本图像对中的第一样本图像和第二样本图像具有关联关系,这一关联关系指示第一样本图像和第二样本图像是同时输入到图像生成模型中的。
302、服务器对多个第一样本图像分别进行处理,得到多个第四样本图像,该第四样本图像的分辨率低于该第一样本图像。
在本申请实施例中,对于任一第一样本图像,服务器降低该第一样本图像的分辨率,得到该第一样本图像关联的第四样本图像,第一样本图像和第四样本图像之间的关联关系指示第四样本图像是由第一样本图像变换生成的。可选地,将步骤301中获取到的多个第一样本图像和多个第二样本图像,划分成多个样本图像对,每个样本图像对包括一个第一样本图像和一个第二样本图像,接着,对每个样本图像对中的第一样本图像进行处理,得到一个第四样本图像,使得第四样本图像的分辨率低于第一样本图像的分辨率。
在一些实施例中,服务器先在样本图像中添加噪声,再降低样本图像的分辨率。换一种表述,对于任一第一样本图像,服务器对该第一样本图像添加高斯噪声,然后,降低添加高斯噪声后的第一样本图像的分辨率,得到该第一样本图像关联的第四样本图像。需要说明的是,服务器对添加高斯噪声后的第一样本图像进行图像插值、下采样或者其他方式,来降低第一样本图像的分辨率,以得到该第一样本图像关联的第四样本图像,本申请实施例对此不进行限制。通过在样本图像中添加高斯噪声,使得基于该样本图像训练得到的图像生成模型,能够适用于图像包括噪声的情况,从而提高图像生成模型的准确率。
在一些实施例中,上述对第一样本图像的处理过程,能够通过下述公式(1)表示。
yLR=D↓(yHR+n)      (1)
其中,yLR表示第四样本图像,D↓()表示插值函数,yHR表示第一样本图像,n表示高斯噪声。
在一些实施例中,本步骤302由服务器基于图像生成模型的校正层实现,即服务器将每个第一样本图像输入校正层后,由校正层对该第一样本图像进行处理;或者本步骤302由服务器基于图像处理层实现,即服务器将第一样本图像输入图像处理层,由图像处理层对该第一样本图像进行处理。
303、服务器基于该图像生成模型中的校正层,对该多个第四样本图像分别进行处理,得 到多个第一中间图像,该多个第一中间图像属于第一图像类型。
在本申请实施例中,第四样本图像属于第二图像类型。对于任一第四样本图像,服务器通过校正层,对该第四样本图像的图像类型进行转换,得到该第四样本图像关联的第一中间图像。通过转换第四样本图像的图像类型,能够基于转换的准确率来对校正层进行训练,使得校正层能够准确地将图像转换至第一图像类型。
在一些实施例中,图像的图像类型不相同,则图像所属的数据域也不同。如CT(Computed Tomography,电子计算机断层扫描)图像和MRI(Magnetic Resonance Imaging,磁共振成像)图像为不同类型的图像,且CT图像和MRI图像所属的数据域不同。因此,图像所属的图像类型,也能够被称为图像所属的数据域,同一数据域内的各个图像的噪声、模糊内核分布相同或相近。
换一种表述,高分辨率的第一样本图像属于第二图像类型,低分辨率的第二样本图像属于第一图像类型,第一样本图像在进行分辨率降解以后所得的第四样本图像,其图像类型不会发生变化,因此低分辨率的第四样本图像也属于第二图像类型。而在本步骤303中,利用校正层,能够将第二图像类型的第四样本图像转换成第一图像类型的第一中间图像,换言之,将低分辨率的第四样本图像,从第一样本图像所属的高分辨率的数据域,校正到第二样本图像所属的低分辨率的数据域。
在一些实施例中,图像生成模型中的校正层包括第一生成器,该第一生成器用于将图像由第二图像类型转换至第一图像类型,换言之,第一生成器用于将图像从一个数据域转换到另一个数据域。服务器能够基于第一生成器对第四样本图像进行处理。相应的,对于任一第四样本图像,服务器通过第一生成器对第四样本图像进行卷积,得到该第四样本图像关联的第一中间图像。其中,该第一生成器的主干网络为三维卷积神经网络,该主干网络通过残差连接的方式连接多个密集块,密集块内包括多个三维卷积层。通过设置第一生成器,使得校正层能够采用卷积的方式对样本图像进行处理,以提高转换图像的图像类型的准确率。
在一些实施例中,第一生成器由三个密集块组成,密集块包括四个三维卷积层。除神经网络的第一层三维卷积层和最后一层三维卷积层外,将三维卷积层的三维卷积核替换为轻量型的卷积模块:队列模块(queue module),以在保持原有性能的前提下,减少第一生成器的参数量,进而能够减少校正层的训练难度和校正层使用时消耗的计算资源。
在一些实施例中,上述第一中间图像能够表示为GX(yLR),其中,yLR表示第四样本图像,GX()表示第一生成器。
需要说明的是,本申请实施例中的多个第一中间图像为训练过程中的中间图像,为便于描述认为该多个第一中间图像属于第一图像类型。然而在训练的初始阶段,校正层并不能准确地将第四样本图像转换至第一图像类型,此时得到的第一中间图像并不属于第一图像类型。而随着训练的进行,校正层能够准确地将第四样本图像转换至第一图像类型,此时得到的第一中间图像属于第一图像类型。
304、服务器基于该图像生成模型中的校正层,对多个第二样本图像分别进行处理,得到多个第二中间图像,该多个第二中间图像属于第二图像类型。
在本申请实施例中,第二样本图像属于第一图像类型。对于任一第二样本图像,服务器能够通过校正层,对该第二样本图像的图像类型进行转换,得到该第二样本图像关联的第二中间图像。通过转换第二样本图像的图像类型,能够基于转换的准确率来对校正层进行训练,使得校正层能够准确地将图像转换至第二图像类型。
换一种表述,对于每个样本图像对,该样本图像对中包括一对样本图像:高分辨率的第一样本图像,和低分辨率的第二样本图像,其中,第一样本图像属于第二图像类型,即属于高分辨率的数据域,第二样本图像属于第一图像类型,即属于低分辨率的数据域。在训练阶段中,对每个样本图像对来说,将该样本图像对中高分辨率的第一样本图像进行图像降解,得到低分辨率的第四样本图像,第四样本图像的图像类型不变,因此第四样本图像也属于第二图像类型,即属于高分辨率的数据域,接着,再对第四样本图像实施域偏移,将第四样本 图像从第二图像类型转换到第一图像类型,得到第一中间图像,此时第一中间图像不但是低分辨率图像,而且还被校正到了低分辨率的数据域。同理,对该样本图像对中的低分辨率的第二样本图像实施域偏移,将第二样本图像从第一图像类型转换到第二图像类型,得到第二中间图像,此时第二中间图像虽然是低分辨率图像,但被校正到了高分辨率的数据域。这样是为了训练校正层对于低分辨率图像实施域偏移的能力,使得校正层不但能够学习到将低分辨率图像从第一图像类型转换到第二图像类型的能力,也能够学习到将低分辨率图像从第二图像类型转换到第一图像类型的能力,实现在两个数据域之间的准确校正。
在一些实施例中,图像生成模型中的校正层包括第二生成器,该第二生成器用于将图像由第一图像类型转换至第二图像类型。服务器能够通过第二生成器对第二样本图像进行处理。相应的,对于任一第二样本图像,服务器能够通过第二生成器对该第二样本图像进行卷积,得到该第二样本图像关联的第二中间图像。其中,该第二生成器的主干网络为三维卷积神经网络,该主干网络通过残差连接的方式连接多个密集块,密集块内包括多个三维卷积层。通过设置第二生成器,使得校正层能够采用卷积的方式对样本图像进行处理,以提高转换图像的图像类型的准确率。
在一些实施例中,第二生成器由三个密集块组成,密集块包括四个三维卷积层。除神经网络的第一层三维卷积层和最后一层三维卷积层外,将三维卷积层的三维卷积核替换为轻量型的卷积模块:队列模块(queue module),以在保持原有性能的前提下,减少第二生成器的参数量,进而能够减少校正层的训练难度和校正层使用时消耗的计算资源。
在一些实施例中,上述第二中间图像能够表示为GY(xLR),其中,xLR表示第二样本图像,GY()表示第二生成器。
需要说明的是,本申请实施例中的多个第二中间图像为训练过程中的中间图像,为便于描述认为该多个第二中间图像属于第二图像类型。然而在训练的初始阶段,校正层并不能准确地将第二样本图像转换至第二图像类型,此时得到的第二中间图像并不属于第二图像类型。而随着训练的进行,校正层能够准确地将第二样本图像转换至第二图像类型,此时得到的第二中间图像属于第二图像类型。
305、服务器基于多个第一中间图像、多个第二样本图像、多个第四样本图像以及多个第二中间图像,对该图像生成模型中的校正层进行训练。
在本申请实施例中,校正层的训练目标是使得第一中间图像和第二样本图像属于同一个图像类型,第二中间图像和第四样本图像属于同一个图像类型。服务器能够通过多个第一中间图像与多个第二样本图像之间的差异,以及多个第四样本图像与多个第二中间图像之间的差异,来训练校正层,使得校正层能够准确地将图像转换至其他图像类型。
换一种表述,以每个样本图像对{第一样本图像,第二样本图像}为单位,其中,第一样本图像是高分辨率图像,属于高分辨率数据域(第二图像类型),第二样本图像是低分辨率图像,属于低分辨率数据域(第一图像类型),对每个样本图像对中的第一样本图像执行上述步骤302-303,对该样本图像对中的第二样本图像执行上述步骤304以后,能够得到每个样本图像对所关联的4个低分辨率图像:第一中间图像、第二样本图像、第四样本图像以及第二中间图像,其中,第四样本图像是由第一样本图像直接图像降解所得的低分辨率图像,属于高分辨率数据域,第一中间图像是由第四样本图像经过校正层进行域偏移所得的低分辨率图像,属于低分辨率数据域,第二中间图像是由第二样本图像经过校正层进行域偏移所得的低分辨率图像,属于高分辨率数据域。因此,校正层的训练目标为:通过每个样本图像对所关联的第一中间图像、第二样本图像、第四样本图像以及第二中间图像,对图像生成模型中的校正层进行训练,使得同属于低分辨率数据域的2个低分辨率图像{第二样本图像,第一中间图像}(两者均属于第一图像类型)差异最小化,并使得同属于高分辨率数据域的2个低分辨率图像{第四样本图像,第二中间图像}(两者均属于第二图像类型)差异最小化。
在一些实施例中,图像生成模型中的校正层包括第一判别器和第二判别器,该第一判别器和第二判别器均用于确定图像之间的相似度。其中,对于任一判别器,对于来自于生成器 的图像,通过该判别器能够得到关联的判别信息,同理对于真实的图像,通过判别器也能够得到关联的判别信息,上述两种判别信息,能够反映来自于生成器的图像和真实图像之间的相似度。基于上述两种判别信息,来确定生成器和判别器的对抗损失。相应的,服务器能够通过第一判别器对多个第一中间图像和多个第二样本图像进行处理,通过第二判别器对多个第二中间图像和多个第四样本图像进行处理。相应的,对于任一第一中间图像,服务器能够通过第一判别器对该第一中间图像和该第一中间图像关联的第二样本图像进行判别,得到该第一中间图像与该第二样本图像之间的相似度。对于任一第二中间图像,服务器能够通过第二判别器对第二中间图像和第二中间图像关联的第四样本图像进行判别,得到该第二中间图像和该第四样本图像之间的相似度。最后,服务器能够通过多个第一中间图像和多个第二样本图像之间的相似度以及多个第二中间图像和多个第四样本图像之间的相似度,对图像生成模型中的校正层进行训练。其中,第一判别器和第二判别器为PatchGAN(全卷积网络的判别器)构建。通过设置第一判别器和第二判别器,使得通过第一判别器对属于第一图像类型的第二样本图像和转换至第一图像类型的第一中间图像之间的差异进行判别,通过上述第二判别器对属于第二图像类型的第四样本图像和转换至第二图像类型的第二中间图像之间的差异进行判别,基于判别的结果来调整校正层的参数,从而提高校正层转换图像的图像类型的准确率。
换一种表述,为了达到上述校正层的训练目标:使得同属于低分辨率数据域的2个低分辨率图像{第二样本图像,第一中间图像}(两者均属于第一图像类型)差异最小化,并使得同属于高分辨率数据域的2个低分辨率图像{第四样本图像,第二中间图像}(两者均属于第二图像类型)差异最小化,因此,在校正层中配置2个判别器,第一判别器用于判别同属于第一图像类型的{第二样本图像,第一中间图像}之间的相似度,第二判别器用于判别同属于第二图像类型的{第四样本图像,第二中间图像}之间的相似度,这样,对于每个样本图像对,在获取到了该样本图像对关联的4个低分辨率图像的基础上,还能够通过2个判别器计算出来2个相似度,即,根据第一判别器计算出来低分辨率数据域中2个低分辨率图像的相似度,以及根据第二判别器计算出来高分辨率数据域中2个低分辨率图像的相似度,这样,根据第一判别器对多个样本图像对输出的多个相似度,以及第二判别器对多个样本图像对输出的多个相似度,能够实现对图像生成模型中的校正层进行训练。
在一些实施例中,第一判别器和第二判别器的结构相同。第一判别器包括六个卷积层,卷积步长均为1,卷积层连接LeakyRelu(激活函数)层。除第一层和最后一层外,卷积层被轻量型的卷积模块替代以在保持原有性能的前提下,减少第一判别器的参数量,进而能够减少校正层的训练难度和校正层使用时消耗的计算资源,参见生成器,在此不再赘述。需要说明的是LeakyRelu层未连接归一化层。
需要说明的是,在训练校正层时,基于Wasserstein(瓦斯斯坦)距离和梯度惩罚来防止梯度爆炸或者消失,第二生成器和第二判别器的对抗损失通过以下公式(2)得到。
其中,Ladv(GY,DY,XLR,YLR)表示第二生成器和第二判别器的对抗损失,表示x服从P(XLR)的数据分布,P(XLR)表示xLR在图像类型XLR中的数据分布,表示y服从P(YLR)的数据分布,P(YLR)表示yLR在图像类型YLR中的数据分布,λ表示权重参数,Ex~P(penalty)表示x服从P(penalty)的数据分布,P(penalty)表示xLR的梯度惩罚分布,表示梯度,DY()表示第二判别器。在一个示例中,图像类型XLR表示第一图像类型即低分辨率数据域,图像类型YLR表示第二图像类型即高分辨率数据域。
需要说明的是,假设校正层能够将yLR转换至图像类型XLR,然后再转换回图像类型YLR,同理能够将xLR转换至图像类型YLR,然后再转换回图像类型XLR。循环一致性损失通过以下 公式(3)得到。
其中,Lcyc(GX,GY)表示循环一致性损失,GX()表示第一生成器,GY()表示第二生成器,表示第三样本图像,yLR表示第四样本图像。其中,第三样本图像是将第四样本图像先后输入到第一生成器、第二生成器以后所得到的图像。
需要说明的是,训练过程中还引入本征损失(identity loss)来保持第二生成器的本征对比度(intrinsic contrast)和亮度(brightness)。本征损失通过以下公式(4)得到。
Lidt(GY)=||GY(yLR)-yLR||       (4)
其中,Lidt(GY)表示GY()的本征损失,GY()表示第二生成器,yLR表示第四样本图像。
需要说明的是,在训练过程中,将L1范数应用于Lcyc(GX,GY)和Lidt(GY),得到第一训练损失。第一训练损失通过以下公式(5)得到。
其中,Ldom表示第一训练损失,Ladv(GY,DY,XLR,YLR)表示第二生成器和第二判别器的对抗损失,Ladv(GX,DX,YLR,XLR)表示第一生成器和第一判别器的对抗损失,Lcyc(GX,GY)表示循环一致性损失,Lcyc(GY,GX)表示循环一致性损失,Lidt(GY)表示GY()的本征损失,Lidt(GX)表示GX()的本征损失,λcyc和λidt表示权重参数。可选地,上述权重参数的取值为1。
306、服务器基于训练完毕的校正层,对多个第一样本图像进行处理,得到多个第三样本图像。
在本申请实施例中,服务器在对校正层训练完毕之后,基于该校正层对多个第一样本图像分别进行处理,以得到训练图像生成层所需的多个第三样本图像。服务器基于多个第三样本图像,采用迭代的方式,对图像生成模型中的图像生成层进行训练,该图像生成层用于提高图像的分辨率。参见下述步骤307和步骤308所示。
例如,图像生成层采用Adam优化器进行训练,服务器训练校正层之后,开始训练图像生成层,训练时采用500个epoch,mini-batch大小为8。
在一些实施例中,上述对第一样本图像的处理过程,能够通过上述公式(1)和下述公式(6)表示。
其中,表示第三样本图像,GY()表示第二生成器,GX()表示第一生成器,yLR表示第四样本图像。
在上述过程中,先降低第一样本图像的分辨率,得到第四样本图像,再将第四样本图像先后输入到第一生成器、第二生成器,得到第三样本图像。这样能够将低分辨率的第四样本图像,先通过第一生成器从第二图像类型转换到第一图像类型,再通过第二生成器从第一图像类型转换回第二图像类型,即,保证了第一样本图像、第四样本图像和第三样本图像具有相同的图像类型(第二图像类型)、同属于相同的数据域(高分辨率数据域),但第一样本图像是高分辨率图像,第三样本图像和第四样本图像都是低分辨率图像。
307、服务器通过图像生成模型中的图像生成层,对于该多个第三样本图像分别进行处理,得到多个结果图像。
在本申请实施例中,对于任一第三样本图像,服务器能够通过图像生成层,对该第三样本图像进行处理,以提高该第三样本图像的分辨率,得到该第三样本图像关联的结果图像。
在一些实施例中,图像生成层的架构为Volumnet(涡轮网络),该图像生成层以平行的卷积层布局实现了更深和更宽的网络结构。图像生成层包含9层卷积层和1层基于pixel shuffle (像素洗牌)的上采样层,在上采样层之后连接了一层全连接层。
在一些实施例中,上述结果图像能够表示为其中,表示第三样本图像,U()表示图像生成层。
308、服务器基于该多个结果图像与关联的该多个第一样本图像之间的差异,对该图像生成层进行训练。
在本申请实施例中,由于结果图像基于第三样本图像生成,而第三样本图像通过第一样本图像得到,则服务器能够以第一样本图像为监督信息,根据第一样本图像关联的结果图像与该第一样本图像之间的差异,来调整图像生成层的参数。
换一种表述,由于图像生成层能够提升输入图像的分辨率,将每个低分辨率的第三样本图像输出到图像生成层以后,将得到一个高分辨率的结果图像,这个结果图像代表了对第四样本图像的高分辨率重建结果,通过比较高分辨率重建的结果图像与高分辨率原始的第一样本图像之间的差异,可实现对图像生成层的迭代训练,其训练目标为控制每个结果图像和该结果图像关联的第一样本图像之间的差异最小化。
需要说明的是,本申请实施例提供的方案,既能应用于训练生成三维图像的模型,也能应用与生成二维图像的模型,本申请实施例对此不进行限制。
需要说明的是,为了使本申请实施例提供的方案更容易理解,下面对该方案进行进一步的说明。将一批噪声、模糊内核分布相近的高分辨率图像作为多个第一样本图像,第一样本图像表示为yHR,yHR∈YHR表示第一样本图像yHR属于图像类型YHR。将另一批低分辨率图像作为多个第二样本图像,第二样本图像表示为xLR,xLR∈XLR表示第二样本图像xLR属于图像类型XLR。然后,通过校正层对第一样本图像进行处理,得到第四样本图像yLR,并假设yLR∈YLR,也即第四样本图像属于图像类型YLR。校正层的训练目标是在xLR和yLR之间进行校正。图4是根据本申请实施例提供的一种图像生成模型的结构示意图,该图像生成模型包括校正层和图像生成层,该校正层包括第一生成器GX()、第二生成器GY()、第一判别器DX()以及第二判别器DY()。首先,服务器基于多个第四样本图像和多个第二样本图像,来训练图像生成模型中的校正层。以一次输入为例,每次向图像生成模型输入一个样本图像对,该样本图像对包括一个第一样本图像和一个第二样本图像,接着,降低第一样本图像的分辨率,得到第四样本图像,将第四样本图像和第二样本图像输入到校正层中;对于输入的第四样本图像,通过第一生成器对该第四样本图像进行处理,得到第一中间图像GX(yLR);对于输入的第二样本图像,通过第二生成器对该第二样本图像进行处理,得到第二中间图像GY(xLR)。然后,通过第一判别器对第一中间图像和第二样本图像进行判别,得到DX(GX(yLR),xLR);通过第二判别器对第二中间图像和第四样本图像进行判别,得到DY(GY(xLR),yLR)。最后,基于DX(GX(yLR),xLR)和DY(GY(xLR),yLR)调整校正层中第一生成器和第二生成器的参数。其次,在校正层训练完毕后,通过校正层对多个第一样本图像进行处理,得到多个第三样本图像,第三样本图像表示为以一个第三样本图像为例,通过图像生成层对该第三样本图像进行处理,得到结果图像基于yHR调整图像生成层的参数。
需要说明的是,图像生成层的训练损失为第二训练损失,能够通过下述公式(7)得到。
其中,Lups表示第二训练损失,表示结果图像,yHR表示第一样本图像。
需要说明的是,在训练图像生成层时,还能够输入第二样本图像,基于训练完毕的校正层对该第二样本图像进行处理,得到GY(xLR)。通过图像生成层对GY(xLR)进行处理,得到U(GY(xLR))。U(GY(xLR))表示基于xLR生成的图像。由于xLR没有关联的高分辨率图像作 为参考,因此基于yHR之间的差异表示模型的性能。
需要说明的是,为了进一步的说明本申请实施例提供的方案所训练得到的图像生成模型的效果,参见图5所示,图5是根据本申请实施例提供的一种效果展示图。采用的评价指标分别为:归一化的平均绝对误差(Mean Absolute Error,MAE)、峰值信噪比(Peak Signal to Noise Ratio,PSNR)以及结构相似性(Structural Similarity,SSIM),用于评价第一样本图像与结果图像之间的差别。其中,归一化的平均绝对误差越小,峰值信噪比和结构相似性越大,则图像质量越高。如图5所示,图5中的(a)示例性的示出了输入四个不同的第一样本图像时,输出的结果图像。经过对比,归一化的平均绝对误差为0.015±0.012,峰值信噪比为34.793±2.246,结构相似性为0.919±0.005。其中,±表示标准偏差。图5中的(b)示例性的示出了输入四个不同的第二样本图像时,输出的结果图像。
需要说明的是,为了验证本申请实施例提供的方案所训练得到的图像生成模型的效果,通过设计对照实验,来比较本申请提供的方案与现有方案的效果。实验选用经双三次插值处理的Bicubic(双三次插值)-0.75数据集和经最近邻插值处理的Nearest(最近邻插值)-0.25数据集,传统方案由方案1、方案2、方案3、方案4、方案5、方案6以及方案7表示。采用的评价指标分别为:归一化的MAE、PSNR以及SSIM。实验效果图参见图6所示,图6是根据本申请实施例提供的一种实验效果图。如图6所示,输入图像与真实图像为匹配的图像对,输入图像为低分辨率的图像,真实图像为关联的高分辨率的图像。实验结果参见表1所示,表1中括号中的数字表示标准偏差。通过表1可知,本申请提供的方案,在上述两个数据集上的表现均优于传统方案,具有较高的适用性和稳定性。
表1
本申请提供了一种图像生成模型的训练方案,通过基于第一样本图像和第二样本图像来训练校正层,使得校正层能够将图像转换至预设的图像类型,由于每个第三样本图像都是基于校正层对一个第一样本图像处理得到的,以每个第三样本图像关联的、分辨率较高的第一样本图像为监督信息,以每个第三样本图像为输入图像来训练图像生成层,能够提高图像生成层的图像重建能力。并且,由于图像生成模型能够转换样本图像的图像类型,使得不需要 使用图像类型相同的训练数据集和测试数据集即可完成模型训练,能够克服图像类型相同的训练数据缺乏的问题,降低图像生成模型的训练成本,提高生成分辨率较高的图像的效率。另外,本申请实施例提供的方案,还通过将轻量型的模块融合进图像生成模型的框架中,使得图像生成模型的参数量降低,降低了图像生成模型的训练难度和消耗的计算资源。
图7是根据本申请实施例提供的一种图像生成方法的流程图,如图7所示,该图像生成方法基于上述图像生成模型实现,在本申请实施例中以计算机设备为终端为例进行说明。该方法包括以下步骤:
701、终端通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,校正层用于转换所输入的图像的图像类型,第二图像属于目标图像类型。
在本申请实施例中,该终端为图1所示的终端101,该终端能够从服务器获取已训练完毕的图像生成模型。该第一图像为待处理的二维图像或者三维图像。终端能够将第一图像输入图像生成模型,该图像生成模型中的校正层能够对第一图像进行处理,将第一图像原来的图像类型转移至目标图像类型,得到第二图像。
702、终端通过图像生成模型中的图像生成层对第二图像进行处理,得到第三图像,图像生成层用于提高所输入的图像的分辨率。
在本申请实施例中,图像生成模型中的图像生成层能够采用对输入的图像依次进行卷积、上采样以及全连接的方式,对校正层输出的第二图像进行处理,以提高第二图像的分辨率,得到第三图像。
在一些实施例中,本申请实施例提供的一种图像生成方法,还能够应用于服务器中。相应的,第一终端向服务器发送图像生成请求。服务器基于图像生成模型对该图像生成请求中携带的第一图像进行处理,得到第三图像。服务器将该第三图像发送至第二终端,由第二终端进行展示。该第二终端能够与第一终端相同,也能够与第一终端不同。
例如,参见图8所示,图8是根据本申请实施例提供的一种图像生成流程的示意图。第一终端801向服务器802发送第一图像,服务器802向第二终端803发送第三图像。
本申请提供了一种图像生成的方案,通过基于第一样本图像和第二样本图像来训练校正层,使得校正层能够将图像转换至预设的图像类型,由于每个第三样本图像都是基于校正层对一个第一样本图像处理得到的,以每个第三样本图像关联的、分辨率较高的第一样本图像为监督信息,以每个第三样本图像为输入图像来训练图像生成层,能够提高图像生成层的图像重建能力。并且,由于图像生成模型能够转换样本图像的图像类型,使得不需要使用图像类型相同的训练数据集和测试数据集即可完成模型训练,能够克服图像类型相同的训练数据缺乏的问题,降低图像生成模型的训练成本,提高生成分辨率较高的图像的效率。另外,本申请实施例提供的方案,还通过将轻量型的模块融合进图像生成模型的框架中,使得图像生成模型的参数量降低,降低了图像生成模型的训练难度和消耗的计算资源。再通过图像生成模型来对输入的图像进行处理,能够得到分辨率较高的图像,处理方式方便快捷,且生成的图像的精度较高,提高了生成图像的效率。
图9是根据本申请实施例提供的一种图像生成模型的训练装置的框图。参见图9,装置包括:第一训练模块901和第二训练模块902。
第一训练模块901,用于基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,该校正层用于转换所输入的样本图像的图像类型,该第一样本图像的分辨率高于该第二样本图像;
第二训练模块902,用于基于训练完毕的该校正层,对该多个第一样本图像进行处理,得到多个第三样本图像;
该第二训练模块902,还用于基于该多个第一样本图像和该多个第三样本图像,对该图像生成模型中的图像生成层进行训练,该图像生成层用于提高所输入的样本图像的分辨率。
在一些实施例中,图10是根据本申请实施例提供的另一种图像生成模型的训练装置的框图,参见图10所示,该第一训练模块901,包括:
图像处理单元9011,用于确定多个样本图像对,每个样本图像对包括一个第一样本图像和一个第二样本图像;对每个样本图像对中的第一样本图像进行处理,得到第四样本图像,该第四样本图像的分辨率低于该第一样本图像;
第一校正单元9012,用于基于该图像生成模型中的校正层,对该第四样本图像进行处理,得到第一中间图像,该第一中间图像属于第一图像类型;
第二校正单元9013,用于基于该图像生成模型中的校正层,对该样本图像对中的第二样本图像进行处理,得到第二中间图像,该第二中间图像属于第二图像类型;
训练单元9014,用于基于每个样本图像对关联的第一中间图像、第二样本图像、第四样本图像以及第二中间图像,对该图像生成模型中的校正层进行训练。
在一些实施例中,该图像处理单元9011,用于对该第一样本图像添加高斯噪声;降低添加高斯噪声后的该第一样本图像的分辨率,得到该第一样本图像关联的第四样本图像。
在一些实施例中,该图像生成模型中的校正层包括第一生成器,该第一生成器用于将图像由该第二图像类型转换至该第一图像类型;该第一校正单元9012,用于通过该第一生成器对该第四样本图像进行卷积,得到该第四样本图像关联的第一中间图像。
在一些实施例中,该图像生成模型中的校正层包括第二生成器,该第二生成器用于将图像由该第一图像类型转换至该第二图像类型;该第二校正单元9013,用于通过该第二生成器对该第二样本图像进行卷积,得到该第二样本图像关联的第二中间图像。
在一些实施例中,该图像生成模型中的校正层包括第一判别器和第二判别器,该第一判别器和该第二判别器均用于确定图像之间的相似度;该训练单元9014,用于通过该第一判别器对该第一中间图像和该第一中间图像关联的第二样本图像进行判别,得到该第一中间图像与该第二样本图像之间的相似度;通过该第二判别器对该第二中间图像和该第二中间图像关联的第四样本图像进行判别,得到该第二中间图像和该第四样本图像之间的相似度;通过该第一判别器输出的多个相似度和该第二判别器输出的多个相似度,对该图像生成模型中的校正层进行训练。
在一些实施例中,该第二训练模块902,用于通过图像生成模型中的图像生成层,对每个第三样本图像分别进行处理,得到结果图像;基于每个结果图像与该结果图像关联的第一样本图像之间的差异,对该图像生成层进行训练。
本申请提供了一种图像生成模型的训练方案,通过基于第一样本图像和第二样本图像来训练校正层,使得校正层能够将图像转换至预设的图像类型,由于每个第三样本图像都是基于校正层对一个第一样本图像处理得到的,以每个第三样本图像关联的、分辨率较高的第一样本图像为监督信息,以每个第三样本图像为输入图像来训练图像生成层,能够提高图像生成层的图像重建能力。并且,由于图像生成模型能够转换样本图像的图像类型,使得不需要使用图像类型相同的训练数据集和测试数据集即可完成模型训练,能够克服图像类型相同的训练数据缺乏的问题,降低图像生成模型的训练成本,提高生成分辨率较高的图像的效率。另外,本申请实施例提供的方案,还通过将轻量型的模块融合进图像生成模型的框架中,使得图像生成模型的参数量降低,降低了图像生成模型的训练难度和消耗的计算资源。
需要说明的是:上述实施例提供的图像生成模型的训练装置在进行模型训练时,仅以上述各功能模块的划分进行举例说明,实际应用中,能够根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像生成模型的训练装置与图像生成模型的训练方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图11是根据本申请实施例提供的一种图像生成装置的框图。参见图11,装置包括:校正模块1101和图像生成模块1102。
校正模块1101,用于通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,校正层用于转换所输入的图像的图像类型,第二图像属于目标图像类型;
图像生成模块1102,用于通过图像生成模型中的图像生成层对第二图像进行处理,得到第三图像,图像生成层用于提高所输入的图像的分辨率。
需要说明的是:上述实施例提供的图像生成装置在进行模型训练时,仅以上述各功能模块的划分进行举例说明,实际应用中,能够根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像生成装置与图像生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在本申请实施例中,计算机设备能够被配置为终端或者服务器,当计算机设备被配置为终端时,能够由终端作为执行主体来实施本申请实施例提供的技术方案,当计算机设备被配置为服务器时,能够由服务器作为执行主体来实施本申请实施例提供的技术方案,也能够通过终端和服务器之间的交互来实施本申请提供的技术方案,本申请实施例对此不作限定。
计算机设备被配置为终端时,图12是根据本申请实施例提供的一种终端1200的结构框图。终端1200可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端1200包括有:处理器1201和存储器1202。
处理器1201包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1201采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。在一些实施例中,处理器1201包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1201集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1201还包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1202包括一个或多个计算机可读存储介质,该计算机可读存储介质是非暂态的。存储器1202还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1202中的非暂态的计算机可读存储介质用于存储至少一个计算机程序,该至少一个计算机程序用于被处理器1201所执行以实现本申请中方法实施例提供的图像生成模型的训练方法或者图像生成方法。
在一些实施例中,终端1200还可选包括有:外围设备接口1203和至少一个外围设备。处理器1201、存储器1202和外围设备接口1203之间通过总线或信号线相连。各个外围设备通过总线、信号线或电路板与外围设备接口1203相连。在一些实施例中,外围设备包括:显示屏1205和摄像头组件1206中的至少一种。
外围设备接口1203被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1201和存储器1202。在一些实施例中,处理器1201、存储器1202和外围设备接口1203被集成在同一芯片或电路板上;在一些其他实施例中,处理器1201、存储器1202和外围设备接口1203中的任意一个或两个在单独的芯片或电路板上实现,本实施例对此不加以限定。
显示屏1205用于显示UI(User Interface,用户界面)。该UI包括图形、文本、图标、视频及其它们的任意组合。当显示屏1205是触摸显示屏时,显示屏1205还具有采集在显示屏1205的表面或表面上方的触摸信号的能力。该触摸信号作为控制信号输入至处理器1201进行处理。此时,显示屏1205还用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1205为一个,设置在终端1200的前面板;在另一些实施例中,显 示屏1205为至少两个,分别设置在终端1200的不同表面或呈折叠设计;在另一些实施例中,显示屏1205是柔性显示屏,设置在终端1200的弯曲表面上或折叠面上。甚至,显示屏1205还设置成非矩形的不规则图形,也即异形屏。显示屏1205采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1206用于采集图像或视频。在一些实施例中,摄像头组件1206包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件1206还包括闪光灯。闪光灯是单色温闪光灯,或者是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,用于不同色温下的光线补偿。
本领域技术人员能够理解,图12中示出的结构并不构成对终端1200的限定,能够包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
计算机设备被配置为服务器时,图13是根据本申请实施例提供的一种服务器的结构示意图,该服务器1300可因配置或性能不同而产生比较大的差异,能够包括一个或一个以上处理器1301和一个或一个以上的存储器1302,其中,该存储器1302中存储有至少一条计算机程序,该至少一条计算机程序由该处理器1301加载并执行以实现上述各个方法实施例提供的图像生成模型的训练方法或者图像生成方法。当然,该服务器还具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一段计算机程序,该至少一段计算机程序由计算机设备的处理器加载并执行以实现上述实施例的图像生成模型的训练方法或图像生成方法。例如,该计算机可读存储介质是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在一些实施例中,本申请实施例所涉及的计算机程序可被部署在一个计算机设备上执行,或者在位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行,分布在多个地点且通过通信网络互连的多个计算机设备组成区块链系统。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得该计算机设备执行上述各种可选实现方式中提供的图像生成模型的训练方法或图像生成方法。
本领域普通技术人员能够理解实现上述实施例的全部或部分步骤能够通过硬件来完成,也能够通过程序来指令相关的硬件完成,该的程序能够存储于一种计算机可读存储介质中,上述提到的存储介质能够是只读存储器,磁盘或光盘等。
以上内容仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (13)

  1. 一种图像生成模型的训练方法,由服务器执行,所述方法包括:
    基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,所述校正层用于转换所输入的样本图像的图像类型,所述第一样本图像的分辨率高于所述第二样本图像;
    基于训练完毕的所述校正层,对所述多个第一样本图像进行处理,得到多个第三样本图像;
    基于所述多个第一样本图像和所述多个第三样本图像,对所述图像生成模型中的图像生成层进行训练,所述图像生成层用于提高所输入的样本图像的分辨率。
  2. 根据权利要求1所述的方法,所述基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,包括:
    确定多个样本图像对,每个所述样本图像对包括一个第一样本图像和一个第二样本图像;
    对每个所述样本图像对中的第一样本图像进行处理,得到第四样本图像,所述第四样本图像的分辨率低于所述第一样本图像;
    基于所述图像生成模型中的校正层,对所述第四样本图像进行处理,得到第一中间图像,所述第一中间图像属于第一图像类型;
    基于所述图像生成模型中的校正层,对所述样本图像对中的第二样本图像进行处理,得到第二中间图像,所述第二中间图像属于第二图像类型;
    基于每个样本图像对关联的第一中间图像、第二样本图像、第四样本图像以及第二中间图像,对所述图像生成模型中的校正层进行训练。
  3. 根据权利要求2所述的方法,所述对每个所述样本图像对中的第一样本图像进行处理,得到第四样本图像,包括:
    对所述第一样本图像添加高斯噪声;
    降低添加高斯噪声后的所述第一样本图像的分辨率,得到所述第四样本图像。
  4. 根据权利要求2所述的方法,所述图像生成模型中的校正层包括第一生成器,所述第一生成器用于将图像由所述第二图像类型转换至所述第一图像类型;
    所述基于所述图像生成模型中的校正层,对所述第四样本图像进行处理,得到第一中间图像,包括:
    通过所述第一生成器对所述第四样本图像进行卷积,得到所述第一中间图像。
  5. 根据权利要求2所述的方法,所述图像生成模型中的校正层包括第二生成器,所述第二生成器用于将图像由所述第一图像类型转换至所述第二图像类型;
    所述基于所述图像生成模型中的校正层,对所述样本图像对中的第二样本图像进行处理,得到第二中间图像,包括:
    通过所述第二生成器对所述第二样本图像进行卷积,得到所述第二中间图像。
  6. 根据权利要求2所述的方法,所述图像生成模型中的校正层包括第一判别器和第二判别器,所述第一判别器和所述第二判别器均用于确定图像之间的相似度;
    所述基于每个样本图像对关联的第一中间图像、第二样本图像、第四样本图像以及第二中间图像,对所述图像生成模型中的校正层进行训练,包括:
    通过所述第一判别器对所述第一中间图像和所述第二样本图像进行判别,得到所述第一中间图像与所述第二样本图像之间的相似度;
    通过所述第二判别器对所述第二中间图像和所述第四样本图像进行判别,得到所述第二中间图像和所述第四样本图像之间的相似度;
    通过所述第一判别器输出的多个相似度以及所述第二判别器输出的多个相似度,对所述图像生成模型中的校正层进行训练。
  7. 根据权利要求1所述的方法,所述基于所述多个第一样本图像和所述多个第三样本图像,对所述图像生成模型中的图像生成层进行训练,包括:
    通过图像生成模型中的图像生成层,对每个所述第三样本图像进行处理,得到结果图像;
    基于每个所述结果图像与所述结果图像关联的第一样本图像之间的差异,对所述图像生成层进行训练。
  8. 一种图像生成方法,由终端执行,所述方法包括:
    通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,所述校正层用于转换所输入的图像的图像类型,所述第二图像属于目标图像类型;
    通过所述图像生成模型中的图像生成层对所述第二图像进行处理,得到第三图像,所述图像生成层用于提高所输入的图像的分辨率。
  9. 一种图像生成模型的训练装置,所述装置包括:
    第一训练模块,用于基于多个第一样本图像和多个第二样本图像,对图像生成模型中的校正层进行训练,所述校正层用于转换图像所属的图像类型,所述第一样本图像的分辨率高于所述第二样本图像;
    第二训练模块,用于基于训练完毕的所述校正层,对所述多个第一样本图像进行处理,得到多个第三样本图像;
    所述第二训练模块,还用于基于所述多个第一样本图像和所述多个第三样本图像,对所述图像生成模型中的图像生成层进行训练,所述图像生成层用于提高图像的分辨率。
  10. 一种图像生成装置,所述装置包括:
    校正模块,用于通过图像生成模型中的校正层对第一图像进行处理,得到第二图像,所述校正层用于转换所输入的图像的图像类型,所述第二图像属于目标图像类型;
    图像生成模块,用于通过所述图像生成模型中的图像生成层对所述第二图像进行处理,得到第三图像,所述图像生成层用于提高所输入的图像的分辨率。
  11. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器用于存储至少一段计算机程序,所述至少一段计算机程序由所述处理器加载并执行权利要求1至7任一项权利要求所述的图像生成模型的训练方法,或者,所述至少一段计算机程序由所述处理器加载并执行权利要求8所述的图像生成方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质用于存储至少一段计算机程序,所述至少一段计算机程序用于执行权利要求1至7任一项权利要求所述的图像生成模型的训练方法,或者,所述至少一段计算机程序用于执行权利要求8所述的图像生成方法。
  13. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项权利要求所述的图像生成模型的训练方法,或者,所述计算机程序被处理器执行时实现如权利要求8所述的图像生成方法。
PCT/CN2023/081897 2022-04-22 2023-03-16 图像生成模型的训练方法、图像生成方法、装置及设备 WO2023202283A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210431484.XA CN115131199A (zh) 2022-04-22 2022-04-22 图像生成模型的训练方法、图像生成方法、装置及设备
CN202210431484.X 2022-04-22

Publications (1)

Publication Number Publication Date
WO2023202283A1 true WO2023202283A1 (zh) 2023-10-26

Family

ID=83376807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081897 WO2023202283A1 (zh) 2022-04-22 2023-03-16 图像生成模型的训练方法、图像生成方法、装置及设备

Country Status (2)

Country Link
CN (1) CN115131199A (zh)
WO (1) WO2023202283A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118658015A (zh) * 2024-08-20 2024-09-17 江西和壹科技有限公司 基于深度学习的地图资源识别方法、系统及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12103045B2 (en) * 2015-07-16 2024-10-01 Sortera Technologies, Inc. Removing airbag modules from automotive scrap
US11969764B2 (en) * 2016-07-18 2024-04-30 Sortera Technologies, Inc. Sorting of plastics
CN115131199A (zh) * 2022-04-22 2022-09-30 腾讯医疗健康(深圳)有限公司 图像生成模型的训练方法、图像生成方法、装置及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295302A1 (en) * 2018-03-22 2019-09-26 Northeastern University Segmentation Guided Image Generation With Adversarial Networks
CN111340682A (zh) * 2018-12-19 2020-06-26 通用电气公司 利用深度神经网络将医学图像转换为不同样式图像的方法和系统
CN115131199A (zh) * 2022-04-22 2022-09-30 腾讯医疗健康(深圳)有限公司 图像生成模型的训练方法、图像生成方法、装置及设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295302A1 (en) * 2018-03-22 2019-09-26 Northeastern University Segmentation Guided Image Generation With Adversarial Networks
CN111340682A (zh) * 2018-12-19 2020-06-26 通用电气公司 利用深度神经网络将医学图像转换为不同样式图像的方法和系统
CN115131199A (zh) * 2022-04-22 2022-09-30 腾讯医疗健康(深圳)有限公司 图像生成模型的训练方法、图像生成方法、装置及设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118658015A (zh) * 2024-08-20 2024-09-17 江西和壹科技有限公司 基于深度学习的地图资源识别方法、系统及存储介质

Also Published As

Publication number Publication date
CN115131199A (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
WO2023202283A1 (zh) 图像生成模型的训练方法、图像生成方法、装置及设备
US20210004935A1 (en) Image Super-Resolution Method and Apparatus
KR102663519B1 (ko) 교차 도메인 이미지 변환 기법
US11151780B2 (en) Lighting estimation using an input image and depth map
US11080553B2 (en) Image search method and apparatus
CN110136056A (zh) 图像超分辨率重建的方法和装置
US20230019972A1 (en) Systems and methods of contrastive point completion with fine-to-coarse refinement
US20220130025A1 (en) Picture optimization method device, terminal and corresponding storage medium
CN111667459B (zh) 一种基于3d可变卷积和时序特征融合的医学征象检测方法、系统、终端及存储介质
CN110852940A (zh) 一种图像处理方法及相关设备
CN113256529A (zh) 图像处理方法、装置、计算机设备及存储介质
CN114519731A (zh) 深度图像补全的方法和装置
Zhang et al. Unsupervised intrinsic image decomposition using internal self-similarity cues
Lu et al. Parallel region-based deep residual networks for face hallucination
CN116246026B (zh) 三维重建模型的训练方法、三维场景渲染方法及装置
WO2023045724A1 (zh) 图像处理方法、电子设备、存储介质及程序产品
Zhao et al. A survey for light field super-resolution
CN116912467A (zh) 图像拼接方法、装置、设备及存储介质
Shao et al. Two-stream coupling network with bidirectional interaction between structure and texture for image inpainting
Madeira et al. Neural Colour Correction for Indoor 3D Reconstruction Using RGB-D Data
CN111178501A (zh) 双循环对抗网络架构的优化方法、系统、电子设备及装置
Yang et al. An end‐to‐end perceptual enhancement method for UHD portrait images
US20240223742A1 (en) Depth-varying reprojection passthrough in video see-through (vst) extended reality (xr)
KR102690903B1 (ko) 선택적 초해상화에 기반한 다중시점 리얼타임 메타버스 컨텐츠데이터를 구축하는 방법 및 시스템
CN116681818B (zh) 新视角重建方法、新视角重建网络的训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23790944

Country of ref document: EP

Kind code of ref document: A1