WO2024049441A1 - Réseau intégré antérieur latent pour la restauration et l'amélioration d'images - Google Patents

Réseau intégré antérieur latent pour la restauration et l'amélioration d'images Download PDF

Info

Publication number
WO2024049441A1
WO2024049441A1 PCT/US2022/042441 US2022042441W WO2024049441A1 WO 2024049441 A1 WO2024049441 A1 WO 2024049441A1 US 2022042441 W US2022042441 W US 2022042441W WO 2024049441 A1 WO2024049441 A1 WO 2024049441A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
latent
network
lpen
computing system
Prior art date
Application number
PCT/US2022/042441
Other languages
English (en)
Inventor
Jenhao Hsiao
Original Assignee
Innopeak Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology, Inc. filed Critical Innopeak Technology, Inc.
Priority to PCT/US2022/042441 priority Critical patent/WO2024049441A1/fr
Publication of WO2024049441A1 publication Critical patent/WO2024049441A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the techniques of this disclosure generally relate to tools and techniques for implementing image restoration and enhancement, and, more particularly, to methods, systems, and apparatuses for implementing latent prior embedded network for restoration and enhancement of images (e.g., facial image, or the like).
  • a method may be provided for restoring and enhancing low quality images.
  • the method may comprise: mapping, using a computing system and an embedding network ("Network M”) that was previously trained, an input image into a first vector, the input image being a low quality image suffering from one or more unknown degradation effects; performing, using the computing system and a latent prior embedded network (“LPEN”) that was previously trained, reverse diffusion on the first vector to produce a second vector, based on a trained diffusion model (“DM”) of the LPEN, the second vector being a denoised version of the first vector; and decoding, using the computing system and a trained decoder of the LPEN, the second vector to produce an output image corresponding to the input image, the output image being a restored and enhanced version of the input image.
  • Network M an embedding network
  • LPEN latent prior embedded network
  • DM trained diffusion model
  • a system may be provided for restoring and enhancing low quality images.
  • the system may comprise a computing system, which may comprise at least one first processor and a first non-transitory computer readable medium communicatively coupled to the at least one first processor.
  • the first non-transitory computer readable medium may have stored thereon computer software comprising a first set of instructions that, when executed by the at least one first processor, causes the computing system to: map, using an embedding network ("Network M”) that was previously trained, an input image into a first vector, the input image being a low quality image suffering from one or more unknown degradation effects; perform, using a latent prior embedded network (“LPEN”) that was previously trained, reverse diffusion on the first vector to produce a second vector, based on a trained diffusion model (“DM”) of the LPEN, the second vector being a denoised version of the first vector; and decode, using a trained decoder of the LPEN, the second vector to produce an output image corresponding to the input image, the output image being a restored and enhanced version of the input image.
  • Network M an embedding network
  • LPEN latent prior embedded network
  • DM trained diffusion model
  • a method may be provided for training models of an artificial intelligence (“AI") system to restore and enhance low quality images.
  • the method may comprise: training, using a computing system, a latent prior embedded network (“LPEN”) of the AI system; and after training the LPEN, training, using the computing system, an embedding network (“Network M”).
  • LPEN latent prior embedded network
  • Network M an embedding network
  • the method may comprise training the LPEN by: encoding, using the computing system and an encoder of the LPEN, a first input image to produce a first latent vector, the first input image being a high quality image among a plurality of high quality images; performing, using the computing system and the LPEN, forward diffusion on the first latent vector to produce a second latent vector, based on a diffusion model ("DM") of the LPEN, the second latent vector being a noise-modified version of the first latent vector; performing, using the computing system and the LPEN, reverse diffusion on the second latent vector to produce a third latent vector, based on the DM of the LPEN, the third latent vector being a denoised version of the second latent vector; decoding, using the computing system and a decoder of the LPEN, the third latent vector to produce a first output image; calculating, using the computing system, a first loss value between the first output image and the first input image, based on a first loss function; updating, using the computing
  • the method may further comprise training the Network M by: mapping, using the computing system and the Network M, a second input image into a first vector, the second input image being a low quality image suffering from one or more degradation effects, the second input image corresponding to a third image, the third image being a high quality version of the second input image; performing, using the computing system and the trained LPEN, reverse diffusion on the first vector to produce a second vector, based on the trained DM of the LPEN, the second vector being a denoised version of the first vector; decoding, using the computing system and the trained decoder of the LPEN, the second vector to produce a second output image corresponding to the second input image; calculating, using the computing system, a second loss value between the second output image and the third image, based on a second loss function; updating, using the computing system, a model of the Network M based at least in part on the calculated second loss value; and repeating, using the computing system, the mapping, the reverse diffusion, the decoding, the second loss value calculation,
  • a sub-label is associated with a reference numeral to denote one of multiple similar components.
  • Fig.1 is a schematic diagram illustrating a system for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Figs.2A and 2B are schematic block flow diagrams illustrating a non-limiting example of a method for training a latent prior embedded network (“LPEN”) and an embedding network (“Network M”) for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • LPEN latent prior embedded network
  • Network M embedding network
  • Fig.2C is a schematic block flow diagram illustrating a non-limiting example of a method for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Fig.3 is a set of diagrams illustrating various non-limiting examples comparing low quality facial images and the corresponding high quality facial image results obtained by implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Figs.4A-4D are flow diagrams illustrating a method for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Fig.5 is a block diagram illustrating an example of computer or system hardware architecture, in accordance with various embodiments.
  • Various embodiments provide tools and techniques for implementing image restoration and enhancement, and, more particularly, to methods, systems, and apparatuses for implementing latent prior embedded network for restoration and enhancement of images (e.g., facial image, or the like).
  • a latent prior embedded network is provided for restoration and enhancement of images (e.g., facial image, or the like). This allows seamlessly integration of the advantages of a diffusion model with DNN.
  • a diffusion model for HQ face or facial image generation is first pre-trained and used as a latent prior network.
  • a DNN embedding network may then be trained by a set of synthesized LQ to HQ face or facial image pairs, during which the DNN learns to map the input degraded image to a desired latent space so that the latent prior network can reproduce the desired HQ face or facial images.
  • Figs.1-5 illustrate some of the features of the method, system, and apparatus for implementing image restoration and enhancement, and, more particularly, to methods, systems, and apparatuses for implementing latent prior embedded network for restoration and enhancement of images (e.g., facial image, or the like), as referred to above.
  • Figs.1-5 refer to examples of different embodiments that include various components and steps, which can be considered alternatives or which can be used in conjunction with one another in the various embodiments.
  • the description of the illustrated methods, systems, and apparatuses shown in Figs.1-5 is provided for purposes of illustration and should not be considered to limit the scope of the different embodiments.
  • FIGs.1-5 is provided for purposes of illustration and should not be considered to limit the scope of the different embodiments.
  • Fig.1 is a schematic diagram illustrating a system 100 for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • system 100 may comprise user device 105, which may include, but is not limited to, one of a smartphone, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a wearable device, a digital photo album, a media player, a camera, a drone, a surveillance system, or a security system, and/or the like.
  • user device 105 may include, without limitation, at least one of computing system(s) / artificial intelligence (“AI") system(s) 110a, data storage 120a, communications system 125, display screen 130 (optional), or audio playback device 135 (optional), and/or the like.
  • computing system(s) / AI system(s) 110a may include, but is not limited to, at least one of a machine learning (“ML") system 115a, latent prior embedded network (“LPEN”) 115b, or embedding network (“Network M”) 115c, and/or the like.
  • ML machine learning
  • LPEN latent prior embedded network
  • Network M embedding network
  • computing system(s) / AI system(s) 110a may further include one or more other processors 115n.
  • computing system(s) 110a and/or at least one other processor 115n may each include, without limitation, at least one of a processor on a user device (e.g., user device 105, or the like), a graphics engine, a graphics rendering engine, a game engine, or a three-dimensional (“3D”) game engine, and/or the like.
  • a processor on a user device e.g., user device 105, or the like
  • a graphics engine e.g., a graphics rendering engine
  • game engine e.g., game engine
  • 3D three-dimensional
  • the AI system(s) 110a and/or ML system 115a may include a neural network including, but not limited to, at least one of a neural network, a deep learning network, a deep neural network (“DNN”), generative adversarial network (“GAN”), a feed-forward artificial neural network (“ANN”), a recurrent neural network (“RNN”), a convolutional neural network (“CNN”), or a fully convolutional network (“FCN”), and/or the like.
  • a neural network including, but not limited to, at least one of a neural network, a deep learning network, a deep neural network (“DNN”), generative adversarial network (“GAN”), a feed-forward artificial neural network (“ANN”), a recurrent neural network (“RNN”), a convolutional neural network (“CNN”), or a fully convolutional network (“FCN”), and/or the like.
  • the data storage 120a may include, but is not limited to, at least one of read-only memory (“ROM”), programmable read-only memory (“PROM”), erasable programmable read-only memory (“EPROM”), electrically erasable programmable read-only memory (“EEPROM”), flash memory, other non-volatile memory devices, random-access memory (“RAM”), static random-access memory (“SRAM”), dynamic random-access memory (“DRAM”), synchronous dynamic random-access memory (“SDRAM”), virtual memory, a RAM disk, or other volatile memory devices, non-volatile RAM devices, and/or the like.
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory other non-volatile memory devices
  • RAM random-access memory
  • SRAM static random-access memory
  • DRAM dynamic random-access memory
  • SDRAM synchronous dynamic random-access memory
  • the communications system 125 may include wireless communications devices capable of communicating using protocols including, but not limited to, at least one of Bluetooth TM communications protocol, Wi-Fi ® communications protocol, or other 802.11 suite of communications protocols, ZigBee ® communications protocol, Z-wave ® communications protocol, or other 802.15.4 suite of communications protocols, cellular communications protocol (e.g., 3G, 4G, 4G LTE, 5G, etc.), or other suitable communications protocols, and/or the like.
  • Bluetooth TM communications protocol Wi-Fi ® communications protocol
  • Wi-Fi ® communications protocol or other 802.11 suite of communications protocols
  • ZigBee ® communications protocol ZigBee ® communications protocol
  • Z-wave ® communications protocol or other 802.15.4 suite of communications protocols
  • cellular communications protocol e.g., 3G, 4G, 4G LTE, 5G, etc.
  • Some user devices may each include at least one integrated display screen 130 (in some cases, including a non-touchscreen display screen(s), while, in other cases, including a touchscreen display screen(s), and, in still other cases, including a combination of at least one non-touchscreen display screen and at least one touchscreen display screen) and at least one integrated audio playback device 135 (e.g., built-in speakers or the like).
  • at least one integrated display screen 130 in some cases, including a non-touchscreen display screen(s), while, in other cases, including a touchscreen display screen(s), and, in still other cases, including a combination of at least one non-touchscreen display screen and at least one touchscreen display screen
  • at least one integrated audio playback device 135 e.g., built-in speakers or the like.
  • Some user devices may each include at least one external display screen or monitor (not shown) (which may be a non-touchscreen display device or a touchscreen display device, or the like) and one of at least one integrated audio playback device 135 (e.g., built-in speakers, etc.) and/or at least one external audio playback device (not shown; e.g., external or peripheral speakers, wired earphones, wired earbuds, wired headphones, wireless earphones, wireless earbuds, wireless headphones, or the like).
  • at least one external display screen or monitor (not shown)
  • one of at least one integrated audio playback device 135 e.g., built-in speakers, etc.
  • external audio playback device not shown; e.g., external or peripheral speakers, wired earphones, wired earbuds, wired headphones, wireless earphones, wireless earbuds, wireless headphones, or the like.
  • System 100 may further comprise one or more remote computing systems 110b (and corresponding database(s) 120b), one or more content sources 140 (and corresponding database(s) 140a), and a content distribution system 145 (and corresponding database(s) 145a), and/or the like, that each communicatively couples with user device 105 via network(s) 150 (and via communications system 125) to provide low quality ("LQ") images or image data 160 for the computing system(s) / AI system(s) 110a to analyze or process, as described in detail below.
  • LQ low quality
  • the one or more remote computing systems 110b may including, without limitation, a server computer over a network, a cloud computing system, or a distributed computing system, and/or the like, and may be similar, if not identical, to computing system 110a at least in terms of features and/or functionalities, except that it is accessible remotely via network(s) 150.
  • the resultant high quality ("HQ") images or image data 165 may be sent to content distribution system 145 (and corresponding database(s) 145a; via network(s) 150 and via communications system 125) for storage and/or distribution to other devices (e.g., user devices 155a-155n).
  • user device 105 may directly send the HQ images or image data 165 to one or more user devices 155a-155n (collectively, "user devices 155" or the like), which may each include, but are not limited to, at least one of a smartphone, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a wearable device, a digital photo album, a media player, a camera, a drone, a surveillance system, a security system, or a display device, and/or the like.
  • the display device may include, without limitation, one of a smart television, a television (in some cases, coupled to the user device 105 or network(s) 150 via a set-top-box or other intermediary media player), a monitor or digital display panel, or a monitor or digital display panel (in some cases, coupled to the user device 105 or network(s) 150 via an externally connected user device (e.g., desktop computer, server computer, etc.)), and/or the like.
  • one or more of the user devices 155 may send the LQ images or image data 160 to at least one of the user device 105, the remote computing system 110b, the content source(s) 140, and/or the content distribution system 145 via network(s) 150, or the like.
  • the lightning bolt symbols are used to denote wireless communications between communications system 125 and network(s) 150 (in some cases, via network access points or the like (not shown)), between communications system 125 and at least one of the one or more user devices 155a-155n, and between network(s) 150 and at least one of the one or more user devices 155a-155n (in some cases, via network access points or the like (not shown)).
  • computing system(s) / AI system(s) 110a, ML system 115a, LPEN 115b, Network M 115c, other processor(s) 115n, user device 105, and/or remote computing system 110b may perform the method of implementing latent prior embedded network for restoration and enhancement of images, as shown, and described below, with respect to Figs.2-4.
  • computing system may perform the method of implementing latent prior embedded network for restoration and enhancement of images, as shown, and described below, with respect to Figs.2-4.
  • Figs.2A-2C are schematic block flow diagrams illustrating a non-limiting example of a method 200 for training and inferencing for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Figs.2A and 2B are schematic block flow diagrams illustrating a non- limiting example of a method 200 for training a latent prior embedded network (“LPEN”) and an embedding network (“Network M”) for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • LPEN latent prior embedded network
  • Network M embedding network
  • Fig.2C is a schematic block flow diagram illustrating a non-limiting example of a method 200 for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • a latent prior network based on diffusion model (such as the LPEN 205 of Fig.2, or the like), and then embed it into a deep neural network ("DNN”) as a decoder for high quality (“HQ”) face or facial image restoration.
  • DNN deep neural network
  • HQ high quality
  • the first part of LPEN 205 is a latent prior network (including, but not limited to, encoder E 215 and decoder D 230), which is a diffusion model that may be pre- trained to learn to reproduce the desired HQ face or facial image, using a large-scale face or facial image dataset (e.g., the FFHQ face dataset, or the like).
  • the second part is an embedding network (encoder M or Network M 250) that maps a LQ image to the latent space in the latent prior network.
  • the generation process is essentially a one-to-one mapping, where the latent prior network reproduces the desired HQ face or facial image based on the mapped LQ image that largely alleviates the uncertainty of one-to-many mapping in conventional methods due to the prior knowledge.
  • the details of each module in the system 200 are provided below.
  • the reconstruction of )* + ) may then be given by [0045]
  • the system employs a diffusion model ("DM") to conduct the image-to-image translation task.
  • Diffusion models are probabilistic models designed to learn a data distribution p(x) by gradually denoising a normally distributed variable, which corresponds to learning the reverse process of a fixed Markov Chain of length T.
  • the neural backbone E ⁇ ( . , t) of the model may be realized as a time-conditional UNet, or the like. Since the forward process is fixed, z t can be efficiently obtained from E 215 during training, and samples from p ⁇ (z) can be decoded to image space with a single pass through D 230.
  • training LPEN 205 may be performed as follows.
  • Encoder E 215 of LPEN 205 may encode a first input image x 210 (of size HxWx3 in RGB space) to produce a first latent vector x', the first input image being a high quality image among a plurality of high quality images.
  • LPEN 205 may perform forward diffusion on the first latent vector x' (of size h x W x n z in latent space) to produce a second latent vector z T , based on a DM of the LPEN, by performing forward diffusion (e.g., using noise encoders or forward diffusion modules 220a-220n, or the like) on each of a successive plurality of fourth latent vectors z 0 – z T-1 , with each successive fourth latent vector being a noise-modified version of a preceding fourth latent vector among the successive plurality of fourth latent vectors based on forward diffusion thereof, the first of the successive plurality of fourth latent vectors (e.g., z 0 ) being a noise-modified version of the first latent vector x', the second of the successive plurality of fourth latent vectors (e.g., z 1 ) being a noise-modified version of the first of the successive plurality of fourth latent vectors (e.g.,
  • LPEN may then perform reverse diffusion on the second latent vector z T to produce a third latent vector z 0 ', based on the DM of the LPEN, by performing reverse diffusion (e.g., using denoisers or reverse diffusion modules 225a-225n, or the like) on each of a successive plurality of fifth latent vectors z 1 ' – z T ', with each successive fifth latent vector being a denoised version of a preceding fifth latent vector among the successive plurality of fifth latent vectors based on reverse diffusion thereof, the first of the successive plurality of fifth latent vectors (e.g., z T ') being a denoised version of the second latent vector z T , the second of the successive plurality of fifth latent vectors (e.g., z T-1 ') being a denoised version of the first of the successive plurality of fifth latent vectors (e.g., z T '), and the third latent vector z 0 '
  • Decoder D 230 of the LPEN 205 may then decode the third latent vector z 0 ' to produce a first output image )D 235.
  • LPEN 205 may then calculate a first loss value between the first output image )D 235 (of size HxWx3 in RGB space) and the first input image x 210, based on a first loss function 240, which may include, but is not limited to, at least one of focal loss, contrastive loss, cross-entropy loss, or multi- class classification loss, and/or the like.
  • LPEN may then update the DM based at least in part on the calculated first loss value.
  • the encoding, the forward diffusion, the reverse diffusion, the decoding, the first loss value calculation, and the updating steps may be repeated for at least one of the high quality image or one or more of the plurality of high quality images, either for a first predetermined number of iterations or until the calculated first loss value is below a first predetermined threshold value, to train the DM, the encoder, and the decoder of LPEN.
  • LPEN 205' the latent prior network
  • some dataset e.g., the FFHQ face dataset, or the like
  • an embedding network M 250 which, in some cases, may include, without limitation, a CNN network that converts an image from to map the low quality image y 245 into the latent space z.
  • Reverse diffusion process may then be performed via a sampling process p ⁇ (z t-1
  • training Network M 250 may be performed as follows.
  • the Network M 250 may map a second input image y 245 into a first vector y', the second input image y 245 being a low quality image suffering from one or more degradation effects, including, but not limited to, at least one of low-resolution, blur, noise, misfocus issues, compression artifacts, or distortion effects, and/or the like.
  • the second input image y 245 may correspond to a third image which may be a high quality version of the second input image y 245.
  • the trained LPEN 205' may perform reverse diffusion on the first vector y' to produce a second vector z 0 ', based on the trained DM of the LPEN 205', by performing reverse diffusion on each of a successive plurality of third vectors z 1 ' – z T ', with each successive third vector being a denoised version of a preceding third vector among the successive plurality of third vectors based on reverse diffusion thereof, the first of the successive plurality of third vectors (e.g., z T ') being a denoised version of the first vector y', the second of the successive plurality of third vectors (e.g., z T-1 ') being a denoised version of the first of the successive plurality of third vectors (e.g., z T '), and the second vector z 0 ' being a denoised version of the last of the successive plurality of third vectors (e.g., z 1 ').
  • Decoder D 230 of the trained LPEN 205' may then decode the second vector z 0 ' to produce a second output image )F 255 corresponding to the second input image y 245.
  • LPEN 205' may then calculate a second loss value between the second output image )F 255 (of size H ⁇ W ⁇ 3 in RGB space) and the third image GF 260, based on a second loss function 265, which may include, but is not limited to, at least one of focal loss, contrastive loss, cross-entropy loss, or multi-class classification loss, and/or the like.
  • LPEN may then update a model of the Network M 250 based at least in part on the calculated second loss value.
  • mapping, the reverse diffusion, the decoding, the second loss value calculation, and the updating steps may be repeated for at least one of the second input image or one or more of a plurality of low quality images having a known corresponding plurality of high quality images, either for a second predetermined number of iterations or until the calculated second loss value is below a second predetermined threshold value, to train Network M 250.
  • LPEN and Network M have been trained (i.e., using LPEN 205' and trained Network M 250')
  • inferencing may be performed (as follows) on images having unknown degradation, including, but not limited to, at least one of unknown low-resolution, unknown blur, unknown noise, unknown misfocus issues, unknown compression artifacts, or unknown distortion effects, and/or the like.
  • the system may receive an input image GD 270 (of size H" x W" x3 in RGB space), the input image GD 270 being a low quality image suffering from one or more unknown degradation effects, as described above.
  • the system may resize the low quality image GD 270 to a predetermined resolution (not shown).
  • the trained Network M 250' may the (resized) input image GD 270 into a first vector y".
  • the trained LPEN 205' may perform reverse diffusion on the first vector y" to produce a second vector z 0 ', based on the trained DM of the LPEN 205', by performing reverse diffusion on each of a successive plurality of third vectors z 1 ' – z T ', with each successive third vector being a denoised version of a preceding third vector among the successive plurality of third vectors based on reverse diffusion thereof, the first of the successive plurality of third vectors (e.g., z T ') being a denoised version of the first vector y", the second of the successive plurality of third vectors (e.g., z T-1 ') being a denoised version of the first of the successive plurality of third vectors (e.g., z T '), and the second vector z 0 ' being a denoised version of the last of the successive plurality of third vectors (e.g., z 1 ').
  • Fig.3 is a set of diagrams illustrating various non-limiting examples 300 comparing low quality facial images (as shown, e.g., on the left side) and the corresponding high quality facial image results obtained by implementing latent prior embedded network for restoration and enhancement of images (as shown, e.g., on the right side), in accordance with various embodiments. As shown in Fig.3, the degradation effects are greatly reduced, if not eliminated.
  • Figs.4A-4D are flow diagrams illustrating a method 400 for implementing latent prior embedded network for restoration and enhancement of images, in accordance with various embodiments.
  • Fig.4 While the techniques and procedures are depicted and/or described in a certain order for purposes of illustration, it should be appreciated that certain procedures may be reordered and/or omitted within the scope of various embodiments.
  • the method 400 illustrated by Fig.4 can be implemented by or with (and, in some cases, are described below with respect to) the systems, examples, or embodiments 100, 200, and 300 of Figs.1, 2, and 3, respectively (or components thereof), such methods may also be implemented using any suitable hardware (or software) implementation.
  • method 400 at block 405, may comprise training, using a computing system, a latent prior embedded network ("LPEN") of an artificial intelligence (“AI”) system.
  • LPEN latent prior embedded network
  • AI artificial intelligence
  • method may comprise, after training the LPEN, training, using the computing system, an embedding network ("Network M").
  • Method 400 may further comprise using both the trained LPEN and the trained Network M to restore and enhance low quality images (block 415).
  • the computing system may comprise at least one of a machine learning system, an artificial intelligence (“AI") system, a deep learning system, a processor on a user device, a server computer over a network, a cloud computing system, or a distributed computing system, and/or the like.
  • AI artificial intelligence
  • each of the Network M or the LPEN may comprise at least one of a machine learning system, an AI system, a deep learning system, a neural network, a deep neural network (“DNN”), a convolutional neural network (“CNN”), a U-Net CNN, a fully convolutional network (“FCN”), or a generative adversarial network (“GAN”), and/or the like.
  • a machine learning system an AI system, a deep learning system, a neural network, a deep neural network (“DNN”), a convolutional neural network (“CNN”), a U-Net CNN, a fully convolutional network (“FCN”), or a generative adversarial network (“GAN”), and/or the like.
  • training the LPEN may comprise encoding, using the computing system and an encoder of the LPEN, a first input image to produce a first latent vector, the first input image being a high quality image among a plurality of high quality images (block 420); performing, using the computing system and the LPEN, forward diffusion on the first latent vector to produce a second latent vector, based on a diffusion model ("DM") of the LPEN, the second latent vector being a noise-modified version of the first latent vector (block 425); performing, using the computing system and the LPEN, reverse diffusion on the second latent vector to produce a third latent vector, based on the DM of the LPEN, the third latent vector being a denoised version of the second latent vector (block 430); decoding, using the computing system and a decoder of the LPEN, the third latent vector to produce a first output image (block 435); calculating, using the computing
  • the first loss function may comprise at least one of focal loss, contrastive loss, cross-entropy loss, or multi-class classification loss, and/or the like.
  • the first input image may comprise a low quality image of a human face.
  • the first input image may comprise a low quality image of an object, an animal, a person, a building, a landmark, a vehicle, or a scene, and/or the like.
  • performing forward diffusion on the first latent vector to produce the second latent vector, based on the DM of the LPEN, may comprise performing forward diffusion on each of a successive plurality of fourth latent vectors, with each successive fourth latent vector being a noise-modified version of a preceding fourth latent vector among the successive plurality of fourth latent vectors based on forward diffusion thereof (block 425a), the first of the successive plurality of fourth latent vectors being a noise-modified version of the first latent vector, and the second latent vector being a noise-modified version of the last of the successive plurality of fourth latent vectors.
  • performing reverse diffusion on the second latent vector to produce the third latent vector, based on the DM of the LPEN, may comprise performing reverse diffusion on each of a successive plurality of fifth latent vectors, with each successive fifth latent vector being a denoised version of a preceding fifth latent vector among the successive plurality of fifth latent vectors based on reverse diffusion thereof (block 430a), the first of the successive plurality of fifth latent vectors being a denoised version of the second latent vector, and the third latent vector being a denoised version of the last of the successive plurality of fifth latent vectors.
  • training the Network M may comprise: mapping, using the computing system and the Network M, a second input image into a first vector, the second input image being a low quality image suffering from one or more degradation effects (block 450), the second input image corresponding to a third image, the third image being a high quality version of the second input image; performing, using the computing system and the trained LPEN, reverse diffusion on the first vector to produce a second vector, based on the trained DM of the LPEN, the second vector being a denoised version of the first vector (block 455); decoding, using the computing system and the trained decoder of the LPEN, the second vector to produce a second output image corresponding to the second input image (block 460); calculating, using the computing system, a second loss value between the second output image and the third image, based on a second loss function (block 465); updating, using the computing system, a model of the Network M based at least in part on
  • the second loss function may comprise at least one of focal loss, contrastive loss, cross-entropy loss, or multi-class classification loss, and/or the like.
  • the one or more degradation effects may comprise at least one of low-resolution, blur, noise, misfocus issues, compression artifacts, or distortion effects, and/or the like.
  • the second input image may comprise a low quality image of a human face.
  • the second input image may comprise a low quality image of an object, an animal, a person, a building, a landmark, a vehicle, or a scene, and/or the like.
  • performing reverse diffusion on the first vector to produce the second vector, based on the DM of the LPEN, may comprise performing reverse diffusion on each of a successive plurality of third vectors, with each successive third vector being a denoised version of a preceding third vector among the successive plurality of third vectors based on reverse diffusion thereof (block 455a), the first of the successive plurality of third vectors being a denoised version of the first vector, and the second vector being a denoised version of the last of the successive plurality of third vectors.
  • restoring and enhancing low quality images may comprise: receiving, using the computing system, an input image, the input image being a low quality image suffering from one or more unknown degradation effects (block 475); resizing, using the computing system, the low quality image to a predetermined resolution (block 480); mapping, using the computing system and the Network M that was previously trained, the input image into a first vector (block 485); performing, using the computing system and the LPEN that was previously trained, reverse diffusion on the first vector to produce a second vector, based on the trained DM of the LPEN, the second vector being a denoised version of the first vector (block 490); and decoding, using the computing system and the trained decoder of the LPEN, the second vector to produce an output image corresponding to the input image, the output image being a restored and enhanced version of the input image (block 495).
  • the first vector is a latent vector representation of the input image in the LPEN.
  • the one or more unknown degradation effects may comprise at least one of unknown low-resolution, unknown blur, unknown noise, unknown misfocus issues, unknown compression artifacts, or unknown distortion effects, and/or the like.
  • the input image may comprise a low quality image of a human face. Alternatively, or additionally, the input image may comprise a low quality image of one of an object, an animal, a person, a building, a landmark, a vehicle, or a scene, and/or the like.
  • performing reverse diffusion on the first vector to produce the second vector, based on the trained DM of the LPEN, may comprise performing reverse diffusion on each of a successive plurality of third vectors, with each successive third vector being a denoised version of a preceding third vector among the successive plurality of third vectors based on reverse diffusion thereof, the first of the successive plurality of third vectors being a denoised version of the first vector, and the second vector being a denoised version of the last of the successive plurality of third vectors (block 490a).
  • Examples of System and Hardware Implementation [0068] Fig.5 is a block diagram illustrating an example of computer or system hardware architecture, in accordance with various embodiments.
  • Fig.5 provides a schematic illustration of one embodiment of a computer system 500 of the service provider system hardware that can perform the methods provided by various other embodiments, as described herein, and/or can perform the functions of computer or hardware system (i.e., user device 105, computing system / artificial intelligence ("AI") system 110a and 110b, display screen 130, audio playback device 135, content source(s) 140, content distribution system 145, and display devices 155a-155n, etc.), as described above.
  • AI artificial intelligence
  • Fig.5 is meant only to provide a generalized illustration of various components, of which one or more (or none) of each may be utilized as appropriate.
  • Fig.5, therefore, broadly illustrates how individual system elements may be implemented in a relatively separated or relatively more integrated manner.
  • the computer or hardware system 500 – which might represent an embodiment of the computer or hardware system (i.e., user device 105, computing system / AI system 110a and 110b, display screen 130, audio playback device 135, content source(s) 140, content distribution system 145, and display devices 155a-155n, etc.), described above with respect to Figs.1-4 – is shown comprising hardware elements that can be electrically coupled via a bus 505 (or may otherwise be in communication, as appropriate).
  • the hardware elements may include one or more processors 510, including, without limitation, one or more general- purpose processors and/or one or more special-purpose processors (such as microprocessors, digital signal processing chips, graphics acceleration processors, and/or the like); one or more input devices 515, which can include, without limitation, a mouse, a keyboard, and/or the like; and one or more output devices 520, which can include, without limitation, a display device, a printer, and/or the like.
  • processors 510 including, without limitation, one or more general- purpose processors and/or one or more special-purpose processors (such as microprocessors, digital signal processing chips, graphics acceleration processors, and/or the like)
  • input devices 515 which can include, without limitation, a mouse, a keyboard, and/or the like
  • output devices 520 which can include, without limitation, a display device, a printer, and/or the like.
  • the computer or hardware system 500 may further include (and/or be in communication with) one or more storage devices 525, which can comprise, without limitation, local and/or network accessible storage, and/or can include, without limitation, a disk drive, a drive array, an optical storage device, solid-state storage device such as a random access memory (“RAM”) and/or a read-only memory (“ROM”), which can be programmable, flash-updateable, and/or the like.
  • RAM random access memory
  • ROM read-only memory
  • Such storage devices may be configured to implement any appropriate data stores, including, without limitation, various file systems, database structures, and/or the like.
  • the computer or hardware system 500 might also include a communications subsystem 530, which can include, without limitation, a modem, a network card (wireless or wired), an infra-red communication device, a wireless communication device and/or chipset (such as a Bluetooth ⁇ device, an 802.11 device, a WiFi device, a WiMax device, a WWAN device, cellular communication facilities, etc.), and/or the like.
  • the communications subsystem 530 may permit data to be exchanged with a network (such as the network described below, to name one example), with other computer or hardware systems, and/or with any other devices described herein.
  • the computer or hardware system 500 will further comprise a working memory 535, which can include a RAM or ROM device, as described above.
  • the computer or hardware system 500 also may comprise software elements, shown as being currently located within the working memory 535, including an operating system 540, device drivers, executable libraries, and/or other code, such as one or more application programs 545, which may comprise computer programs provided by various embodiments (including, without limitation, hypervisors, VMs, and the like), and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein.
  • one or more procedures described with respect to the method(s) discussed above might be implemented as code and/or instructions executable by a computer (and/or a processor within a computer); in an aspect, then, such code and/or instructions can be used to configure and/or adapt a general purpose computer (or other device) to perform one or more operations in accordance with the described methods.
  • a set of these instructions and/or code might be encoded and/or stored on a non- transitory computer readable storage medium, such as the storage device(s) 525 described above. In some cases, the storage medium might be incorporated within a computer system, such as the system 500.
  • the storage medium might be separate from a computer system (i.e., a removable medium, such as a compact disc, etc.), and/or provided in an installation package, such that the storage medium can be used to program, configure, and/or adapt a general purpose computer with the instructions/code stored thereon.
  • These instructions might take the form of executable code, which is executable by the computer or hardware system 500 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on the computer or hardware system 500 (e.g., using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc.) then takes the form of executable code.
  • some or all of the procedures of such methods are performed by the computer or hardware system 500 in response to processor 510 executing one or more sequences of one or more instructions (which might be incorporated into the operating system 540 and/or other code, such as an application program 545) contained in the working memory 535.
  • Such instructions may be read into the working memory 535 from another computer readable medium, such as one or more of the storage device(s) 525.
  • execution of the sequences of instructions contained in the working memory 535 might cause the processor(s) 510 to perform one or more procedures of the methods described herein.
  • machine readable medium and “computer readable medium,” as used herein, refer to any medium that participates in providing data that causes a machine to operate in some fashion.
  • various computer readable media might be involved in providing instructions/code to processor(s) 510 for execution and/or might be used to store and/or carry such instructions/code (e.g., as signals).
  • a computer readable medium is a non-transitory, physical, and/or tangible storage medium.
  • a computer readable medium may take many forms, including, but not limited to, non-volatile media, volatile media, or the like.
  • Non-volatile media includes, for example, optical and/or magnetic disks, such as the storage device(s) 525.
  • Volatile media includes, without limitation, dynamic memory, such as the working memory 535.
  • a computer readable medium may take the form of transmission media, which includes, without limitation, coaxial cables, copper wire, and fiber optics, including the wires that comprise the bus 505, as well as the various components of the communication subsystem 530 (and/or the media by which the communications subsystem 530 provides communication with other devices).
  • transmission media can also take the form of waves (including without limitation radio, acoustic, and/or light waves, such as those generated during radio-wave and infra-red data communications).
  • Common forms of physical and/or tangible computer readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read instructions and/or code.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the processor(s) 510 for execution.
  • the instructions may initially be carried on a magnetic disk and/or optical disc of a remote computer.
  • a remote computer might load the instructions into its dynamic memory and send the instructions as signals over a transmission medium to be received and/or executed by the computer or hardware system 500.
  • These signals which might be in the form of electromagnetic signals, acoustic signals, optical signals, and/or the like, are all examples of carrier waves on which instructions can be encoded, in accordance with various embodiments of the invention.
  • the communications subsystem 530 (and/or components thereof) generally will receive the signals, and the bus 505 then might carry the signals (and/or the data, instructions, etc.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne de nouveaux outils et de nouvelles techniques pour mettre en œuvre un réseau intégré antérieur latent pour la restauration et l'amélioration d'images (par exemple, une image faciale, ou semblable). Dans divers modes de réalisation, un système informatique peut mapper, à l'aide d'un réseau d'intégration (« réseau M ») qui a été précédemment entraîné, une image d'entrée dans un premier vecteur, l'image d'entrée étant une image de faible qualité souffrant d'un ou de plusieurs effets de dégradation inconnus; peut effectuer, à l'aide d'un réseau intégré antérieur latent (« LPEN ») qui a été précédemment entraîné, une diffusion inverse sur le premier vecteur afin de produire un second vecteur, sur la base d'un modèle de diffusion (« DM ») entraîné du LPEN, le second vecteur étant une version débruitée du premier vecteur; et peut décoder, à l'aide d'un décodeur entraîné du LPEN, le second vecteur afin de produire une image de sortie correspondant à l'image d'entrée, l'image de sortie étant une version restaurée et améliorée de l'image d'entrée.
PCT/US2022/042441 2022-09-02 2022-09-02 Réseau intégré antérieur latent pour la restauration et l'amélioration d'images WO2024049441A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2022/042441 WO2024049441A1 (fr) 2022-09-02 2022-09-02 Réseau intégré antérieur latent pour la restauration et l'amélioration d'images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2022/042441 WO2024049441A1 (fr) 2022-09-02 2022-09-02 Réseau intégré antérieur latent pour la restauration et l'amélioration d'images

Publications (1)

Publication Number Publication Date
WO2024049441A1 true WO2024049441A1 (fr) 2024-03-07

Family

ID=90098499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/042441 WO2024049441A1 (fr) 2022-09-02 2022-09-02 Réseau intégré antérieur latent pour la restauration et l'amélioration d'images

Country Status (1)

Country Link
WO (1) WO2024049441A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020131864A1 (fr) * 2018-12-18 2020-06-25 Pathware Inc. Système fondé sur une microscopie de calcul et procédé d'imagerie et d'analyse automatisées d'échantillons de pathologie
CN113632146A (zh) * 2019-04-04 2021-11-09 谷歌有限责任公司 来自3d模型的神经重新渲染

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020131864A1 (fr) * 2018-12-18 2020-06-25 Pathware Inc. Système fondé sur une microscopie de calcul et procédé d'imagerie et d'analyse automatisées d'échantillons de pathologie
CN113632146A (zh) * 2019-04-04 2021-11-09 谷歌有限责任公司 来自3d模型的神经重新渲染

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FLORINEL-ALIN CROITORU; VLAD HONDRU; RADU TUDOR IONESCU; MUBARAK SHAH: "Diffusion Models in Vision: A Survey", ARXIV.ORG, 10 September 2022 (2022-09-10), pages 1 - 20, XP091314787 *

Similar Documents

Publication Publication Date Title
US10652565B1 (en) Image compression and decompression using embeddings
CN109816589B (zh) 用于生成漫画风格转换模型的方法和装置
US11978178B2 (en) Electronic device, control method thereof, and system
KR20210101233A (ko) 미디어 아이템에 임베딩된 뉴럴 네트워크의 설명을 포함하는 렌더링 엔진 모델을 제공하기 위한 방법 및 장치
US10943115B2 (en) Processing image data to perform object detection
US11062210B2 (en) Method and apparatus for training a neural network used for denoising
CN113348486A (zh) 具有选择性运动描述的图像显示
CN113505848B (zh) 模型训练方法和装置
US11568524B2 (en) Tunable models for changing faces in images
KR20190091806A (ko) 생성적 적대 네트워크를 이용한 비디오 시퀀스 생성 시스템 및 그 방법
CN115376035A (zh) 用于流式内容的实时增强
CN113744159B (zh) 一种遥感图像去雾方法、装置及电子设备
CN113658065A (zh) 图像降噪方法及装置、计算机可读介质和电子设备
KR20200073078A (ko) 기계 학습 기반으로 파라미터를 학습하는 영상 처리 장치 및 동작 방법
KR102612625B1 (ko) 신경망 기반의 특징점 학습 장치 및 방법
CN107920275B (zh) 视频播放方法、装置、终端及存储介质
WO2024049441A1 (fr) Réseau intégré antérieur latent pour la restauration et l'amélioration d'images
WO2021042232A1 (fr) Procédés et systèmes de codage d'image amélioré
WO2023045627A1 (fr) Procédé, appareil et dispositif de super-résolution d'image et support de stockage
EP4198878A1 (fr) Procédé et appareil de restauration d'image sur la base d'une image en rafale
WO2022021025A1 (fr) Procédé et appareil d'amélioration d'image
CN117808857B (zh) 一种自监督360°深度估计方法、装置、设备及介质
CN114449280B (zh) 一种视频编解码方法、装置及设备
US20240161344A1 (en) Recovering gamut color loss utilizing lightweight neural networks
WO2022178834A1 (fr) Procédé et appareil de traitement d'images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22957591

Country of ref document: EP

Kind code of ref document: A1