WO2021077140A2 - Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image - Google Patents

Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image Download PDF

Info

Publication number
WO2021077140A2
WO2021077140A2 PCT/US2021/016774 US2021016774W WO2021077140A2 WO 2021077140 A2 WO2021077140 A2 WO 2021077140A2 US 2021016774 W US2021016774 W US 2021016774W WO 2021077140 A2 WO2021077140 A2 WO 2021077140A2
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
data
training
inpainting
Prior art date
Application number
PCT/US2021/016774
Other languages
English (en)
Other versions
WO2021077140A3 (fr
Inventor
Jenhao Hsiao
Original Assignee
Innopeak Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology, Inc. filed Critical Innopeak Technology, Inc.
Priority to PCT/US2021/016774 priority Critical patent/WO2021077140A2/fr
Publication of WO2021077140A2 publication Critical patent/WO2021077140A2/fr
Publication of WO2021077140A3 publication Critical patent/WO2021077140A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application generally relates to artificial intelligence, and more specifically to methods and systems for using deep learning techniques for image inpainting, such facial features inpainting.
  • Image inpainting is a process in which missing, lost, and/or deteriorated parts of images and/or videos are recovered (e.g., reconstructed).
  • a typical approach to fill ins a particular missing region in an image is to “borrow” pixels from surrounding regions of the image that are not missing. While such techniques may work for inpainting backgrounds of an image, they fail to generalize to cases where the surrounding regions may not have suitable information to fill the missing parts. Furthermore, in some circumstances, the missing regions may also require the inpainting system to infer features that should have been present in the missing regions.
  • Deep learning techniques have been applied in image inpainting and yielded significant improvements in the process.
  • a supervised one- stage generator includes a supervised completion network learned based on paired data and is applied to recover missing regions in an incomplete image.
  • These deep learning techniques do not fully utilize existing information in an image and reconstruct features that are over- smoothed and/or blurry. It would be beneficial to develop systems and methods to implement image inpainting efficiently and accurately based on deep learning techniques.
  • the present application describes embodiments related to image inpainting and, more particularly, to systems, devices, and methods for image inpainting of facial features, such as a person’s eyes, eyebrows, nose, chin and mouth.
  • facial features such as a person’s eyes, eyebrows, nose, chin and mouth.
  • Facial features inpainting can be applied to replace closed eyes in the picture.
  • a group photo is taken, the group of users often have to pose for an extended period of time, such that a series of photos may be taken in the hope of obtaining a good image in which everyone’s eyes are open.
  • a user may be talking when a photo is captured, resulting in the user’s mouth is wide open in the photo.
  • a two-stage adversarial model includes a pre-trained unconditional prior image generator (e.g., an unsupervised learning network) followed by an image completion network (e.g., a supervised learning network), thereby transferring pre- leamed domain knowledge gained from the prior image generator to a corresponding inpainting task.
  • the prior image generator is an unconditional generator network that is pretrained based on a large-scale domain dataset and configured to generate realistic images based on a target distribution.
  • Random faces generated by the prior image generator have an identical distribution to a true distribution of faces, and a certain type of facial features (e.g., eyebrows) generated by the prior image generator have an identical distribution to a true distribution of the same type of facial features.
  • the prior image generator corresponds to a latent space that can be used to generate relevant prior images for the corresponding inpainting task. Hallucinated content from the prior image generator is used as a priori, and the related knowledge is online transferred to the second- stage image completion network for the purposes of filling in any missing region or replacing any defective region with a desirable image quality.
  • a method includes obtaining a first image.
  • the face image includes a first portion to be enhanced.
  • the first portion to be enhanced includes an eye region in which eyes of a person are close or not entirely open.
  • the first portion to be enhanced is damaged, defective, deteriorating, or missing in the first image.
  • the method includes generating a prior image that is substantially similar to the first image based on a random multivariate normal vector.
  • the method also includes generating a prediction image from the first image and the prior image using an image completion model.
  • the prediction image replaces the first portion of the first image with an inpainting portion.
  • the inpainting portion has a resolution equal to or greater than the resolution of a remainder of the first image.
  • generating the prior image ftufher includes obtaining the random multivariate normal vector, mapping the random multivariate normal vector to a latent code using a mapping model, and combining the latent code and the first image using a synthesis model to generate the prior image.
  • the synthesis model includes a deep convolutional generative adversarial network (DC GAN) configured to receive the latent code and synthesize the prior image from the latent code.
  • the DC GAN includes a low convolutional layer and one or more high convolutional layers. The low convolutional layer is configured to receive a learned fixed code co. Each of the low and high convolutional layers is configured to receive a style code. Each convolutional layer is configured to project the latent code into a set of per-channel factors and offsets used to multiply an output of each channel of convolutional layer activations.
  • the mapping model, synthesis model, and image completion model are trained jointly using a plurality of training images based on a loss function.
  • the mapping model and synthesis model are trained using a first plurality of training images in a first training stage.
  • the image completion model is trained using a second plurality of training images (e.g., the second plurality of training images are distinct form the first plurality of training images) in a second training stage following the first training stage.
  • the image completion model includes an encoder configured to down- sample the first image and a plurality of residual blocks configured to process the down- sampled first image.
  • the image completion model also includes a decoder configured to up- sample the processed first image to an original size of the first image. Further, in some embodiments, at least one of the plurality of residual blocks is configured to implement dilated convolution with a dilation factor of 2.
  • the method is implemented by a user application installed at an electronic device that is communicatively coupled to a server system and configured to obtain the image completion model from the server system. Further, in some embodiments, the image completion model is trained at the server system.
  • the image completion model is trained using a loss function that is a weighted combination of a reconstruction loss, a style loss, a perceptual loss, and an adversarial loss.
  • a computer system includes one or more processors and memory.
  • the memory includes instructions that, when executed by the one or more processors, cause the processors to perform any of the above methods discussed herein.
  • a non-transitory computer readable storage medium stores instructions for execution by one or more processors. The instructions, when executed by the one or more processing processors, cause the processors to perform any of the above methods described herein.
  • Figure 1 is an example data processing environment having one or more servers communicatively coupled to one or more client devices, in accordance with some embodiments.
  • Figure 2 is a block diagram illustrating a data processing system, in accordance with some embodiments.
  • Figure 3 is an example data processing environment for training and applying a neural network based (NN-based) data processing model for processing visual and/or audio data, in accordance with some embodiments.
  • NN-based neural network based
  • Figure 4 A is an example neural network (NN) applied to process content data in an NN-based data processing model, in accordance with some embodiments
  • Figure 4B is an example node in the neural network, in accordance with some embodiments.
  • Figure 5 illustrates a process for image inpainting, in accordance with some embodiments.
  • Figures 6 A and 6B are two sets of images each of which compares image inpainting results obtained using a one- step inpainting approach and a two-step adversarial model approach, in accordance with some embodiments.
  • Figure 7 illustrates four sets of images each of which compares images processed before and after an eyes inpainting process, in accordance with some embodiments.
  • Figures 8 A and 8B are a flowchart of an example image inpainting process, in accordance with some embodiments.
  • FIG. 1 is an example data processing environment 100 having one or more servers 102 communicatively coupled to one or more client devices 104, in accordance with some embodiments.
  • the one or more client devices 104 may be, for example, desktop computers 104 A, tablet computers 104B, mobile phones 104C, or intelligent, multi-sensing, network- connected home devices (e.g., a camera).
  • Each client device 104 can collect data or user inputs, executes user applications, and present outputs on its user interface. The collected data or user inputs can be processed locally at the client device 104 and/or remotely by the server(s) 102.
  • the one or more servers 102 provides system data (e.g., boot files, operating system images, and user applications) to the client devices 104, and in some embodiments, processes the data and user inputs received from the client device(s) 104 when the user applications are executed on the client devices 104.
  • the data processing environment 100 fijrther includes a storage 106 for storing data related to the servers 102, client devices 104, and applications executed on the client devices 104.
  • the one or more servers 102 can enable real-time data communication with the client devices 104 that are remote from each other or from the one or more servers 102. Further, in some embodiments, the one or more servers 102 can implement data processing tasks that cannot be or are preferably not completed locally by the client devices 104.
  • the client devices 104 include a game console that executes an interactive online gaming application. The game console receives a user instruction and sends it to a game server 102 with user data. The game server 102 generates a stream of video data based on the user instruction and user data and providing the stream of video data for display on the game console and other client devices that are engaged in the same game session with the game console.
  • the client devices 104 include a networked surveillance camera and a mobile phone 104C.
  • the networked surveillance camera collects video data and streams the video data to a surveillance camera server 102 in real time. While the video data is optionally pre-processed on the surveillance camera, the surveillance camera server 102 processes the video data to identify motion or audio events in the video data and share information of these events with the mobile phone 104C, thereby allowing a user of the mobile phone 104 to monitor the events occurring near the networked surveillance camera in the real time and remotely.
  • the one or more servers 102, one or more client devices 104, and storage 106 are communicatively coupled to each other via one or more communication networks 108, which are the medium used to provide communications links between these devices and computers connected together within the data processing environment 100.
  • the one or more communication networks 108 may include connections, such as wire, wireless communication links, or fiber optic cables.
  • Examples of the one or more communication networks 108 include local area networks (LAN), wide area networks (WAN) such as the Internet, or a combination thereof
  • the one or more communication networks 108 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol.
  • USB Universal Serial Bus
  • FIREWIRE Long Term Evolution
  • LTE Long Term Evolution
  • GSM Global System for Mobile Communications
  • EDGE Enhanced Data GSM Environment
  • CDMA code division multiple access
  • TDMA time division multiple access
  • Bluetooth Wi-Fi
  • Wi-Fi voice over Internet Protocol
  • Wi-MAX wireless wide area network
  • a connection to the one or more communication networks 108 may be established either directly (e.g., using 3 G/4G connectivity to a wireless carrier), or through a network interface 110 (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof
  • the one or more communication networks 108 can represent the Internet of a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other computer systems that route data and messages.
  • Deep learning techniques are applied in the data processing environment 100 to process content data (e.g., video, image, audio, or textual data) obtained by an application executed at a client device 104 to identify information contained in the content data, match the content data with other data, categorize the content data, or synthesize related content data.
  • content data e.g., video, image, audio, or textual data
  • data processing models are created based on one or more neural networks to process the content data. These data processing models are trained with training data before they are applied to process the content data.
  • both model training and data processing are implemented locally at each individual client device 104 (e.g., the client device 104C).
  • the client device 104C obtains the training data from the one or more servers 102 or storage 106 and applies the training data to train the data processing models. Subsequently to model training, the client device 104C obtains the content data (e.g., captures video data via an internal camera) and processes the content data using the training data processing models locally. Alternatively, in some embodiments, both model training and data processing are implemented remotely at a server 102 (e.g., the server 102A) associated with a client device 104 (e.g. the client device 104A). The server 102A obtains the training data from itself another server 102 or the storage 106 and applies the training data to train the data processing models.
  • a server 102 e.g., the server 102A
  • the server 102A obtains the training data from itself another server 102 or the storage 106 and applies the training data to train the data processing models.
  • the client device 104 A obtains the content data, sends the content data to the server 102 A (e.g., in an application) for data processing using the trained data processing models, receives data processing results from the server 102A, and presents the results on a user interface (e.g., associated with the application).
  • the client device 104 A itself implements no or little data processing on the content data prior to sending them to the server 102 A.
  • data processing is implemented locally at a client device 104 (e.g., the client device 104B), while model training is implemented remotely at a server 102 (e.g., the server 102B) associated with the client device 104B.
  • the server 102B obtains the training data from itself another server 102 or the storage 106 and applies the training data to train the data processing models.
  • the trained data processing models are optionally stored in the server 102B or storage 106.
  • the client device 104B imports the trained data processing models from the server 102B or storage 106, processes the content data using the data processing models, and generates data processing results to be presented on a user interface locally.
  • FIG. 2 is a block diagram illustrating a data processing system 200, in accordance with some embodiments.
  • the data processing system 200 includes a server 102, a client device 104, a storage 106, or a combination thereof
  • the data processing system 200 typically, includes one or more processing units (CPUs) 202, one or more network interfaces 204, memory 206, and one or more communication buses 208 for interconnecting these components (sometimes called a chipset).
  • the data processing system 200 includes one or more input devices 210 that facilitate user input, such as a keyboard, a mouse, a voice- command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls.
  • the client device 104 of the data processing system 200 uses a microphone and voice recognition or a camera and gesture recognition to supplement or replace the keyboard.
  • the client device 104 includes one or more cameras, scanners, or photo sensor units for capturing images, for example, of graphic serial codes printed on the electronic devices.
  • the data processing system 200 also includes one or more output devices 212 that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays.
  • the client device 104 includes a location detection device, such as a GPS (global positioning satellite) or other geo-location receiver, for determining the location of the client device 104.
  • GPS global positioning satellite
  • Memory 206 includes high-speed random access memory, such as DRAM,
  • Memory 206 optionally, includes one or more storage devices remotely located from one or more processing units 202. Memory 206, or alternatively the non-volatile memory within memory 206, includes a non-transitory computer readable storage medium. In some embodiments, memory 206, or the non- transitory computer readable storage medium of memory 206, stores the following programs, modules, and data structures, or a subset or superset thereof
  • Operating system 214 including procedures for handling various basic system services and for performing hardware dependent tasks
  • Network communication module 216 for connecting each server 102 or client device 104 to other devices (e.g., server 102, client device 104, or storage 106) via one or more network interfaces 204 (wired or wireless) and one or more communication networks 108, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
  • User interface module 218 for enabling presentation of information (e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.) at each client device 104 via one or more output devices 212 (e.g., displays, speakers, etc.);
  • information e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.
  • output devices 212 e.g., displays, speakers, etc.
  • Input processing module 220 for detecting one or more user inputs or interactions from one of the one or more input devices 210 and interpreting the detected input or interaction;
  • Web browser module 222 for navigating, requesting (e.g., via HTTP), and displaying websites and web pages thereof, including a web interface for logging into a user account associated with a client device 104 or another electronic device, controlling the client or electronic device if associated with the user account, and editing and reviewing settings and data that are associated with the user account;
  • One or more user applications 224 for execution by the data processing system 200 (e.g., games, social network applications, smart home applications, and/or other web or non- web based applications for controlling another electronic device and reviewing data captured by such devices);
  • Model training module 226 for receiving training data and establishing a data processing model for processing content data (e.g., video, image, audio, or textual data) to be collected or obtained by a client device 104;
  • content data e.g., video, image, audio, or textual data
  • Data processing module 228 for processing content data using data processing models 240, thereby identifying information contained in the content data, matching the content data with other data, categorizing the content data, or synthesizing related content data, where in some embodiments, the data processing module 228 is associated with one of the user applications 224 to process the content data in response to a user instruction received from the user application 224;
  • the one or more databases 230 are stored in one of the server 102, client device 104, and storage 106 of the data processing system 200.
  • the one or more databases 230 are distributed in more than one of the server 102, client device 104, and storage 106 of the data processing system 200.
  • more than one copy of the above data is stored at distinct devices, e.g., two copies of the data processing models 240 are stored at the server 102 and storage 106, respectively.
  • Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
  • the above identified modules or programs i.e., sets of instructions
  • memory 206 optionally, stores a subset of the modules and data structures identified above.
  • memory 206 optionally, stores additional modules and data structures not described above.
  • FIG. 3 is another example data processing system 300 for training and applying a neural network based (NN-based) data processing model 240 for processing content data (e.g., video, image, audio, or textual data), in accordance with some embodiments.
  • the data processing system 300 includes a model training module 226 for establishing the data processing model 240 and a data processing module 228 for processing the content data using the data processing model 240.
  • both of the model training module 226 and the data processing module 228 are located on a client device 104 of the data processing system 300, while a training data source 304 distinct form the client device 104 provides training data 306 to the client device 104.
  • the training data source 304 is optionally a server 102 or storage 106.
  • both of the model training module 226 and the data processing module 228 are located on a server 102 of the data processing system 300.
  • the training data source 304 providing the training data 306 is optionally the server 102 itself, another server 102, or the storage 106.
  • the model training module 226 and the data processing module 228 are separately located on a server 102 and client device 104, and the server 102 provides the trained data processing model 240 to the client device 104.
  • the model training module 226 includes one or more data pre-processing modules 308, a model training engine 310, and a loss control module 312.
  • the data processing model 240 is trained according to a type of the content data to be processed.
  • the training data 306 is consistent with the type of the content data, so is a data pre-processing module 308 applied to process the training data 306 consistent with the type of the content data.
  • an image pre-processing module 308 A is configured to process image training data 306 to a predefined image format, e.g., extract a region of interest (ROI) in each training image, and crop each training image to a predefined image size.
  • ROI region of interest
  • an audio pre-processing module 308B is configured to process audio training data 306 to a predefined audio format, e.g., converting each training sequence to a frequency domain using a Fourier transform.
  • the model training engine 310 receives pre-processed training data provided by the data pre-processing modules 308, further processes the pre-processed training data using an existing data processing model 240, and generates an output from each training data item.
  • the loss control module 312 can monitor a loss function comparing the output associated with the respective training data item and a ground truth of the respective training data item.
  • the model training engine 310 modifies the data processing model 240 to reduce the loss function, until the loss function satisfies a loss criteria (e.g., a comparison result of the loss function is minimized or reduced below a loss threshold).
  • the modified data processing model 240 is provided to the data processing module 228 to process the content data.
  • the model training module 226 offers supervised learning in which the training data is entirely labelled and includes a desired output for each training data item (also called the ground truth in some situations). Conversely, in some embodiments, the model training module 226 offers unsupervised learning in which the training data are not labelled. The model training module 226 is configured to identify previously undetected patterns in the training data without pre-existing labels and with no or little human supervision. Additionally, in some embodiments, the model training module 226 offers partially supervised learning in which the training data are partially labelled.
  • the data processing module 228 includes a data pre-processing modules 314, a model-based processing module 316, and a data post-processing module 318.
  • the data pre processing modules 314 pre-processes the content data based on the type of the content data. Functions of the data pre-processing modules 314 are consistent with those of the pre processing modules 308 and covert the content data to a predefined content format that is acceptable by inputs of the model-based processing module 316. Examples of the content data include one or more of: video, image, audio, textual, and other types of data.
  • each image is pre-processed to extract an ROI or cropped to a predefined image size
  • an audio clip is pre-processed to convert to a frequency domain using a Fourier transform.
  • the content data includes two or more types, e.g., video data and textual data.
  • the model-based processing module 316 applies the trained data processing model 240 provided by the model training module 226 to process the pre-processed content data.
  • the model-based processing module 316 can also monitor an error indicator to determine whether the content data has been properly processed in the data processing model 240.
  • the processed content data is further processed by the data post processing module 318 to present the processed content data in a preferred format or to provide other related information that can be derived from the processed content data.
  • Figure 4 A is an example neural network (NN) 400 applied to process content data in an NN-based data processing model 240, in accordance with some embodiments
  • Figure 4B is an example node 420 in the neural network (NN) 400, in accordance with some embodiments.
  • the data processing model 240 is established based on the neural network 400.
  • a corresponding model-based processing module 316 applies the data processing model 240 including the neural network 400 to process content data that has been converted to a predefined content format.
  • the neural network 400 includes a collection of nodes 420 that are connected by links 412. Each node 420 receives one or more node inputs and applies a propagation function to generate a node output from the one or more node inputs.
  • the node output is provided via one or more links 412 to one or more other nodes 420
  • a weight w associated with each link 412 is applied to the node output.
  • the one or more node inputs are combined based on corresponding weights w’/, W2, H’.?, and W4 according to the propagation function.
  • the propagation function is a product of a non-linear activation function and a linear weighted combination of the one or more node inputs.
  • the collection of nodes 420 is organized into one or more layers in the neural network 400.
  • the one or more layers includes a single layer acting as both an input layer and an output layer.
  • the one or more layers includes an input layer 402 for receiving inputs, an output layer 406 for providing outputs, and zero or more hidden layers 404 (e.g., 404A and 404B) between the input and output layers 402 and 406.
  • a deep neural network has more than one hidden layers 404 between the input and output layers 402 and 406. In the neural network 400, each layer is only connected with its immediately preceding and/or immediately following layer.
  • a layer 402 or 404B is a fully connected layer because each node 420 in the layer 402 or 404B is connected to every node 420 in its immediately following layer.
  • one of the one or more hidden layers 404 includes two or more nodes that are connected to the same node in its immediately following layer for down sampling or pooling the nodes 420 between these two layers.
  • max pooling uses a maximum value of the two or more nodes in the layer 404B for generating the node of the immediately following layer 406 connected to the two or more nodes.
  • a convolutional neural network is applied in a data processing model 240 to process content data (particularly, video and image data).
  • the CNN employs convolution operations and belongs to a class of deep neural networks 400, i.e., a feedforward neural network that only moves data forward from the input layer 402 through the hidden layers to the output layer 406.
  • the one or more hidden layers of the CNN are convolutional layers convolving with a multiplication or dot product.
  • Each node in a convolutional layer receives inputs from a receptive area associated with a previous layer (e.g., five nodes), and the receptive area is smaller than the entire previous layer and may vary based on a location of the convolution layer in the convolutional neural network.
  • Video or image data is pre-processed to a predefined video/image format corresponding to the inputs of the CNN.
  • the pre-processed video or image data is abstracted by each layer of the CNN to a respective feature map.
  • a recurrent neural network is applied in the data processing model 240 to process content data (particularly, textual and audio data).
  • Nodes in successive layers of the RNN follow a temporal sequence, such that the RNN exhibits a temporal dynamic behavior.
  • each node 420 of the RNN has a time- varying real- valued activation.
  • the RNN examples include, but are not limited to, a long short-term memory (LSTM) network, a Hilly recurrent network, an Elman network, a Jordan network, a Hopfield network, a bidirectional associative memory (BAM network), an echo state network, an independently RNN (IndRNN), a recursive neural network, and a neural history compressor.
  • LSTM long short-term memory
  • BAM bidirectional associative memory
  • the RNN can be used for handwriting or speech recognition.
  • two or more types of content data are processed by the data processing module 228, and two or more types of neural networks (e.g., both CNN and RNN) are applied to process the content data jointly.
  • the training process is a process for calibrating all of the weights w’, for each layer of the learning model using a training data set which is provided in the input layer 402.
  • the training process typically includes two steps, forward propagation and backward propagation, which are repeated multiple times until a predefined convergence condition is satisfied.
  • forward propagation the set of weights for different layers are applied to the input data and intermediate results from the previous layers.
  • backward propagation a margin of error of the output (e.g., a loss function) is measured, and the weights are adjusted accordingly to decrease the error.
  • the activation function is optionally linear, rectified linear unit, sigmoid, hyperbolic tangent, or of other types.
  • a network bias term b is added to the sum of the weighted outputs from the previous layer before the activation function is applied.
  • the network bias b provides a perturbation that helps the NN 400 avoid over fitting the training data.
  • the result of the training includes the network bias parameter b for each layer.
  • Figure 5 illustrates a process 500 for image inpainting implemented in an image inpainting system, in accordance with some embodiments. While Figure 5 uses the eyes inpainting as an example inpainting application (e.g., to fix a pair of closed eyes in a photo), it should be obvious to one of ordinary skill in the art that Figure 5 is not intended to be limiting, and the embodiments disclosed in Figure 5 are equally applicable to inpainting of other features, such as other facial features, such as nose, eyebrows, mouth and chin.
  • the image inpainting system includes a PriorNet generator 502 and an image completion network 516 that are configured to implement the process 500 jointly.
  • the PriorNet generator 502 is a generator that has been pre-trained using a generative adversarial network (GAN).
  • GAN generative adversarial network
  • the GAN is configured to leam mapping from a latent distribution to real data via adversarial training. After learning a non-linear mapping, the GAN produces photo-realistic images from randomly sampled latent codes.
  • the GAN-trained PriorNet generator 502 is used as a domain- specific (e.g., face) prior image 512 for a corresponding inpainting task.
  • the inpainting task includes obtaining (e.g., by the data processing system 200 or 300) an input image 514.
  • the image includes a portion to be enhanced 530.
  • the input image 514 includes a face of a person whose eyes are closed.
  • the PriorNet generator 502 generates a prior image 512 (e.g., I prior) that is substantially similar to the input image 514.
  • the PriorNet generator 502 adopts a mapping and synthesis strategy for generating the prior image 512.
  • the PriorNet generator 502 includes a mapping network 506 (e.g., mapping model) and a synthesis network 510 (e.g., synthesis network).
  • the mapping network 506 takes random noise (e.g., a random signal, such as a 64-bit signal, a 128-bit signal, white noise, or an image, or a video, etc.), models the random noise according to a Gaussian distribution, projects the modelled noise, and transforms the projected noise into a meaningful output.
  • the mapping network 506 (e.g., denoted by a mapping function M0) optionally includes a plurality of layers (e.g., 8 layers) that are fully connected, and is configured to map random multivariate normal vectors to latent codes.
  • the mapping network 506 takes an input 5504 (e.g., codes sampled from a normal distribution) and outputs latent code 508, and the latent code is represented as M(.s) [0044]
  • the PriorNet generator 502 further includes a synthesis network 510.
  • the synthesis network 510 optionally includes a deep convolutional generative adversarial network (DCGAN) generator architecture with several modifications: (i)an input to a lower convolutional layer is a learned fixed code (e.g., denoted by Co) and (ii) a style code is fed into each convolutional layer.
  • DCGAN deep convolutional generative adversarial network
  • the latent code z 508 is first projected by a linear layer into a set of per- channel factors and offsets, and the factors and offsets are used to multiply an output of each channel of convolutional layer activations. This operation is called Adaptive Instance Norm (AdaIN), previously used in style transfer.
  • AdaIN Adaptive Instance Norm
  • the PriorNet generator 502 includes an unconditional generator network and is trained using adversarial training and progressive growing based on one or more existing large-scale face datasets, such as Celeb AHQ dataset or FlickrFaces-HQ. In some embodiments, the PriorNet generator 502 is trained separately from the following image completion network 516, and used to generate the prior image 512 for the inpainting task.
  • the perceptual feature vector is adopted from a VGG-19 network (e.g., described in Russakovsky et al, “Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3)211-252, 2015) that is pre-trained on an ImageNet dataset.
  • the image inpainting system implementing the process 500 fijrther includes the image completion network 516 (e.g., image completion model).
  • the image completion network 516 uses the incomplete input image 514’ (/ Q m) and the synthesized prior image Ip nor (512) as inputs, and outputs (e.g., generates) a prediction image (e.g., Ipred) 518.
  • the incomplete input image 514’ is an image that includes a portion to be enhanced.
  • the incomplete input image 514’ is an input image 514 in which a person’s eyes are closed
  • the prediction image 518 is an image in which the closed eyes are replaced with open eyes.
  • the prediction image 518 includes a color image. In some embodiments, the prediction image 518 has the same resolution as the input image 514 (e.g., the prior image 512 or the incomplete input image 514’). An inpainting portion replacing the portion to be enhanced 30 has the same solution as or a higher resolution than the rest of the input image 514 or the prediction 518.
  • the prediction image 518 is represented as: wherein Gc denotes one or more image completion generators of the image completion network 516.
  • the image completion generator includes encoders that down-sample the images 512 and 514 (e.g., down-sample twice), followed by residual blocks (e.g., eight residual blocks) and decoders that up-sample the down-sampled images 512 and 514 back to the original size.
  • the image completion generator uses dilated convolution with a dilation factor of two and does not use regular convolutions that are applied in residual layers.
  • the image completion network 516 is trained using a supervised learning process, in which the training data is entirely labelled and consists of input-output pairs each having a desired output (e.g., ground truth).
  • a desired output e.g., ground truth
  • an input training data item can be an image in which a person’s eyes are open but deliberately masked and the desired output is the image in which the person’s eyes are unmasked.
  • the image completion network 516 is trained using a loss function (e.g., a joint loss) that includes one or more of: a reconstruction loss 522 (e.g.,
  • the reconstruction loss 522 includes a mean-squared error or cross-entropy between the output and the input training data item, which penalizes the image completion network 516 for creating outputs different from the input.
  • the reconstruction loss 522 is normalized by a mask size.
  • the perceptual loss 524 penalizes results that are not perceptually similar to labels by defining a distance measure between activation maps of a pre-trained network.
  • the perceptual loss 524 is defined as:
  • G p is a 6 X 6 gram matrix constructed from activation maps f.
  • the adversarial loss 528 is defined as:
  • L adv E ,gt [log (D (I gt )) ⁇ + E I pred [log(l - D ⁇ l pred )) ⁇ (5)
  • I g t is the ground truth image (e.g., complete image without missing areas)
  • D() is a discriminator 520 (e.g., a 70x70 PatchGAN architecture that determines whether or not overlapping image patches of size 70 c 70 are real).
  • instance normalization is used across all layers of the image completion network 516.
  • the reconstruction loss 522 e.g., Li
  • the perceptual loss 524 e.g., L per c
  • the style loss 526 e.g., Lstyie
  • the adversarial loss 528 e.g., Ladv
  • the PriorNet generator 502 e.g., the mapping network
  • the 506 and the synthesis network 510) is trained using a first training dataset (e.g., a first plurality of training images) in a first training stage, and the image completion network 516 is trained using a second training dataset (e.g., a second plurality of training images in a second training stage) following the first training stage.
  • the first training dataset is distinct from the second training dataset.
  • the size of the first training dataset is much larger than the size of the second training dataset.
  • the first training dataset is in the size of millions compared to tens of thousands for the second training dataset, and the first training dataset is in the size of hundreds of thousands compared to tens of thousands for the second training dataset,.
  • a one- stage inpainting approach uses a one-stage model that is composed of only the image completion network 516.
  • the PriorNet generator 502 e.g., the mapping network 506 and the synthesis network 510
  • the image completion network 516 collectively form a two-stage adversarial model for image inpainting, in which the pre-leamed domain knowledge of the PriorNet generator 502 is transferred to the image completion network 516 for the image inpainting task.
  • the two- stage adversarial model outperforms the one- stage inpainting approach by generating features that are sharper, less blurry, and not overly smoothed.
  • the image inpainting process 500 is implemented as part of a user application 224 that is installed at an electronic device (e.g., a client device 104, such as a mobile phone that includes a camera and a display), which is communicatively coupled to a server system 102.
  • an electronic device e.g., a client device 104, such as a mobile phone that includes a camera and a display
  • the electronic device captures an input image (e.g., corresponding to an incomplete input image 514’) that includes a person whose eyes are closed.
  • the electronic device (e.g., via the user application 224) generates a prior image 512 using the pre-trained PriorNet generator 502 in accordance with the process 500 described above with respect to Figure 5.
  • the application 224 also generates a prediction image 518 based on the prior image 512 and the captured input image 514 using the image completion network 516.
  • the prediction image 518 is an image in which the person’s closed eyes are replaced with computer-generated open eyes.
  • both the prior image 512 and the predicted image 518 are generated locally by an electronic device 104.
  • the prior image 512 is generated locally at the electronic device 104.
  • the prior image 512 is then transmitted to a server system 102 along with the incomplete input image 514’.
  • the server system 102 generates the predicted image 518 using the completion model 516 at the server system 102.
  • the server returns the predicted image 518 to the electronic device for display on the electronic device 102.
  • the electronic device transmits the captured image
  • the server generates both the prior image 512 and the predicted image 518 using the image inpainting system, and returns the predicted image 518 to the electronic device for display on the electronic device.
  • Figures 6A and 6B are two sets of images 600 and 650 each of which compares image inpainting results obtained using a one- step inpainting approach and a two- step adversarial model approach, in accordance with some embodiments.
  • the one- step inpainting approach includes an image completion network 516 and does not include a PriorNet network 502.
  • the two-step adversarial model approach includes both the PriorNet network 502 and the image completion network 516.
  • a first image 602 is obtained with eyes closed.
  • a second image 604 is marked with a first portion 610 to be enhanced, and the first portion includes the closed eyes.
  • a third image 606 is generated using only the image completion network 516, and a fourth image 608 are generated using the two- stage model that includes the PriorNet generator 502 followed by the image completion network 516.
  • the first portion 660 is replaced with an inpainting portion.
  • the third image 606 corresponds to a blurrier inpainting portion, partially due to limited training data available for training a corresponding deep learning model.
  • the fourth image 608 provides superior perceptual results with a higher quality for the inpainting portion due to introduction of the large-scale unsupervised pre-trained PriorNet 502. That said, the inpainting portion of the fourth image 608 generated by the two-step adversarial model approach is visually sharper and has more details than the counterpart of the third image 606 generated by the one- step inpainting approach.
  • a first image 652 is obtained with eyes closed.
  • a second image 654 is marked with a first portion 610 to be enhanced, and the first portion includes the closed eyes.
  • a third image 656 is generated using only the image completion network 516, and a fourth image 658 are generated using the two- stage model that includes the PriorNet generator 502 followed by the image completion network 516.
  • the first portion 660 is replaced with an inpainting portion that is distinct from the counterpart inpainting portion in Figure 6 A.
  • the inpainting portion of the fourth image 658 generated by the two-step adversarial model approach is visually sharper and has more details than the counterpart of the third image 656 generated by the one- step inpainting approach.
  • Figure 7 illustrates four sets of images 700 each of which compares images processed before and after an eyes inpainting process, in accordance with some embodiments.
  • the image on the left is an original image in which a person’s eyes are closed
  • the image on the right is a retouched image that is generated from the original image using a two-stage adversarial model described above in Figure 5.
  • a first portion of the original image having the closed eyes is replaced with a computer- generated inpainting portion of open eyes, while the rest of the original image remains identical in the retouched image.
  • the computer-generated inpainting portion of each retouched image has the same resolution as or a larger resolution than the rest of the original image, thereby allowing the inpainting portion to be seamlessly merged into the original image.
  • Figures 8 A and 8B are a flowchart of an example image inpainting process
  • the image inpainting process 800 is implemented locally in an electronic device 104 (e.g. a mobile phone), while any deep learning network applied in the process 800 is trained locally using training data downloaded from a remote server system 102.
  • the image inpainting process 800 is implemented locally in the electronic device 104, while any deep learning network applied in the process 800 is trained remotely in the remote server system 102 and downloaded to the electronic device 104.
  • the image inpainting process 800 is implemented in the server system 102, while any deep learning network applied in the process 800 is trained remotely in the remote server system 102 as well.
  • the image inpainting process 800 is implemented jointly by the electronic device 104 and server system 102, i.e., split therebetween.
  • the process 800 includes obtaining (802) a first image 514.
  • the first image 514 includes a first portion to be enhanced 530.
  • the first image 514 is captured locally by the electronic device (e.g., a mobile phone) that is configured to process the first image 514 via the image inpainting process 800.
  • the first image 514 is received from a second electronic device (e.g., a second mobile device) by a first electronic device (e.g., a first mobile phone) that is configured to process the first image 514 via the image inpainting process 800.
  • an instant messaging application is executed at each of the first and second electronic devices to communicate the first image 514 therebetween.
  • the first portion to be enhanced 530 includes an eye region in which eyes of a person are close or not entirely open.
  • the first portion to be enhanced 530 includes other facial features of a person, such as nose, mouth, chin, skin, eyebrow, mole, face shape, ear, nose of the person.
  • the first portion to be enhanced 530 is damaged, defective, deteriorating, or missing in the first image 514.
  • one or both eyes are open too wide or closed. Eye balls are oriented to a wrong direction. A shape of a nose is not desirable. A mouth is open too wide or closed too tightly. An eyebrow is too high or too low. A mole needs to be removed. A face shape is not desirable. A double chin needs to be removed.
  • the first image 514 includes a black-and-white image and the first portion to be enhanced 530 generating a colored version of the black-and- white image.
  • the process 800 includes generating (804) a prior image 512 that is substantially similar to the first image 514 based on a random multivariate normal vector and generating (806) a prediction image 518 from the first image 514 and the prior image 512 using an image completion model.
  • the prediction image 518 replaces (808) the first portion of the first image 514 with an inpainting portion.
  • the first image 514 used to generate the prediction image 518 optionally includes the portion to be enhanced. That said, the portion to be enhanced 530 may be removed to facilitate generation of the prediction image 518.
  • the inpainting portion has a resolution equal to or greater than the resolution of a remainder of the first image 514.
  • generating the prior image 512 fijrther includes obtaining (810) the random multivariate normal vector, mapping (812) the random multivariate normal vector to a latent code using a mapping model (e.g., mapping network 506, Figure 5), and combining (814) the latent code and the first image 514 using a synthesis model (e.g., synthesis network 510, Figure 5) to generate the prior image 512.
  • the synthesis model includes (816) a deep convolutional generative adversarial network (DC GAN) configured to receive the latent code and synthesize the prior image 512 from the latent code.
  • DC GAN deep convolutional generative adversarial network
  • the DCGAN includes a low convolutional layer and one or more high convolutional layers.
  • the low convolutional layer is configured to receive a learned fixed code co.
  • Each of the low and high convolutional layers is configured to receive a style code.
  • Each convolutional layer is configured to project the latent code into a set of per- channel factors and offsets used to multiply an output of each channel of convolutional layer activations.
  • the mapping model, the synthesis model, and the image completion model are trained (818) jointly using a plurality of training images based on a loss function.
  • the mapping model and synthesis model are trained (820) using a first plurality of training images in a first training stage.
  • the image completion model is trained (822) using a second plurality of training images (e.g., the second plurality of training images are distinct form the first plurality of training images) in a second training stage following the first training stage.
  • the image completion model is trained at the server system.
  • the image completion model is trained (824) using a loss function.
  • the loss function is (826) a weighted combination of a reconstruction loss, a style loss, a perceptual loss, and an adversarial loss. More details on training of deep learning models applied in the process 800 are discussed above with reference to Figure 5.
  • the image completion model includes an encoder configured to down- sample the first image 514 and a plurality of residual blocks configured to process the down-sampled first image 514.
  • the image completion model also includes a decoder configured to up- sample the processed first image 514 to an original size of the first image 514.
  • at least one of the plurality of residual blocks is configured to implement dilated convolution with a dilation factor of 2.
  • each of the plurality of residual blocks is configured to implement dilated convolution with a dilation factor of 2.
  • the process 800 is implemented by a user application
  • the user application is integrated in an internal photo album of the electronic device and configured to process any images stored in the internal photo album.
  • the internal photo album is optionally executed with an operating system of the electronic device.
  • the user application is a picture application that is distinct from the internal photo album and configured to process any images captured, modified or stored by the picture application.
  • the user application is an instant messaging application, and configured to allow a user of the instant messaging application to modify the first image 514 shared from or to the user via the instant messaging application.
  • FIGS 8A and 8B have been described are merely exemplary and are not intended to indicate that the described order is the only order in which the operations could be performed.
  • One of ordinary skill in the art would recognize various ways described above with reference to Figures 1-7 to the image inpainting process 800 described in Figure 8. For brevity, these details are not repeated here.
  • Computer- readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
  • computer-readable media generally may correspond to (1) tangible computer- readable storage media which is non- transitory or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the embodiments described in the present application.
  • a computer program product may include a computer- readable medium.
  • first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • a first network could be termed a second network, and, similarly, a second network could be termed a first network, without departing from the scope of the embodiments.
  • the first network and the second network are both network, but they are not the same network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

Cette demande concerne la retouche d'image. Une première image est obtenue avec une première partie à améliorer. Une image précédente devant être sensiblement similaire à la première image est générée sur la base d'un vecteur normal à plusieurs variables aléatoires. Une image de prédiction est générée à partir de la première image et de l'image précédente à l'aide d'un modèle de complétion d'image. Dans l'image de prédiction, la première partie de la première image est remplacée ou mise à jour avec une partie de retouche. Dans certains modes de réalisation, la première partie à améliorer comprend une caractéristique faciale, par exemple, des yeux fermés.
PCT/US2021/016774 2021-02-05 2021-02-05 Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image WO2021077140A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/016774 WO2021077140A2 (fr) 2021-02-05 2021-02-05 Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/016774 WO2021077140A2 (fr) 2021-02-05 2021-02-05 Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image

Publications (2)

Publication Number Publication Date
WO2021077140A2 true WO2021077140A2 (fr) 2021-04-22
WO2021077140A3 WO2021077140A3 (fr) 2021-12-16

Family

ID=75538313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/016774 WO2021077140A2 (fr) 2021-02-05 2021-02-05 Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image

Country Status (1)

Country Link
WO (1) WO2021077140A2 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505845A (zh) * 2021-07-23 2021-10-15 黑龙江省博雅智睿科技发展有限责任公司 一种基于语言的深度学习训练集图像生成方法
CN114331903A (zh) * 2021-12-31 2022-04-12 电子科技大学 一种图像修复方法及存储介质
CN114581343A (zh) * 2022-05-05 2022-06-03 南京大学 一种图像的修复方法、装置、电子设备及存储介质
CN115983495A (zh) * 2023-02-20 2023-04-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 基于RFR-Net的全球中性大气温度密度预测方法及设备
WO2023207778A1 (fr) * 2022-04-24 2023-11-02 腾讯科技(深圳)有限公司 Procédé et dispositif de récupération de données, et support d'enregistrement
CN117094919A (zh) * 2023-10-20 2023-11-21 中国传媒大学 基于扩散模型的壁画数字化修复系统及方法
CN117994171A (zh) * 2024-04-03 2024-05-07 中国海洋大学 基于傅里叶变换扩散模型的海表面温度图像的补全方法
CN117994171B (zh) * 2024-04-03 2024-05-31 中国海洋大学 基于傅里叶变换扩散模型的海表面温度图像的补全方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6987520B2 (en) * 2003-02-24 2006-01-17 Microsoft Corporation Image region filling by exemplar-based inpainting
US9373160B2 (en) * 2013-12-18 2016-06-21 New York University System, method and computer-accessible medium for restoring an image taken through a window
US9159123B2 (en) * 2014-01-24 2015-10-13 Adobe Systems Incorporated Image prior as a shared basis mixture model
US10740881B2 (en) * 2018-03-26 2020-08-11 Adobe Inc. Deep patch feature prediction for image inpainting
EP3742346A3 (fr) * 2019-05-23 2021-06-16 HTC Corporation Procédé de formation d'un réseau antagoniste génératif (gan), procédé de génération d'images en utilisant un gan et support d'enregistrement lisible par ordinateur

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505845A (zh) * 2021-07-23 2021-10-15 黑龙江省博雅智睿科技发展有限责任公司 一种基于语言的深度学习训练集图像生成方法
CN114331903A (zh) * 2021-12-31 2022-04-12 电子科技大学 一种图像修复方法及存储介质
WO2023207778A1 (fr) * 2022-04-24 2023-11-02 腾讯科技(深圳)有限公司 Procédé et dispositif de récupération de données, et support d'enregistrement
CN114581343A (zh) * 2022-05-05 2022-06-03 南京大学 一种图像的修复方法、装置、电子设备及存储介质
CN115983495A (zh) * 2023-02-20 2023-04-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 基于RFR-Net的全球中性大气温度密度预测方法及设备
CN115983495B (zh) * 2023-02-20 2023-08-11 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 基于RFR-Net的全球中性大气温度密度预测方法及设备
CN117094919A (zh) * 2023-10-20 2023-11-21 中国传媒大学 基于扩散模型的壁画数字化修复系统及方法
CN117094919B (zh) * 2023-10-20 2023-12-15 中国传媒大学 基于扩散模型的壁画数字化修复系统及方法
CN117994171A (zh) * 2024-04-03 2024-05-07 中国海洋大学 基于傅里叶变换扩散模型的海表面温度图像的补全方法
CN117994171B (zh) * 2024-04-03 2024-05-31 中国海洋大学 基于傅里叶变换扩散模型的海表面温度图像的补全方法

Also Published As

Publication number Publication date
WO2021077140A3 (fr) 2021-12-16

Similar Documents

Publication Publication Date Title
WO2021077140A2 (fr) Systèmes et procédés de transfert de connaissance préalable pour la retouche d'image
Liao et al. DR-GAN: Automatic radial distortion rectification using conditional GAN in real-time
US11481869B2 (en) Cross-domain image translation
WO2021081562A2 (fr) Modèle de reconnaissance de texte multi-tête pour la reconnaissance optique de caractères multilingue
WO2021184026A1 (fr) Fusion audiovisuelle avec attention intermodale pour la reconnaissance d'actions vidéo
CN108388889B (zh) 用于分析人脸图像的方法和装置
CN115699082A (zh) 缺陷检测方法及装置、存储介质及电子设备
WO2023101679A1 (fr) Récupération inter-modale d'image de texte sur la base d'une expansion de mots virtuels
WO2021092600A2 (fr) Réseau pose-over-parts pour estimation de pose multi-personnes
WO2022103877A1 (fr) Génération d'avatar 3d à commande audio réaliste
US20230196739A1 (en) Machine learning device and far-infrared image capturing device
US20230267587A1 (en) Tuning color image fusion towards original input color with adjustable details
WO2023086398A1 (fr) Réseaux de rendu 3d basés sur des champs de radiance neurale de réfraction
WO2023277877A1 (fr) Détection et reconstruction de plan sémantique 3d
CN111553961B (zh) 线稿对应色图的获取方法和装置、存储介质和电子装置
WO2023091131A1 (fr) Procédés et systèmes pour récupérer des images sur la base de caractéristiques de plan sémantique
WO2023069086A1 (fr) Système et procédé de ré-éclairage de portrait dynamique
WO2023027712A1 (fr) Procédés et systèmes permettant de reconstruire simultanément une pose et des modèles humains 3d paramétriques dans des dispositifs mobiles
WO2023277888A1 (fr) Suivi de la main selon multiples perspectives
CN116420163A (zh) 识别系统、识别方法、程序、学习方法、学习完毕模型、蒸馏模型及学习用数据集生成方法
US20230274403A1 (en) Depth-based see-through prevention in image fusion
RU2817316C2 (ru) Способ и устройство для обучения модели генерирования изображений, способ и устройство для генерирования изображений и их устройства
WO2023172257A1 (fr) Stéréo photométrique pour surface dynamique avec champ de mouvement
US20240087344A1 (en) Real-time scene text area detection
WO2023023160A1 (fr) Reconstruction d'informations de profondeur à partir d'images stéréo multi-vues (mvs)

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21719013

Country of ref document: EP

Kind code of ref document: A2