WO2024117708A1 - Procédé de conversion d'image faciale à l'aide d'un modèle de diffusion - Google Patents

Procédé de conversion d'image faciale à l'aide d'un modèle de diffusion Download PDF

Info

Publication number
WO2024117708A1
WO2024117708A1 PCT/KR2023/019225 KR2023019225W WO2024117708A1 WO 2024117708 A1 WO2024117708 A1 WO 2024117708A1 KR 2023019225 W KR2023019225 W KR 2023019225W WO 2024117708 A1 WO2024117708 A1 WO 2024117708A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
image
neural network
information
network model
Prior art date
Application number
PCT/KR2023/019225
Other languages
English (en)
Korean (ko)
Inventor
김기홍
김윤호
이광희
Original Assignee
주식회사 비브스튜디오스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 비브스튜디오스 filed Critical 주식회사 비브스튜디오스
Publication of WO2024117708A1 publication Critical patent/WO2024117708A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/13Sensors therefor

Definitions

  • This disclosure relates to a facial image conversion method using a diffusion model, and more specifically, by exchanging identity characteristics between two people using a diffusion model, so that the characteristics of one person's face are transferred to a photo of another person. It relates to a method of providing a face image.
  • GAN Geneative Adversarial Network
  • the present disclosure has been derived based at least on the technical background examined above, but the technical problem or purpose of the present disclosure is not limited to solving the problems or shortcomings examined above.
  • the present disclosure can cover various technical issues related to the content to be described below.
  • the problem of this disclosure is to provide a face image transformed so that the characteristics of one person's face are transferred to a photo of another person by exchanging identity characteristics between two people using a diffusion model.
  • a method performed by a computing device for realizing the above-described problem is disclosed.
  • the method includes receiving a target image and an original image; Using a neural network model, removing part or all of first image information containing noise and obtaining second image information; Comparing the second image information and the target image to calculate a feature loss function; Comparing the second image information and the original image to calculate a structure loss function; and training the neural network model based on at least one of the feature loss function or the structure loss function.
  • the neural network model includes a neural network model learned to obtain an image from which part or all of the noise has been removed by removing part or all of the noise for Gaussian distributed noise,
  • a neural network model learned to obtain the second image information from which part or all of the noise is removed based on the original image and the mask condition when the original image and the mask condition are input to the neural network model. may include.
  • the mask condition may be determined by creating a masking area for a portion of the original image where conversion is desired and based on the created masking area.
  • calculating a feature loss function by comparing the second image information and the target image may include extracting a first identity feature vector based on the second image information, and extracting a first identity feature vector based on the target image. It may include extracting a target identity feature vector and calculating the feature loss function based on the extracted first identity feature vector and the target identity feature vector.
  • the structure loss function includes a region loss function or a gaze loss function
  • calculating the structure loss function by comparing the second image information and the original image comprises: Comparing information and the original image to calculate a region loss function; And it may include calculating a gaze loss function by comparing the second image information and the original image.
  • the step of calculating a region loss function by comparing the second image information and the original image includes extracting first segmentation region information based on the second image information, and extracting first segmentation region information based on the original image. It may include extracting original segmentation area information and calculating the area loss function based on the extracted first segmentation area information and the original segmentation area information.
  • the step of calculating a gaze loss function by comparing the second image information and the original image may include extracting first gaze direction information based on the second image information, and It may include extracting original gaze direction information based on the original image, and calculating the gaze direction loss function based on the extracted first gaze direction information and the original gaze direction information.
  • the structure loss function includes a region loss function or a gaze loss function
  • training the neural network model based on at least one of the feature loss function or the structure loss function includes: calculating a transformation loss function based on the feature loss function and the structure loss function; and training the neural network model based on the transformation loss function, wherein the transformation loss function is a weighted sum of the feature loss function, the region loss function, and the gaze loss function. It can be calculated as
  • a computer program stored in a computer-readable storage medium When executed on one or more processors, the computer program causes the one or more processors to perform operations for learning a neural network model for converting an image, the operations comprising: receiving a target image and an original image;
  • a neural network model removing part or all of first image information containing noise and obtaining second image information; Comparing the second image information and the target image to calculate a feature loss function; Comparing the second image information and the original image to calculate a structure loss function; and training the neural network model based on at least one of the feature loss function or the structure loss function.
  • a computing device for realizing the above-described problem is disclosed.
  • the device includes at least one processor; and a memory, wherein the processor receives a target image and an original image; Using a neural network model, remove part or all of first image information containing noise and obtain second image information; Compare the second image information and the target image to calculate a feature loss function; Compare the second image information and the original image to calculate a structure loss function; And may be configured to learn the neural network model based on at least one of the feature loss function or the structure loss function.
  • the present disclosure can achieve the effect of naturally converting a face image by applying desired identity characteristics to the image, synthesizing the facial features of the person being synthesized at the correct position, and processing the person's gaze correctly.
  • FIG. 1 is a block diagram of a computing device for converting a face image using a diffusion model according to an embodiment of the present disclosure.
  • Figure 2 is a schematic diagram showing a network function according to an embodiment of the present disclosure.
  • Figure 3 is a flowchart showing a method for learning a neural network model for converting an image according to an embodiment of the present disclosure.
  • Figure 4 shows a process in which a neural network model is learned to obtain an image from which part or all of the noise has been removed by removing part or all of the Gaussian distributed noise according to an embodiment of the present disclosure. This is a schematic diagram to explain.
  • 5 is a diagram in which part or all of the noise is removed based on the original image, target image, and mask condition when the original image, target image, and masked condition are input to the neural network model according to an embodiment of the present disclosure.
  • 2 This is a schematic diagram showing the process by which a neural network model is learned to acquire image information.
  • FIG. 6A is a schematic diagram illustrating a process for calculating a feature loss function by comparing second image information and a target image according to an embodiment of the present disclosure.
  • FIG. 6B is a schematic diagram illustrating a process for calculating a region loss function by comparing second image information and the original image according to an embodiment of the present disclosure.
  • Figure 6c is a schematic diagram showing a process of calculating a gaze direction loss function by comparing second image information and the original image according to an embodiment of the present disclosure.
  • FIG. 7 is a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.
  • a component may be, but is not limited to, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device can be a component.
  • One or more components may reside within a processor and/or thread of execution.
  • a component may be localized within one computer.
  • a component may be distributed between two or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon.
  • Components may transmit signals, for example, with one or more data packets (e.g., data and/or signals from one component interacting with other components in a local system, a distributed system, to other systems and over a network such as the Internet). Depending on the data being transmitted, they may communicate through local and/or remote processes.
  • data packets e.g., data and/or signals from one component interacting with other components in a local system, a distributed system, to other systems and over a network such as the Internet.
  • a network such as the Internet
  • the term “or” is intended to mean an inclusive “or” and not an exclusive “or.” That is, unless otherwise specified or clear from context, “X utilizes A or B” is intended to mean one of the natural implicit substitutions. That is, either X uses A; X uses B; Or, if X uses both A and B, “X uses A or B” can apply to either of these cases. Additionally, the term “and/or” as used herein should be understood to refer to and include all possible combinations of one or more of the related listed items.
  • the term “at least one of A or B” should be interpreted to mean “a case containing only A,” “a case containing only B,” and “a case of combining A and B.”
  • network function artificial neural network, and neural network may be used interchangeably.
  • FIG. 1 is a block diagram of a computing device for converting a face image using a diffusion model according to an embodiment of the present disclosure.
  • the configuration of the computing device 100 shown in FIG. 1 is only a simplified example.
  • the computing device 100 may include different components for performing the computing environment of the computing device 100, and only some of the disclosed components may configure the computing device 100.
  • the computing device 100 may include a processor 110, a memory 130, and a network unit 150.
  • the processor 110 may be composed of one or more cores, and may include a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), and a tensor processing unit (TPU) of a computing device. unit) may include a processor for data analysis and deep learning.
  • the processor 110 may read a computer program stored in the memory 130 and perform data processing for machine learning according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the processor 110 may perform an operation for learning a neural network model.
  • the processor 110 processes input data for learning in deep learning (DL), extracts features from input data, calculates errors, and learns neural network models such as updating the weights of the neural network model using backpropagation. Calculations can be performed for .
  • DL deep learning
  • At least one of the CPU, GPGPU, and TPU of the processor 110 may process learning of the neural network model.
  • the CPU and GPGPU can work together to process neural network model learning and data classification using the neural network model.
  • the processors of a plurality of computing devices can be used together to process learning of a neural network model and data classification using the neural network model.
  • a computer program executed in a computing device may be a CPU, GPGPU, or TPU executable program.
  • the memory 130 may store any type of information generated or determined by the processor 110 and any type of information received by the network unit 150.
  • the memory 130 is a flash memory type, hard disk type, multimedia card micro type, or card type memory (e.g. (e.g. SD or -Only Memory), and may include at least one type of storage medium among magnetic memory, magnetic disk, and optical disk.
  • the computing device 100 may operate in connection with web storage that performs a storage function of the memory 130 on the Internet.
  • the description of the memory described above is merely an example, and the present disclosure is not limited thereto.
  • the network unit 150 includes Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), and VDSL (A variety of wired communication systems can be used, such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN).
  • PSTN Public Switched Telephone Network
  • xDSL Digital Subscriber Line
  • RADSL Rate Adaptive DSL
  • MDSL Multi Rate DSL
  • VDSL VDSL
  • wired communication systems such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN).
  • the network unit 150 presented in the present disclosure includes Code Division Multi Access (CDMA), Time Division Multi Access (TDMA), Frequency Division Multi Access (FDMA), Orthogonal Frequency Division Multi Access (OFDMA), and SC-FDMA (A variety of wireless communication systems can be used, such as Single Carrier-FDMA) and other systems.
  • CDMA Code Division Multi Access
  • TDMA Time Division Multi Access
  • FDMA Frequency Division Multi Access
  • OFDMA Orthogonal Frequency Division Multi Access
  • SC-FDMA A variety of wireless communication systems can be used, such as Single Carrier-FDMA and other systems.
  • the network unit 150 may be configured regardless of the communication mode, such as wired or wireless, and may be composed of various communication networks such as a personal area network (PAN) and a wide area network (WAN). You can. Additionally, the network may be the well-known World Wide Web (WWW), and may also use wireless transmission technology used for short-distance communication, such as Infrared Data Association (IrDA) or Bluetooth. The techniques described in this disclosure can also be used in other networks mentioned above.
  • WWW World Wide Web
  • IrDA Infrared Data Association
  • Bluetooth wireless transmission technology used for short-distance communication
  • Figure 2 is a schematic diagram showing a network function according to an embodiment of the present disclosure.
  • a neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons.
  • a neural network consists of at least one node. Nodes (or neurons) that make up neural networks may be interconnected by one or more links.
  • one or more nodes connected through a link may form a relative input node and output node relationship.
  • the concepts of input node and output node are relative, and any node in an output node relationship with one node may be in an input node relationship with another node, and vice versa.
  • input node to output node relationships can be created around links.
  • One or more output nodes can be connected to one input node through a link, and vice versa.
  • the value of the data of the output node may be determined based on the data input to the input node.
  • the link connecting the input node and the output node may have a weight. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. The output node value can be determined based on the weight.
  • one or more nodes are interconnected through one or more links to form an input node and output node relationship within the neural network.
  • the characteristics of the neural network may be determined according to the number of nodes and links within the neural network, the correlation between the nodes and links, and the value of the weight assigned to each link. For example, if the same number of nodes and links exist and two neural networks with different weight values of the links exist, the two neural networks may be recognized as different from each other.
  • a neural network may consist of a set of one or more nodes.
  • a subset of nodes that make up a neural network can form a layer.
  • Some of the nodes constituting the neural network may form one layer based on the distances from the first input node.
  • a set of nodes with a distance n from the initial input node may constitute n layers.
  • the distance from the initial input node can be defined by the minimum number of links that must be passed to reach the node from the initial input node.
  • this definition of a layer is arbitrary for explanation purposes, and the order of a layer within a neural network may be defined in a different way than described above.
  • a layer of nodes may be defined by distance from the final output node.
  • the initial input node may refer to one or more nodes in the neural network through which data is directly input without going through links in relationships with other nodes. Alternatively, in the relationship between nodes based on links within a neural network, it may refer to nodes that do not have other input nodes connected by links. Similarly, the final output node may refer to one or more nodes that do not have an output node in their relationship with other nodes among the nodes in the neural network. Additionally, hidden nodes may refer to nodes constituting a neural network other than the first input node and the last output node.
  • the neural network according to an embodiment of the present disclosure is a neural network in which the number of nodes in the input layer may be the same as the number of nodes in the output layer, and the number of nodes decreases and then increases again as it progresses from the input layer to the hidden layer. You can.
  • the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be less than the number of nodes in the output layer, and the number of nodes decreases as it progresses from the input layer to the hidden layer. there is.
  • the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as it progresses from the input layer to the hidden layer. You can.
  • a neural network according to another embodiment of the present disclosure may be a neural network that is a combination of the above-described neural networks.
  • a deep neural network may refer to a neural network that includes multiple hidden layers in addition to the input layer and output layer. Deep neural networks allow you to identify latent structures in data. In other words, it is possible to identify the potential structure of a photo, text, video, voice, or music (e.g., what object is in the photo, what the content and emotion of the text are, what the content and emotion of the voice are, etc.) . Deep neural networks include convolutional neural networks (CNN), recurrent neural networks (RNN), auto encoders, generative adversarial networks (GAN), and restricted Boltzmann machines (RBM). machine), deep belief network (DBN), Q network, U network, Siamese network, Generative Adversarial Network (GAN), etc.
  • CNN convolutional neural networks
  • RNN recurrent neural networks
  • GAN generative adversarial networks
  • RBM restricted Boltzmann machines
  • DBN deep belief network
  • Q network Q network
  • U network Siamese network
  • the network function may include an autoencoder.
  • An autoencoder may be a type of artificial neural network to output output data similar to input data.
  • the autoencoder may include at least one hidden layer, and an odd number of hidden layers may be placed between input and output layers.
  • the number of nodes in each layer may be reduced from the number of nodes in the input layer to an intermediate layer called the bottleneck layer (encoding), and then expanded symmetrically and reduced from the bottleneck layer to the output layer (symmetrical to the input layer).
  • Autoencoders can perform nonlinear dimensionality reduction.
  • the number of input layers and output layers can be corresponded to the dimension after preprocessing of the input data.
  • the number of nodes in the hidden layer included in the encoder may have a structure that decreases as the distance from the input layer increases. If the number of nodes in the bottleneck layer (the layer with the fewest nodes located between the encoder and decoder) is too small, not enough information may be conveyed, so if it is higher than a certain number (e.g., more than half of the input layers, etc.) ) may be maintained.
  • a certain number e.g., more than half of the input layers, etc.
  • a neural network may be trained in at least one of supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
  • Learning of a neural network may be a process of applying knowledge for the neural network to perform a specific operation to the neural network.
  • Neural networks can be trained to minimize output errors.
  • neural network learning learning data is repeatedly input into the neural network, the output of the neural network and the error of the target for the learning data are calculated, and the error of the neural network is transferred from the output layer of the neural network to the input layer in the direction of reducing the error. This is the process of updating the weight of each node in the neural network through backpropagation.
  • teacher learning learning data in which the correct answer is labeled in each learning data is used (i.e., labeled learning data), and in the case of non-teacher learning, the correct answer may not be labeled in each learning data.
  • the learning data may be data in which each learning data is labeled with a category.
  • Labeled training data is input to the neural network, and the error can be calculated by comparing the output (category) of the neural network with the label of the training data.
  • the error can be calculated by comparing the input training data with the neural network output. The calculated error is backpropagated in the reverse direction (i.e., from the output layer to the input layer) in the neural network, and the connection weight of each node in each layer of the neural network can be updated according to backpropagation. The amount of change in the connection weight of each updated node may be determined according to the learning rate.
  • the neural network's calculation of input data and backpropagation of errors can constitute a learning cycle (epoch).
  • the learning rate may be applied differently depending on the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network training, a high learning rate can be used to increase efficiency by allowing the neural network to quickly achieve a certain level of performance, and in the later stages of training, a low learning rate can be used to increase accuracy.
  • the training data can generally be a subset of real data (i.e., the data to be processed using the learned neural network), and thus the error for the training data is reduced, but the error for the real data is reduced. There may be an incremental learning cycle.
  • Overfitting is a phenomenon in which errors in actual data increase due to excessive learning on training data. For example, a phenomenon in which a neural network that learned a cat by showing a yellow cat fails to recognize that it is a cat when it sees a non-yellow cat may be a type of overfitting. Overfitting can cause errors in machine learning algorithms to increase. To prevent such overfitting, various optimization methods can be used. To prevent overfitting, methods such as increasing the learning data, regularization, dropout to disable some of the network nodes during the learning process, and use of a batch normalization layer can be applied. You can.
  • Figure 3 is a flowchart showing a method for learning a neural network model for converting an image according to an embodiment of the present disclosure.
  • the computing device 100 may directly acquire an “image for conversion using a diffusion model” or receive it from an external system.
  • the external system may be a server, database, etc. that stores and manages images for post-processing.
  • the computing device 100 may use an image directly acquired or received from an external system as “input data for converting a facial image using a diffusion model.”
  • the computing device 100 may receive the target image and the original image (S110).
  • the original image may include an image that is the target of conversion
  • the target image may include an image with characteristics to be reflected in the original image.
  • the original image may include an image in which only the identity of the face is subject to conversion while maintaining the posture, placement of facial features, gaze direction, etc.
  • the target image may be may include an image containing the identity characteristics of the face to be reflected in the original image.
  • a face image is used as an example, the target image and original image are not limited to face images and may include various images.
  • the computing device 100 may use a neural network model to remove part or all of the noise from first image information containing noise and obtain second image information (S120).
  • first image information and second image information are only used to distinguish an image from which noise is partially or completely removed from an image containing noise, and are not limited to referring to a specific image. .
  • the first image information containing noise may include isotropic Gaussian distributed noise
  • the computing device 100 may partially remove noise from the Gaussian distributed noise using the neural network model.
  • the image can be acquired as second image information.
  • the computing device 100 may obtain an image from which the noise has been completely removed by repeating the process of removing the noise using the neural network model, and in this case, the image from which the noise has been completely removed will be included in the second image information.
  • An image containing noise immediately before an image from which noise has been completely removed may be included in the first image information.
  • the specific process in which part or all of the noise is removed from the first image information containing noise by using a neural network model and the second image information is acquired will be described later with reference to FIG. 5.
  • the neural network model includes a neural network model learned to obtain an image from which part or all of the noise has been removed by removing part or all of the noise with respect to Gaussian distributed noise, and the neural network model
  • the neural network model When the original image and the mask condition are input, it may include a neural network model learned to obtain the second image information from which part or all of the noise is removed based on the original image and the mask condition. You can.
  • the specific process by which the neural network model is trained to remove part or all of the Gaussian distributed noise to obtain an image from which part or all of the noise has been removed will be described later with reference to FIG. 4.
  • the mask condition can be determined by creating a masking area for the part of the original image received through step S110 for which conversion is desired and based on the created masking area. The specific process for this is shown in FIG. 5 below. It is described later.
  • the computing device 100 may calculate a feature loss function by comparing the target image received through step S110 and the second image information obtained through step S120 (S130-1). Specifically, the computing device 100 extracts a target identity feature vector based on the target image received through step S110, and extracts a first identity feature vector based on the second image information obtained through step S120. A feature vector may be extracted, and the feature loss function may be calculated based on the extracted first identity feature vector and the target identity feature vector.
  • the calculated feature loss function can be used in the process of learning a neural network model for image conversion, and the specific process will be described later with reference to FIGS. 5 and 6A. Meanwhile, the feature loss function may be obtained through comparison of the identity feature vectors (first and target), but is not limited to the above examples and may be obtained through various methods.
  • the computing device 100 may calculate a structure loss function by comparing the original image received through step S110 and the second image information obtained through step S120 (S130-2).
  • the structure loss function may include a region loss function or a gaze loss function
  • the structure loss function is calculated by comparing the original image received through step S110 and the second image information obtained through step S120.
  • the calculating step includes calculating a region loss function by comparing the second image information and the original image; And it may include calculating a gaze loss function by comparing the second image information and the original image.
  • the computing device 100 extracts first segmentation area information based on the second image information, extracts original segmentation area information based on the original image, and extracts the extracted first segmentation area information.
  • the region loss function can be calculated based on the division region information and the original division region information. Additionally, the computing device 100 extracts first gaze direction information based on the second image information, extracts original gaze direction information based on the original image, and extracts the extracted first gaze direction information. And the gaze direction loss function may be calculated based on the original gaze direction information.
  • the calculated structure loss function can be used in the process of learning a neural network model for image conversion, and the specific process will be described later with reference to FIGS. 5, 6B, and 6C. Meanwhile, the structure loss function may be obtained through comparison of the segmentation area information (first and original) and comparison of the gaze direction information (first and original), but in the above example It is not limited to these and can be obtained through various methods.
  • the computing device 100 may train the neural network model based on at least one of the feature loss function calculated through step S130-1 or the structure loss function calculated through step S130-2 (S140).
  • the computing device 100 may calculate a transformation loss function based on the feature loss function and the structure loss function, and train the neural network model based on the transformation loss function.
  • the transformation loss function may be calculated as a weighted sum of the feature loss function, the region loss function, and the gaze loss function.
  • the computing device 100 calculates a transformation loss function by adjusting the weights of the feature loss function, the region loss function, and the gaze direction loss function, and learns the neural network model based on the transformation loss function, When performing image conversion using the neural network model, a technical effect can be obtained that can focus on reflecting desired characteristics. Meanwhile, the specific process in which the transformation loss function is calculated and the neural network model is learned based on it will be described later with reference to FIG. 5.
  • Figure 4 shows a process in which a neural network model is learned to obtain an image from which part or all of the noise has been removed by removing part or all of the Gaussian distributed noise according to an embodiment of the present disclosure. This is a schematic diagram to explain.
  • the computing device 100 removes part or all of the isotropic Gaussian distributed noise 10-4, thereby reducing the noise to A neural network model can be trained to obtain an image (10-1, 10-2, or 10-3) from which part or all of the image has been removed.
  • the neural network model may include a conditional noise prediction model.
  • the conditional noise prediction model may include a U-Net structure where the input and output have the same size, and the noise-containing image x(t) and the diffusion time step t are used as input. And, the diffusion noise included in the noise-containing image x(t) can be predicted and output.
  • the computing device 100 repeats the process of gradually adding random Gaussian distribution noise over T number of type steps (time steps) to the image x(0)(10-1) that does not contain noise (from 10-2). direction of 10-3), and as a result, a forward process can be performed to obtain isotropic Gaussian distributed noise x(T) (isotropic gaussian distributed noise) (10-4).
  • the forward process according to an embodiment of the present disclosure can be performed using the following equation.
  • Equation 1 above can be used as a hyperparameter in the process of calculating the diffusion coefficient, can be set to any value and It can be set to a value of . for example This has a value of 0.0001 can have a value of 0.02, from until The value of may increase linearly or according to a cosine function, and T, which means the total number of diffusion steps, may be set to 1000. but, It is only an example that increases linearly or according to a cosine function, and can be increased through various methods. The amount of increase can be determined. also, may mean random Gaussian distributed noise.
  • equations (1) and (2) in Equation 2 are the diffusion coefficient It is a way to express the specific meaning of .
  • the diffusion coefficient at a specific time step t The hyperparameter above is 1 It can be calculated by subtracting the diffusion coefficient in equation (2) of Equation 2 above. may mean the diffusion coefficient sequentially accumulated from time step 1 to t.
  • Equation (3) in Equation 2 is the diffusion coefficient of the image x(t)(10-3) containing the noise and the image x(0)(10-1) containing the noise at time step t. ( ) and random Gaussian distributed noise ( ) is expressed as a formula for .
  • the computing device 100 repeats the process of gradually adding random Gaussian distribution noise over T type steps (time steps) to the image x(0)(10-1) that does not contain noise (10 direction from -2 to 10-3), and as a result, a forward process can be performed to obtain isotropic Gaussian distributed noise x(T) (10-4).
  • the forward process is not limited to the example of Equation 2 above, and various processes that add noise to data may be included in the forward process.
  • the computing device 100 generates a random Gaussian distribution little by little over T type steps (time steps) for the isotropic Gaussian distributed noise x(T) (10-4) in the opposite direction to the forward process.
  • Learn the neural network model to repeat the process of removing noise (in the direction from 10-3 to 10-2) and perform the reverse process to obtain an image x(0)(10-1) that does not contain noise as a result. You can do it.
  • the formula representing the reverse execution process can be expressed as follows.
  • Equation 3 equation (1) is the noise prediction result predicted by the conditional noise prediction model for “the image containing the noise x(t)(10-3)” ( ) is a formula that represents the reverse process of obtaining “the previous image x(t-1)(10-2) from which some of the noise has been removed.”
  • equation (2) in Equation 3 above is the diffusion coefficient at the current time step t means, and in equation (3) in equation 3 above, means the dispersion parameter, diffusion coefficient It can be calculated based on .
  • the reverse process is not limited to Equation 3 above, and various processes for removing noise from data containing noise may be included in the reverse process.
  • the computing device 100 inputs an image x(t)(10-3) containing noise and a time step t to the conditional noise prediction model, and matches the noise prediction result predicted by the conditional noise prediction model with the actual noise prediction result.
  • the noise prediction loss can be calculated by comparing the included diffusion noise, and the conditional noise prediction model can be trained by performing gradient down according to the noise prediction loss.
  • the noise prediction loss calculated by comparing the noise prediction result predicted by the conditional noise prediction model with the diffusion noise actually included can be expressed by the following formula.
  • the noise prediction loss is the diffusion noise actually included ( ) and the noise prediction result predicted by the conditional noise prediction model ( ) can be calculated by comparing.
  • the noise prediction loss is not limited to the example of Equation 4, and various loss functions calculated by comparing the noise prediction result and the actually included diffusion noise may be included in the noise prediction loss.
  • the neural network model predicts the diffuse noise included in the noise-containing image x(t)(10-3) using the learned conditional noise prediction model, and partially removes the predicted diffuse noise. It can be learned to obtain “image x(t-1)(10-2) with some of the noise removed.”
  • the neural network model predicts the diffusion noise included in the “image x(t)(10-3) containing the noise” and repeats the process of removing the predicted diffusion noise one or more times, “the noise By removing all the diffusion noise included in the image x(t)(10-3), it can be learned to obtain an image x(0)(10-1) from which all the noise has been removed. Meanwhile, by using an additional method in the process of removing diffusion noise from an image containing noise using the neural network model, the neural network model can be trained to generate a converted image. The specific process for this is shown in the figure below. This is explained later in Section 5.
  • FIG. 5 is a diagram in which part or all of the noise is removed based on the original image, target image, and mask condition when the original image, target image, and masked condition are input to the neural network model according to an embodiment of the present disclosure.
  • 2 This is a schematic diagram showing the process by which a neural network model is learned to acquire image information.
  • the terms of the first image information and the second image information refer to an image in which some or all of the noise has been removed from an image containing noise (for example, the image 22-3 of x(1)'). (Example, image (22-4) of 'x(0)') is used to distinguish) and is not limited to meaning one specific image.
  • the first image information containing noise may include isotropic Gaussian distributed noise 22-1, and the computing device 100 may use the neural network model to generate the isotropic Gaussian distributed noise 22-1.
  • x(T-1)' (22-2) an image from which noise has been partially removed, can be obtained as second image information.
  • the computing device 100 can obtain an image x(0)'(22-4) from which the noise has been completely removed by repeating the process of removing the noise using the neural network model, and in this case, the noise has been completely removed.
  • the obtained image x(0)'(22-4) may be included in the second image information, and includes noise at a stage immediately before the image x(0)'(22-4) from which the noise has been completely removed is acquired.
  • the image x(1)'(22-3) may be included in the first image information.
  • the computing device 100 may perform facial image conversion using a diffusion model based on the original image 11 and the target image 21 directly acquired or received from an external system.
  • the target image 21 may be an image with identity characteristics to be input when converting the original image 11. For example, if you want to input the identity features of the face in the image of a specific person (in the example of FIG. 5, a person wearing a gray suit) when converting the original image 11, the image of the specific person from which the identity features are to be extracted is as above. It may be included in the target image 21.
  • the original image 11 may include an image with features desired to be preserved during image conversion. For example, referring to FIG. 5, the original image 11 has posture, facial expression, background, gaze direction, etc. excluding identity features such as the facial outline of the eyes, nose, mouth, and lower jaw that can be applied when converting the image. May include images of people. However, the person image is only an example, and various images other than the person image may be included.
  • the computing device 100 may input the original image 11 and masked conditions 12-1 and 12-2 into the neural network model, and A neural network model may be learned to obtain the second image information from which part or all of the noise has been removed based on the image 11 and the mask conditions 12-1 and 12-2.
  • the masked condition (12-1, 12-2) is such that a masking area (12-1) is created for the part of the original image (11) where conversion is desired, and the created masking area (12-12-1) is It can be decided based on 1). For example, if you want to transform the face part of the original image 11, segmentation technology based on deep learning can be used to create a masking area 12-1 for the face part.
  • Deep learning-based segmentation technologies such as Fully Convolutional Network (FCN), U-Net, DeepLab, and Mask RCNN can be used, but are not limited to this and can be used to create a masking area (12-1) for the part where conversion of the original image is desired.
  • FCN Fully Convolutional Network
  • U-Net U-Net
  • DeepLab DeepLab
  • Mask RCNN Mask RCNN
  • the computing device 100 may utilize segmentation technology based on deep learning to create a masking area 12-1 for the facial part desired to be converted in the original image 11, and the generated masking
  • An inverted masking area 12-2 can be created based on the area 12-1.
  • the inverted masking area 12-2 is where random Gaussian distributed noise is gradually removed over T number of type steps (time step) with respect to isotropic Gaussian distributed noise x(T) (22-1).
  • the black portion in the inverted masking area 12-2 may include information with features desired to be preserved when converting the original image 11, such as “eyes, nose, It may include “features other than identity features such as the facial outline of the mouth and lower diaphragm.”)
  • the computing device 100 extracts original background information through element-wise product between the generated masking area 12-1 and the original image information 10, and applies the extracted original background information to the neural network model.
  • the original image is obtained in the process of obtaining a converted image x(0)'(22-4) that does not contain noise.
  • the converted image x(0)'(22-4) with the original background information maintained can be obtained.
  • the computing device 100 performs an element-wise product between the inverted masking area 12-2 and an isotropic Gaussian distributed noise x(T) 22-1 for the part desired to be transformed.
  • the neural network model removes part or all of the noise from the first image information (eg, the image of 22-1) containing noise
  • the neural network model is learned based on the transformation loss function 31, resulting in a transformed image x(0) that does not contain noise.
  • the transformation loss function 31 may be calculated based on the feature loss function and the structure loss function, and the structure loss function may include a region loss function or a gaze loss function.
  • the conversion loss function 31 can be calculated using the following formula.
  • the conversion loss function 31 is “a feature loss function for reflecting the desired identity of the target image 21 in the original image 11,” and “the converted image is the original image 11.” ) is calculated as a weighted sum of “region loss function for ensuring that the converted image has posture, facial expression, etc. excluding the identity features of )” and “gaze direction loss function for ensuring that the converted image has the gaze direction of the original image (11).” You can.
  • the neural network model is learned based on the transformation loss function 31 to “reflect the desired identity of the target image 21 with respect to the original image 11, and the identity features of the original image 11 may be learned to obtain an image x(0)'(22-4)" having posture, facial expression, and gaze direction excluding At this time, the second image information x(T-1)' can be obtained by removing part or all of the random Gaussian distribution noise from the information x(T)(22-1). However, if it is not learned based on the noise removal process of the neural network model, the conversion loss may be performed so that the “image x(0)’(22-4)” without noise is the same as the original image 11.
  • the computing device 100 may use the neural network model learned through the example (i.e., mask condition or transformation loss function) described above in FIG. 5 to create an isotropic Gaussian distribution noise x(T). (isotropic gaussian distributed noise) For (22-1), remove part or all of the random Gaussian distributed noise (direction from 22-2 to 22-3), and obtain the transformed image x(0)'( 22-4), in addition, according to an embodiment of the present disclosure, the computing device 100 can focus on reflecting the desired characteristics of the target image by performing original image transformation using the neural network model.
  • Figures 6A to 6C show the process of calculating three loss functions used to calculate the conversion loss function according to an embodiment of the present disclosure.
  • This is a schematic diagram.
  • FIG. 6A is a schematic diagram showing a process for calculating a feature loss function by comparing second image information and a target image according to an embodiment of the present disclosure.
  • the computing device 100 extracts a target identity feature vector 32-1 based on the target image 21 and extracts a first identity feature vector 32-1 based on the second image information 23.
  • (identity) feature vector (32-2) can be extracted.
  • the computing device 100 inputs the target image 21 and the second image information 23 into an ID embedding network (e.g. ArcFace, ShphereFace, Smooth-Swap, Partial-fc, etc.) to create an “image.”
  • an ID embedding network e.g. ArcFace, ShphereFace, Smooth-Swap, Partial-fc, etc.
  • the computing device 100 compares the extracted target identity feature vector 32-1 and the first identity feature vector 32-2 to determine the feature.
  • a loss function can be calculated (33), and the neural network model can be trained based on the calculated feature loss function.
  • a feature loss function can be calculated by performing a cosine similarity operation between the extracted target identity feature vector 32-1 and the first identity feature vector 32-2 (33) ), the neural network model can be trained based on the feature loss function by using the calculated feature loss function as a loss term when learning the neural network model.
  • it is not limited to the cosine similarity operation and various operations may be used in the process of calculating the feature loss function.
  • a technical effect can be obtained that allows the identity features of the target image to be well reflected in the part to be converted when converting the image.
  • FIG. 6B is a schematic diagram showing a process of calculating a region loss function by comparing second image information and the original image according to an embodiment of the present disclosure.
  • the computing device 100 extracts original segmentation area information 42-1 based on the original image 11 and performs first segmentation based on the second image information 23.
  • (Segmentation) Area information (42-2) can be extracted.
  • the computing device 100 inputs the original image 11 and the second image information 23 into a deep learning-based segmentation network (e.g. Face Parser, etc.) to create “the original image 11.”
  • a deep learning-based segmentation network e.g. Face Parser, etc.
  • the original segmentation area information 42-1” and the first segmentation area information 42 which includes information with features that are desired to be preserved during conversion (e.g., posture, arrangement of facial features) -2) can be extracted respectively.
  • the computing device 100 compares the extracted original segmentation area information 42-1 and the first segmentation area information 42-2 to determine the area.
  • a loss function can be calculated (43), and the neural network model can be trained based on the calculated region loss function.
  • the region loss function can be calculated by calculating L1 loss between the extracted original segmentation region information 42-1 and the first segmentation region information 42-2 (43) ), the neural network model can be trained based on the region loss function by using the calculated region loss function as a loss term when learning the neural network model.
  • L1 loss calculation it is not limited to the L1 loss calculation, and various operations may be used in the process of calculating the region loss function.
  • a technical effect can be obtained that allows the features of the original image to be well maintained for the part that is desired to be preserved when converting the image.
  • Figure 6C is a schematic diagram showing the process of calculating a gaze direction loss function by comparing second image information and the original image according to an embodiment of the present disclosure.
  • the computing device 100 extracts original gaze direction (gaze) information 52-1 based on the original image 11, and extracts first gaze information 52-1 based on the second image information 23.
  • Direction (gaze) information (52-2) can be extracted (51).
  • the computing device 100 inputs the original image 11 and the second image information 23 into a gaze direction detection network based on deep learning (e.g. GazeNet, HRNetGaze, keypoint estimator, etc.) Extract “the original gaze direction information 52-1, which includes the gaze direction information to be preserved when converting the original image 11” and the first gaze direction information 52-2, respectively. can do.
  • deep learning e.g. GazeNet, HRNetGaze, keypoint estimator, etc.
  • the computing device 100 compares the extracted original gaze direction (gaze) information 52-1 and the first gaze direction (gaze) information 52-2 to determine the gaze direction.
  • a loss function can be calculated (53), and the neural network model can be trained based on the calculated gaze direction loss function.
  • the gaze direction loss function can be calculated by calculating L1 loss between the extracted original gaze direction (gaze) information 52-1 and the first gaze direction (gaze) information 52-2 (53) ), the neural network model can be trained based on the gaze direction loss function by using the calculated gaze direction loss function as a loss term when learning the neural network model.
  • it is not limited to the L1 loss calculation and various operations may be used in the process of calculating the gaze direction loss function.
  • a technical effect can be obtained that allows the gaze direction of the original image to be well maintained with respect to the gaze direction desired to be preserved when converting the image.
  • Data structure can refer to the organization, management, and storage of data to enable efficient access and modification of data.
  • Data structure can refer to the organization of data to solve a specific problem (e.g., retrieving data, storing data, or modifying data in the shortest possible time).
  • a data structure may be defined as a physical or logical relationship between data elements designed to support a specific data processing function. Logical relationships between data elements may include connection relationships between user-defined data elements. Physical relationships between data elements may include actual relationships between data elements that are physically stored in a computer-readable storage medium (e.g., a persistent storage device).
  • a data structure may specifically include a set of data, relationships between data, and functions or instructions applicable to the data. Effectively designed data structures allow computing devices to perform computations while minimizing the use of the computing device's resources. Specifically, computing devices can increase the efficiency of operations, reading, insertion, deletion, comparison, exchange, and search through effectively designed data structures.
  • Data structures can be divided into linear data structures and non-linear data structures depending on the type of data structure.
  • a linear data structure may be a structure in which only one piece of data is connected to another piece of data.
  • Linear data structures may include List, Stack, Queue, and Deque.
  • a list can refer to a set of data that has an internal order.
  • the list may include a linked list.
  • a linked list may be a data structure in which data is connected in such a way that each data has a pointer and is connected in one line. In a linked list, a pointer can contain connection information to the next or previous data.
  • a linked list can be expressed as a singly linked list, doubly linked list, or circular linked list depending on its form.
  • a stack may be a data listing structure that allows limited access to data.
  • a stack can be a linear data structure in which data can be processed (for example, inserted or deleted) at only one end of the data structure.
  • Data stored in the stack may have a data structure (LIFO-Last in First Out) where the later it enters, the sooner it comes out.
  • a queue is a data listing structure that allows limited access to data. Unlike the stack, it can be a data structure (FIFO-First in First Out) where data stored later is released later.
  • a deck can be a data structure that can process data at both ends of the data structure.
  • a non-linear data structure may be a structure in which multiple pieces of data are connected behind one piece of data.
  • Nonlinear data structures may include graph data structures.
  • a graph data structure can be defined by vertices and edges, and an edge can include a line connecting two different vertices.
  • Graph data structure may include a tree data structure.
  • a tree data structure may be a data structure in which there is only one path connecting two different vertices among a plurality of vertices included in the tree. In other words, it may be a data structure that does not form a loop in the graph data structure.
  • Data structures may include neural networks. And the data structure including the neural network may be stored in a computer-readable medium. Data structures including neural networks also include data preprocessed for processing by a neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may include a loss function for learning.
  • a data structure containing a neural network may include any of the components disclosed above.
  • the data structure including the neural network includes preprocessed data for processing by the neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may be configured to include all or any combination of the loss function for learning.
  • a data structure containing a neural network may include any other information that determines the characteristics of the neural network.
  • the data structure may include all types of data used or generated in the computational process of a neural network and is not limited to the above.
  • Computer-readable media may include computer-readable recording media and/or computer-readable transmission media.
  • a neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons.
  • a neural network consists of at least one node.
  • the data structure may include data input to the neural network.
  • a data structure containing data input to a neural network may be stored in a computer-readable medium.
  • Data input to the neural network may include learning data input during the neural network learning process and/or input data input to the neural network on which training has been completed.
  • Data input to the neural network may include data that has undergone pre-processing and/or data subject to pre-processing.
  • Preprocessing may include a data processing process to input data into a neural network. Therefore, the data structure may include data subject to preprocessing and data generated by preprocessing.
  • the above-described data structure is only an example and the present disclosure is not limited thereto.
  • the data structure may include the weights of the neural network. (In this specification, weights and parameters may be used with the same meaning.) And the data structure including the weights of the neural network may be stored in a computer-readable medium.
  • a neural network may include multiple weights. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. Based on the weight, the data value output from the output node can be determined.
  • the above-described data structure is only an example and the present disclosure is not limited thereto.
  • the weights may include weights that are changed during the neural network learning process and/or weights for which neural network learning has been completed.
  • Weights that change during the neural network learning process may include weights that change at the start of the learning cycle and/or weights that change during the learning cycle.
  • the above-described data structure is only an example and the present disclosure is not limited thereto.
  • the data structure including the weights of the neural network may be stored in a computer-readable storage medium (e.g., memory, hard disk) after going through a serialization process.
  • Serialization can be the process of converting a data structure into a form that can be stored on the same or a different computing device and later reorganized and used.
  • Computing devices can transmit and receive data over a network by serializing data structures.
  • Data structures containing the weights of a serialized neural network can be reconstructed on the same computing device or on a different computing device through deserialization.
  • the data structure including the weights of the neural network is not limited to serialization.
  • the data structure including the weights of the neural network is a data structure to increase computational efficiency while minimizing the use of computing device resources (e.g., in non-linear data structures, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree) may be included.
  • computing device resources e.g., in non-linear data structures, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree.
  • the data structure may include hyper-parameters of a neural network. And the data structure including the hyperparameters of the neural network can be stored in a computer-readable medium.
  • a hyperparameter may be a variable that can be changed by the user. Hyperparameters include, for example, learning rate, cost function, number of learning cycle repetitions, weight initialization (e.g., setting the range of weight values subject to weight initialization), Hidden Unit. It may include a number (e.g., number of hidden layers, number of nodes in hidden layers).
  • the above-described data structure is only an example and the present disclosure is not limited thereto.
  • FIG. 7 is a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.
  • program modules include routines, programs, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • routines programs, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • program modules include routines, programs, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the described embodiments of the disclosure can also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote memory storage devices.
  • Computers typically include a variety of computer-readable media.
  • Computer-readable media can be any medium that can be accessed by a computer, and such computer-readable media includes volatile and non-volatile media, transitory and non-transitory media, removable and non-transitory media. Includes removable media.
  • Computer-readable media may include computer-readable storage media and computer-readable transmission media.
  • Computer-readable storage media refers to volatile and non-volatile media, transient and non-transitory media, removable and non-removable, implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Includes media.
  • Computer readable storage media may include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage. This includes, but is not limited to, a device, or any other medium that can be accessed by a computer and used to store desired information.
  • a computer-readable transmission medium typically implements computer-readable instructions, data structures, program modules, or other data on a modulated data signal, such as a carrier wave or other transport mechanism. Includes all information delivery media.
  • modulated data signal refers to a signal in which one or more of the characteristics of the signal have been set or changed to encode information within the signal.
  • computer-readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.
  • System bus 1108 couples system components, including but not limited to system memory 1106, to processing unit 1104.
  • Processing unit 1104 may be any of a variety of commercially available processors. Dual processors and other multiprocessor architectures may also be used as processing unit 1104.
  • System bus 1108 may be any of several types of bus structures that may further be interconnected to a memory bus, peripheral bus, and local bus using any of a variety of commercial bus architectures.
  • System memory 1106 includes read only memory (ROM) 1110 and random access memory (RAM) 1112.
  • the basic input/output system (BIOS) is stored in non-volatile memory 1110, such as ROM, EPROM, and EEPROM, and is a basic input/output system that helps transfer information between components within the computer 1102, such as during startup. Contains routines.
  • RAM 1112 may also include high-speed RAM, such as static RAM, for caching data.
  • Computer 1102 may also include an internal hard disk drive (HDD) 1114 (e.g., EIDE, SATA)—the internal hard disk drive 1114 may also be configured for external use within a suitable chassis (not shown).
  • HDD hard disk drive
  • FDD magnetic floppy disk drive
  • optical disk drive 1120 e.g., a CD-ROM for reading the disk 1122 or reading from or writing to other high-capacity optical media such as DVDs.
  • Hard disk drive 1114, magnetic disk drive 1116, and optical disk drive 1120 are connected to system bus 1108 by hard disk drive interface 1124, magnetic disk drive interface 1126, and optical drive interface 1128, respectively. ) can be connected to.
  • the interface 1124 for implementing an external drive includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
  • drives and their associated computer-readable media provide non-volatile storage of data, data structures, computer-executable instructions, and the like.
  • drive and media correspond to storing any data in a suitable digital format.
  • computer-readable media refers to removable optical media such as HDDs, removable magnetic disks, and CDs or DVDs, those of ordinary skill in the art would also recognize zip drives, magnetic cassettes, flash memory cards, and cartridges. It will be appreciated that other types of computer-readable media may also be used in the exemplary operating environment, and that any such media may contain computer-executable instructions for performing the methods of the present disclosure. .
  • a number of program modules may be stored in the drive and RAM 1112, including an operating system 1130, one or more application programs 1132, other program modules 1134, and program data 1136. All or portions of the operating system, applications, modules and/or data may also be cached in RAM 1112. It will be appreciated that the present disclosure may be implemented on various commercially available operating systems or combinations of operating systems.
  • a user may enter commands and information into computer 1102 through one or more wired/wireless input devices, such as a keyboard 1138 and a pointing device such as mouse 1140.
  • Other input devices may include microphones, IR remote controls, joysticks, game pads, stylus pens, touch screens, etc.
  • input device interface 1142 which is often connected to the system bus 1108, but may also include a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, It can be connected by other interfaces, etc.
  • a monitor 1144 or other type of display device is also connected to system bus 1108 through an interface, such as a video adapter 1146.
  • computers typically include other peripheral output devices (not shown) such as speakers, printers, etc.
  • Computer 1102 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1148, via wired and/or wireless communications.
  • Remote computer(s) 1148 may be a workstation, computing device computer, router, personal computer, portable computer, microprocessor-based entertainment device, peer device, or other conventional network node, and is generally connected to computer 1102.
  • the logical connections depicted include wired/wireless connections to a local area network (LAN) 1152 and/or a larger network, such as a wide area network (WAN) 1154.
  • LAN and WAN networking environments are common in offices and companies and facilitate enterprise-wide computer networks, such as intranets, all of which can be connected to a worldwide computer network, such as the Internet.
  • computer 1102 When used in a LAN networking environment, computer 1102 is connected to local network 1152 through wired and/or wireless communication network interfaces or adapters 1156. Adapter 1156 may facilitate wired or wireless communication to LAN 1152, which also includes a wireless access point installed thereon for communicating with wireless adapter 1156.
  • the computer 1102 When used in a WAN networking environment, the computer 1102 may include a modem 1158 or be connected to a communicating computing device on the WAN 1154 or to establish communications over the WAN 1154, such as via the Internet. Have other means. Modem 1158, which may be internal or external and a wired or wireless device, is coupled to system bus 1108 via serial port interface 1142.
  • program modules described for computer 1102, or portions thereof may be stored in remote memory/storage device 1150. It will be appreciated that the network connections shown are exemplary and that other means of establishing a communications link between computers may be used.
  • Computer 1102 may be associated with any wireless device or object deployed and operating in wireless communications, such as a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communications satellite, wirelessly detectable tag. Performs actions to communicate with any device or location and telephone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, communication may be a predefined structure as in a conventional network or may simply be ad hoc communication between at least two devices.
  • wireless communications such as a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communications satellite, wirelessly detectable tag.
  • PDA portable data assistant
  • Wi-Fi Wireless Fidelity
  • Wi-Fi is a wireless technology, like cell phones, that allows these devices, such as computers, to send and receive data indoors and outdoors, anywhere within the coverage area of a cell tower.
  • Wi-Fi networks use wireless technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, and high-speed wireless connections.
  • Wi-Fi can be used to connect computers to each other, to the Internet, and to wired networks (using IEEE 802.3 or Ethernet).
  • Wi-Fi networks can operate in the unlicensed 2.4 and 5 GHz wireless bands, for example, at data rates of 11 Mbps (802.11a) or 54 Mbps (802.11b), or in products that include both bands (dual band). .
  • the various embodiments presented herein may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques.
  • article of manufacture includes a computer program, carrier, or media accessible from any computer-readable storage device.
  • computer-readable storage media include magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CDs, DVDs, etc.), smart cards, and flash. Includes, but is not limited to, memory devices (e.g., EEPROM, cards, sticks, key drives, etc.).
  • various storage media presented herein include one or more devices and/or other machine-readable media for storing information.
  • It can be used in computing devices, systems, etc. to learn a neural network model for converting images.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation divulgue, selon un mode de réalisation, un procédé permettant de former un modèle de réseau neuronal pour convertir une image, le procédé étant réalisé par un ou plusieurs processeurs d'un dispositif informatique. Le procédé peut comprendre les étapes consistant : à recevoir une image cible et une image d'origine ; à éliminer, en utilisant un modèle de réseau neuronal, une partie ou la totalité du bruit inclus dans des premières informations d'image et à obtenir des secondes informations d'image ; à comparer les secondes informations d'image à l'image cible pour calculer une fonction de perte de caractéristique ; à comparer les secondes informations d'image à l'image d'origine pour calculer une fonction de perte de structure ; et à former le modèle de réseau neuronal sur la base de la fonction de perte de caractéristique et/ou de la fonction de perte de structure.
PCT/KR2023/019225 2022-12-02 2023-11-27 Procédé de conversion d'image faciale à l'aide d'un modèle de diffusion WO2024117708A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0166571 2022-12-02
KR1020220166571A KR102615322B1 (ko) 2022-12-02 2022-12-02 확산 모델을 이용한 얼굴 이미지 변환 방법

Publications (1)

Publication Number Publication Date
WO2024117708A1 true WO2024117708A1 (fr) 2024-06-06

Family

ID=89385271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/019225 WO2024117708A1 (fr) 2022-12-02 2023-11-27 Procédé de conversion d'image faciale à l'aide d'un modèle de diffusion

Country Status (2)

Country Link
KR (2) KR102615322B1 (fr)
WO (1) WO2024117708A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097363A (zh) * 2024-04-28 2024-05-28 南昌大学 一种基于近红外成像的人脸图像生成与识别方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102409988B1 (ko) * 2021-11-03 2022-06-16 주식회사 클레온 딥러닝 네트워크를 이용한 얼굴 변환 방법 및 장치
KR20220145792A (ko) * 2021-04-22 2022-10-31 서울대학교산학협력단 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법 및 장치

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220145792A (ko) * 2021-04-22 2022-10-31 서울대학교산학협력단 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법 및 장치
KR102409988B1 (ko) * 2021-11-03 2022-06-16 주식회사 클레온 딥러닝 네트워크를 이용한 얼굴 변환 방법 및 장치

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIANG WEI, JIANG WEI, DONG WENTAO: "Facke: a Survey on Generative Models for Face Swapping", ARXIV (CORNELL UNIVERSITY), CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, 22 June 2022 (2022-06-22), XP093177000, Retrieved from the Internet <URL:https://arxiv.org/pdf/2206.11203> DOI: 10.48550/arxiv.2206.11203 *
XU CHAO; ZHANG JIANGNING; HUA MIAO; HE QIAN; YI ZILI; LIU YONG: "Region-Aware Face Swapping", 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 18 June 2022 (2022-06-18), pages 7622 - 7631, XP034194883, DOI: 10.1109/CVPR52688.2022.00748 *
XU YANGYANG; DENG BAILIN; WANG JUNLE; JING YANQING; PAN JIA; HE SHENGFENG: "High-resolution Face Swapping via Latent Semantics Disentanglement", 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 18 June 2022 (2022-06-18), pages 7632 - 7641, XP034193284, DOI: 10.1109/CVPR52688.2022.00749 *

Also Published As

Publication number Publication date
KR102615322B1 (ko) 2023-12-19
KR102665707B1 (ko) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2024117708A1 (fr) Procédé de conversion d&#39;image faciale à l&#39;aide d&#39;un modèle de diffusion
WO2021261825A1 (fr) Dispositif et procédé de génération de données météorologiques reposant sur l&#39;apprentissage automatique
WO2022255564A1 (fr) Procédé d&#39;analyse de signal biologique
WO2021040354A1 (fr) Procédé de traitement de données utilisant un réseau de neurones artificiels
WO2022149696A1 (fr) Procédé de classification utilisant un modèle d&#39;apprentissage profond
WO2022119162A1 (fr) Méthode de prédiction de maladie basée sur une image médicale
WO2021080175A1 (fr) Procédé de traitement de contenu
WO2021157863A1 (fr) Construction de graphe à base de codeur automatique pour apprentissage semi-supervisé
WO2024058465A1 (fr) Procédé d&#39;apprentissage de modèle de réseau neuronal local pour apprentissage fédéré
KR20240035302A (ko) 신경망 모델을 활용한 부분적인 이미지 변환 방법
WO2024080791A1 (fr) Procédé de génération d&#39;ensemble de données
WO2023128349A1 (fr) Procédé d&#39;imagerie très haute résolution à l&#39;aide d&#39;un apprentissage coopératif
WO2023101417A1 (fr) Procédé permettant de prédire une précipitation sur la base d&#39;un apprentissage profond
WO2023224350A2 (fr) Procédé et dispositif de détection de point de repère à partir d&#39;une image de volume 3d
KR20240034095A (ko) 신경망 모델을 이용한 이미지 특징 추출 방법
WO2021251691A1 (fr) Procédé de détection d&#39;objet à base de rpn sans ancrage
WO2024143907A1 (fr) Procédé pour entraîner un modèle de réseau neuronal pour convertir une image en utilisant des images partielles
WO2021194105A1 (fr) Procédé d&#39;apprentissage de modèle de simulation d&#39;expert, et dispositif d&#39;apprentissage
WO2023027280A1 (fr) Procédé de déduction d&#39;un épitope candidat
WO2024143909A1 (fr) Procédé de conversion d&#39;image en étapes en prenant en considération des changements d&#39;angle
WO2024034847A1 (fr) Procédé de prédiction de lésion sur la base d&#39;une image échographique
KR102649764B1 (ko) 페이스 스왑 이미지 생성 방법
WO2022050578A1 (fr) Procédé de détermination de maladie
WO2023219237A1 (fr) Procédé basé sur l&#39;intelligence artificielle pour évaluer une doublure
WO2023075351A1 (fr) Procédé d&#39;apprentissage d&#39;intelligence artificielle pour robot industriel