CN113344792A - Image generation method and device and electronic equipment - Google Patents

Image generation method and device and electronic equipment Download PDF

Info

Publication number
CN113344792A
CN113344792A CN202110879082.1A CN202110879082A CN113344792A CN 113344792 A CN113344792 A CN 113344792A CN 202110879082 A CN202110879082 A CN 202110879082A CN 113344792 A CN113344792 A CN 113344792A
Authority
CN
China
Prior art keywords
resolution face
super
network model
resolution
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110879082.1A
Other languages
Chinese (zh)
Other versions
CN113344792B (en
Inventor
李亚鹏
王宁波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110879082.1A priority Critical patent/CN113344792B/en
Publication of CN113344792A publication Critical patent/CN113344792A/en
Priority to PCT/CN2021/128518 priority patent/WO2023010701A1/en
Application granted granted Critical
Publication of CN113344792B publication Critical patent/CN113344792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7796Active pattern-learning, e.g. online learning of image or video features based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image generation method, an image generation device and electronic equipment, wherein the method comprises the following steps: sequentially inputting N frames of low-resolution face images into a first network model in an iterative mode, training the first network model, constraining the training process by using the output loss of the first network model until the training result of the first network model converges, recording the trained first network model as a second network model, carrying out super-resolution processing on multi-frame low-resolution face images of any target by the second network model, and obtaining the super-resolution face images with the same identities as the multi-frame low-resolution face images. Based on the method, the problem that the identity information in the obtained super-resolution face image is consistent with the identity information of the single-frame low-resolution face image cannot be guaranteed when super-resolution processing is carried out on the basis of the single-frame low-resolution face image can be solved.

Description

Image generation method and device and electronic equipment
Technical Field
The application relates to the technical field of face recognition, in particular to an image generation method, an image generation device and electronic equipment.
Background
With the rapid development of scientific technology and the arrival of the big data era, information security becomes more and more important. As a safe, non-contact, convenient and efficient identity information authentication mode, face recognition has been widely applied to various aspects of social life. However, in a relatively large monitoring scene, the size of a face appearing in a video is generally small, the image definition is low, and the requirement of face recognition is difficult to meet, so the face super-resolution technology becomes more and more important. The face super-resolution technology essentially adds high-frequency features to a low-resolution face image to generate a high-resolution face image.
In the prior art, a super-resolution face image is usually obtained by super-resolution processing based on a single-frame low-resolution face image, and the super-resolution face image obtained in this way has face information loss, so that identity information in the super-resolution face image cannot be guaranteed to be consistent with identity information in the single-frame low-resolution face image.
Disclosure of Invention
The application provides an image generation method, an image generation device and electronic equipment, which are used for obtaining a super-resolution face image consistent with identity information of a low-resolution face image after carrying out super-resolution processing on a plurality of frames of low-resolution face images.
In a first aspect, the present application provides an image generation method, including:
acquiring N frames of low-resolution face images of a first target, wherein N is a positive integer greater than or equal to 2;
training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;
sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;
and taking the last generated super-resolution face image in the N frames of super-resolution face images as a final face image.
By the image generation method, the super-resolution face image consistent with the identity information of the low-resolution face image can be obtained after the super-resolution processing is carried out on the multi-frame low-resolution face image.
In a possible design, the training a first network model according to the N frames of low-resolution face images to obtain a second network model includes:
calculating the output loss of a first network model according to the N frames of low-resolution face images, wherein the output loss is used for restricting the training process of the first network model;
judging whether the training result of the first network model is converged or not according to the output loss;
if the training result is not converged, adjusting parameters of the first network model, and continuing to train the first network model until the training result is converged;
and if the training result is converged, recording the first network model after the training as a second network model.
According to the method, the first network model is constrained to carry out a training process through the output loss of the first network model, and a second network model is obtained after the training is finished, wherein the second network model can carry out super-resolution processing on multi-frame low-resolution face images of any target to obtain super-resolution face images with identities consistent with the multi-frame low-resolution face images.
In one possible design, the calculating the output loss of the first network model according to the N frames of low-resolution face images includes:
obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model, wherein the number of the super-resolution face image frames in the super-resolution face image set is N;
sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;
and inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model.
By the method, the output loss of the first network model is calculated and used for constraining the training process of the first network model so that the training result is converged.
In one possible design, the obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model includes:
determining a frame of low-resolution face image from the N frames of low-resolution face images as a first reference frame;
inputting a first super-resolution face image and a first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image, wherein the first super-resolution face image is a real high-resolution face image of the first target, and the first low-resolution face image is a next frame image of the first reference frame;
inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;
and replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with the next frame image of the first low-resolution face image, continuing to train the first network model, and forming 1 super-resolution face image set by the sequentially generated super-resolution face images.
By the method, the super-resolution face images in the super-resolution face image set are used for extracting face characteristic values, and the face characteristic values and the N characteristic random variables are used for calculating the output loss of the first network model.
In one possible design, the inputting the N random variables and the N face feature values into a loss function to calculate an output loss of the first network model includes:
inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss, wherein the negative log-likelihood loss is used for constraining the first network model so that the random variables output by the first network model are subjected to standard positive-too-distribution;
inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss, wherein the cosine loss is used for calculating the difference degree between the super-resolution face characteristic and the real face characteristic;
inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss, wherein the cosine comparison loss is used for constraining a first network model so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image;
and inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model, wherein the output loss is used for restricting the training process of the first network model.
By the method, the output loss of the first network model is obtained through calculation, the training process of the first network model is constrained through the output loss, the random variable coded by the first network model can be distributed according to the standard positive space, and the similarity between the super-resolution face image generated by the first network model each time and the real high-resolution face image can be larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.
In one possible design, sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images, includes:
randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus principle, and determining a second reference frame from the N frames of low-resolution face images;
inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;
inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;
and replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.
By the method, the super-resolution processing is carried out on the N frames of low-resolution face images based on the second network model, and the detail characteristics of one frame of low-resolution face image are added in the super-resolution face image generated each time compared with the super-resolution face image generated last time, so that the detail characteristics of the N frames of low-resolution face images are contained in the super-resolution face image generated last time and are consistent with the identity of the N frames of low-resolution face images.
In a second aspect, the present application provides an image generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring N frames of low-resolution face images of the first target, wherein N is a positive integer greater than or equal to 2;
the training module is used for training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;
the processing module is used for sequentially carrying out super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;
and the selection module is used for taking the last generated frame of super-resolution face image in the N frames of super-resolution face images as a final face image.
In one possible design, the training module includes:
the computing unit is used for computing the output loss of a first network model according to the N frames of low-resolution face images, wherein the output loss is used for restricting the training process of the first network model;
a judging unit, configured to judge whether a training result of the first network model converges according to the output loss;
the adjusting unit is used for adjusting the parameters of the first network model if the training result is not converged, and continuing to train the first network model until the training result is converged;
and the marking unit is used for marking the first network model after training as a second network model if the training result is converged.
In one possible design, the computing unit is specifically configured to:
obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model, wherein the number of the super-resolution face image frames in the super-resolution face image set is N;
sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;
and inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model.
In one possible design, the computing unit is further configured to:
determining a frame of low-resolution face image from the N frames of low-resolution face images as a first reference frame;
inputting a first super-resolution face image and a first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image, wherein the first super-resolution face image is a real high-resolution face image of the first target, and the first low-resolution face image is a next frame image of the first reference frame;
inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;
and replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with the next frame image of the first low-resolution face image, continuing to train the first network model, and forming 1 super-resolution face image set by the sequentially generated super-resolution face images.
In one possible design, the computing unit is further configured to:
inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss, wherein the negative log-likelihood loss is used for constraining the first network model so that the random variables output by the first network model are subjected to standard positive-too-distribution;
inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss, wherein the cosine loss is used for calculating the difference degree between the super-resolution face characteristic and the real face characteristic;
inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss, wherein the cosine comparison loss is used for constraining a first network model so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image;
and inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model, wherein the output loss is used for restricting the training process of the first network model.
In one possible design, the processing module includes:
the acquisition unit is used for randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus direction, and determining a second reference frame from the N frames of low-resolution face images;
the processing unit is used for inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;
the coding unit is used for inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;
and the updating unit is used for replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps in the image generation when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of image generation described above.
Based on the method provided by the application, the first network model is trained in a mode of performing super-resolution processing on the N frames of low-resolution images of the first target, and because the last frame of super-resolution facial image obtained in the training process contains the detail information of a plurality of frames of low-resolution facial images, the second network model performs super-resolution processing on the N frames of low-resolution facial images of the first target, and the identity information in the obtained last frame of super-resolution facial image is consistent with the identity information of the first target.
Of course, the second network model may perform super-resolution processing on not only the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, but also the multiple frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.
For each of the second to fourth aspects and possible technical effects of each aspect, please refer to the above description of the first aspect or the possible technical effects of each of the possible solutions in the first aspect, and no repeated description is given here.
Drawings
FIG. 1 is a flow chart of an image generation method provided herein;
FIG. 2 is a flow chart of a method for training a first network model provided herein;
fig. 3 is a flowchart of a method for obtaining N random variables and 1 super-resolution face image set based on a first network model according to the present application;
FIG. 4 is a flow chart of a method for calculating an output loss of a first network model provided herein;
FIG. 5 is a flowchart of a method for obtaining N frames of super-resolution face images based on a second network model according to the present application;
FIG. 6 is a schematic diagram illustrating a method for training a first network model according to the present application;
fig. 7 is a schematic diagram of a method for performing super-resolution processing on N frames of low-resolution face images based on a second network model according to the present application;
fig. 8 is a schematic structural diagram of an image generating apparatus provided in the present application;
FIG. 9 is a schematic diagram of a training module according to the present application;
FIG. 10 is a schematic diagram of a processing module according to the present application;
fig. 11 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
The present application is described in further detail below with reference to the attached figures.
The image generation method provided by the embodiment of the application can solve the problem that the identity information in the obtained super-resolution face image is not consistent with the identity information of the single-frame low-resolution face image when super-resolution processing is performed on the basis of the single-frame low-resolution face image. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.
The face super-resolution technology is essentially to add high-frequency features to a low-resolution face image to generate a high-resolution face image, and in the field of face super-resolution technology, an SRFlow network model is often used. The SRFlow network model is reversible, and the conditional distribution of the super-resolution image with respect to the low-resolution image can be learned. And inputting the high-resolution image and the low-resolution image into the SRFlow network model to obtain random variables meeting specific distribution, and inputting the low-resolution image and the random variables meeting the specific distribution into the SRFlow network model to generate a super-resolution face image.
In the prior art, a single-frame low-resolution face image is usually subjected to super-resolution processing based on an SRFlow network model to obtain a super-resolution face image, but because the detail information of the single-frame low-resolution face image is missing and the detail information is usually used as a key for distinguishing face identity information, identity information in the obtained super-resolution face image cannot be guaranteed to be consistent with identity information in the low-resolution face image.
In order to solve the problem that identity information in a super-resolution face image obtained by super-resolution processing based on a single-frame low-resolution face image cannot be consistent with identity information in the single-frame low-resolution face image, the method comprises the steps of inputting multi-frame low-resolution face images of a first target into a first network model in sequence in an iteration mode based on the first network model, training the first network model, constraining a training process according to output loss of the first network, marking the trained first network model as a second network model when a training result of the first network model is convergent, performing super-resolution processing on the multi-frame low-resolution face images of the first target or the second target by using the second network model, and generating a last frame of super-resolution face image with detail characteristics of the multi-frame low-resolution face images, and thus is consistent with the low resolution face image identity information.
Specifically, as shown in fig. 1, a flowchart of an image generation method provided by the present application is shown:
s11, acquiring N frames of low-resolution face images of the first target, wherein N is a positive integer greater than or equal to 2;
s12, training a first network model according to the N frames of low-resolution face images to obtain a second network model;
in this embodiment of the present application, the first network model may be an SRFlow network model, the N frames of low-resolution face images are sequentially input into the first network model in an iterative manner, the first network model is trained, a training process is constrained according to an output loss of the first network, and when a training result of the first network model is convergence, the trained first network model is recorded as a second network model.
S13, sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;
in the embodiment of the application, based on the second network model, the super-resolution processing is sequentially performed on the N frames of low-resolution face images to obtain N frames of super-resolution face images. In the super-resolution processing process, the detail characteristics of one frame of low-resolution face image are increased in the super-resolution face image generated each time compared with the super-resolution face image generated last time.
S14, taking the last generated super-resolution face image in the N frames of super-resolution face images as a final face image;
by inputting the N frames of low-resolution face images into the second network model, the super-resolution face image generated each time has more detail features of one frame of low-resolution face image than the super-resolution face image generated last time, so that the last generated frame of super-resolution face image contains the detail features of the N frames of low-resolution face images, that is, the identity information in the last generated frame of super-resolution face image is consistent with the identity information in the N frames of low-resolution face images.
Of course, the second network model may perform super-resolution processing on not only the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, but also the multiple frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.
To further illustrate how the second network model is obtained, the method for training the first network model in step S12 is described in detail, and as shown in fig. 2, a specific process for training the first network model is as follows:
s21, obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model;
in this embodiment of the present application, 1 frame of real high-resolution face images of a first target and the super-resolution face images generated each time are collectively stored in the super-resolution face image set, and the total number of the super-resolution face images in the super-resolution face image set is N when the 1 frame of super-resolution face images is recorded as a first super-resolution face image.
Obtaining the N random variables and the 1 super-resolution face image set may be implemented by inputting the N frames of low-resolution face images into a first network model in an iterative manner, where a specific flow is shown in fig. 3:
s31, determining a frame of low-resolution face image in the N frames of low-resolution face images as a first reference frame;
in this embodiment of the application, the first reference frame may be a 1 st frame of the N frames of low-resolution facial images, or may be a 2 nd frame, a 3 rd frame, a 4 th frame, and the like, and the 1 st frame of low-resolution facial image is selected in this application.
S32, putting 1 frame of real high-resolution face image of the first target into a super-resolution face image set as a first super-resolution face image, and taking the next frame image of the first reference frame as a first low-resolution face image;
s33, inputting the first super-resolution face image and the first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image;
s34, inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;
s35, putting the second super-resolution face image into a super-resolution face image set, and judging whether the number of image frames in the super-resolution face image set is N;
in this embodiment of the present application, if the number of frames in the super-resolution face image set is not N, step S36 is executed; otherwise, step S37 is executed.
S36, if the image frame number is not N, replacing the first super-resolution face image with the second super-resolution face image, and replacing the first low-resolution face image with the next frame image of the first low-resolution face image;
if the number of the image frames is not N, replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with a next frame image of the first low-resolution face image, and executing step S33;
and S37, if the number of the image frames is N, obtaining 1 super-resolution face image set with N random variables and N image frames.
Based on the steps, the super-resolution face images in the super-resolution face image set are used for extracting face characteristic values, and the face characteristic values and the N characteristic random variables are used for calculating the output loss of the first network model.
S22, sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;
s23, inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model;
and constraining the training process of the first network model through the output loss so that random variables output by the first network model are distributed according to a standard positive-false distribution, and the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.
S24, judging whether the training result of the first network model is converged according to the output loss;
if the output loss of the first network model is converged, which indicates that the training result of the first network model is converged, performing step S25; otherwise, go to step 26;
s25, if the training result is converged, recording the first network model after training as a second network model;
if the training result is converged, the first network model is shown to perform super-resolution processing on multiple frames of low-resolution face images, and finally the generated super-resolution face images are consistent with the identity information of the low-resolution face images. And recording the trained first network model as a second network model, wherein the second network model can perform super-resolution processing on a plurality of frames of low-resolution face images of any target, and the identity information in the generated one frame of super-resolution face image is consistent with the identity information in the low-resolution face image.
S26, if the training result is not converged, adjusting the parameters of the first network model, and continuing to train the first network model until the training result is converged;
if the training result is not converged, adjusting the parameters of the first network model, continuing to acquire the N frames of low-resolution face images of the other target, and executing step S11 to continue training the first network model until the training result is converged.
Based on the steps, the N frames of low-resolution face images are input into the first network model, the first network model is trained, and a second network model is obtained after training is completed, wherein the second network model can realize super-resolution processing on multi-frame low-resolution face images of any target, so that super-resolution face images consistent with identity information of the multi-frame low-resolution face images are obtained.
In the training process of obtaining the second network model, the first network model needs to be constrained by the output loss of the first network model, so that the random variable output by the first network model is distributed according to the standard positive distribution, and the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.
To further illustrate the method for calculating the output loss, the output loss calculated in step S23 to obtain the first network model needs to be described in detail, and the specific flow of calculating the output loss is shown in fig. 4:
s41, inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss;
in the embodiment of the present application, the negative log-likelihood loss is used to constrain the first network model such that the random variables output by the first network model obey a standard positive-likelihood distribution, wherein the negative log-likelihood loss can be calculated by equation (1):
Figure 452526DEST_PATH_IMAGE001
wherein, the formula (1) is a negative log-likelihood loss function,LRin the form of a low-resolution face image,SRin order to super-resolution the face image,
Figure 540306DEST_PATH_IMAGE002
for the distribution parameter, N is the number of frames of the low resolution image,LR 1i the i-th frame low resolution face image representing the input first network model,p Z(z1i) Represents the spatial distribution of the random variable,Z 1i to represent a random variable resulting from inputting the ith frame of low-resolution face image into the first network model,
Figure 851201DEST_PATH_IMAGE003
is a first network model, the first network model
Figure 652935DEST_PATH_IMAGE004
Is decomposed into M reversible layer sequences
Figure 139411DEST_PATH_IMAGE005
S42, inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss;
in the embodiment of the present application, the cosine loss indicates a difference degree between the super-resolution face features and the real face features, wherein the cosine loss can be calculated by formula (2):
Figure 493032DEST_PATH_IMAGE006
wherein, the formula (2) is the cosine loss function,Similarity icosine similarity between the ith super-resolution face image of the first network model and the real high-resolution face image, wherein the cosine similarity takes the valueThe range is (-1, 1), the greater the cosine similarity, the higher the similarity between the super-resolution face image and the real high-resolution face image, and the cosine similarity can be calculated by formula (3):
Figure 432169DEST_PATH_IMAGE007
wherein the content of the first and second substances,Similarity ithe cosine similarity generated at the ith time is represented, and formula (3) is a cosine similarity function FiThe face characteristic value is extracted after the super-resolution face image generated by the first network model at the ith time is input into a recognition network; f0Extracting face characteristic values after a real high-resolution face image is input into the recognition network;
s43, inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss;
in the embodiment of the present application, the cosine comparison loss is used to constrain the first network model, so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image, that is, the similarity is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image, that is, the first network model is constrained by the cosine comparison lossSimilarity i+1Is greater thanSimilarity iThe cosine comparison loss can be calculated by formula (4):
Figure 801709DEST_PATH_IMAGE008
wherein, the formula (4) is a cosine comparison loss function, e is the base number of the natural logarithm,αare comparative coefficients.
And S44, inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model.
In the embodiment of the present application, the output loss is used to constrain a training process of the first network model, so that not only can a random variable encoded by the first network model obey a standard positive distribution, but also a similarity between a super-resolution face image generated by the first network model each time and a real high-resolution face image is greater than a similarity between a super-resolution face image generated last time and a real high-resolution face image. The output loss can be calculated by equation (5):
Figure 673850DEST_PATH_IMAGE009
wherein equation (5) is the loss function.
Based on the steps, calculating to obtain the output loss of the first network model, if the output loss is not converged, adjusting the parameters of the first network model, and continuing to train the first network model until the output loss is converged.
When the output loss is converged, the fact that the second network model obtained after training can perform super-resolution processing on the N frames of low-resolution face images is shown, so that N frames of super-resolution face images are obtained, and in the super-resolution processing process, the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.
To further illustrate how the second network model performs super-resolution processing on the N frames of low-resolution face images, step S13 is described in detail, specifically as shown in fig. 5, which is a specific flow of the super-resolution processing:
s51, randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to the standard plus-minus-plus-minus ratio, and determining a second reference frame from the N frames of low-resolution face images;
s52, taking the next frame image of the second reference frame as a second low-resolution face image;
s53, inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;
s54, counting the super-resolution face images generated each time, and judging whether the total frame number of the super-resolution face images is N;
the reason for counting the super-resolution face images generated each time is to determine whether all the N frames of low-resolution face images have been super-resolution processed. If the total frame number of the super-resolution face image is N, the super-resolution is finished, and step S55 is executed; otherwise, step S56 is executed.
S55, if the total frame number of the super-resolution face image is N, the super-resolution is finished, and the N-frame super-resolution face image is obtained;
s56, if the total frame number of the super-resolution face image is not N, inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable;
and S57, replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuing to perform super-resolution processing on the replaced second low-resolution face image.
And replacing the first random variable with the second random variable, and after replacing the second low-resolution face image with the next frame image of the second low-resolution face image, executing step S53 to realize super-resolution processing of the second low-resolution face image after continuous replacement.
Based on the mode, the second network model is used for performing super-resolution processing on the N frames of low-resolution face images, and the super-resolution face image generated each time has more detail features of one frame of low-resolution face image than the super-resolution face image generated last time, so that the super-resolution face image generated last contains the detail features of the N frames of low-resolution face images, namely the identity information in the super-resolution face image generated last is consistent with the identity information in the N frames of low-resolution face images.
Of course, based on the above steps, the second network model is used, so that not only the super-resolution processing can be performed on the N frames of low-resolution face images of the first target, but also the super-resolution processing can be performed on the N frames of low-resolution face images of the second target, and the identity information in the generated last frame of super-resolution face image of the second target is consistent with the identity information in the N frames of low-resolution face images of the second target.
Further, in order to explain an image generation method provided by the present application in more detail, the method provided by the present application is described in detail below through a specific application scenario.
Before generating an image, a first network model needs to be trained, and referring to fig. 6, N frames of low-resolution face images of a first target are sorted according to an acquisition sequence of an image acquisition device, and are respectively recorded as a 1 st frame of low-resolution face image, a 2 nd frame of low-resolution face image, ⋯, and an nth frame of low-resolution face image. Taking the 1 st frame low-resolution face image as a reference frame LR11Inputting the real high-resolution face image HR of the first target into a recognition network to obtain a 1 st face characteristic value F0Wherein HR is denoted as SR0
Training the 1 st time, and LR (high rate) HR and 2 nd frame low-resolution face images12Inputting the first network model to obtain the 1 st random variable Z11(ii) a Will Z11And LR11Inputting the first network model to generate the 1 st frame super-resolution face image SR11SR11Inputting the identification network to obtain the 2 nd face characteristic value F2
2 nd training, SR11And a 3 rd frame low resolution face image LR12Inputting the first network model to obtain the 2 nd random variable Z12(ii) a Will Z12And LR11Inputting the first network model, and generating the 2 nd frame super-resolution face image SR of the first model12SR12Inputting the identification network to obtain the 3 rd personal face characteristic value F3
At the i (i)>1) During secondary training, the super-resolution face image SR generated by the (i-1) th time of the first network model is used1(i-1)And the (i + 1) th frame low-resolution face image LR1(i+1)Inputting the first network model to obtain the ith random variable Z1i(ii) a Will Z1iAnd LR11Inputting the first network model, and generating the i-th frame super-resolution face image SR of the first model1iSR1iInputting the identification network to obtain the (i + 1) th face characteristic value Fi
The face feature value { F to be generatediI =0, 1, ⋯, N-1} is input into the loss function, the output loss of the first network model is obtained through calculation, and whether the output loss is converged is judged; if yes, the first network model training result is shown to be converged, and the trained first network model is recorded as a second network model; otherwise, adjusting the parameters of the first network model, and continuing to train the first network model until the training result is converged.
Based on the second network model obtained by the training method, super-resolution processing can be performed on the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, and also can be performed on the N frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.
After the training of the first network model is completed and the second network model is obtained, the multi-frame low-resolution face image of any target can be subjected to super-resolution processing through the second network, and a super-resolution face image with the identity consistent with that of the low-resolution face image is obtained. Here, taking the first objective as an example, a specific flow will be described with reference to fig. 7:
at the 1 st super-resolution, randomly sampling a random variable Z in a random variable distribution space which is generated in the training process and meets the standard normal distribution21And reference frame LR21Simultaneously accessing a second network model to generate a 1 st frame super-resolution face image SR of the second model21Here, the 1 st frame low resolution face image of the N frames low resolution face image is determined as the reference frame LR21
At super resolution of 2 nd time, SR is21And 2 nd frame low resolution face image LR22Simultaneously inputting the second network model to obtain a 2 nd random variable Z22(ii) a Will Z22And LR21Simultaneously inputting the second network model to generate a 2 nd frame super-resolution face image SR of the second model22
When the face image is super-resolved for the ith time, the super-resolution face image SR of the (i-1) th frame generated by the second network model is used2(i-1)And the ith frame of low-resolution face image LR2iSimultaneously inputting the second network model to generate the i-th frame super-resolution face image SR of the second network model2i
And when the last frame of low-resolution face image is input into the second network model, generating the last frame of super-resolution face image as a final super-resolution result.
Based on the process, sequentially inputting N frames of low-resolution face images of a first target into a first network model, training the first network model, constraining the training process of the first network model by using the output loss of the first network model to make the training result of the first network model converge, and marking the trained first network model as a second network model. Because the last frame of super-resolution face image obtained in the training process contains the detail information of a plurality of frames of low-resolution face images, the second network model performs super-resolution processing on the N frames of low-resolution face images of the first target, and the identity information of the last frame of super-resolution face image is consistent with the identity information of the first target.
Of course, the second network model may perform super-resolution processing on not only the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, but also the multiple frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.
Based on the same inventive concept, an embodiment of the present application further provides an image generating apparatus, as shown in fig. 8, which is a schematic structural diagram of the image generating apparatus in the present application, and the apparatus includes:
an obtaining module 81, configured to obtain N frames of low-resolution face images of a first target, where N is a positive integer greater than or equal to 2;
the training module 82 is configured to train a first network model according to the N frames of low-resolution face images to obtain a second network model, where the first network model can perform super-resolution processing on the low-resolution face images;
the processing module 83 is configured to perform super-resolution processing on the N frames of low-resolution face images in sequence based on the second network model to obtain N frames of super-resolution face images;
and the selecting module 84 is configured to use a last generated super-resolution face image of the N frames of super-resolution face images as a final face image.
In one possible design, as shown in fig. 9, the training module includes:
a calculating unit 91, configured to calculate an output loss of a first network model according to the N frames of low-resolution face images, where the output loss is used to constrain a training process of the first network model;
a determining unit 92, configured to determine whether a training result of the first network model converges according to the output loss;
an adjusting unit 93, configured to adjust parameters of the first network model if the training result is not converged, and continue training the first network model until the training result is converged;
and a marking unit 94, configured to mark the trained first network model as the second network model if the training result is converged.
In one possible design, the computing unit is specifically configured to:
obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model, wherein the number of the super-resolution face image frames in the super-resolution face image set is N;
sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;
and inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model.
In one possible design, the computing unit is further configured to:
determining a frame of low-resolution face image from the N frames of low-resolution face images as a first reference frame;
inputting a first super-resolution face image and a first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image, wherein the first super-resolution face image is a real high-resolution face image of the first target, and the first low-resolution face image is a next frame image of the first reference frame;
inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;
and replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with the next frame image of the first low-resolution face image, continuing to train the first network model, and forming 1 super-resolution face image set by the sequentially generated super-resolution face images.
In one possible design, the computing unit is further configured to:
inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss, wherein the negative log-likelihood loss is used for constraining the first network model so that the random variables output by the first network model are subjected to standard positive-too-distribution;
inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss, wherein the cosine loss is used for calculating the difference degree between the super-resolution face characteristic and the real face characteristic;
inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss, wherein the cosine comparison loss is used for constraining a first network model so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image;
and inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model, wherein the output loss is used for restricting the training process of the first network model.
In one possible design, as shown in fig. 10, the processing module includes:
the acquiring unit 101 is configured to randomly sample a first random variable from random variables distributed according to a standard plus-minus-plus-minus direction generated in a training process, and determine a second reference frame from the N frames of low-resolution face images;
the processing unit 102 is configured to input the first random variable and a second reference frame into a second network model, so as to obtain a super-resolution face image corresponding to the first random variable;
the encoding unit 103 is configured to input the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, where the second low-resolution face image is a next frame image of the second reference frame;
and the updating unit 104 is configured to replace the first random variable with the second random variable, replace the second low-resolution face image with a next frame image of the second low-resolution face image, and continue to perform super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.
Based on the image generation device, N frames of low-resolution face images of a first target are sequentially input into a first network model, the first network model is trained, the training process of the first network model is constrained by the output loss of the first network model, the training result of the first network model is converged, and the trained first network model is recorded as a second network model. Because the last frame of super-resolution face image obtained in the training process contains the detail information of a plurality of frames of low-resolution face images, the super-resolution processing is carried out on the N frames of low-resolution face images of the first target through the second network model, and the obtained last frame of super-resolution face image is consistent with the super-resolution face image of the first target in identity.
Of course, the second network model may perform super-resolution processing on not only the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, but also the multiple frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device can implement the function of the foregoing image generation apparatus, and with reference to fig. 11, the electronic device includes:
at least one processor 111 and a memory 112 connected to the at least one processor 111, in this embodiment, a specific connection medium between the processor 111 and the memory 112 is not limited in this application, and fig. 11 illustrates an example in which the processor 111 and the memory 112 are connected through a bus 110. The bus 110 is shown in fig. 11 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 110 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 11 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 111 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 112 stores instructions executable by the at least one processor 111, and the at least one processor 111 can execute the image generation method discussed above by executing the instructions stored in the memory 112. The processor 111 may implement the functions of the various modules in the apparatus shown in fig. 6.
The processor 111 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions of the apparatus and process data by operating or executing instructions stored in the memory 112 and calling data stored in the memory 112, thereby performing overall monitoring of the apparatus.
In one possible design, processor 111 may include one or more processing units, and processor 111 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 111. In some embodiments, the processor 111 and the memory 112 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 111 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the image generation method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
The memory 112, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 112 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 112 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 112 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
By programming the processor 111, the code corresponding to the image generation method described in the foregoing embodiment may be solidified into the chip, so that the chip can execute the steps of the image generation method of the embodiment shown in fig. 1 when running. How to program the processor 111 is well known to those skilled in the art and will not be described in detail herein.
Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to execute the image generation method discussed above.
In some possible embodiments, the aspects of the image generation method provided by the present application may also be implemented in the form of a program product comprising program code for causing a control apparatus to perform the steps of the image generation method according to various exemplary embodiments of the present application described above in this specification when the program product is run on a device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An image generation method, characterized in that the method comprises:
acquiring N frames of low-resolution face images of a first target, wherein N is a positive integer greater than or equal to 2;
training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;
sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;
and taking the last generated super-resolution face image in the N frames of super-resolution face images as a final face image.
2. The method of claim 1, wherein the training a first network model according to the N frames of low resolution face images to obtain a second network model comprises:
calculating the output loss of a first network model according to the N frames of low-resolution face images, wherein the output loss is used for restricting the training process of the first network model;
judging whether the training result of the first network model is converged or not according to the output loss;
if the training result is not converged, adjusting parameters of the first network model, and continuing to train the first network model until the training result is converged;
and if the training result is converged, recording the first network model after the training as a second network model.
3. The method of claim 2, wherein calculating the output loss of the first network model from the N frames of low resolution face images comprises:
obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model, wherein the number of the super-resolution face image frames in the super-resolution face image set is N;
sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;
and inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model.
4. The method of claim 3, wherein said passing said N frames of low resolution facial images through a first network model to obtain N random variables and 1 super resolution facial image set comprises:
determining a frame of low-resolution face image from the N frames of low-resolution face images as a first reference frame;
inputting a first super-resolution face image and a first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image, wherein the first super-resolution face image is a real high-resolution face image of the first target, and the first low-resolution face image is a next frame image of the first reference frame;
inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;
and replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with the next frame image of the first low-resolution face image, continuing to train the first network model, and forming 1 super-resolution face image set by the sequentially generated super-resolution face images.
5. The method of claim 3, wherein said inputting said N random variables and said N face feature values into a loss function to calculate an output loss for a first network model comprises:
inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss, wherein the negative log-likelihood loss is used for constraining the first network model so that the random variables output by the first network model are subjected to standard positive-too-distribution;
inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss, wherein the cosine loss is used for calculating the difference degree between the super-resolution face characteristic and the real face characteristic;
inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss, wherein the cosine comparison loss is used for constraining a first network model so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image;
and inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model, wherein the output loss is used for restricting the training process of the first network model.
6. The method of claim 1, wherein performing super-resolution processing on the N frames of low-resolution face images in sequence based on the second network model to obtain N frames of super-resolution face images comprises:
randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus principle, and determining a second reference frame from the N frames of low-resolution face images;
inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;
inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;
and replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.
7. An image generation apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring N frames of low-resolution face images of the first target, wherein N is a positive integer greater than or equal to 2;
the training module is used for training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;
the processing module is used for sequentially carrying out super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;
and the selection module is used for taking the last generated frame of super-resolution face image in the N frames of super-resolution face images as a final face image.
8. The apparatus of claim 7, wherein the processing module comprises:
the acquisition unit is used for randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus direction, and determining a second reference frame from the N frames of low-resolution face images;
the processing unit is used for inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;
the coding unit is used for inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;
and the updating unit is used for replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-6 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.
CN202110879082.1A 2021-08-02 2021-08-02 Image generation method and device and electronic equipment Active CN113344792B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110879082.1A CN113344792B (en) 2021-08-02 2021-08-02 Image generation method and device and electronic equipment
PCT/CN2021/128518 WO2023010701A1 (en) 2021-08-02 2021-11-03 Image generation method, apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110879082.1A CN113344792B (en) 2021-08-02 2021-08-02 Image generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113344792A true CN113344792A (en) 2021-09-03
CN113344792B CN113344792B (en) 2022-07-05

Family

ID=77480653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110879082.1A Active CN113344792B (en) 2021-08-02 2021-08-02 Image generation method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN113344792B (en)
WO (1) WO2023010701A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023010701A1 (en) * 2021-08-02 2023-02-09 Zhejiang Dahua Technology Co., Ltd. Image generation method, apparatus, and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN107423701A (en) * 2017-07-17 2017-12-01 北京智慧眼科技股份有限公司 The non-supervisory feature learning method and device of face based on production confrontation network
CN110889895A (en) * 2019-11-11 2020-03-17 南昌大学 Face video super-resolution reconstruction method fusing single-frame reconstruction network
CN111062867A (en) * 2019-11-21 2020-04-24 浙江大华技术股份有限公司 Video super-resolution reconstruction method
CN112507617A (en) * 2020-12-03 2021-03-16 青岛海纳云科技控股有限公司 Training method of SRFlow super-resolution model and face recognition method
CN112508782A (en) * 2020-09-10 2021-03-16 浙江大华技术股份有限公司 Network model training method, face image super-resolution reconstruction method and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344792B (en) * 2021-08-02 2022-07-05 浙江大华技术股份有限公司 Image generation method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN107423701A (en) * 2017-07-17 2017-12-01 北京智慧眼科技股份有限公司 The non-supervisory feature learning method and device of face based on production confrontation network
CN110889895A (en) * 2019-11-11 2020-03-17 南昌大学 Face video super-resolution reconstruction method fusing single-frame reconstruction network
CN111062867A (en) * 2019-11-21 2020-04-24 浙江大华技术股份有限公司 Video super-resolution reconstruction method
CN112508782A (en) * 2020-09-10 2021-03-16 浙江大华技术股份有限公司 Network model training method, face image super-resolution reconstruction method and equipment
CN112507617A (en) * 2020-12-03 2021-03-16 青岛海纳云科技控股有限公司 Training method of SRFlow super-resolution model and face recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANDREAS LUGMAYR等: "SRFlow: Learning the Super-Resolution Space with Normalizing Flow", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 *
孙菁阳等: "图像超分辨率重建算法综述", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023010701A1 (en) * 2021-08-02 2023-02-09 Zhejiang Dahua Technology Co., Ltd. Image generation method, apparatus, and electronic device

Also Published As

Publication number Publication date
WO2023010701A1 (en) 2023-02-09
CN113344792B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
WO2022078041A1 (en) Occlusion detection model training method and facial image beautification method
CN109934300B (en) Model compression method, device, computer equipment and storage medium
CN109413510B (en) Video abstract generation method and device, electronic equipment and computer storage medium
CN111401374A (en) Model training method based on multiple tasks, character recognition method and device
CN112804558B (en) Video splitting method, device and equipment
Zhang et al. Robust facial landmark detection via heatmap-offset regression
CN113344792B (en) Image generation method and device and electronic equipment
CN116189265A (en) Sketch face recognition method, device and equipment based on lightweight semantic transducer model
Chen et al. Learning to generate steganographic cover for audio steganography using gan
CN116205820A (en) Image enhancement method, target identification method, device and medium
CN109492610A (en) A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again
CN113362804B (en) Method, device, terminal and storage medium for synthesizing voice
CN112950505B (en) Image processing method, system and medium based on generation countermeasure network
Zhang et al. A new JPEG image steganalysis technique combining rich model features and convolutional neural networks
CN112750071B (en) User-defined expression making method and system
US11361189B2 (en) Image generation method and computing device
CN112786003A (en) Speech synthesis model training method and device, terminal equipment and storage medium
CN113689527A (en) Training method of face conversion model and face image conversion method
CN111539263B (en) Video face recognition method based on aggregation countermeasure network
Zhong et al. Target aware network adaptation for efficient representation learning
CN107403145A (en) Image characteristic points positioning method and device
CN117291252B (en) Stable video generation model training method, generation method, equipment and storage medium
CN111291602A (en) Video detection method and device, electronic equipment and computer readable storage medium
CN116977794B (en) Digital human video identification model training method and system based on reinforcement learning
Xiong et al. Study on energy theft detection based on customers’ consumption pattern

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant