CN113344792A

CN113344792A - Image generation method and device and electronic equipment

Info

Publication number: CN113344792A
Application number: CN202110879082.1A
Authority: CN
Inventors: 李亚鹏; 王宁波
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-09-03
Anticipated expiration: 2041-08-02
Also published as: WO2023010701A1; CN113344792B

Abstract

The application discloses an image generation method, an image generation device and electronic equipment, wherein the method comprises the following steps: sequentially inputting N frames of low-resolution face images into a first network model in an iterative mode, training the first network model, constraining the training process by using the output loss of the first network model until the training result of the first network model converges, recording the trained first network model as a second network model, carrying out super-resolution processing on multi-frame low-resolution face images of any target by the second network model, and obtaining the super-resolution face images with the same identities as the multi-frame low-resolution face images. Based on the method, the problem that the identity information in the obtained super-resolution face image is consistent with the identity information of the single-frame low-resolution face image cannot be guaranteed when super-resolution processing is carried out on the basis of the single-frame low-resolution face image can be solved.

Description

Image generation method and device and electronic equipment

Technical Field

The application relates to the technical field of face recognition, in particular to an image generation method, an image generation device and electronic equipment.

Background

With the rapid development of scientific technology and the arrival of the big data era, information security becomes more and more important. As a safe, non-contact, convenient and efficient identity information authentication mode, face recognition has been widely applied to various aspects of social life. However, in a relatively large monitoring scene, the size of a face appearing in a video is generally small, the image definition is low, and the requirement of face recognition is difficult to meet, so the face super-resolution technology becomes more and more important. The face super-resolution technology essentially adds high-frequency features to a low-resolution face image to generate a high-resolution face image.

In the prior art, a super-resolution face image is usually obtained by super-resolution processing based on a single-frame low-resolution face image, and the super-resolution face image obtained in this way has face information loss, so that identity information in the super-resolution face image cannot be guaranteed to be consistent with identity information in the single-frame low-resolution face image.

Disclosure of Invention

The application provides an image generation method, an image generation device and electronic equipment, which are used for obtaining a super-resolution face image consistent with identity information of a low-resolution face image after carrying out super-resolution processing on a plurality of frames of low-resolution face images.

In a first aspect, the present application provides an image generation method, including:

acquiring N frames of low-resolution face images of a first target, wherein N is a positive integer greater than or equal to 2;

training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;

sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;

and taking the last generated super-resolution face image in the N frames of super-resolution face images as a final face image.

By the image generation method, the super-resolution face image consistent with the identity information of the low-resolution face image can be obtained after the super-resolution processing is carried out on the multi-frame low-resolution face image.

In a possible design, the training a first network model according to the N frames of low-resolution face images to obtain a second network model includes:

calculating the output loss of a first network model according to the N frames of low-resolution face images, wherein the output loss is used for restricting the training process of the first network model;

judging whether the training result of the first network model is converged or not according to the output loss;

if the training result is not converged, adjusting parameters of the first network model, and continuing to train the first network model until the training result is converged;

and if the training result is converged, recording the first network model after the training as a second network model.

According to the method, the first network model is constrained to carry out a training process through the output loss of the first network model, and a second network model is obtained after the training is finished, wherein the second network model can carry out super-resolution processing on multi-frame low-resolution face images of any target to obtain super-resolution face images with identities consistent with the multi-frame low-resolution face images.

In one possible design, the calculating the output loss of the first network model according to the N frames of low-resolution face images includes:

obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model, wherein the number of the super-resolution face image frames in the super-resolution face image set is N;

sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;

and inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model.

By the method, the output loss of the first network model is calculated and used for constraining the training process of the first network model so that the training result is converged.

In one possible design, the obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model includes:

determining a frame of low-resolution face image from the N frames of low-resolution face images as a first reference frame;

inputting a first super-resolution face image and a first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image, wherein the first super-resolution face image is a real high-resolution face image of the first target, and the first low-resolution face image is a next frame image of the first reference frame;

inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;

and replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with the next frame image of the first low-resolution face image, continuing to train the first network model, and forming 1 super-resolution face image set by the sequentially generated super-resolution face images.

By the method, the super-resolution face images in the super-resolution face image set are used for extracting face characteristic values, and the face characteristic values and the N characteristic random variables are used for calculating the output loss of the first network model.

In one possible design, the inputting the N random variables and the N face feature values into a loss function to calculate an output loss of the first network model includes:

inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss, wherein the negative log-likelihood loss is used for constraining the first network model so that the random variables output by the first network model are subjected to standard positive-too-distribution;

inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss, wherein the cosine loss is used for calculating the difference degree between the super-resolution face characteristic and the real face characteristic;

inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss, wherein the cosine comparison loss is used for constraining a first network model so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image;

and inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model, wherein the output loss is used for restricting the training process of the first network model.

By the method, the output loss of the first network model is obtained through calculation, the training process of the first network model is constrained through the output loss, the random variable coded by the first network model can be distributed according to the standard positive space, and the similarity between the super-resolution face image generated by the first network model each time and the real high-resolution face image can be larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.

In one possible design, sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images, includes:

randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus principle, and determining a second reference frame from the N frames of low-resolution face images;

inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;

inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;

and replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.

By the method, the super-resolution processing is carried out on the N frames of low-resolution face images based on the second network model, and the detail characteristics of one frame of low-resolution face image are added in the super-resolution face image generated each time compared with the super-resolution face image generated last time, so that the detail characteristics of the N frames of low-resolution face images are contained in the super-resolution face image generated last time and are consistent with the identity of the N frames of low-resolution face images.

In a second aspect, the present application provides an image generation apparatus, the apparatus comprising:

the acquisition module is used for acquiring N frames of low-resolution face images of the first target, wherein N is a positive integer greater than or equal to 2;

the training module is used for training a first network model according to the N frames of low-resolution face images to obtain a second network model, wherein the first network model can perform super-resolution processing on the low-resolution face images;

the processing module is used for sequentially carrying out super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;

and the selection module is used for taking the last generated frame of super-resolution face image in the N frames of super-resolution face images as a final face image.

In one possible design, the training module includes:

the computing unit is used for computing the output loss of a first network model according to the N frames of low-resolution face images, wherein the output loss is used for restricting the training process of the first network model;

a judging unit, configured to judge whether a training result of the first network model converges according to the output loss;

the adjusting unit is used for adjusting the parameters of the first network model if the training result is not converged, and continuing to train the first network model until the training result is converged;

and the marking unit is used for marking the first network model after training as a second network model if the training result is converged.

In one possible design, the computing unit is specifically configured to:

In one possible design, the computing unit is further configured to:

In one possible design, the processing module includes:

the acquisition unit is used for randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to a standard plus-minus-plus-minus direction, and determining a second reference frame from the N frames of low-resolution face images;

the processing unit is used for inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;

the coding unit is used for inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, wherein the second low-resolution face image is a next frame image of the second reference frame;

and the updating unit is used for replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuously performing super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps in the image generation when executing the computer program stored in the memory.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of image generation described above.

Based on the method provided by the application, the first network model is trained in a mode of performing super-resolution processing on the N frames of low-resolution images of the first target, and because the last frame of super-resolution facial image obtained in the training process contains the detail information of a plurality of frames of low-resolution facial images, the second network model performs super-resolution processing on the N frames of low-resolution facial images of the first target, and the identity information in the obtained last frame of super-resolution facial image is consistent with the identity information of the first target.

Of course, the second network model may perform super-resolution processing on not only the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, but also the multiple frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.

For each of the second to fourth aspects and possible technical effects of each aspect, please refer to the above description of the first aspect or the possible technical effects of each of the possible solutions in the first aspect, and no repeated description is given here.

Drawings

FIG. 1 is a flow chart of an image generation method provided herein;

FIG. 2 is a flow chart of a method for training a first network model provided herein;

fig. 3 is a flowchart of a method for obtaining N random variables and 1 super-resolution face image set based on a first network model according to the present application;

FIG. 4 is a flow chart of a method for calculating an output loss of a first network model provided herein;

FIG. 5 is a flowchart of a method for obtaining N frames of super-resolution face images based on a second network model according to the present application;

FIG. 6 is a schematic diagram illustrating a method for training a first network model according to the present application;

fig. 7 is a schematic diagram of a method for performing super-resolution processing on N frames of low-resolution face images based on a second network model according to the present application;

fig. 8 is a schematic structural diagram of an image generating apparatus provided in the present application;

FIG. 9 is a schematic diagram of a training module according to the present application;

FIG. 10 is a schematic diagram of a processing module according to the present application;

fig. 11 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.

The present application is described in further detail below with reference to the attached figures.

The image generation method provided by the embodiment of the application can solve the problem that the identity information in the obtained super-resolution face image is not consistent with the identity information of the single-frame low-resolution face image when super-resolution processing is performed on the basis of the single-frame low-resolution face image. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.

The face super-resolution technology is essentially to add high-frequency features to a low-resolution face image to generate a high-resolution face image, and in the field of face super-resolution technology, an SRFlow network model is often used. The SRFlow network model is reversible, and the conditional distribution of the super-resolution image with respect to the low-resolution image can be learned. And inputting the high-resolution image and the low-resolution image into the SRFlow network model to obtain random variables meeting specific distribution, and inputting the low-resolution image and the random variables meeting the specific distribution into the SRFlow network model to generate a super-resolution face image.

In the prior art, a single-frame low-resolution face image is usually subjected to super-resolution processing based on an SRFlow network model to obtain a super-resolution face image, but because the detail information of the single-frame low-resolution face image is missing and the detail information is usually used as a key for distinguishing face identity information, identity information in the obtained super-resolution face image cannot be guaranteed to be consistent with identity information in the low-resolution face image.

In order to solve the problem that identity information in a super-resolution face image obtained by super-resolution processing based on a single-frame low-resolution face image cannot be consistent with identity information in the single-frame low-resolution face image, the method comprises the steps of inputting multi-frame low-resolution face images of a first target into a first network model in sequence in an iteration mode based on the first network model, training the first network model, constraining a training process according to output loss of the first network, marking the trained first network model as a second network model when a training result of the first network model is convergent, performing super-resolution processing on the multi-frame low-resolution face images of the first target or the second target by using the second network model, and generating a last frame of super-resolution face image with detail characteristics of the multi-frame low-resolution face images, and thus is consistent with the low resolution face image identity information.

Specifically, as shown in fig. 1, a flowchart of an image generation method provided by the present application is shown:

s11, acquiring N frames of low-resolution face images of the first target, wherein N is a positive integer greater than or equal to 2;

s12, training a first network model according to the N frames of low-resolution face images to obtain a second network model;

in this embodiment of the present application, the first network model may be an SRFlow network model, the N frames of low-resolution face images are sequentially input into the first network model in an iterative manner, the first network model is trained, a training process is constrained according to an output loss of the first network, and when a training result of the first network model is convergence, the trained first network model is recorded as a second network model.

S13, sequentially performing super-resolution processing on the N frames of low-resolution face images based on the second network model to obtain N frames of super-resolution face images;

in the embodiment of the application, based on the second network model, the super-resolution processing is sequentially performed on the N frames of low-resolution face images to obtain N frames of super-resolution face images. In the super-resolution processing process, the detail characteristics of one frame of low-resolution face image are increased in the super-resolution face image generated each time compared with the super-resolution face image generated last time.

S14, taking the last generated super-resolution face image in the N frames of super-resolution face images as a final face image;

by inputting the N frames of low-resolution face images into the second network model, the super-resolution face image generated each time has more detail features of one frame of low-resolution face image than the super-resolution face image generated last time, so that the last generated frame of super-resolution face image contains the detail features of the N frames of low-resolution face images, that is, the identity information in the last generated frame of super-resolution face image is consistent with the identity information in the N frames of low-resolution face images.

To further illustrate how the second network model is obtained, the method for training the first network model in step S12 is described in detail, and as shown in fig. 2, a specific process for training the first network model is as follows:

s21, obtaining N random variables and 1 super-resolution face image set from the N frames of low-resolution face images through a first network model;

in this embodiment of the present application, 1 frame of real high-resolution face images of a first target and the super-resolution face images generated each time are collectively stored in the super-resolution face image set, and the total number of the super-resolution face images in the super-resolution face image set is N when the 1 frame of super-resolution face images is recorded as a first super-resolution face image.

Obtaining the N random variables and the 1 super-resolution face image set may be implemented by inputting the N frames of low-resolution face images into a first network model in an iterative manner, where a specific flow is shown in fig. 3:

s31, determining a frame of low-resolution face image in the N frames of low-resolution face images as a first reference frame;

in this embodiment of the application, the first reference frame may be a 1 st frame of the N frames of low-resolution facial images, or may be a 2 nd frame, a 3 rd frame, a 4 th frame, and the like, and the 1 st frame of low-resolution facial image is selected in this application.

S32, putting 1 frame of real high-resolution face image of the first target into a super-resolution face image set as a first super-resolution face image, and taking the next frame image of the first reference frame as a first low-resolution face image;

s33, inputting the first super-resolution face image and the first low-resolution face image into the first network model to obtain a random variable corresponding to the first low-resolution face image;

s34, inputting the random variable and the first reference frame into the first network model to obtain a second super-resolution face image;

s35, putting the second super-resolution face image into a super-resolution face image set, and judging whether the number of image frames in the super-resolution face image set is N;

in this embodiment of the present application, if the number of frames in the super-resolution face image set is not N, step S36 is executed; otherwise, step S37 is executed.

S36, if the image frame number is not N, replacing the first super-resolution face image with the second super-resolution face image, and replacing the first low-resolution face image with the next frame image of the first low-resolution face image;

if the number of the image frames is not N, replacing the first super-resolution face image with the second super-resolution face image, replacing the first low-resolution face image with a next frame image of the first low-resolution face image, and executing step S33;

and S37, if the number of the image frames is N, obtaining 1 super-resolution face image set with N random variables and N image frames.

Based on the steps, the super-resolution face images in the super-resolution face image set are used for extracting face characteristic values, and the face characteristic values and the N characteristic random variables are used for calculating the output loss of the first network model.

S22, sequentially inputting the super-resolution face images in the super-resolution face image set into an identification network, and extracting to obtain N face characteristic values;

s23, inputting the N random variables and the N face characteristic values into a loss function, and calculating to obtain the output loss of the first network model;

and constraining the training process of the first network model through the output loss so that random variables output by the first network model are distributed according to a standard positive-false distribution, and the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.

S24, judging whether the training result of the first network model is converged according to the output loss;

if the output loss of the first network model is converged, which indicates that the training result of the first network model is converged, performing step S25; otherwise, go to step 26;

s25, if the training result is converged, recording the first network model after training as a second network model;

if the training result is converged, the first network model is shown to perform super-resolution processing on multiple frames of low-resolution face images, and finally the generated super-resolution face images are consistent with the identity information of the low-resolution face images. And recording the trained first network model as a second network model, wherein the second network model can perform super-resolution processing on a plurality of frames of low-resolution face images of any target, and the identity information in the generated one frame of super-resolution face image is consistent with the identity information in the low-resolution face image.

S26, if the training result is not converged, adjusting the parameters of the first network model, and continuing to train the first network model until the training result is converged;

if the training result is not converged, adjusting the parameters of the first network model, continuing to acquire the N frames of low-resolution face images of the other target, and executing step S11 to continue training the first network model until the training result is converged.

Based on the steps, the N frames of low-resolution face images are input into the first network model, the first network model is trained, and a second network model is obtained after training is completed, wherein the second network model can realize super-resolution processing on multi-frame low-resolution face images of any target, so that super-resolution face images consistent with identity information of the multi-frame low-resolution face images are obtained.

In the training process of obtaining the second network model, the first network model needs to be constrained by the output loss of the first network model, so that the random variable output by the first network model is distributed according to the standard positive distribution, and the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.

To further illustrate the method for calculating the output loss, the output loss calculated in step S23 to obtain the first network model needs to be described in detail, and the specific flow of calculating the output loss is shown in fig. 4:

s41, inputting the N random variables into a negative log-likelihood loss function, and calculating to obtain a negative log-likelihood loss;

in the embodiment of the present application, the negative log-likelihood loss is used to constrain the first network model such that the random variables output by the first network model obey a standard positive-likelihood distribution, wherein the negative log-likelihood loss can be calculated by equation (1):

wherein, the formula (1) is a negative log-likelihood loss function,LRin the form of a low-resolution face image,SRin order to super-resolution the face image,

for the distribution parameter, N is the number of frames of the low resolution image,LR _1ithe i-th frame low resolution face image representing the input first network model,p _Z（z_1i) Represents the spatial distribution of the random variable,Z _1ito represent a random variable resulting from inputting the ith frame of low-resolution face image into the first network model,

is a first network model, the first network model

Is decomposed into M reversible layer sequences

；

S42, inputting the N face characteristic values into a cosine loss function, and calculating to obtain cosine loss;

in the embodiment of the present application, the cosine loss indicates a difference degree between the super-resolution face features and the real face features, wherein the cosine loss can be calculated by formula (2):

wherein, the formula (2) is the cosine loss function,Similarity _icosine similarity between the ith super-resolution face image of the first network model and the real high-resolution face image, wherein the cosine similarity takes the valueThe range is (-1, 1), the greater the cosine similarity, the higher the similarity between the super-resolution face image and the real high-resolution face image, and the cosine similarity can be calculated by formula (3):

wherein the content of the first and second substances,Similarity _ithe cosine similarity generated at the ith time is represented, and formula (3) is a cosine similarity function F_iThe face characteristic value is extracted after the super-resolution face image generated by the first network model at the ith time is input into a recognition network; f₀Extracting face characteristic values after a real high-resolution face image is input into the recognition network;

s43, inputting the cosine loss into a cosine comparison loss function, and calculating to obtain cosine comparison loss;

in the embodiment of the present application, the cosine comparison loss is used to constrain the first network model, so that the similarity between the super-resolution face image generated each time and the real high-resolution face image is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image, that is, the similarity is greater than the similarity between the super-resolution face image generated last time and the real high-resolution face image, that is, the first network model is constrained by the cosine comparison lossSimilarity _i+1Is greater thanSimilarity _iThe cosine comparison loss can be calculated by formula (4):

wherein, the formula (4) is a cosine comparison loss function, e is the base number of the natural logarithm,αare comparative coefficients.

And S44, inputting the negative log likelihood loss, the cosine loss and the cosine comparison loss into a loss function, and calculating to obtain the output loss of the first network model.

In the embodiment of the present application, the output loss is used to constrain a training process of the first network model, so that not only can a random variable encoded by the first network model obey a standard positive distribution, but also a similarity between a super-resolution face image generated by the first network model each time and a real high-resolution face image is greater than a similarity between a super-resolution face image generated last time and a real high-resolution face image. The output loss can be calculated by equation (5):

wherein equation (5) is the loss function.

Based on the steps, calculating to obtain the output loss of the first network model, if the output loss is not converged, adjusting the parameters of the first network model, and continuing to train the first network model until the output loss is converged.

When the output loss is converged, the fact that the second network model obtained after training can perform super-resolution processing on the N frames of low-resolution face images is shown, so that N frames of super-resolution face images are obtained, and in the super-resolution processing process, the similarity between the super-resolution face image generated each time and the real high-resolution face image is larger than the similarity between the super-resolution face image generated last time and the real high-resolution face image.

To further illustrate how the second network model performs super-resolution processing on the N frames of low-resolution face images, step S13 is described in detail, specifically as shown in fig. 5, which is a specific flow of the super-resolution processing:

s51, randomly sampling a first random variable from random variables which are generated in the training process and are distributed according to the standard plus-minus-plus-minus ratio, and determining a second reference frame from the N frames of low-resolution face images;

s52, taking the next frame image of the second reference frame as a second low-resolution face image;

s53, inputting the first random variable and a second reference frame into a second network model to obtain a super-resolution face image corresponding to the first random variable;

s54, counting the super-resolution face images generated each time, and judging whether the total frame number of the super-resolution face images is N;

the reason for counting the super-resolution face images generated each time is to determine whether all the N frames of low-resolution face images have been super-resolution processed. If the total frame number of the super-resolution face image is N, the super-resolution is finished, and step S55 is executed; otherwise, step S56 is executed.

S55, if the total frame number of the super-resolution face image is N, the super-resolution is finished, and the N-frame super-resolution face image is obtained;

s56, if the total frame number of the super-resolution face image is not N, inputting the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable;

and S57, replacing the first random variable with the second random variable, replacing the second low-resolution face image with the next frame image of the second low-resolution face image, and continuing to perform super-resolution processing on the replaced second low-resolution face image.

And replacing the first random variable with the second random variable, and after replacing the second low-resolution face image with the next frame image of the second low-resolution face image, executing step S53 to realize super-resolution processing of the second low-resolution face image after continuous replacement.

Based on the mode, the second network model is used for performing super-resolution processing on the N frames of low-resolution face images, and the super-resolution face image generated each time has more detail features of one frame of low-resolution face image than the super-resolution face image generated last time, so that the super-resolution face image generated last contains the detail features of the N frames of low-resolution face images, namely the identity information in the super-resolution face image generated last is consistent with the identity information in the N frames of low-resolution face images.

Of course, based on the above steps, the second network model is used, so that not only the super-resolution processing can be performed on the N frames of low-resolution face images of the first target, but also the super-resolution processing can be performed on the N frames of low-resolution face images of the second target, and the identity information in the generated last frame of super-resolution face image of the second target is consistent with the identity information in the N frames of low-resolution face images of the second target.

Further, in order to explain an image generation method provided by the present application in more detail, the method provided by the present application is described in detail below through a specific application scenario.

Before generating an image, a first network model needs to be trained, and referring to fig. 6, N frames of low-resolution face images of a first target are sorted according to an acquisition sequence of an image acquisition device, and are respectively recorded as a 1 st frame of low-resolution face image, a 2 nd frame of low-resolution face image, ⋯, and an nth frame of low-resolution face image. Taking the 1 st frame low-resolution face image as a reference frame LR₁₁Inputting the real high-resolution face image HR of the first target into a recognition network to obtain a 1 st face characteristic value F₀Wherein HR is denoted as SR₀；

Training the 1 st time, and LR (high rate) HR and 2 nd frame low-resolution face images₁₂Inputting the first network model to obtain the 1 st random variable Z₁₁(ii) a Will Z₁₁And LR₁₁Inputting the first network model to generate the 1 st frame super-resolution face image SR₁₁SR₁₁Inputting the identification network to obtain the 2 nd face characteristic value F₂；

2 nd training, SR₁₁And a 3 rd frame low resolution face image LR₁₂Inputting the first network model to obtain the 2 nd random variable Z₁₂(ii) a Will Z₁₂And LR₁₁Inputting the first network model, and generating the 2 nd frame super-resolution face image SR of the first model₁₂SR₁₂Inputting the identification network to obtain the 3 rd personal face characteristic value F₃；

At the i (i)>1) During secondary training, the super-resolution face image SR generated by the (i-1) th time of the first network model is used_1(i-1)And the (i + 1) th frame low-resolution face image LR_1(i+1)Inputting the first network model to obtain the ith random variable Z_1i(ii) a Will Z_1iAnd LR₁₁Inputting the first network model, and generating the i-th frame super-resolution face image SR of the first model_1iSR_1iInputting the identification network to obtain the (i + 1) th face characteristic value F_i；

The face feature value { F to be generated_iI =0, 1, ⋯, N-1} is input into the loss function, the output loss of the first network model is obtained through calculation, and whether the output loss is converged is judged; if yes, the first network model training result is shown to be converged, and the trained first network model is recorded as a second network model; otherwise, adjusting the parameters of the first network model, and continuing to train the first network model until the training result is converged.

Based on the second network model obtained by the training method, super-resolution processing can be performed on the N frames of low-resolution face images of the first target to obtain super-resolution face images consistent with the identity information of the first target, and also can be performed on the N frames of low-resolution face images of the second target to obtain super-resolution face images consistent with the identity information of the second target.

After the training of the first network model is completed and the second network model is obtained, the multi-frame low-resolution face image of any target can be subjected to super-resolution processing through the second network, and a super-resolution face image with the identity consistent with that of the low-resolution face image is obtained. Here, taking the first objective as an example, a specific flow will be described with reference to fig. 7:

at the 1 st super-resolution, randomly sampling a random variable Z in a random variable distribution space which is generated in the training process and meets the standard normal distribution₂₁And reference frame LR₂₁Simultaneously accessing a second network model to generate a 1 st frame super-resolution face image SR of the second model₂₁Here, the 1 st frame low resolution face image of the N frames low resolution face image is determined as the reference frame LR₂₁；

At super resolution of 2 nd time, SR is₂₁And 2 nd frame low resolution face image LR₂₂Simultaneously inputting the second network model to obtain a 2 nd random variable Z₂₂(ii) a Will Z₂₂And LR₂₁Simultaneously inputting the second network model to generate a 2 nd frame super-resolution face image SR of the second model₂₂；

When the face image is super-resolved for the ith time, the super-resolution face image SR of the (i-1) th frame generated by the second network model is used_2(i-1)And the ith frame of low-resolution face image LR_2iSimultaneously inputting the second network model to generate the i-th frame super-resolution face image SR of the second network model_2i；

And when the last frame of low-resolution face image is input into the second network model, generating the last frame of super-resolution face image as a final super-resolution result.

Based on the process, sequentially inputting N frames of low-resolution face images of a first target into a first network model, training the first network model, constraining the training process of the first network model by using the output loss of the first network model to make the training result of the first network model converge, and marking the trained first network model as a second network model. Because the last frame of super-resolution face image obtained in the training process contains the detail information of a plurality of frames of low-resolution face images, the second network model performs super-resolution processing on the N frames of low-resolution face images of the first target, and the identity information of the last frame of super-resolution face image is consistent with the identity information of the first target.

Based on the same inventive concept, an embodiment of the present application further provides an image generating apparatus, as shown in fig. 8, which is a schematic structural diagram of the image generating apparatus in the present application, and the apparatus includes:

an obtaining module 81, configured to obtain N frames of low-resolution face images of a first target, where N is a positive integer greater than or equal to 2;

the training module 82 is configured to train a first network model according to the N frames of low-resolution face images to obtain a second network model, where the first network model can perform super-resolution processing on the low-resolution face images;

the processing module 83 is configured to perform super-resolution processing on the N frames of low-resolution face images in sequence based on the second network model to obtain N frames of super-resolution face images;

and the selecting module 84 is configured to use a last generated super-resolution face image of the N frames of super-resolution face images as a final face image.

In one possible design, as shown in fig. 9, the training module includes:

a calculating unit 91, configured to calculate an output loss of a first network model according to the N frames of low-resolution face images, where the output loss is used to constrain a training process of the first network model;

a determining unit 92, configured to determine whether a training result of the first network model converges according to the output loss;

an adjusting unit 93, configured to adjust parameters of the first network model if the training result is not converged, and continue training the first network model until the training result is converged;

and a marking unit 94, configured to mark the trained first network model as the second network model if the training result is converged.

In one possible design, the computing unit is specifically configured to:

In one possible design, the computing unit is further configured to:

In one possible design, as shown in fig. 10, the processing module includes:

the acquiring unit 101 is configured to randomly sample a first random variable from random variables distributed according to a standard plus-minus-plus-minus direction generated in a training process, and determine a second reference frame from the N frames of low-resolution face images;

the processing unit 102 is configured to input the first random variable and a second reference frame into a second network model, so as to obtain a super-resolution face image corresponding to the first random variable;

the encoding unit 103 is configured to input the super-resolution face image and a second low-resolution face image into the second network model to obtain a second random variable, where the second low-resolution face image is a next frame image of the second reference frame;

and the updating unit 104 is configured to replace the first random variable with the second random variable, replace the second low-resolution face image with a next frame image of the second low-resolution face image, and continue to perform super-resolution processing on the replaced second low-resolution face image to sequentially obtain N frames of super-resolution face images.

Based on the image generation device, N frames of low-resolution face images of a first target are sequentially input into a first network model, the first network model is trained, the training process of the first network model is constrained by the output loss of the first network model, the training result of the first network model is converged, and the trained first network model is recorded as a second network model. Because the last frame of super-resolution face image obtained in the training process contains the detail information of a plurality of frames of low-resolution face images, the super-resolution processing is carried out on the N frames of low-resolution face images of the first target through the second network model, and the obtained last frame of super-resolution face image is consistent with the super-resolution face image of the first target in identity.

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device can implement the function of the foregoing image generation apparatus, and with reference to fig. 11, the electronic device includes:

at least one processor 111 and a memory 112 connected to the at least one processor 111, in this embodiment, a specific connection medium between the processor 111 and the memory 112 is not limited in this application, and fig. 11 illustrates an example in which the processor 111 and the memory 112 are connected through a bus 110. The bus 110 is shown in fig. 11 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 110 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 11 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 111 may also be referred to as a controller, without limitation to name a few.

In the embodiment of the present application, the memory 112 stores instructions executable by the at least one processor 111, and the at least one processor 111 can execute the image generation method discussed above by executing the instructions stored in the memory 112. The processor 111 may implement the functions of the various modules in the apparatus shown in fig. 6.

The processor 111 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions of the apparatus and process data by operating or executing instructions stored in the memory 112 and calling data stored in the memory 112, thereby performing overall monitoring of the apparatus.

In one possible design, processor 111 may include one or more processing units, and processor 111 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 111. In some embodiments, the processor 111 and the memory 112 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 111 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the image generation method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

The memory 112, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 112 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 112 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 112 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

By programming the processor 111, the code corresponding to the image generation method described in the foregoing embodiment may be solidified into the chip, so that the chip can execute the steps of the image generation method of the embodiment shown in fig. 1 when running. How to program the processor 111 is well known to those skilled in the art and will not be described in detail herein.

Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to execute the image generation method discussed above.

In some possible embodiments, the aspects of the image generation method provided by the present application may also be implemented in the form of a program product comprising program code for causing a control apparatus to perform the steps of the image generation method according to various exemplary embodiments of the present application described above in this specification when the program product is run on a device.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An image generation method, characterized in that the method comprises:

2. The method of claim 1, wherein the training a first network model according to the N frames of low resolution face images to obtain a second network model comprises:

3. The method of claim 2, wherein calculating the output loss of the first network model from the N frames of low resolution face images comprises:

4. The method of claim 3, wherein said passing said N frames of low resolution facial images through a first network model to obtain N random variables and 1 super resolution facial image set comprises:

5. The method of claim 3, wherein said inputting said N random variables and said N face feature values into a loss function to calculate an output loss for a first network model comprises:

6. The method of claim 1, wherein performing super-resolution processing on the N frames of low-resolution face images in sequence based on the second network model to obtain N frames of super-resolution face images comprises:

7. An image generation apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, wherein the processing module comprises:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-6 when executing the computer program stored on the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.