CN113011271A

CN113011271A - Method, apparatus, device, medium, and program product for generating and processing image

Info

Publication number: CN113011271A
Application number: CN202110204811.3A
Authority: CN
Inventors: 冯懋
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-06-22

Abstract

According to an embodiment of the present disclosure, a method, an apparatus, a device, and a medium for generating and processing an image are provided. A method of generating an image includes determining a plurality of images about a face of a user from a video about the user. The method also includes determining a feature representation of a state of the user's face in the plurality of images by analyzing the plurality of images. The method also includes generating a second image of the face of the user based on the feature representation and the first image of the face of the user. The second image has a higher definition than the first image. In this way, clear face images can be efficiently obtained, and the effect of subsequent face recognition is favorably improved.

Description

Method, apparatus, device, medium, and program product for generating and processing image

Technical Field

Embodiments of the present disclosure relate generally to the field of image processing, and more particularly, to methods, apparatuses, devices, computer-readable storage media, and program products for generating and processing images.

Background

In the field of image processing, the recognition of faces in images has many uses, such as for authentication, face transformation, and the like. However, in some scenes, the obtained image may not be sharp enough or even blurred, which makes the face recognition effect difficult for subsequent processing. Therefore, in these scenes, it is desirable to be able to enhance the images that are not sharp enough or even blurred to obtain sharp images.

Disclosure of Invention

According to an example embodiment of the present disclosure, a scheme of generating an image and processing the image is provided.

In a first aspect of the disclosure, a method of generating an image is provided. The method includes determining, from a video about a user, a plurality of images about the user's face; determining a feature representation of a state of the user's face in the plurality of images by analyzing the plurality of images; and generating a second image about the face of the user based on the feature representation and the first image about the face of the user, the second image having a higher definition than the first image.

In a second aspect of the present disclosure, a method of processing an image is provided. The method includes determining a plurality of reference images for a face of a reference user from a reference video for the reference user; determining a reference feature representation of a state of a reference user's face in the plurality of reference images by analyzing the plurality of reference images; generating a fourth image of the face of the reference user according to the image generation model based on the reference feature representation and the third image of the face of the reference user; and training the image generation model using the fourth image and a fifth image about the face of the reference user, the fifth image having a higher definition than the third image.

In a third aspect of the present disclosure, an apparatus for generating an image is provided. The apparatus includes an image determination module configured to determine a plurality of images about a face of a user from a video about the user; a feature representation determination module configured to determine a feature representation of a state of a face of a user in a plurality of images by analyzing the plurality of images; and an image generation module configured to generate a second image about the face of the user based on the feature representation and the first image about the face of the user, the second image having a higher definition than the first image.

In a fourth aspect of the present disclosure, an apparatus for processing an image is provided. The apparatus includes a reference image determination module configured to determine a plurality of reference images for a face of a reference user from a reference video for the reference user; a reference feature representation determination module configured to determine a reference feature representation of a state of a reference user's face in a plurality of reference images by analyzing the plurality of reference images; a training image generation module configured to generate a fourth image of the face of the reference user according to the image generation model based on the reference feature representation and the third image of the face of the reference user; and a model training module configured to train the image generation model using the fourth image and a fifth image about the face of the reference user, the fifth image having a higher definition than the third image.

In a fifth aspect of the present disclosure, there is provided an electronic device comprising one or more processors; and storage means for storing the one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to the first aspect of the disclosure.

In a sixth aspect of the present disclosure, there is provided an electronic device comprising one or more processors; and storage means for storing the one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to the second aspect of the disclosure.

In a seventh aspect of the present disclosure, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method according to the first aspect of the present disclosure.

In an eighth aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements a method according to the second aspect of the present disclosure.

In a ninth aspect of the present disclosure, a computer program product is provided comprising computer executable instructions, wherein the computer executable instructions, when executed by a processor, implement the method according to the first aspect of the present disclosure.

In a tenth aspect of the disclosure, a computer program product is provided comprising computer executable instructions, wherein the computer executable instructions, when executed by a processor, implement the method according to the second aspect of the disclosure.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which some embodiments of the present disclosure can be implemented;

FIG. 2 illustrates a flow diagram of an example method of generating an image, according to some embodiments of the present disclosure;

FIG. 3 illustrates a schematic diagram of determining a feature representation according to some embodiments of the present disclosure;

FIG. 4 illustrates a schematic diagram of generating a sharp image according to some embodiments of the present disclosure;

FIG. 5 illustrates a schematic diagram of an example environment for training an image generation model, in accordance with some embodiments of the present disclosure;

FIG. 6 illustrates a flow diagram of an example method of processing an image according to some embodiments of the present disclosure;

FIG. 7 illustrates a schematic diagram of a training image generation model, according to some embodiments of the present disclosure;

FIG. 8 shows a schematic block diagram of an apparatus for generating an image according to some embodiments of the present disclosure;

FIG. 9 shows a schematic block diagram of an apparatus for processing an image according to some embodiments of the present disclosure; and

FIG. 10 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As used herein, the term "model" may learn from training data the associations between respective inputs and outputs, such that after training is complete, a given input is processed based on a trained set of parameters to generate a corresponding output. The "model" may also sometimes be referred to as a "neural network", "learning model", "learning network", or "network". These terms are used interchangeably herein.

As used herein, a "neural network" is capable of processing an input and providing a corresponding output, which generally includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer. Neural networks used in deep learning applications typically include many hidden layers, extending the depth of the network. The layers of the neural network are connected in sequence such that the output of a previous layer is provided as the input of a subsequent layer, wherein the input layer receives the input of the neural network and the output of the output layer is the final output of the neural network. Each layer of the neural network includes one or more nodes (also referred to as processing nodes or neurons), each node processing an input from a previous layer. Convolutional Neural Networks (CNN) are a type of neural network that includes one or more convolutional layers for performing convolutional operations on respective inputs. CNNs can be used in a variety of scenarios, particularly suitable for processing image or video data.

As mentioned briefly above, the images used for face recognition may be insufficiently sharp or even blurred, adversely affecting the effectiveness of the face recognition. Common images for face recognition are captured under visible light, but in some scenes (e.g., at night or in places with insufficient light), it is necessary to acquire images for face recognition using an infrared imaging device. For example, a network appointment management platform may need to obtain an image of a driver driving a network appointment to verify whether it is a registered driver of the network appointment. During nighttime driving, it may be desirable to capture an image of the driver using an infrared imaging device.

Due to the difference in imaging modes, the image acquired by the infrared imaging device has a significant difference from the image obtained under visible light. On the other hand, the quality of the acquired image is generally significantly lower than that in visible light due to an infrared imaging apparatus or the like. Furthermore, the user may move during the acquisition of the image, which also makes the acquired image less sharp. The above-described factors cause that images acquired in an infrared scene are not clear enough or even blurred, which causes great interference to face recognition in the infrared scene. Therefore, it is desirable to enhance images that are not sharp enough or even blurred to improve the face recognition effect in infrared scenes.

Conventional image enhancement schemes include super-resolution schemes, face image generation schemes, face property editing schemes, and the like. These conventional solutions are mainly directed to visible light scenes and do not take into account the features of infrared imaging.

However, images in the infrared scene and images in the visible scene have significant differences. Because the principle of infrared imaging is different from that of visible light imaging, the human faces with similar colors in a visible light scene have obvious difference in human face imaging effect in the infrared scene depending on different distances. In view of this, if the data enhancement scheme for the visible light scene is directly used for the image in the infrared scene, a good enhancement effect cannot be achieved, for example, a blurred image cannot be converted into a sharp image.

According to an embodiment of the present disclosure, a solution for generating an image is proposed, which aims to solve one or more of the above-mentioned problems and other potential problems. In this scheme, a plurality of images about the face of the user are determined from the video about the user. By analyzing the images, a characteristic representation of the state of the user's face in the images is determined. Based on the feature representation and the blurred image about the user's face, a sharp image about the user's face is generated. The generated sharp image can be subsequently used for face recognition.

Utilizing the state of the user's face in the plurality of images may assist in generating a sharp image based on the blurred image. In this way, clear face images can be obtained efficiently, and the effect of subsequent face recognition is improved. The embodiment of the disclosure is suitable for image enhancement under various illumination scenes, and is particularly suitable for image enhancement under infrared scenes.

In order to more clearly understand the scheme of generating an image of the embodiments of the present disclosure, the embodiments of the present disclosure will be further described with reference to the accompanying drawings. Fig. 1 illustrates a schematic diagram of an example environment 100 in which some embodiments of the present disclosure can be implemented. In general, the example environment 100 includes a computing device 102, a video 105 about a user, an input image 110 (also referred to as a "first image") about the user's face, and an output image 120 (also referred to as a "second image") about the user's face.

The video 105 about the user may be captured by any suitable imaging device. In some embodiments, the imaging device may be disposed on a vehicle (e.g., a net appointment vehicle). In this case, the user is the driver of the vehicle. For example, a car appointment may be equipped with a car-machine system having a face recognition function, so that a driver can log in without using a mobile phone. The face recognition function may capture an image of a driver currently driving a vehicle to verify whether it is a registered driver. The in-vehicle system may include an imaging device for capturing an image of the driver.

Alternatively, in some embodiments, the imaging device may be disposed on or around the machine. In this case, the user is an operator of the machine. For example, a machine may have face recognition functionality to verify whether the person currently operating the machine is a registered operator or has the right to use the machine.

The imaging device used to capture the video 105 may be any type of device. In some embodiments, the imaging device may be an infrared imaging device, such as an infrared imager disposed on a net appointment vehicle. In some embodiments, the imaging device may be a visible light imaging device, such as a camera.

The input image 110 may not be sharp enough or even blurred. The input image 110 and the video 105 may be captured by the same imaging device. In some embodiments, the input image 110 may be determined from the video 105, as will be described below.

Computing device 102 may be any device with computing capabilities. By way of non-limiting example, the computing device 102 may be any type of stationary, mobile, or portable computing device, including but not limited to a desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, multimedia computer, mobile phone, and the like.

The computing device 102 can communicate with an imaging device for capturing the video 105 to receive the video 105 from the imaging device. In embodiments where the imaging device is disposed on a vehicle, the computing device 102 may communicate with a vehicle-in-machine system on the vehicle to receive the video 105. In this case, the computing device 102 may be, for example, a backend server of a networked car reservation platform. Alternatively, the computing device 102 may be part of a vehicle-on-board system on a vehicle.

The computing device 102 generates an output image 120 of the face of the user based on the video 105 of the user and the input image 110 of the face of the user. As shown in fig. 1, the output image 120 has a higher definition than the input image 110.

The generation of the output image 120 is described below with reference to fig. 2. Fig. 2 illustrates a flow diagram of an example method 200 of generating an image, according to some embodiments of the present disclosure. The method 200 may be implemented by the computing device 102 of fig. 1. For ease of discussion, the method 200 will be described in conjunction with FIG. 1.

At block 210, the computing device 102 determines a plurality of images 130-1, 130-2, 130-3, 130-4, and 130-5, which may also be collectively referred to as the plurality of images 130 or individually as the images 130, about the user's face from the video 105 about the user. It should be understood that the number of the plurality of images 130 shown in fig. 1 is merely exemplary, and is not intended to limit the scope of the present disclosure. In embodiments of the present disclosure, a greater or lesser number of images 130 may be determined from the video 105. The image 130 may be a complete frame of the video 105 or a portion of a frame of the video 105, for example, by cropping the complete frame.

The plurality of images 130 may be a plurality of frames in the video 105 arranged in a chronological order. In some embodiments, the plurality of images 130 may be a plurality of contiguous adjacent frames in the video 105. For example, the plurality of images 130 may be the ith, i +1 th, i +2 th, i +3 th, and i +4 th frames, respectively, of the video 105, where i is a positive integer. Alternatively, in some embodiments, the plurality of images 130 may be a plurality of frames spaced apart in the video 105. For example, the plurality of images 130 may be a jth frame, a jth +2 frame, a jth +4 frame, a jth +6 frame, and a jth +8 frame of the video 105, respectively, where j is a positive integer. Alternatively, in other embodiments, the plurality of images 130 may be a plurality of frames randomly selected from the video 105.

At block 220, the computing device 102 determines a feature representation of a state of the user's face in the plurality of images 130 by analyzing the plurality of images 130. In particular, the feature representation of the state may be implemented as a face state code. A single value or a combination of values in the face state code may represent the state of a dimension, such as the size of the face, the position of the face, the orientation of the face, occlusion conditions, etc.

In some embodiments, the computing device 102 may determine the location of multiple components of the user's face in each image 130. The positions of the components determined for the plurality of images 130, respectively, may be encoded into a characteristic representation of the state. For example, the computing device 102 may perform keypoint detection on each image 130 to determine the location of the mouth, eyes, nose, eyebrows, etc., in each image 130. Further, the respective locations of these sites in the plurality of images 130 may be combined into a characteristic representation of the state.

In some embodiments, the computing device 102 may determine the feature representation of the state based on both static and dynamic properties of the user's face. Static attributes relate to the state of the face itself that is not affected by external factors. The static attributes may include the size, location of the user's face in the image 130. The static attributes may also include the size and location of components of the face in the image 130, such as the size and location of the mouth, eyes, nose, eyebrows, etc., in the image 130.

The dynamic property is related to a state in which the face is susceptible to external factors (e.g., changes in the surrounding environment, movement of the user, etc.). The dynamic properties may include the orientation of the user's face in the image 130, such as whether the face is tilted left or right in the image 130 to rotate backward. The orientation may be represented by an offset angle of the central axis of the face with respect to the z, y, z coordinate axes. Alternatively or additionally, the dynamic property may include a condition in which the user's face is occluded in the image 130. The occluded condition may be classified as no occlusion, upper side occlusion (e.g., the user may be wearing sunglasses), lower side occlusion (e.g., the user may be wearing a mask), and the like. Alternatively or additionally, the dynamic attribute may include the brightness, i.e., the degree of darkness, of the user's face in the image 130. The determined brightness may be represented by a predetermined level. Alternatively or additionally, the dynamic attribute may include the sharpness of the user's face in the image 130. For example, the sharpness of the user's face may be different in different images 130 because the user may be moving or the user's head is rotating.

An embodiment of determining a feature representation based on both static and dynamic attributes is described below with reference to FIG. 3. Fig. 3 illustrates a schematic diagram 300 of determining a feature representation according to some embodiments of the present disclosure. The computing device 102 performs face detection on each of the plurality of images 130 to obtain a face detection result 310. The computing device 102 may utilize any suitable face detection model or algorithm for face detection, embodiments of the present disclosure are not limited in this respect.

After obtaining the face detection result 310, the subsequent processing may be divided into two branches. In the left branch as shown in fig. 3, the computing device 102 determines a dynamic attribute 320 of the user's face for each image 130. For example, the computing device 102 may determine one or more of the orientation, occluded condition, brightness, clarity described above. Dynamic attributes 320 may represent an estimate of the dynamic characteristics of the face. In some embodiments, the dynamic attributes 320 may be further processed to estimate the motion profile of the face. For example, it may be determined in which direction the face is skewed or rotated based on differences in the orientation of the user's face between the plurality of images 130. As another example, it may be determined whether the user's face is near or far from an imaging device (e.g., an infrared imaging device) based on differences in brightness of the user's face among the plurality of images 130.

In the right branch as shown in fig. 3, the computing device 102 determines a static attribute 330 of the user's face for each image 130. For example, the computing device 102 may determine the size and location of the user's face in each image 130. Alternatively or additionally, the computing device 102 may determine a size and location of a component of the face in the image 130, such as a size and location of a region of the mouth, eyes, nose, eyebrows, etc., in the image 130.

In some embodiments, the input image 110 may be selected from a plurality of images 130. For example, where the plurality of images 130 are consecutive adjacent frames in the video 105, the input image 110 may be an intermediate frame in the plurality of images 130, such as image 130-3. As another example, the input image 110 may be selected from the plurality of images 130 based on the sharpness of the face in the dynamic property 320. Thus, the input image 110 may be the image with the highest definition of the face among the plurality of images 130.

In such embodiments, the static properties of the face in the input image 110 may be determined based on the static properties of the face in the other images of the plurality of images 130 than the input image 110. The face detection results 310 may be used to initially determine static attributes of the face in the input image 110, such as the size and location of the face. The static attributes of the faces in the other images 130 may then be utilized to modify the static attributes initially determined for the input image 110. For example, where input image 110 is image 130-3, interpolation may be utilized to correct the static properties of the face in image 130-3 based on the static properties of the face in images 130-1, 130-2, 130-4, and 130-5. As another example, where the input image 110 is the image 130-5, the static properties of the face in the image 130-5 may be modified using extrapolation based on the static properties of the face in the images 130-1, 130-2, 130-3, and 30-4. In this way, the corrected static properties are more accurate, thereby helping to ensure that the final sharp image is accurate.

After obtaining the dynamic attributes 320 (and optionally the motion profile) and the static attributes 330, the computing device 102 combines the dynamic attributes 320 (and optionally the motion profile) and the static attributes 330 determined separately for the plurality of images 130 into a feature representation 340. For example, the computing device 102 may generate a feature vector having a plurality of dimensions. A single dimension or a combination of dimensions of a feature vector may represent a dynamic or static attribute.

In the embodiment described with reference to FIG. 3, the characteristic representation of the state is determined based on both static and dynamic attributes. The feature representation thus determined can comprehensively reflect the state of the face of the user. This is beneficial for obtaining a final sharp image.

Reference is made back to fig. 2. At block 230, the computing device 102 generates a second image, i.e., the output image 120, about the user's face based on the feature representation (e.g., the feature representation 340 in fig. 3) and the first image (i.e., the input image 110) about the user's face. The output image 120 has a higher definition than the input image 110.

In some embodiments, the computing device 102 may utilize the determined feature representation to modify the input image 110 to obtain a sharp output image 120. For example, the determined feature representation may be used to correct blurred pixels in the input image 110.

In some embodiments, the computing device 102 may utilize an image generation model to generate the output image 120. Refer to fig. 4. Fig. 4 illustrates a schematic diagram 400 showing the generation of a sharp image according to some embodiments of the present disclosure. As shown in FIG. 4, an output image 120 may be generated from an image generation model 410 with the feature representation 340 and the input image 110 as inputs. The image generation model 410 is trained to convert an input image having a first definition to an output image having a second definition, and the second definition is higher than the first definition.

The image generation model 140 may be based on any suitable network, such as a variational auto-encoder (VAE). In some embodiments, the image generation model 140 may be based on a generative confrontation network (GAN). GAN can facilitate the generation of clearer and more realistic images than other types of networks.

In some embodiments, the computing device 102 may also utilize the generated output image 120 for face recognition. The result of face recognition can be used for identity verification and the like. For example, in embodiments where the video 105 is captured by an imaging device on a net appointment, the results of face recognition may be used to verify whether the person currently driving the net appointment is a registered driver of the net appointment.

As can be seen from the above description, using the state of the user's face in multiple images can assist in generating a sharp image based on a blurred image. In this way, clear face images can be obtained efficiently, and the effect of subsequent face recognition is improved. The embodiment of the disclosure is suitable for image enhancement under various illumination scenes, and is particularly suitable for image enhancement under infrared scenes.

Training of the image generation model 410 is described below with reference to fig. 5 to 7. FIG. 5 illustrates a schematic diagram of an example environment 500 for training the image generation model 410, according to some embodiments of the present disclosure. In general, the example environment 500 includes training data 501, a computing device 502, and an image generation model 410.

The training data 501 includes reference video 505 for a reference user. The imaging device used to capture the reference video 505 may be of the same type as the imaging device used to capture the video 105. For example, where the image generation model 104 is to be used for image enhancement in an infrared scene, the reference video 505 may be captured by an infrared imaging device. During the capture of the reference video 505, the reference video 505 may be blurred in a number of ways to facilitate training of the image generation model 410. For example, during the capture of the reference video 505, the reference user, and in particular the head of the reference user, may be in motion.

The training data 501 also includes a training input image 510, also referred to as a third image, for the reference user's face. The training input image 510 is also blurred. In some embodiments, the training input images 510 may be determined from the reference video 505, similar to the determination of the input images 110 described above.

The training data 501 also includes a clear image 550, also referred to as a fifth image, about the face of the reference user. The sharp image 550 has a higher sharpness than the reference video 505 and the training input image 510. The clear image 550 may be a still image of a reference user captured by the imaging device. Alternatively, the clear image 550 may be a still video captured by the imaging device in which the reference user, and in particular the head of the reference user, is not moving.

Computing device 502 may be any device with computing capabilities. By way of non-limiting example, the computing device 502 may be any type of stationary, mobile, or portable computing device, including but not limited to a desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, multimedia computer, mobile phone, and the like. Computing device 502 may be the same as or different from computing device 102. The computing device 502 trains the image generation model 410 using the training data 501.

Fig. 6 illustrates a flowchart of an example method 600 of processing an image according to some embodiments of the present disclosure. The method 600 may be implemented by the computing device 502 of fig. 1. For ease of discussion, the method 600 will be described in conjunction with FIG. 5.

At block 610, the computing device 502 determines a plurality of reference images 530-1, 530-2, 530-3, 530-4, and 530-5, which may also be collectively referred to as the plurality of reference images 530 or individually as the reference image 530, for the reference user's face from the reference video 505 for the reference user. It should be understood that the number of the plurality of reference images 530 shown in fig. 5 is merely exemplary, and is not intended to limit the scope of the present disclosure.

In addition, the number of the plurality of reference pictures 530 is the same as the number of the plurality of pictures 130. The manner in which reference picture 530 is determined from reference video 505 may be the same as the manner in which picture 10 is determined from video 105. For example, where the plurality of images 130 are consecutive adjacent frames in the video 105, the plurality of reference images 130 are consecutive adjacent frames in the reference video 505.

At block 620, the computing device 502 determines a reference feature representation of a state of the reference user's face in the plurality of reference images 530 by analyzing the plurality of reference images 530. The determination of the reference signature is the same as that described above with respect to block 220 and therefore is not described in detail.

At block 630, the computing device 502 generates a fourth image, also referred to as a training output image, for the face of the reference user according to the image generation model 410 based on the reference feature representation and the third image (i.e., the training input image 510) for the face of the reference user.

At block 640, the computing device 502 trains the image generation model 410 using the fourth image and the fifth image (i.e., the sharp image 550) for the reference user's face. The image generation model 410 may be trained in different ways depending on the particular network employed by the image generation model 410. For example, in some embodiments, the image generation model 410 may be trained in a supervised learning manner with the sharp image 550 as a true value.

In embodiments where the image generation model 410 is based on GAN, an image discrimination model, i.e., a discriminator, also needs to be established in the training. The training of the GAN-based image generation model 410 is described below with reference to fig. 7. FIG. 7 illustrates a schematic diagram 700 of training the image generation model 410, according to some embodiments of the present disclosure.

The computing device 502 determines a reference feature representation 710 of a state of a reference user's face in the plurality of reference images 530 based on analyzing the plurality of reference images 530. Subsequently, the reference feature representation 710 and the training input image 510 are input into the image generation model 410 together to generate a training output image 720 (i.e., a fourth image).

The training output image 720 is input to the image discrimination model 730 together with the sharp image 550. The image discrimination model 730 is configured to determine the difference between the training output image 720 and the sharp image 550 as a discrimination result 740. For example, the image discrimination model 730 may be implemented as a classifier, which may classify the training output image 720 as a sharp image or a blurred image.

As indicated by arrows 750 and 760, the discrimination results 740 may be propagated back to the image generation model 410 and the image discrimination model 730 to train the image generation model 410 and the image discrimination model 730. The image generation model 410 and the image discrimination model 730 may be iteratively trained continuously until the trained image discrimination model 730 is unable to distinguish between the image generated by the trained image generation model 410 and the sharp image 550. In other words, the difference between the trained output image 720 generated by the trained image generation model 410 and the sharp image 550 is less than the threshold.

The embodiment of the disclosure also provides a corresponding device for realizing the method. Fig. 8 illustrates a schematic block diagram of an apparatus 800 for generating an image according to some embodiments of the present disclosure. The apparatus 800 may be included in the computing device 102 of fig. 1.

As shown in fig. 8, the apparatus 800 includes an image determination module 810 configured to determine a plurality of images for a face of a user from a video for the user. The apparatus 800 further comprises a feature representation determination module 820 configured to determine a feature representation of a state of the user's face in the plurality of images by analyzing the plurality of images. The apparatus 800 further comprises an image generation module 830 configured to generate a second image of the face of the user based on the feature representation and the first image of the face of the user. The second image has a higher definition than the first image.

In some embodiments, the image generation module 830 comprises: a model application module configured to generate a second image from an image generation model based on the feature representation and the first image, the image generation model being trained to convert an input image having a first definition into an output image having a second definition and the second definition being higher than the first definition.

In some embodiments, the image generation model is based on a generative confrontation network GAN.

In some embodiments, the feature representation determination module 820 includes: an attribute determination module configured to determine a static attribute and a dynamic attribute of a face of a user in each of a plurality of images; and an attribute combining module configured to combine the static attributes and the dynamic attributes determined separately for the plurality of images into a feature representation.

In some embodiments, the first image is selected from a plurality of images. The attribute determination module includes: an attribute modification module configured to determine a static attribute of the face of the user in the first image based on the static attribute of the face of the user in other images than the first image among the plurality of images.

In some embodiments, the static attributes include at least one of: a size of the user's face, or a position of the user's face. The dynamic attributes include at least one of: an orientation of the user's face, a condition that the user's face is occluded, or a brightness of the user's face.

In some embodiments, the video is captured by an infrared imaging device.

Fig. 9 shows a schematic block diagram of an apparatus 900 for processing an image according to some embodiments of the present disclosure. Apparatus 900 may be included in computing device 502 of fig. 5.

As shown in fig. 9, the apparatus 900 includes a reference image determination module 910 configured to determine a plurality of reference images for a face of a reference user from a reference video for the reference user. The apparatus 900 further comprises a reference feature representation determining module 920 configured to determine a reference feature representation of a state of a reference user's face in the plurality of reference images by analyzing the plurality of reference images. The apparatus 900 further comprises a training image generation module 930 configured to generate a fourth image on the face of the reference user according to the image generation model based on the reference feature representation and the third image on the face of the reference user. The apparatus 900 further includes a model training module 940 configured to train the image generation model using the fourth image and the fifth image with respect to the face of the reference user. The fifth image has a higher definition than the third image.

In some embodiments, model training module 940 includes: a confrontation network training module configured to train the image generation model and the image discrimination model using the fourth image and the fifth image such that a difference between the fourth image and the fifth image determined from the trained image discrimination model is less than a threshold.

In some embodiments, the reference feature representation determination module 920 includes: a reference attribute determination module configured to determine a static attribute and a dynamic attribute of a face of a reference user in each reference image of a reference plurality of images; and a reference attribute combining module configured to combine the static attributes and the dynamic attributes determined separately for the plurality of reference images into a reference feature representation.

In some embodiments, the third image is selected from a plurality of reference images, and wherein the reference property determination module comprises: a reference attribute modification module configured to determine a static attribute of the face of the reference user in the third image based on the static attribute of the face of the reference user in other reference images than the third image among the plurality of reference images.

In some embodiments, the reference video is captured by an infrared imaging device and the fifth image is a still image captured by the infrared imaging device.

Fig. 10 illustrates a schematic block diagram of an example device 1000 that can be used to implement embodiments of the present disclosure. Device 1000 can be used to implement computing device 102 of fig. 1 or computing device 502 of fig. 5. As shown, device 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM)1002 or computer program instructions loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The processing unit 1001 performs the various methods and processes described above, such as any of the

methods

200 and 600. For example, in some embodiments, either of

methods

200 and 600 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into RAM 1003 and executed by CPU 1001, one or more steps of any of

methods

200 and 600 described above may be performed. Alternatively, in other embodiments, the CPU 1001 may be configured to perform any of the

methods

200 and 600 by any other suitable means (e.g., by way of firmware).

The present disclosure may be methods, apparatus, systems, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for carrying out various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

The embodiment of the application discloses:

ts1. a method of generating an image, comprising:

determining, from a video about a user, a plurality of images about a face of the user;

determining a feature representation of a state of the user's face in the plurality of images by analyzing the plurality of images; and

generating a second image about the face of the user based on the feature representation and the first image about the face of the user, the second image having a higher definition than the first image.

Ts2. the method of TS1, wherein generating the second image comprises:

based on the feature representation and the first image, generating the second image according to an image generation model trained to convert an input image having a first definition into an output image having a second definition and the second definition being higher than the first definition.

Ts3. the method according to TS2, wherein the image generation model is based on a generative confrontation network GAN.

Ts4. the method of TS1, wherein determining the feature representation comprises:

determining a static attribute and a dynamic attribute of the user's face in each of the plurality of images; and

combining the static properties and the dynamic properties determined separately for the plurality of images into the feature representation.

Ts5. the method of TS4, wherein the first image is selected from the plurality of images, and wherein determining static properties of the user's face in each image of the plurality of images comprises:

determining a static attribute of the user's face in the first image based on the static attribute of the user's face in other images of the plurality of images than the first image.

Ts6. the method according to TS4, wherein the static properties comprise at least one of:

the size of the face of the user, or

A position of the face of the user, an

Wherein the dynamic properties include at least one of:

the orientation of the face of the user,

a condition that the face of the user is occluded, or

A brightness of the face of the user.

Ts7. the method of TS1, wherein the video is captured by an infrared imaging device.

Ts8. a method of processing an image, comprising:

determining a plurality of reference images for a face of a reference user from a reference video for the reference user;

determining a reference feature representation of a state of the reference user's face in the plurality of reference images by analyzing the plurality of reference images;

generating a fourth image of the face of the reference user according to an image generation model based on the reference feature representation and the third image of the face of the reference user; and

training the image generation model using the fourth image and a fifth image of the reference user's face, the fifth image having a higher definition than the third image.

Ts9. the method of TS8, wherein training the image generation model comprises:

training the image generation model and the image discrimination model using the fourth image and the fifth image such that a difference between the fourth image and the fifth image determined from the trained image discrimination model is less than a threshold.

Ts10. the method according to TS9, wherein determining the reference feature representation comprises:

determining a static attribute and a dynamic attribute of the reference user's face in each of the reference plurality of images; and

combining the static and dynamic properties determined separately for the plurality of reference images into the reference feature representation.

Ts11. the method of TS10, wherein the third image is selected from the plurality of reference images, and wherein determining a static property of the reference user's face in each of the plurality of reference images comprises:

determining the static attribute of the reference user's face in the third image based on the static attribute of the reference user's face in other of the plurality of reference images than the third image.

Ts12. the method of TS8, wherein the reference video is captured by an infrared imaging device and the fifth image is a still image captured by the infrared imaging device.

Ts13. an apparatus for generating an image, comprising:

an image determination module configured to determine a plurality of images about a face of a user from a video about the user;

a feature representation determination module configured to determine a feature representation of a state of the user's face in the plurality of images by analyzing the plurality of images; and

an image generation module configured to generate a second image about the face of the user based on the feature representation and a first image about the face of the user, the second image having a higher definition than the first image.

Ts14. the apparatus of TS13, wherein the image generation module comprises:

a model application module configured to generate the second image from an image generation model based on the feature representation and the first image, the image generation model being trained to convert an input image having a first definition into an output image having a second definition and the second definition being higher than the first definition.

Ts15. the apparatus according to TS14, wherein the image generation model is based on a generative confrontation network GAN.

Ts16. the apparatus according to TS13, wherein the feature representation determination module comprises:

an attribute determination module configured to determine a static attribute and a dynamic attribute of the user's face in each of the plurality of images; and

a property combination module configured to combine the static properties and the dynamic properties determined separately for the plurality of images into the feature representation.

The device of TS17. according to TS16, wherein the first image is selected from the plurality of images, and wherein the attribute determination module comprises:

an attribute modification module configured to determine a static attribute of the face of the user in the first image based on the static attribute of the face of the user in the other images of the plurality of images except the first image.

Ts18. the apparatus of TS16, wherein the static attributes include at least one of:

the size of the face of the user, or

A position of the face of the user, an

Wherein the dynamic properties include at least one of:

the orientation of the face of the user,

a condition that the face of the user is occluded, or

A brightness of the face of the user.

The apparatus of TS13, wherein the video is captured by an infrared imaging device.

Ts20. an apparatus for processing an image, comprising:

a reference image determination module configured to determine a plurality of reference images for a face of a reference user from a reference video for the reference user;

a reference feature representation determination module configured to determine a reference feature representation of a state of the reference user's face in the plurality of reference images by analyzing the plurality of reference images;

a training image generation module configured to generate a fourth image for the face of the reference user according to an image generation model based on the reference feature representation and a third image for the face of the reference user; and

a model training module configured to train the image generation model using the fourth image and a fifth image about the face of the reference user, the fifth image having a higher definition than the third image.

Ts21. the device according to TS20, wherein the model training module comprises:

a confrontation network training module configured to train the image generation model and an image discrimination model using the fourth image and the fifth image such that a difference between the fourth image and the fifth image determined from the trained image discrimination model is less than a threshold.

Ts22. the apparatus according to TS21, wherein the reference feature representation determination module comprises:

a reference attribute determination module configured to determine a static attribute and a dynamic attribute of the reference user's face in each of the reference plurality of images; and

a reference attribute combining module configured to combine the static attributes and the dynamic attributes determined separately for the plurality of reference images into the reference feature representation.

Ts23. the device according to TS22, wherein the third picture is selected from the plurality of reference pictures, and wherein the reference property determination module comprises:

a reference attribute modification module configured to determine the static attribute of the reference user's face in the third image based on the static attribute of the reference user's face in other of the plurality of reference images than the third image.

The apparatus of TS20, wherein the reference video is captured by an infrared imaging device and the fifth image is a still image captured by the infrared imaging device.

Ts25. an electronic device, the device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of TS 1-7.

Ts26. an electronic device, the device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as recited in any of TS 8-12.

Ts27. a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of the TS 1-7.

Ts28. a computer program product comprising computer executable instructions, wherein the computer executable instructions, when executed by a processor, implement the method as set forth in any of the TS 8-12.

Claims

1. A method of generating an image, comprising:

2. The method of claim 1, wherein generating the second image comprises:

3. The method of claim 1, wherein determining the feature representation comprises:

4. The method of claim 3, wherein the first image is selected from the plurality of images, and wherein determining a static attribute of the user's face in each of the plurality of images comprises:

5. The method of claim 1, wherein the video is captured by an infrared imaging device.

6. A method of processing an image, comprising:

7. An apparatus for generating an image, comprising:

8. An apparatus for processing an image, comprising:

9. An electronic device, the device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to any one of claims 1-5.

10. An electronic device, the device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of claim 6.