CN109978804B

CN109978804B - Human eye sight correction method and system based on deep learning

Info

Publication number: CN109978804B
Application number: CN201910175164.0A
Authority: CN
Inventors: 鲁继文; 周杰; 任亮亮
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2021-02-26
Anticipated expiration: 2039-03-08
Also published as: CN109978804A

Abstract

The invention discloses a human eye sight line correction method and a human eye sight line correction system based on deep learning, wherein the method comprises the following steps: acquiring a human eye picture; processing the human eye picture through a coarse adjustment distortion network to obtain a human eye generation image in a coarse stage; and detecting a defect area in the human eye generated image through the fine correction network, and correcting the defect area. According to the method, for the input human eye picture, a generated image in a coarse stage is obtained by using a distortion-based method, and then a defect region in the image output in the coarse stage is detected by using a cyclic strategy network based on depth enhancement learning, so that the error between the generated image and a real image is effectively reduced, the visual defect and the non-reality sense in the image are eliminated, and meanwhile, the image details such as reflective bright spots and the like can be recovered.

Description

Human eye sight line correction method and system based on deep learning

Technical Field

The invention relates to the technical field of digital image processing, in particular to a human eye sight line correction method and system based on deep learning.

Background

Gaze Correction (Gaze Correction) is the processing of a picture of a person's eyes to change the direction of the person's eyes in the picture. The gaze correction has practical value and broad prospects in communication scenes such as video calls and the like. However, since the image or video of human eyes may vary greatly in size, resolution, viewing angle, illumination, texture, and occlusion during the acquisition, the problem of visual correction in the real world is still a challenging problem.

Currently, the existing gaze correction methods are mainly classified into two types: a graphics-based gaze correction and a pixel distortion-based gaze correction. For the first category, graphics-based gaze correction is mainly based on the use of 3D eye models with artificial textures to simulate continuous motion of the eyes and head, rendering eye images by geometric mass rendering using dynamic and controllable eye region models. However, the human eye image synthesized by the method has a large difference from the real human eye image. Meanwhile, a 3D model of human eyes is needed in application, but the cost for constructing the 3D model is high, so that the method has great limitation in practical application. For the second category, the gaze correction method based on warping predicts the warped flow field by learning the warping function, thereby directly generating the gaze-corrected image from the original human eye image. For example, Gain et al propose a depth feedforward system that combines the principles of coarse and fine processing, image warping, intensity correction, and the like. Kononenko et al propose a human eye distortion field method implemented by random forest prediction period and capable of running on a CPU (Central Processing Unit) in real time, since the distortion function is pose-specific, it is possible to synthesize a more realistic image using human eye images having different gaze directions and head poses, and have solved head pose and gaze angle variations in practical applications. However, human eye images usually have complicated textures, lighting, occlusion and the like, and the influence of these specific factors is difficult to be accomplished by the overall correction operation. As shown in fig. 1, images generated using only warping methods can have significant defects and non-photorealism problems.

In recent years, Deep Learning (Deep Learning) has been significantly successful in various visual applications, such as object detection, object tracking, object search, and motion recognition. Current deep reinforcement learning methods can be divided into two categories: deep Q learning and policy gradients. For the first class, the Q value is fitted to capture the expected return for taking a particular action in a particular state. For example, one collaborative deep reinforcement learning method proposed by Kong et al jointly locates objects in several iterations. For the second category, the distribution of the strategy is explicitly represented and the strategy is optimized by updating the parameters in the gradient direction. Liu et al applies a policy gradient method to optimize the headline metric and generative countermeasure networks, respectively. Recently, deep reinforcement learning plays an important role in face recognition and synthesis.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a method for correcting a line of sight of a human eye based on deep learning, which effectively reduces an error between a generated image and a real image, eliminates a visual defect and a non-real sense in the image, and can recover image details such as reflective bright spots.

Another object of the present invention is to provide a system for correcting the visual line of human eyes based on deep learning.

In order to achieve the above object, an embodiment of an aspect of the present invention provides a method for correcting a line of sight of a human eye based on deep learning, including: acquiring a human eye picture; processing the human eye picture through a coarse distortion network to obtain a human eye generated image in a coarse stage; and detecting a defect area in the human eye generated image through a fine correction network, and correcting the defect area.

According to the human eye sight line correction method based on the deep learning, the generated image in the coarse stage is obtained by using a distortion-based method for the input human eye image, and then the defect area in the image output in the coarse stage is detected by using a cyclic strategy network based on the deep reinforcement learning. The detected defect area is refined by considering a local correction network of global visual characteristics, so that the visual defects and non-authenticity caused by specific factors such as illumination, texture, shielding and the like are greatly eliminated.

In addition, the method for correcting the sight line of the human eye based on the deep learning according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, before the processing the human eye picture through the coarse distortion network to obtain the human eye generated image in the coarse stage, the method further includes: training a convolutional neural network with a coarse-fine structure to generate a two-dimensional compensation map; the compensation map carries out pixel-level substitution operation on the original image to generate a training image; training the coarse distortion network using a mean square error between the generated training image and the original image as a loss function.

Further, in an embodiment of the present invention, the fine modification network includes: a loop policy network and a local correction network;

the detecting the defect area in the human eye generated image through the fine correction network and correcting the defect area comprises the following steps: the loop strategy network detects a defect area in the human eye generated image; the local correction network corrects the defect area through a convolution layer.

Further, in one embodiment of the present invention, the loop policy network detects a defect region in the human eye-generated image, including:

given the image I of step t_t-1The cyclic strategy network selects a local coordinate position l of a block_tTo select a block

l_t＝f_r(s_t-1)

Wherein s is_t-1Is an encoded state feature of the cyclic strategy network, from an input image I_t-1Encoded historical hidden state h_t-1And the position l of the block selected in the previous step_t-1Co-construction, g denotes a crop operation, image I_t-1In a given position l_tThe block of (b) is clipped as a result.

Further, in an embodiment of the present invention, the local repair network repairs the defect area through a convolutional layer, including: the block to be corrected selected in each step

With said locally modified network f_eTo obtain a corrected block, and using the corrected block to directly replace the block before correction as the corrected image I_tAfter T, the final image I is obtained_T。

In order to achieve the above object, another embodiment of the present invention provides a system for correcting a line of sight of a human eye based on deep learning, including: the acquisition module is used for acquiring a human eye picture; the processing module is used for processing the human eye picture through a coarse adjustment distortion network to obtain a human eye generated image in a coarse stage; and the correction module is used for detecting a defect area in the human eye generated image through a fine correction network and correcting the defect area.

According to the human eye sight line correction system based on the deep learning, the generated image in the coarse stage is obtained by using a distortion-based method for the input human eye image, and then the defect area in the image output in the coarse stage is detected by using a cyclic strategy network based on the deep reinforcement learning. The detected defect area is refined by considering a local correction network of global visual characteristics, so that the visual defects and non-authenticity caused by specific factors such as illumination, texture, shielding and the like are greatly eliminated.

In addition, the human eye sight line correction system based on deep learning according to the above embodiment of the invention may also have the following additional technical features:

further, in an embodiment of the present invention, the method further includes: the generating module is used for training a convolution neural network with a coarse-fine structure to generate a two-dimensional compensation map, the compensation map performs pixel-level substitution operation on an original image to generate a training image, and the coarse distortion network is trained by using the mean square error between the generated training image and the original image as a loss function.

the correction module comprises: a detection unit and a correction unit;

the detection unit is used for detecting a defect area in the human eye generated image by the loop strategy network; the correcting unit is used for correcting the defect area through a convolution layer by the local correcting network.

Further, in an embodiment of the present invention, the detecting unit is specifically configured to:

l_t＝f_r(s_t-1)

Further, in an embodiment of the present invention, the modifying unit is specifically configured to:

the block to be corrected selected in each step

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram of an image generated using a warping method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for correcting a human eye's vision based on deep learning according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for correcting a line of sight of a human eye based on deep learning according to an embodiment of the invention;

FIG. 4 is a schematic diagram of a coarse twist network according to an embodiment of the present invention;

FIG. 5 is a block diagram of a round robin policy network according to one embodiment of the invention;

FIG. 6 is a flow diagram of a local correction network in accordance with one embodiment of the present invention;

fig. 7 is a schematic structural diagram of a human eye vision correcting system based on deep learning according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a human eye sight line correction method and system based on deep learning according to an embodiment of the present invention with reference to the accompanying drawings.

First, a method for correcting a line of sight of a human eye based on deep learning according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 2 is a flowchart of a method for correcting a line of sight of a human eye based on deep learning according to an embodiment of the invention.

As shown in fig. 2, the method for correcting the sight line of the human eye based on the deep learning comprises the following steps:

in step S101, a human eye picture is acquired.

In step S102, the eye image is processed through the coarse distortion network to obtain a coarse-stage eye-generated image.

Further, in an embodiment of the present invention, before step S102, the method further includes: training a convolutional neural network with a coarse-fine structure to generate a two-dimensional compensation map; the compensation map carries out pixel-level substitution operation on the original image to generate a training image; the coarse warping network is trained using the mean square error between the generated training image and the original image as a loss function.

Specifically, a coarse distortion network is generated through the above steps to perform a preliminary processing on the human eye image obtained in step S101, so as to obtain a human eye generated image in a coarse stage.

In step S103, a defective area in the human eye production image is detected by the fine correction network, and the defective area is corrected.

Further, in one embodiment of the present invention, a fine modification network comprises: a round robin policy network and a local correction network.

Detecting a defect area in the human eye generated image through a fine correction network, and correcting the defect area, wherein the method comprises the following steps: detecting a defect area in an image generated by human eyes by a loop strategy network; the local correction network corrects the defective area by the convolution layer.

The method for detecting the defect area in the image generated by the human eyes by the loop strategy network comprises the following steps:

given the image I of step t_t-1The round robin strategy network selects the coordinate position l of a local block_tTo select a block

l_t＝f_r(s_t-1)

Wherein s is_t-1Is an encoded state feature of a cyclic strategy network, from an input image I_t-1Encoded historical hidden state h_t-1And the position l of the block selected in the previous step_t-1Co-construction, g denotes a crop operation, image I_t-1In a given position l_tThe block of (b) is clipped as a result.

Further, in an embodiment of the present invention, the local repair network repairs the defect area by using the convolution layer, including: the block to be corrected selected in each step

By locally modifying the network f_eTo obtain a corrected block, and using the corrected block to directly replace the block before correction as the corrected image I_tAfter T, the final image I is obtained_T。

The method of the embodiment of the invention is different from a distortion-based method and an integral correction method for the generated image, and has obvious improvement on the integral effect. With dynamic stepwise assignment of new regions of interest based on a deep reinforcement learning loop policy network, visually defective blocks (patch) of the coarse-stage generated image can be detected. The detected blocks are corrected by using a local correction network considering the global visual characteristics, so that the error between the generated image and the real image is effectively reduced, the visual defect and the unreality in the image are eliminated, and the image details such as reflective bright spots and the like are recovered.

The embodiment of the invention performs gaze correction by a two-stage method from coarse to fine, as shown in fig. 3, for a given human eye picture I and a change angle α of a sight line, the method of the embodiment of the invention is divided into two parts: coarse Warping Networks (CWN), Coarse Warping networks (FCN), and Fine Corrected Networks (FCN). Wherein the CWN is used in a first step to modify the image as a whole by pixel replacement operations. And FCN is used in a second step to refine the image output by the CWN to increase the realism of the generated image.

The following describes a method for correcting a line of sight of a human eye based on deep learning according to an embodiment of the present invention.

1. Coarse tuning distortion network (CWN)

The task of the coarse warping network is to generate a warped flow field for warping the original image. To achieve this, a coarse-to-fine structured convolutional neural network is trained to generate a two-dimensional compensation map. The map has a compensation vector (u (x, y), v (x, y)) for each pixel (x, y). This compensation map is used to perform pixel-level substitution operations on the original image. The calculation method of the distorted image is as follows:

O(x，y)＝I(x+u(x，y)，y+v(x，y))

therefore, the pixel of each point of the generated image is replaced by a pixel point in the original image, and the position of the replaced point is determined by the compensation vector.

The original image, the sight line change angle and the detected positions of the human eye feature points are used as the input of a coarse distortion network, and the network generates a two-channel atlas D_C. Generating a coarse warped image O by warping the original image I from below_C：

O_C(x，y)＝I{(x，y)+D_C(x，y)}

Wherein the parenthesis represents bilinear difference operation.

The CWN is trained using Mean Squared Error (MSE) between the generated image and the actual image as a loss function. The concrete network structure of the CWN is shown in fig. 4.

2. Fine Correction Network (FCN)

The result generated by the coarse distortion network usually contains local defects, which seriously affect the reality of the picture. To address this problem, the fine correction network is used to fine-correct the image generated by the coarse network lock.

The fine correction network is mainly divided into two parts: (1) a loop strategy network selects a block (2) to be corrected at each step and a partial correction network corrects the defective block by convolution layers. The cyclic body flow of the FCN is as follows:

l_t＝f_r(s_t-1)

Wherein s is_t-1Is an encoded state feature of a cyclic strategy network, from an input image I_t-1Encoded history hidden state h_t-1And the position l of the block selected in the previous step_t-1Co-construction, g denotes a crop operation, image I_t-1In a given position l_tThe block of (b) is clipped as a result.

Then selecting the block to be corrected for each step

By locally modifying the network f_eTo obtain a corrected block, and using the corrected block to directly replace the block before correction as the corrected image I_t. After T steps, we obtain the final image I_T。

The specific implementation manner of the loop policy network is as follows: this process is considered a markov decision problem at discrete time intervals. At each step, the decision network encodes the current state characteristics and decides which part of the image of the human eye needs to be modified. Until the maximum number of steps is reached, the blocks of the human eye image are gradually modified and the state features are updated.

At the end of the correction sequence, a delayed global reward is taken to guide the training of the policy network. The policy network iteratively explores an optimal search path so that each individual eye image can achieve the maximum global reward, and the structural details of the network are shown in fig. 5.

The specific settings of the state, behavior and reward of the policy network are as follows:

the state is as follows: state s_tExtracted from the input image of the current step and the past behavior history, and comprises three parts: (1) image of the human eye from the current step I_tThe feature map extracted in (1) is extracted by the same convolution network structure as that in the local correction network, and the specific structure will be described later. (2) Location I of the block selected in the previous step_t-1. (3) Hidden unit h of LSTM layer_tWherein, the LSTM adopts a GRU network structure.

Behavior: in a policy network, the action is to select the location of the block to be modified at this step from all possible locations. The network firstly encodes the feature map of the current image and the block position selected in the last step through a full connection layer, and simultaneously combines the vector obtained after encoding with a historical hidden vector h_t-1Generating a new hidden unit h_tFinal policy network pi_θFrom h_tPosition l in this step is obtained_t。

Rewarding: rewards are used to guide the web learning how to select a series of actions to achieve an optimal final output. The loss of mean square error between the final output image and the real image is used as a reward for the network. In addition, the final delay reward is generated only in the last step, and the error of each step in the middle of the network is not counted into the training. The reward r at step t is therefore as follows:

wherein, I_gtRepresenting a real image. In the method of the present embodiment, the method,the discount factor y is set to 1, i.e. the correction of each step is equally important for the evaluation of the final result.

The local correction network is specifically as follows:

location l obtained from a round robin policy network_tTo image I_t-1Clipping to obtain the block to be corrected

Will position l_tEncoding, and summing the same

And merging as the input of the network. And obtaining a residual error map delta through a deep convolution network containing a series of convolution layers, directly adding the value of the residual error map delta to the block before modification, and taking the result as the modified block to replace the original block. The specific flow is shown in fig. 6.

The optimization method comprises the following steps:

and jointly training the cyclic strategy network and rejecting the network for correction by using an enhanced learning architecture. The overall formula for the optimization problem is as follows:

first, the round robin policy network is optimized using the following formula, { μ, ∑ pi }, where_θ(s_t)：

Probability distribution using a positive-Taiwan distribution as a behavior selection

Secondly, optimizing a local correction network:

the local convolutional network will perform parameter update and optimization at each step. The mean square error is still used as a loss function of the backtransmission error. The optimization process for the local convolutional network does not affect the parameters of the circular strategy network.

According to the human eye sight line correction method flow chart based on the deep learning provided by the embodiment of the invention, for the input human eye picture, a generated image in a coarse stage is obtained by using a distortion-based method, and then a defect area in the image output in the coarse stage is detected by using a cyclic strategy network based on the deep reinforcement learning. The detected defect area is refined by considering a local correction network of global visual characteristics, so that the visual defects and non-authenticity caused by specific factors such as illumination, texture, shielding and the like are greatly eliminated.

Next, a system for correcting a line of sight of a human eye based on deep learning according to an embodiment of the present invention will be described with reference to the accompanying drawings.

As shown in fig. 7, the system for correcting the line of sight of the human eye includes: an acquisition module 100, a processing module 200 and a modification module 300.

The obtaining module 100 is configured to obtain a human eye picture. The processing module 200 is configured to process the eye image through the coarse distortion network to obtain a coarse-stage eye-generated image. The correction module 300 is configured to detect a defective area in the human eye generated image through the fine correction network and correct the defective area. The system effectively reduces the error between the generated image and the real image, eliminates the visual defect and the non-reality sense in the image, and can recover the image details such as reflective bright spots and the like.

Further, in an embodiment of the present invention, the method further includes: the generating module is used for training a convolution neural network with a coarse-fine structure to generate a two-dimensional compensation map, the compensation map carries out pixel-level substitution operation on the original image to generate a training image, and the generated training image and the original image are used for training the coarse tuning distortion network by taking the mean square error as a loss function.

Further, in one embodiment of the present invention, a fine modification network comprises: a loop policy network and a local correction network;

a correction module, comprising: a detection unit and a correction unit;

the detection unit is used for detecting a defect area in the image generated by the human eyes through a loop strategy network;

and the correcting unit is used for correcting the defect area through the convolution layer by the local correcting network.

Further, in an embodiment of the present invention, the detection unit is specifically configured to:

l_t＝f_r(s_t-1)

Further, in an embodiment of the present invention, the modification unit is specifically configured to:

the block to be corrected selected in each step

By locally modifying the network f_eTo make a correction to obtain a corrected block, and then directly replacing the block before correction with the corrected blockModifying the image I for this step_tAfter T, the final image I is obtained_T。

It should be noted that the foregoing explanation of the embodiment of the method for correcting the line of sight of the human eye based on deep learning is also applicable to the system of the embodiment, and is not repeated here.

According to the human eye sight line correction system based on the deep learning, provided by the embodiment of the invention, for an input human eye picture, a generated image in a coarse stage is obtained by using a warping-based method, and a defect area in the image output in the coarse stage is detected by using a cyclic strategy network based on the deep reinforcement learning. The detected defect area is refined by considering a local correction network of global visual characteristics, so that the visual defects and non-authenticity caused by specific factors such as illumination, texture, shielding and the like are greatly eliminated.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A human eye sight line correction method based on deep learning is characterized by comprising the following steps:

acquiring a human eye picture;

processing the human eye picture through a coarse distortion network to obtain a human eye generated image in a coarse stage;

detecting a defect area in the human eye generated image through a fine correction network, and correcting the defect area;

wherein the fine correction network comprises: a loop policy network and a local correction network;

the detecting the defect area in the human eye generated image through the fine correction network and correcting the defect area comprises the following steps:

the loop strategy network detects a defect area in the human eye generated image;

the local correction network corrects the defect area through a convolution layer.

2. The method for correcting the sight line of the human eye based on the deep learning of claim 1, wherein before the processing the picture of the human eye through the coarse distortion network to obtain the image generated by the human eye in the coarse stage, the method further comprises:

training a convolutional neural network with a coarse-fine structure to generate a two-dimensional compensation map;

the compensation map carries out pixel-level substitution operation on the original image to generate a training image;

training the coarse distortion network using a mean square error between the generated training image and the original image as a loss function.

3. The method for correcting the sight line of the human eye based on the deep learning as claimed in claim 1, wherein the loop strategy network detects a defect region in the image generated by the human eye, and comprises the following steps:

l_t＝f_r(s_t-1)

Wherein s is_t-1Is an encoded state feature of the cyclic strategy network, from an input image I_t-1Encoded historical hidden state h_t-1And the position l of the block selected in the previous step_t-1Co-construction, g denotes a crop operation, image l_t-1In a given position l_tThe block of (b) is clipped as a result.

4. The deep learning based human eye vision correction method according to claim 3,

the local correction network corrects the defect area through a convolution layer, and comprises:

the block to be corrected selected in each step

5. A system for correcting a line of sight of a human eye based on deep learning, comprising:

the acquisition module is used for acquiring a human eye picture;

the processing module is used for processing the human eye picture through a coarse adjustment distortion network to obtain a human eye generated image in a coarse stage;

the correction module is used for detecting a defect area in the human eye generated image through a fine correction network and correcting the defect area; wherein the fine correction network comprises: a loop policy network and a local correction network;

the correction module comprises: a detection unit and a correction unit;

the detection unit is used for detecting a defect area in the human eye generated image by the loop strategy network;

the correcting unit is used for correcting the defect area through a convolution layer by the local correcting network.

6. The deep learning based human eye gaze correction system of claim 5, further comprising: a module for generating a plurality of modules,

the generation module is used for training a convolutional neural network with a coarse-fine structure to generate a two-dimensional compensation map, the compensation map performs pixel-level substitution operation on an original image to generate a training image, and the coarse tuning distortion network is trained by using the mean square error between the generated training image and the original image as a loss function.

7. The deep learning based human eye vision correction system of claim 6,

the detection unit is specifically configured to:

l_t＝f_r(s_t-1)

8. The deep learning based human eye gaze correction system of claim 7,

the correction unit is specifically configured to:

the block to be corrected selected in each step