CN111339928B - Eye spirit adjusting method and device and storage medium - Google Patents

Eye spirit adjusting method and device and storage medium Download PDF

Info

Publication number
CN111339928B
CN111339928B CN202010114683.9A CN202010114683A CN111339928B CN 111339928 B CN111339928 B CN 111339928B CN 202010114683 A CN202010114683 A CN 202010114683A CN 111339928 B CN111339928 B CN 111339928B
Authority
CN
China
Prior art keywords
eye
image
network model
network
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010114683.9A
Other languages
Chinese (zh)
Other versions
CN111339928A (en
Inventor
范蓉蓉
毛晓蛟
章勇
曹李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202010114683.9A priority Critical patent/CN111339928B/en
Publication of CN111339928A publication Critical patent/CN111339928A/en
Priority to PCT/CN2020/121519 priority patent/WO2021169325A1/en
Application granted granted Critical
Publication of CN111339928B publication Critical patent/CN111339928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application relates to a method, a device and a storage medium for regulating the eyesight, which belong to the technical field of image processing, and the method comprises the following steps: acquiring a target image including a target eye image; acquiring an eye spirit adjusting network; obtaining an adjusted eye image based on the target eye image and the catch adjusting network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is an expected adjusting angle; the problem of low eye gaze correction efficiency caused by the fact that a user watches a screen and a camera at the same time by changing hardware equipment can be solved; because the eye relief adjusting network can generate the coded image based on the expected adjusting angle, generate the corrected eye image based on the target eye image and the coded image, and adjust the pixel position and the color of the corrected eye image; therefore, the eye spirit adjustment of the eye images based on the eye spirit adjustment network can be realized, hardware equipment does not need to be changed, and the eye spirit correction efficiency can be improved.

Description

Eye spirit adjusting method and device and storage medium
Technical Field
The application relates to an eye-spirit adjusting method, an eye-spirit adjusting device and a storage medium, and belongs to the technical field of image processing.
Background
A video conference system (also called a video conference system) refers to a system in which two or more individuals or groups in different places transmit audio, video and file data to each other through a transmission line and multimedia devices, so as to realize real-time and interactive communication and thus realize a teleconference.
In the existing video conference system, a camera is generally disposed above or below a display screen. Therefore, when the user watches the display screen, the camera can shoot the user without being blocked by the display screen. However, the camera is positioned above or below the display screen, the sight of the user is a direct-view display screen, the eye image effect acquired by the camera is a state that the eyes are raised or fallen to the left, and the image viewed by the other end of the camera and the participants is an image without eye interaction.
In order to improve the effect of the video conference, the eyesight of the conference participants needs to be corrected. In a typical calibration method, a user is enabled to view a screen and a camera simultaneously by using a dedicated hardware device.
However, special hardware equipment needs to modify the existing video conference system, and the eye correction efficiency is low.
Disclosure of Invention
The application provides a method and a device for adjusting the eyesight and a storage medium, which can solve the problem that the eyesight correction efficiency is low because a user watches a screen and a camera at the same time by changing hardware equipment. The application provides the following technical scheme:
in a first aspect, there is provided a method of eye gaze regulation, the method comprising:
acquiring a target image including a target eye image;
acquiring an eye gaze adjusting network, wherein the eye gaze adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model is used for generating a coded image with a desired adjustment angle based on the input desired adjustment angle, the second network model is used for generating a corrected eye image obtained by correcting the input image based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image;
obtaining an adjusted eye image based on the target eye image and the catch adjusting network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is the expected adjusting angle.
Optionally, before the acquiring the eye gaze adjustment network, the method includes:
acquiring a plurality of sample images including images having respective gaze angles;
obtaining a loss function;
and training a preset network model by using the plurality of sample images and the loss function to obtain the eye relief adjusting network.
Optionally, the training a preset network model by using the plurality of sample images and the loss function to obtain the eye gaze adjusting network includes:
performing eye key point detection on each sample image to obtain n eye key points, wherein n is a positive integer;
for each sample image, determining a sample anchor frame based on the position of each eye key point in the corresponding n eye key points to obtain a sample anchor frame corresponding to each key point;
combining the multiple sample images in pairs, and determining the difference of eye spirit angles in each group of image combination to obtain a training set; the training set comprises a plurality of groups of training data, wherein each group of training data comprises a reference sample image, a sample image to be adjusted, a sample anchor frame corresponding to the sample image to be adjusted and the difference between the eye positions of the sample image to be adjusted relative to the reference sample image;
Inputting the difference of eye gaze angles in the training data, the sample image to be adjusted, and the sample anchor frame corresponding to the sample image to be adjusted into the preset network model, and training the preset network model by using the loss function and the reference sample image in the training data to obtain the eye gaze adjustment network.
Optionally, the inputting, into the preset network model, the difference between the gaze angles in the training data, the sample image to be adjusted, and the sample anchor frame corresponding to the sample image to be adjusted includes:
inputting the difference of the eye gaze angles in the training data into a first network model in the preset network models;
inputting the sample image to be adjusted, the sample anchor frame corresponding to the sample image to be adjusted and the output result of the first network model into a second network model in the preset network models;
and inputting the sample image to be adjusted and the output result of the second network model into a third network model in the preset network models.
Optionally, the loss function comprises a first loss function, a second loss function, and a third loss function;
the first loss function is used for minimizing the sum of differences of the adjusted eye image and a real image at a pixel level;
The second loss function is used for minimizing the difference between the eye structure of the model output result of the preset network model and the eye structure of the real image;
the third loss function is used for minimizing a difference between an eye color of a model output result of the preset network model and an eye color of a real image.
Optionally, the first network model is an encoder, the second network model is a correction network, and the third network model includes a pixel relocation branch model and a color adjustment branch model;
the pixel repositioning branch model comprises a preset activation function and a pixel repositioning model connected with the preset activation function, and is used for converging the output result of the second network model so as to make local pixels exceeding an expected range in the output result converge into the expected range; the color adjustment branch model comprises a color adjustment network model and a color adjustment model connected with the color adjustment network model and the pixel relocation model, and is used for performing color adjustment on the output result of the pixel relocation model by using a color adjustment mode indicated by the output result of the color adjustment network model.
Optionally, the obtaining an adjusted eye image based on the target eye image and the gaze adjustment network includes:
determining a target key point of the target eye image;
generating a target anchor frame based on the target key points;
acquiring a desired adjustment angle of the target eye image;
and inputting the target eye image, the target anchor frame and the expected adjusting angle into the eye spirit adjusting network to obtain an adjusted eye image.
Optionally, the third network model comprises a pixel repositioning branch model and a color adjustment branch model; inputting the target eye image, the target anchor frame and the expected adjustment angle into the catch adjustment network to obtain an adjusted eye image, including:
inputting the expected adjustment angle into the first network model to obtain a coded image with the expected adjustment angle;
inputting the coded image, the target eye image and the target anchor frame into the second network model to obtain the corrected eye image;
inputting the target eye image and the corrected eye image into the pixel repositioning branch model to obtain a converged eye image;
And inputting the converged eye image and the corrected eye image into the color adjustment branch model to obtain the adjusted eye image.
Optionally, the method further comprises:
and carrying out image fusion on the adjusted eye image and the target image to obtain a fused image.
In a second aspect, there is provided an eye relief device, the device comprising:
an image acquisition module for acquiring a target image including a target eye image;
the network acquisition module is used for acquiring an eye relief adjusting network, and the eye relief adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model is used for generating a coded image with a desired adjustment angle based on the input desired adjustment angle, the second network model is used for generating a corrected eye image obtained by correcting the input image based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image;
And the catch adjusting module is used for obtaining an adjusted eye image based on the target eye image and the catch adjusting network, and the difference of the catch angle of the target eye image relative to the catch angle in the adjusted eye image is the expected adjusting angle.
In a third aspect, an eye relief device is provided, the device comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement the gaze adjustment method of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which a program is stored, and the program is loaded and executed by the processor to implement the method for adjusting gaze of a user.
The beneficial effect of this application lies in: by acquiring a target image comprising a target eye image; acquiring an eye spirit adjusting network; obtaining an adjusted eye image based on the target eye image and the catch adjusting network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is an expected adjusting angle; the problem of low eye gaze correction efficiency caused by the fact that a user watches a screen and a camera at the same time by changing hardware equipment can be solved; the eye spirit adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model may generate a coded image based on a desired adjustment angle, the second network model may correct an input image based on the coded image, and the third network model may perform pixel position adjustment and color adjustment on the corrected eye image; therefore, the eye spirit adjustment of the eye images in the target image is realized based on the eye spirit adjustment network, hardware equipment does not need to be changed, and the eye spirit correction efficiency can be improved.
In addition, because the corrected eye image may have the situation that the eye pixels are not in the eye contour, the third network model is arranged to adjust the pixel position, so that the output eye pixels in the adjusted eye image are all in the eye contour, and the reality of the adjusted eye image is improved.
In addition, because the corrected eye image may have a situation that the eye color does not match the actual eye color, the color of the output adjusted eye image can be ensured to better conform to the actual eye color by setting the third network model for color adjustment, and the authenticity of the adjusted eye image is improved.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of a gaze adjustment method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an eye gaze adjustment network according to an embodiment of the present application;
FIG. 3 is a flow chart of a training method of an eye gaze adjustment network provided by one embodiment of the present application;
FIG. 4 is a schematic diagram of a gaze adjustment process provided by one embodiment of the present application;
FIG. 5 is a block diagram of a gaze adjustment device provided in accordance with an embodiment of the present application;
fig. 6 is a block diagram of a gaze adjustment device according to an embodiment of the present application.
Detailed Description
The following detailed description of the present application will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
First, a number of terms referred to in this application are introduced:
encoder (encoder): is part of an auto encoder (Autoencoder). An encoder is a neural network for extracting features of input data. It supports the extraction of features from the input data and the placement of these features on an image. The network model forming the encoder may be a fully-connected neural network model, a convolutional neural network model, or the like, and the type of the network model of the encoder is not limited in the present application.
And (3) correcting the network: for performing homographic transformation on the input image to correct the target. The correction network may be a deep neural network such as a Cascaded Shape Regression (CSR) network, and the network model type of the correction network is not limited in the present application.
An anchor frame: the image processing method refers to generating a plurality of bounding boxes with different sizes and aspect ratios by taking one pixel point of an image as a center.
Optionally, the execution subject of each embodiment is taken as an example of an electronic device, and the electronic device may be a device with data processing capability, such as a terminal or a server, where the terminal may be a video conference terminal, a mobile phone, a computer, and the like, and the embodiment does not limit the type of the terminal.
Fig. 1 is a flowchart of a gaze adjustment method according to an embodiment of the present application. The method at least comprises the following steps:
step 101, a target image including a target eye image is acquired.
The target image may be a frame image in a video; alternatively, a single image may be used. The target image includes target eye images such as: the eye image of a person may be an eye image of an animal, and the type of a living body to which the eye image belongs is not limited in this embodiment.
102, acquiring a gaze adjusting network, wherein the gaze adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model is used for generating a coded image with an input expected adjustment angle based on the input expected adjustment angle, the second network model is used for generating a corrected eye image obtained by correcting the input image based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image.
Optionally, the first network model is an encoder, the second network model is a correction network, and the third network model includes a pixel relocation branch model and a color adjustment branch model. The pixel repositioning branch model comprises a preset activating function and a pixel repositioning model connected with the preset activating function, and is used for converging the output result of the second network model so as to converge the local pixels exceeding the expected range in the output result into the expected range; the color adjustment branch model comprises a color adjustment network model and a color adjustment model connected with the color adjustment network model and the pixel relocation model, and is used for performing color adjustment on the output result of the pixel relocation model by using a color adjustment mode indicated by the output result of the color adjustment network model.
The desired range may be a range formed by the eye contour, or a range included by the eye contour.
It should be noted that the network models are only schematic, and in actual implementation, as long as all models capable of implementing the function of the first network model can be used as the first network model, all models capable of implementing the function of the second network model can be used as the second network model, and all models capable of implementing the function of the third network model can be used as the third network model, and the present embodiment does not limit the model structures of the first network model, the second network model, and the third network model.
Optionally, the preset activation function is used to converge the pixel adjustment range of the correction network on the eye image to a specified range. The preset activation function may be a Tanh function, and certainly, may also be other types of activation functions, and the present embodiment does not limit the type of the preset activation function.
Referring to the gaze adjustment network 20 shown in fig. 2, the first network model 201 is an encoder having an input of a desired adjustment angle and an output of an encoded image. The encoded image is an image having the desired adjustment angle. The second network model 202 is a correction network that inputs the eye image, an anchor frame obtained based on the eye image, and a coded image, and outputs the corrected eye image. The third network model 203 includes a pixel repositioning branch model 2031 and a color adjustment branch model 2032. The pixel relocation branch model 2031 includes a preset activation function and a pixel relocation model connected to the preset activation function. The input of the preset activation function is a corrected eye image output by the correction network, and the output is a pixel convergence value; the pixel repositioning model has eye image and pixel converging value as input and eye image converged based on the pixel converging value as output. The converged eye image and the corrected eye image are input to the color adjustment branch model 2032, and an adjusted eye image is obtained. Specifically, the color adjustment branch model 2032 includes a color adjustment network model and a color adjustment model connected to the color adjustment network model and the pixel relocation model. The input of the color adjusting network is the corrected eye image, and the output is the pixel position and the color adjusting mode to be subjected to color adjustment. The color adjusting model inputs the converged eye image, the color adjusted pixel position and the color adjusting mode and outputs the converged eye image and the color adjusted pixel position and the color adjusting mode.
Optionally, obtaining the catch adjustment network comprises: a pre-trained eye-conditioning network is invoked. At this time, before acquiring the eye relief adjustment network, the method further includes: acquiring a plurality of sample images, wherein the plurality of sample images comprise images with various catch angles; obtaining a loss function; and training a preset network model by using a plurality of sample images and the loss function to obtain the eye-mind adjusting network.
Referring to fig. 3, the method for training a preset network model by using a plurality of sample images and a loss function to obtain an eye-gaze adjusting network at least includes the following steps 31 to 34:
and step 31, performing eye key point detection on each sample image to obtain n eye key points. n is a positive integer.
And acquiring eye key points in the sample image by using a key point detection algorithm. Optionally, the keypoint detection algorithm includes, but is not limited to: the present embodiment does not limit the algorithm for detecting the eye key points, such as an Active Shape Model (ASM), an Active Appearance Model (AAM), and a Cascaded Position Regression (CPR).
The number of the eye key points (the value of n) may be 6, 8, or the like, and the number of the eye key points is not limited in this embodiment.
And step 32, determining a sample anchor frame for each sample image based on the position of each eye key point in the corresponding n eye key points to obtain a sample anchor frame corresponding to each key point.
The position of each eye keypoint is represented by pixel coordinates such as: the position of the eye key point 1 is (x1, y 1). The electronic device stores the offset of the anchor frame, and for each eye key point, the electronic device determines the position of the sample anchor frame based on the difference between the pixel coordinates of the eye key point and the offset. Illustratively, the offset of the anchor frame includes a first offset relative to the x-axis and a second offset relative to the y-axis, and at this time, for each eye keypoint, the x-axis pixel coordinate of the eye keypoint is subtracted from the first offset to obtain a sample anchor frame; and subtracting the y-axis pixel coordinate of the eye key point from the second offset to obtain a sample anchor frame corresponding to the eye key point.
And step 33, combining the multiple sample images in pairs, and determining the difference of the gaze angles in each group of image combination to obtain a training set.
The training set comprises a plurality of groups of training data, and each group of training data comprises a reference sample image, a sample image to be adjusted, a sample anchor frame corresponding to the sample image to be adjusted and the difference between the eye angle of the sample image to be adjusted relative to the reference sample image.
In this application, the calculation mode of catch angle includes: and calculating the sight angle according to the pupil position. Such as: the electronic equipment stores the mapping relation between each pupil position and the gaze angle in advance, and then determines the corresponding gaze angle according to the positions of the pupils in the eye images; or, training the deep learning network model by using a large number of eye images and corresponding gaze angles to obtain a gaze angle calculation model, and determining the gaze angles in the eye images by using the gaze angle calculation model, wherein the embodiment does not limit the calculation mode of the gaze angles.
Optionally, the difference between the gaze angles of the adjustment sample image and the sample image to be adjusted relative to the reference sample image is: setting the catch angle in the reference sample image as 0 degrees, and setting the catch angle of the sample image to be adjusted relative to the reference sample image as the difference of the catch angles; or, in the common coordinate system, the difference between the gaze angle of the sample image to be adjusted and the gaze angle of the reference sample image is subtracted, and of course, the determination manner of the gaze angle difference may also be other manners, and this embodiment is not listed here.
Taking the number of sample images as an example, the obtained training data includes 6 groups, which are respectively as follows:
A first group: the sample image 1 is a sample image to be adjusted, the sample image 2 is a reference sample image, a difference between a sample anchor frame corresponding to the sample image 1 and the eye angle is the eye angle 1 of the sample image 1 relative to the sample image 2.
Second group: the sample image 1 is a sample image to be adjusted, the sample image 3 is a reference sample image, a difference between a sample anchor frame corresponding to the sample image 1 and the eye angle is an eye angle 2 of the sample image 1 relative to the sample image 3.
Third group: the sample image 2 is a sample image to be adjusted, the sample image 1 is a reference sample image, a difference between a sample anchor frame corresponding to the sample image 2 and an eye angle is an eye angle 3 of the sample image 2 relative to the sample image 1.
And a fourth group: the sample image 2 is a sample image to be adjusted, the sample image 3 is a reference sample image, the difference between the sample anchor frame corresponding to the sample image 2 and the catch angle is the catch angle 4 of the sample image 2 relative to the sample image 3.
And a fifth group: the sample image 3 is a sample image to be adjusted, the sample image 1 is a reference sample image, a difference between a sample anchor frame corresponding to the sample image 3 and the catch angle is the catch angle 5 of the sample image 3 relative to the sample image 1.
A sixth group: the sample image 3 is a sample image to be adjusted, the sample image 2 is a reference sample image, a sample anchor frame corresponding to the sample image 3, a sample anchor frame corresponding to the difference between eye angles of the sample image 1, and an eye angle 6 of the sample image 3 relative to the sample image 2.
And step 34, inputting the difference of the eye gaze angles in the training data, the sample image to be adjusted and the sample anchor frame corresponding to the sample image to be adjusted into a preset network model, and training the preset network model by using the loss function and the reference sample image in the training data to obtain the eye gaze adjustment network.
The network structure of the preset network model is the same as that of the eye relief adjusting network, namely the preset network model also comprises a first network model, a second network model and a third network model.
According to the same theory of the eye gaze adjustment network shown in fig. 2, inputting the eye gaze angle difference in the training data, the sample image to be adjusted, and the sample anchor frame corresponding to the sample image to be adjusted into the preset network model includes: inputting the difference of the eye gaze angles in the training data into a first network model in the preset network models; inputting a sample image to be adjusted, a sample anchor frame corresponding to the sample image to be adjusted and an output result of the first network model into a second network model in the preset network models; and inputting the sample image to be adjusted and the output result of the second network model into a third network model in the preset network models.
Specifically, the sample image to be adjusted and the output result of the second network model are input into a pixel repositioning branch model in a third network model; and inputting the output result of the pixel repositioning branch model and the output result of the second network model into the color adjusting model in the color adjusting branch model to obtain a training result output by the color adjusting model.
Optionally, the loss function includes a first loss function, a second loss function, and a third loss function.
The first loss function is used to minimize the sum of differences at the pixel level between the adjusted eye image and the real image.
Taking the first loss function as an example of an L2 loss function, the L2 loss function is expressed by the following equation:
Figure BDA0002391114120000111
where p 'is the pixel in the training result I', ptIs a real image ItPixels in (reference sample image).
The second loss function is used to minimize a difference between an eye structure of a model output result of the preset network model and an eye structure of the real image.
When the eye image is adjusted, in order to maintain the structure and shape of the eye, the desired pixel point movement directions of the eyeball and the eyelid are the same direction. In addition, since the sclera is almost white, the color of the pupil and the iris is generally darker than the sclera, and the shape of each region can be preserved according to the brightness of the pixel. In other words, the dark colored pixels represent the iris and pupil, the light colored pixels represent the sclera, and there is more freedom in movement of the sclera.
Based on the above characteristics, the second loss function includes an eyeball loss function loss ebRepresented by the following formula:
Figure BDA0002391114120000121
Figure BDA0002391114120000122
where the subscript eb represents the eyeball, l (p) represents the brightness of the pixel p, and F (-) is the trained pixel optical flow field. The optical flow field is a two-dimensional instantaneous velocity field formed by all pixel points in an image, wherein a two-dimensional velocity vector is the projection of a three-dimensional velocity vector of a visible point in an object on an imaging surface.
The second loss function comprises an eyelid loss function losselRepresented by the following formula:
Figure BDA0002391114120000123
the subscript el denotes the eyelid and F (-) is the pixel optical flow field indicated by the training results.
The third loss function is used for minimizing the difference between the eye color of the model output result of the preset network model and the eye color of the real image.
In the present embodiment, the artificial visual effect caused by the occlusion of the iris by the eyelid is reduced by using the color adjustment network model. However, in the process of training the preset network model, the color adjustment network model may significantly change the color of the pixel in order to minimize the L2 distance. Based on the above technical problem, the color loss is corrected by adding a third loss function.
The third loss function includes the first loss function term losspAnd a second loss function term losss
First loss function term loss pRepresented by the formula:
Figure BDA0002391114120000124
Figure BDA0002391114120000125
c (-) is a predefined penalty map whose value is incremented from the eye center position up to the boundary position of the eye region. B (p) represents the luminance field of each pixel. β and γ are arbitrary constants, β is used to control a curve of the penalty map, γ is used to control a coefficient of the penalty map, β may be 3, γ may be 5, and of course, β and γ may also be set to other values, which is not limited in this embodiment.
Second loss function term losssRepresented by the formula:
Figure BDA0002391114120000131
total loss function losstotRepresented by the formula:
losstot=lossL2+losseb+lossel+lossp+losss
alternatively, step 102 may be performed after step 101; alternatively, it may be performed before step 101; alternatively, the steps may be performed simultaneously with step 101, and the execution sequence between steps 101 and 102 is not limited in this embodiment.
And 103, obtaining the adjusted eye image based on the target eye image and the catch adjusting network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is an expected adjusting angle.
Wherein, the eye image after adjusting is obtained based on the target eye image and the eye spirit adjusting network, and the method comprises the following steps: determining a target key point of a target eye image; generating a target anchor frame based on the target key points; acquiring an expected adjusting angle of the target eye image; and inputting the target eye image, the target anchor frame and the expected adjusting angle into an eye spirit adjusting network to obtain an adjusted eye image.
The process of determining the target key points in the target eye image is the same as the step 31; the process of generating the target anchor frame based on the target key points is the same as that in step 32, and the details are not repeated here in this embodiment.
Alternatively, the desired adjustment angle may be user-input; alternatively, the electronic device may also calculate the difference between the gaze angle of the target image and the reference angle, and the embodiment does not limit the manner of acquiring the desired adjustment angle.
Based on the catch adjustment network shown in fig. 2, the second network model is input into the target eye image and the target anchor frame, the first network model is input into the desired adjustment angle, the third network model is also input into the target eye image, and the adjusted eye image is obtained.
Optionally, after this step, the electronic device further performs image fusion on the adjusted eye image and the target image to obtain a fused image. Illustratively, the electronic device performs image fusion using image fusion algorithms including, but not limited to: the image fusion method includes a pixel-level image fusion algorithm, a feature-level image fusion algorithm or a decision-level image fusion algorithm, and the present embodiment does not limit the type of the image fusion algorithm.
In order to more clearly understand the gaze adjustment method provided by the present application, an example of the gaze adjustment method is described below, and with reference to fig. 4, after a target image is acquired, face detection is performed on the target image to obtain a target eye image; detecting key points of the target eye image to obtain target key points; generating a target anchor frame based on the target key point; calculating an expected adjustment angle; inputting the target eye image, the target anchor frame and the expected adjusting angle into an eye spirit adjusting network to obtain an adjusted eye image; and fusing the adjusted eye image and the target image to obtain a fused image.
In summary, the gaze adjustment method provided by the embodiment acquires the target image including the target eye image; acquiring an eye spirit adjusting network; obtaining an adjusted eye image based on the target eye image and the catch adjusting network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is an expected adjusting angle; the problem of low eye gaze correction efficiency caused by the fact that a user watches a screen and a camera at the same time by changing hardware equipment can be solved; the eye spirit adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model can generate a coded image based on a desired adjustment angle, the second network model can correct the input image based on the coded image, and the third network model is used for adjusting the pixel position and the color of the corrected eye image; therefore, the eye spirit adjustment of the eye images in the target image is realized based on the eye spirit adjustment network, hardware equipment does not need to be changed, and the eye spirit correction efficiency can be improved.
In addition, because the corrected eye image may have the situation that the eye pixels are not in the eye contour, the third network model is arranged to adjust the pixel position, so that the output eye pixels in the adjusted eye image are all in the eye contour, and the reality of the adjusted eye image is improved.
In addition, because the corrected eye image may have a situation that the eye color does not match the actual eye color, the color of the output adjusted eye image can be ensured to better conform to the actual eye color by setting the third network model for color adjustment, and the authenticity of the adjusted eye image is improved.
Fig. 5 is a block diagram of a gaze adjustment device according to an embodiment of the present application. The device at least comprises the following modules: an image acquisition module 510, a network acquisition module 520, and an eye gaze adjustment module 530.
An image acquisition module 510 for acquiring a target image including a target eye image;
a network obtaining module 520, configured to obtain an eye gaze adjusting network, where the eye gaze adjusting network includes a first network model, a second network model connected to the first network model, and a third network model connected to the second network model; the first network model is used for generating a coded image based on an input expected adjustment angle, the second network model is used for generating a corrected eye image with the expected adjustment angle based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image;
An catch adjusting module 530, configured to obtain an adjusted eye image based on the target eye image and the catch adjusting network, where a difference between a catch angle of the target eye image and a catch angle of a catch angle in the adjusted eye image is the desired adjustment angle.
Reference is made to the above-described method embodiments for relevant details.
It should be noted that: in the eye gaze adjusting device provided in the above embodiment, when adjusting the eye gaze, only the division of the above functional modules is exemplified, and in practical application, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the eye gaze adjusting device is divided into different functional modules to complete all or part of the above described functions. In addition, the eye spirit adjusting device provided by the above embodiment and the eye spirit adjusting method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described herein again.
Fig. 6 is a block diagram of an eye gaze adjustment device according to an embodiment of the present application. The apparatus comprises at least a processor 601 and a memory 602.
Processor 601 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
Memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the gaze adjustment method provided by method embodiments herein.
In some embodiments, the eye gaze adjusting device may further comprise: a peripheral interface and at least one peripheral. The processor 601, memory 602 and peripheral interface may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.
Of course, the gaze adjusting device may also comprise fewer or more components, which is not limited by the embodiment.
Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the gaze adjustment method of the above method embodiment.
Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, where a program is stored, and the program is loaded and executed by a processor to implement the eye gaze adjustment method of the above method embodiment.
All possible combinations of the technical features of the above embodiments may not be described for the sake of brevity, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of eye accommodation, the method comprising:
Acquiring a target image including a target eye image;
acquiring a catch adjustment network, wherein the catch adjustment network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model is used for generating a coded image with a desired adjustment angle based on the input desired adjustment angle, the second network model is used for generating a corrected eye image obtained by correcting the input image based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image;
obtaining an adjusted eye image based on the target eye image and the catch adjustment network, wherein the difference between the catch angle of the target eye image and the catch angle of the catch angle in the adjusted eye image is the expected adjustment angle;
before the acquiring the eye relief adjusting network, the method comprises the following steps:
obtaining a plurality of sample images, the plurality of sample images including images having respective gaze angles;
obtaining a loss function;
and training a preset network model by using the plurality of sample images and the loss function to obtain the eye-spirit adjusting network.
2. The method of claim 1, wherein training a predetermined network model using the plurality of sample images and the loss function to obtain the eye relief network comprises:
performing eye key point detection on each sample image to obtain n eye key points, wherein n is a positive integer;
for each sample image, determining a sample anchor frame based on the position of each eye key point in the corresponding n eye key points to obtain a sample anchor frame corresponding to each key point;
combining the plurality of sample images pairwise, and determining the difference of eye gaze angles in each group of image combination to obtain a training set; the training set comprises a plurality of groups of training data, wherein each group of training data comprises a reference sample image, a sample image to be adjusted, a sample anchor frame corresponding to the sample image to be adjusted and the difference between the eye positions of the sample image to be adjusted relative to the reference sample image;
and inputting the difference of the eye gaze angles in the training data, the sample image to be adjusted and the sample anchor frame corresponding to the sample image to be adjusted into the preset network model, and training the preset network model by using the loss function and the reference sample image in the training data to obtain the eye gaze adjustment network.
3. The method according to claim 2, wherein the inputting the difference between the gaze angles in the training data, the sample image to be adjusted, and the sample anchor box corresponding to the sample image to be adjusted into the preset network model comprises:
inputting the difference of the eye gaze angles in the training data into a first network model in the preset network models;
inputting the sample image to be adjusted, the sample anchor frame corresponding to the sample image to be adjusted and the output result of the first network model into a second network model in the preset network models;
and inputting the sample image to be adjusted and the output result of the second network model into a third network model in the preset network models.
4. The method of claim 1, wherein the loss function comprises a first loss function, a second loss function, and a third loss function;
the first loss function is used for minimizing the sum of differences of the adjusted eye image and a real image at a pixel level;
the second loss function is used for minimizing the difference between the eye structure of the model output result of the preset network model and the eye structure of the real image;
The third loss function is used for minimizing a difference between an eye color of a model output result of the preset network model and an eye color of a real image.
5. The method according to any of claims 1 to 4, wherein the first network model is an encoder, the second network model is a correction network, and the third network model comprises a pixel relocation branch model and a color adjustment branch model;
the pixel repositioning branch model comprises a preset activation function and a pixel repositioning model connected with the preset activation function, and is used for converging the output result of the second network model so as to make local pixels exceeding an expected range in the output result converge into the expected range; the color adjustment branch model comprises a color adjustment network model and a color adjustment model connected with the color adjustment network model and the pixel relocation model, and is used for performing color adjustment on the output result of the pixel relocation model by using a color adjustment mode indicated by the output result of the color adjustment network model.
6. The method according to any one of claims 1 to 4, wherein obtaining the adjusted eye image based on the target eye image and the gaze adjustment network comprises:
determining a target key point of the target eye image;
generating a target anchor frame based on the target key points;
acquiring a desired adjustment angle of the target eye image;
and inputting the target eye image, the target anchor frame and the expected adjusting angle into the eye spirit adjusting network to obtain an adjusted eye image.
7. The method of claim 6, wherein the third network model comprises a pixel relocation branch model and a color adjustment branch model; inputting the target eye image, the target anchor frame and the expected adjustment angle into the catch adjustment network to obtain an adjusted eye image, including:
inputting the expected adjustment angle into the first network model to obtain a coded image with the expected adjustment angle;
inputting the coded image, the target eye image and the target anchor frame into the second network model to obtain the corrected eye image;
Inputting the target eye image and the corrected eye image into the pixel repositioning branch model to obtain a converged eye image;
and inputting the converged eye image and the corrected eye image into the color adjustment branch model to obtain the adjusted eye image.
8. An eye relief device, the device comprising:
an image acquisition module for acquiring a target image including a target eye image;
the network acquisition module is used for acquiring an eye relief adjusting network, and the eye relief adjusting network comprises a first network model, a second network model connected with the first network model and a third network model connected with the second network model; the first network model is used for generating a coded image with a desired adjustment angle based on the input desired adjustment angle, the second network model is used for generating a corrected eye image obtained by correcting the input image based on the input image and the coded image, and the third network model is used for carrying out pixel position adjustment and color adjustment on the corrected eye image;
the catch adjusting module is used for obtaining an adjusted eye image based on the target eye image and the catch adjusting network, and the difference of the catch angle of the target eye image relative to the catch angle in the adjusted eye image is the expected adjusting angle;
Before the acquiring the eye relief adjusting network, the method comprises the following steps:
means for obtaining a plurality of sample images, the plurality of sample images including images having respective gaze angles;
a module for obtaining a loss function;
and the module is used for training a preset network model by using the plurality of sample images and the loss function to obtain the eye relief adjusting network.
9. An eye relief device, the device comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the gaze adjustment method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to carry out the gaze adjustment method according to any one of claims 1 to 7.
CN202010114683.9A 2020-02-25 2020-02-25 Eye spirit adjusting method and device and storage medium Active CN111339928B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010114683.9A CN111339928B (en) 2020-02-25 2020-02-25 Eye spirit adjusting method and device and storage medium
PCT/CN2020/121519 WO2021169325A1 (en) 2020-02-25 2020-10-16 Gaze adjustment method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010114683.9A CN111339928B (en) 2020-02-25 2020-02-25 Eye spirit adjusting method and device and storage medium

Publications (2)

Publication Number Publication Date
CN111339928A CN111339928A (en) 2020-06-26
CN111339928B true CN111339928B (en) 2022-06-28

Family

ID=71185564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114683.9A Active CN111339928B (en) 2020-02-25 2020-02-25 Eye spirit adjusting method and device and storage medium

Country Status (2)

Country Link
CN (1) CN111339928B (en)
WO (1) WO2021169325A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339928B (en) * 2020-02-25 2022-06-28 苏州科达科技股份有限公司 Eye spirit adjusting method and device and storage medium
TWI792137B (en) * 2020-12-31 2023-02-11 瑞昱半導體股份有限公司 Gaze correction method
CN112733795B (en) * 2021-01-22 2022-10-11 腾讯科技(深圳)有限公司 Method, device and equipment for correcting sight of face image and storage medium
CN112733794B (en) * 2021-01-22 2021-10-15 腾讯科技(深圳)有限公司 Method, device and equipment for correcting sight of face image and storage medium
CN112733797B (en) * 2021-01-22 2021-10-08 腾讯科技(深圳)有限公司 Method, device and equipment for correcting sight of face image and storage medium
CN113362243B (en) * 2021-06-03 2024-06-11 Oppo广东移动通信有限公司 Model training method, image processing method and device, medium and electronic equipment
CN114049442B (en) * 2021-11-19 2024-07-23 北京航空航天大学 Three-dimensional face sight line calculation method
CN117094966B (en) * 2023-08-21 2024-04-05 青岛美迪康数字工程有限公司 Tongue image identification method and device based on image amplification and computer equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6806898B1 (en) * 2000-03-20 2004-10-19 Microsoft Corp. System and method for automatically adjusting gaze and head orientation for video conferencing
CN103838255A (en) * 2012-11-27 2014-06-04 英业达科技有限公司 Sight angle adjusting system of display device and method thereof
CN204168406U (en) * 2014-08-20 2015-02-18 深圳市融创天下科技有限公司 A kind of sight line means for correcting for video calling
RU2596062C1 (en) * 2015-03-20 2016-08-27 Автономная Некоммерческая Образовательная Организация Высшего Профессионального Образования "Сколковский Институт Науки И Технологий" Method for correction of eye image using machine learning and method of machine learning
US9538130B1 (en) * 2015-12-10 2017-01-03 Dell Software, Inc. Dynamic gaze correction for video conferencing
US10423830B2 (en) * 2016-04-22 2019-09-24 Intel Corporation Eye contact correction in real time using neural network based machine learning
CN106569611A (en) * 2016-11-11 2017-04-19 努比亚技术有限公司 Apparatus and method for adjusting display interface, and terminal
TWI637288B (en) * 2017-10-11 2018-10-01 緯創資通股份有限公司 Image processing method and system for eye-gaze correction
CN109978804B (en) * 2019-03-08 2021-02-26 清华大学 Human eye sight line correction method and system based on deep learning
CN111339928B (en) * 2020-02-25 2022-06-28 苏州科达科技股份有限公司 Eye spirit adjusting method and device and storage medium

Also Published As

Publication number Publication date
CN111339928A (en) 2020-06-26
WO2021169325A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
CN111339928B (en) Eye spirit adjusting method and device and storage medium
US11632537B2 (en) Method and apparatus for obtaining binocular panoramic image, and storage medium
US9639914B2 (en) Portrait deformation method and apparatus
CN108846793B (en) Image processing method and terminal equipment based on image style conversion model
US11900557B2 (en) Three-dimensional face model generation method and apparatus, device, and medium
US11238569B2 (en) Image processing method and apparatus, image device, and storage medium
US9635311B2 (en) Image display apparatus and image processing device
CN109272566A (en) Movement expression edit methods, device, equipment, system and the medium of virtual role
US10082867B2 (en) Display control method and display control apparatus
CN111882627B (en) Image processing method, video processing method, apparatus, device and storage medium
WO2018137455A1 (en) Image interaction method and interaction apparatus
CN106920274A (en) Mobile terminal 2D key points rapid translating is the human face model building of 3D fusion deformations
CN111183405A (en) Adjusting digital representation of head region
US20180374258A1 (en) Image generating method, device and computer executable non-volatile storage medium
CN110838084A (en) Image style transfer method and device, electronic equipment and storage medium
CN111476151B (en) Eyeball detection method, device, equipment and storage medium
CN111311733A (en) Three-dimensional model processing method and device, processor, electronic device and storage medium
CN111028318A (en) Virtual face synthesis method, system, device and storage medium
WO2023103813A1 (en) Image processing method and apparatus, device, storage medium, and program product
JP2017212720A (en) Image processing apparatus, image processing method, and program
US20150215602A1 (en) Method for ajdusting stereo image and image processing device using the same
US20220207667A1 (en) Gaze direction correction method
CN111462294B (en) Image processing method, electronic equipment and computer readable storage medium
CN111275648B (en) Face image processing method, device, equipment and computer readable storage medium
CN113642364B (en) Face image processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant