WO2022156622A1 - 脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品 - Google Patents

脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2022156622A1
WO2022156622A1 PCT/CN2022/072302 CN2022072302W WO2022156622A1 WO 2022156622 A1 WO2022156622 A1 WO 2022156622A1 CN 2022072302 W CN2022072302 W CN 2022072302W WO 2022156622 A1 WO2022156622 A1 WO 2022156622A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
corrected
image
correction model
sight
Prior art date
Application number
PCT/CN2022/072302
Other languages
English (en)
French (fr)
Inventor
蒋正锴
彭瑾龙
贺珂珂
余晓铭
易阳
涂娟辉
周易
刘程浩
王亚彪
邰颖
汪铖杰
李季檩
黄飞跃
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022156622A1 publication Critical patent/WO2022156622A1/zh
Priority to US17/977,576 priority Critical patent/US20230072627A1/en

Links

Images

Classifications

    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/20Linear translation of a whole image or part thereof, e.g. panning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06T5/60
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the embodiments of the present application are based on the Chinese patent application with the application number of 202110088340.4 and the application date of January 22, 2021, and claim the priority of the Chinese patent application.
  • the entire contents of the Chinese patent application are incorporated into the embodiments of the present application as refer to.
  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, apparatus, device, computer-readable storage medium, and computer program product for visual correction of facial images.
  • Eyesight correction for image objects is a typical application of artificial intelligence in graphics and image processing.
  • a technical solution for sight correction based on a fixed head posture is provided, and the solution has better sight correction capability for images with a fixed head posture.
  • the user's head posture will change in real time, making this solution unsuitable for such scenarios.
  • Embodiments of the present application provide a method, device, device, computer-readable storage medium, and computer program product for visual correction of a face image, and the technical solutions are as follows:
  • An embodiment of the present application provides a method for correcting a line of sight of a face image, the method comprising:
  • the eye motion flow field is determined based on the eye image to be corrected and the target line of sight direction; wherein, the target line of sight refers to the line of sight direction in the eye image to be corrected that needs to be corrected, and the The eye motion flow field is used to adjust the pixel position in the to-be-corrected eye image;
  • a line-of-sight corrected face image is generated.
  • the embodiment of the present application provides a training method for a vision correction model, the method comprising:
  • the first teacher's sight correction model based on the motion flow field is trained by the eye image samples to be corrected, and the trained first teacher's sight correction model is obtained, and the trained first teacher's sight correction model is used to output the to-be-corrected sight correction model. Correcting the eye motion flow field of the eye image sample, the eye motion flow field is used to adjust the pixel position in the eye image sample to be corrected;
  • the image-based second teacher's sight correction model is trained by using the eye image samples to be corrected to obtain the trained second teacher's sight correction model, and the trained second teacher's sight correction model is used to output the to-be-corrected sight correction model.
  • An embodiment of the present application provides a device for correcting sight of a face image, the device comprising:
  • an eye image acquisition module configured to acquire an eye image to be corrected from the face image
  • the motion flow field generation module is configured to determine the eye motion flow field based on the eye image to be corrected and the target line of sight direction; wherein, the target line of sight direction refers to the eye line of sight in the eye image to be corrected.
  • the corrected sight direction, the eye motion flow field is used to adjust the pixel position in the to-be-corrected eye image;
  • a sight correction processing module configured to perform sight correction processing on the to-be-corrected eye image based on the eye motion flow field to obtain a corrected eye image
  • the eye image integration module is configured to generate a sight-corrected face image based on the corrected eye image.
  • the embodiment of the present application provides a training device for a vision correction model, and the device includes:
  • the first teacher model training module is configured to train the first teacher's sight correction model based on the motion flow field through the eye image samples to be corrected, and obtain the trained first teacher's sight correction model.
  • the trained first teacher's sight correction model The sight correction model is used to output the eye motion flow field of the eye image sample to be corrected, and the eye motion flow field is used to adjust the pixel position in the eye image sample to be corrected;
  • the second teacher model training module is configured to train the image-based second teacher's sight correction model by using the eye image samples to be corrected to obtain the trained second teacher's sight correction model.
  • the trained second teacher's sight correction model The sight correction model is used to output the corrected eye image sample of the to-be-corrected eye image sample;
  • the student model training module is configured to perform knowledge distillation training on the student's sight-correction model through the trained first teacher's sight-correction model and the trained second teacher's sight-correction model to obtain the trained student's sight-correction model .
  • An embodiment of the present application provides a computer device, the computer device includes a processor and a memory, the memory stores at least one instruction, at least a piece of program, a code set or an instruction set, the at least one instruction, the at least one A piece of program, the code set or the instruction set is loaded and executed by the processor to implement the above-mentioned method for eye correction of a face image, or the above-mentioned method for training a vision correction model.
  • An embodiment of the present application provides a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, the at least one instruction, the at least one piece of program, the code
  • the set or instruction set is loaded and executed by the processor to implement the above-mentioned method of eye-line correction for facial images, or the above-mentioned method for training a line-of-sight correction model.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above-mentioned method for correcting the line of sight of the facial image or the above-mentioned method for training the line-of-sight correction model.
  • the eye motion flow field is generated, and then the eye motion flow field is used to perform line of sight correction processing on the to-be-corrected eye image to obtain the corrected eye image.
  • the head posture is fixed for sight correction, so it has better sight correction ability compared to the scene where the head posture changes in real time.
  • FIG. 1 is a schematic diagram of a solution implementation environment provided by an embodiment of the present application.
  • Fig. 2 shows the schematic diagram of the angle formed between the camera, the human eye, and the position that the human eye looks at under the video conference scene;
  • FIG. 3 is a flowchart of a method for correcting sight lines of a face image provided by an embodiment of the present application
  • FIG. 4A is a schematic diagram before sight correction provided by an embodiment of the present application.
  • FIG. 4B is a schematic diagram after sight correction provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for correcting sight lines of a face image provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a vision correction model provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a training method of a sight correction model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a training process of a first teacher's sight correction model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a training process of a second teacher's sight correction model provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a training process of a student sight correction model provided by an embodiment of the present application.
  • FIG. 11 is a block diagram of a device for correcting sight lines of a face image provided by an embodiment of the present application.
  • FIG. 12 is a block diagram of a training device for a vision correction model provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer vision technology is a science that studies how to make machines "see”. Further, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets and other machine vision, and further. Do graphics processing to make computer processing become images more suitable for human eye observation or transmission to instruments for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR, Optical Character Recognition), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, three-dimensional (3D, 3 Dimension) technology, virtual reality, augmented reality, simultaneous positioning and map construction and other technologies, as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.
  • FIG. 1 shows a schematic diagram of a solution implementation environment provided by an embodiment of the present application.
  • the solution implementation environment can be implemented as a video conference system.
  • the solution implementation environment may include a server 10 and a plurality of terminals 20 .
  • the terminal 20 may be an electronic device such as a mobile phone, a tablet computer, a personal computer (PC, Personal Computer), a smart TV, a multimedia playback device, and the like.
  • a client terminal running a video conference application program may be installed in the terminal 20, so as to provide the user with a video conference function.
  • the server 10 may be a server, a server cluster composed of multiple servers, or a cloud computing service center.
  • the server 10 may be a background server of a video conference application, and is used to provide a background server for the above-mentioned client.
  • the terminal 20 can run a client of a video conference application, and the client of the video conference application can collect facial images in the video process, generate a line of sight based on the facial images, and implement a correction request, and send a request to the video conferencing application.
  • the server submits a line of sight correction request, so that when the executable instructions in the storage medium of the server 200 are executed by the processor, the line of sight correction method for the face image provided by the embodiment of the present application is implemented, and the eye image to be corrected is obtained from the face image.
  • the eye motion flow field in the embodiment of the present application can be obtained by calling the student sight correction model. Of course, it is used for the student sight line. Before correcting the model, it is also necessary to train the student's vision correction model, including:
  • the first teacher's sight correction model based on the motion flow field is trained by the eye image samples to be corrected, and the trained first teacher's sight correction model is obtained.
  • the trained first teacher's sight correction model is used to output the eye image to be corrected.
  • the eye motion flow field of the sample; the image-based second teacher's sight correction model is trained through the eye image samples to be corrected, and the trained second teacher's sight correction model is obtained, and the trained second teacher's sight correction model is used for Output the corrected eye image sample of the eye image sample to be corrected; through the trained first teacher's sight correction model and the trained second teacher's sight correction model, perform knowledge distillation training on the student's sight correction model, and obtain the trained eye sight correction model.
  • Student vision correction model is used to output the eye image to be corrected.
  • the user’s line of sight is generally looking at the other party on the screen 21 , while the camera 22 is not in the screen 21 , but is in other positions (as shown in FIG. 2 above the screen 21 ) ), so there is often an included angle between the camera 22, the human eye, and the position where the human eye looks at the line of sight (the included angle ⁇ shown by the dotted line in FIG. 2 ).
  • the user's line of sight is not looking at the other user, but is downward, which affects the user's communication experience.
  • the method for correcting the sight of a face image provided by the embodiments of the present application provides an editing function for changing the sight line, so as to support the user to correct the sight line of the object in the image and video, for example, in the original image
  • the direction of the line of sight of the user is a
  • the line of sight is corrected to the direction b by the line of sight correction method of the face image provided by the embodiment of the present application, so as to realize the editing function of changing the line of sight, so that the image conveys the line of sight information different from the original image.
  • the user's head posture will change in real time, so that the technical solution for sight correction based on a fixed head posture provided in the related art cannot be applied.
  • the technical solutions provided by the embodiments of the present application generate an eye motion flow field by combining the eye image to be corrected and the target sight direction to be corrected, and then use the eye motion flow field to perform sight correction processing on the eye image to be corrected, The corrected eye image is generated. Since the embodiment of the present application does not need to fix the head posture for sight correction, for scenarios such as video conferences, video calls, and live video broadcasts, where the user's head posture may change in real time, the embodiments of the present application have a relatively high level of accuracy. Good vision correction ability.
  • the terminal 20 may run a client of a video conferencing application, and the client of the video conferencing application can collect facial images in the video process, and generate a line of sight based on the facial images to implement a correction request, so as to pass
  • the executable instructions in the storage medium of the terminal 20 are executed by the processor, the method for correcting the line of sight of the facial image provided by the embodiment of the present application is implemented, and the eye image to be corrected is obtained from the facial image; Determine the eye motion flow field based on the direction of the target line of sight; perform line-of-sight correction processing on the eye image to be corrected based on the eye motion flow field to obtain the corrected eye image; based on the corrected eye image, generate the sight-corrected face image , the eye movement flow field in the embodiment of the present application can be obtained by calling the student's sight correction model.
  • the student's sight correction model needs to be trained, specifically including:
  • the first teacher's sight correction model based on the motion flow field is trained by the eye image samples to be corrected, and the trained first teacher's sight correction model is obtained.
  • the trained first teacher's sight correction model is used to output the eye image to be corrected.
  • the eye motion flow field of the sample; the image-based second teacher's sight correction model is trained through the eye image samples to be corrected, and the trained second teacher's sight correction model is obtained, and the trained second teacher's sight correction model is used for Output the corrected eye image sample of the eye image sample to be corrected; through the trained first teacher's sight correction model and the trained second teacher's sight correction model, perform knowledge distillation training on the student's sight correction model, and obtain the trained eye sight correction model.
  • Student vision correction model is used to output the eye image to be corrected.
  • FIG. 3 shows a flowchart of a method for correcting sight lines of a face image provided by an embodiment of the present application.
  • the execution body of each step of the method may be a terminal device such as a mobile phone, a tablet computer, a PC, etc., or a server.
  • the method may include the following steps (310-340):
  • Step 310 Obtain the eye image to be corrected from the face image.
  • a face image refers to an image including a face
  • the face image may be a photo or a picture, or an image frame in a video, which is not limited in this embodiment of the present application.
  • the eye image to be corrected is intercepted from the face image, and contains the image of the eye area that needs to be corrected for sight line.
  • a face image contains two left and right eyes (such as human eyes), so two images of the eyes to be corrected can be obtained from a face image, one of which corresponds to the eye to be corrected of the left eye. image, and the other image corresponds to the eye image to be corrected for the right eye.
  • an eye image to be corrected may also include left and right human eyes.
  • Step 320 Determine the eye motion flow field based on the eye image to be corrected and the target line of sight direction.
  • the target line of sight direction refers to the line of sight direction of the eye in the eye image to be corrected that needs to be corrected (that is, the line of sight of the eye is corrected to a specified direction).
  • the target line of sight direction refers to the direction of facing the camera, so that the eye line of sight in the eye image to be corrected is corrected to the direction of facing the camera.
  • the target line-of-sight direction includes a pitch angle (Pitch) and a yaw angle (Yaw).
  • Pitch pitch angle
  • Yaw yaw angle
  • the pitch angle is equal to 0°
  • the yaw angle is also equal to 0°.
  • the eye motion flow field is used to adjust the pixel position in the eye image to be corrected.
  • the pixel value of each pixel in the eye motion flow field includes horizontal displacement and vertical displacement; wherein, the horizontal displacement of a pixel in the eye motion flow field represents the difference between the pixel in the eye image to be corrected and the pixel.
  • the displacement amount of the pixel at the same position in the horizontal direction such as the number of pixels displaced in the horizontal direction
  • the vertical displacement amount of a pixel in the eye motion flow field indicating that the pixel at the same position as the pixel in the eye image to be corrected is in the vertical direction.
  • the amount of displacement in the direction such as the number of pixels displaced in the vertical direction.
  • the eye movement flow field can include a two-dimensional image, such as a first-dimensional image and a second-dimensional image, the first-dimensional image is used to store the horizontal displacement of each pixel, and the second-dimensional image is used to store the vertical displacement of each pixel. quantity. Moreover, the size (including height and width) of the first-dimensional image and the second-dimensional image is the same as the size of the eye image to be corrected.
  • Step 330 Perform sight correction processing on the eye image to be corrected based on the eye motion flow field to obtain a corrected eye image.
  • any pixel in the eye image to be corrected obtain the horizontal displacement and vertical displacement of the pixel from the eye motion flow field, and then perform displacement processing on the pixel based on the horizontal displacement and vertical displacement to obtain Corrected eye image.
  • Step 340 based on the corrected eye image, generate a line-of-sight corrected face image.
  • the corrected eye image is integrated into the corresponding position in the original face image, and the corrected eye image is used to cover or replace the above-mentioned to-be-corrected eye image to obtain the face image after sight correction.
  • FIG. 4A-FIG. 4B it shows a comparison diagram before and after the sight correction method using the sight correction method provided by the embodiment of the present application
  • FIG. 4A is a face image without sight correction
  • the human eye sight 401 is offset
  • FIG. 4B is a face image after line-of-sight correction
  • the line of sight 402 of the human eye is focused on the front.
  • the technical solutions provided by the embodiments of the present application generate an eye motion flow field by combining the eye image to be corrected and the target sight direction to be corrected, and then use the eye motion flow field to treat the corrected eye image. Perform sight correction processing to obtain a corrected eye image. Since this embodiment of the present application does not require a fixed head posture to perform sight correction, for scenarios in which the user's head posture changes in real time, such as video conferences, video calls, and live video broadcasts, this The application embodiments have better vision correction ability.
  • FIG. 5 shows a flowchart of a method for correcting sight lines of a face image provided by an embodiment of the present application.
  • the execution body of each step of the method may be a terminal device such as a mobile phone, a tablet computer, a PC, etc., or a server.
  • the method may include the following steps (510-550):
  • Step 510 Obtain the eye image to be corrected from the face image.
  • face detection is first performed on a face image, it is determined whether a face is included in the face image, and if a face is included, the position of the face is determined. For example, if a face is included in the face image, face keypoint detection is performed. Since the embodiment of the present application focuses on the eye region, only the key points of the eye can be detected, and the key points of other parts such as the mouth and nose do not need to be detected.
  • the minimum circumscribed rectangle of the single eye is determined based on the contour key points of the single eye; the minimum circumscribed rectangle of the single eye is enlarged by a specified multiple to obtain the image capture frame of the single eye; based on the image of the single eye A clipping frame, the eye image to be corrected of the single eye is clipped from the face image.
  • the smallest enclosing rectangle of a single eye refers to the smallest enclosing rectangle box containing the single eye.
  • the smallest enclosing rectangle of the left eye refers to the smallest enclosing rectangle containing the left eye.
  • the above-mentioned specified multiple may be a preset value, such as 1.5 times, 2 times, or 3 times, etc., which is not limited in this embodiment of the present application.
  • the center point of the smallest circumscribed rectangle is taken as the center, and it is proportionally enlarged to obtain an image interception frame, and the center point of the image interception frame is obtained.
  • an image interception technology is used to intercept the image content in the image interception frame of the single eye from the face image, so as to obtain the eye image of the single eye to be corrected.
  • the minimum circumscribed rectangle of all eyes is determined based on the outline key points of all eyes in the face image; the minimum circumscribed rectangle of all eyes is enlarged by a specified multiple to obtain image clipping frames of all eyes; based on the images of all eyes
  • the interception frame is used to intercept the eye image to be corrected including all eyes from the face image.
  • obtaining the eye image to be corrected from the face image and performing sight correction processing on the to-be-corrected eye image can help reduce the amount of calculation in the subsequent steps and improve the efficiency.
  • Step 520 Determine the eye motion flow field and the eye contour mask based on the eye image to be corrected and the target line of sight direction.
  • the target line of sight direction refers to the line of sight direction in the eye image to be corrected that needs to be corrected
  • the eye motion flow field is used to adjust the pixel position in the eye image to be corrected.
  • the eye contour mask is used to indicate the probability that the pixel positions in the eye image to be corrected belong to the eye region.
  • the eye contour mask can be represented as a one-dimensional image whose size (including height and width) is the same as the size of the eye image to be corrected.
  • the pixel value of a certain pixel in the eye contour mask may be a probability value, indicating the probability that the pixel at the same position in the eye image to be corrected belongs to the eye region.
  • the pixel value whose coordinate is (i, j) in the eye contour mask can be a probability value belonging to the value range of [0, 1], indicating that the coordinate of the eye image to be corrected is (i, j) The probability that the pixel at the location belongs to the eye region.
  • the eye image to be corrected and the target line of sight direction are input into the line of sight correction model, and feature extraction processing is performed on the above input data through the line of sight correction model, and the eye motion flow field and the eye contour mask are output.
  • the sight correction model may be a machine learning model obtained by training a neural network in advance.
  • step 520 can be implemented in the following ways: performing combination processing based on the channel dimension on the eye image to be corrected and the target line of sight direction to obtain combined data; performing feature extraction processing on the combined data through the line of sight correction model to obtain the output data of the line of sight correction model ; Extract the eye motion flow field and eye contour mask from the output data.
  • the eye image to be corrected may include images of three channels of R, G, and B
  • the target line of sight may include images of two channels of pitch angle and yaw angle.
  • Dimension is combined to obtain combined data, which may include images of the above-mentioned five channels (three channels of R, G, and B, and two channels of pitch angle and yaw angle).
  • the target line of sight includes the pitch angle equal to 0° and the yaw angle equal to 0°
  • the pixel value of each pixel in the image corresponding to the pitch angle channel is 0, and the yaw angle channel corresponds to The pixel value of each pixel in the image is also 0.
  • the sight correction model may be a neural network model, eg, it may include an encoding network and a decoding network.
  • the encoding network is used for down-sampling the combined data to obtain feature information of the combined data;
  • the decoding network is used for up-sampling the above-mentioned feature information to obtain output data.
  • the output data can include 3-channel images (or data), and the data of the first channel and the second channel are extracted from the output data to obtain the eye movement flow field; the data of the third channel is extracted from the output data to obtain the eye movement flow field.
  • Outline mask For example, the data of the first channel is used as the first dimension image to store the horizontal displacement of each pixel; the data of the second channel is used as the second dimension image to store the vertical displacement of each pixel.
  • the height of the eye image to be corrected is H and the width is W.
  • H and W may represent the number of pixels in the height direction and the number of pixels in the width direction, respectively.
  • the eye image to be corrected is a three-channel image of H ⁇ W ⁇ 3, and the target line of sight is a two-channel image of H ⁇ W ⁇ 2.
  • the two are combined in the channel dimension to obtain the combined data of H ⁇ W ⁇ 5 .
  • the output data of the sight correction model includes a three-channel image of H ⁇ W ⁇ 3, from which the data of two channels H ⁇ W ⁇ 2 is extracted as the eye motion flow field, and the data of the remaining one channel is H ⁇ W ⁇ 1 as the eye. Contour mask.
  • the sight correction model is a student sight correction model obtained by performing knowledge distillation training on multiple teacher sight correction models.
  • the model structure and/or model parameters of the student's vision correction model will be simplified. In this way, it is possible to train a line of sight correction model with excellent sight correction effect and a small size of the model, which is suitable for application on mobile devices such as mobile phones.
  • Step 530 Perform sight correction processing on the eye image to be corrected based on the eye motion flow field to obtain a corrected eye image.
  • any pixel in the eye image to be corrected obtain the horizontal displacement and vertical displacement of the pixel from the eye motion flow field, and then perform displacement processing on the pixel based on the horizontal displacement and vertical displacement to obtain Corrected eye image.
  • y(i,j) x(i+f(i,j)[0],j+f(i,j)[1]), since i+f(i,j)[0] and j+f (i,j)[1] is a floating point number, so bilinear interpolation is required to calculate i+f(i,j)[0] and j+f(i,j)[1].
  • Step 540 Perform adjustment processing on the corrected eye image based on the eye contour mask to obtain an adjusted eye image.
  • step 540 can be implemented in the following manner: multiply the eye contour mask and the pixel value corresponding to the same position in the corrected eye image to obtain the first intermediate image; and combine the mapped image corresponding to the eye contour mask with The pixel values corresponding to the same position in the eye image to be corrected are multiplied to obtain a second intermediate image; the pixel values corresponding to the same position in the first intermediate image and the second intermediate image are added and processed to obtain the adjusted eye image .
  • the pixel value of each position in the eye contour mask is a probability value belonging to the [0,1] value range, and any position in the mapped image corresponding to the eye contour mask is a probability value.
  • adjusted eye image eye contour mask ⁇ corrected eye image+(1-eye contour mask) ⁇ to-be-corrected eye image.
  • Step 550 based on the adjusted eye image, generate a line-of-sight corrected face image.
  • the eye image to be corrected in the face image is replaced with the adjusted eye image to obtain the face image after sight correction.
  • the adjusted eye image is integrated into the image of the eye to be corrected at the position of the image capture frame of the face image to obtain an integrated image; the image harmony processing is performed at the position of the image capture frame in the integrated image to obtain the corrected eye sight. face image after.
  • the purpose of the image harmony processing is to eliminate the boundary traces at the position of the image clipping frame.
  • the method used for image harmonization processing is not limited, such as Gaussian blur, erosion expansion, or image harmonization method based on deep learning.
  • the following method is used to perform image harmony processing on the position of the image interception frame in the integrated image, so as to obtain a face image after sight correction:
  • the pixel value of the initialization mask image at the position of the image interception frame is 1, and the pixel value of the rest of the position is 0;
  • the size of the original face image is C ⁇ H ⁇ W; in which, C is the number of channels (such as including three channels of R, G, and B), and H is the height (such as the number of pixels contained in the height direction), W is the width (such as the number of pixels contained in the width direction).
  • the above-mentioned image interception frame is a rectangular frame of size h ⁇ w at the target position in the face image, h is the height (such as the number of pixels contained in the height direction), and w is the width (such as the width).
  • the size of the eye image to be corrected and the corrected eye image are both c ⁇ h ⁇ w, and c is the number of channels (for example, including three channels of R, G, and B).
  • an initialization mask image of size C ⁇ H ⁇ W For each single-channel image H ⁇ W in the C channels, the pixel value in the h ⁇ w image interception frame at the above target position is set to 1, The pixel values of other regions except the image capture frame are set to 0 as the initialization mask image.
  • the purpose of the etching operation is to eliminate boundary points between objects.
  • an ellipse template can be used to perform an etching operation on the initial mask image to obtain a mask image after etching.
  • Gaussian blur also known as Gaussian smoothing, is used to reduce image noise and level of detail.
  • Gaussian blurring processing may be performed on the mask image after etching to obtain a mask image after processing.
  • the value of each pixel is in the range [0,1], especially for the pixel at the boundary position between 0 and 1
  • the value will be between 0 and 1, so as to achieve a smooth transition.
  • the pixel value of each position in the processed mask image is a value that belongs to the value range of [0,1].
  • facial image after sight correction mask image after processing ⁇ integrated image+(1 ⁇ mask image after processing) ⁇ face image.
  • the boundary traces located at the position of the image interception frame in the integrated image can be eliminated, so that the finally obtained facial image after sight correction has no obvious splicing traces, and the effect is better.
  • the eye image to be corrected and the target line of sight direction are input into the line of sight correction model, the eye motion flow field and the eye contour mask are output from the line of sight correction model, and the eye image is to be corrected based on the eye motion flow field.
  • Perform transformation processing to generate an eye image after transformation processing (equivalent to the corrected eye image in step 530), and then perform image adjustment processing on the transformed eye image based on the eye contour mask to obtain the final corrected eye image image (equivalent to the adjusted eye image in step 540).
  • the eye image to be corrected intercepted from the face image includes not only the eye area inside the eye contour, but also the non-eye area outside the eye contour
  • the adjusted eye image is obtained by using the eye contour mask to adjust the corrected eye image, and the adjusted eye image is used as the final corrected eye image, which realizes
  • the eye area inside the eye contour retains the result of pixel displacement through the eye motion flow field, while the non-eye area outside the eye contour retains more original image information, which realizes the original image information through the attention mechanism.
  • the eye image to be corrected and the eye image corrected by the eye motion flow field are fused to ensure that only the image content inside the eye contour is corrected for sight, and the image content outside the eye contour does not need to be corrected. Improves the sight correction effect of the final corrected eye image.
  • the content involved in the use of the sight correction model and the content involved in the training process correspond to each other, and the two communicate with each other. place, you can refer to the description on the other side.
  • FIG. 7 shows a flowchart of a training method for a sight correction model provided by an embodiment of the present application.
  • the execution body of each step of the method may be a computer device such as a computer and a server.
  • the method may include the following steps (710-730):
  • step 710 the first teacher's sight correction model based on the motion flow field is trained through the eye image samples to be corrected, and the trained first teacher's sight correction model is obtained, and the trained first teacher's sight correction model is used to output the to-be-corrected sight correction model.
  • the eye motion flow field of the eye image sample which is used to adjust the pixel position in the eye image sample to be corrected.
  • the first teacher's sight correction model may be a neural network model.
  • the input data of the model includes the eye image samples to be corrected and the target gaze direction
  • the output data includes the eye motion flow field and the eye contour mask.
  • step 710 may include the following sub-steps:
  • each training sample includes two images, which are two images of different sight lines captured by the same person at the same head posture angle, one of which can be an image in any line of sight direction, and this image is used as the image to be The corrected eye image sample is used, and the other image is an image with the target gaze direction, and this image is used as the target corrected eye image.
  • different training samples can be different characters or have different head poses. That is, the training sample set of the model may include multiple training samples, and the multiple training samples may include training samples with different characters, including training samples with different head poses, so that the trained model can adapt to different characters and poses. Different head poses improve the robustness of the model.
  • the image sample to be corrected and the target line of sight direction are combined based on the channel dimension to obtain combined data; the feature extraction process is performed on the combined data through the first teacher's line of sight correction model to obtain output data; from the output The eye motion flow field and the eye contour mask are extracted from the data.
  • the height of the eye image sample to be corrected is H and the width is W.
  • H and W may represent the number of pixels in the height direction and the number of pixels in the width direction, respectively.
  • the eye image sample to be corrected is a three-channel image of H ⁇ W ⁇ 3, and the target line of sight is a two-channel image of H ⁇ W ⁇ 2.
  • the two are combined in the channel dimension to obtain a combination of H ⁇ W ⁇ 5
  • the data is input to the first teacher's sight correction model.
  • the output data of the first teacher's sight correction model includes a three-channel image of H ⁇ W ⁇ 3, from which the data of two channels H ⁇ W ⁇ 2 is extracted as the eye motion flow field, and the data of one channel is H ⁇ W ⁇ 1 As an eye contour mask.
  • the target line of sight can be the (0°, 0°) direction of the front facing camera, or any other direction, so that the model has the ability to correct the line of sight to any direction.
  • external image samples This process is the same as or similar to steps 530 to 540 introduced in the embodiment of FIG. 5 .
  • steps 530 to 540 introduced in the embodiment of FIG. 5 .
  • a loss function of the first teacher's sight correction model is constructed, and parameters of the first teacher's sight correction model are adjusted based on the loss function of the first teacher's sight correction model.
  • the loss function of the first teacher's sight correction model can be constructed based on the difference between the corrected eye image samples and the target corrected eye image, such as the weight between the corrected eye image samples and the target corrected eye image. Construct loss, as the loss of the first teacher's sight correction model. Then, based on the loss function of the first teacher's sight correction model, a gradient descent algorithm is used to adjust the parameters of the first teacher's sight correction model to optimize the model parameters.
  • Step 720 training the image-based second teacher's sight correction model through the eye image samples to be corrected, to obtain the trained second teacher's sight correction model, and the trained second teacher's sight correction model is used to output the eye to be corrected.
  • the corrected eye image sample of the image sample is used to obtain the trained second teacher's sight correction model, and the trained second teacher's sight correction model is used to output the eye to be corrected.
  • the second teacher's sight correction model may be a neural network model.
  • the input data of the model includes the eye image sample to be corrected and the target gaze direction
  • the output data includes the corrected eye image sample and the eye contour mask.
  • the difference between the second teacher's sight correction model and the first teacher's sight correction model is that the second teacher's sight correction model directly outputs a corrected eye image sample obtained after sight correction.
  • step 720 may be implemented in the following manner:
  • the training samples used by the second teacher's sight correction model may be the same as or different from the training samples used by the first teacher's sight correction model. However, no matter whether they are the same or different, each training sample includes the eye image sample to be corrected and the target corrected eye image.
  • the image sample to be corrected and the target line of sight direction are combined based on the channel dimension to obtain combined data; the combined data is processed by the second teacher's line of sight correction model to obtain output data; from the output data Extract the corrected eye image samples and the eye contour mask.
  • H and W may represent the number of pixels in the height direction and the number of pixels in the width direction, respectively.
  • the eye image sample to be corrected is a three-channel image of H ⁇ W ⁇ 3, and the target line of sight is a two-channel image of H ⁇ W ⁇ 2.
  • the two are combined in the channel dimension to obtain a combination of H ⁇ W ⁇ 5
  • the data is input to the second teacher's vision correction model.
  • the output data of the second teacher's sight correction model includes a four-channel image of H ⁇ W ⁇ 4, from which three-channel data H ⁇ W ⁇ 3 are extracted as the corrected eye image sample, and the remaining one-channel data H ⁇ W ⁇ 1 as the eye contour mask.
  • the loss function of the second teacher's gaze correction model can be constructed based on the difference between the adjusted eye image samples and the target-corrected eye image, such as the weight between the adjusted eye image samples and the target-corrected eye image. Construct loss, as the loss of the second teacher's sight correction model. Then, based on the loss of the second teacher's sight correction model, a gradient descent algorithm is used to adjust the parameters of the second teacher's sight correction model to optimize the model parameters.
  • Step 730 Perform knowledge distillation training on the student's sight correction model by using the first teacher's sight correction model and the second teacher's sight correction model to obtain a trained student's sight correction model.
  • the purpose of the knowledge distillation training process is to enable the student's sight correction model to learn the knowledge learned by the first teacher's sight correction model and the second teacher's sight correction model, thereby generating a sight correction with excellent effect
  • the model is a small student eye correction model, which is suitable for application on mobile devices such as mobile phones.
  • the model parameters of the first teacher's sight correction model and the second teacher's sight correction model are fixed, and the performance of the student's sight correction model is optimized by adjusting the parameters of the student's sight correction model.
  • step 730 may be implemented in the following manner:
  • the training samples used by the student's vision correction model may be the same as or different from the training samples used by the first/second teacher's vision correction model. However, no matter whether they are the same or different, each training sample includes the eye image sample to be corrected and the target corrected eye image.
  • This process is the same as or similar to the process of generating the corrected eye image sample introduced in step 710, and will not be repeated here.
  • This process is the same as or similar to the process of generating the adjusted eye image sample introduced in step 720, and will not be repeated here.
  • the input data of the student's gaze correction model includes the eye image sample to be corrected and the target gaze direction
  • the output data includes the student's eye motion flow field and the student's eye contour mask.
  • the student's eye motion flow field is used to transform the to-be-corrected eye image sample to generate a transformed image; the student's eye contour mask is used to adjust the transformed image to generate a third output image.
  • the first sub-loss is determined based on the difference between the first output image and the third output image; the second sub-loss is determined based on the difference between the second output image and the third output image; and the second sub-loss is determined based on the difference between the second output image and the third output image
  • the third sub-loss is determined by the difference between the output image and the target-corrected eye image; the first sub-loss, the second sub-loss and the third sub-loss are weighted and summed to obtain the loss function of the student's vision correction model.
  • the loss function L of the student's vision correction model can be calculated by the following formula:
  • LPIPS Loss(teacher1_img, student_img) represents the above-mentioned first sub-loss
  • LPIPS Loss(teacher2_img, student_img) represents the above-mentioned second sub-loss
  • L1Loss(student_img, img_tar) represents the above-mentioned third sub-loss.
  • teacher1_img represents the first output image
  • teacher2_img represents the second output image
  • student_img represents the third output image
  • img_tar represents the target corrected eye image.
  • the first sub-loss and the second sub-loss use LPIPS (Learned Perceptual Image Patch Similarity, perceptual image block similarity learning) loss
  • the third sub-loss uses L1 loss.
  • the gradient descent algorithm is used to adjust the parameters of the student's sight correction model to optimize the model parameters.
  • the technical solutions provided by the embodiments of the present application use the multi-teacher distillation method to train and generate the final student's sight correction model for online use, so that the student's sight correction model can learn the first teacher's sight correction model and the second teacher's sight correction model.
  • a student vision correction model with excellent vision correction effect and a small model size is generated, which is suitable for application on mobile devices such as mobile phones.
  • the first teacher's sight correction model is a model based on the motion flow field, and its output data includes the eye motion flow field.
  • the sight correction is performed based on the eye motion flow field, because the pixels in the original eye image to be corrected are displaced. Realize sight correction, so the first teacher's sight correction model can better retain the original image features; however, for the case of large eye sight deviation, if the content of the eye contour has only a small number of pixels corresponding to the eyeballs, pixel displacement is used to achieve There will be distortion in sight correction, so another image-based second teacher's sight correction model is trained.
  • the second teacher's sight correction model can better overcome the above distortion problem; finally, Using the method of multi-teacher distillation learning, the above two teacher models are used to train the student's sight correction model, so that the student's sight correction model can take into account the respective advantages of the above two teacher models, and generate a more realistic and less distorted eye image after correction .
  • FIG. 11 shows a block diagram of an apparatus for correcting sight lines of a face image provided by an embodiment of the present application.
  • the device has the function of realizing the above-mentioned method for correcting the sight of the face image, and the function can be realized by hardware, and can also be realized by the hardware executing corresponding software.
  • the apparatus may be computer equipment, or may be provided in computer equipment.
  • the apparatus 1100 may include: an eye image acquisition module 1110 , a motion flow field generation module 1120 , a line of sight correction processing module 1130 and an eye image integration module 1140 .
  • the eye image acquisition module 1110 is configured to acquire the eye image to be corrected from the face image.
  • the motion flow field generation module 1120 is configured to determine the eye motion flow field based on the eye image to be corrected and the target line of sight direction; wherein the target line of sight direction refers to the eye line of sight in the eye image to be corrected
  • the corrected line-of-sight direction is required, and the eye motion flow field is used to adjust the pixel position in the eye image to be corrected.
  • the sight correction processing module 1130 is configured to perform sight correction processing on the eye image to be corrected based on the eye motion flow field to obtain a corrected eye image.
  • the eye image integration module 1140 is configured to generate a sight-corrected face image based on the corrected eye image.
  • the motion flow field generation module 1120 is further configured to determine an eye contour mask based on the to-be-corrected eye image and the target gaze direction, where the eye contour mask is used to indicate The probability that the pixel position in the to-be-corrected eye image belongs to the eye region.
  • the eye image integration module 1140 is further configured to perform adjustment processing on the corrected eye image based on the eye contour mask to obtain an adjusted eye image; The corrected eye image is replaced with the adjusted eye image to obtain the line-of-sight corrected face image.
  • the motion flow field generation module 1120 is configured to: perform combination processing based on the channel dimension on the eye image to be corrected and the target line of sight direction to obtain combined data; Feature extraction processing is performed on the combined data to obtain output data of the sight correction model; the eye motion flow field and the eye contour mask are extracted from the output data.
  • the sight correction model is a student sight correction model obtained after training with a plurality of teacher sight correction models
  • the training process of the student sight correction model is as follows:
  • the first teacher's sight correction model in the flow field is trained to obtain a trained first teacher's sight correction model, and the trained first teacher's sight correction model is used to output the eye movement flow of the eye image sample to be corrected.
  • the second teacher's sight correction model based on the image is trained by the eye image samples to be corrected, and the trained second teacher's sight correction model is obtained, and the second teacher's sight correction model is used to output the to-be-corrected sight correction model.
  • the eye image integration module 1140 is configured to: perform product processing on the eye contour mask and a pixel value corresponding to the same position in the corrected eye image to obtain a first intermediate image ; Multiply the mapping image corresponding to the eye contour mask and the pixel value corresponding to the same position in the to-be-corrected eye image to obtain a second intermediate image; Combine the first intermediate image and the second The pixel values corresponding to the same position in the intermediate image are added and processed to obtain the adjusted eye image.
  • FIG. 12 shows a block diagram of a training apparatus for a sight correction model provided by an embodiment of the present application.
  • the device has the function of realizing the training method of the above-mentioned sight correction model, and the function can be realized by hardware, and can also be realized by the hardware executing corresponding software.
  • the apparatus may be computer equipment, or may be provided in computer equipment.
  • the apparatus 1200 may include: a first teacher model training module 1210 , a second teacher model training module 1220 and a student model training module 1230 .
  • the first teacher model training module 1210 is configured to train the first teacher's sight correction model based on the motion flow field through the eye image samples to be corrected to obtain the trained first teacher's sight correction model.
  • the teacher's vision correction model is used to output the eye motion flow field of the eye image sample to be corrected, and the eye motion flow field is used to adjust the pixel position in the eye image sample to be corrected.
  • the second teacher model training module 1220 is configured to train an image-based second teacher's sight correction model by using the eye image samples to be corrected to obtain a trained second teacher's sight correction model.
  • the teacher's sight correction model is used to output the corrected eye image sample of the to-be-corrected eye image sample.
  • the student model training module 1230 is configured to perform knowledge distillation training on the student's sight correction model through the trained first teacher's sight correction model and the trained second teacher's sight correction model to obtain the trained student's sight correction model Model.
  • the first teacher model training module 1210 is configured to: obtain a training sample of the first teacher's sight correction model, where the training sample includes an eye image sample to be corrected and a target corrected eye image; Perform eye feature extraction processing on the eye image sample to be corrected by using the first teacher's sight correction model to obtain the eye motion flow field and eye contour mask of the eye image sample to be corrected.
  • the eye contour mask is used to indicate the probability that the pixel position in the eye image sample to be corrected belongs to the eye region; based on the eye image sample to be corrected and the corresponding eye motion flow field and eye contour mask, Determine the corrected eye image sample; build the loss function of the first teacher's sight correction model based on the corrected eye image sample and the target corrected eye image, and based on the first teacher's sight correction model The loss function adjusts the parameters of the first teacher's sight correction model.
  • the second teacher model training module 1220 is configured to: obtain training samples of the second teacher's sight correction model, where the training samples include eye image samples to be corrected and target corrected eye images; Perform sight correction processing on the eye image sample to be corrected by using the second teacher's sight correction model to obtain a corrected eye image sample and an eye contour mask, where the eye contour mask is used to indicate the to-be-corrected eye image sample The probability that the pixel position in the corrected eye image sample belongs to the eye region; the adjusted eye image sample is adjusted based on the eye contour mask to obtain the adjusted eye image sample; based on the adjusted eye image sample Eye image samples and the target corrected eye image, construct the loss function of the second teacher's sight correction model, and set the parameters of the second teacher's sight correction model based on the loss function of the second teacher's sight correction model make adjustments.
  • the student model training module 1230 is configured to: obtain a training sample of the student's vision correction model, the training sample includes an eye image sample to be corrected and a target corrected eye image;
  • the trained first teacher's sight correction model outputs the teacher's eye motion flow field and the first teacher's eye contour mask of the to-be-corrected eye image sample, and based on the to-be-corrected eye image sample and the corresponding teacher's eye
  • the external motion flow field and the first teacher's eye contour mask are used to generate the first output image;
  • the second teacher's sight correction model after the training is used to output the corrected image of the to-be-corrected eye image sample and the second teacher's eye and the second output image is generated based on the corrected image and the second teacher's eye contour mask;
  • the student's eye of the to-be-corrected eye image sample is output through the student's sight correction model.
  • a third output image is generated; based on the first output
  • the difference between the image and the third output image, the difference between the second output image and the third output image, and the difference between the third output image and the target corrected eye image construct the loss function of the student's sight correction model; based on the loss function of the student's sight correction model, adjust the parameters of the student's sight correction model to obtain the trained student's sight correction model.
  • the student model training module 1230 is configured to: transform the to-be-corrected eye image sample based on the student's eye motion flow field to obtain a transformed image;
  • the contour mask performs adjustment processing on the transformed image to obtain the third output image.
  • the student model training module 1230 is configured to: determine a first sub-loss based on the difference between the first output image and the third output image; the difference between the third output images, determine a second sub-loss; based on the difference between the third output image and the target corrected eye image, determine a third sub-loss; for the first sub-loss , the second sub-loss and the third sub-loss are weighted and summed to obtain the loss function of the student's sight correction model.
  • FIG. 13 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device can be any electronic device with data computing, processing, and storage functions, such as a mobile phone, a tablet computer, a personal computer (PC, Personal Computer), or a server.
  • the computer device is used to implement the method for correcting the line of sight of the face image or the method for training the line of sight correction model provided in the above embodiments.
  • the computer device 1300 includes a processing unit (such as a central processing unit (CPU, Central Processing Unit), a graphics processing unit (GPU, Graphics Processing Unit), and a field programmable gate array (FPGA, Field Programmable Gate Array), etc.) 1301, including A random-access memory (RAM, Random-Access Memory) 1302 and a system memory 1304 of a read-only memory (ROM, Read-Only Memory) 1303, and a system bus 1305 connecting the system memory 1304 and the central processing unit 1301.
  • a processing unit such as a central processing unit (CPU, Central Processing Unit), a graphics processing unit (GPU, Graphics Processing Unit), and a field programmable gate array (FPGA, Field Programmable Gate Array), etc.
  • a random-access memory RAM, Random-Access Memory
  • ROM Read-Only Memory
  • the computer device 1300 also includes a basic input/output system (I/O system, Input Output System) 1306 that facilitates the transfer of information between various devices within the server, and is used to store an operating system 1313, application programs 1314, and other program modules 1315
  • I/O system Input Output System
  • the basic input/output system 1306 includes a display 1308 for displaying information and input devices 1309 such as a mouse, keyboard, etc., for user input of information.
  • the display 1308 and the input device 1309 are both connected to the central processing unit 1301 through the input and output controller 1310 connected to the system bus 1305 .
  • the basic input/output system 1306 may also include an input output controller 1310 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
  • input output controller 1310 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1307 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305 .
  • the mass storage device 1307 and its associated computer-readable media provide non-volatile storage for the computer device 1300 . That is, the mass storage device 1307 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • a computer-readable medium such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • Computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, Erasable Programmable Read-Only Memory (EPROM, Erasable Programmable Read-Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), flash memory or Other solid-state storage technologies, CD-ROM, High Density Digital Video Disc (DVD, Digital Video Disc) or other optical storage, cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • the system memory 1304 and the mass storage device 1307 described above may be collectively referred to as memory.
  • the computer device 1300 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 1300 can be connected to the network 1312 through the network interface unit 1311 connected to the system bus 1305, or can also use the network interface unit 1311 to connect to other types of networks or remote computer systems (not shown) .
  • the memory also includes at least one instruction, at least one piece of program, set of code or set of instructions stored in the memory and configured to be executed by one or more processors , so as to realize the above-mentioned method of sight correction of facial images or training method of sight correction model.
  • a computer-readable storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program,
  • the code set or the instruction set is executed by the processor of the computer device, the line of sight correction method for a face image or the training method of the line of sight correction model provided by the above embodiments is implemented.
  • the computer-readable storage medium may include: Read-Only Memory (ROM, Read-Only Memory), Random-Access Memory (RAM, Random-Access Memory), Solid State Drives (SSD, Solid State Drives), or an optical disc.
  • the random access memory may include a resistive random access memory (ReRAM, Resistance Random Access Memory) and a dynamic random access memory (DRAM, Dynamic Random Access Memory).
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above-mentioned method of correcting the sight line of the face image or the method of correcting the sight line of the model. training method.
  • references herein to "a plurality” means two or more.
  • "And/or" which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character "/" generally indicates that the associated objects are an "or” relationship.
  • the numbering of the steps described in this document only exemplarily shows a possible execution sequence between the steps. In some other embodiments, the above steps may also be executed in different order, such as two different numbers. The steps are performed at the same time, or two steps with different numbers are performed in a reverse order to that shown in the figure, which is not limited in this embodiment of the present application.

Abstract

一种脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品,涉及人工智能技术领域。方法包括:从脸部图像中获取待矫正眼部图像;基于待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,目标视线方向是指待矫正眼部图像中的眼部视线需要矫正后的视线方向,眼部运动流场用于调整待矫正眼部图像中的像素位置;基于眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;基于矫正后眼部图像,生成经视线矫正后的脸部图像。

Description

脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请实施例基于申请号为202110088340.4、申请日为2021年01月22日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请涉及人工智能技术领域,特别涉及一种脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品。
背景技术
对图像的对象(例如人物、动物等)进行视线矫正是人工智能在图形图像处理方面的典型应用,具有多种应用场景,其中视线矫正用于将图像中对象的视线矫正为任意指定的方向。
在相关技术中,提供了一种基于固定头部姿态进行视线矫正的技术方案,该方案对于头部姿态固定的图像,具有较好的视线矫正能力。但是,对于视频会议、视频通话等场景,用户头部姿态实时会发生变化,导致该方案对于这种场景无法适用。
发明内容
本申请实施例提供了一种脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品,所述技术方案如下:
本申请实施例提供了一种脸部图像的视线矫正方法,所述方法包括:
从脸部图像中获取待矫正眼部图像;
基于所述待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,所述目标视线方向是指所述待矫正眼部图像中的眼部视线需要矫正后的视线方向,所述眼部运动流场用于调整所述待矫正眼部图像中的像素位置;
基于所述眼部运动流场对所述待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;
基于所述矫正后眼部图像,生成经视线矫正后的脸部图像。
本申请实施例提供了一种视线矫正模型的训练方法,所述方法包括:
通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场,所述眼部运动流场用于调整所述待矫正眼部图像样本中的像素位置;
通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述训练后的第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;
通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
本申请实施例提供了一种脸部图像的视线矫正装置,所述装置包括:
眼部图像获取模块,配置为从脸部图像中获取待矫正眼部图像;
运动流场生成模块,配置为基于所述待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,所述目标视线方向是指所述待矫正眼部图像中的眼部视线需要矫正后的视线方向,所述眼部运动流场用于调整所述待矫正眼部图像中的像素位置;
视线矫正处理模块,配置为基于所述眼部运动流场对所述待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;
眼部图像整合模块,配置为基于所述矫正后眼部图像,生成经视线矫正后的脸部图像。
本申请实施例提供了一种视线矫正模型的训练装置,所述装置包括:
第一教师模型训练模块,配置为通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场,所述眼部运动流场用于调整所述待矫正眼部图像样本中的像素位置;
第二教师模型训练模块,配置为通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述训练后的第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;
学生模型训练模块,配置为通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述脸部图像的视线矫正方法,或者上述视线矫正模型的训练方法。
本申请实施例提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述脸部图像的视线矫正方法,或者上述视线矫正模型的训练方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述脸部图像的视线矫正方法,或者上述视线矫正模型的训练方法。
本申请实施例提供的技术方案至少包括如下有益效果:
通过结合待矫正眼部图像和需要矫正后的目标视线方向,生成眼部运动流场,然后采用该眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像,由于无需固定头部姿态进行视线矫正,因此相对于头部姿态实时会发生变化的场景,具有较好的视线矫正能力。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的方案实施环境的示意图;
图2示出了在视频会议场景下,摄像头、人眼、人眼视线看的位置之间形成的夹角 的示意图;
图3是本申请实施例提供的脸部图像的视线矫正方法的流程图;
图4A是本申请实施例提供的视线矫正前的示意图;
图4B是本申请实施例提供的视线矫正后的示意图;
图5是本申请实施例提供的脸部图像的视线矫正方法的流程图;
图6是本申请实施例提供的视线矫正模型的示意图;
图7是本申请实施例提供的视线矫正模型的训练方法的流程图;
图8是本申请实施例提供的第一教师视线矫正模型的训练过程的示意图;
图9是本申请实施例提供的第二教师视线矫正模型的训练过程的示意图;
图10是本申请实施例提供的学生视线矫正模型的训练过程的示意图;
图11是本申请实施例提供的脸部图像的视线矫正装置的框图;
图12是本申请实施例提供的视线矫正模型的训练装置的框图;
图13是本申请实施例提供的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(CV,Computer Vision)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(OCR,Optical Character Recognition)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、三维(3D,3 Dimension)技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的脸部识别、指纹识别等生物特征识别技术。
机器学习(ML,Machine Learning)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。
本申请实施例提供的技术方案,涉及人工智能的机器学习和计算机视觉等技术,具体通过如下实施例进行介绍说明。
请参考图1,其示出了本申请实施例提供的方案实施环境的示意图。该方案实施环境可以实现称为一个视频会议系统。该方案实施环境可以包括服务器10和多个终端20。
终端20可以是诸如手机、平板电脑、个人计算机(PC,Personal Computer)、智能电视、多媒体播放设备等电子设备。终端20中可以安装运行视频会议应用程序的客户端,从而向用户提供视频会议功能。
服务器10可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。服务器10可以是视频会议应用程序的后台服务器,用于为上述客户端提供后台服务器。
终端20和服务器10之间可以通过网络进行通信。
在本申请实施例中,终端20中可以运行视频会议应用程序的客户端,视频会议应用程序的客户端能够收集视频过程中的脸部图像,并基于脸部图像生成视线实现矫正请求,并向服务器提交视线矫正请求,以通过服务器200的存储介质中的可执行指令被处理器执行时实现本申请实施例所提供的脸部图像的视线矫正方法,从脸部图像中获取待矫正眼部图像;基于待矫正眼部图像和目标视线方向,确定眼部运动流场;基于眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;基于矫正后眼部图像,生成经视线矫正后的脸部图像,并将经视线矫正后的脸部图像发送至终端20,本申请实施例中的眼部运动流场可以通过调用学生视线矫正模型得到,当然在用于学生视线矫正模型之前,还需要对学生视线矫正模型进行训练,具体包括:
通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,训练后的第一教师视线矫正模型用于输出待矫正眼部图像样本的眼部运动流场;通过待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,训练后的第二教师视线矫正模型用于输出待矫正眼部图像样本的矫正后眼部图像样本;通过训练后的第一教师视线矫正模型和训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
如图2所示,在视频会议场景下,用户的视线一般都是看着屏幕21中的对方,而摄像头22并不在屏幕21中,而是在其他位置(如图2所示的屏幕21上方),因此摄像头22、人眼、人眼视线看的位置之间往往有一个夹角(图2中虚线所示夹角α)。在对方用户看来,用户的视线并没有看着对方用户,而是视线偏下,从而影响了用户的交流体验。
除了在视频会议场景之外,视频通话、视频直播、社交分享等场景中均有类似问题。例如,在社交分享的场景下,通过本申请实施例提供的脸部图像的视线矫正方法提供视线改变的编辑功能,以支持用户对图像、视频中的对象的视线进行矫正,例如,原本图像中的用户的视线的方向是a,通过本申请实施例提供的脸部图像的视线矫正方法将视线矫正为方向b,从而实现视线改变的编辑功能,使得图像传达了与原图不同的视线信息。
而且,在上述这些场景下,用户头部姿态实时会发生变化,导致相关技术提供的基于固定头部姿态进行视线矫正的技术方案无法适用。本申请实施例提供的技术方案,通过结合待矫正眼部图像和需要矫正至的目标视线方向,生成眼部运动流场,然后采用该眼部运动流场对待矫正眼部图像进行视线矫正处理,生成矫正后眼部图像,由于本申请实施例无需固定头部姿态进行视线矫正,因此对于诸如视频会议、视频通话、视频直播等用户头部姿态实时会发生变化的场景,本申请实施例具有较好的视线矫正能力。
在本申请实施例中,终端20中可以运行视频会议应用程序的客户端,视频会议应用程序的客户端能够收集视频过程中的脸部图像,并基于脸部图像生成视线实现矫正请 求,以通过终端20的存储介质中的可执行指令被处理器执行时实现本申请实施例所提供的脸部图像的视线矫正方法,从脸部图像中获取待矫正眼部图像;基于待矫正眼部图像和目标视线方向,确定眼部运动流场;基于眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;基于矫正后眼部图像,生成经视线矫正后的脸部图像,本申请实施例中的眼部运动流场可以通过调用学生视线矫正模型得到,当然在用于学生视线矫正模型之前,还需要对学生视线矫正模型进行训练,具体包括:
通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,训练后的第一教师视线矫正模型用于输出待矫正眼部图像样本的眼部运动流场;通过待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,训练后的第二教师视线矫正模型用于输出待矫正眼部图像样本的矫正后眼部图像样本;通过训练后的第一教师视线矫正模型和训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
请参考图3,其示出了本申请实施例提供的脸部图像的视线矫正方法的流程图。该方法各步骤的执行主体可以是诸如手机、平板电脑、PC等终端设备,也可以是服务器。该方法可以包括如下几个步骤(310~340):
步骤310,从脸部图像中获取待矫正眼部图像。
例如,脸部图像是指包含脸部的图像,该脸部图像可以是一张照片或图片,也可以是视频中的一个图像帧,本申请实施例对此不作限定。待矫正眼部图像是从脸部图像中截取的,包含需要进行视线矫正的眼部区域的图像。
需要说明的是,一个脸部图像中包含左右两个眼睛(例如人眼),因此可以从一张脸部图像中获取两张待矫正眼部图像,其中一张对应于左眼的待矫正眼部图像,另一张对应于右眼的待矫正眼部图像。当然,一张待矫正眼部图像也可以包含左右两个人眼。
步骤320,基于待矫正眼部图像和目标视线方向,确定眼部运动流场。
例如,目标视线方向是指待矫正眼部图像中的眼部视线需要矫正后的视线方向(也就是将眼睛的视线矫正为指定的方向)。例如,目标视线方向是指正视摄像头的方向,从而实现将待矫正眼部图像中的眼部视线矫正至正视摄像头的方向。例如,目标视线方向包括俯仰角(Pitch)和偏航角(Yaw),例如,在正视摄像头的情况下,定义俯仰角等于0°且偏航角也等于0°。
其中,眼部运动流场用于调整待矫正眼部图像中的像素位置。例如,眼部运动流场中每个像素的像素值,包括水平位移量和垂直位移量;其中,眼部运动流场中某一像素的水平位移量,表示待矫正眼部图像中与该像素相同位置的像素在水平方向上的位移量,如水平方向上位移的像素数量;眼部运动流场中某一像素的垂直位移量,表示待矫正眼部图像中与该像素相同位置的像素在垂直方向上的位移量,如垂直方向上位移的像素数量。眼部运动流场可以包括一个二维图像,如包括第一维度图像和第二维度图像,第一维度图像用于存储各像素的水平位移量,第二维度图像用于存储各像素的垂直位移量。并且,第一维度图像和第二维度图像的尺寸(包括高和宽),与待矫正眼部图像的尺寸相同。
步骤330,基于眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像。
例如,对于待矫正眼部图像中的任一像素,从眼部运动流场中获取该像素的水平位移量和垂直位移量,然后基于水平位移量和垂直位移量对该像素进行位移处理,得到矫正后眼部图像。
步骤340,基于矫正后眼部图像,生成经视线矫正后的脸部图像。
将矫正后眼部图像整合至原始的脸部图像中的相应位置,采用矫正后眼部图像覆盖或替换掉上述待矫正眼部图像,得到经视线矫正后的脸部图像。
如图4A-图4B所示,其示出了采用本申请实施例提供的视线矫正方法,在视线矫正前后的对比图,图4A为未经视线矫正的脸部图像,人眼视线401偏移,图4B为经视线矫正后的脸部图像,人眼视线402聚焦在正前方。
综上所述,本申请实施例提供的技术方案,通过结合待矫正眼部图像和需要矫正后的目标视线方向,生成眼部运动流场,然后采用该眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像,由于本申请实施例无需固定头部姿态进行视线矫正,因此对于诸如视频会议、视频通话、视频直播等用户头部姿态实时会发生变化的场景,本申请实施例具有较好的视线矫正能力。
请参考图5,其示出了本申请实施例提供的脸部图像的视线矫正方法的流程图。该方法各步骤的执行主体可以是诸如手机、平板电脑、PC等终端设备,也可以是服务器。该方法可以包括如下几个步骤(510~550):
步骤510,从脸部图像中获取待矫正眼部图像。
例如,首先对脸部图像进行脸部检测,确定脸部图像中是否包含脸部,以及在包含脸部的情况下确定脸部位置。例如,如果在脸部图像中包含脸部的情况下,进行脸部关键点检测。由于本申请实施例重点关注的是眼部区域,因此可以仅进行眼部关键点的检测,诸如嘴、鼻子等其他部位的关键点不需要检测。
在一些实施例中,基于单个眼睛的轮廓关键点,确定该单个眼睛的最小外接矩形;对该单个眼睛的最小外接矩形放大指定倍数,得到该单个眼睛的图像截取框;基于该单个眼睛的图像截取框,从脸部图像中截取得到该单个眼睛的待矫正眼部图像。
单个眼睛的最小外接矩形是指包含该单个眼睛的最小外接矩形框。例如,左眼的最小外接矩形即是指包含该左眼的最小外接矩形框。上述指定倍数可以是预先设定的数值,例如1.5倍、2倍或3倍等,本申请实施例对此不作限定。在对单个眼睛的最小外接矩形进行放大处理得到图像截取框的过程中,以该最小外接矩形的中心点为中心,对其进行等比例放大处理,得到图像截取框,该图像截取框的中心点与上述最小外接矩形的中心点重合。最后,采用图像截取技术,从脸部图像中截取该单个眼睛的图像截取框中的图像内容,得到该单个眼睛的待矫正眼部图像。
在一些实施例中,基于脸部图像中所有眼睛的轮廓关键点,确定所有眼睛的最小外接矩形;对所有眼睛的最小外接矩形放大指定倍数,得到所有眼睛的图像截取框;基于所有眼睛的图像截取框,从脸部图像中截取得到包含所有眼睛的待矫正眼部图像。
相比于直接对脸部图像进行视线矫正处理,通过从脸部图像中获取待矫正眼部图像,对该待矫正眼部图像进行视线矫正处理,有助于减少后续步骤的计算量,提升效率。
步骤520,基于待矫正眼部图像和目标视线方向,确定眼部运动流场和眼部轮廓掩码。
例如,目标视线方向是指待矫正眼部图像中的眼部视线需要矫正后的视线方向,眼部运动流场用于调整待矫正眼部图像中的像素位置。有关目标视线方向和眼部运动流场的介绍说明,可参见上文实施例,此处不再赘述。
眼部轮廓掩码用于指示待矫正眼部图像中的像素位置属于眼部区域的概率。例如,眼部轮廓掩码可以表示为一个一维图像,该一维图像的尺寸(包括高和宽),与待矫正眼部图像的尺寸相同。眼部轮廓掩码中某一像素的像素值可以是一个概率值,表示待矫正眼部图像中相同位置的像素属于眼部区域的概率。例如,眼部轮廓掩码中坐标为(i,j)位置的像素值,可以是一个属于[0,1]取值范围的概率值,表示待矫正眼部图像中坐标为(i,j)位置的像素属于眼部区域的概率。
在一些实施例中,将待矫正眼部图像和目标视线方向输入至视线矫正模型,通过该视线矫正模型对上述输入数据进行特征提取处理,输出眼部运动流场和眼部轮廓掩码。视线矫正模型可以是预先对神经网络进行训练得到的机器学习模型。
例如,步骤520可以通过以下方式实现:对待矫正眼部图像和目标视线方向进行基于通道维的组合处理,得到组合数据;通过视线矫正模型对组合数据进行特征提取处理,得到视线矫正模型的输出数据;从输出数据中提取眼部运动流场和眼部轮廓掩码。
例如,待矫正眼部图像可以包括R、G、B三个通道的图像,目标视线方向可以包括俯仰角、偏航角这两个通道的图像,通过对待矫正眼部图像和目标视线方向在通道维进行组合,得到组合数据,该组合数据可以包括上述5个通道(R、G、B三个通道以及俯仰角、偏航角这两个通道)的图像。另外,在目标视线方向包括俯仰角等于0°且偏航角也等于0°的情况下,俯仰角这一通道对应的图像中各像素的像素值均为0,偏航角这一通道对应的图像中各像素的像素值也均为0。
视线矫正模型可以是一个神经网络模型,如其可以包括编码网络和解码网络。编码网络用于对组合数据进行下采样处理,得到该组合数据的特征信息;解码网络用于对上述特征信息进行上采样处理,得到输出数据。
输出数据可以包括3通道的图像(或称为数据),从输出数据中提取第一通道和第二通道的数据,得到眼部运动流场;从输出数据中提取第三通道的数据,得到眼部轮廓掩码。例如,第一通道的数据作为第一维度图像,用于存储各像素的水平位移量;第二通道的数据作为第二维度图像,用于存储各像素的垂直位移量。
在一个示例中,假设待矫正眼部图像的高为H,宽为W,H和W可以分别表示高度方向上的像素数量和宽度方向上的像素数量。那么待矫正眼部图像是一个H×W×3的三通道图像,目标视线方向是一个H×W×2的二通道图像,两者在通道维进行组合,得到H×W×5的组合数据。视线矫正模型的输出数据包括H×W×3的三通道图像,从中提取两个通道的数据H×W×2作为眼部运动流场,剩下一个通道的数据H×W×1作为眼部轮廓掩码。
例如,视线矫正模型为经多个教师视线矫正模型进行知识蒸馏训练后得到的学生视线矫正模型。学生视线矫正模型相比于教师视线矫正模型的模型结构和/或模型参数会有所简化。采用这种方式,能够训练得到视线矫正效果优,模型体积小的视线矫正模型,适合诸如手机等移动设备上应用。
步骤530,基于眼部运动流场对待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像。
例如,对于待矫正眼部图像中的任一像素,从眼部运动流场中获取该像素的水平位移量和垂直位移量,然后基于水平位移量和垂直位移量对该像素进行位移处理,得到矫正后眼部图像。
假设待矫正眼部图像为x,矫正后眼部图像为y,眼部运动流场为f,对于图像中的任一坐标为(i,j)位置,y(i,j)的计算如下:y(i,j)=x(i+f(i,j)[0],j+f(i,j)[1]),由于i+f(i,j)[0]和j+f(i,j)[1]是浮点数,因此需要用双线性插值计算i+f(i,j)[0]和j+f(i,j)[1]。
步骤540,基于眼部轮廓掩码对矫正后眼部图像进行调整处理,得到调整后眼部图像。
例如,步骤540可以通过以下方式实现:将眼部轮廓掩码与矫正后眼部图像中对应同一位置的像素值进行乘积处理,得到第一中间图像;将眼部轮廓掩码对应的映射图像与待矫正眼部图像中对应同一位置的像素值进行乘积处理,得到第二中间图像;将第一中间图像和第二中间图像中对应同一位置的像素值进行加和处理,得到调整后眼部图像。
结合上文实施例中的介绍说明,眼部轮廓掩码中各位置的像素值是一个属于[0,1]取 值范围的概率值,眼部轮廓掩码对应的映射图像中任一位置的像素值,是采用1减去眼部轮廓掩码中相同位置的像素值(即概率值)得到的数值。例如,眼部轮廓掩码中坐标为(i,j)位置的像素值(即概率值)为0.2,那么眼部轮廓掩码对应的映射图像中坐标为(i,j)位置的像素值为1-0.2=0.8。
上述生成调整后眼部图像可以采用如下公式表示:调整后眼部图像=眼部轮廓掩码×矫正后眼部图像+(1-眼部轮廓掩码)×待矫正眼部图像。
步骤550,基于调整后眼部图像,生成经视线矫正后的脸部图像。
例如,将脸部图像中的待矫正的眼睛图像替换为调整后眼部图像,得到经视线矫正后的脸部图像。
例如,将调整后眼部图像整合至待矫正眼部图像在脸部图像的图像截取框位置处,得到整合图像;对该整合图像中图像截取框位置处进行图像和谐化处理,得到经视线矫正后的脸部图像。其中,图像和谐化处理的目的是为了消除图像截取框位置处的边界痕迹。在本申请实施例中,对图像和谐化处理所采用的方式不作限定,如高斯模糊、腐蚀膨胀或者基于深度学习的图像和谐化方法等。
在一种可能的实现方式中,采用如下方式对整合图像中图像截取框位置处进行图像和谐化处理,得到经视线矫正后的脸部图像:
1、生成与脸部图像相同尺寸的初始化掩膜图像,该初始化掩膜图像在图像截取框位置处的像素值为1,其余位置的像素值为0;
例如,假设原始的脸部图像的尺寸为C×H×W;其中,C为通道数(如包括R、G、B三个通道),H为高度(如高度方向上包含的像素数量),W为宽度(如宽度方向上包含的像素数量)。同时假设上述图像截取框是一个在脸部图像中的目标位置处的尺寸为h×w的矩形框,h为高度(如高度方向上包含的像素数量),w为宽度(如宽度方向上包含的像素数量),那么待矫正眼部图像和矫正后眼部图像的尺寸均为c×h×w,c为通道数(如包括R、G、B三个通道)。
那么生成尺寸为C×H×W的初始化掩膜图像,对于C个通道中的每一个单通道图像H×W,在上述目标位置处的h×w图像截取框中的像素值设为1,除该图像截取框之外的其他区域的像素值设为0,作为初始化掩膜图像。
2、对初始化掩膜图像进行腐蚀操作和高斯模糊处理,得到处理后掩膜图像;
腐蚀操作的目的是消除物体之间的边界点,例如可以采用椭圆模板对初始化掩膜图像进行腐蚀操作,得到腐蚀后掩膜图像。高斯模糊处理也称为高斯平滑处理,用于减少图像噪声以及降低细节层次。在得到上述腐蚀后掩膜图像之后,可以对该腐蚀后掩膜图像进行高斯模糊处理,得到处理后掩膜图像。处理后掩膜图像中,仍然是一个尺寸为C×H×W的图像,各像素的取值在范围[0,1]之间,特别是对于原本0和1之间的交界位置处的像素来说,经过上述腐蚀操作和高斯模糊处理之后,取值会介于0和1之间,从而实现平滑过渡。
3、将处理后掩膜图像与整合图像中对应同一位置的像素值进行乘积处理,得到第一生成图像;
4、将处理后掩膜图像对应的映射图像与脸部图像中对应同一位置的像素值进行乘积处理,得到第二生成图像;
5、将第一生成图像和第二生成图像中对应同一位置的像素值进行加和处理,得到经视线矫正后的脸部图像。
处理后掩膜图像中各位置的像素值是一个属于[0,1]取值范围的数值,处理后掩膜图像对应的映射图像中任一位置的像素值,是采用1减去处理后掩膜图像中相同位置的像素值得到的数值。例如,处理后掩膜图像中坐标为(i,j)位置的像素值为0.3,那么处 理后掩膜图像对应的映射图像中坐标为(i,j)位置的像素值为1-0.3=0.7。
上述生成经视线矫正后的脸部图像可以采用如下公式表示:经视线矫正后的脸部图像=处理后掩膜图像×整合图像+(1-处理后掩膜图像)×脸部图像。
通过上述方式,能够使得整合图像中位于图像截取框位置处的边界痕迹得到消除,使得最终得到的经视线矫正后的脸部图像没有明显的拼接痕迹,效果更佳。
如图6所示,将待矫正眼部图像和目标视线方向输入至视线矫正模型,由视线矫正模型输出眼部运动流场和眼部轮廓掩码,基于眼部运动流场对待矫正眼部图像进行变换处理,生成变换处理后眼部图像(相当于步骤530中的矫正后眼部图像),然后基于眼部轮廓掩码对变换处理后眼部图像进行图像调整处理,得到最终的矫正后眼部图像(相当于步骤540中的调整后眼部图像)。
综上所述,本申请实施例提供的技术方案,由于从脸部图像中截取的待矫正眼部图像,除了包括眼部轮廓内部的眼部区域之外,还包括眼部轮廓外部的非眼部区域,本申请实施例通过采用眼部轮廓掩码对矫正后眼部图像进行调整处理,得到调整后眼部图像,将该调整后眼部图像作为最终的矫正后眼部图像,实现了对于眼部轮廓内部的眼部区域保留经眼部运动流场进行像素位移后的结果,而对于眼部轮廓外部的非眼部区域保留更多的原始图像信息,实现了通过注意力机制将原始的待矫正眼部图像和经眼部运动流场矫正后的眼部图像进行融合,保证了只对眼部轮廓内部的图像内容进行视线矫正,而眼部轮廓外部的图像内容不需要进行视线矫正,提升了最终得到的矫正后眼部图像的视线矫正效果。
下面,通过对视线矫正模型的训练流程进行介绍说明,有关该视线矫正模型使用过程中涉及的内容和训练过程中涉及的内容是相互对应的,两者互通,如在一侧未作详细说明的地方,可以参考另一侧的描述说明。
请参考图7,其示出了本申请实施例提供的视线矫正模型的训练方法的流程图。该方法各步骤的执行主体可以是电脑、服务器等计算机设备。该方法可以包括如下几个步骤(710~730):
步骤710,通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,训练后的第一教师视线矫正模型用于输出待矫正眼部图像样本的眼部运动流场,眼部运动流场用于调整待矫正眼部图像样本中的像素位置。
例如,第一教师视线矫正模型可以是一个神经网络模型。例如,该模型的输入数据包括待矫正眼部图像样本和目标视线方向,输出数据包括眼部运动流场和眼部轮廓掩码。
在一些实施例中,步骤710可以包括如下几个子步骤:
1、获取第一教师视线矫正模型的训练样本,该训练样本包括待矫正眼部图像样本和目标矫正眼部图像。
例如,每个训练样本包括两张图像,分别是同一个人在同一个头部姿态角下拍摄得到的两张不同视线的图像,其中一张图像可以是任意视线方向的图像,该张图像作为待矫正眼部图像样本使用,另一张图像是具有目标视线方向的图像,该张图像作为目标矫正眼部图像使用。需要说明的是,不同的训练样本可以是不同的人物,也可以具有不同的头部姿态。也即,模型的训练样本集中可以包括多个训练样本,该多个训练样本可以包括具有不同人物的训练样本,包括具有不同头部姿态的训练样本,从而使得训练出的模型能够适应不同人物和不同头部姿态,提升模型的鲁棒性。
2、通过第一教师视线矫正模型对待矫正眼部图像样本进行眼特征提取处理,得到待矫正眼部图像样本的眼部运动流场和眼部轮廓掩码,眼部轮廓掩码用于指示待矫正眼部图像样本中的像素位置属于眼部区域的概率。
如图8所示,将待矫正眼部图像样本和目标视线方向进行基于通道维的组合处理,得到组合数据;通过第一教师视线矫正模型对组合数据进行特征提取处理,得到输出数据;从输出数据中提取眼部运动流场和眼部轮廓掩码。
在一个示例中,假设待矫正眼部图像样本的高为H,宽为W,H和W可以分别表示高度方向上的像素数量和宽度方向上的像素数量。那么待矫正眼部图像样本是一个H×W×3的三通道图像,目标视线方向是一个H×W×2的二通道图像,两者在通道维进行组合,得到H×W×5的组合数据,输入至第一教师视线矫正模型。第一教师视线矫正模型的输出数据包括H×W×3的三通道图像,从中提取两个通道的数据H×W×2作为眼部运动流场,剩下一个通道的数据H×W×1作为眼部轮廓掩码。
需要说明的是,在训练过程中,目标视线方向可以是正视摄像头的(0°,0°)方向,也可以是其他任意方向,从而使得模型具备将视线矫正至任意方向的能力。
3、基于待矫正眼部图像样本及对应的眼部运动流场和眼部轮廓掩码,确定矫正后眼部图像样本。
例如,基于眼部运动流场对待矫正眼部图像样本进行变换处理,得到变换处理后眼部图像样本,然后基于眼部轮廓掩码对变换处理后眼部图像样本进行调整处理,得到矫正后眼部图像样本。该过程与图5实施例中介绍的步骤530至540相同或类似,具体可参见图5实施例中的介绍说明,此处不再赘述。
4、基于矫正后眼部图像样本和目标矫正眼部图像,构建第一教师视线矫正模型的损失函数,并基于第一教师视线矫正模型的损失函数对第一教师视线矫正模型的参数进行调整。
例如,第一教师视线矫正模型的损失函数可以基于矫正后眼部图像样本和目标矫正眼部图像之间的差异来构建,如将矫正后眼部图像样本和目标矫正眼部图像之间的重构损失,作为第一教师视线矫正模型的损失。然后,基于该第一教师视线矫正模型的损失函数,采用梯度下降算法对第一教师视线矫正模型的参数进行调整,以优化模型参数。
步骤720,通过待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,训练后的第二教师视线矫正模型用于输出待矫正眼部图像样本的矫正后眼部图像样本。
例如,第二教师视线矫正模型可以是一个神经网络模型。例如,该模型的输入数据包括待矫正眼部图像样本和目标视线方向,输出数据包括矫正后眼部图像样本和眼部轮廓掩码。第二教师视线矫正模型与第一教师视线矫正模型的不同之处在于,第二教师视线矫正模型直接输出经视线矫正后得到的矫正后眼部图像样本。
在一些实施例中,步骤720可以通过以下方式实现:
1、获取第二教师视线矫正模型的训练样本,该训练样本包括待矫正眼部图像样本和目标矫正眼部图像。
例如,第二教师视线矫正模型采用的训练样本,可以与第一教师视线矫正模型采用的训练样本相同,也可以不同。但是,不论相同还是不同,每个训练样本均是包括待矫正眼部图像样本和目标矫正眼部图像。
2、通过第二教师视线矫正模型对待矫正眼部图像样本进行视线矫正处理,得到矫正后眼部图像样本和眼部轮廓掩码,眼部轮廓掩码用于指示待矫正眼部图像样本中的像素位置属于眼部区域的概率。
如图9所示,将待矫正眼部图像样本和目标视线方向进行基于通道维的组合处理,得到组合数据;通过第二教师视线矫正模型对组合数据进行处理,得到输出数据;从输出数据中提取得到矫正后眼部图像样本和眼部轮廓掩码。
在一个示例中,假设待矫正眼部图像样本的高为H,宽为W,H和W可以分别表 示高度方向上的像素数量和宽度方向上的像素数量。那么待矫正眼部图像样本是一个H×W×3的三通道图像,目标视线方向是一个H×W×2的二通道图像,两者在通道维进行组合,得到H×W×5的组合数据,输入至第二教师视线矫正模型。第二教师视线矫正模型的输出数据包括H×W×4的四通道图像,从中提取三个通道的数据H×W×3作为矫正后眼部图像样本,剩下一个通道的数据H×W×1作为眼部轮廓掩码。
3、基于眼部轮廓掩码对矫正后眼部图像样本进行调整处理,得到调整后眼部图像样本。
例如,将眼部轮廓掩码与矫正后眼部图像样本中同一位置的像素值进行相乘,得到第三中间图像;将眼部轮廓掩码对应的映射图像与待矫正眼部图像样本中同一位置的像素值进行相乘,得到第四中间图像;将第三中间图像和第四中间图像中同一位置的像素值进行相加,得到调整后眼部图像样本。该过程与图5实施例步骤540中介绍的生成调整后眼部图像的方式相同或类似,具体可参见上文介绍说明,此处不再赘述。
4、基于调整后眼部图像样本和目标矫正眼部图像,构建第二教师视线矫正模型的损失函数,并基于第二教师视线矫正模型的损失函数对第二教师视线矫正模型的参数进行调整。
例如,第二教师视线矫正模型的损失函数可以基于调整后眼部图像样本和目标矫正眼部图像之间的差异来构建,如将调整后眼部图像样本和目标矫正眼部图像之间的重构损失,作为第二教师视线矫正模型的损失。然后,基于该第二教师视线矫正模型的损失,采用梯度下降算法对第二教师视线矫正模型的参数进行调整,以优化模型参数。
步骤730,通过第一教师视线矫正模型和第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
在本申请实施例中,知识蒸馏训练过程的目的是让学生视线矫正模型能够学习到第一教师视线矫正模型和第二教师视线矫正模型所学习到的知识,从而生成出一个视线矫正效果优,模型体积小的学生视线矫正模型,适合诸如手机等移动设备上应用。
在训练学生视线矫正模型的过程中,第一教师视线矫正模型和第二教师视线矫正模型的模型参数固定不变,通过调整学生视线矫正模型的参数,对其性能进行优化。
在一些实施例中,步骤730可以通过以下方式实现:
1、获取学生视线矫正模型的训练样本,该训练样本包括待矫正眼部图像样本和目标矫正眼部图像。
学生视线矫正模型采用的训练样本,可以与第一/第二教师视线矫正模型采用的训练样本相同,也可以不同。但是,不论相同还是不同,每个训练样本均是包括待矫正眼部图像样本和目标矫正眼部图像。
2、通过训练后的第一教师视线矫正模型输出待矫正眼部图像样本的教师眼部运动流场和第一教师眼部轮廓掩码,并基于待矫正眼部图像样本及对应的教师眼部运动流场和第一教师眼部轮廓掩码,生成第一输出图像。
该过程与步骤710中介绍的生成矫正后眼部图像样本的过程相同或类似,此处不再赘述。
3、通过训练后的第二教师视线矫正模型输出待矫正眼部图像样本的矫正后图像和第二教师眼部轮廓掩码,并基于矫正后图像和第二教师眼部轮廓掩码,生成第二输出图像。
该过程与步骤720中介绍的生成调整后眼部图像样本的过程相同或类似,此处不再赘述。
4、通过学生视线矫正模型输出待矫正眼部图像样本的学生眼部运动流场和学生眼部轮廓掩码,并基于待矫正眼部图像样本及对应的学生眼部运动流场和学生眼部轮廓掩 码,生成第三输出图像。
如图10所示,学生视线矫正模型的输入数据包括待矫正眼部图像样本和目标视线方向,输出数据包括学生眼部运动流场和学生眼部轮廓掩码。
例如,采用学生眼部运动流场对待矫正眼部图像样本进行变换处理,生成变换后图像;采用学生眼部轮廓掩码对变换后图像进行调整处理,生成第三输出图像。
5、基于第一输出图像与第三输出图像之间的差异、第二输出图像与第三输出图像之间的差异、以及第三输出图像与目标矫正眼部图像之间的差异,构建学生视线矫正模型的损失函数。
在一些实施例中,基于第一输出图像与第三输出图像之间的差异,确定第一子损失;基于第二输出图像与第三输出图像之间的差异,确定第二子损失;基于第三输出图像与目标矫正眼部图像之间的差异,确定第三子损失;对第一子损失、第二子损失和第三子损失进行加权求和处理,得到学生视线矫正模型的损失函数。
例如,学生视线矫正模型的损失函数L可以采用如下公式计算得到:
L=Kd_loss+Rec_loss;
其中,Kd_loss=w1×LPIPS Loss(teacher1_img,student_img)+w2×LPIPS Loss(teacher2_img,student_img),Rec_loss=w3×L1Loss(student_img,img_tar)。
其中,w1、w2和w3分别表示3个权重值,其可以是可调节数值,例如w1+w2+w3=1。LPIPS Loss(teacher1_img,student_img)表示上述第一子损失,LPIPS Loss(teacher2_img,student_img)表示上述第二子损失,L1Loss(student_img,img_tar)表示上述第三子损失。teacher1_img表示第一输出图像,teacher2_img表示第二输出图像,student_img表示第三输出图像,img_tar表示目标矫正眼部图像。在上述公式中,第一子损失和第二子损失采用LPIPS(Learned Perceptual Image Patch Similarity,感知图像块相似度学习)损失,第三子损失采用L1损失。
6、基于学生视线矫正模型的损失函数,对学生视线矫正模型的参数进行调整,得到训练后的学生视线矫正模型。
基于该学生视线矫正模型的损失,采用梯度下降算法对学生视线矫正模型的参数进行调整,以优化模型参数。
综上所述,本申请实施例提供的技术方案,通过采用多教师蒸馏方式,训练生成最终线上使用的学生视线矫正模型,使得学生视线矫正模型能够学习到第一教师视线矫正模型和第二教师视线矫正模型所学习到的知识,从而生成出一个视线矫正效果优,模型体积小的学生视线矫正模型,适合诸如手机等移动设备上应用。
另外,第一教师视线矫正模型是基于运动流场的模型,其输出数据包括眼部运动流场,基于眼部运动流场进行视线矫正,由于是对原始待矫正眼部图像中的像素进行位移实现视线矫正,因此第一教师视线矫正模型能够较好地保留原始图像特征;但是,对于眼部视线偏移较大的情况,如果眼部轮廓内容仅有少量的眼珠对应像素,采用像素位移实现视线矫正会存在失真,因此训练另一个基于图像的第二教师视线矫正模型,由于其输出数据包括矫正后的眼部图像,因此第二教师视线矫正模型能够较好地克服上述失真问题;最后,采用多教师蒸馏学习的方式,利用上述两个教师模型训练得到学生视线矫正模型,使得学生视线矫正模型能够兼顾上述两个教师模型的各自优势,生成出更加真实且不易失真的矫正后眼部图像。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图11,其示出了本申请实施例提供的脸部图像的视线矫正装置的框图。该装置具有实现上述脸部图像的视线矫正方法的功能,所述功能可以由硬件实现,也可以由 硬件执行相应的软件实现。该装置可以是计算机设备,也可以设置在计算机设备中。该装置1100可以包括:眼部图像获取模块1110、运动流场生成模块1120、视线矫正处理模块1130和眼部图像整合模块1140。
眼部图像获取模块1110,配置为从脸部图像中获取待矫正眼部图像。运动流场生成模块1120,配置为基于所述待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,所述目标视线方向是指所述待矫正眼部图像中的眼部视线需要矫正后的视线方向,所述眼部运动流场用于调整所述待矫正眼部图像中的像素位置。视线矫正处理模块1130,配置为基于所述眼部运动流场对所述待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像。眼部图像整合模块1140,配置为基于所述矫正后眼部图像,生成经视线矫正后的脸部图像。
在一些实施例中,所述运动流场生成模块1120,还配置为基于所述待矫正眼部图像和所述目标视线方向,确定眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像中的像素位置属于眼部区域的概率。所述眼部图像整合模块1140,还配置为基于所述眼部轮廓掩码对所述矫正后眼部图像进行调整处理,得到调整后眼部图像;将所述脸部图像中的所述待矫正的眼睛图像替换为所述调整后眼部图像,得到所述经视线矫正后的脸部图像。
在一些实施例中,所述运动流场生成模块1120,配置为:对所述待矫正眼部图像和所述目标视线方向进行基于通道维的组合处理,得到组合数据;通过视线矫正模型对所述组合数据进行特征提取处理,得到所述视线矫正模型的输出数据;从所述输出数据中提取得到所述眼部运动流场和所述眼部轮廓掩码。
在一些实施例中,所述视线矫正模型为经多个教师视线矫正模型进行训练后得到的学生视线矫正模型,所述学生视线矫正模型的训练过程如下:通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场;通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对所述学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
在一些实施例中,所述眼部图像整合模块1140,配置为:将所述眼部轮廓掩码与所述矫正后眼部图像中对应同一位置的像素值进行乘积处理,得到第一中间图像;将所述眼部轮廓掩码对应的映射图像与所述待矫正眼部图像中对应同一位置的像素值进行乘积处理,得到第二中间图像;将所述第一中间图像和所述第二中间图像中对应同一位置的像素值进行加和处理,得到所述调整后眼部图像。
请参考图12,其示出了本申请实施例提供的视线矫正模型的训练装置的框图。该装置具有实现上述视线矫正模型的训练方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是计算机设备,也可以设置在计算机设备中。该装置1200可以包括:第一教师模型训练模块1210、第二教师模型训练模块1220和学生模型训练模块1230。
第一教师模型训练模块1210,配置为通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场,所述眼部运动流场用于调整所述待矫正眼部图像样本中的像素位置。第二教师模型训练模块1220,配置为通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到 训练后的第二教师视线矫正模型,所述训练后的第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本。学生模型训练模块1230,配置为通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
在一些实施例中,所述第一教师模型训练模块1210,配置为:获取所述第一教师视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;通过所述第一教师视线矫正模型对所述待矫正眼部图像样本进行眼部特征提取处理,得到所述待矫正眼部图像样本的眼部运动流场和眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像样本中的像素位置属于眼部区域的概率;基于所述待矫正眼部图像样本以及对应的眼部运动流场和眼部轮廓掩码,确定矫正后眼部图像样本;基于所述矫正后眼部图像样本和所述目标矫正眼部图像,构建所述第一教师视线矫正模型的损失函数,并基于所述第一教师视线矫正模型的损失函数对所述第一教师视线矫正模型的参数进行调整。
在一些实施例中,所述第二教师模型训练模块1220,配置为:获取所述第二教师视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;通过所述第二教师视线矫正模型对所述待矫正眼部图像样本进行视线矫正处理,得到矫正后眼部图像样本和眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像样本中的像素位置属于眼部区域的概率;基于所述眼部轮廓掩码对所述矫正后眼部图像样本进行调整处理,得到调整后眼部图像样本;基于所述调整后眼部图像样本和所述目标矫正眼部图像,构建所述第二教师视线矫正模型的损失函数,并基于所述第二教师视线矫正模型的损失函数对所述第二教师视线矫正模型的参数进行调整。
在一些实施例中,所述学生模型训练模块1230,配置为:获获取所述学生视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;通过所述训练后的第一教师视线矫正模型输出所述待矫正眼部图像样本的教师眼部运动流场和第一教师眼部轮廓掩码,并基于所述待矫正眼部图像样本及对应的教师眼部运动流场和第一教师眼部轮廓掩码,生成第一输出图像;通过所述训练后的第二教师视线矫正模型输出所述待矫正眼部图像样本的矫正后图像和第二教师眼部轮廓掩码,并基于所述矫正后图像和所述第二教师眼部轮廓掩码,生成第二输出图像;通过所述学生视线矫正模型输出所述待矫正眼部图像样本的学生眼部运动流场和学生眼部轮廓掩码,并基于所述待矫正眼部图像样本及对应的学生眼部运动流场和学生眼部轮廓掩码,生成第三输出图像;基于所述第一输出图像与所述第三输出图像之间的差异、所述第二输出图像与所述第三输出图像之间的差异、以及所述第三输出图像与所述目标矫正眼部图像之间的差异,构建所述学生视线矫正模型的损失函数;基于所述学生视线矫正模型的损失函数,对所述学生视线矫正模型的参数进行调整,得到训练后的学生视线矫正模型。
在一些实施例中,所述学生模型训练模块1230,配置为:基于所述学生眼部运动流场对所述待矫正眼部图像样本进行变换处理,得到变换后图像;采用所述学生眼部轮廓掩码对所述变换后图像进行调整处理,得到所述第三输出图像。
在一些实施例中,所述学生模型训练模块1230,配置为:基于所述第一输出图像与所述第三输出图像之间的差异,确定第一子损失;基于所述第二输出图像与所述第三输出图像之间的差异,确定第二子损失;基于所述第三输出图像与所述目标矫正眼部图像之间的差异,确定第三子损失;对所述第一子损失、所述第二子损失和所述第三子损失进行加权求和处理,得到所述学生视线矫正模型的损失函数。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完 成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图13,其示出了本申请实施例提供的计算机设备的结构示意图。该计算机设备可以是任何具备数据计算、处理和存储功能的电子设备,如手机、平板电脑、个人计算机(PC,Personal Computer)或服务器等。该计算机设备用于实施上述实施例中提供的脸部图像的视线矫正方法或视线矫正模型的训练方法。
该计算机设备1300包括处理单元(如中央处理器(CPU,Central Processing Unit)、图形处理器(GPU,Graphics Processing Unit)和现场可编程逻辑门阵列(FPGA,Field Programmable Gate Array)等)1301、包括随机存储器(RAM,Random-Access Memory)1302和只读存储器(ROM,Read-Only Memory)1303的系统存储器1304,以及连接系统存储器1304和中央处理单元1301的系统总线1305。该计算机设备1300还包括帮助服务器内的各个器件之间传输信息的基本输入/输出系统(I/O系统,Input Output System)1306,和用于存储操作系统1313、应用程序1314和其他程序模块1315的大容量存储设备1307。
该基本输入/输出系统1306包括有用于显示信息的显示器1308和用于用户输入信息的诸如鼠标、键盘之类的输入设备1309。其中,该显示器1308和输入设备1309都通过连接到系统总线1305的输入输出控制器1310连接到中央处理单元1301。该基本输入/输出系统1306还可以包括输入输出控制器1310以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1310还提供输出到显示屏、打印机或其他类型的输出设备。
该大容量存储设备1307通过连接到系统总线1305的大容量存储控制器(未示出)连接到中央处理单元1301。该大容量存储设备1307及其相关联的计算机可读介质为计算机设备1300提供非易失性存储。也就是说,该大容量存储设备1307可以包括诸如硬盘或者只读光盘(CD-ROM,Compact Disc Read-Only Memory)驱动器之类的计算机可读介质(未示出)。
不失一般性,该计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦写可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦写可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、闪存或其他固态存储其技术,CD-ROM、高密度数字视频光盘(DVD,Digital Video Disc)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知该计算机存储介质不局限于上述几种。上述的系统存储器1304和大容量存储设备1307可以统称为存储器。
根据本申请实施例,该计算机设备1300还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1300可以通过连接在该系统总线1305上的网络接口单元1311连接到网络1312,或者说,也可以使用网络接口单元1311来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、至少一段程序、代码集或指令集存储于存储器中,且经配置以由一个或者一个以上处理器执行,以实现上述脸部图像的视线矫正方法或视线矫正模型的训练方法。
在一些实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少 一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或所述指令集在被计算机设备的处理器执行时实现上述实施例提供的脸部图像的视线矫正方法或视线矫正模型的训练方法。
例如,该计算机可读存储介质可以包括:只读存储器(ROM,Read-Only Memory)、随机存储器(RAM,Random-Access Memory)、固态硬盘(SSD,Solid State Drives)或光盘等。其中,随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。
在一些实施例中,还提供了一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从所述计算机可读存储介质中读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行上述脸部图像的视线矫正方法或视线矫正模型的训练方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种脸部图像的视线矫正方法,应用于计算机设备,所述方法包括:
    从脸部图像中获取待矫正眼部图像;
    基于所述待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,所述目标视线方向是指所述待矫正眼部图像中的眼部视线需要矫正后的视线方向,所述眼部运动流场用于调整所述待矫正眼部图像中的像素位置;
    基于所述眼部运动流场对所述待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;
    基于所述矫正后眼部图像,生成经视线矫正后的脸部图像。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    基于所述待矫正眼部图像和所述目标视线方向,确定眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像中的像素位置属于眼部区域的概率;
    所述基于所述矫正后眼部图像,生成经视线矫正后的脸部图像,包括:
    基于所述眼部轮廓掩码对所述矫正后眼部图像进行调整处理,得到调整后眼部图像;
    将所述脸部图像中的所述待矫正的眼睛图像替换为所述调整后眼部图像,得到所述经视线矫正后的脸部图像。
  3. 根据权利要求2所述的方法,其中,所述基于所述待矫正眼部图像和所述目标视线方向,确定眼部运动流场,包括:
    对所述待矫正眼部图像和所述目标视线方向进行基于通道维的组合处理,得到组合数据;
    通过视线矫正模型对所述组合数据进行特征提取处理,得到所述视线矫正模型的输出数据;
    从所述输出数据中提取所述眼部运动流场;
    所述基于所述待矫正眼部图像和所述目标视线方向,确定眼部轮廓掩码,包括:
    对所述待矫正眼部图像和所述目标视线方向进行基于通道维的组合处理,得到组合数据;
    通过视线矫正模型对所述组合数据进行特征提取处理,得到所述视线矫正模型的输出数据;
    从所述输出数据中提取所述眼部轮廓掩码。
  4. 根据权利要求3所述的方法,其中,所述视线矫正模型为通过多个教师视线矫正模型训练得到的学生视线矫正模型,所述学生视线矫正模型的训练过程如下:
    通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场;
    通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;
    通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对所述学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
  5. 根据权利要求2所述的方法,其中,所述基于所述眼部轮廓掩码对所述矫正后 眼部图像进行调整处理,得到调整后眼部图像,包括:
    将所述眼部轮廓掩码与所述矫正后眼部图像中对应同一位置的像素值进行乘积处理,得到第一中间图像;
    将所述眼部轮廓掩码对应的映射图像与所述待矫正眼部图像中对应同一位置的像素值进行乘积处理,得到第二中间图像;
    将所述第一中间图像和所述第二中间图像中对应同一位置的像素值进行加和处理,得到所述调整后眼部图像。
  6. 一种视线矫正模型的训练方法,应用于计算机设备,所述方法包括:
    通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场,所述眼部运动流场用于调整所述待矫正眼部图像样本中的像素位置;
    通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述训练后的第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;
    通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
  7. 根据权利要求6所述的方法,其中,所述通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,包括:
    获取所述第一教师视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;
    通过所述第一教师视线矫正模型对所述待矫正眼部图像样本进行眼部特征提取处理,得到所述待矫正眼部图像样本的眼部运动流场和眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像样本中的像素位置属于眼部区域的概率;
    基于所述待矫正眼部图像样本以及对应的眼部运动流场和眼部轮廓掩码,确定矫正后眼部图像样本;
    基于所述矫正后眼部图像样本和所述目标矫正眼部图像,构建所述第一教师视线矫正模型的损失函数,并基于所述第一教师视线矫正模型的损失函数对所述第一教师视线矫正模型的参数进行调整。
  8. 根据权利要求6所述的方法,其中,所述通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,包括:
    获取所述第二教师视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;
    通过所述第二教师视线矫正模型对所述待矫正眼部图像样本进行视线矫正处理,得到矫正后眼部图像样本和眼部轮廓掩码,所述眼部轮廓掩码用于指示所述待矫正眼部图像样本中的像素位置属于眼部区域的概率;
    基于所述眼部轮廓掩码对所述矫正后眼部图像样本进行调整处理,得到调整后眼部图像样本;
    基于所述调整后眼部图像样本和所述目标矫正眼部图像,构建所述第二教师视线矫正模型的损失函数,并基于所述第二教师视线矫正模型的损失函数对所述第二教师视线矫正模型的参数进行调整。
  9. 根据权利要求6所述的方法,其中,所述通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型,包括:
    获取所述学生视线矫正模型的训练样本,所述训练样本包括待矫正眼部图像样本和目标矫正眼部图像;
    通过所述训练后的第一教师视线矫正模型输出所述待矫正眼部图像样本的教师眼部运动流场和第一教师眼部轮廓掩码,并基于所述待矫正眼部图像样本及对应的教师眼部运动流场和第一教师眼部轮廓掩码,生成第一输出图像;
    通过所述训练后的第二教师视线矫正模型输出所述待矫正眼部图像样本的矫正后图像和第二教师眼部轮廓掩码,并基于所述矫正后图像和所述第二教师眼部轮廓掩码,生成第二输出图像;
    通过所述学生视线矫正模型输出所述待矫正眼部图像样本的学生眼部运动流场和学生眼部轮廓掩码,并基于所述待矫正眼部图像样本及对应的学生眼部运动流场和学生眼部轮廓掩码,生成第三输出图像;
    基于所述第一输出图像与所述第三输出图像之间的差异、所述第二输出图像与所述第三输出图像之间的差异、以及所述第三输出图像与所述目标矫正眼部图像之间的差异,构建所述学生视线矫正模型的损失函数;
    基于所述学生视线矫正模型的损失函数,对所述学生视线矫正模型的参数进行调整,得到训练后的学生视线矫正模型。
  10. 根据权利要求9所述的方法,其中,所述基于所述待矫正眼部图像样本及对应的学生眼部运动流场和学生眼部轮廓掩码,生成第三输出图像,包括:
    基于所述学生眼部运动流场对所述待矫正眼部图像样本进行变换处理,得到变换后图像;
    采用所述学生眼部轮廓掩码对所述变换后图像进行调整处理,得到所述第三输出图像。
  11. 根据权利要求9所述的方法,其中,所述基于所述第一输出图像与所述第三输出图像之间的差异、所述第二输出图像与所述第三输出图像之间的差异,以及所述第三输出图像与所述目标矫正眼部图像之间的差异,构建所述学生视线矫正模型的损失函数,包括:
    基于所述第一输出图像与所述第三输出图像之间的差异,确定第一子损失;
    基于所述第二输出图像与所述第三输出图像之间的差异,确定第二子损失;
    基于所述第三输出图像与所述目标矫正眼部图像之间的差异,确定第三子损失;
    对所述第一子损失、所述第二子损失和所述第三子损失进行加权求和处理,得到所述学生视线矫正模型的损失函数。
  12. 一种脸部图像的视线矫正装置,所述装置包括:
    眼部图像获取模块,配置为从脸部图像中获取待矫正眼部图像;
    运动流场生成模块,配置为基于所述待矫正眼部图像和目标视线方向,确定眼部运动流场;其中,所述目标视线方向是指所述待矫正眼部图像中的眼部视线需要矫正后的视线方向,所述眼部运动流场用于调整所述待矫正眼部图像中的像素位置;
    视线矫正处理模块,配置为基于所述眼部运动流场对所述待矫正眼部图像进行视线矫正处理,得到矫正后眼部图像;
    眼部图像整合模块,配置为基于所述矫正后眼部图像,生成经视线矫正后的脸部图 像。
  13. 一种视线矫正模型的训练装置,所述装置包括:
    第一教师模型训练模块,配置为通过待矫正眼部图像样本对基于运动流场的第一教师视线矫正模型进行训练,得到训练后的第一教师视线矫正模型,所述训练后的第一教师视线矫正模型用于输出所述待矫正眼部图像样本的眼部运动流场,所述眼部运动流场用于调整所述待矫正眼部图像样本中的像素位置;
    第二教师模型训练模块,配置为通过所述待矫正眼部图像样本对基于图像的第二教师视线矫正模型进行训练,得到训练后的第二教师视线矫正模型,所述训练后的第二教师视线矫正模型用于输出所述待矫正眼部图像样本的矫正后眼部图像样本;
    学生模型训练模块,配置为通过所述训练后的第一教师视线矫正模型和所述训练后的第二教师视线矫正模型,对学生视线矫正模型进行知识蒸馏训练,得到训练后的学生视线矫正模型。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至5任一项所述的脸部图像的视线矫正方法,或者实现如权利要求6至11任一项所述的视线矫正模型的训练方法。
  15. 一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至5任一项所述的脸部图像的视线矫正方法,或者实现如权利要求6至11任一项所述的视线矫正模型的训练方法。
  16. 一种计算机程序产品,包括计算机程序,所述计算机程序使得计算机设备执行如权利要求1至5任一项所述的脸部图像的视线矫正方法,或者如权利要求6至11任一项所述的视线矫正模型的训练方法。
PCT/CN2022/072302 2021-01-22 2022-01-17 脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品 WO2022156622A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/977,576 US20230072627A1 (en) 2021-01-22 2022-10-31 Gaze correction method and apparatus for face image, device, computer-readable storage medium, and computer program product face image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110088340.4 2021-01-22
CN202110088340.4A CN112733795B (zh) 2021-01-22 2021-01-22 人脸图像的视线矫正方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/977,576 Continuation US20230072627A1 (en) 2021-01-22 2022-10-31 Gaze correction method and apparatus for face image, device, computer-readable storage medium, and computer program product face image

Publications (1)

Publication Number Publication Date
WO2022156622A1 true WO2022156622A1 (zh) 2022-07-28

Family

ID=75593796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072302 WO2022156622A1 (zh) 2021-01-22 2022-01-17 脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品

Country Status (3)

Country Link
US (1) US20230072627A1 (zh)
CN (1) CN112733795B (zh)
WO (1) WO2022156622A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733797B (zh) * 2021-01-22 2021-10-08 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733795B (zh) * 2021-01-22 2022-10-11 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN113222857A (zh) * 2021-05-27 2021-08-06 Oppo广东移动通信有限公司 图像处理方法、模型的训练方法及装置、介质和电子设备
CN113641247A (zh) * 2021-08-31 2021-11-12 北京字跳网络技术有限公司 视线角度调整方法、装置、电子设备及存储介质
CN114519666B (zh) * 2022-02-18 2023-09-19 广州方硅信息技术有限公司 直播图像矫正方法、装置、设备及存储介质
CN116382475A (zh) * 2023-03-24 2023-07-04 北京百度网讯科技有限公司 视线方向的控制、视线交流方法、装置、设备及介质
CN116958945B (zh) * 2023-08-07 2024-01-30 北京中科睿途科技有限公司 面向智能座舱的司机视线估计方法和相关设备
CN117152688A (zh) * 2023-10-31 2023-12-01 江西拓世智能科技股份有限公司 一种基于人工智能的智慧课堂行为分析方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008264341A (ja) * 2007-04-24 2008-11-06 Chube Univ 眼球運動計測方法および眼球運動計測装置
CN110740246A (zh) * 2018-07-18 2020-01-31 阿里健康信息技术有限公司 一种图像矫正方法、移动设备和终端设备
CN111031234A (zh) * 2019-11-20 2020-04-17 维沃移动通信有限公司 一种图像处理方法及电子设备
CN111339928A (zh) * 2020-02-25 2020-06-26 苏州科达科技股份有限公司 眼神调节方法、装置及存储介质
CN112733794A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733797A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733795A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9635311B2 (en) * 2013-09-24 2017-04-25 Sharp Kabushiki Kaisha Image display apparatus and image processing device
CN109040521B (zh) * 2017-06-08 2020-11-13 株式会社理光 图像处理方法、装置、电子设备及计算机可读存储介质
JP7282810B2 (ja) * 2018-02-22 2023-05-29 イノデム ニューロサイエンシズ 視線追跡方法およびシステム
CN108985159A (zh) * 2018-06-08 2018-12-11 平安科技(深圳)有限公司 人眼模型训练方法、人眼识别方法、装置、设备及介质
CN110598638A (zh) * 2019-09-12 2019-12-20 Oppo广东移动通信有限公司 模型训练方法、人脸性别预测方法、设备及存储介质
CN111008929B (zh) * 2019-12-19 2023-09-26 维沃移动通信(杭州)有限公司 图像矫正方法及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008264341A (ja) * 2007-04-24 2008-11-06 Chube Univ 眼球運動計測方法および眼球運動計測装置
CN110740246A (zh) * 2018-07-18 2020-01-31 阿里健康信息技术有限公司 一种图像矫正方法、移动设备和终端设备
CN111031234A (zh) * 2019-11-20 2020-04-17 维沃移动通信有限公司 一种图像处理方法及电子设备
CN111339928A (zh) * 2020-02-25 2020-06-26 苏州科达科技股份有限公司 眼神调节方法、装置及存储介质
CN112733794A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733797A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733795A (zh) * 2021-01-22 2021-04-30 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112733795A (zh) 2021-04-30
US20230072627A1 (en) 2023-03-09
CN112733795B (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
WO2022156622A1 (zh) 脸部图像的视线矫正方法、装置、设备、计算机可读存储介质及计算机程序产品
WO2022156626A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
US11107232B2 (en) Method and apparatus for determining object posture in image, device, and storage medium
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
CN111488865B (zh) 图像优化方法、装置、计算机存储介质以及电子设备
US20210334942A1 (en) Image processing method and apparatus, device, and storage medium
US20220028031A1 (en) Image processing method and apparatus, device, and storage medium
WO2022188697A1 (zh) 提取生物特征的方法、装置、设备、介质及程序产品
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
WO2022148248A1 (zh) 图像处理模型的训练方法、图像处理方法、装置、电子设备及计算机程序产品
Chen et al. Sound to visual: Hierarchical cross-modal talking face video generation
CN111754622B (zh) 脸部三维图像生成方法及相关设备
CN115131218A (zh) 图像处理方法、装置、计算机可读介质及电子设备
CN116434253A (zh) 图像处理方法、装置、设备、存储介质及产品
CN113538254A (zh) 图像恢复方法、装置、电子设备及计算机可读存储介质
US20230098276A1 (en) Method and apparatus for generating panoramic image based on deep learning network
CN109151444B (zh) 3d智能像素增强引擎
Rao et al. A Dual-Path Approach for Gaze Following in Fisheye Meeting Scenes
CN116012513A (zh) 脸部模型生成方法、装置、设备及存储介质
CN116958394A (zh) 图像生成方法、装置、设备及存储介质
Ermaimaiti et al. Face photo-line drawings synthesis based on local extraction preserving generative adversarial networks
CN116612213A (zh) 基于人脸重演算法的数字名片生成方法及系统
CN114118203A (zh) 图像特征提取匹配的方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.11.2023)