WO2022193809A1 - 眼神捕捉方法及装置、存储介质、终端 - Google Patents

眼神捕捉方法及装置、存储介质、终端 Download PDF

Info

Publication number
WO2022193809A1
WO2022193809A1 PCT/CN2022/071905 CN2022071905W WO2022193809A1 WO 2022193809 A1 WO2022193809 A1 WO 2022193809A1 CN 2022071905 W CN2022071905 W CN 2022071905W WO 2022193809 A1 WO2022193809 A1 WO 2022193809A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
dimensional
center position
actor
eyeball
Prior art date
Application number
PCT/CN2022/071905
Other languages
English (en)
French (fr)
Inventor
王志勇
王从艺
柴金祥
张建杰
金师豪
Original Assignee
魔珐(上海)信息科技有限公司
上海墨舞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔珐(上海)信息科技有限公司, 上海墨舞科技有限公司 filed Critical 魔珐(上海)信息科技有限公司
Publication of WO2022193809A1 publication Critical patent/WO2022193809A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor

Definitions

  • Embodiments of the present invention relate to the field of eye capture, and in particular, to an eye capture method and device, a storage medium, and a terminal.
  • Facial animation is an important part of many popular applications, such as movies, games, virtual reality, etc.
  • Eyes as the organ that can convey the most emotional information in the face, play a crucial role in face capture technology.
  • the ability to capture fine eye movements (i.e. gaze) is the key to accurately conveying the actor's intentions and feelings.
  • eye-catching also plays an extremely important role in intelligent interaction. Through eye-catching, the direction the user is staring at can be accurately captured, and the things that the user is interested in can be captured.
  • the technical problems solved by the embodiments of the present invention are high cost of eye-catching and poor user experience.
  • an embodiment of the present invention provides a method for capturing eyes, including: acquiring an image of an actor's eyes; acquiring three-dimensional information of the actor's eyes, and determining the three-dimensional eyeball of the actor according to the three-dimensional information of the eyes , the eye three-dimensional information at least includes: eyeball center position, eyeball radius and iris size; according to the eye image, the eye network model and the three-dimensional eyeball are used to determine the three-dimensional pupil center position, and according to the three-dimensional pupil center The position captures the direction of the actor's gaze.
  • the determining the center position of the three-dimensional pupil by using the eye network model and the three-dimensional eyeball according to the eye image includes: obtaining two-dimensional eye information by using the eye network model according to the eye image.
  • the eye two-dimensional information at least includes: iris mask, two-dimensional pupil center position and eye opening and closing state; according to the eye two-dimensional information and the three-dimensional eyeball, determine the three-dimensional pupil center position.
  • obtaining the two-dimensional eye information by using the eye network model according to the eye image includes: acquiring multiple two-dimensional eyelid feature points corresponding to the eye image; A similarity transformation matrix when the two-dimensional eyelid feature points are aligned with a plurality of preset two-dimensional eyelid feature points; using the similarity transformation matrix to perform similarity transformation on the eye image to obtain a transformed image;
  • the image is input into the eye network model, and the two-dimensional information of the eye corresponding to the transformed image is predicted; the inverse matrix of the similarity transformation matrix is used to transform the two-dimensional information of the eye corresponding to the transformed image to obtain The eye two-dimensional information corresponding to the eye image.
  • the determining the center position of the three-dimensional pupil according to the two-dimensional information of the eye and the three-dimensional eyeball includes: obtaining a prediction of the three-dimensional eyeball according to the three-dimensional eyeball and the estimated three-dimensional pupil center position. Estimating the iris; Projecting the estimated iris to the corresponding two-dimensional plane of the eye image to obtain the estimated iris mask; Calculating between the estimated iris mask and the iris mask predicted by the eye network model The total difference is calculated according to the first difference; if the total difference is not greater than the preset first threshold, the estimated three-dimensional pupil center position is used as the three-dimensional pupil center position.
  • the eye-catching method further includes: if the total difference is greater than a preset first threshold, adjusting and iteratively optimizing the estimated three-dimensional pupil center position according to the total difference until the total difference is reached. is not greater than a preset first threshold or the number of iterations reaches a set number of times, and the estimated three-dimensional pupil center position when the total difference is not greater than a preset first threshold or when the number of iterations reaches a set number of times is used as the three-dimensional pupil center position.
  • calculating the total difference according to the first difference includes: projecting the estimated iris to a two-dimensional plane corresponding to the eye image to obtain an estimated two-dimensional pupil center position; calculating the A second difference between the estimated two-dimensional pupil center position and the two-dimensional pupil center position predicted by the eye network model; the total difference is calculated according to the first difference and the second difference.
  • calculating the total difference according to the first difference and the second difference includes: calculating the third difference between the three-dimensional pupil center position optimized by the current iteration and the three-dimensional pupil center position at the time of optimization; The first difference, the second difference and the third difference are calculated to obtain the total difference.
  • the calculating the first difference between the estimated iris mask and the iris mask predicted by the eye network model includes: calculating the estimated iris mask and the iris mask predicted by the eye network model.
  • the intersection part of the film, and the union part of the estimated iris mask and the iris mask predicted by the eye network model, the difference between the ratio of the intersection part and the union part and the ideal ratio is used as the The first difference; or, according to the generation distance transformation map of the iris mask predicted by the eye network model, calculate the value of the edge pixel of the estimated iris mask in the distance transformation map, obtain according to the calculated value the first difference.
  • the obtaining the three-dimensional information of the actor's eyes includes: obtaining the center position of the eyeball, the radius of the eyeball and the size of the iris through eyeball calibration.
  • obtaining the center position of the eyeball through eyeball calibration includes: obtaining a three-dimensional face of the actor with a neutral expression, and obtaining a plurality of three-dimensional eyelid features from the three-dimensional face under the neutral expression. Calculate the average value of the three-dimensional positions of the multiple three-dimensional eyelid feature points of each eye, and add a preset three-dimensional offset on the basis of the average value of the three-dimensional positions to obtain the eyeball center position of each eye , the offset direction of the three-dimensional offset is toward the inside of the eye.
  • the obtaining the three-dimensional information of the eyes of the actor includes: obtaining a facial image corresponding to the eye image of the actor; obtaining a transformation matrix of the facial posture of the actor according to the facial image, where the The facial posture is the posture of the actor's face relative to the camera; the center position of the eyeball is transformed according to the transformation matrix of the facial posture to obtain the central position of the eyeball relative to the camera.
  • the facial image corresponding to the actor's eye image is obtained according to the following method: the actor's head wears a facial expression capture helmet, and the helmet is relatively stationary with the actor's head; the A facial expression capture camera is mounted on the helmet, which captures the actor's facial expressions.
  • the transformation matrix of the facial pose for the facial image of any frame is a fixed value.
  • the facial image corresponding to the actor's eye image is obtained according to the following methods: using a camera to photograph the actor's facial expression; the camera is separated from the actor's head.
  • the transformation matrix of the facial pose is varied for any frame of facial images.
  • obtaining the iris size through eyeball calibration includes: obtaining a preset number of calibration images that meet calibration requirements; inputting each calibration image into the eye network model, and predicting multiple iris masks; The multiple iris masks are respectively fitted with circles to obtain multiple circles after the circle fitting; the multiple circles are respectively projected to the three-dimensional face of the actor under the neutral expression, according to the projection result Calculate the iris size corresponding to the multiple iris masks in the three-dimensional face; obtain the iris size according to the iris size corresponding to the multiple iris masks in the three-dimensional face.
  • obtaining the iris size according to the iris sizes corresponding to multiple iris masks in the three-dimensional human face includes: taking the average value of the iris sizes corresponding to the multiple iris masks in the three-dimensional human face. as the iris size.
  • the eye network model is for one of a pair of eyes, and when the eye image input to the eye network model is the other eye of the pair of eyes, the input eye image is symmetrical. Flip, and use the symmetrically flipped eye image as the input of the eye network model.
  • the eye-catching method further includes: before determining the center position of the three-dimensional pupil according to the two-dimensional information of the eye and the three-dimensional eyeball, judging whether the actor is closed according to the state of the open and closed eyes. eye; when the eye opening and closing state indicates that the eye is closed, the gaze direction captured according to the eye image of the previous frame is used as the gaze direction corresponding to the eye image.
  • the eye-catching method further includes: after capturing the corresponding three-dimensional pupil center position of each eye in a pair of eyes, calculating the zenith angle ⁇ and the azimuth angle in the three-dimensional pupil center position of the pair of eyes.
  • the joint prior distribution of , the three-dimensional pupil center position includes: eyeball radius, zenith angle ⁇ and azimuth angle
  • capturing the gaze direction of the actor according to the three-dimensional pupil center position includes: determining a direction from the eyeball center position to the three-dimensional pupil center position, and using the direction as the actor's gaze direction.
  • An embodiment of the present invention further provides an eye-catching device, comprising: an acquisition unit for acquiring an eye image of an actor; a three-dimensional eyeball determination unit for acquiring three-dimensional eye information of the actor, according to the three-dimensional eye information Determine the three-dimensional eyeball of the actor, and the three-dimensional eye information includes at least: the center position of the eyeball, the radius of the eyeball and the size of the iris; the eye-catching unit is configured to use the eye network model and the three-dimensional eyeball according to the eye image, The three-dimensional pupil center position is determined, and the gaze direction of the actor is captured according to the three-dimensional pupil center position.
  • An embodiment of the present invention further provides a storage medium, where the storage medium is a non-volatile storage medium or a non-transitory storage medium, and a computer program is stored thereon, and the computer program executes any of the foregoing when the computer program is run by a processor The steps of the eye-catching method.
  • An embodiment of the present invention further provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes any one of the above-mentioned eyes when running the computer program Capture the steps of the method.
  • the three-dimensional eyeball of the actor is determined according to the three-dimensional information of the actor's eye. According to the acquired eye image of the actor, the eye network model and the three-dimensional eyeball are used to determine the center position of the three-dimensional pupil, and the direction of the actor's gaze is captured according to the three-dimensional pupil center position.
  • the embodiment of the present invention captures the eye direction of the actor according to the actor's eye three-dimensional information and the eye network model by using the actor's eye image, aiming at In proposing a user-friendly and inexpensive solution that does not require the user to wear expensive equipment, the capture technology based on a single camera can not only improve the user's comfort when using the equipment, but also is inexpensive and does not require a specific studio. Effectively reduce the cost of eye capture.
  • Fig. 1 is the flow chart of a kind of eye-catching method in the embodiment of the present invention
  • Fig. 2 is a flow chart of a specific implementation of step S11 in Fig. 1;
  • step S11 in FIG. 1 is a flowchart of another specific implementation of step S11 in FIG. 1;
  • Fig. 4 is a flow chart of a specific implementation of step S13 in Fig. 1;
  • Fig. 5 is a flow chart of a specific implementation manner of step S131 in Fig. 4;
  • FIG. 6 is a flowchart of a specific implementation of step S132 in FIG. 4;
  • Fig. 7 is a calibration flowchart of the center position of the eyeball in an eyeball calibration according to an embodiment of the present invention.
  • FIG. 8 is a flow chart of obtaining three-dimensional information of an actor's eyes according to an embodiment of the present invention.
  • Fig. 9 is the calibration flow chart of the iris size in a kind of eyeball calibration in the embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an application scenario in an embodiment of the present invention.
  • FIG. 11 is a flowchart of yet another eye-catching method in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another application scenario in an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another application scenario in an embodiment of the present invention.
  • 15 is a schematic diagram of another application scenario in an embodiment of the present invention.
  • 16 is a schematic diagram of another application scenario in an embodiment of the present invention.
  • FIG. 17 is a schematic diagram of another application scenario in an embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of an eye-catching device in an embodiment of the present invention.
  • the existing eye-capturing technologies are usually based on infrared devices, and users need to wear special glasses or arrange specific infrared devices. This eye-catching method brings great discomfort to the user and is costly.
  • the three-dimensional eyeball of the actor is determined according to the three-dimensional information of the actor's eyes.
  • the eye network model and the three-dimensional eyeball are used to determine the center position of the three-dimensional pupil, and the direction of the actor's gaze is captured according to the three-dimensional pupil center position.
  • the embodiment of the present invention captures the eye direction of the actor by using the actor's eye image, the actor's eye three-dimensional information, and the eye network model, without the need for a user. Wearing expensive equipment, the capture technology based on a single camera can not only improve the user's comfort when using the device, but also is inexpensive and does not need to be performed in a specific studio, which can effectively reduce the cost of eye capture.
  • An embodiment of the present invention provides a method for capturing eyes.
  • a flowchart of a method for capturing eyes in an embodiment of the present invention is provided, which may specifically include the following steps:
  • Step S11 acquiring an eye image of the actor.
  • Step S12 Acquire the three-dimensional eye information of the actor, and determine the three-dimensional eyeball of the actor according to the three-dimensional eye information.
  • the three-dimensional information of the eye may at least include: the center position of the eyeball, the radius of the eyeball, and the size of the iris.
  • Step S13 using the eye network model and the three-dimensional eyeball to determine the center position of the three-dimensional pupil according to the eye image, and capture the gaze direction of the actor according to the center position of the three-dimensional pupil.
  • the direction in which the center position of the eyeball points to the center position of the three-dimensional pupil may be determined, and the direction may be used as the direction of the actor's eyes.
  • the research found that the center position of the iris coincides with the center position of the three-dimensional pupil, and the specific position of the iris on the eyeball is determined according to the center position of the three-dimensional pupil. Therefore, the position of the iris will move with the change of the center position of the three-dimensional pupil, and the final eye will appear. Orientation changes are changes in the position of the iris on the eyeball. Determining the direction in which the center of the eyeball points to the center of the three-dimensional pupil is to calculate the ray direction generated by the connection between the center of the eyeball and the center of the three-dimensional pupil.
  • the three-dimensional pupil center position may be in spherical coordinates represented by , where r is the radius of the three-dimensional eyeball, ⁇ is the zenith angle, is the azimuth angle.
  • r is the radius of the three-dimensional eyeball
  • is the zenith angle
  • azimuth angle is the azimuth angle.
  • the zenith angle ⁇ and azimuth angle in the spherical coordinates of the three-dimensional pupil center position can be used. to indicate the direction of the eyes.
  • the three-dimensional eye information can be used as personalized data for describing each person's eyeball, and the three-dimensional eye information can at least include: eyeball center position, eyeball radius and iris size.
  • the specific values of the center position of the eyeball, the radius of the eyeball, and the size of the iris in the three-dimensional information of each actor's eyes correspond to the actor respectively, and the specific values corresponding to different actors are different.
  • the three-dimensional eyeball corresponding to each actor can be determined according to the three-dimensional information of the eyes of each actor.
  • the center position of the three-dimensional pupil is related to the gaze direction of the actor, even if the same actor has different gaze directions corresponding to different three-dimensional pupil center positions.
  • the iris size is used to characterize the size of the iris.
  • step S11 can be implemented in various ways, that is, the eye image of the actor can be acquired in various ways. For example, only the eyes of the actor are photographed to obtain an image of the eyes of the actor. For another example, a facial image of an actor is taken, and an eye image is cut out from the facial image.
  • the image acquisition device for acquiring the eye image may use a single camera, or may use a computer with an image acquisition function, a mobile phone, a helmet, or other terminals, which are not limited here.
  • Step S11 may specifically include the following steps S111 to S114. Cut out the eye image from the image.
  • Step S111 acquiring the facial image of the actor.
  • Step S112 detecting and obtaining a plurality of two-dimensional eyelid feature points from the facial image.
  • a deep learning method eg, a CNN network
  • the two-dimensional facial feature points include two-dimensional eyelid feature points.
  • the two-dimensional eyelid feature points of each eye can be 6, 8, or more.
  • the specific number can be configured according to the needs, and only needs to meet the limit of multiple two-dimensional eyelid feature points set by Outline the eyes.
  • Step S113 Determine the position of the eyes on the face image according to the positions of the plurality of two-dimensional eyelid feature points on the face image.
  • Step S114 according to the position of the eyes on the face image, cut out the eye image from the face image.
  • the several two-dimensional eyelid feature points of a pair of eyes are acquired, and the several two-dimensional eyelid feature points include two-dimensional eyelid feature points of the left eye and two-dimensional eyelid feature points of the right eye.
  • the eye image of the left eye and the eye image of the right eye can be cut out respectively.
  • the eye image of the left eye is intercepted from the facial image;
  • An eye image of the right eye is taken from the image.
  • step S11 may specifically include the following steps S101 to S105 , and can follow steps S101 to S105 from An eye image is cut out of the actor's face image.
  • Step S101 acquiring the facial image of the actor.
  • Step S102 acquiring a three-dimensional human face corresponding to the facial image, and extracting a plurality of three-dimensional eyelid feature points from the three-dimensional human face.
  • the image used for reconstructing the 3D face and the image used for eye capture can be acquired by the same image acquisition device; the image used for reconstructing the 3D face and the image used for eye capture The images can also be acquired with different image acquisition equipment.
  • the data collected by different image acquisition devices can be approximately matched through the image acquisition time.
  • Approximate matching means that the interval between the acquisition time of the image used for reconstructing the 3D face and the acquisition time of the image used for eye-catching capture satisfies the set period, so as to ensure that the expressions of the actors in the image used for reconstructing the 3D face match the
  • the expressions of the actors in the images used for eye capture do not change much between expressions, which ensures the accuracy of eye directions obtained by eye capture.
  • the specific value of the set duration can be configured according to actual needs. The higher the accuracy requirement of the eye direction obtained by eye capture, the smaller the set duration.
  • reconstructing the 3D face is not limited to reconstruction with images, and other methods can also be used, such as sticking markers on the actor's face, collecting the positions of the markers on the actor's face through the motion capture system, and reconstructing the actor's 3D face.
  • the facial markers and the images used for eye capture can be approximately matched according to their respective acquisition times.
  • Step S103 projecting the plurality of three-dimensional eyelid feature points onto a two-dimensional plane corresponding to the facial image to obtain a plurality of two-dimensional projection points.
  • Step S104 Determine the position of the eyes on the face image according to the positions of the plurality of two-dimensional projection points on the face image.
  • Step S105 according to the position of the eyes on the face image, cut out the eye image from the face image.
  • the size of the captured eye image can be set according to requirements, which is not limited here.
  • the eye image is an image corresponding to a single eye
  • the three-dimensional eyelid feature point of the left eye can be extracted from the three-dimensional face, and the three-dimensional eyelid feature point of the left eye can be projected on the facial image to obtain a two-dimensional projection corresponding to the left eye.
  • the eye image of the left eye is intercepted from the facial image; for the interception process of the eye image of the right eye and the type of the left eye, you can refer to the left eye. The description in the interception of the eye image will not be repeated here.
  • the 3D eyelid feature points corresponding to a pair of eyes can be extracted from the 3D face, and the 3D eyelid feature points corresponding to a pair of eyes can be projected onto the facial image to obtain a pair of eyes.
  • an eye image of a pair of eyes is intercepted from the facial image.
  • step S13 may include the following steps S131 to S132.
  • Step S131 according to the eye image, using the eye network model to obtain two-dimensional eye information.
  • the two-dimensional information of the eye may at least include: an iris mask, a two-dimensional pupil center position, and an eye opening and closing state.
  • the iris mask is used to characterize the information of the two-dimensional pupil.
  • the eye-opening and closing state is used to indicate the state of the eyes being opened or closed.
  • the closed eyes can be detected effectively and accurately through the state of open and closed eyes, which is helpful to determine whether the two-dimensional information of the eyes predicted by the current network can be used to capture the direction of the eyes.
  • the gaze direction corresponding to the eye image of the previous frame is used as the gaze direction corresponding to the current eye image.
  • the state of open and closed eyes may be identified by using binary 0 and 1. For example, 0 is used to identify the closed-eye state, and 1 is used to identify the open-eye state. It can be understood that other identifiers may also be used to identify the state of opening and closing the eyes, which will not be exemplified here.
  • the two-dimensional eye information may further include two-dimensional eyelid feature points. Due to the high resolution of the eye image, the accuracy of the two-dimensional eyelid feature points predicted by the eye network model will be relatively high, and the subsequent correction of the shape of the eyes in the reconstructed three-dimensional face based on the predicted two-dimensional eyelid feature points is beneficial. Improve the accuracy of correction results.
  • Step S132 Determine the center position of the three-dimensional pupil according to the two-dimensional information of the eye and the three-dimensional eyeball.
  • step S131 may include the following steps S11311 to S11315.
  • Step S1311 acquiring a plurality of two-dimensional eyelid feature points corresponding to the eye image.
  • a facial image of an actor can be obtained, and two-dimensional facial feature points in the facial image are detected, wherein the two-dimensional facial feature points include a plurality of two-dimensional eyelid feature points.
  • a 3D face corresponding to the facial image is acquired, a plurality of 3D eyelid feature points are extracted from the 3D face, and the plurality of 3D eyelid feature points are projected onto the face
  • the two-dimensional plane corresponding to the image is obtained, and multiple two-dimensional projection points are obtained.
  • the obtained multiple two-dimensional projection years are multiple two-dimensional eyelid feature points corresponding to the eye image.
  • Step S1312 Calculate the similarity transformation matrix when aligning the plurality of two-dimensional eyelid feature points with a plurality of preset two-dimensional eyelid feature points.
  • the eye image can be cut out according to the positions of the multiple two-dimensional eyelid feature points in the facial image, and the multiple two-dimensional eyelid feature points in the eye image and the multiple preset two-dimensional eyelid features Similarity transformation matrix when points are aligned.
  • the plurality of preset two-dimensional eyelid feature points may be corresponding eyelid feature points under a default expression.
  • the default expression can also be called neutral expression, which refers to the natural state of no expression.
  • the preset two-dimensional eyelid feature points are defined on the preset eye image.
  • Step S1313 using the similarity transformation matrix to perform similarity transformation on the eye image to obtain a transformed image.
  • the eye image can be rotated, zoomed, and adjusted so that the size and position of the transformed image meet the set requirements, so that the two-dimensional eyelid feature points corresponding to the transformed image are the same as the preset eye image.
  • the corresponding 2D eyelid feature points have similar positions, rotations, and sizes.
  • Step S1314 the transformed image is input into the eye network model, and the two-dimensional information of the eye corresponding to the transformed image is predicted.
  • Step S1315 Transform the two-dimensional eye information corresponding to the transformed image by using the inverse matrix of the similarity transformation matrix to obtain the two-dimensional eye information corresponding to the eye image.
  • the eye network model can be obtained by training based on a deep learning algorithm, taking an eye image as an input and taking two-dimensional eye information as an output.
  • the eye network model can be trained based on Convolutional Neural Networks (CNN) or other types of deep neural learning algorithms.
  • CNN Convolutional Neural Networks
  • training can be performed on a single eye (left eye or right eye) as a basis, and an eye network model corresponding to a single eye can be obtained.
  • Training the network model based on a single eye can make the eye network model lightweight, reduce the size of the eye network model, improve the running speed of the eye network model, and reduce the running time of the eye network model. In order to reduce the impact on the frame rate of the original system. In addition, operating costs can be reduced.
  • the eye network model when the eye network model is trained, assuming that the left (or right) eye is used as the benchmark, the sample image of the right (or left) eye as the training sample can be flipped symmetrically, and then converted to the left (or left) eye. Right) sample image of the eye, so only one model needs to be trained. For example, to train the eye network model on the basis of the left eye, the corresponding eye image of the right eye will be flipped left and right symmetrically, and the left eye image corresponding to the left eye will be composed of training data. , to train the eye network model.
  • the eye network model is trained according to the eye image of one of the eyes, when the eye image input to the eye network model is the other eye of the pair of eyes, the input The eye image is symmetrically inverted (eg, left-right symmetrically inverted), and the symmetrically inverted eye image is used as the input of the eye network model.
  • the two-dimensional eye information of the eye image of the other eye before the symmetry inversion can be obtained by performing symmetry transformation on the two-dimensional eye information output by the eye network model again.
  • the eye network model is obtained by training based on the eye image of the left eye
  • the eye image for the left eye can be directly input to the eye network model, and the two-dimensional eye information of the left eye is output.
  • the right eye image needs to be inverted symmetrically from left to right, and converted into the eye image of the left eye.
  • the inverse of the similarity transformation matrix is used. The matrix converts the two-dimensional information of the eye corresponding to the left eye to obtain the two-dimensional information of the eye corresponding to the right eye.
  • step S132 may include the following steps S1321 to S1327.
  • Step S1321 obtaining the estimated iris of the three-dimensional eyeball according to the three-dimensional eyeball and the estimated center position of the three-dimensional pupil.
  • the three-dimensional pupil center position obtained in the previous iteration may be used as the estimated three-dimensional pupil center position. If it is the first iteration, the 3D pupil center position determined by the previous frame image can be used as the estimated 3D pupil center position, or the 3D pupil center position determined according to the image of the default (looking forward) eye direction state can be used as the estimated 3D pupil center position. The center position is used as the estimated 3D pupil center position.
  • Step S1322 project the estimated iris onto a two-dimensional plane corresponding to the eye image to obtain an estimated iris mask.
  • Step S1323 Calculate the first difference between the estimated iris mask and the iris mask predicted by the eye network model.
  • the intersection of the estimated iris mask and the iris mask predicted by the eye network model is divided by the union (Intersection over Union, IOU), and the first difference is obtained according to the IOU calculation result. That is, the degree of overlap between the estimated iris mask and the iris mask predicted by the eye network model is calculated, and the first difference is obtained according to the degree of overlap. Further, the first difference can be obtained according to the difference between the degree of overlap and the degree of ideal overlap, and when the degree of ideal overlap takes complete overlap, the IOU is 1.
  • the ideal ratio is the ratio under the ideal overlapping degree, and when the ideal overlapping degree takes the complete overlapping, the ideal ratio takes 1.
  • Another example according to the generated distance transformation map of the iris mask predicted by the eye network model, calculate the value of the edge pixel of the estimated iris mask in the distance transformation map, according to the calculated estimated iris mask
  • the value of the edge pixels in the distance transform map yields the first difference. For example, the sum of the values of all edge pixels of the estimated iris mask in the distance transformation map is taken as the first difference.
  • the value of each pixel represents the distance of that pixel from the nearest foreground pixel.
  • the foreground is the iris mask predicted by the eye network model.
  • the value of the edge pixel in the distance transformation map is 0; when the edge pixel of the estimated iris mask is not in the eye network
  • the value of the edge pixel in the distance transformation map is greater than 0.
  • Step S1324 Calculate the total difference according to the first difference.
  • the estimated iris is projected to the two-dimensional plane corresponding to the eye image to obtain the estimated two-dimensional pupil center position;
  • the second difference between the estimated two-dimensional pupil center position and the two-dimensional pupil center position predicted by the eye network model; the total difference is calculated according to the first difference and the second difference.
  • the sum of the first difference and the second difference is calculated, and the sum of the first difference and the second difference is used as the total difference.
  • corresponding weights are respectively configured for the first difference and the second difference
  • the first difference and the second difference are weighted according to the weight corresponding to the first difference and the weight corresponding to the second difference, and obtained according to the weighting result total difference.
  • the weight corresponding to the first difference is multiplied by the first difference to obtain the first operation result
  • the weight corresponding to the second difference is multiplied by the second weight to obtain the second operation result
  • the first operation result is obtained by multiplying the weight corresponding to the second difference and the second weight.
  • the sum of the result and the result of the second operation is taken as the total difference.
  • a third difference between the three-dimensional pupil center position optimized by the current iteration and the three-dimensional pupil center position at the initial optimization time is calculated, according to the first difference, the second difference and the first difference. Three differences, the total difference is calculated.
  • the sum of the first difference, the second difference and the third difference may be used as the total difference.
  • Corresponding weights can also be configured for the first difference, the second difference and the third difference, respectively, according to the weight corresponding to the first difference, the weight corresponding to the second difference and the weight corresponding to the third difference, respectively.
  • the difference and the third difference are weighted, and the total difference is obtained according to the weighted result.
  • the weight corresponding to the first difference is multiplied by the first difference to obtain a fourth operation result
  • the weight corresponding to the second difference is multiplied by the second difference to obtain a fifth operation result
  • the third difference corresponds to The weight of , and the third weight are multiplied to obtain the sixth operation result
  • the sum of the fourth operation result, the fifth operation result and the sixth operation result is taken as the total difference.
  • the third difference can be used to characterize the movement of the pupil. Since the interval between the acquisition times of the two frames of images is short, within this interval, the movement of the pupil of the actor is usually small, which is reflected to the eyes in the adjacent frames. image, the position of the pupil varies less between adjacent frames. If the third difference is larger, it indicates that the pupil moves too fast.
  • the third difference can constrain the optimization function used for the iterative optimization to find a solution ( ⁇ and ⁇ in the three-dimensional pupil center position). ), search in the neighborhood of the initial value.
  • the solution sought ( ⁇ in the three-dimensional pupil center position and ) is not in the neighborhood, then the third difference is large, which can prompt the optimization function to return to the neighborhood of the initial value and improve the eye-catching efficiency.
  • the initial value comes from the eye direction captured according to the eye image of the previous frame or the eye direction captured according to the eye image looking forward by default.
  • Step S1325 determining whether the total difference is greater than a preset first threshold.
  • step S1326 is executed.
  • Step S1326 taking the estimated three-dimensional pupil center position as the three-dimensional pupil center position.
  • step S1327 is executed.
  • Step S1327 Adjust the three-dimensional pupil center position according to the total difference.
  • spherical coordinates can be used where, r is the eyeball radius, ⁇ is the zenith angle, is the azimuth angle.
  • the three-dimensional pupil center position of each eye may be optimized in a synthesis-analysis manner, that is, the manner corresponding to steps S1321 to S1327.
  • the eyeball radius r may be a preset value. Therefore, when optimizing the three-dimensional pupil center position by synthesis-analysis, only ⁇ , optimize.
  • step S1321 is continued according to the adjusted 3D pupil center position, that is, the 3D pupil center position is iteratively optimized until the The total difference is not greater than a preset first threshold or the number of iterations reaches a set number of times, and the estimated three-dimensional pupil center position when the total difference is not greater than a preset first threshold or the number of iterations reaches a set number of times is used as the three-dimensional pupil center Location.
  • the above steps S1321-S1327 are performed for each frame of image.
  • whether a capture error occurs may be determined according to the interaction relationship between the two eyes. If it is determined that a capture error occurs according to the interaction relationship between the two eyes, the gaze direction captured according to the eye image of the previous frame is used as the gaze direction corresponding to the current eye image.
  • the interaction relationship generally refers to whether the gaze direction of the left eye and the gaze direction of the right eye can be made by the same person at the same time. For example, the direction of the captured eyes is that the left eye looks up and the right eye looks down, but this situation is not easy for ordinary people to do, so it can be determined that the capture is wrong.
  • the azimuth angle can be calculated according to the optimized zenith angle ⁇ of the three-dimensional pupil center position of each eye. Determine the interaction between the two eyes. Specifically, calculating ⁇ in the three-dimensional pupil center position of two eyes in a pair of eyes, The joint prior distribution of . When the probability value indicated by the joint prior distribution result is lower than the set probability threshold, it is determined that the capture is wrong, and the gaze direction captured according to the eye image of the previous frame is used as the gaze direction corresponding to the eye image. When the probability value indicated by the joint prior distribution result is not lower than the set probability threshold, the eye direction captured according to the eye image is used.
  • the probability value is used to represent ⁇ in the three-dimensional pupil center position of the two eyes, probability of joint occurrence.
  • ⁇ in the three-dimensional pupil center position of the two eyes The joint prior distribution includes the left eye ⁇ , the left eye right eye theta and right eye The joint prior distribution of these 4 variables.
  • ⁇ of the left eye, ⁇ of the left eye right eye theta and right eye These four variables are obtained after the iterative optimization is completed.
  • the range of capturing eye directions is limited by the joint prior distribution to ensure that the captured eye direction conforms to the expression state that ordinary people can make, so as to obtain the expression state that ordinary people can make. to avoid catching abnormal eye directions, such as the left eye's upward direction, the right eye's downward direction, etc.
  • step S132 before step S132 is executed, it is determined whether the actor has closed eyes according to the state of open and closed eyes output in step S131.
  • the gaze direction captured by the eye image of the previous frame is used as the gaze direction corresponding to the eye image. That is, the eye direction is no longer captured based on the eye image of this frame, but the eye direction corresponding to the eye image of the previous frame is used to improve the stability of the eye capture process and ensure the stability of the eye direction state obtained during the eye capture process. Consistency and coherence.
  • the gaze direction of the actor is continuously captured based on multiple frames of eye images, the obtained gaze change obtained by the capture is more consistent with the actual gaze change of the actor.
  • the three-dimensional eye information is the personalized data of the actor.
  • step S12 it is determined whether eyeball calibration has been performed; if eyeball calibration has not been performed, eyeball calibration is performed.
  • the three-dimensional information of the eyes can be closer to the real situation of the actor.
  • the center position of the eyeball, the radius of the eyeball, and the size of the iris in the three-dimensional information of the eye can be obtained through eyeball calibration.
  • the three-dimensional pupil center position expresses the motion state of the pupil center, which can be obtained by optimizing steps S1321 and S1327, and details are not described here.
  • eyeball calibration can be performed in the following manner.
  • the eyeball radius may be the average eyeball radius of an adult. In some embodiments, the eyeball radius may take 12.5mm. It is understandable that, according to the requirements of the actual application scenario, the value of the eyeball radius can also be adjusted adaptively. For example, when the actor is a child, the eyeball radius can be adjusted slightly smaller to try to fit the actual size of the child's eyeball. For another example, actors of different races have different eyeball sizes, and the eyeball radius can also be configured according to the specific race of the actor. Among them, the race can be yellow, white, black, etc., and there can be many types of races. You can select different classification methods according to your needs and configure the corresponding eyeball radius, which is not limited here.
  • FIG. 7 a flowchart of the calibration of the center position of the eyeball in an eyeball calibration in an embodiment of the present invention is given, which may specifically include the following steps:
  • Step S71 acquiring a three-dimensional face of the actor with a neutral expression, and acquiring a plurality of three-dimensional eyelid feature points from the three-dimensional face under the neutral expression.
  • Step S72 calculate the average value of the three-dimensional positions of the multiple three-dimensional eyelid feature points of each eye, and add a preset three-dimensional offset on the basis of the average value of the three-dimensional positions to obtain the eyeball center of each eye. position, the offset direction of the three-dimensional offset is toward the inside of the eye.
  • the number of 3D eyelid feature points of each eye may be 6, 8, or may be other more numbers, and the specific number of 3D eyelid feature points can be set according to actual requirements.
  • the average value of the three-dimensional positions of the selected three-dimensional eyelid feature points of each eyeball is calculated.
  • a preset 3D offset is added on the basis of the average value of the 3D position of the 3D eyelid feature points, where the offset direction of the 3D offset is towards the inside of the eye to simulate the real The offset of the eyelid and the center of the eyeball, so that the three-dimensional offset can be used to perform a three-dimensional offset to the inside of the eye, and the three-dimensional position after the offset is used as the center of the eyeball of each eye.
  • the center position of the eyeball relative to the three-dimensional face is obtained.
  • obtaining three-dimensional eye information of an actor in an embodiment of the present invention is given.
  • the obtaining three-dimensional eye information of the actor may specifically include:
  • Step S81 acquiring a face image corresponding to the actor's eye image.
  • Step S82 obtaining a transformation matrix of the actor's facial posture according to the facial image, where the facial posture is the posture of the actor's face relative to the camera.
  • the facial pose can be obtained through 3D face reconstruction.
  • the 3D face model is jointly determined by facial pose and expression parameters, wherein the expression parameters define the 3D face model in the face coordinate system, Facial pose converts the 3D face model in the face coordinate system to the camera coordinate system or other specified coordinate system, so by reconstructing the 3D model of the face in the facial image, the facial pose can be calculated.
  • the facial pose can also be predicted by a deep learning algorithm, that is, the facial image is input into a deep learning network to predict the facial pose.
  • the facial posture is the posture of the actor's face relative to the camera, that is, the position and orientation of the actor's face relative to the camera.
  • the facial image corresponding to the actor's eye image is obtained according to the following method: the actor's head wears a facial expression capturing helmet, and the helmet is relatively stationary with the actor's head; A facial expression capture camera is mounted on the helmet, and the camera captures the actor's facial expressions.
  • the pose transformation matrix is a fixed value for the face image of any frame.
  • the helmet and the actor's head are relatively stationary in each acquisition, that is, the relative positions and relative orientations are fixed, and one acquisition is defined as the acquisition from after the actor wears the helmet to before taking off the helmet.
  • the transformation matrix of the facial pose is a fixed value, that is, after the pose transformation matrix is obtained, the transformation matrix of the facial pose can be used in subsequent pictures without further calculation.
  • the facial image corresponding to the eye image of the actor is obtained according to the following method: the facial expression of the actor is photographed with a camera; the camera is separated from the actor's head.
  • the transformation matrix of the facial posture for any frame of facial images is changed, and the posture transformation matrix needs to be recalculated for each frame of pictures.
  • Step S83 transforming the eyeball center position according to the description to obtain the eyeball center position relative to the camera.
  • the center position of the eyeball relative to the coordinate system where the camera is located is obtained by transforming the center position of the eyeball by using the transformation matrix of the facial posture.
  • the position of the center of the eyeball may be relative to the position of the three-dimensional human face. It should be noted that when the selected reference coordinate system is different, the specific value of the center position of the eyeball is different. Specifically, the reference coordinate system can be selected according to the requirements and converted when necessary.
  • FIG. 9 a flow chart of the calibration of the iris size in an eyeball calibration in an embodiment of the present invention is given, which may specifically include the following steps:
  • Step S91 acquiring a preset number of calibration images that meet the calibration requirements.
  • the actor keeps his eyes open for a preset duration (eg 1 to 2 seconds), after which the actor starts to make expressions.
  • the eyes are open to provide the required calibration image for subsequent calibration. For example, for each image input by a single camera, it can be determined whether the eyeball has been calibrated, and when the eyeball calibration is completed, the calibration status information is output, and the calibration status information is used to indicate that the eyeball calibration has been completed. If the eyeball calibration has not been completed, judge whether the captured image meets the calibration requirements (for example, the eyes are normally open and the eyes are looking forward). Perform eye calibration.
  • the set number of calibration images can be configured in advance according to requirements.
  • step S92 each calibration image is input to the eye network model, and multiple iris masks are predicted to be obtained.
  • Step S93 performing circle fitting on the plurality of iris masks, respectively, to obtain a plurality of circles after the circle fitting.
  • circle fitting is performed on the iris mask respectively, that is, the edge pixels where the iris mask is located are fitted into a circle.
  • Step S94 project the multiple circles to the three-dimensional face of the actor under neutral expression, and calculate the iris size corresponding to the multiple iris masks in the three-dimensional face according to the projection result.
  • Back-projection is the inverse process of camera projection, that is, connecting the camera and a pixel in the picture to generate a ray, and calculating the intersection of the ray and the eyeball as the back-projection point of the picture pixel. If there are 2 intersections, take the intersection that is closer to the camera; if there is no intersection, there is no backprojection point for that pixel.
  • the camera position is at the coordinate origin (0, 0, 0)
  • the coordinates of the picture pixels can be represented by (x, y, f), where (x, y) is the pixel's position in the picture
  • the two-dimensional coordinates in , f is the focal length of the camera, unit: pixel.
  • Step S95 obtaining the iris size according to the iris size corresponding to the multiple iris masks in the three-dimensional face.
  • an average value of iris sizes corresponding to multiple iris masks in a three-dimensional human face may be used as the calibrated iris size.
  • corresponding weights may be configured for each iris size, and each iris size and the corresponding weight may be weighted, and the weighted The result is used as the calibrated iris size.
  • the maximum value and the minimum value may be removed according to the corresponding iris sizes of the multiple iris masks in the three-dimensional face, and then the remaining iris sizes are averaged, and the calculated average value is used as the calibrated average value. iris size.
  • the iris can be approximated as an arc on the eyeball, and the size of the iris can be represented by the radius of the bottom of the arc, or by the angle between the radius of the bottom and the radius of the eyeball.
  • the calibration of the center position of the eyeball and the calibration of the size of the iris can be performed synchronously or asynchronously, which is not limited here.
  • the positions of the three-dimensional pupil centers of the two eyes may be optimized respectively.
  • the optimization process of the three-dimensional pupil center position reference may be made to the descriptions in steps S1321 to S1327 in the above embodiment, which will not be repeated here.
  • the three-dimensional eyeball of the actor is determined according to the three-dimensional information of the actor's eyes.
  • the eye network model and the three-dimensional eyeball are used to determine the center position of the three-dimensional pupil, and the direction of the actor's gaze is captured according to the three-dimensional pupil center position.
  • the embodiment of the present invention captures the eye direction of the actor by using the actor's eye image, the actor's eye three-dimensional information, and the eye network model, without the need for a user. Wearing additional equipment, the capture technology based on a single camera can not only improve the user's comfort when using the device, but also is inexpensive and does not need to be performed in a specific studio, which can effectively reduce the cost of eye capture.
  • the eye three-dimensional information used for determining the three-dimensional eyeball may be the eyeball center position, eyeball radius and iris size after eyeball calibration.
  • the center position of the eyeball after eyeball calibration can also be transformed according to the transformation matrix of the facial pose.
  • training can be performed for a single eye (left eye or right eye) to obtain an eye network model corresponding to a single eye.
  • the sample image of the right (or left) eye as a training sample can be flipped left and right, and then converted into a sample of the left (or right) eye images, so only one model needs to be trained.
  • the corresponding eye image of the right eye will be flipped left and right symmetrically, and the left eye image corresponding to the left eye will be composed of training data.
  • to train the eye network model to train the eye network model.
  • the similarity transformation matrix required for alignment is obtained, wherein the preset eyelid feature point refers to the default expression (also referred to as neutral expressions).
  • the sample images are transformed correspondingly according to the similarity transformation matrix, and the image after the similarity transformation has extremely similar attributes and characteristics to the preset eye image, that is, all sample images have similar rotation, scale and Position, all sample images are adjusted based on the preset eyelid feature points.
  • Similar transformation is performed on the manually marked eyelid feature points in the sample image, so that the manually marked eyelid in the sample image
  • the feature points are as consistent as possible with the position of each corresponding preset eyelid feature point.
  • the difference between each manually marked eyelid feature point and the corresponding preset eyelid feature point is calculated to minimize the difference.
  • Inputting the similar transformed sample images into the deep learning network and training the eye network model can make the network converge fast and the required network is small, reduce the difficulty of training the eye network model, and improve the training efficiency of the eye network model. At the same time, because the difficulty is reduced, a smaller deep learning network can be used, thereby reducing the running time of the entire algorithm.
  • the purpose is to make the network of the eye network model simple, the eye network model is easy to converge during training, the prediction time is short, and the cost is low.
  • FIG. 10 a schematic diagram of an application scenario in an embodiment of the present invention is given.
  • the eye-catching method is used in the action and expression capturing system
  • the actor wears a facial expression capturing helmet 30
  • the helmet 30 is relatively fixed to the actor's head
  • a camera 40 is provided on the helmet 30 .
  • the specific process is as follows:
  • the helmet of the face capture system captures the facial image of the actor
  • a2 Calculate the facial pose transformation matrix of the first frame image according to the facial image and the position of the camera.
  • the transformation matrix of the facial posture is a fixed value, and the subsequent frames follow the facial posture (headpose) transformation matrix corresponding to the first frame image;
  • a3 Take the eye image from the actor's face image.
  • a4 Input the eye image into the eye network model to obtain the iris mask, the center position of the two-dimensional pupil, and the state of open and closed eyes.
  • a5 Determine whether the actor closes his eyes.
  • step a9 If the judgment result is yes, go to step a9; if the judgement result is no, go to step a8.
  • a8 Obtain the eyeball radius, iris size and eyeball center position relative to the face coordinates through eyeball calibration, and convert it into the eyeball center position relative to the camera through the transformation matrix of the facial pose.
  • a9 According to the predicted iris mask, two-dimensional pupil center position, eyeball radius, iris size and eyeball center position relative to camera coordinates, and the estimated three-dimensional pupil center position, the actor’s Eye direction, that is, the zenith angle ⁇ and azimuth angle of spherical coordinates of the three-dimensional pupil center position are obtained
  • a10 Calculate the zenith angle ⁇ and azimuth angle in the three-dimensional pupil center position of a pair of eyes The joint prior distribution of .
  • a11 Determine the zenith angle ⁇ and azimuth angle in the three-dimensional pupil center position of a pair of eyes Whether the joint prior distribution of satisfies the requirements of the set probability threshold.
  • step a12 is executed.
  • a12 According to the zenith angle ⁇ and azimuth angle in the three-dimensional pupil center position Show the iris in the position of the eyeball.
  • the reconstructed three-dimensional face and eyes are presented on the display interface of the display terminal in FIG. 10 .
  • the presentation effect shown in FIG. 10 is only a schematic illustration, and other deformation modes may also exist.
  • FIG. 12 and FIG. 13 a schematic diagram of another application scenario in the embodiment of the present invention is given.
  • the eye-catching method is based on a single camera system, and the single camera may be the camera 70 of the PC terminal 60, as shown in FIG. 12 .
  • the single camera is the camera of the mobile terminal 50 (eg, cell phone). The position of the single camera and the actor is not fixed.
  • a flowchart of another eye-catching method in the embodiment of the invention is provided in conjunction with FIG. 14 , and the specific process is as follows:
  • the helmet of the face capture system captures the facial image of the actor
  • b2 Calculate the facial pose (headpose) transformation matrix of each frame image according to the facial image and the position of the camera.
  • the transformation matrix of the facial posture is changing, and it may not be a fixed value, and each frame of image needs to recalculate the facial posture (headpose) transformation matrix;
  • b3 Capture the eye image from the actor's face image.
  • b4 Put the eye image into the eye network model to obtain the iris mask, the center position of the two-dimensional pupil, and the state of open and closed eyes.
  • step b4 For the specific implementation method of step b4, reference may be made to the descriptions in steps S1311-S1315, which will not be repeated here.
  • execute b6 that is, use the eye direction corresponding to the previous frame of image; if the judgment result is no, execute step b7.
  • step b9 If the judgment result is yes, go to step b9; if the judgment result is no, go to step b8.
  • b8 Obtain the eyeball radius, iris size, and eyeball center position relative to the face coordinates through eyeball calibration, and convert it into the eyeball center position relative to the camera through the transformation matrix of the facial pose.
  • step b13 the capture result of the gaze direction of the previous frame is used, that is, step b13 is executed. If the judgment result is yes, execute b12.
  • b12 According to the zenith angle ⁇ and azimuth angle in the three-dimensional pupil center position Show the iris in the position of the eyeball.
  • the reconstructed three-dimensional face and eyes are displayed on the display interface of the PC terminal 60 in FIG. 12 .
  • the reconstructed three-dimensional face and eyes are presented on the display interface of the mobile terminal 50 as shown in FIG. 13 . It can be understood that, the presentation effects shown in FIGS. 12 and 13 are only schematic descriptions, and other modifications may also exist.
  • the above two application scenarios can be used for the generation of virtual character performance animation and the live broadcast of virtual characters. According to the result of capturing the eye direction, eyeballs and iris can be added to the face of the virtual character, so that the virtual character has a similar eye direction to the actor, and then the expression and intention of the actor can be more accurately conveyed.
  • FIG. 15 Schematic diagram of the application scenario.
  • the scene illustrated in FIG. 15 is to use a mobile terminal 50 (such as a mobile phone) to collect an eye image of the user, and then determine the eye focus area according to the determined eye direction.
  • the scene shown in FIG. 16 is that the camera 70 of the PC terminal 60 collects the eye image of the user, and then determines the eye attention area according to the determined eye direction.
  • the scene shown in FIG. 17 is that the user wears a helmet, the helmet 30 is relatively fixed to the user's head, and a camera 40 is arranged on the helmet.
  • the camera 40 collects the user's eye image, and then determines the eye focus area according to the determined eye direction.
  • the direction in which the eyes are fixed is often the direction in which the object of most interest is located. Through eye-catching, you can accurately capture the direction the user is looking at, and capture the things that the user is interested in, so as to know the user's preferences, interests and intentions, so as to deliver the user's personalized products in a targeted manner.
  • the direction of the user's gaze is detected, so as to find that the user is very interested in sports, which improves the user's character setting, that is, the user is a sports fan and will interact in the future.
  • the eye-catching device provided by the embodiment of the present invention can perform eye-catching on offline videos or images, and can also perform on-line eye-catching on actors in real time.
  • the embodiment of the present invention greatly improves the accuracy of facial expression capture based on a single camera, and can vividly and effectively convey the real emotion and intention of the human face.
  • it provides an algorithm basis for core AI technologies such as single-camera online virtual live broadcast technology, single-camera intelligent interaction technology, and face recognition technology, which can be used in movies, games, criminal investigation, surveillance and other fields.
  • An embodiment of the present invention also provides an eye-catching device.
  • an eye-catching device 140 in an embodiment of the present invention is given, which may specifically include:
  • an acquisition unit 141 used to acquire the eye image of the actor
  • the three-dimensional eyeball determination unit 142 is configured to obtain the three-dimensional information of the actor's eyes, and determine the three-dimensional eyeball of the actor according to the three-dimensional eye information.
  • the three-dimensional eye information at least includes: the center position of the eyeball, the radius of the eyeball and the iris. size;
  • the eye-catching unit 143 is configured to use the eye network model and the three-dimensional eyeball to determine the center position of the three-dimensional pupil according to the eye image, and capture the gaze direction of the actor according to the center position of the three-dimensional pupil.
  • the eye-catching unit 143 is configured to obtain two-dimensional eye information by using the eye network model according to the eye image, where the two-dimensional eye information at least includes: iris mask, two-dimensional The center position of the pupil and the state of open and closed eyes; the center position of the three-dimensional pupil is determined according to the two-dimensional information of the eye and the three-dimensional eyeball.
  • the eye eye capturing unit 143 is configured to acquire multiple two-dimensional eyelid feature points corresponding to the eye image; Similar transformation matrix during point alignment; using the similarity transformation matrix to perform similarity transformation on the eye image to obtain a transformed image; inputting the transformed image into the eye network model to predict the transformed image Corresponding two-dimensional eye information; using the inverse matrix of the similarity transformation matrix to transform the two-dimensional eye information corresponding to the transformed image to obtain the two-dimensional eye information corresponding to the eye image.
  • the eye eye capturing unit 143 is configured to obtain the estimated iris of the three-dimensional eyeball according to the three-dimensional eyeball and the estimated center position of the three-dimensional pupil; project the estimated iris to the eye image
  • the corresponding two-dimensional plane obtains an estimated iris mask; Calculate the first difference between the estimated iris mask and the iris mask predicted by the eye network model; Calculate the total difference according to the first difference; If the total difference is not greater than a preset first threshold, the estimated three-dimensional pupil center position is used as the three-dimensional pupil center position.
  • the eye-catching device 140 further includes: an optimization unit, configured to adjust and iteratively optimize the estimated three-dimensional pupil center position according to the total difference if the total difference is greater than a preset first threshold , until the total difference is not greater than the preset first threshold or the number of iterations reaches the set number of times, and the estimated three-dimensional pupil center position when the total difference is not greater than the preset first threshold or the number of iterations reaches the set number of times is used as the The three-dimensional pupil center position.
  • an optimization unit configured to adjust and iteratively optimize the estimated three-dimensional pupil center position according to the total difference if the total difference is greater than a preset first threshold , until the total difference is not greater than the preset first threshold or the number of iterations reaches the set number of times, and the estimated three-dimensional pupil center position when the total difference is not greater than the preset first threshold or the number of iterations reaches the set number of times is used as the The three-dimensional pupil center position.
  • an optimization unit is used to project the estimated iris to a two-dimensional plane corresponding to the eye image to obtain an estimated two-dimensional pupil center position; calculate the estimated two-dimensional pupil center position and the estimated two-dimensional pupil center position The second difference between the two-dimensional pupil center positions predicted by the eye network model; the total difference is calculated according to the first difference and the second difference.
  • the optimization unit is configured to calculate the third difference between the three-dimensional pupil center position optimized by the current iteration and the three-dimensional pupil center position at the time of optimization; according to the first difference, the second difference and the The third difference is calculated to obtain the total difference.
  • an optimization unit configured to calculate the intersection of the estimated iris mask and the iris mask predicted by the eye network model, and the estimated iris mask and the iris predicted by the eye network model For the union part of the mask, the difference between the ratio of the intersection part and the union part and the ideal ratio is used as the first difference; or, the generated distance transformation map of the iris mask predicted according to the eye network model , calculate the value of the edge pixel of the estimated iris mask in the distance transformation map, and obtain the first difference according to the calculated value.
  • the eye-catching device 140 may further include an eyeball calibration unit, and the three-dimensional information of the actor's eyes includes: obtaining the eyeball center position, eyeball radius and iris size by performing eyeball calibration by the eyeball calibration unit.
  • the eyeball calibration unit is used to obtain the three-dimensional face of the actor under neutral expression, and obtain a plurality of three-dimensional eyelid feature points from the three-dimensional face under the neutral expression;
  • the average value of the three-dimensional positions of the plurality of three-dimensional eyelid feature points of the eyes, adding a preset three-dimensional offset on the basis of the average value of the three-dimensional positions to obtain the center position of the eyeball of each eye, and the three-dimensional offset is obtained.
  • the offset direction of the shift amount is towards the inside of the eye.
  • the obtaining unit 141 is configured to obtain a face image corresponding to the eye image of the actor; according to the face image, obtain a transformation matrix of the actor's facial posture, where the facial posture is the The pose of the actor's face relative to the camera; the center position of the eyeball is transformed according to the transformation matrix of the facial pose to obtain the center position of the eyeball relative to the camera.
  • the facial image corresponding to the actor's eye image is obtained according to the following method: the actor's head wears a facial expression capturing helmet, and the helmet is relatively stationary with the actor's head; A facial expression capture camera is mounted on the helmet, and the camera captures the actor's facial expressions.
  • the transformation matrix of the facial pose is a fixed value for the facial image of any frame.
  • the facial image corresponding to the actor's eye image is obtained according to the following method: using a camera to photograph the actor's facial expression; the camera is separated from the actor's head.
  • the transformation matrix of the facial pose varies for any frame of the facial image.
  • the eyeball calibration unit is configured to obtain a preset number of calibration images that meet calibration requirements; input each calibration image into the eye network model, and predict to obtain a plurality of iris masks;
  • the iris masks are respectively fitted with circles to obtain multiple circles after the circle fitting; the multiple circles are respectively projected to the three-dimensional face of the actor under neutral expressions, and multiple irises are calculated according to the projection results.
  • the iris size corresponding to the mask in the three-dimensional face; the iris size is obtained according to the corresponding iris sizes of the iris masks in the three-dimensional face.
  • the eyeball calibration unit is configured to use an average value of iris sizes corresponding to multiple iris masks in the three-dimensional human face as the iris size.
  • the eye network model is for one of a pair of eyes, and when the eye image input to the eye network model is the other eye of the pair of eyes, the input eye image is processed Symmetrically flipped, and use the symmetrically flipped eye image as the input of the eye network model.
  • the eye-catching device 140 may further include a first judging unit, and the first judging unit is configured to, before determining the center position of the three-dimensional pupil according to the two-dimensional information of the eye and the three-dimensional eyeball, determine the center position of the three-dimensional pupil according to the The eye opening and closing state determines whether the actor has closed eyes; when the eye opening and closing state indicates that the eyes are closed, the gaze direction captured according to the eye image of the previous frame is used as the gaze direction corresponding to the eye image.
  • the eye-catching device 140 may further include a computing unit and a second judging unit, the computing unit is configured to calculate a pair of eyes after capturing the corresponding three-dimensional pupil center positions of each eye in a pair of eyes
  • the zenith angle ⁇ and azimuth angle in the 3D pupil center position The joint prior distribution of , the three-dimensional pupil center position includes: eyeball radius, zenith angle ⁇ and azimuth angle
  • the second judging unit is used to judge whether the joint prior distribution result is lower than the set probability threshold. When the probability value indicated by the joint prior distribution result is lower than the set probability threshold, it is judged that the capture is wrong.
  • the gaze direction captured by the image is used as the gaze direction corresponding to the eye image.
  • the eye-catching unit 143 is configured to determine the direction in which the center of the eyeball points to the center of the three-dimensional pupil, and use the direction as the direction of the actor's gaze.
  • the eye-catching device 140 may be integrated into computing devices such as terminals and servers.
  • the eye capture device 140 may be centrally integrated within the same server.
  • the eye-catching device 140 may be integrated in a plurality of terminals or servers in a decentralized manner and coupled to each other.
  • the three-dimensional eye model can be separately set on a terminal or a server to ensure a better data processing speed.
  • the user obtains the eye image to be processed on the side of the acquiring unit 141 , and then the eye direction of the actor can be captured at the output end of the eye-catching unit 143 , thereby To achieve the actor's eye capture.
  • An embodiment of the present invention further provides a storage medium, the storage medium is a non-volatile storage medium or a non-transitory storage medium, and a computer program is stored thereon, and the computer program executes any one of the foregoing implementations when the computer program is run by a processor
  • the example provides the steps of the eye-catching method.
  • An embodiment of the present invention further provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes any one of the above-mentioned eyes when running the computer program Capture the steps of the method.

Abstract

一种眼神捕捉方法及装置、存储介质、终端,所述眼神捕捉方法,包括:获取演员的眼部图像;获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。上述方案能够降低眼神捕捉的成本以及提高用户体验。

Description

眼神捕捉方法及装置、存储介质、终端
本申请要求2021年3月18日提交中国专利局、申请号为202110290851.4、发明名称为“眼神捕捉方法及装置、存储介质、终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及眼神捕捉领域,尤其涉及一种眼神捕捉方法及装置、存储介质、终端。
背景技术
人脸动画是当前很多热门应用的重要组成部分,比如电影、游戏、虚拟现实等。目前通常直接捕捉真实演员的面部并生成虚拟的三维人脸模型。眼睛作为人脸中最能传达情感信息的器官,在人脸捕捉技术中发挥着至关重要的作用。能否捕捉到精细的眼球运动(即眼神),是能否准确传达演员的意图和感受的关键。除此之外,眼神捕捉在智能交互中也发挥着极其重要的作用,通过眼神捕捉,可以准确的捕捉到用户紧盯的方向,并捕捉到用户所感兴趣的事物。
当前的眼神捕捉技术通常是基于红外设备的,用户需要佩戴特制的眼镜或者布置特定的红外设备。然而,这种眼神捕捉方式给用户带来了极大的不舒适感并且成本很高,且通常要到指定的工作室进行采集。基于红外设备的技术,极大的阻碍了眼神捕捉技术的发展和推广。
发明内容
本发明实施例解决的技术问题是眼神捕捉的成本较高以及用户体验差。
为解决上述技术问题,本发明实施例提供一种眼神捕捉方法,包括:获取演员的眼部图像;获取所述演员的眼部三维信息,根据所述 眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
可选的,所述根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,包括:根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息,所述眼部二维信息至少包括:虹膜掩膜、二维瞳孔中心位置及睁闭眼状态;根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置。
可选的,所述根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息,包括:获取所述眼部图像对应的多个二维眼皮特征点;计算将所述多个二维眼皮特征点与多个预设二维眼皮特征点对齐时的相似变换矩阵;采用所述相似变换矩阵对所述眼部图像进行相似变换,得到变换后的图像;将所述变换后的图像输入至所述眼睛网络模型,预测变换后的图像对应的眼部二维信息;采用所述相似变换矩阵的逆矩阵,对所述变换后的图像对应的眼部二维信息进行变换,得到所述眼部图像对应的眼部二维信息。
可选的,所述根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置,包括:根据所述三维眼球和预估三维瞳孔中心位置,得到所述三维眼球的预估虹膜;将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估虹膜掩膜;计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异;根据所述第一差异计算得到总差异;若所述总差异不大于预设第一阈值,则将所述预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
可选的,所述眼神捕捉方法,还包括:若所述总差异大于预设第一阈值,根据所述总差异对所述预估三维瞳孔中心位置进行调整并迭代优化,直至所述总差异不大于预设第一阈值或者迭代次数达到设定次数,将所述总差异不大于预设第一阈值或者迭代次数达到设定次数 时的预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
可选的,所述根据所述第一差异计算得到总差异,包括:将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估二维瞳孔中心位置;计算所述预估二维瞳孔中心位置与所述眼睛网络模型预测的二维瞳孔中心位置之间的第二差异;根据所述第一差异及所述第二差异计算得到所述总差异。
可选的,所述根据所述第一差异和第二差异计算得到总差异,包括:计算当前迭代优化的三维瞳孔中心位置与优化初始时的三维瞳孔中心位置之间的第三差异;根据所述第一差异、所述第二差异及所述第三差异,计算得到所述总差异。
可选的,所述计算预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异,包括:计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的交集部分,以及所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的并集部分,将所述交集部分与所述并集部分的比值与理想比值的差异作为所述第一差异;或者,根据所述眼睛网络模型预测的虹膜掩膜的生成距离变换图,计算所述预估虹膜掩膜的边缘像素在所述距离变换图中的值,根据计算得到的值得到所述第一差异。
可选的,所述获取所述演员的眼部三维信息,包括:通过眼球校准获得所述眼球中心位置、眼球半径及虹膜尺寸。
可选的,所述通过眼球校准获得所述眼球中心位置,包括:获取所述演员在中性表情下的三维人脸,从所述中性表情下的三维人脸中获取多个三维眼皮特征点;计算每只眼睛的所述多个三维眼皮特征点的三维位置的平均值,在所述三维位置的平均值的基础上加上预设的三维偏移量得到每只眼睛的眼球中心位置,所述三维偏移量的偏移方向朝向眼睛内部。
可选的,所述获取所述演员的眼部三维信息,包括:获取与所述 演员的眼部图像对应的面部图像;根据所述面部图像,获得所述演员的面部姿态的变换矩阵,所述面部姿态为所述演员面部相对于相机的姿态;根据所述面部姿态的变换矩阵对所述眼球中心位置进行变换,得到相对于相机的眼球中心位置。
可选的,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:所述演员的头部佩戴面部表情捕捉头盔,所述头盔与所述演员的头部相对静止;所述头盔上安装有面部表情捕捉相机,所述相机捕捉演员面部表情。
可选的,对于任一帧的面部图像所述面部姿态的变换矩阵是固定值。
可选的,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:利用摄像机拍摄所述演员的面部表情;所述摄像机与所述演员的头部是分离的。
可选的,对于任一帧的面部图像所述面部姿态的变换矩阵是变化的。
可选的,所述通过眼球校准获得所述虹膜尺寸,包括:获取预设数量且满足校准要求的校准图像;将各校准图像输入至所述眼睛网络模型,预测得到多个虹膜掩膜;对所述多个虹膜掩膜分别进行圆拟合,得到圆拟合后的多个圆形;将所述多个圆形分别投影至所述演员在中性表情下的三维人脸,根据投影结果计算多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸;根据多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸。
可选的,所述根据多个虹膜掩膜在三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸,包括:将多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸的平均值作为所述虹膜尺寸。
可选的,所述眼睛网络模型针对一双眼睛中的其中一只眼睛,当输入至所述眼睛网络模型的眼部图像为一双眼睛中的另一只眼睛时, 对输入的眼部图像进行对称翻转,并将对称翻转后的眼部图像作为所述眼睛网络模型的输入。
可选的,所述眼神捕捉方法,还包括:在根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置之前,根据所述睁闭眼状态判断所述演员是否闭眼;当所述睁闭眼状态指示闭眼时,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
可选的,所述眼神捕捉方法,还包括:在捕捉得到一双眼睛中的每只眼睛分别对应的三维瞳孔中心位置之后,计算一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000001
的联合先验分布,所述三维瞳孔中心位置包括:眼球半径、天顶角θ和方位角
Figure PCTCN2022071905-appb-000002
当联合先验分布结果指示的概率值低于设定概率阈值时,判定捕捉错误,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
可选的,所述根据所述三维瞳孔中心位置捕捉所述演员的眼神方向,包括:确定所述眼球中心位置指向所述三维瞳孔中心位置的方向,将该方向作为所述演员的眼神方向。
本发明实施例还提供一种眼神捕捉装置,包括:获取单元,用于获取演员的眼部图像;三维眼球确定单元,用于获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;眼神捕捉单元,用于根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
本发明实施例还提供一种存储介质,所述存储介质为非易失性存储介质或非瞬态存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述任一种眼神捕捉方法的步骤。
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,所述处理器运行 所述计算机程序时执行上述任一种眼神捕捉方法的步骤。
与现有技术相比,本发明实施例的技术方案具有以下有益效果:
根据演员的眼部三维信息确定演员的三维眼球。根据获取的演员的眼部图像,采用眼睛网络模型和三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉演员的眼神方向。相比需要佩戴特制的眼镜或者布置特定的红外设备进行眼神捕捉而言,本发明实施例通过采用演员的眼部图像,根据演员的眼部三维信息以及眼睛网络模型来捕捉演员的眼神方向,旨在提出一种用户友好的廉价解决方案,无需用户佩戴昂贵的设备,基于单个相机的捕捉技术,不仅可以提高用户使用设备时候的舒适感,而且造价便宜且不需要在特定的工作室进行,可以有效地降低眼神捕捉的成本。
附图说明
图1是本发明实施例中的一种眼神捕捉方法的流程图;
图2是图1中步骤S11的一种具体实施方式的流程图;
图3是图1中步骤S11的另一种具体实施方式的流程图;
图4是图1中步骤S13的一种具体实施方式的流程图;
图5是图4中步骤S131的一种具体实施方式的流程图;
图6是图4中步骤S132的一种具体实施方式的流程图;
图7是本发明实施例中的一种眼球校准中的眼球中心位置的校准流程图;
图8是本发明实施例中的一种演员的眼部三维信息的获取流程图;
图9是本发明实施例中的一种眼球校准中的虹膜尺寸的校准流程图;
图10是本发明实施例中的一种应用场景示意图;
图11是本发明实施例中的又一种眼神捕捉方法的流程图;
图12是本发明实施例中的另一种应用场景示意图;
图13是本发明实施例中的另一种应用场景示意图;
图14是本发明实施例中的再一种眼神捕捉方法的流程图;
图15是本发明实施例中的又一种应用场景示意图;
图16是本发明实施例中的又一种应用场景示意图;
图17是本发明实施例中的又一种应用场景示意图;
图18是本发明实施例中的一种眼神捕捉装置的结构示意图。
具体实施方式
如背景技术所言,现有的眼神捕捉技术通常是基于红外设备的,用户需要佩戴特制的眼镜或者布置特定的红外设备。此种眼神捕捉方式给用户带来了极大的不舒适感并且成本很高。
为解决上述问题,在本发明实施例中,根据演员的眼部三维信息确定演员的三维眼球。根据获取的演员的眼部图像,采用眼睛网络模型和三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉演员的眼神方向。相比需要佩戴特制的眼镜或者布置特定的红外设备进行眼神捕捉而言,本发明实施例通过采用演员的眼部图像、演员的眼部三维信息以及眼睛网络模型来捕捉演员的眼神方向,无需用户佩戴昂贵的设备,基于单个相机的捕捉技术,不仅可以提高用户使用设备时候的舒适感,而且造价便宜且不需要在特定的工作室进行,可以有效地降低眼神捕捉的成本。
为使本发明实施例的上述目的、特征和有益效果能够更为明显易懂,下面结合附图对本发明的具体实施例做详细的说明。
本发明实施例提供一种眼神捕捉方法,参照图1,给出了本发明实施例中的一种眼神捕捉方法的流程图,具体可以包括如下步骤:
步骤S11,获取演员的眼部图像。
步骤S12,获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球。
在具体实施中,所述眼部三维信息可以至少包括:眼球中心位置、眼球半径及虹膜尺寸。
步骤S13,根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
在具体实施中,可以确定所述眼球中心位置指向所述三维瞳孔中心位置的方向,将该方向作为演员的眼神方向。
经研究发现,虹膜的中心位置与三维瞳孔中心位置重合,虹膜在眼球上的具体位置根据三维瞳孔中心位置来确定,因此虹膜的位置会跟着三维瞳孔中心位置的变化而移动,最终呈现出来的眼神方向变化是虹膜在眼球上的位置变化。确定所述眼球中心位置指向所述三维瞳孔中心位置的方向,即为计算眼球中心与三维瞳孔中心所连接产生的射线方向。
在一些实施例中,三维瞳孔中心位置可以采用球坐标
Figure PCTCN2022071905-appb-000003
的方式进行表示,其中,r为三维眼球的半径,θ为天顶角,
Figure PCTCN2022071905-appb-000004
为方位角。此时,在实际捕捉眼神方向时,三维瞳孔中心位置球坐标中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000005
来可以表征眼球中心位置与三维瞳孔中心位置所连接产生的射线方向,故可以采用三维瞳孔中心位置球坐标中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000006
来表示眼神方向。
在具体实施中,由于每个人的眼球均不相同,眼部三维信息可以作为用于描述每个人的眼球的个性化数据,眼部三维信息至少可以包括:眼球中心位置、眼球半径及虹膜尺寸。每个演员的眼部三维信息中的眼球中心位置、眼球半径及虹膜尺寸等的具体取值分别与该演员相对应,不同演员对应的具体取值不同。从而可以实现根据每个演员 的眼部三维信息确定该演员对应的三维眼球。相应地,三维瞳孔中心位置为演员的眼神方向相关,即使同一个演员,不同眼神方向对应的三维瞳孔中心位置不同。
在具体实施中,虹膜尺寸用于表征虹膜的大小。
在具体实施中,步骤S11可以通过多种方式实现,也即可以通过多种方式获取演员的眼部图像。例如,仅拍摄演员的眼部,以得到演员的眼部图像。又如,拍摄演员的面部图像,从面部图像中截取出眼部图像。用于采集眼部图像的图像采集装置可以采用单相机,也可以采用具有图像采集功能的电脑,手机、头盔或者其他终端,此处不做限定。
当从演员的面部图像中截取出眼部图像时,也可以通过多种方式实现,包括并不限于以下几种:
在本发明一实施例中,参照图2,给出了步骤S11的一种具体实施方式的流程图,步骤S11具体可以包括如下步骤S111至步骤S114,通过步骤S111至步骤S114可以从演员的面部图像中截取出眼部图像。
步骤S111,获取所述演员的面部图像。
步骤S112,从所述面部图像中检测得到多个二维眼皮特征点。
在具体实施例中,可以采用深度学习方法(例如CNN网络)对演员的面部图像进行检测,获得二维脸部特征点。其中,二维脸部特征点包括二维眼皮特征点。每只眼睛的二维眼皮特征点可以为6个,也可以为8个,还可以为其他更多数目,具体数目可以根据需求进行配置,只需满足通过设置的多个二维眼皮特征点限定出眼睛的轮廓即可。
步骤S113,根据所述多个二维眼皮特征点在所述面部图像上的位置,确定眼睛在所述面部图像上的位置。
步骤S114,根据所述眼睛在所述面部图像上的位置,从所述面部图像中截取出所述眼部图像。
具体地,例如,获取的为一双眼睛的若干个二维眼皮特征点,若干个二维眼皮特征点包括左眼的二维眼皮特征点以及右眼的二维眼皮特征点。根据一双眼睛的若干个二维眼皮特征点在面部图像上的位置,可以分别截取出左眼的眼部图像以及右眼的眼部图像。具体而言,根据左眼的二维眼皮特征点在面部图像上的位置,从面部图像中截取左眼的眼部图像;根据右眼的二维眼皮特征点在面部图像上的位置,从面部图像中截取右眼的眼部图像。
在本发明另一实施例中,参照图3给出的步骤S11中的另一种具体实施方式的流程图,步骤S11具体可以包括如下步骤S101至步骤S105,可按照以下步骤S101至步骤S105从演员的面部图像中截取出眼部图像。
步骤S101,获取所述演员的面部图像。
步骤S102,获取与所述面部图像对应的三维人脸,从所述三维人脸中提取多个三维眼皮特征点。
在具体实施中,用于重建三维人脸所采用的图像和用于眼神捕捉所采用的图像可用相同的图像采集设备采集得到;用于重建三维人脸所采用的图像和用于眼神捕捉所采用的图像也可用不同的图像采集设备采集得到。
其中,当采用不同的图像采集设备时,可通过图像的采集时间将不同图像采集设备采集的数据做近似匹配。近似匹配指用于重建三维人脸的图像的采集时间与用于眼神捕捉的图像的采集时间之间的间隔时长满足设定时长,以确保用于重建三维人脸的图像中的演员的表情与用于眼神捕捉的图像中的演员的表情之间的表情变化不大,保证眼神捕捉得到的眼神方向的准确度。可根据实际需求配置设定时长的具体取值,对眼神捕捉得到的眼神方向的准确度要求越高,设定时长 越小。
进一步,重建三维人脸不局限于用图像进行重建,还可利用其它方式,例如在演员的脸部贴标记点,通过动捕系统采集演员脸部标记点的位置,重建演员的三维人脸,此时可以将脸部标记点和眼神捕捉所采用的图像根据各自的采集时间做近似匹配。
步骤S103,将所述多个三维眼皮特征点投影至所述面部图像对应的二维平面,得到多个二维投影点。
步骤S104,根据所述多个二维投影点在所述面部图像上的位置,确定眼睛在所述面部图像上的位置。
步骤S105,根据所述眼睛在所述面部图像上的位置,从所述面部图像中截取出所述眼部图像。
在具体实施中,截取的眼部图像的尺寸可以根据需求进行设置,此处不做限定。
进一步,当眼部图像中为单眼对应的图像时,可以从三维人脸中提取左眼的三维眼皮特征点,将左眼的三维眼皮特征点投影在面部图像上得到左眼对应的二维投影点,根据左眼对应的二维投影点在面部图像上的位置,从面部图像中截取左眼的眼部图像;关于右眼的眼部图像的截取过程与左眼类型,可以参照左眼的眼部图像的截取中的描述,此处不再赘述。
可以理解的是,当眼部图像为一双眼睛对应的图像时,可以从三维人脸中提取一双眼睛对应的三维眼皮特征点,将一双眼睛对应的三维眼皮特征点投影至面部图像上得到一双眼睛对应的二维投影点,根据一双眼睛对应的二维投影点在面部图像上的位置,从面部图像中截取一双眼睛的眼部图像。
可以理解的是,还可以采用其他方式获取演员的眼部图像,此处不再一一举例。
进一步地,参照图4,给出了步骤S13的一个具体实施方式的流程图,步骤S13可以包括如下步骤S131至步骤S132。
步骤S131,根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息。
在具体实施中,所述眼部二维信息可以至少包括:虹膜掩膜、二维瞳孔中心位置及睁闭眼状态。
其中,虹膜掩膜用于表征二维瞳孔的信息。
睁闭眼状态用于指示眼睛的睁眼或者闭眼的状态。通过睁闭眼状态,可以有效准确地检测到闭眼,有助于判断当前网络预测的眼部二维信息是否可以用于捕捉眼神方向,如果为闭眼,那么不需要进行眼神捕捉,可以将前一帧的眼部图像对应的眼神方向作为当前的眼部图像对应的眼神方向。
在一些实施例中,睁闭眼状态可以采用采用二进制的0及1进行标识。例如,采用0标识闭眼状态,采用1标识睁眼状态。可以理解的是,也可以采用其他标识来标识睁闭眼状态,此处不再一一举例。
在一些实施例中,眼部二维信息还可以包括二维眼皮特征点。由于眼部图像的分辨率高,采用眼睛网络模型预测的二维眼皮特征点的准确度会比较高,后续基于预测的二维眼皮特征点修正重建的三维人脸中眼睛的形状时,有利于提高修正结果的准确度。
步骤S132,根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置。
进一步地,参照图5,给出步骤S131的一个具体实施方式的流程图,步骤S131可以包括如下步骤S11311至步骤S11315。
步骤S1311,获取所述眼部图像对应的多个二维眼皮特征点。
在本发明一实施例中,可以获得演员的面部图像,检测所述面部图像中的二维面部特征点,其中二维面部特征点包括多个二维眼皮特 征点。
在本发明另一实施例中,获取与所述面部图像对应的三维人脸,从所述三维人脸中提取多个三维眼皮特征点,将所述多个三维眼皮特征点投影至所述面部图像对应的二维平面,得到多个二维投影点。所得到的多个二维投影年即为眼部图像对应的多个二维眼皮特征点。
步骤S1312,计算将所述多个二维眼皮特征点与多个预设二维眼皮特征点对齐时的相似变换矩阵。
在一个具体实施例中,可以根据多个二维眼皮特征点在面部图像的位置将眼部图像截取出来,将眼部图像中的多个二维眼皮特征点与多个预设二维眼皮特征点对齐时的相似变换矩阵。
在具体实施中,多个预设的二维眼皮特征点可以为默认表情下对应的眼皮特征点。其中,默认表情也可以称为中性表情,指无表情的自然状态。其中预设二维眼皮特征点定义在预设眼部图像上。
步骤S1313,采用所述相似变换矩阵对所述眼部图像进行相似变换,得到变换后的图像。
采用所述相似变换矩阵对所述眼部图像进行相似变换,得到变换后的图像。通过变换可以对眼部图像进行旋转、缩放以及位置等的调整,使得变换后的图像的尺寸、位置等满足设定要求,使得变换后的图像对应的二维眼皮特征点与预设眼部图像对应的二维眼皮特征点具有相似的位置、旋转和尺寸等。
步骤S1314,将所述变换后的图像输入至所述眼睛网络模型,预测变换后的图像对应的眼部二维信息。
步骤S1315,采用所述相似变换矩阵的逆矩阵,对所述变换后的图像对应的眼部二维信息进行变换,得到所述眼部图像对应的眼部二维信息。
在具体实施中,眼睛网络模型可以基于深度学习算法训练得到, 以眼部图像为输入,以眼部二维信息为输出。
可以基于卷积神经网络(Convolutional Neural Networks,CNN)训练眼睛网络模型,也可以采用其他类型的深度神经学习算法训练得到。
在具体实施中,考虑到左眼和右眼基本对称,在训练眼睛网络模型时,可以针对单只眼睛(左眼或右眼)为基础进行训练,并得到单只眼睛对应的眼睛网络模型。以单只眼睛为基础训练网络网络模型可以使得眼睛网络模型轻量化,并降低眼睛网络模型的大小,提高眼睛网络模型的运行速度,降低眼睛网络模型的运行时间。以降低对原有系统的帧率的影响。此外,还可以降低运行的成本。
需要说明的是,在眼睛网络模型训练的时候,假定以左(或者右)眼作为基准,可以将作为训练样本的右(或者左)眼的样本图像进行左右对称翻转,进而转换为左(或者右)眼的样本图像,因此只需要训练一个模型即可。例如,以左眼为基础进行训练眼睛网络模型,会把右眼的对应的眼部图像进行左右对称翻转,将左右对称翻转后的右眼眼部图像与左眼对应的眼部图像组成训练数据,对眼睛网络模型进行训练。
进一步,若所述眼睛网络模型根据一双眼睛中的其中一只眼睛的眼部图像训练得到,当输入至所述眼睛网络模型的眼部图像为一双眼睛中的另一只眼睛时,对输入的眼部图像进行对称翻转(如左右对称翻转),并将对称翻转后的眼部图像作为所述眼睛网络模型的输入。只需对眼睛网络模型输出的眼部二维信息再次进行对称变换,即可得到对称翻转前的另一只眼睛的眼部图像的眼部二维信息。
例如,若是基于左眼的眼部图像训练得到眼睛网络模型,针对左眼的眼部图像可以直接输入至眼睛网络模型,输出左眼的眼部二维信息。针对右眼的眼部图像,则需要对右眼的眼部图像进行左右对称翻转,转换成左眼的眼部图像,得到左眼对应的眼部二维信息之后,则采用相似变换矩阵的逆矩阵对左眼对应的眼部二维信息进行转换,得 到右眼对应的眼部二维信息。
进一步地,参照图6,给出步骤S132的一种具体实施方式的流程图,步骤S132可以包括如下步骤S1321至步骤S1327。
步骤S1321,根据所述三维眼球和预估三维瞳孔中心位置得到所述三维眼球的预估虹膜。
其中,可以将前一次迭代得到的三维瞳孔中心位置作为预估的三维瞳孔中心位置。如果当前是第一次迭代,则可以将前一帧图像确定的三维瞳孔中心位置作为预估的三维瞳孔中心位置,也可以将根据默认(向前看)的眼神方向状态的图像确定的三维瞳孔中心位置作为预估三维瞳孔中心位置。
步骤S1322,将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估虹膜掩膜。
步骤S1323,计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异。
其中,计算第一差异的计算方式可以有多种。
例如,计算预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的交集除以并集(Intersection over Union,IOU),根据IOU计算结果得到第一差异。也即计算预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的重叠度,根据重叠度得到第一差异。进一步,可以根据重叠度与理想重叠度的差异得到第一差异,当理想重叠度取完全重叠时,IOU为1。具体而言,计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的交集部分,以及所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的并集部分,将所述交集部分与所述并集部分的比值与理想比值的差异作为所述第一差异。其中,理想比值即为理想重叠度下的比值,当理想重叠度取完全重叠时,理想比值取1。
又如,根据所述眼睛网络模型预测的虹膜掩膜的生成距离变换图,计算所述预估虹膜掩膜的边缘像素在所述距离变换图中的值,根 据计算得到的预估虹膜掩膜的边缘像素在所述距离变换图中的值得到所述第一差异。例如,将预估虹膜掩膜的所有边缘像素在所述距离变换图中的值之和,作为第一差异。
具体地,在距离变换图中,每个像素的值表示该像素与最近的前景像素的距离。本实施例中,前景即为眼睛网络模型预测的虹膜掩膜。当预估虹膜掩膜的边缘像素落入眼睛网络模型预测的虹膜掩膜中时,那么该边缘像素在距离变换图中的取值即为0;当预估虹膜掩膜的边缘像素不在眼睛网络模型预测的虹膜掩膜中时,那么该边缘像素在距离变换图中的取值则大于0。
步骤S1324,根据第一差异计算总差异。
为提高三维瞳孔中心位置的确定的准确度,在本发明一些实施例中,将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估二维瞳孔中心位置;计算所述预估二维瞳孔中心位置与所述眼睛网络模型预测的二维瞳孔中心位置之间的第二差异;根据所述第一差异及所述第二差异计算得到所述总差异。
例如,计算第一差异与第二差异的和,将第一差异与第二差异之和作为总差异。
在具体实施例中,为第一差异及第二差异分别配置对应的权重,根据第一差异对应的权重及第二差异对应的权重对第一差异及第二差异进行加权,并根据加权结果得到总差异。具体而言,将第一差异对应的权重与第一差异做乘法运算,得到第一运算结果,将第二差异对应的权重与第二权重做乘法运算,得到第二运算结果,将第一运算结果与第二运算结果之和作为总差异。
在本发明又一些实施例中,计算当前迭代优化的三维瞳孔中心位置与优化初始时的三维瞳孔中心位置之间的第三差异,根据所述第一差异、所述第二差异及所述第三差异,计算得到所述总差异。
其中,可以将所述第一差异、所述第二差异及所述第三差异之和 作为所述总差异。也可以为第一差异、第二差异及第三差异分别配置对应的权重,根据第一差异对应的权重、第二差异对应的权重及第三差异对应的权重,分别对第一差异、第二差异及第三差异进行加权,并根据加权结果得到总差异。具体而言,将第一差异对应的权重与第一差异做乘法运算,得到第四运算结果,将第二差异对应的权重与第二差异做乘法运算得到第五运算结果,将第三差异对应的权重与第三权重做乘法运算,得到第六运算结果,将第四运算结果、第五运算结果与第六运算结果之和作为总差异。
第三差异可以用于表征瞳孔的移动情况,由于两帧图像的采集时间的间隔时长较短,在该间隔时长内,通常演员的瞳孔的移动幅度较小,反应至相邻帧中的眼部图像,相邻帧之间的瞳孔的位置变化幅度较小。若第三差异较大,则表征瞳孔移动过快。当总差异大于预设第一阈值时,对三维瞳孔中心位置进行迭代优化时,第三差异可以约束用于迭代优化的优化函数在寻找解(三维瞳孔中心位置中的θ及
Figure PCTCN2022071905-appb-000007
)的时候,在初始值的邻域内搜索。若所寻找的解(三维瞳孔中心位置中的θ及
Figure PCTCN2022071905-appb-000008
)不在邻域内,则第三差异较大,可以促使优化函数回到初始值的邻域,提高眼神捕捉效率。其中初始值来自于根据前一帧眼部图像捕捉的眼神方向或者根据眼睛默认向前看的眼部图像所捕捉的眼神方向。
步骤S1325,判断总差异是否大于预设第一阈值。
当判断结果为否时,执行步骤S1326。
步骤S1326,将所述预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
进一步地,在一些实施例中,若步骤S1325的判断结果为是时,执行步骤S1327。
步骤S1327,根据所述总差异对所述三维瞳孔中心位置进行调整。
关于三维瞳孔中心位置,可用球坐标的方式
Figure PCTCN2022071905-appb-000009
进行表 示,其中,r为眼球半径、θ为天顶角,
Figure PCTCN2022071905-appb-000010
为方位角。可以采用合成-分析的方式对每只眼睛的三维瞳孔中心位置进行优化,也即步骤S1321至步骤S1327对应的方式。
具体地,眼球半径r可以为预设值。因此,在采用合成-分析的方式对三维瞳孔中心位置进行优化时,可以仅对θ,
Figure PCTCN2022071905-appb-000011
进行优化。
在具体实施中,根据所述总差异对所述预估三维瞳孔中心位置进行调整之后,根据调整后的三维瞳孔中心位置继续执行步骤S1321,也即对三维瞳孔中心位置进行迭代优化,直至所述总差异不大于预设第一阈值或者迭代次数达到设定次数,将所述总差异不大于预设第一阈值或者迭代次数达到设定次数时的预估三维瞳孔中心位置作为所述三维瞳孔中心位置。对于每一帧图像都执行上述步骤S1321-S1327。
在具体实施中,步骤S13捕捉得到所述演员的眼神方向的过程中,可能出现眼神捕捉结果错误的情况,一旦眼神捕捉结果出现错误,影响用户体验。为解决上述问题,在本发明一些非限制性实施例中,可以根据两只眼睛的互动关系,判断是否出现捕捉错误。若根据两只眼睛的互动关系判定出现捕捉错误时,则将根据前一帧眼部图像捕捉的眼神方向作为本次的眼部图像对应的眼神方向。互动关系一般是指是指左眼的眼神方向和右眼的眼神方向是否能够被同一个人同时做出来。例如,捕捉出来的双眼眼神方向为左眼向上看,右眼向下看,但是此种情况对于一般普通人来说是不容易做到的,因此可以判定捕捉错误。
在具体实施中,在捕捉得到一双眼睛中的每只眼睛分别对应的三维瞳孔中心位置之后,可以根据优化后的每只眼睛的三维瞳孔中心位置的天顶角θ,方位角
Figure PCTCN2022071905-appb-000012
确定两只眼睛的互动关系。具体而言,计算一双眼睛中的两只眼睛的三维瞳孔中心位置中的θ,
Figure PCTCN2022071905-appb-000013
的联合先验分布。当联合先验分布结果指示的概率值低于设定概率阈值时,判定捕捉错误,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。当联合先验分布结果指示的概率值不低于设定概率阈 值时,则采用根据所述眼部图像捕捉的眼神方向。其中,概率值用于表示两只眼睛的三维瞳孔中心位置中的θ,
Figure PCTCN2022071905-appb-000014
联合出现的概率。其中,两只眼睛的三维瞳孔中心位置中的θ,
Figure PCTCN2022071905-appb-000015
的联合先验分布包括左眼的θ、左眼的
Figure PCTCN2022071905-appb-000016
右眼的θ和右眼的
Figure PCTCN2022071905-appb-000017
这4个变量的联合先验分布。此时,左眼的θ、左眼的
Figure PCTCN2022071905-appb-000018
右眼的θ和右眼的
Figure PCTCN2022071905-appb-000019
这4个变量为迭代优化完成后所得到的,通过联合先验分布限定捕捉眼神方向的范围,保证捕捉出来的眼神方向状符合一般普通人能够做出来的表情状态以得到符合常规人群能够做出的眼神方向,避免捕捉出异常眼神方向,例如左眼的眼神方向向上,右眼的眼神方向向下等。
为了提高用户体验,在本发明一些非限制性实施例中,在步骤S132执行之前,根据步骤S131输出的睁闭眼状态判断所述演员是否闭眼。当睁闭眼状态指示闭眼时,将前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。也即不再基于本帧眼部图像捕捉眼神方向,而是采用前一帧眼部图像对应的眼神方向,以提高眼神捕捉过程中的稳定性,以及确保眼神捕捉过程中所得到眼神方向状态的一致性和连贯性。当基于多帧眼部图像连续捕捉演员的眼神方向时,使得所得到的捕捉得到的眼神变化与演员的实际眼神变化更加吻合。
在具体实施中,眼部三维信息为演员的个性化数据,为了提高生成的三维眼球的效果,在步骤S12执行之前,判断是否已经进行眼球校准;若未进行眼球校准,则进行眼球校准。
通过眼球校准可以使得眼部三维信息与演员的真实情况更加贴近。在本发明实施例中,眼部三维信息中的眼球中心位置、眼球半径以及虹膜尺寸可以通过眼球校准得到。三维瞳孔中心位置表达了瞳孔中心的运动状态,可通过步骤S1321及步骤S1327进行优化得到,此处不再赘述。
考虑到实际中,眼球的大部分被眼睑覆盖。当闭眼时,眼球基本完全被眼睑所覆盖。当睁眼时,可以在眼睑间的裂缝(睑裂)暴露出部分眼球。从而获取到的眼部图像中,眼球的绝大部分是不可见的, 如果同时校准眼部三维信息中的眼球中心位置、眼球半径以及虹膜尺寸,所得到的眼部三维信息的准确度以及稳定性不太好。
为了提高眼部三维信息的校准结果的稳定性,可以通过如下方式进行眼球校准。
关于眼球半径的校准,经研究发现,通常各成年人的眼球的大小之间的差异较小。故在本发明实施例中,眼球半径可以取成年人的平均眼球半径。在一些实施例中,眼球半径可以取12.5mm。可以理解的是,根据实际应用场景的需求,眼球半径的取值也可以做适应性的调整。例如,对于演员为儿童时,眼球半径可以稍微调小些,以尽量与儿童的眼球实际大小相贴合。再如,不同人种的演员的眼球大小不同,也可以根据演员的具体人种配置眼球半径,其中,人种可以为黄色人种、白色人种、黑色人种等,人种分类可以有多种不同的分类方式,具体根据需求进行选择,并配置对应的眼球半径即可,此处不做限定。
关于眼球中心位置的校准,参照图7,给出了本发明实施例中的一种眼球校准中的眼球中心位置的校准流程图,具体可以包括如下步骤:
步骤S71,获取所述演员在中性表情下的三维人脸,从所述中性表情下的三维人脸中获取多个三维眼皮特征点。
步骤S72,计算每只眼睛的所述多个三维眼皮特征点的三维位置的平均值,在所述三维位置的平均值的基础上加上预设的三维偏移量得到每只眼睛的眼球中心位置,所述三维偏移量的偏移方向朝向眼睛内部。
在具体实施中,每只眼睛的三维眼皮特征点的数目可以为6个,也可以为8个,或者可以为其他更多数目,三维眼皮特征点的具体数目可以根据实际需求进行设定。计算所选取的每只眼球的三维眼皮特征点的三维位置的平均值。考虑到三维眼球嵌入在眼睑内,在三维眼 皮特征点的三维位置的平均值的基础上加上预设的三维偏移量,其中三维偏移量的偏移方向朝向眼睛内部,用来模拟真实眼皮与眼球中心的偏移,从而采用三维偏移量可以对平均值进行朝向眼睛内部的三维偏移,将偏移后的三维位置作为每只眼睛的眼球中心位置。此时获得的是相对于三维人脸的眼球中心位置。
在一些实施例中,参照图8,给出了本发明实施例中的一种演员的眼部三维信息的获取流程图,所述获取所述演员的眼部三维信息具体可以包括:
步骤S81,获取与所述演员的眼部图像对应的面部图像。
步骤S82,根据所述面部图像,获得所述演员的面部姿态的变换矩阵,所述面部姿态为所述演员面部相对于相机的姿态。
在一些实施例中,面部姿态可以通过三维人脸重建获得,通常情况下三维人脸模型由面部姿态和表情参数来联合确定,其中表情参数定义了在人脸坐标系下的三维人脸模型,面部姿态将人脸坐标系下的三维人脸模型转换到相机坐标系或者其他指定的坐标系下,所以通过重建面部图片中人脸的三维模型,可以计算出其面部姿态。
在另一些实施例中,面部姿态也可以通过深度学习算法预测得到,即将面部图像输入到深度学习网络中,预测得到面部姿态。所述面部姿态为所述演员面部相对于相机的姿态即为演员面部相对于相机的位置和朝向。
在一实施例中,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:所述演员的头部佩戴面部表情捕捉头盔,所述头盔与所述演员的头部相对静止;所述头盔上安装有面部表情捕捉相机,所述相机捕捉演员面部表情。此时,对于任一帧的面部图像所述姿态变换矩阵是固定值。所述头盔与所述演员的头部在每次采集中相对静止,即相对位置和相对朝向固定,一次采集定义为从所述演员佩戴头盔之后到取下头盔之前的采集。对于每次采集中任何一帧的面部图像 所述面部姿态的变换矩阵是固定值,即在获得姿态变换矩阵之后,可在后续图片中沿用这个面部姿态的变换矩阵,而无需再进行计算。
在另一实施例中,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:利用摄像机拍摄演员的面部表情;所述摄像机与演员的头部是分离的。
进一步,对于任一帧的面部图像所述面部姿态的变换矩阵是变化的,每一帧图片均需重新计算姿态变换矩阵。
步骤S83,根据所述对所述眼球中心位置进行变换,得到相对于相机的眼球中心位置。
具体地,采用面部姿态的变换矩阵对眼球中心位置进行变换获得的是相对于相机所在坐标系的眼球中心位置。
在一些非限制性实施例中,眼球中心位置可以为相对于三维人脸的位置。需要说明的是,所选择参考坐标系不同时,眼球中心位置的具体取值不同,具体可以根据需求选择的参考坐标系,并在需要时进行转换即可。
关于虹膜尺寸的校准,参照图9,给出了本发明实施例中的一种眼球校准中的虹膜尺寸的校准流程图,具体可以包括如下步骤:
步骤S91,获取预设数量且满足校准要求的校准图像。
演员保持眼睛睁开,达到设定预设时长之后(如1秒至2秒),之后演员开始做表情。眼睛睁开是为了后续校准提供所需的校准图像。例如对于单相机输入的每张图像,可以判断眼球是否已经校准,当完成眼球校准,则输出校准状态信息,校准状态信息用于指示已经完成眼球校准。如果没有完成眼球校准,则判断采集的图像是否符合校准要求(如眼睛正常睁开且目视前方),若满足校准要求则存贮该图像,继续获取图像直到获得校准图像达到设定数量后开始进行眼球校准。其中校准图像的设定数量可以预先根据需求进行配置。
步骤S92,将各校准图像输入至所述眼睛网络模型,预测得到多个虹膜掩膜。
步骤S93,对所述多个虹膜掩膜分别进行圆拟合,得到圆拟合后的多个圆形。
具体而言,对虹膜掩膜分别进行圆拟合,即虹膜掩膜所在的边缘像素点拟合成一个圆形。
步骤S94,将所述多个圆形分别投影至所述演员在中性表情下的三维人脸,根据投影结果计算多个虹膜掩膜在三维人脸中对应的虹膜尺寸。
反投影为相机投影的反过程,即连接相机和图片中一个像素产生一条射线,计算该射线与眼球的交点作为该图片像素的反投影点。如果有2个交点,则取距离相机较近的那个交点;如果没有交点,则该像素没有反投影点。通常,以相机为参考坐标系,相机位置在坐标原点(0,0,0),而图片像素的坐标可以用(x,y,f)表示,其中(x,y)为该像素的在图片中的二维坐标,f是相机的焦距,单位:像素。
步骤S95,根据多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸。
在一些实施例中,可以将多个虹膜掩膜在三维人脸中对应的虹膜尺寸的平均值,作为校准后的虹膜尺寸。
在另一些实施例中,可以根据多个虹膜掩膜在三维人脸中对应的虹膜尺寸的取值,分别为各虹膜尺寸配置对应的权重,将各虹膜尺寸与对应的权重进行加权,将加权结果作为校准后的虹膜尺寸。
在又一些实施例中,还可以根据多个虹膜掩膜在三维人脸中对应的虹膜尺寸,去除最大值及最小值,然后将余下的虹膜尺寸进行平均,将计算得到的平均值作为校准后的虹膜尺寸。
在本发明实施例中,虹膜可以近似为眼球上的一个弧面,虹膜尺 寸可以用该弧面的底面半径来表示,或者采用底面半径与眼球半径的夹角表示。
需要说明的是,上述眼球中心位置的校准和虹膜尺寸的校准可以同步执行,也可以异步执行,此处不做限定。
本发明实施例中,可以分别对两只眼睛的三维瞳孔中心的位置进行优化。三维瞳孔中心位置的优化过程可以参见上述实施例中的步骤S1321至步骤S1327中的描述,此处不再赘述。
由上可知,根据演员的眼部三维信息确定演员的三维眼球。根据获取的演员的眼部图像,采用眼睛网络模型和三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉演员的眼神方向。相比需要佩戴特制的眼镜或者布置特定的红外设备进行眼神捕捉而言,本发明实施例通过采用演员的眼部图像、演员的眼部三维信息以及眼睛网络模型来捕捉演员的眼神方向,无须用户佩戴额外的设备,基于单个相机的捕捉技术,不仅可以提高用户使用设备时候的舒适感,而且造价便宜且不需要在特定的工作室进行,可以有效地降低眼神捕捉的成本。
进一步地,在步骤S11中,确定三维眼球所采用的眼部三维信息可以为经过眼球校准后的眼球中心位置、眼球半径和虹膜尺寸。此外,经过眼球校准后的眼球中心位置还可以根据面部姿态的变换矩阵进行变换。
下面对眼睛网络模型的训练过程进行说明。
在具体实施中,考虑到人的左右眼是相互对称的,在训练眼睛网络模型时,可以针对单只眼睛(左眼或右眼)进行训练,得到单只眼睛对应的眼睛网络模型。在训练眼睛网络模型时,假定以左(或者右)眼作为基准,可以将作为训练样本的右(或者左)眼的样本图像进行对左右称翻转,进而转换为左(或者右)眼的样本图像,因此只需要训练一个模型即可。例如,以左眼为基础进行训练眼睛网络模型,会 把右眼的对应的眼部图像进行左右对称翻转,将左右对称翻转后的右眼眼部图像与左眼对应的眼部图像组成训练数据,对眼睛网络模型进行训练。
根据每张样本图像中人工标注的眼皮特征点的位置与预设眼皮特征点的位置进行对齐,获得对齐所需的相似变换矩阵,其中,预设眼皮特征点指默认表情下(也可称为中性表情)的特征点。根据该相似变换矩阵对样本图像进行对应的变换,完成相似变换后的图像与所述预设眼部图像具有极度相似的属性和特点,即通过对齐使得所有的样本图像具有相似的旋转、尺度和位置,所有的样本图像都是以预设眼皮特征点为基础进行调整,以预设眼皮特征点为基准,对样本图像中人工标注的眼皮特征点进行相似变换,使得样本图像中人工标注的眼皮特征点尽可能的与每个对应的预设眼皮特征点的位置一致。计算每个人工标注的眼皮特征点与对应的预设眼皮特征点的差异,使得差异最小。
将相似变换后的样本图像输入至深度学习网络中,进行眼睛网络模型训练,可以使得网络收敛速度快以及所需网络小,降低眼睛网络模型的训练的难度,提高眼睛网络模型的训练效率。同时因为难度降低,可以使用更小的深度学习网络,从而降低了整个算法的运行时间。目的是使得眼睛网络模型的网络简洁,训练的时候眼睛网络模型容易收敛,预测时间短,成本较低。
为便于本领域技术人员更好的理解和实现本发明实施例中,下面结合具体场景对本发明实施例提供的眼神捕捉方法进行说明:
参照图10,给出了本发明实施例中的一种应用场景示意图。在该场景A中,眼神捕捉方法用于动作和表情捕捉系统,演员头戴面部表情捕捉头盔30,头盔30与演员头部相对固定,头盔30上设置有相机40。结合图11给出的发明实施例中的又一种眼神捕捉方法的流程图,具体流程如下:
a1:面部捕捉系统的头盔拍摄演员的面部图像;
a2:根据面部图像和相机的位置计算第一帧图像的面部姿态(headpose)变换矩阵。
在本实施例中,面部姿态的变换矩阵是固定值,后续帧沿用第一帧图像对应的面部姿态(headpose)变换矩阵;
a3:根据演员的面部图像截取眼部图像。
a3的具体实现方法可参见步骤S111-S114或者S101-S105中的描述,此处不做赘述。
a4:将眼部图像输入至眼睛网络模型中,得到虹膜掩膜、二维瞳孔中心位置及睁闭眼状态。
a4的具体方法可参见步骤S1311-S1315中的描述,此处不做赘述。
a5:判断演员是否闭眼。
如判断结果为是,执行a6,沿用上一帧图像对应的眼神方向;若判断结果为否,执行步骤a7。
a7:判断眼球是否已进行校准。
如判断结果为是,则进行步骤a9;若判断结果为否,则进行步骤a8。
a8:通过眼球校准获得眼球半径、虹膜尺寸以及相对于人脸坐标的眼球中心位置,并通过面部姿态的变换矩阵转换成相对于相机的眼球中心位置。
a8的具体实现方法参照步骤S71-S72、S91-S95及S81-S83中的描述,此处不做赘述。
a9:根据预测得到的虹膜掩膜、二维瞳孔中心位置,眼球半径、虹膜大小和相对于相机坐标的眼球中心位置,以及预估三维瞳孔中心位置采用合成-分析的方法捕捉得到所述演员的眼神方向,即得到三 维瞳孔中心位置球坐标的天顶角θ和方位角
Figure PCTCN2022071905-appb-000020
a9的具体实现方法参照步骤S1321-S1327中的描述,此处不做赘述。
a10:计算一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000021
的联合先验分布。
a11:判断一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000022
的联合先验分布是否符合设定概率阈值要求。
若判断结果为否,不符合设定概率阈值要求,则采用前一帧的眼神方向的捕捉结果,也即执行a13。若判断结果为是,执行步骤a12。
a12:根据三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000023
将虹膜在眼球的位置呈现出来。
例如,图10中显示终端的显示界面上呈现的即为重建的三维人脸和眼神。图10中示意的呈现效果即为示意性说明,还可以存在其他变形方式。
参照图12及图13,给出了本发明实施例中的另一种应用场景示意图。在该场景B中,眼神捕捉方法基于单相机系统,单相机可以为PC端60的相机70,如图12所示。或者单相机是移动终端50(如手机)的相机。单相机与演员的位置不固定。结合图14给出了发明实施例中的又一种眼神捕捉方法的流程图,具体流程如下:
b1:面部捕捉系统的头盔拍摄演员的面部图像;
b2:根据面部图像和相机的位置计算每一帧图像的面部姿态(headpose)变换矩阵。
其中,面部姿态的变换矩阵是变化的,可以不是固定值,每一帧图像均需重新计算面部姿态(headpose)变换矩阵;
b3:根据演员的面部图像截取眼部图像。
b3的具体实现方法可参见步骤S111-S114或者S101-S105中的描述,此处不做赘述。
b4:将眼部图像放到眼睛网络模型中,得到虹膜掩膜、二维瞳孔中心位置及睁闭眼状态。
步骤b4的具体实现方法可参见步骤S1311-S1315中的描述,此处不做赘述。
b5:判断演员是否闭眼。
若判断结果为是,执行b6,即沿用上一帧图像对应的眼神方向;若判断结果为否,执行步骤b7。
b7:判断眼球是否已进行校准。
若判断结果为是,则进行步骤b9;若判断结果为否,则进行步骤b8。
b8:通过眼球校准获得眼球半径、虹膜尺寸以及相对于人脸坐标的眼球中心位置,并通过面部姿态的变换矩阵转换成相对于相机的眼球中心位置。
b8的具体实现方法参照步骤S71-S72、S91-S95或者S81-S83中的描述,此处不做赘述。
b9:根据预测得到的虹膜掩膜、二维瞳孔中心位置,眼球半径、虹膜大小和相对于相机坐标的眼球中心位置,以及预估三维瞳孔中心位置采用合成-分析的方法捕捉得到所述演员的眼神方向,即得到三维瞳孔中心位置球坐标的天顶角θ和方位角
Figure PCTCN2022071905-appb-000024
b9的具体实现方法参照步骤S1321-S1327中的描述,此处不做赘述。
b10:计算一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000025
的联合先验分布。
b11,判断一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000026
的联合先验分布是否符合设定概率阈值要求。
若判断结果为否,不符合设定概率阈值要求,则采用前一帧的眼神方向的捕捉结果,也即执行步骤b13。若判断结果为是,执行b12。
b12:根据三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000027
将虹膜在眼球的位置呈现出来。
例如,图12中PC端60的显示界面上呈现的即为重建的三维人脸和眼神。如图13的移动终端50的显示界面上呈现的即为重建的三维人脸和眼神。可以理解的是,图12及13中示意的呈现效果即为示意性说明,还可以存在其他变形方式。
上述两种应用场景均可用于虚拟角色表演动画的生成和虚拟角色的直播。根据眼神方向的捕捉结果,可以将眼球和虹膜添加到虚拟角色的脸上,使得虚拟角色具备和所述演员相似的眼神方向,进而能够更加准确的传达所述演员的表情和意图。
除此之外,眼神捕捉在智能交互中也发挥着极其重要的作用,本发明实施例提供的眼神捕捉方法还可用于智能交互,如图15至图17所示的本发明实施例中的不同的应用场景示意图。图15示意的场景为采用移动终端50(如手机)采集用户的眼部图像,进而根据确定的眼神方向确定眼神关注区域。图16示意的场景为PC端60的相机70采集用户的眼部图像,进而根据确定的眼神方向确定眼神关注区域。图17示意的场景为用户佩戴头盔,头盔30,头盔30与用户头部相对固定,头盔上设置有相机40,通过相机40采集用户的眼部图像,进而根据确定的眼神方向确定眼神关注区域。通常眼睛紧盯的方向,常常是最感兴趣的物体所在的方向。通过眼神捕捉,可以准确的捕捉到用户紧盯的方向,并捕捉到用户所感兴趣的事物,从而获知用户的喜好、兴趣和意图,以有针对性的投放用户个性化的产品。
例如,用户在盯着屏幕中的广告时,通过检测到用户目光的方向, 从而发现用户对体育运动非常感兴趣,完善了该用户的人物设定,即用户是个体育迷,进行在以后的交互中提供该用户感兴趣的体育比赛、体育产品等等。
需要说明的是,以上应用场景仅为示意性说明,还可以存在其他的应用场景,上述举例的应用场景并不限制本发明实施例提供的眼神捕捉方法的应用场景。
本发明实施例提供的眼神捕捉装置可以对离线的视频或者图像进行眼神捕捉,也可以实时在线对演员进行眼神捕捉。
此外,采用本发明实施例提供的眼神捕捉方法,极大的提高基于单相机人脸表情捕捉的准确性,能够生动有效的传达人脸的真实情感和意图。同时为单相机在线虚拟直播技术、单相机智能交互技术、人脸识别技术等核心AI技术提供了算法基础,可以用于电影、游戏、刑侦,监控等领域。
本发明实施例还提供一种眼神捕捉装置,参照图18,给出本发明实施例中的一种眼神捕捉装置140,具体可以包括:
获取单元141,用于获取演员的眼部图像;
三维眼球确定单元142,用于获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;
眼神捕捉单元143,用于根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
在具体实施中,所述眼神捕捉单元143,用于根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息,所述眼部二维信息至少包括:虹膜掩膜、二维瞳孔中心位置及睁闭眼状态;根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置。
在具体实施中,所述眼神捕捉单元143,用于获取所述眼部图像对应的多个二维眼皮特征点;计算将所述多个二维眼皮特征点与多个预设二维眼皮特征点对齐时的相似变换矩阵;采用所述相似变换矩阵对所述眼部图像进行相似变换,得到变换后的图像;将所述变换后的图像输入至所述眼睛网络模型,预测变换后的图像对应的眼部二维信息;采用所述相似变换矩阵的逆矩阵,对所述变换后的图像对应的眼部二维信息进行变换,得到所述眼部图像对应的眼部二维信息。
在具体实施中,所述眼神捕捉单元143,用于根据所述三维眼球和预估三维瞳孔中心位置,得到所述三维眼球的预估虹膜;将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估虹膜掩膜;计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异;根据所述第一差异计算得到总差异;若所述总差异不大于预设第一阈值,则将所述预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
在具体实施中,所述眼神捕捉装置140还包括:优化单元,用于若所述总差异大于预设第一阈值,根据所述总差异对所述预估三维瞳孔中心位置进行调整并迭代优化,直至所述总差异不大于预设第一阈值或者迭代次数达到设定次数,将所述总差异不大于预设第一阈值或者迭代次数达到设定次数时的预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
在具体实施中,优化单元,用于将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估二维瞳孔中心位置;计算所述预估二维瞳孔中心位置与所述眼睛网络模型预测的二维瞳孔中心位置之间的第二差异;根据所述第一差异及所述第二差异计算得到所述总差异。
在具体实施中,优化单元,用于计算当前迭代优化的三维瞳孔中心位置与优化初始时的三维瞳孔中心位置之间的第三差异;根据所述第一差异、所述第二差异及所述第三差异,计算得到所述总差异。
在具体实施中,优化单元,用于计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的交集部分,以及所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的并集部分,将所述交集部分与所述并集部分的比值与理想比值的差异作为所述第一差异;或者,根据所述眼睛网络模型预测的虹膜掩膜的生成距离变换图,计算所述预估虹膜掩膜的边缘像素在所述距离变换图中的值,根据计算得到的值得到所述第一差异。
在具体实施中,所述眼神捕捉装置140还可以包括眼球校准单元,所述演员的眼部三维信息包括:通过所述眼球校准单元进行眼球校准获得所述眼球中心位置、眼球半径及虹膜尺寸。
在具体实施中,所述眼球校准单元,用于获取所述演员在中性表情下的三维人脸,从所述中性表情下的三维人脸中获取多个三维眼皮特征点;计算每只眼睛的所述多个三维眼皮特征点的三维位置的平均值,在所述三维位置的平均值的基础上加上预设的三维偏移量得到每只眼睛的眼球中心位置,所述三维偏移量的偏移方向朝向眼睛内部。
在具体实施中,所述获取单元141用于获取与所述演员的眼部图像对应的面部图像;根据所述面部图像,获得所述演员的面部姿态的变换矩阵,所述面部姿态为所述演员面部相对于相机的姿态;根据所述面部姿态的变换矩阵对所述眼球中心位置进行变换,得到相对于相机的眼球中心位置。
在具体实施中,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:所述演员的头部佩戴面部表情捕捉头盔,所述头盔与所述演员的头部相对静止;所述头盔上安装有面部表情捕捉相机,所述相机捕捉演员面部表情。
在具体实施中,对于任一帧的面部图像所述面部姿态的变换矩阵是固定值。
在具体实施中,所述与所述演员的眼部图像对应的面部图像根据 以下方法获取:利用摄像机拍摄所述演员的面部表情;所述摄像机与所述演员的头部是分离的。
在具体实施中,对于任一帧的面部图像所述面部姿态的变换矩阵是变化的。
在具体实施中,所述眼球校准单元,用于获取预设数量且满足校准要求的校准图像;将各校准图像输入至所述眼睛网络模型,预测得到多个虹膜掩膜;对所述多个虹膜掩膜分别进行圆拟合,得到圆拟合后的多个圆形;将所述多个圆形分别投影至所述演员在中性表情下的三维人脸,根据投影结果计算多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸;根据多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸。
在具体实施中,所述眼球校准单元,用于将多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸的平均值作为所述虹膜尺寸。
在具体实施中,所述眼睛网络模型针对一双眼睛中的其中一只眼睛,当输入至所述眼睛网络模型的眼部图像为一双眼睛中的另一只眼睛时,对输入的眼部图像进行对称翻转,并将对称翻转后的眼部图像作为所述眼睛网络模型的输入。
在具体实施中,所述眼神捕捉装置140还可以包括第一判断单元,第一判断单元用于在根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置之前,根据所述睁闭眼状态判断所述演员是否闭眼;当所述睁闭眼状态指示闭眼时,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
在具体实施中,所述眼神捕捉装置140还可以包括计算单元及第二判断单元,所述计算单元用于在捕捉得到一双眼睛中的每只眼睛分别对应的三维瞳孔中心位置之后,计算一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
Figure PCTCN2022071905-appb-000028
的联合先验分布,所述三维瞳孔中心位置包括:眼球半径、天顶角θ和方位角
Figure PCTCN2022071905-appb-000029
第二判断单元用于判断联合先 验分布分布结果是否低于设定概率阈值,当联合先验分布结果指示的概率值低于设定概率阈值时,判定捕捉错误,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
在具体实施中,所述眼神捕捉单元143,用于确定所述眼球中心位置指向所述三维瞳孔中心位置的方向,将该方向作为所述演员的眼神方向。
在具体实施中,眼神捕捉装置140的具体工作原理及工作流程可以参见本发明上述任一实施例中的描述,此处不再赘述。
进一步,所述眼神捕捉装置140可以集成于终端、服务器等计算设备。例如,眼神捕捉装置140可以集中地集成于同一服务器内。或者,眼神捕捉装置140可以分散的集成于多个终端或服务器内并相互耦接。例如,所述三维眼神模型可以单独设置于终端或服务器上,以确保较优的数据处理速度。
基于本实施例眼神捕捉装置140及对应的眼神捕捉方法,用户在获取单元141一侧获取待处理的眼部图像,即可在眼神捕捉单元143的输出端捕捉得到所述演员的眼神方向,从而实现演员的眼神捕捉。
本发明实施例还提供一种存储介质,所述存储介质为非易失性存储介质或非瞬态存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述任一实施例提供的眼神捕捉方法的步骤。
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行上述任一种眼神捕捉方法的步骤。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于任一计算机可读存储介质中,存储介质可以包括:ROM、RAM、磁盘或光盘等。
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术 人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (24)

  1. 一种眼神捕捉方法,其特征在于,包括:
    获取演员的眼部图像;
    获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;
    根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
  2. 如权利要求1所述的眼神捕捉方法,其特征在于,所述根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,包括:
    根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息,所述眼部二维信息至少包括:虹膜掩膜、二维瞳孔中心位置及睁闭眼状态;
    根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置。
  3. 如权利要求2所述的眼神捕捉方法,其特征在于,所述根据所述眼部图像,采用所述眼睛网络模型得到眼部二维信息,包括:
    获取所述眼部图像对应的多个二维眼皮特征点;
    计算将所述多个二维眼皮特征点与多个预设二维眼皮特征点对齐时的相似变换矩阵;
    采用所述相似变换矩阵对所述眼部图像进行相似变换,得到变换后的图像;
    将所述变换后的图像输入至所述眼睛网络模型,预测变换后的图 像对应的眼部二维信息;
    采用所述相似变换矩阵的逆矩阵,对所述变换后的图像对应的眼部二维信息进行变换,得到所述眼部图像对应的眼部二维信息。
  4. 如权利要求2所述的眼神捕捉方法,其特征在于,所述根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置,包括:
    根据所述三维眼球和预估三维瞳孔中心位置,得到所述三维眼球的预估虹膜;
    将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估虹膜掩膜;
    计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异;
    根据所述第一差异计算得到总差异;
    若所述总差异不大于预设第一阈值,则将所述预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
  5. 如权利要求4所述的眼神捕捉方法,其特征在于,还包括:
    若所述总差异大于预设第一阈值,根据所述总差异对所述预估三维瞳孔中心位置进行调整并迭代优化,直至所述总差异不大于预设第一阈值或者迭代次数达到设定次数,将所述总差异不大于预设第一阈值或者迭代次数达到设定次数时的预估三维瞳孔中心位置作为所述三维瞳孔中心位置。
  6. 如权利要求4或5所述的眼神捕捉方法,其特征在于,所述根据所述第一差异计算得到总差异,包括:
    将所述预估虹膜投影至所述眼部图像对应的二维平面,得到预估二维瞳孔中心位置;
    计算所述预估二维瞳孔中心位置与所述眼睛网络模型预测的二 维瞳孔中心位置之间的第二差异;
    根据所述第一差异及所述第二差异计算得到所述总差异。
  7. 如权利要求6所述的眼神捕捉方法,其特征在于,所述根据所述第一差异和第二差异计算得到总差异,包括:
    计算当前迭代优化的三维瞳孔中心位置与优化初始时的三维瞳孔中心位置之间的第三差异;
    根据所述第一差异、所述第二差异及所述第三差异,计算得到所述总差异。
  8. 如权利要求4所述的眼神捕捉方法,其特征在于,所述计算预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜之间的第一差异,包括:
    计算所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的交集部分,以及所述预估虹膜掩膜与所述眼睛网络模型预测的虹膜掩膜的并集部分,将所述交集部分与所述并集部分的比值与理想比值的差异作为所述第一差异;
    或者,根据所述眼睛网络模型预测的虹膜掩膜的生成距离变换图,计算所述预估虹膜掩膜的边缘像素在所述距离变换图中的值,根据计算得到的值得到所述第一差异。
  9. 如权利要求1所述的眼神捕捉方法,其特征在于,所述获取所述演员的眼部三维信息,包括:通过眼球校准获得所述眼球中心位置、眼球半径及虹膜尺寸。
  10. 如权利要求9所述的眼神捕捉方法,其特征在于,所述通过眼球校准获得所述眼球中心位置,包括:
    获取所述演员在中性表情下的三维人脸,从所述中性表情下的三维人脸中获取多个三维眼皮特征点;
    计算每只眼睛的所述多个三维眼皮特征点的三维位置的平均值,在所述三维位置的平均值的基础上加上预设的三维偏移量得到每只眼睛的眼球中心位置,所述三维偏移量的偏移方向朝向眼睛内部。
  11. 如权利要求9所述的眼神捕捉方法,其特征在于,所述获取所述演员的眼部三维信息,包括:
    获取与所述演员的眼部图像对应的面部图像;
    根据所述面部图像,获得所述演员的面部姿态的变换矩阵,所述面部姿态为所述演员面部相对于相机的姿态;
    根据所述面部姿态的变换矩阵对所述眼球中心位置进行变换,得到相对于相机的眼球中心位置。
  12. 如权利要求11所述的眼神捕捉方法,其特征在于,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:
    所述演员的头部佩戴面部表情捕捉头盔,所述头盔与所述演员的头部相对静止;
    所述头盔上安装有面部表情捕捉相机,所述相机捕捉演员面部表情。
  13. 如权利要求12所述的眼神捕捉方法,其特征在于,对于任一帧的面部图像所述面部姿态的变换矩阵是固定值。
  14. 如权利要求11所述的眼神捕捉方法,其特征在于,所述与所述演员的眼部图像对应的面部图像根据以下方法获取:
    利用摄像机拍摄所述演员的面部表情;
    所述摄像机与所述演员的头部是分离的。
  15. 如权利要求14所述的眼神捕捉方法,其特征在于,对于任一帧的面部图像所述面部姿态的变换矩阵是变化的。
  16. 如权利要求9所述的眼神捕捉方法,其特征在于,所述通过 眼球校准获得所述虹膜尺寸,包括:
    获取预设数量且满足校准要求的校准图像;
    将各校准图像输入至所述眼睛网络模型,预测得到多个虹膜掩膜;
    对所述多个虹膜掩膜分别进行圆拟合,得到圆拟合后的多个圆形;
    将所述多个圆形分别投影至所述演员在中性表情下的三维人脸,根据投影结果计算多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸;
    根据多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸。
  17. 如权利要求16所述的眼神捕捉方法,其特征在于,所述根据多个虹膜掩膜在三维人脸中对应的虹膜尺寸,得到所述虹膜尺寸,包括:
    将多个虹膜掩膜在所述三维人脸中对应的虹膜尺寸的平均值作为所述虹膜尺寸。
  18. 如权利要求2所述的眼神捕捉方法,其特征在于,所述眼睛网络模型针对一双眼睛中的其中一只眼睛,当输入至所述眼睛网络模型的眼部图像为一双眼睛中的另一只眼睛时,对输入的眼部图像进行对称翻转,并将对称翻转后的眼部图像作为所述眼睛网络模型的输入。
  19. 如权利要求2所述的眼神捕捉方法,其特征在于,还包括:
    在根据所述眼部二维信息和所述三维眼球,确定所述三维瞳孔中心位置之前,根据所述睁闭眼状态判断所述演员是否闭眼;
    当所述睁闭眼状态指示闭眼时,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
  20. 如权利要求1所述的眼神捕捉方法,其特征在于,还包括:
    在捕捉得到一双眼睛中的每只眼睛分别对应的三维瞳孔中心位置之后,计算一双眼睛的三维瞳孔中心位置中的天顶角θ和方位角
    Figure PCTCN2022071905-appb-100001
    的联合先验分布,所述三维瞳孔中心位置包括:眼球半径、天顶角θ和方位角
    Figure PCTCN2022071905-appb-100002
    当联合先验分布结果指示的概率值低于设定概率阈值时,判定捕捉错误,将根据前一帧眼部图像捕捉的眼神方向作为所述眼部图像对应的眼神方向。
  21. 如权利要求1所述的眼神捕捉方法,其特征在于,所述根据所述三维瞳孔中心位置捕捉所述演员的眼神方向,包括:
    确定所述眼球中心位置指向所述三维瞳孔中心位置的方向,将该方向作为所述演员的眼神方向。
  22. 一种眼神捕捉装置,其特征在于,包括:
    获取单元,用于获取演员的眼部图像;
    三维眼球确定单元,用于获取所述演员的眼部三维信息,根据所述眼部三维信息确定所述演员的三维眼球,所述眼部三维信息至少包括:眼球中心位置、眼球半径及虹膜尺寸;
    眼神捕捉单元,用于根据所述眼部图像,采用眼睛网络模型和所述三维眼球,确定三维瞳孔中心位置,并根据所述三维瞳孔中心位置捕捉所述演员的眼神方向。
  23. 一种存储介质,所述存储介质为非易失性存储介质或非瞬态存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器运行时执行权利要求1至21任一项所述的眼神捕捉方法的步骤。
  24. 一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,其特征在于,所述处理器运行所 述计算机程序时执行权利要求1至21任一项所述的眼神捕捉方法的步骤。
PCT/CN2022/071905 2021-03-18 2022-01-14 眼神捕捉方法及装置、存储介质、终端 WO2022193809A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110290851.4A CN113192132B (zh) 2021-03-18 2021-03-18 眼神捕捉方法及装置、存储介质、终端
CN202110290851.4 2021-03-18

Publications (1)

Publication Number Publication Date
WO2022193809A1 true WO2022193809A1 (zh) 2022-09-22

Family

ID=76973435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071905 WO2022193809A1 (zh) 2021-03-18 2022-01-14 眼神捕捉方法及装置、存储介质、终端

Country Status (2)

Country Link
CN (1) CN113192132B (zh)
WO (1) WO2022193809A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116382475A (zh) * 2023-03-24 2023-07-04 北京百度网讯科技有限公司 视线方向的控制、视线交流方法、装置、设备及介质
CN116382475B (zh) * 2023-03-24 2024-05-14 北京百度网讯科技有限公司 视线方向的控制、视线交流方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192132B (zh) * 2021-03-18 2022-07-29 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端
CN114219878B (zh) * 2021-12-14 2023-05-23 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端
CN116664394B (zh) * 2023-08-01 2023-10-03 博奥生物集团有限公司 一种三维人眼图像生成方法及装置、电子设备、存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100220897A1 (en) * 2009-02-27 2010-09-02 Kabushiki Kaisha Toshiba Information processing apparatus and network conference system
CN102830793A (zh) * 2011-06-16 2012-12-19 北京三星通信技术研究有限公司 视线跟踪方法和设备
CN108573192A (zh) * 2017-03-09 2018-09-25 北京京东尚科信息技术有限公司 匹配人脸的眼镜试戴方法和装置
CN109471523A (zh) * 2017-09-08 2019-03-15 托比股份公司 使用眼球中心位置的眼睛追踪
CN110807364A (zh) * 2019-09-27 2020-02-18 中国科学院计算技术研究所 三维人脸与眼球运动的建模与捕获方法及系统
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4692526B2 (ja) * 2006-07-18 2011-06-01 株式会社国際電気通信基礎技術研究所 視線方向の推定装置、視線方向の推定方法およびコンピュータに当該視線方向の推定方法を実行させるためのプログラム
CN104809424B (zh) * 2014-01-23 2020-11-10 北京七鑫易维信息技术有限公司 一种基于虹膜特征实现视线追踪的方法
JP2019519859A (ja) * 2016-06-29 2019-07-11 シーイング マシーンズ リミテッド 視線追跡を実行するシステム及び方法
CN108229264A (zh) * 2016-12-15 2018-06-29 广东技术师范学院 基于虹膜识别的驾驶员眼部动作捕捉方法
CN110516548B (zh) * 2019-07-24 2021-08-03 浙江工业大学 一种基于三维眼球模型和Snakuscule的虹膜中心定位方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100220897A1 (en) * 2009-02-27 2010-09-02 Kabushiki Kaisha Toshiba Information processing apparatus and network conference system
CN102830793A (zh) * 2011-06-16 2012-12-19 北京三星通信技术研究有限公司 视线跟踪方法和设备
CN108573192A (zh) * 2017-03-09 2018-09-25 北京京东尚科信息技术有限公司 匹配人脸的眼镜试戴方法和装置
CN109471523A (zh) * 2017-09-08 2019-03-15 托比股份公司 使用眼球中心位置的眼睛追踪
CN110807364A (zh) * 2019-09-27 2020-02-18 中国科学院计算技术研究所 三维人脸与眼球运动的建模与捕获方法及系统
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116382475A (zh) * 2023-03-24 2023-07-04 北京百度网讯科技有限公司 视线方向的控制、视线交流方法、装置、设备及介质
CN116382475B (zh) * 2023-03-24 2024-05-14 北京百度网讯科技有限公司 视线方向的控制、视线交流方法、装置、设备及介质

Also Published As

Publication number Publication date
CN113192132A (zh) 2021-07-30
CN113192132B (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2022193809A1 (zh) 眼神捕捉方法及装置、存储介质、终端
CN109086726B (zh) 一种基于ar智能眼镜的局部图像识别方法及系统
JP6917444B2 (ja) 角膜曲率を用いた虹彩境界推定
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
US20190387168A1 (en) Augmented reality display with frame modulation functionality
US10831268B1 (en) Systems and methods for using eye tracking to improve user interactions with objects in artificial reality
US10628948B2 (en) Image registration device, image registration method, and image registration program
CN110807364B (zh) 三维人脸与眼球运动的建模与捕获方法及系统
WO2019177870A1 (en) Animating virtual avatar facial movements
Tonsen et al. A high-level description and performance evaluation of pupil invisible
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
WO2022095721A1 (zh) 参数估算模型的训练方法、装置、设备和存储介质
WO2020125499A1 (zh) 一种操作提示方法及眼镜
WO2020063000A1 (zh) 神经网络训练、视线检测方法和装置及电子设备
Wang et al. Realtime and accurate 3D eye gaze capture with DCNN-based iris and pupil segmentation
WO2020211347A1 (zh) 基于人脸识别的修改图片的方法、装置和计算机设备
US11694419B2 (en) Image analysis and gaze redirection using characteristics of the eye
JP2014194617A (ja) 視線方向推定装置、視線方向推定装置および視線方向推定プログラム
CN111914811A (zh) 图像数据处理方法、装置、计算机设备以及存储介质
Malleson et al. Rapid one-shot acquisition of dynamic VR avatars
Chen et al. 3D face reconstruction and gaze tracking in the HMD for virtual interaction
WO2020200082A1 (zh) 直播互动方法、装置、直播系统及电子设备
US9786030B1 (en) Providing focal length adjustments
Brito et al. Repurposing labeled photographs for facial tracking with alternative camera intrinsics
Li et al. A low-cost head and eye tracking system for realistic eye movements in virtual avatars

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770171

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770171

Country of ref document: EP

Kind code of ref document: A1