WO2021135827A1 - 视线方向确定方法、装置、电子设备及存储介质 - Google Patents

视线方向确定方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021135827A1
WO2021135827A1 PCT/CN2020/134049 CN2020134049W WO2021135827A1 WO 2021135827 A1 WO2021135827 A1 WO 2021135827A1 CN 2020134049 W CN2020134049 W CN 2020134049W WO 2021135827 A1 WO2021135827 A1 WO 2021135827A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
eye
line
facial
features
Prior art date
Application number
PCT/CN2020/134049
Other languages
English (en)
French (fr)
Inventor
王飞
钱晨
Original Assignee
上海商汤临港智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤临港智能科技有限公司 filed Critical 上海商汤临港智能科技有限公司
Priority to JP2022524710A priority Critical patent/JP7309116B2/ja
Priority to KR1020217034841A priority patent/KR20210140763A/ko
Publication of WO2021135827A1 publication Critical patent/WO2021135827A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to a method, device, electronic device, and storage medium for determining the direction of the line of sight.
  • gaze tracking is an important field in computer vision.
  • the main purpose of gaze tracking is to predict the user’s gaze direction. Because the user’s gaze direction is often related to the user’s personal intention, this makes the gaze tracking technology useful in the understanding of the user’s intention. Important role, so how to accurately determine the direction of the user's line of sight becomes particularly important.
  • the embodiments of the present disclosure provide at least one solution for determining the line of sight direction.
  • an embodiment of the present disclosure provides a method for determining a line of sight direction, including: acquiring a facial image and an eye image of a target object; extracting facial features of the target object from the facial image; The facial features and the eye image determine the eye features of the target object; predict the initial line of sight direction of the target object based on the facial features, and based on the fusion of the facial features and the eye features Based on the fusion features of, the line-of-sight residual information is predicted to obtain; based on the line-of-sight residual information, the initial line-of-sight direction is corrected to obtain the line-of-sight direction of the target object.
  • the method for determining the line of sight direction can extract facial features of a target object based on facial images, which can predict the initial line of sight direction of the target object, and determine the eye features of the target object based on facial features and eye images . Then, the information representing the difference between the actual line-of-sight direction of the target object and the initial line-of-sight direction, that is, the line-of-sight residual information, can be predicted by the fusion feature after the facial feature and the eye feature are fused. Then, the initial line of sight direction predicted only based on facial features is adjusted by the information that characterizes the difference, that is, the line of sight direction that is closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
  • the eye image includes a left eye image and a right eye image
  • the determining the eye feature of the target object according to the facial feature of the target object and the eye image includes : Extract left-eye features from the left-eye image; extract right-eye features from the right-eye image; determine that the left-eye feature corresponds to the facial feature, the left-eye feature, and the right-eye feature
  • the first weight of and the second weight corresponding to the right-eye feature; based on the first weight and the second weight, the left-eye feature and the right-eye feature are weighted and summed to obtain the eye Department features.
  • the different contributions of the left-eye image and the right-eye image in determining the direction of the line of sight are respectively determined, thereby determining the accuracy
  • the higher eye features which in turn facilitates the improvement of the accuracy of predicting the residual information of the line of sight.
  • the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature are determined according to the facial feature, the left-eye feature, and the right-eye feature.
  • the weight includes: determining the first score of the left-eye feature according to the facial feature and the left-eye feature, and determining the second score of the right-eye feature according to the facial feature and the right-eye feature Value; based on the first score and the second score, the first weight and the second weight are determined.
  • the predicting the initial line of sight direction of the target object based on the facial features includes: determining the weight of each feature point in the facial feature, and based on each feature point in the facial feature The facial feature is adjusted according to the weight of, and the initial line of sight direction of the target object is determined according to the adjusted facial feature.
  • determining the fusion feature based on the facial feature and the eye feature in the following manner includes: according to the adjusted facial feature, the eye feature, and the adjusted feature The weight of each feature point in the facial features determines the intermediate feature; based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, the intermediate feature is Performing a weighted summation with the adjusted facial features to obtain the fusion feature.
  • the weight of each feature point in the adjusted facial feature is determined in the following manner: the weight of each feature point in the adjusted facial feature is determined according to the eye feature and the adjusted facial feature Weights.
  • the weights corresponding to the intermediate feature and the adjusted facial feature are determined in the following manner: the intermediate feature and the adjusted facial feature are determined according to the eye feature and the adjusted facial feature. The weights corresponding to the adjusted facial features respectively.
  • the above is based on the eye features and the adjusted facial features to determine the fusion feature after the fusion of the facial features and the eye features.
  • the fusion feature comprehensively considers the facial image and the eye image, thereby facilitating the determination of the target object through the fusion feature
  • the difference between the actual line of sight direction and the initial line of sight direction, and then the initial line of sight direction can be corrected based on the difference to obtain a more accurate line of sight direction.
  • the method for determining the line of sight direction is implemented by a neural network, and the neural network is obtained by training using a sample image containing the marked line of sight direction of the target sample object.
  • the neural network is obtained by training in the following manner: acquiring the face sample image and the eye sample image of the target sample object in the sample image; extracting the target sample object from the face sample image The facial features of the target sample object; determine the eye features of the target sample object based on the facial features of the target sample object and the eye sample image; predict the initial line of sight direction of the target sample object based on the facial features of the target sample object , And, based on the fusion feature after fusion of the facial features of the target sample object and the eye features of the target sample object, predicting the line of sight residual information of the target sample object; based on the line of sight of the target sample object The residual information corrects the initial line-of-sight direction of the target sample object to obtain the line-of-sight direction of the target sample object; based on the obtained line-of-sight direction of the target sample object and the labeled line-of-sight direction of the target sample object, The network parameter values of the neural network are adjusted.
  • the face sample image and the eye sample image of the target sample object in the sample image can be obtained.
  • the facial features of the target sample object are extracted based on the facial sample image, and the facial features of the target sample object can predict the initial line of sight direction of the target sample object.
  • the eye features of the target sample object are determined based on the facial features and eye images of the target sample object.
  • the information that characterizes the difference between the actual line of sight direction of the target sample object and the initial line of sight direction, that is, the line of sight residual information, can be predicted by the fusion feature after the facial feature and the eye feature of the target sample object are fused.
  • the initial gaze direction predicted only based on the facial features of the target sample object is adjusted by the information that characterizes the difference, that is, the gaze direction that is closer to the marked gaze direction of the target sample object can be obtained.
  • the network parameter values of the neural network are adjusted based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction, that is, a neural network with higher accuracy can be obtained. Based on the neural network with higher accuracy, the sight direction of the target object can be accurately predicted.
  • the embodiments of the present disclosure provide a device for determining a line of sight direction, including: an image acquisition module for acquiring facial images and eye images of a target object; a feature extraction module for extracting all images from the facial image The facial features of the target object; and used to determine the eye features of the target object based on the facial features of the target object and the eye features; a line of sight prediction module is used to predict the target object based on the facial features And, based on the fusion feature after the facial features and the eye features are fused, the gaze residual information is predicted; the gaze correction module is configured to perform the correction of the initial gaze based on the gaze residual information The direction is corrected to obtain the sight direction of the target object.
  • an embodiment of the present disclosure provides an electronic device, including: a processor, a storage medium, and a bus.
  • the storage medium stores machine-readable instructions executable by the processor.
  • the storage media communicate through a bus, and the machine-readable instructions cause the processor to execute the method according to the first aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program causes a processor to execute the method described in the first aspect.
  • Fig. 1 shows a flowchart of a method for determining a line of sight direction provided by an embodiment of the present disclosure.
  • FIG. 2 shows a schematic diagram of a principle of determining the direction of the line of sight provided by an embodiment of the present disclosure.
  • Fig. 3 shows a flow chart of a method for determining eye characteristics provided by an embodiment of the present disclosure.
  • FIG. 4 shows a schematic diagram of a process of determining the weights corresponding to the left-eye feature and the right-eye feature provided by an embodiment of the present disclosure.
  • Fig. 5 shows a flowchart of a method for determining an initial line of sight direction provided by an embodiment of the present disclosure.
  • Fig. 6 shows a flow chart of a method for determining fusion features provided by an embodiment of the present disclosure.
  • FIG. 7 shows a schematic diagram of a process of determining the initial line of sight direction and determining the line of sight residual information provided by an embodiment of the present disclosure.
  • FIG. 8 shows a schematic diagram of a process of determining the line of sight direction provided by an embodiment of the present disclosure.
  • Fig. 9 shows a flowchart of a neural network training method provided by an embodiment of the present disclosure.
  • FIG. 10 shows a schematic structural diagram of an apparatus for determining a line of sight direction provided by an embodiment of the present disclosure.
  • FIG. 11 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Sight tracking is an important field in computer vision.
  • the main purpose of gaze tracking is to predict the user’s gaze direction.
  • appearance-based gaze prediction models are often implemented using deep learning models, such as facial features in facial images. Or eye features in the eye image to predict the direction of the line of sight.
  • facial image and the eye image are regarded as different independent feature sources, and the intrinsic relationship between the facial image and the eye image is not substantially considered.
  • eye images provide fine-grained features that focus on gaze, while facial images provide coarse-grained features with broader information. The combination of the two can more accurately predict the direction of the line of sight.
  • the present disclosure provides a method for determining the direction of the line of sight.
  • the facial features of the target object can be extracted based on the facial image, and the facial features can be used to predict the initial line of sight direction of the target object.
  • the features also called "fusion features”
  • the features that are fused from the facial features and eye features can be used to predict the actual line-of-sight direction and initial direction of the target object.
  • the information about the difference between the direction of the line of sight that is, the residual information of the line of sight.
  • the initial line of sight direction predicted only based on facial features is adjusted by the information that characterizes the difference, that is, the line of sight direction that is closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
  • the execution subject of the method for determining the line of sight direction provided by the embodiments of the present disclosure is generally a computer device with a certain computing capability.
  • the computer equipment includes, for example, a terminal device or a server or other processing equipment.
  • the terminal device may be a user equipment (UE), a mobile device, a user terminal, a terminal, and the like.
  • the method for determining the line-of-sight direction may be implemented by a processor invoking a computer-readable instruction stored in a memory.
  • FIG. 1 it is a flowchart of a method for determining a line of sight direction provided by an embodiment of the present disclosure.
  • the method includes steps S101 to S103.
  • the target object can be the user whose line of sight is to be predicted, and the face of the target object can be photographed by a device capable of collecting images, such as a video camera or a camera, to obtain a facial image of the target object, and then the eye of the target object can be intercepted in the facial image Department image.
  • a device capable of collecting images such as a video camera or a camera
  • S102 Extract the facial features of the target object from the facial image.
  • S103 Determine the eye feature of the target object according to the facial feature and eye image of the target object.
  • the facial features of the target object refer to coarse-grained features with broader information. Through these facial features, the initial line of sight direction of the target object can be predicted; the eye features of the target object refer to the fine-grained features that can characterize the focus on gaze feature. The combination of eye features and facial features can more accurately predict the direction of the line of sight.
  • the facial features and eye features can be extracted by the sub-neural network used for feature extraction in the pre-trained neural network for line-of-sight prediction, which will be described in detail in the following embodiments. Go ahead and repeat.
  • S104 Predict the initial line of sight direction of the target object based on the facial features, and predict and obtain line-of-sight residual information based on the fusion feature after the facial feature and the eye feature are fused.
  • the line-of-sight residual information is used to characterize the difference between the actual line-of-sight direction of the target object and the initial line-of-sight direction.
  • the initial line of sight direction here can be determined based on facial features. Specifically, it can be predicted based on the sub-neural network used to determine the initial line of sight direction in the pre-trained neural network for line of sight prediction. The specific prediction method will be combined later. The examples are described in detail.
  • the line-of-sight residual information here can be predicted by a sub-neural network used to determine line-of-sight residual information in a pre-trained neural network for predicting the direction of the line of sight.
  • the specific prediction method will be described in detail later.
  • the information that characterizes the difference between the actual line of sight of the target object and the initial line of sight direction is predicted by the features after facial features and eye features are fused, and then the information that characterizes the difference is used to adjust the initial line of sight predicted only based on facial features Direction, that is, the direction of the line of sight that is closer to the direction of the actual line of sight can be obtained. That is, the present disclosure proposes to combine the facial image and the eye image of the target object, and predict by combining the fine-grained features provided by the eye image that focus on gaze and the coarse-grained features corresponding to the broader information provided by the facial image.
  • facial features and eye features can be input into the pre-trained neural network for gaze direction prediction in the sub-neural network used to determine the residual information of the gaze, and the features after the fusion of the facial features and the eye features are obtained. It will be described later in conjunction with specific embodiments.
  • S105 Correct the initial line-of-sight direction based on the line-of-sight residual information to obtain the line-of-sight direction of the target object.
  • the line-of-sight residual information here may include information that characterizes the difference between the actual line-of-sight direction and the initial line-of-sight direction determined based on the features after the fusion of the facial features and the eye features, and then the line-of-sight residual information can be paired with The initial line of sight direction is adjusted, for example, the line of sight residual information can be summed with the initial line of sight direction predicted based on facial features to obtain the line of sight direction that is closer to the actual line of sight direction of the target object.
  • FIG. 2 it shows a schematic diagram of a principle for determining the direction of the line of sight, where g b represents the initial line of sight direction of the target object predicted based on facial features, and gr represents the residual information of the line of sight, the final target object
  • the line of sight direction g is expressed by the following formula (1):
  • the line-of-sight residual information indicates the difference between the actual line-of-sight direction and the initial line-of-sight direction
  • it can be represented by a vector.
  • a world coordinate system can be introduced to represent the initial line of sight direction and line of sight residual information.
  • the actual gaze direction of the target object is 30 degrees south east
  • the initial gaze direction of the target object obtained by the facial feature prediction of the target object is 25 degrees south east
  • the line-of-sight residual information obtained by feature prediction is a deviation of 4 degrees
  • the initial line-of-sight direction is corrected by the line-of-sight residual information
  • the predicted line-of-sight direction of the target object is 29 degrees south east and 29 degrees south east. 25 degrees south east is obviously closer to the actual line of sight of the target object.
  • the gaze direction determination method proposed in the above steps S101 to S105 can be based on the facial features extracted from the face image of the target object, which can predict the initial gaze direction of the target object; in determining the eye of the target object based on the facial features and eye images After the facial features and eye features are fused, the information that characterizes the difference between the actual line of sight of the target object and the initial line of sight can be predicted, that is, the residual information of the line of sight; and then through the feature that characterizes the difference
  • the information adjustment is only based on the initial line of sight direction predicted by the facial features, that is, the line of sight direction closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
  • the facial image can be analyzed to extract the position point coordinates that can characterize the facial features as the facial features of the target object. For example, extract the coordinates of the cheeks and the corners of the eyes.
  • the facial features of the target object can be extracted based on a neural network.
  • the facial features of the target object can be extracted based on the sub-neural network for feature extraction in the pre-trained neural network for line-of-sight prediction, which specifically includes:
  • the facial image is input to the first feature extraction network, and the facial features are obtained through the first feature extraction network processing.
  • the first feature extraction network is a sub-neural network used for facial feature extraction in a pre-trained neural network for line-of-sight prediction.
  • the first feature extraction network here is used to extract facial features in facial images in a pre-trained neural network for line of sight prediction. That is, after the facial image is input to the first feature extraction network, it can be extracted to predict the initial line of sight Direction of facial features.
  • the facial features in the facial image are extracted through the first feature extraction network in the pre-trained neural network for line-of-sight prediction.
  • the first feature extraction network is dedicated to extracting facial features of a facial image, so that more accurate facial features can be extracted, thereby facilitating the improvement of the accuracy of the initial line of sight direction.
  • the above-mentioned eye image includes a left-eye image and a right-eye image.
  • the appearance of the left eye shown in the left-eye image and the appearance of the right eye shown in the right-eye image will change with changes in the environment or changes in the posture of the head.
  • the left-eye feature extracted based on the left-eye image and the right-eye feature extracted based on the right-eye image may have different contributions when determining the direction of the line of sight.
  • determining the eye features of the target object according to the facial features and eye images of the target object, as shown in FIG. 3, may include the following steps S301 to S304.
  • extracting the left-eye feature in the left-eye image can be the extraction of the position point coordinates that can characterize the eye feature in the left-eye image, as the left-eye feature of the target object, such as the position of the pupil and the corner of the eye, or it can be based on Pre-trained neural network to extract left eye features.
  • extracting the right-eye feature from the right-eye image here can be extracting the position point coordinates that can characterize the eye feature in the right-eye image, as the right-eye feature of the target object, such as the position of the pupil and the corner of the eye, or,
  • the right eye feature can be extracted based on a pre-trained neural network.
  • the present disclosure uses a pre-trained neural network to extract left-eye features and right-eye features as an example for description:
  • the left-eye image is input into the second feature extraction network, the left-eye feature is obtained through the second feature extraction network, and the right-eye image is input into the third feature extraction network, and the right-eye feature is obtained through the third feature extraction network.
  • the second feature extraction network is a sub-neural network used for left-eye feature extraction in a pre-trained neural network for line-of-sight prediction.
  • the third feature extraction network is a sub-neural network used for right-eye feature extraction in a pre-trained neural network for line-of-sight prediction.
  • S303 Determine a first weight corresponding to the left-eye feature and a second weight corresponding to the right-eye feature according to the facial feature, the left-eye feature, and the right-eye feature.
  • the first weight corresponding to the left-eye feature represents the contribution of the left-eye image in determining the line of sight direction
  • the second weight corresponding to the right-eye feature represents the contribution of the right-eye image in determining the line of sight direction.
  • it can be determined by a pre-trained neural network. For example, facial features, left-eye features, and right-eye features can be input into the attention network, and the attention network processes the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature.
  • the attention network is a sub-neural network used to determine the respective evaluation values of the left-eye feature and the right-eye feature in the pre-trained neural network for predicting the direction of the line of sight.
  • the evaluation value represents the importance of the left eye feature/right eye feature in the eye feature.
  • the facial features, left-eye features, and right-eye features are input into the attention network, and the first weight and the second weight are obtained through the attention network processing, it includes:
  • determining the first score of the left-eye feature based on facial features and left-eye features and determining the second score of the right-eye feature based on facial features and right-eye features it can be determined by a pre-trained neural network, for example, Attention network to determine, namely:
  • the determination of the first weight and the second weight based on the first score and the second score can also be obtained through attention network processing.
  • the first score can represent the contribution of the left-eye image in determining the direction of the line of sight, and it is known through advance testing that the first score is related to both facial features and left-eye features.
  • the first score is related to facial features, and refers to the score that predicts the facial features of the initial line of sight direction that can affect the features of the left eye.
  • the first score is related to the left eye feature, that is, the shape and appearance of the left eye will also affect the score of the left eye feature.
  • the attention network can determine the first score according to the following formula (2):
  • m l represents the first score corresponding to the left eye feature
  • W 1 , W 2 and W 3 are the network parameters in the attention network, that is, the network parameters obtained after the attention network is trained
  • f f represents the facial features
  • F l represents the left eye feature.
  • the second score can represent the contribution of the right-eye image in determining the direction of the line of sight, and it is known through advance testing that the second score is related to both facial features and right-eye features.
  • the second score is related to facial features, and refers to the score that predicts the facial features of the initial line of sight direction that can affect the features of the right eye.
  • the second score is related to the right eye feature, that is, the shape and appearance of the right eye will also affect the score of the right eye feature.
  • the attention network can determine the second score according to the following formula (3):
  • m r represents the second score corresponding to the right eye feature
  • W 1 , W 2 and W 3 are the network parameters in the attention network, that is, the network parameters obtained after the attention network is trained
  • f f represents the facial features
  • Fr represents the right eye feature.
  • the first weight and the right weight corresponding to the left eye feature can be further determined according to the first and second scores.
  • the second weight corresponding to the eye feature can be specifically determined according to the following formula (4):
  • the first weight w l corresponding to the left-eye feature can be obtained; and the second weight w r corresponding to the right-eye feature can be obtained.
  • FIG 4 The above process diagram of determining the weights corresponding to the left-eye feature and the right-eye feature can be shown in Figure 4.
  • the left-eye feature f l and the right-eye feature f r can be obtained through the deep neural network CNN respectively, and then the face
  • the partial feature f f , the left-eye feature f l and the right-eye feature fr are input into the attention network, and the first weight w l corresponding to the left-eye feature and the second weight w r corresponding to the right-eye feature are obtained.
  • it may be a step of performing a weighted summation of the left eye feature and the right eye feature based on the first weight and the second weight through the attention network to obtain the eye feature.
  • the left-eye feature and the right-eye feature can be weighted and summed.
  • the eye feature f e can be obtained according to the following formula (5) :
  • the different contributions of the left-eye image and the right-eye image in determining the direction of the line of sight are respectively determined, thereby determining the accuracy Higher eye features, which in turn facilitates the improvement of the accuracy of the residual information of the line of sight.
  • the sight direction of the target object can be further determined based on the facial features and eye features. Determining the sight direction of the target object can include two parts. The first part is the process of predicting the initial sight direction of the target object based on facial features, and the second part is the process of predicting the residual sight of the target object based on the fusion of facial features and eye features. The process of poor information.
  • S501 Determine the weight of each feature point in the facial feature, and adjust the facial feature based on the weight of each feature point in the facial feature;
  • S502 Determine an initial line of sight direction of the target object according to the adjusted facial features.
  • Facial features may include multiple feature points.
  • Feature points can be understood as different coarse-grained features extracted from facial images. These coarse-grained features can include, for example, regional features and location point features in facial images. Each feature point in the facial features has a different degree of importance when predicting the initial line of sight direction.
  • the facial features can be adjusted based on the weight of each feature point first, and then the initial line of sight direction of the target object can be determined based on the adjusted facial features.
  • the adjustment can be made through a pre-trained neural network, which will be described in detail later.
  • the merged features can be determined based on the facial features and the eye features in the manner shown in FIG. 6, which specifically includes the following steps S601 to S602.
  • S601 Determine an intermediate feature according to the adjusted facial feature, eye feature, and weight of each feature point in the adjusted facial feature.
  • S602 Perform a weighted summation on the intermediate feature and the adjusted facial feature based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, to obtain the fused feature.
  • the intermediate feature here can be determined by a pre-trained neural network. Through the intermediate feature and the adjusted facial feature, the feature after the fusion of the facial feature and the eye feature can be determined.
  • the above process of adjusting the facial features to obtain the adjusted facial features, and the process of obtaining the features after the fusion of the facial features and the eye features, can be processed by a pre-trained neural network, such as a gate network.
  • a pre-trained neural network such as a gate network.
  • the determination of the initial line of sight direction of the target object based on the adjusted facial features can also be determined based on a pre-trained neural network, which will be described in detail later.
  • the weight of each feature point in the adjusted facial features can be determined according to the following steps:
  • the weight of each feature point in the adjusted facial features is determined according to the eye features and the adjusted facial features.
  • the method of determining the weight here can be determined according to a preset weight distribution method, or it can be determined through a pre-trained neural network, which will be described in detail later.
  • the weights corresponding to the intermediate features and the adjusted facial features are determined according to the following steps:
  • the weights corresponding to the intermediate features and the adjusted facial features are determined according to the eye features and the adjusted facial features.
  • the method of determining the weight may be determined according to a preset weight distribution method, or may be determined through a pre-trained neural network, which will be described in detail later.
  • the gate network functions to filter the received features, that is, to increase the weight of important features and reduce the weight of non-important features.
  • the characteristic change mode of the gate network will be introduced in combination with formula (7) to formula (10):
  • W z , W r , W h are the network parameters in the gate network; ⁇ represents the sigmoid operation; ReLU represents the activation function; f represents the corresponding feature received (when processing facial features, f here represents facial features , When processing eye features, f here represents eye features); z t represents the weight obtained after sigmoid operation; r t represents the weight obtained after sigmoid operation; Represents the intermediate feature obtained after the features in the input gate network are fused; h t represents the weighted sum of the intermediate feature and the features output by the adjacent gate network, and set h 0 equal to 0.
  • the embodiments of the present disclosure need to determine the initial line-of-sight direction of the target object based on facial features, and predict the line-of-sight residual information of the target object based on the features after fusion of facial features and eye features.
  • the embodiment of the present disclosure can introduce two gate networks to complete the filtering of features, respectively, which can be recorded as the first gate network and the second gate network.
  • the output characteristics of the first gate network can be recorded as h 1
  • the second gate network The output feature can be denoted as h 2 , which will be described below in conjunction with specific embodiments.
  • the process of predicting the initial line of sight direction of the target object based on facial features is introduced.
  • the weight of facial features can be adjusted through the first network to obtain adjusted facial features h 1 , and then predicted based on the adjusted facial features h 1
  • the initial line of sight direction specifically includes the following steps.
  • the facial features here can include multiple feature points.
  • the feature points here can be understood as different coarse-grained features in the facial image, and these coarse-grained features may include regional features, location point features, etc. in the facial image.
  • Each feature point in the facial features has a different degree of importance when predicting the initial line of sight direction.
  • the weight of each feature point in facial features is determined through the first gate network.
  • the first gate network here is a sub-neural network used to adjust facial features in a pre-trained neural network for line-of-sight prediction.
  • the facial features can also be adjusted based on the weight of each feature point in the facial features through the first network.
  • h 0 is equal to 0.
  • the first multilayer perceptron is a sub-neural network used to predict the initial line of sight direction in the pre-trained neural network for predicting the direction of the line of sight.
  • the adjusted facial feature is denoted as h 1 , and then the adjusted facial feature is input into the first multilayer perceptron MLP to obtain the initial line of sight direction of the target object.
  • the first gate network adjusts the weight of each feature point in the facial features, so that the weight of the feature point that has a greater impact on the initial line of sight direction is greater than the weight of the feature point that has a small impact on the initial line of sight direction, so that the adjusted
  • the facial feature input predicts the first multi-layer perceptron of the initial line of sight direction to obtain a more accurate initial line of sight direction.
  • the eye features and adjusted facial features are input into the second gate network, and the second gate network is processed to obtain the fused features;
  • the second gate network is a pre-trained neural network for line-of-sight prediction, used for prediction fusion
  • the sub-neural network of the latter feature is a pre-trained neural network for line-of-sight prediction, used for prediction fusion
  • the adjusted facial feature is the h 1 output by the first gate network, and then input the h 1 and the eye feature f e into the second gate network, that is, the fused feature h output by the second gate network can be obtained 2 .
  • the weighted summation of the intermediate feature and the adjusted facial feature is performed through the second gate network to obtain the fused feature.
  • the weight of each feature point in the adjusted facial features can be determined in the following way:
  • the second gate network uses the trained weight distribution function when performing the first processing The first network parameter information.
  • the weight distribution function is a sigmoid operation represented by ⁇ ; the first network parameter information is W r .
  • formula (9) can be introduced to process the adjusted facial features, eye features, and the weights of each feature point in the adjusted facial features to obtain intermediate features, namely Get the intermediate feature as
  • the weights corresponding to the intermediate features and the adjusted facial features can be determined according to the following methods:
  • the second gate network uses the second network in the trained weight distribution function when performing the second processing. Parameter information.
  • the second processing is performed on the adjusted facial feature h 1 and eye feature f e to obtain the weights corresponding to the intermediate feature and the adjusted facial feature h 1 respectively.
  • This formula corresponds to the above-mentioned second gate network for the eye features and the adjusted face
  • the weight distribution function is the sigmoid operation represented by ⁇ ;
  • the second network parameter information is W z , so that the weight corresponding to the intermediate feature is z 2 , and the weight corresponding to the adjusted facial feature h 1 is 1-z 2 .
  • the gaze residual information can be predicted based on the features fused from facial features and eye features in the following manner:
  • the fused features are input into the second multilayer perceptron MLP, and processed by the second multilayer perceptron to obtain line-of-sight residual information.
  • the second multilayer perceptron is a sub-neural network used to predict the residual information of the line of sight in a pre-trained neural network for predicting the direction of the line of sight.
  • the fused feature is denoted as h 2 , and then the fused feature is input to the second multilayer perceptron MLP to obtain the line-of-sight residual information of the target object.
  • the above schematic diagram of the process of determining the initial line of sight direction and determining the line of sight residual information can be determined by the two sub-neural networks shown in FIG. 7.
  • the first sub-neural network includes a first gate function and a first multilayer perceptron MLP
  • the second sub-neural network includes a second gate function and a second multilayer perceptron MLP.
  • the adjusted facial feature h 1 can be input into the first multilayer perceptron to obtain the initial line of sight direction g b on the one hand, and on the other hand, it can be input into the second gate network together with the eye feature, and then processed by the second gate network.
  • the feature h 2 after the fusion of facial features and eye features is obtained. Then input the fused feature h 2 into the second multilayer perceptron to obtain the line-of-sight residual information g r .
  • the eye feature and the facial feature adjusted by the first gate network are input into the second gate network for processing, and the feature after the fusion of the facial feature and the eye feature is determined.
  • the fused feature is a feature obtained after comprehensive consideration of the facial image and the eye image, so that the difference between the actual line of sight direction of the target object and the initial line of sight direction can be easily determined through the fused feature. After correcting the initial line of sight direction based on the difference, a more accurate line of sight direction can be obtained.
  • an eye image is intercepted from the facial image, and the eye image includes a left-eye image and a right-eye image.
  • the facial image is input to the first feature extraction network (CNN) to obtain the facial feature f f .
  • the facial features are input into the aforementioned first sub-neural network (the first sub-neural network includes the first gate network and the first multilayer perceptron) for processing, that is, the initial line of sight direction g b can be obtained.
  • the left-eye image in the intercepted eye image is input into the second feature extraction network to obtain the left-eye feature f l
  • the right-eye image is input into the third feature extraction network to obtain the right-eye feature f r .
  • the left-eye feature, right-eye feature, and facial feature are input into the attention network to obtain the eye feature f e .
  • the eye features and the adjusted facial features h 1 obtained by the sub-neural network that predicts the initial line of sight direction are input into the second sub-neural network (the second sub-neural network includes the second gate network and the second multilayer perceptron)
  • the line-of-sight residual information g r can be obtained.
  • the initial line-of-sight direction can be corrected based on the line-of-sight residual information gr to obtain the line-of-sight direction of the target object.
  • the method for determining the line of sight direction proposed in the embodiments of the present application can be implemented by a neural network, which is obtained by training using a sample image that contains a target sample object labeled line of sight direction.
  • the labeled sight direction is the actual sight direction of the target sample object.
  • the neural network for determining the direction of the line of sight proposed in the embodiment of the present application can be obtained by training using the following steps, including steps S901 to S906.
  • S901 Acquire a face sample image and an eye sample image of a target sample object in a sample image.
  • the target sample object may include multiple target objects respectively located at different spatial locations. Make these multiple target objects all look in the same observation direction, and acquire facial images of these target sample objects as facial sample images. Then the eye sample image is intercepted from the face sample image.
  • the target sample object here may include a target object. The target sample image is made to look in different observation directions, and the facial image corresponding to each observation direction of the target sample object is acquired as the facial sample image, and then the eye sample image is intercepted from the facial sample image.
  • S902 Extract the facial features of the target sample object from the facial sample image.
  • extracting the facial features of the target sample object from the face sample image is similar to the method of extracting the facial features of the target object introduced above, and will not be repeated here.
  • S903 Determine the eye feature of the target sample object according to the facial feature and the eye sample image of the target sample object.
  • Determining the eye characteristics of the target sample object is similar to the method of determining the eye characteristics of the target object introduced above, and will not be repeated here.
  • S904 Predict the initial line-of-sight direction of the target sample object based on the facial features of the target sample object, and predict and obtain the line-of-sight residual of the target sample object based on the features fused from the facial features of the target sample object and the eye features of the target sample object information.
  • determining the initial line-of-sight direction and line-of-sight residual information of the target sample object is similar to the method for determining the initial line-of-sight direction and line-of-sight residual information of the target object above, and will not be repeated here.
  • S905 Correct the initial line-of-sight direction of the target sample object based on the line-of-sight residual information of the target sample object to obtain the line-of-sight direction of the target sample object.
  • the method of correcting the initial line of sight direction of the target sample object is similar to the method of correcting the initial line of sight direction of the target object based on the residual information of the target object described above, and will not be repeated here.
  • S906 Adjust network parameter values of the neural network based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction of the target sample object.
  • a loss function can be introduced to determine the loss value corresponding to the direction of the predicted line of sight.
  • the network parameter values of the neural network are adjusted through the loss value. For example, when the loss value is less than the set threshold, the training can be stopped to obtain the network parameter value of the neural network.
  • how to obtain eye features based on facial features, left eye features, right eye features, and attention network is similar to the detailed process of determining eye features in the gaze direction determination method introduced above, and will not be repeated here; How to predict the initial gaze direction of the target sample object based on facial features, how to determine the fused features based on facial features and eye features, and how to determine the gaze residual information of the target sample object based on the fused features are the same as described above.
  • the process of determining the fusion feature in the method for determining the direction of the line of sight is similar to the process of determining the residual information of the line of sight, and will not be repeated here.
  • the face sample image and the eye sample image of the target sample object in the sample image can be obtained.
  • the facial features of the target sample object are extracted based on the facial sample image, and the facial features of the target sample object can predict the initial line of sight direction of the target sample object.
  • the actual gaze direction of the target sample object can be predicted by the features after the fusion of the facial features and eye features of the target sample object Information about the difference between the initial line of sight direction and the line of sight residual information.
  • the initial line-of-sight direction predicted only based on the facial features of the target sample object is adjusted by the information that characterizes the difference, that is, the line-of-sight direction that is closer to the marked line-of-sight direction of the target sample object can be obtained.
  • Adjust the network parameter values of the neural network based on the obtained line of sight direction and the marked line of sight direction that is, a neural network with higher accuracy can be obtained. Based on the neural network with higher accuracy, the sight direction of the target object can be accurately predicted.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • the embodiment of the present disclosure also provides a line-of-sight direction determination device corresponding to the above-mentioned line-of-sight direction determination method. Since the line-of-sight direction determination device in the embodiment of the present disclosure solves the problem, the principle of the above-mentioned line-of-sight direction determination in the embodiment of the present disclosure is The methods are similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
  • the line-of-sight direction determining device 1000 includes: an image acquisition module 1001, a feature extraction module 1002, a line-of-sight prediction module 1003, and a line-of-sight correction module 1004 .
  • the image acquisition module 1001 is used to acquire facial images and eye images of the target object.
  • the feature extraction module 1002 is used for extracting the facial features of the target object from the facial image, and for determining the eye features of the target object according to the facial features and eye features of the target object.
  • the line-of-sight prediction module 1003 is used to predict the initial line-of-sight direction of the target object based on facial features, and to predict the residual line-of-sight information based on the fusion feature after the facial feature and the eye feature are fused.
  • the line-of-sight correction module 1004 is used to correct the initial line-of-sight direction based on the line-of-sight residual information to obtain the line-of-sight direction of the target object.
  • the eye image includes a left eye image and a right eye image
  • the feature extraction module 1002 is used to determine the eye features of the target object according to the facial features and eye features of the target object, perform the following operations : Extract left-eye features from the left-eye image; extract right-eye features from the right-eye image; determine the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature according to facial features, left-eye features, and right-eye features Weight: Based on the first weight and the second weight, a weighted summation of the left eye feature and the right eye feature is performed to obtain the eye feature.
  • the feature extraction module 1002 when used to determine the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature according to facial features, left-eye features, and right-eye features, execute the following Operation: Determine the first score of the left-eye feature based on facial features and left-eye features, and determine the second score of the right-eye feature based on facial features and right-eye features; determine based on the first and second scores The first weight and the second weight.
  • the line-of-sight prediction module 1003 when used to predict the initial line-of-sight direction of the target object based on facial features, it performs the following operations: determining the weight of each feature point in the facial feature, and based on each feature point in the facial feature The facial features are adjusted according to the weight of, and the initial line of sight direction of the target object is determined according to the adjusted facial features.
  • the line of sight prediction module 1003 is configured to determine the fused features based on facial features and eye features in the following manner: according to adjusted facial features, eye features, and adjusted facial features The weight of each feature point in determines the intermediate feature; based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, the intermediate feature and the adjusted facial feature are weighted and summed to obtain the fusion feature.
  • the line-of-sight prediction module 1003 determines the weight of each feature point in the adjusted facial feature according to the following method: determining the weight of each feature point in the adjusted facial feature according to the eye feature and the adjusted facial feature Weights.
  • the line-of-sight prediction module 1003 determines the weights corresponding to the intermediate features and the adjusted facial features in the following manner: the intermediate features and the adjusted facial features are determined according to the eye features and the adjusted facial features, respectively The corresponding weight.
  • the device 1000 for determining the line of sight direction further includes a neural network training module 1005.
  • the neural network training module 1005 is used to train a neural network for determining the line of sight direction of the target object.
  • the sample image of the object's marked line of sight direction is trained.
  • the neural network training module 1005 trains the neural network in the following manner: obtain the face sample image and eye sample image of the target sample object in the sample image; extract the face of the target sample object from the face sample image Features; determine the eye features of the target sample object based on the facial features and eye sample images of the target sample object; predict the initial line of sight direction of the target sample object based on the facial features of the target sample object, and, based on the facial features and The fusion feature after the fusion of the eye features of the target sample object predicts and obtains the line of sight residual information of the target sample object; based on the line of sight residual information of the target sample object, the initial line of sight direction of the target sample object is corrected to obtain the line of sight of the target sample object Direction: Based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction of the target sample object, the network parameter value of the neural network is adjusted.
  • an embodiment of the present disclosure also provides an electronic device.
  • FIG. 11 a schematic structural diagram of an electronic device 1100 provided by an embodiment of the present disclosure, including: a processor 1101, a storage The medium 1102 and the bus 1103; the storage medium 1102 is used to store execution instructions, including the memory 11021 and the external memory 11022; the memory 11021 here is also called the internal memory, which is used to temporarily store the calculation data of the processor 1101, and the external memory 11022 such as hard disk
  • the processor 1101 exchanges data with the external memory 11022 through the memory 11021.
  • the processor 1101 and the memory 1102 communicate through the bus 1103, and the machine-readable instructions are executed by the processor 1101 as follows deal with:
  • the face image and eye image of the target object Acquire the face image and eye image of the target object; extract the face feature of the target object from the face image; determine the eye feature of the target object according to the face feature and eye image of the target object; predict the initial line of sight direction of the target object based on the facial features , And, based on the fusion features after the facial features and eye features are fused, the line of sight residual information is predicted; based on the line of sight residual information, the initial line of sight direction is corrected to obtain the line of sight direction of the target object.
  • Embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is run by a processor, the method for determining the direction of sight described in the embodiment of the method for determining the direction of sight is executed A step of.
  • the storage medium may be a volatile or non-volatile computer readable storage medium.
  • the computer program product of the method for determining the line of sight direction provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code, and the instructions included in the program code can be used to execute the method for determining the line of sight direction described in the above method embodiment
  • the instructions included in the program code can be used to execute the method for determining the line of sight direction described in the above method embodiment
  • the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments.
  • the computer program product can be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种视线方向确定方法、装置、电子设备及存储介质,其中,该视线方向确定方法包括:获取目标对象的面部图像和眼部图像(S101);在面部图像中提取目标对象的面部特征(S102);根据目标对象的面部特征和眼部图像确定目标对象的眼部特征(S103);基于面部特征预测目标对象的初始视线方向,以及,基于由面部特征和眼部特征融合后的融合特征,预测得到视线残差信息(S104);基于视线残差信息对初始视线方向进行修正,得到目标对象的视线方向(S105)。

Description

视线方向确定方法、装置、电子设备及存储介质
相关申请的交叉引用
本公开要求于2019年12月30日提交的、申请号为201911403648.2的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本公开涉及图像处理技术领域,具体而言,涉及一种视线方向确定方法、装置、电子设备及存储介质。
背景技术
目前,视线追踪是计算机视觉中的一个重要领域,视线追踪的主要目的在于预测用户的视线方向,由于用户的视线方向往往和用户的个人意图相关,这使得视线追踪技术在用户的意图理解中有着重要的作用,因此如何准确地确定用户的视线方向就变得尤为重要。
发明内容
本公开实施例至少提供一种视线方向确定方案。
第一方面,本公开实施例提供了一种视线方向确定方法,包括:获取目标对象的面部图像和眼部图像;在所述面部图像中提取所述目标对象的面部特征;根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征;基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
本公开实施例提供的视线方向确定方法,可以基于面部图像提取到目标对象的面部特征,该面部特征能够预测目标对象的初始视线方向,以及基于面部特征和眼部图像确定目标对象的眼部特征。然后,可以通过由面部特征和眼部特征融合后的融合特征来预测表征目标对象的实际视线方向与初始视线方向之间的差异的信息,即视线残差信息。然后再通过表征该差异的信息调整仅仅根据面部特征预测的初始视线方向,即能够得到更接近实际视线方向的视线方向。可见本公开实施例提出的视线确定方法能够预测得到更加准确的视线方向。
在一种可能的实施方式中,所述眼部图像包括左眼图像和右眼图像,所述根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征,包括:在所述左眼图像中提取左眼特征;在所述右眼图像中提取右眼特征;根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重;基于所述第一权重以及所述第二权重,对所述左眼特征和所述右眼特征进行加权求和,得到所述眼部特征。
本公开实施例通过将面部特征与左眼特征进行结合,以及将面部特征与右眼图像进 行结合,分别确定出左眼图像和右眼图像在确定视线方向时的不同贡献,从而确定出准确度较高的眼部特征,进而便于提高预测视线残差信息的准确度。
在一种可能的实施方式中,所述根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重,包括:根据所述面部特征和所述左眼特征确定所述左眼特征的第一分值,以及,根据所述面部特征和所述右眼特征确定所述右眼特征的第二分值;基于所述第一分值和第二分值,确定所述第一权重和第二权重。
在一种可能的实施方式中,所述基于所述面部特征预测所述目标对象的初始视线方向,包括:确定所述面部特征中各个特征点的权重,并基于所述面部特征中各个特征点的权重,对所述面部特征进行调整;根据调整后的面部特征确定所述目标对象的初始视线方向。
这里提出对面部特征中各个特征点的权重进行调整,可以使得对初始视线方向影响较大的特征点的权重大于对初始视线方向影响较小的特征点的权重,这样就可以基于调整后的面部特征得到较为准确的初始视线方向。
在一种可能的实施方式中,按照以下方式基于所述面部特征和所述眼部特征,确定所述融合特征,包括:根据所述调整后的面部特征、所述眼部特征、以及调整后的面部特征中各个特征点的权重确定中间特征;基于所述中间特征、所述调整后的面部特征,以及所述中间特征和所述调整后的面部特征分别对应的权重,对所述中间特征和所述调整后的面部特征进行加权求和,得到所述融合特征。
在一种可能的实施方式中,按照以下方式确定调整后的面部特征中各个特征点的权重:根据所述眼部特征和所述调整后的面部特征确定调整后的面部特征中各个特征点的权重。
在一种可能的实施方式中,按照以下方式确定所述中间特征和所述调整后的面部特征分别对应的权重:根据所述眼部特征和所述调整后的面部特征确定所述中间特征和所述调整后的面部特征分别对应的权重。
以上通过基于眼部特征和调整后的面部特征,确定由面部特征和眼部特征融合后的融合特征,该融合特征综合考虑了面部图像和眼部图像,从而便于通过该融合特征确定目标对象的实际视线方向与初始视线方向之间的差异,进而可以根据该差异对初始视线方向进行修正,得到较为准确的视线方向。
在一种可能的实施方式中,所述视线方向确定方法由神经网络实现,所述神经网络利用包含了目标样本对象的标注视线方向的样本图像训练得到。
在一种可能的实施方式中,所述神经网络采用以下方式训练得到:获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;在所述面部样本图像中提取所述目标样本对象的面部特征;根据所述目标样本对象的面部特征和所述眼部样本图像确定所述目标样本对象的眼部特征;基于所述目标样本对象的面部特征预测所述目标样本对象的初始视线方向,以及,基于由所述目标样本对象的面部特征和所述目标样本对象的眼部特征融合后的融合特征,预测得到所述目标样本对象的视线残差信息;基于所述目标样本对象的视线残差信息对所述目标样本对象的初始视线方向进行修正,得到所述目标样本对象的视线方向;基于得到的所述目标样本对象的视线方向和所述目标样本对象的标 注视线方向,对所述神经网络的网络参数值进行调整。
根据本公开实施例提供的神经网络的训练方法,可以获取样本图像中的目标样本对象的面部样本图像和眼部样本图像。然后,基于面部样本图像提取到目标样本对象的面部特征,该目标样本对象的面部特征能够预测目标样本对象的初始视线方向。基于目标样本对象的面部特征和眼部图像确定目标样本对象的眼部特征。可以通过由目标样本对象的面部特征和眼部特征融合后的融合特征来预测表征目标样本对象的实际视线方向与初始视线方向之间的差异的信息,即视线残差信息。然后,再通过表征该差异的信息调整仅仅根据目标样本对象的面部特征预测的初始视线方向,即能够得到更接近目标样本对象的标注视线方向的视线方向。基于得到的目标样本对象的视线方向以及标注视线方向对神经网络的网络参数值进行调整,即可以得到准确度较高的神经网络。基于该准确度较高的神经网络即可以对目标对象的视线方向进行准确预测。
第二方面,本公开实施例提供了一种视线方向确定装置,包括:图像获取模块,用于获取目标对象的面部图像和眼部图像;特征提取模块,用于在所述面部图像中提取所述目标对象的面部特征;以及用于根据所述目标对象的面部特征和所述眼部特征确定所述目标对象的眼部特征;视线预测模块,用于基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;视线修正模块,用于基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
第三方面,本公开实施例提供了一种电子设备,包括:处理器、存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令促使所述处理器执行如第一方面所述的方法。
第四方面,本公开实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序促使处理器执行如第一方面所述的方法。
为使本公开的上述目的、特征和优点能更明显易懂,根据下文实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种视线方向确定方法的流程图。
图2示出了本公开实施例所提供的一种视线方向确定原理示意图。
图3示出了本公开实施例所提供的一种眼部特征的确定方法流程图。
图4示出了本公开实施例所提供的一种左眼特征以及右眼特征各自对应的权重的确定过程示意图。
图5示出了本公开实施例所提供的一种初始视线方向的确定方法流程图。
图6示出了本公开实施例所提供的一种融合特征的确定方法流程图。
图7示出了本公开实施例所提供的一种确定初始视线方向以及确定视线残差信息的过程示意图。
图8示出了本公开实施例所提供的一种确定视线方向的过程示意图。
图9示出了本公开实施例所提供的一种神经网络训练方法的流程图。
图10示出了本公开实施例所提供的一种视线方向确定装置的结构示意图。
图11示出了本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
视线追踪是计算机视觉中的一个重要领域,视线追踪的主要目的在于预测用户的视线方向,经研究发现,基于外观的视线预测模型往往使用深度学习模型实现,比如可以基于面部图像中的脸部特征或者眼部图像中的眼部特征来预测视线方向。
相关技术中,只是将面部图像和眼部图像当作不同的独立特征源,并未实质考虑面部图像和眼部图像之间的内在关系。实际上,眼部图像提供了专注于凝视的细粒度特征,而面部图像则提供了具有更广泛信息的粗粒度特征,二者的结合,能够更加准确地预测视线方向。
基于上述研究,本公开提供了一种视线方向确定方法。可以基于面部图像提取到目标对象的面部特征,该面部特征能够用于预测目标对象的初始视线方向。在基于面部特征和眼部图像确定目标对象的眼部特征后,可以通过由面部特征和眼部特征融合后的特征(也称为“融合特征”)来预测表征目标对象的实际视线方向与初始视线方向之间的差异的信息,即视线残差信息。然后再通过表征该差异的信息调整仅仅根据面部特征预测的初始视线方向,即能够得到更接近实际视线方向的视线方向。可见本公开实施例提出的视线确定方法能够预测得到更加准确的视线方向。
下面将结合本公开中附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一 个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种视线方向确定方法进行详细介绍。本公开实施例所提供的视线方向确定方法的执行主体一般为具有一定计算能力的计算机设备。该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端等。在一些可能的实现方式中,该视线方向确定方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面以执行主体为终端设备为例对本公开实施例提供的视线方向确定方法加以说明。
参见图1所示,为本公开实施例提供的视线方向确定方法的流程图,方法包括步骤S101~S103。
S101,获取目标对象的面部图像和眼部图像。
这里目标对象可以为待预测视线方向的用户,可以通过摄像机或者照相机等能够采集图像的设备对目标对象的脸部进行拍照,得到目标对象的面部图像,然后在该面部图像中截取目标对象的眼部图像。
S102,在面部图像中提取目标对象的面部特征。
S103,根据目标对象的面部特征和眼部图像确定目标对象的眼部特征。
这里,目标对象的面部特征,是指具有更广泛信息的粗粒度特征,通过这些面部特征,能够预测目标对象的初始视线方向;目标对象的眼部特征,是指能够表征专注于凝视的细粒度特征。眼部特征和面部特征的结合,能够较为准确地预测视线方向。
具体地,这里面部特征和眼部特征可以通过预先训练的进行视线方向预测的神经网络中用于进行特征提取的子神经网络来进行提取,将在后文实施例中进行详细介绍,在此不进行赘述。
S104,基于面部特征预测目标对象的初始视线方向,以及,基于由面部特征和眼部特征融合后的融合特征,预测得到视线残差信息。
其中,视线残差信息用于表征目标对象的实际视线方向与初始视线方向之间的差异。
这里的初始视线方向即可以基于面部特征来确定,具体地,可以基于预先训练的进行视线方向预测的神经网络中用于确定初始视线方向的子神经网络进行预测,具体预测方式将在后文结合实施例进行详细阐述。
这里的视线残差信息可以通过预先训练的进行视线方向预测的神经网络中用于确定视线残差信息的子神经网络进行预测,具体预测方式将在后文进行详细阐述。
这里通过由面部特征和眼部特征融合后的特征来预测表征目标对象的实际视线方向与初始视线方向之间的差异的信息,然后再通过该表征差异的信息调整仅仅根据面部特征预测的初始视线方向,即能够得到更接近实际视线方向的视线方向。即本公开提出将目标对象的面部图像和眼部图像进行结合,通过将眼部图像提供的专注于凝视的细粒度特征,以及面部图像提供的对应更广泛信息的粗粒度特征进行结合,来预测得到表征目标对象的实际视线方向与初始视线方向之间的差异的视线残差信息,从而利用该视线残差信息调整基于面部特征预测的目标对象的初始视线方向,进而得到更加准确的目标对 象的视线方向。
具体可以将面部特征和眼部特征输入预先训练的进行视线方向预测的神经网络中用于确定视线残差信息的子神经网络中,得到由面部特征和眼部特征融合后的特征,该方式将在后文结合具体实施例进行阐述。
S105,基于视线残差信息对初始视线方向进行修正,得到目标对象的视线方向。
具体地,这里的视线残差信息可以包括基于由面部特征和眼部特征融合后的特征确定的表征实际视线方向与初始视线方向之间的差异的信息,然后即可以基于该视线残差信息对初始视线方向进行调整,比如可以将该视线残差信息与基于面部特征预测的初始视线方向求和,得到更接近目标对象的实际视线方向的视线方向。
比如,如图2所示,表示一种用于确定视线方向的原理示意图,其中g b表示基于面部特征预测的目标对象的初始视线方向,g r表示视线残差信息,则最终得到的目标对象的视线方向g通过以下公式(1)表示:
g=g b+g r                                             (1);
视线残差信息在表示实际视线方向与初始视线方向的差异时,可以通过矢量进行表示。这里可以引入世界坐标系来表示初始视线方向和视线残差信息。在将视线残差信息和初始视线方向进行求和时,可以将初始视线方向和视线残差信息在世界坐标系中相同方向轴中的值对应相加,即得到目标对象的视线方向。
比如,若目标对象的实际视线方向为东偏南30度,而经过目标对象的面部特征预测得到的目标对象的初始视线方向为东偏南25度,经过由面部特征和眼部特征融合后的特征预测得到的视线残差信息为偏差4度,则通过视线残差信息对初始视线方向进行修正,则可以得到预测的目标对象的视线方向为东偏南29度,东偏南29度相比东偏南25度显然更接近目标对象的实际视线方向。
以上步骤S101~S105提出的视线方向确定方法,可以基于面部图像中提取到目标对象的面部特征,该面部特征能够预测目标对象的初始视线方向;在基于面部特征和眼部图像确定目标对象的眼部特征后,可以通过由面部特征和眼部特征融合后的特征来预测表征目标对象的实际视线方向与初始视线方向之间的差异的信息,即视线残差信息;然后再通过表征该差异的信息调整仅仅根据面部特征预测的初始视线方向,即能够得到更接近实际视线方向的视线方向。可见本公开实施例提出的视线确定方法能够预测得到更加准确的视线方向。
下面将结合具体的实施例来对上述S101~S105的过程进行分析。
针对上述在面部图像中提取目标对象的面部特征的步骤(S102),可以通过对面部图像进行图像分析,在面部图像中提取能够表征面部特征的位置点坐标,作为目标对象的面部特征。比如提取面颊、眼角等位置点坐标。或者,可以基于神经网络来提取目标对象的面部特征。
比如,目标对象的面部特征可以基于预先训练的进行视线方向预测的神经网络中进行特征提取的子神经网络来进行提取,具体包括:
将面部图像输入第一特征提取网络,经第一特征提取网络处理得到面部特征,第一特征提取网络为预先训练的进行视线方向预测的神经网络中,用于进行面部特征提取的 子神经网络。
这里的第一特征提取网络在预先训练的进行视线方向预测的神经网络中用于提取面部图像中的面部特征,即将面部图像输入该第一特征提取网络后,即可以提取到用于预测初始视线方向的面部特征。
这里通过预先训练的进行视线方向预测的神经网络中的第一特征提取网络来提取面部图像中的面部特征。由于在进行视线方向预测的神经网络中,该第一特征提取网络专用于提取面部图像的面部特征,从而能够提取更加准确的面部特征,进而便于提高初始视线方向的准确度。
上述眼部图像包括左眼图像和右眼图像。通常,左眼图像示出的左眼的外观和右眼图像示出的右眼的外观,会随着环境的变化或者头部姿态的变化发生变化。这样,基于左眼图像提取的左眼特征和基于右眼图像提取的右眼特征在确定视线方向时,可能会存在不同的贡献。考虑到此,根据目标对象的面部特征和眼部图像确定目标对象的眼部特征,如图3所示,可以包括以下步骤S301~S304。
S301,在左眼图像中提取左眼特征。
这里在左眼图像中提取左眼特征,可以是在左眼图像中提取能够表征眼部特征的位置点坐标,作为目标对象的左眼特征,比如瞳孔、眼角等位置点坐标,或者,可以基于预先训练的神经网络来提取左眼特征。
S302,在右眼图像中提取右眼特征。
同样,这里在右眼图像中提取右眼特征,可以是在右眼图像中提取能够表征眼部特征的位置点坐标,作为目标对象的右眼特征,比如瞳孔、眼角等位置点坐标,或者,可以基于预先训练的神经网络来提取右眼特征。
本公开以通过预先训练的神经网络来提取左眼特征和右眼特征为例进行说明:
将左眼图像输入第二特征提取网络,经第二特征提取网络处理得到左眼特征,以及将右眼图像输入第三特征提取网络,经第三特征提取网络处理得到右眼特征。
其中,第二特征提取网络为预先训练的进行视线方向预测的神经网络中,用于进行左眼特征提取的子神经网络。第三特征提取网络为预先训练的进行视线方向预测的神经网络中,用于进行右眼特征提取的子神经网络。
S303,根据面部特征、左眼特征和右眼特征,确定左眼特征对应的第一权重和右眼特征对应的第二权重。
这里左眼特征对应的第一权重表示左眼图像在确定视线方向时的贡献,右眼特征对应的第二权重表示右眼图像在确定视线方向时的贡献。在确定该第一权重和第二权重时,可以通过预先训练的神经网络来确定。比如可以将面部特征、左眼特征和右眼特征输入注意力网络,经注意力网络处理得到左眼特征对应的第一权重和右眼特征对应的第二权重。
其中,注意力网络为预先训练的进行视线方向预测的神经网络中,用于确定左眼特征和右眼特征各自的评价值的子神经网络。该评价值表征了左眼特征/右眼特征在眼部特征中的重要度。
将面部特征、左眼特征和右眼特征输入该注意力网络后,能够得到左眼特征和右眼特征各自的评价值。
具体地,在将面部特征、左眼特征和右眼特征输入注意力网络,经注意力网络处理得到第一权重和第二权重时,包括:
(1)根据面部特征和左眼特征确定左眼特征的第一分值,以及,根据面部特征和右眼特征确定右眼特征的第二分值;
(2)基于第一分值和第二分值,确定第一权重和第二权重。
同样,这里根据面部特征和左眼特征确定左眼特征的第一分值以及根据面部特征和右眼特征确定右眼特征的第二分值时,可以通过预先训练的神经网络来确定,比如通过注意力网络来确定,即:
将面部特征和左眼特征输入注意力网络,经注意力网络处理得到左眼特征的第一分值,以及,将面部特征和右眼特征输入注意力网络,经注意力网络处理得到右眼特征的第二分值。
这里基于第一分值和第二分值确定第一权重和第二权重也可以是通过注意力网络处理得到的。第一分值能够表示左眼图像在确定视线方向时的贡献,经过提前测试得知,该第一分值与脸部特征和左眼特征均相关。第一分值与面部特征相关,是指预测初始视线方向的面部特征能够影响左眼特征的分值。另外第一分值与左眼特征相关,即左眼形状、外观等也会影响左眼特征的分值。具体地,注意力网络在接收到面部特征和左眼特征后,能够按照以下公式(2)确定第一分值:
m l=W 1 Ttanh(W 2 Tf f+W 3 Tf l)                                  (2);
这里的m l即表示左眼特征对应的第一分值;W 1、W 2和W 3为注意力网络中网络参数,即注意力网络在训练完毕后得到的网络参数;f f表示面部特征;f l表示左眼特征。
对应地,第二分值能够表示右眼图像在确定视线方向时的贡献,经过提前测试得知,该第二分值与脸部特征和右眼特征均相关。第二分值与面部特征相关,是指预测初始视线方向的面部特征能够影响右眼特征的分值。另外第二分值与右眼特征相关,即右眼形状、外观等也会影响右眼特征的分值。具体地,注意力网络在接收到面部特征和右眼特征后,能够按照以下公式(3)确定第二分值:
m r=W 1 Ttanh(W 2 Tf f+W 3 Tf r)                                 (3);
这里的m r即表示右眼特征对应的第二分值;W 1、W 2和W 3为注意力网络中网络参数,即注意力网络在训练完毕后得到的网络参数;f f表示面部特征;f r表示右眼特征。
在得到左眼特征对应的第一分值,以及右眼特征对应的第二分值后,即可以进一步根据该第一分值和第二分值确定出左眼特征对应的第一权重和右眼特征对应的第二权重,具体可以根据以下公式(4)确定出第一权重和第二权重:
[w l,w r]=softmax([m l,m r])                                  (4);
其中,这里通过引入归一化指数函数softmax函数即可以得到表示左眼特征对应的第一权重w l;以及表示右眼特征对应的第二权重w r
以上确定左眼特征以及右眼特征各自对应的权重的过程示意图可以如图4所示,图4中可以分别通过深度神经网络CNN得到左眼特征f l和右眼特征f r,然后进一步将脸部特征f f、左眼特征f l和右眼特征f r输入注意力网络,得到左眼特征对应的第一权重w l,以及右眼特征对应的第二权重w r
S304,基于第一权重以及第二权重,对左眼特征和右眼特征进行加权求和,得到眼部特征。
这里可以是通过注意力网络执行基于第一权重和第二权重,对左眼特征和右眼特征进行加权求和,得到眼部特征的步骤。在得到左眼特征对应的第一权重以及右眼特征对应的第二权重后,即可以对左眼特征和右眼特征进行加权求和,具体可以根据以下公式(5)得到眼部特征f e
f e=w l*f l+w r*f r                                       (6);
本公开实施例通过将面部特征与左眼特征进行结合,以及将面部特征与右眼图像进行结合,分别确定出左眼图像和右眼图像在确定视线方向时的不同贡献,从而确定出准确度较高的眼部特征,进而便于提高视线残差信息的准确度。
在按照上述方式得到面部特征和眼部特征后,即可以进一步基于面部特征和眼部特征来确定目标对象的视线方向。确定目标对象的视线方向可以包括两个部分,第一部分是基于面部特征预测目标对象的初始视线方向的过程,第二部分是基于由面部特征和眼部特征融合后的特征预测目标对象的视线残差信息的过程。
其中,在基于面部特征预测目标对象的初始视线方向时,如图5所示,可以包括以下步骤S501~S502:
S501,确定面部特征中各个特征点的权重,并基于面部特征中各个特征点的权重,对面部特征进行调整;
S502,根据调整后的面部特征确定目标对象的初始视线方向。
面部特征可以包括多个特征点,特征点可以理解为由面部图像提取的不同的粗粒度特征,这些粗粒度特征可以包括例如面部图像中的区域特征、位置点特征等。面部特征中的每个特征点在预测初始视线方向时,所起的重要程度不同。这里可以先基于各个特征点的权重对面部特征进行调整,然后再基于调整后的面部特征确定目标对象的初始视线方向。
这里在对面部特征进行调整时,可以通过预先训练的神经网络进行调整,将在后文进行详细介绍。
在得到调整后的面部特征后,可以如图6所示的方式基于面部特征和眼部特征,确定融合后的特征,具体包括以下步骤S601~S602。
S601,根据调整后的面部特征、眼部特征、以及调整后的面部特征中各个特征点的权重确定中间特征。
S602,基于中间特征、调整后的面部特征,以及中间特征和调整后的面部特征分别对应的权重,对中间特征和调整后的面部特征进行加权求和,得到融合后的特征。
这里的中间特征可以通过预先训练的神经网络进行确定,通过该中间特征和调整后的面部特征,即可以确定由面部特征和眼部特征融合后的特征。
以上对面部特征进行调整,得到调整后的面部特征的过程,以及得到由面部特征和眼部特征融合后的特征的过程,均可以通过预先训练的神经网络进行处理,比如通过门网络进行处理。而根据调整后的面部特征确定目标对象的初始视线方向同样也可以基于预先训练的神经网络来确定,将在后文进行详细介绍。
本公开实施例中,可以根据以下步骤确定调整后的面部特征中各个特征点的权重:
根据眼部特征和调整后的面部特征确定调整后的面部特征中各个特征点的权重。
这里确定权重的方式,可以是按照预先设置好的权重分配方式进行确定,也可以通过预先训练的神经网络进行确定,将在后文进行详细介绍。
本公开实施例中,根据以下步骤确定中间特征和调整后的面部特征分别对应的权重:
根据眼部特征和调整后的面部特征确定中间特征和调整后的面部特征分别对应的权重。
同样,这里确定权重的方式,可以是按照预先设置好的权重分配方式进行确定,也可以通过预先训练的神经网络进行确定,将在后文进行详细介绍。
在介绍初始视线方向的确定过程,以及由面部特征和眼部特征融合后的特征的确定过程之前,先介绍门网络。首先,这里先引入门网络的概念。门网络在本公开实施例提出的预先训练的进行视线方向预测的神经网络中,起到对接收的特征进行过滤筛选的作用,即将重要特征的权重调大,将非重要特征的权重调小,具体将在下文结合实施例进行具体阐释,这里先结合公式(7)~公式(10)介绍门网络的特征变化方式:
z t=σ(W z·[h t-1,f])                                        (7);
r t=σ(W r·[h t-1,f])                                        (8);
Figure PCTCN2020134049-appb-000001
Figure PCTCN2020134049-appb-000002
其中,W z、W r、W h为门网络中的网络参数;σ表示sigmoid运算;ReLU表示激活函数;f表示接收的相应特征(在对面部特征进行处理时,这里的f即表示面部特征,在对眼部特征进行处理时,这里的f即表示眼部特征);z t表示经过sigmoid运算后得到的权重;r t表示经过sigmoid运算后得到的权重;
Figure PCTCN2020134049-appb-000003
表示输入门网络中的特征进行融合后得到的中间特征;h t表示中间特征与相邻门网络输出的特征的加权和,设置h 0等于 0。
本公开实施例需要确定基于面部特征预测目标对象的初始视线方向,以及基于由面部特征和眼部特征融合后的特征预测目标对象的视线残差信息。本公开实施例可以引入两个门网络来分别完成特征的过滤筛选,分别可以记为第一门网络和第二门网络,第一门网络输出的特征即可以记为h 1,第二门网络输出的特征即可以记为h 2,下面将结合具体实施例进行阐述。
首先介绍基于面部特征预测目标对象的初始视线方向的过程,这里可以先通过第一门网络对面部特征进行权重调整,得到调整后的面部特征h 1,然后再基于调整后的面部特征h 1预测初始视线方向,具体包括以下步骤。
(1)将面部特征输入第一门网络,经第一门网络进行处理得到面部特征中各个特征点的权重。
这里的面部特征可以包括多个特征点。这里的特征点可以理解为面部图像中不同的粗粒度特征,这些粗粒度特征可以包括面部图像中的区域特征、位置点特征等。面部特征中的每个特征点在预测初始视线方向时,所起的重要程度不同。这里通过第一门网络来确定面部特征中各个特征点的权重。这里的第一门网络为预先训练的进行视线方向预测的神经网络中用于调整面部特征的子神经网络。
这里第一门网络得到面部特征中各个特征点的权重可以通过上述公式(7)和公式(8)得到。因为第一门网络最终输出的为h 1,则在引用公式(7)和公式(8)时,令t=1,f=f f,此时得到z 1=σ(W z·[h 0,f f]),以及r 1=σ(W r·[h 0,f f]),然后可以基于得到的z 1和r 1来进一步对面部特征进行调整,这里的h 0等于0。
(2)基于面部特征中各个特征点的权重,对面部特征进行调整。
这里也可以是通过第一门网络基于面部特征中各个特征点的权重,对面部特征进行调整的。将上述得到面部特征中各个特征点的权重r 1代入上述公式(9),并令t=1,f=f f,则得到面部特征的中间特征
Figure PCTCN2020134049-appb-000004
以及将上述得到的面部特征的中间特征的权重z 1,以及相邻门网络输出的特征h 0对应的权重1-z 1代入上述公式(10),并令t=1,f=f f得到调整后的面部特征
Figure PCTCN2020134049-appb-000005
这里h 0等于0。
(3)将调整后的面部特征输入第一多层感知机(multilayer perception,MLP),经第一多层感知机进行处理得到目标对象的初始视线方向。
这里第一多层感知机为预先训练的进行视线方向预测的神经网络中,用于预测初始视线方向的子神经网络。
调整后的面部特征即记为h 1,然后将调整后的面部特征输入第一多层感知机MLP,即可以得到目标对象的初始视线方向。
这里提出第一门网络对面部特征中各个特征点的权重进行调整,使得对初始视线方向影响较大的特征点的权重大于对初始视线方向影响较小的特征点的权重,这样再将调整后的面部特征输入预测初始视线方向的第一多层感知机,得到较为准确的初始视线方向。
下面,再介绍基于面部特征和眼部特征,确定融合后的特征的过程,具体包括:
将眼部特征和调整后的面部特征输入第二门网络,经第二门网络进行处理得到融合后的特征;第二门网络为预先训练的进行视线方向预测的神经网络中,用于预测融合后的特征的子神经网络。
这里调整后的面部特征即为上述第一门网络输出的h 1,然后再将该h 1和眼部特征f e输入第二门网络,即可以得到第二门网络输出的融合后的特征h 2
具体地,在将眼部特征和调整后的面部特征输入第二门网络,经第二门网络进行处理得到融合后的特征时,包括以下两个步骤:
(1)通过第二门网络,对调整后的面部特征、眼部特征、以及调整后的面部特征中各个特征点的权重进行处理,得到中间特征;
(2)基于中间特征、调整后的面部特征,以及中间特征和调整后的面部特征分别对应的权重,通过第二门网络对中间特征和调整后的面部特征进行加权求和,得到融合后的特征。
针对上述第(1)步,这里调整后的面部特征中各个特征点的权重可以根据以下方式确定:
通过第二门网络对眼部特征和调整后的面部特征进行第一处理得到调整后的面部特征中各个特征点的权重,这里第二门网络进行第一处理时使用训练好的权重分配函数中的第一网络参数信息。
这里通过第二门网络对调整后的面部特征h 1和眼部特征f e进行第一处理得到调整后的面部特征中各个特征点的权重时,可以引用上述公式(8),这里令t=2,f=f e,即可以得到面部特征中各个特征点的权重r 2=σ(W r·[h 1,f e]),该公式即对应上述提到的第二门网络对眼部特征和调整后的面部特征进行的第一处理,其中权重分配函数为σ表示的sigmoid运算;第一网络参数信息即为W r
在得到面部特征中各个特征点的权重后,即可以引入公式(9)对调整后的面部特征、眼部特征、以及调整后的面部特征中各个特征点的权重进行处理,得到中间特征,即得到中间特征为
Figure PCTCN2020134049-appb-000006
针对上述第(2)步,中间特征和调整后的面部特征分别对应的权重可以根据以下方式确定:
对眼部特征和调整后的面部特征进行第二处理得到中间特征和调整后的面部特征分别对应的权重,这里第二门网络进行第二处理时使用训练好的权重分配函数中的第二网络参数信息。
对调整后的面部特征h 1和眼部特征f e进行第二处理得到中间特征和调整后的面部特征h 1分别对应的权重,可以引用上述公式(7),且令t=2,f=f e时,即可以得到中间特征对应的权重z 2=σ(W z·[h 1,f e]),该公式即对应上述提到的第二门网络对眼部特征和调整后的面部特征进行的第二处理,其中权重分配函数为σ表示的sigmoid运算;第二网络参数信息即为W z,这样得到中间特征对应的权重为z 2,调整后的面部特征h 1对应的权重为1-z 2
然后在得到中间特征和调整后的面部特征分别对应的权重后,进一步地,通过引入上述公式(10),同样令t=2,f=f e,则基于中间特征、调整后的面部特征,以及中间特征和调整后的面部特征分别对应的权重,通过第二门网络对中间特征和调整后的面部特征进行加权求和,得到由面部特征和眼部特征融合后的特征:
Figure PCTCN2020134049-appb-000007
在得到由面部特征和眼部特征融合后的特征后,可以按照以下方式基于由面部特征和眼部特征融合后的特征,预测得到视线残差信息:
将融合后的特征输入第二多层感知机MLP,经第二多层感知机进行处理得到视线残差信息。其中,第二多层感知机为预先训练的进行视线方向预测的神经网络中,用于预测视线残差信息的子神经网络。
这里融合后的特征即记为h 2,然后将融合后的特征输入第二多层感知机MLP,即可以得到目标对象的视线残差信息。
以上确定初始视线方向以及确定视线残差信息的过程示意图可以通过图7所示的两个子神经网络确定。其中,第一子神经网络包括第一门网络(Gate function)和第一多层感知机MLP,第二子神经网络包括第二门网络(Gate function)和第二多层感知机MLP。面部特征(Face feature)输入第一门网络后,可以经过第一门网络调整,得到调整后的面部特征h 1。该调整后的面部特征h 1可以一方面输入第一多层感知机得到初始视线方向g b,另一方面与眼部特征(Eye feature)一起输入第二门网络后,经过第二门网络处理,得到由面部特征和眼部特征融合后的特征h 2。然后将融合后的特征h 2输入第二多层感知机得到视线残差信息g r
以上通过将眼部特征和经第一门网络调整后的面部特征输入第二门网络进行处理,确定由面部特征和眼部特征融合后的特征。该融合后的特征是综合考虑了面部图像和眼部图像后得到的特征,从而便于通过该融合后的特征确定目标对象的实际视线方向与初始视线方向之间的差异。在根据该差异对初始视线方向进行修正后,即可以得到较为准确的视线方向。
综上所有实施例,可以结合如图8所示的示意图对本公开实施例提供的视线方向确定方法进行说明。
在得到面部图像后,在该面部图像中截取眼部图像,该眼部图像包括左眼图像和右眼图像。将面部图像输入第一特征提取网络(CNN),得到面部特征f f。然后,将该面部特征输入上述提到的第一子神经网络(第一子神经网络包括第一门网络和第一多层感知机)进行处理,即可以得到初始视线方向g b。另外,将截取的眼部图像中的左眼图像输入第二特征提取网络得到左眼特征f l,将右眼图像输入第三特征提取网络得到右眼特征f r。然后,将左眼特征、右眼特征和面部特征输入注意力网络,即可以得到眼部特征f e。然后,将眼部特征和经过预测初始视线方向的子神经网络得到的调整后的面部特征h 1输入第二子神经网络(第二子神经网络包括第二门网络和第二多层感知机)进行处理,即可以得到视线残差信息g r
进一步地,在得到初始视线方向g b和视线残差信息g r,即可以基于视线残差信息g r对初始视线方向进行修正,得到目标对象的视线方向。
综上,本申请实施例提出的视线方向确定方法可以由神经网络实现,神经网络利用包含了目标样本对象的标注视线方向的样本图像训练得到。该标注视线方向即为目标样本对象的实际视线方向。
具体地,如图9所示,本申请实施例提出的用于确定视线方向的神经网络可以采用以下步骤训练得到,包括步骤S901~S906。
S901,获取样本图像中的目标样本对象的面部样本图像和眼部样本图像。
这里目标样本对象可以包括分别位于不同空间位置点的多个目标对象。使得这多个目标对象均看向同一观测方向,并获取这些目标样本对象的面部图像作为面部样本图像。然后在面部样本图像中截取眼部样本图像。或者,这里的目标样本对象可以包括一个目标对象。使得该目标样本图像分别看向不同观测方向,并获取该目标样本对象对应各观测方向的面部图像作为面部样本图像,然后在面部样本图像中截取眼部样本图像。
S902,在面部样本图像中提取目标样本对象的面部特征。
这里在面部样本图像提取目标样本对象的面部特征,与上文介绍的提取目标对象的面部特征的方式相似,在此不再赘述。
S903,根据目标样本对象的面部特征和眼部样本图像确定目标样本对象的眼部特征。
这里确定目标样本对象的眼部特征,与上文介绍的确定目标对象的眼部特征的方式相似,在此不再赘述。
S904,基于目标样本对象的面部特征预测目标样本对象的初始视线方向,以及, 基于由目标样本对象的面部特征和目标样本对象的眼部特征融合后的特征,预测得到目标样本对象的视线残差信息。
同样,这里确定目标样本对象的初始视线方向以及视线残差信息与上文确定目标对象的初始视线方向以及视线残差信息的方式相似,在此不再赘述。
S905,基于目标样本对象的视线残差信息对目标样本对象的初始视线方向进行修正,得到目标样本对象的视线方向。
这里对目标样本对象的初始视线方向进行修正的方式与上文介绍的基于目标对象的视线残差信息对目标对象的初始视线方向进行修正的方式相似,在此不再赘述。
S906,基于得到的目标样本对象的视线方向和目标样本对象的标注视线方向,对神经网络的网络参数值进行调整。
这里可以引入损失函数确定预测视线方向对应的损失值。经过多次训练后,通过损失值来对神经网络的网络参数值进行调整。比如使得损失值小于设定阈值时,即可以停止训练,从而得到神经网络的网络参数值。
另外,针对如何基于面部特征、左眼特征、右眼特征和注意力网络得到眼部特征,与上文介绍的视线方向确定方法中确定眼部特征的详细过程相似,在此不再赘述;针对如何基于面部特征预测目标样本对象的初始视线方向,以及如何基于面部特征和眼部特征确定融合后的特征,以及如何基于融合后的特征确定目标样本对象的视线残差信息同样与上文介绍的视线方向确定方法中确定融合后的特征和确定视线残差信息的过程相似,在此不再赘述。
根据本公开实施例提供的神经网络的训练方法,可以获取样本图像中的目标样本对象的面部样本图像和眼部样本图像。然后,基于面部样本图像提取到目标样本对象的面部特征,该目标样本对象的面部特征能够预测目标样本对象的初始视线方向。在基于目标样本对象的面部特征和眼部样本图像确定目标样本对象的眼部特征后,可以通过由目标样本对象的面部特征和眼部特征融合后的特征来预测表征目标样本对象的实际视线方向与初始视线方向之间的差异的信息,即视线残差信息。然后,再通过该表征差异的信息调整仅仅根据目标样本对象的面部特征预测的初始视线方向,即能够得到更接近目标样本对象的标注视线方向的视线方向。基于得到的视线方向和标注视线方向对神经网络的网络参数值进行调整,即可以得到准确度较高神经网络。基于该准确度较高的神经网络即可以对目标对象的视线方向进行准确预测。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一技术构思,本公开实施例中还提供了与上述视线方向确定方法对应的视线方向确定装置,由于本公开实施例中的视线方向确定装置解决问题的原理与本公开实施例上述视线方向确定方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图10所示,为本公开实施例提供的一种视线方向确定装置1000的示意图,该视线方向确定装置1000包括:图像获取模块1001、特征提取模块1002、视线预测模块1003、视线修正模块1004。
其中,图像获取模块1001,用于获取目标对象的面部图像和眼部图像。
特征提取模块1002,用于在面部图像中提取目标对象的面部特征,以及用于根据目标对象的面部特征和眼部特征确定目标对象的眼部特征。
视线预测模块1003,用于基于面部特征预测目标对象的初始视线方向,以及,基于由面部特征和眼部特征融合后的融合特征,预测得到视线残差信息。
视线修正模块1004,用于基于视线残差信息对初始视线方向进行修正,得到目标对象的视线方向。
在一种可能的实施方式中,眼部图像包括左眼图像和右眼图像,特征提取模块1002在用于根据目标对象的面部特征和眼部特征确定目标对象的眼部特征时,执行以下操作:在左眼图像中提取左眼特征;在右眼图像中提取右眼特征;根据面部特征、左眼特征和右眼特征,确定左眼特征对应的第一权重和右眼特征对应的第二权重;基于第一权重以及第二权重,对左眼特征和右眼特征进行加权求和,得到眼部特征。
在一种可能的实施方式中,特征提取模块1002在用于根据面部特征、左眼特征和右眼特征,确定左眼特征对应的第一权重和右眼特征对应的第二权重时,执行以下操作:根据面部特征和左眼特征确定左眼特征的第一分值,以及,根据面部特征和右眼特征确定右眼特征的第二分值;基于第一分值和第二分值,确定第一权重和第二权重。
在一种可能的实施方式中,视线预测模块1003在用于基于面部特征预测目标对象的初始视线方向时,执行以下操作:确定面部特征中各个特征点的权重,并基于面部特征中各个特征点的权重,对面部特征进行调整;根据调整后的面部特征确定目标对象的初始视线方向。
在一种可能的实施方式中,视线预测模块1003用于按照以下方式,基于面部特征和眼部特征,确定融合后的特征:根据调整后的面部特征、眼部特征、以及调整后的面部特征中各个特征点的权重确定中间特征;基于中间特征、调整后的面部特征,以及中间特征和调整后的面部特征分别对应的权重,对中间特征和调整后的面部特征进行加权求和,得到融合特征。
在一种可能的实施方式中,视线预测模块1003根据以下方式确定调整后的面部特征中各个特征点的权重:根据眼部特征和调整后的面部特征确定调整后的面部特征中各个特征点的权重。
在一种可能的实施方式中,视线预测模块1003按照以下方式确定中间特征和调整后的面部特征分别对应的权重:根据眼部特征和调整后的面部特征确定中间特征和调整后的面部特征分别对应的权重。
在一种可能的实施方式中,视线方向确定装置1000还包括神经网络训练模块1005,神经网络训练模块1005用于:训练用于确定目标对象的视线方向的神经网络,神经网络利用了包含目标样本对象的标注视线方向的样本图像训练得到。
在一种可能的实施方式中,神经网络训练模块1005按照以下方式训练神经网络:获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;在面部样本图像中提取目标样本对象的面部特征;根据目标样本对象的面部特征和眼部样本图像确定目标样本对象的眼部特征;基于目标样本对象的面部特征预测目标样本对象的初始视线方向, 以及,基于由目标样本对象的面部特征和目标样本对象的眼部特征融合后的融合特征,预测得到目标样本对象的视线残差信息;基于目标样本对象的视线残差信息对目标样本对象的初始视线方向进行修正,得到目标样本对象的视线方向;基于得到的目标样本对象的视线方向和目标样本对象的标注视线方向,对神经网络的网络参数值进行调整。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
对应于图1中的视线方向确定方法,本公开实施例还提供了一种电子设备,如图11所示,为本公开实施例提供的电子设备1100的结构示意图,包括:处理器1101、存储介质1102和总线1103;存储介质1102用于存储执行指令,包括内存11021和外部存储器11022;这里的内存11021也称内存储器,用于暂时存放处理器1101的运算数据,以及与硬盘等外部存储器11022交换的数据,处理器1101通过内存11021与外部存储器11022进行数据交换,当电子设备1100运行时,处理器1101与存储器1102之间通过总线1103通信,机器可读指令被处理器1101执行时执行如下处理:
获取目标对象的面部图像和眼部图像;在面部图像中提取目标对象的面部特征;根据目标对象的面部特征和眼部图像确定目标对象的眼部特征;基于面部特征预测目标对象的初始视线方向,以及,基于由面部特征和眼部特征融合后的融合特征,预测得到视线残差信息;基于视线残差信息对初始视线方向进行修正,得到目标对象的视线方向。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述视线方向确定方法实施例中所述的视线方向确定方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的视线方向确定方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的视线方向确定方法的步骤,具体可参见上述方法实施例,在此不再赘述。
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方 案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (20)

  1. 一种视线方向确定方法,其特征在于,包括:
    获取目标对象的面部图像和眼部图像;
    在所述面部图像中提取所述目标对象的面部特征;
    根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征;
    基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;
    基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
  2. 根据权利要求1所述的视线方向确定方法,其特征在于,所述眼部图像包括左眼图像和右眼图像,所述根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征,包括:
    在所述左眼图像中提取左眼特征;
    在所述右眼图像中提取右眼特征;
    根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重;
    基于所述第一权重以及所述第二权重,对所述左眼特征和所述右眼特征进行加权求和,得到所述眼部特征。
  3. 根据权利要求2所述的视线方向确定方法,其特征在于,所述根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重,包括:
    根据所述面部特征和所述左眼特征确定所述左眼特征的第一分值,以及,根据所述面部特征和所述右眼特征确定所述右眼特征的第二分值;
    基于所述第一分值和第二分值,确定所述第一权重和第二权重。
  4. 根据权利要求1至3任一所述的视线方向确定方法,其特征在于,所述基于所述面部特征预测所述目标对象的初始视线方向,包括:
    确定所述面部特征中各个特征点的权重,并基于所述面部特征中各个特征点的权重,对所述面部特征进行调整;
    根据调整后的面部特征确定所述目标对象的初始视线方向。
  5. 根据权利要求4所述的视线方向确定方法,其特征在于,按照以下方式基于所述面部特征和所述眼部特征,确定所述融合特征:
    根据所述调整后的面部特征、所述眼部特征、以及所述调整后的面部特征中各个特征点的权重确定中间特征;
    基于所述中间特征、所述调整后的面部特征,以及所述中间特征和所述调整后的面部特征分别对应的权重,对所述中间特征和所述调整后的面部特征进行加权求和,得到所述融合特征。
  6. 根据权利要求5所述的视线方向确定方法,其特征在于,按照以下方式确定所述调整后的面部特征中各个特征点的权重:
    根据所述眼部特征和所述调整后的面部特征确定所述调整后的面部特征中各个特征点的权重。
  7. 根据权利要求5所述的视线方向确定方法,其特征在于,按照以下方式确定所述中间特征和所述调整后的面部特征分别对应的权重:
    根据所述眼部特征和所述调整后的面部特征确定所述中间特征和所述调整后的面部特征分别对应的权重。
  8. 根据权利要求1至7任一所述的视线方向确定方法,其特征在于,
    所述视线方向确定方法由神经网络实现,所述神经网络利用包含了目标样本对象的标注视线方向的样本图像训练得到。
  9. 根据权利要求8所述的方法,其特征在于,所述神经网络采用以下方式训练得到:
    获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;
    在所述面部样本图像中提取所述目标样本对象的面部特征;
    根据所述目标样本对象的面部特征和所述眼部样本图像确定所述目标样本对象的眼部特征;
    基于所述目标样本对象的面部特征预测所述目标样本对象的初始视线方向,以及,基于由所述目标样本对象的面部特征和所述目标样本对象的眼部特征融合后的融合特征,预测得到所述目标样本对象的视线残差信息;
    基于所述目标样本对象的视线残差信息对所述目标样本对象的初始视线方向进行修正,得到所述目标样本对象的视线方向;
    基于得到的所述目标样本对象的视线方向和所述目标样本对象的标注视线方向,对所述神经网络的网络参数值进行调整。
  10. 一种视线方向确定装置,其特征在于,包括:
    图像获取模块,用于获取目标对象的面部图像和眼部图像;
    特征提取模块,用于在所述面部图像中提取所述目标对象的面部特征;以及用于根据所述目标对象的面部特征和所述眼部特征确定所述目标对象的眼部特征;
    视线预测模块,用于基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;
    视线修正模块,用于基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
  11. 根据权利要求10所述的视线方向确定装置,其特征在于,所述眼部图像包括左眼图像和右眼图像,所述特征提取模块在用于根据所述目标对象的面部特征和所述眼部特征确定所述目标对象的眼部特征时,执行以下操作:
    在所述左眼图像中提取左眼特征;
    在所述右眼图像中提取右眼特征;
    根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重;
    基于所述第一权重以及所述第二权重,对所述左眼特征和所述右眼特征进行加权求和,得到所述眼部特征。
  12. 根据权利要求11所述的视线方向确定装置,其特征在于,所述特征提取模块在用于根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重时,执行以下操作:
    根据所述面部特征和所述左眼特征确定所述左眼特征的第一分值,以及,根据所述面部特征和所述右眼特征确定所述右眼特征的第二分值;
    基于所述第一分值和第二分值,确定所述第一权重和第二权重。
  13. 根据权利要求10至12任一所述的视线方向确定装置,其特征在于,所述视线 预测模块在用于基于所述面部特征预测所述目标对象的初始视线方向时,执行以下操作:
    确定所述面部特征中各个特征点的权重,并基于所述面部特征中各个特征点的权重,对所述面部特征进行调整;
    根据调整后的面部特征确定所述目标对象的初始视线方向。
  14. 根据权利要求13所述的视线方向确定装置,其特征在于,所述视线预测模块按照以下方式基于所述面部特征和所述眼部特征,确定所述融合特征:
    根据所述调整后的面部特征、所述眼部特征、以及所述调整后的面部特征中各个特征点的权重确定中间特征;
    基于所述中间特征、所述调整后的面部特征,以及所述中间特征和所述调整后的面部特征分别对应的权重,对所述中间特征和所述调整后的面部特征进行加权求和,得到所述融合特征。
  15. 根据权利要求14所述的视线方向确定装置,其特征在于,所述视线预测模块根据以下方式确定所述调整后的面部特征中各个特征点的权重:
    根据所述眼部特征和所述调整后的面部特征确定所述调整后的面部特征中各个特征点的权重。
  16. 根据权利要求14所述的视线方向确定装置,其特征在于,所述视线预测模块按照以下方式确定所述中间特征和所述调整后的面部特征分别对应的权重:
    根据所述眼部特征和所述调整后的面部特征确定所述中间特征和所述调整后的面部特征分别对应的权重。
  17. 根据权利要求10至16任一所述的视线方向确定装置,其特征在于,所述视线方向确定装置还包括神经网络训练模块,所述神经网络训练模块用于:
    训练用于确定所述目标对象的视线方向的神经网络,所述神经网络利用了包含目标样本对象的标注视线方向的样本图像训练得到。
  18. 根据权利要求17所述的视线方向确定装置,其特征在于,所述神经网络训练模块按照以下方式训练所述神经网络:
    获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;
    在所述面部样本图像中提取所述目标样本对象的面部特征;
    根据所述目标样本对象的面部特征和所述眼部样本图像确定所述目标样本对象的眼部特征;
    基于所述目标样本对象的面部特征预测所述目标样本对象的初始视线方向,以及,基于由所述目标样本对象的面部特征和所述目标样本对象的眼部特征融合后的融合特征,预测得到所述目标样本对象的视线残差信息;
    基于所述目标样本对象的视线残差信息对所述目标样本对象的初始视线方向进行修正,得到所述目标样本对象的视线方向;
    基于得到的所述目标样本对象的视线方向和所述目标样本对象的标注视线方向,对所述神经网络的网络参数值进行调整。
  19. 一种电子设备,其特征在于,包括:处理器、非暂时性存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令促使所述处理器执行如权利要求1至9任一所述的视线方向确定方法。
  20. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序促使处理器执行如权利要求1至9任一所述的视线方向确定方法。
PCT/CN2020/134049 2019-12-30 2020-12-04 视线方向确定方法、装置、电子设备及存储介质 WO2021135827A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022524710A JP7309116B2 (ja) 2019-12-30 2020-12-04 視線方向特定方法、装置、電子機器及び記憶媒体
KR1020217034841A KR20210140763A (ko) 2019-12-30 2020-12-04 시선 방향 결정 방법, 장치, 전자 장치 및 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911403648.2 2019-12-30
CN201911403648.2A CN111178278B (zh) 2019-12-30 2019-12-30 视线方向确定方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021135827A1 true WO2021135827A1 (zh) 2021-07-08

Family

ID=70646509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134049 WO2021135827A1 (zh) 2019-12-30 2020-12-04 视线方向确定方法、装置、电子设备及存储介质

Country Status (4)

Country Link
JP (1) JP7309116B2 (zh)
KR (1) KR20210140763A (zh)
CN (1) CN111178278B (zh)
WO (1) WO2021135827A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220222969A1 (en) * 2021-01-13 2022-07-14 Beihang University Method for determining the direction of gaze based on adversarial optimization

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2996269A1 (en) * 2014-09-09 2016-03-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio splicing concept
CN111178278B (zh) * 2019-12-30 2022-04-08 上海商汤临港智能科技有限公司 视线方向确定方法、装置、电子设备及存储介质
CN113807119B (zh) * 2020-05-29 2024-04-02 魔门塔(苏州)科技有限公司 一种人员注视位置检测方法及装置
CN113743172B (zh) * 2020-05-29 2024-04-16 魔门塔(苏州)科技有限公司 一种人员注视位置检测方法及装置
CN112183200B (zh) * 2020-08-25 2023-10-17 中电海康集团有限公司 一种基于视频图像的眼动追踪方法和系统
CN112749655B (zh) * 2021-01-05 2024-08-02 风变科技(深圳)有限公司 视线追踪方法、装置、计算机设备和存储介质
CN113361441B (zh) * 2021-06-18 2022-09-06 山东大学 基于头部姿态和空间注意力的视线区域估计方法及系统
CN113378777A (zh) * 2021-06-30 2021-09-10 沈阳康慧类脑智能协同创新中心有限公司 基于单目摄像机的视线检测的方法和装置
CN113705550B (zh) * 2021-10-29 2022-02-18 北京世纪好未来教育科技有限公司 一种训练方法、视线检测方法、装置和电子设备
CN114360042A (zh) * 2022-01-07 2022-04-15 桂林电子科技大学 一种人眼注视方向预测方法及系统
CN116052264B (zh) * 2023-03-31 2023-07-04 广州视景医疗软件有限公司 一种基于非线性偏差校准的视线估计方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419664A (zh) * 2007-10-25 2009-04-29 株式会社日立制作所 视线方向计测方法和视线方向计测装置
CN101489467A (zh) * 2006-07-14 2009-07-22 松下电器产业株式会社 视线方向检测装置和视线方向检测方法
CN102547123A (zh) * 2012-01-05 2012-07-04 天津师范大学 基于人脸识别技术的自适应视线跟踪系统及其跟踪方法
CN103246044A (zh) * 2012-02-09 2013-08-14 联想(北京)有限公司 一种自动对焦方法、系统及具有该系统的照相机和摄像机
CN107193383A (zh) * 2017-06-13 2017-09-22 华南师范大学 一种基于人脸朝向约束的二级视线追踪方法
CN109508679A (zh) * 2018-11-19 2019-03-22 广东工业大学 实现眼球三维视线跟踪的方法、装置、设备及存储介质
CN111178278A (zh) * 2019-12-30 2020-05-19 上海商汤临港智能科技有限公司 视线方向确定方法、装置、电子设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9563805B2 (en) * 2014-09-02 2017-02-07 Hong Kong Baptist University Method and apparatus for eye gaze tracking
JP6946831B2 (ja) 2017-08-01 2021-10-13 オムロン株式会社 人物の視線方向を推定するための情報処理装置及び推定方法、並びに学習装置及び学習方法
US20190212815A1 (en) * 2018-01-10 2019-07-11 Samsung Electronics Co., Ltd. Method and apparatus to determine trigger intent of user
CN108615014B (zh) * 2018-04-27 2022-06-21 京东方科技集团股份有限公司 一种眼睛状态的检测方法、装置、设备和介质
CN110503068A (zh) 2019-08-28 2019-11-26 Oppo广东移动通信有限公司 视线估计方法、终端及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101489467A (zh) * 2006-07-14 2009-07-22 松下电器产业株式会社 视线方向检测装置和视线方向检测方法
CN101419664A (zh) * 2007-10-25 2009-04-29 株式会社日立制作所 视线方向计测方法和视线方向计测装置
CN102547123A (zh) * 2012-01-05 2012-07-04 天津师范大学 基于人脸识别技术的自适应视线跟踪系统及其跟踪方法
CN103246044A (zh) * 2012-02-09 2013-08-14 联想(北京)有限公司 一种自动对焦方法、系统及具有该系统的照相机和摄像机
CN107193383A (zh) * 2017-06-13 2017-09-22 华南师范大学 一种基于人脸朝向约束的二级视线追踪方法
CN109508679A (zh) * 2018-11-19 2019-03-22 广东工业大学 实现眼球三维视线跟踪的方法、装置、设备及存储介质
CN111178278A (zh) * 2019-12-30 2020-05-19 上海商汤临港智能科技有限公司 视线方向确定方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220222969A1 (en) * 2021-01-13 2022-07-14 Beihang University Method for determining the direction of gaze based on adversarial optimization
US12106606B2 (en) * 2021-01-13 2024-10-01 Beihang University Method for determining the direction of gaze based on adversarial optimization

Also Published As

Publication number Publication date
CN111178278B (zh) 2022-04-08
JP7309116B2 (ja) 2023-07-18
CN111178278A (zh) 2020-05-19
KR20210140763A (ko) 2021-11-23
JP2022553776A (ja) 2022-12-26

Similar Documents

Publication Publication Date Title
WO2021135827A1 (zh) 视线方向确定方法、装置、电子设备及存储介质
Chen et al. Fsrnet: End-to-end learning face super-resolution with facial priors
JP4829141B2 (ja) 視線検出装置及びその方法
EP4307233A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
JP2019028843A (ja) 人物の視線方向を推定するための情報処理装置及び推定方法、並びに学習装置及び学習方法
US9508004B2 (en) Eye gaze detection apparatus, computer-readable recording medium storing eye gaze detection program and eye gaze detection method
US10037624B2 (en) Calibrating object shape
US10254831B2 (en) System and method for detecting a gaze of a viewer
CN111723707B (zh) 一种基于视觉显著性的注视点估计方法及装置
CN106133649A (zh) 使用双目注视约束的眼睛凝视跟踪
WO2022257487A1 (zh) 深度估计模型的训练方法, 装置, 电子设备及存储介质
CN114503162A (zh) 具有不确定性的特征点位置估计的图像处理系统和方法
US9747695B2 (en) System and method of tracking an object
EP3506149A1 (en) Method, system and computer program product for eye gaze direction estimation
WO2021217937A1 (zh) 姿态识别模型的训练方法及设备、姿态识别方法及其设备
CN114333046A (zh) 舞蹈动作评分方法、装置、设备和存储介质
CN115841602A (zh) 基于多视角的三维姿态估计数据集的构建方法及装置
WO2015176502A1 (zh) 一种图像特征的估计方法和设备
JP2022140386A (ja) 顔の姿勢を検出する装置及び方法、画像処理システム、並びに記憶媒体
CN113903210A (zh) 虚拟现实模拟驾驶方法、装置、设备和存储介质
CN116386087B (zh) 目标对象处理方法以及装置
WO2020044630A1 (ja) 検出器生成装置、モニタリング装置、検出器生成方法及び検出器生成プログラム
CN115953813B (zh) 一种表情驱动方法、装置、设备及存储介质
JP6952298B2 (ja) 視線変換装置及び視線変換方法
JP2011232845A (ja) 特徴点抽出装置および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909241

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217034841

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022524710

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20909241

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20909241

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 30/01/2023)