WO2021135827A1 - 视线方向确定方法、装置、电子设备及存储介质 - Google Patents
视线方向确定方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021135827A1 WO2021135827A1 PCT/CN2020/134049 CN2020134049W WO2021135827A1 WO 2021135827 A1 WO2021135827 A1 WO 2021135827A1 CN 2020134049 W CN2020134049 W CN 2020134049W WO 2021135827 A1 WO2021135827 A1 WO 2021135827A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- eye
- line
- facial
- features
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
Definitions
- the present disclosure relates to the field of image processing technology, and in particular, to a method, device, electronic device, and storage medium for determining the direction of the line of sight.
- gaze tracking is an important field in computer vision.
- the main purpose of gaze tracking is to predict the user’s gaze direction. Because the user’s gaze direction is often related to the user’s personal intention, this makes the gaze tracking technology useful in the understanding of the user’s intention. Important role, so how to accurately determine the direction of the user's line of sight becomes particularly important.
- the embodiments of the present disclosure provide at least one solution for determining the line of sight direction.
- an embodiment of the present disclosure provides a method for determining a line of sight direction, including: acquiring a facial image and an eye image of a target object; extracting facial features of the target object from the facial image; The facial features and the eye image determine the eye features of the target object; predict the initial line of sight direction of the target object based on the facial features, and based on the fusion of the facial features and the eye features Based on the fusion features of, the line-of-sight residual information is predicted to obtain; based on the line-of-sight residual information, the initial line-of-sight direction is corrected to obtain the line-of-sight direction of the target object.
- the method for determining the line of sight direction can extract facial features of a target object based on facial images, which can predict the initial line of sight direction of the target object, and determine the eye features of the target object based on facial features and eye images . Then, the information representing the difference between the actual line-of-sight direction of the target object and the initial line-of-sight direction, that is, the line-of-sight residual information, can be predicted by the fusion feature after the facial feature and the eye feature are fused. Then, the initial line of sight direction predicted only based on facial features is adjusted by the information that characterizes the difference, that is, the line of sight direction that is closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
- the eye image includes a left eye image and a right eye image
- the determining the eye feature of the target object according to the facial feature of the target object and the eye image includes : Extract left-eye features from the left-eye image; extract right-eye features from the right-eye image; determine that the left-eye feature corresponds to the facial feature, the left-eye feature, and the right-eye feature
- the first weight of and the second weight corresponding to the right-eye feature; based on the first weight and the second weight, the left-eye feature and the right-eye feature are weighted and summed to obtain the eye Department features.
- the different contributions of the left-eye image and the right-eye image in determining the direction of the line of sight are respectively determined, thereby determining the accuracy
- the higher eye features which in turn facilitates the improvement of the accuracy of predicting the residual information of the line of sight.
- the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature are determined according to the facial feature, the left-eye feature, and the right-eye feature.
- the weight includes: determining the first score of the left-eye feature according to the facial feature and the left-eye feature, and determining the second score of the right-eye feature according to the facial feature and the right-eye feature Value; based on the first score and the second score, the first weight and the second weight are determined.
- the predicting the initial line of sight direction of the target object based on the facial features includes: determining the weight of each feature point in the facial feature, and based on each feature point in the facial feature The facial feature is adjusted according to the weight of, and the initial line of sight direction of the target object is determined according to the adjusted facial feature.
- determining the fusion feature based on the facial feature and the eye feature in the following manner includes: according to the adjusted facial feature, the eye feature, and the adjusted feature The weight of each feature point in the facial features determines the intermediate feature; based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, the intermediate feature is Performing a weighted summation with the adjusted facial features to obtain the fusion feature.
- the weight of each feature point in the adjusted facial feature is determined in the following manner: the weight of each feature point in the adjusted facial feature is determined according to the eye feature and the adjusted facial feature Weights.
- the weights corresponding to the intermediate feature and the adjusted facial feature are determined in the following manner: the intermediate feature and the adjusted facial feature are determined according to the eye feature and the adjusted facial feature. The weights corresponding to the adjusted facial features respectively.
- the above is based on the eye features and the adjusted facial features to determine the fusion feature after the fusion of the facial features and the eye features.
- the fusion feature comprehensively considers the facial image and the eye image, thereby facilitating the determination of the target object through the fusion feature
- the difference between the actual line of sight direction and the initial line of sight direction, and then the initial line of sight direction can be corrected based on the difference to obtain a more accurate line of sight direction.
- the method for determining the line of sight direction is implemented by a neural network, and the neural network is obtained by training using a sample image containing the marked line of sight direction of the target sample object.
- the neural network is obtained by training in the following manner: acquiring the face sample image and the eye sample image of the target sample object in the sample image; extracting the target sample object from the face sample image The facial features of the target sample object; determine the eye features of the target sample object based on the facial features of the target sample object and the eye sample image; predict the initial line of sight direction of the target sample object based on the facial features of the target sample object , And, based on the fusion feature after fusion of the facial features of the target sample object and the eye features of the target sample object, predicting the line of sight residual information of the target sample object; based on the line of sight of the target sample object The residual information corrects the initial line-of-sight direction of the target sample object to obtain the line-of-sight direction of the target sample object; based on the obtained line-of-sight direction of the target sample object and the labeled line-of-sight direction of the target sample object, The network parameter values of the neural network are adjusted.
- the face sample image and the eye sample image of the target sample object in the sample image can be obtained.
- the facial features of the target sample object are extracted based on the facial sample image, and the facial features of the target sample object can predict the initial line of sight direction of the target sample object.
- the eye features of the target sample object are determined based on the facial features and eye images of the target sample object.
- the information that characterizes the difference between the actual line of sight direction of the target sample object and the initial line of sight direction, that is, the line of sight residual information, can be predicted by the fusion feature after the facial feature and the eye feature of the target sample object are fused.
- the initial gaze direction predicted only based on the facial features of the target sample object is adjusted by the information that characterizes the difference, that is, the gaze direction that is closer to the marked gaze direction of the target sample object can be obtained.
- the network parameter values of the neural network are adjusted based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction, that is, a neural network with higher accuracy can be obtained. Based on the neural network with higher accuracy, the sight direction of the target object can be accurately predicted.
- the embodiments of the present disclosure provide a device for determining a line of sight direction, including: an image acquisition module for acquiring facial images and eye images of a target object; a feature extraction module for extracting all images from the facial image The facial features of the target object; and used to determine the eye features of the target object based on the facial features of the target object and the eye features; a line of sight prediction module is used to predict the target object based on the facial features And, based on the fusion feature after the facial features and the eye features are fused, the gaze residual information is predicted; the gaze correction module is configured to perform the correction of the initial gaze based on the gaze residual information The direction is corrected to obtain the sight direction of the target object.
- an embodiment of the present disclosure provides an electronic device, including: a processor, a storage medium, and a bus.
- the storage medium stores machine-readable instructions executable by the processor.
- the storage media communicate through a bus, and the machine-readable instructions cause the processor to execute the method according to the first aspect.
- embodiments of the present disclosure provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program causes a processor to execute the method described in the first aspect.
- Fig. 1 shows a flowchart of a method for determining a line of sight direction provided by an embodiment of the present disclosure.
- FIG. 2 shows a schematic diagram of a principle of determining the direction of the line of sight provided by an embodiment of the present disclosure.
- Fig. 3 shows a flow chart of a method for determining eye characteristics provided by an embodiment of the present disclosure.
- FIG. 4 shows a schematic diagram of a process of determining the weights corresponding to the left-eye feature and the right-eye feature provided by an embodiment of the present disclosure.
- Fig. 5 shows a flowchart of a method for determining an initial line of sight direction provided by an embodiment of the present disclosure.
- Fig. 6 shows a flow chart of a method for determining fusion features provided by an embodiment of the present disclosure.
- FIG. 7 shows a schematic diagram of a process of determining the initial line of sight direction and determining the line of sight residual information provided by an embodiment of the present disclosure.
- FIG. 8 shows a schematic diagram of a process of determining the line of sight direction provided by an embodiment of the present disclosure.
- Fig. 9 shows a flowchart of a neural network training method provided by an embodiment of the present disclosure.
- FIG. 10 shows a schematic structural diagram of an apparatus for determining a line of sight direction provided by an embodiment of the present disclosure.
- FIG. 11 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- Sight tracking is an important field in computer vision.
- the main purpose of gaze tracking is to predict the user’s gaze direction.
- appearance-based gaze prediction models are often implemented using deep learning models, such as facial features in facial images. Or eye features in the eye image to predict the direction of the line of sight.
- facial image and the eye image are regarded as different independent feature sources, and the intrinsic relationship between the facial image and the eye image is not substantially considered.
- eye images provide fine-grained features that focus on gaze, while facial images provide coarse-grained features with broader information. The combination of the two can more accurately predict the direction of the line of sight.
- the present disclosure provides a method for determining the direction of the line of sight.
- the facial features of the target object can be extracted based on the facial image, and the facial features can be used to predict the initial line of sight direction of the target object.
- the features also called "fusion features”
- the features that are fused from the facial features and eye features can be used to predict the actual line-of-sight direction and initial direction of the target object.
- the information about the difference between the direction of the line of sight that is, the residual information of the line of sight.
- the initial line of sight direction predicted only based on facial features is adjusted by the information that characterizes the difference, that is, the line of sight direction that is closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
- the execution subject of the method for determining the line of sight direction provided by the embodiments of the present disclosure is generally a computer device with a certain computing capability.
- the computer equipment includes, for example, a terminal device or a server or other processing equipment.
- the terminal device may be a user equipment (UE), a mobile device, a user terminal, a terminal, and the like.
- the method for determining the line-of-sight direction may be implemented by a processor invoking a computer-readable instruction stored in a memory.
- FIG. 1 it is a flowchart of a method for determining a line of sight direction provided by an embodiment of the present disclosure.
- the method includes steps S101 to S103.
- the target object can be the user whose line of sight is to be predicted, and the face of the target object can be photographed by a device capable of collecting images, such as a video camera or a camera, to obtain a facial image of the target object, and then the eye of the target object can be intercepted in the facial image Department image.
- a device capable of collecting images such as a video camera or a camera
- S102 Extract the facial features of the target object from the facial image.
- S103 Determine the eye feature of the target object according to the facial feature and eye image of the target object.
- the facial features of the target object refer to coarse-grained features with broader information. Through these facial features, the initial line of sight direction of the target object can be predicted; the eye features of the target object refer to the fine-grained features that can characterize the focus on gaze feature. The combination of eye features and facial features can more accurately predict the direction of the line of sight.
- the facial features and eye features can be extracted by the sub-neural network used for feature extraction in the pre-trained neural network for line-of-sight prediction, which will be described in detail in the following embodiments. Go ahead and repeat.
- S104 Predict the initial line of sight direction of the target object based on the facial features, and predict and obtain line-of-sight residual information based on the fusion feature after the facial feature and the eye feature are fused.
- the line-of-sight residual information is used to characterize the difference between the actual line-of-sight direction of the target object and the initial line-of-sight direction.
- the initial line of sight direction here can be determined based on facial features. Specifically, it can be predicted based on the sub-neural network used to determine the initial line of sight direction in the pre-trained neural network for line of sight prediction. The specific prediction method will be combined later. The examples are described in detail.
- the line-of-sight residual information here can be predicted by a sub-neural network used to determine line-of-sight residual information in a pre-trained neural network for predicting the direction of the line of sight.
- the specific prediction method will be described in detail later.
- the information that characterizes the difference between the actual line of sight of the target object and the initial line of sight direction is predicted by the features after facial features and eye features are fused, and then the information that characterizes the difference is used to adjust the initial line of sight predicted only based on facial features Direction, that is, the direction of the line of sight that is closer to the direction of the actual line of sight can be obtained. That is, the present disclosure proposes to combine the facial image and the eye image of the target object, and predict by combining the fine-grained features provided by the eye image that focus on gaze and the coarse-grained features corresponding to the broader information provided by the facial image.
- facial features and eye features can be input into the pre-trained neural network for gaze direction prediction in the sub-neural network used to determine the residual information of the gaze, and the features after the fusion of the facial features and the eye features are obtained. It will be described later in conjunction with specific embodiments.
- S105 Correct the initial line-of-sight direction based on the line-of-sight residual information to obtain the line-of-sight direction of the target object.
- the line-of-sight residual information here may include information that characterizes the difference between the actual line-of-sight direction and the initial line-of-sight direction determined based on the features after the fusion of the facial features and the eye features, and then the line-of-sight residual information can be paired with The initial line of sight direction is adjusted, for example, the line of sight residual information can be summed with the initial line of sight direction predicted based on facial features to obtain the line of sight direction that is closer to the actual line of sight direction of the target object.
- FIG. 2 it shows a schematic diagram of a principle for determining the direction of the line of sight, where g b represents the initial line of sight direction of the target object predicted based on facial features, and gr represents the residual information of the line of sight, the final target object
- the line of sight direction g is expressed by the following formula (1):
- the line-of-sight residual information indicates the difference between the actual line-of-sight direction and the initial line-of-sight direction
- it can be represented by a vector.
- a world coordinate system can be introduced to represent the initial line of sight direction and line of sight residual information.
- the actual gaze direction of the target object is 30 degrees south east
- the initial gaze direction of the target object obtained by the facial feature prediction of the target object is 25 degrees south east
- the line-of-sight residual information obtained by feature prediction is a deviation of 4 degrees
- the initial line-of-sight direction is corrected by the line-of-sight residual information
- the predicted line-of-sight direction of the target object is 29 degrees south east and 29 degrees south east. 25 degrees south east is obviously closer to the actual line of sight of the target object.
- the gaze direction determination method proposed in the above steps S101 to S105 can be based on the facial features extracted from the face image of the target object, which can predict the initial gaze direction of the target object; in determining the eye of the target object based on the facial features and eye images After the facial features and eye features are fused, the information that characterizes the difference between the actual line of sight of the target object and the initial line of sight can be predicted, that is, the residual information of the line of sight; and then through the feature that characterizes the difference
- the information adjustment is only based on the initial line of sight direction predicted by the facial features, that is, the line of sight direction closer to the actual line of sight direction can be obtained. It can be seen that the line of sight determination method proposed by the embodiment of the present disclosure can predict and obtain a more accurate line of sight direction.
- the facial image can be analyzed to extract the position point coordinates that can characterize the facial features as the facial features of the target object. For example, extract the coordinates of the cheeks and the corners of the eyes.
- the facial features of the target object can be extracted based on a neural network.
- the facial features of the target object can be extracted based on the sub-neural network for feature extraction in the pre-trained neural network for line-of-sight prediction, which specifically includes:
- the facial image is input to the first feature extraction network, and the facial features are obtained through the first feature extraction network processing.
- the first feature extraction network is a sub-neural network used for facial feature extraction in a pre-trained neural network for line-of-sight prediction.
- the first feature extraction network here is used to extract facial features in facial images in a pre-trained neural network for line of sight prediction. That is, after the facial image is input to the first feature extraction network, it can be extracted to predict the initial line of sight Direction of facial features.
- the facial features in the facial image are extracted through the first feature extraction network in the pre-trained neural network for line-of-sight prediction.
- the first feature extraction network is dedicated to extracting facial features of a facial image, so that more accurate facial features can be extracted, thereby facilitating the improvement of the accuracy of the initial line of sight direction.
- the above-mentioned eye image includes a left-eye image and a right-eye image.
- the appearance of the left eye shown in the left-eye image and the appearance of the right eye shown in the right-eye image will change with changes in the environment or changes in the posture of the head.
- the left-eye feature extracted based on the left-eye image and the right-eye feature extracted based on the right-eye image may have different contributions when determining the direction of the line of sight.
- determining the eye features of the target object according to the facial features and eye images of the target object, as shown in FIG. 3, may include the following steps S301 to S304.
- extracting the left-eye feature in the left-eye image can be the extraction of the position point coordinates that can characterize the eye feature in the left-eye image, as the left-eye feature of the target object, such as the position of the pupil and the corner of the eye, or it can be based on Pre-trained neural network to extract left eye features.
- extracting the right-eye feature from the right-eye image here can be extracting the position point coordinates that can characterize the eye feature in the right-eye image, as the right-eye feature of the target object, such as the position of the pupil and the corner of the eye, or,
- the right eye feature can be extracted based on a pre-trained neural network.
- the present disclosure uses a pre-trained neural network to extract left-eye features and right-eye features as an example for description:
- the left-eye image is input into the second feature extraction network, the left-eye feature is obtained through the second feature extraction network, and the right-eye image is input into the third feature extraction network, and the right-eye feature is obtained through the third feature extraction network.
- the second feature extraction network is a sub-neural network used for left-eye feature extraction in a pre-trained neural network for line-of-sight prediction.
- the third feature extraction network is a sub-neural network used for right-eye feature extraction in a pre-trained neural network for line-of-sight prediction.
- S303 Determine a first weight corresponding to the left-eye feature and a second weight corresponding to the right-eye feature according to the facial feature, the left-eye feature, and the right-eye feature.
- the first weight corresponding to the left-eye feature represents the contribution of the left-eye image in determining the line of sight direction
- the second weight corresponding to the right-eye feature represents the contribution of the right-eye image in determining the line of sight direction.
- it can be determined by a pre-trained neural network. For example, facial features, left-eye features, and right-eye features can be input into the attention network, and the attention network processes the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature.
- the attention network is a sub-neural network used to determine the respective evaluation values of the left-eye feature and the right-eye feature in the pre-trained neural network for predicting the direction of the line of sight.
- the evaluation value represents the importance of the left eye feature/right eye feature in the eye feature.
- the facial features, left-eye features, and right-eye features are input into the attention network, and the first weight and the second weight are obtained through the attention network processing, it includes:
- determining the first score of the left-eye feature based on facial features and left-eye features and determining the second score of the right-eye feature based on facial features and right-eye features it can be determined by a pre-trained neural network, for example, Attention network to determine, namely:
- the determination of the first weight and the second weight based on the first score and the second score can also be obtained through attention network processing.
- the first score can represent the contribution of the left-eye image in determining the direction of the line of sight, and it is known through advance testing that the first score is related to both facial features and left-eye features.
- the first score is related to facial features, and refers to the score that predicts the facial features of the initial line of sight direction that can affect the features of the left eye.
- the first score is related to the left eye feature, that is, the shape and appearance of the left eye will also affect the score of the left eye feature.
- the attention network can determine the first score according to the following formula (2):
- m l represents the first score corresponding to the left eye feature
- W 1 , W 2 and W 3 are the network parameters in the attention network, that is, the network parameters obtained after the attention network is trained
- f f represents the facial features
- F l represents the left eye feature.
- the second score can represent the contribution of the right-eye image in determining the direction of the line of sight, and it is known through advance testing that the second score is related to both facial features and right-eye features.
- the second score is related to facial features, and refers to the score that predicts the facial features of the initial line of sight direction that can affect the features of the right eye.
- the second score is related to the right eye feature, that is, the shape and appearance of the right eye will also affect the score of the right eye feature.
- the attention network can determine the second score according to the following formula (3):
- m r represents the second score corresponding to the right eye feature
- W 1 , W 2 and W 3 are the network parameters in the attention network, that is, the network parameters obtained after the attention network is trained
- f f represents the facial features
- Fr represents the right eye feature.
- the first weight and the right weight corresponding to the left eye feature can be further determined according to the first and second scores.
- the second weight corresponding to the eye feature can be specifically determined according to the following formula (4):
- the first weight w l corresponding to the left-eye feature can be obtained; and the second weight w r corresponding to the right-eye feature can be obtained.
- FIG 4 The above process diagram of determining the weights corresponding to the left-eye feature and the right-eye feature can be shown in Figure 4.
- the left-eye feature f l and the right-eye feature f r can be obtained through the deep neural network CNN respectively, and then the face
- the partial feature f f , the left-eye feature f l and the right-eye feature fr are input into the attention network, and the first weight w l corresponding to the left-eye feature and the second weight w r corresponding to the right-eye feature are obtained.
- it may be a step of performing a weighted summation of the left eye feature and the right eye feature based on the first weight and the second weight through the attention network to obtain the eye feature.
- the left-eye feature and the right-eye feature can be weighted and summed.
- the eye feature f e can be obtained according to the following formula (5) :
- the different contributions of the left-eye image and the right-eye image in determining the direction of the line of sight are respectively determined, thereby determining the accuracy Higher eye features, which in turn facilitates the improvement of the accuracy of the residual information of the line of sight.
- the sight direction of the target object can be further determined based on the facial features and eye features. Determining the sight direction of the target object can include two parts. The first part is the process of predicting the initial sight direction of the target object based on facial features, and the second part is the process of predicting the residual sight of the target object based on the fusion of facial features and eye features. The process of poor information.
- S501 Determine the weight of each feature point in the facial feature, and adjust the facial feature based on the weight of each feature point in the facial feature;
- S502 Determine an initial line of sight direction of the target object according to the adjusted facial features.
- Facial features may include multiple feature points.
- Feature points can be understood as different coarse-grained features extracted from facial images. These coarse-grained features can include, for example, regional features and location point features in facial images. Each feature point in the facial features has a different degree of importance when predicting the initial line of sight direction.
- the facial features can be adjusted based on the weight of each feature point first, and then the initial line of sight direction of the target object can be determined based on the adjusted facial features.
- the adjustment can be made through a pre-trained neural network, which will be described in detail later.
- the merged features can be determined based on the facial features and the eye features in the manner shown in FIG. 6, which specifically includes the following steps S601 to S602.
- S601 Determine an intermediate feature according to the adjusted facial feature, eye feature, and weight of each feature point in the adjusted facial feature.
- S602 Perform a weighted summation on the intermediate feature and the adjusted facial feature based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, to obtain the fused feature.
- the intermediate feature here can be determined by a pre-trained neural network. Through the intermediate feature and the adjusted facial feature, the feature after the fusion of the facial feature and the eye feature can be determined.
- the above process of adjusting the facial features to obtain the adjusted facial features, and the process of obtaining the features after the fusion of the facial features and the eye features, can be processed by a pre-trained neural network, such as a gate network.
- a pre-trained neural network such as a gate network.
- the determination of the initial line of sight direction of the target object based on the adjusted facial features can also be determined based on a pre-trained neural network, which will be described in detail later.
- the weight of each feature point in the adjusted facial features can be determined according to the following steps:
- the weight of each feature point in the adjusted facial features is determined according to the eye features and the adjusted facial features.
- the method of determining the weight here can be determined according to a preset weight distribution method, or it can be determined through a pre-trained neural network, which will be described in detail later.
- the weights corresponding to the intermediate features and the adjusted facial features are determined according to the following steps:
- the weights corresponding to the intermediate features and the adjusted facial features are determined according to the eye features and the adjusted facial features.
- the method of determining the weight may be determined according to a preset weight distribution method, or may be determined through a pre-trained neural network, which will be described in detail later.
- the gate network functions to filter the received features, that is, to increase the weight of important features and reduce the weight of non-important features.
- the characteristic change mode of the gate network will be introduced in combination with formula (7) to formula (10):
- W z , W r , W h are the network parameters in the gate network; ⁇ represents the sigmoid operation; ReLU represents the activation function; f represents the corresponding feature received (when processing facial features, f here represents facial features , When processing eye features, f here represents eye features); z t represents the weight obtained after sigmoid operation; r t represents the weight obtained after sigmoid operation; Represents the intermediate feature obtained after the features in the input gate network are fused; h t represents the weighted sum of the intermediate feature and the features output by the adjacent gate network, and set h 0 equal to 0.
- the embodiments of the present disclosure need to determine the initial line-of-sight direction of the target object based on facial features, and predict the line-of-sight residual information of the target object based on the features after fusion of facial features and eye features.
- the embodiment of the present disclosure can introduce two gate networks to complete the filtering of features, respectively, which can be recorded as the first gate network and the second gate network.
- the output characteristics of the first gate network can be recorded as h 1
- the second gate network The output feature can be denoted as h 2 , which will be described below in conjunction with specific embodiments.
- the process of predicting the initial line of sight direction of the target object based on facial features is introduced.
- the weight of facial features can be adjusted through the first network to obtain adjusted facial features h 1 , and then predicted based on the adjusted facial features h 1
- the initial line of sight direction specifically includes the following steps.
- the facial features here can include multiple feature points.
- the feature points here can be understood as different coarse-grained features in the facial image, and these coarse-grained features may include regional features, location point features, etc. in the facial image.
- Each feature point in the facial features has a different degree of importance when predicting the initial line of sight direction.
- the weight of each feature point in facial features is determined through the first gate network.
- the first gate network here is a sub-neural network used to adjust facial features in a pre-trained neural network for line-of-sight prediction.
- the facial features can also be adjusted based on the weight of each feature point in the facial features through the first network.
- h 0 is equal to 0.
- the first multilayer perceptron is a sub-neural network used to predict the initial line of sight direction in the pre-trained neural network for predicting the direction of the line of sight.
- the adjusted facial feature is denoted as h 1 , and then the adjusted facial feature is input into the first multilayer perceptron MLP to obtain the initial line of sight direction of the target object.
- the first gate network adjusts the weight of each feature point in the facial features, so that the weight of the feature point that has a greater impact on the initial line of sight direction is greater than the weight of the feature point that has a small impact on the initial line of sight direction, so that the adjusted
- the facial feature input predicts the first multi-layer perceptron of the initial line of sight direction to obtain a more accurate initial line of sight direction.
- the eye features and adjusted facial features are input into the second gate network, and the second gate network is processed to obtain the fused features;
- the second gate network is a pre-trained neural network for line-of-sight prediction, used for prediction fusion
- the sub-neural network of the latter feature is a pre-trained neural network for line-of-sight prediction, used for prediction fusion
- the adjusted facial feature is the h 1 output by the first gate network, and then input the h 1 and the eye feature f e into the second gate network, that is, the fused feature h output by the second gate network can be obtained 2 .
- the weighted summation of the intermediate feature and the adjusted facial feature is performed through the second gate network to obtain the fused feature.
- the weight of each feature point in the adjusted facial features can be determined in the following way:
- the second gate network uses the trained weight distribution function when performing the first processing The first network parameter information.
- the weight distribution function is a sigmoid operation represented by ⁇ ; the first network parameter information is W r .
- formula (9) can be introduced to process the adjusted facial features, eye features, and the weights of each feature point in the adjusted facial features to obtain intermediate features, namely Get the intermediate feature as
- the weights corresponding to the intermediate features and the adjusted facial features can be determined according to the following methods:
- the second gate network uses the second network in the trained weight distribution function when performing the second processing. Parameter information.
- the second processing is performed on the adjusted facial feature h 1 and eye feature f e to obtain the weights corresponding to the intermediate feature and the adjusted facial feature h 1 respectively.
- This formula corresponds to the above-mentioned second gate network for the eye features and the adjusted face
- the weight distribution function is the sigmoid operation represented by ⁇ ;
- the second network parameter information is W z , so that the weight corresponding to the intermediate feature is z 2 , and the weight corresponding to the adjusted facial feature h 1 is 1-z 2 .
- the gaze residual information can be predicted based on the features fused from facial features and eye features in the following manner:
- the fused features are input into the second multilayer perceptron MLP, and processed by the second multilayer perceptron to obtain line-of-sight residual information.
- the second multilayer perceptron is a sub-neural network used to predict the residual information of the line of sight in a pre-trained neural network for predicting the direction of the line of sight.
- the fused feature is denoted as h 2 , and then the fused feature is input to the second multilayer perceptron MLP to obtain the line-of-sight residual information of the target object.
- the above schematic diagram of the process of determining the initial line of sight direction and determining the line of sight residual information can be determined by the two sub-neural networks shown in FIG. 7.
- the first sub-neural network includes a first gate function and a first multilayer perceptron MLP
- the second sub-neural network includes a second gate function and a second multilayer perceptron MLP.
- the adjusted facial feature h 1 can be input into the first multilayer perceptron to obtain the initial line of sight direction g b on the one hand, and on the other hand, it can be input into the second gate network together with the eye feature, and then processed by the second gate network.
- the feature h 2 after the fusion of facial features and eye features is obtained. Then input the fused feature h 2 into the second multilayer perceptron to obtain the line-of-sight residual information g r .
- the eye feature and the facial feature adjusted by the first gate network are input into the second gate network for processing, and the feature after the fusion of the facial feature and the eye feature is determined.
- the fused feature is a feature obtained after comprehensive consideration of the facial image and the eye image, so that the difference between the actual line of sight direction of the target object and the initial line of sight direction can be easily determined through the fused feature. After correcting the initial line of sight direction based on the difference, a more accurate line of sight direction can be obtained.
- an eye image is intercepted from the facial image, and the eye image includes a left-eye image and a right-eye image.
- the facial image is input to the first feature extraction network (CNN) to obtain the facial feature f f .
- the facial features are input into the aforementioned first sub-neural network (the first sub-neural network includes the first gate network and the first multilayer perceptron) for processing, that is, the initial line of sight direction g b can be obtained.
- the left-eye image in the intercepted eye image is input into the second feature extraction network to obtain the left-eye feature f l
- the right-eye image is input into the third feature extraction network to obtain the right-eye feature f r .
- the left-eye feature, right-eye feature, and facial feature are input into the attention network to obtain the eye feature f e .
- the eye features and the adjusted facial features h 1 obtained by the sub-neural network that predicts the initial line of sight direction are input into the second sub-neural network (the second sub-neural network includes the second gate network and the second multilayer perceptron)
- the line-of-sight residual information g r can be obtained.
- the initial line-of-sight direction can be corrected based on the line-of-sight residual information gr to obtain the line-of-sight direction of the target object.
- the method for determining the line of sight direction proposed in the embodiments of the present application can be implemented by a neural network, which is obtained by training using a sample image that contains a target sample object labeled line of sight direction.
- the labeled sight direction is the actual sight direction of the target sample object.
- the neural network for determining the direction of the line of sight proposed in the embodiment of the present application can be obtained by training using the following steps, including steps S901 to S906.
- S901 Acquire a face sample image and an eye sample image of a target sample object in a sample image.
- the target sample object may include multiple target objects respectively located at different spatial locations. Make these multiple target objects all look in the same observation direction, and acquire facial images of these target sample objects as facial sample images. Then the eye sample image is intercepted from the face sample image.
- the target sample object here may include a target object. The target sample image is made to look in different observation directions, and the facial image corresponding to each observation direction of the target sample object is acquired as the facial sample image, and then the eye sample image is intercepted from the facial sample image.
- S902 Extract the facial features of the target sample object from the facial sample image.
- extracting the facial features of the target sample object from the face sample image is similar to the method of extracting the facial features of the target object introduced above, and will not be repeated here.
- S903 Determine the eye feature of the target sample object according to the facial feature and the eye sample image of the target sample object.
- Determining the eye characteristics of the target sample object is similar to the method of determining the eye characteristics of the target object introduced above, and will not be repeated here.
- S904 Predict the initial line-of-sight direction of the target sample object based on the facial features of the target sample object, and predict and obtain the line-of-sight residual of the target sample object based on the features fused from the facial features of the target sample object and the eye features of the target sample object information.
- determining the initial line-of-sight direction and line-of-sight residual information of the target sample object is similar to the method for determining the initial line-of-sight direction and line-of-sight residual information of the target object above, and will not be repeated here.
- S905 Correct the initial line-of-sight direction of the target sample object based on the line-of-sight residual information of the target sample object to obtain the line-of-sight direction of the target sample object.
- the method of correcting the initial line of sight direction of the target sample object is similar to the method of correcting the initial line of sight direction of the target object based on the residual information of the target object described above, and will not be repeated here.
- S906 Adjust network parameter values of the neural network based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction of the target sample object.
- a loss function can be introduced to determine the loss value corresponding to the direction of the predicted line of sight.
- the network parameter values of the neural network are adjusted through the loss value. For example, when the loss value is less than the set threshold, the training can be stopped to obtain the network parameter value of the neural network.
- how to obtain eye features based on facial features, left eye features, right eye features, and attention network is similar to the detailed process of determining eye features in the gaze direction determination method introduced above, and will not be repeated here; How to predict the initial gaze direction of the target sample object based on facial features, how to determine the fused features based on facial features and eye features, and how to determine the gaze residual information of the target sample object based on the fused features are the same as described above.
- the process of determining the fusion feature in the method for determining the direction of the line of sight is similar to the process of determining the residual information of the line of sight, and will not be repeated here.
- the face sample image and the eye sample image of the target sample object in the sample image can be obtained.
- the facial features of the target sample object are extracted based on the facial sample image, and the facial features of the target sample object can predict the initial line of sight direction of the target sample object.
- the actual gaze direction of the target sample object can be predicted by the features after the fusion of the facial features and eye features of the target sample object Information about the difference between the initial line of sight direction and the line of sight residual information.
- the initial line-of-sight direction predicted only based on the facial features of the target sample object is adjusted by the information that characterizes the difference, that is, the line-of-sight direction that is closer to the marked line-of-sight direction of the target sample object can be obtained.
- Adjust the network parameter values of the neural network based on the obtained line of sight direction and the marked line of sight direction that is, a neural network with higher accuracy can be obtained. Based on the neural network with higher accuracy, the sight direction of the target object can be accurately predicted.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiment of the present disclosure also provides a line-of-sight direction determination device corresponding to the above-mentioned line-of-sight direction determination method. Since the line-of-sight direction determination device in the embodiment of the present disclosure solves the problem, the principle of the above-mentioned line-of-sight direction determination in the embodiment of the present disclosure is The methods are similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
- the line-of-sight direction determining device 1000 includes: an image acquisition module 1001, a feature extraction module 1002, a line-of-sight prediction module 1003, and a line-of-sight correction module 1004 .
- the image acquisition module 1001 is used to acquire facial images and eye images of the target object.
- the feature extraction module 1002 is used for extracting the facial features of the target object from the facial image, and for determining the eye features of the target object according to the facial features and eye features of the target object.
- the line-of-sight prediction module 1003 is used to predict the initial line-of-sight direction of the target object based on facial features, and to predict the residual line-of-sight information based on the fusion feature after the facial feature and the eye feature are fused.
- the line-of-sight correction module 1004 is used to correct the initial line-of-sight direction based on the line-of-sight residual information to obtain the line-of-sight direction of the target object.
- the eye image includes a left eye image and a right eye image
- the feature extraction module 1002 is used to determine the eye features of the target object according to the facial features and eye features of the target object, perform the following operations : Extract left-eye features from the left-eye image; extract right-eye features from the right-eye image; determine the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature according to facial features, left-eye features, and right-eye features Weight: Based on the first weight and the second weight, a weighted summation of the left eye feature and the right eye feature is performed to obtain the eye feature.
- the feature extraction module 1002 when used to determine the first weight corresponding to the left-eye feature and the second weight corresponding to the right-eye feature according to facial features, left-eye features, and right-eye features, execute the following Operation: Determine the first score of the left-eye feature based on facial features and left-eye features, and determine the second score of the right-eye feature based on facial features and right-eye features; determine based on the first and second scores The first weight and the second weight.
- the line-of-sight prediction module 1003 when used to predict the initial line-of-sight direction of the target object based on facial features, it performs the following operations: determining the weight of each feature point in the facial feature, and based on each feature point in the facial feature The facial features are adjusted according to the weight of, and the initial line of sight direction of the target object is determined according to the adjusted facial features.
- the line of sight prediction module 1003 is configured to determine the fused features based on facial features and eye features in the following manner: according to adjusted facial features, eye features, and adjusted facial features The weight of each feature point in determines the intermediate feature; based on the intermediate feature, the adjusted facial feature, and the weights corresponding to the intermediate feature and the adjusted facial feature, the intermediate feature and the adjusted facial feature are weighted and summed to obtain the fusion feature.
- the line-of-sight prediction module 1003 determines the weight of each feature point in the adjusted facial feature according to the following method: determining the weight of each feature point in the adjusted facial feature according to the eye feature and the adjusted facial feature Weights.
- the line-of-sight prediction module 1003 determines the weights corresponding to the intermediate features and the adjusted facial features in the following manner: the intermediate features and the adjusted facial features are determined according to the eye features and the adjusted facial features, respectively The corresponding weight.
- the device 1000 for determining the line of sight direction further includes a neural network training module 1005.
- the neural network training module 1005 is used to train a neural network for determining the line of sight direction of the target object.
- the sample image of the object's marked line of sight direction is trained.
- the neural network training module 1005 trains the neural network in the following manner: obtain the face sample image and eye sample image of the target sample object in the sample image; extract the face of the target sample object from the face sample image Features; determine the eye features of the target sample object based on the facial features and eye sample images of the target sample object; predict the initial line of sight direction of the target sample object based on the facial features of the target sample object, and, based on the facial features and The fusion feature after the fusion of the eye features of the target sample object predicts and obtains the line of sight residual information of the target sample object; based on the line of sight residual information of the target sample object, the initial line of sight direction of the target sample object is corrected to obtain the line of sight of the target sample object Direction: Based on the obtained line-of-sight direction of the target sample object and the marked line-of-sight direction of the target sample object, the network parameter value of the neural network is adjusted.
- an embodiment of the present disclosure also provides an electronic device.
- FIG. 11 a schematic structural diagram of an electronic device 1100 provided by an embodiment of the present disclosure, including: a processor 1101, a storage The medium 1102 and the bus 1103; the storage medium 1102 is used to store execution instructions, including the memory 11021 and the external memory 11022; the memory 11021 here is also called the internal memory, which is used to temporarily store the calculation data of the processor 1101, and the external memory 11022 such as hard disk
- the processor 1101 exchanges data with the external memory 11022 through the memory 11021.
- the processor 1101 and the memory 1102 communicate through the bus 1103, and the machine-readable instructions are executed by the processor 1101 as follows deal with:
- the face image and eye image of the target object Acquire the face image and eye image of the target object; extract the face feature of the target object from the face image; determine the eye feature of the target object according to the face feature and eye image of the target object; predict the initial line of sight direction of the target object based on the facial features , And, based on the fusion features after the facial features and eye features are fused, the line of sight residual information is predicted; based on the line of sight residual information, the initial line of sight direction is corrected to obtain the line of sight direction of the target object.
- Embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is run by a processor, the method for determining the direction of sight described in the embodiment of the method for determining the direction of sight is executed A step of.
- the storage medium may be a volatile or non-volatile computer readable storage medium.
- the computer program product of the method for determining the line of sight direction provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code, and the instructions included in the program code can be used to execute the method for determining the line of sight direction described in the above method embodiment
- the instructions included in the program code can be used to execute the method for determining the line of sight direction described in the above method embodiment
- the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments.
- the computer program product can be specifically implemented by hardware, software, or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
- the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Ophthalmology & Optometry (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (20)
- 一种视线方向确定方法,其特征在于,包括:获取目标对象的面部图像和眼部图像;在所述面部图像中提取所述目标对象的面部特征;根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征;基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
- 根据权利要求1所述的视线方向确定方法,其特征在于,所述眼部图像包括左眼图像和右眼图像,所述根据所述目标对象的面部特征和所述眼部图像确定所述目标对象的眼部特征,包括:在所述左眼图像中提取左眼特征;在所述右眼图像中提取右眼特征;根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重;基于所述第一权重以及所述第二权重,对所述左眼特征和所述右眼特征进行加权求和,得到所述眼部特征。
- 根据权利要求2所述的视线方向确定方法,其特征在于,所述根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重,包括:根据所述面部特征和所述左眼特征确定所述左眼特征的第一分值,以及,根据所述面部特征和所述右眼特征确定所述右眼特征的第二分值;基于所述第一分值和第二分值,确定所述第一权重和第二权重。
- 根据权利要求1至3任一所述的视线方向确定方法,其特征在于,所述基于所述面部特征预测所述目标对象的初始视线方向,包括:确定所述面部特征中各个特征点的权重,并基于所述面部特征中各个特征点的权重,对所述面部特征进行调整;根据调整后的面部特征确定所述目标对象的初始视线方向。
- 根据权利要求4所述的视线方向确定方法,其特征在于,按照以下方式基于所述面部特征和所述眼部特征,确定所述融合特征:根据所述调整后的面部特征、所述眼部特征、以及所述调整后的面部特征中各个特征点的权重确定中间特征;基于所述中间特征、所述调整后的面部特征,以及所述中间特征和所述调整后的面部特征分别对应的权重,对所述中间特征和所述调整后的面部特征进行加权求和,得到所述融合特征。
- 根据权利要求5所述的视线方向确定方法,其特征在于,按照以下方式确定所述调整后的面部特征中各个特征点的权重:根据所述眼部特征和所述调整后的面部特征确定所述调整后的面部特征中各个特征点的权重。
- 根据权利要求5所述的视线方向确定方法,其特征在于,按照以下方式确定所述中间特征和所述调整后的面部特征分别对应的权重:根据所述眼部特征和所述调整后的面部特征确定所述中间特征和所述调整后的面部特征分别对应的权重。
- 根据权利要求1至7任一所述的视线方向确定方法,其特征在于,所述视线方向确定方法由神经网络实现,所述神经网络利用包含了目标样本对象的标注视线方向的样本图像训练得到。
- 根据权利要求8所述的方法,其特征在于,所述神经网络采用以下方式训练得到:获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;在所述面部样本图像中提取所述目标样本对象的面部特征;根据所述目标样本对象的面部特征和所述眼部样本图像确定所述目标样本对象的眼部特征;基于所述目标样本对象的面部特征预测所述目标样本对象的初始视线方向,以及,基于由所述目标样本对象的面部特征和所述目标样本对象的眼部特征融合后的融合特征,预测得到所述目标样本对象的视线残差信息;基于所述目标样本对象的视线残差信息对所述目标样本对象的初始视线方向进行修正,得到所述目标样本对象的视线方向;基于得到的所述目标样本对象的视线方向和所述目标样本对象的标注视线方向,对所述神经网络的网络参数值进行调整。
- 一种视线方向确定装置,其特征在于,包括:图像获取模块,用于获取目标对象的面部图像和眼部图像;特征提取模块,用于在所述面部图像中提取所述目标对象的面部特征;以及用于根据所述目标对象的面部特征和所述眼部特征确定所述目标对象的眼部特征;视线预测模块,用于基于所述面部特征预测所述目标对象的初始视线方向,以及,基于由所述面部特征和所述眼部特征融合后的融合特征,预测得到视线残差信息;视线修正模块,用于基于所述视线残差信息对所述初始视线方向进行修正,得到所述目标对象的视线方向。
- 根据权利要求10所述的视线方向确定装置,其特征在于,所述眼部图像包括左眼图像和右眼图像,所述特征提取模块在用于根据所述目标对象的面部特征和所述眼部特征确定所述目标对象的眼部特征时,执行以下操作:在所述左眼图像中提取左眼特征;在所述右眼图像中提取右眼特征;根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重;基于所述第一权重以及所述第二权重,对所述左眼特征和所述右眼特征进行加权求和,得到所述眼部特征。
- 根据权利要求11所述的视线方向确定装置,其特征在于,所述特征提取模块在用于根据所述面部特征、所述左眼特征和所述右眼特征,确定所述左眼特征对应的第一权重和所述右眼特征对应的第二权重时,执行以下操作:根据所述面部特征和所述左眼特征确定所述左眼特征的第一分值,以及,根据所述面部特征和所述右眼特征确定所述右眼特征的第二分值;基于所述第一分值和第二分值,确定所述第一权重和第二权重。
- 根据权利要求10至12任一所述的视线方向确定装置,其特征在于,所述视线 预测模块在用于基于所述面部特征预测所述目标对象的初始视线方向时,执行以下操作:确定所述面部特征中各个特征点的权重,并基于所述面部特征中各个特征点的权重,对所述面部特征进行调整;根据调整后的面部特征确定所述目标对象的初始视线方向。
- 根据权利要求13所述的视线方向确定装置,其特征在于,所述视线预测模块按照以下方式基于所述面部特征和所述眼部特征,确定所述融合特征:根据所述调整后的面部特征、所述眼部特征、以及所述调整后的面部特征中各个特征点的权重确定中间特征;基于所述中间特征、所述调整后的面部特征,以及所述中间特征和所述调整后的面部特征分别对应的权重,对所述中间特征和所述调整后的面部特征进行加权求和,得到所述融合特征。
- 根据权利要求14所述的视线方向确定装置,其特征在于,所述视线预测模块根据以下方式确定所述调整后的面部特征中各个特征点的权重:根据所述眼部特征和所述调整后的面部特征确定所述调整后的面部特征中各个特征点的权重。
- 根据权利要求14所述的视线方向确定装置,其特征在于,所述视线预测模块按照以下方式确定所述中间特征和所述调整后的面部特征分别对应的权重:根据所述眼部特征和所述调整后的面部特征确定所述中间特征和所述调整后的面部特征分别对应的权重。
- 根据权利要求10至16任一所述的视线方向确定装置,其特征在于,所述视线方向确定装置还包括神经网络训练模块,所述神经网络训练模块用于:训练用于确定所述目标对象的视线方向的神经网络,所述神经网络利用了包含目标样本对象的标注视线方向的样本图像训练得到。
- 根据权利要求17所述的视线方向确定装置,其特征在于,所述神经网络训练模块按照以下方式训练所述神经网络:获取样本图像中的目标样本对象的面部样本图像和眼部样本图像;在所述面部样本图像中提取所述目标样本对象的面部特征;根据所述目标样本对象的面部特征和所述眼部样本图像确定所述目标样本对象的眼部特征;基于所述目标样本对象的面部特征预测所述目标样本对象的初始视线方向,以及,基于由所述目标样本对象的面部特征和所述目标样本对象的眼部特征融合后的融合特征,预测得到所述目标样本对象的视线残差信息;基于所述目标样本对象的视线残差信息对所述目标样本对象的初始视线方向进行修正,得到所述目标样本对象的视线方向;基于得到的所述目标样本对象的视线方向和所述目标样本对象的标注视线方向,对所述神经网络的网络参数值进行调整。
- 一种电子设备,其特征在于,包括:处理器、非暂时性存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令促使所述处理器执行如权利要求1至9任一所述的视线方向确定方法。
- 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序促使处理器执行如权利要求1至9任一所述的视线方向确定方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022524710A JP7309116B2 (ja) | 2019-12-30 | 2020-12-04 | 視線方向特定方法、装置、電子機器及び記憶媒体 |
KR1020217034841A KR20210140763A (ko) | 2019-12-30 | 2020-12-04 | 시선 방향 결정 방법, 장치, 전자 장치 및 저장 매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911403648.2 | 2019-12-30 | ||
CN201911403648.2A CN111178278B (zh) | 2019-12-30 | 2019-12-30 | 视线方向确定方法、装置、电子设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021135827A1 true WO2021135827A1 (zh) | 2021-07-08 |
Family
ID=70646509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/134049 WO2021135827A1 (zh) | 2019-12-30 | 2020-12-04 | 视线方向确定方法、装置、电子设备及存储介质 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP7309116B2 (zh) |
KR (1) | KR20210140763A (zh) |
CN (1) | CN111178278B (zh) |
WO (1) | WO2021135827A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220222969A1 (en) * | 2021-01-13 | 2022-07-14 | Beihang University | Method for determining the direction of gaze based on adversarial optimization |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2996269A1 (en) * | 2014-09-09 | 2016-03-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio splicing concept |
CN111178278B (zh) * | 2019-12-30 | 2022-04-08 | 上海商汤临港智能科技有限公司 | 视线方向确定方法、装置、电子设备及存储介质 |
CN113807119B (zh) * | 2020-05-29 | 2024-04-02 | 魔门塔(苏州)科技有限公司 | 一种人员注视位置检测方法及装置 |
CN113743172B (zh) * | 2020-05-29 | 2024-04-16 | 魔门塔(苏州)科技有限公司 | 一种人员注视位置检测方法及装置 |
CN112183200B (zh) * | 2020-08-25 | 2023-10-17 | 中电海康集团有限公司 | 一种基于视频图像的眼动追踪方法和系统 |
CN112749655B (zh) * | 2021-01-05 | 2024-08-02 | 风变科技(深圳)有限公司 | 视线追踪方法、装置、计算机设备和存储介质 |
CN113361441B (zh) * | 2021-06-18 | 2022-09-06 | 山东大学 | 基于头部姿态和空间注意力的视线区域估计方法及系统 |
CN113378777A (zh) * | 2021-06-30 | 2021-09-10 | 沈阳康慧类脑智能协同创新中心有限公司 | 基于单目摄像机的视线检测的方法和装置 |
CN113705550B (zh) * | 2021-10-29 | 2022-02-18 | 北京世纪好未来教育科技有限公司 | 一种训练方法、视线检测方法、装置和电子设备 |
CN114360042A (zh) * | 2022-01-07 | 2022-04-15 | 桂林电子科技大学 | 一种人眼注视方向预测方法及系统 |
CN116052264B (zh) * | 2023-03-31 | 2023-07-04 | 广州视景医疗软件有限公司 | 一种基于非线性偏差校准的视线估计方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101419664A (zh) * | 2007-10-25 | 2009-04-29 | 株式会社日立制作所 | 视线方向计测方法和视线方向计测装置 |
CN101489467A (zh) * | 2006-07-14 | 2009-07-22 | 松下电器产业株式会社 | 视线方向检测装置和视线方向检测方法 |
CN102547123A (zh) * | 2012-01-05 | 2012-07-04 | 天津师范大学 | 基于人脸识别技术的自适应视线跟踪系统及其跟踪方法 |
CN103246044A (zh) * | 2012-02-09 | 2013-08-14 | 联想(北京)有限公司 | 一种自动对焦方法、系统及具有该系统的照相机和摄像机 |
CN107193383A (zh) * | 2017-06-13 | 2017-09-22 | 华南师范大学 | 一种基于人脸朝向约束的二级视线追踪方法 |
CN109508679A (zh) * | 2018-11-19 | 2019-03-22 | 广东工业大学 | 实现眼球三维视线跟踪的方法、装置、设备及存储介质 |
CN111178278A (zh) * | 2019-12-30 | 2020-05-19 | 上海商汤临港智能科技有限公司 | 视线方向确定方法、装置、电子设备及存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9563805B2 (en) * | 2014-09-02 | 2017-02-07 | Hong Kong Baptist University | Method and apparatus for eye gaze tracking |
JP6946831B2 (ja) | 2017-08-01 | 2021-10-13 | オムロン株式会社 | 人物の視線方向を推定するための情報処理装置及び推定方法、並びに学習装置及び学習方法 |
US20190212815A1 (en) * | 2018-01-10 | 2019-07-11 | Samsung Electronics Co., Ltd. | Method and apparatus to determine trigger intent of user |
CN108615014B (zh) * | 2018-04-27 | 2022-06-21 | 京东方科技集团股份有限公司 | 一种眼睛状态的检测方法、装置、设备和介质 |
CN110503068A (zh) | 2019-08-28 | 2019-11-26 | Oppo广东移动通信有限公司 | 视线估计方法、终端及存储介质 |
-
2019
- 2019-12-30 CN CN201911403648.2A patent/CN111178278B/zh active Active
-
2020
- 2020-12-04 WO PCT/CN2020/134049 patent/WO2021135827A1/zh active Application Filing
- 2020-12-04 JP JP2022524710A patent/JP7309116B2/ja active Active
- 2020-12-04 KR KR1020217034841A patent/KR20210140763A/ko not_active Application Discontinuation
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101489467A (zh) * | 2006-07-14 | 2009-07-22 | 松下电器产业株式会社 | 视线方向检测装置和视线方向检测方法 |
CN101419664A (zh) * | 2007-10-25 | 2009-04-29 | 株式会社日立制作所 | 视线方向计测方法和视线方向计测装置 |
CN102547123A (zh) * | 2012-01-05 | 2012-07-04 | 天津师范大学 | 基于人脸识别技术的自适应视线跟踪系统及其跟踪方法 |
CN103246044A (zh) * | 2012-02-09 | 2013-08-14 | 联想(北京)有限公司 | 一种自动对焦方法、系统及具有该系统的照相机和摄像机 |
CN107193383A (zh) * | 2017-06-13 | 2017-09-22 | 华南师范大学 | 一种基于人脸朝向约束的二级视线追踪方法 |
CN109508679A (zh) * | 2018-11-19 | 2019-03-22 | 广东工业大学 | 实现眼球三维视线跟踪的方法、装置、设备及存储介质 |
CN111178278A (zh) * | 2019-12-30 | 2020-05-19 | 上海商汤临港智能科技有限公司 | 视线方向确定方法、装置、电子设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220222969A1 (en) * | 2021-01-13 | 2022-07-14 | Beihang University | Method for determining the direction of gaze based on adversarial optimization |
US12106606B2 (en) * | 2021-01-13 | 2024-10-01 | Beihang University | Method for determining the direction of gaze based on adversarial optimization |
Also Published As
Publication number | Publication date |
---|---|
CN111178278B (zh) | 2022-04-08 |
JP7309116B2 (ja) | 2023-07-18 |
CN111178278A (zh) | 2020-05-19 |
KR20210140763A (ko) | 2021-11-23 |
JP2022553776A (ja) | 2022-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021135827A1 (zh) | 视线方向确定方法、装置、电子设备及存储介质 | |
Chen et al. | Fsrnet: End-to-end learning face super-resolution with facial priors | |
JP4829141B2 (ja) | 視線検出装置及びその方法 | |
EP4307233A1 (en) | Data processing method and apparatus, and electronic device and computer-readable storage medium | |
JP2019028843A (ja) | 人物の視線方向を推定するための情報処理装置及び推定方法、並びに学習装置及び学習方法 | |
US9508004B2 (en) | Eye gaze detection apparatus, computer-readable recording medium storing eye gaze detection program and eye gaze detection method | |
US10037624B2 (en) | Calibrating object shape | |
US10254831B2 (en) | System and method for detecting a gaze of a viewer | |
CN111723707B (zh) | 一种基于视觉显著性的注视点估计方法及装置 | |
CN106133649A (zh) | 使用双目注视约束的眼睛凝视跟踪 | |
WO2022257487A1 (zh) | 深度估计模型的训练方法, 装置, 电子设备及存储介质 | |
CN114503162A (zh) | 具有不确定性的特征点位置估计的图像处理系统和方法 | |
US9747695B2 (en) | System and method of tracking an object | |
EP3506149A1 (en) | Method, system and computer program product for eye gaze direction estimation | |
WO2021217937A1 (zh) | 姿态识别模型的训练方法及设备、姿态识别方法及其设备 | |
CN114333046A (zh) | 舞蹈动作评分方法、装置、设备和存储介质 | |
CN115841602A (zh) | 基于多视角的三维姿态估计数据集的构建方法及装置 | |
WO2015176502A1 (zh) | 一种图像特征的估计方法和设备 | |
JP2022140386A (ja) | 顔の姿勢を検出する装置及び方法、画像処理システム、並びに記憶媒体 | |
CN113903210A (zh) | 虚拟现实模拟驾驶方法、装置、设备和存储介质 | |
CN116386087B (zh) | 目标对象处理方法以及装置 | |
WO2020044630A1 (ja) | 検出器生成装置、モニタリング装置、検出器生成方法及び検出器生成プログラム | |
CN115953813B (zh) | 一种表情驱动方法、装置、设备及存储介质 | |
JP6952298B2 (ja) | 視線変換装置及び視線変換方法 | |
JP2011232845A (ja) | 特徴点抽出装置および方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20909241 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20217034841 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2022524710 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20909241 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20909241 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 30/01/2023) |