CN114241179A - Sight estimation method based on self-learning - Google Patents

Sight estimation method based on self-learning Download PDF

Info

Publication number
CN114241179A
CN114241179A CN202111480164.5A CN202111480164A CN114241179A CN 114241179 A CN114241179 A CN 114241179A CN 202111480164 A CN202111480164 A CN 202111480164A CN 114241179 A CN114241179 A CN 114241179A
Authority
CN
China
Prior art keywords
tree
network
probability
leaf
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111480164.5A
Other languages
Chinese (zh)
Inventor
孟明明
潘力立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202111480164.5A priority Critical patent/CN114241179A/en
Publication of CN114241179A publication Critical patent/CN114241179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a sight line method based on self-learning, and belongs to the field of computer vision. The method comprises the steps of firstly selecting a deep regression forest as a basic frame, simultaneously introducing two independent sub-networks for feature extraction, performing feature fusion on the extracted features through a feature fusion network, improving the capability of extracting network features, then introducing a structure of the regression forest as a regression model for estimating probability distribution of the sight direction of an input image, calculating a prediction result and the entropy of a sample based on the probability distribution, finally training the whole network model by adopting a self-learning method, and correcting the sequence of the model in self-learning sequencing by utilizing the entropy of the sample to finish the training of the whole model. By the method, the advantages of the deep regression forest and the self-learning training method are fully utilized, and the accuracy and the robustness of the model on the sight estimation task are improved.

Description

Sight estimation method based on self-learning
Technical Field
The invention belongs to the field of computer vision, and mainly relates to a sight estimation problem based on images; the method is mainly applied to the aspects of film and television entertainment industry, man-machine interaction, machine vision understanding and the like.
Background
The sight line estimation refers to inputting an image including an eye region, analyzing and processing the image by using a computer technology, and estimating the sight line direction of eyes in the input image. At present, the demand for line of sight estimation is increasing in the fields of movie and television entertainment, human-computer interaction, machine vision understanding and the like. For example, the direction of the sight line can be calculated in real time through the camera, and the efficiency of man-machine interaction is improved; in behavior analysis in public places, visual behavior and the like of a monitored object can be better analyzed in an auxiliary manner through sight line estimation. The existing sight line estimation methods are mainly divided into methods based on model estimation and methods based on appearance estimation.
The model-based gaze estimation method is an early method, the basic principle of which can be divided into three steps. The first step is to roughly extract the eye position from the graph using a classifier and locate the center of the eye using a shape-based method; the second step is to detect the eye area and model a two-dimensional elliptical contour covering the eye area on the basis of the corneal limbus; the third step is to back-project the two-dimensional elliptical contour into three-dimensional space to locate the optical axis direction of the eye, and then to estimate the gaze direction of the line of sight in combination with the intersection of the optical axis direction and the screen. The method relies on accurate modeling of the eye image, has high requirements on the quality of the input image, has poor anti-interference capability and is often difficult to meet the requirements on the estimation accuracy. Reference documents: wood E, boring A. Eyetab: Model-based simulation on unmodified tablet computers, Proceedings of the Symposium on Eye Tracking Research and applications.2014: 207-.
The sight line estimation method based on the appearance is to directly obtain the sight line direction through eye image calculation, and specifically, a model is trained through a large number of eye images with labels, so that the model learns a mapping function for directly estimating the sight line direction from the eye images. The method has the advantages that the complicated eye geometric shape modeling process can be avoided, the quality requirement on the input eye image is reduced, and the estimation precision is improved. However, the method has the disadvantages that the training relies on a large number of accurate labeled images for model training, the robustness of the model is not high, the sight line estimation precision may be significantly reduced in a task scene of cross-person estimation, and effective cross-person migration prediction cannot be performed. Reference documents: fischer T, Chang H J, Demiris Y.Rt-gene, Real-time eye size estimation in natural environment, Proceedings of the European Conference on Computer Vision (ECCV) 2018: 334-.
In recent years, the sight line estimation method based on the appearance is more mature, and higher requirements are also placed on the accuracy and robustness of the sight line estimation. The existing method has certain problems in model training, and cannot achieve sufficient precision and robustness. Aiming at the field and considering the defects, the invention provides the sight line estimation method based on self-learning, and the precision and the robustness are obviously improved.
Disclosure of Invention
The invention discloses a sight line estimation method based on self-learning, which solves the problems of low sight line estimation precision and poor robustness in the prior art.
The method begins with the selection of a depth regression forest as the basic frame, the training picture is composed of a pair of left and right eye images, and the monocular image is normalized to a size of 36 x 60 x 3. And respectively constructing a feature extraction network for the left eye and the right eye, taking the features extracted by the left eye and the right eye as the input of the feature fusion network, further obtaining a fusion feature vector, taking the fusion feature vector as the input feature of a regression forest, and further estimating the sight direction of the input image through the regression forest. The training of the model is finally completed by introducing a self-learning strategy in the training process of the model, correcting the sequence of the samples based on the uncertainty of the samples and gradually adding the training samples into the training process. After the model is trained, the sight direction can be estimated only by inputting the left eye image and the right eye image into the trained network model respectively. By the method, the advantages of deep regression forest and self-learning are utilized, the sight line estimation method based on self-learning is provided, and the estimation precision and robustness of the model are improved. The general structural diagram of the algorithm is shown in fig. 1.
For the convenience of describing the present disclosure, certain terms are first defined.
Definition 1: a normal distribution. Also called normal distribution, also known as gaussian distribution, is a probability distribution that is very important in the fields of mathematics, physics, engineering, etc., and has a significant influence on many aspects of statistics. If the random variable x, its probability density function satisfies
Figure BDA0003393988260000021
Where μ is the mathematical expectation of a normal distribution, σ2The variance of a normal distribution is said to satisfy the normal distribution, and is often referred to as
Figure BDA0003393988260000022
Definition 2: the Relu function. The modified linear unit is an activation function commonly used in artificial neural networks, and generally refers to a nonlinear function represented by a ramp function and a variant thereof, and the expression is f (x) max (0, x).
Definition 3: sigmoid function. Using expressions
Figure BDA0003393988260000023
And (4) defining.
Therefore, the technical scheme of the invention is a sight line estimation method based on self-learning, which comprises the following steps:
step 1: preprocessing the data set;
acquiring a data set, wherein the data set consists of images and corresponding labeling information thereof, extracting left and right eye areas of each image according to the labeling information, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network, wherein the convolutional neural network comprises a feature extraction network and a feature fusion network;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are respectively composed of 2, 3 and 3 standard convolution layers, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector; the standard convolutional layer, the standard fully-connected layer, the sub-networks and the feature extraction network are shown in fig. 3.
2) Constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are cascaded first, and then the fusion feature vectors are output through the feature fusion network; the feature fusion network is shown in fig. 4.
And step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, the depth of each tree is 6, for each tree, each tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution; calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal nodenAfter calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslThen, according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree; finally, taking the average value of the 5 tree prediction results as the final sight estimation result;
and 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; finally, calculating the left shift probability of each tree internal node in the regression forest based on the fusion feature vector and the splitting function, and further calculating a final prediction result; the general neural network structure is schematically shown in fig. 1.
And 5: designing a loss function; the ith pair of left and right eye images obtained in step 1Notation xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, pi representing the parameter of regression forest leaf gaussian distribution, the loss function can be represented as:
Figure BDA0003393988260000041
wherein
Figure BDA0003393988260000042
Indicating that y is taken with the current model parametersiProbability of (H)iExpressing the entropy of the ith pair of samples, gamma expressing the weight coefficient of the entropy, and lambda being the control parameter of the learning process, wherein the two parameters are both the hyper-parameters of the model; the goal of the entire model is to maximize the above loss function;
step 6: training a total neural network based on self-learning; completing the training of the network model according to a self-learning strategy;
and 7: and estimating the sight in the actual image by adopting the trained total neural network.
Further, the specific method of step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar quantity which is mapped between 0 and 1 and represents the probability of dividing the sample into a left sub-tree after reaching the current node; the concrete form of the splitting function is as follows:
Figure BDA0003393988260000043
where σ (-) is a sigmoid function,
Figure BDA0003393988260000044
is that the index function represents at the nth splitting sectionPoint selection fusion feature f
Figure BDA0003393988260000045
The number of the elements is one,
Figure BDA0003393988260000046
represents to the sample xiIn terms of the value of the nth split node;
step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
Figure BDA0003393988260000047
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;
Figure BDA0003393988260000048
respectively representing node sets of subtrees taking left and right children of the split node n as root nodes;
step 3.3: calculating the prediction result of a single tree: by Gaussian distribution
Figure BDA0003393988260000049
Representing the distribution of leaf nodes, yiRepresents the value of the sight angle, mu represents the mean value of the Gaussian distribution,
Figure BDA00033939882600000410
the variance of the gaussian distribution is expressed, considering that a tree is composed of a plurality of leaf nodes, and the final prediction result is expressed by the weighted average of all the leaves according to the arrival probability, and the form is as follows:
Figure BDA0003393988260000051
wherein the content of the first and second substances,
Figure BDA0003393988260000052
indicating arrival at a leaf
Figure BDA0003393988260000053
The probability of (a) of (b) being,
Figure BDA0003393988260000054
indicating leaves
Figure BDA0003393988260000055
At yiThe probability of (a) being in (b),
Figure BDA0003393988260000056
representation tree
Figure BDA0003393988260000057
A set of leaves of (1);
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
Figure BDA0003393988260000058
wherein K represents the number of trees in the regressive forest,
Figure BDA0003393988260000059
is the predicted result of the kth tree, pikIs the leaf distribution parameter for the kth tree;
further, the method for calculating the sample entropy in step 5 is as follows:
since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
Figure BDA00033939882600000510
wherein
Figure BDA00033939882600000511
Is the predicted result of the kth tree, pikIs the leaf distribution parameter of the kth tree, then the entropy of the sample is obtained from the average of the entropy of the trees, calculated by:
Figure BDA00033939882600000512
the innovation of the invention is that:
1) the features of the left eye image and the right eye image are respectively extracted by using two independent sub-networks, and the extracted features are subjected to feature fusion. As shown in fig. 6.
2) And introducing a regression forest structure as a regression model, performing regression to estimate the probability distribution of the sight line direction of the input image, and calculating the entropy of a prediction result and a sample based on the probability distribution.
3) And (3) a learning paradigm of self-learning is introduced to train a deep regression forest model, the sequence of the samples in the self-learning is corrected by combining the uncertainty of the samples, and the prediction precision and the robustness of the model are improved.
Drawings
FIG. 1 is a diagram of the main network structure of the method of the present invention
FIG. 2 is a schematic diagram of a standard volume block and a standard full-link block of the present invention.
Fig. 3 is a schematic diagram of a feature extraction network according to the present invention.
FIG. 4 is a schematic diagram of a feature fusion network according to the present invention.
FIG. 5 is a schematic view of a regression forest structure according to the present invention.
FIG. 6 is a flow chart of the model training algorithm for the self-learning of the present invention.
Detailed Description
Step 1: preprocessing the data set;
acquiring an MPIIGaze data set, wherein the MPIIGaze data set consists of 15 images of people and corresponding annotation information, and each person has 1500 images; extracting left and right eye regions of each image according to the labeling information, enabling the size of a single-eye image to be 36 × 60 × 3, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network and a regression forest;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are composed of 2, 3 and 3 standard convolution layers respectively, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector. The standard convolutional layer, the standard fully-connected layer, the sub-networks and the feature extraction network are shown in fig. 2.
2) Constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are firstly cascaded, and then the fusion feature vectors are output through the feature fusion network. The feature fusion network is shown in fig. 2.
And step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, and the depth of each tree is 6. For each tree, the tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution. Calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal noden. After calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslAnd then according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree. And finally, taking the average value of the 5 tree prediction results as the result of the final sight line estimation.
And 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; and finally, calculating the left shift probability of the internal node of each tree in the regression forest based on the fusion feature vector and the splitting function, and further calculating the final prediction result. The general neural network structure is schematically shown in fig. 1.
And 5: designing a loss function; recording the i-th pair of left and right eye images obtained in step 1 as xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, pi representing the parameter of regression forest leaf gaussian distribution, the loss function can be represented as:
Figure BDA0003393988260000071
wherein
Figure BDA0003393988260000072
Indicating that y is taken with the current model parametersiProbability of (H)iThe entropy of the ith pair of samples is represented, gamma represents a weight coefficient of the entropy, and lambda is a control parameter of the learning process, wherein the two parameters are both hyper-parameters of the model. The goal of the overall model is to maximize the above loss function.
Step 6: training a network model based on self-learning; and finishing the training of the network model according to a self-walking learning strategy, setting the total step number of the self-walking learning to be 6, and setting the number of the samples used from the step 1 to the step 6 to be 50%, 60%, 70%, 80%, 90% and 100% of the total sample number. Initializing lambda0,γ0Ensure that 50% of the data is added to the 1 st training. During each training step, the loss function in the step 5 is maximized, the network parameters and the regression forest parameters are updated, and after the training is finished, the lambda and the gamma are adjusted to ensure that samples with corresponding proportions are added to the next stepAnd (5) a one-step training process. A flow chart of a model training algorithm based on self-learning is shown in fig. 3.
And 7: and in the testing stage, an image to be tested is taken, preprocessing is carried out according to the method in the step 1, and then the preprocessed image pair is used as the input of the trained model in the step 6, so that the sight estimation result of the tested image is obtained. Experimental results the mean error on MPIIGaze data set was 4.45 °; compared with the front method, the angle is improved by 0.17 degrees.
Further, the specific method of step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar that maps between 0 and 1, characterizes how likely the sample should be divided into the left sub-tree after reaching the current node. The concrete form of the splitting function is as follows:
Figure BDA0003393988260000073
where σ (-) is a sigmoid function,
Figure BDA0003393988260000074
is that the index function represents the selection of the fusion characteristic f at the nth splitting node
Figure BDA0003393988260000075
The number of the elements is one,
Figure BDA0003393988260000076
represents to the sample xiIn terms of the value of the nth split node; .
Step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
Figure BDA0003393988260000077
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;
Figure BDA0003393988260000081
respectively representing node sets of subtrees taking left and right children of the split node n as root nodes.
Step 3.3: calculating the prediction result of a single tree: by Gaussian distribution
Figure BDA0003393988260000082
The distribution state of the leaf nodes is represented, and considering that a tree is composed of a plurality of leaf nodes, the final prediction result is represented by the weighted average of all the leaves according to the arrival probability, and the form of the final prediction result is as follows:
Figure BDA0003393988260000083
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
Figure BDA0003393988260000084
further, the specific method of step 5 is as follows:
step 5.1: calculating the prediction result of the sample: according to the method in the step 3, the prediction result of the regression forest is calculated
Figure BDA0003393988260000085
Step 5.2: calculating the entropy of the sample: since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
Figure BDA0003393988260000086
wherein
Figure BDA0003393988260000087
Is the predicted result of the kth tree, ΠkIs the leaf distribution parameter for the kth tree. The entropy of the sample is then derived from the average of the entropies of the trees, and is calculated by:
Figure BDA0003393988260000088

Claims (3)

1. a sight line estimation method based on self-walking learning comprises the following steps:
step 1: preprocessing the data set;
acquiring a data set, wherein the data set consists of images and corresponding labeling information thereof, extracting left and right eye areas of each image according to the labeling information, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network, wherein the convolutional neural network comprises a feature extraction network and a feature fusion network;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are respectively composed of 2, 3 and 3 standard convolution layers, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector;
2) constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are cascaded first, and then the fusion feature vectors are output through the feature fusion network;
and step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, the depth of each tree is 6, for each tree, each tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution; calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal nodenAfter calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslThen, according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree; finally, taking the average value of the 5 tree prediction results as the final sight estimation result;
and 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; finally, calculating the left shift probability of each tree internal node in the regression forest based on the fusion feature vector and the splitting function, and further calculating a final prediction result;
and 5: designing a loss function; recording the i-th pair of left and right eye images obtained in step 1 as xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, and pi representing the parameter of the regression forest leaf Gaussian distribution, the loss function can be represented as:
Figure FDA0003393988250000011
wherein
Figure FDA00033939882500000212
Indicating that y is taken with the current model parametersiProbability of (H)iExpressing the entropy of the ith pair of samples, gamma expressing the weight coefficient of the entropy, and lambda being the control parameter of the learning process, wherein the two parameters are both the hyper-parameters of the model; the goal of the entire model is to maximize the above loss function;
step 6: training a total neural network based on self-learning; completing the training of the network model according to a self-learning strategy;
and 7: and estimating the sight in the actual image by adopting the trained total neural network.
2. The sight line estimation method based on self-learning according to claim 1, wherein the specific method of the step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar quantity which is mapped between 0 and 1 and represents the probability of dividing the sample into a left sub-tree after reaching the current node; the concrete form of the splitting function is as follows:
Figure FDA0003393988250000021
where σ (-) is a sigmoid function,
Figure FDA0003393988250000022
is that the index function represents the selection of the fusion characteristic f at the nth splitting node
Figure FDA0003393988250000023
The number of the elements is one,
Figure FDA0003393988250000024
represents to the sample xiIn terms of the value of the nth split node;
step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
Figure FDA0003393988250000025
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;
Figure FDA0003393988250000026
respectively representing node sets of subtrees taking left and right children of the split node n as root nodes;
step 3.3: calculating the prediction result of a single tree: by Gaussian distribution
Figure FDA0003393988250000027
Representing the distribution of leaf nodes, yiRepresents the value of the sight angle, mu represents the mean value of the Gaussian distribution,
Figure FDA0003393988250000028
the variance of the gaussian distribution is expressed, considering that a tree is composed of a plurality of leaf nodes, and the final prediction result is expressed by the weighted average of all the leaves according to the arrival probability, and the form is as follows:
Figure FDA0003393988250000029
wherein, ω isl(xi| θ) represents the probability of reaching a leaf, l, pl(yi) Indicates that the leaf l is at yiThe probability of (a) being in (b),
Figure FDA00033939882500000210
representation tree
Figure FDA00033939882500000211
A set of leaves of (1);
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
Figure FDA0003393988250000031
wherein K represents the number of trees in the regressive forest,
Figure FDA0003393988250000032
is the predicted result of the kth tree, pikIs the leaf distribution parameter for the kth tree;
3. the sight line estimation method based on self-learning according to claim 1, wherein the calculation method of the sample entropy in the step 5 is as follows:
since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
Figure FDA0003393988250000033
wherein
Figure FDA0003393988250000034
Is the predicted result of the kth tree, pikIs the leaf distribution parameter of the kth tree, then the entropy of the sample is obtained from the average of the entropy of the trees, calculated by:
Figure FDA0003393988250000035
CN202111480164.5A 2021-12-06 2021-12-06 Sight estimation method based on self-learning Pending CN114241179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111480164.5A CN114241179A (en) 2021-12-06 2021-12-06 Sight estimation method based on self-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111480164.5A CN114241179A (en) 2021-12-06 2021-12-06 Sight estimation method based on self-learning

Publications (1)

Publication Number Publication Date
CN114241179A true CN114241179A (en) 2022-03-25

Family

ID=80753446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111480164.5A Pending CN114241179A (en) 2021-12-06 2021-12-06 Sight estimation method based on self-learning

Country Status (1)

Country Link
CN (1) CN114241179A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599994A (en) * 2016-11-23 2017-04-26 电子科技大学 Sight line estimation method based on depth regression network
CN108765409A (en) * 2018-06-01 2018-11-06 电子科技大学 A kind of screening technique of the candidate nodule based on CT images
CN110516537A (en) * 2019-07-15 2019-11-29 电子科技大学 A kind of face age estimation method based on from step study
CN111414875A (en) * 2020-03-26 2020-07-14 电子科技大学 Three-dimensional point cloud head attitude estimation system based on depth regression forest
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599994A (en) * 2016-11-23 2017-04-26 电子科技大学 Sight line estimation method based on depth regression network
CN108765409A (en) * 2018-06-01 2018-11-06 电子科技大学 A kind of screening technique of the candidate nodule based on CT images
CN110516537A (en) * 2019-07-15 2019-11-29 电子科技大学 A kind of face age estimation method based on from step study
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system
CN111414875A (en) * 2020-03-26 2020-07-14 电子科技大学 Three-dimensional point cloud head attitude estimation system based on depth regression forest

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LILI PAN等: ""self-paced deep regression forests with consideration on underrepresented examples"" *
TOBIAS FISCHER等: ""RT-GENE: Real-time eye gaze estimation in natural environments"" *
单兴华等: ""基于改进随机森林的架势员视线估计的方法"" *

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
CN106920243B (en) Improved ceramic material part sequence image segmentation method of full convolution neural network
CN110956111A (en) Artificial intelligence CNN, LSTM neural network gait recognition system
CN111681178B (en) Knowledge distillation-based image defogging method
CN108288035A (en) The human motion recognition method of multichannel image Fusion Features based on deep learning
CN109741268B (en) Damaged image complement method for wall painting
CN110059616A (en) Pedestrian's weight identification model optimization method based on fusion loss function
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN110889450A (en) Method and device for super-parameter tuning and model building
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN116704079B (en) Image generation method, device, equipment and storage medium
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN112084895A (en) Pedestrian re-identification method based on deep learning
CN113989405B (en) Image generation method based on small sample continuous learning
CN113420289B (en) Hidden poisoning attack defense method and device for deep learning model
CN114647752A (en) Lightweight visual question-answering method based on bidirectional separable deep self-attention network
CN114783017A (en) Method and device for generating confrontation network optimization based on inverse mapping
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN111126155A (en) Pedestrian re-identification method for generating confrontation network based on semantic constraint
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN117576149A (en) Single-target tracking method based on attention mechanism
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
CN111160161A (en) Self-learning face age estimation method based on noise elimination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination