CN114241179A - Sight estimation method based on self-learning - Google Patents
Sight estimation method based on self-learning Download PDFInfo
- Publication number
- CN114241179A CN114241179A CN202111480164.5A CN202111480164A CN114241179A CN 114241179 A CN114241179 A CN 114241179A CN 202111480164 A CN202111480164 A CN 202111480164A CN 114241179 A CN114241179 A CN 114241179A
- Authority
- CN
- China
- Prior art keywords
- tree
- network
- probability
- leaf
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a sight line method based on self-learning, and belongs to the field of computer vision. The method comprises the steps of firstly selecting a deep regression forest as a basic frame, simultaneously introducing two independent sub-networks for feature extraction, performing feature fusion on the extracted features through a feature fusion network, improving the capability of extracting network features, then introducing a structure of the regression forest as a regression model for estimating probability distribution of the sight direction of an input image, calculating a prediction result and the entropy of a sample based on the probability distribution, finally training the whole network model by adopting a self-learning method, and correcting the sequence of the model in self-learning sequencing by utilizing the entropy of the sample to finish the training of the whole model. By the method, the advantages of the deep regression forest and the self-learning training method are fully utilized, and the accuracy and the robustness of the model on the sight estimation task are improved.
Description
Technical Field
The invention belongs to the field of computer vision, and mainly relates to a sight estimation problem based on images; the method is mainly applied to the aspects of film and television entertainment industry, man-machine interaction, machine vision understanding and the like.
Background
The sight line estimation refers to inputting an image including an eye region, analyzing and processing the image by using a computer technology, and estimating the sight line direction of eyes in the input image. At present, the demand for line of sight estimation is increasing in the fields of movie and television entertainment, human-computer interaction, machine vision understanding and the like. For example, the direction of the sight line can be calculated in real time through the camera, and the efficiency of man-machine interaction is improved; in behavior analysis in public places, visual behavior and the like of a monitored object can be better analyzed in an auxiliary manner through sight line estimation. The existing sight line estimation methods are mainly divided into methods based on model estimation and methods based on appearance estimation.
The model-based gaze estimation method is an early method, the basic principle of which can be divided into three steps. The first step is to roughly extract the eye position from the graph using a classifier and locate the center of the eye using a shape-based method; the second step is to detect the eye area and model a two-dimensional elliptical contour covering the eye area on the basis of the corneal limbus; the third step is to back-project the two-dimensional elliptical contour into three-dimensional space to locate the optical axis direction of the eye, and then to estimate the gaze direction of the line of sight in combination with the intersection of the optical axis direction and the screen. The method relies on accurate modeling of the eye image, has high requirements on the quality of the input image, has poor anti-interference capability and is often difficult to meet the requirements on the estimation accuracy. Reference documents: wood E, boring A. Eyetab: Model-based simulation on unmodified tablet computers, Proceedings of the Symposium on Eye Tracking Research and applications.2014: 207-.
The sight line estimation method based on the appearance is to directly obtain the sight line direction through eye image calculation, and specifically, a model is trained through a large number of eye images with labels, so that the model learns a mapping function for directly estimating the sight line direction from the eye images. The method has the advantages that the complicated eye geometric shape modeling process can be avoided, the quality requirement on the input eye image is reduced, and the estimation precision is improved. However, the method has the disadvantages that the training relies on a large number of accurate labeled images for model training, the robustness of the model is not high, the sight line estimation precision may be significantly reduced in a task scene of cross-person estimation, and effective cross-person migration prediction cannot be performed. Reference documents: fischer T, Chang H J, Demiris Y.Rt-gene, Real-time eye size estimation in natural environment, Proceedings of the European Conference on Computer Vision (ECCV) 2018: 334-.
In recent years, the sight line estimation method based on the appearance is more mature, and higher requirements are also placed on the accuracy and robustness of the sight line estimation. The existing method has certain problems in model training, and cannot achieve sufficient precision and robustness. Aiming at the field and considering the defects, the invention provides the sight line estimation method based on self-learning, and the precision and the robustness are obviously improved.
Disclosure of Invention
The invention discloses a sight line estimation method based on self-learning, which solves the problems of low sight line estimation precision and poor robustness in the prior art.
The method begins with the selection of a depth regression forest as the basic frame, the training picture is composed of a pair of left and right eye images, and the monocular image is normalized to a size of 36 x 60 x 3. And respectively constructing a feature extraction network for the left eye and the right eye, taking the features extracted by the left eye and the right eye as the input of the feature fusion network, further obtaining a fusion feature vector, taking the fusion feature vector as the input feature of a regression forest, and further estimating the sight direction of the input image through the regression forest. The training of the model is finally completed by introducing a self-learning strategy in the training process of the model, correcting the sequence of the samples based on the uncertainty of the samples and gradually adding the training samples into the training process. After the model is trained, the sight direction can be estimated only by inputting the left eye image and the right eye image into the trained network model respectively. By the method, the advantages of deep regression forest and self-learning are utilized, the sight line estimation method based on self-learning is provided, and the estimation precision and robustness of the model are improved. The general structural diagram of the algorithm is shown in fig. 1.
For the convenience of describing the present disclosure, certain terms are first defined.
Definition 1: a normal distribution. Also called normal distribution, also known as gaussian distribution, is a probability distribution that is very important in the fields of mathematics, physics, engineering, etc., and has a significant influence on many aspects of statistics. If the random variable x, its probability density function satisfiesWhere μ is the mathematical expectation of a normal distribution, σ2The variance of a normal distribution is said to satisfy the normal distribution, and is often referred to as
Definition 2: the Relu function. The modified linear unit is an activation function commonly used in artificial neural networks, and generally refers to a nonlinear function represented by a ramp function and a variant thereof, and the expression is f (x) max (0, x).
Therefore, the technical scheme of the invention is a sight line estimation method based on self-learning, which comprises the following steps:
step 1: preprocessing the data set;
acquiring a data set, wherein the data set consists of images and corresponding labeling information thereof, extracting left and right eye areas of each image according to the labeling information, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network, wherein the convolutional neural network comprises a feature extraction network and a feature fusion network;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are respectively composed of 2, 3 and 3 standard convolution layers, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector; the standard convolutional layer, the standard fully-connected layer, the sub-networks and the feature extraction network are shown in fig. 3.
2) Constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are cascaded first, and then the fusion feature vectors are output through the feature fusion network; the feature fusion network is shown in fig. 4.
And step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, the depth of each tree is 6, for each tree, each tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution; calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal nodenAfter calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslThen, according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree; finally, taking the average value of the 5 tree prediction results as the final sight estimation result;
and 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; finally, calculating the left shift probability of each tree internal node in the regression forest based on the fusion feature vector and the splitting function, and further calculating a final prediction result; the general neural network structure is schematically shown in fig. 1.
And 5: designing a loss function; the ith pair of left and right eye images obtained in step 1Notation xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, pi representing the parameter of regression forest leaf gaussian distribution, the loss function can be represented as:
whereinIndicating that y is taken with the current model parametersiProbability of (H)iExpressing the entropy of the ith pair of samples, gamma expressing the weight coefficient of the entropy, and lambda being the control parameter of the learning process, wherein the two parameters are both the hyper-parameters of the model; the goal of the entire model is to maximize the above loss function;
step 6: training a total neural network based on self-learning; completing the training of the network model according to a self-learning strategy;
and 7: and estimating the sight in the actual image by adopting the trained total neural network.
Further, the specific method of step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar quantity which is mapped between 0 and 1 and represents the probability of dividing the sample into a left sub-tree after reaching the current node; the concrete form of the splitting function is as follows:
where σ (-) is a sigmoid function,is that the index function represents at the nth splitting sectionPoint selection fusion feature fThe number of the elements is one,represents to the sample xiIn terms of the value of the nth split node;
step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;respectively representing node sets of subtrees taking left and right children of the split node n as root nodes;
step 3.3: calculating the prediction result of a single tree: by Gaussian distributionRepresenting the distribution of leaf nodes, yiRepresents the value of the sight angle, mu represents the mean value of the Gaussian distribution,the variance of the gaussian distribution is expressed, considering that a tree is composed of a plurality of leaf nodes, and the final prediction result is expressed by the weighted average of all the leaves according to the arrival probability, and the form is as follows:
wherein the content of the first and second substances,indicating arrival at a leafThe probability of (a) of (b) being,indicating leavesAt yiThe probability of (a) being in (b),representation treeA set of leaves of (1);
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
wherein K represents the number of trees in the regressive forest,is the predicted result of the kth tree, pikIs the leaf distribution parameter for the kth tree;
further, the method for calculating the sample entropy in step 5 is as follows:
since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
whereinIs the predicted result of the kth tree, pikIs the leaf distribution parameter of the kth tree, then the entropy of the sample is obtained from the average of the entropy of the trees, calculated by:
the innovation of the invention is that:
1) the features of the left eye image and the right eye image are respectively extracted by using two independent sub-networks, and the extracted features are subjected to feature fusion. As shown in fig. 6.
2) And introducing a regression forest structure as a regression model, performing regression to estimate the probability distribution of the sight line direction of the input image, and calculating the entropy of a prediction result and a sample based on the probability distribution.
3) And (3) a learning paradigm of self-learning is introduced to train a deep regression forest model, the sequence of the samples in the self-learning is corrected by combining the uncertainty of the samples, and the prediction precision and the robustness of the model are improved.
Drawings
FIG. 1 is a diagram of the main network structure of the method of the present invention
FIG. 2 is a schematic diagram of a standard volume block and a standard full-link block of the present invention.
Fig. 3 is a schematic diagram of a feature extraction network according to the present invention.
FIG. 4 is a schematic diagram of a feature fusion network according to the present invention.
FIG. 5 is a schematic view of a regression forest structure according to the present invention.
FIG. 6 is a flow chart of the model training algorithm for the self-learning of the present invention.
Detailed Description
Step 1: preprocessing the data set;
acquiring an MPIIGaze data set, wherein the MPIIGaze data set consists of 15 images of people and corresponding annotation information, and each person has 1500 images; extracting left and right eye regions of each image according to the labeling information, enabling the size of a single-eye image to be 36 × 60 × 3, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network and a regression forest;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are composed of 2, 3 and 3 standard convolution layers respectively, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector. The standard convolutional layer, the standard fully-connected layer, the sub-networks and the feature extraction network are shown in fig. 2.
2) Constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are firstly cascaded, and then the fusion feature vectors are output through the feature fusion network. The feature fusion network is shown in fig. 2.
And step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, and the depth of each tree is 6. For each tree, the tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution. Calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal noden. After calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslAnd then according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree. And finally, taking the average value of the 5 tree prediction results as the result of the final sight line estimation.
And 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; and finally, calculating the left shift probability of the internal node of each tree in the regression forest based on the fusion feature vector and the splitting function, and further calculating the final prediction result. The general neural network structure is schematically shown in fig. 1.
And 5: designing a loss function; recording the i-th pair of left and right eye images obtained in step 1 as xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, pi representing the parameter of regression forest leaf gaussian distribution, the loss function can be represented as:
whereinIndicating that y is taken with the current model parametersiProbability of (H)iThe entropy of the ith pair of samples is represented, gamma represents a weight coefficient of the entropy, and lambda is a control parameter of the learning process, wherein the two parameters are both hyper-parameters of the model. The goal of the overall model is to maximize the above loss function.
Step 6: training a network model based on self-learning; and finishing the training of the network model according to a self-walking learning strategy, setting the total step number of the self-walking learning to be 6, and setting the number of the samples used from the step 1 to the step 6 to be 50%, 60%, 70%, 80%, 90% and 100% of the total sample number. Initializing lambda0,γ0Ensure that 50% of the data is added to the 1 st training. During each training step, the loss function in the step 5 is maximized, the network parameters and the regression forest parameters are updated, and after the training is finished, the lambda and the gamma are adjusted to ensure that samples with corresponding proportions are added to the next stepAnd (5) a one-step training process. A flow chart of a model training algorithm based on self-learning is shown in fig. 3.
And 7: and in the testing stage, an image to be tested is taken, preprocessing is carried out according to the method in the step 1, and then the preprocessed image pair is used as the input of the trained model in the step 6, so that the sight estimation result of the tested image is obtained. Experimental results the mean error on MPIIGaze data set was 4.45 °; compared with the front method, the angle is improved by 0.17 degrees.
Further, the specific method of step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar that maps between 0 and 1, characterizes how likely the sample should be divided into the left sub-tree after reaching the current node. The concrete form of the splitting function is as follows:
where σ (-) is a sigmoid function,is that the index function represents the selection of the fusion characteristic f at the nth splitting nodeThe number of the elements is one,represents to the sample xiIn terms of the value of the nth split node; .
Step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;respectively representing node sets of subtrees taking left and right children of the split node n as root nodes.
Step 3.3: calculating the prediction result of a single tree: by Gaussian distributionThe distribution state of the leaf nodes is represented, and considering that a tree is composed of a plurality of leaf nodes, the final prediction result is represented by the weighted average of all the leaves according to the arrival probability, and the form of the final prediction result is as follows:
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
further, the specific method of step 5 is as follows:
step 5.1: calculating the prediction result of the sample: according to the method in the step 3, the prediction result of the regression forest is calculated
Step 5.2: calculating the entropy of the sample: since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
whereinIs the predicted result of the kth tree, ΠkIs the leaf distribution parameter for the kth tree. The entropy of the sample is then derived from the average of the entropies of the trees, and is calculated by:
Claims (3)
1. a sight line estimation method based on self-walking learning comprises the following steps:
step 1: preprocessing the data set;
acquiring a data set, wherein the data set consists of images and corresponding labeling information thereof, extracting left and right eye areas of each image according to the labeling information, and randomly disordering the sequence of the left and right eye image pairs; finally, normalizing the pixel values of the picture to a range of [ -1,1 ];
step 2: constructing a convolutional neural network, wherein the convolutional neural network comprises a feature extraction network and a feature fusion network;
1) constructing a feature extraction network; the feature extraction network consists of two sub-networks with the same structure, and each sub-network receives the monocular image as input and outputs a feature vector; a sub-network is composed of 5 convolution blocks and 1 standard full-connection layer, wherein the 5 convolution blocks are respectively composed of 2, 3 and 3 standard convolution layers, a maximum pooling layer with the step length of 2 is added between the convolution blocks, a maximum pooling layer with the step length of 2 is also connected behind the 5 th convolution block, and finally a standard full-connection layer is connected to output a corresponding feature vector;
2) constructing a feature fusion network; the feature fusion network takes the feature vectors corresponding to the left eye and the right eye as input and outputs fusion feature vectors; the feature fusion network is composed of 2 standard full-connection layers and 1 inactivated full-connection layer, two input feature vectors are cascaded first, and then the fusion feature vectors are output through the feature fusion network;
and step 3: constructing a regression forest; the regression forest is composed of 5 complete binary trees, the depth of each tree is 6, for each tree, each tree is composed of 31 internal nodes and 32 leaf nodes, each internal node has a splitting function, and each leaf node has a Gaussian distribution; calculating the probability s of the current internal node moving to the left according to the splitting function of the nth internal nodenAfter calculating the probability that all internal nodes move to the left, starting from the root node, the arrival probability w of each leaf node can be calculated according to the left movement probability of the internal nodeslThen, according to the probability of reaching each leaf and the distribution of the leaves, calculating the prediction result of the current tree; finally, taking the average value of the 5 tree prediction results as the final sight estimation result;
and 4, step 4: an overall neural network; respectively extracting the feature vectors f of the left eye image and the right eye image by using the feature extraction network in the step 2l,fr(ii) a Then extracting the feature vector fl,frInputting as a feature fusion network to further obtain a fusion feature vector f; finally, calculating the left shift probability of each tree internal node in the regression forest based on the fusion feature vector and the splitting function, and further calculating a final prediction result;
and 5: designing a loss function; recording the i-th pair of left and right eye images obtained in step 1 as xi,yiLabel representing the ith pair of images, viRepresenting the weight of the ith pair of samples, theta representing the parameters of the feature extraction network and the feature fusion network, and pi representing the parameter of the regression forest leaf Gaussian distribution, the loss function can be represented as:
whereinIndicating that y is taken with the current model parametersiProbability of (H)iExpressing the entropy of the ith pair of samples, gamma expressing the weight coefficient of the entropy, and lambda being the control parameter of the learning process, wherein the two parameters are both the hyper-parameters of the model; the goal of the entire model is to maximize the above loss function;
step 6: training a total neural network based on self-learning; completing the training of the network model according to a self-learning strategy;
and 7: and estimating the sight in the actual image by adopting the trained total neural network.
2. The sight line estimation method based on self-learning according to claim 1, wherein the specific method of the step 3 is as follows:
step 3.1: calculating the left shift probability of each internal node: splitting function sn(xi;θ):xi→[0,1]The splitting function is determined by a network parameter theta, and input samples xiA scalar quantity which is mapped between 0 and 1 and represents the probability of dividing the sample into a left sub-tree after reaching the current node; the concrete form of the splitting function is as follows:
where σ (-) is a sigmoid function,is that the index function represents the selection of the fusion characteristic f at the nth splitting nodeThe number of the elements is one,represents to the sample xiIn terms of the value of the nth split node;
step 3.2: calculate the probability of reaching a leaf: for each sample pair, calculating the probability of arriving at each leaf node from the root node according to the left-shift probability of the split node, wherein the calculation of the arrival probability is given by the following formula:
wherein [. ]]Is an indication function, if true returns 1, otherwise returns 0;respectively representing node sets of subtrees taking left and right children of the split node n as root nodes;
step 3.3: calculating the prediction result of a single tree: by Gaussian distributionRepresenting the distribution of leaf nodes, yiRepresents the value of the sight angle, mu represents the mean value of the Gaussian distribution,the variance of the gaussian distribution is expressed, considering that a tree is composed of a plurality of leaf nodes, and the final prediction result is expressed by the weighted average of all the leaves according to the arrival probability, and the form is as follows:
wherein, ω isl(xi| θ) represents the probability of reaching a leaf, l, pl(yi) Indicates that the leaf l is at yiThe probability of (a) being in (b),representation treeA set of leaves of (1);
step 3.4: calculating the prediction result of the regression forest: the final prediction for the sample is the average of the multiple tree predictions, given by:
3. the sight line estimation method based on self-learning according to claim 1, wherein the calculation method of the sample entropy in the step 5 is as follows:
since a single tree is obtained by weighted summation of multiple leaf distributions, the integral of such a mixture gaussian distribution is non-trivial, where the lower bound of the single tree entropy is calculated to approximate the true value of the single tree entropy, which is calculated by:
whereinIs the predicted result of the kth tree, pikIs the leaf distribution parameter of the kth tree, then the entropy of the sample is obtained from the average of the entropy of the trees, calculated by:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111480164.5A CN114241179A (en) | 2021-12-06 | 2021-12-06 | Sight estimation method based on self-learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111480164.5A CN114241179A (en) | 2021-12-06 | 2021-12-06 | Sight estimation method based on self-learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114241179A true CN114241179A (en) | 2022-03-25 |
Family
ID=80753446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111480164.5A Pending CN114241179A (en) | 2021-12-06 | 2021-12-06 | Sight estimation method based on self-learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114241179A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599994A (en) * | 2016-11-23 | 2017-04-26 | 电子科技大学 | Sight line estimation method based on depth regression network |
CN108765409A (en) * | 2018-06-01 | 2018-11-06 | 电子科技大学 | A kind of screening technique of the candidate nodule based on CT images |
CN110516537A (en) * | 2019-07-15 | 2019-11-29 | 电子科技大学 | A kind of face age estimation method based on from step study |
CN111414875A (en) * | 2020-03-26 | 2020-07-14 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system based on depth regression forest |
WO2021022970A1 (en) * | 2019-08-05 | 2021-02-11 | 青岛理工大学 | Multi-layer random forest-based part recognition method and system |
-
2021
- 2021-12-06 CN CN202111480164.5A patent/CN114241179A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599994A (en) * | 2016-11-23 | 2017-04-26 | 电子科技大学 | Sight line estimation method based on depth regression network |
CN108765409A (en) * | 2018-06-01 | 2018-11-06 | 电子科技大学 | A kind of screening technique of the candidate nodule based on CT images |
CN110516537A (en) * | 2019-07-15 | 2019-11-29 | 电子科技大学 | A kind of face age estimation method based on from step study |
WO2021022970A1 (en) * | 2019-08-05 | 2021-02-11 | 青岛理工大学 | Multi-layer random forest-based part recognition method and system |
CN111414875A (en) * | 2020-03-26 | 2020-07-14 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system based on depth regression forest |
Non-Patent Citations (3)
Title |
---|
LILI PAN等: ""self-paced deep regression forests with consideration on underrepresented examples"" * |
TOBIAS FISCHER等: ""RT-GENE: Real-time eye gaze estimation in natural environments"" * |
单兴华等: ""基于改进随机森林的架势员视线估计的方法"" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN110263912B (en) | Image question-answering method based on multi-target association depth reasoning | |
CN106920243B (en) | Improved ceramic material part sequence image segmentation method of full convolution neural network | |
CN110956111A (en) | Artificial intelligence CNN, LSTM neural network gait recognition system | |
CN111681178B (en) | Knowledge distillation-based image defogging method | |
CN108288035A (en) | The human motion recognition method of multichannel image Fusion Features based on deep learning | |
CN109741268B (en) | Damaged image complement method for wall painting | |
CN110059616A (en) | Pedestrian's weight identification model optimization method based on fusion loss function | |
CN108595558B (en) | Image annotation method based on data equalization strategy and multi-feature fusion | |
CN110889450A (en) | Method and device for super-parameter tuning and model building | |
KR20200010672A (en) | Smart merchandise searching method and system using deep learning | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN116704079B (en) | Image generation method, device, equipment and storage medium | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN112084895A (en) | Pedestrian re-identification method based on deep learning | |
CN113989405B (en) | Image generation method based on small sample continuous learning | |
CN113420289B (en) | Hidden poisoning attack defense method and device for deep learning model | |
CN114647752A (en) | Lightweight visual question-answering method based on bidirectional separable deep self-attention network | |
CN114783017A (en) | Method and device for generating confrontation network optimization based on inverse mapping | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN111126155A (en) | Pedestrian re-identification method for generating confrontation network based on semantic constraint | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
CN117576149A (en) | Single-target tracking method based on attention mechanism | |
CN117115911A (en) | Hypergraph learning action recognition system based on attention mechanism | |
CN111160161A (en) | Self-learning face age estimation method based on noise elimination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |