CN110633667A - Action prediction method based on multitask random forest - Google Patents

Action prediction method based on multitask random forest Download PDF

Info

Publication number
CN110633667A
CN110633667A CN201910856984.6A CN201910856984A CN110633667A CN 110633667 A CN110633667 A CN 110633667A CN 201910856984 A CN201910856984 A CN 201910856984A CN 110633667 A CN110633667 A CN 110633667A
Authority
CN
China
Prior art keywords
node
action
random forest
multitask
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910856984.6A
Other languages
Chinese (zh)
Other versions
CN110633667B (en
Inventor
刘翠微
于天宇
杜冲
石祥滨
李照奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENYANG TIANMU TECHNOLOGY CO LTD
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201910856984.6A priority Critical patent/CN110633667B/en
Publication of CN110633667A publication Critical patent/CN110633667A/en
Application granted granted Critical
Publication of CN110633667B publication Critical patent/CN110633667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The invention discloses an action prediction method based on a multitask random forest, which comprises the following steps: constructing a multi-task random forest-based action prediction model by using a training video labeled with an action type label and an observation rate label; and for a newly input video containing incomplete actions, predicting the action type of the video by utilizing a multitask random forest. According to the motion prediction method based on the multi-task random forest, aiming at the difficulties that input videos are incomplete and observation rates are unknown in motion prediction, through jointly learning motion classification and a classifier of two tasks of video observation rate identification, the performance of a motion prediction model is greatly improved.

Description

Action prediction method based on multitask random forest
Technical Field
The invention relates to the field of computer vision, and particularly provides a motion prediction method based on a multitask random forest.
Background
Computer vision is a field of research that models and processes data using statistical methods by means of geometric, physical and learning theories. Motion recognition is to discriminate motion on the basis of a complete video, however, in life, it is often necessary to react to motion before the motion is completed, that is, before the complete video is not observed, and therefore, research is being conducted on how to predict the category of video including incomplete motion. The action prediction of a person mainly comprises two steps of action representation and prediction of an action category of the person. The motion representation is to extract information such as appearance, motion, and structure from an input video and generate a feature vector describing the video. Motion prediction establishes an association between video content and motion categories by analyzing feature vectors of the video at the current time.
At present, the research of the motion prediction method is still in the development stage, and the existing motion prediction methods can be divided into three categories: a sequence matching based approach; a deep learning based approach; a method based on video segment analysis. The first method extracts appearance, motion and other bottom layer features from incomplete videos and complete training videos respectively, matches incomplete videos of unknown classes with complete training videos of known classes in a sequence matching mode, and takes the class of the training video with the best matching result as a result of action prediction. The second method utilizes deep neural network learning to map the characteristics of an incomplete video and the characteristics of a complete video to a space, and migrates semantic knowledge in the complete training video to an action prediction model through characteristic transformation. The third method is to divide the long video into a group of video segments, and analyze the motion category by analyzing the time sequence structure of the context modeling motion of the video segments. However, the main difficulty of motion prediction is that the observation rate of incomplete video is unknown, and the motion progress cannot be known. The semantic information contained in videos with different observation rates has great difference, which causes great difficulty for modeling the time sequence structure of the action.
Disclosure of Invention
In view of the above, the present invention provides a motion prediction method based on a multitask random forest, so as to solve at least the problems of long time spent on incomplete video prediction motion categories and low accuracy rate of the existing motion prediction methods.
The technical scheme provided by the invention is as follows: a motion prediction method based on a multitask random forest comprises the following steps:
s1: constructing an action prediction model based on a multitask random forest by using training data, wherein the multitask random forest is an integrated learning model comprising N multitask decision trees,
the method comprises the following steps of constructing a multitask random forest:
s11: collecting training set containing M incomplete videos
Figure BDA0002198424470000021
Each sample in the training set D contains a feature vector xm∈RF×1Action category label
Figure BDA0002198424470000022
Observation rate label
Figure BDA0002198424470000023
Wherein F represents the number of features in the feature vector and K represents the number of motion categories;
s12: constructing a multitask decision tree, which comprises the following specific steps:
s121: sampling the training set D to obtain a subset of the training set D: a sample set D';
s122: creating a node sequence S, wherein the elements in the sequence are octaves (node, l, f, c)1,c2α, β, v), where node denotes the node number, l denotes the depth of the node, f denotes the parent of the node, c1A left child node representing the node, c2Representing the right child node of the node, alpha representing the splitting characteristic of the node, beta representing the splitting value of the node, and v representing the category vector stored by the node; in particular, if the node is an intermediate node, then v is empty, and if the node is a leaf node, then c is empty1、c2Alpha, beta are null; creating a node sequence Q to be processed, wherein elements in the sequence are four-tuple (node, l, f, D)*) Here, node denotes a node number, l denotes a node depth, f denotes a parent node of the node, and D*Representing a set of training samples arriving at the node; initializing S and Q to null; a root node (1,1,0, D') is created and added to Q byThis operation inputs a sample set D' to the root node; initializing a node counter as 1;
s123: the first node in Q is fetched, and the quadruple representing the node is saved as (node, l, f, D)*) Removing Q, and judging whether the depth L of the node is equal to the maximum depth L of the preset multitask decision treemax(ii) a If L is equal to LmaxIf the node is a leaf node, go to S125; if L is less than LmaxThen continue to judge the sample set D reaching the node*Whether both labels belong to the same category, if both labels belong to the same category, the node is a leaf node, then S125 is executed; if the nodes do not belong to the same category, or belong to the same category on one label and do not belong to the same category on the other label, the node is an intermediate node, and S124 is executed;
s124: intermediate nodes (node, l, f, D)*) Split into two child nodes and put the data set D*Distributing the data into two child nodes, and specifically comprising the following steps:
(1) randomly selecting G candidate features { alpha ] from F featuresg}g=1:GAnd according to D*All samples in (a) at each attribute αgThe value range of the step (a) is used for randomly generating corresponding G splitting values { betag}g=1:GThereby obtaining G doublets { (α)gg)}g=1:GEach doublet represents a candidate splitting scheme;
(2) judgment of D*If the samples in the step (4) belong to the same class on the two labels, selecting an action observation rate label to calculate information gain if the samples in the step (4) belong to the same class on the action class label; if the observation rate labels belong to the same class, calculating information gain by adopting the action class labels, and executing the step (3); if the two labels do not belong to the same class, randomly selecting one of the action label and the observation rate label, if the action type label is selected to calculate the information gain, executing the step (3), and if the action observation rate label is selected to calculate the information gain, executing the step (4);
(3) according to D*In each movement of the sampleMaking a distribution on categories, calculating D*Information entropy EntAction (D)*) Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the schemes (a)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure BDA0002198424470000041
And
Figure BDA0002198424470000042
if a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure BDA0002198424470000043
Otherwise, the sample belongs to the subset
Figure BDA0002198424470000044
Computing from the distribution of samples over action classes
Figure BDA0002198424470000045
Information entropy of (3) EntAction
Figure BDA0002198424470000046
And
Figure BDA00021984244700000418
information entropy of (3) EntActionComputing doublet (. alpha.)gg) To D*Partitioned information gain (D)*gg) Choosing the splitting scheme (α) with the largest information gain**) Then, step (5) is executed;
(4) according to D*The distribution of the samples in (1) on each observation rate label, and D is calculated*Information entropy of (D) EntRatio (D)*) Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the schemes (a)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure BDA0002198424470000048
And
Figure BDA0002198424470000049
if a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure BDA00021984244700000410
Otherwise, the sample belongs to the subsetCalculating according to the distribution of the sample on the observation rate label
Figure BDA00021984244700000412
Information entropy of (1) EntRatio
Figure BDA00021984244700000413
And
Figure BDA00021984244700000414
information entropy of (1) EntRatio
Figure BDA00021984244700000415
Computing doublet (. alpha.)gg) To D*Divided information gain (D)*gg) Choosing the splitting scheme (α) with the largest information gain**);
(5) According to the splitting scheme (. alpha.)**) I.e. splitting characterized by a*Having a cleavage value of beta*Sample set D*Splitting into two subsets of corresponding left and right child nodesAnd
Figure BDA00021984244700000417
the number of the left child node is counter +1, and the number of the left child node is counter + 2; current intermediate node (node, l, f, D)*) Can be further expressed as octave (node, l, f, counter +1, counter +2, alpha)**0), adding the octave group into the node sequence S; the left child node (counter +1, l +1, node,
Figure BDA0002198424470000051
) And right child nodes (counter +2, l +1, node,) Adding a node sequence Q to be processed;
(6) updating the node counter variable: counter + 2; executing S126;
s125: node (node, l, f, D)*) As leaf nodes, count D*The video proportion of each action type in the leaf node forms an action type vector v of the leaf node, the possibility that a sample reaching the leaf node belongs to each action type is expressed, the current leaf node can be further expressed as an octave (node, l, f,0,0,0,0, v), and the octave is added into a node sequence S;
s126: judging whether the node sequence Q to be processed is empty, and if so, completing the construction of a multi-task decision tree; if not, returning to S123 to continue execution;
s13: judging whether N multitask decision trees are constructed or not, if so, completing construction of the multitask random forest; if not, returning to S12 to continue execution, and constructing the next multi-task decision tree;
s2: for a newly input video containing incomplete actions, predicting the action type of the video by utilizing a multitask random forest, and specifically comprising the following steps:
s21: extracting a feature vector x of the video;
s22: initializing the sequence number n of the current tree to 1; initializing a final action class vector v*Is a zero vector, v*∈R1×K
S23: inputting the video feature vector x into the nth tree of the multitask random forest, enabling the sample to go deep downwards from a root node, sequentially selecting a left branch or a right branch according to a splitting principle until reaching a leaf node, and acquiring an action category vector v stored by the leaf node;
s24: judging whether N is equal to the total number N of the trees in the multitask random forest, if so, executing the step S25, otherwise, v*=v*+ v, n equals n +1, and execution continues with step S23;
s25: final action category vector v*In the method, the action category corresponding to the maximum element is used as a prediction result of the multitask random forest on the video; suppose v*The value of the kth element is the largest, and then the prediction result of the video is the action category k.
Preferably, in S121, the training set D is sampled by using a bootstrapping method to obtain a subset of the training set D: the sample set D', boottrap method may try to ensure that the sample set input to each tree is different.
Further preferably, in step (2) in S124, a constant γ ∈ [0,1] is predefined to control the probability of selecting two kinds of tags, and for a certain node, a random number μ ∈ [0,1] is generated, if μ ∈ [ gamma ] is less than γ, the action category tag is selected to calculate the information gain, and if μ ≧ γ, the action observation rate tag is selected to calculate the information gain.
More preferably, in step (3) in S124, D*The entropy of the information is calculated by formula (1):
here, K represents the number of action classes, pkRepresenting a sample set D*The fraction of samples of the kth action class.
Further preferably, in step (3) in S124, the binary group (α)gg) To D*Partitioned information gain (D)*gg) Calculated by equation (2):
Figure BDA0002198424470000062
here, | D*|、
Figure BDA0002198424470000063
Respectively represent a set D*
Figure BDA0002198424470000064
The number of the elements in (B).
Further preferably, in step (3) in S124, the splitting scheme (α) having the largest information gain**) Selected by equation (3):
more preferably, in step (4) in S124, D*The entropy of the information is calculated by formula (4):
here, 9 is the number of types of discretized video observation rates, qjAnd (j ═ 1, 2.. times.9) denotes a sample set D*The sample fraction of the jth video observation rate.
Further preferably, in step (4) in S124, the binary group (α)gg) To D*Divided information gain (D)*gg) Calculated by equation (5):
Figure BDA0002198424470000072
here, | D*|、
Figure BDA0002198424470000073
Respectively represent a set D*
Figure BDA0002198424470000074
The number of the elements in (B).
Further preferably, in step (4) in S124, the splitting scheme (α) having the largest information gain**) Selected by equation (6):
Figure BDA0002198424470000075
further preferably, in S23, the action category vector v is obtained by using the node sequence S of the multitask decision tree, and the specific steps are as follows:
s231: the current node variable e is the first element in the sequence S, i.e. the root node;
s232: taking the value of the current node variable e, and recording as an octave (node, l, f, c)1,c2α, β, v), if c10 or c20 or α or β is 0, then the node is a leaf node, and step S234 is performed; if v is 0, the node is an intermediate node, and step S233 is performed;
s233: judging whether the value of the video feature vector x on the alpha-th feature is smaller than the split value beta, if so, selecting a left branch, traversing S to find out the node serial number c1The octave group is given with a variable e, otherwise, the right branch is selected, and S is traversed to find out the node sequence number c2The octave of (a), which is assigned to variable e; continuing to execute step S232;
s234: and returning the action category vector v recorded in the current node variable e.
According to the action prediction method based on the multitask random forest, the video observation rate is abstracted into a classification task, the potential relation between the action type and the observation rate in an incomplete video is searched through two related tasks of the multitask random forest joint learning action type classification and the video observation rate classification, and the accuracy of predicting the action type of the incomplete video is improved.
Detailed Description
The invention will be further explained with reference to specific embodiments, without limiting the invention.
The invention provides an action prediction method based on a multitask random forest, which comprises the following steps:
s1: constructing an action prediction model based on a multitask random forest by using training data, wherein the multitask random forest is an integrated learning model comprising N multitask decision trees,
the method comprises the following steps of constructing a multitask random forest:
s11: collecting training set containing M incomplete videos
Figure BDA0002198424470000081
Each sample in the training set D contains a feature vector xm∈RF×1Action category label
Figure BDA0002198424470000082
Observation rate label
Figure BDA0002198424470000083
Wherein F represents the number of features in the feature vector and K represents the number of motion categories; specifically, a set of complete videos with motion category labels is collected first to form an original video set D0D was measured at fixed observation rates of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.90Sampling the video to obtain a group of incomplete videos; for example, extracting the first 40% of a complete video to obtain an incomplete video with an observation rate label of 0.4, wherein the action category label of the incomplete video is the same as that of the original complete video; the method for representing the video features based on the dense track (IDT) is adopted, the dense track is firstly extracted from the mth incomplete video, each track is described by four descriptors of track, HOG, HOF and MBH, and then the four descriptors of all the tracks in the video are respectively converted into feature vectors V for describing the whole video by adopting a bag-of-words model1、V2、V3、V4Then, the four video feature vectors are spliced to obtain the feature vector x of the incomplete videom=[V1;V2;V3;V4]∈RF×1The dimension of the vector is F, which means that F features are contained;
s12: constructing a multitask decision tree, which comprises the following specific steps:
s121: sampling the training set D by using a Bootstrap method to obtain a subset of the training set D: a sample set D'; the Bootstrap sampling method is also called self-help sampling method (existing algorithm), which can ensure that the sample sets D' input to each tree are different as much as possible;
s122: creating a node sequence S, wherein the elements in the sequence are octaves (node, l, f, c)1,c2α, β, v), where node represents the node number, l represents the depth of the node, f represents the parent of the node, c1A left child node representing the node, c2Representing the right child node of the node, alpha representing the splitting characteristic of the node, beta representing the splitting value of the node, and v representing the category vector stored by the node; in particular, if the node is an intermediate node, then v is empty, and if the node is a leaf node, then c is empty1、c2Alpha and beta are empty; creating a node sequence Q to be processed, wherein elements in the sequence are four-tuple (node, l, f, D)*) Wherein, node represents the node serial number, l represents the depth of the node, f represents the father node of the node, D*Representing a set of training samples arriving at the node; initializing S and Q to null; creating a root node (1,1,0, D ') and adding it to Q, by which the sample set D' is input to the root node; initializing a node counter as 1;
s123: the first node in Q is fetched, and the quadruple representing the node is saved as (node, l, f, D)*) Removing Q, and judging whether the depth L of the node is equal to the maximum depth L of the preset multitask decision treemax(ii) a If L is equal to LmaxIf the node is a leaf node, go to S125; if L is less than LmaxThen continue to judge the sample set D reaching the node*Whether both labels belong to the same category, if both labels belong to the same category, the node is a leaf node, then S125 is executed; if none belong to the same category, or in a single objectLabeling the nodes belonging to the same category and labeling the nodes not belonging to the same category, wherein the node is an intermediate node, and executing S124;
s124: intermediate nodes (node, l, f, D)*) Split into two child nodes and put the data set D*Distributing the data into two child nodes; the method comprises the following specific steps:
(1) randomly selecting G candidate features { alpha ] from F featuresg}g=1:GAnd according to D*All samples in (a) at each attribute αgThe value range of the step (a) is used for randomly generating corresponding G splitting values { betag}g=1:GThereby obtaining G doublets { (α)gg)}g=1:GEach doublet represents a candidate splitting scheme;
(2) judgment of D*Whether the samples in (1) belong to the same category on both labels. If the action type labels belong to the same type, selecting an action observation rate label to calculate information gain, and executing the step (4); if the observation rate labels belong to the same class, calculating information gain by adopting the action class labels, and executing the step (3); if the two labels do not belong to the same class, one of the action label and the observation rate label is randomly selected to be used as a basis for calculating information gain. Specifically, a constant γ ∈ [0,1] is predefined]To control the probability of selecting two labels, and for a certain node, generate a random number mu epsilon [0,1 ∈ ]]If mu is less than gamma, selecting an action category label to calculate information gain, executing the step (3), if mu is more than or equal to gamma, selecting an action observation rate label to calculate information gain, and executing the step (4);
(3) statistical sample set D*The ratio p of the samples of the kth motion classkThen D is*The entropy of (2) can be calculated by formula (1):
Figure BDA0002198424470000101
here, K represents the number of action categories. Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the aboveTable (alpha)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure BDA0002198424470000111
Andif a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure BDA0002198424470000113
Otherwise, the sample belongs to the subset
Figure BDA0002198424470000114
Also, it can be calculated according to the formula (1)
Figure BDA0002198424470000115
Information entropy end Action of
Figure BDA0002198424470000116
And
Figure BDA0002198424470000117
information entropy of (3) EntAction
Figure BDA0002198424470000118
Equation (2) is a binary group (α)gg) To D*Divided information gain:
Figure BDA0002198424470000119
here, | D*|、
Figure BDA00021984244700001110
Respectively represent a set D*
Figure BDA00021984244700001111
The number of the elements in (B). Then, throughEquation (3) selects the splitting scheme (α) with the largest information gain**):
Then, step (5) is executed;
(4) statistical sample set D*Sample fraction q of jth video observation ratej(j ═ 1, 2.., 9), then D*The entropy of (b) can be calculated by formula (4):
Figure BDA00021984244700001113
here, the number of types of discretized video observation rates is 9. Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the schemes (a)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure BDA00021984244700001114
And
Figure BDA00021984244700001115
if a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure BDA00021984244700001116
Otherwise, the sample belongs to the subset
Figure BDA00021984244700001117
Also, it can be calculated according to the formula (4)
Figure BDA00021984244700001118
Information entropy of (1) EntRatio
Figure BDA00021984244700001119
And
Figure BDA00021984244700001120
information entropy of (1) EntRatioEquation (5) is a binary group (α)gg) To D*Divided information gain:
Figure BDA00021984244700001122
here, | D*|、
Figure BDA0002198424470000121
Respectively represent a set D*
Figure BDA0002198424470000122
The number of the elements in (B). Then, the splitting scheme (α) having the largest information gain is selected by equation (6)**):
Figure BDA0002198424470000123
(5) According to the splitting scheme (. alpha.)**) I.e. splitting characterized by a*Having a cleavage value of beta*Sample set D*Splitting into two subsets of corresponding left and right child nodes
Figure BDA0002198424470000124
And
Figure BDA0002198424470000125
the number of the left child node is counter +1, and the number of the left child node is counter + 2; current intermediate node (node, l, f, D)*) Can be further expressed as octave (node, l, f, counter +1, counter +2, alpha)**0), adding the octave group into the node sequence S; the left child node (counter +1, l +1, node,
Figure BDA0002198424470000126
) And the right sonThe node (counter +2, l +1, node,
Figure BDA0002198424470000127
) Adding a node sequence Q to be processed;
(6) updating the node counter variable: counter + 2; executing S126;
s125: node (node, l, f, D)*) Considered a leaf node, the leaf node holds an action class vector v that expresses the likelihood that a sample arriving at the leaf node belongs to each action class. In particular, the statistical sample set D*The ratio p of the samples of the kth motion classkObtaining the action type vector v ═ p of the leaf node1,p2,...,pK]Here, K is the number of operation types. The current leaf node may be further represented as an octave (node, l, f,0,0,0,0, v) that is added to the node sequence S;
s126: judging whether the node sequence Q to be processed is empty, and if so, completing the construction of a multi-task decision tree; if not, returning to S123 to continue execution;
s13: judging whether N multitask decision trees are constructed or not, if so, completing construction of the multitask random forest; if not, returning to S12 to continue execution, and constructing the next multi-task decision tree;
s2: for a newly input video containing incomplete actions, predicting the action type of the video by utilizing a multitask random forest, and specifically comprising the following steps:
s21: the feature vector x of the video is extracted. Specifically, by adopting a video feature representation method based on dense tracks (IDT), the dense tracks are firstly extracted from a video, each track is described by four descriptors of reject, HOG, HOF and MBH, and then the four descriptors of all the tracks in the video are respectively converted into feature vectors V for describing the whole video by adopting a bag-of-words model1、V2、V3、V4Then, the four video feature vectors are spliced together to obtain the feature vector x ═ V of the video1;V2;V3;V4];
S22: order of the current treeNumber n is initialized to 1; initializing a final action class vector v*Is a zero vector, v*∈R1×K
S23: inputting the video feature vector x into the nth tree of the multitask random forest, enabling the sample to go deep from the root node downwards, sequentially selecting a left branch (left node) or a right branch (right node) according to a splitting principle until reaching a leaf node, and obtaining an action category vector v stored by the leaf node. Specifically, the motion category vector v is obtained by using the node sequence S of the multitask decision tree. Each element in the sequence of nodes S is an octave (node, l, f, c)1,c2α, β, v), where node denotes the node number, l denotes the depth of the node, f denotes the parent of the node, c1A left child node representing the node, c2Represents the right child node of the node, α represents the splitting characteristic of the node, β represents the splitting value of the node, and v represents the action category vector stored by the node. The method comprises the following specific steps:
s231: the current node variable e is the first element in the sequence S, i.e. the root node;
s232: taking the value of the current node variable e, and recording as an octave (node, l, f, c)1,c2α, β, v), if c10 or c20 or α or β is 0, then the node is a leaf node, and step S234 is performed; if v is 0, the node is an intermediate node, and step S233 is performed;
s233: judging whether the value of the video feature vector x on the alpha-th feature is smaller than the split value beta, if so, selecting a left branch, traversing S to find out the node serial number c1The octave group is given with a variable e, otherwise, the right branch is selected, and S is traversed to find out the node sequence number c2The octave of (a), which is assigned to variable e; continuing to execute step S232;
s234: returning the action category vector v recorded in the current node variable e;
s24: judging whether N is equal to the total number N of the trees in the multitask random forest, if so, executing the step S25, otherwise, v*=v*+ v, n equals n +1, and execution continues with step S23;
s25: final action category vector v*In the method, the action category corresponding to the maximum element is used as a prediction result of the multitask random forest on the video; suppose v*The value of the kth element is the largest, and then the prediction result of the video is the action category k.
According to the action prediction method based on the multitask random forest, the video observation rate is abstracted into a classification task, the potential relation between the action type and the observation rate in an incomplete video is searched through two related tasks of the multitask random forest joint learning action type classification and the video observation rate classification, and the accuracy of predicting the action type of the incomplete video is improved.
While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (10)

1. A motion prediction method based on a multitask random forest is characterized by comprising the following steps:
s1: constructing an action prediction model based on a multitask random forest by using training data, wherein the multitask random forest is an integrated learning model comprising N multitask decision trees,
the method comprises the following steps of constructing a multitask random forest:
s11: collecting training set containing M incomplete videos
Figure FDA0002198424460000011
Each sample in the training set D contains a feature vector xm∈RF×1Action category label
Figure FDA0002198424460000012
Observation rate labelWherein F represents the number of features in the feature vector and K represents the number of motion categories;
s12: constructing a multitask decision tree, which comprises the following specific steps:
s121: sampling the training set D to obtain a subset of the training set D: a sample set D';
s122: creating a node sequence S, wherein the elements in the sequence are octaves (node, l, f, c)1,c2α, β, v), where node denotes the node number, l denotes the depth of the node, f denotes the parent of the node, c1A left child node representing the node, c2Representing the right child node of the node, alpha representing the splitting characteristic of the node, beta representing the splitting value of the node, and v representing the category vector stored by the node; in particular, if the node is an intermediate node, then v is empty, and if the node is a leaf node, then c is empty1、c2Alpha, beta are null; creating a node sequence Q to be processed, wherein elements in the sequence are four-tuple (node, l, f, D)*) Here, node denotes a node number, l denotes a node depth, f denotes a parent node of the node, and D*Representing a set of training samples arriving at the node; initializing S and Q to null; creating a root node (1,1,0, D ') and adding it to Q, by which the sample set D' is input to the root node; initializing a node counter as 1;
s123: the first node in Q is fetched, and the quadruple representing the node is saved as (node, l, f, D)*) Removing Q, and judging whether the depth L of the node is equal to the maximum depth L of the preset multitask decision treemax(ii) a If L is equal to LmaxIf the node is a leaf node, go to S125; if L is less than LmaxThen continue to judge the sample set D reaching the node*Whether both labels belong to the same category, if both labels belong to the same category, the node is a leaf node, then S125 is executed; if the nodes do not belong to the same category, or belong to the same category on one label and do not belong to the same category on the other label, the node is an intermediate node, and S124 is executed;
s124: intermediate nodes (node, l, f, D)*) Split into two child nodes and put the data set D*Distributing the data into two child nodes, and specifically comprising the following steps:
(1) randomly selecting G candidate features { alpha ] from F featuresg}g=1:GAnd according to D*All samples in (a) at each attribute αgThe value range of the step (a) is used for randomly generating corresponding G splitting values { betag}g=1:GThereby obtaining G doublets { (α)gg)}g=1:GEach doublet represents a candidate splitting scheme;
(2) judgment of D*If the samples in the step (4) belong to the same class on the two labels, selecting an action observation rate label to calculate information gain if the samples in the step (4) belong to the same class on the action class label; if the observation rate labels belong to the same class, calculating information gain by adopting the action class labels, and executing the step (3); if the two labels do not belong to the same class, randomly selecting one of the action label and the observation rate label, if the action type label is selected to calculate the information gain, executing the step (3), and if the action observation rate label is selected to calculate the information gain, executing the step (4);
(3) according to D*The distribution of the samples in (1) on each action class, calculating D*Information entropy EntAction (D)*) Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the schemes (a)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure FDA0002198424460000021
And
Figure FDA0002198424460000022
if a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure FDA0002198424460000023
Otherwise, it isSamples belonging to a subset
Figure FDA0002198424460000031
Computing from the distribution of samples over action classes
Figure FDA0002198424460000032
Information entropy of
Figure FDA0002198424460000033
And
Figure FDA0002198424460000034
information entropy of
Figure FDA0002198424460000035
Computing doublet (. alpha.)gg) To D*Partitioned information gain (D)*gg) Choosing the splitting scheme (α) with the largest information gain**) Then, step (5) is executed;
(4) according to D*The distribution of the samples in (1) on each observation rate label, and D is calculated*Information entropy of (D) EntRatio (D)*) Sequentially taking a candidate scheme set { (alpha)gg)}g=1:GEach of the schemes (a)gg) Set the samples D*Splitting into two subsets of corresponding left and right child nodes
Figure FDA0002198424460000036
And
Figure FDA0002198424460000037
if a sample is at the alphagThe value of each feature is less than the splitting value betagThen the sample belongs to the subset
Figure FDA0002198424460000038
Otherwise, the sample belongs to the subset
Figure FDA0002198424460000039
Calculating according to the distribution of the sample on the observation rate label
Figure FDA00021984244600000310
Information entropy of
Figure FDA00021984244600000311
And
Figure FDA00021984244600000312
information entropy of
Figure FDA00021984244600000313
Computing doublet (. alpha.)gg) To D*Divided information gain (D)*gg) Choosing the splitting scheme (α) with the largest information gain**);
(5) According to the splitting scheme (. alpha.)**) I.e. splitting characterized by a*Having a cleavage value of beta*Sample set D*Splitting into two subsets of corresponding left and right child nodes
Figure FDA00021984244600000314
And
Figure FDA00021984244600000315
the number of the left child node is counter +1, and the number of the left child node is counter + 2; current intermediate node (node, l, f, D)*) Can be further expressed as octave (node, l, f, counter +1, counter +2, alpha)**0), adding the octave group into the node sequence S; will be left child nodeAnd right child node
Figure FDA00021984244600000317
Adding a node sequence Q to be processed;
(6) updating the node counter variable: counter + 2; executing S126;
s125: node (node, l, f, D)*) As leaf nodes, count D*The video proportion of each action type in the leaf node forms an action type vector v of the leaf node, the possibility that a sample reaching the leaf node belongs to each action type is expressed, the current leaf node can be further expressed as an octave (node, l, f,0,0,0,0, v), and the octave is added into a node sequence S;
s126: judging whether the node sequence Q to be processed is empty, and if so, completing the construction of a multi-task decision tree; if not, returning to S123 to continue execution;
s13: judging whether N multitask decision trees are constructed or not, if so, completing construction of the multitask random forest; if not, returning to S12 to continue execution, and constructing the next multi-task decision tree;
s2: for a newly input video containing incomplete actions, predicting the action type of the video by utilizing a multitask random forest, and specifically comprising the following steps:
s21: extracting a feature vector x of the video;
s22: initializing the sequence number n of the current tree to 1; initializing a final action class vector v*Is a zero vector, v*∈R1×K
S23: inputting the video feature vector x into the nth tree of the multitask random forest, enabling the sample to go deep downwards from a root node, sequentially selecting a left branch or a right branch according to a splitting principle until reaching a leaf node, and acquiring an action category vector v stored by the leaf node;
s24: judging whether N is equal to the total number N of the trees in the multitask random forest, if so, executing the step S25, otherwise, v*=v*+ v, n equals n +1, and execution continues with step S23;
s25: final action category vector v*In the method, the action category corresponding to the maximum element is used as a prediction result of the multitask random forest on the video; suppose v*The value of the kth element is maximum, thenThe prediction result of the video is the action class k.
2. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in S121, sampling the training set D by using a bootstrapping method to obtain a subset of the training set D: the sample set D', boottrap method may try to ensure that the sample set input to each tree is different.
3. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in the step (2) in S124, a constant γ ∈ [0,1] is predefined to control the probability of selecting two labels, and for a certain node, a random number μ ∈ [0,1] is generated, if μ is less than γ, an action category label is selected to calculate information gain, and if μ is greater than or equal to γ, an action observation rate label is selected to calculate information gain.
4. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (3) in S124, D*The entropy of the information is calculated by formula (1):
Figure FDA0002198424460000051
here, K represents the number of action classes, pkRepresenting a sample set D*The fraction of samples of the kth action class.
5. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (3) in S124, the binary group (α)gg) To D*Partitioned information gain (D)*gg) Calculated by equation (2):
Figure FDA0002198424460000052
here, | D*|、
Figure FDA0002198424460000053
Respectively represent a set D*
Figure FDA0002198424460000054
The number of the elements in (B).
6. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (3) in S124, the splitting scheme (α) having the largest information gain**) Selected by equation (3):
Figure FDA0002198424460000055
7. the method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (4) in S124, D*The entropy of the information is calculated by formula (4):
Figure FDA0002198424460000056
here, 9 is the number of types of discretized video observation rates, qjAnd (j ═ 1, 2.. times.9) denotes a sample set D*The sample fraction of the jth video observation rate.
8. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (4) in S124, a binary group (. alpha.) is formedgg) To D*Divided information gain (D)*gg) Calculated by equation (5):
Figure FDA0002198424460000061
here, | D*|、
Figure FDA0002198424460000062
Respectively represent a set D*
Figure FDA0002198424460000063
The number of the elements in (B).
9. The method of motion prediction based on multitasking random forest according to claim 1 characterized by: in step (4) in S124, the splitting scheme (α) having the largest information gain**) Selected by equation (6):
Figure FDA0002198424460000064
10. the method of motion prediction based on multitasking random forest according to claim 1 characterized by: in S23, an action category vector v is obtained by using a node sequence S of the multitask decision tree, and the specific steps are as follows:
s231: the current node variable e is the first element in the sequence S, i.e. the root node;
s232: taking the value of the current node variable e, and recording as an octave (node, l, f, c)1,c2α, β, v), if c10 or c20 or α or β is 0, then the node is a leaf node, and step S234 is performed; if v is 0, the node is an intermediate node, and step S233 is performed;
s233: judging whether the value of the video feature vector x on the alpha-th feature is smaller than the split value beta, if so, selecting a left branch, traversing S to find out the node serial number c1The octave group is given with a variable e, otherwise, the right branch is selected, and S is traversed to find out the node sequence number c2The octave of (a), which is assigned to variable e; continuing to execute step S232;
s234: and returning the action category vector v recorded in the current node variable e.
CN201910856984.6A 2019-09-11 2019-09-11 Action prediction method based on multitask random forest Active CN110633667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910856984.6A CN110633667B (en) 2019-09-11 2019-09-11 Action prediction method based on multitask random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910856984.6A CN110633667B (en) 2019-09-11 2019-09-11 Action prediction method based on multitask random forest

Publications (2)

Publication Number Publication Date
CN110633667A true CN110633667A (en) 2019-12-31
CN110633667B CN110633667B (en) 2021-11-26

Family

ID=68970937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910856984.6A Active CN110633667B (en) 2019-09-11 2019-09-11 Action prediction method based on multitask random forest

Country Status (1)

Country Link
CN (1) CN110633667B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738534A (en) * 2020-08-21 2020-10-02 支付宝(杭州)信息技术有限公司 Training of multi-task prediction model, and prediction method and device of event type
WO2022027371A1 (en) * 2020-08-05 2022-02-10 Alibaba Group Holding Limited Noise control in processor-in-memory architectures
CN115481681A (en) * 2022-09-09 2022-12-16 武汉中数医疗科技有限公司 Artificial intelligence-based breast sampling data processing method
CN115499285A (en) * 2021-06-18 2022-12-20 中国科学院声学研究所 Method for constructing name resolution system provided by distributed hierarchical time delay

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845513A (en) * 2016-12-05 2017-06-13 华中师范大学 Staff detector and method based on condition random forest
CN107292186A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of model training method and device based on random forest
CN108334951A (en) * 2017-01-20 2018-07-27 微软技术许可有限责任公司 For the pre- statistics of the data of the node of decision tree
CN109947079A (en) * 2019-03-20 2019-06-28 阿里巴巴集团控股有限公司 Region method for detecting abnormality and edge calculations equipment based on edge calculations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292186A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of model training method and device based on random forest
CN106845513A (en) * 2016-12-05 2017-06-13 华中师范大学 Staff detector and method based on condition random forest
CN108334951A (en) * 2017-01-20 2018-07-27 微软技术许可有限责任公司 For the pre- statistics of the data of the node of decision tree
CN109947079A (en) * 2019-03-20 2019-06-28 阿里巴巴集团控股有限公司 Region method for detecting abnormality and edge calculations equipment based on edge calculations

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIU CUIWEI 等: "Learning a discrimination mid-level feature for action recognition", 《SCIENCE CHINA》 *
YANHUA YANG 等: "Discrimination Multi-instance Multitask Learning for 3D Action Recognition", 《IEEE》 *
王宪兵 等: "基于Kinect传感器的教学手势识别", 《新型工业化》 *
石祥滨 等: "基于多特征融合的动作识别方法", 《沈阳航空航天大学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022027371A1 (en) * 2020-08-05 2022-02-10 Alibaba Group Holding Limited Noise control in processor-in-memory architectures
CN111738534A (en) * 2020-08-21 2020-10-02 支付宝(杭州)信息技术有限公司 Training of multi-task prediction model, and prediction method and device of event type
CN115499285A (en) * 2021-06-18 2022-12-20 中国科学院声学研究所 Method for constructing name resolution system provided by distributed hierarchical time delay
CN115499285B (en) * 2021-06-18 2023-11-24 中国科学院声学研究所 Method for constructing name resolution system provided by distributed hierarchical time delay
CN115481681A (en) * 2022-09-09 2022-12-16 武汉中数医疗科技有限公司 Artificial intelligence-based breast sampling data processing method
CN115481681B (en) * 2022-09-09 2024-02-06 武汉中数医疗科技有限公司 Mammary gland sampling data processing method based on artificial intelligence

Also Published As

Publication number Publication date
CN110633667B (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN110633667B (en) Action prediction method based on multitask random forest
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN111967294B (en) Unsupervised domain self-adaptive pedestrian re-identification method
Woodward et al. Active one-shot learning
Gavankar et al. Eager decision tree
Janczura et al. Classification of particle trajectories in living cells: Machine learning versus statistical testing hypothesis for fractional anomalous diffusion
CN111046275B (en) User label determining method and device based on artificial intelligence and storage medium
Pérez-Ortega et al. The K-means algorithm evolution
CN108765383B (en) Video description method based on deep migration learning
Hamdani et al. Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate
CN111930518B (en) Knowledge graph representation learning-oriented distributed framework construction method
CN112015898B (en) Model training and text label determining method and device based on label tree
CN114117213A (en) Recommendation model training and recommendation method, device, medium and equipment
CN113297369B (en) Intelligent question-answering system based on knowledge graph subgraph retrieval
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
CN112836509A (en) Expert system knowledge base construction method and system
CN109214004A (en) Big data processing method based on machine learning
Chakarverti et al. Prediction analysis techniques of data mining: a review
CN111191033A (en) Open set classification method based on classification utility
CN110795410A (en) Multi-field text classification method
CN109784404A (en) A kind of the multi-tag classification prototype system and method for fusion tag information
CN116910571B (en) Open-domain adaptation method and system based on prototype comparison learning
CN109241298A (en) Semantic data stores dispatching method
Gupta et al. Transfer learning
CN104468276B (en) Network flow identification method based on random sampling multi-categorizer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231228

Address after: No. 19 Youhao Street, Shenhe District, Shenyang City, Liaoning Province, 110013

Patentee after: Shenyang Tianmu Technology Co.,Ltd.

Address before: 110136, Liaoning, Shenyang, Shenbei New Area moral South Avenue No. 37

Patentee before: SHENYANG AEROSPACE University