CN106909890A - A kind of Human bodys' response method based on position cluster feature - Google Patents
A kind of Human bodys' response method based on position cluster feature Download PDFInfo
- Publication number
- CN106909890A CN106909890A CN201710057722.4A CN201710057722A CN106909890A CN 106909890 A CN106909890 A CN 106909890A CN 201710057722 A CN201710057722 A CN 201710057722A CN 106909890 A CN106909890 A CN 106909890A
- Authority
- CN
- China
- Prior art keywords
- human body
- histogram
- point
- video
- characteristic point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
Abstract
The present invention discloses a kind of Human bodys' response method based on position cluster feature, including:Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, the local location skew and global position skew of each each characteristic point of frame are calculated afterwards;Then the characteristic point offset information of all training videos is collected, and offset information is clustered using K means clustering algorithms, cluster centre is obtained, that is, forms code book, current training video is then represented with one group of histogram of joint characteristic point according to code book;Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up histogram, and compare test phase histogram by naive Bayesian arest neighbors sorting technique afterwards carries out Activity recognition with the histogrammic difference of training stage.Using technical scheme, with discrimination very high.
Description
Technical field
The invention belongs to computer vision and area of pattern recognition, more particularly to a kind of human body based on position cluster feature
Activity recognition method.
Background technology
In recent years, human behavior identification obtained increasing concern, is understood by analyzing interacting for people and object
People even infer that its is intended to what does, it appears particularly critical, thus the automatic understanding for carrying out human action with identification to being permitted
For many artificial intelligence systems it is critical that, this can be widely applied in many practical applications, such as intelligent video prison
In many fields such as control, motion retrieval, man-machine interaction and health care.For example, can intelligently be serviced to build one
In the man-machine interactive system of the mankind, the system not only needs to perceive the motion of human body, and is also understood that the semanteme of human action
And infer that it is intended to.
Action recognition sorting technique traditional at present is mainly by RGB camera acquisition video sequence to carry out behavior knowledge
Not, the video for being obtained in this case is a RGB image sequence according to the tactic 2D of time order and function.Based on RGB
The human action identification of information is having made great progress over the past decades, and many methods are suggested in succession, these method bags
Include human body key poses, Motion mask, outline and Space Time shape etc..Method based on space-time detection can carry out accurate phase
Measured like degree, also the method based on dense motion track is due to enjoying the concern of people with outstanding performance.
Although the above method achieves preferable recognition result in relevant criterion test data set, due to
Human action has the flexibility of height, and the attitude of human body, motion, clothing have significant individual difference, camera perspective, phase
The motion of machine, the change of illumination condition, the spatio-temporal structure for blocking, blocking the simultaneously interaction comprising people-thing simultaneously and complexity certainly
Etc. the combined influence of factor so that human action identification is still extremely challenging.And RGB information is highly susceptible to environmental factor
Influence, such as change of illumination, background etc. can all bring different degrees of interference, further for two different behaviors, RGB figures
Picture may be closely similar, and this will bring very big difficulty to action recognition classification.
With the development of science and technology, the progress of sensor technology so that the cheap depth transducer of high-resolution becomes possible to, example
Such as the Kinect and the Xtion PRO LIVE of HuaShuo Co., Ltd of Microsoft.In the depth map image gathered by depth camera
Each pixel record the depth value of scene, completely different with light intensity value represented by pixel in common RGB image.Depth
The introducing of sensor can greatly expand computer system and perceive three-dimensional world and extract the ability of Low Level Vision information.Depth
The more traditional RGB camera of sensor has unrivaled advantage in terms of human action identification, i.e., it is not by the shadow of illumination condition
Ring, with color and texture consistency, and RGBD cameras can not only obtain RGB sequences can also be while obtaining depth sequence
Row, while depth information can greatly simplify detection and the segmentation task of target.If from single visual angle, different behaviors may have
Similar 2D projections, now depth map can provide extra body-shape information to distinguish different behaviors.So in recent years, largely
The research work of researcher is laid particular emphasis on using 3D information research Activity recognitions, and the 3D information pair obtained by RGBD cameras
The estimation of human body attitude is significantly improved.
Wherein Lu etc. proposes the effective scheme for recognizing human action:By the part for calculating human synovial 3D positions
Position offsets to recognize the action of human body.However, this method does not account for the characteristic of time series, make record joint information
Histogram lose the continuous information of sequence;And their method is not accounted in action recognition in code book formation stages
Each joint sports independence.
Additionally, the Kinect cameras of Microsoft shoot human body when can not only obtain human body depth map and
And 16 joint dot position informations of human body can also be provided simultaneously, the research of Most scholars is all based on Microsoft's
The artis information that Kinect is provided carries out human action identification, but Kinect is when human body is shot, preceding 20 frame
Left and right can not now provide the joint dot position information of human body, in addition for judging to recognize position of the human body in picture
When human action Amplitude Ratio is larger, such as human body from erectility be transitioned into kick when, Kinect is given
Artis position have sizable skew, it is not accurate enough, as shown in Figure 1.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of Human bodys' response method based on position cluster feature,
With discrimination very high.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of Human bodys' response method based on position cluster feature includes:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first,
Each characteristic point of each frame is calculated afterwards to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then collect all
The characteristic point offset information of training video, and offset information is clustered using K-means clustering algorithms, clustered
Center, that is, form code book, then represents current training video with one group of histogram of joint characteristic point according to code book;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up straight
Fang Tu, compares the Nogata of test phase histogram and training stage by naive Bayesian arest neighbors sorting technique (NBNN) afterwards
The difference of figure carries out Activity recognition.
Preferably, the step 1 is comprised the following steps:
Step 1.1, human body attitude feature point extraction, comprise the following steps
Step 1.1.1, firstly the need of exactly position human body limb endpoint location, then centered on acra point, realize people
Body region is divided, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified, by human body depth
Degree pixel is divided into six most of, i.e. head, left arm, right arm, left leg, right leg, trunk, human body classification according to
Classified according to following formula,
Ωi’={ v ∈ V:||v-ei’||geod≤minJ '=0 ..., 5||v-ej’||geod, i '=0 ..., 5
Wherein Ωi’, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left sides
Leg, right leg, trunk;V represents some pixel, e in human bodyi’Represent the i-th ' individual acra point, i.e. the left hand right hand or
Left foot right crus of diaphragm, when i '=0, ei’Represent the central point of human body.||v-ei’||geadRepresent pixel v to acra point ei’Geodesic distance
From
Step 1.1.2, the Divisional characteristic point for having used the region clustering algorithm extraction human body based on K-means, that is, exist
Above-mentioned to obtain being clustered in the block of human body acra point position, the representation of the artis according to human body extracts cluster feature point
To characterize different human body attitudes.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D of m characteristic point of each frame sits
Mark f (t) can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents that i-th human body of f (t) is special
3D coordinate informations a little are levied, m represents the quantity of characteristic point.
The global offset for obtaining action sequence by the characteristic point position offset information for calculating current t frames and the first frame is believed
Breath:
fi1=θi(t)-θi(1)
By calculating current t frames and the (part of the characteristic point position offset information acquisition action sequence of t- Δ t) frames
Offset information:
fi2=θi(t)-θi(t-Δt)
Wherein, Δ t is a time interval.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can lead to
Cross global offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]。
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video
Assuming that all human body feature points of each training video are represented with one group of offset information, all videos being collected into
Each characteristic point global offset vector R1Represent, i.e.,WhereinCorresponding is jth
The t frames of the ith feature point of individual training video, the local offset vector R of each characteristic point of all videos being collected into2
Represent, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video, if
R=R1YR2, cluster is carried out to R using K-means algorithms afterwards and forms code book { bk, k=1,2 ..., K, each code word is just
It is the center of each cluster, here using the clustering measure method of Euclidean distance.
If each training video F={ f (t) }, t=1, what 2 ..., n, wherein n were represented is frame number, in each frame f (t)
The global offset vector f of each human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find
The most short code word of Euclidean distance, i.e.,
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and
f2iT (), the position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is a pass
In the histogram of each code word frequency, byWithComposition, wherein hi 1To represent the global offset amount Nogata of ith feature point
Figure,The local offset histogram of ith feature point is represented, i.e.,
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F=
{hi, i=1, wherein 2 ..., m, hiCorresponding is the histogram of ith feature point.
Preferably, using naive Bayesian arest neighbors sorting technique (Native Bayes Nearest in step 2
Neighbor Classifier NBNN) carry out the classification of motion:The video sequence F=that known one group of characteristic point histogram is represented
{hi, i=1, wherein 2 ..., m, m are the quantity of characteristic point,
It is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, Activity recognition,
What is calculated is the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, following institute
Show:
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'iC () represents the histogram of ith feature point in behavior class c.
Brief description of the drawings
Fig. 1 is the wrong artis schematic diagram that Microsoft Kinect are given;
Fig. 2 is Human bodys' response method flow schematic diagram of the present invention;
Fig. 3 is the acra feature detection schematic diagram based on geodesic distance;
Fig. 4 is the human region mark schematic diagram based on geodesic distance;
Fig. 5 is that the posture feature based on cluster extracts schematic diagram;
Fig. 6 a are the global offset schematic diagram of present frame;
Fig. 6 b are the local offset schematic diagram of present frame;
Fig. 7 is to be offset to form cluster centre and histogrammic procedure chart according to characteristic point global and local position;
Fig. 8 be different situations under action recognition rate compare figure;
Fig. 9 carries out the result schematic diagram of action recognition classification for the method for the present invention;
Figure 10 is the result schematic diagram for carrying out action recognition classification using the method for Lu et al. based on joint point feature;
Figure 11 is the result schematic diagram for carrying out action recognition classification using the method for the present invention based on joint point feature.
Specific embodiment
Present example provides a kind of Human bodys' response method based on position cluster feature, in order to avoid human synovial
Dot position information is not accurate enough, using division of human body position cluster centre as the characteristic point for characterizing human body attitude;In order to using dynamic
Make the global property of sequence information, the present invention adds global position skew to make up using only local position in sequence signature vector
Put the defect that offset information is identified.Based on this, it is necessary to the key issue for solving includes:The extraction of human body attitude feature;People
The calculating of body action sequence characteristic vector;Action recognition is classified.
Range image sequence of present invention when human motion calculates human action classification as input data as defeated
Go out;Wherein, to be the side-play amount structural feature vector using the locus of human body attitude feature describe the core link of calculating
One behavior sequence (including global offset information and local offset information), and the classification of motion is realized on this basis.
A kind of Human bodys' response method based on position cluster feature includes:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first,
Each characteristic point of each frame is calculated afterwards to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then collect all
The characteristic point offset information of training video, and offset information is clustered using K-means clustering algorithms, clustered
Center, that is, code book is formed into, then representing current training with one group of histogram of joint characteristic point according to code book regards
Frequently;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up straight
Fang Tu, compares the Nogata of test phase histogram and training stage by naive Bayesian arest neighbors sorting technique (NBNN) afterwards
The difference of figure carries out Activity recognition, as shown in Figure 2.
The step 1 is comprised the following steps:
Step 1.1, human body attitude feature point extraction
In this stage, use Kinect to shoot actual human body sampling depth data, be then converted into a little depth data
Cloud.
As shown in Figure 3.Firstly the need of positioning human body acra point (right-hand man, left and right pin and head) position exactly (with human body
Geometric center point is that source point carries out acra point location using the Dijkstra's algorithm based on geodesic distance).Then with acra point
Centered on, realize that human region is divided.
As shown in figure 4, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified,
Human depth's pixel is divided into six major parts, i.e. head, left arm, right arm, left leg, right leg, trunk.Human body portion
Classified the position following formula of classification foundation (1).
Ωi’={ v ∈ V:||v-ei’||geod≤minJ '=0......5||v-ej’||geod, i '=0 ..., 5
(1)
Wherein Ωi’, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left sides
Leg, right leg, trunk.V represents some pixel, e in human bodyi’Represent the i-th ' individual acra point, i.e. the left hand right hand or
Left foot right crus of diaphragm, when i '=0, ei’Represent the central point of human body.||v-ei’||geodRepresent pixel v to acra point ei’Geodesic distance
From.Formula (1) is all pixels in the individual position of expression the i-th ' to the i-th ' individual acra point ei’Geodesic distance be less than other
The geodesic distance of acra point.
In order to effectively characterize human body attitude, this method has used the region clustering algorithm based on K-means to extract
The Divisional characteristic point of human body, i.e., obtain being clustered in the block of human body acra point position above-mentioned.As shown in Figure 5.In fact, poly-
When class point number (i.e. characteristic point quantity) m is very few, the expressiveness of feature shortcoming, a cluster point number cross at most characteristic rule compared with
Difference.The present invention extracts m=15 cluster feature point different to characterize according to conventional 16 representations of artis of human body
Human body attitude.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D of m characteristic point of each frame sits
Mark f (t) can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
(2)
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents that i-th human body of f (t) is special
3D coordinate informations a little are levied, m represents the quantity of characteristic point.
The present invention obtains the overall situation of action sequence by calculating the characteristic point position offset information of current t frames and the first frame
Offset information:
fi1=θi(t)-θi(1)
By calculating current t frames and the (part of the characteristic point position offset information acquisition action sequence of t- Δ t) frames
Offset information:
fi2=θi(t)-θi(t-Δt)
As shown in fig. 6, wherein Δ t is a time interval, it can be with the precision of balanced deflection amount and noise robustness
Ability.Δ t values are bigger, then the robustness of noise is just more preferable, but computational accuracy will be reduced, conversely, robustness is then poor,
Precision can be higher.Depending on actual conditions of the value according to different action sequence databases.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can lead to
Cross global offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video:Assuming that all human bodies of each training video are special
Levy and a little represented with one group of offset information.The global offset vector R of each characteristic point of all videos being collected into1Table
Show, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video.
The local offset vector R of each characteristic point of all videos being collected into2Represent, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video.If R=R1YR2.K-means algorithms are used afterwards
Cluster is carried out to R and forms code book { bk, k=1,2 ..., K, each code word are exactly the center of each cluster, are used here
The clustering measure method of Euclidean distance.
If each training video F={ f (t) }, t=1,2 ..., n.What n was represented is frame number.It is each in each frame f (t)
The global offset vector f of individual human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find Euclidean
The most short code word of distance, i.e.,
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and
f2i(t).The position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is a pass
In the histogram of each code word frequency, byWithComposition, whereinTo represent the global offset amount Nogata of ith feature point
Figure,The local offset histogram of ith feature point is represented, i.e.,
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F=
{hi, i=1, wherein 2 ..., m, hiCorresponding is the histogram of ith feature point, as shown in Figure 7.
Naive Bayesian arest neighbors sorting technique (Native Bayes Nearest Neighbor are used in step 2
Classifier NBNN) carry out the classification of motion:Video sequence F={ the h that known one group of characteristic point histogram is representedi, i=1,
2 ..., m, wherein m are the quantity of characteristic point, are generally easy to for this group of histogram to combine straight as one
Square figure is classified.The independence of characteristics of human body's space of points can thus be lost.The spatial information of human body feature point is being distinguished not
With behavior when extra clue can be provided, so to take into full account the independence of human body feature point.
The present invention is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, behavior
Identification, calculating be the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, such as
Shown in lower:
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'i(C) histogram of ith feature point in behavior class c is represented.
Formula (7) is to represent the test video sequence for being input into, and obtains the histogram of each characteristic point, is then counted
The histogrammic difference of the m histogram of characteristic point and each class behavior of training video, the corresponding behavior class c with lowest difference*, i.e.,
It is considered as the behavior class corresponding to current video F.
The above method is had been applied to the range image sequence of Kinect2 acquisitions, achieve good experimental result.
We select 640 × 480 RGBD images in experiment, and collection environment is interior, and collection illumination is fluorescent lamp, acquires 6 people,
Everyone 7 kinds of actions, each action does twice, altogether 84 video sequences, altogether 6343 frame, wherein act including lift respectively
Hand, wave, squat down, kicking, bending over, body bilateral is waved, body swing etc..
It is 2 for the ratio of training set and test set for each action selection when being tested:1, chosen at random
Choosing, has carried out 50 random experiments altogether, and the average recognition accuracy for obtaining is 98.07%.Same video sequence, in identical reality
Under the conditions of testing, that is, same number of times experiment is carried out, training set is identical with the ratio of test set, using the method for Lu et al., obtains
Average recognition rate is 95.00%.The artis provided using Microsoft Kinect is acted using the method for the present invention
Identification classification, the average recognition rate for obtaining be 96.43%, it is seen that the cluster feature point based on position as action recognition classify according to
According to validity.
Give the method for the method of the present invention and Lu et al. and carried based on Kinect with table 1 as shown in Fig. 8,9,10,11
For artis the Different Results comparison schematic diagram of action recognition classification is carried out using this method, it can be seen that it is proposed by the present invention
Method has discrimination very high under major part action.
In sum, the human action method for identifying and classifying based on division of human body position cluster feature proposed by the present invention passes through
Checking, can obtain highly desirable classification results.
Accuracy of identification and recognition result table under the different situations of table 1
Claims (3)
1. a kind of Human bodys' response method based on position cluster feature, it is characterised in that including:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, afterwards
Each characteristic point of each frame is calculated to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then all training are collected
The characteristic point offset information of video, and offset information is clustered using K-means clustering algorithms, in being clustered
The heart, that is, form code book, then represents current training video with one group of histogram of joint characteristic point according to code book;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up histogram,
Compare test phase histogram by naive Bayesian arest neighbors sorting technique afterwards to enter with the histogrammic difference of training stage
Row Activity recognition.
2. the Human bodys' response method of position cluster feature is based on as claimed in claim 1, it is characterised in that the step
1 comprises the following steps:
Step 1.1, human body attitude feature point extraction, comprise the following steps
Step 1.1.1, firstly the need of exactly position human body limb endpoint location, then centered on acra point, realize human body area
Domain divides, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified, by human depth's picture
Element is divided into six major parts, i.e. head, left arm, right arm, left leg, right leg, trunk, under human body classification foundation
Formula is stated to be classified,
Ωi′={ v ∈ V:||v-ei′||gead≤minJ '=0 ... 5||v-ej′||gead, i '=0 ..., 5
Wherein Ωi′, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left legs
Portion, right leg, trunk;V represents some pixel, e in human bodyi′Represent the i-th ' individual acra point, i.e. the left hand right hand or a left side
Pin right crus of diaphragm, when i '=0, ei′Represent the central point of human body.||v-ei′||geodRepresent pixel v to acra point ei′Geodesic distance
Step 1.1.2, the Divisional characteristic point for having used the region clustering algorithm extraction human body based on K-means, i.e., above-mentioned
Obtain being clustered in the block of human body acra point position, the representation of the artis according to human body, extract cluster feature point with table
Levy different human body attitudes.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D coordinates f of m characteristic point of each frame
T () can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents i-th human body feature point of f (t)
3D coordinate informations, m represents the quantity of characteristic point.
The global offset information of action sequence is obtained by the characteristic point position offset information for calculating current t frames and the first frame:
fi1=θi(t)-θi(1)
By calculating current t frames and the (local offset of the characteristic point position offset information acquisition action sequence of t- Δ t) frames
Information:
fi2=θi(t)-θi(t-Δt)
Wherein, Δ t is a time interval.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can be by complete
Office offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video
Assuming that all human body feature points of each training video are represented with one group of offset information, all videos being collected into it is every
The global offset vector R of individual characteristic point1Represent, i.e.,WhereinCorresponding is j-th instruction
Practice the t frames of the ith feature point of video, the local offset vector R of each characteristic point of all videos being collected into2Table
Show, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video,
If R=R1YR2, cluster is carried out to R using K-means algorithms afterwards and forms code book { bi, wherein using the cluster degree of Euclidean distance
Amount method, each code word is exactly the K center of cluster, that is, { bk, k=1,2 ..., K.
If each training video F={ f (t) }, t=1, what 2 ..., n, wherein n were represented is frame number, each in each frame f (t)
The global offset vector f of individual human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find Euclidean
The most short code word of distance, i.e.,
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and f2i(t),
The position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is one on each
The histogram of code word frequency, byWithComposition, whereinTo represent the global offset amount histogram of ith feature point,Table
Show the local offset histogram of ith feature point, i.e.,
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F={ hi,
I=1,2 ..., m, wherein hiCorresponding is the histogram of ith feature point.
3. the Human bodys' response method of position cluster feature is based on as claimed in claim 1, it is characterised in that in step 2
Entered using naive Bayesian arest neighbors sorting technique (Native Bayes Nearest Neighbor Classifier NBNN)
The row classification of motion:Video sequence F={ the h that known one group of characteristic point histogram is representedi, i=1, wherein 2 ..., m, m are features
The quantity of point;
It is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, Activity recognition, calculate
Be the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, it is as follows:
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'iC () represents the histogram of ith feature point in behavior class c.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710057722.4A CN106909890B (en) | 2017-01-23 | 2017-01-23 | Human behavior recognition method based on part clustering characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710057722.4A CN106909890B (en) | 2017-01-23 | 2017-01-23 | Human behavior recognition method based on part clustering characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106909890A true CN106909890A (en) | 2017-06-30 |
CN106909890B CN106909890B (en) | 2020-02-11 |
Family
ID=59207591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710057722.4A Active CN106909890B (en) | 2017-01-23 | 2017-01-23 | Human behavior recognition method based on part clustering characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909890B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520250A (en) * | 2018-04-19 | 2018-09-11 | 北京工业大学 | A kind of human motion sequence extraction method of key frame |
CN108564047A (en) * | 2018-04-19 | 2018-09-21 | 北京工业大学 | A kind of Human bodys' response method based on the joints 3D point sequence |
CN109272523A (en) * | 2018-08-13 | 2019-01-25 | 西安交通大学 | Based on the random-stow piston position and orientation estimation method for improving CVFH and CRH feature |
CN110121103A (en) * | 2019-05-06 | 2019-08-13 | 郭凌含 | The automatic editing synthetic method of video and device |
CN110163103A (en) * | 2019-04-18 | 2019-08-23 | 中国农业大学 | A kind of live pig Activity recognition method and apparatus based on video image |
CN111249691A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Athlete training method and system based on body shape recognition |
CN112784662A (en) * | 2018-12-30 | 2021-05-11 | 奥瞳系统科技有限公司 | Video-based fall risk evaluation system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150023590A1 (en) * | 2013-07-16 | 2015-01-22 | National Taiwan University Of Science And Technology | Method and system for human action recognition |
CN104715493A (en) * | 2015-03-23 | 2015-06-17 | 北京工业大学 | Moving body posture estimating method |
-
2017
- 2017-01-23 CN CN201710057722.4A patent/CN106909890B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150023590A1 (en) * | 2013-07-16 | 2015-01-22 | National Taiwan University Of Science And Technology | Method and system for human action recognition |
CN104715493A (en) * | 2015-03-23 | 2015-06-17 | 北京工业大学 | Moving body posture estimating method |
Non-Patent Citations (1)
Title |
---|
GUOLIANG LU ETC.: ""Efficient action recognition via local position offset of 3D skeletal body joints"", 《SPRINGER SCIENCE+BUSINESS MEDIA NEW YORK》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520250A (en) * | 2018-04-19 | 2018-09-11 | 北京工业大学 | A kind of human motion sequence extraction method of key frame |
CN108564047A (en) * | 2018-04-19 | 2018-09-21 | 北京工业大学 | A kind of Human bodys' response method based on the joints 3D point sequence |
CN108564047B (en) * | 2018-04-19 | 2021-09-10 | 北京工业大学 | Human behavior identification method based on3D joint point sequence |
CN108520250B (en) * | 2018-04-19 | 2021-09-14 | 北京工业大学 | Human motion sequence key frame extraction method |
CN109272523A (en) * | 2018-08-13 | 2019-01-25 | 西安交通大学 | Based on the random-stow piston position and orientation estimation method for improving CVFH and CRH feature |
CN111249691A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Athlete training method and system based on body shape recognition |
CN111249691B (en) * | 2018-11-30 | 2021-11-23 | 百度在线网络技术(北京)有限公司 | Athlete training method and system based on body shape recognition |
CN112784662A (en) * | 2018-12-30 | 2021-05-11 | 奥瞳系统科技有限公司 | Video-based fall risk evaluation system |
CN110163103A (en) * | 2019-04-18 | 2019-08-23 | 中国农业大学 | A kind of live pig Activity recognition method and apparatus based on video image |
CN110163103B (en) * | 2019-04-18 | 2021-07-30 | 中国农业大学 | Live pig behavior identification method and device based on video image |
CN110121103A (en) * | 2019-05-06 | 2019-08-13 | 郭凌含 | The automatic editing synthetic method of video and device |
Also Published As
Publication number | Publication date |
---|---|
CN106909890B (en) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Action recognition based on joint trajectory maps with convolutional neural networks | |
Pala et al. | Multimodal person reidentification using RGB-D cameras | |
CN107832672B (en) | Pedestrian re-identification method for designing multi-loss function by utilizing attitude information | |
CN106909890A (en) | A kind of Human bodys' response method based on position cluster feature | |
Kamal et al. | A hybrid feature extraction approach for human detection, tracking and activity recognition using depth sensors | |
WO2017101434A1 (en) | Human body target re-identification method and system among multiple cameras | |
Medioni et al. | Identifying noncooperative subjects at a distance using face images and inferred three-dimensional face models | |
US20060018516A1 (en) | Monitoring activity using video information | |
Yao et al. | Robust CNN-based gait verification and identification using skeleton gait energy image | |
Hu et al. | Exploring structural information and fusing multiple features for person re-identification | |
Thành et al. | An evaluation of pose estimation in video of traditional martial arts presentation | |
WO2009123354A1 (en) | Method, apparatus, and program for detecting object | |
CN104794451B (en) | Pedestrian's comparison method based on divided-fit surface structure | |
CN110008913A (en) | The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism | |
Pandey et al. | Hand gesture recognition for sign language recognition: A review | |
CN110263605A (en) | Pedestrian's dress ornament color identification method and device based on two-dimension human body guise estimation | |
JP5940862B2 (en) | Image processing device | |
CN103902992B (en) | Human face recognition method | |
CN109271932A (en) | Pedestrian based on color-match recognition methods again | |
JP7422456B2 (en) | Image processing device, image processing method and program | |
CN114187665A (en) | Multi-person gait recognition method based on human body skeleton heat map | |
CN108280421A (en) | Human bodys' response method based on multiple features Depth Motion figure | |
Wang et al. | Hand motion and posture recognition in a network of calibrated cameras | |
Munaro et al. | An evaluation of 3d motion flow and 3d pose estimation for human action recognition | |
Hayashi et al. | Upper body pose estimation for team sports videos using a poselet-regressor of spine pose and body orientation classifiers conditioned by the spine angle prior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |