CN106909890A - A kind of Human bodys' response method based on position cluster feature - Google Patents

A kind of Human bodys' response method based on position cluster feature Download PDF

Info

Publication number
CN106909890A
CN106909890A CN201710057722.4A CN201710057722A CN106909890A CN 106909890 A CN106909890 A CN 106909890A CN 201710057722 A CN201710057722 A CN 201710057722A CN 106909890 A CN106909890 A CN 106909890A
Authority
CN
China
Prior art keywords
human body
histogram
point
video
characteristic point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710057722.4A
Other languages
Chinese (zh)
Other versions
CN106909890B (en
Inventor
孔德慧
贾文浩
孙彬
王少帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710057722.4A priority Critical patent/CN106909890B/en
Publication of CN106909890A publication Critical patent/CN106909890A/en
Application granted granted Critical
Publication of CN106909890B publication Critical patent/CN106909890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Abstract

The present invention discloses a kind of Human bodys' response method based on position cluster feature, including:Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, the local location skew and global position skew of each each characteristic point of frame are calculated afterwards;Then the characteristic point offset information of all training videos is collected, and offset information is clustered using K means clustering algorithms, cluster centre is obtained, that is, forms code book, current training video is then represented with one group of histogram of joint characteristic point according to code book;Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up histogram, and compare test phase histogram by naive Bayesian arest neighbors sorting technique afterwards carries out Activity recognition with the histogrammic difference of training stage.Using technical scheme, with discrimination very high.

Description

A kind of Human bodys' response method based on position cluster feature
Technical field
The invention belongs to computer vision and area of pattern recognition, more particularly to a kind of human body based on position cluster feature Activity recognition method.
Background technology
In recent years, human behavior identification obtained increasing concern, is understood by analyzing interacting for people and object People even infer that its is intended to what does, it appears particularly critical, thus the automatic understanding for carrying out human action with identification to being permitted For many artificial intelligence systems it is critical that, this can be widely applied in many practical applications, such as intelligent video prison In many fields such as control, motion retrieval, man-machine interaction and health care.For example, can intelligently be serviced to build one In the man-machine interactive system of the mankind, the system not only needs to perceive the motion of human body, and is also understood that the semanteme of human action And infer that it is intended to.
Action recognition sorting technique traditional at present is mainly by RGB camera acquisition video sequence to carry out behavior knowledge Not, the video for being obtained in this case is a RGB image sequence according to the tactic 2D of time order and function.Based on RGB The human action identification of information is having made great progress over the past decades, and many methods are suggested in succession, these method bags Include human body key poses, Motion mask, outline and Space Time shape etc..Method based on space-time detection can carry out accurate phase Measured like degree, also the method based on dense motion track is due to enjoying the concern of people with outstanding performance.
Although the above method achieves preferable recognition result in relevant criterion test data set, due to Human action has the flexibility of height, and the attitude of human body, motion, clothing have significant individual difference, camera perspective, phase The motion of machine, the change of illumination condition, the spatio-temporal structure for blocking, blocking the simultaneously interaction comprising people-thing simultaneously and complexity certainly Etc. the combined influence of factor so that human action identification is still extremely challenging.And RGB information is highly susceptible to environmental factor Influence, such as change of illumination, background etc. can all bring different degrees of interference, further for two different behaviors, RGB figures Picture may be closely similar, and this will bring very big difficulty to action recognition classification.
With the development of science and technology, the progress of sensor technology so that the cheap depth transducer of high-resolution becomes possible to, example Such as the Kinect and the Xtion PRO LIVE of HuaShuo Co., Ltd of Microsoft.In the depth map image gathered by depth camera Each pixel record the depth value of scene, completely different with light intensity value represented by pixel in common RGB image.Depth The introducing of sensor can greatly expand computer system and perceive three-dimensional world and extract the ability of Low Level Vision information.Depth The more traditional RGB camera of sensor has unrivaled advantage in terms of human action identification, i.e., it is not by the shadow of illumination condition Ring, with color and texture consistency, and RGBD cameras can not only obtain RGB sequences can also be while obtaining depth sequence Row, while depth information can greatly simplify detection and the segmentation task of target.If from single visual angle, different behaviors may have Similar 2D projections, now depth map can provide extra body-shape information to distinguish different behaviors.So in recent years, largely The research work of researcher is laid particular emphasis on using 3D information research Activity recognitions, and the 3D information pair obtained by RGBD cameras The estimation of human body attitude is significantly improved.
Wherein Lu etc. proposes the effective scheme for recognizing human action:By the part for calculating human synovial 3D positions Position offsets to recognize the action of human body.However, this method does not account for the characteristic of time series, make record joint information Histogram lose the continuous information of sequence;And their method is not accounted in action recognition in code book formation stages Each joint sports independence.
Additionally, the Kinect cameras of Microsoft shoot human body when can not only obtain human body depth map and And 16 joint dot position informations of human body can also be provided simultaneously, the research of Most scholars is all based on Microsoft's The artis information that Kinect is provided carries out human action identification, but Kinect is when human body is shot, preceding 20 frame Left and right can not now provide the joint dot position information of human body, in addition for judging to recognize position of the human body in picture When human action Amplitude Ratio is larger, such as human body from erectility be transitioned into kick when, Kinect is given Artis position have sizable skew, it is not accurate enough, as shown in Figure 1.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of Human bodys' response method based on position cluster feature, With discrimination very high.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of Human bodys' response method based on position cluster feature includes:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, Each characteristic point of each frame is calculated afterwards to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then collect all The characteristic point offset information of training video, and offset information is clustered using K-means clustering algorithms, clustered Center, that is, form code book, then represents current training video with one group of histogram of joint characteristic point according to code book;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up straight Fang Tu, compares the Nogata of test phase histogram and training stage by naive Bayesian arest neighbors sorting technique (NBNN) afterwards The difference of figure carries out Activity recognition.
Preferably, the step 1 is comprised the following steps:
Step 1.1, human body attitude feature point extraction, comprise the following steps
Step 1.1.1, firstly the need of exactly position human body limb endpoint location, then centered on acra point, realize people Body region is divided, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified, by human body depth Degree pixel is divided into six most of, i.e. head, left arm, right arm, left leg, right leg, trunk, human body classification according to Classified according to following formula,
Ωi’={ v ∈ V:||v-ei’||geod≤minJ '=0 ..., 5||v-ej’||geod, i '=0 ..., 5
Wherein Ωi’, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left sides Leg, right leg, trunk;V represents some pixel, e in human bodyi’Represent the i-th ' individual acra point, i.e. the left hand right hand or Left foot right crus of diaphragm, when i '=0, ei’Represent the central point of human body.||v-ei’||geadRepresent pixel v to acra point ei’Geodesic distance From
Step 1.1.2, the Divisional characteristic point for having used the region clustering algorithm extraction human body based on K-means, that is, exist Above-mentioned to obtain being clustered in the block of human body acra point position, the representation of the artis according to human body extracts cluster feature point To characterize different human body attitudes.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D of m characteristic point of each frame sits Mark f (t) can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents that i-th human body of f (t) is special 3D coordinate informations a little are levied, m represents the quantity of characteristic point.
The global offset for obtaining action sequence by the characteristic point position offset information for calculating current t frames and the first frame is believed Breath:
fi1i(t)-θi(1)
By calculating current t frames and the (part of the characteristic point position offset information acquisition action sequence of t- Δ t) frames Offset information:
fi2i(t)-θi(t-Δt)
Wherein, Δ t is a time interval.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can lead to Cross global offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]。
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video
Assuming that all human body feature points of each training video are represented with one group of offset information, all videos being collected into Each characteristic point global offset vector R1Represent, i.e.,WhereinCorresponding is jth The t frames of the ith feature point of individual training video, the local offset vector R of each characteristic point of all videos being collected into2 Represent, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video, if R=R1YR2, cluster is carried out to R using K-means algorithms afterwards and forms code book { bk, k=1,2 ..., K, each code word is just It is the center of each cluster, here using the clustering measure method of Euclidean distance.
If each training video F={ f (t) }, t=1, what 2 ..., n, wherein n were represented is frame number, in each frame f (t) The global offset vector f of each human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find The most short code word of Euclidean distance, i.e.,
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and f2iT (), the position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is a pass In the histogram of each code word frequency, byWithComposition, wherein hi 1To represent the global offset amount Nogata of ith feature point Figure,The local offset histogram of ith feature point is represented, i.e.,
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F= {hi, i=1, wherein 2 ..., m, hiCorresponding is the histogram of ith feature point.
Preferably, using naive Bayesian arest neighbors sorting technique (Native Bayes Nearest in step 2 Neighbor Classifier NBNN) carry out the classification of motion:The video sequence F=that known one group of characteristic point histogram is represented {hi, i=1, wherein 2 ..., m, m are the quantity of characteristic point,
It is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, Activity recognition, What is calculated is the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, following institute Show:
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'iC () represents the histogram of ith feature point in behavior class c.
Brief description of the drawings
Fig. 1 is the wrong artis schematic diagram that Microsoft Kinect are given;
Fig. 2 is Human bodys' response method flow schematic diagram of the present invention;
Fig. 3 is the acra feature detection schematic diagram based on geodesic distance;
Fig. 4 is the human region mark schematic diagram based on geodesic distance;
Fig. 5 is that the posture feature based on cluster extracts schematic diagram;
Fig. 6 a are the global offset schematic diagram of present frame;
Fig. 6 b are the local offset schematic diagram of present frame;
Fig. 7 is to be offset to form cluster centre and histogrammic procedure chart according to characteristic point global and local position;
Fig. 8 be different situations under action recognition rate compare figure;
Fig. 9 carries out the result schematic diagram of action recognition classification for the method for the present invention;
Figure 10 is the result schematic diagram for carrying out action recognition classification using the method for Lu et al. based on joint point feature;
Figure 11 is the result schematic diagram for carrying out action recognition classification using the method for the present invention based on joint point feature.
Specific embodiment
Present example provides a kind of Human bodys' response method based on position cluster feature, in order to avoid human synovial Dot position information is not accurate enough, using division of human body position cluster centre as the characteristic point for characterizing human body attitude;In order to using dynamic Make the global property of sequence information, the present invention adds global position skew to make up using only local position in sequence signature vector Put the defect that offset information is identified.Based on this, it is necessary to the key issue for solving includes:The extraction of human body attitude feature;People The calculating of body action sequence characteristic vector;Action recognition is classified.
Range image sequence of present invention when human motion calculates human action classification as input data as defeated Go out;Wherein, to be the side-play amount structural feature vector using the locus of human body attitude feature describe the core link of calculating One behavior sequence (including global offset information and local offset information), and the classification of motion is realized on this basis.
A kind of Human bodys' response method based on position cluster feature includes:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, Each characteristic point of each frame is calculated afterwards to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then collect all The characteristic point offset information of training video, and offset information is clustered using K-means clustering algorithms, clustered Center, that is, code book is formed into, then representing current training with one group of histogram of joint characteristic point according to code book regards Frequently;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up straight Fang Tu, compares the Nogata of test phase histogram and training stage by naive Bayesian arest neighbors sorting technique (NBNN) afterwards The difference of figure carries out Activity recognition, as shown in Figure 2.
The step 1 is comprised the following steps:
Step 1.1, human body attitude feature point extraction
In this stage, use Kinect to shoot actual human body sampling depth data, be then converted into a little depth data Cloud.
As shown in Figure 3.Firstly the need of positioning human body acra point (right-hand man, left and right pin and head) position exactly (with human body Geometric center point is that source point carries out acra point location using the Dijkstra's algorithm based on geodesic distance).Then with acra point Centered on, realize that human region is divided.
As shown in figure 4, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified, Human depth's pixel is divided into six major parts, i.e. head, left arm, right arm, left leg, right leg, trunk.Human body portion Classified the position following formula of classification foundation (1).
Ωi’={ v ∈ V:||v-ei’||geod≤minJ '=0......5||v-ej’||geod, i '=0 ..., 5
(1)
Wherein Ωi’, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left sides Leg, right leg, trunk.V represents some pixel, e in human bodyi’Represent the i-th ' individual acra point, i.e. the left hand right hand or Left foot right crus of diaphragm, when i '=0, ei’Represent the central point of human body.||v-ei’||geodRepresent pixel v to acra point ei’Geodesic distance From.Formula (1) is all pixels in the individual position of expression the i-th ' to the i-th ' individual acra point ei’Geodesic distance be less than other The geodesic distance of acra point.
In order to effectively characterize human body attitude, this method has used the region clustering algorithm based on K-means to extract The Divisional characteristic point of human body, i.e., obtain being clustered in the block of human body acra point position above-mentioned.As shown in Figure 5.In fact, poly- When class point number (i.e. characteristic point quantity) m is very few, the expressiveness of feature shortcoming, a cluster point number cross at most characteristic rule compared with Difference.The present invention extracts m=15 cluster feature point different to characterize according to conventional 16 representations of artis of human body Human body attitude.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D of m characteristic point of each frame sits Mark f (t) can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
(2)
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents that i-th human body of f (t) is special 3D coordinate informations a little are levied, m represents the quantity of characteristic point.
The present invention obtains the overall situation of action sequence by calculating the characteristic point position offset information of current t frames and the first frame Offset information:
fi1i(t)-θi(1)
By calculating current t frames and the (part of the characteristic point position offset information acquisition action sequence of t- Δ t) frames Offset information:
fi2i(t)-θi(t-Δt)
As shown in fig. 6, wherein Δ t is a time interval, it can be with the precision of balanced deflection amount and noise robustness Ability.Δ t values are bigger, then the robustness of noise is just more preferable, but computational accuracy will be reduced, conversely, robustness is then poor, Precision can be higher.Depending on actual conditions of the value according to different action sequence databases.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can lead to Cross global offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video:Assuming that all human bodies of each training video are special Levy and a little represented with one group of offset information.The global offset vector R of each characteristic point of all videos being collected into1Table Show, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video. The local offset vector R of each characteristic point of all videos being collected into2Represent, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video.If R=R1YR2.K-means algorithms are used afterwards Cluster is carried out to R and forms code book { bk, k=1,2 ..., K, each code word are exactly the center of each cluster, are used here The clustering measure method of Euclidean distance.
If each training video F={ f (t) }, t=1,2 ..., n.What n was represented is frame number.It is each in each frame f (t) The global offset vector f of individual human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find Euclidean The most short code word of distance, i.e.,
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and f2i(t).The position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is a pass In the histogram of each code word frequency, byWithComposition, whereinTo represent the global offset amount Nogata of ith feature point Figure,The local offset histogram of ith feature point is represented, i.e.,
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F= {hi, i=1, wherein 2 ..., m, hiCorresponding is the histogram of ith feature point, as shown in Figure 7.
Naive Bayesian arest neighbors sorting technique (Native Bayes Nearest Neighbor are used in step 2 Classifier NBNN) carry out the classification of motion:Video sequence F={ the h that known one group of characteristic point histogram is representedi, i=1, 2 ..., m, wherein m are the quantity of characteristic point, are generally easy to for this group of histogram to combine straight as one Square figure is classified.The independence of characteristics of human body's space of points can thus be lost.The spatial information of human body feature point is being distinguished not With behavior when extra clue can be provided, so to take into full account the independence of human body feature point.
The present invention is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, behavior Identification, calculating be the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, such as Shown in lower:
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'i(C) histogram of ith feature point in behavior class c is represented.
Formula (7) is to represent the test video sequence for being input into, and obtains the histogram of each characteristic point, is then counted The histogrammic difference of the m histogram of characteristic point and each class behavior of training video, the corresponding behavior class c with lowest difference*, i.e., It is considered as the behavior class corresponding to current video F.
The above method is had been applied to the range image sequence of Kinect2 acquisitions, achieve good experimental result. We select 640 × 480 RGBD images in experiment, and collection environment is interior, and collection illumination is fluorescent lamp, acquires 6 people, Everyone 7 kinds of actions, each action does twice, altogether 84 video sequences, altogether 6343 frame, wherein act including lift respectively Hand, wave, squat down, kicking, bending over, body bilateral is waved, body swing etc..
It is 2 for the ratio of training set and test set for each action selection when being tested:1, chosen at random Choosing, has carried out 50 random experiments altogether, and the average recognition accuracy for obtaining is 98.07%.Same video sequence, in identical reality Under the conditions of testing, that is, same number of times experiment is carried out, training set is identical with the ratio of test set, using the method for Lu et al., obtains Average recognition rate is 95.00%.The artis provided using Microsoft Kinect is acted using the method for the present invention Identification classification, the average recognition rate for obtaining be 96.43%, it is seen that the cluster feature point based on position as action recognition classify according to According to validity.
Give the method for the method of the present invention and Lu et al. and carried based on Kinect with table 1 as shown in Fig. 8,9,10,11 For artis the Different Results comparison schematic diagram of action recognition classification is carried out using this method, it can be seen that it is proposed by the present invention Method has discrimination very high under major part action.
In sum, the human action method for identifying and classifying based on division of human body position cluster feature proposed by the present invention passes through Checking, can obtain highly desirable classification results.
Accuracy of identification and recognition result table under the different situations of table 1

Claims (3)

1. a kind of Human bodys' response method based on position cluster feature, it is characterised in that including:
Step 1, in the training stage, the position cluster feature point of each frame of training video is extracted by Attitude estimation first, afterwards Each characteristic point of each frame is calculated to be offset relative to the position of the corresponding characteristic point of a certain frame before;Then all training are collected The characteristic point offset information of video, and offset information is clustered using K-means clustering algorithms, in being clustered The heart, that is, form code book, then represents current training video with one group of histogram of joint characteristic point according to code book;
Step 2, in test phase, to a test video, the code book being made up of the above-mentioned training stage first sets up histogram, Compare test phase histogram by naive Bayesian arest neighbors sorting technique afterwards to enter with the histogrammic difference of training stage Row Activity recognition.
2. the Human bodys' response method of position cluster feature is based on as claimed in claim 1, it is characterised in that the step 1 comprises the following steps:
Step 1.1, human body attitude feature point extraction, comprise the following steps
Step 1.1.1, firstly the need of exactly position human body limb endpoint location, then centered on acra point, realize human body area Domain divides, using geodesic distance as the foundation of classification, using arest neighbors sorting algorithm as the instrument classified, by human depth's picture Element is divided into six major parts, i.e. head, left arm, right arm, left leg, right leg, trunk, under human body classification foundation Formula is stated to be classified,
Ωi′={ v ∈ V:||v-ei′||gead≤minJ '=0 ... 5||v-ej′||gead, i '=0 ..., 5
Wherein Ωi′, six human body blocks of i '=0 ..., 5 presentation classes, their correspondence head, left arm, right arm, left legs Portion, right leg, trunk;V represents some pixel, e in human bodyi′Represent the i-th ' individual acra point, i.e. the left hand right hand or a left side Pin right crus of diaphragm, when i '=0, ei′Represent the central point of human body.||v-ei′||geodRepresent pixel v to acra point ei′Geodesic distance
Step 1.1.2, the Divisional characteristic point for having used the region clustering algorithm extraction human body based on K-means, i.e., above-mentioned Obtain being clustered in the block of human body acra point position, the representation of the artis according to human body, extract cluster feature point with table Levy different human body attitudes.
The calculating of step 1.2, human action sequence signature vector
It is divided into following steps:
Step 1.2.1, calculating position offset:For a video sequence F for n frames, the 3D coordinates f of m characteristic point of each frame T () can be estimated to obtain by human body attitude:
F (t)=φ (t)={ θ1(t),θ2(t),K,θm(t) }, t ∈ { 1,2, K, n }
Wherein θi(t)=(xi(t),yi(t),zi(t)), i ∈ { 1,2, K, m }, θiT () represents i-th human body feature point of f (t) 3D coordinate informations, m represents the quantity of characteristic point.
The global offset information of action sequence is obtained by the characteristic point position offset information for calculating current t frames and the first frame:
fi1i(t)-θi(1)
By calculating current t frames and the (local offset of the characteristic point position offset information acquisition action sequence of t- Δ t) frames Information:
fi2i(t)-θi(t-Δt)
Wherein, Δ t is a time interval.
Obtain after the offset information of all human body feature points of t frames, the characteristic information of all characteristic points of t frames can be by complete Office offset information f1(t) and local offset information f2T () two parts are represented, as follows:
f1(t)=[f11(t),f21(t),K,fm1(t)]
f2(t)=[f12(t),f22(t),K,fm2(t)]
The acquisition of the corresponding action sequence characteristic vector of step 1.2.2, video
Assuming that all human body feature points of each training video are represented with one group of offset information, all videos being collected into it is every The global offset vector R of individual characteristic point1Represent, i.e.,WhereinCorresponding is j-th instruction Practice the t frames of the ith feature point of video, the local offset vector R of each characteristic point of all videos being collected into2Table Show, i.e.,WhereinCorresponding is j-th t frame of the ith feature point of training video, If R=R1YR2, cluster is carried out to R using K-means algorithms afterwards and forms code book { bi, wherein using the cluster degree of Euclidean distance Amount method, each code word is exactly the K center of cluster, that is, { bk, k=1,2 ..., K.
If each training video F={ f (t) }, t=1, what 2 ..., n, wherein n were represented is frame number, each in each frame f (t) The global offset vector f of individual human body feature point i1i(t) or local offset vector f2iT () all can be in code book { bkIn find Euclidean The most short code word of distance, i.e.,
[ Δf 1 i ( t ) orΔf 2 i ( t ) ] ← arg min b k | [ Δf 1 i ( t ) orΔf 2 i ( t ) ] - b k | , k ∈ { 1 , 2 , K , K }
Therefore, in F in each characteristic point i motions i.e. video characteristic point i all position offset f1i(t) and f2i(t), The position offset of each characteristic point can further pass through a histogram hiTo represent, the histogram is one on each The histogram of code word frequency, byWithComposition, whereinTo represent the global offset amount histogram of ith feature point,Table Show the local offset histogram of ith feature point, i.e.,
h i 1 ( k ) = # { b k : f 1 i ( t ) = b k } n , i = 1 , 2 , K , n k = 1 , 2 , K , K
h i 2 ( k ) = # { b k : f 2 i ( t ) = b k } n , i = 1 , 2 , K , n k = 1 , 2 , K , K
h i ( k ) = [ h i 1 , h i 2 ]
Wherein # { } is a scoring function.Last F can just represent with one group of histogram of all characteristic points, i.e. F={ hi, I=1,2 ..., m, wherein hiCorresponding is the histogram of ith feature point.
3. the Human bodys' response method of position cluster feature is based on as claimed in claim 1, it is characterised in that in step 2 Entered using naive Bayesian arest neighbors sorting technique (Native Bayes Nearest Neighbor Classifier NBNN) The row classification of motion:Video sequence F={ the h that known one group of characteristic point histogram is representedi, i=1, wherein 2 ..., m, m are features The quantity of point;
It is applied to based on NBNN visual classifications from the initial concept based on NBNN image classifications, that is, Activity recognition, calculate Be the distance of joint histogram-classification rather than the distance of video-classification or the distance of audio-video, it is as follows:
F → c * , w h e r e c * = arg min c Σ i = 1 m | | h i - NN i c ( h i ) | | 2
WhereinRepresent in the ith feature point of c class behaviors with hiThe histogram of arest neighbors, i.e.,Wherein h 'iC () represents the histogram of ith feature point in behavior class c.
CN201710057722.4A 2017-01-23 2017-01-23 Human behavior recognition method based on part clustering characteristics Active CN106909890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710057722.4A CN106909890B (en) 2017-01-23 2017-01-23 Human behavior recognition method based on part clustering characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710057722.4A CN106909890B (en) 2017-01-23 2017-01-23 Human behavior recognition method based on part clustering characteristics

Publications (2)

Publication Number Publication Date
CN106909890A true CN106909890A (en) 2017-06-30
CN106909890B CN106909890B (en) 2020-02-11

Family

ID=59207591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710057722.4A Active CN106909890B (en) 2017-01-23 2017-01-23 Human behavior recognition method based on part clustering characteristics

Country Status (1)

Country Link
CN (1) CN106909890B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520250A (en) * 2018-04-19 2018-09-11 北京工业大学 A kind of human motion sequence extraction method of key frame
CN108564047A (en) * 2018-04-19 2018-09-21 北京工业大学 A kind of Human bodys' response method based on the joints 3D point sequence
CN109272523A (en) * 2018-08-13 2019-01-25 西安交通大学 Based on the random-stow piston position and orientation estimation method for improving CVFH and CRH feature
CN110121103A (en) * 2019-05-06 2019-08-13 郭凌含 The automatic editing synthetic method of video and device
CN110163103A (en) * 2019-04-18 2019-08-23 中国农业大学 A kind of live pig Activity recognition method and apparatus based on video image
CN111249691A (en) * 2018-11-30 2020-06-09 百度在线网络技术(北京)有限公司 Athlete training method and system based on body shape recognition
CN112784662A (en) * 2018-12-30 2021-05-11 奥瞳系统科技有限公司 Video-based fall risk evaluation system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150023590A1 (en) * 2013-07-16 2015-01-22 National Taiwan University Of Science And Technology Method and system for human action recognition
CN104715493A (en) * 2015-03-23 2015-06-17 北京工业大学 Moving body posture estimating method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150023590A1 (en) * 2013-07-16 2015-01-22 National Taiwan University Of Science And Technology Method and system for human action recognition
CN104715493A (en) * 2015-03-23 2015-06-17 北京工业大学 Moving body posture estimating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUOLIANG LU ETC.: ""Efficient action recognition via local position offset of 3D skeletal body joints"", 《SPRINGER SCIENCE+BUSINESS MEDIA NEW YORK》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520250A (en) * 2018-04-19 2018-09-11 北京工业大学 A kind of human motion sequence extraction method of key frame
CN108564047A (en) * 2018-04-19 2018-09-21 北京工业大学 A kind of Human bodys' response method based on the joints 3D point sequence
CN108564047B (en) * 2018-04-19 2021-09-10 北京工业大学 Human behavior identification method based on3D joint point sequence
CN108520250B (en) * 2018-04-19 2021-09-14 北京工业大学 Human motion sequence key frame extraction method
CN109272523A (en) * 2018-08-13 2019-01-25 西安交通大学 Based on the random-stow piston position and orientation estimation method for improving CVFH and CRH feature
CN111249691A (en) * 2018-11-30 2020-06-09 百度在线网络技术(北京)有限公司 Athlete training method and system based on body shape recognition
CN111249691B (en) * 2018-11-30 2021-11-23 百度在线网络技术(北京)有限公司 Athlete training method and system based on body shape recognition
CN112784662A (en) * 2018-12-30 2021-05-11 奥瞳系统科技有限公司 Video-based fall risk evaluation system
CN110163103A (en) * 2019-04-18 2019-08-23 中国农业大学 A kind of live pig Activity recognition method and apparatus based on video image
CN110163103B (en) * 2019-04-18 2021-07-30 中国农业大学 Live pig behavior identification method and device based on video image
CN110121103A (en) * 2019-05-06 2019-08-13 郭凌含 The automatic editing synthetic method of video and device

Also Published As

Publication number Publication date
CN106909890B (en) 2020-02-11

Similar Documents

Publication Publication Date Title
Wang et al. Action recognition based on joint trajectory maps with convolutional neural networks
Pala et al. Multimodal person reidentification using RGB-D cameras
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN106909890A (en) A kind of Human bodys' response method based on position cluster feature
Kamal et al. A hybrid feature extraction approach for human detection, tracking and activity recognition using depth sensors
WO2017101434A1 (en) Human body target re-identification method and system among multiple cameras
Medioni et al. Identifying noncooperative subjects at a distance using face images and inferred three-dimensional face models
US20060018516A1 (en) Monitoring activity using video information
Yao et al. Robust CNN-based gait verification and identification using skeleton gait energy image
Hu et al. Exploring structural information and fusing multiple features for person re-identification
Thành et al. An evaluation of pose estimation in video of traditional martial arts presentation
WO2009123354A1 (en) Method, apparatus, and program for detecting object
CN104794451B (en) Pedestrian's comparison method based on divided-fit surface structure
CN110008913A (en) The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism
Pandey et al. Hand gesture recognition for sign language recognition: A review
CN110263605A (en) Pedestrian's dress ornament color identification method and device based on two-dimension human body guise estimation
JP5940862B2 (en) Image processing device
CN103902992B (en) Human face recognition method
CN109271932A (en) Pedestrian based on color-match recognition methods again
JP7422456B2 (en) Image processing device, image processing method and program
CN114187665A (en) Multi-person gait recognition method based on human body skeleton heat map
CN108280421A (en) Human bodys' response method based on multiple features Depth Motion figure
Wang et al. Hand motion and posture recognition in a network of calibrated cameras
Munaro et al. An evaluation of 3d motion flow and 3d pose estimation for human action recognition
Hayashi et al. Upper body pose estimation for team sports videos using a poselet-regressor of spine pose and body orientation classifiers conditioned by the spine angle prior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant