CN106228111A

CN106228111A - A kind of method based on skeleton sequential extraction procedures key frame

Info

Publication number: CN106228111A
Application number: CN201610539455.XA
Authority: CN
Inventors: 侯永宏; 李照洋; 董嘉蓉; 马乐乐; 王爽
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-07-08
Filing date: 2016-07-08
Publication date: 2016-12-14

Abstract

The present invention relates to a kind of method based on skeleton sequential extraction procedures key frame, including: catch human action by Kinect video camera, obtain comprising the three-dimensional skeleton sequence of multiple skeleton node；The skeleton coordinate of consecutive frame is subtracted each other, obtains the three-dimensional skeleton motion vector of all skeletons；The three-dimensional skeleton motion vector of all skeletons is projected respectively in the three of Descartes's rhombic system planes, on each perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram；According to comentropy formula, the skeleton motion vector histograms of consecutive frame being asked information entropy, the frame definition that comentropy has local maximum is primitive frame；For each primitive frame of whole three-dimensional skeleton sequence, calculate intertexture coefficient, and the information entropy of primitive frame is weighted；Obtain the key frame of this skeleton sequence human's action.The present invention can extract human action's key frame accurately and reliably and efficiently.

Description

A kind of method based on skeleton sequential extraction procedures key frame

Technical field

The invention belongs to multimedia signal processing field, relate to a kind of method extracting key frame.

Background technology

Along with the arrival of cybertimes, and the developing rapidly of computer industry, the market of computer intelligence is the most vigorously sent out Exhibition.The fields such as machine learning, pattern recognition, data mining have wide development space in current social.The pattern that is subordinated to is known The human action of the computer in other field detects identification and has a lot of application in today's world, the body-sensing trip of such as man-machine interaction Play, intelligent monitoring, video frequency searching etc..But in the angle of computer disposal video, amount of video information is the hugest.For Raising video processing speed, makes machine learning algorithm based on video have higher applicability, filters out in video Comprise the key frame that action message more enriches to process, become very popular in recent years.The present invention propose a kind of for Human action, based on three-dimensional skeleton sequence, the method for extraction action key frame in the video sequence.

In recent years, camera industry development is rapid, it is possible to the camera catching depth information has and is increasingly widely applied.? After within 2010, Microsoft issues Kinect video camera, depth camera comes into huge numbers of families, the most substantial amounts of video and picture Research direction is increasingly turned to information processing based on RGB-D by the scholar in direction.Along with following the tracks of the mankind in deep video sequence Updating of skeleton algorithm, bone information, as more abstract and high-level characteristics of human body, is widely used, because it has There are the insensitive characteristic of light, and more comprehensively three-dimensional character.But, there is presently no key frame based on skeleton sequence and carry Take technology.

Summary of the invention

In order to video sequence is processed more easily, computer is allowed to identify the action of the mankind fast and effectively, this Invention, based on skeleton sequence, proposes the extracting method of a kind of human action's key frame.The method has compares two-dimensional signal more The spatial character of robust.Meanwhile, do well out of the simple characteristic of bone information, there is the highest operational efficiency.Summary of the invention is as follows:

A kind of method based on skeleton sequential extraction procedures key frame, comprises the following steps:

1) catch human action by Kinect video camera, in the data stream of capture, carry out skeleton tracking, comprised The three-dimensional skeleton sequence of multiple skeleton nodes；

2) for each skeleton node, the skeleton coordinate of consecutive frame is subtracted each other, obtain each this skeleton node adjacent The skeleton motion vector of interframe, and then calculate the three-dimensional skeleton motion vector of all skeletons；

3) the three-dimensional skeleton motion vector of all skeletons is projected respectively in the three of Descartes's rhombic system planes, On each perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram, be defined as Skeleton motion vector histograms；

4) according to comentropy formula, the skeleton motion vector histograms of consecutive frame is sought information entropy；By whole video sequence In row, all of skeleton motion vector histograms information entropy is arranged in order according to video sequencing, and is depicted as song Line chart, is entropy curve by this curve definitions, in entropy curve, obtains local maximum, comentropy is had local maximum Frame definition is primitive frame；

5) for each primitive frame i of whole three-dimensional skeleton sequence, according to himself information entropy and consecutive frame Information entropy, can calculate intertexture coefficient HI by the interleaving mode of following formula:

H I = \frac{Σ_{x = 1}^{3} \min (H (i), H (i &PlusMinus; x))}{Σ_{x = 1}^{3} H (i &PlusMinus; x)}

Wherein, the information entropy of primitive frame is H (i), and makes H (i ± x) represent the comentropy of x frame adjacent with primitive frame i Value, the frame after+number representative, the frame before-number representative；Itself and original frame information entropy is made to carry out product, thus to primitive frame Information entropy be weighted；

6) by information entropy H (i) of each primitive frame, it is interleaved with coefficient HI and is multiplied and is weighted, after being weighted Original frame information entropy；

7) according to the original frame information entropy after weighting, draw new information entropy curve, obtain local maximum corresponding Frame, as the key frame of this skeleton sequence human's action.

The present invention can extract human action's key frame accurately and reliably and efficiently.

Accompanying drawing explanation

Fig. 1 is whole key-frame extraction framework

The key frame that action of waving is extracted on MSRAction-3D data set by Fig. 2 by the present invention, uses gray-scale map Visualization

Detailed description of the invention

1) present invention uses the 32-bit operating system of Windows8, and exploitation IDE is VS2010, has configured Kinect for Windows SDK v1.6 and OpenCV2.3.0 or more highest version, use NUI skeleton tracking mode will capture at Kinect Data stream in carry out skeleton tracking, and human skeletal's action sequence is exported.

2), in each frame of skeleton sequence, the three-dimensional coordinate of 20 human skeletal's nodes is comprised.Each skeleton is saved Point, the absolute difference of the skeleton three-dimensional coordinate of the most adjacent two frames is this skeleton node and vows at the skeleton motion of these adjacent two interframe Amount, and then the three-dimensional skeleton motion vector of all 20 skeleton nodes can be obtained.

3) the skeleton motion vector to two interframe all skeletons node carries out the throwing in three directions in Descartes's rhombic system Shadow, the projection on two dimensional surface of the motion vector of each skeleton node has different directions and vector size.Each On two dimensional surface, on the basis of x-axis positive direction, in the counterclockwise direction, often rotating 45 ° and be defined as a direction, so far plane can It is divided into 8 directions.According to experimental result, the present invention is with the maximum amplitude value of skeleton motion vectors all in each video sequence For standard, the skeleton motion vector on all two dimensional surfaces is divided into 5 magnitude range.Thus, according to skeleton motion vector Size and Orientation, defines 40 classifications (classification order and numbering do not affect result), the skeleton in each two-dimensional projection face successively Motion vector can be returned according to direction and size to be divided into a classification.

On each perspective plane, count the skeleton number that each classification comprises, can obtain dimension be 40 to Amount (i.e. rectangular histogram), couples together adding up, on three perspective planes, the vector obtained respectively, obtains vector that dimension is 120 (i.e. Rectangular histogram), it is defined as skeleton motion vector histograms.

4) to each skeleton motion vector histograms, according to comentropy formula: The entropy that each skeleton motion vector is corresponding can be obtained.Wherein H is information entropy, p_iIt is that in 120 dimensional vectors, i-th classification exists Ratio shared in whole rectangular histogram, n is histogrammic length, takes n=120 in the present invention.

For whole skeleton sequence, all comentropies are coupled together, obtain the curve being made up of comentropy, defined For entropy curve.Extract entropy in the local maximum in entropy curve, i.e. entropy curve and meet two frame entropy about simultaneously greater than Point, is primitive frame by having the frame definition of local maximum in skeleton sequence entropy curve.

5) in entropy curve, to each primitive frame, it is assumed that this frame is the i-th frame in whole video, then its entropy is H (i), and another H (i ± x) represent the x frame adjacent with this frame (before or after) entropy.This is calculated according to following interleaving formula The intertexture coefficient HI of frame.The skeleton motion that this intertexture coefficient reflects primitive frame is adjacent the motion difference size of frame.With friendship Knit coefficient to be multiplied with the entropy of primitive frame, thus realize primitive frame entropy is weighted.Intertexture coefficient formula is as follows:

H I = \frac{Σ_{x = 1}^{3} \min (H (i), H (i &PlusMinus; x))}{Σ_{x = 1}^{3} H (i &PlusMinus; x)}

6) the primitive frame entropy after being weighted, is sequentially connected according to video sequences and connects, and obtains new entropy curve.Newly The frame corresponding to local maximum is taken out, as the key frame of human action's sequence in this video on entropy curve.

The result tested the present invention on MSRAction-3D data set below illustrates:

The human action that MSRAction-3D is the most influential detects identification data set, and this data set comprises 20 classes Action, it is provided that depth information data and skeleton data.The present invention, according to extraction method of key frame described above, moves waving Making to have carried out key-frame extraction, this action of waving comprises 58 frames altogether, by the inventive method, extracts 8 key frames altogether.Fig. 2 For the result being arranged in order after depth information corresponding for key frame is visualized.By result it can be seen that, originally The action sequence comprising multiframe passes through the inventive method, has extracted a few frames that can characterize whole action.By the present invention, Process to whole video sequence can be converted to process keyframe sequence, thus greatly reduces the redundancy processing data, reduces The operation time of algorithm, space cost, improve the practicality of complicated algorithm in terms of processing video.

Claims

1. a method based on skeleton sequential extraction procedures key frame, comprises the following steps:

1) catch human action by Kinect video camera, in the data stream of capture, carry out skeleton tracking, obtain comprising multiple The three-dimensional skeleton sequence of skeleton node；

2) for each skeleton node, the skeleton coordinate of consecutive frame is subtracted each other, obtain each this skeleton node in adjacent interframe Skeleton motion vector, and then calculate the three-dimensional skeleton motion vector of all skeletons；

3) the three-dimensional skeleton motion vector of all skeletons is projected, often respectively in the three of Descartes's rhombic system planes On individual perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram, be defined as skeleton Motion vector histograms；

4) according to comentropy formula, the skeleton motion vector histograms of consecutive frame is sought information entropy；By in whole video sequence, All of skeleton motion vector histograms information entropy is arranged in order according to video sequencing, and is depicted as curve chart, It is entropy curve by this curve definitions, in entropy curve, obtains local maximum, comentropy is had the frame definition of local maximum For primitive frame；

5) for each primitive frame i of whole three-dimensional skeleton sequence, according to himself information entropy and the information of consecutive frame Entropy, can calculate intertexture coefficient HI by the interleaving mode of following formula:

H I = \frac{Σ_{x = 1}^{3} m i n (H (i), H (i &PlusMinus; x))}{Σ_{x = 1}^{3} H (i &PlusMinus; x)}

Wherein, the information entropy of primitive frame is H (i), and makes H (i ± x) represent the information entropy of x frame adjacent with primitive frame i ,+number Frame after representative, the frame before-number representative；It is made to carry out product with original frame information entropy, thus the information to primitive frame Entropy is weighted；

6) by information entropy H (i) of each primitive frame, it is interleaved with coefficient HI and is multiplied and is weighted, original after being weighted Frame information entropy；

7) according to the original frame information entropy after weighting, draw new information entropy curve, obtain the frame that local maximum is corresponding, Key frame as the sequence human's action of this skeleton.