CN106228111A - A kind of method based on skeleton sequential extraction procedures key frame - Google Patents

A kind of method based on skeleton sequential extraction procedures key frame Download PDF

Info

Publication number
CN106228111A
CN106228111A CN201610539455.XA CN201610539455A CN106228111A CN 106228111 A CN106228111 A CN 106228111A CN 201610539455 A CN201610539455 A CN 201610539455A CN 106228111 A CN106228111 A CN 106228111A
Authority
CN
China
Prior art keywords
frame
skeleton
motion vector
entropy
information entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610539455.XA
Other languages
Chinese (zh)
Inventor
侯永宏
李照洋
董嘉蓉
马乐乐
王爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610539455.XA priority Critical patent/CN106228111A/en
Publication of CN106228111A publication Critical patent/CN106228111A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of method based on skeleton sequential extraction procedures key frame, including: catch human action by Kinect video camera, obtain comprising the three-dimensional skeleton sequence of multiple skeleton node;The skeleton coordinate of consecutive frame is subtracted each other, obtains the three-dimensional skeleton motion vector of all skeletons;The three-dimensional skeleton motion vector of all skeletons is projected respectively in the three of Descartes's rhombic system planes, on each perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram;According to comentropy formula, the skeleton motion vector histograms of consecutive frame being asked information entropy, the frame definition that comentropy has local maximum is primitive frame;For each primitive frame of whole three-dimensional skeleton sequence, calculate intertexture coefficient, and the information entropy of primitive frame is weighted;Obtain the key frame of this skeleton sequence human's action.The present invention can extract human action's key frame accurately and reliably and efficiently.

Description

A kind of method based on skeleton sequential extraction procedures key frame
Technical field
The invention belongs to multimedia signal processing field, relate to a kind of method extracting key frame.
Background technology
Along with the arrival of cybertimes, and the developing rapidly of computer industry, the market of computer intelligence is the most vigorously sent out Exhibition.The fields such as machine learning, pattern recognition, data mining have wide development space in current social.The pattern that is subordinated to is known The human action of the computer in other field detects identification and has a lot of application in today's world, the body-sensing trip of such as man-machine interaction Play, intelligent monitoring, video frequency searching etc..But in the angle of computer disposal video, amount of video information is the hugest.For Raising video processing speed, makes machine learning algorithm based on video have higher applicability, filters out in video Comprise the key frame that action message more enriches to process, become very popular in recent years.The present invention propose a kind of for Human action, based on three-dimensional skeleton sequence, the method for extraction action key frame in the video sequence.
In recent years, camera industry development is rapid, it is possible to the camera catching depth information has and is increasingly widely applied.? After within 2010, Microsoft issues Kinect video camera, depth camera comes into huge numbers of families, the most substantial amounts of video and picture Research direction is increasingly turned to information processing based on RGB-D by the scholar in direction.Along with following the tracks of the mankind in deep video sequence Updating of skeleton algorithm, bone information, as more abstract and high-level characteristics of human body, is widely used, because it has There are the insensitive characteristic of light, and more comprehensively three-dimensional character.But, there is presently no key frame based on skeleton sequence and carry Take technology.
Summary of the invention
In order to video sequence is processed more easily, computer is allowed to identify the action of the mankind fast and effectively, this Invention, based on skeleton sequence, proposes the extracting method of a kind of human action's key frame.The method has compares two-dimensional signal more The spatial character of robust.Meanwhile, do well out of the simple characteristic of bone information, there is the highest operational efficiency.Summary of the invention is as follows:
A kind of method based on skeleton sequential extraction procedures key frame, comprises the following steps:
1) catch human action by Kinect video camera, in the data stream of capture, carry out skeleton tracking, comprised The three-dimensional skeleton sequence of multiple skeleton nodes;
2) for each skeleton node, the skeleton coordinate of consecutive frame is subtracted each other, obtain each this skeleton node adjacent The skeleton motion vector of interframe, and then calculate the three-dimensional skeleton motion vector of all skeletons;
3) the three-dimensional skeleton motion vector of all skeletons is projected respectively in the three of Descartes's rhombic system planes, On each perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram, be defined as Skeleton motion vector histograms;
4) according to comentropy formula, the skeleton motion vector histograms of consecutive frame is sought information entropy;By whole video sequence In row, all of skeleton motion vector histograms information entropy is arranged in order according to video sequencing, and is depicted as song Line chart, is entropy curve by this curve definitions, in entropy curve, obtains local maximum, comentropy is had local maximum Frame definition is primitive frame;
5) for each primitive frame i of whole three-dimensional skeleton sequence, according to himself information entropy and consecutive frame Information entropy, can calculate intertexture coefficient HI by the interleaving mode of following formula:
H I = Σ x = 1 3 min ( H ( i ) , H ( i ± x ) ) Σ x = 1 3 H ( i ± x )
Wherein, the information entropy of primitive frame is H (i), and makes H (i ± x) represent the comentropy of x frame adjacent with primitive frame i Value, the frame after+number representative, the frame before-number representative;Itself and original frame information entropy is made to carry out product, thus to primitive frame Information entropy be weighted;
6) by information entropy H (i) of each primitive frame, it is interleaved with coefficient HI and is multiplied and is weighted, after being weighted Original frame information entropy;
7) according to the original frame information entropy after weighting, draw new information entropy curve, obtain local maximum corresponding Frame, as the key frame of this skeleton sequence human's action.
The present invention can extract human action's key frame accurately and reliably and efficiently.
Accompanying drawing explanation
Fig. 1 is whole key-frame extraction framework
The key frame that action of waving is extracted on MSRAction-3D data set by Fig. 2 by the present invention, uses gray-scale map Visualization
Detailed description of the invention
1) present invention uses the 32-bit operating system of Windows8, and exploitation IDE is VS2010, has configured Kinect for Windows SDK v1.6 and OpenCV2.3.0 or more highest version, use NUI skeleton tracking mode will capture at Kinect Data stream in carry out skeleton tracking, and human skeletal's action sequence is exported.
2), in each frame of skeleton sequence, the three-dimensional coordinate of 20 human skeletal's nodes is comprised.Each skeleton is saved Point, the absolute difference of the skeleton three-dimensional coordinate of the most adjacent two frames is this skeleton node and vows at the skeleton motion of these adjacent two interframe Amount, and then the three-dimensional skeleton motion vector of all 20 skeleton nodes can be obtained.
3) the skeleton motion vector to two interframe all skeletons node carries out the throwing in three directions in Descartes's rhombic system Shadow, the projection on two dimensional surface of the motion vector of each skeleton node has different directions and vector size.Each On two dimensional surface, on the basis of x-axis positive direction, in the counterclockwise direction, often rotating 45 ° and be defined as a direction, so far plane can It is divided into 8 directions.According to experimental result, the present invention is with the maximum amplitude value of skeleton motion vectors all in each video sequence For standard, the skeleton motion vector on all two dimensional surfaces is divided into 5 magnitude range.Thus, according to skeleton motion vector Size and Orientation, defines 40 classifications (classification order and numbering do not affect result), the skeleton in each two-dimensional projection face successively Motion vector can be returned according to direction and size to be divided into a classification.
On each perspective plane, count the skeleton number that each classification comprises, can obtain dimension be 40 to Amount (i.e. rectangular histogram), couples together adding up, on three perspective planes, the vector obtained respectively, obtains vector that dimension is 120 (i.e. Rectangular histogram), it is defined as skeleton motion vector histograms.
4) to each skeleton motion vector histograms, according to comentropy formula: The entropy that each skeleton motion vector is corresponding can be obtained.Wherein H is information entropy, piIt is that in 120 dimensional vectors, i-th classification exists Ratio shared in whole rectangular histogram, n is histogrammic length, takes n=120 in the present invention.
For whole skeleton sequence, all comentropies are coupled together, obtain the curve being made up of comentropy, defined For entropy curve.Extract entropy in the local maximum in entropy curve, i.e. entropy curve and meet two frame entropy about simultaneously greater than Point, is primitive frame by having the frame definition of local maximum in skeleton sequence entropy curve.
5) in entropy curve, to each primitive frame, it is assumed that this frame is the i-th frame in whole video, then its entropy is H (i), and another H (i ± x) represent the x frame adjacent with this frame (before or after) entropy.This is calculated according to following interleaving formula The intertexture coefficient HI of frame.The skeleton motion that this intertexture coefficient reflects primitive frame is adjacent the motion difference size of frame.With friendship Knit coefficient to be multiplied with the entropy of primitive frame, thus realize primitive frame entropy is weighted.Intertexture coefficient formula is as follows:
H I = Σ x = 1 3 min ( H ( i ) , H ( i ± x ) ) Σ x = 1 3 H ( i ± x )
6) the primitive frame entropy after being weighted, is sequentially connected according to video sequences and connects, and obtains new entropy curve.Newly The frame corresponding to local maximum is taken out, as the key frame of human action's sequence in this video on entropy curve.
The result tested the present invention on MSRAction-3D data set below illustrates:
The human action that MSRAction-3D is the most influential detects identification data set, and this data set comprises 20 classes Action, it is provided that depth information data and skeleton data.The present invention, according to extraction method of key frame described above, moves waving Making to have carried out key-frame extraction, this action of waving comprises 58 frames altogether, by the inventive method, extracts 8 key frames altogether.Fig. 2 For the result being arranged in order after depth information corresponding for key frame is visualized.By result it can be seen that, originally The action sequence comprising multiframe passes through the inventive method, has extracted a few frames that can characterize whole action.By the present invention, Process to whole video sequence can be converted to process keyframe sequence, thus greatly reduces the redundancy processing data, reduces The operation time of algorithm, space cost, improve the practicality of complicated algorithm in terms of processing video.

Claims (1)

1. a method based on skeleton sequential extraction procedures key frame, comprises the following steps:
1) catch human action by Kinect video camera, in the data stream of capture, carry out skeleton tracking, obtain comprising multiple The three-dimensional skeleton sequence of skeleton node;
2) for each skeleton node, the skeleton coordinate of consecutive frame is subtracted each other, obtain each this skeleton node in adjacent interframe Skeleton motion vector, and then calculate the three-dimensional skeleton motion vector of all skeletons;
3) the three-dimensional skeleton motion vector of all skeletons is projected, often respectively in the three of Descartes's rhombic system planes On individual perspective plane, according to direction and amplitude, skeleton motion vector is carried out probability statistics, obtain rectangular histogram, be defined as skeleton Motion vector histograms;
4) according to comentropy formula, the skeleton motion vector histograms of consecutive frame is sought information entropy;By in whole video sequence, All of skeleton motion vector histograms information entropy is arranged in order according to video sequencing, and is depicted as curve chart, It is entropy curve by this curve definitions, in entropy curve, obtains local maximum, comentropy is had the frame definition of local maximum For primitive frame;
5) for each primitive frame i of whole three-dimensional skeleton sequence, according to himself information entropy and the information of consecutive frame Entropy, can calculate intertexture coefficient HI by the interleaving mode of following formula:
H I = Σ x = 1 3 m i n ( H ( i ) , H ( i ± x ) ) Σ x = 1 3 H ( i ± x )
Wherein, the information entropy of primitive frame is H (i), and makes H (i ± x) represent the information entropy of x frame adjacent with primitive frame i ,+number Frame after representative, the frame before-number representative;It is made to carry out product with original frame information entropy, thus the information to primitive frame Entropy is weighted;
6) by information entropy H (i) of each primitive frame, it is interleaved with coefficient HI and is multiplied and is weighted, original after being weighted Frame information entropy;
7) according to the original frame information entropy after weighting, draw new information entropy curve, obtain the frame that local maximum is corresponding, Key frame as the sequence human's action of this skeleton.
CN201610539455.XA 2016-07-08 2016-07-08 A kind of method based on skeleton sequential extraction procedures key frame Pending CN106228111A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610539455.XA CN106228111A (en) 2016-07-08 2016-07-08 A kind of method based on skeleton sequential extraction procedures key frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610539455.XA CN106228111A (en) 2016-07-08 2016-07-08 A kind of method based on skeleton sequential extraction procedures key frame

Publications (1)

Publication Number Publication Date
CN106228111A true CN106228111A (en) 2016-12-14

Family

ID=57520339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610539455.XA Pending CN106228111A (en) 2016-07-08 2016-07-08 A kind of method based on skeleton sequential extraction procedures key frame

Country Status (1)

Country Link
CN (1) CN106228111A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190474A (en) * 2018-08-01 2019-01-11 南昌大学 Human body animation extraction method of key frame based on posture conspicuousness
CN109934183A (en) * 2019-03-18 2019-06-25 北京市商汤科技开发有限公司 Image processing method and device, detection device and storage medium
CN111402290A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Action restoration method and device based on skeleton key points

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102395984A (en) * 2009-04-14 2012-03-28 皇家飞利浦电子股份有限公司 Key frames extraction for video content analysis
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
CN102749993A (en) * 2012-05-30 2012-10-24 无锡掌游天下科技有限公司 Motion recognition method based on skeleton node data
CN103020648A (en) * 2013-01-09 2013-04-03 北京东方艾迪普科技发展有限公司 Method and device for identifying action types, and method and device for broadcasting programs
US20150125045A1 (en) * 2013-11-04 2015-05-07 Steffen Gauglitz Environment Mapping with Automatic Motion Model Selection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102395984A (en) * 2009-04-14 2012-03-28 皇家飞利浦电子股份有限公司 Key frames extraction for video content analysis
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
CN102749993A (en) * 2012-05-30 2012-10-24 无锡掌游天下科技有限公司 Motion recognition method based on skeleton node data
CN103020648A (en) * 2013-01-09 2013-04-03 北京东方艾迪普科技发展有限公司 Method and device for identifying action types, and method and device for broadcasting programs
US20150125045A1 (en) * 2013-11-04 2015-05-07 Steffen Gauglitz Environment Mapping with Automatic Motion Model Selection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LING SHAO等: "Motion Histogram Analysis Based Key Frame Extraction for Human Action/Activity Representation", 《2009 CANADIAN CONFERENCE ON COMPUTER AND ROBOT VISION》 *
邹维嘉: "基于多示例学习的动作识别与显著性检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190474A (en) * 2018-08-01 2019-01-11 南昌大学 Human body animation extraction method of key frame based on posture conspicuousness
CN109190474B (en) * 2018-08-01 2021-07-20 南昌大学 Human body animation key frame extraction method based on gesture significance
CN109934183A (en) * 2019-03-18 2019-06-25 北京市商汤科技开发有限公司 Image processing method and device, detection device and storage medium
CN111402290A (en) * 2020-02-29 2020-07-10 华为技术有限公司 Action restoration method and device based on skeleton key points
CN111402290B (en) * 2020-02-29 2023-09-12 华为技术有限公司 Action restoration method and device based on skeleton key points

Similar Documents

Publication Publication Date Title
Song et al. Richly activated graph convolutional network for action recognition with incomplete skeletons
Liu et al. Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition.
Gao et al. Infar dataset: Infrared action recognition at different times
Hassner A critical review of action recognition benchmarks
Malgireddy et al. A temporal Bayesian model for classifying, detecting and localizing activities in video sequences
CN105528794A (en) Moving object detection method based on Gaussian mixture model and superpixel segmentation
CN103020647A (en) Image classification method based on hierarchical SIFT (scale-invariant feature transform) features and sparse coding
CN103902989B (en) Human action video frequency identifying method based on Non-negative Matrix Factorization
Gao et al. Human action recognition via multi-modality information
Liu et al. 3D action recognition using multiscale energy-based global ternary image
Kihl et al. A unified framework for local visual descriptors evaluation
CN106228111A (en) A kind of method based on skeleton sequential extraction procedures key frame
CN105469050A (en) Video behavior identification method based on local space-time characteristic description and pyramid vocabulary tree
Zhou et al. Human action recognition toward massive-scale sport sceneries based on deep multi-model feature fusion
CN103577804A (en) Abnormal human behavior identification method based on SIFT flow and hidden conditional random fields
Hou et al. Enhancing and dissecting crowd counting by synthetic data
Roy et al. Sparsity-inducing dictionaries for effective action classification
CN103218829A (en) Foreground extracting method suitable for dynamic background
Sun et al. Learning spatio-temporal co-occurrence correlograms for efficient human action classification
CN103778439A (en) Body contour reconstruction method based on dynamic time-space information digging
Yin et al. Small human group detection and event representation based on cognitive semantics
Li et al. Trajectory-pooled spatial-temporal architecture of deep convolutional neural networks for video event detection
CN116403286A (en) Social grouping method for large-scene video
Ma et al. Video event classification and image segmentation based on noncausal multidimensional hidden markov models
Choi et al. A view-based real-time human action recognition system as an interface for human computer interaction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161214