CN106683111A

CN106683111A - Human motion video segmentation method based on temporal clustering

Info

Publication number: CN106683111A
Application number: CN201611040136.0A
Authority: CN
Inventors: 钱诚; 徐则中; 陈丽燕; 徐强
Original assignee: Changzhou Institute of Technology
Current assignee: Changzhou Institute of Technology
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2017-05-17
Anticipated expiration: 2036-11-24
Also published as: CN106683111B

Abstract

The invention discloses a human motion video segmentation method based on temporal clustering. The method specifically comprises the following steps: extracting the features of video frames, calculating a distance transform diagram, and carrying out treatment such as k-means clustering to get class tag vectors output as the features of the video frames; modeling the relationship between the features of the video frames: building a correlation matrix M for the feature matrix of the video frames to describe the relationship between the features of the video frames; and after getting the correlation matrix M, performing a graph cut algorithm on the correlation matrix M to get a clustering result of the features of the video frames, wherein the clustering result is used as the segmentation result of the video frames, and each class in the clustering result represents a video segment containing an independent action. The fusion problem concerning similarity and temporality of the features of human motion in video frames in human motion video segmentation is solved. The segmentation accuracy is improved. As no iterative calculation is needed in calculation of the relationship between video frames, the calculation efficiency is improved.

Description

The human motion methods of video segmentation of timing cluster

Technical field

The invention belongs to image/video process field, more particularly to a kind of human motion Video segmentation side of timing cluster Method.

Background technology

《Temporal Subspace Clustering for Human Motion Segmentation》This text In offering, author proposes the two-value template that human body image is extracted in each frame video, and by range conversion range conversion figure is formed, Preliminary cluster result is formed by K mean cluster, using the cluster result of bi-level fashion as frame of video feature.Subsequently, author A Laplce canonical related to encoder matrix is with the addition of on the basis of least square returns Subspace clustering method about Beam, for forming the model of frame of video characteristic time relation, by alternating direction multiplier method dictionary and encoder matrix is solved.Most Afterwards continuous videos are cut into the video-frequency band comprising self contained function by the figure blanking method on encoder matrix.Experiment shows that the method exists There is preferable performance in the accuracy rate and normalized mutual information index of the cutting of human motion video-frequency band.

But the algorithm due to the process employs this kind of iterative of alternating direction multiplier method, compares on time consumption Greatly, thus video-frequency band cutting speed it is slow.In addition the method needs the time Laplce's item by encoder matrix to retouch Frame of video relation in time is stated, for the dependency on video frame time describes complex.

The content of the invention

For above-mentioned technical problem, the present invention proposes a kind of human motion methods of video segmentation of timing cluster, more The relation between frame of video is comprehensively described, and improves computational efficiency.

To reach above-mentioned purpose, the technical solution used in the present invention is：

A kind of human motion methods of video segmentation of timing cluster, specifically includes following components：The spy of frame of video Extraction is levied, the modeling of relation between frame of video feature, the solution of correlation matrix, the figure on correlation matrix is cut.

Frame of video feature extraction：Input t frame videos, to each frame of video background reducing is carried out, and extracts human body image With background image, binary map is formed, wherein human body image region represents that background image region is represented with black image with white； To frame of video computed range Transformation Graphs, range conversion figure is expanded into into a column vector by row：To on t frame videos it is acquired away from The column vector launched from Transformation Graphs carries out K mean cluster, the class label vector of the bi-level fashion of each frame video is obtained, by category The feature that vector is signed as these frame of video is exported.The class quantity value of K mean cluster is as follows：Work as t<When=50, class quantity is just Value t, works as t>50, the general value of quantity of class is 50.

The modeling of relation between frame of video feature：Characteristic set { the x of input video frame₁,x₂,…,x_t, wherein x_iRepresent The class label vector of resulting bi-level fashion in i-th frame frame of video, namely the feature of the frame video, these features constitute one Eigenmatrix X=x₁,x₂,…,x_t].In order to describe the relation between frame of video feature, build a correlation matrix M and merge phase Like property tolerance and feature sequential propinquity.Matrix M can carry out minimum and obtain by following function pair M：

Meet M >=0 (1) simultaneously

Tr (AM) is the mark of calculating matrix AM, and λ is the regular parameter that is positive number.By carrying out derivation to formula (1), can To obtain the constraint equation with regard to M, and solve with this to obtain correlation matrix M

A then serves as a weight matrix related to sequential.The line number of its matrix, columns and matrix X^TX is consistent, and in addition 0 Represent line number, columns and matrix X^TNull matrix of the X consistent element all for 0；

Max computings represent that the value of each element in M takesMaximum between corresponding element value and 0 value.

Figure on correlation matrix is cut：After correlation matrix M is obtained, figure is performed to it and cuts algorithm, it is possible to obtain video The cluster result of frame feature, and in this, as the segmentation result of frame of video, namely each class in cluster result includes one The all of frame of video of individual self contained function.

Wherein, on the one hand correlation matrix M contains the similarity measurement between frame of video feature, on the other hand measures Propinquity of the frame of video feature in sequential, to describe relation of the frame of video feature on similarity and sequential propinquity.

A serves as a weight matrix related to sequential.The value of A each element is as represented by following formula：

In above formula, line index, column index in i, k difference representing matrix.ε values are 10^-6.τ is time window length, and τ takes It is worth for 5~17.In above formula, the setting of weights A allows adjacent element in sequential to give more when similarity is calculated Big weights.

The invention has the advantages that：The present invention solves human motion in frame of video in human motion Video segmentation Characteristic similarity and the fusion problem of timing, improve segmentation precision, while when frame of video relation is calculated due to not needing Calculating is iterated, therefore improves computational efficiency.

Description of the drawings

Fig. 1 is the human motion methods of video segmentation flow chart of the embodiment of the present invention.

Specific embodiment

For the ease of the understanding of those skilled in the art, the present invention is made further with reference to embodiment and accompanying drawing It is bright.

Human motion video-frequency band cutting method depends on the dependency description of frame of video, and great majority are cut based on cluster Segmentation method only considers frame of video similarity characteristically when structure video frame correlation is described, often, and seldom considers to regard Frequency frame dependency in time.The present embodiment is also added into frame of video while similarity measurement between frame of video is retained The tolerance of propinquity in sequential, therefore, it is possible to the relation being described more fully hereinafter between frame of video.In addition, lifting video cutting Speed be also crucial.

Human motion methods of video segmentation flow process as shown in Figure 1：One section of video comprising human motion of input, will be each Frame color video frequency image does subtraction with corresponding static background image, completes background subtraction and processes operation, obtains human body image area Domain, and be white by the human body image region labeling for being extracted, background area is demarcated as black, obtains a bianry image, right Bianry image carries out range conversion and obtains range conversion figure.By the range conversion figure of all frame of video by row launch, formed row to Duration set, obtains on its basis the class label vector of bi-level fashion using K mean cluster algorithm, and it is represented with 1 in the vector Affiliated class, 0 represents that it is not belonging to corresponding class.

Class label vector [0,0,1,0]^TRepresent, corresponding feature can be classified as the 3rd class, and be not belonging to other classes.

Using class label vector as the feature of frame of video, correlation matrix M is carried out in the feature base of these frame of video Solution.According to formula 2：

The solution with regard to correlation matrix M is obtained, in above formula, the value of A each element is as represented by following formula：

In above formula, line index, column index in i, k difference representing matrix.ε values are 10^-6.In the setting of weight matrix τ, In the present embodiment its value is τ=9.

After correlation matrix M is obtained, figure is performed to it and cuts algorithm, so as to obtain the cluster result of individual features, enter one Corresponding video slicing is the video-frequency band comprising human body self contained function according to cluster result by step.

Embodiment above technological thought only to illustrate the invention, it is impossible to which protection scope of the present invention is limited with this, it is all It is any change done on the basis of technical scheme according to technological thought proposed by the present invention, each falls within present invention protection model Within enclosing.

Claims

1. the human motion methods of video segmentation that timing is clustered, it is characterised in that comprise the steps：The feature of frame of video is carried Take, relationship modeling between frame of video feature, the solution of correlation matrix, the figure on correlation matrix is cut.

2. human motion methods of video segmentation according to claim 1, it is characterised in that：

Frame of video feature extraction：Input t frame videos, to each frame of video background reducing is carried out, and extracts human body image with the back of the body Scape image, forms binary map；To frame of video computed range Transformation Graphs, range conversion figure is expanded into into a column vector by row；To t The column vector that range conversion figure acquired on frame video launches carries out K mean cluster, obtains the bi-level fashion of each frame video Class label vector, using class label vector as these frame of video feature export；

Relationship modeling between frame of video feature：Characteristic set { the x of input video frame₁,x₂,…,x_t, wherein x_iRepresent that the i-th frame is regarded The feature of frequency frame, these features constitute an eigenmatrix X=[x₁,x₂,…,x_t]；

The solution of correlation matrix：Build a correlation matrix M fusion similarity measurement and feature sequential propinquity；

Correlation matrix M

A is a weight matrix related to sequential, the line number of its matrix, columns and matrix X^TX is consistent, in addition 0 represent line number, Columns and matrix X^TNull matrix of the X consistent element all for 0；

Max computings represent that the value of each element in M takesMaximum between corresponding element value and 0 value；

Figure on correlation matrix is cut：After correlation matrix M is obtained, figure is performed to it and cuts algorithm, obtain frame of video feature Cluster result, and in this, as the segmentation result of frame of video, namely each class in cluster result is moved comprising an independence Make all of frame of video.

3. human motion methods of video segmentation according to claim 2, it is characterised in that：Represented with 1 in class label vector It belongs to such, and 0 represents that it is not belonging to corresponding class.

4. human motion methods of video segmentation according to claim 2, it is characterised in that：Correlation matrix M includes video Similarity measurement between frame feature, and measure propinquity of the frame of video feature in sequential.

5. the human motion methods of video segmentation according to one of claim 2 to 4, it is characterised in that：The value of element A is such as Represented by following formula：

In above formula, line index, column index in i, k difference representing matrix, ε values are 10^-6, τ is time window length；Weights A's Setting allows adjacent element in sequential to give bigger weights when similarity is calculated.

6. human motion methods of video segmentation according to claim 5, it is characterised in that：τ values are 5~17.