CN103945227A

CN103945227A - Video semantic block partition method based on light stream clustering

Info

Publication number: CN103945227A
Application number: CN201410153245.8A
Authority: CN
Inventors: 林巍峣; 王薇月
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2014-04-16
Filing date: 2014-04-16
Publication date: 2014-07-23
Anticipated expiration: 2034-04-16
Also published as: CN103945227B

Abstract

The invention provides a video semantic block partition method based on light stream clustering. The detection result of consistent motion areas under the same motion mode and the light stream information are combined, a relevant matrix is built, spectral clustering is carried out, labels obtained after clustering are used as feature vectors, and secondary clustering is carried out on different motion modes. According to the method, the detection result of the consistent motion areas can be well applied, and the relevant information of the scene motion under the same motion mode and the global information of the scene motion under the different motion modes can be effectively mastered. The information of the scene motion is mastered accurately, and the method plays an important role in improving the accuracy rate of relevant applications like mode recognition.

Description

Video semanteme piece dividing method based on light stream cluster

Technical field

The present invention relates to a kind of video semanteme piece cutting techniques, relate in particular to the video semanteme piece cutting techniques on light stream cluster basis, that is: carry out scene by the light stream cluster to video and cut apart, then use segmentation result to continue to obtain the piecemeal that motor pattern is more easily expressed.

Background technology

Semantic chunk in video refers to the consistent region of motor pattern.It is very important that video semanteme piece is segmented in image processing field, and it is widely used at aspects such as traffic monitoring, dense population monitoring.

Consistent moving region detection technique is a very important technology in video and image processing field, the method detecting about consistent moving region has a lot, through the literature search of prior art is found, as, high density that what paper " A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis " that the people such as S.Ali deliver for 2007 on " Conference on Computer Vision and Pattern Recognition " proposed the cut apart consistent moving region of trooping.But these methods are cut apart mainly for a kind of stable motor pattern.If hold the different motion pattern of video Same Scene, these methods are far from being enough.As there being the motor patterns such as left-hand rotation, right-hand rotation, craspedodrome in traffic video, every kind of pattern has the consistent moving region of different vehicles, between many regions, overlaps each other, and is not suitable for right-hand rotation scene or craspedodrome scene for the division of the consistent moving region of left-hand rotation scene.Dividing semantic chunk, is the world subdivision that obtains more reasonably, more fully describing Same Scene different motion pattern, and it has very important effect to work such as Describing Motion pattern, identifications.

In addition the paper " Multi-camera activity correlation analysis " that, the people such as C.C.Loy delivers for 2007 on " Conference on Computer Vision and Pattern Recognition " has proposed a kind of method of auto Segmentation semantic chunk.The method is mainly found semantic chunk by the correlation of foreground moving, and its subject matter is, it mainly considers the motor pattern correlation of consecutive frame, neighbor, has no idea to hold the global characteristics of scene.

Therefore,, for above variety of problems, we need to find a kind of different time motor pattern of can holding and can grasp again the method for overall different pixels movement relation and divide semantic chunk.

Summary of the invention

For defect of the prior art, the object of this invention is to provide a kind of video semanteme piece dividing method based on light stream cluster.

According to a kind of video semanteme piece dividing method based on light stream cluster provided by the invention, comprise following steps:

Step 1: the optical flow field of the video that input contains different motion pattern, wherein, described video has T frame, every frame is containing M × N pixel, M, N are respectively the length and width of every two field picture, total T-1 of described optical flow field, and each optical flow field is two two-dimensional matrixs that size is M × N, one is x direction, and one is y direction;

Step 2: carry out cluster for each optical flow field in T-1 optical flow field, obtain U consistent moving region;

Step 3: consistent U moving region is classified as to S (S<U) consistent moving region group, and indicate each consistent moving region as area classification label (if Fig. 2 is the unanimously signal of area classification label in different optical flow fields) with the label of consistent moving region group;

Step 4: utilize the area classification label that step 3 obtains to carry out construction feature vector to each pixel in M × N pixel, obtain altogether M × N characteristic vector C ₁, C ₂..., C _{m × N}; The characteristic vector of m pixel is by cascading up the label of the consistent moving region group under m is in each optical flow field to obtain;

Step 5: according to characteristic vector C ₁, C ₂..., C _{m × N}, M × N point carried out to the semantic chunk that K-means cluster obtains splitting.

Preferably, described step 3 comprises the steps:

Step a) is established in T-1 optical flow field, and a total U consistent moving region, carries out given sequence number to each consistent moving region, and sequence number is 1～U, and sets up U × 2 dimension matrix F;

Step b), for each consistent moving region, is calculated its average light stream, puts into the relevant position of F, wherein; As, the consistent moving region (1≤h≤U) that in i group, label is h, the light stream of x direction and the light stream of y direction of points all in the h of i optical flow field consistent moving region are averaged respectively, and mean value is distinguished to assignment to F (h, 1) with F (h, 2), F (h, 1) be respectively h unanimously x durection component and the y durection component of the average light stream in moving region with F (h, 2);

It is the two-dimensional matrix W of U × U that step c) is set up size, wherein, W (i, j) be F (i, 1) F (j, 1)+F (i, 2) F (j, 2), W (i, j) W (i, j) be i consistent moving region and j consistent moving region motion relevance, F (i, 1) be the x durection component of i the consistent average light stream in moving region, F (j, 1) be the x durection component of j the consistent average light stream in moving region, F (i, 2) be the y durection component of j the consistent average light stream in moving region, F (j, 2) be the y durection component of j the consistent average light stream in moving region, oeprator represents multiplication sign,

Step d) utilizes Spectral Clustering to carry out cluster to matrix W, forms S cluster, and with the each consistent moving region of cluster label sign corresponding to this S cluster, indicates U consistent moving region with S label.

Preferably, the characteristic vector of m pixel described in step 4 obtains by the following method:

If m pixel affiliated consistent moving region category label in t optical flow field is h _m,t, 1≤h _m,t≤ S, 1≤t≤T-1; By area classification label h _m,tassignment is to C _m(t), i.e. C _m(t)=h _m,t, wherein, C _mbe the characteristic vector of m pixel, C _m(t) be that m pixel characteristic is to a flow control t element.

Compared with prior art, the present invention has following beneficial effect:

The invention discloses a kind of video semanteme piece dividing method detecting based on consistent moving region, the result that it detects according to the consistent moving region under same motor pattern is combined with Optic flow information, build correlation matrix, and carry out spectral clustering cluster, and the label that cluster is obtained is as characteristic vector, and different motion pattern is carried out to secondary cluster.

The present invention can apply consistent moving region testing result well, effectively holds relevant information under the same movement pattern of scene motion and the global information of different motion pattern.The present invention holds accurately scene motion information, to the raising important role of the related application accuracys rate such as pattern recognition.

Brief description of the drawings

By reading the detailed description of non-limiting example being done with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:

Fig. 1 is the consistent moving region of certain frame optical flow field in input video;

Fig. 2 utilizes consistent region group to indicate consistent region, and represents the schematic diagram of the characteristic vector of a pixel with the label of consistent region group;

Fig. 3 is the Scene Semantics piece schematic diagram of finally cutting apart.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art further to understand the present invention, but not limit in any form the present invention.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, can also make some distortion and improvement.These all belong to protection scope of the present invention.

The present embodiment is implemented under taking technical solution of the present invention as prerequisite, provided detailed execution mode and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

The present embodiment comprises the following steps:

Step 1: obtain input video (total T frame, every frame M × N pixel, M, N are respectively the length and width of every two field picture) optical flow field (altogether T-1 field, each is two two-dimensional matrixs that size is M × N, one is x direction, one is y direction) with consistent moving region (altogether T-1 segmentation result, in each segmentation result, have several consistent regions).

The method that the paper " High accuracy optical flow estimation based on a theory for warping " that described optical flow field can be delivered by T.Brox etc. for 2004 on " European Conference on Computer Vision " proposes obtains.

Step 2: cluster is carried out in the light stream for each optical flow field in T-1 field, obtains U (U>T-1) consistent moving region altogether.

The method that described cluster can be published in by J.Hartigan etc. " the A K-Means Clustering Algorithm " proposition on " Journal of the Royal Statistical Society " for 1979 obtains.

Step 3: the U obtaining in step 2 consistent moving region sorted out, and obtain the consistent moving region of S (S<U) group, and indicate respectively each consistent moving region with the label of consistent moving region group, concrete steps are as follows:

A) establish in T-1 optical flow field, a total U consistent moving region, carries out given sequence number to the moving region of each consistent wish, and sequence number is 1～U.And set up U × 2 and tie up matrix F.

B) for each consistent region, calculate its average light stream, put into the relevant position of F.As, the consistent region (1≤h≤U) that in i group, label is h, averages the light stream of x direction and the light stream of y direction of points all in the h of i optical flow field consistent moving region respectively, and mean value is distinguished to assignment to F (h, 1) with F (h, 2).

C) setting up size is the two-dimensional matrix W of U × U.Wherein, W (i, j) is F (i, 1) F (j, 1)+F (i, 2) F (j, 2).

D) utilize Spectral Clustering to carry out cluster to matrix W, form S cluster, and with the each consistent moving region of cluster label sign corresponding to this S cluster, can indicate U consistent moving region with S label.

Wherein step 3 d) " Self-tuning spectral clustering " that described spectral clustering can be published in by L.Zelnik-Manor etc. on " Neural Information Processing Systems " for 2004 realize.

Step 4: utilize area classification label that step 3 obtains to each the pixel construction feature vector in M × N pixel, obtain altogether M × N characteristic vector C ₁, C ₂..., C _{n × N}, for representing the kinetic characteristic of each pixel at whole video.

The characteristic vector of m described pixel can be by cascading up the label of the consistent region group under m is in each optical flow field to obtain.Establishing m pixel affiliated consistent region labeling in t optical flow field (1≤t≤T-1) is h _m,t(11h _m,t≤ S).Be h by region labeling _m,tarea classification label (be L (h _m,t)) assignment is to C _m(t), i.e. C _m=[C _m(t)]=[L (h _m,t)].

Step 5: according to characteristic vector C ₁, C ₂..., C _{m × N}, utilize the methods such as K-means to carry out cluster to M × N point, the semantic chunk that can obtain splitting.

Above specific embodiments of the invention are described.It will be appreciated that, the present invention is not limited to above-mentioned specific implementations, and those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims

1. the video semanteme piece dividing method based on light stream cluster, is characterized in that, comprises following steps:

Step 2: carry out cluster for each optical flow field in T-1 optical flow field, obtain U consistent moving region, U>T-1;

Step 3: consistent U moving region is classified as to S consistent moving region group, S<U, and with the each consistent moving region of label sign of consistent moving region group as area classification label;

2. the video semanteme piece dividing method based on light stream cluster according to claim 1, is characterized in that: described step 3 comprises the steps:

Step b), for each consistent moving region, is calculated its average light stream, puts into the relevant position of F;

It is the two-dimensional matrix W of U × U that step c) is set up size; Wherein, W (i, j) be F (i, 1) F (j, 1)+F (i, 2) F (j, 2), W (i, j) is i consistent moving region and j consistent moving region motion relevance, F (i, 1) be the x durection component of i the consistent average light stream in moving region, F (j, 1) is the x durection component of j the consistent average light stream in moving region, F (i, 2) be the y durection component of j the consistent average light stream in moving region, F (j, 2) is the y durection component of j the consistent average light stream in moving region, and oeprator represents multiplication sign;

3. the video semanteme piece dividing method based on light stream cluster according to claim 1, is characterized in that: the characteristic vector of m pixel described in step 4 obtains by the following method: