CN105654054B

CN105654054B - The intelligent video analysis method of study and more visual dictionary models is propagated based on semi-supervised neighbour

Info

Publication number: CN105654054B
Application number: CN201511022492.5A
Authority: CN
Inventors: 朱珂; 许维纲; 夏冰
Original assignee: Shanghai Yiben Information Technology Co Ltd
Current assignee: SHANGHAI GUIHE SOFTWARE TECHNOLOGY Co.,Ltd.
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2018-12-04
Anticipated expiration: 2035-12-30
Also published as: CN105654054A

Abstract

The present invention relates to a kind of intelligent video analysis methods that study and more visual dictionary models are propagated based on semi-supervised neighbour, comprising: keeps the key video sequence frame of strategy to extract based on sampling；Key frame is converted into gray level image, averagely segmentation gray level image is several image blocks, generates OM feature vector with the average brightness sequence of each image block；Based on AP clustering video sample, the video submanifold of mutual exclusion is formed；For the video submanifold of AP clustering, determines the corresponding class label of each submanifold, construct more visual dictionaries；According to minimum range decision rule, the class label of video to be checked is judged online；The class label that the more visual dictionaries to shake down rejudge video to be checked is reconstructed using the self-adapting reconstruction study of closed loop feedback.The present invention solves the problems, such as that key-frame extraction cannot take into account efficiency and precision, and enhancing video features pairing approximation overall situation linear change has very strong strong robustness, realizes the accurate intelligent measurement that can be adaptive to video mode variation.

Description

The intelligent video analysis of study and more visual dictionary models is propagated based on semi-supervised neighbour Method

Technical field

The present invention relates to computer graphic image processing technology fields, in particular to a kind of to be based on semi-supervised neighbour's dissemination Practise the intelligent video analysis method with more visual dictionary models.

Background technique

Universal with the terminal devices such as smart phone, digital camera and internet, the production of video stores and transmits more It is convenient to add, and more and more people watch video by network, and the life information of oneself is fabricated to video and is uploaded to network point It enjoys, video traffic has been no longer limited to the services for life industries such as traditional radio and television, amusement film, and is widely used in political affairs It controls, military affairs, science and education, medical treatment, traffic, the various fields such as safety.In face of " explosive " growth of number of videos, how quick and precisely Ground retrieval, matching, classification video information face huge challenge, while being also the hot issue of academia's research.

Existing intellectual analysis key technology mainly includes three kinds: key-frame extraction, feature detection and machine learning, key frame Extract: key frame is one group of image of reflecting video segment main contents, and the extraction method of key frame based on shot segmentation is accurate Property it is high, but calculation amount is too big, time-consuming serious, and the key-frame extraction based on fixed rate has very high timeliness, but extracts Keyframe sequence possibly can not accurate description video clip main contents, lack representative；Feature detection: one kind utilizes letter Clean global characteristics quickly detect video, such as color histogram etc., this kind of technology detection efficiency with higher, but are only capable of Detect some videos without adding global linear change；The another kind of robustness using higher-dimension local feature enhancing algorithm, example Such as Scale invariant features transform, fast robust feature, this kind of technology still have the video of some nonlinear changes good Good detection effect even includes the variation of some backgrounds if visual angle change, Geometrical change, picture are cut, but every frame video In can all extract hundreds and thousands of a local features, cause the calculation amount of characteristic matching huge, detection efficiency not can guarantee；Engineering It practises: the method that video analysis generally uses machine learning, but the accuracy of machine learning algorithm remains to be further improved, engineering Practising sorting algorithm is often influenced by following factor: the training sample of most of machine learning algorithm pairing approximation elliposoidal distribution This collection has preferable classifying quality, but cannot get good classification performance to the complex distributions structure of spill；In practical application In inevitably can be comprising the noises such as some isolated points, unknown data or wrong data, these can all influence the quality of sorting algorithm； It cannot be adaptive to the variation of video environment, identify emerging video type in real time.

Summary of the invention

Aiming at the shortcomings in the prior art, the present invention is provided a kind of propagated based on semi-supervised neighbour and learnt and more visual dictionaries The intelligent video analysis method of model solves the problems, such as that key video sequence frame cannot take into account efficiency and precision in the prior art, enhancing The robustness of video features pairing approximation overall situation linear change realizes the accurate intelligence inspection that can be adaptive to video mode variation It surveys.

According to design scheme provided by the present invention, one kind propagating study and more visual dictionary models based on semi-supervised neighbour Intelligent video analysis method, comprise the following steps:

Step 1. is directed to video sample, keeps strategy to extract key video sequence frame using sampling；

Step 2. is directed to key video sequence frame, calculates the OM feature vector based on sequential metrics；

Step 3. propagates study to all OM feature vectors progress intelligent clustering using based on semi-supervised neighbour, is formed each Video submanifold；

Step 4. determines the corresponding class label of each video submanifold, constructs more visual dictionaries, and class label includes unknown Type video label；

Video to be checked is successively executed the sampling of the use in step 1 and strategy is kept to extract key video sequence frame and step by step 5. OM feature vector of the calculating based on sequential metrics in 2, and according to more visual dictionaries in step 4, according to minimum range rule Judge the class label of video to be checked；

If the video number that UNKNOWN TYPE video tab occurs in step 6. is greater than given threshold, using closed loop feedback from Reconstruct learning method is adapted to, return step 3 reconstructs the more visual dictionaries that can adapt to new environment, further judges wait inspect Otherwise the class label of frequency terminates.

Above-mentioned, step 1 specifically includes following content: for the video frame of any arrival, extracting the abstract letter of video frame Breath；Summary info is matched with key feature library, if successful match, which is determined as key video sequence frame, it is no Then, random sampling is carried out according to Probability p and be determined as key video sequence frame if drawing, otherwise, abandon the video frame.

Above-mentioned, step 2 calculates the OM feature vector based on sequential metrics and specifically includes the following steps:

Key video sequence frame is converted into gray level image by step 2.1；

Gray level image is averagely divided into N number of image block by step 2.2, wherein N=Nx*Ny, wherein Nx represents X-axis side To image block；Ny represents the image block of Y direction；

Step 2.3, the average brightness value I for calculating each image block_k, i.e.,Wherein, f (x, It y) is brightness value of the coordinate for the pixel of (x, y), k ∈ [1, N]；M, n are the line number and columns of image block；

Step 2.4 is ranked up each image block average brightness value, generates OM feature vector I=[I₁,I₂,……,I_N]。

Above-mentioned, step 3 specifically includes following content:

Step 3.1 is directed to marked video sample SPACE V_l, unmarked video sample SPACE V_nl, extract all videos and close The OM feature vector of key frame；

Step 3.2 successively judges any two OM feature vector I_i、I_jWhether V is belonged to_lIf belonging to V_l, and it is marked Video belongs to same type, then by two OM feature vector I_i、I_jThe distance between D_ijIt is set as maximum value 0；If belonging to V_l, and Marked video is not belonging to uniform type, then by two I_i、I_jThe distance between D_ijIt is set as minimum value-∞；If I_i、I_jIn at least There is one to belong to V_nl, then I is calculated_i、I_jEuclidean distance, distance D as between the two_ij；

The distance of n sample point any two OM feature vector in video sample space is stored in matrix by step 3.3 In E；

Step 3.4 carries out clustering based on neighbour's propagation principle, forms the K C={ C that cluster₁,…,C_K}。

Above-mentioned, step 4 specifically includes following content:

Step 4.1 is directed to any video submanifold Ci, calculates the marked video frame for belonging to any class label lj in Ci Number Ni；

Step 4.2, using majority vote rule, by the class label l comprising most samples^*It is assigned to submanifold Ci；

Step 4.3, the mass center wi, as class label l for calculating video submanifold Ci^*Vision code book Wi；

Step 4.4, all vision code books constitute more visual dictionaries.

Above-mentioned, step 5 specifically includes following content:

Video to be checked is kept strategy to extract key video sequence frame by step 5.1 using sampling；

Step 5.2 calculates the OM feature vector O based on sequential metrics；

Step 5.3 calculates OM feature vector O at a distance from vision code book Wi any in more visual dictionaries；

Step 5.4 finds out the corresponding vision code book Wj of minimum range；

Video to be checked is judged to the affiliated submanifold of vision code book Wj by step 5.5.

Above-mentioned, the self-adapting reconstruction learning method of closed loop feedback in step 6 reconstructs the more views that can adapt to new environment Feel dictionary, specifically include following content:

Step 6.1, the maximum differentiation distance Dmax and threshold value δ of initialization；

Step 6.2 is directed to key frame of video OM feature vector Oi, and it is whether big at a distance from all vision code books to compare Oi In Dmax；

If step 6.3, Oi are all larger than Dmax at a distance from all vision code books, which is judged to UNKNOWN TYPE video； Otherwise, jump procedure 5.3 executes；

If step 6.4, UNKNOWN TYPE video number are greater than threshold value δ, 3 are gone to step, construct the more views to shake down Feel dictionary, further judge the type of video to be checked, otherwise, terminates.

Beneficial effects of the present invention:

1, the present invention keeps strategy to extract key frame of video, calculating OM feature vector, based on semi-supervised using sampling is based on The method that the clustering of AP, more visual dictionaries building based on label mapping and closed loop feedback reconstruct learn, is effectively reduced The computation complexity of the characteristic matching of massive video, and the precision of visual classification is greatly improved, realize the intelligence point of massive video Analysis.

2, the present invention is based on the key video sequence frame extraction that sampling keeps strategy, sample process is maintained to facilitate, at guarantee Reason efficiency is unaffected, and on the other hand the thick matching to key video sequence frame based on known key feature, avoids because of key video sequence frame Difference caused by extracting；The OM feature vector of sequential metrics does not emphasize the average brightness of each image block, only focuses on each image block The sequence of average brightness to addition noise, adjustment parameter, the approximate global linear change such as recompiles and has good robust Property, feature is succinct, is easy to extract；Semi-supervised learning mechanism is introduced, using most of unlabeled exemplars and on a small quantity has exemplar, The computation complexity for substantially reducing marking video sample knows video class label using a small amount of marker samples automatically, improves and divides Class precision and accuracy propagate the Clustering Effect that study improves extensive high dimensional feature using neighbour, link vision code book it is same Justice and ambiguity problem；If occurring unknown video type in video environment, intelligence can be known by way of closed loop feedback reconstruct Video mode that Chu be not new can adapt to the accurate intelligent measurement changed in video mode.

Detailed description of the invention:

Fig. 1 is flow diagram of the invention；

Fig. 2 is the key video sequence frame extraction flow diagram of the invention that strategy is kept based on sampling；

Fig. 3 is the flow diagram of calculating OM feature vector of the invention；

Fig. 4 is the intelligent clustering flow diagram of the invention based on semi-supervised AP；

Fig. 5 is the more visual dictionary flow diagrams of building of the invention；

Fig. 6 is the flow diagram for the class label that foundation minimum range rule of the invention judges video to be checked；

Fig. 7 is the self-adapting reconstruction learning method flow diagram of the invention based on closed loop feedback.

Specific embodiment:

The present invention is described in further detail with technical solution with reference to the accompanying drawing, and detailed by preferred embodiment Describe bright embodiments of the present invention in detail, but embodiments of the present invention are not limited to this.

Embodiment one, a kind of shown in Figure 1, intelligence for propagating study and more visual dictionary models based on semi-supervised neighbour Video analysis method, comprises the following steps:

Step 6. when video number that UNKNOWN TYPE video tab occur is greater than given threshold, using closed loop feedback from Reconstruct learning method is adapted to, return step 3 reconstructs the more visual dictionaries that can adapt to new environment, otherwise, terminates.

Embodiment two, referring to shown in Fig. 1~7, a kind of propagated based on semi-supervised neighbour is learnt and more visual dictionary models Intelligent video analysis method, comprises the following steps:

Step 1. is directed to video sample, keeps strategy to extract key video sequence frame using sampling, for the video arbitrarily reached Frame extracts the summary info of video frame；Summary info is matched with key feature library, if successful match, by the video Frame is determined as key video sequence frame, otherwise, carries out random sampling according to Probability p and is determined as key video sequence frame if drawing, otherwise, lose Abandon the video frame, by key video sequence frame extraction be divided into feature slightly match and the video frame fixed cycle sample, if arrival video frame It is fitted on known key feature, then it is assumed that it is key video sequence frame, is otherwise sampled with fixation probability p；

Step 2. is directed to key video sequence frame, calculates the OM feature vector based on sequential metrics, specifically includes the following steps:

Key video sequence frame is converted into gray level image by step 2.1；

Step 2.4 is ranked up each image block average brightness value, generates OM feature vector I=[I₁,I₂,……,I_N]；

Step 3. propagates study to all OM feature vectors progress intelligent clustering using based on semi-supervised neighbour, is formed each Video submanifold specifically includes following content:

Step 3.2 successively judges any two OM feature vector I_i、I_jWhether V is belonged to_lIf belonging to V_l, and it is marked Video belongs to same type, then by two OM feature vector I_i、I_jDistance D_ijIt is set as maximum value 0；If belonging to V_l, and marked Note video is not belonging to uniform type, then by two I_i、I_jDistance D_ijIt is set as minimum value-∞；If I_i、I_jIn at least one category In V_nl, then I is calculated_i、I_jEuclidean distance D_ij；

The distance of any the two of n sample point in video sample space is stored in matrix E by step 3.3；

Step 3.4 carries out clustering based on neighbour's propagation principle, forms the K C={ C that cluster₁,…,C_K}；Step 4. is true Determine the corresponding class label of each video submanifold, construct more visual dictionaries, class label includes UNKNOWN TYPE video tab, specifically Include following content:

Step 4.4, all vision code books constitute more visual dictionaries；

Video to be checked is successively executed the sampling of the use in step 1 and strategy is kept to extract key video sequence frame and step by step 5. OM feature vector of the calculating based on sequential metrics in 2, and according to more visual dictionaries in step 4, according to minimum range rule Judge the class label of video to be checked, specifically include following content:

Step 5.2 calculates the OM feature vector O based on sequential metrics；

Step 5.4 finds out the corresponding vision code book Wj of minimum range；

Video to be checked is judged to the affiliated submanifold of vision code book Wj by step 5.5；

Step 6. when video number that UNKNOWN TYPE video tab occur is greater than given threshold, using closed loop feedback from Reconstruct learning method is adapted to, the more visual dictionaries that can adapt to new environment are reconstructed, specifically includes following content:

Difference caused by avoiding key video sequence frame from extracting improves video detection efficiency；To addition noise, adjustment parameter, again Newly encoded equal approximate global linear change has good robustness；The precision of video analysis model is improved, is improved extensive high The Clustering Effect of dimensional feature alleviates the synonymy and ambiguity problem of vision code book.

The invention is not limited to above-mentioned specific embodiment, those skilled in the art can also make a variety of variations accordingly, But it is any all to cover within the scope of the claims with equivalent or similar variation of the invention.

Claims

1. a kind of intelligent video analysis method for being propagated study and more visual dictionary models based on semi-supervised neighbour, feature are existed In: it comprises the following steps:

Step 1. is directed to video sample, keeps strategy to extract key video sequence frame using sampling；For the video frame arbitrarily reached, mention Take the summary info of video frame；Summary info is matched with key feature library, if successful match, which is determined Otherwise random sampling is carried out according to Probability p and is determined as key video sequence frame if drawing, otherwise, abandons the view for key video sequence frame Frequency frame；

Step 3. propagates study to all OM feature vectors progress intelligent clustering using based on semi-supervised neighbour, forms each video Submanifold；

Step 4. determines the corresponding class label of each video submanifold, constructs more visual dictionaries, and class label includes UNKNOWN TYPE Video tab；

Step 5. successively executes video to be checked keeps strategy to extract key video sequence frame and calculate based on sequential metrics using sampling OM feature vector the classification mark of video to be checked is judged according to minimum range rule and according to more visual dictionaries in step 4 Label；

If the video number that UNKNOWN TYPE video tab occurs in step 6. is greater than given threshold, the adaptive of closed loop feedback is used Learning method is reconstructed, return step 3 reconstructs the more visual dictionaries that can adapt to new environment, further judges video to be checked Otherwise class label terminates.

2. the intelligent video analysis according to claim 1 for propagating study and more visual dictionary models based on semi-supervised neighbour Method, it is characterised in that: step 2 calculates the OM feature vector based on sequential metrics and specifically includes the following steps:

Key video sequence frame is converted into gray level image by step 2.1；

Gray level image is averagely divided into N number of image block by step 2.2, wherein N=Nx*Ny, wherein Nx represents X-direction Image block；Ny represents the image block of Y direction；

Step 2.3, the average brightness value I for calculating each image block_k, i.e.,Wherein, f (x, y) is Coordinate is the brightness value of the pixel of (x, y), k ∈ [1, N]；M, n are the line number and columns of image block；Step 2.4, to each image block Average brightness value is ranked up, and generates OM feature vector I=[I₁,I₂,……,I_N]。

3. the intelligent video analysis according to claim 2 for propagating study and more visual dictionary models based on semi-supervised neighbour Method, it is characterised in that: step 3 specifically includes following content:

Step 3.1 is directed to marked video sample SPACE V_l, unmarked video sample SPACE V_nl, extract all key frame of video OM feature vector；

Step 3.2 successively judges any two OM feature vector I_i、I_jWhether V is belonged to_lIf belonging to V_l, and marked video category In same type, then by two OM feature vector I_i、I_jDistance D_ijIt is set as maximum value 0；If belonging to V_l, and marked video It is not belonging to uniform type, then by two I_i、I_jDistance D_ijIt is set as minimum value-∞；If I_i、I_jIn at least one belong to V_nl, then Calculate I_i、I_jEuclidean distance D_ij；

The distance of any the two of n sample point in video sample space is stored in matrix E by step 3.3；Step 3.4, base Clustering is carried out in neighbour's propagation principle, forms the K C={ C that cluster₁,…,C_K}。

4. the intelligent video analysis according to claim 3 for propagating study and more visual dictionary models based on semi-supervised neighbour Method, it is characterised in that: step 4 specifically includes following content:

Step 4.1 is directed to any video submanifold Ci, calculates the number for belonging to the marked video frame of any class label lj in Ci Ni；

Step 4.4, all vision code books constitute more visual dictionaries.

5. the intelligent video analysis according to claim 4 for propagating study and more visual dictionary models based on semi-supervised neighbour Method, it is characterised in that: step 5 specifically includes following content:

Step 5.2 calculates the OM feature vector O based on sequential metrics；

Step 5.4 finds out the corresponding vision code book Wj of minimum range；

6. the intelligent video analysis according to claim 5 for propagating study and more visual dictionary models based on semi-supervised neighbour Method, it is characterised in that: the self-adapting reconstruction learning method of closed loop feedback in step 6 reconstructs and can adapt to the more of new environment Visual dictionary specifically includes following content:

Step 6.2 is directed to key frame of video OM feature vector Oi, compares whether Oi is all larger than at a distance from all vision code books Dmax；

If step 6.3, Oi are all larger than Dmax at a distance from all vision code books, which is judged to UNKNOWN TYPE video；Otherwise, Jump procedure 5.3 executes；

If step 6.4, UNKNOWN TYPE video number are greater than threshold value δ, 3 are gone to step, construct the more visual words to shake down Allusion quotation further judges the type of video to be checked, otherwise, terminates.