CN105512631A

CN105512631A - Violence and horror video detection method based on MoSIFT and CSD features

Info

Publication number: CN105512631A
Application number: CN201510894937.2A
Authority: CN
Inventors: 蒋兴浩; 孙锬锋; 倪俊; 郑辉; 王丹阳
Original assignee: DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd; Shanghai Jiaotong University
Current assignee: DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd; Shanghai Jiaotong University
Priority date: 2015-12-07
Filing date: 2015-12-07
Publication date: 2016-04-20
Anticipated expiration: 2035-12-07
Also published as: CN105512631B

Abstract

The invention provides a violence and horror video detection method based on MoSIFT (Motion ScaleInvariant Feature Transform) and CSD (Color Structure Descriptor) features, and the method comprises the steps: step1, calculating CSD of a video; step2, calculating a maximum dynamic density point of CSD features; step3, calculating a CSD fraction; step4, calculating MoSIFT of the video; step5, performing vertical dimensionality reduction for MoSIFT features; step6, performing horizontal dimensionality reduction for the video; step7, performing clustering for the MoSIFT features; step8, training SVM-1; step9, calculating a MoSIFT fraction; step10, training SVM-2; and step11, obtaining a classification result. According to the invention, the MoSIFT features and the CSD features are well-used, the complexity of the algorithm is reduced, and a relatively good detection effect can be obtained.

Description

Based on the sudden and violent probably video detecting method of MoSIFT and CSD feature

Technical field

The present invention relates to sudden and violent probably Video Detection Algorithm, particularly, relate to a kind of sudden and violent probably Video Detection Algorithm based on MoSIFT and CSD feature.

Background technology

Along with the development of internet, become vast as the open sea by the various video contents of transmission on Internet, and be wherein no lack of the terrified class video of violence, this kind of video easily causes bad psychological impact to minor.Therefore need to carry out classification management and control to the video on internet, traditional mode is the video by manual detection examination & verification process magnanimity, this method workload is large, and can not the diffusion of the terrified class video of management and control violence in time comprehensively, thus make to adopt the terrified video detecting method of automatic violence just to seem very meaningful.

The detection mode of sudden and violent probably video is varied, generally start with from sound signal and visual signal two aspect, screamed by sound signal identification, the abnormal sound information such as blast, the picture such as identify blood, darkness by visual signal, fight, these two kinds of methods are applied in different fields has respective advantage.Especially, in the data such as film, monitor video, visual signal identification advantageously.

Through finding the retrieval of sudden and violent probably video detection technology, publication number is CN104036301A, and publication date is on September 10th, 2014, and the Chinese patent that name is called " incident of violence recognition methods and system based on light stream block feature " discloses a kind of recognition methods.Particularly, motor point self-adaptation between frame of video is divided into light stream block; Filter out and comprise the light stream block that motor point number is not less than first threshold and obtain effective light stream block group; Judge whether there is incident of violence according to effective light stream block group, when the light stream block number in described effective light stream block group be not less than Second Threshold and in described effective light stream block group the key light flow valuve of all light stream blocks be all not less than the 3rd threshold value and in described effective light stream block group, the key light flow path direction angle between any two of all light stream blocks is all not less than the 4th threshold value time, then judge occur incident of violence.This patent documentation employs light stream block feature, but is only extracted the local dynamic station feature of video, and the overall static nature of video is not extracted, thus some static natures of video are lost, and makes the poor effect detected bloody terrified class video.

Therefore, need to propose a kind ofly to move the sudden and violent probably video detecting method of static nature, to improve detection efficiency and accuracy of detection by complete detection video.Space-time characteristic (MotionScaleInvariantFeatureTransform) algorithm of a kind of novelty that the people such as Charif propose, is called for short MoSIFT.First this algorithm extracts SIFT (InvariantFeatureTransform) point patterns in video image, then calculates the light stream size corresponding with SIFT key point yardstick.MoSIFT feature is tieed up light stream direction histogram by 128 dimension SIFT vector sums 128 and is formed by connecting, totally 256 dimensions.The array mode of light stream direction histogram and the array mode of SIFT feature vector similar: the modulus value and the angle that correspond to pixel gradient in SIFT by the modulus value of light stream and angle, be weighted in local domain.Different from SIFT descriptor, do not need the light stream rotating each point in principal direction here, the rotational invariance that reason is to be different from spatial domain helps the same target of identification angle, and light stream angle is the important information for sports immunology.

The color distribution situation of what color structure descriptor CSD (ColorStructureDescriptor) calculated is image local area.The window of such as 8 × 8 pixel sizes slides and the color category of statistical window appearance on whole image, this CSD feature is at HMMD (Hue, Min, Max, Difference) extract under color space, major advantage be can distinct color histogram similar, but the image pair that Color-spatial distribution is comparatively different.

Summary of the invention

For defect of the prior art, the object of this invention is to provide a kind of sudden and violent probably video detecting method based on MoSIFT and CSD feature.

According to the sudden and violent probably video detecting method based on MoSIFT and CSD feature provided by the invention, comprise the steps:

Step 1: the MoSIFT feature extracting test video, training video respectively;

Step 2: quantity and dimension reduction are carried out to the MoSIFT feature of the test video extracted, training video;

Step 3: the training video MoSIFT feature after cutting down quantity and dimension carries out cluster, obtains cluster centre matrix;

Step 4: obtain corresponding cluster centre according to cluster centre matrix, utilizes described cluster centre to add up the word bag of training video and test video, obtains training sample and test sample book;

Step 5: utilize described training sample to train SVM classifier, obtain sorter, be designated as SVM-1;

Step 6: utilize SVM-1 to carry out classified calculating respectively in training sample, test sample book, calculate mark, be designated as MoSIFT mark;

Step 7: the CSD feature extracting training video, test video respectively;

Step 8: utilize average dynamic density point MeanDiversityDensity, is called for short MDD method, the maximum dynamic density point MP of calculation training sample;

Step 9: according to the maximum dynamic density point MP difference calculation training sample of training sample, the CSD mark of test sample book;

Step 10: the MoSIFT mark and the CSD mark training SVM classifier that utilize training sample, the sorter obtaining training is designated as SVM-2;

Step 11: utilize SVM-2 on the MoSIFT mark and CSD mark of test sample book, classify, and then complete test video is classified.

Preferably, described step 1 comprises:

Step 1.1: test video and training video are split framing;

Step 1.2: adjacent two frames before and after process, calculates MoSIFT feature;

Particularly, utilize the MoSIFT algorithm realized based on Opencv, call the constructed fuction of the MoSIFT class in Opencv, obtain the MoSIFT feature that front and back two continuous frames has, and described MoSIFT feature is saved as the Mat categorical data of Opencv;

Step 1.3: by certain frame of a training video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fisrt feature matrix, described fisrt feature matrix is designated as M ₁; By certain frame of a test video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fourth feature matrix, described fourth feature matrix is designated as M ₄, processed all videos, obtained the fisrt feature matrix M of each video ₁.

Preferably, described step 2 comprises:

Step 2.1: to fisrt feature matrix M ₁in the first row perform following process successively to last column:

In fisrt feature matrix M ₁find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;

Step 2.2: by the fisrt feature matrix M after step 2.1 processes ₁k is divided into etc. line number ₄part, every portion is longitudinally averaged, i.e. calculated column mean value, obtains k by longitudinal spliced for every a result ₄× w ₁matrix and second characteristic matrix, described second characteristic matrix is designated as M ₂, wherein w ₁represent M ₁transverse dimensions;

Step 2.3: statistics second characteristic matrix M ₂in the number of non-zero element in each longitudinal dimension, extract second characteristic matrix M ₂in the number of non-zero element come front k ₁the column vector of the dimension of position, obtains third feature matrix M ₃; Remember the row number of the dimension extracted, according to described row number, extract fourth feature matrix M ₄obtain sixth feature matrix M ₆.

Preferably, described step 3 comprises: to the third feature matrix M of all training videos ₃carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C ₁, described cluster centre Matrix C ₁every a line is as a cluster centre;

Particularly, the k-means function process third feature matrix M of MATLAB is used ₃, and to limit maximum iterations be that 200, start value is set to cluster, the value that k-means function returns is set to [idx, center], and idx represents the position after certain a line cluster in M, and namely center represents cluster centre Matrix C ₁.

Preferably, described step 4 comprises:

Step 4.1: statistics idx obtains the word bag of training sample corresponding in training video, calculates sixth feature matrix M ₆in a line and cluster centre Matrix C ₁in the distance of every a line, get cluster centre Matrix C ₁in with sixth feature matrix M ₆in the minimum row of a row distance as vocabulary;

Step 4.2: the number of times that in statistical test video, each vocabulary occurs, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as v; The number of each vocabulary occurred in statistics training video word bag, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as q;

Step 4.3: the corresponding video vector v of each test video, each training video corresponding video vector q, obtains the training sample S that represents by the vectorial q of video, video vector v and test sample book T.

Preferably, described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.

Preferably, described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, calculates

In formula: a _jrepresent the estimated parameter of a jth training sample in SVM, y _jrepresent the label of a jth training sample, G (x _j, x) represent inner product, x _jrepresent the support vector of a jth training sample, x represents sample, and b represents SVM estimated parameter.

Preferably, described step 7 comprises:

Step 7.1: video is divided into k ₂part, get first frame of every part, the set of all frames taken out is designated as M ₇

Step 7.2: extract M ₇in CSD feature, particularly, CSD feature is extracted with MPEG7FEX instrument, the CSD of one frame is characterized as a vector, and getting dimension is 128, the feature of a video is saved as a file, the CSD feature of one frame saves as a line in this file, wherein, the feature of all frames of a video is called bag, and the feature of a frame is called example.

Preferably, described step 8 comprises:

Step 8.1: the positive sample in all training videos and negative sample are up numbered with sequence number from 1 respectively, inside each sample, the CSD feature of each frame is also up numbered with sequence number from 1;

Step 8.2: start to perform following process to all examples successively to the example being numbered 1 be numbered in the positive sample of 1:

Example i nearest in the positive sample finding certain example to number at the next one, by example i and the link of certain example, performs following process to example i:

Initialization i ₁=pi _1r, k=1 performs

{pm}_{k} = \frac{1}{k} Σ_{t = 1}^{k} {pi}_{{ti}_{t}}

i_{k + 1} = \underset{i_{k + 1} &Element; 1 : m}{\arg \min} E D ({pm}_{k}, {pi}_{{ki}_{k}})

In formula: i ₁represent the example of be linked the 1st bag, k represents that what processing is that kth is wrapped, i _krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, ED (pm _k, ) represent Euclidean distance between example, pm _kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure _kindividual example, arg () represents the parameter of looking for bracket condition, represent the t in t positive closure _tindividual example;

Step 8.3: the mean value of the link of some examples is designated as u, then the set of the mean value of the link of all examples is designated as P;

Step 8.4: all negative samples in training video are spliced into matrix M ₈, by all negative bag example hybrid together, each example is a row vector, obtains matrix M ₈;

Step 8.5: to matrix M ₈the first row performs following process successively to last column:

In matrix M ₈find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;

Step 8.6: by M ₈be divided into k by row ₃part, every part of longitudinal direction is averaged, and obtains w, and the set of all w is designated as N;

Step 8.7: in set of computations P in individual element to N whole element distance and, that to get in set P the distance of all elements in individual element to N is designated as MP with minimum element, is maximum dynamic density point; Computing formula is as follows:

w = \underset{w &Element; 1 : m}{\arg \max} Σ_{i = 1}^{k} E D ({pm}_{w}, {nm}_{i})

MP＝pm _w

In formula: pm _wrepresent w value in pm, pm represents the total collection of the average of different links, nm _irepresent i-th negative bag example average.

Preferably, described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:

In formula: bi _trepresent i-th example of test pack;

Described step 10 comprises: the MoSIFT mark of all training samples and test sample book and CSD mark are combined into 2 dimensional vector w, and the set of the w of training sample is designated as R, trains SVM classifier by set R.

Compared with prior art, the present invention has following beneficial effect:

1, the detection method of the sudden and violent probably video based on MoSIFT and CSD feature provided by the invention effectively can detect the terrified element of violence in such as TV programme, Internet video, and precision is high.

2, the detection method of the sudden and violent probably video based on MoSIFT with CSD feature provided by the invention is compared with the invention only using MoSIFT feature to carry out detecting, and its algorithm complex is less, and therefore detection speed is fast.

3, the detection method that the present invention is based on the sudden and violent probably video of MoSIFT with CSD feature has better accuracy compared with the method only using CSD feature to carry out detecting.

Accompanying drawing explanation

By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:

Fig. 1 is the schematic flow sheet of the sudden and violent probably video detecting method based on MoSIFT and CSD feature provided by the invention.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.

Particularly, as shown in Figure 1, according to the detection method of the sudden and violent probably video based on MoSIFT and CSD feature provided by the invention, comprise the steps:

Step 1: the MoSIFT feature extracting test video and training video respectively;

Step 2: quantity and dimension reduction are carried out to the training video after extracting MoSIFT feature;

Step 6: utilize SVM-1 to carry out classified calculating on training sample and test sample book, calculate mark, be designated as MoSIFT mark;

Step 7: the CSD feature extracting training video and test video respectively;

Step 9: according to the maximum dynamic density point MP calculation training sample of training sample and the CSD mark of test sample book;

Described step 1 comprises:

Step 1.1: test video and training video are split framing;

Particularly, utilize the MoSIFT algorithm realized based on Opencv, call the constructed fuction of MoSIFT class wherein, obtain the MoSIFT feature that front and back two continuous frames has, and described MoSIFT feature is saved as the Mat categorical data of Opencv;

Described step 2 comprises:

Concrete, if process jth row, then in jth+1 row to last column, look for nearest a line, herein, the distance of institute's directed quantity all uses Euclidean distance.

Particularly, the line number of M1 is designated as m1, and the capable and last y provisional capital of y=m1%k4, h=(m1-y)/k4, every h is longitudinally averaging and obtains u, by longitudinal spliced for all u be M ₂.

Step 2.3: statistics second characteristic matrix M ₂in the number of non-zero element in each longitudinal dimension, extract second characteristic matrix M ₂in non-zero element number size come front k ₁the column vector of the dimension of position, obtains third feature matrix M ₃.Remember the position of dimension, according to same position, extract fourth feature matrix M ₄obtain sixth feature matrix M ₆.

Described step 3 comprises: to the third feature matrix M of all training videos ₃carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C ₁, described cluster centre Matrix C ₁every a line is as a cluster centre;

Described step 4 comprises:

Step 4.1: statistics idx can obtain the word bag of training sample corresponding in training video, calculates sixth feature matrix M ₆in a line and cluster centre Matrix C 2 in the distance of every a line, get in cluster centre Matrix C 2 with sixth feature matrix M ₆in the minimum row of a row distance as vocabulary;

Described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.

Described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, and computing formula is as follows:

In formula: a _jrepresent SVM estimated parameter, y _jrepresent the label of training sample, G (x _j, x) represent inner product, x _jexpress support for vector, x represents sample, and b represents SVM estimated parameter.

Described step 7 comprises:

Step 7.2: extract M ₇in CSD feature, particularly, extract CSD feature with MPEG7FEX instrument, getting dimension is 128, the feature of a video is saved as a file, and a frame is a line, and wherein, the feature of all frames of a video is called bag, and the feature of a frame is called example.

Described step 8 comprises:

Initialization i ₁=pi _1r, k=1 performs

{pm}_{k} = \frac{1}{k} Σ_{t = 1}^{k} {pi}_{{ti}_{t}}

i_{k + 1} = \underset{i_{k + 1} &Element; 1 : m}{\arg \min} E D ({pm}_{k}, {pi}_{{ki}_{k}})

In formula: i ₁represent the example of the 1st bag be linked, pi ₁represent all examples of the 1st positive closure, k represents that what processing is kth bag, i _krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, ED (pm _k, ) represent Euclidean distance between example, pm _kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure _kindividual example, arg () represents the parameter of looking for bracket condition;

w = \underset{w &Element; 1 : m}{\arg \max} Σ_{i = 1}^{k} E D ({pm}_{w}, {nm}_{i})

MP＝pm _w

In formula: pm _wrepresent w pm, nm _irepresent i-th negative bag example average.

Described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:

In formula: bi _trepresent i-th example of test pack;

Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims

1., based on a sudden and violent probably video detecting method for MoSIFT and CSD feature, it is characterized in that, comprise the steps:

Step 1: the MoSIFT feature extracting test video, training video respectively;

Step 7: the CSD feature extracting training video, test video respectively;

2. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 1, it is characterized in that, described step 1 comprises:

Step 1.1: test video and training video are split framing;

3. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 2, it is characterized in that, described step 2 comprises:

4. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 3, it is characterized in that, described step 3 comprises: to the third feature matrix M of all training videos ₃carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C ₁, described cluster centre Matrix C ₁every a line is as a cluster centre;

5. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 4, it is characterized in that, described step 4 comprises:

6. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 5, it is characterized in that, described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.

7. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 6, it is characterized in that, described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, calculates

8. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 7, it is characterized in that, described step 7 comprises:

9. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 8, it is characterized in that, described step 8 comprises:

Initialization i ₁=pi _1r, k=1 performs

{pm}_{k} = \frac{1}{k} Σ_{t = 1}^{k} {pi}_{{ti}_{t}}

i_{k + 1} = \underset{i_{k + 1} &Element; 1 : m}{argmin} E D ({pm}_{k}, {pi}_{{ki}_{k}})

In formula: i ₁represent the example of be linked the 1st bag, k represents that what processing is that kth is wrapped, i _krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, represent the Euclidean distance between example, pm _kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure _kindividual example, arg () represents the parameter of looking for bracket condition, represent the t in t positive closure _tindividual example;

w = \underset{w &Element; 1 : m}{argmax} Σ_{i = 1}^{k} E D ({pm}_{w}, {nm}_{i})

MP＝pm _w

10. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 9, it is characterized in that, described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:

In formula: bi _trepresent i-th example of test pack;