CN105512631A - Violence and horror video detection method based on MoSIFT and CSD features - Google Patents

Violence and horror video detection method based on MoSIFT and CSD features Download PDF

Info

Publication number
CN105512631A
CN105512631A CN201510894937.2A CN201510894937A CN105512631A CN 105512631 A CN105512631 A CN 105512631A CN 201510894937 A CN201510894937 A CN 201510894937A CN 105512631 A CN105512631 A CN 105512631A
Authority
CN
China
Prior art keywords
video
mosift
feature
training
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510894937.2A
Other languages
Chinese (zh)
Other versions
CN105512631B (en
Inventor
蒋兴浩
孙锬锋
倪俊
郑辉
王丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd
Shanghai Jiaotong University
Original Assignee
DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd, Shanghai Jiaotong University filed Critical DIGITAL CHINA (SHANGHAI) HOLDINGS Ltd
Priority to CN201510894937.2A priority Critical patent/CN105512631B/en
Publication of CN105512631A publication Critical patent/CN105512631A/en
Application granted granted Critical
Publication of CN105512631B publication Critical patent/CN105512631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Abstract

The invention provides a violence and horror video detection method based on MoSIFT (Motion ScaleInvariant Feature Transform) and CSD (Color Structure Descriptor) features, and the method comprises the steps: step1, calculating CSD of a video; step2, calculating a maximum dynamic density point of CSD features; step3, calculating a CSD fraction; step4, calculating MoSIFT of the video; step5, performing vertical dimensionality reduction for MoSIFT features; step6, performing horizontal dimensionality reduction for the video; step7, performing clustering for the MoSIFT features; step8, training SVM-1; step9, calculating a MoSIFT fraction; step10, training SVM-2; and step11, obtaining a classification result. According to the invention, the MoSIFT features and the CSD features are well-used, the complexity of the algorithm is reduced, and a relatively good detection effect can be obtained.

Description

Based on the sudden and violent probably video detecting method of MoSIFT and CSD feature
Technical field
The present invention relates to sudden and violent probably Video Detection Algorithm, particularly, relate to a kind of sudden and violent probably Video Detection Algorithm based on MoSIFT and CSD feature.
Background technology
Along with the development of internet, become vast as the open sea by the various video contents of transmission on Internet, and be wherein no lack of the terrified class video of violence, this kind of video easily causes bad psychological impact to minor.Therefore need to carry out classification management and control to the video on internet, traditional mode is the video by manual detection examination & verification process magnanimity, this method workload is large, and can not the diffusion of the terrified class video of management and control violence in time comprehensively, thus make to adopt the terrified video detecting method of automatic violence just to seem very meaningful.
The detection mode of sudden and violent probably video is varied, generally start with from sound signal and visual signal two aspect, screamed by sound signal identification, the abnormal sound information such as blast, the picture such as identify blood, darkness by visual signal, fight, these two kinds of methods are applied in different fields has respective advantage.Especially, in the data such as film, monitor video, visual signal identification advantageously.
Through finding the retrieval of sudden and violent probably video detection technology, publication number is CN104036301A, and publication date is on September 10th, 2014, and the Chinese patent that name is called " incident of violence recognition methods and system based on light stream block feature " discloses a kind of recognition methods.Particularly, motor point self-adaptation between frame of video is divided into light stream block; Filter out and comprise the light stream block that motor point number is not less than first threshold and obtain effective light stream block group; Judge whether there is incident of violence according to effective light stream block group, when the light stream block number in described effective light stream block group be not less than Second Threshold and in described effective light stream block group the key light flow valuve of all light stream blocks be all not less than the 3rd threshold value and in described effective light stream block group, the key light flow path direction angle between any two of all light stream blocks is all not less than the 4th threshold value time, then judge occur incident of violence.This patent documentation employs light stream block feature, but is only extracted the local dynamic station feature of video, and the overall static nature of video is not extracted, thus some static natures of video are lost, and makes the poor effect detected bloody terrified class video.
Therefore, need to propose a kind ofly to move the sudden and violent probably video detecting method of static nature, to improve detection efficiency and accuracy of detection by complete detection video.Space-time characteristic (MotionScaleInvariantFeatureTransform) algorithm of a kind of novelty that the people such as Charif propose, is called for short MoSIFT.First this algorithm extracts SIFT (InvariantFeatureTransform) point patterns in video image, then calculates the light stream size corresponding with SIFT key point yardstick.MoSIFT feature is tieed up light stream direction histogram by 128 dimension SIFT vector sums 128 and is formed by connecting, totally 256 dimensions.The array mode of light stream direction histogram and the array mode of SIFT feature vector similar: the modulus value and the angle that correspond to pixel gradient in SIFT by the modulus value of light stream and angle, be weighted in local domain.Different from SIFT descriptor, do not need the light stream rotating each point in principal direction here, the rotational invariance that reason is to be different from spatial domain helps the same target of identification angle, and light stream angle is the important information for sports immunology.
The color distribution situation of what color structure descriptor CSD (ColorStructureDescriptor) calculated is image local area.The window of such as 8 × 8 pixel sizes slides and the color category of statistical window appearance on whole image, this CSD feature is at HMMD (Hue, Min, Max, Difference) extract under color space, major advantage be can distinct color histogram similar, but the image pair that Color-spatial distribution is comparatively different.
Summary of the invention
For defect of the prior art, the object of this invention is to provide a kind of sudden and violent probably video detecting method based on MoSIFT and CSD feature.
According to the sudden and violent probably video detecting method based on MoSIFT and CSD feature provided by the invention, comprise the steps:
Step 1: the MoSIFT feature extracting test video, training video respectively;
Step 2: quantity and dimension reduction are carried out to the MoSIFT feature of the test video extracted, training video;
Step 3: the training video MoSIFT feature after cutting down quantity and dimension carries out cluster, obtains cluster centre matrix;
Step 4: obtain corresponding cluster centre according to cluster centre matrix, utilizes described cluster centre to add up the word bag of training video and test video, obtains training sample and test sample book;
Step 5: utilize described training sample to train SVM classifier, obtain sorter, be designated as SVM-1;
Step 6: utilize SVM-1 to carry out classified calculating respectively in training sample, test sample book, calculate mark, be designated as MoSIFT mark;
Step 7: the CSD feature extracting training video, test video respectively;
Step 8: utilize average dynamic density point MeanDiversityDensity, is called for short MDD method, the maximum dynamic density point MP of calculation training sample;
Step 9: according to the maximum dynamic density point MP difference calculation training sample of training sample, the CSD mark of test sample book;
Step 10: the MoSIFT mark and the CSD mark training SVM classifier that utilize training sample, the sorter obtaining training is designated as SVM-2;
Step 11: utilize SVM-2 on the MoSIFT mark and CSD mark of test sample book, classify, and then complete test video is classified.
Preferably, described step 1 comprises:
Step 1.1: test video and training video are split framing;
Step 1.2: adjacent two frames before and after process, calculates MoSIFT feature;
Particularly, utilize the MoSIFT algorithm realized based on Opencv, call the constructed fuction of the MoSIFT class in Opencv, obtain the MoSIFT feature that front and back two continuous frames has, and described MoSIFT feature is saved as the Mat categorical data of Opencv;
Step 1.3: by certain frame of a training video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fisrt feature matrix, described fisrt feature matrix is designated as M 1; By certain frame of a test video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fourth feature matrix, described fourth feature matrix is designated as M 4, processed all videos, obtained the fisrt feature matrix M of each video 1.
Preferably, described step 2 comprises:
Step 2.1: to fisrt feature matrix M 1in the first row perform following process successively to last column:
In fisrt feature matrix M 1find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Step 2.2: by the fisrt feature matrix M after step 2.1 processes 1k is divided into etc. line number 4part, every portion is longitudinally averaged, i.e. calculated column mean value, obtains k by longitudinal spliced for every a result 4× w 1matrix and second characteristic matrix, described second characteristic matrix is designated as M 2, wherein w 1represent M 1transverse dimensions;
Step 2.3: statistics second characteristic matrix M 2in the number of non-zero element in each longitudinal dimension, extract second characteristic matrix M 2in the number of non-zero element come front k 1the column vector of the dimension of position, obtains third feature matrix M 3; Remember the row number of the dimension extracted, according to described row number, extract fourth feature matrix M 4obtain sixth feature matrix M 6.
Preferably, described step 3 comprises: to the third feature matrix M of all training videos 3carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C 1, described cluster centre Matrix C 1every a line is as a cluster centre;
Particularly, the k-means function process third feature matrix M of MATLAB is used 3, and to limit maximum iterations be that 200, start value is set to cluster, the value that k-means function returns is set to [idx, center], and idx represents the position after certain a line cluster in M, and namely center represents cluster centre Matrix C 1.
Preferably, described step 4 comprises:
Step 4.1: statistics idx obtains the word bag of training sample corresponding in training video, calculates sixth feature matrix M 6in a line and cluster centre Matrix C 1in the distance of every a line, get cluster centre Matrix C 1in with sixth feature matrix M 6in the minimum row of a row distance as vocabulary;
Step 4.2: the number of times that in statistical test video, each vocabulary occurs, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as v; The number of each vocabulary occurred in statistics training video word bag, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as q;
Step 4.3: the corresponding video vector v of each test video, each training video corresponding video vector q, obtains the training sample S that represents by the vectorial q of video, video vector v and test sample book T.
Preferably, described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.
Preferably, described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, calculates
In formula: a jrepresent the estimated parameter of a jth training sample in SVM, y jrepresent the label of a jth training sample, G (x j, x) represent inner product, x jrepresent the support vector of a jth training sample, x represents sample, and b represents SVM estimated parameter.
Preferably, described step 7 comprises:
Step 7.1: video is divided into k 2part, get first frame of every part, the set of all frames taken out is designated as M 7
Step 7.2: extract M 7in CSD feature, particularly, CSD feature is extracted with MPEG7FEX instrument, the CSD of one frame is characterized as a vector, and getting dimension is 128, the feature of a video is saved as a file, the CSD feature of one frame saves as a line in this file, wherein, the feature of all frames of a video is called bag, and the feature of a frame is called example.
Preferably, described step 8 comprises:
Step 8.1: the positive sample in all training videos and negative sample are up numbered with sequence number from 1 respectively, inside each sample, the CSD feature of each frame is also up numbered with sequence number from 1;
Step 8.2: start to perform following process to all examples successively to the example being numbered 1 be numbered in the positive sample of 1:
Example i nearest in the positive sample finding certain example to number at the next one, by example i and the link of certain example, performs following process to example i:
Initialization i 1=pi 1r, k=1 performs
pm k = 1 k Σ t = 1 k pi ti t
i k + 1 = arg min i k + 1 ∈ 1 : m E D ( pm k , pi ki k )
In formula: i 1represent the example of be linked the 1st bag, k represents that what processing is that kth is wrapped, i krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, ED (pm k, ) represent Euclidean distance between example, pm kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure kindividual example, arg () represents the parameter of looking for bracket condition, represent the t in t positive closure tindividual example;
Step 8.3: the mean value of the link of some examples is designated as u, then the set of the mean value of the link of all examples is designated as P;
Step 8.4: all negative samples in training video are spliced into matrix M 8, by all negative bag example hybrid together, each example is a row vector, obtains matrix M 8;
Step 8.5: to matrix M 8the first row performs following process successively to last column:
In matrix M 8find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Step 8.6: by M 8be divided into k by row 3part, every part of longitudinal direction is averaged, and obtains w, and the set of all w is designated as N;
Step 8.7: in set of computations P in individual element to N whole element distance and, that to get in set P the distance of all elements in individual element to N is designated as MP with minimum element, is maximum dynamic density point; Computing formula is as follows:
w = arg max w ∈ 1 : m Σ i = 1 k E D ( pm w , nm i )
MP=pm w
In formula: pm wrepresent w value in pm, pm represents the total collection of the average of different links, nm irepresent i-th negative bag example average.
Preferably, described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:
In formula: bi trepresent i-th example of test pack;
Described step 10 comprises: the MoSIFT mark of all training samples and test sample book and CSD mark are combined into 2 dimensional vector w, and the set of the w of training sample is designated as R, trains SVM classifier by set R.
Compared with prior art, the present invention has following beneficial effect:
1, the detection method of the sudden and violent probably video based on MoSIFT and CSD feature provided by the invention effectively can detect the terrified element of violence in such as TV programme, Internet video, and precision is high.
2, the detection method of the sudden and violent probably video based on MoSIFT with CSD feature provided by the invention is compared with the invention only using MoSIFT feature to carry out detecting, and its algorithm complex is less, and therefore detection speed is fast.
3, the detection method that the present invention is based on the sudden and violent probably video of MoSIFT with CSD feature has better accuracy compared with the method only using CSD feature to carry out detecting.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the schematic flow sheet of the sudden and violent probably video detecting method based on MoSIFT and CSD feature provided by the invention.
Embodiment
Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some distortion and improvement can also be made.These all belong to protection scope of the present invention.
Particularly, as shown in Figure 1, according to the detection method of the sudden and violent probably video based on MoSIFT and CSD feature provided by the invention, comprise the steps:
Step 1: the MoSIFT feature extracting test video and training video respectively;
Step 2: quantity and dimension reduction are carried out to the training video after extracting MoSIFT feature;
Step 3: the training video MoSIFT feature after cutting down quantity and dimension carries out cluster, obtains cluster centre matrix;
Step 4: obtain corresponding cluster centre according to cluster centre matrix, utilizes described cluster centre to add up the word bag of training video and test video, obtains training sample and test sample book;
Step 5: utilize described training sample to train SVM classifier, obtain sorter, be designated as SVM-1;
Step 6: utilize SVM-1 to carry out classified calculating on training sample and test sample book, calculate mark, be designated as MoSIFT mark;
Step 7: the CSD feature extracting training video and test video respectively;
Step 8: utilize average dynamic density point MeanDiversityDensity, is called for short MDD method, the maximum dynamic density point MP of calculation training sample;
Step 9: according to the maximum dynamic density point MP calculation training sample of training sample and the CSD mark of test sample book;
Step 10: the MoSIFT mark and the CSD mark training SVM classifier that utilize training sample, the sorter obtaining training is designated as SVM-2;
Step 11: utilize SVM-2 on the MoSIFT mark and CSD mark of test sample book, classify, and then complete test video is classified.
Described step 1 comprises:
Step 1.1: test video and training video are split framing;
Step 1.2: adjacent two frames before and after process, calculates MoSIFT feature;
Particularly, utilize the MoSIFT algorithm realized based on Opencv, call the constructed fuction of MoSIFT class wherein, obtain the MoSIFT feature that front and back two continuous frames has, and described MoSIFT feature is saved as the Mat categorical data of Opencv;
Step 1.3: by certain frame of a training video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fisrt feature matrix, described fisrt feature matrix is designated as M 1; By certain frame of a test video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fourth feature matrix, described fourth feature matrix is designated as M 4, processed all videos, obtained the fisrt feature matrix M of each video 1.
Described step 2 comprises:
Step 2.1: to fisrt feature matrix M 1in the first row perform following process successively to last column:
In fisrt feature matrix M 1find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Concrete, if process jth row, then in jth+1 row to last column, look for nearest a line, herein, the distance of institute's directed quantity all uses Euclidean distance.
Step 2.2: by the fisrt feature matrix M after step 2.1 processes 1k is divided into etc. line number 4part, every portion is longitudinally averaged, i.e. calculated column mean value, obtains k by longitudinal spliced for every a result 4× w 1matrix and second characteristic matrix, described second characteristic matrix is designated as M 2, wherein w 1represent M 1transverse dimensions;
Particularly, the line number of M1 is designated as m1, and the capable and last y provisional capital of y=m1%k4, h=(m1-y)/k4, every h is longitudinally averaging and obtains u, by longitudinal spliced for all u be M 2.
Step 2.3: statistics second characteristic matrix M 2in the number of non-zero element in each longitudinal dimension, extract second characteristic matrix M 2in non-zero element number size come front k 1the column vector of the dimension of position, obtains third feature matrix M 3.Remember the position of dimension, according to same position, extract fourth feature matrix M 4obtain sixth feature matrix M 6.
Described step 3 comprises: to the third feature matrix M of all training videos 3carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C 1, described cluster centre Matrix C 1every a line is as a cluster centre;
Particularly, the k-means function process third feature matrix M of MATLAB is used 3, and to limit maximum iterations be that 200, start value is set to cluster, the value that k-means function returns is set to [idx, center], and idx represents the position after certain a line cluster in M, and namely center represents cluster centre Matrix C 1.
Described step 4 comprises:
Step 4.1: statistics idx can obtain the word bag of training sample corresponding in training video, calculates sixth feature matrix M 6in a line and cluster centre Matrix C 2 in the distance of every a line, get in cluster centre Matrix C 2 with sixth feature matrix M 6in the minimum row of a row distance as vocabulary;
Step 4.2: the number of times that in statistical test video, each vocabulary occurs, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as v; The number of each vocabulary occurred in statistics training video word bag, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as q;
Step 4.3: the corresponding video vector v of each test video, each training video corresponding video vector q, obtains the training sample S that represents by the vectorial q of video, video vector v and test sample book T.
Described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.
Described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, and computing formula is as follows:
In formula: a jrepresent SVM estimated parameter, y jrepresent the label of training sample, G (x j, x) represent inner product, x jexpress support for vector, x represents sample, and b represents SVM estimated parameter.
Described step 7 comprises:
Step 7.1: video is divided into k 2part, get first frame of every part, the set of all frames taken out is designated as M 7
Step 7.2: extract M 7in CSD feature, particularly, extract CSD feature with MPEG7FEX instrument, getting dimension is 128, the feature of a video is saved as a file, and a frame is a line, and wherein, the feature of all frames of a video is called bag, and the feature of a frame is called example.
Described step 8 comprises:
Step 8.1: the positive sample in all training videos and negative sample are up numbered with sequence number from 1 respectively, inside each sample, the CSD feature of each frame is also up numbered with sequence number from 1;
Step 8.2: start to perform following process to all examples successively to the example being numbered 1 be numbered in the positive sample of 1:
Example i nearest in the positive sample finding certain example to number at the next one, by example i and the link of certain example, performs following process to example i:
Initialization i 1=pi 1r, k=1 performs
pm k = 1 k Σ t = 1 k pi ti t
i k + 1 = arg min i k + 1 ∈ 1 : m E D ( pm k , pi ki k )
In formula: i 1represent the example of the 1st bag be linked, pi 1represent all examples of the 1st positive closure, k represents that what processing is kth bag, i krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, ED (pm k, ) represent Euclidean distance between example, pm kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure kindividual example, arg () represents the parameter of looking for bracket condition;
Step 8.3: the mean value of the link of some examples is designated as u, then the set of the mean value of the link of all examples is designated as P;
Step 8.4: all negative samples in training video are spliced into matrix M 8, by all negative bag example hybrid together, each example is a row vector, obtains matrix M 8;
Step 8.5: to matrix M 8the first row performs following process successively to last column:
In matrix M 8find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Step 8.6: by M 8be divided into k by row 3part, every part of longitudinal direction is averaged, and obtains w, and the set of all w is designated as N;
Step 8.7: in set of computations P in individual element to N whole element distance and, that to get in set P the distance of all elements in individual element to N is designated as MP with minimum element, is maximum dynamic density point; Computing formula is as follows:
w = arg max w ∈ 1 : m Σ i = 1 k E D ( pm w , nm i )
MP=pm w
In formula: pm wrepresent w pm, nm irepresent i-th negative bag example average.
Described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:
In formula: bi trepresent i-th example of test pack;
Described step 10 comprises: the MoSIFT mark of all training samples and test sample book and CSD mark are combined into 2 dimensional vector w, and the set of the w of training sample is designated as R, trains SVM classifier by set R.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims (10)

1., based on a sudden and violent probably video detecting method for MoSIFT and CSD feature, it is characterized in that, comprise the steps:
Step 1: the MoSIFT feature extracting test video, training video respectively;
Step 2: quantity and dimension reduction are carried out to the MoSIFT feature of the test video extracted, training video;
Step 3: the training video MoSIFT feature after cutting down quantity and dimension carries out cluster, obtains cluster centre matrix;
Step 4: obtain corresponding cluster centre according to cluster centre matrix, utilizes described cluster centre to add up the word bag of training video and test video, obtains training sample and test sample book;
Step 5: utilize described training sample to train SVM classifier, obtain sorter, be designated as SVM-1;
Step 6: utilize SVM-1 to carry out classified calculating respectively in training sample, test sample book, calculate mark, be designated as MoSIFT mark;
Step 7: the CSD feature extracting training video, test video respectively;
Step 8: utilize average dynamic density point MeanDiversityDensity, is called for short MDD method, the maximum dynamic density point MP of calculation training sample;
Step 9: according to the maximum dynamic density point MP difference calculation training sample of training sample, the CSD mark of test sample book;
Step 10: the MoSIFT mark and the CSD mark training SVM classifier that utilize training sample, the sorter obtaining training is designated as SVM-2;
Step 11: utilize SVM-2 on the MoSIFT mark and CSD mark of test sample book, classify, and then complete test video is classified.
2. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 1, it is characterized in that, described step 1 comprises:
Step 1.1: test video and training video are split framing;
Step 1.2: adjacent two frames before and after process, calculates MoSIFT feature;
Particularly, utilize the MoSIFT algorithm realized based on Opencv, call the constructed fuction of the MoSIFT class in Opencv, obtain the MoSIFT feature that front and back two continuous frames has, and described MoSIFT feature is saved as the Mat categorical data of Opencv;
Step 1.3: by certain frame of a training video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fisrt feature matrix, described fisrt feature matrix is designated as M 1; By certain frame of a test video and the MoSIFT feature of this frame longitudinal spliced, obtain corresponding fourth feature matrix, described fourth feature matrix is designated as M 4, processed all videos, obtained the fisrt feature matrix M of each video 1.
3. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 2, it is characterized in that, described step 2 comprises:
Step 2.1: to fisrt feature matrix M 1in the first row perform following process successively to last column:
In fisrt feature matrix M 1find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Step 2.2: by the fisrt feature matrix M after step 2.1 processes 1k is divided into etc. line number 4part, every portion is longitudinally averaged, i.e. calculated column mean value, obtains k by longitudinal spliced for every a result 4× w 1matrix and second characteristic matrix, described second characteristic matrix is designated as M 2, wherein w 1represent M 1transverse dimensions;
Step 2.3: statistics second characteristic matrix M 2in the number of non-zero element in each longitudinal dimension, extract second characteristic matrix M 2in the number of non-zero element come front k 1the column vector of the dimension of position, obtains third feature matrix M 3; Remember the row number of the dimension extracted, according to described row number, extract fourth feature matrix M 4obtain sixth feature matrix M 6.
4. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 3, it is characterized in that, described step 3 comprises: to the third feature matrix M of all training videos 3carry out longitudinal spliced, obtain total training matrix M, K-means clustering processing is carried out to M, obtain cluster centre Matrix C 1, described cluster centre Matrix C 1every a line is as a cluster centre;
Particularly, the k-means function process third feature matrix M of MATLAB is used 3, and to limit maximum iterations be that 200, start value is set to cluster, the value that k-means function returns is set to [idx, center], and idx represents the position after certain a line cluster in M, and namely center represents cluster centre Matrix C 1.
5. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 4, it is characterized in that, described step 4 comprises:
Step 4.1: statistics idx obtains the word bag of training sample corresponding in training video, calculates sixth feature matrix M 6in a line and cluster centre Matrix C 1in the distance of every a line, get cluster centre Matrix C 1in with sixth feature matrix M 6in the minimum row of a row distance as vocabulary;
Step 4.2: the number of times that in statistical test video, each vocabulary occurs, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as v; The number of each vocabulary occurred in statistics training video word bag, and be depicted as histogram, using this histogrammic eigenwert as the vector of the video of correspondence, be designated as q;
Step 4.3: the corresponding video vector v of each test video, each training video corresponding video vector q, obtains the training sample S that represents by the vectorial q of video, video vector v and test sample book T.
6. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 5, it is characterized in that, described step 5 comprises: utilize described training sample S Training Support Vector Machines SVM classifier, the sorter obtaining training is designated as SVM-1; Particularly, use fitcsvm function process training sample S to be positive sample or negative sample label information with the sample determined, obtain the SVM-1 of svmmodel class inside MATLAB.
7. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 6, it is characterized in that, described step 6 comprises: the MoSIFT mark utilizing SVM-1 calculation training sample and test sample book, calculates
In formula: a jrepresent the estimated parameter of a jth training sample in SVM, y jrepresent the label of a jth training sample, G (x j, x) represent inner product, x jrepresent the support vector of a jth training sample, x represents sample, and b represents SVM estimated parameter.
8. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 7, it is characterized in that, described step 7 comprises:
Step 7.1: video is divided into k 2part, get first frame of every part, the set of all frames taken out is designated as M 7
Step 7.2: extract M 7in CSD feature, particularly, CSD feature is extracted with MPEG7FEX instrument, the CSD of one frame is characterized as a vector, and getting dimension is 128, the feature of a video is saved as a file, the CSD feature of one frame saves as a line in this file, wherein, the feature of all frames of a video is called bag, and the feature of a frame is called example.
9. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 8, it is characterized in that, described step 8 comprises:
Step 8.1: the positive sample in all training videos and negative sample are up numbered with sequence number from 1 respectively, inside each sample, the CSD feature of each frame is also up numbered with sequence number from 1;
Step 8.2: start to perform following process to all examples successively to the example being numbered 1 be numbered in the positive sample of 1:
Example i nearest in the positive sample finding certain example to number at the next one, by example i and the link of certain example, performs following process to example i:
Initialization i 1=pi 1r, k=1 performs
pm k = 1 k Σ t = 1 k pi ti t
i k + 1 = argmin i k + 1 ∈ 1 : m E D ( pm k , pi ki k )
In formula: i 1represent the example of be linked the 1st bag, k represents that what processing is that kth is wrapped, i krepresent the example of the kth that an is linked bag, what m represented positive closure has m individual, represent the Euclidean distance between example, pm kthe average of the positive example that the chain of k bag connects before representing, represent i-th in a kth positive closure kindividual example, arg () represents the parameter of looking for bracket condition, represent the t in t positive closure tindividual example;
Step 8.3: the mean value of the link of some examples is designated as u, then the set of the mean value of the link of all examples is designated as P;
Step 8.4: all negative samples in training video are spliced into matrix M 8, by all negative bag example hybrid together, each example is a row vector, obtains matrix M 8;
Step 8.5: to matrix M 8the first row performs following process successively to last column:
In matrix M 8find a line nearest apart from this row in all row below certain a line, and the next line of a line nearest apart from this row and this row is exchanged;
Step 8.6: by M 8be divided into k by row 3part, every part of longitudinal direction is averaged, and obtains w, and the set of all w is designated as N;
Step 8.7: in set of computations P in individual element to N whole element distance and, that to get in set P the distance of all elements in individual element to N is designated as MP with minimum element, is maximum dynamic density point; Computing formula is as follows:
w = argmax w ∈ 1 : m Σ i = 1 k E D ( pm w , nm i )
MP=pm w
In formula: pm wrepresent w value in pm, pm represents the total collection of the average of different links, nm irepresent i-th negative bag example average.
10. the sudden and violent probably video detecting method based on MoSIFT and CSD feature according to claim 9, it is characterized in that, described step 9 comprises: the CSD mark calculating test sample book according to the maximum dynamic density point MP of training sample; Computing formula is as follows:
In formula: bi trepresent i-th example of test pack;
Described step 10 comprises: the MoSIFT mark of all training samples and test sample book and CSD mark are combined into 2 dimensional vector w, and the set of the w of training sample is designated as R, trains SVM classifier by set R.
CN201510894937.2A 2015-12-07 2015-12-07 Video detecting method is feared cruelly based on MoSIFT and CSD feature Active CN105512631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510894937.2A CN105512631B (en) 2015-12-07 2015-12-07 Video detecting method is feared cruelly based on MoSIFT and CSD feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510894937.2A CN105512631B (en) 2015-12-07 2015-12-07 Video detecting method is feared cruelly based on MoSIFT and CSD feature

Publications (2)

Publication Number Publication Date
CN105512631A true CN105512631A (en) 2016-04-20
CN105512631B CN105512631B (en) 2019-01-25

Family

ID=55720598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510894937.2A Active CN105512631B (en) 2015-12-07 2015-12-07 Video detecting method is feared cruelly based on MoSIFT and CSD feature

Country Status (1)

Country Link
CN (1) CN105512631B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107592635A (en) * 2017-09-05 2018-01-16 东南大学 Malicious user method of discrimination based on SOM neutral nets in cognitive radio
CN108595477A (en) * 2018-03-12 2018-09-28 北京奇艺世纪科技有限公司 A kind for the treatment of method and apparatus of video data
WO2019127940A1 (en) * 2017-12-25 2019-07-04 上海七牛信息技术有限公司 Video classification model training method, device, storage medium, and electronic device
CN110647905A (en) * 2019-08-02 2020-01-03 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110659688A (en) * 2019-09-24 2020-01-07 江西慧识智能科技有限公司 Monitoring video riot and terrorist behavior identification method based on machine learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode
US20110081052A1 (en) * 2009-10-02 2011-04-07 Fotonation Ireland Limited Face recognition performance using additional image features
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
US20130287248A1 (en) * 2012-04-26 2013-10-31 General Electric Company Real-time video tracking system
CN103500324A (en) * 2013-09-29 2014-01-08 重庆科技学院 Violent behavior recognition method based on video monitoring
US20140232862A1 (en) * 2012-11-29 2014-08-21 Xerox Corporation Anomaly detection using a kernel-based sparse reconstruction model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110081052A1 (en) * 2009-10-02 2011-04-07 Fotonation Ireland Limited Face recognition performance using additional image features
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode
US20130287248A1 (en) * 2012-04-26 2013-10-31 General Electric Company Real-time video tracking system
US20140232862A1 (en) * 2012-11-29 2014-08-21 Xerox Corporation Anomaly detection using a kernel-based sparse reconstruction model
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
CN103500324A (en) * 2013-09-29 2014-01-08 重庆科技学院 Violent behavior recognition method based on video monitoring

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李荣杰等: "一种基于音频词袋的暴力视频分类方法", 《上海交通大学学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107592635A (en) * 2017-09-05 2018-01-16 东南大学 Malicious user method of discrimination based on SOM neutral nets in cognitive radio
CN107592635B (en) * 2017-09-05 2019-10-11 东南大学 Based on the malicious user method of discrimination of SOM neural network in cognitive radio
WO2019127940A1 (en) * 2017-12-25 2019-07-04 上海七牛信息技术有限公司 Video classification model training method, device, storage medium, and electronic device
CN108595477A (en) * 2018-03-12 2018-09-28 北京奇艺世纪科技有限公司 A kind for the treatment of method and apparatus of video data
CN108595477B (en) * 2018-03-12 2021-10-15 北京奇艺世纪科技有限公司 Video data processing method and device
CN110647905A (en) * 2019-08-02 2020-01-03 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110647905B (en) * 2019-08-02 2022-05-13 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110659688A (en) * 2019-09-24 2020-01-07 江西慧识智能科技有限公司 Monitoring video riot and terrorist behavior identification method based on machine learning

Also Published As

Publication number Publication date
CN105512631B (en) 2019-01-25

Similar Documents

Publication Publication Date Title
Zapletal et al. Vehicle re-identification for automatic video traffic surveillance
CN107316007B (en) Monitoring image multi-class object detection and identification method based on deep learning
Hu et al. Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features
CN105808732B (en) A kind of integrated Target attribute recognition and precise search method based on depth measure study
CN105512631A (en) Violence and horror video detection method based on MoSIFT and CSD features
CN109508663B (en) Pedestrian re-identification method based on multi-level supervision network
CN104050471B (en) Natural scene character detection method and system
Dehghan et al. View independent vehicle make, model and color recognition using convolutional neural network
CN103218608B (en) Network violent video identification method
CN105740758A (en) Internet video face recognition method based on deep learning
CN101894276B (en) Training method of human action recognition and recognition method
CN108665481A (en) Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method
CN104915655A (en) Multi-path monitor video management method and device
CN106372666B (en) A kind of target identification method and device
CN106354816A (en) Video image processing method and video image processing device
CN104504377B (en) A kind of passenger on public transport degree of crowding identifying system and method
CN106776943A (en) A kind of vehicle retrieval method based on AutoEncoder and attribute tags
Møgelmose et al. Traffic sign detection for us roads: Remaining challenges and a case for tracking
CN103902966B (en) Video interactive affair analytical method and device based on sequence space-time cube feature
CN104915643A (en) Deep-learning-based pedestrian re-identification method
CN109376580B (en) Electric power tower component identification method based on deep learning
CN111738218B (en) Human body abnormal behavior recognition system and method
Juranek et al. Real-time pose estimation piggybacked on object detection
CN106326932A (en) Power line inspection image automatic identification method based on neural network and power line inspection image automatic identification device thereof
CN106257496A (en) Mass network text and non-textual image classification method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant