CN105005772A - Video scene detection method - Google Patents
Video scene detection method Download PDFInfo
- Publication number
- CN105005772A CN105005772A CN201510427821.8A CN201510427821A CN105005772A CN 105005772 A CN105005772 A CN 105005772A CN 201510427821 A CN201510427821 A CN 201510427821A CN 105005772 A CN105005772 A CN 105005772A
- Authority
- CN
- China
- Prior art keywords
- video
- formula
- vector
- represent
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Abstract
The invention discloses a video scene detection method. According to the method, video data is detected by a computer instead of artificial detection, and scenes in the video are recognized. The detection method includes an offline training discrimination model process and a video scene detection process. The offline training discrimination model process includes: centralizing each video extraction feature comprising semantic and space-time feature extraction aiming to training video samples; performing category annotation of feature vectors and obtaining a group of sample set; performing iteration training of the sample set by employing a multiple-kernel learning frame and obtaining an offline training model. The video scene detection process includes: accessing to a monitoring video source; performing video sampling and obtaining a short video; extracting features from the short video; and loading the offline training model, detecting the features, and obtaining a detection result. According to the method, the scenes in the video can be recognized by the computer instead of artificial detection, the detection efficiency can be improved, the cost is lowered, and convenience is brought to data storage and retrieval.
Description
Technical field
The present invention relates to video information analytical technology, particularly relate to a kind of video scene detection method.
Background technology
Current, video monitoring system is day by day popularized, its maintaining public order, crack in crime case etc. and play irreplaceable effect.In field of video monitoring, identify that abnormal scene is very important, such as accurately detect the abnormal scene such as the behavior of the impairment public safeties such as group affray, the illegal operation of detection pedlar significant in social management, city management field.
Video monitoring system comprises front-end camera, transmission equipment and video monitoring platform.Camera acquisition head end video picture signal, by sending to monitor supervision platform after transmission equipment pressure, platform is by work such as the storage of complete paired data, accident detections.Monitor video often has the feature that data volume is large, information redundancy is many, if arrange manually to monitor these videos, process, not only take time and effort, accuracy rate also cannot be guaranteed.
Along with the development of computer vision technique, computing machine can the object such as people, animal, car in recognition image, and progressively for oblige by doing, some simply work.But, prior art to the identification of scene for object mainly static images.Compare static images, video has time dimension, and comprises the change information of background and the movable information of target object, therefore deals with more complicated.At present, mostly by manual method, video data monitored, process and find abnormal scene wherein, take time and effort, cost is high, efficiency is low, and accuracy rate cannot be guaranteed, and is also difficult to realize efficiently to the storage of Video processing analysis result data and retrieval recycling in the future.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of detection method of video Scene, replace manually detecting video data with computing machine, find abnormal scene wherein, greatly can improve detection efficiency, reduce costs, data are stored and also provides facility with retrieval in the future.
Technical scheme provided by the invention is:
A kind of video scene detection method, the method for manually detecting video data, identifies the scene in video by computer generation; Detection method comprises off-line training discrimination model process and video scene testing process:
1) off-line training discrimination model process, performs and operates as follows:
11) training video sample set is prepared;
12) for video extraction feature each in training video sample set, be characterized as vector form, comprise semantic feature and extract and space-time characteristic extraction;
13) carry out classification mark to proper vector, obtain one group of sample set, each sample packages is containing semantic feature vector sum space-time characteristic vector, and corresponding classification mark;
14) Multiple Kernel Learning framework is utilized to step 13) described sample set carries out repetitive exercise, obtains an off-line training model;
2) video scene testing process, performs and operates as follows:
21) the monitor video source that will detect is accessed;
22) sample mode is set and carries out video sampling, obtain a short-sighted frequency; This short-sighted frequency is for detecting target;
23) to step 22) described short video extraction feature, comprise semantic feature vector sum space-time characteristic vector, abstracting method and step 12 in training process) identical;
24) utilize Multiple Kernel Learning framework to be loaded into off-line training model, detection is carried out to feature and differentiates, determine whether given scenario, obtain testing result.
For above-mentioned video scene detection method, further, step 11) described training video sample comprises two class samples, and a class is comprise the video set that pedlar manages scene, another kind of for not comprise the video set that pedlar manages scene.
Step 12) for video extraction feature each in training video sample set, comprise semantic feature extraction process and space-time characteristic extraction process.
Wherein, semantic feature extraction process specifically comprises the steps:
121a) to each video, calculated the score of every frame picture by extraction method of key frame, choose the m that score is the highest
Frame picture is as key frame, and score computing formula is as follows:
Sdiff (f
k)=∑
i,j| I
k(i, j)-I
k-1(i, j) | (formula 2)
In formula 1 ~ formula 3, f
krepresent kth frame picture in video sequence; Score (f
k) represent the score of kth frame picture; Sdiff (f
k) represent the measures of dispersion of this frame and former frame; α, β are respectively weight; Max_Sdiff and Min_Sdiff is respectively maximum difference and the minimal difference of adjacent two interframe;
with
represent the variable quantity of horizontal direction and the variable quantity of vertical direction of pixel i light stream in kth frame picture respectively; N
krepresent kth frame number of pixels; MoValue (f
k) represent the light stream intensity of kth frame; Max_MoValue represents maximum light stream intensity in all frames; Min_MoValue represents minimum light intensity of flow in all frames;
121b) to the m frame picture chosen, for every frame picture, extract picture semantic feature with Dartmouth Classeme feature extracting method, obtain the semantic feature vector of this frame picture;
121c) m the real number proper vector that extraction m frame picture obtains is spliced, obtain the vector of a m*2659 dimension, as the semantic feature vector of this video.
In an embodiment of the present invention, step 121a) described m frame picture is three frame pictures.For feature extraction, space and time order feature extraction process specifically comprises the steps:
122a) to each training video, extracted by MoSIFT feature extracting method and obtain MoSIFT feature;
122b) based on MoSIFT features all in video set, generate visual dictionary;
122c) utilize above-mentioned visual dictionary, Fei Sheer vector coding is carried out to each video, obtain the Fei Sheer vector of a 2*D*K dimension;
122d) implement principal component analysis (PCA) to above-mentioned Fei Sheer vector, obtain a low dimensional vector, this low dimensional vector is the space-time characteristic vector of video.
Above-mentioned steps 122b) concrete employing mixed Gauss model generation visual dictionary.
For above-mentioned video scene detection method, further, step 14) described Multiple Kernel Learning framework is Multiple Kernel Learning framework in Shogun kit, adopts the mode of linear weighted function to combine kernel function, is expressed as formula 9:
In formula 9, K
k(x
i, x
j) represent a kth kernel function; β
krepresent the weight of a kth kernel function; x
i, x
jrepresent video sample i respectively, j is to should the feature of kernel function;
Choose two polynomial kernel as kernel function, characteristic of correspondence is semantic feature and space-time characteristic respectively; The formula of polynomial kernel is such as formula 10:
K (x, x
i)=((xx
i)+1)
d(formula 10)
In formula 10, x, x
irepresent the vector of the input space respectively; D represents exponent number;
The constrained optimization problem of Multiple Kernel Learning is expressed as:
(formula 11)
In formula 11, N represents the vectorial number of the input space; ξ
irepresent the coefficient of relaxation of vectorial i; S represents the number of kernel function; w
krepresent the width of the interphase corresponding to a kth kernel function to support vector; C represents penalty factor; In constraint condition, y
ifor the classification (being 1 or-1) of vector;
for the higher dimensional space mapping function that a kth kernel function is corresponding; B is side-play amount.
Solving especially by Lagrangian changing method of described Multiple Kernel Learning model, obtains solving objective function and is:
In formula 12, N represents the vectorial number of the input space; x
i, x
jrepresent the vector of the input space; α
i, α
jfor the weight of correspondence, obtained by study; y
i, y
jfor the classification of correspondence; S represents the number of kernel function; β
krepresent the weight of a kth kernel function, also obtained by study; In constraint condition, C represents penalty factor, and p is normalization norm.
In an embodiment of the present invention, the exponent number d of polynomial kernel described in formula 10 is 2.
Step 22) mode of described video sampling comprise every time sampling and every frame sampling; Every time sampling specifically every t, sampling should be carried out second, once sample 10 seconds, form a short-sighted frequency; Every frame sampling specifically every k frame sampling once, adopt enough 240 frames and form a short-sighted frequency; This short-sighted frequency is for detecting target.
Compared with prior art, the invention has the beneficial effects as follows:
The invention provides a kind of detection method of video Scene, the method is replaced by computer generation and is manually detected video data, video semanteme feature is extracted based on external knowledge storehouse, consider the Key-frame Extraction Algorithm of background and movable information, and solve video Scene test problems by the method for Multiple Kernel Learning, detection method comprises off-line training discrimination model process and video scene testing process, by identifying the scene in video, can find abnormal scene wherein.Technical scheme provided by the invention can improve detection efficiency greatly, reduces costs, and stores also provide facility with retrieval in the future to data.
Accompanying drawing explanation
Fig. 1 is that the present invention obtains the FB(flow block) of off-line training discrimination model by learning training process.
Fig. 2 is the FB(flow block) of video scene testing process provided by the invention.
Embodiment
Below in conjunction with accompanying drawing, further describe the present invention by embodiment, but the scope do not limited the present invention in any way.
The invention provides a kind of video scene detection method, the method for manually detecting video data, identifies the scene in video by computer generation; Detection method comprises off-line training discrimination model process and video scene testing process:
1) off-line training discrimination model process, performs and operates as follows:
11) training video sample set is prepared;
12) for video extraction feature each in training video sample set, be characterized as vector form, comprise semantic feature and extract and space-time characteristic extraction;
13) carry out classification mark to proper vector, obtain one group of sample set, each sample packages is containing semantic feature vector sum space-time characteristic vector, and corresponding classification mark;
14) Multiple Kernel Learning framework is utilized to step 13) described sample set carries out repetitive exercise, obtains an off-line training model;
2) video scene testing process, performs and operates as follows:
21) the monitor video source that will detect is accessed;
22) sample mode is set and carries out video sampling, obtain a short-sighted frequency; This short-sighted frequency is for detecting target;
23) to step 22) described short video extraction feature, comprise semantic feature vector sum space-time characteristic vector, abstracting method and step 12 in training process) identical;
24) utilize Multiple Kernel Learning framework to be loaded into off-line training model, detection is carried out to feature and differentiates, determine whether given scenario, obtain testing result.
Whether the present embodiment utilizes monitor video, detect in video and have pedlar to manage scene.Detection method comprises off-line training discrimination model process and video scene testing process.
1) off-line training discrimination model process: utilize training video sample, off-line training discrimination model
11) training video sample is prepared;
In the present embodiment, training video sample comprises two class samples, and a class is comprise the video set that pedlar manages scene, and a class is do not comprise the video set that pedlar manages scene;
12) for video extraction feature each in training video sample, comprise semantic feature and extract and space-time characteristic extraction;
Feature for characterizing this video comprises semantic feature and space-time characteristic; Be characterized as vector form; Obtain two proper vectors for each video extraction feature, one of them is semantic feature vector, for characterizing semantic feature; Another is space-time characteristic vector (space-time dimension), for characterizing space-time characteristic.
121) process extracting semantic feature specifically comprises:
121a) to each video, utilize extraction method of key frame to calculate the score of every frame picture, choose the highest m frame picture of score as key frame, score computing formula is as follows:
Sdiff (f
k)=∑
i,j| I
k(i, j)-I
k-1(i, j) | (formula 2)
In formula 1 ~ formula 3, f
krepresent kth frame picture in video sequence; Score (f
k) represent the score of kth frame picture; Sdiff (f
k) represent the measures of dispersion (between two frames the difference of pixel value, for RGB color image, measures of dispersion is the average of R, G, channel B difference) of this frame and former frame; α, β are respectively weight; Max_Sdiff and Min_Sdiff is respectively maximum difference and the minimal difference of adjacent two interframe;
with
represent the variable quantity of horizontal direction and the variable quantity of vertical direction of pixel i light stream in kth frame picture respectively; N
krepresent kth frame number of pixels; MoValue (f
k) represent the light stream intensity of kth frame; Max_MoValue represents maximum light stream intensity in all frames; Min_MoValue represents minimum light intensity of flow in all frames.
Above-mentioned extraction method of key frame is chosen obtain key frame by being considered picture scene change informa and movable information.The present embodiment setting m=3, namely utilizes extraction method of key frame to calculate the score of every frame picture, chooses 3 the highest frame pictures of score as key frame.
121b) to the m frame picture chosen, for every frame picture, extract picture semantic feature with Dartmouth classeme feature extracting method, obtain the semantic feature vector of this frame picture;
Classeme feature extracting method is the semantics extraction instrument based on external knowledge storehouse, it is a kind of descriptor of expressing image attributes, Classeme image attributes descriptor (Classemes attribute descriptor) comprises 2659 kinds of image attributes (that is having 2659 dimensions), corresponding 2659 concepts; Comprise object (as basketball, bicycle), personage (as football player, boy), place (as swimming pool, outdoor) etc.Every frame figure sector-meeting extracts the real number vector of one 2659 dimension.
121c) m the real number vector that extraction m frame picture obtains is spliced, obtain the vector of a m*2659 dimension, as the semantic feature vector of this video;
122) process extracting space-time characteristic specifically comprises:
122a) to each training video, extracted by feature extracting method and obtain MoSIFT feature;
Training video includes and comprises pedlar and manage the video of scene and do not comprise the video that pedlar manages scene; The feature extracting method that the present embodiment adopts is MoSIFT feature extracting method; Document (M.-Y.Chen and A.Hauptmann, " Mosift:Recognizing human actions in surveillance videos; " CMU-CS-09-161.Carnegie MellonUniversity, 2009.) describe and extract by MoSIFT feature extracting method the process obtaining MoSIFT feature, MoSIFT feature is a kind of space-time characteristic considering space dimension and time dimension, the feature generated is 256 dimensions, counts D;
Extracting MoSIFT feature to each training video and comprise two steps, is first the detection of point of interest, is secondly build the description to point of interest.
The detection of point of interest specifically comprises finds out Local Extremum alternatively point of interest and determine that whether candidate's point of interest is as point of interest:
Build multiple dimensioned difference of Gaussian pyramid, find out Local Extremum alternatively point of interest, the computing formula of difference of Gaussian is:
D (x, y, k δ)=L (x, y, k δ)-L (x, y, (k-1) δ) (formula 4)
In formula 4, the pixel coordinate in x and y representative image; K δ represents the standard deviation of the Gaussian function of pyramid kth layer; L (x, y, k δ) represents the convolution results of pyramid kth layer Gaussian function and image; L (x, y, (k-1) δ) represents the convolution results of pyramid kth-1 layer of Gaussian function and image; The difference result that D (x, y, k δ) is pyramid kth layer;
Whether then judge whether these candidate points exist enough movable informations by optical flow analysis, namely whether exercise intensity is enough large, to determine as point of interest.
After obtaining point of interest, MoSIFT feature extracting method is by describe SIFT (Scale-invariant feature transform) and light stream describing to combine and obtains the description as this point of interest of one 256 vector tieed up; Wherein SIFT is the classical feature for token image, there is scale invariability, carry out the point of interest in Description Image with the real number vector of 128, the describing mode of light stream is similar with SIFT feature, and both combine and just obtain one the 256 real number vector tieed up.
122b) based on MoSIFT features all in video set, generate visual dictionary;
This method adopts mixed Gauss model to generate visual dictionary, wherein, represent the size of visual dictionary with K, mixed Gauss model main thought supposes that the distribution of MoSIFT unique point meets the linear superposition of K Gaussian distribution, this method gets K=64, and the mathematical notation of mixed Gauss model is:
In formula 5, the probability distribution that P (y| θ) is MoSIFT feature; α
kfor the weight of each Gauss model; K represents the size of visual dictionary; Y represents MoSIFT proper vector; θ represents the parameter of distribution; θ
krepresent the parameter of a kth Gaussian function.
122c) utilize above-mentioned visual dictionary, Fei Sheer vector coding is carried out to each video, obtain the Fei Sheer vector of a 2*D*K dimension;
122d) implement principal component analysis (PCA) to above-mentioned Fei Sheer vector, obtain a low dimensional vector, this low dimensional vector is the space-time characteristic vector of video;
Above-mentioned 2*D*K Wei Feisheer vector is 32768 Wei Feisheer vectors; Principal component analysis (PCA) utilizes dimensionality reduction thought, and be a few generalized variable by multiple variables transformations, these generalized variables are major component, and these major components can reflect most information of original variable.To the vectorial process of carrying out principal component analysis (PCA) of Fei Sheer be in the method:
Fei Sheer vector dimension is designated as p; Make x
i=(x
i1, x
i2..., x
ip)
t, i=1,2 .., N, representation feature matrix; x
ijrepresent the jth dimensional feature value of i-th sample, eigenmatrix carried out as down conversion:
Wherein, Z
ijfor the i-th row jth row value for standardization battle array Z;
n is number of samples;
Then correlation matrix R is asked to Z:
Then the secular equation of correlation matrix R is solved:
| R-λ I
p|=0 (formula 8)
In formula 8, R is correlation matrix; I
pfor unit matrix; λ is eigenwert;
Solve formula 8 and obtain p characteristic root, it is M=1168 that this method gets major component number; Finally by primitive character matrix projection in M principal direction, obtain final space-time characteristic.
13) carry out classification mark to proper vector, obtain one group of sample set, each sample packages is containing two proper vectors, and corresponding classification mark;
In the present embodiment, classification mark is carried out to proper vector, specifically: comprising the video labeling that pedlar manages scene is 1, represent positive example, be-1 to the video labeling not comprising pedlar and manage scene, be expressed as negative example, so just obtain one group of sample set, each sample packages is containing two proper vectors, and corresponding classification mark;
14) Multiple Kernel Learning framework is utilized to carry out repetitive exercise to above-mentioned training sample set;
The present invention adopts the Multiple Kernel Learning framework in Shogun kit, combines kernel function by the mode of linear weighted function, and concrete formula is as follows:
In formula 9, K
k(x
i, x
j) represent a kth kernel function; β
krepresent the weight of a kth kernel function; x
i, x
jrepresent video sample i respectively, j is to should the feature of kernel function; Choose altogether two polynomial kernel in the method as kernel function, a kernel function characteristic of correspondence is semantic feature, and another kernel function characteristic of correspondence is space-time characteristic; The formula of polynomial kernel is as follows,
K (x, x
i)=((xx
i)+1)
d(formula 10)
In formula 10, x, x
irepresent the vector of the input space respectively; D represents exponent number, and in this method, the exponent number of polynomial kernel is 2.
The constrained optimization problem of Multiple Kernel Learning can be expressed as:
(formula 11)
In formula 11, N represents the vectorial number of the input space; ξ
irepresent the coefficient of relaxation of vectorial i; S represents the number of kernel function; w
krepresent the width of the interphase corresponding to a kth kernel function to support vector; C represents penalty factor; In constraint condition, y
ifor the classification (being 1 or-1) of vector;
for the higher dimensional space mapping function that a kth kernel function is corresponding; B is side-play amount.
Similar with SVM, solving of the Multiple Kernel Learning model that this method adopts also can become solving its dual problem by Lagrange, and the objective function that solves of the primal-dual optimization problem of Multiple Kernel Learning is:
In formula 12, N represents the vectorial number of the input space; x
i, x
jrepresent the vector of the input space; α
i, α
jfor the weight of correspondence, obtained by study; y
i, y
jfor the classification of correspondence; S represents the number of kernel function; β
krepresent the weight of a kth kernel function, also obtained by study; In constraint condition, C represents penalty factor, and p is normalization norm; This method is set as p=2, C=8.
15) off-line model can be obtained through multinuclear training;
The off-line model obtained is exactly by training the unknown parameter obtained, mainly comprising the isoparametric value of weight of support vector sample and weight, kernel function and correspondence thereof;
2) video scene testing process
21) the monitor video source that will detect is accessed;
22) sample mode is set and carries out video sampling, obtain a short-sighted frequency; This short-sighted frequency is for detecting target;
Sample mode comprise every time sampling and every frame sampling; Every time sampling specifically every t, sampling should be carried out second, once sample 10 seconds, form a short-sighted frequency; Every frame sampling specifically every k frame sampling once, adopt enough 240 frames and form a short-sighted frequency; This short-sighted frequency is for detecting target.
23) to above-mentioned short video extraction semantic feature and space-time characteristic, abstracting method flow process is identical with training process;
24) utilize Multiple Kernel Learning framework, be loaded into off-line training module, detection is carried out to feature and differentiates, determine whether given scenario, obtain testing result;
Discriminant function is:
In formula 13, except parameter x, other meaning of parameters are with formula is identical above; X represents the semantic feature and space-time characteristics that extract short-sighted frequency; By calculate discriminant function f (x) be 1 represent this video segment comprise given scenario, for-1 represent this video segment do not comprise given scenario.
It should be noted that the object publicizing and implementing example is to help to understand the present invention further, but it will be appreciated by those skilled in the art that: in the spirit and scope not departing from the present invention and claims, various substitutions and modifications are all possible.Therefore, the present invention should not be limited to the content disclosed in embodiment, and the scope that the scope of protection of present invention defines with claims is as the criterion.
Claims (10)
1. a video scene detection method, by computer generation for manually detecting video data, identifies the scene in video; Detection method comprises off-line training discrimination model process and video scene testing process:
1) off-line training discrimination model process, performs and operates as follows:
11) training video sample set is prepared;
12) for video extraction feature each in training video sample set, be characterized as vector form, comprise semantic feature vector sum space-time characteristic vector;
13) carry out classification mark to proper vector, obtain one group of sample set, each sample packages is containing semantic feature vector sum space-time characteristic vector, and corresponding classification mark;
14) Multiple Kernel Learning framework is utilized to step 13) described sample set carries out repetitive exercise, obtains an off-line training model;
2) video scene testing process, performs and operates as follows:
21) the monitor video source that will detect is accessed;
22) sample mode is set and carries out video sampling, obtain a short-sighted frequency; This short-sighted frequency is for detecting target;
23) to step 22) described short video extraction feature, comprise semantic feature vector sum space-time characteristic vector, abstracting method and step 12 in training process) identical;
24) utilize Multiple Kernel Learning framework to be loaded into off-line training model, detection is carried out to feature and differentiates, determine whether given scenario, obtain testing result.
2. video scene detection method as claimed in claim 1, is characterized in that, step 11) described training video sample comprises two class samples, and a class is comprise the video set that pedlar manages scene, another kind of for not comprise the video set that pedlar manages scene.
3. video scene detection method as claimed in claim 1, is characterized in that, step 12) for video extraction feature each in training video sample set, comprise and extract semantic feature extraction process and space-time characteristic extraction process.
4. video scene detection method as claimed in claim 3, it is characterized in that, semantic feature extraction process specifically comprises the steps:
121a) to each video, calculated the score of every frame picture by extraction method of key frame, choose the highest m frame picture of score as key frame, score computing formula is as follows:
Sdiff (f
k)=∑
i,j| I
k(i, j)-I
k-1(i, j) | (formula 2)
In formula 1 ~ formula 3, f
krepresent kth frame picture in video sequence; Score (f
k) represent the score of kth frame picture; Sdiff (f
k) represent the measures of dispersion of this frame and former frame; α, β are respectively weight; Max_Sdiff and Min_Sdiff is respectively maximum difference and the minimal difference of adjacent two interframe;
with
represent the variable quantity of horizontal direction and the variable quantity of vertical direction of pixel i light stream in kth frame picture respectively; N
krepresent kth frame number of pixels; MoValue (f
k) represent the light stream intensity of kth frame; Max_MoValue represents maximum light stream intensity in all frames; Min_MoValue represents minimum light intensity of flow in all frames;
121b) to the m frame picture chosen, for every frame picture, extract picture semantic feature with Dartmouth classeme feature extracting method, obtain the semantic feature vector of this frame picture;
121c) m the real number proper vector that extraction m frame picture obtains is spliced, obtain the vector of a m*2659 dimension, as the semantic feature vector of this video.
5. video scene detection method as claimed in claim 4, is characterized in that, step 121a) described m frame picture is three frame pictures.
6. video scene detection method as claimed in claim 3, it is characterized in that, space and time order feature extraction process specifically comprises the steps:
122a) to each training video, extracted by MoSIFT feature extracting method and obtain MoSIFT feature;
122b) based on MoSIFT features all in video set, generate visual dictionary;
122c) utilize above-mentioned visual dictionary, Fei Sheer vector coding is carried out to each video, obtain the Fei Sheer vector of a 2*D*K dimension;
122d) implement principal component analysis (PCA) to above-mentioned Fei Sheer vector, obtain a low dimensional vector, this low dimensional vector is the space-time characteristic vector of video.
7. video scene detection method as claimed in claim 6, is characterized in that, step 122b) adopt mixed Gauss model to generate visual dictionary.
8. video scene detection method as claimed in claim 1, is characterized in that, step 14) described Multiple Kernel Learning framework is Multiple Kernel Learning framework in Shogun kit, adopts the mode of linear weighted function to combine kernel function, is expressed as formula 9:
In formula 9, K
k(x
i, x
j) represent a kth kernel function; β
krepresent the weight of a kth kernel function; x
i, x
jrepresent video sample i respectively, j is to should the feature of kernel function;
Choose two polynomial kernel as kernel function, characteristic of correspondence is semantic feature and space-time characteristic respectively; The formula of polynomial kernel is such as formula 10:
K (x, x
i)=((xx
i)+1)
d(formula 10)
In formula 10, x, x
irepresent the vector of the input space respectively; D represents exponent number;
The constrained optimization problem of Multiple Kernel Learning is expressed as:
(formula 11)
In formula 11, N represents the vectorial number of the input space; ξ
irepresent the coefficient of relaxation of vectorial i; S represents the number of kernel function; w
krepresent the width of the interphase corresponding to a kth kernel function to support vector; C represents penalty factor; In constraint condition, y
ifor the classification (being 1 or-1) of vectorial i;
for the higher dimensional space mapping function that a kth kernel function is corresponding; B is side-play amount.
Solving especially by Lagrangian changing method of described Multiple Kernel Learning model, obtains solving objective function and is:
In formula 12, N represents the vectorial number of the input space; x
i, x
jrepresent the vector of the input space; α
i, α
jfor the weight of correspondence, obtained by study; y
i, y
jfor the classification of correspondence; S represents the number of kernel function; β
krepresent the weight of a kth kernel function, also obtained by study; In constraint condition, C represents penalty factor; P is normalization norm.
9. video scene detection method as claimed in claim 7, it is characterized in that, the exponent number d of polynomial kernel described in formula 10 is 2.
10. video scene detection method as claimed in claim 1, is characterized in that, step 22) mode of described video sampling comprise every time sampling and every frame sampling; Every time sampling specifically every t, sampling should be carried out second, once sample 10 seconds, form a short-sighted frequency; Every frame sampling specifically every k frame sampling once, adopt enough 240 frames and form a short-sighted frequency; Described short-sighted frequency is as detection target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510427821.8A CN105005772B (en) | 2015-07-20 | 2015-07-20 | A kind of video scene detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510427821.8A CN105005772B (en) | 2015-07-20 | 2015-07-20 | A kind of video scene detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105005772A true CN105005772A (en) | 2015-10-28 |
CN105005772B CN105005772B (en) | 2018-06-12 |
Family
ID=54378437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510427821.8A Active CN105005772B (en) | 2015-07-20 | 2015-07-20 | A kind of video scene detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105005772B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105844239A (en) * | 2016-03-23 | 2016-08-10 | 北京邮电大学 | Method for detecting riot and terror videos based on CNN and LSTM |
CN107038221A (en) * | 2017-03-22 | 2017-08-11 | 杭州电子科技大学 | A kind of video content description method guided based on semantic information |
CN107239801A (en) * | 2017-06-28 | 2017-10-10 | 安徽大学 | Video attribute represents that learning method and video text describe automatic generation method |
CN107273863A (en) * | 2017-06-21 | 2017-10-20 | 天津师范大学 | A kind of scene character recognition method based on semantic stroke pond |
CN107766838A (en) * | 2017-11-08 | 2018-03-06 | 央视国际网络无锡有限公司 | A kind of switching detection method of video scene |
CN108197566A (en) * | 2017-12-29 | 2018-06-22 | 成都三零凯天通信实业有限公司 | Monitoring video behavior detection method based on multi-path neural network |
CN108229336A (en) * | 2017-12-13 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video identification and training method and device, electronic equipment, program and medium |
CN108647264A (en) * | 2018-04-28 | 2018-10-12 | 北京邮电大学 | A kind of image automatic annotation method and device based on support vector machines |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN109218721A (en) * | 2018-11-26 | 2019-01-15 | 南京烽火星空通信发展有限公司 | A kind of mutation video detecting method compared based on frame |
CN109241811A (en) * | 2017-07-10 | 2019-01-18 | 南京原觉信息科技有限公司 | Scene analysis method based on image spiral line and the scene objects monitoring system using this method |
CN110126846A (en) * | 2019-05-24 | 2019-08-16 | 北京百度网讯科技有限公司 | Representation method, device, system and the storage medium of Driving Scene |
CN110532990A (en) * | 2019-09-04 | 2019-12-03 | 上海眼控科技股份有限公司 | The recognition methods of turn signal use state, device, computer equipment and storage medium |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
WO2022012002A1 (en) * | 2020-07-15 | 2022-01-20 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for video analysis |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110166744A (en) * | 2019-04-28 | 2019-08-23 | 南京师范大学 | A kind of monitoring method violating the regulations of setting up a stall based on video geography fence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102073864A (en) * | 2010-12-01 | 2011-05-25 | 北京邮电大学 | Football item detecting system with four-layer structure in sports video and realization method thereof |
CN102473291A (en) * | 2009-07-20 | 2012-05-23 | 汤姆森特许公司 | Method for detecting and adapting video processing for far-view scenes in sports video |
CN102509084A (en) * | 2011-11-18 | 2012-06-20 | 中国科学院自动化研究所 | Multi-examples-learning-based method for identifying horror video scene |
US8489627B1 (en) * | 2008-08-28 | 2013-07-16 | Adobe Systems Incorporated | Combined semantic description and visual attribute search |
CN103679192A (en) * | 2013-09-30 | 2014-03-26 | 中国人民解放军理工大学 | Image scene type discrimination method based on covariance features |
CN104184925A (en) * | 2014-09-11 | 2014-12-03 | 刘鹏 | Video scene change detection method |
-
2015
- 2015-07-20 CN CN201510427821.8A patent/CN105005772B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8489627B1 (en) * | 2008-08-28 | 2013-07-16 | Adobe Systems Incorporated | Combined semantic description and visual attribute search |
CN102473291A (en) * | 2009-07-20 | 2012-05-23 | 汤姆森特许公司 | Method for detecting and adapting video processing for far-view scenes in sports video |
CN102073864A (en) * | 2010-12-01 | 2011-05-25 | 北京邮电大学 | Football item detecting system with four-layer structure in sports video and realization method thereof |
CN102509084A (en) * | 2011-11-18 | 2012-06-20 | 中国科学院自动化研究所 | Multi-examples-learning-based method for identifying horror video scene |
CN103679192A (en) * | 2013-09-30 | 2014-03-26 | 中国人民解放军理工大学 | Image scene type discrimination method based on covariance features |
CN104184925A (en) * | 2014-09-11 | 2014-12-03 | 刘鹏 | Video scene change detection method |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105844239B (en) * | 2016-03-23 | 2019-03-29 | 北京邮电大学 | It is a kind of that video detecting method is feared based on CNN and LSTM cruelly |
CN105844239A (en) * | 2016-03-23 | 2016-08-10 | 北京邮电大学 | Method for detecting riot and terror videos based on CNN and LSTM |
CN107038221A (en) * | 2017-03-22 | 2017-08-11 | 杭州电子科技大学 | A kind of video content description method guided based on semantic information |
CN107038221B (en) * | 2017-03-22 | 2020-11-17 | 杭州电子科技大学 | Video content description method based on semantic information guidance |
CN107273863B (en) * | 2017-06-21 | 2019-07-23 | 天津师范大学 | A kind of scene character recognition method based on semantic stroke pond |
CN107273863A (en) * | 2017-06-21 | 2017-10-20 | 天津师范大学 | A kind of scene character recognition method based on semantic stroke pond |
CN107239801A (en) * | 2017-06-28 | 2017-10-10 | 安徽大学 | Video attribute represents that learning method and video text describe automatic generation method |
CN107239801B (en) * | 2017-06-28 | 2020-07-28 | 安徽大学 | Video attribute representation learning method and video character description automatic generation method |
CN109241811A (en) * | 2017-07-10 | 2019-01-18 | 南京原觉信息科技有限公司 | Scene analysis method based on image spiral line and the scene objects monitoring system using this method |
CN109241811B (en) * | 2017-07-10 | 2021-04-09 | 南京原觉信息科技有限公司 | Scene analysis method based on image spiral line and scene target monitoring system using same |
CN107766838B (en) * | 2017-11-08 | 2021-06-01 | 央视国际网络无锡有限公司 | Video scene switching detection method |
CN107766838A (en) * | 2017-11-08 | 2018-03-06 | 央视国际网络无锡有限公司 | A kind of switching detection method of video scene |
CN108229336B (en) * | 2017-12-13 | 2021-06-04 | 北京市商汤科技开发有限公司 | Video recognition and training method and apparatus, electronic device, program, and medium |
CN108229336A (en) * | 2017-12-13 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video identification and training method and device, electronic equipment, program and medium |
CN108197566B (en) * | 2017-12-29 | 2022-03-25 | 成都三零凯天通信实业有限公司 | Monitoring video behavior detection method based on multi-path neural network |
CN108197566A (en) * | 2017-12-29 | 2018-06-22 | 成都三零凯天通信实业有限公司 | Monitoring video behavior detection method based on multi-path neural network |
CN108647264B (en) * | 2018-04-28 | 2020-10-13 | 北京邮电大学 | Automatic image annotation method and device based on support vector machine |
CN108647264A (en) * | 2018-04-28 | 2018-10-12 | 北京邮电大学 | A kind of image automatic annotation method and device based on support vector machines |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
CN110969066B (en) * | 2018-09-30 | 2023-10-10 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
CN109218721A (en) * | 2018-11-26 | 2019-01-15 | 南京烽火星空通信发展有限公司 | A kind of mutation video detecting method compared based on frame |
CN110126846A (en) * | 2019-05-24 | 2019-08-16 | 北京百度网讯科技有限公司 | Representation method, device, system and the storage medium of Driving Scene |
CN110532990A (en) * | 2019-09-04 | 2019-12-03 | 上海眼控科技股份有限公司 | The recognition methods of turn signal use state, device, computer equipment and storage medium |
WO2022012002A1 (en) * | 2020-07-15 | 2022-01-20 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for video analysis |
Also Published As
Publication number | Publication date |
---|---|
CN105005772B (en) | 2018-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105005772A (en) | Video scene detection method | |
CN111709409B (en) | Face living body detection method, device, equipment and medium | |
CN111310731B (en) | Video recommendation method, device, equipment and storage medium based on artificial intelligence | |
CN111325115B (en) | Cross-modal countervailing pedestrian re-identification method and system with triple constraint loss | |
Wang et al. | Joint learning of visual attributes, object classes and visual saliency | |
CN109255289B (en) | Cross-aging face recognition method based on unified generation model | |
CN112183468A (en) | Pedestrian re-identification method based on multi-attention combined multi-level features | |
CN110516707B (en) | Image labeling method and device and storage medium thereof | |
CN109165612B (en) | Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization | |
CN110874576B (en) | Pedestrian re-identification method based on typical correlation analysis fusion characteristics | |
Pei et al. | Consistency guided network for degraded image classification | |
Zheng et al. | When saliency meets sentiment: Understanding how image content invokes emotion and sentiment | |
Li et al. | Codemaps-segment, classify and search objects locally | |
CN107894996A (en) | The image intelligent analysis method for clapping device is supervised based on intelligence | |
CN115147641A (en) | Video classification method based on knowledge distillation and multi-mode fusion | |
US20240013368A1 (en) | Pavement nondestructive detection and identification method based on small samples | |
CN113762151A (en) | Fault data processing method and system and fault prediction method | |
CN110135363B (en) | Method, system, equipment and medium for searching pedestrian image based on recognition dictionary embedding | |
CN116189063B (en) | Key frame optimization method and device for intelligent video monitoring | |
Mortezaie et al. | A color-based re-ranking process for people re-identification: Paper ID 21 | |
Ling et al. | Magnetic tile surface defect detection methodology based on self-attention and self-supervised learning | |
CN112380970B (en) | Video target detection method based on local area search | |
CN111459050B (en) | Intelligent simulation type nursing teaching system and teaching method based on dual-network interconnection | |
CN106971151B (en) | Open visual angle action identification method based on linear discriminant analysis | |
CN115063591B (en) | RGB image semantic segmentation method and device based on edge measurement relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |