CN103853724A

CN103853724A - Multimedia data sorting method and device

Info

Publication number: CN103853724A
Application number: CN201210498829.XA
Authority: CN
Inventors: 常江龙; 徐法明; 朱春波
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2012-11-29
Filing date: 2012-11-29
Publication date: 2014-06-11
Anticipated expiration: 2032-11-29
Also published as: CN103853724B

Abstract

The invention discloses a multimedia data sorting method and device. The multimedia data sorting method comprises extracting characteristics from every original multimedia data, enabling all data characteristics to form into an original characteristic set, extracting dynamic characteristics, editing characteristics and static characteristics when the original multimedia data are video data and extracting one type or any combination of structural characteristics, color characteristics, textural characteristics and shape characteristics when the original multimedia data are image data; selecting a training sample for every category from the original characteristic set and enabling the training samples to form into a training set; performing learning training on a training set through a preset sorting algorithm and generating into a sorting judging principle model; computing any testing sample through the sorting judging principle model and obtaining the categories to which the testing samples belong. The multimedia data sorting method improves the multimedia data sorting accuracy rate.

Description

Multi-medium data sorting technique and device

Technical field

The present invention relates to multimedia technology field, be specifically related to multi-medium data sorting technique and device.

Background technology

Along with the surge of multi-medium data amount, how effectively to manage these data and become a problem demanding prompt solution.The existing multimedia class technology overwhelming majority is text based, and content-based multimedia class technology, comprises that content-based visual classification and content-based Image Classfication Technology are also in the research and development stage.

Existing content-based video features mainly contains video editing feature, video behavioral characteristics, video static nature etc.Existing content-based characteristics of image mainly contains color of image feature, image texture characteristic, picture shape feature, image space relationship characteristic etc.The machine learning of existing content-based multimedia class and method for classifying modes mainly contain Bayesian decision method, artificial neural network, decision tree, linear decision function, nonparametric technique etc.Existing multimedia class technology has two sorting techniques and many sorting techniques in class categories.

There is following shortcoming in existing content-based multimedia class technology:

One, existing content-based multimedia class technology is extracted video or image mostly a kind of feature or a small amount of several feature combine, and such as extracting the essential characteristics such as color, texture, therefore the classification accuracy of video, image has much room for improvement.

Two, existing content-based multimedia class technology mostly adopts single sorter train and classify on machine learning and method for classifying modes, inadequate for the utilization of ensemble learning and Ensemble classifier method.

Three, existing content-based multimedia class technology mainly solves comparatively single problem, such as single physical culture visual classification, cartoon video classification etc. in visual classification, indoor and outdoor Images Classification in Images Classification, the non-advertising image classification of advertising image etc., inadequate for versatility, multi-class video and Images Classification issue handling.

Summary of the invention

The invention provides multi-medium data sorting technique and device, to improve multi-medium data classification accuracy.

Technical scheme of the present invention is achieved in that

A kind of multi-medium data sorting technique, the method comprises:

From each original multi-medium data, extract respectively feature, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any;

For each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets;

Adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model;

For arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.

When described while being characterized as behavioral characteristics,

Described extraction feature comprises:

For the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data;

For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change;

For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence;

Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.

Described editor's feature comprises: video lens shear rate, video lens fade rate and static frame per second.

When described while being characterized as video lens shear rate,

Described extraction feature comprises:

For the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor;

Judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1,

When whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.

When described while being characterized as video lens fade rate,

Described extraction feature comprises:

For any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1;

When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.

Described static nature comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.

Described color characteristic comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.

Described textural characteristics comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.

When described while being characterized as shape facility,

Described extraction feature comprises:

Calculate the gray-scale map of this image;

In the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I _s;

In x, y direction, calculate I respectively _sgradient, obtain gradient map I _gradxand I _grady, according to I _gradxand I _gradycalculate gradient amplitude figure I _gradmag;

To gradient amplitude figure I _gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I _edge;

To I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_final};

Computation bound figure I _{edge_final}7 geometric invariant moment, obtain the shape facility of image.

When described while being characterized as architectural feature,

Described extraction feature comprises:

Calculate the gray-scale map I of this image _gray;

To I _graycarry out Census and convert the Census Transformation Graphs I that obtains this image _census;

Calculate I _censushistogram, its dimension is 256;

Use principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.

Describedly training set carried out to learning training comprise:

Preset frequency of training n;

Training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtains m new training set;

Carry out respectively learning training with m new training set and obtain m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, and who gets the most votes's classification is as this classification results of this test sample book;

Whether training of judgement number of times reaches n time, if so, determines that training finishes; Otherwise, start training process next time.

A kind of multi-medium data sorter, comprising:

Characteristic extracting module: extract respectively feature from each original multi-medium data, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any, primitive character collection is sent;

Sample is chosen module: receive described primitive character collection, for each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets, send training set;

Training module: receive described training set, adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model, classification decision rule model is sent;

Test module: receive described classification decision rule model, for arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.

Described characteristic extracting module is further used for, when described while being characterized as behavioral characteristics, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.

Editor's feature that described characteristic extracting module is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.

Described characteristic extracting module is further used for, when described while being characterized as video lens shear rate, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.

Described characteristic extracting module is further used for, when described while being characterized as video lens fade rate, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.

The static nature that described characteristic extracting module is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.

The color characteristic that described characteristic extracting module is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.

The textural characteristics that described characteristic extracting module is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.

Described characteristic extracting module is further used for, and while being characterized as shape facility, calculates the gray-scale map of this image when described, carries out gaussian filtering respectively in the x of the gray-scale map of this image, y direction, obtains filtered image I _s, in x, y direction, calculate I respectively _sgradient, obtain gradient map I _gradxand I _grady, according to I _gradxand I _gradycalculate gradient amplitude figure I _gradmag, to gradient amplitude figure I _gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I _edge, to I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_final}, computation bound figure I _{edge_final}7 geometric invariant moment, obtain the shape facility of image.

Described characteristic extracting module is further used for, and while being characterized as architectural feature, calculates the gray-scale map I of this image when described _gray, to I _graycarry out Census and convert the Census Transformation Graphs I that obtains this image _census, calculate I _censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.

Described training module is further used for, preset frequency of training n, training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, training process next time started;

Described test module is further used for, in the time receiving training module and send m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as this classification results of this test sample book, in the time receiving the complete indication of training that training module sends, determine that training finishes.

Compared with prior art, the present invention has improved multi-medium data classification accuracy.

Accompanying drawing explanation

The multi-medium data sorting technique process flow diagram that Fig. 1 provides for the embodiment of the present invention;

The method flow diagram of the behavioral characteristics of the extraction video data that Fig. 2 provides for the embodiment of the present invention;

The method flow diagram of the shot-cut rate feature of the extraction video data that Fig. 3 provides for the embodiment of the present invention;

The method flow diagram of the gradual shot rate feature of the extraction video data that Fig. 4 provides for the embodiment of the present invention;

The method flow diagram of the static frame per second feature of the extraction video data that Fig. 5 provides for the embodiment of the present invention;

The method flow diagram of the static nature of the extraction video data that Fig. 6 provides for the embodiment of the present invention;

The method flow diagram of the color characteristic of the extraction view data that Fig. 7 provides for the embodiment of the present invention;

The method flow diagram of the textural characteristics of the extraction view data that Fig. 8 provides for the embodiment of the present invention;

The method flow diagram of the shape facility of the extraction view data that Fig. 9 provides for the embodiment of the present invention;

The method flow diagram of the architectural feature of the extraction view data that Figure 10 provides for the embodiment of the present invention;

The distribution training set that Figure 11 provides for the embodiment of the present invention and the method flow diagram of test set;

The learning training method flow diagram that Figure 12 provides for the embodiment of the present invention one;

The learning training method flow diagram that Figure 13 provides for the embodiment of the present invention two;

The composition schematic diagram of the multi-medium data sorter that Figure 14 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.

The multi-medium data sorting technique process flow diagram that Fig. 1 provides for the embodiment of the present invention, as shown in Figure 1, its concrete steps are as follows:

Step 101: each original multi-medium data is carried out respectively to feature extraction, obtain the eigenvector of each data, the eigenvector of all data has formed primitive character vector set.

Multi-medium data can be video data, can be also view data.

For video data, can extract behavioral characteristics, editor's feature and color characteristic.

Wherein, behavioral characteristics comprises: mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average; Editor's feature can comprise: video lens shear rate, video lens fade rate and static frame per second; Static nature can comprise: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.Wherein, r, g, b are the color component of image under RGB color space.

For view data, can extract architectural feature, and extract color characteristic, shape facility, textural characteristics thrin or combination in any.

Step 102: for each classification, concentrate and choose such other training sample and test sample book from primitive character, all training sample composition training sets, all test sample book composition test sets.

Step 103: adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model.

Step 104: for the arbitrary test sample book in test set, the classification decision rule model that adopts step 103 to obtain calculates this test sample book, obtains the affiliated classification of this test sample book.

Below provide respectively the specific implementation of the video feature extraction method adopting in the embodiment of the present invention:

In the embodiment of the present invention, before carrying out feature extraction, in order to reduce algorithm complex and memory data output, can first adopt default reduced graph generating method, every two field picture is converted to thumbnail, the large I of thumbnail is determined according to actual conditions, as can be 160*120, and use respectively two array I _{yuv_cur}and I _{yuv_pre}deposit the yuv data of present frame and former frame thumbnail, use respectively two array I _{rgb_cur}and I _{rgb_pre}deposit the RGB data of present frame and former frame thumbnail, use respectively two array I _{hsv_cur}and I _{hsv_pre}deposit the HSV data of present frame and former frame thumbnail.Wherein, the color space of YUV, RGB, HSV presentation video.

The method flow diagram of the behavioral characteristics of the extraction video data that Fig. 2 provides for the embodiment of the present invention, as shown in Figure 2, its concrete steps are as follows:

Step 201: for the every two consecutive frame thumbnails in video sequence, calculate I _{yuv_cur}in the average of all y, obtain y _{avr_cur}; Calculate I _{yuv_pre}in the average of all y, obtain y _{avr_pre}, calculate y _{avr_cur}and y _{avr_pre}the absolute value of difference, obtain the luminance difference of present frame and former frame thumbnail.

Step 202: calculate respectively I _{rgb_cur}in the average, the average of all g of all r, the average of all b, obtains r _{avr_cur}, g _{avr_cur}, b _{avr_cur}; Calculate I _{rgb_pre}in the average, the average of all g of all r, the average of all b, obtains r _{avr_pre}, g _{avr_pre}, b _{avr_pre}, calculate respectively r _{avr_cur}and r _{avr_pre}absolute value, the g of difference _{avr_cur}and g _{avr_pre}absolute value, the b of difference _{avr_cur}and b _{avr_pre}the absolute value of difference, r, the g, the b that obtain present frame and former frame thumbnail are poor.

Step 203: adopt block matching algorithm to calculate I _{yuv_cur}and I _{yuv_pre}between motion vector average.

This step can adopt existing block matching algorithm to realize.

Step 204: for whole video sequence, poor to luminance difference, r between all thumbnails of consecutive frame between two that obtain, g is poor, b is poor, motion vector average is averaged, and obtains mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average.

Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average form the behavioral characteristics of video data.

Editor's feature of video data comprises: video lens shear rate, video lens fade rate and static frame per second.

The method flow diagram of the shot-cut rate feature of the extraction video data that Fig. 3 provides for the embodiment of the present invention, as shown in Figure 3, its concrete steps are as follows:

Step 301: for the every two consecutive frame thumbnail I in video sequence _{yuv_cur}and I _{yuv_pre}, calculate respectively I _{yuv_cur}in the average, the average of all u, the average of all v of all y, obtain y _{avr_cur}, u _{avr_cur}, v _{avr_cur}; Calculate I _{yuv_pre}in the average, the average of all u, the average of all v of all y, obtain y _{avr_pre}, u _{avr_pre}, v _{avr_pre}, calculate respectively y _{avr_cur}and y _{avr_pre}absolute value, the u of difference _{avr_cur}and u _{avr_pre}absolute value, the v of difference _{avr_cur}and v _{avr_pre}the absolute value of difference, the y that obtains present frame and former frame thumbnail is poor, u poor and v is poor.

Step 302: judge whether to meet: y is poor is greater than that preset first threshold value, u are poor is greater than that default Second Threshold, v are poor is greater than default the 3rd threshold value, if so, determines that present frame and former frame be candidate's shot-cut frame, perform step 303; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.

Step 303: use respectively Sobel Operator Method to I _{yuv_cur}and I _{yuv_pre}calculate, obtain respectively outline map I _edge1and I _edge2, calculate I _edge1the average Gr of middle pixel value _avr1, calculate I _edge2the average Gr of middle pixel value _avr2, calculate Gr _avr1, Gr _avr2the absolute value of difference.

Step 304: judge Gr _avr1, Gr _avr2the absolute value of difference whether be greater than default the 4th threshold value, if so, perform step 305; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.

Global illumination can be reduced and change the flase drop of the shot-cut causing in step 303 ~ 304.

Step 305: adopt block matching algorithm to calculate I _{yuv_cur}and I _{yuv_pre}between motion vector average.

Step 306: judge I _{yuv_cur}and I _{yuv_pre}between motion vector average whether be greater than default the 5th threshold value, if so, perform step 307; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.

The shot-cut flase drop causing due to object of which movement can be removed in step 305 ~ 306.

Step 307: determine that present frame and former frame are shot-cut frame, add 1 by shot-cut number.

It is 0 that shot-cut is counted initial value.

Step 308: judge whether, to complete to whole video sequence detection, if so, to perform step 309; Otherwise, go to next frame, return to step 301.

Step 309: the shot-cut number in calculating video sequence and the ratio of whole video sequence frame number, obtain shot-cut rate.

The method flow diagram of the gradual shot rate feature of the extraction video data that Fig. 4 provides for the embodiment of the present invention, as shown in Figure 4, its concrete steps are as follows:

Step 401: for the be separated by thumbnail I of q frame of any two in video sequence _{yuv_cur}and I _{yuv_cur-q}, calculate I _{yuv_cur}in the average of all y, obtain y _{avr_cur}; Calculate I _{yuv_cur-q}in the average of all y, obtain y _{avr_cur-q}, calculate y _{avr_cur}and y _{avr_cur-q}the absolute value of difference | y _{avr_cur}-y _{avr_cur-q}|.

Wherein, q is the frame number of being separated by of two two field pictures in video sequence, and the value of q can rule of thumb determine, preferably, and q desirable 10.

Step 402: judgement | y _{avr_cur}-y _{avr_cur-q}| whether be greater than default the 6th threshold value, if so, perform step 403; Otherwise, execution step 405.

Step 403: judge whether that continuous N frame all meets | y _{avr_cur}-y _{avr_cur-q}| be greater than default the 6th threshold value, if so, perform step 404; Otherwise, execution step 405.

The span of M can be q ~ 2q.

Step 404: judge and a gradual shot place detected, gradual shot number is added to 1.

It is 0 that gradual shot is counted initial value.

Step 405: judge whether that whole video sequence has all detected complete, if so, perform step 406; Otherwise, go to next frame, return to step 401.

Step 406: calculate the ratio of gradual shot number and whole video sequence frame number, obtain gradual shot rate.

The method flow diagram of the static frame per second feature of the extraction video data that Fig. 5 provides for the embodiment of the present invention, as shown in Figure 5, its concrete steps are as follows:

Step 501: for the every two consecutive frame thumbnail I in video sequence _{yuv_cur}and I _{yuv_pre}, calculate respectively I _{yuv_cur}in the average, the average of all u, the average of all v of all y, obtain y _{avr_cur}, u _{avr_cur}, v _{avr_cur}; Calculate I _{yuv_pre}in the average, the average of all u, the average of all v of all y, obtain y _{avr_pre}, u _{avr_pre}, v _{avr_pre}, calculate respectively y _{avr_cur}and y _{avr_pre}absolute value, the u of difference _{avr_cur}and u _{avr_pre}absolute value, the v of difference _{avr_cur}and v _{avr_pre}the absolute value of difference, the y that obtains present frame and former frame thumbnail is poor, u poor and v is poor.

Step 502: judge whether to meet: y is poor is less than that default the 7th threshold value, u are poor is less than that default the 8th threshold value, v are poor is less than default the 9th threshold value, if so, performs step 503; Otherwise, execution step 504.

Step 503: determine that present frame is frozen frozen mass, static frame number adds 1.

Static frame number initial value is 0.

Step 504: judge whether that whole video sequence has all detected complete, if so, perform step 505; Otherwise, go to next frame, return to step 501.

Step 505: calculate static frame number on whole video sequence and the ratio of whole video sequence frame number, obtain static frame per second.

Shot-cut rate, gradual shot rate and static frame per second form editor's feature of video data.

The method flow diagram of the static nature of the extraction video data that Fig. 6 provides for the embodiment of the present invention, as shown in Figure 6, its concrete steps are as follows:

Step 601: for the arbitrary frame thumbnail in video sequence, calculate I _{yuv_cur}in average and the variance of y, calculate I _{hsv_cur}in the average of s.

Step 602: for I _{yuv_cur}in y, statistics y value is greater than the number of presetting the tenth threshold value, calculates the ratio of the pixel sum in this number and whole thumbnail, obtains the high component ratio of brightness; For I _{hsv_cur}in s, statistics s value is greater than the number of presetting the 11 threshold value, calculates the ratio of the pixel sum in this number and whole thumbnail, obtains the high component ratio of saturation degree.

Step 603: to I _{yuv_cur}carry out three rank wavelet transformations, obtain 10 subgraphs, calculate respectively the root mean square of the y of each subgraph.

Step 604: for whole video sequence, calculate respectively the root mean square summation of the y of each subgraph of the high component ratio of average, the variance of y, the average of s, brightness, the high component ratio of saturation degree of the y of all frames, all frames, and each and value are averaged for video sequence frame number, obtain the average y root mean square of mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and each subgraph.

Wherein, in this step, for the root mean square of y in 10 subgraphs, be for each subgraph, the root mean square of the y of this subgraph in all frames be added, then by averaging for video sequence frame number with value of obtaining, obtain the average y root mean square of this subgraph, like this, obtain respectively the average y root mean square of 10 subgraphs, the average y root mean square of these 10 subgraphs has formed the wavelet transformation textural characteristics of video data.

Mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics have formed the static nature of video data.

Below provide respectively the specific implementation of the image characteristic extracting method adopting in the embodiment of the present invention:

Characteristics of image comprises: architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any.

The method flow diagram of the color characteristic of the extraction view data that Fig. 7 provides for the embodiment of the present invention, as shown in Figure 7, its concrete steps are as follows:

Step 701: for each two field picture, under the hsv color space of this image, calculate respectively average, variance and the degree of bias of tone (H), saturation degree (S) and three components of brightness (V), i.e. the single order of image, second order and three rank HSV spatial color moment characteristics.

The computing formula of three kinds of moment characteristics is as follows:

Average:

μ_{i} = \frac{1}{n} Σ_{j = 1}^{n} p_{ij}

Variance:

σ_{i} = {(\frac{1}{n} Σ_{j = 1}^{n} {(p_{ij} - μ_{i})}^{2})}^{\frac{1}{2}}

The degree of bias:

s_{i} = {(\frac{1}{n} Σ_{j = 1}^{n} {(p_{ij} - μ_{i})}^{3})}^{\frac{1}{3}}

Wherein, the sequence number that i is color component, totally 3 of color components, i.e. h, s, v, its sequence number is respectively 0,1,2, i.e. i0,1,2; p _ijbe the value of j pixel in i color component, n is pixel sum.

Step 702: the tone (H) in the hsv color space to this two field picture, saturation degree (S) and three component values of brightness (V) carry out non-equivalent quantification, and wherein, H quantizes to 8 values, and S, V quantize to two values.

Color histogram dimension after quantification is:

Hist=HQ _sQ _v+SQ _v+V

Wherein, Q _s, Q _vbeing respectively the quantification progression of S and V component, is all that 2, H, S, V are respectively the value after H, S, tri-element quantizations of V.

Step 703: according to the color histogram after quantizing, calculate single order, second order and three rank color histogram moment characteristics.

Single order, second order and the three rank HSV spatial color moment characteristics of image and single order, second order and third moment color histogram moment characteristics have formed the color characteristic of view data.

The method flow diagram of the textural characteristics of the extraction view data that Fig. 8 provides for the embodiment of the present invention, as shown in Figure 8, its concrete steps are as follows:

Step 801: for each two field picture, calculate the gray-scale map of this image, gray-scale value is quantified as to 8 grades.

Step 802: it is 32 × 32 video in window that this two field picture is divided into several sizes, each video in window non-overlapping copies.

The number of video in window is determined with the ratio of video in window size according to the width of this two field picture and height.

Step 803: for each video in window, calculate the gray level co-occurrence matrixes of the four direction (0 °, 45 °, 90 °, 135 °) of this video in window.

Computing formula is as follows:

m _(d,θ)(i,j)=card{[(x ₁,y ₁),(x ₂,y ₂)]∈S|f(x ₁,y ₁)=i&f(x ₂,y ₂)=j}

Wherein, f (x, y) represent video in window, S is the right set of pixel that pixel value is respectively i, j, d is that pixel value is the distance of two pixels of i, j, and the value of d is rule of thumb definite in advance, and θ represents orientation angle, 0 °, 45 °, 90 ° or 135 °, card (S) represents in S set m _{(d, θ)}(i, j) contributive element number.

Step 804: for the gray level co-occurrence matrixes of each direction of each video in window, calculate 7 statistic features of this gray level co-occurrence matrixes of this video in window: correlativity, contrast, entropy, unfavourable balance square, energy and average and entropy.

Step 805: for each statistic feature of the gray level co-occurrence matrixes of each direction, to this statistic feature summation of all video in windows, this and value are averaging with respect to the sum of video in window, obtain the mean value of this statistic feature, and the mean value of this statistic feature is normalized, the last mean value that obtains altogether 28 statistic features on four direction, the mean value of these 28 statistic features has formed the textural characteristics based on gray level co-occurrence matrixes of this two field picture.

Step 806: this two field picture is carried out to three rank wavelet transformations, obtain 10 subgraphs, calculate the root mean square of the y of each subgraph, finally obtain altogether the root mean square of the y of 10 subgraphs, the root mean square of the y of these 10 subgraphs has formed the wavelet transformation textural characteristics of this two field picture.

Textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics have formed the textural characteristics of view data.

The method flow diagram of the shape facility of the extraction view data that Fig. 9 provides for the embodiment of the present invention, as shown in Figure 9, its concrete steps are as follows:

Step 901: for each two field picture, calculate the gray-scale map of this two field picture, gray-scale value is quantified as to 8 grades.

Step 902: carry out gaussian filtering respectively in the x of the gray-scale map of this two field picture, y direction, obtain filtered image I _s.

Step 903: calculate I respectively in x, y direction _sgradient, obtain gradient map I _gradxand I _grady, according to I _gradxand I _gradycalculate gradient amplitude figure I _gradmag.

Step 904: to gradient amplitude figure I _gradmagcarry out non-maximization and suppress to process, to suppress the wherein pixel value of non local extreme point, obtain possible boundary graph I _edge.

This step can adopt existing techniques in realizing.

Step 905: to I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_final}.

This step can adopt existing techniques in realizing.

Step 906: computation bound figure I _{edge_final}7 geometric invariant moment.

With f (x, y) expression I _{edge_final}, its p+q rank square is defined as:

m_{pq} = \underset{x}{Σ} \underset{y}{Σ} x^{p} y^{q} f (x, y)

Qip+qJie center square is defined as

wherein,

the regional barycenter of presentation video, normalization Hou center square is expressed as

η_{pq} = \frac{μ_{pq}}{μ_{00}^{r}},

Wherein

γ = \frac{p + q}{2} + 1,

p+q=2,3,...

Here only calculate second order and third central moment, namely p+q=2 or 3 situation.Draw 7 geometric invariant moment by above second order and third central moment, as follows:

Boundary graph I _{edge_final}7 geometric invariant moment formed the shape facility of view data.

The method flow diagram of the architectural feature of the extraction view data that Figure 10 provides for the embodiment of the present invention, as shown in figure 10, its concrete steps are as follows:

Step 1001: for each two field picture, calculate the gray-scale map I of this two field picture _gray, gray-scale value is quantified as to 8 grades.

Step 1002: to I _graycarry out Census and convert the Census Transformation Graphs I that obtains this two field picture _census.

Wherein, Census conversion can be described as: the magnitude relationship of the pixel value of more each pixel and its 8 neighborhood territory pixel point respectively, if this pixel value is less than neighborhood territory pixel value, putting neighborhood value is 0, otherwise putting neighborhood value is 1, then according to order, from left to right, neighborhood value is formed to the binary number of eight from top to bottom, this binary number is converted to decimal number, is the Census transformed value CT of this pixel, form Census Transformation Graphs I by Census transformed value CT _census, be exemplified below:

32 64 96 1 1 0

\begin{matrix} \begin{matrix} 32 & 64 & 96 &DoubleRightArrow; 1 \end{matrix} & 0 &DoubleRightArrow; {(11010110)}_{2} &DoubleRightArrow; CT = 214 \end{matrix}

32 32 96 1 1 0

Step 1003: calculate I _censushistogram, its dimension is 256.

Step 1004: use principal component analysis (PCA) to carry out dimension-reduction treatment to histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this two field picture.

In the embodiment of the present invention, can carry out training study and Classification and Identification processing to the eigenvector of video or image with the support vector machine classifier based on radial basis kernel function.

The distribution training set that Figure 11 provides for the embodiment of the present invention and the method flow diagram of test set, as shown in figure 11, its concrete steps are as follows:

Step 1101: according to the training set of setting and the scale-up factor ratio of test set, the total sample number of each classification, determine the sum of training set and test set.

Ratio can set as required.

Step 1102: for arbitrary classification, choose at random training sample in such other sample.

Select the criterion of sample to be: between class, in the large and class of difference, difference is little.

Step 1103: for arbitrary classification, when such other training sample has all been selected when complete, using such very the residue sample in this as such other test sample book.

The learning training method flow diagram that Figure 12 provides for the embodiment of the present invention one, as shown in figure 12, its concrete steps are as follows:

Step 1201: preset frequency of training n.

N can rule of thumb set.

Step 1202: in the time that each training starts, adopt flow process shown in Figure 11 to generate training sample and the test sample book of each classification, obtain training set and test set.

Step 1203: support vector machine classifier is trained and obtained classifying decision rule model with training set.

Step 1204: for each test sample book, this test sample book is classified with classification decision rule model, when all test sample books being classified when complete, calculate this classification accuracy.

Step 1205: whether training of judgement number of times reaches n time, if so, performs step 1206; Otherwise, return to step 1202 and start training process next time.

Step 1206: the classification accuracy of n time is averaged, draw final classification accuracy.

In the present embodiment, in the time that each training starts to generate training set and test set, can adopt different ratio.

The learning training method flow diagram that Figure 13 provides for the embodiment of the present invention two, as shown in figure 13, its concrete steps are as follows:

Step 1301: preset frequency of training n.

Step 1302: in the time that each training starts, adopt flow process shown in Figure 11 to generate training sample and the test sample book of each classification, obtain training set and test set.

Step 1303: training set is carried out to m time and choose at random, the sample number of at every turn choosing is p, obtains m new training set.

M can rule of thumb set, the sample number in the training set in p< step 1302.

Step 1304: respectively support vector machine classifier train and obtained m the decision rule model of classifying with the training set that m is new, with this m model, the each test sample book in test set is classified respectively again, obtain m classification results for each sample of test set.

Step 1305: for m classification results of each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as the final classification results of this test sample book.

Step 1306: according to the final classification results of each test sample book, draw this classification accuracy.

Step 1307: whether training of judgement number of times reaches n time, if so, performs step 1308; Otherwise, return to step 1302 and start training process next time.

Step 1308: the classification accuracy of n time is averaged, draw final classification accuracy.

The present embodiment also can adopt different scale-up factor ratio in each training process.

The composition schematic diagram of the multi-medium data sorter that Figure 14 provides for the embodiment of the present invention, as shown in figure 14, it mainly comprises: characteristic extracting module 141, sample are chosen module 142, training module 143 and test module 144, wherein:

Characteristic extracting module 141: extract respectively feature from each original multi-medium data, the feature of all data has formed primitive character collection, and, in the time that original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any, send to sample to choose module 142 primitive character collection.

Sample is chosen module 142: the primitive character collection that receive feature extraction module 141 is sent, and for each classification, to concentrate and choose such other training sample from primitive character, all training sample composition training sets, send to training module 143 by training set.

Training module 143: receive sample and choose the training set that module 142 is sent, adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model, classification decision rule model is sent to test module 144.

Test module 144: receive the classification decision rule model that training module 143 is sent, for arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.

Characteristic extracting module 141 is further used for, when extract be characterized as behavioral characteristics time, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.

Editor's feature that characteristic extracting module 141 is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.

Characteristic extracting module 141 is further used for, when extract be characterized as video lens shear rate time, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.

Characteristic extracting module 141 is further used for, when extract be characterized as video lens fade rate time, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.

The static nature that characteristic extracting module 141 is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.

The color characteristic that characteristic extracting module 141 is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.

The textural characteristics that characteristic extracting module 141 is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.

Characteristic extracting module 141 is further used for, when extract be characterized as shape facility time, calculate the gray-scale map of this image, in the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I _s, in x, y direction, calculate I respectively _sgradient, obtain gradient map I _gradxand I _grady, according to I _gradxand I _gradycalculate gradient amplitude figure I _gradmag, to gradient amplitude figure I _gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I _edge, to I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_final}, computation bound figure I _{edge_final}7 geometric invariant moment, obtain the shape facility of image.

Characteristic extracting module 141 is further used for, when extract be characterized as architectural feature time, calculate the gray-scale map I of this image _gray, to I _graycarry out Census and convert the Census Transformation Graphs I that obtains this image _census, calculate I _censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to this histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.

Training module 143 is further used for, preset frequency of training n, training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, training process next time started.

Simultaneously, test module 144 is further used for, in the time receiving training module and send m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as this classification results of this test sample book, in the time receiving the complete indication of training that training module sends, determines that training finishes.

In actual applications, adopt the embodiment of the present invention to carry out visual classification and Images Classification experiment, specific as follows:

For video, video is divided into 4 classifications, be respectively cartoon class, news category, sport category and other classes.Collect on the internet the video segment of these more common 4 classifications as original video data, wherein each 50 sample datas of cartoon, news and sport category, 39 sample datas of other classifications, be 5:5,6:4,7:3,8:2 and 9:1 in the ratio of training set and test set respectively, the video feature extraction and the sorting technique that adopt the embodiment of the present invention to use are tested, and obtain 84%～88% classification accuracy.

For image, select the image data base of Wang Group, totally 10 classes, respectively original inhabitant, sandy beach, building, bus, dinosaur, elephant, flower, horse, mountain range and food, each class 100 width image, totally 1000 width, are 5:5,6:4,7:3,8:2 and 9:1 in the ratio of training set and test set respectively, the image characteristics extraction and the sorting technique that adopt the embodiment of the present invention to use are tested, and obtain 83%～86% classification accuracy.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. a multi-medium data sorting technique, is characterized in that, the method comprises:

2. method according to claim 1, is characterized in that, when described while being characterized as behavioral characteristics,

Described extraction feature comprises:

3. method according to claim 1, is characterized in that, described editor's feature comprises: video lens shear rate, video lens fade rate and static frame per second.

4. method according to claim 3, is characterized in that, when described while being characterized as video lens shear rate,

Described extraction feature comprises:

5. method according to claim 3, is characterized in that, when described while being characterized as video lens fade rate,

Described extraction feature comprises:

6. method according to claim 3, it is characterized in that, described static nature comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise:

Mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.

7. method according to claim 1, is characterized in that, described color characteristic comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.

8. method according to claim 1, is characterized in that, described textural characteristics comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.

9. method according to claim 1, is characterized in that, when described while being characterized as shape facility,

Described extraction feature comprises:

Calculate the gray-scale map of this image;

To I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_fnal};

10. method according to claim 1, is characterized in that, when described while being characterized as architectural feature,

Described extraction feature comprises:

Calculate the gray-scale map I of this image _gray;

Calculate I _censushistogram, its dimension is 256;

11. methods according to claim 1, is characterized in that, describedly training set is carried out to learning training comprise:

Preset frequency of training n;

12. 1 kinds of multi-medium data sorters, is characterized in that, comprising:

13. devices according to claim 12, it is characterized in that, described characteristic extracting module is further used for, while being characterized as behavioral characteristics, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture when described, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.

14. devices according to claim 12, is characterized in that, editor's feature that described characteristic extracting module is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.

15. devices according to claim 14, it is characterized in that, described characteristic extracting module is further used for, when described while being characterized as video lens shear rate, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.

16. devices according to claim 14, it is characterized in that, described characteristic extracting module is further used for, when described while being characterized as video lens fade rate, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.

17. devices according to claim 12, it is characterized in that, the static nature that described characteristic extracting module is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.

18. devices according to claim 12, is characterized in that, the color characteristic that described characteristic extracting module is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.

19. devices according to claim 12, is characterized in that, the textural characteristics that described characteristic extracting module is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.

20. devices according to claim 12, is characterized in that, described characteristic extracting module is further used for, when described while being characterized as shape facility, calculate the gray-scale map of this image, in the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I _s, in x, y direction, calculate I respectively _sgradient, obtain gradient map I _gradxand I _grady, according to I _gradxand I _gradycalculate gradient amplitude figure I _gradmag, to gradient amplitude figure I _gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I _edge, to I _edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I _edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I _{edge_fnal}, computation bound figure I _{edge_final}7 geometric invariant moment, obtain the shape facility of image.

21. devices according to claim 12, is characterized in that, described characteristic extracting module is further used for, and while being characterized as architectural feature, calculate the gray-scale map I of this image when described _gray, to I _graycarry out Census and convert the Census Transformation Graphs I that obtains this image _census, calculate I _censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.

22. devices according to claim 12, it is characterized in that, described training module is further used for, preset frequency of training n, training set is carried out to m time to be chosen at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, start training process next time,