CN103853724A - Multimedia data sorting method and device - Google Patents

Multimedia data sorting method and device Download PDF

Info

Publication number
CN103853724A
CN103853724A CN201210498829.XA CN201210498829A CN103853724A CN 103853724 A CN103853724 A CN 103853724A CN 201210498829 A CN201210498829 A CN 201210498829A CN 103853724 A CN103853724 A CN 103853724A
Authority
CN
China
Prior art keywords
average
difference
calculate
training
field picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210498829.XA
Other languages
Chinese (zh)
Other versions
CN103853724B (en
Inventor
常江龙
徐法明
朱春波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201210498829.XA priority Critical patent/CN103853724B/en
Publication of CN103853724A publication Critical patent/CN103853724A/en
Application granted granted Critical
Publication of CN103853724B publication Critical patent/CN103853724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multimedia data sorting method and device. The multimedia data sorting method comprises extracting characteristics from every original multimedia data, enabling all data characteristics to form into an original characteristic set, extracting dynamic characteristics, editing characteristics and static characteristics when the original multimedia data are video data and extracting one type or any combination of structural characteristics, color characteristics, textural characteristics and shape characteristics when the original multimedia data are image data; selecting a training sample for every category from the original characteristic set and enabling the training samples to form into a training set; performing learning training on a training set through a preset sorting algorithm and generating into a sorting judging principle model; computing any testing sample through the sorting judging principle model and obtaining the categories to which the testing samples belong. The multimedia data sorting method improves the multimedia data sorting accuracy rate.

Description

Multi-medium data sorting technique and device
Technical field
The present invention relates to multimedia technology field, be specifically related to multi-medium data sorting technique and device.
Background technology
Along with the surge of multi-medium data amount, how effectively to manage these data and become a problem demanding prompt solution.The existing multimedia class technology overwhelming majority is text based, and content-based multimedia class technology, comprises that content-based visual classification and content-based Image Classfication Technology are also in the research and development stage.
Existing content-based video features mainly contains video editing feature, video behavioral characteristics, video static nature etc.Existing content-based characteristics of image mainly contains color of image feature, image texture characteristic, picture shape feature, image space relationship characteristic etc.The machine learning of existing content-based multimedia class and method for classifying modes mainly contain Bayesian decision method, artificial neural network, decision tree, linear decision function, nonparametric technique etc.Existing multimedia class technology has two sorting techniques and many sorting techniques in class categories.
There is following shortcoming in existing content-based multimedia class technology:
One, existing content-based multimedia class technology is extracted video or image mostly a kind of feature or a small amount of several feature combine, and such as extracting the essential characteristics such as color, texture, therefore the classification accuracy of video, image has much room for improvement.
Two, existing content-based multimedia class technology mostly adopts single sorter train and classify on machine learning and method for classifying modes, inadequate for the utilization of ensemble learning and Ensemble classifier method.
Three, existing content-based multimedia class technology mainly solves comparatively single problem, such as single physical culture visual classification, cartoon video classification etc. in visual classification, indoor and outdoor Images Classification in Images Classification, the non-advertising image classification of advertising image etc., inadequate for versatility, multi-class video and Images Classification issue handling.
Summary of the invention
The invention provides multi-medium data sorting technique and device, to improve multi-medium data classification accuracy.
Technical scheme of the present invention is achieved in that
A kind of multi-medium data sorting technique, the method comprises:
From each original multi-medium data, extract respectively feature, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any;
For each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets;
Adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model;
For arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.
When described while being characterized as behavioral characteristics,
Described extraction feature comprises:
For the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data;
For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change;
For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence;
Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.
Described editor's feature comprises: video lens shear rate, video lens fade rate and static frame per second.
When described while being characterized as video lens shear rate,
Described extraction feature comprises:
For the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor;
Judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1,
When whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.
When described while being characterized as video lens fade rate,
Described extraction feature comprises:
For any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1;
When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.
Described static nature comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.
Described color characteristic comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.
Described textural characteristics comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.
When described while being characterized as shape facility,
Described extraction feature comprises:
Calculate the gray-scale map of this image;
In the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I s;
In x, y direction, calculate I respectively sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure I gradmag;
To gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I edge;
To I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_final;
Computation bound figure I edge_final7 geometric invariant moment, obtain the shape facility of image.
When described while being characterized as architectural feature,
Described extraction feature comprises:
Calculate the gray-scale map I of this image gray;
To I graycarry out Census and convert the Census Transformation Graphs I that obtains this image census;
Calculate I censushistogram, its dimension is 256;
Use principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.
Describedly training set carried out to learning training comprise:
Preset frequency of training n;
Training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtains m new training set;
Carry out respectively learning training with m new training set and obtain m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, and who gets the most votes's classification is as this classification results of this test sample book;
Whether training of judgement number of times reaches n time, if so, determines that training finishes; Otherwise, start training process next time.
A kind of multi-medium data sorter, comprising:
Characteristic extracting module: extract respectively feature from each original multi-medium data, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any, primitive character collection is sent;
Sample is chosen module: receive described primitive character collection, for each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets, send training set;
Training module: receive described training set, adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model, classification decision rule model is sent;
Test module: receive described classification decision rule model, for arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.
Described characteristic extracting module is further used for, when described while being characterized as behavioral characteristics, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.
Editor's feature that described characteristic extracting module is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.
Described characteristic extracting module is further used for, when described while being characterized as video lens shear rate, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.
Described characteristic extracting module is further used for, when described while being characterized as video lens fade rate, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.
The static nature that described characteristic extracting module is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.
The color characteristic that described characteristic extracting module is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.
The textural characteristics that described characteristic extracting module is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.
Described characteristic extracting module is further used for, and while being characterized as shape facility, calculates the gray-scale map of this image when described, carries out gaussian filtering respectively in the x of the gray-scale map of this image, y direction, obtains filtered image I s, in x, y direction, calculate I respectively sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure I gradmag, to gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I edge, to I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_final, computation bound figure I edge_final7 geometric invariant moment, obtain the shape facility of image.
Described characteristic extracting module is further used for, and while being characterized as architectural feature, calculates the gray-scale map I of this image when described gray, to I graycarry out Census and convert the Census Transformation Graphs I that obtains this image census, calculate I censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.
Described training module is further used for, preset frequency of training n, training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, training process next time started;
Described test module is further used for, in the time receiving training module and send m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as this classification results of this test sample book, in the time receiving the complete indication of training that training module sends, determine that training finishes.
Compared with prior art, the present invention has improved multi-medium data classification accuracy.
Accompanying drawing explanation
The multi-medium data sorting technique process flow diagram that Fig. 1 provides for the embodiment of the present invention;
The method flow diagram of the behavioral characteristics of the extraction video data that Fig. 2 provides for the embodiment of the present invention;
The method flow diagram of the shot-cut rate feature of the extraction video data that Fig. 3 provides for the embodiment of the present invention;
The method flow diagram of the gradual shot rate feature of the extraction video data that Fig. 4 provides for the embodiment of the present invention;
The method flow diagram of the static frame per second feature of the extraction video data that Fig. 5 provides for the embodiment of the present invention;
The method flow diagram of the static nature of the extraction video data that Fig. 6 provides for the embodiment of the present invention;
The method flow diagram of the color characteristic of the extraction view data that Fig. 7 provides for the embodiment of the present invention;
The method flow diagram of the textural characteristics of the extraction view data that Fig. 8 provides for the embodiment of the present invention;
The method flow diagram of the shape facility of the extraction view data that Fig. 9 provides for the embodiment of the present invention;
The method flow diagram of the architectural feature of the extraction view data that Figure 10 provides for the embodiment of the present invention;
The distribution training set that Figure 11 provides for the embodiment of the present invention and the method flow diagram of test set;
The learning training method flow diagram that Figure 12 provides for the embodiment of the present invention one;
The learning training method flow diagram that Figure 13 provides for the embodiment of the present invention two;
The composition schematic diagram of the multi-medium data sorter that Figure 14 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.
The multi-medium data sorting technique process flow diagram that Fig. 1 provides for the embodiment of the present invention, as shown in Figure 1, its concrete steps are as follows:
Step 101: each original multi-medium data is carried out respectively to feature extraction, obtain the eigenvector of each data, the eigenvector of all data has formed primitive character vector set.
Multi-medium data can be video data, can be also view data.
For video data, can extract behavioral characteristics, editor's feature and color characteristic.
Wherein, behavioral characteristics comprises: mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average; Editor's feature can comprise: video lens shear rate, video lens fade rate and static frame per second; Static nature can comprise: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.Wherein, r, g, b are the color component of image under RGB color space.
For view data, can extract architectural feature, and extract color characteristic, shape facility, textural characteristics thrin or combination in any.
Step 102: for each classification, concentrate and choose such other training sample and test sample book from primitive character, all training sample composition training sets, all test sample book composition test sets.
Step 103: adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model.
Step 104: for the arbitrary test sample book in test set, the classification decision rule model that adopts step 103 to obtain calculates this test sample book, obtains the affiliated classification of this test sample book.
Below provide respectively the specific implementation of the video feature extraction method adopting in the embodiment of the present invention:
In the embodiment of the present invention, before carrying out feature extraction, in order to reduce algorithm complex and memory data output, can first adopt default reduced graph generating method, every two field picture is converted to thumbnail, the large I of thumbnail is determined according to actual conditions, as can be 160*120, and use respectively two array I yuv_curand I yuv_predeposit the yuv data of present frame and former frame thumbnail, use respectively two array I rgb_curand I rgb_predeposit the RGB data of present frame and former frame thumbnail, use respectively two array I hsv_curand I hsv_predeposit the HSV data of present frame and former frame thumbnail.Wherein, the color space of YUV, RGB, HSV presentation video.
The method flow diagram of the behavioral characteristics of the extraction video data that Fig. 2 provides for the embodiment of the present invention, as shown in Figure 2, its concrete steps are as follows:
Step 201: for the every two consecutive frame thumbnails in video sequence, calculate I yuv_curin the average of all y, obtain y avr_cur; Calculate I yuv_prein the average of all y, obtain y avr_pre, calculate y avr_curand y avr_prethe absolute value of difference, obtain the luminance difference of present frame and former frame thumbnail.
Step 202: calculate respectively I rgb_curin the average, the average of all g of all r, the average of all b, obtains r avr_cur, g avr_cur, b avr_cur; Calculate I rgb_prein the average, the average of all g of all r, the average of all b, obtains r avr_pre, g avr_pre, b avr_pre, calculate respectively r avr_curand r avr_preabsolute value, the g of difference avr_curand g avr_preabsolute value, the b of difference avr_curand b avr_prethe absolute value of difference, r, the g, the b that obtain present frame and former frame thumbnail are poor.
Step 203: adopt block matching algorithm to calculate I yuv_curand I yuv_prebetween motion vector average.
This step can adopt existing block matching algorithm to realize.
Step 204: for whole video sequence, poor to luminance difference, r between all thumbnails of consecutive frame between two that obtain, g is poor, b is poor, motion vector average is averaged, and obtains mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average.
Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average form the behavioral characteristics of video data.
Editor's feature of video data comprises: video lens shear rate, video lens fade rate and static frame per second.
The method flow diagram of the shot-cut rate feature of the extraction video data that Fig. 3 provides for the embodiment of the present invention, as shown in Figure 3, its concrete steps are as follows:
Step 301: for the every two consecutive frame thumbnail I in video sequence yuv_curand I yuv_pre, calculate respectively I yuv_curin the average, the average of all u, the average of all v of all y, obtain y avr_cur, u avr_cur, v avr_cur; Calculate I yuv_prein the average, the average of all u, the average of all v of all y, obtain y avr_pre, u avr_pre, v avr_pre, calculate respectively y avr_curand y avr_preabsolute value, the u of difference avr_curand u avr_preabsolute value, the v of difference avr_curand v avr_prethe absolute value of difference, the y that obtains present frame and former frame thumbnail is poor, u poor and v is poor.
Step 302: judge whether to meet: y is poor is greater than that preset first threshold value, u are poor is greater than that default Second Threshold, v are poor is greater than default the 3rd threshold value, if so, determines that present frame and former frame be candidate's shot-cut frame, perform step 303; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.
Step 303: use respectively Sobel Operator Method to I yuv_curand I yuv_precalculate, obtain respectively outline map I edge1and I edge2, calculate I edge1the average Gr of middle pixel value avr1, calculate I edge2the average Gr of middle pixel value avr2, calculate Gr avr1, Gr avr2the absolute value of difference.
Step 304: judge Gr avr1, Gr avr2the absolute value of difference whether be greater than default the 4th threshold value, if so, perform step 305; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.
Global illumination can be reduced and change the flase drop of the shot-cut causing in step 303 ~ 304.
Step 305: adopt block matching algorithm to calculate I yuv_curand I yuv_prebetween motion vector average.
Step 306: judge I yuv_curand I yuv_prebetween motion vector average whether be greater than default the 5th threshold value, if so, perform step 307; Otherwise, determine that present frame and former frame are not shot-cut frame, go to step 308.
The shot-cut flase drop causing due to object of which movement can be removed in step 305 ~ 306.
Step 307: determine that present frame and former frame are shot-cut frame, add 1 by shot-cut number.
It is 0 that shot-cut is counted initial value.
Step 308: judge whether, to complete to whole video sequence detection, if so, to perform step 309; Otherwise, go to next frame, return to step 301.
Step 309: the shot-cut number in calculating video sequence and the ratio of whole video sequence frame number, obtain shot-cut rate.
The method flow diagram of the gradual shot rate feature of the extraction video data that Fig. 4 provides for the embodiment of the present invention, as shown in Figure 4, its concrete steps are as follows:
Step 401: for the be separated by thumbnail I of q frame of any two in video sequence yuv_curand I yuv_cur-q, calculate I yuv_curin the average of all y, obtain y avr_cur; Calculate I yuv_cur-qin the average of all y, obtain y avr_cur-q, calculate y avr_curand y avr_cur-qthe absolute value of difference | y avr_cur-y avr_cur-q|.
Wherein, q is the frame number of being separated by of two two field pictures in video sequence, and the value of q can rule of thumb determine, preferably, and q desirable 10.
Step 402: judgement | y avr_cur-y avr_cur-q| whether be greater than default the 6th threshold value, if so, perform step 403; Otherwise, execution step 405.
Step 403: judge whether that continuous N frame all meets | y avr_cur-y avr_cur-q| be greater than default the 6th threshold value, if so, perform step 404; Otherwise, execution step 405.
The span of M can be q ~ 2q.
Step 404: judge and a gradual shot place detected, gradual shot number is added to 1.
It is 0 that gradual shot is counted initial value.
Step 405: judge whether that whole video sequence has all detected complete, if so, perform step 406; Otherwise, go to next frame, return to step 401.
Step 406: calculate the ratio of gradual shot number and whole video sequence frame number, obtain gradual shot rate.
The method flow diagram of the static frame per second feature of the extraction video data that Fig. 5 provides for the embodiment of the present invention, as shown in Figure 5, its concrete steps are as follows:
Step 501: for the every two consecutive frame thumbnail I in video sequence yuv_curand I yuv_pre, calculate respectively I yuv_curin the average, the average of all u, the average of all v of all y, obtain y avr_cur, u avr_cur, v avr_cur; Calculate I yuv_prein the average, the average of all u, the average of all v of all y, obtain y avr_pre, u avr_pre, v avr_pre, calculate respectively y avr_curand y avr_preabsolute value, the u of difference avr_curand u avr_preabsolute value, the v of difference avr_curand v avr_prethe absolute value of difference, the y that obtains present frame and former frame thumbnail is poor, u poor and v is poor.
Step 502: judge whether to meet: y is poor is less than that default the 7th threshold value, u are poor is less than that default the 8th threshold value, v are poor is less than default the 9th threshold value, if so, performs step 503; Otherwise, execution step 504.
Step 503: determine that present frame is frozen frozen mass, static frame number adds 1.
Static frame number initial value is 0.
Step 504: judge whether that whole video sequence has all detected complete, if so, perform step 505; Otherwise, go to next frame, return to step 501.
Step 505: calculate static frame number on whole video sequence and the ratio of whole video sequence frame number, obtain static frame per second.
Shot-cut rate, gradual shot rate and static frame per second form editor's feature of video data.
The method flow diagram of the static nature of the extraction video data that Fig. 6 provides for the embodiment of the present invention, as shown in Figure 6, its concrete steps are as follows:
Step 601: for the arbitrary frame thumbnail in video sequence, calculate I yuv_curin average and the variance of y, calculate I hsv_curin the average of s.
Step 602: for I yuv_curin y, statistics y value is greater than the number of presetting the tenth threshold value, calculates the ratio of the pixel sum in this number and whole thumbnail, obtains the high component ratio of brightness; For I hsv_curin s, statistics s value is greater than the number of presetting the 11 threshold value, calculates the ratio of the pixel sum in this number and whole thumbnail, obtains the high component ratio of saturation degree.
Step 603: to I yuv_curcarry out three rank wavelet transformations, obtain 10 subgraphs, calculate respectively the root mean square of the y of each subgraph.
Step 604: for whole video sequence, calculate respectively the root mean square summation of the y of each subgraph of the high component ratio of average, the variance of y, the average of s, brightness, the high component ratio of saturation degree of the y of all frames, all frames, and each and value are averaged for video sequence frame number, obtain the average y root mean square of mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and each subgraph.
Wherein, in this step, for the root mean square of y in 10 subgraphs, be for each subgraph, the root mean square of the y of this subgraph in all frames be added, then by averaging for video sequence frame number with value of obtaining, obtain the average y root mean square of this subgraph, like this, obtain respectively the average y root mean square of 10 subgraphs, the average y root mean square of these 10 subgraphs has formed the wavelet transformation textural characteristics of video data.
Mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics have formed the static nature of video data.
Below provide respectively the specific implementation of the image characteristic extracting method adopting in the embodiment of the present invention:
Characteristics of image comprises: architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any.
The method flow diagram of the color characteristic of the extraction view data that Fig. 7 provides for the embodiment of the present invention, as shown in Figure 7, its concrete steps are as follows:
Step 701: for each two field picture, under the hsv color space of this image, calculate respectively average, variance and the degree of bias of tone (H), saturation degree (S) and three components of brightness (V), i.e. the single order of image, second order and three rank HSV spatial color moment characteristics.
The computing formula of three kinds of moment characteristics is as follows:
Average: μ i = 1 n Σ j = 1 n p ij
Variance: σ i = ( 1 n Σ j = 1 n ( p ij - μ i ) 2 ) 1 2
The degree of bias: s i = ( 1 n Σ j = 1 n ( p ij - μ i ) 3 ) 1 3
Wherein, the sequence number that i is color component, totally 3 of color components, i.e. h, s, v, its sequence number is respectively 0,1,2, i.e. i0,1,2; p ijbe the value of j pixel in i color component, n is pixel sum.
Step 702: the tone (H) in the hsv color space to this two field picture, saturation degree (S) and three component values of brightness (V) carry out non-equivalent quantification, and wherein, H quantizes to 8 values, and S, V quantize to two values.
Color histogram dimension after quantification is:
Hist=HQ sQ v+SQ v+V
Wherein, Q s, Q vbeing respectively the quantification progression of S and V component, is all that 2, H, S, V are respectively the value after H, S, tri-element quantizations of V.
Step 703: according to the color histogram after quantizing, calculate single order, second order and three rank color histogram moment characteristics.
Single order, second order and the three rank HSV spatial color moment characteristics of image and single order, second order and third moment color histogram moment characteristics have formed the color characteristic of view data.
The method flow diagram of the textural characteristics of the extraction view data that Fig. 8 provides for the embodiment of the present invention, as shown in Figure 8, its concrete steps are as follows:
Step 801: for each two field picture, calculate the gray-scale map of this image, gray-scale value is quantified as to 8 grades.
Step 802: it is 32 × 32 video in window that this two field picture is divided into several sizes, each video in window non-overlapping copies.
The number of video in window is determined with the ratio of video in window size according to the width of this two field picture and height.
Step 803: for each video in window, calculate the gray level co-occurrence matrixes of the four direction (0 °, 45 °, 90 °, 135 °) of this video in window.
Computing formula is as follows:
m (d,θ)(i,j)=card{[(x 1,y 1),(x 2,y 2)]∈S|f(x 1,y 1)=i&f(x 2,y 2)=j}
Wherein, f (x, y) represent video in window, S is the right set of pixel that pixel value is respectively i, j, d is that pixel value is the distance of two pixels of i, j, and the value of d is rule of thumb definite in advance, and θ represents orientation angle, 0 °, 45 °, 90 ° or 135 °, card (S) represents in S set m (d, θ)(i, j) contributive element number.
Step 804: for the gray level co-occurrence matrixes of each direction of each video in window, calculate 7 statistic features of this gray level co-occurrence matrixes of this video in window: correlativity, contrast, entropy, unfavourable balance square, energy and average and entropy.
Step 805: for each statistic feature of the gray level co-occurrence matrixes of each direction, to this statistic feature summation of all video in windows, this and value are averaging with respect to the sum of video in window, obtain the mean value of this statistic feature, and the mean value of this statistic feature is normalized, the last mean value that obtains altogether 28 statistic features on four direction, the mean value of these 28 statistic features has formed the textural characteristics based on gray level co-occurrence matrixes of this two field picture.
Step 806: this two field picture is carried out to three rank wavelet transformations, obtain 10 subgraphs, calculate the root mean square of the y of each subgraph, finally obtain altogether the root mean square of the y of 10 subgraphs, the root mean square of the y of these 10 subgraphs has formed the wavelet transformation textural characteristics of this two field picture.
Textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics have formed the textural characteristics of view data.
The method flow diagram of the shape facility of the extraction view data that Fig. 9 provides for the embodiment of the present invention, as shown in Figure 9, its concrete steps are as follows:
Step 901: for each two field picture, calculate the gray-scale map of this two field picture, gray-scale value is quantified as to 8 grades.
Step 902: carry out gaussian filtering respectively in the x of the gray-scale map of this two field picture, y direction, obtain filtered image I s.
Step 903: calculate I respectively in x, y direction sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure I gradmag.
Step 904: to gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, to suppress the wherein pixel value of non local extreme point, obtain possible boundary graph I edge.
This step can adopt existing techniques in realizing.
Step 905: to I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_final.
This step can adopt existing techniques in realizing.
Step 906: computation bound figure I edge_final7 geometric invariant moment.
With f (x, y) expression I edge_final, its p+q rank square is defined as:
m pq = Σ x Σ y x p y q f ( x , y )
Qip+qJie center square is defined as
Figure BDA00002491103600172
wherein,
Figure BDA00002491103600173
the regional barycenter of presentation video, normalization Hou center square is expressed as η pq = μ pq μ 00 r , Wherein γ = p + q 2 + 1 , p+q=2,3,...
Here only calculate second order and third central moment, namely p+q=2 or 3 situation.Draw 7 geometric invariant moment by above second order and third central moment, as follows:
Figure BDA00002491103600177
Figure BDA00002491103600178
Figure BDA00002491103600179
Figure BDA000024911036001710
Figure BDA000024911036001711
Figure BDA000024911036001712
Figure BDA000024911036001714
Boundary graph I edge_final7 geometric invariant moment formed the shape facility of view data.
The method flow diagram of the architectural feature of the extraction view data that Figure 10 provides for the embodiment of the present invention, as shown in figure 10, its concrete steps are as follows:
Step 1001: for each two field picture, calculate the gray-scale map I of this two field picture gray, gray-scale value is quantified as to 8 grades.
Step 1002: to I graycarry out Census and convert the Census Transformation Graphs I that obtains this two field picture census.
Wherein, Census conversion can be described as: the magnitude relationship of the pixel value of more each pixel and its 8 neighborhood territory pixel point respectively, if this pixel value is less than neighborhood territory pixel value, putting neighborhood value is 0, otherwise putting neighborhood value is 1, then according to order, from left to right, neighborhood value is formed to the binary number of eight from top to bottom, this binary number is converted to decimal number, is the Census transformed value CT of this pixel, form Census Transformation Graphs I by Census transformed value CT census, be exemplified below:
32 64 96 1 1 0
32 64 96 ⇒ 1 0 ⇒ ( 11010110 ) 2 ⇒ CT = 214
32 32 96 1 1 0
Step 1003: calculate I censushistogram, its dimension is 256.
Step 1004: use principal component analysis (PCA) to carry out dimension-reduction treatment to histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this two field picture.
In the embodiment of the present invention, can carry out training study and Classification and Identification processing to the eigenvector of video or image with the support vector machine classifier based on radial basis kernel function.
The distribution training set that Figure 11 provides for the embodiment of the present invention and the method flow diagram of test set, as shown in figure 11, its concrete steps are as follows:
Step 1101: according to the training set of setting and the scale-up factor ratio of test set, the total sample number of each classification, determine the sum of training set and test set.
Ratio can set as required.
Step 1102: for arbitrary classification, choose at random training sample in such other sample.
Select the criterion of sample to be: between class, in the large and class of difference, difference is little.
Step 1103: for arbitrary classification, when such other training sample has all been selected when complete, using such very the residue sample in this as such other test sample book.
The learning training method flow diagram that Figure 12 provides for the embodiment of the present invention one, as shown in figure 12, its concrete steps are as follows:
Step 1201: preset frequency of training n.
N can rule of thumb set.
Step 1202: in the time that each training starts, adopt flow process shown in Figure 11 to generate training sample and the test sample book of each classification, obtain training set and test set.
Step 1203: support vector machine classifier is trained and obtained classifying decision rule model with training set.
Step 1204: for each test sample book, this test sample book is classified with classification decision rule model, when all test sample books being classified when complete, calculate this classification accuracy.
Step 1205: whether training of judgement number of times reaches n time, if so, performs step 1206; Otherwise, return to step 1202 and start training process next time.
Step 1206: the classification accuracy of n time is averaged, draw final classification accuracy.
In the present embodiment, in the time that each training starts to generate training set and test set, can adopt different ratio.
The learning training method flow diagram that Figure 13 provides for the embodiment of the present invention two, as shown in figure 13, its concrete steps are as follows:
Step 1301: preset frequency of training n.
Step 1302: in the time that each training starts, adopt flow process shown in Figure 11 to generate training sample and the test sample book of each classification, obtain training set and test set.
Step 1303: training set is carried out to m time and choose at random, the sample number of at every turn choosing is p, obtains m new training set.
M can rule of thumb set, the sample number in the training set in p< step 1302.
Step 1304: respectively support vector machine classifier train and obtained m the decision rule model of classifying with the training set that m is new, with this m model, the each test sample book in test set is classified respectively again, obtain m classification results for each sample of test set.
Step 1305: for m classification results of each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as the final classification results of this test sample book.
Step 1306: according to the final classification results of each test sample book, draw this classification accuracy.
Step 1307: whether training of judgement number of times reaches n time, if so, performs step 1308; Otherwise, return to step 1302 and start training process next time.
Step 1308: the classification accuracy of n time is averaged, draw final classification accuracy.
The present embodiment also can adopt different scale-up factor ratio in each training process.
The composition schematic diagram of the multi-medium data sorter that Figure 14 provides for the embodiment of the present invention, as shown in figure 14, it mainly comprises: characteristic extracting module 141, sample are chosen module 142, training module 143 and test module 144, wherein:
Characteristic extracting module 141: extract respectively feature from each original multi-medium data, the feature of all data has formed primitive character collection, and, in the time that original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any, send to sample to choose module 142 primitive character collection.
Sample is chosen module 142: the primitive character collection that receive feature extraction module 141 is sent, and for each classification, to concentrate and choose such other training sample from primitive character, all training sample composition training sets, send to training module 143 by training set.
Training module 143: receive sample and choose the training set that module 142 is sent, adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model, classification decision rule model is sent to test module 144.
Test module 144: receive the classification decision rule model that training module 143 is sent, for arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.
Characteristic extracting module 141 is further used for, when extract be characterized as behavioral characteristics time, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.
Editor's feature that characteristic extracting module 141 is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.
Characteristic extracting module 141 is further used for, when extract be characterized as video lens shear rate time, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.
Characteristic extracting module 141 is further used for, when extract be characterized as video lens fade rate time, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.
The static nature that characteristic extracting module 141 is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.
The color characteristic that characteristic extracting module 141 is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.
The textural characteristics that characteristic extracting module 141 is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.
Characteristic extracting module 141 is further used for, when extract be characterized as shape facility time, calculate the gray-scale map of this image, in the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I s, in x, y direction, calculate I respectively sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure I gradmag, to gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I edge, to I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_final, computation bound figure I edge_final7 geometric invariant moment, obtain the shape facility of image.
Characteristic extracting module 141 is further used for, when extract be characterized as architectural feature time, calculate the gray-scale map I of this image gray, to I graycarry out Census and convert the Census Transformation Graphs I that obtains this image census, calculate I censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to this histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.
Training module 143 is further used for, preset frequency of training n, training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, training process next time started.
Simultaneously, test module 144 is further used for, in the time receiving training module and send m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as this classification results of this test sample book, in the time receiving the complete indication of training that training module sends, determines that training finishes.
In actual applications, adopt the embodiment of the present invention to carry out visual classification and Images Classification experiment, specific as follows:
For video, video is divided into 4 classifications, be respectively cartoon class, news category, sport category and other classes.Collect on the internet the video segment of these more common 4 classifications as original video data, wherein each 50 sample datas of cartoon, news and sport category, 39 sample datas of other classifications, be 5:5,6:4,7:3,8:2 and 9:1 in the ratio of training set and test set respectively, the video feature extraction and the sorting technique that adopt the embodiment of the present invention to use are tested, and obtain 84%~88% classification accuracy.
For image, select the image data base of Wang Group, totally 10 classes, respectively original inhabitant, sandy beach, building, bus, dinosaur, elephant, flower, horse, mountain range and food, each class 100 width image, totally 1000 width, are 5:5,6:4,7:3,8:2 and 9:1 in the ratio of training set and test set respectively, the image characteristics extraction and the sorting technique that adopt the embodiment of the present invention to use are tested, and obtain 83%~86% classification accuracy.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (22)

1. a multi-medium data sorting technique, is characterized in that, the method comprises:
From each original multi-medium data, extract respectively feature, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any;
For each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets;
Adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model;
For arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.
2. method according to claim 1, is characterized in that, when described while being characterized as behavioral characteristics,
Described extraction feature comprises:
For the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data;
For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change;
For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence;
Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.
3. method according to claim 1, is characterized in that, described editor's feature comprises: video lens shear rate, video lens fade rate and static frame per second.
4. method according to claim 3, is characterized in that, when described while being characterized as video lens shear rate,
Described extraction feature comprises:
For the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor;
Judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1,
When whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.
5. method according to claim 3, is characterized in that, when described while being characterized as video lens fade rate,
Described extraction feature comprises:
For any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1;
When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.
6. method according to claim 3, it is characterized in that, described static nature comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise:
Mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.
7. method according to claim 1, is characterized in that, described color characteristic comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.
8. method according to claim 1, is characterized in that, described textural characteristics comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.
9. method according to claim 1, is characterized in that, when described while being characterized as shape facility,
Described extraction feature comprises:
Calculate the gray-scale map of this image;
In the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I s;
In x, y direction, calculate I respectively sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure i gradmag;
To gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I edge;
To I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_fnal;
Computation bound figure I edge_final7 geometric invariant moment, obtain the shape facility of image.
10. method according to claim 1, is characterized in that, when described while being characterized as architectural feature,
Described extraction feature comprises:
Calculate the gray-scale map I of this image gray;
To I graycarry out Census and convert the Census Transformation Graphs I that obtains this image census;
Calculate I censushistogram, its dimension is 256;
Use principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.
11. methods according to claim 1, is characterized in that, describedly training set is carried out to learning training comprise:
Preset frequency of training n;
Training set is carried out to m time and choose at random, the sample number of at every turn choosing is less than the total sample number in training set, obtains m new training set;
Carry out respectively learning training with m new training set and obtain m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, and who gets the most votes's classification is as this classification results of this test sample book;
Whether training of judgement number of times reaches n time, if so, determines that training finishes; Otherwise, start training process next time.
12. 1 kinds of multi-medium data sorters, is characterized in that, comprising:
Characteristic extracting module: extract respectively feature from each original multi-medium data, the feature of all data has formed primitive character collection, and, in the time that described original multi-medium data is video data, extract behavioral characteristics, editor's feature and static nature, in the time that described original multi-medium data is view data, extract architectural feature, and color characteristic, textural characteristics, shape facility thrin or combination in any, primitive character collection is sent;
Sample is chosen module: receive described primitive character collection, for each classification, concentrate and choose such other training sample from primitive character, all training sample composition training sets, send training set;
Training module: receive described training set, adopt default sorting algorithm, training set is carried out to learning training, generate classification decision rule model, classification decision rule model is sent;
Test module: receive described classification decision rule model, for arbitrary test sample book, adopt classification decision rule model to calculate this test sample book, obtain the affiliated classification of this test sample book.
13. devices according to claim 12, it is characterized in that, described characteristic extracting module is further used for, while being characterized as behavioral characteristics, for the every two consecutive frame YUV images in video data, calculate respectively the brightness average of every two field picture when described, calculate the absolute value of the difference of the brightness average of this two two field picture, obtain the luminance difference of this two two field picture, calculate the average of the luminance difference of whole video data, obtain the mean flow rate difference in change of whole video data; For the every two consecutive frame RGB images in video data, calculate respectively the average of the r of every two field picture, the average of g, the average of b, calculate the absolute value of the difference of the average of absolute value, the b of the difference of the average of absolute value, the g of the difference of the average of the r of this two two field picture, the r that obtains this two two field picture is poor, g is poor, b is poor, calculate the average of the r difference of whole video data, the average that g is poor, the average that b is poor, obtain the average r difference in change of whole video data, average g difference in change, average b difference in change; For the every two adjacent two field pictures in video data, calculate the motion vector average between this two two field picture, calculate the average motion vector average of whole video sequence; Mean flow rate difference in change, average r difference in change, average g difference in change, average b difference in change and average motion vector average have formed the behavioral characteristics of video data.
14. devices according to claim 12, is characterized in that, editor's feature that described characteristic extracting module is extracted comprises: video lens shear rate, video lens fade rate and static frame per second.
15. devices according to claim 14, it is characterized in that, described characteristic extracting module is further used for, when described while being characterized as video lens shear rate, for the every two consecutive frame YUV images in video data, calculate respectively the average of the y of every two field picture, the average of u, the average of v, calculate the absolute value of the difference of the average of absolute value, the v of the difference of the average of absolute value, the u of the difference of the average of the y of this two two field picture, the y that obtains this two two field picture is poor, u is poor, v is poor, judge whether to meet: the poor preset first threshold value that is greater than of y, u difference is greater than default Second Threshold, poor being greater than of v preset the 3rd threshold value, if, determine that this two two field picture is candidate's shot-cut frame, use respectively Sobel Operator Method to calculate this two two field picture, obtain respectively two outline maps, calculate the absolute value of the difference of the average of pixel value in these two outline maps, if the absolute value of this difference is greater than default the 4th threshold value, calculate the motion vector average between this two two field picture, if this motion vector average is greater than default the 5th threshold value, determine that this two two field picture is shot-cut frame, shot-cut number is added to 1, when whole video data being detected when complete, calculate the ratio of shot-cut number and whole video data frame number, obtain shot-cut rate.
16. devices according to claim 14, it is characterized in that, described characteristic extracting module is further used for, when described while being characterized as video lens fade rate, for any two YUV images of being separated by q frame in video data, calculate the absolute value of the difference of the average of the y of this two two field picture, if the absolute value of this difference is greater than default the 6th threshold value, judge whether that continuous N time meets: the absolute value of the difference of the average of the y of the image of the q frame of being separated by is greater than default the 6th threshold value, if, judge and a gradual shot place detected, gradual shot number is added to 1; Otherwise, go to next frame and continue to detect, wherein, q, M are default integer, and q>1; When whole video data detects when complete, calculate the ratio of gradual shot number and whole video data frame number, obtain gradual shot rate.
17. devices according to claim 12, it is characterized in that, the static nature that described characteristic extracting module is extracted comprises: mean flow rate average, mean flow rate variance, average staturation average and wavelet transformation textural characteristics, or comprise: the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics, or comprise: mean flow rate average, mean flow rate variance, average staturation average, the high component ratio of mean flow rate, the high component ratio of average staturation and wavelet transformation textural characteristics.
18. devices according to claim 12, is characterized in that, the color characteristic that described characteristic extracting module is extracted comprises: single order, second order and three rank HSV spatial color moment characteristics and single order, second order and the third moment color histogram moment characteristics of image.
19. devices according to claim 12, is characterized in that, the textural characteristics that described characteristic extracting module is extracted comprises: the textural characteristics based on gray level co-occurrence matrixes and wavelet transformation textural characteristics.
20. devices according to claim 12, is characterized in that, described characteristic extracting module is further used for, when described while being characterized as shape facility, calculate the gray-scale map of this image, in the x of the gray-scale map of this image, y direction, carry out gaussian filtering respectively, obtain filtered image I s, in x, y direction, calculate I respectively sgradient, obtain gradient map I gradxand I grady, according to I gradxand I gradycalculate gradient amplitude figure I gradmag, to gradient amplitude figure I gradmagcarry out non-maximization and suppress to process, obtain possible boundary graph I edge, to I edgecarry out threshold estimation, obtain a high threshold HighThrd, judge I edgemiddle possible boundary pixel is put corresponding gradient amplitude and whether is greater than HighThrd, the if so, starting point using this as border, and then recurrence is followed the trail of other frontier points, until by all pixels that should border are all found out, obtain final boundary graph I edge_fnal, computation bound figure I edge_final7 geometric invariant moment, obtain the shape facility of image.
21. devices according to claim 12, is characterized in that, described characteristic extracting module is further used for, and while being characterized as architectural feature, calculate the gray-scale map I of this image when described gray, to I graycarry out Census and convert the Census Transformation Graphs I that obtains this image census, calculate I censushistogram, its dimension is 256, uses principal component analysis (PCA) to carry out dimension-reduction treatment to described histogram, obtaining final Census, to convert histogrammic dimension be 40, the architectural feature using Census conversion histogram of this 40 dimension as this image.
22. devices according to claim 12, it is characterized in that, described training module is further used for, preset frequency of training n, training set is carried out to m time to be chosen at random, the sample number of at every turn choosing is less than the total sample number in training set, obtain m new training set, carry out respectively learning training with m new training set and obtain m classification decision rule model, this m classification decision rule model sent to test module, whether training of judgement number of times reaches n time, if, send the complete indication of training to test module, otherwise, start training process next time,
Described test module is further used for, in the time receiving training module and send m classification decision rule model, with this m model, each test sample book is classified respectively, obtain m classification results for each test sample book, adopt voting mechanism, this m classification results is voted, who gets the most votes's classification is as this classification results of this test sample book, in the time receiving the complete indication of training that training module sends, determine that training finishes.
CN201210498829.XA 2012-11-29 2012-11-29 multimedia data classification method and device Active CN103853724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210498829.XA CN103853724B (en) 2012-11-29 2012-11-29 multimedia data classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210498829.XA CN103853724B (en) 2012-11-29 2012-11-29 multimedia data classification method and device

Publications (2)

Publication Number Publication Date
CN103853724A true CN103853724A (en) 2014-06-11
CN103853724B CN103853724B (en) 2017-10-17

Family

ID=50861392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210498829.XA Active CN103853724B (en) 2012-11-29 2012-11-29 multimedia data classification method and device

Country Status (1)

Country Link
CN (1) CN103853724B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463196A (en) * 2014-11-11 2015-03-25 中国人民解放军理工大学 Video-based weather phenomenon recognition method
CN105046366A (en) * 2015-07-29 2015-11-11 腾讯科技(深圳)有限公司 Model training method and device
CN106570910A (en) * 2016-11-02 2017-04-19 南阳理工学院 Auto-encoding characteristic and neighbor model based automatic image marking method
CN107291825A (en) * 2017-05-26 2017-10-24 北京奇艺世纪科技有限公司 With the search method and system of money commodity in a kind of video
CN108205324A (en) * 2018-01-03 2018-06-26 李文清 A kind of Intelligent road cleaning plant
CN108229300A (en) * 2017-11-02 2018-06-29 深圳市商汤科技有限公司 Video classification methods, device, computer readable storage medium and electronic equipment
CN108399241A (en) * 2018-02-28 2018-08-14 福州大学 A kind of emerging much-talked-about topic detecting system based on multiclass feature fusion
CN108780462A (en) * 2016-03-13 2018-11-09 科尔蒂卡有限公司 System and method for being clustered to multimedia content element
CN109347834A (en) * 2018-10-24 2019-02-15 广东工业大学 Detection method, device and the equipment of abnormal data in Internet of Things edge calculations environment
CN109580656A (en) * 2018-12-24 2019-04-05 广东华中科技大学工业技术研究院 Mobile phone light guide panel defect inspection method and system based on changeable weight assembled classifier
CN113326857A (en) * 2020-02-28 2021-08-31 合肥美亚光电技术股份有限公司 Model training method and device
CN116894794A (en) * 2023-09-11 2023-10-17 长沙超创电子科技有限公司 Quick denoising method for video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276417A (en) * 2008-04-17 2008-10-01 上海交通大学 Method for filtering internet cartoon medium rubbish information based on content
US20110026840A1 (en) * 2009-07-28 2011-02-03 Samsung Electronics Co., Ltd. System and method for indoor-outdoor scene classification
CN102508923A (en) * 2011-11-22 2012-06-20 北京大学 Automatic video annotation method based on automatic classification and keyword marking

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276417A (en) * 2008-04-17 2008-10-01 上海交通大学 Method for filtering internet cartoon medium rubbish information based on content
US20110026840A1 (en) * 2009-07-28 2011-02-03 Samsung Electronics Co., Ltd. System and method for indoor-outdoor scene classification
CN102508923A (en) * 2011-11-22 2012-06-20 北京大学 Automatic video annotation method based on automatic classification and keyword marking

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
基于内容的不良图像识别研究;柳伯超;《中国优秀硕士学位论文全文数据库 信息科技辑》;20071015;第I138-762页 *
基于多特征组合和SVM的视频内容自动分类算法研究;覃丹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111215;第I138-1309页 *
柳伯超: "基于内容的不良图像识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
覃丹: "基于多特征组合和SVM的视频内容自动分类算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈海林: ""基于判别学习的图像目标分类研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463196A (en) * 2014-11-11 2015-03-25 中国人民解放军理工大学 Video-based weather phenomenon recognition method
CN104463196B (en) * 2014-11-11 2017-07-25 中国人民解放军理工大学 A kind of weather phenomenon recognition methods based on video
CN105046366B (en) * 2015-07-29 2018-06-08 腾讯科技(深圳)有限公司 model training method and device
CN105046366A (en) * 2015-07-29 2015-11-11 腾讯科技(深圳)有限公司 Model training method and device
CN108780462A (en) * 2016-03-13 2018-11-09 科尔蒂卡有限公司 System and method for being clustered to multimedia content element
CN108780462B (en) * 2016-03-13 2022-11-22 科尔蒂卡有限公司 System and method for clustering multimedia content elements
CN106570910A (en) * 2016-11-02 2017-04-19 南阳理工学院 Auto-encoding characteristic and neighbor model based automatic image marking method
CN106570910B (en) * 2016-11-02 2019-08-20 南阳理工学院 Based on the image automatic annotation method from coding characteristic and Neighborhood Model
CN107291825A (en) * 2017-05-26 2017-10-24 北京奇艺世纪科技有限公司 With the search method and system of money commodity in a kind of video
CN108229300A (en) * 2017-11-02 2018-06-29 深圳市商汤科技有限公司 Video classification methods, device, computer readable storage medium and electronic equipment
CN108229300B (en) * 2017-11-02 2020-08-11 深圳市商汤科技有限公司 Video classification method and device, computer-readable storage medium and electronic equipment
CN108205324A (en) * 2018-01-03 2018-06-26 李文清 A kind of Intelligent road cleaning plant
CN108399241A (en) * 2018-02-28 2018-08-14 福州大学 A kind of emerging much-talked-about topic detecting system based on multiclass feature fusion
CN108399241B (en) * 2018-02-28 2021-08-31 福州大学 Emerging hot topic detection system based on multi-class feature fusion
CN109347834A (en) * 2018-10-24 2019-02-15 广东工业大学 Detection method, device and the equipment of abnormal data in Internet of Things edge calculations environment
CN109347834B (en) * 2018-10-24 2021-03-16 广东工业大学 Method, device and equipment for detecting abnormal data in Internet of things edge computing environment
CN109580656B (en) * 2018-12-24 2021-01-15 广东华中科技大学工业技术研究院 Mobile phone light guide plate defect detection method and system based on dynamic weight combination classifier
CN109580656A (en) * 2018-12-24 2019-04-05 广东华中科技大学工业技术研究院 Mobile phone light guide panel defect inspection method and system based on changeable weight assembled classifier
CN113326857A (en) * 2020-02-28 2021-08-31 合肥美亚光电技术股份有限公司 Model training method and device
CN116894794A (en) * 2023-09-11 2023-10-17 长沙超创电子科技有限公司 Quick denoising method for video
CN116894794B (en) * 2023-09-11 2023-11-21 长沙超创电子科技有限公司 Quick denoising method for video

Also Published As

Publication number Publication date
CN103853724B (en) 2017-10-17

Similar Documents

Publication Publication Date Title
CN103853724A (en) Multimedia data sorting method and device
CN109165623B (en) Rice disease spot detection method and system based on deep learning
Lee et al. Adaboost for text detection in natural scene
US9443314B1 (en) Hierarchical conditional random field model for labeling and segmenting images
CN109840560B (en) Image classification method based on clustering in capsule network
CN104866616B (en) Monitor video Target Searching Method
CN106126585B (en) The unmanned plane image search method combined based on quality grading with perceived hash characteristics
CN108647602B (en) A kind of aerial remote sensing images scene classification method determined based on image complexity
CN103530638B (en) Method for pedestrian matching under multi-cam
CN105574063A (en) Image retrieval method based on visual saliency
CN102024152B (en) Method for recognizing traffic sings based on sparse expression and dictionary study
CN106156777B (en) Text picture detection method and device
CN103218832B (en) Based on the vision significance algorithm of global color contrast and spatial distribution in image
US8503768B2 (en) Shape description and modeling for image subscene recognition
CN102695056A (en) Method for extracting compressed video key frames
CN104778476A (en) Image classification method
CN104281849A (en) Fabric image color feature extraction method
Farinella et al. Scene classification in compressed and constrained domain
CN102867183A (en) Method and device for detecting littered objects of vehicle and intelligent traffic monitoring system
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN103218604A (en) Method for detecting pedestrians in traffic scene based on road surface extraction
CN109858570A (en) Image classification method and system, computer equipment and medium
CN103985130A (en) Image significance analysis method for complex texture images
CN111382766A (en) Equipment fault detection method based on fast R-CNN
CN108647703B (en) Saliency-based classification image library type judgment method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 5-12 / F, building 6, 57 Andemen street, Yuhuatai District, Nanjing City, Jiangsu Province

Patentee after: Samsung Electronics (China) R&D Center

Patentee after: SAMSUNG ELECTRONICS Co.,Ltd.

Address before: 17 / F, Xindi center, 188 Lushan Road, Jianye District, Nanjing, Jiangsu 210019

Patentee before: Samsung Electronics (China) R&D Center

Patentee before: SAMSUNG ELECTRONICS Co.,Ltd.

CP02 Change in the address of a patent holder