CN103065300A - Method for video labeling and device for video labeling - Google Patents

Method for video labeling and device for video labeling Download PDF

Info

Publication number
CN103065300A
CN103065300A CN2012105669855A CN201210566985A CN103065300A CN 103065300 A CN103065300 A CN 103065300A CN 2012105669855 A CN2012105669855 A CN 2012105669855A CN 201210566985 A CN201210566985 A CN 201210566985A CN 103065300 A CN103065300 A CN 103065300A
Authority
CN
China
Prior art keywords
sample
mark
video
classification
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105669855A
Other languages
Chinese (zh)
Other versions
CN103065300B (en
Inventor
秦兴德
吴金勇
王一科
王军
钟翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Security and Surveillance Technology PRC Inc
Original Assignee
China Security and Surveillance Technology PRC Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Security and Surveillance Technology PRC Inc filed Critical China Security and Surveillance Technology PRC Inc
Priority to CN201210566985.5A priority Critical patent/CN103065300B/en
Publication of CN103065300A publication Critical patent/CN103065300A/en
Application granted granted Critical
Publication of CN103065300B publication Critical patent/CN103065300B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method for video labeling and a device for video labeling and belongs to the field of video processing. The method for the video labeling comprises carrying out a shot segmentation to a video; extracting a key frame assembly in each segmented shot; extracting relevant low layer eigenvectors of each key frame assembly; using semi-supervised kernel density estimation arithmetic to label each unlabeled sample with a category label and labeling key frames corresponding to the unlabeled samples with category. Due to the fact that the eigenvectors combined by various low layer characteristics of an image are adopted to represent the key frames, and loss of pattern information is reduced. The semi-supervised kernel density estimation arithmetic is used for labeling each unlabeled sample with a category, unlabeled data is leaded to a kernel density estimation, feature information of the labeled samples and unlabeled samples are comprehensively applied, and efficiency of video labeling and accuracy of kernel density estimation are improved.

Description

A kind of video labeling method and device
Technical field
The present invention relates to Video processing and machine learning field, particularly a kind of video labeling method and device.
Background technology
Along with the development of computer and network technologies, so that domestic consumer can contact increasing video data.Video data provides a large amount of Useful Informations, and its content is more abundanter, directly perceived and lively than other forms of data.On the one hand, the magnanimity information that the video data that enriches comprises is that other media are incomparable; But on the other hand, the polysemy of its day by day huge data volume, non-structured data mode and content is provided with obstacle for user interactive easily again, has affected the larger effect of its performance.
In order to excavate potential value in the large-scale video set, the user needs effectively to retrieve needed video segment.Video labeling is the technology that text and video semanteme content association are got up, and is the mode of the semantic gap of a kind of good minimizing and the intermediate steps that can be used for video frequency searching, so that the user can retrieve by key frame or the semantic information of input video.
In actual applications, it is relatively more difficult that multitude of video is marked, and at first uses artificial method to remove to mark a video set and need to expend a large amount of time and efforts, and the user does not often have enough patience to finish the mark of whole sample set; Secondly, extract the semanteme that to express video content with low-level proper vector very difficult.
How few sample of trying one's best is manually marked and the low-level image feature of various ways, and obtain the key issue that video labeling performance preferably becomes video labeling.Because machine Learning Theory is relatively ripe, foundation and the various possible solution of theoretical analysis can be provided for video labeling, it is generally acknowledged that therefore it is to solve the video labeling problem than better suited method.At present, the research of video labeling mainly concentrates on the accuracy of how utilizing learning method and improving mark in conjunction with the characteristics of video.Many machine learning methods such as support vector machine (Support Vector Machines, be called for short SVM), the concentrated information of having considered the mark sample such as Bayess classification, random forest (Random Forest), and a large amount of do not mark the information that sample comprises and be wasted.
Summary of the invention
The few quantity that do not mark of processing labeled data exists in a large number in the prior art in order to overcome, and the few deficiency of single features expressing information, the invention provides a kind of video labeling method and device, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, improved the efficient of video labeling and the accuracy of Density Estimator.
It is as follows that the present invention solves the problems of the technologies described above the technical scheme that adopts:
According to an aspect of the present invention, a kind of video labeling method that provides may further comprise the steps:
Video is carried out camera lens to be cut apart;
Key frame set in each camera lens that extraction is cut apart;
Extract the relevant low-level image feature vector of each key frame set;
Utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;
Do not carry out the classification mark to marking the corresponding key frame of sample.
Preferably, above-mentionedly video is carried out camera lens cut apart and may further comprise the steps:
If video is compressed video, then carry out video decode, to obtain primitive frame;
If the color space of image is not the hsv color space, then the RGB color space conversion with image is the hsv color space;
Utilization is carried out camera lens based on the Shot Detection method of pixel domain and is cut apart.
Preferably, the key frame in each camera lens of cutting apart of said extracted comprises following steps:
The frame pitch that calculates all consecutive frames in the same camera lens from, select all and a upper consecutive frame range difference greater than the frame of adaptive threshold as key frame.
Preferably, above-mentioned relevant low-level image feature vector comprises: color histogram, color moment, marginal distribution histogram and/or Tamura textural characteristics.
Preferably, above-mentionedly utilize semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark and may further comprise the steps:
Initialization mark sample posterior probability;
Calculate the cuclear density of sample;
Calculate the posterior probability that does not mark sample;
Determine not mark the affiliated classification of sample.
Preferably, initialization mark sample posterior probability adopts following formula to carry out:
P ( C k | x j ) = l k Σ k = 1 K l k , j ∈ L
Wherein, j, k all belong to natural number, l kFor being labeled as the sample number of classification k,
Figure BDA00002643917300032
The mark classification number that represents all samples, C kBe the sample set of k class, P (C kx j) the given sample x of expression jBelong to classification C kThe initialization posterior probability of conditional probability.
Preferably, calculating the posterior probability that does not mark sample adopts following formula to carry out:
P ^ ( C k | x j ) = Σ i = 1 n P ( C k | x i ) κ ( x j - x i ) Σ i = 1 n κ ( x j - x i )
Wherein, k is mark sample type quantity, and n is the total sample number amount, x iBe mark sample, x jFor not marking sample, Expression does not mark sample x jBelong to classification C kThe posterior probability estimation value of conditional probability, P (C k| x j) the given sample x of expression jBelong to classification C kThe initialization posterior probability of conditional probability; κ (x j-x i) expression do not mark sample x jCuclear density;
Correspondingly,
Determine that the affiliated classification that does not mark sample is: classification corresponding to posterior probability maximal value that selection does not mark sample is the affiliated classification of this sample.
According to another aspect of the present invention, a kind of video labeling device that provides comprises that camera lens is cut apart module, key frame is gathered extraction module, characteristic extracting module, semi-supervised kernel density Estimation module and sample labeling module, wherein:
Camera lens is cut apart module, is used for that video is carried out camera lens and cuts apart;
Key frame set extraction module is used for extracting the key frame set in each camera lens of cutting apart;
Characteristic extracting module is used for extracting the relevant low-level image feature vector of each key frame set;
Semi-supervised kernel density Estimation module: be used for utilizing semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;
The sample labeling module is used for not carrying out the classification mark to marking the corresponding key frame of sample.
Preferably, characteristic extracting module specifically is used for: the color histogram, color moment, marginal distribution histogram and/or the textural characteristics that extract each key frame set.
Preferably, semi-supervised kernel density Estimation module comprises: the first computing unit, the second computing unit, the 3rd computing unit and determining unit, wherein:
The first computing unit is used for calculating initialization mark sample posterior probability;
The second computing unit is for the cuclear density of calculating sample;
The 3rd computing unit is used for calculating the posterior probability that does not mark sample;
Determining unit is used for definite affiliated classification that does not mark sample.
According to embodiments of the invention, the proper vector that makes up by the multiple low-level image feature that adopts image represents key frame, reduced image information loss, and utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, improved the efficient of video labeling and the accuracy of Density Estimator, especially suitable extensive video labeling.
Description of drawings
The process flow diagram of a kind of video labeling method that Fig. 1 provides for the embodiment of the invention;
The process flow diagram of a kind of camera lens dividing method that Fig. 2 provides for the preferred embodiment of the present invention;
A kind of key frame that Fig. 3 provides for the preferred embodiment of the present invention is gathered the process flow diagram of extracting method;
A kind of method flow diagram that extracts the proper vector of key frame that Fig. 4 provides for the preferred embodiment of the present invention;
Fig. 5 divides schematic diagram for the image-region that the preferred embodiment of the present invention provides;
Fig. 6 does not mark the method flow diagram that sample carries out the classification mark for a kind of semi-supervised kernel density Estimation algorithm that utilizes that the preferred embodiment of the present invention provides to each;
The modular structure figure of a kind of video labeling device that Fig. 7 provides for the embodiment of the invention.
Embodiment
In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
Be a kind of video labeling method process flow diagram that the embodiment of the invention provides as shown in Figure 1, the method may further comprise the steps:
S101, video is carried out camera lens cut apart;
S102, the key frame that extracts in each shot segmentation are gathered;
Specifically, have suitable redundancy between each frame in the common same camera lens, therefore can choose the two field picture of main information content of a camera lens of reflection as key frame, in order to succinct statement camera lens.In the situation that determine camera lens, frame in the camera lens is extracted key frame, its rudimentary algorithm includes but not limited to: the frame pitch that calculates all consecutive frames in the same camera lens from, find with a upper consecutive frame range difference greater than the frame of threshold value as key frame, if apart from difference less than the range difference threshold value then continue to search, until the consecutive frame range difference of selected key frame set is all greater than threshold value.
S103, extract the relevant low-level image feature vector of each key frame set;
The relevant low-level image feature vector that above-mentioned key frame set is extracted includes but not limited to color histogram, color moment, marginal distribution histogram and texture.
S104, utilize semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark;
S105, do not carry out classification mark to marking key frame corresponding to sample.
The below is described in detail the specific implementation of each step in above-mentioned each method:
See also Fig. 2, a kind of camera lens dividing method that the preferred embodiment of the present invention provides may further comprise the steps:
S1011 is if compressed video at first carries out video decode, to obtain primitive frame;
If the color space of S1012 image is not the hsv color space, then the RGB color space conversion with image is the hsv color space, conversion formula can for:
H = arccos ( R - G ) + ( R - B ) 2 ( R - G ) * ( R - G ) + ( R - B ) * ( G - * B ) ( B ≤ G ) 2 π - arccos ( R - G ) + ( R - B ) 2 ( R - G ) * ( R - G ) + ( R - B ) * ( G - * B ) ( B > G ) - - - ( 1 )
S = max ( R , G , B ) - min ( R , G , B ) max ( R + G + B ) - - - ( 2 )
V = max ( R , G , B ) 255 - - - ( 3 )
Wherein, R represents that redness, G represent that green, B represent blueness, and H represents that tone, S represent that saturation degree, V represent brightness.
S1013, utilize and to carry out camera lens based on the Shot Detection method of pixel domain and cut apart.
In this step, can adopt χ 2Histogram method carries out video lens to be cut apart, with χ 2Compare with given threshold tau, if χ 2>τ, then shot boundary exists, χ 2The histogram calculation formula is:
Figure BDA00002643917300062
Wherein k is the color layers sum, H 1(i) and H 2(i) be the i layer color histogram of two two field pictures, threshold tau is by the adjacent χ of all videos 2Histogram mean value is determined.
Certainly, also can adopt other based on the Shot Detection method of pixel domain in the present embodiment, such as template matching method, based on the method for edge rate and model-based methods etc.
See also Fig. 3, a kind of key frame set extracting method that the preferred embodiment of the present invention provides may further comprise the steps:
S1021, initial frame are as initial key frame;
Specifically, read the first frame in the video lens, and with this frame as the initial key frame f in the camera lens 1
The similarity of S1022, calculated for subsequent frame and key frame;
Specifically, can be according to similarity measurement method calculated for subsequent frame f jSimilarity with the first frame;
S1023, whether judge similarity greater than an adaptive threshold τ, if so, execution in step S1024 then, otherwise return step S1022;
S1024, with f jAs new key frame;
S1025, with f jOutput to the key frame set;
S1026, judge whether camera lens finishes, if do not finish, then return step S1022, otherwise execution in step S1027;
S1027, process ends.
Definite method of above-mentioned adaptive threshold τ can adopt:
δ 2 = 1 T ( 1 T Σ i = 1 T [ s i - 1 T Σ i = 1 T s i ] 2 ) + 1 M - T - 1 ( 1 M - T - 1 Σ i = T + 1 M [ s i - 1 M - T - 1 Σ i = T + 1 M s i ] 2 ) - - - ( 5 )
Wherein, s iBe any one element in the one-dimension array of preserving the poor result of all frame frames, M is the number of one-dimension array.Carry out T for all array elements and cut apart, and for all separation calculation δ 2, find minimum δ 2, the array s that its corresponding T is cut apart TBe its used adaptive threshold τ.
See also Fig. 4, a kind of relevant low-level image feature vector that extracts each key frame set that the preferred embodiment of the present invention provides may further comprise the steps:
S1031, layering is carried out in the hsv color space of image.
The color space layering has various ways, is divided into 8 parts with the tone H with the hsv color space in this step, and saturation degree S and brightness V are divided into 3 parts, is total to such an extent that 72 kinds of colors are example, and its hierarchy formulas is:
H = 0 if h ∈ [ 316,20 ] 1 if h ∈ [ 21,40 ] 2 if h ∈ [ 41,75 ] 3 if h ∈ [ 76,155 ] 4 if h ∈ [ 156,190 ] 5 if h ∈ [ 191,270 ] 6 if h ∈ [ 271,195 ] 7 if h ∈ [ 296,315 ] - - - ( 6 )
S = 0 if s ∈ [ 0,0.2 ] 1 if s ∈ [ 0.2,0.7 ] 2 if s ∈ [ 0.7,1 ] - - - ( 7 )
V = 0 if v ∈ [ 0,0.2 ] 1 if v ∈ [ 0.2,0.7 ] 2 if v ∈ [ 0.7,1 ] - - - ( 8 )
According to above method color space is divided into 72 kinds of colors.
The color histogram of S1032, extraction image.
To carry out extracting N dimension color histogram in the N kind color that layering obtains to the hsv color space of image in this step, wherein, N be natural number.
S1033, image is carried out the zone divide.
Seeing also Fig. 5, can be the 3X3 zone with image segmentation in the present embodiment, certainly, can also adopt other dividing mode.
The color moment of S1034, extraction image.
In the 3X3 zone (Fig. 5) of image segmentation, the first moment of each extracted region color moment (average u), second moment (standard variance σ), third moment (degree of bias s) extract 81 dimension color moments altogether, extract formula and are:
u i = 1 N Σ j = 1 N p ij - - - ( 9 )
σ i = ( 1 N Σ j = 1 N ( p ij - u i ) 2 ) 1 / 2 - - - ( 10 )
s i = ( 1 N Σ j = 1 N ( p ij - u i ) 3 ) 1 / 3 - - - ( 11 )
Wherein N is the pixel sum of i two field picture, p IjBe j pixel value.
S1035, image is carried out rim detection, extract the marginal distribution histogram;
The marginal distribution histogram mainly is statistical picture or the distribution situation of a certain local edge wherein.The marginal distribution histogram generally is to obtain marginal information by detection algorithm, and the directivity of edge distribution is added up by the certain angle interval again.
The Canny operator is to generally acknowledge one of best Image Edge-Detection operator at present, its superiority is to utilize two different threshold values to detect strong edge and weak edge, when the strong edge of a weak edge and is communicated with, the weak edge of output in the strong edge of output, otherwise weak edge can not be output, like this can noise reduction for the interference of rim detection, can not lose weak marginal information again simultaneously.
Can utilize the Canny operator that image is carried out rim detection in the present embodiment, to the image behind the Canny operator edge extracting, the edge direction is divided for scope at a certain angle and (is still taked 3X3 to divide, such as Fig. 5), form one some grades marginal distribution histogram, the 27 dimension marginal distribution histograms that obtain at last carry out normalized:
H[i]=H[i]/S (12)
H[i wherein] be edge orientation histogram, S is the area of image.
The Tamura texture of S1036, extraction image;
The Tamura texture has six visual properties: fineness degree (Coarseness), contrast (Contrast), directivity (Directionality), wire (Linelikeness), systematicness (Regularity) and roughness (Roughness), only use first three, rear three properties and first three have larger correlativity, according to the zoning such as Fig. 5, extract 27 dimension Tamura textures, its computing formula is:
Coarseness = 1 mn Σ i = 1 m Σ j = 1 n S best ( i , j ) - - - ( 13 )
I wherein, j be width be m highly for the coordinate of the pixel of the image of n, establish E(and comprise level, vertical both direction) be that the mean intensity difference of pixel, (x, y) represent selected image-region, then make E reach the optimum dimension S of maximum BestUnite definite by following formula:
S best(x,y)=2 k
E k=E max=max(E 1,E 2,Λ,E h)
Contrast = σ σ 4 1 / 4 , α 4 = u 4 σ 4 - - - ( 14 )
Wherein σ is the standard variance of gradation of image, α 4The kurtosis of gradation of image value, u 4It is the Fourth-order moment average.
Directionality = Σ p n p Σ φ ∈ w p ( φ - φ p ) 2 H D ( φ ) - - - ( 15 )
φ is the maximal value between gradient angle location, n pThat each regional inside gradient angle is greater than the pixel quantity of given threshold value, H D(φ) be the histogram of the gradient vector number structure of all pixels, φ pRepresent the peak in this histogram, w pRepresent the quantized value scope that p comprises, p is certain peak value.
Need to explanatorily be not have execution sequence between above-mentioned steps S1032, S1033 and the S1034.
The proper vector of S1037, output image.
Obtain at last the 207 dimension low-level image features vectors that formed by color histogram, color moment, marginal distribution histogram and Taumura texture by above-mentioned steps.
That a kind of semi-supervised kernel density Estimation algorithm that utilizes that the preferred embodiment of the present invention provides does not mark the method flow diagram that sample carries out the classification mark to each as shown in Figure 6.
In the key frame set that above-mentioned steps S102 obtains, the use characteristic vector represents key frame, and each proper vector represents a key frame sample x i, bidding is annotated sample the K class, and l mark sample L={x arranged 1, x 2, Λ, x lAnd the individual sample U={x that do not mark of u L+1, Λ, x L+u, n=l+u, the cuclear density probability function of a kind of expansion of use in the present embodiment
Figure BDA00002643917300101
Estimation formulas (16):
p ^ ( x | C k ) = Σ i = 1 n P ( C k | x i ) κ ( x - x i ) Σ i = 1 n P ( C k | x i ) - - - ( 16 )
Wherein,
Figure BDA00002643917300103
Expression sample x belongs to classification C kProbability (being the posterior probability of sample), P (C k| x i) expression sample x iBelong to classification C kThe initialization posterior probability of conditional probability; K is mark sample type quantity, and n is the total sample number amount, x iBe the mark sample, x is for specifying sample, κ (x-x i) expression specifies the cuclear density of sample x.
With the sample x variable in the above-mentioned formula with concrete not mark sample x jReplace, then do not mark sample x jPosterior probability be formula (17):
P ^ ( C k | x j ) = Σ i = 1 n P ( C k | x i ) κ ( x j - x i ) Σ i = 1 n κ ( x j - x i ) - - - ( 17 )
Wherein, k is mark sample type quantity, and n is the total sample number amount, x iBe mark sample, x jFor not marking sample,
Figure BDA00002643917300105
Expression does not mark sample x jBelong to classification C kThe posterior probability estimation value of conditional probability, P (C k| x j) the given sample x of expression jBelong to classification C kThe initialization posterior probability of conditional probability; κ (x jx i) expression do not mark sample x jCuclear density.
This algorithm for estimating has comprised simultaneously the mark sample and has not marked the information of sample, has greatly improved the accuracy of Density Estimator.See also Fig. 6, utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out classification mark and may further comprise the steps:
S1041, initialization mark sample posterior probability P (C kx j);
Can adopt following formula to carry out in this step:
P ( C k | x j ) = l k Σ k = 1 K l k , j ∈ L - - - ( 18 )
L wherein kFor being labeled as the sample number of classification k, The mark classification number that represents all samples, C kBe the sample set of k class, j, k all belong to natural number.
Cuclear density κ (the x of S1042, calculating sample j-x i);
Can adopt various ways to calculate cuclear density in this step, for example, when adopting gaussian kernel, can utilize following formula to carry out:
κ ( x j - x i ) = 1 ( 2 π ) d / 2 σ d exp ( - | | x j - x i | | / 2 σ 2 ) - - - ( 19 )
Wherein, get d=1; x iBe mark sample, x jFor not marking sample, exp is natural number e, and σ is the standard deviation of all samples.
When adopting index nuclear, undertaken by following formula:
κ ( x j - x i ) = 1 ( 2 σ ) d exp ( - | | x j - x i | | / σ ) - - - ( 20 )
Wherein, get d=1; x iBe mark sample, x jFor not marking sample, exp is natural number e, and σ is the standard deviation of all samples.
S1043, calculating do not mark sample x jPosterior probability
Figure BDA00002643917300115
The concrete above-mentioned formula (17) that adopts of this step carries out, and specifically referring to above-mentioned explanation, does not repeat here.
S1044, definite affiliated classification that does not mark sample;
Specifically, the more above-mentioned sample x that do not mark of this step jPosterior probability values, get classification corresponding to maximal value and be the affiliated classification of this sample, to sample x jCorresponding key frame carries out the classification mark.
Be illustrated in figure 7 as the modular structure figure of a kind of video labeling device that the embodiment of the invention provides, among the figure, this device comprises that camera lens is cut apart module 10, key frame is gathered extraction module 20, characteristic extracting module 30, semi-supervised kernel density Estimation module 40 and sample labeling module 50, wherein:
Camera lens is cut apart module 10, is used for that video is carried out camera lens and cuts apart;
Key frame set extraction module 20 is used for extracting the key frame set in each camera lens;
Characteristic extracting module 30 is used for extracting the relevant low-level image feature vector of each key frame set;
Semi-supervised kernel density Estimation module 40 is used for utilizing semi-supervised kernel density Estimation algorithm that each is not marked sample and carries out the classification mark;
Sample labeling module 50 is used for key frame corresponding to sample carried out the classification mark.
Preferably, these characteristic extracting module 30 concrete combination in any that are used for color histogram, color moment, marginal distribution histogram and the textural characteristics of each key frame set of extraction.
Specifically, semi-supervised kernel density Estimation module comprises 40: the first computing units 401, the second computing unit 402, the 3rd computing unit 403 and determining unit 404, wherein:
The first computing unit 401 is used for initialization mark sample posterior probability;
The second computing unit 402 is for the cuclear density of calculating sample;
The 3rd computing unit 403 is used for calculating the posterior probability that does not mark sample;
Determining unit 404 is used for determining the described affiliated classification that does not mark sample.
Preferably, the first computing unit 401 is concrete for adopting above-mentioned formula (18) initialization mark sample posterior probability; The second computing unit 402 concrete cuclear density that are used for adopting above-mentioned formula (19) or (20) calculating sample; The 3rd computing unit 403 concrete formula (17) that adopt calculate the posterior probability that does not mark sample, and determining unit 404 is concrete for the more above-mentioned posterior probability values that does not mark sample, gets classification corresponding to maximal value and is the affiliated classification of this sample.
Need to prove, the technical characterictic among the said method embodiment is applicable equally in the present embodiment, no longer repeats here.
Embodiments of the invention, the proper vector that makes up by the multiple low-level image feature that adopts image represents key frame, reduced image information loss, and utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, the efficient of video labeling and the accuracy of Density Estimator have been improved, especially be fit to extensive video labeling, especially be fit to extensive video labeling.
Above with reference to the accompanying drawings of the preferred embodiments of the present invention, be not so limit to interest field of the present invention.Those skilled in the art do not depart from the scope and spirit of the present invention, and can have multiple flexible program to realize the present invention, obtain another embodiment such as the feature as an embodiment can be used for another embodiment.Allly using any modification of doing within the technical conceive of the present invention, be equal to and replace and improve, all should be within interest field of the present invention.

Claims (10)

1. a video labeling method is characterized in that, the method may further comprise the steps:
Video is carried out camera lens to be cut apart;
Extract the key frame set in described each camera lens of cutting apart;
Extract the relevant low-level image feature vector of described each key frame set;
Utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;
Do not carry out the classification mark to marking the corresponding key frame of sample.
2. a kind of video labeling method according to claim 1 is characterized in that, describedly video is carried out camera lens cuts apart and may further comprise the steps:
If described video is compressed video, then carry out video decode, to obtain primitive frame;
If the color space of image is not the hsv color space, then the RGB color space conversion with image is the hsv color space;
Utilization is carried out camera lens based on the Shot Detection method of pixel domain and is cut apart.
3. a kind of video labeling method according to claim 1 is characterized in that, the key frame in each camera lens that described extraction is cut apart comprises following steps:
The frame pitch that calculates all consecutive frames in the same camera lens from, select all and a upper consecutive frame range difference greater than the frame of adaptive threshold as key frame.
4. a kind of video labeling method according to claim 1 is characterized in that, described relevant low-level image feature vector comprises: color histogram, color moment, marginal distribution histogram and/or Tamura textural characteristics.
5. a kind of video labeling method according to claim 1 is characterized in that, describedly utilizes semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark and may further comprise the steps:
Initialization mark sample posterior probability;
Calculate the cuclear density of sample;
Calculate the posterior probability that does not mark sample;
Determine the described affiliated classification that does not mark sample.
6. a kind of video labeling method according to claim 5 is characterized in that, described initialization mark sample posterior probability adopts following formula to carry out:
P ( C k | x j ) = l k Σ k = 1 K l k , j ∈ L
Wherein, j, k all belong to natural number, l kFor being labeled as the sample number of classification k,
Figure FDA00002643917200022
The mark classification number that represents all samples, C kBe the sample set of k class, P (C kx j) the given sample x of expression jBelong to classification C kThe initialization posterior probability of conditional probability.
7. a kind of video labeling method according to claim 5 is characterized in that, the posterior probability that described calculating does not mark sample adopts following formula to carry out:
P ^ ( C k | x j ) = Σ i = 1 n P ( C k | x i ) κ ( x j - x i ) Σ i = 1 n κ ( x j - x i )
Wherein, k is mark sample type quantity, and n is the total sample number amount, x iBe mark sample, x jFor not marking sample, Expression does not mark sample x jBelong to classification C kThe posterior probability estimation value of conditional probability, P (C k| x j) the given sample x of expression jBelong to classification C kThe initialization posterior probability of conditional probability; κ (x jx i) expression do not mark sample x jCuclear density;
Correspondingly,
Described definite described affiliated classification that does not mark sample is: classification corresponding to posterior probability maximal value that selection does not mark sample is the affiliated classification of this sample.
8. a video labeling device is characterized in that, this device comprises that camera lens is cut apart module, key frame is gathered extraction module, characteristic extracting module, semi-supervised kernel density Estimation module and sample labeling module, wherein:
Camera lens is cut apart module, is used for that video is carried out camera lens and cuts apart;
Key frame set extraction module is used for extracting the key frame set in described each camera lens of cutting apart;
Characteristic extracting module is used for extracting the relevant low-level image feature vector of described each key frame set;
Semi-supervised kernel density Estimation module: be used for utilizing semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;
The sample labeling module is used for not carrying out the classification mark to marking the corresponding key frame of sample.
9. a kind of video labeling device according to claim 8 is characterized in that, described characteristic extracting module specifically is used for: the color histogram, color moment, marginal distribution histogram and/or the textural characteristics that extract described each key frame set.
10. a kind of video labeling device according to claim 8 is characterized in that, described semi-supervised kernel density Estimation module comprises: the first computing unit, the second computing unit, the 3rd computing unit and determining unit, wherein:
The first computing unit is used for calculating initialization mark sample posterior probability;
The second computing unit is for the cuclear density of calculating sample;
The 3rd computing unit is used for calculating the posterior probability that does not mark sample;
Determining unit is used for determining the described affiliated classification that does not mark sample.
CN201210566985.5A 2012-12-24 2012-12-24 Method for video labeling and device for video labeling Expired - Fee Related CN103065300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210566985.5A CN103065300B (en) 2012-12-24 2012-12-24 Method for video labeling and device for video labeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210566985.5A CN103065300B (en) 2012-12-24 2012-12-24 Method for video labeling and device for video labeling

Publications (2)

Publication Number Publication Date
CN103065300A true CN103065300A (en) 2013-04-24
CN103065300B CN103065300B (en) 2015-03-25

Family

ID=48107917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210566985.5A Expired - Fee Related CN103065300B (en) 2012-12-24 2012-12-24 Method for video labeling and device for video labeling

Country Status (1)

Country Link
CN (1) CN103065300B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475935A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Method and device for retrieving video segments
CN106339655A (en) * 2015-07-06 2017-01-18 无锡天脉聚源传媒科技有限公司 Video shot marking method and device
CN106603916A (en) * 2016-12-14 2017-04-26 天脉聚源(北京)科技有限公司 Key frame detection method and device
CN106649855A (en) * 2016-12-30 2017-05-10 中广热点云科技有限公司 Video label adding method and adding system
CN106919652A (en) * 2017-01-20 2017-07-04 东北石油大学 Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning
CN107133569A (en) * 2017-04-06 2017-09-05 同济大学 The many granularity mask methods of monitor video based on extensive Multi-label learning
CN108235116A (en) * 2017-12-27 2018-06-29 北京市商汤科技开发有限公司 Feature propagation method and device, electronic equipment, program and medium
WO2018187917A1 (en) * 2017-04-10 2018-10-18 深圳市柔宇科技有限公司 Method and device for assessing picture quality
CN109829467A (en) * 2017-11-23 2019-05-31 财团法人资讯工业策进会 Image labeling method, electronic device and non-transient computer-readable storage medium
CN110263645A (en) * 2019-05-21 2019-09-20 新华智云科技有限公司 A kind of method and system judged for team's attacking and defending in section of football match video
CN110865756A (en) * 2019-11-12 2020-03-06 苏州智加科技有限公司 Image labeling method, device, equipment and storage medium
WO2020052270A1 (en) * 2018-09-14 2020-03-19 华为技术有限公司 Video review method and apparatus, and device
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113506610A (en) * 2021-07-08 2021-10-15 联仁健康医疗大数据科技股份有限公司 Method and device for generating annotation specification, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1997114A (en) * 2006-09-14 2007-07-11 浙江大学 A video object mask method based on the profile space and time feature
CN101141633A (en) * 2007-08-28 2008-03-12 湖南大学 Moving object detecting and tracing method in complex scene

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1997114A (en) * 2006-09-14 2007-07-11 浙江大学 A video object mask method based on the profile space and time feature
CN101141633A (en) * 2007-08-28 2008-03-12 湖南大学 Moving object detecting and tracing method in complex scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
游前慧: "基于核密度的半监督学习算法在视频语义标注中的应用", 《中国优秀硕士论文》, 30 June 2008 (2008-06-30) *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475935A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Method and device for retrieving video segments
CN106339655A (en) * 2015-07-06 2017-01-18 无锡天脉聚源传媒科技有限公司 Video shot marking method and device
CN106603916A (en) * 2016-12-14 2017-04-26 天脉聚源(北京)科技有限公司 Key frame detection method and device
CN106649855A (en) * 2016-12-30 2017-05-10 中广热点云科技有限公司 Video label adding method and adding system
CN106649855B (en) * 2016-12-30 2019-06-21 中广热点云科技有限公司 A kind of adding method and add-on system of video tab
CN106919652A (en) * 2017-01-20 2017-07-04 东北石油大学 Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning
CN107133569A (en) * 2017-04-06 2017-09-05 同济大学 The many granularity mask methods of monitor video based on extensive Multi-label learning
WO2018187917A1 (en) * 2017-04-10 2018-10-18 深圳市柔宇科技有限公司 Method and device for assessing picture quality
CN109829467A (en) * 2017-11-23 2019-05-31 财团法人资讯工业策进会 Image labeling method, electronic device and non-transient computer-readable storage medium
CN108235116A (en) * 2017-12-27 2018-06-29 北京市商汤科技开发有限公司 Feature propagation method and device, electronic equipment, program and medium
CN108235116B (en) * 2017-12-27 2020-06-16 北京市商汤科技开发有限公司 Feature propagation method and apparatus, electronic device, and medium
WO2020052270A1 (en) * 2018-09-14 2020-03-19 华为技术有限公司 Video review method and apparatus, and device
CN110913243A (en) * 2018-09-14 2020-03-24 华为技术有限公司 Video auditing method, device and equipment
CN110263645A (en) * 2019-05-21 2019-09-20 新华智云科技有限公司 A kind of method and system judged for team's attacking and defending in section of football match video
CN110865756A (en) * 2019-11-12 2020-03-06 苏州智加科技有限公司 Image labeling method, device, equipment and storage medium
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113344932B (en) * 2021-06-01 2022-05-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113506610A (en) * 2021-07-08 2021-10-15 联仁健康医疗大数据科技股份有限公司 Method and device for generating annotation specification, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103065300B (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN103065300B (en) Method for video labeling and device for video labeling
Jia et al. Category-independent object-level saliency detection
Scharfenberger et al. Statistical textural distinctiveness for salient region detection in natural images
Li et al. Amodal instance segmentation
CN104508682B (en) Key frame is identified using the openness analysis of group
Cheng et al. HFS: Hierarchical feature selection for efficient image segmentation
Varnousfaderani et al. Weighted color and texture sample selection for image matting
US9626585B2 (en) Composition modeling for photo retrieval through geometric image segmentation
CN102968635B (en) Image visual characteristic extraction method based on sparse coding
Aksac et al. Complex networks driven salient region detection based on superpixel segmentation
Hu et al. Robust subspace analysis for detecting visual attention regions in images
CN103400386A (en) Interactive image processing method used for video
Zeeshan et al. A newly developed ground truth dataset for visual saliency in videos
Zhong et al. Background subtraction driven seeds selection for moving objects segmentation and matting
Shao et al. A Wavelet Based Local Descriptor for Human Action Recognition.
Manzanera Local jet feature space framework for image processing and representation
Kim et al. Non-parametric human segmentation using support vector machine
Zhou et al. Modeling perspective effects in photographic composition
Zhou et al. Depth-guided saliency detection via boundary information
Siva et al. Grid seams: A fast superpixel algorithm for real-time applications
Scharfenberger et al. Image saliency detection via multi-scale statistical non-redundancy modeling
Mancas et al. Human attention modelization and data reduction
KR102444172B1 (en) Method and System for Intelligent Mining of Digital Image Big-Data
Affara et al. Large scale asset extraction for urban images
Kushwaha et al. Automatic moving object segmentation methods under varying illumination conditions for video data: comparative study, and an improved method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150325

Termination date: 20171224

CF01 Termination of patent right due to non-payment of annual fee