CN103065300A

CN103065300A - Method for video labeling and device for video labeling

Info

Publication number: CN103065300A
Application number: CN2012105669855A
Authority: CN
Inventors: 秦兴德; 吴金勇; 王一科; 王军; 钟翔宇
Original assignee: China Security and Surveillance Technology PRC Inc
Current assignee: China Security and Surveillance Technology PRC Inc
Priority date: 2012-12-24
Filing date: 2012-12-24
Publication date: 2013-04-24
Anticipated expiration: 2032-12-24
Also published as: CN103065300B

Abstract

The invention discloses a method for video labeling and a device for video labeling and belongs to the field of video processing. The method for the video labeling comprises carrying out a shot segmentation to a video; extracting a key frame assembly in each segmented shot; extracting relevant low layer eigenvectors of each key frame assembly; using semi-supervised kernel density estimation arithmetic to label each unlabeled sample with a category label and labeling key frames corresponding to the unlabeled samples with category. Due to the fact that the eigenvectors combined by various low layer characteristics of an image are adopted to represent the key frames, and loss of pattern information is reduced. The semi-supervised kernel density estimation arithmetic is used for labeling each unlabeled sample with a category, unlabeled data is leaded to a kernel density estimation, feature information of the labeled samples and unlabeled samples are comprehensively applied, and efficiency of video labeling and accuracy of kernel density estimation are improved.

Description

A kind of video labeling method and device

Technical field

The present invention relates to Video processing and machine learning field, particularly a kind of video labeling method and device.

Background technology

Along with the development of computer and network technologies, so that domestic consumer can contact increasing video data.Video data provides a large amount of Useful Informations, and its content is more abundanter, directly perceived and lively than other forms of data.On the one hand, the magnanimity information that the video data that enriches comprises is that other media are incomparable; But on the other hand, the polysemy of its day by day huge data volume, non-structured data mode and content is provided with obstacle for user interactive easily again, has affected the larger effect of its performance.

In order to excavate potential value in the large-scale video set, the user needs effectively to retrieve needed video segment.Video labeling is the technology that text and video semanteme content association are got up, and is the mode of the semantic gap of a kind of good minimizing and the intermediate steps that can be used for video frequency searching, so that the user can retrieve by key frame or the semantic information of input video.

In actual applications, it is relatively more difficult that multitude of video is marked, and at first uses artificial method to remove to mark a video set and need to expend a large amount of time and efforts, and the user does not often have enough patience to finish the mark of whole sample set; Secondly, extract the semanteme that to express video content with low-level proper vector very difficult.

How few sample of trying one's best is manually marked and the low-level image feature of various ways, and obtain the key issue that video labeling performance preferably becomes video labeling.Because machine Learning Theory is relatively ripe, foundation and the various possible solution of theoretical analysis can be provided for video labeling, it is generally acknowledged that therefore it is to solve the video labeling problem than better suited method.At present, the research of video labeling mainly concentrates on the accuracy of how utilizing learning method and improving mark in conjunction with the characteristics of video.Many machine learning methods such as support vector machine (Support Vector Machines, be called for short SVM), the concentrated information of having considered the mark sample such as Bayess classification, random forest (Random Forest), and a large amount of do not mark the information that sample comprises and be wasted.

Summary of the invention

The few quantity that do not mark of processing labeled data exists in a large number in the prior art in order to overcome, and the few deficiency of single features expressing information, the invention provides a kind of video labeling method and device, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, improved the efficient of video labeling and the accuracy of Density Estimator.

It is as follows that the present invention solves the problems of the technologies described above the technical scheme that adopts:

According to an aspect of the present invention, a kind of video labeling method that provides may further comprise the steps:

Video is carried out camera lens to be cut apart;

Key frame set in each camera lens that extraction is cut apart;

Extract the relevant low-level image feature vector of each key frame set;

Utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;

Do not carry out the classification mark to marking the corresponding key frame of sample.

Preferably, above-mentionedly video is carried out camera lens cut apart and may further comprise the steps:

If video is compressed video, then carry out video decode, to obtain primitive frame;

If the color space of image is not the hsv color space, then the RGB color space conversion with image is the hsv color space;

Utilization is carried out camera lens based on the Shot Detection method of pixel domain and is cut apart.

Preferably, the key frame in each camera lens of cutting apart of said extracted comprises following steps:

The frame pitch that calculates all consecutive frames in the same camera lens from, select all and a upper consecutive frame range difference greater than the frame of adaptive threshold as key frame.

Preferably, above-mentioned relevant low-level image feature vector comprises: color histogram, color moment, marginal distribution histogram and/or Tamura textural characteristics.

Preferably, above-mentionedly utilize semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark and may further comprise the steps:

Initialization mark sample posterior probability;

Calculate the cuclear density of sample;

Calculate the posterior probability that does not mark sample;

Determine not mark the affiliated classification of sample.

Preferably, initialization mark sample posterior probability adopts following formula to carry out:

P (C_{k} | x_{j}) = \frac{l_{k}}{Σ_{k = 1}^{K} l_{k}}, j &Element; L

Wherein, j, k all belong to natural number, l _kFor being labeled as the sample number of classification k,

The mark classification number that represents all samples, C _kBe the sample set of k class, P (C _kx _j) the given sample x of expression _jBelong to classification C _kThe initialization posterior probability of conditional probability.

Preferably, calculating the posterior probability that does not mark sample adopts following formula to carry out:

\hat{P} (C_{k} | x_{j}) = \frac{Σ_{i = 1}^{n} P (C_{k} | x_{i}) κ (x_{j} - x_{i})}{Σ_{i = 1}^{n} κ (x_{j} - x_{i})}

Wherein, k is mark sample type quantity, and n is the total sample number amount, x _iBe mark sample, x _jFor not marking sample, Expression does not mark sample x _jBelong to classification C _kThe posterior probability estimation value of conditional probability, P (C _k| x _j) the given sample x of expression _jBelong to classification C _kThe initialization posterior probability of conditional probability; κ (x _j-x _i) expression do not mark sample x _jCuclear density;

Correspondingly,

Determine that the affiliated classification that does not mark sample is: classification corresponding to posterior probability maximal value that selection does not mark sample is the affiliated classification of this sample.

According to another aspect of the present invention, a kind of video labeling device that provides comprises that camera lens is cut apart module, key frame is gathered extraction module, characteristic extracting module, semi-supervised kernel density Estimation module and sample labeling module, wherein:

Camera lens is cut apart module, is used for that video is carried out camera lens and cuts apart;

Key frame set extraction module is used for extracting the key frame set in each camera lens of cutting apart;

Characteristic extracting module is used for extracting the relevant low-level image feature vector of each key frame set;

Semi-supervised kernel density Estimation module: be used for utilizing semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark;

The sample labeling module is used for not carrying out the classification mark to marking the corresponding key frame of sample.

Preferably, characteristic extracting module specifically is used for: the color histogram, color moment, marginal distribution histogram and/or the textural characteristics that extract each key frame set.

Preferably, semi-supervised kernel density Estimation module comprises: the first computing unit, the second computing unit, the 3rd computing unit and determining unit, wherein:

The first computing unit is used for calculating initialization mark sample posterior probability;

The second computing unit is for the cuclear density of calculating sample;

The 3rd computing unit is used for calculating the posterior probability that does not mark sample;

Determining unit is used for definite affiliated classification that does not mark sample.

According to embodiments of the invention, the proper vector that makes up by the multiple low-level image feature that adopts image represents key frame, reduced image information loss, and utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, improved the efficient of video labeling and the accuracy of Density Estimator, especially suitable extensive video labeling.

Description of drawings

The process flow diagram of a kind of video labeling method that Fig. 1 provides for the embodiment of the invention;

The process flow diagram of a kind of camera lens dividing method that Fig. 2 provides for the preferred embodiment of the present invention;

A kind of key frame that Fig. 3 provides for the preferred embodiment of the present invention is gathered the process flow diagram of extracting method;

A kind of method flow diagram that extracts the proper vector of key frame that Fig. 4 provides for the preferred embodiment of the present invention;

Fig. 5 divides schematic diagram for the image-region that the preferred embodiment of the present invention provides;

Fig. 6 does not mark the method flow diagram that sample carries out the classification mark for a kind of semi-supervised kernel density Estimation algorithm that utilizes that the preferred embodiment of the present invention provides to each;

The modular structure figure of a kind of video labeling device that Fig. 7 provides for the embodiment of the invention.

Embodiment

In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.

Be a kind of video labeling method process flow diagram that the embodiment of the invention provides as shown in Figure 1, the method may further comprise the steps:

S101, video is carried out camera lens cut apart;

S102, the key frame that extracts in each shot segmentation are gathered;

Specifically, have suitable redundancy between each frame in the common same camera lens, therefore can choose the two field picture of main information content of a camera lens of reflection as key frame, in order to succinct statement camera lens.In the situation that determine camera lens, frame in the camera lens is extracted key frame, its rudimentary algorithm includes but not limited to: the frame pitch that calculates all consecutive frames in the same camera lens from, find with a upper consecutive frame range difference greater than the frame of threshold value as key frame, if apart from difference less than the range difference threshold value then continue to search, until the consecutive frame range difference of selected key frame set is all greater than threshold value.

S103, extract the relevant low-level image feature vector of each key frame set;

The relevant low-level image feature vector that above-mentioned key frame set is extracted includes but not limited to color histogram, color moment, marginal distribution histogram and texture.

S104, utilize semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark;

S105, do not carry out classification mark to marking key frame corresponding to sample.

The below is described in detail the specific implementation of each step in above-mentioned each method:

See also Fig. 2, a kind of camera lens dividing method that the preferred embodiment of the present invention provides may further comprise the steps:

S1011 is if compressed video at first carries out video decode, to obtain primitive frame;

If the color space of S1012 image is not the hsv color space, then the RGB color space conversion with image is the hsv color space, conversion formula can for:

H = \{\begin{matrix} \arccos \frac{(R - G) + (R - B)}{2 \sqrt{(R - G) * (R - G) + (R - B) * (G - * B)}} (B \leq G) \\ 2 π - \arccos \frac{(R - G) + (R - B)}{2 \sqrt{(R - G) * (R - G) + (R - B) * (G - * B)}} (B > G) \end{matrix} - - - (1)

S = \frac{\max (R, G, B) - \min (R, G, B)}{\max (R + G + B)} - - - (2)

V = \frac{\max (R, G, B)}{255} - - - (3)

Wherein, R represents that redness, G represent that green, B represent blueness, and H represents that tone, S represent that saturation degree, V represent brightness.

S1013, utilize and to carry out camera lens based on the Shot Detection method of pixel domain and cut apart.

In this step, can adopt χ ²Histogram method carries out video lens to be cut apart, with χ ²Compare with given threshold tau, if χ ²＞τ, then shot boundary exists, χ ²The histogram calculation formula is:

Wherein k is the color layers sum, H ₁(i) and H ₂(i) be the i layer color histogram of two two field pictures, threshold tau is by the adjacent χ of all videos ²Histogram mean value is determined.

Certainly, also can adopt other based on the Shot Detection method of pixel domain in the present embodiment, such as template matching method, based on the method for edge rate and model-based methods etc.

See also Fig. 3, a kind of key frame set extracting method that the preferred embodiment of the present invention provides may further comprise the steps:

S1021, initial frame are as initial key frame;

Specifically, read the first frame in the video lens, and with this frame as the initial key frame f in the camera lens ₁

The similarity of S1022, calculated for subsequent frame and key frame;

Specifically, can be according to similarity measurement method calculated for subsequent frame f _jSimilarity with the first frame;

S1023, whether judge similarity greater than an adaptive threshold τ, if so, execution in step S1024 then, otherwise return step S1022;

S1024, with f _jAs new key frame;

S1025, with f _jOutput to the key frame set;

S1026, judge whether camera lens finishes, if do not finish, then return step S1022, otherwise execution in step S1027;

S1027, process ends.

Definite method of above-mentioned adaptive threshold τ can adopt:

δ^{2} = \frac{1}{T} (\frac{1}{T} Σ_{i = 1}^{T} {[s_{i} - \frac{1}{T} Σ_{i = 1}^{T} s_{i}]}^{2}) + \frac{1}{M - T - 1} (\frac{1}{M - T - 1} Σ_{i = T + 1}^{M} {[s_{i} - \frac{1}{M - T - 1} Σ_{i = T + 1}^{M} s_{i}]}^{2}) - - - (5)

Wherein, s _iBe any one element in the one-dimension array of preserving the poor result of all frame frames, M is the number of one-dimension array.Carry out T for all array elements and cut apart, and for all separation calculation δ ², find minimum δ ², the array s that its corresponding T is cut apart _TBe its used adaptive threshold τ.

See also Fig. 4, a kind of relevant low-level image feature vector that extracts each key frame set that the preferred embodiment of the present invention provides may further comprise the steps:

S1031, layering is carried out in the hsv color space of image.

The color space layering has various ways, is divided into 8 parts with the tone H with the hsv color space in this step, and saturation degree S and brightness V are divided into 3 parts, is total to such an extent that 72 kinds of colors are example, and its hierarchy formulas is:

H = \{\begin{matrix} 0 & if & h &Element; [316,20] \\ 1 & if & h &Element; [21,40] \\ 2 & if & h &Element; [41,75] \\ 3 & if & h &Element; [76,155] \\ 4 & if & h &Element; [156,190] \\ 5 & if & h &Element; [191,270] \\ 6 & if & h &Element; [271,195] \\ 7 & if & h &Element; [296,315] \end{matrix} - - - (6)

S = \{\begin{matrix} 0 & if & s &Element; [0,0.2] \\ 1 & if & s &Element; [0.2,0.7] \\ 2 & if & s &Element; [0.7,1] \end{matrix} - - - (7)

V = \{\begin{matrix} 0 & if & v &Element; [0,0.2] \\ 1 & if & v &Element; [0.2,0.7] \\ 2 & if & v &Element; [0.7,1] \end{matrix} - - - (8)

According to above method color space is divided into 72 kinds of colors.

The color histogram of S1032, extraction image.

To carry out extracting N dimension color histogram in the N kind color that layering obtains to the hsv color space of image in this step, wherein, N be natural number.

S1033, image is carried out the zone divide.

Seeing also Fig. 5, can be the 3X3 zone with image segmentation in the present embodiment, certainly, can also adopt other dividing mode.

The color moment of S1034, extraction image.

In the 3X3 zone (Fig. 5) of image segmentation, the first moment of each extracted region color moment (average u), second moment (standard variance σ), third moment (degree of bias s) extract 81 dimension color moments altogether, extract formula and are:

u_{i} = \frac{1}{N} Σ_{j = 1}^{N} p_{ij} - - - (9)

σ_{i} = {(\frac{1}{N} Σ_{j = 1}^{N} {(p_{ij} - u_{i})}^{2})}^{1 / 2} - - - (10)

s_{i} = {(\frac{1}{N} Σ_{j = 1}^{N} {(p_{ij} - u_{i})}^{3})}^{1 / 3} - - - (11)

Wherein N is the pixel sum of i two field picture, p _IjBe j pixel value.

S1035, image is carried out rim detection, extract the marginal distribution histogram;

The marginal distribution histogram mainly is statistical picture or the distribution situation of a certain local edge wherein.The marginal distribution histogram generally is to obtain marginal information by detection algorithm, and the directivity of edge distribution is added up by the certain angle interval again.

The Canny operator is to generally acknowledge one of best Image Edge-Detection operator at present, its superiority is to utilize two different threshold values to detect strong edge and weak edge, when the strong edge of a weak edge and is communicated with, the weak edge of output in the strong edge of output, otherwise weak edge can not be output, like this can noise reduction for the interference of rim detection, can not lose weak marginal information again simultaneously.

Can utilize the Canny operator that image is carried out rim detection in the present embodiment, to the image behind the Canny operator edge extracting, the edge direction is divided for scope at a certain angle and (is still taked 3X3 to divide, such as Fig. 5), form one some grades marginal distribution histogram, the 27 dimension marginal distribution histograms that obtain at last carry out normalized:

H[i]＝H[i]/S （12）

H[i wherein] be edge orientation histogram, S is the area of image.

The Tamura texture of S1036, extraction image;

The Tamura texture has six visual properties: fineness degree (Coarseness), contrast (Contrast), directivity (Directionality), wire (Linelikeness), systematicness (Regularity) and roughness (Roughness), only use first three, rear three properties and first three have larger correlativity, according to the zoning such as Fig. 5, extract 27 dimension Tamura textures, its computing formula is:

Coarseness = \frac{1}{mn} Σ_{i = 1}^{m} Σ_{j = 1}^{n} S_{best} (i, j) - - - (13)

I wherein, j be width be m highly for the coordinate of the pixel of the image of n, establish E(and comprise level, vertical both direction) be that the mean intensity difference of pixel, (x, y) represent selected image-region, then make E reach the optimum dimension S of maximum _BestUnite definite by following formula:

S _best(x,y)=2 ^k

E _k=E _max=max(E ₁,E ₂,Λ,E _h)

Contrast = \frac{σ}{σ_{4}^{1 / 4}}, α_{4} = \frac{u_{4}}{σ^{4}} - - - (14)

Wherein σ is the standard variance of gradation of image, α ₄The kurtosis of gradation of image value, u ₄It is the Fourth-order moment average.

Directionality = Σ_{p}^{n_{p}} \underset{φ &Element; w_{p}}{Σ} {(φ - φ_{p})}^{2} H_{D} (φ) - - - (15)

φ is the maximal value between gradient angle location, n _pThat each regional inside gradient angle is greater than the pixel quantity of given threshold value, H _D(φ) be the histogram of the gradient vector number structure of all pixels, φ _pRepresent the peak in this histogram, w _pRepresent the quantized value scope that p comprises, p is certain peak value.

Need to explanatorily be not have execution sequence between above-mentioned steps S1032, S1033 and the S1034.

The proper vector of S1037, output image.

Obtain at last the 207 dimension low-level image features vectors that formed by color histogram, color moment, marginal distribution histogram and Taumura texture by above-mentioned steps.

That a kind of semi-supervised kernel density Estimation algorithm that utilizes that the preferred embodiment of the present invention provides does not mark the method flow diagram that sample carries out the classification mark to each as shown in Figure 6.

In the key frame set that above-mentioned steps S102 obtains, the use characteristic vector represents key frame, and each proper vector represents a key frame sample x _i, bidding is annotated sample the K class, and l mark sample L={x arranged ₁, x ₂, Λ, x _lAnd the individual sample U={x that do not mark of u _L+1, Λ, x _L+u, n=l+u, the cuclear density probability function of a kind of expansion of use in the present embodiment

Estimation formulas (16):

\hat{p} (x | C_{k}) = \frac{Σ_{i = 1}^{n} P (C_{k} | x_{i}) κ (x - x_{i})}{Σ_{i = 1}^{n} P (C_{k} | x_{i})} - - - (16)

Wherein,

Expression sample x belongs to classification C _kProbability (being the posterior probability of sample), P (C _k| x _i) expression sample x _iBelong to classification C _kThe initialization posterior probability of conditional probability; K is mark sample type quantity, and n is the total sample number amount, x _iBe the mark sample, x is for specifying sample, κ (x-x _i) expression specifies the cuclear density of sample x.

With the sample x variable in the above-mentioned formula with concrete not mark sample x _jReplace, then do not mark sample x _jPosterior probability be formula (17):

\hat{P} (C_{k} | x_{j}) = \frac{Σ_{i = 1}^{n} P (C_{k} | x_{i}) κ (x_{j} - x_{i})}{Σ_{i = 1}^{n} κ (x_{j} - x_{i})} - - - (17)

Wherein, k is mark sample type quantity, and n is the total sample number amount, x _iBe mark sample, x _jFor not marking sample,

Expression does not mark sample x _jBelong to classification C _kThe posterior probability estimation value of conditional probability, P (C _k| x _j) the given sample x of expression _jBelong to classification C _kThe initialization posterior probability of conditional probability; κ (x _jx _i) expression do not mark sample x _jCuclear density.

This algorithm for estimating has comprised simultaneously the mark sample and has not marked the information of sample, has greatly improved the accuracy of Density Estimator.See also Fig. 6, utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out classification mark and may further comprise the steps:

S1041, initialization mark sample posterior probability P (C _kx _j);

Can adopt following formula to carry out in this step:

P (C_{k} | x_{j}) = \frac{l_{k}}{Σ_{k = 1}^{K} l_{k}}, j &Element; L - - - (18)

L wherein _kFor being labeled as the sample number of classification k, The mark classification number that represents all samples, C _kBe the sample set of k class, j, k all belong to natural number.

Cuclear density κ (the x of S1042, calculating sample _j-x _i);

Can adopt various ways to calculate cuclear density in this step, for example, when adopting gaussian kernel, can utilize following formula to carry out:

κ (x_{j} - x_{i}) = \frac{1}{{(2 π)}^{d / 2} σ^{d}} \exp (- | | x_{j} - x_{i} | | / {2 σ}^{2}) - - - (19)

Wherein, get d=1; x _iBe mark sample, x _jFor not marking sample, exp is natural number e, and σ is the standard deviation of all samples.

When adopting index nuclear, undertaken by following formula:

κ (x_{j} - x_{i}) = \frac{1}{{(2 σ)}^{d}} \exp (- | | x_{j} - x_{i} | | / σ) - - - (20)

S1043, calculating do not mark sample x _jPosterior probability

The concrete above-mentioned formula (17) that adopts of this step carries out, and specifically referring to above-mentioned explanation, does not repeat here.

S1044, definite affiliated classification that does not mark sample;

Specifically, the more above-mentioned sample x that do not mark of this step _jPosterior probability values, get classification corresponding to maximal value and be the affiliated classification of this sample, to sample x _jCorresponding key frame carries out the classification mark.

Be illustrated in figure 7 as the modular structure figure of a kind of video labeling device that the embodiment of the invention provides, among the figure, this device comprises that camera lens is cut apart module 10, key frame is gathered extraction module 20, characteristic extracting module 30, semi-supervised kernel density Estimation module 40 and sample labeling module 50, wherein:

Camera lens is cut apart module 10, is used for that video is carried out camera lens and cuts apart;

Key frame set extraction module 20 is used for extracting the key frame set in each camera lens;

Characteristic extracting module 30 is used for extracting the relevant low-level image feature vector of each key frame set;

Semi-supervised kernel density Estimation module 40 is used for utilizing semi-supervised kernel density Estimation algorithm that each is not marked sample and carries out the classification mark;

Sample labeling module 50 is used for key frame corresponding to sample carried out the classification mark.

Preferably, these characteristic extracting module 30 concrete combination in any that are used for color histogram, color moment, marginal distribution histogram and the textural characteristics of each key frame set of extraction.

Specifically, semi-supervised kernel density Estimation module comprises 40: the first computing units 401, the second computing unit 402, the 3rd computing unit 403 and determining unit 404, wherein:

The first computing unit 401 is used for initialization mark sample posterior probability;

The second computing unit 402 is for the cuclear density of calculating sample;

The 3rd computing unit 403 is used for calculating the posterior probability that does not mark sample;

Determining unit 404 is used for determining the described affiliated classification that does not mark sample.

Preferably, the first computing unit 401 is concrete for adopting above-mentioned formula (18) initialization mark sample posterior probability; The second computing unit 402 concrete cuclear density that are used for adopting above-mentioned formula (19) or (20) calculating sample; The 3rd computing unit 403 concrete formula (17) that adopt calculate the posterior probability that does not mark sample, and determining unit 404 is concrete for the more above-mentioned posterior probability values that does not mark sample, gets classification corresponding to maximal value and is the affiliated classification of this sample.

Need to prove, the technical characterictic among the said method embodiment is applicable equally in the present embodiment, no longer repeats here.

Embodiments of the invention, the proper vector that makes up by the multiple low-level image feature that adopts image represents key frame, reduced image information loss, and utilize semi-supervised kernel density Estimation algorithm that each is not marked sample and carry out the classification mark, unlabeled data is incorporated in the Density Estimator goes, integrated use mark sample and do not mark the characteristic information of sample, the efficient of video labeling and the accuracy of Density Estimator have been improved, especially be fit to extensive video labeling, especially be fit to extensive video labeling.

Above with reference to the accompanying drawings of the preferred embodiments of the present invention, be not so limit to interest field of the present invention.Those skilled in the art do not depart from the scope and spirit of the present invention, and can have multiple flexible program to realize the present invention, obtain another embodiment such as the feature as an embodiment can be used for another embodiment.Allly using any modification of doing within the technical conceive of the present invention, be equal to and replace and improve, all should be within interest field of the present invention.

Claims

1. a video labeling method is characterized in that, the method may further comprise the steps:

Video is carried out camera lens to be cut apart;

Extract the key frame set in described each camera lens of cutting apart;

Extract the relevant low-level image feature vector of described each key frame set;

2. a kind of video labeling method according to claim 1 is characterized in that, describedly video is carried out camera lens cuts apart and may further comprise the steps:

If described video is compressed video, then carry out video decode, to obtain primitive frame;

3. a kind of video labeling method according to claim 1 is characterized in that, the key frame in each camera lens that described extraction is cut apart comprises following steps:

4. a kind of video labeling method according to claim 1 is characterized in that, described relevant low-level image feature vector comprises: color histogram, color moment, marginal distribution histogram and/or Tamura textural characteristics.

5. a kind of video labeling method according to claim 1 is characterized in that, describedly utilizes semi-supervised kernel density Estimation algorithm that each is not marked sample to carry out classification mark and may further comprise the steps:

Initialization mark sample posterior probability;

Calculate the cuclear density of sample;

Calculate the posterior probability that does not mark sample;

Determine the described affiliated classification that does not mark sample.

6. a kind of video labeling method according to claim 5 is characterized in that, described initialization mark sample posterior probability adopts following formula to carry out:

P (C_{k} | x_{j}) = \frac{l_{k}}{Σ_{k = 1}^{K} l_{k}}, j &Element; L

7. a kind of video labeling method according to claim 5 is characterized in that, the posterior probability that described calculating does not mark sample adopts following formula to carry out:

\hat{P} (C_{k} | x_{j}) = \frac{Σ_{i = 1}^{n} P (C_{k} | x_{i}) κ (x_{j} - x_{i})}{Σ_{i = 1}^{n} κ (x_{j} - x_{i})}

Wherein, k is mark sample type quantity, and n is the total sample number amount, x _iBe mark sample, x _jFor not marking sample, Expression does not mark sample x _jBelong to classification C _kThe posterior probability estimation value of conditional probability, P (C _k| x _j) the given sample x of expression _jBelong to classification C _kThe initialization posterior probability of conditional probability; κ (x _jx _i) expression do not mark sample x _jCuclear density;

Correspondingly,

Described definite described affiliated classification that does not mark sample is: classification corresponding to posterior probability maximal value that selection does not mark sample is the affiliated classification of this sample.

8. a video labeling device is characterized in that, this device comprises that camera lens is cut apart module, key frame is gathered extraction module, characteristic extracting module, semi-supervised kernel density Estimation module and sample labeling module, wherein:

Key frame set extraction module is used for extracting the key frame set in described each camera lens of cutting apart;

Characteristic extracting module is used for extracting the relevant low-level image feature vector of described each key frame set;

9. a kind of video labeling device according to claim 8 is characterized in that, described characteristic extracting module specifically is used for: the color histogram, color moment, marginal distribution histogram and/or the textural characteristics that extract described each key frame set.

10. a kind of video labeling device according to claim 8 is characterized in that, described semi-supervised kernel density Estimation module comprises: the first computing unit, the second computing unit, the 3rd computing unit and determining unit, wherein:

The second computing unit is for the cuclear density of calculating sample;

Determining unit is used for determining the described affiliated classification that does not mark sample.