CN102436483A

CN102436483A - Video advertisement detection method based on explicit sharing subspace

Info

Publication number: CN102436483A
Application number: CN2011103356334A
Authority: CN
Inventors: 朱振峰; 赵耀; 杨厚德; 刘楠
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2011-10-31
Filing date: 2011-10-31
Publication date: 2012-05-02

Abstract

The present invention discloses a video advertisement detection method based on explicit shared subspace in the field of multimedia retrieval technology. First, the lens is divided into a semantic lens sequence, and the key frames therein are extracted to obtain visual features and audio features, and then the eigenvalues of the explicit shared subspace and the mapping matrix composed of the two are obtained; then, by selecting the vector corresponding to the eigenvalue of the specified mapping matrix in the explicit shared subspace, the visual feature mapping matrix and the audio feature mapping matrix are obtained; the dimension reduction and feature fusion operations are implemented; the matrix obtained by feature fusion is input into a support vector machine for training, and after obtaining the optimal classification model, it is used to make a preliminary judgment on the lens to be detected, and finally, through the post-processing step, it is finally determined whether the lens to be detected is an advertisement lens. The present invention does not need to establish an advertisement database, uses the vector that plays a main role to detect the lens, and finally judges the lens through the post-processing step, which improves the effectiveness and accuracy of the detection.

Description

A kind of video ads detection method based on explicit shared Sub space

Technical field

The invention belongs to the multimedia retrieval technical field, relate in particular to a kind of video ads detection method based on explicit shared Sub space.

Background technology

Along with the continuous development of science and technology and progress, particularly computer technology, network technology and the high capacity memory technology of infotech, people have used a large amount of collection of various means and have produced various types of multimedia information datas.Yet, along with the accumulation all the year round of image and video, and the decline of multimedia messages processing power, people need find own interested content urgently quickly and accurately in vast as the ocean multi-medium data.

At first the text retrieval technology is adopted in the retrieval of multimedia messages, promptly generate the textual description of multimedia messages,, adopt the text retrieval technology to realize retrieval then multimedia messages like file explanation, mark etc. through manual type.The retrieval that this retrieval mode is applied to multimedia messages has its intrinsic defective.The first, whole semantemes that multimedia messages reflected are difficult to use the text accurate description.The second, the artificial method of describing is vulnerable to mark the influence of staff's carelessness and produces mistake, and workload is huge.The 3rd, the understanding of user on multimedia information varies with each individual and causes artificial describing mode objective inadequately.The 4th, textual description also receives the restriction of language, and is but different because of the angle of translation as for an American film, thereby the Chinese name of definition is also different.Therefore, the content-based multimedia retrieval method of research has important practical sense.

For addressing these problems, people propose content-based multimedia retrieval technology.So-called content-based multimedia retrieval is meant that physics and content semanteme that multi-medium data (like video, audio stream etc.) is contained carries out Computer Analysis and understand; To make things convenient for user inquiring; Its essence is to unordered multimedia data stream structuring; Extract semantic information, guarantee that content of multimedia can be by retrieval fast.

Purposes of commercial detection is causing the concern that People more and more is many as an important branch of Content-based Video Retrieval.Exactly because this advertisement important effect of play more and more in people's life.As the important carrier of business information, advertisement plays a part can not be substituted on the transmission business information.Many businessmans spend huge sums and make excellent advertisement for the product of oneself and propagate oneself product, and the expansion brand influence is pushed up sales.As the important means of supervision enterprise, relevant government advertising mechanism for monitoring is being served as tissue always, is being guided and supervised managing advertisement like advertising supervision and control department of State Administration for Industry and Commerce of the People's Republic of China, the responsibility of illegal activities such as investigation sham publicity.As the important method of obtaining merchandise news, people are also in the various advertising messages of the reception that does not stop.Along with the development of advertising, the quantity of advertisement grows with each passing day, and the type of advertisement also varies.How to discern and to detect advertisement automatically, become the focus of research.People have proposed the video ads detection system for this reason, hope to utilize this system can detect advertisement automatically, and the position of positioning advertising.

Different crowds is different to the real needs of video advertisement system, for common televiewer, often hopes that advertisement is few more good more, is unlikely to influence them like this and watches normal TV programme.They hope that the video ads detection system not only can accomplish the detection of advertisement, can also advertisement be cut off, and they just can interference-freely watch the normal tv program like this.For businessman, they but hope to watch comprehensive advertisement, and on the one hand, they can recognize rival's latest tendency according to rival's up-to-date advertisement, thereby make rational competitive strategy.On the other hand, whether whether they also play in the advertisement of inspection oneself as requested, get a desired effect or the like.For government organs, like advertising supervision and control department of State Administration for Industry and Commerce, General Bureau of Radio, Film and Television or the like, they will supervise advertisement content as supervision department, see whether whether advertisement content is illegal, cheat spectators.

The video ads detection algorithm varies, but all be the difference that utilizes advertising programme and ordinary tv programme, the video ads detection system that on the basis of existing content-based multimedia retrieval system, proposes.With the purposes of commercial detection algorithm based on principle different, we are divided into following three types with the algorithm of commercial detection system:

1. based on the station symbol method

This method is to have utilized the sign characteristics of TV station.We know, TV station is playing the normal tv program, like news, can the sign of TV station be placed on more showy position in the time of TV play etc., let the televiewer can remember television channel.And in the time of TV station's playing advertisements, but can the sign of oneself be stashed.Based on such difference, whether whether in progress program is advertisement in the existence that we can identify through detection TV station.Generally speaking, the station symbol of TV station is divided into three kinds: static station symbol, translucent station symbol and dynamic station symbol.To three kinds of different station symbols, people propose corresponding searching algorithm one after another, thereby realize the detection of advertisement.But the defective of this method mainly contains following 2 points: the first, and this ad playing rule is not the TV programme that are adapted to all periods of TV station; The second, translucent station symbol is with dynamically station symbol is because itself, and like manufacture technique, reasons such as manifestation mode deal with complicated especially, so also ripe without comparison purposes of commercial detection and recognizer.

2. based on the method for discerning

The prerequisite of the method is to set up huge advertising database, adopts the corresponding matched algorithm to confirm the similarity of advertisement in video and the database of to be detected or identification then, thereby determines whether it is the advertisement in the database.But imagine that easily the shortcoming of the maximum of this method is exactly to set up huge advertising database, constantly the artificial regeneration advertising database adds up-to-date advertisement in order to detection at any time with assurance simultaneously.How in huge memory contents, accomplishing inquiry and coupling rapidly also is a research difficult problem.

3. based on the method for learning

For solving the defective that preceding two class methods exist, people propose a kind of method based on study.The characteristic that the method mainly utilizes advertisement to be different from normal program realizes purposes of commercial detection.With respect to ordinary tv programme, advertising programme exists difference clearly in some characteristic aspect.This is because the characteristics of advertisement itself: advertisement because will attract spectators' eyeball, has been added various manufacture techniques when making, play up skill or the like.Such as realizing detecting through average edge rate A-ECR (Average of Edge Change Ratio) and the edge variation variance V-ECR (Variance of Change Ratio) that extracts one section frame of video.This mainly is to consider that advertisement is different from ordinary tv programme in visual aspects, and the edge variation situation of advertisement is than complicated many of normal program.Aspect audio frequency; Also there are some obvious characteristics in the audio content of advertisement video part and common program audio-frequency information partly, such as utilizing audio frequency Mei Er cepstral coefficients (Mel-frequency Cepstral Coefficient) and audio-frequency information entropy to realize the detection to video ads.Up-to-date commercial detection system merges both often, thereby realizes advertisement section is detected more accurately.In recent research; Much in the detection method based on study the method for having introduced machine learning is arranged, through training to sample, the reasonable sorter of obtained performance; Then advertisement camera lens and general programs camera lens are classified, thereby obtain more accurate testing result.But existing these class methods fail deep layer to excavate the total characteristic that following of different modalities contains semanteme, have influenced the performance of purposes of commercial detection.

Avoid the problem of preceding two class methods simultaneously in order to remedy this defective; The present invention is based on the principle of the 3rd class methods; Propose a kind of video ads detection system, utilize explicit shared Sub space visual signature and audio frequency characteristics to be merged dimensionality reduction based on explicit shared Sub space; Fully excavate the total semanteme that vision and audio modality contained; And utilize SVMs that the advertisement camera lens is classified, at last carry out aftertreatment and correct, thereby develop the system that a cover can the fast detecting advertisement by the time continuity characteristic of advertisement.

Summary of the invention

Can not deep layer excavate the deficiencies such as total characteristic that following of different modalities contains semanteme to mentioning existing method in the above-mentioned background technology, the present invention proposes a kind of video ads detection method based on explicit shared Sub space.

Technical scheme of the present invention is that a kind of video ads detection method based on explicit shared Sub space is characterized in that this method may further comprise the steps:

Step 1: training set data is divided into semantic shot sequence through assignment algorithm;

Step 2: each camera lens in the semantic shot sequence extracts the vision key frame, and then obtains visual signature and audio frequency characteristics, tries to achieve the eigenwert of the mapping matrix that is made up of visual signature and audio frequency characteristics;

Step 3: try to achieve explicit shared Sub space according to visual signature and audio frequency characteristics;

Step 4: according to from big to small rank order, and choose eigenwert corresponding vector in explicit shared Sub space of specifying mapping matrix, try to achieve visual signature mapping matrix and audio frequency characteristics mapping matrix with this vector to the eigenwert of mapping matrix;

Step 5: on the basis of step 4, video features and audio frequency characteristics are mapped to explicit shared Sub space, accomplish the dimensionality reduction of video features and audio frequency characteristics, and then accomplish Feature Fusion;

Step 6: will be input to by the matrix that Feature Fusion obtains and carry out classification based training in the SVMs, and utilize ad hoc approach to obtain the optimal classification model, and use it tentatively to judge whether camera lens to be detected is the advertisement camera lens;

Step 7: on the basis of step 6, confirm finally through post-processing step whether camera lens to be detected is the advertisement camera lens.

Said assignment algorithm is semantic camera lens partitioning algorithm.

Said ad hoc approach is the right-angled intersection proof method.

Said visual signature mapping matrix is:

A = {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} u

Wherein:

A is the visual signature mapping matrix;

X is a visual signature;

λ is the regularization coefficient;

A is the weight coefficient of visual space and audio space;

I is a unit matrix;

U is training shared Sub space.

Said audio frequency characteristics mapping matrix is:

B = {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T} u

Wherein:

B is the audio frequency characteristics mapping matrix;

Y is an audio frequency characteristics.

Said post-processing step is:

Step 1: the length of window that semantic camera lens is set;

Step 2: advertisement number of shots and non-number of ads in the statistical window beyond the current semantic camera lens;

Step 3: the camera lens property value of finding the solution current camera lens in the window;

Step 4: the camera lens property value with current camera lens in the window compares with advertisement attributes threshold value and normal program attribute threshold value respectively, judges whether final definite camera lens to be detected is the advertisement camera lens.

The computing formula of the camera lens property value of current camera lens is in the said window:

l_{i} = \frac{1 \times N_{C} + (- 1) \times N_{P}}{N_{C} + N_{P}}

Wherein:

l _iCamera lens property value for i camera lens in the window;

N _CBe the advertisement number of shots beyond the current semantic camera lens;

N _PBe the non-advertisement number of shots beyond the current semantic camera lens.

The advantage of native system is: the characteristic under the different modalities has been extracted in advertisement, made full use of the characteristic under the different modalities and construct unified semantic description.When constructing unified semantic description, we have adopted novel structure to unify the method for semantic description, promptly explicit shared Sub space.When asking explicit shared Sub space, we add adjustable parameter to visual signature and audio frequency characteristics.Parameter obtains visual signature and audio frequency characteristics is best merges through regulating.In addition, the present invention is based on the commercial detection system of rule learning, so need not set up huge advertising database, in dimensionality reduction, can effectively obtain the most effectively dimension, thereby improve detection efficiency simultaneously.Last post-processing step has also made full use of the time continuity characteristic of advertisement, has improved the accuracy of the semantic Shot Detection of advertisement to a great extent.

Description of drawings

Fig. 1 is an overall system diagram of the present invention;

Fig. 2 is a principle schematic of finding the solution explicit shared Sub space;

Fig. 3 is the aftertreatment synoptic diagram;

Fig. 4 is an influence curve of selecting dimension correct rate behind the dimensionality reduction;

Fig. 5 is the influence curve of visual signature weight coefficient correct rate.

Embodiment

Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit scope of the present invention and application thereof.

The present invention is shown in accompanying drawing 1, and its basic process is:

The video database that will comprise advertisement and non-advertisement through semantic camera lens segmentation procedure, is divided into semantic shot sequence with training set data as training set data;

Each semantic camera lens in the training set data uniformly-spaced extracts the vision key frame, then to key-frame extraction N in the semantic camera lens ₁The visual signature X of dimension, this characteristic comprises color, edge, grey level histogram;

The equally spaced audio fragment that is divided into of each semantic camera lens in the training set data extracts N then ₂The audio frequency characteristics Y of dimension, this characteristic comprises the tone color textural characteristics, tonequality characteristic and other low-level image feature;

Find the solution explicit shared Sub space U according to the visual signature X of training set and the audio frequency characteristics Y of training set, the eigenwert

of the visual signature X of training set and the audio frequency characteristics mapping matrix that Y constitutes of training set

sorts from big to small with eigenwert; With eigenwert characteristic of correspondence vector also ordering thereupon; The dimension that selection plays a major role from explicit shared Sub space U; Form training shared Sub space u, utilize training shared Sub space u to find the solution visual signature mapping matrix A and audio frequency characteristics mapping matrix B; The visual signature X of training set and the audio frequency characteristics Y of training set multiply by visual signature mapping matrix A and audio frequency characteristics mapping matrix B respectively; Be X * A and Y * B; Thereby the visual signature X of training set and the audio frequency characteristics Y of training set are mapped among the explicit shared Sub space U, and have accomplished characteristic dimensionality reduction the audio frequency characteristics Y of the visual signature X of training set and training set.X * A and Y * B are directly combined, i.e. [X * A Y * B], thus accomplish Feature Fusion.

[X * A Y * B] be input to carry out classification based training in the SVMs, utilize the right-angled intersection proof method, obtain the optimal classification model M;

Camera lens to be detected is accomplished semantic camera lens segmentation procedure according to the identical method of training set data, extract the visual signature X ' of corresponding test set and the audio frequency characteristics Y ' of test set;

Audio frequency characteristics Y ' to the visual signature X ' of test set and test set carries out Feature Fusion and characteristic dimensionality reduction, promptly at first the visual signature X ' of test set multiply by visual signature mapping matrix A, i.e. X ' * A; The audio frequency characteristics Y ' of test set multiply by audio frequency characteristics mapping matrix B, i.e. Y ' * B.Then X ' * A and Y ' * B are directly coupled together, promptly [X ' * A Y ' * B] is as the data to be predicted of SVMs;

Whether the optimal classification model and the parameter of training are treated predicted data and are carried out SVMs classification prediction, be the advertisement camera lens thereby judge;

Setting is corrected and is differentiated wrong advertisement camera lens based on successional post-processing step of time spot, thereby further improves the accuracy of purposes of commercial detection.

Below in conjunction with accompanying drawing and embodiment the present invention is done further description.

Based on the technical scheme of above introduction, we apply the present invention in the purposes of commercial detection, for the user provides accurate purposes of commercial detection service.In conjunction with accompanying drawing, our specific embodiments of the invention is done to set forth in detail.

1. semantic shot sequence is cut apart

In the present invention; The purpose that semantic shot sequence is cut apart is to hope video is divided into the junior unit with semanteme; Each semantic camera lens can be expressed identical meaning, is the detection that unit carries out advertisement with semantic camera lens, thereby reduces memory consumption and computation complexity.Native system adopts semantic shot sequence partitioning algorithm, not only can detect the sudden change camera lens, can also detect the gradual change camera lens.Its operating process is following:

(1) initialization sudden change camera lens threshold value T _cWith gradual change camera lens threshold value T _s

(2) read the video file of input by the form of frame after, clip image.Remove image up and down each 1/6, in the middle of only keeping 2/3, thereby remove black surround, the influence of station symbol and captions.

(3) reserve part is extracted 24 dimension HSV histograms.The histogram difference of two frames before and after calculating is followed T with difference _cRelatively, if greater than T _c, then be judged as semantic shot-cut point, write down this position.

(4) since the first stature cut-point, if between next semantic camera lens cut-point and the last semantic camera lens cut-point greater than 10 frames, calculate the accumulative histogram difference between then every continuous 10 frames, with accumulative histogram difference and T _sRelatively, if greater than T _s, then be judged as the semantic camera lens cut-point of gradual change, write down this position.

(5) change (2), histogrammic difference between the remaining successive frame is relatively accomplished between all frames in the continuation comparison video.

2. key-frame extraction

Native system has adopted different extraction method of key frame to phonetic feature and visual signature.The concrete operations step is following:

(1) for visual signature, for each semantic camera lens, per 30 frames extract a key frame, thereby obtain visual signature key frame sequence.

(2) for audio frequency characteristics, for each semantic camera lens, per 20 milliseconds of places are as the audio frequency characteristics key frame.

3. feature extraction

Native system has extracted visual signature and audio frequency characteristics simultaneously, from vision and two aspects of audio frequency video content is described.

(1) for visual signature, we adopted multiple dimensioned moving window extracted HSV (hue, saturation, value) color histogram, edge rate, gray scale frame difference characteristic, these characteristics couple together forms N ₁The characteristic of dimension is described the visual information of video.

(2) for audio frequency characteristics, we have adopted textural characteristics, tonequality characteristic and some other low-level image feature, thus form N ₂The characteristic of dimension is described Video and Audio information.

4. find the solution explicit shared Sub space U

Through visual signature and first characteristic frequently are provided with parameter, find the solution explicit shared Sub space among the present invention.Shown in accompanying drawing 2,, make the characteristic that obtains have better discrimination through finding the solution explicit shared Sub space.Its cardinal principle is following:

Ask and make objective function 1.a||XA-U|| _F+ (1-a) || YB-U|| _FMinimum U.

Specifically find the solution as follows:

Former formula=Tr [a (XA-U) ^T(XA-U)]+Tr [(1-a) (YB-U) ^T(YB-U)]

＝aTr[(A ^TX ^T-U ^T)(XA-U)]+(1-a)Tr[(B ^TY ^T-U ^T)(YB-U)]

＝aTr[A ^TX ^TXA-A ^TX ^TU-U ^TXA+U ^TU+(1-a)Tr[B ^TY ^TYB-B ^TY ^TU-U ^TYB+U ^TU]

＝aTr[A ^TX ^TXA-2A ^TX ^TU+U ^TU]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU+U ^TU]

＝aTr[A ^TX ^TXA-2A ^TX ^TU+I]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU+I]

＝aTr[A ^TX ^TXA-2A ^TX ^TU]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU]+Tr(I)

＝aTr[A ^TX ^TXA]-2aTr[A ^TX ^TU]+(1-a)Tr[B ^TY ^TYB]-2(1-a)Tr[B ^TY ^TU]+Tr(I)

Wherein:

A is the weight coefficient of visual signature;

X is the visual signature of training set;

A is the visual signature mapping matrix;

Y is the audio frequency characteristics of training set;

B is the audio frequency characteristics mapping matrix;

Tr is for to ask mark (trace) to matrix;

X ^TTransposed matrix for matrix X;

U is explicit shared Sub space, U ^TU=I.

Following formula is asked partial derivative to A, B respectively, that is:

\frac{&PartialD; L}{&PartialD; A} = 2 a X^{T} XA - 2 a X^{T} U

\frac{&PartialD; L}{&PartialD; B} = 2 (1 - a) Y^{T} YB - 2 (1 - a) Y^{T} U

Order

\frac{&PartialD; L}{&PartialD; A} = 0,

\frac{&PartialD; L}{&PartialD; B} = 0,

Have:

\{\begin{matrix} 2 a X^{T} XA - 2 a X^{T} U = 0 & (1) \\ 2 (1 - a) Y^{T} YB - 2 (1 - a) Y^{T} U = 0 & (2) \end{matrix}

Get by (1):

A＝(X ^TX) ^-1X ^TU (3)

Get by (2):

B＝(Y ^TY) ^-1Y ^TU (4)

In (3) and (4) substitution objective function 1, obtain:

a||X(X ^TX) ^-1X ^TU-U|| _F+(1-a)||Y(Y ^TY) ^-1Y ^TU-U|| _F (5)

Can get by (5):

Tr[U ^T[aX(X ^TX) ^-1X ^T+(1-a)Y(Y ^TY) ^-1Y ^T]U] (6)

So objective function 1 is finding the solution (6).

Make G=[aX (X ^TX) ^-1X ^T+ (1-a) Y (Y ^TY) ^-1Y ^T], then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.But, because (X ^TX) ^-1(Y ^TY) ^-1Irreversible situation may occur, we introduce the regularization coefficient lambda, prevent the situation that matrix is unusual, and λ can also prevent simultaneously || A| _FWith || B|| _FExcessive or too small situation, the concrete operations principle is following:

Ask and make objective function 2.a||XA-U|| _F+ (1-a) || YB-U|| _F+ λ [|| A|| _F+ || B|| _F] minimum U, wherein U ^TU=I.Find the solution as follows:

Former formula=Tr [a (XA-U) ^T(XA-U)]+Tr is [(1-α) (YB-U) ^T(YB-U)]+λ Tr [A ^TA+B ^TB]

＝aTr[(A ^TX ^T-U ^T)(XA-U)]+(1-a)Tr[B ^TY ^T-U ^T)(YB-U)]+λTr[A ^TA+B ^TB]

＝aTR[A ^TX ^TXA-A ^TX ^TU-U ^TXA+U ^TU]+(1-a)Tr[B ^TY ^TYB-B ^TY ^TU-U ^TYB+U ^TU]+λTr[A ^TA+B ^TB]

＝aTr[A ^TX ^TXA-2A ^TX ^TU+U ^TU]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU+U ^TU]+λTr[A ^TA+B ^TB]

＝aTr[A ^TX ^TXA-2A ^TX ^TU+I]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU+I]+λTr[A ^TA+B ^TB]

＝aTr[A ^TX ^TXA-2A ^TX ^TU]+(1-a)Tr[B ^TY ^TYB-2B ^TY ^TU]+Tr(I)+λTr[A ^TA+B ^TB]

＝aTr[A ^TX ^TXA]-2aTr[A ^TX ^TU]+(1-a)Tr[B ^TY ^TYB]-2(1-a)Tr[B ^TY ^TU]+Tr(I)+λTr[A ^TA+B ^TB]

Following formula is asked partial derivative to A, B respectively, that is:

\frac{&PartialD; L}{&PartialD; A} = 2 a X^{T} XA - 2 a X^{T} U + 2 λA

\frac{&PartialD; L}{&PartialD; B} = 2 (1 - a) Y^{T} YB - 2 (1 - a) Y^{T} U + 2 λB

Order

\frac{&PartialD; L}{&PartialD; A} = 0,

\frac{&PartialD; L}{&PartialD; B} = 0,

Have:

\{\begin{matrix} 2 a X^{T} XA - 2 a X^{T} U + 2 λA = 0 & (7) \\ 2 (1 - a) Y^{T} YB - 2 (1 - a) Y^{T} U + 2 λB = 0 & (8) \end{matrix}

Get by (7):

A = {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} U - - - (9)

Get by (8):

B = {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T} U - - - (10)

In (9) and (10) substitution objective function 2, obtain:

a {| | X {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} U - U | |}_{F} + (1 - a) {| | Y {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T} U - U | |}_{F} + λ ({| | {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} U | |}_{F} + {| | {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T} U | |}_{F}) - - - (11)

Can get by (11):

Tr [U^{T} [aX {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} + (1 - a) Y {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T}] U] - - - (12)

So objective function 2 is finding the solution (12).

Order

G = [AX {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} + (1 - a) Y {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T}],

Then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.Wherein proper vector is promptly formed explicit shared Sub space U.

The principle of finding the solution explicit shared Sub space by after improving can know that the step of finding the solution explicit shared Sub space U is following:

(1) parameter a and λ are set and seek optimum explicit shared Sub space.Experiment shows, a=0.2, and optimum explicit shared Sub space U can be tried to achieve in λ=0.05 o'clock.

(2) ask

[AX {(X^{T} X + \frac{λ}{a} I)}^{- 1} X^{T} + (1 - a) Y {(Y^{T} Y + \frac{λ}{1 - a} I)}^{- 1} Y^{T}]

Eigenwert

And proper vector, and with this proper vector as explicit shared Sub space U.

Wherein:

X ^TAnd Y ^TRepresent transposition respectively to visual signature X and audio frequency characteristics Y;

I representation unit matrix;

λ is the regularization coefficient.

5. characteristic dimensionality reduction and Feature Fusion:

The characteristic dimensionality reduction operation that the present invention adopts can improve operation efficiency than the highland, and very little to purposes of commercial detection result influence.And the Feature Fusion technology can further improve the accuracy of monitoring of the advertisement.Concrete characteristic dimensionality reduction and Feature Fusion operation steps are following:

(1) eigenwert

is sorted from big to small, with eigenwert characteristic of correspondence vector also ordering thereupon.

(2) get preceding n (n＜N ₁, n＜N ₂) individual eigenwert characteristic of correspondence vector forms the training shared Sub space behind the dimensionality reduction.

(3) find the solution visual signature mapping matrix A by

;

(4) find the solution audio frequency characteristics mapping matrix B by

;

(5) the visual signature X with training set multiply by visual signature mapping matrix A, i.e. X * A, thus the visual signature X of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the visual signature X of training set;

(6) the audio frequency characteristics Y with training set multiply by audio frequency characteristics mapping matrix B, i.e. Y * B, thus the audio frequency characteristics Y of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the audio frequency characteristics Y of training set.

(7) X * A and Y * B are directly coupled together, i.e. [X * A, Y * B], thus accomplish Feature Fusion.

6. the SVMs training obtains the optimal classification model;

[X * A, Y * B] is input in the SVMs with the new feature after characteristic dimensionality reduction and the Feature Fusion, utilizes the right-angled intersection verification method to obtain optimum disaggregated model M and is used for the classification to the new feature of video to be detected.

7. video features to be detected extracts:

With 1,2,3 described methods are accomplished semantic camera lens and are cut apart to video to be detected, key-frame extraction, the audio frequency characteristics Y ' extraction of the visual signature X ' of test set and test set.

8. Feature Fusion:

The visual signature X ' of the test set that extracts and the audio frequency characteristics Y ' of test set are carried out Feature Fusion, and fusion steps is following:

(1) visual signature X ' is mapped to explicit shared Sub space U, method of operating is that X ' multiply by mapping matrix A.

(2) audio frequency characteristics Y ' is mapped to explicit shared Sub space U, method of operating is that Y ' multiply by mapping matrix B.

(3) X ' A and Y ' B are coupled together, i.e. [X ' A, Y ' B], thus accomplish Feature Fusion.

9. SVMs classification prediction

Among the disaggregated model M with the optimum that trains in [X ' A, Y ' B] input 7, the output category result judges tentatively whether each semantic camera lens of video to be detected is the advertisement camera lens.

10. post-processing step

Shown in accompanying drawing 3, this post-processing step has mainly utilized the time continuity characteristic of advertisement.Its step is following:

(1) to each semantic lens detection result, about 75 seconds window is set respectively, thereby form 150 seconds window.

(2) except that current semantic camera lens, advertisement number of shots N in the difference statistical window _CWith non-advertisement camera lens N _PQuantity.

(3) find the solution the camera lens property value of current camera lens in the window

l_{i} = \frac{1 \times N_{C} + (- 1) \times N_{P}}{N_{C} + N_{P}} .

(4) l _iWith the advertisement attributes threshold value T that pre-sets _CWith normal program attribute threshold value T _PCompare: if l _i＞T _C, then no matter whether original testing result is advertisement, all corrects to be advertisement.If l _i＜T _P, then no matter whether original testing result is advertisement, all corrects to be non-advertisement.

The purposes of commercial detection result of (5) output process last handling process, thus post-processing step accomplished.

In order to verify effect of the present invention, this programme the training set that comprises 8723 camera lenses with comprise under the setting of 4731 camera lens test sets, test.Fig. 4 is illustrated in the characteristic dimensionality reduction process, selects different dimensions this programme changes of properties curve.It is thus clear that along with the increase of selected intrinsic dimensionality, this programme has reached satisfactory performance.Fig. 5 is when selecting different visual signature weight coefficient, uses or do not use the curve of accuracy of the commercial detection system of post-processing technology.Can know the vital role of aftertreatment by figure.

The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. A video advertisement detection method based on an explicit shared subspace, characterized in that the method comprises the following steps:

Step 1: Segment the training set data into semantic shot sequences by specifying an algorithm;

Step 2: Extract visual key frames for each shot in the semantic shot sequence, and then obtain visual features and audio features, and obtain the eigenvalues of the mapping matrix composed of visual features and audio features;

Step 3: Obtain an explicit shared subspace based on visual features and audio features;

Step 4: Sort the eigenvalues of the mapping matrix in descending order, and select the vector corresponding to the eigenvalues of the specified mapping matrix in the explicit shared subspace, and use this vector to obtain the visual feature mapping matrix and audio feature mapping matrix;

Step 5: On the basis of step 4, map video features and audio features to an explicit shared subspace, complete dimensionality reduction of video features and audio features, and then complete feature fusion;

Step 6: Input the matrix obtained by feature fusion into the support vector machine for classification training, use a specific method to obtain the optimal classification model, and use it to preliminarily judge whether the shot to be detected is an advertising shot;

Step 7: On the basis of step 6, finally determine whether the shot to be detected is an advertising shot through a post-processing step.

2. A video advertisement detection method based on an explicit shared subspace according to claim 1, characterized in that the specified algorithm is a semantic shot segmentation algorithm.

3. A video advertisement detection method based on an explicit shared subspace according to claim 1, characterized in that said specific method is a cross-validation method.

4. A kind of video advertisement detection method based on explicit shared subspace according to claim 1, it is characterized in that described visual feature mapping matrix is:

A A = = {(({X x}^{T T} X x + + \frac{λ λ}{a a} I I))}^{- - 11} {X x}^{T T} u u

in:

A is the visual feature mapping matrix;

X is the visual feature;

λ is the regularization coefficient;

a is the weight coefficient of visual space and audio space;

I is the identity matrix;

u is the training shared subspace.

5. A kind of video advertisement detection method based on explicit shared subspace according to claim 4, it is characterized in that described audio feature mapping matrix is:

B B = = {(({Y Y}^{T T} Y Y + + \frac{λ λ}{11 - - a a} I I))}^{- - 11} {Y Y}^{T T} u u

in:

B is the audio feature mapping matrix;

Y is an audio feature.

6. A kind of video advertisement detection method based on explicit shared subspace according to claim 1, it is characterized in that described post-processing step is:

Step 1: Set the window length of the semantic lens;

Step 2: Count the number of advertising shots and non-advertising shots other than the current semantic shot in the window;

Step 3: Solve the lens attribute value of the current lens in the window;

Step 4: Compare the shot attribute value of the current shot in the window with the advertising attribute threshold and the normal program attribute threshold respectively, and finally determine whether the shot to be detected is an advertising shot.

7. A kind of video advertisement detection method based on explicit shared subspace according to claim 6, it is characterized in that the calculation formula of the lens attribute value of current lens in the described window is:

{l l}_{i i} = = \frac{11 \times \times {N N}_{C C} + + ((- - 11)) \times \times {N N}_{P P}}{{N N}_{C C} + + {N N}_{P P}}

in:

l _i is the lens attribute value of the i-th lens in the window;

N _C is the number of advertising shots other than the current semantic shot;

N _P is the number of non-advertising shots other than the current semantic shot.