CN102436483A - Video advertisement detecting method based on explicit type sharing subspace - Google Patents

Video advertisement detecting method based on explicit type sharing subspace Download PDF

Info

Publication number
CN102436483A
CN102436483A CN2011103356334A CN201110335633A CN102436483A CN 102436483 A CN102436483 A CN 102436483A CN 2011103356334 A CN2011103356334 A CN 2011103356334A CN 201110335633 A CN201110335633 A CN 201110335633A CN 102436483 A CN102436483 A CN 102436483A
Authority
CN
China
Prior art keywords
camera lens
advertisement
mapping matrix
audio frequency
frequency characteristics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103356334A
Other languages
Chinese (zh)
Inventor
朱振峰
赵耀
杨厚德
刘楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN2011103356334A priority Critical patent/CN102436483A/en
Publication of CN102436483A publication Critical patent/CN102436483A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video advertisement detecting method based on an explicit type sharing subspace, belonging to the technical field of multimedia index. The method comprises the steps of: firstly, cutting the lens to a semantic lens sequence; extracting a key frame therein and obtaining the visual feature and an audio feature, further obtaining the explicit type sharing subspace and the feature value of a mapping matrix formed by both; then obtaining the visual feature mapping matrix and the audio feature mapping matrix via selecting the corresponding vector of the feature value of the pointed mapping matrix in the explicit type sharing subspace; realizing the operation of reducing dimensions and fusing features; inputting the matrix obtained by fusing features to a support vector machine to train; after obtaining the optimal classification model, performing the primary judgment on the lens to be detected by using it; and then confirming whether the lens to be detected is the advertisement lens finally via the step of the post treatment. The invention is not required to build the advertisement database; the vector playing the main role is used for detecting the lens; at last, the lens is judged via the step of post treatment, and therefore improving the effectiveness and the accuracy of detecting.

Description

A kind of video ads detection method based on explicit shared Sub space
Technical field
The invention belongs to the multimedia retrieval technical field, relate in particular to a kind of video ads detection method based on explicit shared Sub space.
Background technology
Along with the continuous development of science and technology and progress, particularly computer technology, network technology and the high capacity memory technology of infotech, people have used a large amount of collection of various means and have produced various types of multimedia information datas.Yet, along with the accumulation all the year round of image and video, and the decline of multimedia messages processing power, people need find own interested content urgently quickly and accurately in vast as the ocean multi-medium data.
At first the text retrieval technology is adopted in the retrieval of multimedia messages, promptly generate the textual description of multimedia messages,, adopt the text retrieval technology to realize retrieval then multimedia messages like file explanation, mark etc. through manual type.The retrieval that this retrieval mode is applied to multimedia messages has its intrinsic defective.The first, whole semantemes that multimedia messages reflected are difficult to use the text accurate description.The second, the artificial method of describing is vulnerable to mark the influence of staff's carelessness and produces mistake, and workload is huge.The 3rd, the understanding of user on multimedia information varies with each individual and causes artificial describing mode objective inadequately.The 4th, textual description also receives the restriction of language, and is but different because of the angle of translation as for an American film, thereby the Chinese name of definition is also different.Therefore, the content-based multimedia retrieval method of research has important practical sense.
For addressing these problems, people propose content-based multimedia retrieval technology.So-called content-based multimedia retrieval is meant that physics and content semanteme that multi-medium data (like video, audio stream etc.) is contained carries out Computer Analysis and understand; To make things convenient for user inquiring; Its essence is to unordered multimedia data stream structuring; Extract semantic information, guarantee that content of multimedia can be by retrieval fast.
Purposes of commercial detection is causing the concern that People more and more is many as an important branch of Content-based Video Retrieval.Exactly because this advertisement important effect of play more and more in people's life.As the important carrier of business information, advertisement plays a part can not be substituted on the transmission business information.Many businessmans spend huge sums and make excellent advertisement for the product of oneself and propagate oneself product, and the expansion brand influence is pushed up sales.As the important means of supervision enterprise, relevant government advertising mechanism for monitoring is being served as tissue always, is being guided and supervised managing advertisement like advertising supervision and control department of State Administration for Industry and Commerce of the People's Republic of China, the responsibility of illegal activities such as investigation sham publicity.As the important method of obtaining merchandise news, people are also in the various advertising messages of the reception that does not stop.Along with the development of advertising, the quantity of advertisement grows with each passing day, and the type of advertisement also varies.How to discern and to detect advertisement automatically, become the focus of research.People have proposed the video ads detection system for this reason, hope to utilize this system can detect advertisement automatically, and the position of positioning advertising.
Different crowds is different to the real needs of video advertisement system, for common televiewer, often hopes that advertisement is few more good more, is unlikely to influence them like this and watches normal TV programme.They hope that the video ads detection system not only can accomplish the detection of advertisement, can also advertisement be cut off, and they just can interference-freely watch the normal tv program like this.For businessman, they but hope to watch comprehensive advertisement, and on the one hand, they can recognize rival's latest tendency according to rival's up-to-date advertisement, thereby make rational competitive strategy.On the other hand, whether whether they also play in the advertisement of inspection oneself as requested, get a desired effect or the like.For government organs, like advertising supervision and control department of State Administration for Industry and Commerce, General Bureau of Radio, Film and Television or the like, they will supervise advertisement content as supervision department, see whether whether advertisement content is illegal, cheat spectators.
The video ads detection algorithm varies, but all be the difference that utilizes advertising programme and ordinary tv programme, the video ads detection system that on the basis of existing content-based multimedia retrieval system, proposes.With the purposes of commercial detection algorithm based on principle different, we are divided into following three types with the algorithm of commercial detection system:
1. based on the station symbol method
This method is to have utilized the sign characteristics of TV station.We know, TV station is playing the normal tv program, like news, can the sign of TV station be placed on more showy position in the time of TV play etc., let the televiewer can remember television channel.And in the time of TV station's playing advertisements, but can the sign of oneself be stashed.Based on such difference, whether whether in progress program is advertisement in the existence that we can identify through detection TV station.Generally speaking, the station symbol of TV station is divided into three kinds: static station symbol, translucent station symbol and dynamic station symbol.To three kinds of different station symbols, people propose corresponding searching algorithm one after another, thereby realize the detection of advertisement.But the defective of this method mainly contains following 2 points: the first, and this ad playing rule is not the TV programme that are adapted to all periods of TV station; The second, translucent station symbol is with dynamically station symbol is because itself, and like manufacture technique, reasons such as manifestation mode deal with complicated especially, so also ripe without comparison purposes of commercial detection and recognizer.
2. based on the method for discerning
The prerequisite of the method is to set up huge advertising database, adopts the corresponding matched algorithm to confirm the similarity of advertisement in video and the database of to be detected or identification then, thereby determines whether it is the advertisement in the database.But imagine that easily the shortcoming of the maximum of this method is exactly to set up huge advertising database, constantly the artificial regeneration advertising database adds up-to-date advertisement in order to detection at any time with assurance simultaneously.How in huge memory contents, accomplishing inquiry and coupling rapidly also is a research difficult problem.
3. based on the method for learning
For solving the defective that preceding two class methods exist, people propose a kind of method based on study.The characteristic that the method mainly utilizes advertisement to be different from normal program realizes purposes of commercial detection.With respect to ordinary tv programme, advertising programme exists difference clearly in some characteristic aspect.This is because the characteristics of advertisement itself: advertisement because will attract spectators' eyeball, has been added various manufacture techniques when making, play up skill or the like.Such as realizing detecting through average edge rate A-ECR (Average of Edge Change Ratio) and the edge variation variance V-ECR (Variance of Change Ratio) that extracts one section frame of video.This mainly is to consider that advertisement is different from ordinary tv programme in visual aspects, and the edge variation situation of advertisement is than complicated many of normal program.Aspect audio frequency; Also there are some obvious characteristics in the audio content of advertisement video part and common program audio-frequency information partly, such as utilizing audio frequency Mei Er cepstral coefficients (Mel-frequency Cepstral Coefficient) and audio-frequency information entropy to realize the detection to video ads.Up-to-date commercial detection system merges both often, thereby realizes advertisement section is detected more accurately.In recent research; Much in the detection method based on study the method for having introduced machine learning is arranged, through training to sample, the reasonable sorter of obtained performance; Then advertisement camera lens and general programs camera lens are classified, thereby obtain more accurate testing result.But existing these class methods fail deep layer to excavate the total characteristic that following of different modalities contains semanteme, have influenced the performance of purposes of commercial detection.
Avoid the problem of preceding two class methods simultaneously in order to remedy this defective; The present invention is based on the principle of the 3rd class methods; Propose a kind of video ads detection system, utilize explicit shared Sub space visual signature and audio frequency characteristics to be merged dimensionality reduction based on explicit shared Sub space; Fully excavate the total semanteme that vision and audio modality contained; And utilize SVMs that the advertisement camera lens is classified, at last carry out aftertreatment and correct, thereby develop the system that a cover can the fast detecting advertisement by the time continuity characteristic of advertisement.
Summary of the invention
Can not deep layer excavate the deficiencies such as total characteristic that following of different modalities contains semanteme to mentioning existing method in the above-mentioned background technology, the present invention proposes a kind of video ads detection method based on explicit shared Sub space.
Technical scheme of the present invention is that a kind of video ads detection method based on explicit shared Sub space is characterized in that this method may further comprise the steps:
Step 1: training set data is divided into semantic shot sequence through assignment algorithm;
Step 2: each camera lens in the semantic shot sequence extracts the vision key frame, and then obtains visual signature and audio frequency characteristics, tries to achieve the eigenwert of the mapping matrix that is made up of visual signature and audio frequency characteristics;
Step 3: try to achieve explicit shared Sub space according to visual signature and audio frequency characteristics;
Step 4: according to from big to small rank order, and choose eigenwert corresponding vector in explicit shared Sub space of specifying mapping matrix, try to achieve visual signature mapping matrix and audio frequency characteristics mapping matrix with this vector to the eigenwert of mapping matrix;
Step 5: on the basis of step 4, video features and audio frequency characteristics are mapped to explicit shared Sub space, accomplish the dimensionality reduction of video features and audio frequency characteristics, and then accomplish Feature Fusion;
Step 6: will be input to by the matrix that Feature Fusion obtains and carry out classification based training in the SVMs, and utilize ad hoc approach to obtain the optimal classification model, and use it tentatively to judge whether camera lens to be detected is the advertisement camera lens;
Step 7: on the basis of step 6, confirm finally through post-processing step whether camera lens to be detected is the advertisement camera lens.
Said assignment algorithm is semantic camera lens partitioning algorithm.
Said ad hoc approach is the right-angled intersection proof method.
Said visual signature mapping matrix is:
A = ( X T X + λ a I ) - 1 X T u
Wherein:
A is the visual signature mapping matrix;
X is a visual signature;
λ is the regularization coefficient;
A is the weight coefficient of visual space and audio space;
I is a unit matrix;
U is training shared Sub space.
Said audio frequency characteristics mapping matrix is:
B = ( Y T Y + λ 1 - a I ) - 1 Y T u
Wherein:
B is the audio frequency characteristics mapping matrix;
Y is an audio frequency characteristics.
Said post-processing step is:
Step 1: the length of window that semantic camera lens is set;
Step 2: advertisement number of shots and non-number of ads in the statistical window beyond the current semantic camera lens;
Step 3: the camera lens property value of finding the solution current camera lens in the window;
Step 4: the camera lens property value with current camera lens in the window compares with advertisement attributes threshold value and normal program attribute threshold value respectively, judges whether final definite camera lens to be detected is the advertisement camera lens.
The computing formula of the camera lens property value of current camera lens is in the said window:
l i = 1 × N C + ( - 1 ) × N P N C + N P
Wherein:
l iCamera lens property value for i camera lens in the window;
N CBe the advertisement number of shots beyond the current semantic camera lens;
N PBe the non-advertisement number of shots beyond the current semantic camera lens.
The advantage of native system is: the characteristic under the different modalities has been extracted in advertisement, made full use of the characteristic under the different modalities and construct unified semantic description.When constructing unified semantic description, we have adopted novel structure to unify the method for semantic description, promptly explicit shared Sub space.When asking explicit shared Sub space, we add adjustable parameter to visual signature and audio frequency characteristics.Parameter obtains visual signature and audio frequency characteristics is best merges through regulating.In addition, the present invention is based on the commercial detection system of rule learning, so need not set up huge advertising database, in dimensionality reduction, can effectively obtain the most effectively dimension, thereby improve detection efficiency simultaneously.Last post-processing step has also made full use of the time continuity characteristic of advertisement, has improved the accuracy of the semantic Shot Detection of advertisement to a great extent.
Description of drawings
Fig. 1 is an overall system diagram of the present invention;
Fig. 2 is a principle schematic of finding the solution explicit shared Sub space;
Fig. 3 is the aftertreatment synoptic diagram;
Fig. 4 is an influence curve of selecting dimension correct rate behind the dimensionality reduction;
Fig. 5 is the influence curve of visual signature weight coefficient correct rate.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit scope of the present invention and application thereof.
The present invention is shown in accompanying drawing 1, and its basic process is:
The video database that will comprise advertisement and non-advertisement through semantic camera lens segmentation procedure, is divided into semantic shot sequence with training set data as training set data;
Each semantic camera lens in the training set data uniformly-spaced extracts the vision key frame, then to key-frame extraction N in the semantic camera lens 1The visual signature X of dimension, this characteristic comprises color, edge, grey level histogram;
The equally spaced audio fragment that is divided into of each semantic camera lens in the training set data extracts N then 2The audio frequency characteristics Y of dimension, this characteristic comprises the tone color textural characteristics, tonequality characteristic and other low-level image feature;
Find the solution explicit shared Sub space U according to the visual signature X of training set and the audio frequency characteristics Y of training set, the eigenwert
Figure BDA0000103719670000071
of the visual signature X of training set and the audio frequency characteristics mapping matrix that Y constitutes of training set
Figure BDA0000103719670000072
sorts from big to small with eigenwert; With eigenwert characteristic of correspondence vector also ordering thereupon; The dimension that selection plays a major role from explicit shared Sub space U; Form training shared Sub space u, utilize training shared Sub space u to find the solution visual signature mapping matrix A and audio frequency characteristics mapping matrix B; The visual signature X of training set and the audio frequency characteristics Y of training set multiply by visual signature mapping matrix A and audio frequency characteristics mapping matrix B respectively; Be X * A and Y * B; Thereby the visual signature X of training set and the audio frequency characteristics Y of training set are mapped among the explicit shared Sub space U, and have accomplished characteristic dimensionality reduction the audio frequency characteristics Y of the visual signature X of training set and training set.X * A and Y * B are directly combined, i.e. [X * A Y * B], thus accomplish Feature Fusion.
[X * A Y * B] be input to carry out classification based training in the SVMs, utilize the right-angled intersection proof method, obtain the optimal classification model M;
Camera lens to be detected is accomplished semantic camera lens segmentation procedure according to the identical method of training set data, extract the visual signature X ' of corresponding test set and the audio frequency characteristics Y ' of test set;
Audio frequency characteristics Y ' to the visual signature X ' of test set and test set carries out Feature Fusion and characteristic dimensionality reduction, promptly at first the visual signature X ' of test set multiply by visual signature mapping matrix A, i.e. X ' * A; The audio frequency characteristics Y ' of test set multiply by audio frequency characteristics mapping matrix B, i.e. Y ' * B.Then X ' * A and Y ' * B are directly coupled together, promptly [X ' * A Y ' * B] is as the data to be predicted of SVMs;
Whether the optimal classification model and the parameter of training are treated predicted data and are carried out SVMs classification prediction, be the advertisement camera lens thereby judge;
Setting is corrected and is differentiated wrong advertisement camera lens based on successional post-processing step of time spot, thereby further improves the accuracy of purposes of commercial detection.
Below in conjunction with accompanying drawing and embodiment the present invention is done further description.
Based on the technical scheme of above introduction, we apply the present invention in the purposes of commercial detection, for the user provides accurate purposes of commercial detection service.In conjunction with accompanying drawing, our specific embodiments of the invention is done to set forth in detail.
1. semantic shot sequence is cut apart
In the present invention; The purpose that semantic shot sequence is cut apart is to hope video is divided into the junior unit with semanteme; Each semantic camera lens can be expressed identical meaning, is the detection that unit carries out advertisement with semantic camera lens, thereby reduces memory consumption and computation complexity.Native system adopts semantic shot sequence partitioning algorithm, not only can detect the sudden change camera lens, can also detect the gradual change camera lens.Its operating process is following:
(1) initialization sudden change camera lens threshold value T cWith gradual change camera lens threshold value T s
(2) read the video file of input by the form of frame after, clip image.Remove image up and down each 1/6, in the middle of only keeping 2/3, thereby remove black surround, the influence of station symbol and captions.
(3) reserve part is extracted 24 dimension HSV histograms.The histogram difference of two frames before and after calculating is followed T with difference cRelatively, if greater than T c, then be judged as semantic shot-cut point, write down this position.
(4) since the first stature cut-point, if between next semantic camera lens cut-point and the last semantic camera lens cut-point greater than 10 frames, calculate the accumulative histogram difference between then every continuous 10 frames, with accumulative histogram difference and T sRelatively, if greater than T s, then be judged as the semantic camera lens cut-point of gradual change, write down this position.
(5) change (2), histogrammic difference between the remaining successive frame is relatively accomplished between all frames in the continuation comparison video.
2. key-frame extraction
Native system has adopted different extraction method of key frame to phonetic feature and visual signature.The concrete operations step is following:
(1) for visual signature, for each semantic camera lens, per 30 frames extract a key frame, thereby obtain visual signature key frame sequence.
(2) for audio frequency characteristics, for each semantic camera lens, per 20 milliseconds of places are as the audio frequency characteristics key frame.
3. feature extraction
Native system has extracted visual signature and audio frequency characteristics simultaneously, from vision and two aspects of audio frequency video content is described.
(1) for visual signature, we adopted multiple dimensioned moving window extracted HSV (hue, saturation, value) color histogram, edge rate, gray scale frame difference characteristic, these characteristics couple together forms N 1The characteristic of dimension is described the visual information of video.
(2) for audio frequency characteristics, we have adopted textural characteristics, tonequality characteristic and some other low-level image feature, thus form N 2The characteristic of dimension is described Video and Audio information.
4. find the solution explicit shared Sub space U
Through visual signature and first characteristic frequently are provided with parameter, find the solution explicit shared Sub space among the present invention.Shown in accompanying drawing 2,, make the characteristic that obtains have better discrimination through finding the solution explicit shared Sub space.Its cardinal principle is following:
Ask and make objective function 1.a||XA-U|| F+ (1-a) || YB-U|| FMinimum U.
Specifically find the solution as follows:
Former formula=Tr [a (XA-U) T(XA-U)]+Tr [(1-a) (YB-U) T(YB-U)]
=aTr[(A TX T-U T)(XA-U)]+(1-a)Tr[(B TY T-U T)(YB-U)]
=aTr[A TX TXA-A TX TU-U TXA+U TU+(1-a)Tr[B TY TYB-B TY TU-U TYB+U TU]
=aTr[A TX TXA-2A TX TU+U TU]+(1-a)Tr[B TY TYB-2B TY TU+U TU]
=aTr[A TX TXA-2A TX TU+I]+(1-a)Tr[B TY TYB-2B TY TU+I]
=aTr[A TX TXA-2A TX TU]+(1-a)Tr[B TY TYB-2B TY TU]+Tr(I)
=aTr[A TX TXA]-2aTr[A TX TU]+(1-a)Tr[B TY TYB]-2(1-a)Tr[B TY TU]+Tr(I)
Wherein:
A is the weight coefficient of visual signature;
X is the visual signature of training set;
A is the visual signature mapping matrix;
Y is the audio frequency characteristics of training set;
B is the audio frequency characteristics mapping matrix;
Tr is for to ask mark (trace) to matrix;
X TTransposed matrix for matrix X;
U is explicit shared Sub space, U TU=I.
Following formula is asked partial derivative to A, B respectively, that is:
∂ L ∂ A = 2 a X T XA - 2 a X T U
∂ L ∂ B = 2 ( 1 - a ) Y T YB - 2 ( 1 - a ) Y T U
Order ∂ L ∂ A = 0 , ∂ L ∂ B = 0 , Have:
2 a X T XA - 2 a X T U = 0 ( 1 ) 2 ( 1 - a ) Y T YB - 2 ( 1 - a ) Y T U = 0 ( 2 )
Get by (1):
A=(X TX) -1X TU (3)
Get by (2):
B=(Y TY) -1Y TU (4)
In (3) and (4) substitution objective function 1, obtain:
a||X(X TX) -1X TU-U|| F+(1-a)||Y(Y TY) -1Y TU-U|| F (5)
Can get by (5):
Tr[U T[aX(X TX) -1X T+(1-a)Y(Y TY) -1Y T]U] (6)
So objective function 1 is finding the solution (6).
Make G=[aX (X TX) -1X T+ (1-a) Y (Y TY) -1Y T], then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.But, because (X TX) -1(Y TY) -1Irreversible situation may occur, we introduce the regularization coefficient lambda, prevent the situation that matrix is unusual, and λ can also prevent simultaneously || A| FWith || B|| FExcessive or too small situation, the concrete operations principle is following:
Ask and make objective function 2.a||XA-U|| F+ (1-a) || YB-U|| F+ λ [|| A|| F+ || B|| F] minimum U, wherein U TU=I.Find the solution as follows:
Former formula=Tr [a (XA-U) T(XA-U)]+Tr is [(1-α) (YB-U) T(YB-U)]+λ Tr [A TA+B TB]
=aTr[(A TX T-U T)(XA-U)]+(1-a)Tr[B TY T-U T)(YB-U)]+λTr[A TA+B TB]
=aTR[A TX TXA-A TX TU-U TXA+U TU]+(1-a)Tr[B TY TYB-B TY TU-U TYB+U TU]+λTr[A TA+B TB]
=aTr[A TX TXA-2A TX TU+U TU]+(1-a)Tr[B TY TYB-2B TY TU+U TU]+λTr[A TA+B TB]
=aTr[A TX TXA-2A TX TU+I]+(1-a)Tr[B TY TYB-2B TY TU+I]+λTr[A TA+B TB]
=aTr[A TX TXA-2A TX TU]+(1-a)Tr[B TY TYB-2B TY TU]+Tr(I)+λTr[A TA+B TB]
=aTr[A TX TXA]-2aTr[A TX TU]+(1-a)Tr[B TY TYB]-2(1-a)Tr[B TY TU]+Tr(I)+λTr[A TA+B TB]
Following formula is asked partial derivative to A, B respectively, that is:
∂ L ∂ A = 2 a X T XA - 2 a X T U + 2 λA
∂ L ∂ B = 2 ( 1 - a ) Y T YB - 2 ( 1 - a ) Y T U + 2 λB
Order ∂ L ∂ A = 0 , ∂ L ∂ B = 0 , Have:
2 a X T XA - 2 a X T U + 2 λA = 0 ( 7 ) 2 ( 1 - a ) Y T YB - 2 ( 1 - a ) Y T U + 2 λB = 0 ( 8 )
Get by (7):
A = ( X T X + λ a I ) - 1 X T U - - - ( 9 )
Get by (8):
B = ( Y T Y + λ 1 - a I ) - 1 Y T U - - - ( 10 )
In (9) and (10) substitution objective function 2, obtain:
a | | X ( X T X + λ a I ) - 1 X T U - U | | F + ( 1 - a ) | | Y ( Y T Y + λ 1 - a I ) - 1 Y T U - U | | F + λ ( | | ( X T X + λ a I ) - 1 X T U | | F + | | ( Y T Y + λ 1 - a I ) - 1 Y T U | | F ) - - - ( 11 )
Can get by (11):
Tr [ U T [ aX ( X T X + λ a I ) - 1 X T + ( 1 - a ) Y ( Y T Y + λ 1 - a I ) - 1 Y T ] U ] - - - ( 12 )
So objective function 2 is finding the solution (12).
Order G = [ AX ( X T X + λ a I ) - 1 X T + ( 1 - a ) Y ( Y T Y + λ 1 - a I ) - 1 Y T ] , Then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.Wherein proper vector is promptly formed explicit shared Sub space U.
The principle of finding the solution explicit shared Sub space by after improving can know that the step of finding the solution explicit shared Sub space U is following:
(1) parameter a and λ are set and seek optimum explicit shared Sub space.Experiment shows, a=0.2, and optimum explicit shared Sub space U can be tried to achieve in λ=0.05 o'clock.
(2) ask [ AX ( X T X + λ a I ) - 1 X T + ( 1 - a ) Y ( Y T Y + λ 1 - a I ) - 1 Y T ] Eigenwert
Figure BDA0000103719670000134
And proper vector, and with this proper vector as explicit shared Sub space U.
Wherein:
X TAnd Y TRepresent transposition respectively to visual signature X and audio frequency characteristics Y;
I representation unit matrix;
λ is the regularization coefficient.
5. characteristic dimensionality reduction and Feature Fusion:
The characteristic dimensionality reduction operation that the present invention adopts can improve operation efficiency than the highland, and very little to purposes of commercial detection result influence.And the Feature Fusion technology can further improve the accuracy of monitoring of the advertisement.Concrete characteristic dimensionality reduction and Feature Fusion operation steps are following:
(1) eigenwert
Figure BDA0000103719670000135
is sorted from big to small, with eigenwert characteristic of correspondence vector also ordering thereupon.
(2) get preceding n (n<N 1, n<N 2) individual eigenwert characteristic of correspondence vector forms the training shared Sub space behind the dimensionality reduction.
(3) find the solution visual signature mapping matrix A by
Figure BDA0000103719670000141
;
(4) find the solution audio frequency characteristics mapping matrix B by
Figure BDA0000103719670000142
;
(5) the visual signature X with training set multiply by visual signature mapping matrix A, i.e. X * A, thus the visual signature X of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the visual signature X of training set;
(6) the audio frequency characteristics Y with training set multiply by audio frequency characteristics mapping matrix B, i.e. Y * B, thus the audio frequency characteristics Y of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the audio frequency characteristics Y of training set.
(7) X * A and Y * B are directly coupled together, i.e. [X * A, Y * B], thus accomplish Feature Fusion.
6. the SVMs training obtains the optimal classification model;
[X * A, Y * B] is input in the SVMs with the new feature after characteristic dimensionality reduction and the Feature Fusion, utilizes the right-angled intersection verification method to obtain optimum disaggregated model M and is used for the classification to the new feature of video to be detected.
7. video features to be detected extracts:
With 1,2,3 described methods are accomplished semantic camera lens and are cut apart to video to be detected, key-frame extraction, the audio frequency characteristics Y ' extraction of the visual signature X ' of test set and test set.
8. Feature Fusion:
The visual signature X ' of the test set that extracts and the audio frequency characteristics Y ' of test set are carried out Feature Fusion, and fusion steps is following:
(1) visual signature X ' is mapped to explicit shared Sub space U, method of operating is that X ' multiply by mapping matrix A.
(2) audio frequency characteristics Y ' is mapped to explicit shared Sub space U, method of operating is that Y ' multiply by mapping matrix B.
(3) X ' A and Y ' B are coupled together, i.e. [X ' A, Y ' B], thus accomplish Feature Fusion.
9. SVMs classification prediction
Among the disaggregated model M with the optimum that trains in [X ' A, Y ' B] input 7, the output category result judges tentatively whether each semantic camera lens of video to be detected is the advertisement camera lens.
10. post-processing step
Shown in accompanying drawing 3, this post-processing step has mainly utilized the time continuity characteristic of advertisement.Its step is following:
(1) to each semantic lens detection result, about 75 seconds window is set respectively, thereby form 150 seconds window.
(2) except that current semantic camera lens, advertisement number of shots N in the difference statistical window CWith non-advertisement camera lens N PQuantity.
(3) find the solution the camera lens property value of current camera lens in the window l i = 1 × N C + ( - 1 ) × N P N C + N P .
(4) l iWith the advertisement attributes threshold value T that pre-sets CWith normal program attribute threshold value T PCompare: if l i>T C, then no matter whether original testing result is advertisement, all corrects to be advertisement.If l i<T P, then no matter whether original testing result is advertisement, all corrects to be non-advertisement.
The purposes of commercial detection result of (5) output process last handling process, thus post-processing step accomplished.
In order to verify effect of the present invention, this programme the training set that comprises 8723 camera lenses with comprise under the setting of 4731 camera lens test sets, test.Fig. 4 is illustrated in the characteristic dimensionality reduction process, selects different dimensions this programme changes of properties curve.It is thus clear that along with the increase of selected intrinsic dimensionality, this programme has reached satisfactory performance.Fig. 5 is when selecting different visual signature weight coefficient, uses or do not use the curve of accuracy of the commercial detection system of post-processing technology.Can know the vital role of aftertreatment by figure.
The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (7)

1. video ads detection method based on explicit shared Sub space is characterized in that this method may further comprise the steps:
Step 1: training set data is divided into semantic shot sequence through assignment algorithm;
Step 2: each camera lens in the semantic shot sequence extracts the vision key frame, and then obtains visual signature and audio frequency characteristics, tries to achieve the eigenwert of the mapping matrix that is made up of visual signature and audio frequency characteristics;
Step 3: try to achieve explicit shared Sub space according to visual signature and audio frequency characteristics;
Step 4: according to from big to small rank order, and choose eigenwert corresponding vector in explicit shared Sub space of specifying mapping matrix, try to achieve visual signature mapping matrix and audio frequency characteristics mapping matrix with this vector to the eigenwert of mapping matrix;
Step 5: on the basis of step 4, video features and audio frequency characteristics are mapped to explicit shared Sub space, accomplish the dimensionality reduction of video features and audio frequency characteristics, and then accomplish Feature Fusion;
Step 6: will be input to by the matrix that Feature Fusion obtains and carry out classification based training in the SVMs, and utilize ad hoc approach to obtain the optimal classification model, and use it tentatively to judge whether camera lens to be detected is the advertisement camera lens;
Step 7: on the basis of step 6, confirm finally through post-processing step whether camera lens to be detected is the advertisement camera lens.
2. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said assignment algorithm is semantic camera lens partitioning algorithm.
3. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said ad hoc approach is the right-angled intersection proof method.
4. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said visual signature mapping matrix is:
A = ( X T X + λ a I ) - 1 X T u
Wherein:
A is the visual signature mapping matrix;
X is a visual signature;
λ is the regularization coefficient;
A is the weight coefficient of visual space and audio space;
I is a unit matrix;
U is training shared Sub space.
5. a kind of video ads detection method based on explicit shared Sub space according to claim 4 is characterized in that said audio frequency characteristics mapping matrix is:
B = ( Y T Y + λ 1 - a I ) - 1 Y T u
Wherein:
B is the audio frequency characteristics mapping matrix;
Y is an audio frequency characteristics.
6. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said post-processing step is:
Step 1: the length of window that semantic camera lens is set;
Step 2: advertisement number of shots and non-number of ads in the statistical window beyond the current semantic camera lens;
Step 3: the camera lens property value of finding the solution current camera lens in the window;
Step 4: the camera lens property value with current camera lens in the window compares with advertisement attributes threshold value and normal program attribute threshold value respectively, judges whether final definite camera lens to be detected is the advertisement camera lens.
7. a kind of video ads detection method based on explicit shared Sub space according to claim 6 is characterized in that the computing formula of the camera lens property value of current camera lens in the said window is:
l i = 1 × N C + ( - 1 ) × N P N C + N P
Wherein:
l iCamera lens property value for i camera lens in the window;
N CBe the advertisement number of shots beyond the current semantic camera lens;
N PBe the non-advertisement number of shots beyond the current semantic camera lens.
CN2011103356334A 2011-10-31 2011-10-31 Video advertisement detecting method based on explicit type sharing subspace Pending CN102436483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103356334A CN102436483A (en) 2011-10-31 2011-10-31 Video advertisement detecting method based on explicit type sharing subspace

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103356334A CN102436483A (en) 2011-10-31 2011-10-31 Video advertisement detecting method based on explicit type sharing subspace

Publications (1)

Publication Number Publication Date
CN102436483A true CN102436483A (en) 2012-05-02

Family

ID=45984546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103356334A Pending CN102436483A (en) 2011-10-31 2011-10-31 Video advertisement detecting method based on explicit type sharing subspace

Country Status (1)

Country Link
CN (1) CN102436483A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799633A (en) * 2012-06-26 2012-11-28 天脉聚源(北京)传媒科技有限公司 Advertisement video detection method
CN103237233A (en) * 2013-03-28 2013-08-07 深圳Tcl新技术有限公司 Rapid detection method and system for television commercials
CN103458300A (en) * 2013-08-28 2013-12-18 天津三星电子有限公司 Television false advertisement prompting method and system
CN103838835A (en) * 2014-02-25 2014-06-04 中国科学院自动化研究所 Network sensitive video detection method
CN104469545A (en) * 2014-12-22 2015-03-25 无锡天脉聚源传媒科技有限公司 Method and device for verifying splitting effect of video clip
CN104504055A (en) * 2014-12-19 2015-04-08 常州飞寻视讯信息科技有限公司 Commodity similarity calculation method and commodity recommending system based on image similarity
CN104581396A (en) * 2014-12-12 2015-04-29 北京百度网讯科技有限公司 Processing method and device for promotion information
CN107133266A (en) * 2017-03-31 2017-09-05 北京奇艺世纪科技有限公司 The detection method and device and database update method and device of video lens classification
CN107682733A (en) * 2017-10-13 2018-02-09 深圳依偎控股有限公司 A kind of control method and system for improving user and watching video tastes degree
CN109446990A (en) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109522450A (en) * 2018-11-29 2019-03-26 腾讯科技(深圳)有限公司 A kind of method and server of visual classification
CN110232357A (en) * 2019-06-17 2019-09-13 深圳航天科技创新研究院 A kind of video lens dividing method and system
CN111723239A (en) * 2020-05-11 2020-09-29 华中科技大学 Multi-mode-based video annotation method
CN112214643A (en) * 2020-10-15 2021-01-12 百度(中国)有限公司 Video patch generation method and device, electronic equipment and storage medium
CN112233667A (en) * 2020-12-17 2021-01-15 成都索贝数码科技股份有限公司 Synchronous voice recognition method based on deep learning
WO2022033231A1 (en) * 2020-08-10 2022-02-17 International Business Machines Corporation Dual-modality relation networks for audio-visual event localization

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101162470A (en) * 2007-11-16 2008-04-16 北京交通大学 Video frequency advertisement recognition method based on layered matching

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101162470A (en) * 2007-11-16 2008-04-16 北京交通大学 Video frequency advertisement recognition method based on layered matching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨厚德: "视频广告的自动识别与检测", 《中国优秀硕士学位论文全文数据库》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799633B (en) * 2012-06-26 2015-07-15 天脉聚源(北京)传媒科技有限公司 Advertisement video detection method
WO2014000515A1 (en) * 2012-06-26 2014-01-03 天脉聚源(北京)传媒科技有限公司 Advertisement video detection method
CN102799633A (en) * 2012-06-26 2012-11-28 天脉聚源(北京)传媒科技有限公司 Advertisement video detection method
CN103237233A (en) * 2013-03-28 2013-08-07 深圳Tcl新技术有限公司 Rapid detection method and system for television commercials
CN103458300A (en) * 2013-08-28 2013-12-18 天津三星电子有限公司 Television false advertisement prompting method and system
CN103838835A (en) * 2014-02-25 2014-06-04 中国科学院自动化研究所 Network sensitive video detection method
CN103838835B (en) * 2014-02-25 2017-11-21 中国科学院自动化研究所 A kind of network sensitive video detection method
CN104581396A (en) * 2014-12-12 2015-04-29 北京百度网讯科技有限公司 Processing method and device for promotion information
CN104504055A (en) * 2014-12-19 2015-04-08 常州飞寻视讯信息科技有限公司 Commodity similarity calculation method and commodity recommending system based on image similarity
CN104504055B (en) * 2014-12-19 2017-12-26 常州飞寻视讯信息科技有限公司 The similar computational methods of commodity and commercial product recommending system based on image similarity
CN104469545A (en) * 2014-12-22 2015-03-25 无锡天脉聚源传媒科技有限公司 Method and device for verifying splitting effect of video clip
CN104469545B (en) * 2014-12-22 2017-09-15 无锡天脉聚源传媒科技有限公司 A kind of method and apparatus for examining video segment cutting effect
CN107133266A (en) * 2017-03-31 2017-09-05 北京奇艺世纪科技有限公司 The detection method and device and database update method and device of video lens classification
CN107133266B (en) * 2017-03-31 2020-02-18 北京奇艺世纪科技有限公司 Method and device for detecting video shot type and method and device for updating database
CN107682733A (en) * 2017-10-13 2018-02-09 深圳依偎控股有限公司 A kind of control method and system for improving user and watching video tastes degree
CN107682733B (en) * 2017-10-13 2020-04-28 深圳依偎控股有限公司 Control method and system for improving user experience of watching video
CN109446990A (en) * 2018-10-30 2019-03-08 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109522450A (en) * 2018-11-29 2019-03-26 腾讯科技(深圳)有限公司 A kind of method and server of visual classification
WO2020108396A1 (en) * 2018-11-29 2020-06-04 腾讯科技(深圳)有限公司 Video classification method, and server
US12106563B2 (en) 2018-11-29 2024-10-01 Tencent Technology (Shenzhen) Company Limited Video classification method and server
US11741711B2 (en) 2018-11-29 2023-08-29 Tencent Technology (Shenzhen) Company Limited Video classification method and server
CN110232357A (en) * 2019-06-17 2019-09-13 深圳航天科技创新研究院 A kind of video lens dividing method and system
CN111723239A (en) * 2020-05-11 2020-09-29 华中科技大学 Multi-mode-based video annotation method
CN111723239B (en) * 2020-05-11 2023-06-16 华中科技大学 Video annotation method based on multiple modes
GB2613507A (en) * 2020-08-10 2023-06-07 Ibm Dual-modality relation networks for audio-visual event localization
US11663823B2 (en) 2020-08-10 2023-05-30 International Business Machines Corporation Dual-modality relation networks for audio-visual event localization
WO2022033231A1 (en) * 2020-08-10 2022-02-17 International Business Machines Corporation Dual-modality relation networks for audio-visual event localization
CN112214643B (en) * 2020-10-15 2024-01-12 百度(中国)有限公司 Video patch generation method and device, electronic equipment and storage medium
CN112214643A (en) * 2020-10-15 2021-01-12 百度(中国)有限公司 Video patch generation method and device, electronic equipment and storage medium
CN112233667B (en) * 2020-12-17 2021-03-23 成都索贝数码科技股份有限公司 Synchronous voice recognition method based on deep learning
CN112233667A (en) * 2020-12-17 2021-01-15 成都索贝数码科技股份有限公司 Synchronous voice recognition method based on deep learning

Similar Documents

Publication Publication Date Title
CN102436483A (en) Video advertisement detecting method based on explicit type sharing subspace
US11023523B2 (en) Video content retrieval system
CN101292238B (en) Method and system for automated rich presentation of a semantic topic
CN108269125B (en) Comment information quality evaluation method and system and comment information processing method and system
CN111754302B (en) Video live broadcast interface commodity display intelligent management system based on big data
WO2016179938A1 (en) Method and device for question recommendation
US10248865B2 (en) Identifying presentation styles of educational videos
CN111460221B (en) Comment information processing method and device and electronic equipment
CN105930411A (en) Classifier training method, classifier and sentiment classification system
CN110851718B (en) Movie recommendation method based on long and short term memory network and user comments
CN103605658A (en) Search engine system based on text emotion analysis
CN105069041A (en) Video user gender classification based advertisement putting method
CN110287314B (en) Long text reliability assessment method and system based on unsupervised clustering
Balasubramanian et al. A multimodal approach for extracting content descriptive metadata from lecture videos
CN118250516B (en) Hierarchical processing method for users
US20240086452A1 (en) Tracking concepts within content in content management systems and adaptive learning systems
Baidya et al. LectureKhoj: automatic tagging and semantic segmentation of online lecture videos
CN107844531B (en) Answer output method and device and computer equipment
CN101213539B (en) Cross descriptor learning system using non-label sample and method
CN116882414B (en) Automatic comment generation method and related device based on large-scale language model
Vinciarelli et al. Application of information retrieval technologies to presentation slides
Kechaou et al. A novel system for video news' sentiment analysis
KR101838089B1 (en) Sentimetal opinion extracting/evaluating system based on big data context for finding welfare service and method thereof
Zhu et al. Identifying and modeling the dynamic evolution of niche preferences
CN118503363B (en) Emotion analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120502