CN102436483A - Video advertisement detecting method based on explicit type sharing subspace - Google Patents
Video advertisement detecting method based on explicit type sharing subspace Download PDFInfo
- Publication number
- CN102436483A CN102436483A CN2011103356334A CN201110335633A CN102436483A CN 102436483 A CN102436483 A CN 102436483A CN 2011103356334 A CN2011103356334 A CN 2011103356334A CN 201110335633 A CN201110335633 A CN 201110335633A CN 102436483 A CN102436483 A CN 102436483A
- Authority
- CN
- China
- Prior art keywords
- camera lens
- advertisement
- mapping matrix
- audio frequency
- frequency characteristics
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a video advertisement detecting method based on an explicit type sharing subspace, belonging to the technical field of multimedia index. The method comprises the steps of: firstly, cutting the lens to a semantic lens sequence; extracting a key frame therein and obtaining the visual feature and an audio feature, further obtaining the explicit type sharing subspace and the feature value of a mapping matrix formed by both; then obtaining the visual feature mapping matrix and the audio feature mapping matrix via selecting the corresponding vector of the feature value of the pointed mapping matrix in the explicit type sharing subspace; realizing the operation of reducing dimensions and fusing features; inputting the matrix obtained by fusing features to a support vector machine to train; after obtaining the optimal classification model, performing the primary judgment on the lens to be detected by using it; and then confirming whether the lens to be detected is the advertisement lens finally via the step of the post treatment. The invention is not required to build the advertisement database; the vector playing the main role is used for detecting the lens; at last, the lens is judged via the step of post treatment, and therefore improving the effectiveness and the accuracy of detecting.
Description
Technical field
The invention belongs to the multimedia retrieval technical field, relate in particular to a kind of video ads detection method based on explicit shared Sub space.
Background technology
Along with the continuous development of science and technology and progress, particularly computer technology, network technology and the high capacity memory technology of infotech, people have used a large amount of collection of various means and have produced various types of multimedia information datas.Yet, along with the accumulation all the year round of image and video, and the decline of multimedia messages processing power, people need find own interested content urgently quickly and accurately in vast as the ocean multi-medium data.
At first the text retrieval technology is adopted in the retrieval of multimedia messages, promptly generate the textual description of multimedia messages,, adopt the text retrieval technology to realize retrieval then multimedia messages like file explanation, mark etc. through manual type.The retrieval that this retrieval mode is applied to multimedia messages has its intrinsic defective.The first, whole semantemes that multimedia messages reflected are difficult to use the text accurate description.The second, the artificial method of describing is vulnerable to mark the influence of staff's carelessness and produces mistake, and workload is huge.The 3rd, the understanding of user on multimedia information varies with each individual and causes artificial describing mode objective inadequately.The 4th, textual description also receives the restriction of language, and is but different because of the angle of translation as for an American film, thereby the Chinese name of definition is also different.Therefore, the content-based multimedia retrieval method of research has important practical sense.
For addressing these problems, people propose content-based multimedia retrieval technology.So-called content-based multimedia retrieval is meant that physics and content semanteme that multi-medium data (like video, audio stream etc.) is contained carries out Computer Analysis and understand; To make things convenient for user inquiring; Its essence is to unordered multimedia data stream structuring; Extract semantic information, guarantee that content of multimedia can be by retrieval fast.
Purposes of commercial detection is causing the concern that People more and more is many as an important branch of Content-based Video Retrieval.Exactly because this advertisement important effect of play more and more in people's life.As the important carrier of business information, advertisement plays a part can not be substituted on the transmission business information.Many businessmans spend huge sums and make excellent advertisement for the product of oneself and propagate oneself product, and the expansion brand influence is pushed up sales.As the important means of supervision enterprise, relevant government advertising mechanism for monitoring is being served as tissue always, is being guided and supervised managing advertisement like advertising supervision and control department of State Administration for Industry and Commerce of the People's Republic of China, the responsibility of illegal activities such as investigation sham publicity.As the important method of obtaining merchandise news, people are also in the various advertising messages of the reception that does not stop.Along with the development of advertising, the quantity of advertisement grows with each passing day, and the type of advertisement also varies.How to discern and to detect advertisement automatically, become the focus of research.People have proposed the video ads detection system for this reason, hope to utilize this system can detect advertisement automatically, and the position of positioning advertising.
Different crowds is different to the real needs of video advertisement system, for common televiewer, often hopes that advertisement is few more good more, is unlikely to influence them like this and watches normal TV programme.They hope that the video ads detection system not only can accomplish the detection of advertisement, can also advertisement be cut off, and they just can interference-freely watch the normal tv program like this.For businessman, they but hope to watch comprehensive advertisement, and on the one hand, they can recognize rival's latest tendency according to rival's up-to-date advertisement, thereby make rational competitive strategy.On the other hand, whether whether they also play in the advertisement of inspection oneself as requested, get a desired effect or the like.For government organs, like advertising supervision and control department of State Administration for Industry and Commerce, General Bureau of Radio, Film and Television or the like, they will supervise advertisement content as supervision department, see whether whether advertisement content is illegal, cheat spectators.
The video ads detection algorithm varies, but all be the difference that utilizes advertising programme and ordinary tv programme, the video ads detection system that on the basis of existing content-based multimedia retrieval system, proposes.With the purposes of commercial detection algorithm based on principle different, we are divided into following three types with the algorithm of commercial detection system:
1. based on the station symbol method
This method is to have utilized the sign characteristics of TV station.We know, TV station is playing the normal tv program, like news, can the sign of TV station be placed on more showy position in the time of TV play etc., let the televiewer can remember television channel.And in the time of TV station's playing advertisements, but can the sign of oneself be stashed.Based on such difference, whether whether in progress program is advertisement in the existence that we can identify through detection TV station.Generally speaking, the station symbol of TV station is divided into three kinds: static station symbol, translucent station symbol and dynamic station symbol.To three kinds of different station symbols, people propose corresponding searching algorithm one after another, thereby realize the detection of advertisement.But the defective of this method mainly contains following 2 points: the first, and this ad playing rule is not the TV programme that are adapted to all periods of TV station; The second, translucent station symbol is with dynamically station symbol is because itself, and like manufacture technique, reasons such as manifestation mode deal with complicated especially, so also ripe without comparison purposes of commercial detection and recognizer.
2. based on the method for discerning
The prerequisite of the method is to set up huge advertising database, adopts the corresponding matched algorithm to confirm the similarity of advertisement in video and the database of to be detected or identification then, thereby determines whether it is the advertisement in the database.But imagine that easily the shortcoming of the maximum of this method is exactly to set up huge advertising database, constantly the artificial regeneration advertising database adds up-to-date advertisement in order to detection at any time with assurance simultaneously.How in huge memory contents, accomplishing inquiry and coupling rapidly also is a research difficult problem.
3. based on the method for learning
For solving the defective that preceding two class methods exist, people propose a kind of method based on study.The characteristic that the method mainly utilizes advertisement to be different from normal program realizes purposes of commercial detection.With respect to ordinary tv programme, advertising programme exists difference clearly in some characteristic aspect.This is because the characteristics of advertisement itself: advertisement because will attract spectators' eyeball, has been added various manufacture techniques when making, play up skill or the like.Such as realizing detecting through average edge rate A-ECR (Average of Edge Change Ratio) and the edge variation variance V-ECR (Variance of Change Ratio) that extracts one section frame of video.This mainly is to consider that advertisement is different from ordinary tv programme in visual aspects, and the edge variation situation of advertisement is than complicated many of normal program.Aspect audio frequency; Also there are some obvious characteristics in the audio content of advertisement video part and common program audio-frequency information partly, such as utilizing audio frequency Mei Er cepstral coefficients (Mel-frequency Cepstral Coefficient) and audio-frequency information entropy to realize the detection to video ads.Up-to-date commercial detection system merges both often, thereby realizes advertisement section is detected more accurately.In recent research; Much in the detection method based on study the method for having introduced machine learning is arranged, through training to sample, the reasonable sorter of obtained performance; Then advertisement camera lens and general programs camera lens are classified, thereby obtain more accurate testing result.But existing these class methods fail deep layer to excavate the total characteristic that following of different modalities contains semanteme, have influenced the performance of purposes of commercial detection.
Avoid the problem of preceding two class methods simultaneously in order to remedy this defective; The present invention is based on the principle of the 3rd class methods; Propose a kind of video ads detection system, utilize explicit shared Sub space visual signature and audio frequency characteristics to be merged dimensionality reduction based on explicit shared Sub space; Fully excavate the total semanteme that vision and audio modality contained; And utilize SVMs that the advertisement camera lens is classified, at last carry out aftertreatment and correct, thereby develop the system that a cover can the fast detecting advertisement by the time continuity characteristic of advertisement.
Summary of the invention
Can not deep layer excavate the deficiencies such as total characteristic that following of different modalities contains semanteme to mentioning existing method in the above-mentioned background technology, the present invention proposes a kind of video ads detection method based on explicit shared Sub space.
Technical scheme of the present invention is that a kind of video ads detection method based on explicit shared Sub space is characterized in that this method may further comprise the steps:
Step 1: training set data is divided into semantic shot sequence through assignment algorithm;
Step 2: each camera lens in the semantic shot sequence extracts the vision key frame, and then obtains visual signature and audio frequency characteristics, tries to achieve the eigenwert of the mapping matrix that is made up of visual signature and audio frequency characteristics;
Step 3: try to achieve explicit shared Sub space according to visual signature and audio frequency characteristics;
Step 4: according to from big to small rank order, and choose eigenwert corresponding vector in explicit shared Sub space of specifying mapping matrix, try to achieve visual signature mapping matrix and audio frequency characteristics mapping matrix with this vector to the eigenwert of mapping matrix;
Step 5: on the basis of step 4, video features and audio frequency characteristics are mapped to explicit shared Sub space, accomplish the dimensionality reduction of video features and audio frequency characteristics, and then accomplish Feature Fusion;
Step 6: will be input to by the matrix that Feature Fusion obtains and carry out classification based training in the SVMs, and utilize ad hoc approach to obtain the optimal classification model, and use it tentatively to judge whether camera lens to be detected is the advertisement camera lens;
Step 7: on the basis of step 6, confirm finally through post-processing step whether camera lens to be detected is the advertisement camera lens.
Said assignment algorithm is semantic camera lens partitioning algorithm.
Said ad hoc approach is the right-angled intersection proof method.
Said visual signature mapping matrix is:
Wherein:
A is the visual signature mapping matrix;
X is a visual signature;
λ is the regularization coefficient;
A is the weight coefficient of visual space and audio space;
I is a unit matrix;
U is training shared Sub space.
Said audio frequency characteristics mapping matrix is:
Wherein:
B is the audio frequency characteristics mapping matrix;
Y is an audio frequency characteristics.
Said post-processing step is:
Step 1: the length of window that semantic camera lens is set;
Step 2: advertisement number of shots and non-number of ads in the statistical window beyond the current semantic camera lens;
Step 3: the camera lens property value of finding the solution current camera lens in the window;
Step 4: the camera lens property value with current camera lens in the window compares with advertisement attributes threshold value and normal program attribute threshold value respectively, judges whether final definite camera lens to be detected is the advertisement camera lens.
The computing formula of the camera lens property value of current camera lens is in the said window:
Wherein:
l
iCamera lens property value for i camera lens in the window;
N
CBe the advertisement number of shots beyond the current semantic camera lens;
N
PBe the non-advertisement number of shots beyond the current semantic camera lens.
The advantage of native system is: the characteristic under the different modalities has been extracted in advertisement, made full use of the characteristic under the different modalities and construct unified semantic description.When constructing unified semantic description, we have adopted novel structure to unify the method for semantic description, promptly explicit shared Sub space.When asking explicit shared Sub space, we add adjustable parameter to visual signature and audio frequency characteristics.Parameter obtains visual signature and audio frequency characteristics is best merges through regulating.In addition, the present invention is based on the commercial detection system of rule learning, so need not set up huge advertising database, in dimensionality reduction, can effectively obtain the most effectively dimension, thereby improve detection efficiency simultaneously.Last post-processing step has also made full use of the time continuity characteristic of advertisement, has improved the accuracy of the semantic Shot Detection of advertisement to a great extent.
Description of drawings
Fig. 1 is an overall system diagram of the present invention;
Fig. 2 is a principle schematic of finding the solution explicit shared Sub space;
Fig. 3 is the aftertreatment synoptic diagram;
Fig. 4 is an influence curve of selecting dimension correct rate behind the dimensionality reduction;
Fig. 5 is the influence curve of visual signature weight coefficient correct rate.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit scope of the present invention and application thereof.
The present invention is shown in accompanying drawing 1, and its basic process is:
The video database that will comprise advertisement and non-advertisement through semantic camera lens segmentation procedure, is divided into semantic shot sequence with training set data as training set data;
Each semantic camera lens in the training set data uniformly-spaced extracts the vision key frame, then to key-frame extraction N in the semantic camera lens
1The visual signature X of dimension, this characteristic comprises color, edge, grey level histogram;
The equally spaced audio fragment that is divided into of each semantic camera lens in the training set data extracts N then
2The audio frequency characteristics Y of dimension, this characteristic comprises the tone color textural characteristics, tonequality characteristic and other low-level image feature;
Find the solution explicit shared Sub space U according to the visual signature X of training set and the audio frequency characteristics Y of training set, the eigenwert
of the visual signature X of training set and the audio frequency characteristics mapping matrix that Y constitutes of training set
sorts from big to small with eigenwert; With eigenwert characteristic of correspondence vector also ordering thereupon; The dimension that selection plays a major role from explicit shared Sub space U; Form training shared Sub space u, utilize training shared Sub space u to find the solution visual signature mapping matrix A and audio frequency characteristics mapping matrix B; The visual signature X of training set and the audio frequency characteristics Y of training set multiply by visual signature mapping matrix A and audio frequency characteristics mapping matrix B respectively; Be X * A and Y * B; Thereby the visual signature X of training set and the audio frequency characteristics Y of training set are mapped among the explicit shared Sub space U, and have accomplished characteristic dimensionality reduction the audio frequency characteristics Y of the visual signature X of training set and training set.X * A and Y * B are directly combined, i.e. [X * A Y * B], thus accomplish Feature Fusion.
[X * A Y * B] be input to carry out classification based training in the SVMs, utilize the right-angled intersection proof method, obtain the optimal classification model M;
Camera lens to be detected is accomplished semantic camera lens segmentation procedure according to the identical method of training set data, extract the visual signature X ' of corresponding test set and the audio frequency characteristics Y ' of test set;
Audio frequency characteristics Y ' to the visual signature X ' of test set and test set carries out Feature Fusion and characteristic dimensionality reduction, promptly at first the visual signature X ' of test set multiply by visual signature mapping matrix A, i.e. X ' * A; The audio frequency characteristics Y ' of test set multiply by audio frequency characteristics mapping matrix B, i.e. Y ' * B.Then X ' * A and Y ' * B are directly coupled together, promptly [X ' * A Y ' * B] is as the data to be predicted of SVMs;
Whether the optimal classification model and the parameter of training are treated predicted data and are carried out SVMs classification prediction, be the advertisement camera lens thereby judge;
Setting is corrected and is differentiated wrong advertisement camera lens based on successional post-processing step of time spot, thereby further improves the accuracy of purposes of commercial detection.
Below in conjunction with accompanying drawing and embodiment the present invention is done further description.
Based on the technical scheme of above introduction, we apply the present invention in the purposes of commercial detection, for the user provides accurate purposes of commercial detection service.In conjunction with accompanying drawing, our specific embodiments of the invention is done to set forth in detail.
1. semantic shot sequence is cut apart
In the present invention; The purpose that semantic shot sequence is cut apart is to hope video is divided into the junior unit with semanteme; Each semantic camera lens can be expressed identical meaning, is the detection that unit carries out advertisement with semantic camera lens, thereby reduces memory consumption and computation complexity.Native system adopts semantic shot sequence partitioning algorithm, not only can detect the sudden change camera lens, can also detect the gradual change camera lens.Its operating process is following:
(1) initialization sudden change camera lens threshold value T
cWith gradual change camera lens threshold value T
s
(2) read the video file of input by the form of frame after, clip image.Remove image up and down each 1/6, in the middle of only keeping 2/3, thereby remove black surround, the influence of station symbol and captions.
(3) reserve part is extracted 24 dimension HSV histograms.The histogram difference of two frames before and after calculating is followed T with difference
cRelatively, if greater than T
c, then be judged as semantic shot-cut point, write down this position.
(4) since the first stature cut-point, if between next semantic camera lens cut-point and the last semantic camera lens cut-point greater than 10 frames, calculate the accumulative histogram difference between then every continuous 10 frames, with accumulative histogram difference and T
sRelatively, if greater than T
s, then be judged as the semantic camera lens cut-point of gradual change, write down this position.
(5) change (2), histogrammic difference between the remaining successive frame is relatively accomplished between all frames in the continuation comparison video.
2. key-frame extraction
Native system has adopted different extraction method of key frame to phonetic feature and visual signature.The concrete operations step is following:
(1) for visual signature, for each semantic camera lens, per 30 frames extract a key frame, thereby obtain visual signature key frame sequence.
(2) for audio frequency characteristics, for each semantic camera lens, per 20 milliseconds of places are as the audio frequency characteristics key frame.
3. feature extraction
Native system has extracted visual signature and audio frequency characteristics simultaneously, from vision and two aspects of audio frequency video content is described.
(1) for visual signature, we adopted multiple dimensioned moving window extracted HSV (hue, saturation, value) color histogram, edge rate, gray scale frame difference characteristic, these characteristics couple together forms N
1The characteristic of dimension is described the visual information of video.
(2) for audio frequency characteristics, we have adopted textural characteristics, tonequality characteristic and some other low-level image feature, thus form N
2The characteristic of dimension is described Video and Audio information.
4. find the solution explicit shared Sub space U
Through visual signature and first characteristic frequently are provided with parameter, find the solution explicit shared Sub space among the present invention.Shown in accompanying drawing 2,, make the characteristic that obtains have better discrimination through finding the solution explicit shared Sub space.Its cardinal principle is following:
Ask and make objective function 1.a||XA-U||
F+ (1-a) || YB-U||
FMinimum U.
Specifically find the solution as follows:
Former formula=Tr [a (XA-U)
T(XA-U)]+Tr [(1-a) (YB-U)
T(YB-U)]
=aTr[(A
TX
T-U
T)(XA-U)]+(1-a)Tr[(B
TY
T-U
T)(YB-U)]
=aTr[A
TX
TXA-A
TX
TU-U
TXA+U
TU+(1-a)Tr[B
TY
TYB-B
TY
TU-U
TYB+U
TU]
=aTr[A
TX
TXA-2A
TX
TU+U
TU]+(1-a)Tr[B
TY
TYB-2B
TY
TU+U
TU]
=aTr[A
TX
TXA-2A
TX
TU+I]+(1-a)Tr[B
TY
TYB-2B
TY
TU+I]
=aTr[A
TX
TXA-2A
TX
TU]+(1-a)Tr[B
TY
TYB-2B
TY
TU]+Tr(I)
=aTr[A
TX
TXA]-2aTr[A
TX
TU]+(1-a)Tr[B
TY
TYB]-2(1-a)Tr[B
TY
TU]+Tr(I)
Wherein:
A is the weight coefficient of visual signature;
X is the visual signature of training set;
A is the visual signature mapping matrix;
Y is the audio frequency characteristics of training set;
B is the audio frequency characteristics mapping matrix;
Tr is for to ask mark (trace) to matrix;
X
TTransposed matrix for matrix X;
U is explicit shared Sub space, U
TU=I.
Following formula is asked partial derivative to A, B respectively, that is:
Order
Have:
Get by (1):
A=(X
TX)
-1X
TU (3)
Get by (2):
B=(Y
TY)
-1Y
TU (4)
In (3) and (4) substitution objective function 1, obtain:
a||X(X
TX)
-1X
TU-U||
F+(1-a)||Y(Y
TY)
-1Y
TU-U||
F (5)
Can get by (5):
Tr[U
T[aX(X
TX)
-1X
T+(1-a)Y(Y
TY)
-1Y
T]U] (6)
So objective function 1 is finding the solution (6).
Make G=[aX (X
TX)
-1X
T+ (1-a) Y (Y
TY)
-1Y
T], then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.But, because (X
TX)
-1(Y
TY)
-1Irreversible situation may occur, we introduce the regularization coefficient lambda, prevent the situation that matrix is unusual, and λ can also prevent simultaneously || A|
FWith || B||
FExcessive or too small situation, the concrete operations principle is following:
Ask and make objective function 2.a||XA-U||
F+ (1-a) || YB-U||
F+ λ [|| A||
F+ || B||
F] minimum U, wherein U
TU=I.Find the solution as follows:
Former formula=Tr [a (XA-U)
T(XA-U)]+Tr is [(1-α) (YB-U)
T(YB-U)]+λ Tr [A
TA+B
TB]
=aTr[(A
TX
T-U
T)(XA-U)]+(1-a)Tr[B
TY
T-U
T)(YB-U)]+λTr[A
TA+B
TB]
=aTR[A
TX
TXA-A
TX
TU-U
TXA+U
TU]+(1-a)Tr[B
TY
TYB-B
TY
TU-U
TYB+U
TU]+λTr[A
TA+B
TB]
=aTr[A
TX
TXA-2A
TX
TU+U
TU]+(1-a)Tr[B
TY
TYB-2B
TY
TU+U
TU]+λTr[A
TA+B
TB]
=aTr[A
TX
TXA-2A
TX
TU+I]+(1-a)Tr[B
TY
TYB-2B
TY
TU+I]+λTr[A
TA+B
TB]
=aTr[A
TX
TXA-2A
TX
TU]+(1-a)Tr[B
TY
TYB-2B
TY
TU]+Tr(I)+λTr[A
TA+B
TB]
=aTr[A
TX
TXA]-2aTr[A
TX
TU]+(1-a)Tr[B
TY
TYB]-2(1-a)Tr[B
TY
TU]+Tr(I)+λTr[A
TA+B
TB]
Following formula is asked partial derivative to A, B respectively, that is:
Order
Have:
Get by (7):
Get by (8):
In (9) and (10) substitution objective function 2, obtain:
Can get by (11):
So objective function 2 is finding the solution (12).
Order
Then objective function is converted into generalized eigenvalue and the proper vector of finding the solution G.Wherein proper vector is promptly formed explicit shared Sub space U.
The principle of finding the solution explicit shared Sub space by after improving can know that the step of finding the solution explicit shared Sub space U is following:
(1) parameter a and λ are set and seek optimum explicit shared Sub space.Experiment shows, a=0.2, and optimum explicit shared Sub space U can be tried to achieve in λ=0.05 o'clock.
Wherein:
X
TAnd Y
TRepresent transposition respectively to visual signature X and audio frequency characteristics Y;
I representation unit matrix;
λ is the regularization coefficient.
5. characteristic dimensionality reduction and Feature Fusion:
The characteristic dimensionality reduction operation that the present invention adopts can improve operation efficiency than the highland, and very little to purposes of commercial detection result influence.And the Feature Fusion technology can further improve the accuracy of monitoring of the advertisement.Concrete characteristic dimensionality reduction and Feature Fusion operation steps are following:
(1) eigenwert
is sorted from big to small, with eigenwert characteristic of correspondence vector also ordering thereupon.
(2) get preceding n (n<N
1, n<N
2) individual eigenwert characteristic of correspondence vector forms the training shared Sub space behind the dimensionality reduction.
(5) the visual signature X with training set multiply by visual signature mapping matrix A, i.e. X * A, thus the visual signature X of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the visual signature X of training set;
(6) the audio frequency characteristics Y with training set multiply by audio frequency characteristics mapping matrix B, i.e. Y * B, thus the audio frequency characteristics Y of training set is mapped among the explicit shared Sub space U, and accomplished characteristic dimensionality reduction to the audio frequency characteristics Y of training set.
(7) X * A and Y * B are directly coupled together, i.e. [X * A, Y * B], thus accomplish Feature Fusion.
6. the SVMs training obtains the optimal classification model;
[X * A, Y * B] is input in the SVMs with the new feature after characteristic dimensionality reduction and the Feature Fusion, utilizes the right-angled intersection verification method to obtain optimum disaggregated model M and is used for the classification to the new feature of video to be detected.
7. video features to be detected extracts:
With 1,2,3 described methods are accomplished semantic camera lens and are cut apart to video to be detected, key-frame extraction, the audio frequency characteristics Y ' extraction of the visual signature X ' of test set and test set.
8. Feature Fusion:
The visual signature X ' of the test set that extracts and the audio frequency characteristics Y ' of test set are carried out Feature Fusion, and fusion steps is following:
(1) visual signature X ' is mapped to explicit shared Sub space U, method of operating is that X ' multiply by mapping matrix A.
(2) audio frequency characteristics Y ' is mapped to explicit shared Sub space U, method of operating is that Y ' multiply by mapping matrix B.
(3) X ' A and Y ' B are coupled together, i.e. [X ' A, Y ' B], thus accomplish Feature Fusion.
9. SVMs classification prediction
Among the disaggregated model M with the optimum that trains in [X ' A, Y ' B] input 7, the output category result judges tentatively whether each semantic camera lens of video to be detected is the advertisement camera lens.
10. post-processing step
Shown in accompanying drawing 3, this post-processing step has mainly utilized the time continuity characteristic of advertisement.Its step is following:
(1) to each semantic lens detection result, about 75 seconds window is set respectively, thereby form 150 seconds window.
(2) except that current semantic camera lens, advertisement number of shots N in the difference statistical window
CWith non-advertisement camera lens N
PQuantity.
(3) find the solution the camera lens property value of current camera lens in the window
(4) l
iWith the advertisement attributes threshold value T that pre-sets
CWith normal program attribute threshold value T
PCompare: if l
i>T
C, then no matter whether original testing result is advertisement, all corrects to be advertisement.If l
i<T
P, then no matter whether original testing result is advertisement, all corrects to be non-advertisement.
The purposes of commercial detection result of (5) output process last handling process, thus post-processing step accomplished.
In order to verify effect of the present invention, this programme the training set that comprises 8723 camera lenses with comprise under the setting of 4731 camera lens test sets, test.Fig. 4 is illustrated in the characteristic dimensionality reduction process, selects different dimensions this programme changes of properties curve.It is thus clear that along with the increase of selected intrinsic dimensionality, this programme has reached satisfactory performance.Fig. 5 is when selecting different visual signature weight coefficient, uses or do not use the curve of accuracy of the commercial detection system of post-processing technology.Can know the vital role of aftertreatment by figure.
The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.
Claims (7)
1. video ads detection method based on explicit shared Sub space is characterized in that this method may further comprise the steps:
Step 1: training set data is divided into semantic shot sequence through assignment algorithm;
Step 2: each camera lens in the semantic shot sequence extracts the vision key frame, and then obtains visual signature and audio frequency characteristics, tries to achieve the eigenwert of the mapping matrix that is made up of visual signature and audio frequency characteristics;
Step 3: try to achieve explicit shared Sub space according to visual signature and audio frequency characteristics;
Step 4: according to from big to small rank order, and choose eigenwert corresponding vector in explicit shared Sub space of specifying mapping matrix, try to achieve visual signature mapping matrix and audio frequency characteristics mapping matrix with this vector to the eigenwert of mapping matrix;
Step 5: on the basis of step 4, video features and audio frequency characteristics are mapped to explicit shared Sub space, accomplish the dimensionality reduction of video features and audio frequency characteristics, and then accomplish Feature Fusion;
Step 6: will be input to by the matrix that Feature Fusion obtains and carry out classification based training in the SVMs, and utilize ad hoc approach to obtain the optimal classification model, and use it tentatively to judge whether camera lens to be detected is the advertisement camera lens;
Step 7: on the basis of step 6, confirm finally through post-processing step whether camera lens to be detected is the advertisement camera lens.
2. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said assignment algorithm is semantic camera lens partitioning algorithm.
3. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said ad hoc approach is the right-angled intersection proof method.
4. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said visual signature mapping matrix is:
Wherein:
A is the visual signature mapping matrix;
X is a visual signature;
λ is the regularization coefficient;
A is the weight coefficient of visual space and audio space;
I is a unit matrix;
U is training shared Sub space.
5. a kind of video ads detection method based on explicit shared Sub space according to claim 4 is characterized in that said audio frequency characteristics mapping matrix is:
Wherein:
B is the audio frequency characteristics mapping matrix;
Y is an audio frequency characteristics.
6. a kind of video ads detection method based on explicit shared Sub space according to claim 1 is characterized in that said post-processing step is:
Step 1: the length of window that semantic camera lens is set;
Step 2: advertisement number of shots and non-number of ads in the statistical window beyond the current semantic camera lens;
Step 3: the camera lens property value of finding the solution current camera lens in the window;
Step 4: the camera lens property value with current camera lens in the window compares with advertisement attributes threshold value and normal program attribute threshold value respectively, judges whether final definite camera lens to be detected is the advertisement camera lens.
7. a kind of video ads detection method based on explicit shared Sub space according to claim 6 is characterized in that the computing formula of the camera lens property value of current camera lens in the said window is:
Wherein:
l
iCamera lens property value for i camera lens in the window;
N
CBe the advertisement number of shots beyond the current semantic camera lens;
N
PBe the non-advertisement number of shots beyond the current semantic camera lens.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103356334A CN102436483A (en) | 2011-10-31 | 2011-10-31 | Video advertisement detecting method based on explicit type sharing subspace |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103356334A CN102436483A (en) | 2011-10-31 | 2011-10-31 | Video advertisement detecting method based on explicit type sharing subspace |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102436483A true CN102436483A (en) | 2012-05-02 |
Family
ID=45984546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103356334A Pending CN102436483A (en) | 2011-10-31 | 2011-10-31 | Video advertisement detecting method based on explicit type sharing subspace |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102436483A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799633A (en) * | 2012-06-26 | 2012-11-28 | 天脉聚源(北京)传媒科技有限公司 | Advertisement video detection method |
CN103237233A (en) * | 2013-03-28 | 2013-08-07 | 深圳Tcl新技术有限公司 | Rapid detection method and system for television commercials |
CN103458300A (en) * | 2013-08-28 | 2013-12-18 | 天津三星电子有限公司 | Television false advertisement prompting method and system |
CN103838835A (en) * | 2014-02-25 | 2014-06-04 | 中国科学院自动化研究所 | Network sensitive video detection method |
CN104469545A (en) * | 2014-12-22 | 2015-03-25 | 无锡天脉聚源传媒科技有限公司 | Method and device for verifying splitting effect of video clip |
CN104504055A (en) * | 2014-12-19 | 2015-04-08 | 常州飞寻视讯信息科技有限公司 | Commodity similarity calculation method and commodity recommending system based on image similarity |
CN104581396A (en) * | 2014-12-12 | 2015-04-29 | 北京百度网讯科技有限公司 | Processing method and device for promotion information |
CN107133266A (en) * | 2017-03-31 | 2017-09-05 | 北京奇艺世纪科技有限公司 | The detection method and device and database update method and device of video lens classification |
CN107682733A (en) * | 2017-10-13 | 2018-02-09 | 深圳依偎控股有限公司 | A kind of control method and system for improving user and watching video tastes degree |
CN109446990A (en) * | 2018-10-30 | 2019-03-08 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109522450A (en) * | 2018-11-29 | 2019-03-26 | 腾讯科技(深圳)有限公司 | A kind of method and server of visual classification |
CN110232357A (en) * | 2019-06-17 | 2019-09-13 | 深圳航天科技创新研究院 | A kind of video lens dividing method and system |
CN111723239A (en) * | 2020-05-11 | 2020-09-29 | 华中科技大学 | Multi-mode-based video annotation method |
CN112214643A (en) * | 2020-10-15 | 2021-01-12 | 百度(中国)有限公司 | Video patch generation method and device, electronic equipment and storage medium |
CN112233667A (en) * | 2020-12-17 | 2021-01-15 | 成都索贝数码科技股份有限公司 | Synchronous voice recognition method based on deep learning |
WO2022033231A1 (en) * | 2020-08-10 | 2022-02-17 | International Business Machines Corporation | Dual-modality relation networks for audio-visual event localization |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162470A (en) * | 2007-11-16 | 2008-04-16 | 北京交通大学 | Video frequency advertisement recognition method based on layered matching |
-
2011
- 2011-10-31 CN CN2011103356334A patent/CN102436483A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162470A (en) * | 2007-11-16 | 2008-04-16 | 北京交通大学 | Video frequency advertisement recognition method based on layered matching |
Non-Patent Citations (1)
Title |
---|
杨厚德: "视频广告的自动识别与检测", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799633B (en) * | 2012-06-26 | 2015-07-15 | 天脉聚源(北京)传媒科技有限公司 | Advertisement video detection method |
WO2014000515A1 (en) * | 2012-06-26 | 2014-01-03 | 天脉聚源(北京)传媒科技有限公司 | Advertisement video detection method |
CN102799633A (en) * | 2012-06-26 | 2012-11-28 | 天脉聚源(北京)传媒科技有限公司 | Advertisement video detection method |
CN103237233A (en) * | 2013-03-28 | 2013-08-07 | 深圳Tcl新技术有限公司 | Rapid detection method and system for television commercials |
CN103458300A (en) * | 2013-08-28 | 2013-12-18 | 天津三星电子有限公司 | Television false advertisement prompting method and system |
CN103838835A (en) * | 2014-02-25 | 2014-06-04 | 中国科学院自动化研究所 | Network sensitive video detection method |
CN103838835B (en) * | 2014-02-25 | 2017-11-21 | 中国科学院自动化研究所 | A kind of network sensitive video detection method |
CN104581396A (en) * | 2014-12-12 | 2015-04-29 | 北京百度网讯科技有限公司 | Processing method and device for promotion information |
CN104504055A (en) * | 2014-12-19 | 2015-04-08 | 常州飞寻视讯信息科技有限公司 | Commodity similarity calculation method and commodity recommending system based on image similarity |
CN104504055B (en) * | 2014-12-19 | 2017-12-26 | 常州飞寻视讯信息科技有限公司 | The similar computational methods of commodity and commercial product recommending system based on image similarity |
CN104469545A (en) * | 2014-12-22 | 2015-03-25 | 无锡天脉聚源传媒科技有限公司 | Method and device for verifying splitting effect of video clip |
CN104469545B (en) * | 2014-12-22 | 2017-09-15 | 无锡天脉聚源传媒科技有限公司 | A kind of method and apparatus for examining video segment cutting effect |
CN107133266A (en) * | 2017-03-31 | 2017-09-05 | 北京奇艺世纪科技有限公司 | The detection method and device and database update method and device of video lens classification |
CN107133266B (en) * | 2017-03-31 | 2020-02-18 | 北京奇艺世纪科技有限公司 | Method and device for detecting video shot type and method and device for updating database |
CN107682733A (en) * | 2017-10-13 | 2018-02-09 | 深圳依偎控股有限公司 | A kind of control method and system for improving user and watching video tastes degree |
CN107682733B (en) * | 2017-10-13 | 2020-04-28 | 深圳依偎控股有限公司 | Control method and system for improving user experience of watching video |
CN109446990A (en) * | 2018-10-30 | 2019-03-08 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109522450A (en) * | 2018-11-29 | 2019-03-26 | 腾讯科技(深圳)有限公司 | A kind of method and server of visual classification |
WO2020108396A1 (en) * | 2018-11-29 | 2020-06-04 | 腾讯科技(深圳)有限公司 | Video classification method, and server |
US12106563B2 (en) | 2018-11-29 | 2024-10-01 | Tencent Technology (Shenzhen) Company Limited | Video classification method and server |
US11741711B2 (en) | 2018-11-29 | 2023-08-29 | Tencent Technology (Shenzhen) Company Limited | Video classification method and server |
CN110232357A (en) * | 2019-06-17 | 2019-09-13 | 深圳航天科技创新研究院 | A kind of video lens dividing method and system |
CN111723239A (en) * | 2020-05-11 | 2020-09-29 | 华中科技大学 | Multi-mode-based video annotation method |
CN111723239B (en) * | 2020-05-11 | 2023-06-16 | 华中科技大学 | Video annotation method based on multiple modes |
GB2613507A (en) * | 2020-08-10 | 2023-06-07 | Ibm | Dual-modality relation networks for audio-visual event localization |
US11663823B2 (en) | 2020-08-10 | 2023-05-30 | International Business Machines Corporation | Dual-modality relation networks for audio-visual event localization |
WO2022033231A1 (en) * | 2020-08-10 | 2022-02-17 | International Business Machines Corporation | Dual-modality relation networks for audio-visual event localization |
CN112214643B (en) * | 2020-10-15 | 2024-01-12 | 百度(中国)有限公司 | Video patch generation method and device, electronic equipment and storage medium |
CN112214643A (en) * | 2020-10-15 | 2021-01-12 | 百度(中国)有限公司 | Video patch generation method and device, electronic equipment and storage medium |
CN112233667B (en) * | 2020-12-17 | 2021-03-23 | 成都索贝数码科技股份有限公司 | Synchronous voice recognition method based on deep learning |
CN112233667A (en) * | 2020-12-17 | 2021-01-15 | 成都索贝数码科技股份有限公司 | Synchronous voice recognition method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102436483A (en) | Video advertisement detecting method based on explicit type sharing subspace | |
US11023523B2 (en) | Video content retrieval system | |
CN101292238B (en) | Method and system for automated rich presentation of a semantic topic | |
CN108269125B (en) | Comment information quality evaluation method and system and comment information processing method and system | |
CN111754302B (en) | Video live broadcast interface commodity display intelligent management system based on big data | |
WO2016179938A1 (en) | Method and device for question recommendation | |
US10248865B2 (en) | Identifying presentation styles of educational videos | |
CN111460221B (en) | Comment information processing method and device and electronic equipment | |
CN105930411A (en) | Classifier training method, classifier and sentiment classification system | |
CN110851718B (en) | Movie recommendation method based on long and short term memory network and user comments | |
CN103605658A (en) | Search engine system based on text emotion analysis | |
CN105069041A (en) | Video user gender classification based advertisement putting method | |
CN110287314B (en) | Long text reliability assessment method and system based on unsupervised clustering | |
Balasubramanian et al. | A multimodal approach for extracting content descriptive metadata from lecture videos | |
CN118250516B (en) | Hierarchical processing method for users | |
US20240086452A1 (en) | Tracking concepts within content in content management systems and adaptive learning systems | |
Baidya et al. | LectureKhoj: automatic tagging and semantic segmentation of online lecture videos | |
CN107844531B (en) | Answer output method and device and computer equipment | |
CN101213539B (en) | Cross descriptor learning system using non-label sample and method | |
CN116882414B (en) | Automatic comment generation method and related device based on large-scale language model | |
Vinciarelli et al. | Application of information retrieval technologies to presentation slides | |
Kechaou et al. | A novel system for video news' sentiment analysis | |
KR101838089B1 (en) | Sentimetal opinion extracting/evaluating system based on big data context for finding welfare service and method thereof | |
Zhu et al. | Identifying and modeling the dynamic evolution of niche preferences | |
CN118503363B (en) | Emotion analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120502 |