CN102024033B - A kind of automatic detection audio template also divides the method for chapter to video - Google Patents

A kind of automatic detection audio template also divides the method for chapter to video Download PDF

Info

Publication number
CN102024033B
CN102024033B CN201010567970.1A CN201010567970A CN102024033B CN 102024033 B CN102024033 B CN 102024033B CN 201010567970 A CN201010567970 A CN 201010567970A CN 102024033 B CN102024033 B CN 102024033B
Authority
CN
China
Prior art keywords
fragment
audio
template
frame
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010567970.1A
Other languages
Chinese (zh)
Other versions
CN102024033A (en
Inventor
董远
王乐滋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201010567970.1A priority Critical patent/CN102024033B/en
Publication of CN102024033A publication Critical patent/CN102024033A/en
Application granted granted Critical
Publication of CN102024033B publication Critical patent/CN102024033B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of automatic detection audio template also divides the method for chapter to video.It utilizes program audio data weekly, vocal print feature Fast Learning is adopted to go out the fragment that content repeats, again by segment composition, sort out alternatively template, statistics fragment length, occurrence number, Annual distribution information are come calibrating template type and screen selecting formwork file, and are utilized template automatically to divide chapter to new program.The present invention is based on the retrieval of audio frequency and dynamically set up template base, the calculated amount overcome based on the method for video is large, detection speed is slow, and the shortcoming that when program fragment has an identical audio content, picture material is different, also solves the problem that in database, " static state " template is simultaneously.

Description

A kind of automatic detection audio template also divides the method for chapter to video
Art
The invention belongs to and copy detection is carried out to the audio content of video frequency program and program is divided automatically to the field of chapter, be specifically related to a kind of automatic detection audio template and video divided to the method for chapter.
Background technology
Video frequency program carries out point chapter and refers to that the specific fragment (as advertisement, program special efficacy) of data volume is large, that the duration is long video frequency program is marked thus facilitates user to browse.
At present, traditional method to be gone forward side by side row relax to video frame extraction feature, is based upon in image basis.Common are station symbol to detect and video identification.
The method of video identification can utilize the information of template in database to accomplish to locate fast and exactly and mark really, but in the database of current the method template by people for adding, database internal information is more fixing, the data do not had in database cannot detect, in addition, some program fragment has identical audio content, and picture material was different at that time, and the time continued is longer, as news content looks back part.For such program fragment, the common detection method based on image is just inapplicable.Detect for station symbol, increasing video uses same station symbol (if advertisement is with program) in the part that should be judged to different chapters and sections, causes station caption detection method to lose efficacy like this.
It is large also to there is calculated amount in the above-mentioned method based on video, the problem that detection speed is slow.And at present, the video based on audio frequency divides Zhang Fangfa all to belong to the detection having template, namely artificially pre-define template in a database and again testing audio data are compared.It is " static state " that the defect of these methods is limited to template in database equally, and the data do not had in database cannot detect.
Summary of the invention
The present invention is in order to overcome the deficiency based on video detection and these two class methods of audio detection based on template, propose a kind of automatic detection audio template and video divided to the method for chapter, it can learn out audio template fast, robustly in the audio file of very big data quantity, and utilizes template to divide chapter accurately to new video.
The invention provides and automatically detect audio template and the method for video being divided to chapter, divide the chapter stage comprising Template Learning stage and video.
The Template Learning stage comprises the following steps:
1) using the voice data in past one week as training sample, the voice data of 7 days (7*24 hour) 5513HZ is carried out pre-service; Whole 7 audio frequency be multiplied by 24 hours are divided into 1 hour some audio file being unit; Utilize the KULLBACK-LEIBLER distance of audio frequency, the file of 1 hour is carried out the segmentation of shear point, obtain scrappy audio-frequency fragments; Prevent segment from splitting overbreak, these audio-frequency fragments are carried out cluster, judge the time span of each fragment, duration is less than the segment of 3 seconds and the shorter segment of adjacent duration is spliced; Then for the audio file of 5513HZ, be a frame with window length 0.37s, 40ms, judge whether every frame is mute frame, the energy of each frame is eFr, energy threshold TE, according to formula:
eFr = Σ w x i 2 - mean W TE = se α · n + β · e min
Wherein, w is the quantity of sampled point in window, and n is the frame number of whole file, x ifor the energy value of each sampled point, α, β are setup parameter,
If eFr≤TE, then this frame is judged as mute frame; If it is over half that mute frame account for audio fragment, this fragment will be defined as silence clip.
2) window length 0.37 second, 40ms is that stepping carries out discrete Fourier transform (DFT) to the audio file of 5513HZ, and according to Mei Er frequency formula
Mel(f)=2595lg(1+f/700)
20HZ---3000HZ in actual band is partially converted into Mei Er frequency band and is divided into 17 word frequency bands; Calculate the energy difference between adjacent two frequency bands; If difference is more than or equal to setting threshold value, output is 1, otherwise is 0; Extract the eigenwert of string of binary characters as each frame of a 16Bit;
3) utilize the data of all frames of audio frequency in a week to set up out a Hash table, the key word key of Hash table is the eigenwert of 16Bit; The value value storage of Hash table has the frame number of this eigenwert and the fragment position at place; All frames Hash in this Hash table in each non-mute audio fragment A goes out to have with it neighbour's frame of identical key; According to the numbering of the search situation of every frame and the audio fragment at neighbour's frame place, using the candidate matches fragment of audio fragment as audio A that the frame number of half can be had in A to search contiguous frames; Then by the Segment A intersegmental calculating similarity with candidate matches sheet one by one, for two Segment A, B, the frame that wherein will can find matching characteristic respectively in chronological sequence order arrangement, in A can with find the frame number mating right frame to be a in B 1, a 2..., a m, can be b by the frame number of the frame in characteristic matching in A in B 1, b 2..., b n, according to formulae discovery 2 coefficient s1, s2:
s 1 = m + n 2 · min ( N A , N B )
s 2 = Σ m χ ( a i ) + Σ n χ ( b i ) 2 · min ( N A , N B )
Wherein, t is setting threshold value; Utilize s1, s2 calculates the similarity of two fragments, S=w 1s 1+ w 2s 2, w1 and w2 is the constant coefficient of setting, usual w1 < w2; Candidate segment similarity S being greater than threshold value T1 remains the coupling fragment as Segment A.
4) the audio fragment A finding coupling number of fragments to be greater than certain threshold value T2 is remained; And judge in certain hour with interval, whether there are other fragments that quantity also can be found to be greater than the coupling fragment of setting threshold value; If then retain this audio fragment, otherwise delete; Finally obtain a series of audio fragment repeated in week age.
5) utilize the state pause judgments temporal information of fragment, the audio fragment belonged on the same day remained is done and splices and merge; Fragment merges regularly be between two: for 2 Segment A on the same day, B, the initial time of A is Tas, end time is the starting and ending time of Tae, B be respectively Tbs, Tbe, wherein Tae < Tbs, if | Tae-Tbs| < TDur, then Segment A, B and two panels spacer segment part all permeate fragment, and its initial time is Tas, and the end time is Tbe;
6) sort out finishing the fragment after fusion, sorting out principle is: in the fragment after 2 fusions, and be partly coupling fragment each other if having, then these 2 fragments are classified as a class; Class also meets criterion in addition: if A and B is same class, B and C is same class, then A and C is same class;
7) carry out the judgement of program category for the fragment computations that each class content repeats 3 indexs, its decision rule is as follows:
Index 3:T k
N kbe the average length of fragment in K class, n is the number of fragments in K class, t iit is the time span of i-th fragment; be the Annual distribution situation of K class fragment, c is the central instant of a week, c iit is the central instant of fragment i in K class; T kit is the number of times that K class occurred in a week; 3 indexs are done to merge:
Type=c 1·Dur+c 2·Distrb+c 2·T
C1, C2, C3 are the weights of 3 settings; Type < T1, such fragment is judged as program special efficacy; T1≤Type < T2, such is judged as station promotion sheet; Type >=T2, such is judged as advertisement.
8) type decision complete after, carry out screening audio fragment to set up template base; Belong in the right repetition audio fragment of coupling the time that retains the longest one section in each class audio frequency fragment, by this segment characterizations together with the program category type information judged together stored in template base, generate template file.
Video divides the chapter stage:
For new one section of program, system utilizes template base file to do copy detection to new program, finds out the fragment with template file with identical content in a program which, and nominal time and type, comprise the following steps:
1) utilize template generation stage etch 2, method described in 3 is extracted feature to new audio program and is established Hash table and template base file mated with new video frequency program one by one;
2) for template A, its every frame 16bit feature all in Hash table Hash go out the audio frequency characteristics mated with it;
3) feature in A is alignd in time with the feature of its coupling, and calculation template file and and its time upper equitant program audio part between Hamming distance hi frame by frame, again by distance divided by overlap part frame number in the hope of similarity distance mark Dsore wherein overlap is the frame number that program and template think lap.
4) mark is less than the candidate matches fragment of program audio part as template of setting threshold value, what wherein score was minimum is set to optimum matching fragment; Then other candidate segment, and if best fractional time interval be greater than time interval threshold value, and the difference of its score Dsocre and best score is less than and sets fraction shifts threshold value, then be still regarded as mating fragment; Mark initial time and the duration of this lap, and utilize template type to demarcate this part program category.
The invention has the beneficial effects as follows, the information utilizing specific fragment in one week, content to repeat is breach, and the fragment utilizing vocal print feature these to be repeated fast is found out from a large amount of data; The present invention, according to similarity decision method and this stable feature of repeatability, ensure that the accuracy of search; The present invention according to the time span of the audio fragment determined, multiplicity, and temporal distribution variance information determined the program category of audio template; In addition, the present invention utilizes the audio template learnt out to make point chapter automatically to new program, ensure that the speed of point chapter and temporal accurate location; The present invention is based on the retrieval of audio frequency and dynamically set up template base, the calculated amount overcome based on the method for video is large, detection speed is slow, and the shortcoming that when program fragment has an identical audio content, picture material is different, also solves the problem that in database, " static state " template is simultaneously.
Accompanying drawing explanation
Fig. 1 is that the present invention automatically detects audio template and video divided to the method template study partial process view of chapter;
Fig. 2 is that the present invention automatically detects audio template and divides the method video of chapter to divide chapter partial process view to video;
Fig. 3 is that the present invention automatically detects audio template and video divided to the overall architecture of the method and system of chapter;
Fig. 4 is window length and the stepping schematic diagram of audio feature extraction;
Fig. 5 is that video divides the spacing mark of chapter phase templates audio fragment and program audio to calculate schematic diagram.
Embodiment
Below in conjunction with specification drawings and specific embodiments, the present invention is further elaborated, when these those skilled in the art can be allowed not need to spend performing creative labour, understands and realize technical scheme proposed by the invention.
The technical problem to be solved in the present invention comprises:
1, utilize program audio data in the past, allow machine go out template file from a large amount of data learnings, Dynamic Establishing template base;
2, split audio file and extract robust, the vocal print feature that is conducive to fast search matching;
3, according to the feature extracted, the similarity mode between two section audio fragments;
4, audio fragment cluster, does the differentiation of program category to each audio class and pick out template file from each audio class;
5, utilize template base file, to program of newly arriving do mate after a point chapter is carried out to program.
In conjunction with above-mentioned technical matters, the present invention realizes this object to propose automatic detection audio template and the method for video being divided to chapter, divides chapter two stages comprising Template Learning and video.
By reference to the accompanying drawings 1, automatically detect audio template and also divide video the Template Learning stage of the method for chapter to comprise the following steps:
Step 101: preferably, the present invention gets over the program data of a week as training data, from learning depanning plate file; Week about, just in the program data of upper a week, learn new template, add in template base; The voice data of 7 days (7*24 hour) 5513HZ is carried out pre-service; Whole 7 audio frequency be multiplied by 24 hours are divided into 1 hour some audio file being unit; Utilize the KULLBACK-LEIBLER distance of audio frequency, the file of 1 hour is carried out the segmentation of shear point, obtain scrappy audio-frequency fragments; Prevent segment from splitting overbreak, these audio-frequency fragments are carried out cluster, judge the time span of each fragment, duration is less than the segment of 3 seconds and the shorter segment of adjacent duration is spliced; Then for the audio file of 5513HZ, be a frame with window length 0.37s, 40ms, judge whether every frame is mute frame, the energy of each frame is eFr, energy threshold TE, according to formula:
eFr = &Sigma; w x i 2 - mean W TE = se &alpha; &CenterDot; n + &beta; &CenterDot; e min
Wherein, w is the quantity of sampled point in window, and n is the frame number of whole file, x ifor the energy value of each sampled point, α, β are setup parameter, if eFr≤TE, then this frame is judged as mute frame; If it is over half that mute frame account for audio fragment, this fragment will be defined as silence clip.
Step 102: window length 0.37 second, 40ms is that stepping carries out discrete Fourier transform (DFT) to the audio file of 5513HZ, and according to Mei Er frequency formula
Mel(f)=2595lg(1+f/700)
20HZ---3000HZ in actual band is partially converted into Mei Er frequency band and is divided into 17 word frequency bands; Calculate the energy difference between adjacent two frequency bands; If difference is more than or equal to setting threshold value, output is 1, otherwise is 0; Extract the eigenwert of string of binary characters as each frame of a 16Bit;
As shown in Figure 5, frame 1 utilizes 0 to 0.37 second, and this part sampling number certificate does discrete Fourier transform (DFT), then 20HZ---3000HZ in its actual band be partially converted into Mei Er frequency band and be divided into 17 word frequency bands, calculating the energy difference between adjacent two frequency bands; If difference is more than or equal to setting threshold value, output is 1, otherwise is 0; Extract the eigenwert of string of binary characters as frame 1 of a 16Bit; Then window sliding 40 milliseconds, namely utilizes the sampled data points of 40 milliseconds to 0.41 second to repeat the eigenwert of string of binary characters as frame 2 of above-mentioned steps extraction 16Bit, and by that analogy until all audio frames all extract feature.
Step 103: utilize the data of one week all frame of audio frequency to set up out a Hash table, the key word key of Hash table is the eigenwert of 16Bit; The value value storage of Hash table has the frame number of this eigenwert and the fragment position at place; All frames Hash in this Hash table in each non-mute audio fragment A goes out to have with it neighbour's frame of identical key; According to the numbering of the search situation of every frame and the audio fragment at neighbour's frame place, using the candidate matches fragment of audio fragment as audio A that the frame number of half can be had in A to search contiguous frames; Then by the Segment A intersegmental calculating similarity with candidate matches sheet one by one.
For Segment A and an one candidate matches fragment B, the frame that wherein will can find matching characteristic respectively in chronological sequence order arrangement, in A can with find the frame number mating right frame to be a in B 1, a 2..., a m, can be b by the frame number of the frame in characteristic matching in A in B 1, b 2..., b n, according to formulae discovery 2 coefficient s1, s2:
s 1 = m + n 2 &CenterDot; min ( N A , N B )
s 2 = &Sigma; m &chi; ( a i ) + &Sigma; n &chi; ( b i ) 2 &CenterDot; min ( N A , N B )
Wherein, t is setting threshold value, and preferably, T value is 3; Utilize s1, s2 calculates the similarity of two fragments, S=w 1s 1+ w 2s 2, w1 and w2 is the constant coefficient of setting, usual w1 < w2, preferably, and w1 value 1/3, w2 value 2/3; Candidate segment similarity S being greater than threshold value T1 remains the coupling fragment as Segment A, and preferably, it is 0.5 that the present invention sets threshold value T1.
Step 104: remained by the audio fragment A finding coupling number of fragments to be greater than certain threshold value T2, for employing one weekly data as training sample, preferably, it is 7 that the present invention sets T2 value; And judge with in A separated in time, whether there are other fragments that quantity also can be found to be greater than setting threshold value T2 coupling fragment; If then retain this audio fragment, otherwise delete; Finally obtain a series of audio fragment repeated in week age.
Step 105: the state pause judgments temporal information utilizing fragment, does the audio fragment belonged on the same day remained and splices and merge; Fragment merges regularly be between two: for 2 Segment A on the same day, the initial time of B, A is Tas, and the end time is Tae, the starting and ending time of B is respectively Tbs, Tbe, wherein Tae < Tbs, if | Tae-Tbs| < TDur, preferably, TDur is set as 10 seconds, then Segment A, B and two panels spacer segment part all permeate fragment, and its initial time is Tas, and the end time is Tbe;
Step 106: sort out finishing the fragment after fusion, sorting out principle is: in the fragment after 2 fusions, it is partly coupling fragment pair each other if having, then these 2 fragments are classified as a class, even in Segment A, some data is judged as at step 104 with partial data in fragment B and mates, then Segment A and fragment B are classified as a class; Class also meets criterion in addition: if A and B is same class, B and C is same class, then A and C is same class;
Step 107: carry out the judgement of program category for the fragment computations that each class content repeats 3 indexs, its decision rule is as follows:
Index 1: Dur = N k 2 max ( N k 2 ) &ForAll; k
Index 2: Distrb = &sigma; k 2 max ( &sigma; k 2 ) k
Index 3:T k
N kbe the average length of fragment in K class, n is the number of fragments in K class, t iit is the time span of i-th fragment; be the Annual distribution situation of K class fragment, c is the central instant of a week, c iit is the central instant of fragment i in K class; T kit is the number of times that K class occurred in a week; 3 indexs are done to merge:
Type=c 1·Dur+c 2·Distrb+c 2·T
C1, C2, C3 are the weights of 3 settings; Type < T1, such fragment is judged as program special efficacy; T1≤Type < T2, such is judged as station promotion sheet; Type >=T2, such is judged as advertisement.
Step 108: after type decision is complete, carries out screening audio fragment to set up template base; Belong in the right repetition audio fragment of coupling the time that retains the longest one section in each class audio frequency fragment, by this segment characterizations together with the program category type information judged together stored in template base, generate template file.
By reference to the accompanying drawings 2, automatic detection audio template also divides the video of the method for chapter to divide the chapter stage to video, for new one section of program, system utilizes template base file to do copy detection to new program, find out the fragment with template file with identical content in a program which, and nominal time and type, comprise the following steps:
Step 201: with template generation stage etch 102, described in 103, method is identical, establishes Hash table to the audio extraction feature of new program, again template base file is mated with new video frequency program one by one, coupling work as the following step 202,203, described in 204;
Step 202: for a template audio fragment A, its every frame 16bit feature all in Hash table Hash go out the audio frequency characteristics mated with it;
Step 203: feature in A is alignd in time with the feature of its coupling, and calculation template file and and its time upper equitant program audio part between Hamming distance frame by frame, then by distance divided by the frame number of the part that overlaps in the hope of affinity score;
As shown in Figure 6, might as well establish in step 202., in the middle of template audio fragment A frame 3 and frame 6 in the audio file of new program be detected as mate right, so the frame 3 of A is alignd on time location with the frame 6 of program, calculate the Hamming distance frame by frame of program and A lap and A, namely from frame 1 to the frame 4 of frame m and program, to frame m+3, this part calculates Hamming distance hi to A frame by frame; Then the Hamming distance utilizing each frame to calculate calculates distance mark Dsore, wherein overlap is the frame number that program and template think lap, and overlap equals the frame number m of A in this example embodiment.
Step 204: the candidate matches fragment of program audio part as template mark being less than setting threshold value, what wherein score was minimum is set to optimum matching fragment; Then other candidate segment, if and best fractional time interval is greater than time interval threshold value, preferably, time interval threshold value is set to the template segments time span of 1.2 times, and the difference of its score and best score is less than and sets fraction shifts threshold value, preferably fraction shifts threshold value is set to 2, then be still regarded as mating fragment; Mark initial time and the duration of this lap, and utilize template type to demarcate this part program category.

Claims (1)

1. one kind is automatically detected audio template and video frequency program is divided to the method for chapter, the information that it is characterized in that utilizing specific fragment in one week, content to repeat goes out audio template from the voice data learning of a week fast, robustly for breach, and utilize template to divide chapter accurately to new program, comprise the Template Learning stage and video divides the chapter stage, wherein the Template Learning stage comprises the following steps:
Step one, carries out pre-service for the program audio file of a week and judges silence clip;
Step 2, for each audio fragment, extracts the vocal print feature of robust;
Step 3, utilizes one week audio data characteristics, sets up Hash table, search coupling fragment;
Whether step 4, remains the audio fragment A that coupling number of fragments can be found inside step 3 gained fragment to be greater than threshold value, and judges in certain hour with interval, have other fragments that quantity also can be found to be greater than the coupling fragment of setting threshold value; If then retain this audio fragment, otherwise delete; Finally obtain a series of audio fragment in week age, content repeated;
Step 5, in the fragment that step 4 filters out, initial time for two Segment A on the same day, B, A is Tas, and the end time is Tae, the starting and ending time of B is respectively Tbs, Tbe, wherein Tae < Tbs, if | Tae-Tbs| < TDur, then Segment A, B and two panels spacer segment part all permeate fragment, its initial time is Tas, and the end time is Tbe;
Step 6, carries out cluster by the fragment after merging in step 5, obtains several audio class, and its classification principle is: in the fragment after two fusions, and be partly coupling fragment each other if having, then these two fragments are classified as a class; Class also meets criterion in addition: if A and B is same class, B and C is same class, then A and C is same class;
Step 7, for each class put in order in step 6, judges its program category;
Step 8, belongs in the right repetition audio fragment of coupling the time that retains the longest one section in each class audio frequency fragment, by this segment characterizations together with the program category type information judged together stored in template base, generate template file;
Wherein said step one specifically comprises: using the voice data in past one week as training sample, the voice data of these 5513HZ is divided into some audio files that 1 hour is unit; Utilize the Kullback-Leibler distance of audio frequency, the file of 1 hour is carried out the segmentation of shear point, obtain scrappy audio-frequency fragments; Prevent segment from splitting overbreak, these audio-frequency fragments are carried out cluster, judge the time span of each fragment, duration is less than the segment of 3 seconds and the shorter segment of adjacent duration is spliced; Then for the audio file of 5513HZ, be a frame with window length 0.37s, 40ms, judge whether every frame is mute frame, the energy of each frame is eFr, energy threshold TE, according to formula:
w is the quantity of sampled point in window, and n is the frame number of whole file, x ifor the energy value of each sampled point, α, β are setup parameter, if eFr≤TE, then this frame is judged as mute frame; If it is over half that mute frame account for audio fragment, this fragment will be defined as silence clip;
Wherein said step 2 specifically comprises: window length 0.37 second, 40ms is that stepping carries out discrete Fourier transform (DFT) to the audio file of 5513HZ, and according to Mei Er frequency formula Mel (f)=2595lg (1+f/700), 20HZ---3000HZ in actual band is partially converted into Mei Er frequency band and is divided into 17 word frequency bands; Calculate the energy difference between adjacent two frequency bands; If difference is more than or equal to setting threshold value, output is 1, otherwise is 0; Extract the eigenwert of string of binary characters as each frame voice data of a 16Bit;
Wherein said step 3 specifically comprises: set up out a Hash table by the data of one week interior all frame of audio frequency, the key word key of Hash table is the eigenwert of 16Bit, the value value storage of Hash table has the frame number of this eigenwert and the fragment position at place, all frames Hash in this Hash table in each non-mute audio fragment A goes out to have with it neighbour's frame of identical key, according to the numbering of the search situation of every frame and the audio fragment at neighbour's frame place, using the candidate matches fragment of audio fragment as audio A that the frame number of half can be had in A to search contiguous frames; Then by the Segment A intersegmental calculating similarity with candidate matches sheet one by one, candidate segment similarity being greater than threshold value remains the coupling fragment as Segment A;
Divide the chapter stage at video, for new one section of program, system utilizes template base file to do copy detection to new program, finds out the fragment with template file with identical content in a program which, and nominal time and type, comprise the following steps:
Step one, consistent with method described in step 2 in the Template Learning stage and step 3, feature is extracted to new audio program and establishes Hash table;
Step 2, mates with new video frequency program one by one by template base file, for each template, its every frame 16bit feature all in Hash table Hash go out the audio frequency characteristics mated with it;
Step 3, the similarity distance mark Dscore between calculation template file and new Program sections data;
Step 4, selects and demarcates the fragment of mating with template file from new program;
Video divides the step 3 in chapter stage specifically to comprise: the feature in the feature of template file and the program file of its coupling alignd in time, and calculation template file and and its time upper equitant program audio part between Hamming distance hi frame by frame, again by distance divided by overlap part frame number in the hope of similarity distance mark Dscore, Dscore wherein overlap be program and template overlap part frame number;
Video divides step 4 described in the chapter stage specifically to comprise: the candidate matches fragment of program audio part as template mark being less than setting threshold value, and what wherein score was minimum is set to optimum matching fragment; Then other candidate segment, if and optimum matching fractional time interval is greater than time interval threshold value, and the difference of the similarity distance mark of its similarity distance mark Dscore calculated in step 3 and optimum matching fragment is less than the fraction shifts threshold value of setting, then be still regarded as mating fragment; The wherein time interval threshold value template time length that equals 1.2 times, fraction shifts threshold value equals 2; Mark initial time and the duration of this lap, and utilize template type to demarcate this part program category;
Above-mentioned automatic detection audio template is also divided in Zhang Fangfa to video frequency program, the feature in Template Learning stage to be in step 3 two Segment A, the similarity decision method of B is: for two Segment A, B, the frame that wherein will can find matching characteristic respectively in chronological sequence order arrangement, in A can with find the frame number mating right frame to be a in B 1, a 2..., a m, can be b by the frame number of the frame in characteristic matching in A in B 1, b 2..., b n, according to formulae discovery 2 coefficient s1, s2:
Wherein, t is setting threshold value; Utilize s1, s2 calculates the similarity of two fragments,
S=w 1s 1+ w 2s 2, w1 and w2 is the constant coefficient of setting; Candidate segment similarity S being greater than threshold value T1 remains the coupling fragment as Segment A;
Above-mentioned automatic detection audio template is also divided in Zhang Fangfa to video frequency program, and the feature in Template Learning stage is also that the program category of calculating and each audio class that step 7 comprises 3 indexs judges:
Index 1:
Index 2:
Index 3:T k
N kbe the average length of fragment in K class, n is the number of fragments in K class, t iit is the time span of i-th fragment; be the Annual distribution situation of K class fragment, c is the central instant of a week, c iit is the central instant of fragment i in K class; T kbe the number of times that K class occurred in a week, then 3 indexs done merge and judge program category;
The concrete operations that the fusion of above-mentioned 3 indexs and template file program category judge comprise calculating fusion coefficients Type and are compared with the threshold value of setting by this coefficient:
Type=c 1·Dur+c 2·Distrb+c 3·T
C1, C2, C3 are the weights of 3 settings; Type < T1, such fragment is judged as program special efficacy; T1≤Type < T2, such is judged as station promotion sheet; Type >=T2, such is judged as advertisement.
CN201010567970.1A 2010-12-01 2010-12-01 A kind of automatic detection audio template also divides the method for chapter to video Expired - Fee Related CN102024033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010567970.1A CN102024033B (en) 2010-12-01 2010-12-01 A kind of automatic detection audio template also divides the method for chapter to video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010567970.1A CN102024033B (en) 2010-12-01 2010-12-01 A kind of automatic detection audio template also divides the method for chapter to video

Publications (2)

Publication Number Publication Date
CN102024033A CN102024033A (en) 2011-04-20
CN102024033B true CN102024033B (en) 2016-01-20

Family

ID=43865330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010567970.1A Expired - Fee Related CN102024033B (en) 2010-12-01 2010-12-01 A kind of automatic detection audio template also divides the method for chapter to video

Country Status (1)

Country Link
CN (1) CN102024033B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379364B (en) * 2012-04-26 2018-08-03 腾讯科技(深圳)有限公司 Processing method, device, video server and the system of video file
CN103021440B (en) 2012-11-22 2015-04-22 腾讯科技(深圳)有限公司 Method and system for tracking audio streaming media
CN103237233B (en) * 2013-03-28 2017-01-25 深圳Tcl新技术有限公司 Rapid detection method and system for television commercials
CN104091598A (en) * 2013-04-18 2014-10-08 腾讯科技(深圳)有限公司 Audio file similarity calculation method and device
CN105185401B (en) * 2015-08-28 2019-01-01 广州酷狗计算机科技有限公司 The method and device of synchronized multimedia listed files
CN106548793A (en) * 2015-09-16 2017-03-29 中兴通讯股份有限公司 Storage and the method and apparatus for playing audio file
CN106331844A (en) * 2016-08-17 2017-01-11 北京金山安全软件有限公司 Method and device for generating subtitles of media file and electronic equipment
CN108253977B (en) * 2016-12-28 2020-11-24 沈阳美行科技有限公司 Generation method and generation device of incremental data for updating navigation data
CN107609149B (en) * 2017-09-21 2020-06-19 北京奇艺世纪科技有限公司 Video positioning method and device
CN108513140B (en) * 2018-03-05 2020-10-16 北京明略昭辉科技有限公司 Method for screening repeated advertisement segments in audio and generating wool audio
CN108447501B (en) * 2018-03-27 2020-08-18 中南大学 Pirated video detection method and system based on audio words in cloud storage environment
CN108763492A (en) * 2018-05-29 2018-11-06 四川远鉴科技有限公司 A kind of audio template extracting method and device
CN112863547B (en) * 2018-10-23 2022-11-29 腾讯科技(深圳)有限公司 Virtual resource transfer processing method, device, storage medium and computer equipment
CN109547850B (en) * 2018-11-22 2021-04-06 杭州秋茶网络科技有限公司 Video shooting error correction method and related product
CN110400559B (en) * 2019-06-28 2020-09-29 北京达佳互联信息技术有限公司 Audio synthesis method, device and equipment
CN110717063B (en) * 2019-10-18 2022-02-11 上海华讯网络系统有限公司 Method and system for verifying and selectively archiving IP telephone recording file
CN111883139A (en) * 2020-07-24 2020-11-03 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for screening target voices
CN111863023B (en) * 2020-09-22 2021-01-08 深圳市声扬科技有限公司 Voice detection method and device, computer equipment and storage medium
CN115205635B (en) * 2022-09-13 2022-12-02 有米科技股份有限公司 Weak supervision self-training method and device of image-text semantic alignment model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101420618A (en) * 2008-12-02 2009-04-29 西安交通大学 Adaptive telescopic video encoding and decoding construction design method based on interest zone
CN101594527A (en) * 2009-06-30 2009-12-02 成都艾索语音技术有限公司 The dual stage process of high Precision Detection template from audio and video streams

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101420618A (en) * 2008-12-02 2009-04-29 西安交通大学 Adaptive telescopic video encoding and decoding construction design method based on interest zone
CN101594527A (en) * 2009-06-30 2009-12-02 成都艾索语音技术有限公司 The dual stage process of high Precision Detection template from audio and video streams

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于时空融合的视频分割算法研究;李宏等;《信号处理》;20090131;第25卷(第1期);第72-76页 *

Also Published As

Publication number Publication date
CN102024033A (en) 2011-04-20

Similar Documents

Publication Publication Date Title
CN102024033B (en) A kind of automatic detection audio template also divides the method for chapter to video
CN110322738B (en) Course optimization method, device and system
CN101616264B (en) Method and system for cataloging news video
CN101710490B (en) Method and device for compensating noise for voice assessment
CN102890778A (en) Content-based video detection method and device
CN101221760B (en) Audio matching method and system
CN104731954A (en) Music recommendation method and system based on group perspective
CN110213670A (en) Method for processing video frequency, device, electronic equipment and storage medium
CN101159834A (en) Method and system for detecting repeatable video and audio program fragment
TW201432674A (en) Audio identifying method and audio identification device using the same
CN104778230B (en) A kind of training of video data segmentation model, video data cutting method and device
CN110210294A (en) Evaluation method, device, storage medium and the computer equipment of Optimized model
CN107609149B (en) Video positioning method and device
CN103871424A (en) Online speaking people cluster analysis method based on bayesian information criterion
CN106098079A (en) Method and device for extracting audio signal
CN101727441B (en) Evaluating method and evaluating system targeting Chinese name identifying system
CN104505101A (en) Real-time audio comparison method
CN102623007B (en) Audio characteristic classification method based on variable duration
CN109712642A (en) It is a kind of that precisely quickly monitoring method is broadcasted in advertisement
Harb et al. Robust speech music discrimination using spectrum's first order statistics and neural networks
CN109857842A (en) A kind of method and device of report barrier text identification
Martens et al. The COST278 broadcast news segmentation and speaker clustering evaluation-overview, methodology, systems, results
CN108717851B (en) Voice recognition method and device
CN105843957A (en) Depth sorting method and system for microblogs
CN111382302A (en) Audio sample retrieval method based on variable speed template

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160120

Termination date: 20211201