CN104156478B

CN104156478B - A kind of captions matching of internet video and search method

Info

Publication number: CN104156478B
Application number: CN201410423582.4A
Authority: CN
Inventors: 程国艮; 袁翔宇; 王宇晨
Original assignee: Mandarin Technology (beijing) Co Ltd
Current assignee: Chinese translation language through Polytron Technologies Inc
Priority date: 2014-08-26
Filing date: 2014-08-26
Publication date: 2017-07-07
Anticipated expiration: 2034-08-26
Also published as: CN104156478A

Abstract

The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps：For the video file of existing text subtitle, index building；Obtain the elementary audio characteristic sequence Z sequences of the video according to Sms divisions：Z₁Z₂Z₃Z₄Z₅...Z_nWith integration and sequence, T-sequence：T₁T₂T₃T₄...T_n‑9；Retrieval of each captions one Z sequence fragment of correspondence to video caption, for each video in video index storehouse, takes out its T-sequence, is set to sequence A：A₁A₂A₃...A_n, the T-sequence of fragment Seg is set to sequence B：B₁B₂B₃...B_M,Sequence of calculation A and sequence B best match, and the Euclidean distance of best match is calculated, lookup obtains the minimum video V of distance, used as the video that matching is obtained；Each captions in the video obtained to matching, carry out counter-match and find best match, realize that captions are matched.The present invention sets up index based on voice data, there is provided an a kind of captions Indexing Mechanism and caption detection method for form different video, efficiently and accurate.

Description

A kind of captions matching of internet video and search method

Technical field

The present invention relates to computer software technical field, the captions of espespecially a kind of internet video are matched and search method.

Background technology

Video on internet is varied, and the video of identical content, possible coded format is different, and possible code check differs Sample, possible resolution ratio is different, and a possible video is the fragment of another video.This patent provides one for form difference A kind of captions Indexing Mechanism and caption detection method of video.In such a case it is difficult to efficient and preparation carry out video The index of captions, realizes the matching of captions.

The content of the invention

To solve the above problems, the present invention is provided and a kind of sets up the captions of the internet video of index based on voice data Matching and search method.

The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps：

Step one, the video file for existing text subtitle, index building；

（1）Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel；

（2）By the normalization of audio data samples rate；

（3）Framing is carried out to voice data；

（4）For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence：Z₁Z₂Z₃Z₄Z₅...Z_n;

（5）For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence：T₁T₂T₃T₄...T_n-9, wherein, Tn=Zn+Zn+1+...+Zn+9；Each captions pair Answer a Z sequence fragment；

Step 2：The retrieval of video caption

（1）For the video I on internet, according in above-mentioned steps one（1）、（2）Step carries out the normalizing of voice data Change is processed；

（2）By end-point detection algorithm, voice and non-voice are distinguished；

（3）To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds；

（4）According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence；

（5）For each video in video index storehouse, its T-sequence is taken out, be set to sequence A：A₁A₂A₃...A_n, piece The T-sequence of section Seg, is set to sequence B：B₁B₂B₃...B_M,Sequence of calculation A and sequence B best match, and calculate the Europe of best match Formula distance, computational methods are as follows：

, take out from the head of sequence A and the same length of sequence B data：A₁A₂A₃...A_m, calculate the data and sequence B： B₁B₂B₃...B_m：Euclidean distance；

, subsequence is offset k, i.e. sequence A_1+kA_2+kA_3+k...A_m+k, calculate the Euclidean distance with sequence B；

, subsequence is offset 2k, i.e. sequence A_1+2kA_2+2kA_3+2k...A_m+2k, calculate the Euclidean distance with sequence B；

, in this way, until scanning through full sequence；

, find out the most short subsequence A of matching Euclidean distance_1+jkA_2+jkA_3+jk...A_m+jk, finer scanning is carried out, Find out arrangement set A_1+jk+dA_2+jk+dA_3+jk+d...A_m+jk+d, wherein-m/2<= d <In=m/2, with sequence BB₁B₂B₃...B_mThe most short sequence of Euclidean distance, as best match sequence；Best match sequence is piece with the distance of sequence B The distance of section Seg and video；

（6）Lookup obtains the minimum video V of distance, used as the video that matching is obtained；

（7）Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, As A sequences, according to step 2（5）Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.

Step one（2）Middle sample rate is normalized to 16bit, 8,000 Hz.

Step one（3）Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.

Step one（4）In take S for 10 ms.

Advantageous Effects of the invention are：The present invention sets up index based on voice data, by audio frequency characteristics sequence The structure of row（Integration and sequence）, based on integration and sequence search most have matching video method and based on integration and sequence carry out A kind of method of captions matching, there is provided a captions Indexing Mechanism and caption detection method for form different video, efficiently And it is accurate.

Specific embodiment

With reference to embodiment, specific embodiment of the invention is described in further detail.

Step one, the video file for existing text subtitle, index building；

（1）Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel.

（2）The normalization of audio data samples rate, sample rate are normalized to 16bit, 8,000 Hz, or other adopt Sample rate.

（3）Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.

（4）For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence：Z₁Z₂Z₃Z₄Z₅...Z_n, S is taken for 10 ms.

（5）For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence：T₁T₂T₃T₄...T_n-9, wherein, Tn=Zn+Zn+1+...+Zn+9；Each captions pair Answer a Z sequence fragment.

Step 2：The retrieval of video caption

（1）For the video I on internet, according in above-mentioned steps one（1）、（2）Step carries out the normalizing of voice data Change is processed.

（2）By end-point detection algorithm, voice and non-voice are distinguished.

（3）To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds.

（4）According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence.

, in this way, until scanning through full sequence；

（6）Lookup obtains the minimum video V of distance, used as the video that matching is obtained.

Claims

1. a kind of captions of internet video are matched and search method, it is characterised in that comprised the following steps：

Step one, the video file for existing text subtitle, index building；

（1）Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 sound Road；

（2）By the normalization of audio data samples rate；

（3）Framing is carried out to voice data；

（4）For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z sequences of the video according to S ms divisions：

Z₁ Z₂ Z ₃ Z ₄Z ₅ ...Z _n;

（5）For each video, in addition to preserving elementary audio characteristic sequence, it is 1000ms also to preserve a time window Integration and sequence, T sequences：T₁ T₂ T ₃T ₄ ...T_n-9, wherein, T_n= Z_n+Z_n+1+...+Z_n+9；Each word Curtain one Z sequence fragment of correspondence；

Step 2：The retrieval of video caption

（1）For the video I on internet, according in above-mentioned steps one（1）、（2）Step carries out the normalization of voice data Treatment；

（4）According to the method for above-mentioned steps one, the elementary audio characteristic sequence Z sequences and integration and sequence T of fragment Seg are calculated Sequence；

（5）For each video in video index storehouse, its T sequences are taken out, be set to sequence A：A₁ A ₂ A₃ ...A_n, the T sequences of fragment Seg are set to sequence B：B₁ B ₂ B ₃ ...B_m, sequence of calculation A and sequence B Best match, and the Euclidean distance of best match is calculated, computational methods are as follows：

I, the head taking-up from sequence A and the data of the same length of sequence B：A₁ A₂ A ₃ ...A _m, calculate the data With sequence B：B₁ B₂B ₃ ...B _m：Euclidean distance；

Ii, subsequence is offset k, i.e. sequence A_1+k A _2+kA _3+k ...A _m+k, calculate the Euclidean distance with sequence B；

Iii, subsequence is offset 2k, i.e. sequence A_1+2k A _2+2k A_3+2k ...A _m+2k, calculate European with sequence B Distance；

Iv, in this way, until scanning through full sequence；

V, find out the most short subsequence A of matching Euclidean distance_1+jkA _2+jk A_3+jk ...A_m+jk, finer scanning is carried out, Also arrangement set A is found out_1+jk+dA _2+jk+d A _3+jk+d ...A _m+jk+d, wherein-m/2<= d <In=m/2, with sequence B：B₁ B ₂ B ₃...B _mThe most short sequence of Euclidean distance, as best match sequence；Best match sequence and sequence B Distance is fragment Seg and the distance of video；

2. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one （2）Middle sample rate is normalized to 16bit, 8,000 Hz.

3. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one （3）Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.

4. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one （4）In take S for 10 ms.