CN104156478B - A kind of captions matching of internet video and search method - Google Patents
A kind of captions matching of internet video and search method Download PDFInfo
- Publication number
- CN104156478B CN104156478B CN201410423582.4A CN201410423582A CN104156478B CN 104156478 B CN104156478 B CN 104156478B CN 201410423582 A CN201410423582 A CN 201410423582A CN 104156478 B CN104156478 B CN 104156478B
- Authority
- CN
- China
- Prior art keywords
- sequence
- video
- captions
- sequences
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Television Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:For the video file of existing text subtitle, index building;Obtain the elementary audio characteristic sequence Z sequences of the video according to Sms divisions:Z1Z2Z3Z4Z5...ZnWith integration and sequence, T-sequence:T1T2T3T4...Tn‑9;Retrieval of each captions one Z sequence fragment of correspondence to video caption, for each video in video index storehouse, takes out its T-sequence, is set to sequence A:A1A2A3...An, the T-sequence of fragment Seg is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and the Euclidean distance of best match is calculated, lookup obtains the minimum video V of distance, used as the video that matching is obtained;Each captions in the video obtained to matching, carry out counter-match and find best match, realize that captions are matched.The present invention sets up index based on voice data, there is provided an a kind of captions Indexing Mechanism and caption detection method for form different video, efficiently and accurate.
Description
Technical field
The present invention relates to computer software technical field, the captions of espespecially a kind of internet video are matched and search method.
Background technology
Video on internet is varied, and the video of identical content, possible coded format is different, and possible code check differs
Sample, possible resolution ratio is different, and a possible video is the fragment of another video.This patent provides one for form difference
A kind of captions Indexing Mechanism and caption detection method of video.In such a case it is difficult to efficient and preparation carry out video
The index of captions, realizes the matching of captions.
The content of the invention
To solve the above problems, the present invention is provided and a kind of sets up the captions of the internet video of index based on voice data
Matching and search method.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1
Sound channel;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions
Sequence:Z1Z2Z3Z4Z5...Zn;
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is
The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair
Answer a Z sequence fragment;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data
Change is processed;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and
Sequence T-sequence;
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece
The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match
Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B:
B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out,
Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence
BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B
The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video,
As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match,
Realize that captions are matched.
Step one(2)Middle sample rate is normalized to 16bit, 8,000 Hz.
Step one(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
Step one(4)In take S for 10 ms.
Advantageous Effects of the invention are:The present invention sets up index based on voice data, by audio frequency characteristics sequence
The structure of row(Integration and sequence), based on integration and sequence search most have matching video method and based on integration and sequence carry out
A kind of method of captions matching, there is provided a captions Indexing Mechanism and caption detection method for form different video, efficiently
And it is accurate.
Specific embodiment
With reference to embodiment, specific embodiment of the invention is described in further detail.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1
Sound channel.
(2)The normalization of audio data samples rate, sample rate are normalized to 16bit, 8,000 Hz, or other adopt
Sample rate.
(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions
Sequence:Z1Z2Z3Z4Z5...Zn, S is taken for 10 ms.
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is
The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair
Answer a Z sequence fragment.
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data
Change is processed.
(2)By end-point detection algorithm, voice and non-voice are distinguished.
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds.
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and
Sequence T-sequence.
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece
The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match
Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B:
B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out,
Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence
BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B
The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained.
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video,
As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match,
Realize that captions are matched.
Claims (4)
1. a kind of captions of internet video are matched and search method, it is characterised in that comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 sound
Road;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z sequences of the video according to S ms divisions:
Z 1 Z 2 Z 3 Z 4 Z 5 ...Z n ;
(5)For each video, in addition to preserving elementary audio characteristic sequence, it is 1000ms also to preserve a time window
Integration and sequence, T sequences:T1 T 2 T 3 T 4 ...T n-9, wherein, Tn = Zn+Zn+1+...+Zn+9;Each word
Curtain one Z sequence fragment of correspondence;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、 (2)Step carries out the normalization of voice data
Treatment;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, the elementary audio characteristic sequence Z sequences and integration and sequence T of fragment Seg are calculated
Sequence;
(5)For each video in video index storehouse, its T sequences are taken out, be set to sequence A:A1 A 2 A 3
...A n, the T sequences of fragment Seg are set to sequence B:B1 B 2 B 3 ...B m, sequence of calculation A and sequence B
Best match, and the Euclidean distance of best match is calculated, computational methods are as follows:
I, the head taking-up from sequence A and the data of the same length of sequence B:A 1 A 2 A 3 ...A m , calculate the data
With sequence B:B1 B 2 B 3 ...B m :Euclidean distance;
Ii, subsequence is offset k, i.e. sequence A1+k A 2+k A 3+k ...A m+k, calculate the Euclidean distance with sequence B;
Iii, subsequence is offset 2k, i.e. sequence A1+2k A 2+2k A 3+2k ...A m+2k, calculate European with sequence B
Distance;
Iv, in this way, until scanning through full sequence;
V, find out the most short subsequence A of matching Euclidean distance 1+jk A 2+jk A 3+jk ...A m+jk, finer scanning is carried out,
Also arrangement set A is found out1+jk+d A 2+jk+d A 3+jk+d ...A m+jk+d , wherein-m/2<= d <In=m/2, with sequence
B:B 1 B 2 B 3 ...B m The most short sequence of Euclidean distance, as best match sequence;Best match sequence and sequence B
Distance is fragment Seg and the distance of video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, as
A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match,
Realize that captions are matched.
2. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one
(2)Middle sample rate is normalized to 16bit, 8,000 Hz.
3. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one
(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
4. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one
(4)In take S for 10 ms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410423582.4A CN104156478B (en) | 2014-08-26 | 2014-08-26 | A kind of captions matching of internet video and search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410423582.4A CN104156478B (en) | 2014-08-26 | 2014-08-26 | A kind of captions matching of internet video and search method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104156478A CN104156478A (en) | 2014-11-19 |
CN104156478B true CN104156478B (en) | 2017-07-07 |
Family
ID=51881976
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410423582.4A Active CN104156478B (en) | 2014-08-26 | 2014-08-26 | A kind of captions matching of internet video and search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104156478B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106101573A (en) * | 2016-06-24 | 2016-11-09 | 中译语通科技(北京)有限公司 | The grappling of a kind of video labeling and matching process |
CN114579806B (en) * | 2022-04-27 | 2022-08-09 | 阿里巴巴(中国)有限公司 | Video detection method, storage medium and processor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1367906A (en) * | 1999-07-31 | 2002-09-04 | 朴奎珍 | Study method and apparatus using digital audio and caption data |
US7378588B1 (en) * | 2006-09-12 | 2008-05-27 | Chieh Changfan | Melody-based music search |
CN102724598A (en) * | 2011-12-05 | 2012-10-10 | 新奥特(北京)视频技术有限公司 | Method for splitting news items |
CN102937972A (en) * | 2012-10-15 | 2013-02-20 | 上海外教社信息技术有限公司 | Audiovisual subtitle making system and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9158754B2 (en) * | 2012-03-29 | 2015-10-13 | The Echo Nest Corporation | Named entity extraction from a block of text |
-
2014
- 2014-08-26 CN CN201410423582.4A patent/CN104156478B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1367906A (en) * | 1999-07-31 | 2002-09-04 | 朴奎珍 | Study method and apparatus using digital audio and caption data |
US7378588B1 (en) * | 2006-09-12 | 2008-05-27 | Chieh Changfan | Melody-based music search |
CN102724598A (en) * | 2011-12-05 | 2012-10-10 | 新奥特(北京)视频技术有限公司 | Method for splitting news items |
CN102937972A (en) * | 2012-10-15 | 2013-02-20 | 上海外教社信息技术有限公司 | Audiovisual subtitle making system and method |
Non-Patent Citations (1)
Title |
---|
基于视频的字幕检索与提取;杨友庆等;《计算机应用》;20001028;第20卷(第10期);第33页-第35页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104156478A (en) | 2014-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107305541B (en) | Method and device for segmenting speech recognition text | |
CN109257547B (en) | Chinese online audio/video subtitle generating method | |
RU2011104001A (en) | METHOD AND DISCRIMINATOR FOR CLASSIFICATION OF VARIOUS SIGNAL SEGMENTS | |
CN106297776B (en) | A kind of voice keyword retrieval method based on audio template | |
EP2819414A3 (en) | Image processing device and image processing method | |
EP2884423A3 (en) | Video synopsis method and apparatus | |
CN105408956B (en) | Method for obtaining spectral coefficients of a replacement frame of an audio signal and related product | |
US11769515B2 (en) | Audio coder window sizes and time-frequency transformations | |
EP4404560A3 (en) | Audio decoding method for processing stereo audio signals using a variable prediction direction | |
MY181026A (en) | Apparatus and method realizing improved concepts for tcx ltp | |
NO20092125L (en) | Device and method for processing spectral values, as well as audio signal decoders and decoders | |
MY188538A (en) | Decoding device, method, and program | |
WO2012128382A1 (en) | Device and method for lip motion detection | |
US20100121648A1 (en) | Audio frequency encoding and decoding method and device | |
EP4354432A3 (en) | Apparatus for determining for the compression of an hoa data frame representation a lowest integer number of bits required for representing non-differential gain values | |
WO2008122975A3 (en) | Means and methods for detecting bacteria in a sample | |
EP4243017A3 (en) | Apparatus and method decoding an audio signal using an aligned look-ahead portion | |
CN104156478B (en) | A kind of captions matching of internet video and search method | |
MX2019001193A (en) | Method and device for processing audio signal. | |
WO2015024428A1 (en) | Method, terminal, system for audio encoding/decoding/codec | |
CN108198558A (en) | A kind of audio recognition method based on CSI data | |
EP4336495A3 (en) | Inter-channel phase difference parameter extraction method and apparatus | |
CN103456307B (en) | In audio decoder, the spectrum of frame error concealment replaces method and system | |
CN107424628A (en) | A kind of method that specific objective sound end is searched under noisy environment | |
MX2016014237A (en) | Improved frame loss correction with voice information. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing Patentee after: Chinese translation language through Polytron Technologies Inc Address before: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing Patentee before: Mandarin Technology (Beijing) Co., Ltd. |