CN104156478B - A kind of captions matching of internet video and search method - Google Patents

A kind of captions matching of internet video and search method Download PDF

Info

Publication number
CN104156478B
CN104156478B CN201410423582.4A CN201410423582A CN104156478B CN 104156478 B CN104156478 B CN 104156478B CN 201410423582 A CN201410423582 A CN 201410423582A CN 104156478 B CN104156478 B CN 104156478B
Authority
CN
China
Prior art keywords
sequence
video
captions
sequences
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410423582.4A
Other languages
Chinese (zh)
Other versions
CN104156478A (en
Inventor
程国艮
袁翔宇
王宇晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese translation language through Polytron Technologies Inc
Original Assignee
Mandarin Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mandarin Technology (beijing) Co Ltd filed Critical Mandarin Technology (beijing) Co Ltd
Priority to CN201410423582.4A priority Critical patent/CN104156478B/en
Publication of CN104156478A publication Critical patent/CN104156478A/en
Application granted granted Critical
Publication of CN104156478B publication Critical patent/CN104156478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Television Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:For the video file of existing text subtitle, index building;Obtain the elementary audio characteristic sequence Z sequences of the video according to Sms divisions:Z1Z2Z3Z4Z5...ZnWith integration and sequence, T-sequence:T1T2T3T4...Tn‑9;Retrieval of each captions one Z sequence fragment of correspondence to video caption, for each video in video index storehouse, takes out its T-sequence, is set to sequence A:A1A2A3...An, the T-sequence of fragment Seg is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and the Euclidean distance of best match is calculated, lookup obtains the minimum video V of distance, used as the video that matching is obtained;Each captions in the video obtained to matching, carry out counter-match and find best match, realize that captions are matched.The present invention sets up index based on voice data, there is provided an a kind of captions Indexing Mechanism and caption detection method for form different video, efficiently and accurate.

Description

A kind of captions matching of internet video and search method
Technical field
The present invention relates to computer software technical field, the captions of espespecially a kind of internet video are matched and search method.
Background technology
Video on internet is varied, and the video of identical content, possible coded format is different, and possible code check differs Sample, possible resolution ratio is different, and a possible video is the fragment of another video.This patent provides one for form difference A kind of captions Indexing Mechanism and caption detection method of video.In such a case it is difficult to efficient and preparation carry out video The index of captions, realizes the matching of captions.
The content of the invention
To solve the above problems, the present invention is provided and a kind of sets up the captions of the internet video of index based on voice data Matching and search method.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence:Z1Z2Z3Z4Z5...Zn;
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair Answer a Z sequence fragment;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data Change is processed;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence;
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B: B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out, Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.
Step one(2)Middle sample rate is normalized to 16bit, 8,000 Hz.
Step one(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
Step one(4)In take S for 10 ms.
Advantageous Effects of the invention are:The present invention sets up index based on voice data, by audio frequency characteristics sequence The structure of row(Integration and sequence), based on integration and sequence search most have matching video method and based on integration and sequence carry out A kind of method of captions matching, there is provided a captions Indexing Mechanism and caption detection method for form different video, efficiently And it is accurate.
Specific embodiment
With reference to embodiment, specific embodiment of the invention is described in further detail.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel.
(2)The normalization of audio data samples rate, sample rate are normalized to 16bit, 8,000 Hz, or other adopt Sample rate.
(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence:Z1Z2Z3Z4Z5...Zn, S is taken for 10 ms.
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair Answer a Z sequence fragment.
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data Change is processed.
(2)By end-point detection algorithm, voice and non-voice are distinguished.
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds.
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence.
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B: B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out, Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained.
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.

Claims (4)

1. a kind of captions of internet video are matched and search method, it is characterised in that comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 sound Road;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z sequences of the video according to S ms divisions:
Z 1 Z 2 Z 3 Z 4 Z 5 ...Z n ;
(5)For each video, in addition to preserving elementary audio characteristic sequence, it is 1000ms also to preserve a time window Integration and sequence, T sequences:T1 T 2 T 3 T 4 ...T n-9, wherein, Tn = Zn+Zn+1+...+Zn+9;Each word Curtain one Z sequence fragment of correspondence;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、 (2)Step carries out the normalization of voice data Treatment;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, the elementary audio characteristic sequence Z sequences and integration and sequence T of fragment Seg are calculated Sequence;
(5)For each video in video index storehouse, its T sequences are taken out, be set to sequence A:A1 A 2 A 3 ...A n, the T sequences of fragment Seg are set to sequence B:B1 B 2 B 3 ...B m, sequence of calculation A and sequence B Best match, and the Euclidean distance of best match is calculated, computational methods are as follows:
I, the head taking-up from sequence A and the data of the same length of sequence B:A 1 A 2 A 3 ...A m , calculate the data With sequence B:B1 B 2 B 3 ...B m :Euclidean distance;
Ii, subsequence is offset k, i.e. sequence A1+k A 2+k A 3+k ...A m+k, calculate the Euclidean distance with sequence B;
Iii, subsequence is offset 2k, i.e. sequence A1+2k A 2+2k A 3+2k ...A m+2k, calculate European with sequence B Distance;
Iv, in this way, until scanning through full sequence;
V, find out the most short subsequence A of matching Euclidean distance 1+jk A 2+jk A 3+jk ...A m+jk, finer scanning is carried out, Also arrangement set A is found out1+jk+d A 2+jk+d A 3+jk+d ...A m+jk+d , wherein-m/2<= d <In=m/2, with sequence B:B 1 B 2 B 3 ...B m The most short sequence of Euclidean distance, as best match sequence;Best match sequence and sequence B Distance is fragment Seg and the distance of video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, as A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.
2. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (2)Middle sample rate is normalized to 16bit, 8,000 Hz.
3. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
4. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (4)In take S for 10 ms.
CN201410423582.4A 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method Active CN104156478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410423582.4A CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410423582.4A CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Publications (2)

Publication Number Publication Date
CN104156478A CN104156478A (en) 2014-11-19
CN104156478B true CN104156478B (en) 2017-07-07

Family

ID=51881976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410423582.4A Active CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Country Status (1)

Country Link
CN (1) CN104156478B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101573A (en) * 2016-06-24 2016-11-09 中译语通科技(北京)有限公司 The grappling of a kind of video labeling and matching process
CN114579806B (en) * 2022-04-27 2022-08-09 阿里巴巴(中国)有限公司 Video detection method, storage medium and processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367906A (en) * 1999-07-31 2002-09-04 朴奎珍 Study method and apparatus using digital audio and caption data
US7378588B1 (en) * 2006-09-12 2008-05-27 Chieh Changfan Melody-based music search
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158754B2 (en) * 2012-03-29 2015-10-13 The Echo Nest Corporation Named entity extraction from a block of text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367906A (en) * 1999-07-31 2002-09-04 朴奎珍 Study method and apparatus using digital audio and caption data
US7378588B1 (en) * 2006-09-12 2008-05-27 Chieh Changfan Melody-based music search
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于视频的字幕检索与提取;杨友庆等;《计算机应用》;20001028;第20卷(第10期);第33页-第35页 *

Also Published As

Publication number Publication date
CN104156478A (en) 2014-11-19

Similar Documents

Publication Publication Date Title
CN109257547B (en) Subtitle generation method for Chinese online audio and video
US11769515B2 (en) Audio coder window sizes and time-frequency transformations
RU2011104001A (en) METHOD AND DISCRIMINATOR FOR CLASSIFICATION OF VARIOUS SIGNAL SEGMENTS
EP4254951A3 (en) Audio decoding method for processing stereo audio signals using a variable prediction direction
EP2629293A3 (en) Method and apparatus for audio decoding
TR201903942T4 (en) Post processing device and method for spectral values and encoder and decoder for audio signals.
MY181026A (en) Apparatus and method realizing improved concepts for tcx ltp
MY188538A (en) Decoding device, method, and program
MY191125A (en) Audio data processing method and terminal
CN105469807B (en) A kind of more fundamental frequency extracting methods and device
WO2008122975A3 (en) Means and methods for detecting bacteria in a sample
US8463614B2 (en) Audio encoding/decoding for reducing pre-echo of a transient as a function of bit rate
CN104156478B (en) A kind of captions matching of internet video and search method
MY176427A (en) Method and device for arithmetic encoding or arithmetic decoding
EP4325727A3 (en) Data processing method and device
EP4235662A3 (en) Audio decoder, audio encoder, method for providing a decoded audio signal, method for providing an encoded audio signal, audio stream, audio stream provider and computer program using a stream identifier
CN104064191B (en) Sound mixing method and device
WO2010092915A1 (en) Method for processing multichannel acoustic signal, system thereof, and program
EP2757801A3 (en) Server and client processing multiple sets of channel information and controlling method of the same
MX362612B (en) Method and device for processing audio signal.
EP3095117B1 (en) Multi-channel audio signal classifier
TW200811833A (en) Detection method for voice activity endpoint
EP2077633A3 (en) Method and apparatus for processing service guide information
CN106486133A (en) One kind is uttered long and high-pitched sounds scene recognition method and equipment
CN109361923A (en) A kind of time slip-window scene change detection method and system based on motion analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing

Patentee after: Chinese translation language through Polytron Technologies Inc

Address before: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing

Patentee before: Mandarin Technology (Beijing) Co., Ltd.