CN104156478B - A kind of captions matching of internet video and search method - Google Patents

A kind of captions matching of internet video and search method Download PDF

Info

Publication number
CN104156478B
CN104156478B CN201410423582.4A CN201410423582A CN104156478B CN 104156478 B CN104156478 B CN 104156478B CN 201410423582 A CN201410423582 A CN 201410423582A CN 104156478 B CN104156478 B CN 104156478B
Authority
CN
China
Prior art keywords
sequence
video
captions
sequences
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410423582.4A
Other languages
Chinese (zh)
Other versions
CN104156478A (en
Inventor
程国艮
袁翔宇
王宇晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese translation language through Polytron Technologies Inc
Original Assignee
Mandarin Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mandarin Technology (beijing) Co Ltd filed Critical Mandarin Technology (beijing) Co Ltd
Priority to CN201410423582.4A priority Critical patent/CN104156478B/en
Publication of CN104156478A publication Critical patent/CN104156478A/en
Application granted granted Critical
Publication of CN104156478B publication Critical patent/CN104156478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Television Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:For the video file of existing text subtitle, index building;Obtain the elementary audio characteristic sequence Z sequences of the video according to Sms divisions:Z1Z2Z3Z4Z5...ZnWith integration and sequence, T-sequence:T1T2T3T4...Tn‑9;Retrieval of each captions one Z sequence fragment of correspondence to video caption, for each video in video index storehouse, takes out its T-sequence, is set to sequence A:A1A2A3...An, the T-sequence of fragment Seg is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and the Euclidean distance of best match is calculated, lookup obtains the minimum video V of distance, used as the video that matching is obtained;Each captions in the video obtained to matching, carry out counter-match and find best match, realize that captions are matched.The present invention sets up index based on voice data, there is provided an a kind of captions Indexing Mechanism and caption detection method for form different video, efficiently and accurate.

Description

A kind of captions matching of internet video and search method
Technical field
The present invention relates to computer software technical field, the captions of espespecially a kind of internet video are matched and search method.
Background technology
Video on internet is varied, and the video of identical content, possible coded format is different, and possible code check differs Sample, possible resolution ratio is different, and a possible video is the fragment of another video.This patent provides one for form difference A kind of captions Indexing Mechanism and caption detection method of video.In such a case it is difficult to efficient and preparation carry out video The index of captions, realizes the matching of captions.
The content of the invention
To solve the above problems, the present invention is provided and a kind of sets up the captions of the internet video of index based on voice data Matching and search method.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence:Z1Z2Z3Z4Z5...Zn;
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair Answer a Z sequence fragment;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data Change is processed;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence;
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B: B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out, Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.
Step one(2)Middle sample rate is normalized to 16bit, 8,000 Hz.
Step one(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
Step one(4)In take S for 10 ms.
Advantageous Effects of the invention are:The present invention sets up index based on voice data, by audio frequency characteristics sequence The structure of row(Integration and sequence), based on integration and sequence search most have matching video method and based on integration and sequence carry out A kind of method of captions matching, there is provided a captions Indexing Mechanism and caption detection method for form different video, efficiently And it is accurate.
Specific embodiment
With reference to embodiment, specific embodiment of the invention is described in further detail.
The present invention is captions matching and the search method of a kind of internet video, is comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 Sound channel.
(2)The normalization of audio data samples rate, sample rate are normalized to 16bit, 8,000 Hz, or other adopt Sample rate.
(3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z of the video according to S ms divisions Sequence:Z1Z2Z3Z4Z5...Zn, S is taken for 10 ms.
(5)For each video, in addition to preserving elementary audio characteristic sequence, also preserving a time window is The integration and sequence of 1000ms, T-sequence:T1T2T3T4...Tn-9, wherein, Tn=Zn+Zn+1+...+Zn+9;Each captions pair Answer a Z sequence fragment.
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、(2)Step carries out the normalizing of voice data Change is processed.
(2)By end-point detection algorithm, voice and non-voice are distinguished.
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds.
(4)According to the method for above-mentioned steps one, calculate fragment Seg elementary audio characteristic sequence Z sequences and integration and Sequence T-sequence.
(5)For each video in video index storehouse, its T-sequence is taken out, be set to sequence A:A1A2A3...An, piece The T-sequence of section Seg, is set to sequence B:B1B2B3...BM,Sequence of calculation A and sequence B best match, and calculate the Europe of best match Formula distance, computational methods are as follows:
, take out from the head of sequence A and the same length of sequence B data:A1A2A3...Am, calculate the data and sequence B: B1B2B3...Bm:Euclidean distance;
, subsequence is offset k, i.e. sequence A1+kA2+kA3+k...Am+k, calculate the Euclidean distance with sequence B;
, subsequence is offset 2k, i.e. sequence A1+2kA2+2kA3+2k...Am+2k, calculate the Euclidean distance with sequence B;
, in this way, until scanning through full sequence;
, find out the most short subsequence A of matching Euclidean distance1+jkA2+jkA3+jk...Am+jk, finer scanning is carried out, Find out arrangement set A1+jk+dA2+jk+dA3+jk+d...Am+jk+d, wherein-m/2<= d <In=m/2, with sequence BB1B2B3...BmThe most short sequence of Euclidean distance, as best match sequence;Best match sequence is piece with the distance of sequence B The distance of section Seg and video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained.
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, As A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.

Claims (4)

1. a kind of captions of internet video are matched and search method, it is characterised in that comprised the following steps:
Step one, the video file for existing text subtitle, index building;
(1)Voice data to video is analyzed, if video has multiple sound channels, multi-channel data is merged into 1 sound Road;
(2)By the normalization of audio data samples rate;
(3)Framing is carried out to voice data;
(4)For each frame, zero-crossing rate is calculated, obtain the elementary audio characteristic sequence Z sequences of the video according to S ms divisions:
Z 1 Z 2 Z 3 Z 4 Z 5 ...Z n ;
(5)For each video, in addition to preserving elementary audio characteristic sequence, it is 1000ms also to preserve a time window Integration and sequence, T sequences:T1 T 2 T 3 T 4 ...T n-9, wherein, Tn = Zn+Zn+1+...+Zn+9;Each word Curtain one Z sequence fragment of correspondence;
Step 2:The retrieval of video caption
(1)For the video I on internet, according in above-mentioned steps one(1)、 (2)Step carries out the normalization of voice data Treatment;
(2)By end-point detection algorithm, voice and non-voice are distinguished;
(3)To video I, the length that voice intensive fragment Seg, fragment Seg are taken out wherein is 10-30 seconds;
(4)According to the method for above-mentioned steps one, the elementary audio characteristic sequence Z sequences and integration and sequence T of fragment Seg are calculated Sequence;
(5)For each video in video index storehouse, its T sequences are taken out, be set to sequence A:A1 A 2 A 3 ...A n, the T sequences of fragment Seg are set to sequence B:B1 B 2 B 3 ...B m, sequence of calculation A and sequence B Best match, and the Euclidean distance of best match is calculated, computational methods are as follows:
I, the head taking-up from sequence A and the data of the same length of sequence B:A 1 A 2 A 3 ...A m , calculate the data With sequence B:B1 B 2 B 3 ...B m :Euclidean distance;
Ii, subsequence is offset k, i.e. sequence A1+k A 2+k A 3+k ...A m+k, calculate the Euclidean distance with sequence B;
Iii, subsequence is offset 2k, i.e. sequence A1+2k A 2+2k A 3+2k ...A m+2k, calculate European with sequence B Distance;
Iv, in this way, until scanning through full sequence;
V, find out the most short subsequence A of matching Euclidean distance 1+jk A 2+jk A 3+jk ...A m+jk, finer scanning is carried out, Also arrangement set A is found out1+jk+d A 2+jk+d A 3+jk+d ...A m+jk+d , wherein-m/2<= d <In=m/2, with sequence B:B 1 B 2 B 3 ...B m The most short sequence of Euclidean distance, as best match sequence;Best match sequence and sequence B Distance is fragment Seg and the distance of video;
(6)Lookup obtains the minimum video V of distance, used as the video that matching is obtained;
(7)Each captions in the video obtained to matching, carry out counter-match, calculate the integral sequence of input video, as A sequences, according to step 2(5)Flow, be B sequences with the corresponding Z sequence fragments of each captions, find best match, Realize that captions are matched.
2. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (2)Middle sample rate is normalized to 16bit, 8,000 Hz.
3. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (3)Framing is carried out to voice data, according to frame length L ms, frame moves S ms and carries out framing.
4. the captions of the internet video according to claim 1 are matched and search method, it is characterised in that step one (4)In take S for 10 ms.
CN201410423582.4A 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method Active CN104156478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410423582.4A CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410423582.4A CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Publications (2)

Publication Number Publication Date
CN104156478A CN104156478A (en) 2014-11-19
CN104156478B true CN104156478B (en) 2017-07-07

Family

ID=51881976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410423582.4A Active CN104156478B (en) 2014-08-26 2014-08-26 A kind of captions matching of internet video and search method

Country Status (1)

Country Link
CN (1) CN104156478B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101573A (en) * 2016-06-24 2016-11-09 中译语通科技(北京)有限公司 The grappling of a kind of video labeling and matching process
CN114579806B (en) * 2022-04-27 2022-08-09 阿里巴巴(中国)有限公司 Video detection method, storage medium and processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367906A (en) * 1999-07-31 2002-09-04 朴奎珍 Study method and apparatus using digital audio and caption data
US7378588B1 (en) * 2006-09-12 2008-05-27 Chieh Changfan Melody-based music search
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158754B2 (en) * 2012-03-29 2015-10-13 The Echo Nest Corporation Named entity extraction from a block of text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367906A (en) * 1999-07-31 2002-09-04 朴奎珍 Study method and apparatus using digital audio and caption data
US7378588B1 (en) * 2006-09-12 2008-05-27 Chieh Changfan Melody-based music search
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于视频的字幕检索与提取;杨友庆等;《计算机应用》;20001028;第20卷(第10期);第33页-第35页 *

Also Published As

Publication number Publication date
CN104156478A (en) 2014-11-19

Similar Documents

Publication Publication Date Title
CN107305541B (en) Method and device for segmenting speech recognition text
CN109257547B (en) Chinese online audio/video subtitle generating method
RU2011104001A (en) METHOD AND DISCRIMINATOR FOR CLASSIFICATION OF VARIOUS SIGNAL SEGMENTS
CN106297776B (en) A kind of voice keyword retrieval method based on audio template
EP2819414A3 (en) Image processing device and image processing method
EP2884423A3 (en) Video synopsis method and apparatus
CN105408956B (en) Method for obtaining spectral coefficients of a replacement frame of an audio signal and related product
US11769515B2 (en) Audio coder window sizes and time-frequency transformations
EP4404560A3 (en) Audio decoding method for processing stereo audio signals using a variable prediction direction
MY181026A (en) Apparatus and method realizing improved concepts for tcx ltp
NO20092125L (en) Device and method for processing spectral values, as well as audio signal decoders and decoders
MY188538A (en) Decoding device, method, and program
WO2012128382A1 (en) Device and method for lip motion detection
US20100121648A1 (en) Audio frequency encoding and decoding method and device
EP4354432A3 (en) Apparatus for determining for the compression of an hoa data frame representation a lowest integer number of bits required for representing non-differential gain values
WO2008122975A3 (en) Means and methods for detecting bacteria in a sample
EP4243017A3 (en) Apparatus and method decoding an audio signal using an aligned look-ahead portion
CN104156478B (en) A kind of captions matching of internet video and search method
MX2019001193A (en) Method and device for processing audio signal.
WO2015024428A1 (en) Method, terminal, system for audio encoding/decoding/codec
CN108198558A (en) A kind of audio recognition method based on CSI data
EP4336495A3 (en) Inter-channel phase difference parameter extraction method and apparatus
CN103456307B (en) In audio decoder, the spectrum of frame error concealment replaces method and system
CN107424628A (en) A kind of method that specific objective sound end is searched under noisy environment
MX2016014237A (en) Improved frame loss correction with voice information.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing

Patentee after: Chinese translation language through Polytron Technologies Inc

Address before: Room 7473, room No. 3, No. 3, Xijing Road, Badachu high tech park, Shijingshan District, Beijing

Patentee before: Mandarin Technology (Beijing) Co., Ltd.