CN104038804B - Captioning synchronization apparatus and method based on speech recognition - Google Patents

Captioning synchronization apparatus and method based on speech recognition Download PDF

Info

Publication number
CN104038804B
CN104038804B CN201310069142.9A CN201310069142A CN104038804B CN 104038804 B CN104038804 B CN 104038804B CN 201310069142 A CN201310069142 A CN 201310069142A CN 104038804 B CN104038804 B CN 104038804B
Authority
CN
China
Prior art keywords
text information
voice
module
word
captions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310069142.9A
Other languages
Chinese (zh)
Other versions
CN104038804A (en
Inventor
徐�明
范炜
谭皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201310069142.9A priority Critical patent/CN104038804B/en
Publication of CN104038804A publication Critical patent/CN104038804A/en
Application granted granted Critical
Publication of CN104038804B publication Critical patent/CN104038804B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

There is provided a kind of captioning synchronization apparatus and method based on speech recognition, the captioning synchronization device includes:Sound identification module, the voice from audio stream extraction foreground sounds, and the voice of extraction is sampled and recognized, so as to generate and corresponding text information;Dynamic sampling adjusting module, carries out the evaluation of semantics recognition degree to the text information of generation, and according to the result of evaluation come control voice identification module sampling rate adjusting to obtain the text information with high semantics recognition degree;Captions semanteme contrast module, semantic matches are carried out by the word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video;Captioning synchronization module, if captions semanteme contrast module finds sentence corresponding with the text information of the voice of identification in subtitle file, the temporal information of subtitle file is adjusted according to the temporal information of voice;Subtitle Demonstration module, captions are shown according to the temporal information of the subtitle file after adjustment.

Description

Captioning synchronization apparatus and method based on speech recognition
Technical field
The present invention relates to speech recognition and captioning synchronization technical field.More particularly, it is related to one kind and utilizes speech recognition The apparatus and method of automatic synchronization captions corresponding with video when TV programme are played.
Background technology
At present, the support in digital television signal stream for subtitle language number is limited, it is impossible to while meeting different crowd Demand.Especially in place as hotel hotel, the people for having many different language countries moves in, and these crowds are in viewing The need for just having its special when DTV captions.Therefore, when playing digital TV video frequency in the presence of additional many of display The demand of state's language subtitle.Simultaneously as the information that may be broken for commercialsy with emergency notice etc in TV programme, additional Multinational Subtitle Demonstration needs advertisement category information turn function, synchronous with audio frequency and video holding all the time.
The content of the invention
The present invention proposes in TV programme simultaneous display when there is commercial breaks by using speech recognition technology and added The scheme of captions.By additional language subtitle, sampled using dynamic voice, effective audio-frequency information is rationally obtained, to subtitling The text Presentation Time Stamp that is matched and adjusted so that subtitling text can be to entering in the presence of the phenomenon such as intercutting in digital television program The effective adjustment of row, keeps the simultaneous display of subtitling.
According to an aspect of the present invention there is provided a kind of captioning synchronization device based on speech recognition, including:Speech recognition Module, the voice from audio stream extraction foreground sounds corresponding with playing video, and the voice of extraction is sampled and known Not, so as to generate text information corresponding with the voice recognized;Dynamic sampling adjusting module, the text generated to sound identification module Word information carries out the evaluation of semantics recognition degree, and according to the result of evaluation come control voice identification module sampling rate adjusting with Obtain the text information with high semantics recognition degree;Captions semanteme contrast module, by the text information with high semantics recognition degree The word of additional multilingual subtitle with playing video carries out semantic matches;Captioning synchronization module, if captions semanteme contrast Module finds sentence corresponding with the text information of the voice of identification in subtitle file, then is adjusted according to the temporal information of voice The temporal information of whole subtitle file;Subtitle Demonstration module, the temporal information of the subtitle file after being adjusted according to captioning synchronization module To show captions.
According to an aspect of the present invention, the captioning synchronization device also includes:Speech selection module, according to the selection of user To determine the language for the captions that will be shown.
According to an aspect of the present invention, when in the text information that dynamic sampling adjusting module determines sound identification module generation Phonetic word number when preset range [m, n] is interior, dynamic sampling adjusting module, which determines that text information has, high semantic to be known Do not spend, wherein m, n are natural numbers.
According to an aspect of the present invention, if dynamic sampling adjusting module determines the text information of sound identification module generation In phonetic word quantity be less than minimum number m, then dynamic sampling adjusting module control voice identification module improve sampling frequency Rate is sampled to voice;If dynamic sampling adjusting module determines the voice in the text information of sound identification module generation The quantity of word is higher than maximum quantity n, then dynamic sampling adjusting module control voice identification module reduction sample frequency is come to language Sound is sampled.
According to an aspect of the present invention, dynamic sampling adjusting module is considered in the text information of sound identification module generation The semantic meaning of phonetic word evaluates the semantics recognition degree of text information.
According to an aspect of the present invention, the semantic contrast module of captions uses fuzzy algorithmic approach to playing video using fuzzy algorithmic approach The word of additional multilingual subtitle enter line character scoring so that find out the sentence of highest scoring in subtitle file as with text The sentence of word information matches.
According to an aspect of the present invention, if captions semantic matches module does not find the language with identification in subtitle file The corresponding sentence of text information of sound, then notify dynamic sampling adjusting module to improve the sample frequency of sound identification module.
According to another aspect of the present invention there is provided a kind of captioning synchronization method based on speech recognition, including:(a) from Audio stream corresponding with playing video extracts the voice in foreground sounds, and the voice of extraction is sampled and recognized, so that Generate text information corresponding with the voice recognized;(b) evaluation of semantics recognition degree, and root are carried out to the text information of generation Carry out control voice identification module sampling rate adjusting according to the result of evaluation to obtain the text information with high semantics recognition degree; (c) word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video is carried out semantic Match somebody with somebody, to find sentence corresponding with the text information of the voice of identification in subtitle file;(d) according to the temporal information of voice come Adjust the temporal information of subtitle file;(e) captions are shown according to the temporal information of the subtitle file after adjustment.
According to another aspect of the present invention, the captioning synchronization method also includes:Being determined according to the selection of user will The language of the captions of display.
According to another aspect of the present invention, in step (b), when it is determined that step (a) generation text information in voice list The number of word determines that text information has high semantics recognition degree, wherein m, n is natural number when preset range [m, n] is interior.
According to another aspect of the present invention, in step (b), if it is determined that the voice in the text information of step (a) generation The quantity of word is less than minimum number m, then return to step (a) and improves sample frequency to sample to voice;If it is determined that The quantity of phonetic word in the text information of step (a) generation is higher than maximum quantity n, then return to step (a) reduction sampling frequency Rate is sampled to voice.
According to another aspect of the present invention, in step (b), it is considered to the phonetic word in the text information of step (a) generation Semantic meaning evaluate the semantics recognition degree of text information.
According to another aspect of the present invention, in step (c), using fuzzy algorithmic approach using fuzzy algorithmic approach to playing the attached of video Plus the word of multilingual subtitle enters line character scoring, so that the sentence for finding out highest scoring in subtitle file is believed as with word Cease the sentence of matching.
According to another aspect of the present invention, if do not found in step (c) in subtitle file and the voice of identification The sample frequency of the corresponding sentence of text information, then return to step (a) raising speech recognition.
Brief description of the drawings
By the description carried out below in conjunction with the accompanying drawings, above and other purpose of the invention and feature will become more clear Chu, wherein:
Fig. 1 is the block diagram for showing the captioning synchronization device according to embodiments of the present invention based on speech recognition;
Fig. 2 is the flow chart for showing the captioning synchronization method according to embodiments of the present invention based on speech recognition.
Embodiment
The description of progress referring to the drawings is provided below to contribute to comprehensive understanding such as claim and its equivalent to be limited Exemplary embodiment of the invention.The description includes various detailed details to help to understand, and these describe will be by Think exemplary only.Therefore, one of ordinary skill in the art will recognize do not departing from scope and spirit of the present invention In the case of can make various changes described here and modification.In addition, in order to clear and succinct, can omit to known function and The description of construction.
The term and word used in following described and claimed is not limited to bibliographical meaning, but only by inventor Use to understand and as one man understand the present invention.Therefore, it should be appreciated by the person skilled in the art that being provided below To the description of the exemplary embodiment of the present invention merely for the purpose shown, rather than for limitation such as by claim and its The purpose of the present invention that jljl limits.
Fig. 1 is the block diagram for showing the captioning synchronization device 100 according to embodiments of the present invention based on speech recognition.
As shown in figure 1, the captioning synchronization device 100 based on speech recognition according to embodiments of the present invention includes speech selection The semantic contrast module 140 of module 110, sound identification module 120, dynamic sampling adjusting module 130, captions, captioning synchronization module 150 and Subtitle Demonstration module 160.Captioning synchronization device 100 according to embodiments of the present invention can be integrated into Digital Broadcasting Receiver dress Put or video play device among.
Speech selection module 110 can determine the subtitle language that will be shown according to the selection of user.For example, when user is logical Cross remote control equal controller and send signal to captioning synchronization device 100, so as to select the subtitle language for wanting to use.
Sound identification module 120 is from the corresponding audio of the video flowings of the TV programme with playing or other broadcasting contents Stream extracts the voice in foreground sounds, and the voice of extraction is sampled and recognized, so as to generate corresponding with the voice recognized Text information.By extracting prospect master voice, the background sound in the video of broadcasting can be removed, for example, movie or television The sound such as automobile, background music in program, can so improve the degree of accuracy of speech recognition.Can using it is any in the prior art Prospect master voice extracting method and speech recognition engine realize sound identification module 120.
The text information that dynamic sampling adjusting module 130 is generated to sound identification module 120 carries out semantic identification degree and commented Valency, and according to the result of evaluation determine the need for adjust sound identification module 120 sample frequency.It is real according to the one of the present invention Example is applied, dynamic sampling adjusting module 130 can determine that the number of the phonetic word in the text information that sound identification module 120 is generated Whether in preset range [m, n].If it is determined that the quantity of the phonetic word in text information is less than minimum number m or is more than Maximum quantity n, then the determination of dynamic sampling adjusting module 130 semantics recognition degree is relatively low, it is necessary to sampling rate adjusting.Work as dynamic sampling Adjusting module 130 determines that the quantity of the phonetic word in the text information that sound identification module 120 is generated is less than minimum number m When, dynamic sampling adjusting module 130 determines to need to improve sample frequency, so that the adopting with raising of control voice identification module 120 Sample frequency is sampled to voice.When dynamic sampling adjusting module 130 determines the text information that sound identification module 120 is generated In phonetic word quantity be higher than maximum quantity n when, dynamic sampling adjusting module 130 determine can reduce sample frequency, from And control voice identification module 120 is sampled according to the sample frequency after reduction to voice.That is, as the people in audio When thing speaks word speed quickly, the sentence number of characters obtained within the unit interval will increase, and this causes the error rate that captions are matched Increase, now, it may be determined that the semantics recognition degree of present video is low.Conversely, when in audio personage speak word speed it is very slow when, in unit The sentence number of characters obtained in time will be reduced, and equally can also increase the error rate of captions matching, now, equally be can determine that and worked as Preceding audio semantics recognition degree is low.Therefore, only control sample frequency and obtain the number of characters of fair amount and just can determine that semanteme Resolution is high.
In addition, embodiments in accordance with the present invention, when evaluation semantics recognition is spent, dynamic sampling adjusting module 130 can also be examined Consider the semantic meaning of the phonetic word in the text information that sound identification module 120 is generated, determine whether to need adjustment to adopt Sample frequency.For example, when the phonetic word in the text information that sound identification module 120 is generated includes multiple low semantic words When (for example, the onomatopoeia of such as continuous multiple " "), dynamic sampling adjusting module 130 can determine that sound identification module 120 is given birth to Into text information semantics recognition degree it is relatively low, and control voice identification module 120 improve sample frequency.
Next, obtaining the text information of higher semantics recognition degree in the assessment by dynamic sampling adjusting module 130 Afterwards, the semantic contrast module 140 of captions carries out the word of text information and the additional multilingual subtitle for playing video semantic Matching.Here, the semantic contrast module 140 of captions can use fuzzy algorithmic approach, and word is carried out to the word of additional multilingual subtitle Symbol scoring, so as to find out the sentence of highest scoring in subtitle file.That is, captions semanteme contrast module 140 is literary by captions Scoring is defined as sentence corresponding with the text information recognized higher than the scoring highest sentence in the sentence of predetermined value in part.
It will be exemplified below by the way of fuzzy algorithmic approach scores sentence.Certainly, those skilled in the art can adopt Otherwise search the sentence with the semantic matches of the sentence in subtitle file.
Two character strings ACAATCC and AGCATGC are provided, then modifies, delete and adds when being matched to both It can just be matched completely Deng operation.In order to be more convenient the calculating of the degree of approximation, editing distance is adjusted to degree of approximation score, even With then obtaining 2 points, modification, deletion, addition then obtain -1 point., can be by following in order to obtain degree of approximation score during matching completely Recurrence formula obtains a score matrix, and its degree of approximation score is S (n, n) value in n rank matrixes S, and n is matching string Length adds 1.V represents Value (i.e. score value), and D represents Difference Value (i.e. difference), and S represents String and (treated With character string), T represents Template i.e. template, and i, j represent the row and column of matrix respectively, and value is since 0).
Initial value can be directly obtained:
V (0,0)=0;
V (0, j)=V (0, j-1)+D (_, T [j]);Insertion j times
V (i, 0)=V (i-1,0)+D (S [i], _);Delete i times
Other values can be obtained by following stepping type:
Formula more than, with exemplified by calculating V (1,2),
Known i=1, j=2
Then:
V (0,1)=- 1, V (0,2)=- 2, V (1,1)=2;
D (S [1], T [2])=- 1 (i.e. A is compared with AG),
D (S [1], _)=- 1 (i.e. A with _ is compared),
D (_, T [2])=- 1 (i.e. _ compared with G);
V (1,2)=V (0,1)+D (S [1], T [2])=- 2,
V (1,2)=V (0,2)+D (S [1], _)=- 3,
V (1,2)=V (0,1)+D (_, T [2])=1;
It can finally obtain:
(max) V (1,2)=1
The corresponding optimal similarity score for being scored at the character string of 7 points, i.e., two of most short editing distance may finally be drawn For 7.
It the above is only an example of the method for scoring character string of having enumerated, any of side can also be used Method is evaluated come the similitude between the sentence in the text information and subtitle file to identification.
If the scoring of all sentences is below predetermined value, captions semanteme contrast module 140 is determined in subtitle file not In the presence of sentence corresponding with the text information of identification.Embodiments in accordance with the present invention, when the semantic contrast module 140 of captions does not exist When sentence corresponding with the text information recognized is found in subtitle file, captions semanteme contrast module 140 is adjusted to dynamic sampling Module 130 sends the order for improving sample frequency, so that dynamic sampling adjusting module 130 can control sound identification module 120 Continue that voice is identified according to the sample frequency of raising.Then, sound identification module 120, dynamic sampling adjustment mould are repeated The aforesaid operations of the semantic contrast module 140 of block 130, captions, until find with the semantic similarity of the sentence in subtitle file compared with Untill high voice.
If captions semanteme contrast module 140 finds the sentence in subtitle file corresponding with the voice sampled, captions Synchronization module 150 adjusts the temporal information of subtitle file according to the temporal information of voice.That is, captioning synchronization module It is inclined between the temporal information of 150 sentences found according to the semantic contrast module 140 of temporal information and captions of the voice of sampling Shifting value adjusts the temporal information of Subtitle Demonstration.
Finally, Subtitle Demonstration module 160 shows word according to the temporal information of the captions after the adjustment of captioning synchronization module 150 Curtain.
It should be understood that modules described above can be further combined into less module, or according to its execution Operate and be divided into more modules.
Captioning synchronization side according to embodiments of the present invention based on speech recognition is described below with reference to Fig. 2 flow chart Method.
First, in step S210, the voice from audio stream corresponding with video flowing extraction foreground sounds, and to extraction Voice is sampled and recognized, so as to generate text information corresponding with the voice recognized.Here, word can be selected by user The language form of information.
Next, in step S220, semantic identification degree evaluation is carried out to the text information of generation.Next, in step S230 determines the need for adjusting the sample frequency of speech recognition according to the result of evaluation.Embodiments in accordance with the present invention, can lead to Whether the number for the phonetic word crossed in the text information for determining the generation of sound identification module 120 is come in preset range [m, n] Decide whether the sample frequency of adjustment speech recognition.In addition, it is also possible to consider the semanteme meaning of the phonetic word in text information Justice determines the need for sampling rate adjusting.If it is determined that need sampling rate adjusting, then can according to semanteme in step S235 The evaluation result of resolution carrys out sampling rate adjusting, then returnes to step S210 to carry out semantic identification degree evaluation again. If it is determined that not needing sampling rate adjusting, then proceed to step S240.
After obtaining the text information of higher semantics recognition degree by step S230 assessment, in step S240 by word The word of additional multilingual subtitle of the information with playing video carries out semantic matches.
Next, determining whether to find the word letter with identification in the word of additional multilingual subtitle in step S250 Cease the sentence of matching.
If determining have found the sentence matched with text information in step S250, believe in step S260 according to word The temporal information of corresponding voice is ceased to adjust the display time of captions.Otherwise, if not finding what is matched with text information Sentence, then improve sample frequency in step S255, is then back to step S210 and extracts voice and sampled and recognized.
The operation S210-S255 of the above is repeated, until the word that the voice with extracting is found in subtitle file is believed Untill ceasing corresponding sentence.
Finally, in S270, captions are shown according to the display time of the captions after adjustment.
The present invention proposes the solution of the simultaneous display of captions using speech recognition technology.By using dynamic voice Sampling, rationally obtains effective audio-frequency information, and subtitling text is matched and display temporal information is adjusted, can be in DTV Exist in program and the phenomenon such as intercut effective adjustment is carried out to the word of subtitling, keep the simultaneous display of subtitling.
The method according to the invention may be recorded in including performing by the programmed instruction of computer implemented various operations In computer-readable medium.Medium can also only include programmed instruction or including be combined with programmed instruction data file, Data structure etc..The example of computer-readable medium includes magnetizing mediums (such as hard disk, floppy disk and tape);Optical medium is (for example CD-ROM and DVD);Magnet-optical medium (for example, CD);And especially it is formulated for the hardware unit of storage and execute program instructions (for example, read-only storage (ROM), random access memory (RAM), flash memory etc.).Medium can also include transmitting regulation journey The transmission medium of the carrier wave of the signal of sequence instruction, data structure etc. (such as optical line or metal wire, waveguide).Programmed instruction Example includes the text of the machine code and high-level code performed comprising usable interpreter by computer for example produced by compiler Part.
Although being particularly shown and describing the present invention, the skill of this area with reference to the exemplary embodiment of the present invention Art personnel to it should be understood that in the case where not departing from the spirit and scope of the present invention being defined by the claims, can enter Various changes in row form and details.

Claims (14)

1. a kind of captioning synchronization device based on speech recognition, including:
Sound identification module, the voice from audio stream extraction foreground sounds corresponding with playing video, and to the voice of extraction Sampled and recognized, so as to generate text information corresponding with the voice recognized;
Dynamic sampling adjusting module, whether the number of the phonetic word in text information by determining sound identification module generation Carry out to carry out the text information that sound identification module is generated the evaluation of semantics recognition degree within a predetermined range, and according to evaluation As a result carry out control voice identification module sampling rate adjusting to obtain the text information with high semantics recognition degree;
Captions semanteme contrast module, by the text information with high semantics recognition degree and the additional multilingual subtitle for playing video Word carry out semantic matches;
Captioning synchronization module, if captions semanteme contrast module finds the text information pair with the voice of identification in subtitle file The sentence answered, then adjust the temporal information of subtitle file according to the temporal information of voice;
Subtitle Demonstration module, captions are shown according to the temporal information of the subtitle file after the adjustment of captioning synchronization module.
2. captioning synchronization device as claimed in claim 1, in addition to:
Speech selection module, the language of captions that will be shown is determined according to the selection of user.
3. captioning synchronization device as claimed in claim 1, wherein, when dynamic sampling adjusting module determines that sound identification module is given birth to Into text information in phonetic word number when preset range [m, n] is interior, dynamic sampling adjusting module determine word believe Breath has high semantics recognition degree, and wherein m, n is natural number.
4. captioning synchronization device as claimed in claim 3, wherein:
If dynamic sampling adjusting module determines that the quantity of the phonetic word in the text information of sound identification module generation is less than Minimum number m, then dynamic sampling adjusting module control voice identification module improve sample frequency and voice sampled;
If dynamic sampling adjusting module determines that the quantity of the phonetic word in the text information of sound identification module generation is higher than Maximum quantity n, then dynamic sampling adjusting module control voice identification module reduction sample frequency voice is sampled.
5. the captioning synchronization device as described in claim 3 or 4, wherein, dynamic sampling adjusting module considers sound identification module The semantic meaning of phonetic word in the text information of generation evaluates the semantics recognition degree of text information.
6. captioning synchronization device as claimed in claim 1, wherein, captions semanteme contrast module is regarded using fuzzy algorithmic approach to broadcasting The word of the additional multilingual subtitle of frequency enters line character scoring, thus find out the sentence of highest scoring in subtitle file as with The sentence of text information matching.
7. captioning synchronization device as claimed in claim 1, wherein, if captions semanteme contrast module is not in subtitle file Sentence corresponding with the text information of the voice of identification is found, then notifies dynamic sampling adjusting module to improve sound identification module Sample frequency.
8. a kind of captioning synchronization method based on speech recognition, including:
(a) from playing the voice during the corresponding audio stream of video extracts foreground sounds, and the voice of extraction is carried out sampling and Identification, so as to generate text information corresponding with the voice recognized;
(b) whether the number of the phonetic word in the text information by determining generation carrys out the word to generation within a predetermined range Information carries out the evaluation of semantics recognition degree, and according to the result of evaluation come control voice identification module sampling rate adjusting to obtain There must be the text information of high semantics recognition degree;
(c) word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video is carried out semantic Matching, to find sentence corresponding with the text information of the voice of identification in subtitle file;
(d) temporal information of subtitle file is adjusted according to the temporal information of voice;
(e) captions are shown according to the temporal information of the subtitle file after adjustment.
9. captioning synchronization method as claimed in claim 8, in addition to:
The language of captions that will be shown is determined according to the selection of user.
10. captioning synchronization method as claimed in claim 8, wherein, in step (b), when it is determined that the word letter of step (a) generation The number of phonetic word in breath determines that text information has high semantics recognition degree, wherein m, n when preset range [m, n] is interior It is natural number.
11. captioning synchronization method as claimed in claim 8, wherein, in step (b),
If it is determined that the quantity of the phonetic word in the text information of step (a) generation is less than minimum number m, then return to step (a) and improve sample frequency to sample to voice;
If it is determined that the quantity of the phonetic word in the text information of step (a) generation is higher than maximum quantity n, then return to step (a) sample frequency is reduced to sample to voice.
12. the captioning synchronization method as described in claim 10 or 11, wherein, in step (b), it is considered to the text of step (a) generation The semantic meaning of phonetic word in word information evaluates the semantics recognition degree of text information.
13. captioning synchronization method as claimed in claim 8, wherein, in step (c), using fuzzy algorithmic approach to playing video The word of additional multilingual subtitle enters line character scoring, thus find out the sentence of highest scoring in subtitle file as with word The sentence of information matches.
14. captioning synchronization method as claimed in claim 8, wherein, if step (c) do not found in subtitle file with The sample frequency of the corresponding sentence of text information of the voice of identification, then return to step (a) raising speech recognition.
CN201310069142.9A 2013-03-05 2013-03-05 Captioning synchronization apparatus and method based on speech recognition Active CN104038804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310069142.9A CN104038804B (en) 2013-03-05 2013-03-05 Captioning synchronization apparatus and method based on speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310069142.9A CN104038804B (en) 2013-03-05 2013-03-05 Captioning synchronization apparatus and method based on speech recognition

Publications (2)

Publication Number Publication Date
CN104038804A CN104038804A (en) 2014-09-10
CN104038804B true CN104038804B (en) 2017-09-29

Family

ID=51469372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310069142.9A Active CN104038804B (en) 2013-03-05 2013-03-05 Captioning synchronization apparatus and method based on speech recognition

Country Status (1)

Country Link
CN (1) CN104038804B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019164535A1 (en) * 2018-02-26 2019-08-29 Google Llc Automated voice translation dubbing for prerecorded videos

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104202425A (en) * 2014-09-19 2014-12-10 武汉易象禅网络科技有限公司 Real-time online data transmission system and remote course data transmission method
CN105741841B (en) * 2014-12-12 2019-12-03 深圳Tcl新技术有限公司 Sound control method and electronic equipment
KR102413692B1 (en) * 2015-07-24 2022-06-27 삼성전자주식회사 Apparatus and method for caculating acoustic score for speech recognition, speech recognition apparatus and method, and electronic device
CN105374366A (en) * 2015-10-09 2016-03-02 广东小天才科技有限公司 Method and system for wearable device to identify meaning
CN105848005A (en) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 Video subtitle display method and video subtitle display device
CN106486125A (en) * 2016-09-29 2017-03-08 安徽声讯信息技术有限公司 A kind of simultaneous interpretation system based on speech recognition technology
CN106792097A (en) * 2016-12-27 2017-05-31 深圳Tcl数字技术有限公司 Audio signal captions matching process and device
CN106604125B (en) * 2016-12-29 2019-06-14 北京奇艺世纪科技有限公司 A kind of determination method and device of video caption
CN107241616B (en) * 2017-06-09 2018-10-26 腾讯科技(深圳)有限公司 video lines extracting method, device and storage medium
CN108289244B (en) * 2017-12-28 2021-05-25 努比亚技术有限公司 Video subtitle processing method, mobile terminal and computer readable storage medium
CN108366305A (en) * 2018-02-07 2018-08-03 深圳佳力拓科技有限公司 A kind of code stream without subtitle shows the method and system of subtitle by speech recognition
CN108366182B (en) * 2018-02-13 2020-07-07 京东方科技集团股份有限公司 Calibration method and device for synchronous broadcast of text voice and computer storage medium
CN108259963A (en) * 2018-03-19 2018-07-06 成都星环科技有限公司 A kind of TV ends player
CN108449629B (en) * 2018-03-31 2020-06-05 湖南广播电视台广播传媒中心 Audio voice and character synchronization method, editing method and editing system
CN109195007B (en) * 2018-10-19 2021-09-07 深圳市轱辘车联数据技术有限公司 Video generation method, device, server and computer readable storage medium
CN109949793A (en) * 2019-03-06 2019-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN110689220B (en) * 2019-08-20 2023-04-28 国网山东省电力公司莱芜供电公司 Automatic point-setting machine for realizing dispatching automation
CN110619868B (en) * 2019-08-29 2021-12-17 深圳市优必选科技股份有限公司 Voice assistant optimization method, voice assistant optimization device and intelligent equipment
CN110557668B (en) * 2019-09-06 2022-05-03 常熟理工学院 Sound and subtitle accurate alignment system based on wavelet ant colony
CN110798733A (en) * 2019-10-30 2020-02-14 中央电视台 Subtitle generating method and device, computer storage medium and electronic equipment
CN114333918A (en) * 2020-09-27 2022-04-12 广州市久邦数码科技有限公司 Method and device for matching audio book subtitles
CN115474066A (en) * 2021-06-11 2022-12-13 北京有竹居网络技术有限公司 Subtitle processing method and device, electronic equipment and storage medium
CN113689865A (en) * 2021-08-24 2021-11-23 广东优碧胜科技有限公司 Sampling rate switching method and device, electronic equipment and voice system
CN116471436A (en) * 2023-04-12 2023-07-21 央视国际网络有限公司 Information processing method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6199041B1 (en) * 1998-11-20 2001-03-06 International Business Machines Corporation System and method for sampling rate transformation in speech recognition
CN101320560A (en) * 2008-07-01 2008-12-10 上海大学 Method for speech recognition system improving discrimination by using sampling velocity conversion
CN101505397A (en) * 2009-02-20 2009-08-12 深圳华为通信技术有限公司 Method and system for audio and video subtitle synchronous presenting
CN101808202A (en) * 2009-02-18 2010-08-18 联想(北京)有限公司 Method, system and computer for realizing sound-and-caption synchronization in video file
CN102708861A (en) * 2012-06-15 2012-10-03 天格科技(杭州)有限公司 Poor speech recognition method based on support vector machine

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100640893B1 (en) * 2004-09-07 2006-11-02 엘지전자 주식회사 Baseband modem and mobile terminal for voice recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6199041B1 (en) * 1998-11-20 2001-03-06 International Business Machines Corporation System and method for sampling rate transformation in speech recognition
CN101320560A (en) * 2008-07-01 2008-12-10 上海大学 Method for speech recognition system improving discrimination by using sampling velocity conversion
CN101808202A (en) * 2009-02-18 2010-08-18 联想(北京)有限公司 Method, system and computer for realizing sound-and-caption synchronization in video file
CN101505397A (en) * 2009-02-20 2009-08-12 深圳华为通信技术有限公司 Method and system for audio and video subtitle synchronous presenting
CN102708861A (en) * 2012-06-15 2012-10-03 天格科技(杭州)有限公司 Poor speech recognition method based on support vector machine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019164535A1 (en) * 2018-02-26 2019-08-29 Google Llc Automated voice translation dubbing for prerecorded videos
KR102481871B1 (en) * 2018-02-26 2022-12-28 구글 엘엘씨 Automated voice translation dubbing of pre-recorded videos
KR20230005430A (en) * 2018-02-26 2023-01-09 구글 엘엘씨 Automated voice translation dubbing for prerecorded videos
KR102598824B1 (en) 2018-02-26 2023-11-06 구글 엘엘씨 Automated voice translation dubbing for prerecorded videos

Also Published As

Publication number Publication date
CN104038804A (en) 2014-09-10

Similar Documents

Publication Publication Date Title
CN104038804B (en) Captioning synchronization apparatus and method based on speech recognition
CN111968649B (en) Subtitle correction method, subtitle display method, device, equipment and medium
CN109635270B (en) Bidirectional probabilistic natural language rewrite and selection
US7949530B2 (en) Conversation controller
CN105244022B (en) Audio-video method for generating captions and device
JP5610197B2 (en) SEARCH DEVICE, SEARCH METHOD, AND PROGRAM
US7949532B2 (en) Conversation controller
US20210158795A1 (en) Generating audio for a plain text document
CN106878805A (en) A kind of mixed languages subtitle file generation method and device
WO2011068170A1 (en) Search device, search method, and program
US8688725B2 (en) Search apparatus, search method, and program
JPH11191000A (en) Method for aligning text and voice signal
US20180047387A1 (en) System and method for generating accurate speech transcription from natural speech audio signals
US9818450B2 (en) System and method of subtitling by dividing script text into two languages
CN109754783A (en) Method and apparatus for determining the boundary of audio sentence
JP2015212732A (en) Sound metaphor recognition device and program
CN111739556A (en) System and method for voice analysis
KR101410601B1 (en) Spoken dialogue system using humor utterance and method thereof
CN110324709A (en) A kind of processing method, device, terminal device and storage medium that video generates
JP2012181358A (en) Text display time determination device, text display system, method, and program
CN111079423A (en) Method for generating dictation, reading and reporting audio, electronic equipment and storage medium
Levin et al. Automated closed captioning for Russian live broadcasting
CN105931641A (en) Subtitle data generation method and device
US11967248B2 (en) Conversation-based foreign language learning method using reciprocal speech transmission through speech recognition function and TTS function of terminal
CN102970618A (en) Video on demand method based on syllable identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant