CN104038804B - Captioning synchronization apparatus and method based on speech recognition - Google Patents
Captioning synchronization apparatus and method based on speech recognition Download PDFInfo
- Publication number
- CN104038804B CN104038804B CN201310069142.9A CN201310069142A CN104038804B CN 104038804 B CN104038804 B CN 104038804B CN 201310069142 A CN201310069142 A CN 201310069142A CN 104038804 B CN104038804 B CN 104038804B
- Authority
- CN
- China
- Prior art keywords
- text information
- voice
- module
- word
- captions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
There is provided a kind of captioning synchronization apparatus and method based on speech recognition, the captioning synchronization device includes:Sound identification module, the voice from audio stream extraction foreground sounds, and the voice of extraction is sampled and recognized, so as to generate and corresponding text information;Dynamic sampling adjusting module, carries out the evaluation of semantics recognition degree to the text information of generation, and according to the result of evaluation come control voice identification module sampling rate adjusting to obtain the text information with high semantics recognition degree;Captions semanteme contrast module, semantic matches are carried out by the word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video;Captioning synchronization module, if captions semanteme contrast module finds sentence corresponding with the text information of the voice of identification in subtitle file, the temporal information of subtitle file is adjusted according to the temporal information of voice;Subtitle Demonstration module, captions are shown according to the temporal information of the subtitle file after adjustment.
Description
Technical field
The present invention relates to speech recognition and captioning synchronization technical field.More particularly, it is related to one kind and utilizes speech recognition
The apparatus and method of automatic synchronization captions corresponding with video when TV programme are played.
Background technology
At present, the support in digital television signal stream for subtitle language number is limited, it is impossible to while meeting different crowd
Demand.Especially in place as hotel hotel, the people for having many different language countries moves in, and these crowds are in viewing
The need for just having its special when DTV captions.Therefore, when playing digital TV video frequency in the presence of additional many of display
The demand of state's language subtitle.Simultaneously as the information that may be broken for commercialsy with emergency notice etc in TV programme, additional
Multinational Subtitle Demonstration needs advertisement category information turn function, synchronous with audio frequency and video holding all the time.
The content of the invention
The present invention proposes in TV programme simultaneous display when there is commercial breaks by using speech recognition technology and added
The scheme of captions.By additional language subtitle, sampled using dynamic voice, effective audio-frequency information is rationally obtained, to subtitling
The text Presentation Time Stamp that is matched and adjusted so that subtitling text can be to entering in the presence of the phenomenon such as intercutting in digital television program
The effective adjustment of row, keeps the simultaneous display of subtitling.
According to an aspect of the present invention there is provided a kind of captioning synchronization device based on speech recognition, including:Speech recognition
Module, the voice from audio stream extraction foreground sounds corresponding with playing video, and the voice of extraction is sampled and known
Not, so as to generate text information corresponding with the voice recognized;Dynamic sampling adjusting module, the text generated to sound identification module
Word information carries out the evaluation of semantics recognition degree, and according to the result of evaluation come control voice identification module sampling rate adjusting with
Obtain the text information with high semantics recognition degree;Captions semanteme contrast module, by the text information with high semantics recognition degree
The word of additional multilingual subtitle with playing video carries out semantic matches;Captioning synchronization module, if captions semanteme contrast
Module finds sentence corresponding with the text information of the voice of identification in subtitle file, then is adjusted according to the temporal information of voice
The temporal information of whole subtitle file;Subtitle Demonstration module, the temporal information of the subtitle file after being adjusted according to captioning synchronization module
To show captions.
According to an aspect of the present invention, the captioning synchronization device also includes:Speech selection module, according to the selection of user
To determine the language for the captions that will be shown.
According to an aspect of the present invention, when in the text information that dynamic sampling adjusting module determines sound identification module generation
Phonetic word number when preset range [m, n] is interior, dynamic sampling adjusting module, which determines that text information has, high semantic to be known
Do not spend, wherein m, n are natural numbers.
According to an aspect of the present invention, if dynamic sampling adjusting module determines the text information of sound identification module generation
In phonetic word quantity be less than minimum number m, then dynamic sampling adjusting module control voice identification module improve sampling frequency
Rate is sampled to voice;If dynamic sampling adjusting module determines the voice in the text information of sound identification module generation
The quantity of word is higher than maximum quantity n, then dynamic sampling adjusting module control voice identification module reduction sample frequency is come to language
Sound is sampled.
According to an aspect of the present invention, dynamic sampling adjusting module is considered in the text information of sound identification module generation
The semantic meaning of phonetic word evaluates the semantics recognition degree of text information.
According to an aspect of the present invention, the semantic contrast module of captions uses fuzzy algorithmic approach to playing video using fuzzy algorithmic approach
The word of additional multilingual subtitle enter line character scoring so that find out the sentence of highest scoring in subtitle file as with text
The sentence of word information matches.
According to an aspect of the present invention, if captions semantic matches module does not find the language with identification in subtitle file
The corresponding sentence of text information of sound, then notify dynamic sampling adjusting module to improve the sample frequency of sound identification module.
According to another aspect of the present invention there is provided a kind of captioning synchronization method based on speech recognition, including:(a) from
Audio stream corresponding with playing video extracts the voice in foreground sounds, and the voice of extraction is sampled and recognized, so that
Generate text information corresponding with the voice recognized;(b) evaluation of semantics recognition degree, and root are carried out to the text information of generation
Carry out control voice identification module sampling rate adjusting according to the result of evaluation to obtain the text information with high semantics recognition degree;
(c) word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video is carried out semantic
Match somebody with somebody, to find sentence corresponding with the text information of the voice of identification in subtitle file;(d) according to the temporal information of voice come
Adjust the temporal information of subtitle file;(e) captions are shown according to the temporal information of the subtitle file after adjustment.
According to another aspect of the present invention, the captioning synchronization method also includes:Being determined according to the selection of user will
The language of the captions of display.
According to another aspect of the present invention, in step (b), when it is determined that step (a) generation text information in voice list
The number of word determines that text information has high semantics recognition degree, wherein m, n is natural number when preset range [m, n] is interior.
According to another aspect of the present invention, in step (b), if it is determined that the voice in the text information of step (a) generation
The quantity of word is less than minimum number m, then return to step (a) and improves sample frequency to sample to voice;If it is determined that
The quantity of phonetic word in the text information of step (a) generation is higher than maximum quantity n, then return to step (a) reduction sampling frequency
Rate is sampled to voice.
According to another aspect of the present invention, in step (b), it is considered to the phonetic word in the text information of step (a) generation
Semantic meaning evaluate the semantics recognition degree of text information.
According to another aspect of the present invention, in step (c), using fuzzy algorithmic approach using fuzzy algorithmic approach to playing the attached of video
Plus the word of multilingual subtitle enters line character scoring, so that the sentence for finding out highest scoring in subtitle file is believed as with word
Cease the sentence of matching.
According to another aspect of the present invention, if do not found in step (c) in subtitle file and the voice of identification
The sample frequency of the corresponding sentence of text information, then return to step (a) raising speech recognition.
Brief description of the drawings
By the description carried out below in conjunction with the accompanying drawings, above and other purpose of the invention and feature will become more clear
Chu, wherein:
Fig. 1 is the block diagram for showing the captioning synchronization device according to embodiments of the present invention based on speech recognition;
Fig. 2 is the flow chart for showing the captioning synchronization method according to embodiments of the present invention based on speech recognition.
Embodiment
The description of progress referring to the drawings is provided below to contribute to comprehensive understanding such as claim and its equivalent to be limited
Exemplary embodiment of the invention.The description includes various detailed details to help to understand, and these describe will be by
Think exemplary only.Therefore, one of ordinary skill in the art will recognize do not departing from scope and spirit of the present invention
In the case of can make various changes described here and modification.In addition, in order to clear and succinct, can omit to known function and
The description of construction.
The term and word used in following described and claimed is not limited to bibliographical meaning, but only by inventor
Use to understand and as one man understand the present invention.Therefore, it should be appreciated by the person skilled in the art that being provided below
To the description of the exemplary embodiment of the present invention merely for the purpose shown, rather than for limitation such as by claim and its
The purpose of the present invention that jljl limits.
Fig. 1 is the block diagram for showing the captioning synchronization device 100 according to embodiments of the present invention based on speech recognition.
As shown in figure 1, the captioning synchronization device 100 based on speech recognition according to embodiments of the present invention includes speech selection
The semantic contrast module 140 of module 110, sound identification module 120, dynamic sampling adjusting module 130, captions, captioning synchronization module
150 and Subtitle Demonstration module 160.Captioning synchronization device 100 according to embodiments of the present invention can be integrated into Digital Broadcasting Receiver dress
Put or video play device among.
Speech selection module 110 can determine the subtitle language that will be shown according to the selection of user.For example, when user is logical
Cross remote control equal controller and send signal to captioning synchronization device 100, so as to select the subtitle language for wanting to use.
Sound identification module 120 is from the corresponding audio of the video flowings of the TV programme with playing or other broadcasting contents
Stream extracts the voice in foreground sounds, and the voice of extraction is sampled and recognized, so as to generate corresponding with the voice recognized
Text information.By extracting prospect master voice, the background sound in the video of broadcasting can be removed, for example, movie or television
The sound such as automobile, background music in program, can so improve the degree of accuracy of speech recognition.Can using it is any in the prior art
Prospect master voice extracting method and speech recognition engine realize sound identification module 120.
The text information that dynamic sampling adjusting module 130 is generated to sound identification module 120 carries out semantic identification degree and commented
Valency, and according to the result of evaluation determine the need for adjust sound identification module 120 sample frequency.It is real according to the one of the present invention
Example is applied, dynamic sampling adjusting module 130 can determine that the number of the phonetic word in the text information that sound identification module 120 is generated
Whether in preset range [m, n].If it is determined that the quantity of the phonetic word in text information is less than minimum number m or is more than
Maximum quantity n, then the determination of dynamic sampling adjusting module 130 semantics recognition degree is relatively low, it is necessary to sampling rate adjusting.Work as dynamic sampling
Adjusting module 130 determines that the quantity of the phonetic word in the text information that sound identification module 120 is generated is less than minimum number m
When, dynamic sampling adjusting module 130 determines to need to improve sample frequency, so that the adopting with raising of control voice identification module 120
Sample frequency is sampled to voice.When dynamic sampling adjusting module 130 determines the text information that sound identification module 120 is generated
In phonetic word quantity be higher than maximum quantity n when, dynamic sampling adjusting module 130 determine can reduce sample frequency, from
And control voice identification module 120 is sampled according to the sample frequency after reduction to voice.That is, as the people in audio
When thing speaks word speed quickly, the sentence number of characters obtained within the unit interval will increase, and this causes the error rate that captions are matched
Increase, now, it may be determined that the semantics recognition degree of present video is low.Conversely, when in audio personage speak word speed it is very slow when, in unit
The sentence number of characters obtained in time will be reduced, and equally can also increase the error rate of captions matching, now, equally be can determine that and worked as
Preceding audio semantics recognition degree is low.Therefore, only control sample frequency and obtain the number of characters of fair amount and just can determine that semanteme
Resolution is high.
In addition, embodiments in accordance with the present invention, when evaluation semantics recognition is spent, dynamic sampling adjusting module 130 can also be examined
Consider the semantic meaning of the phonetic word in the text information that sound identification module 120 is generated, determine whether to need adjustment to adopt
Sample frequency.For example, when the phonetic word in the text information that sound identification module 120 is generated includes multiple low semantic words
When (for example, the onomatopoeia of such as continuous multiple " "), dynamic sampling adjusting module 130 can determine that sound identification module 120 is given birth to
Into text information semantics recognition degree it is relatively low, and control voice identification module 120 improve sample frequency.
Next, obtaining the text information of higher semantics recognition degree in the assessment by dynamic sampling adjusting module 130
Afterwards, the semantic contrast module 140 of captions carries out the word of text information and the additional multilingual subtitle for playing video semantic
Matching.Here, the semantic contrast module 140 of captions can use fuzzy algorithmic approach, and word is carried out to the word of additional multilingual subtitle
Symbol scoring, so as to find out the sentence of highest scoring in subtitle file.That is, captions semanteme contrast module 140 is literary by captions
Scoring is defined as sentence corresponding with the text information recognized higher than the scoring highest sentence in the sentence of predetermined value in part.
It will be exemplified below by the way of fuzzy algorithmic approach scores sentence.Certainly, those skilled in the art can adopt
Otherwise search the sentence with the semantic matches of the sentence in subtitle file.
Two character strings ACAATCC and AGCATGC are provided, then modifies, delete and adds when being matched to both
It can just be matched completely Deng operation.In order to be more convenient the calculating of the degree of approximation, editing distance is adjusted to degree of approximation score, even
With then obtaining 2 points, modification, deletion, addition then obtain -1 point., can be by following in order to obtain degree of approximation score during matching completely
Recurrence formula obtains a score matrix, and its degree of approximation score is S (n, n) value in n rank matrixes S, and n is matching string
Length adds 1.V represents Value (i.e. score value), and D represents Difference Value (i.e. difference), and S represents String and (treated
With character string), T represents Template i.e. template, and i, j represent the row and column of matrix respectively, and value is since 0).
Initial value can be directly obtained:
V (0,0)=0;
V (0, j)=V (0, j-1)+D (_, T [j]);Insertion j times
V (i, 0)=V (i-1,0)+D (S [i], _);Delete i times
Other values can be obtained by following stepping type:
Formula more than, with exemplified by calculating V (1,2),
Known i=1, j=2
Then:
V (0,1)=- 1, V (0,2)=- 2, V (1,1)=2;
D (S [1], T [2])=- 1 (i.e. A is compared with AG),
D (S [1], _)=- 1 (i.e. A with _ is compared),
D (_, T [2])=- 1 (i.e. _ compared with G);
V (1,2)=V (0,1)+D (S [1], T [2])=- 2,
V (1,2)=V (0,2)+D (S [1], _)=- 3,
V (1,2)=V (0,1)+D (_, T [2])=1;
It can finally obtain:
(max) V (1,2)=1
The corresponding optimal similarity score for being scored at the character string of 7 points, i.e., two of most short editing distance may finally be drawn
For 7.
It the above is only an example of the method for scoring character string of having enumerated, any of side can also be used
Method is evaluated come the similitude between the sentence in the text information and subtitle file to identification.
If the scoring of all sentences is below predetermined value, captions semanteme contrast module 140 is determined in subtitle file not
In the presence of sentence corresponding with the text information of identification.Embodiments in accordance with the present invention, when the semantic contrast module 140 of captions does not exist
When sentence corresponding with the text information recognized is found in subtitle file, captions semanteme contrast module 140 is adjusted to dynamic sampling
Module 130 sends the order for improving sample frequency, so that dynamic sampling adjusting module 130 can control sound identification module 120
Continue that voice is identified according to the sample frequency of raising.Then, sound identification module 120, dynamic sampling adjustment mould are repeated
The aforesaid operations of the semantic contrast module 140 of block 130, captions, until find with the semantic similarity of the sentence in subtitle file compared with
Untill high voice.
If captions semanteme contrast module 140 finds the sentence in subtitle file corresponding with the voice sampled, captions
Synchronization module 150 adjusts the temporal information of subtitle file according to the temporal information of voice.That is, captioning synchronization module
It is inclined between the temporal information of 150 sentences found according to the semantic contrast module 140 of temporal information and captions of the voice of sampling
Shifting value adjusts the temporal information of Subtitle Demonstration.
Finally, Subtitle Demonstration module 160 shows word according to the temporal information of the captions after the adjustment of captioning synchronization module 150
Curtain.
It should be understood that modules described above can be further combined into less module, or according to its execution
Operate and be divided into more modules.
Captioning synchronization side according to embodiments of the present invention based on speech recognition is described below with reference to Fig. 2 flow chart
Method.
First, in step S210, the voice from audio stream corresponding with video flowing extraction foreground sounds, and to extraction
Voice is sampled and recognized, so as to generate text information corresponding with the voice recognized.Here, word can be selected by user
The language form of information.
Next, in step S220, semantic identification degree evaluation is carried out to the text information of generation.Next, in step
S230 determines the need for adjusting the sample frequency of speech recognition according to the result of evaluation.Embodiments in accordance with the present invention, can lead to
Whether the number for the phonetic word crossed in the text information for determining the generation of sound identification module 120 is come in preset range [m, n]
Decide whether the sample frequency of adjustment speech recognition.In addition, it is also possible to consider the semanteme meaning of the phonetic word in text information
Justice determines the need for sampling rate adjusting.If it is determined that need sampling rate adjusting, then can according to semanteme in step S235
The evaluation result of resolution carrys out sampling rate adjusting, then returnes to step S210 to carry out semantic identification degree evaluation again.
If it is determined that not needing sampling rate adjusting, then proceed to step S240.
After obtaining the text information of higher semantics recognition degree by step S230 assessment, in step S240 by word
The word of additional multilingual subtitle of the information with playing video carries out semantic matches.
Next, determining whether to find the word letter with identification in the word of additional multilingual subtitle in step S250
Cease the sentence of matching.
If determining have found the sentence matched with text information in step S250, believe in step S260 according to word
The temporal information of corresponding voice is ceased to adjust the display time of captions.Otherwise, if not finding what is matched with text information
Sentence, then improve sample frequency in step S255, is then back to step S210 and extracts voice and sampled and recognized.
The operation S210-S255 of the above is repeated, until the word that the voice with extracting is found in subtitle file is believed
Untill ceasing corresponding sentence.
Finally, in S270, captions are shown according to the display time of the captions after adjustment.
The present invention proposes the solution of the simultaneous display of captions using speech recognition technology.By using dynamic voice
Sampling, rationally obtains effective audio-frequency information, and subtitling text is matched and display temporal information is adjusted, can be in DTV
Exist in program and the phenomenon such as intercut effective adjustment is carried out to the word of subtitling, keep the simultaneous display of subtitling.
The method according to the invention may be recorded in including performing by the programmed instruction of computer implemented various operations
In computer-readable medium.Medium can also only include programmed instruction or including be combined with programmed instruction data file,
Data structure etc..The example of computer-readable medium includes magnetizing mediums (such as hard disk, floppy disk and tape);Optical medium is (for example
CD-ROM and DVD);Magnet-optical medium (for example, CD);And especially it is formulated for the hardware unit of storage and execute program instructions
(for example, read-only storage (ROM), random access memory (RAM), flash memory etc.).Medium can also include transmitting regulation journey
The transmission medium of the carrier wave of the signal of sequence instruction, data structure etc. (such as optical line or metal wire, waveguide).Programmed instruction
Example includes the text of the machine code and high-level code performed comprising usable interpreter by computer for example produced by compiler
Part.
Although being particularly shown and describing the present invention, the skill of this area with reference to the exemplary embodiment of the present invention
Art personnel to it should be understood that in the case where not departing from the spirit and scope of the present invention being defined by the claims, can enter
Various changes in row form and details.
Claims (14)
1. a kind of captioning synchronization device based on speech recognition, including:
Sound identification module, the voice from audio stream extraction foreground sounds corresponding with playing video, and to the voice of extraction
Sampled and recognized, so as to generate text information corresponding with the voice recognized;
Dynamic sampling adjusting module, whether the number of the phonetic word in text information by determining sound identification module generation
Carry out to carry out the text information that sound identification module is generated the evaluation of semantics recognition degree within a predetermined range, and according to evaluation
As a result carry out control voice identification module sampling rate adjusting to obtain the text information with high semantics recognition degree;
Captions semanteme contrast module, by the text information with high semantics recognition degree and the additional multilingual subtitle for playing video
Word carry out semantic matches;
Captioning synchronization module, if captions semanteme contrast module finds the text information pair with the voice of identification in subtitle file
The sentence answered, then adjust the temporal information of subtitle file according to the temporal information of voice;
Subtitle Demonstration module, captions are shown according to the temporal information of the subtitle file after the adjustment of captioning synchronization module.
2. captioning synchronization device as claimed in claim 1, in addition to:
Speech selection module, the language of captions that will be shown is determined according to the selection of user.
3. captioning synchronization device as claimed in claim 1, wherein, when dynamic sampling adjusting module determines that sound identification module is given birth to
Into text information in phonetic word number when preset range [m, n] is interior, dynamic sampling adjusting module determine word believe
Breath has high semantics recognition degree, and wherein m, n is natural number.
4. captioning synchronization device as claimed in claim 3, wherein:
If dynamic sampling adjusting module determines that the quantity of the phonetic word in the text information of sound identification module generation is less than
Minimum number m, then dynamic sampling adjusting module control voice identification module improve sample frequency and voice sampled;
If dynamic sampling adjusting module determines that the quantity of the phonetic word in the text information of sound identification module generation is higher than
Maximum quantity n, then dynamic sampling adjusting module control voice identification module reduction sample frequency voice is sampled.
5. the captioning synchronization device as described in claim 3 or 4, wherein, dynamic sampling adjusting module considers sound identification module
The semantic meaning of phonetic word in the text information of generation evaluates the semantics recognition degree of text information.
6. captioning synchronization device as claimed in claim 1, wherein, captions semanteme contrast module is regarded using fuzzy algorithmic approach to broadcasting
The word of the additional multilingual subtitle of frequency enters line character scoring, thus find out the sentence of highest scoring in subtitle file as with
The sentence of text information matching.
7. captioning synchronization device as claimed in claim 1, wherein, if captions semanteme contrast module is not in subtitle file
Sentence corresponding with the text information of the voice of identification is found, then notifies dynamic sampling adjusting module to improve sound identification module
Sample frequency.
8. a kind of captioning synchronization method based on speech recognition, including:
(a) from playing the voice during the corresponding audio stream of video extracts foreground sounds, and the voice of extraction is carried out sampling and
Identification, so as to generate text information corresponding with the voice recognized;
(b) whether the number of the phonetic word in the text information by determining generation carrys out the word to generation within a predetermined range
Information carries out the evaluation of semantics recognition degree, and according to the result of evaluation come control voice identification module sampling rate adjusting to obtain
There must be the text information of high semantics recognition degree;
(c) word of the text information with high semantics recognition degree and the additional multilingual subtitle for playing video is carried out semantic
Matching, to find sentence corresponding with the text information of the voice of identification in subtitle file;
(d) temporal information of subtitle file is adjusted according to the temporal information of voice;
(e) captions are shown according to the temporal information of the subtitle file after adjustment.
9. captioning synchronization method as claimed in claim 8, in addition to:
The language of captions that will be shown is determined according to the selection of user.
10. captioning synchronization method as claimed in claim 8, wherein, in step (b), when it is determined that the word letter of step (a) generation
The number of phonetic word in breath determines that text information has high semantics recognition degree, wherein m, n when preset range [m, n] is interior
It is natural number.
11. captioning synchronization method as claimed in claim 8, wherein, in step (b),
If it is determined that the quantity of the phonetic word in the text information of step (a) generation is less than minimum number m, then return to step
(a) and improve sample frequency to sample to voice;
If it is determined that the quantity of the phonetic word in the text information of step (a) generation is higher than maximum quantity n, then return to step
(a) sample frequency is reduced to sample to voice.
12. the captioning synchronization method as described in claim 10 or 11, wherein, in step (b), it is considered to the text of step (a) generation
The semantic meaning of phonetic word in word information evaluates the semantics recognition degree of text information.
13. captioning synchronization method as claimed in claim 8, wherein, in step (c), using fuzzy algorithmic approach to playing video
The word of additional multilingual subtitle enters line character scoring, thus find out the sentence of highest scoring in subtitle file as with word
The sentence of information matches.
14. captioning synchronization method as claimed in claim 8, wherein, if step (c) do not found in subtitle file with
The sample frequency of the corresponding sentence of text information of the voice of identification, then return to step (a) raising speech recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310069142.9A CN104038804B (en) | 2013-03-05 | 2013-03-05 | Captioning synchronization apparatus and method based on speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310069142.9A CN104038804B (en) | 2013-03-05 | 2013-03-05 | Captioning synchronization apparatus and method based on speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104038804A CN104038804A (en) | 2014-09-10 |
CN104038804B true CN104038804B (en) | 2017-09-29 |
Family
ID=51469372
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310069142.9A Active CN104038804B (en) | 2013-03-05 | 2013-03-05 | Captioning synchronization apparatus and method based on speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104038804B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019164535A1 (en) * | 2018-02-26 | 2019-08-29 | Google Llc | Automated voice translation dubbing for prerecorded videos |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104202425A (en) * | 2014-09-19 | 2014-12-10 | 武汉易象禅网络科技有限公司 | Real-time online data transmission system and remote course data transmission method |
CN105741841B (en) * | 2014-12-12 | 2019-12-03 | 深圳Tcl新技术有限公司 | Sound control method and electronic equipment |
KR102413692B1 (en) * | 2015-07-24 | 2022-06-27 | 삼성전자주식회사 | Apparatus and method for caculating acoustic score for speech recognition, speech recognition apparatus and method, and electronic device |
CN105374366A (en) * | 2015-10-09 | 2016-03-02 | 广东小天才科技有限公司 | Method and system for wearable device to identify meaning |
CN105848005A (en) * | 2016-03-28 | 2016-08-10 | 乐视控股(北京)有限公司 | Video subtitle display method and video subtitle display device |
CN106486125A (en) * | 2016-09-29 | 2017-03-08 | 安徽声讯信息技术有限公司 | A kind of simultaneous interpretation system based on speech recognition technology |
CN106792097A (en) * | 2016-12-27 | 2017-05-31 | 深圳Tcl数字技术有限公司 | Audio signal captions matching process and device |
CN106604125B (en) * | 2016-12-29 | 2019-06-14 | 北京奇艺世纪科技有限公司 | A kind of determination method and device of video caption |
CN107241616B (en) * | 2017-06-09 | 2018-10-26 | 腾讯科技(深圳)有限公司 | video lines extracting method, device and storage medium |
CN108289244B (en) * | 2017-12-28 | 2021-05-25 | 努比亚技术有限公司 | Video subtitle processing method, mobile terminal and computer readable storage medium |
CN108366305A (en) * | 2018-02-07 | 2018-08-03 | 深圳佳力拓科技有限公司 | A kind of code stream without subtitle shows the method and system of subtitle by speech recognition |
CN108366182B (en) * | 2018-02-13 | 2020-07-07 | 京东方科技集团股份有限公司 | Calibration method and device for synchronous broadcast of text voice and computer storage medium |
CN108259963A (en) * | 2018-03-19 | 2018-07-06 | 成都星环科技有限公司 | A kind of TV ends player |
CN108449629B (en) * | 2018-03-31 | 2020-06-05 | 湖南广播电视台广播传媒中心 | Audio voice and character synchronization method, editing method and editing system |
CN109195007B (en) * | 2018-10-19 | 2021-09-07 | 深圳市轱辘车联数据技术有限公司 | Video generation method, device, server and computer readable storage medium |
CN109949793A (en) * | 2019-03-06 | 2019-06-28 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN110689220B (en) * | 2019-08-20 | 2023-04-28 | 国网山东省电力公司莱芜供电公司 | Automatic point-setting machine for realizing dispatching automation |
CN110619868B (en) * | 2019-08-29 | 2021-12-17 | 深圳市优必选科技股份有限公司 | Voice assistant optimization method, voice assistant optimization device and intelligent equipment |
CN110557668B (en) * | 2019-09-06 | 2022-05-03 | 常熟理工学院 | Sound and subtitle accurate alignment system based on wavelet ant colony |
CN110798733A (en) * | 2019-10-30 | 2020-02-14 | 中央电视台 | Subtitle generating method and device, computer storage medium and electronic equipment |
CN114333918A (en) * | 2020-09-27 | 2022-04-12 | 广州市久邦数码科技有限公司 | Method and device for matching audio book subtitles |
CN115474066A (en) * | 2021-06-11 | 2022-12-13 | 北京有竹居网络技术有限公司 | Subtitle processing method and device, electronic equipment and storage medium |
CN113689865A (en) * | 2021-08-24 | 2021-11-23 | 广东优碧胜科技有限公司 | Sampling rate switching method and device, electronic equipment and voice system |
CN116471436A (en) * | 2023-04-12 | 2023-07-21 | 央视国际网络有限公司 | Information processing method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6199041B1 (en) * | 1998-11-20 | 2001-03-06 | International Business Machines Corporation | System and method for sampling rate transformation in speech recognition |
CN101320560A (en) * | 2008-07-01 | 2008-12-10 | 上海大学 | Method for speech recognition system improving discrimination by using sampling velocity conversion |
CN101505397A (en) * | 2009-02-20 | 2009-08-12 | 深圳华为通信技术有限公司 | Method and system for audio and video subtitle synchronous presenting |
CN101808202A (en) * | 2009-02-18 | 2010-08-18 | 联想(北京)有限公司 | Method, system and computer for realizing sound-and-caption synchronization in video file |
CN102708861A (en) * | 2012-06-15 | 2012-10-03 | 天格科技(杭州)有限公司 | Poor speech recognition method based on support vector machine |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100640893B1 (en) * | 2004-09-07 | 2006-11-02 | 엘지전자 주식회사 | Baseband modem and mobile terminal for voice recognition |
-
2013
- 2013-03-05 CN CN201310069142.9A patent/CN104038804B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6199041B1 (en) * | 1998-11-20 | 2001-03-06 | International Business Machines Corporation | System and method for sampling rate transformation in speech recognition |
CN101320560A (en) * | 2008-07-01 | 2008-12-10 | 上海大学 | Method for speech recognition system improving discrimination by using sampling velocity conversion |
CN101808202A (en) * | 2009-02-18 | 2010-08-18 | 联想(北京)有限公司 | Method, system and computer for realizing sound-and-caption synchronization in video file |
CN101505397A (en) * | 2009-02-20 | 2009-08-12 | 深圳华为通信技术有限公司 | Method and system for audio and video subtitle synchronous presenting |
CN102708861A (en) * | 2012-06-15 | 2012-10-03 | 天格科技(杭州)有限公司 | Poor speech recognition method based on support vector machine |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019164535A1 (en) * | 2018-02-26 | 2019-08-29 | Google Llc | Automated voice translation dubbing for prerecorded videos |
KR102481871B1 (en) * | 2018-02-26 | 2022-12-28 | 구글 엘엘씨 | Automated voice translation dubbing of pre-recorded videos |
KR20230005430A (en) * | 2018-02-26 | 2023-01-09 | 구글 엘엘씨 | Automated voice translation dubbing for prerecorded videos |
KR102598824B1 (en) | 2018-02-26 | 2023-11-06 | 구글 엘엘씨 | Automated voice translation dubbing for prerecorded videos |
Also Published As
Publication number | Publication date |
---|---|
CN104038804A (en) | 2014-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104038804B (en) | Captioning synchronization apparatus and method based on speech recognition | |
CN111968649B (en) | Subtitle correction method, subtitle display method, device, equipment and medium | |
CN109635270B (en) | Bidirectional probabilistic natural language rewrite and selection | |
US7949530B2 (en) | Conversation controller | |
CN105244022B (en) | Audio-video method for generating captions and device | |
JP5610197B2 (en) | SEARCH DEVICE, SEARCH METHOD, AND PROGRAM | |
US7949532B2 (en) | Conversation controller | |
US20210158795A1 (en) | Generating audio for a plain text document | |
CN106878805A (en) | A kind of mixed languages subtitle file generation method and device | |
WO2011068170A1 (en) | Search device, search method, and program | |
US8688725B2 (en) | Search apparatus, search method, and program | |
JPH11191000A (en) | Method for aligning text and voice signal | |
US20180047387A1 (en) | System and method for generating accurate speech transcription from natural speech audio signals | |
US9818450B2 (en) | System and method of subtitling by dividing script text into two languages | |
CN109754783A (en) | Method and apparatus for determining the boundary of audio sentence | |
JP2015212732A (en) | Sound metaphor recognition device and program | |
CN111739556A (en) | System and method for voice analysis | |
KR101410601B1 (en) | Spoken dialogue system using humor utterance and method thereof | |
CN110324709A (en) | A kind of processing method, device, terminal device and storage medium that video generates | |
JP2012181358A (en) | Text display time determination device, text display system, method, and program | |
CN111079423A (en) | Method for generating dictation, reading and reporting audio, electronic equipment and storage medium | |
Levin et al. | Automated closed captioning for Russian live broadcasting | |
CN105931641A (en) | Subtitle data generation method and device | |
US11967248B2 (en) | Conversation-based foreign language learning method using reciprocal speech transmission through speech recognition function and TTS function of terminal | |
CN102970618A (en) | Video on demand method based on syllable identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |