CN103885924A - Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method - Google Patents

Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method Download PDF

Info

Publication number
CN103885924A
CN103885924A CN201310596791.4A CN201310596791A CN103885924A CN 103885924 A CN103885924 A CN 103885924A CN 201310596791 A CN201310596791 A CN 201310596791A CN 103885924 A CN103885924 A CN 103885924A
Authority
CN
China
Prior art keywords
module
model
recognition result
keyword
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310596791.4A
Other languages
Chinese (zh)
Inventor
巢文涵
马国庆
苏一鸣
李水华
孙承根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201310596791.4A priority Critical patent/CN103885924A/en
Publication of CN103885924A publication Critical patent/CN103885924A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

Disclosed are a field-adaptive automatic open class subtitle generating system and a field-adaptive automatic open class subtitle generating method. The field-adaptive automatic open class subtitle generating system comprises a preprocessing module, a model modification module, an audio recognizing module, a recognition result feedback module and a subtitle generating module, wherein the preprocessing module is used for preprocessing videos, texts and keywords; the model modification module contains text data obtained in the preprocessing process and is used for modifying a language module in a recognizing process; the audio recognizing module is used for recognizing uploaded videos; the recognition result feedback module is used for extracting keywords from recognition results, and the keywords can be called into the preprocessing process to be processed by the aid of a method for processing the keywords input by a user; the subtitle generating module is used for standardizing the final recognition result during a subtitle generating process and adding information like timer shafts so as to obtain real subtitles. The field-adaptive automatic open class subtitle generating system and the field-adaptive automatic open class subtitle generating method have the advantages that difficulty in generating the subtitles for open classes is solved, load on manpower for manually generating the subtitles is relieved via automatic processing, and great convenience is brought to people having English open classes.

Description

A kind of open class captions automatic creation system and method for domain-adaptive
Technical field
The present invention relates to a kind of open class captions automatic creation system and method for domain-adaptive, belong to multimedia and speech recognition technology field.
Background technology
Along with informationalized universal, the resource of various openings is more and more.Taking the course resources of colleges and universities as example, the released one after another video open class of every field of some world-class schools, for study relevant knowledge provides chance more easily.But blemish in an otherwise perfect thing is that various videos often do not have captions, and due to the obstacle of language, this has brought very large difficulty to study.Existing common solution is exactly to utilize special captions to generate team, is that relevant video adds captions specially, and this process often takes time and effort; Because the videos such as open class often belong to some special field, such as computing machine, law, literature etc., therefore in captions generative process, need the expert of domain-specific, this has also strengthened the difficulty that captions generate simultaneously.
Speech recognition technology was started to walk from the fifties in last century, through the development of over half a century, since having obtained significant achievement, particularly last decade, progressively walk out laboratory and realized commercialization, there is the system of a large amount of maturations to appear on the market both at home and abroad, certainly utilize these systems of maturation, can from the video of open class, obtain caption information, but because the scope of one's knowledge of open class design is extensive, and have a lot of professional vocabulary, cause last accuracy of identification not ideal enough.
Summary of the invention
The technology of the present invention is dealt with problems: overcome the deficiencies in the prior art, a kind of open class captions automatic creation system and method for domain-adaptive are provided, set out with open class resource instantly, by utilizing the correlation technique of speech recognition, in conjunction with the method for natural language processing, solve the difficult problem that open class captions are produced, by the processing of robotization, alleviate the manpower burden of artificial production captions, for everybody English learning open class provides a great convenience.
The technology of the present invention solution: a kind of open class captions automatic creation system of domain-adaptive, comprising: pretreatment module, model modification module, audio identification module, recognition result feedback module and captions generation module; Wherein:
Pretreatment module comprises pre-service, the pre-service of text and the processing of keyword to video; Be that the open class video that user is provided carries out extraction and the conversion of audio frequency to the pre-service of video, the video format that user provides can be diversified, and the audio format finally obtaining is unified; The pre-service of text is that needs are extracted and conversion, the text message of the various forms that provide according to user, text message is wherein retrieved and filtered, the text information of all different-formats is carried out to unified format conversion, unified being put in a text, for follow-up use, and carries out information retrieval and filtration to text after reunification, get useful plain text data, i.e. English text and containing chart and other symbols completely; In text pretreatment module, also comprise keyword pre-service, utilize the keyword that user inputs to carry out web search and information extraction, get corresponding plain text data, and be used from amendment model of cognition with the useful plain text data one that customer-furnished text extracts;
Model modification module, utilizes the plain text data that pretreatment module obtains to modify to the language model in identifying, obtains amended language model;
Described utilizing after text information that user provides generates new language model, by this newly-generated model and original merging by model, adopt the method for cum rights linear interpolation, obtain a new model of cognition, to improve the probability of related term, thereby reach the object of Optimized model;
Audio identification module, the audio file obtaining after pretreatment module is carried out to speech recognition, in the process of identification, first audio file to be changed into characteristic value sequence, then contrasting speech model and language model resolves and gives a mark each possible recognition result, what this wherein mainly utilized is exactly three problems in HMM model, calculates optimum status switch corresponding to recognition result according to the characteristic value sequence in speech recognition, for marking result output recognition result;
Recognition result feedback module, for last recognition result, single text is carried out the extraction of keyword, in network retrieval, adopt the method for TF-IDF based on word frequency statistics widely to extract so can not directly adopt, native system has been taked to extract based on the keyword extraction algorithm of word intergroup relation.
Captions generation module, arranges final recognition result, and finally the file layout of output is the srt(SubRip Text of standard) form subtitle file, then to net result joining day axis information.
The amendment of the language model in described model modification module comprises model generation and model interpolation, described model generation is exactly to utilize the plain text data that pretreatment module obtains to build language model, and model interpolation is that the language model newly building and original general language model are carried out to interpolation; The structure of language model is exactly by the phrase in plain text data is added up specifically, calculates respectively probability and expectation that phrase occurs in plain text data; The interpolation of language model is that the probability in result of calculation and original general language model is carried out to interpolation, obtains the new probability that phrase is corresponding, as shown in formula below:
P(ω|h)=λP c(ω|h)+(1-λ)P α(ω|h)(0≤λ≤1)
Wherein P c(ω | h) be the probability in universal model, P α(ω | the h) probability for utilizing plain text data newly to calculate, P (ω | the h) probability for finally obtaining, λ is interpolation coefficient.
The process of the concrete identification of described audio identification module is: first audio file will be changed into characteristic value sequence, recycling speech model is resolved characteristic value sequence, obtain possible recognition result, then contrasting language model gives a mark to each possible recognition result, wherein be mainly concerned with the 3rd problem in HMM model, calculate optimum status switch corresponding to recognition result according to the characteristic value sequence in speech recognition, for marking result output recognition result.
The detailed process of described recognition result feedback module is: first recognition result is carried out the extraction of keyword, from recognition result, extract the keyword relevant with content to the theme of video, then utilize the keyword extracting to carry out web search and information extraction, obtain relevant plain text data, this process is identical with the processing to keyword in pre-service, then utilizes these plain text data again to revise language model.
An open class captions automatic generation method for domain-adaptive, performing step is as follows:
(1) user add video, text or input keyword, for the video of user add, carry out audio extraction and format conversion, thereby obtain the audio file of consolidation form; Retrieve and filter for the text of user add, the text of all different-formats is carried out to unified format conversion, i.e. unified being put in a text, and the text obtaining is carried out to retrieval and the filtration of information, get useful plain text data, for follow-up use; For the keyword of user's input, utilize keyword to carry out web search and information extraction, obtain relevant plain text data, and the plain text data one of obtaining the text providing from user is used from amendment model of cognition;
(2) utilize amended model to carry out the identification of audio frequency, obtain recognition result, recognition result further feeds back, and by again getting back to network retrieval after keyword extraction, identifying is continuous iteration in this way, finally obtains captions.
The present invention's advantage is compared with prior art:
(1) the present invention sets out with current open class resource, by utilizing the correlation technique of speech recognition, in conjunction with the method for natural language processing, solve the difficult problem that open class captions are produced, by the processing of robotization, alleviate the manpower burden of artificial production captions, for everybody English learning open class provides a great convenience.
(2) the present invention utilizes speech recognition technology, for open class video generates captions automatically, considers the feature of open class self territoriality, and how system primary study realizes the domain-adaptive in speech recognition process, the quality generating to improve captions.The meaning of whole system is to utilize speech recognition technology, realizes robotization and intellectuality that open class captions generate, saves a large amount of man power and materials, simultaneously for the people who wants study provides better resource.
Brief description of the drawings
Fig. 1 is the composition frame chart of system of the present invention;
Fig. 2 is the realization flow figure of the inventive method;
Fig. 3 is the realization flow figure of pretreatment module in the present invention;
Fig. 4 is received pronunciation identification process figure;
Fig. 5 is that language model builds process flow diagram.
Embodiment
As shown in Figure 1, 2, the present invention includes: pretreatment module, model modification module, audio identification module, recognition result feedback module and captions generation module; Pretreatment module: comprise the pre-service to video, the pre-service of text and the pre-service of keyword; Model modification module: the content that the amendment of model comprises is to utilize the text information obtaining in preprocessing process to modify to the language model in identifying; Audio identification module: to the identification of uploaded videos; Keyword extracting module: the captions that obtain liking identification of keyword extraction, the keyword extracting can call again the treating method of the keyword of inputting for user in preprocessing process and process, and this process forms iteration; Captions generation module: in captions generative process, final recognition result is standardized, add the information such as time shaft, obtain real captions.
Respectively each module is elaborated below.
1. pretreatment module
Whole pretreatment module has comprised pre-service, the pre-service of text and three parts of the pre-service of keyword of video, makes introductions all round below.
As shown in Figure 3, the flow process of whole pretreatment module process is as follows:
(1) video pre-filtering
Mainly that the open class video that user is provided carries out extraction and the conversion of audio frequency to the pre-service of video, the video format that user provides can be diversified, the audio format finally obtaining need to be unified, in this process, call FFmpeg interface, the uniform format of audio frequency is arrived to wav form, and the sound channel of unified audio file and sampling rate etc.For recognition system, be monophony by all audio frequency unifications that extract to sound channel, sampling rate is 44100, bit rate is 705kb.
(2) text pre-service
The same with the pre-service of video, the pre-service of text also needs to extract and conversion, and the text message of the various forms that provide according to user, retrieves and filter text message wherein.The form of the text information providing due to user may be diversified, such as word, ppt, pdf etc., so the processing of the first step is exactly that the text information of all different-formats is carried out to unified format conversion, i.e. unified being put in a text (txt form), for follow-up use.
For the text obtaining, carry out retrieval and the filtration of information.Because whole system is English speech recognition system, so what first will do is exactly the non-English part removing in text, secondly, in order to facilitate the foundation of following model, also need sentence to arrange, comprise removing of punctuation mark, the merging of sentence with cut apart.Whole processing procedure is all to utilize the method for regular expression to carry out, and finally obtains plain text data.
(3) keyword processing
The processing of keyword is mainly to utilize keyword to carry out web search, gets corresponding text information.For the keyword of user's input, system is searched for according to keyword, and main search entrance comprises wiki, google science, freebase etc.Different from general web crawlers, FASAS system processing mode is herein to screen according to the relation of the page or file and keyword, for the text information obtaining (taking file as unit), call the pretreated method processing of text and can obtain plain text data.
2. model modification module
Model modification part comprise in have two parts: the generation of model and the interpolation of model.Be exactly to utilize the plain text data that pre-service obtains to build model about the generation of model; The interpolation of model is that the model newly building and original model are carried out to interpolation.
(1) model generation
Before introducing model generation, first the ultimate principle of speech recognition is carried out to a rough introduction, as shown in Figure 4, the basic process of speech recognition roughly can be divided into three steps: feature extraction, model bank are set up and last pattern match.
The groundwork of feature extraction is exactly that the voice signal of input is processed, and obtains corresponding eigenwert, thereby voice signal is converted into characteristic of correspondence value sequence for identification; The foundation of model bank comprises again the foundation of speech model and the foundation of language model; Pattern match is to give a mark according to front two results that obtain, and selects the highest result of score, and the result of matching degree maximum is exported.
Language model (Language Model, LM) is a record sheet of the probability that occurs in the whole process of speaking by the each word of sign out of a large amount of text training and phrase.The language model of present comparison main flow all comprises: linear model (Unigram), binary model (Bigram), ternary model (Trigram).In the extensive utilization of LM and natural language processing.In the process of speech recognition, carry out the calculating of various possible outcome probabilities of occurrence by speech model and language model, thereby appearring in selection, the result of maximum probability exports.
Because LM describes the distribution of the probability of the appearance of given word sequence in language, be exactly the distribution probability that changes word sequence to a certain extent so change model.Mention above for any open class video and all had its specific field and knowledge background, even can obtain the text information relevant to video, so can utilize these text informations to improve the process of identification: by these text informations, the language model in identifying is modified.
(2) model generation
According to the comparison to language model method for building up, system has been selected taking Good-Turing algorithm as basic language model construction method.
At this, once content and the major function of language model are described, under Good-Turing algorithm, for reducing the complexity of model, only monobasic, binary and three metagrammars are added up and calculated, first add up the probability that in training set, various grammers occur, then utilize Good-Turing algorithm to carry out probability calculation and discounting.In addition, consider the limitation of training text, also need the rollback probability of monobasic and binary word to calculate.
For Good-Turing algorithm, flow process as shown in Figure 5, utilizes formula 1 to calculate d r:
d r = r * r - ( k + 1 ) n k + 1 n 1 1 - ( k + 1 ) n k + 1 n 1 ( r * = ( r + 1 ) n r + 1 r · n r ) (formula 1)
Wherein, n rrepresent the number that has occurred the N unit word of r time in training set, r *for the r value after smoothing processing, the threshold value (constant) that k is a certain setting, d rfor having there is the discounting rate of n unit word of r time.If but the d calculating on this basis r>=1 or d r<=0 is d radopt following formula 2 to calculate:
d r = min { a , 1 - n r r ( 2 n r + n r + 1 ) } ( a = 0.98 ) (formula 2)
For the calculating of rollback probability, utilize Katz smoothing algorithm to realize.Taking two-dimensional grammar as example, for an occurrence number be
Figure DEST_PATH_GDA0000498254240000062
two-dimensional grammar
Figure DEST_PATH_GDA0000498254240000063
revise as follows:
C ( w i - 1 i ) = d r r , r > 0 &alpha; ( w i - 1 ) P ML ( w i ) , r = 0
Wherein r is
Figure DEST_PATH_GDA0000498254240000065
in text actual occur number of times, d rfor for discount rate, α (w i-1) be w i-1rollback probability,
Figure DEST_PATH_GDA0000498254240000066
for revised two-dimensional grammar
Figure DEST_PATH_GDA0000498254240000067
the number of times occurring.
W i-1rollback probability can be calculated by following formula:
&alpha; ( w i - 1 ) = 1 - &Sigma; c ( w i - 1 i ) > 0 p ( w i | w i - 1 ) 1 - &Sigma; c ( w i - 1 i ) > 0 p ( w i )
Wherein,
Figure DEST_PATH_GDA0000498254240000069
for
Figure DEST_PATH_GDA00004982542400000610
in text, occur number of times, p (w i) be w ithe probability occurring, p (w i| w i-1) be w iw under the prerequisite occurring i-1the probability occurring, α (w i-1) be w i-1rollback probability.
(3) model interpolation
Utilizing after text information that user provides generates new language model, by this newly-generated model and original merging by model, what mainly adopt is the method for cum rights linear interpolation, obtain a new model of cognition, improve the probability of related term, thereby reach the object of Optimized model.
For the computation process of simple interpolations as formula:
P(ω|h)=λP c(ω|h)+(1-λ)P α(ω|h)(0≤λ≤1)
Wherein P c(ω | h) be the probability in universal model, P α(ω | the h) probability for utilizing plain text data newly to calculate, P (ω | the h) probability for finally obtaining, λ is interpolation coefficient, span is 0-1.
3. audio identification module
The identification of audio frequency is that the audio file to obtaining in preprocessing process carries out speech recognition, the sphinx-4 framework that has adopted MIT to provide.
In the process of identification, first audio file to be changed into characteristic value sequence, then contrasting speech model and language model gives a mark to each possible recognition result, what this wherein mainly utilized is exactly three problems in HMM model, calculate optimum status switch (corresponding to recognition result) according to the characteristic value sequence in speech recognition, for marking result output recognition result.
The implementation procedure of audio identification module is exactly the status switch Q that asks conditional probability P (Q|O, μ) maximum under the condition of given model μ and observation sequence O:
Q ^ = arg max Q P ( Q | O , &mu; )
Under the identification framework of HMM model, system has adopted Viterbi algorithm to realize: Viterbi variable δ t(i) be in the time of time t, HMM arrives s along a certain paths i, and export observation sequence o 1o 2o tmaximum probability:
&delta; t ( i ) = max q 1 , q 2 . . . q t - 1 P ( q 1 , q 2 , . . . q t = s i , o 1 o 2 . . . o t | &mu; )
Can be in the hope of whole optimal sequence by following recurrence formula, wherein α, b is respectively state-transition matrix and probability matrix, α jiindicate that j state transitions is to i shape probability of state, b i(o t+1) be illustrated under i state and observe o t+1probability:
&delta; t + 1 ( i ) = [ max j &delta; t ( i ) &alpha; ji ] &CenterDot; b i ( o t + 1 )
In the process generating at whole captions, the identification of audio frequency and the foundation of dynamic model can repeatedly be carried out, and according to last recognition result, whole system is carried out self-adaptation one time, again adjust the language model of whole system, comprise secondary improvement to model etc.
4. recognition result feedback module
Recognition result feedback module is exactly mainly the recognition result that utilizes previous step, by recognition result being carried out to the extraction of keyword, obtain the keyword relevant to video, then call the disposal route for keyword in pretreatment module, this process has formed the iteration of whole system.
The extraction of keyword is a most important part in whole process, different from general keyword extraction, keyword extraction in FASAS system for be last recognition result, single text is carried out the extraction of keyword, in network retrieval, adopt the method for TF-IDF based on word frequency statistics widely to extract so can not directly adopt.
The feature of contact language model, system has adopted the keyword extracting method based on phrase relation, specifically, first the phrase term(that needs to add up in whole text comprises monobasic word, binary word, ternary word), go out high frequency words set G according to word frequency statistics, g is the element in set G, p grepresent the frequency that g independently occurs; ω represents any term in document, n ωrepresent the number of times that ω occurs together with term in all G is in same sentence, freq (ω, g) represents the number of times that ω occurs together with g.Be defined as follows formula:
&chi; 2 ( &omega; ) = &Sigma; g &Element; G ( ( freq ( &omega; , g ) - n &omega; p g ) 2 n &omega; p g
If obviously ω and g onrelevant, above-mentioned result is tending towards 0, and the contained quantity of information of the larger explanation of result that obtains is larger, is more likely keyword.For fear of special circumstances, above-mentioned computing formula is improved:
&chi; &prime; 2 ( &omega; ) = &chi; 2 ( &omega; ) - max g &Element; G { ( ( freq ( &omega; , g ) - n &omega; p g ) 2 n &omega; p g }
Calculate by said method, can obtain a phrase set that may become keyword, in actual test, find, in set, may there are some adverbial words, pronoun etc., so in actual processing procedure, need to set up a un-keywords table, record some common adverbial words, pronoun, preposition etc. in table, by the keyword set of extracting and table are contrasted, finally export keyword.
5. captions generation module
About the generation of captions, be mainly that final recognition result is arranged, the file layout that system finally need to be exported is the srt(SubRip Text of standard) form subtitle file, so need to be to net result joining day axis information.
In audio identification module, audio stream is divided into phonological component and non-speech portion by preprocessing part, phonological component can be identified and obtain text, each section of phonological component is to having start time and end time, the text that identification is obtained and the time of voice combine, be written to the subtitle file that has just obtained standard in text according to set subtitling format, obtain after subtitle file, system can read subtitle file and in video, show captions.
Non-elaborated part of the present invention belongs to techniques well known.
The above; be only part embodiment of the present invention, but protection scope of the present invention is not limited to this, in the technical scope that any those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.

Claims (5)

1. an open class captions automatic creation system for domain-adaptive, is characterized in that comprising: pretreatment module, model modification module, audio identification module, recognition result feedback module and captions generation module; Wherein:
Pretreatment module, comprises video pre-filtering, text pre-service and keyword pre-service; Be to carry out audio extraction and format conversion for the video of user add to video pre-filtering, thereby obtain the audio file of consolidation form; Text pre-service is to retrieve and filter for the text of user add, the text of all different-formats is carried out to unified format conversion, unified in a text, for follow-up use, and text is after reunification carried out to information retrieval and filtration, get useful plain text data, i.e. English text and containing chart and other symbols completely; In text pretreatment module, also comprise keyword pre-service, utilize the keyword that user inputs to carry out web search and information extraction, get corresponding plain text data, and be used from amendment model of cognition with the useful plain text data one that customer-furnished text extracts;
Model modification module, utilizes the plain text data that pretreatment module obtains to modify to the language model in identifying, obtains amended language model;
Audio identification module, utilizes amended language model to identify the audio file proposing in pre-service, obtains recognition result;
Recognition result feedback module, obtains result and carries out keyword extraction identification, and the keyword extracting calls again in pretreatment module the keyword for the treatment of method input to(for) user and processes, and this process forms iteration, obtains final recognition result;
Captions generation module, captions generative process completes standardizes to final recognition result, adds the additional informations such as time shaft, obtains real subtitle file.
2. the open class captions automatic creation system of domain-adaptive according to claim 1, it is characterized in that: the amendment of the language model in described model modification module comprises model generation and model interpolation, described model generation is exactly to utilize the plain text data that pretreatment module obtains to build language model, and model interpolation is that the language model newly building and original general language model are carried out to interpolation; The structure of language model is exactly by the phrase in plain text data is added up specifically, calculates respectively probability and expectation that phrase occurs in plain text data; The interpolation of language model is that the probability in result of calculation and original general language model is carried out to interpolation, obtains the new probability that phrase is corresponding, as shown in formula below:
P(ω|h)=λP c(ω|h)+(1-λ)P α(ω|h)(0≤λ≤1)
Wherein P c(ω | h) be the probability in universal model, P α(ω | the h) probability for utilizing plain text data newly to calculate, P (ω | the h) probability for finally obtaining, λ is interpolation coefficient.
3. the open class captions automatic creation system of domain-adaptive according to claim 1, it is characterized in that: the process of the concrete identification of described audio identification module is: first audio file will be changed into characteristic value sequence, recycling speech model is resolved characteristic value sequence, obtain possible recognition result, then contrasting language model gives a mark to each possible recognition result, wherein be mainly concerned with the 3rd problem in HMM model, calculate optimum status switch corresponding to recognition result according to the characteristic value sequence in speech recognition, for marking result output recognition result.
4. the open class captions automatic creation system of domain-adaptive according to claim 1, it is characterized in that: the detailed process of described recognition result feedback module is: first recognition result is carried out the extraction of keyword, from recognition result, extract the keyword relevant with content to the theme of video, then utilize the keyword extracting to carry out web search and information extraction, obtain relevant plain text data, this process is identical with the processing to keyword in pre-service, then utilizes these plain text data again to revise language model.
5. an open class captions automatic generation method for domain-adaptive, is characterized in that performing step is as follows:
(1) user add video, text or input keyword, for the video of user add, carry out audio extraction and format conversion, thereby obtain the audio file of consolidation form; Retrieve and filter for the text of user add, the text of all different-formats is carried out to unified format conversion, i.e. unified being put in a text, and the text obtaining is carried out to retrieval and the filtration of information, get useful plain text data, for follow-up use; For the keyword of user's input, utilize keyword to carry out web search and information extraction, obtain relevant plain text data, and the plain text data one of obtaining the text providing from user is used from amendment model of cognition;
(2) utilize amended model to carry out the identification of audio frequency, obtain recognition result, recognition result further feeds back, and by again getting back to network retrieval after keyword extraction, identifying is continuous iteration in this way, finally obtains captions.
CN201310596791.4A 2013-11-21 2013-11-21 Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method Pending CN103885924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310596791.4A CN103885924A (en) 2013-11-21 2013-11-21 Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310596791.4A CN103885924A (en) 2013-11-21 2013-11-21 Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method

Publications (1)

Publication Number Publication Date
CN103885924A true CN103885924A (en) 2014-06-25

Family

ID=50954820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310596791.4A Pending CN103885924A (en) 2013-11-21 2013-11-21 Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method

Country Status (1)

Country Link
CN (1) CN103885924A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847271A (en) * 2016-12-12 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system
CN107292396A (en) * 2017-08-14 2017-10-24 南宁学院 A kind of hydroelectric facility reports message treatment method for repairment
CN108597502A (en) * 2018-04-27 2018-09-28 上海适享文化传播有限公司 Field speech recognition training method based on dual training
CN110168531A (en) * 2016-12-30 2019-08-23 三菱电机株式会社 Method and system for multi-modal fusion model
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN112233661A (en) * 2020-10-14 2021-01-15 广州欢网科技有限责任公司 Method, system and equipment for generating movie content subtitle based on voice recognition
CN112668306A (en) * 2020-12-22 2021-04-16 延边大学 Language processing method and system based on statement discrimination recognition and reinforcement learning action design

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651788A (en) * 2008-12-26 2010-02-17 中国科学院声学研究所 Alignment system of on-line speech text and method thereof
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102236639A (en) * 2010-04-28 2011-11-09 三星电子株式会社 System and method for updating language model
CN102623010A (en) * 2012-02-29 2012-08-01 北京百度网讯科技有限公司 Method and device for establishing language model and method and device for recognizing voice
CN102801925A (en) * 2012-08-08 2012-11-28 无锡天脉聚源传媒科技有限公司 Method and device for adding and matching captions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651788A (en) * 2008-12-26 2010-02-17 中国科学院声学研究所 Alignment system of on-line speech text and method thereof
CN102236639A (en) * 2010-04-28 2011-11-09 三星电子株式会社 System and method for updating language model
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102623010A (en) * 2012-02-29 2012-08-01 北京百度网讯科技有限公司 Method and device for establishing language model and method and device for recognizing voice
CN102801925A (en) * 2012-08-08 2012-11-28 无锡天脉聚源传媒科技有限公司 Method and device for adding and matching captions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENHAN CHAO 等: "Improved Graph-based Bilingual Corpus Selection with Sentence Pair Ranking for Statistical Machine Translation", 《2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE》, 31 December 2011 (2011-12-31), pages 446 - 451 *
郑李磊 等: "全自动中文新闻字幕生成系统的设计与实现", 《电子学报》, vol. 39, no. 3, 31 March 2011 (2011-03-31), pages 69 - 74 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847271A (en) * 2016-12-12 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system
CN110168531A (en) * 2016-12-30 2019-08-23 三菱电机株式会社 Method and system for multi-modal fusion model
CN110168531B (en) * 2016-12-30 2023-06-20 三菱电机株式会社 Method and system for multi-modal fusion model
CN107292396A (en) * 2017-08-14 2017-10-24 南宁学院 A kind of hydroelectric facility reports message treatment method for repairment
CN107292396B (en) * 2017-08-14 2020-05-05 南宁学院 Hydroelectric equipment repair message processing method
CN108597502A (en) * 2018-04-27 2018-09-28 上海适享文化传播有限公司 Field speech recognition training method based on dual training
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN110517689B (en) * 2019-08-28 2023-11-24 腾讯科技(深圳)有限公司 Voice data processing method, device and storage medium
CN112233661A (en) * 2020-10-14 2021-01-15 广州欢网科技有限责任公司 Method, system and equipment for generating movie content subtitle based on voice recognition
CN112233661B (en) * 2020-10-14 2024-04-05 广州欢网科技有限责任公司 Video content subtitle generation method, system and equipment based on voice recognition
CN112668306A (en) * 2020-12-22 2021-04-16 延边大学 Language processing method and system based on statement discrimination recognition and reinforcement learning action design
CN112668306B (en) * 2020-12-22 2021-07-27 延边大学 Language processing method and system based on statement discrimination recognition and reinforcement learning action design

Similar Documents

Publication Publication Date Title
CN103885924A (en) Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method
KR102445519B1 (en) System and method for manufacturing conversational intelligence service providing chatbot
CN114116994A (en) Welcome robot dialogue method
CN104573099A (en) Topic searching method and device
CN111914555B (en) Automatic relation extraction system based on Transformer structure
CN111341293A (en) Text voice front-end conversion method, device, equipment and storage medium
CN110740275A (en) nonlinear editing systems
CN110517668A (en) A kind of Chinese and English mixing voice identifying system and method
CN111508466A (en) Text processing method, device and equipment and computer readable storage medium
Mussakhojayeva et al. KazakhTTS: An open-source Kazakh text-to-speech synthesis dataset
CN116092472A (en) Speech synthesis method and synthesis system
CN112216267A (en) Rhythm prediction method, device, equipment and storage medium
CN114444481B (en) Sentiment analysis and generation method of news comment
Kuhn et al. The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
CN109933773A (en) A kind of multiple semantic sentence analysis system and method
CN106502988A (en) The method and apparatus that a kind of objective attribute target attribute is extracted
López-Ludeña et al. LSESpeak: A spoken language generator for Deaf people
Hu et al. MnTTS: an open-source mongolian text-to-speech synthesis dataset and accompanied baseline
CN116129868A (en) Method and system for generating structured photo
Tan Design of intelligent speech translation system based on deep learning
Singh et al. An Integrated Model for Text to Text, Image to Text and Audio to Text Linguistic Conversion using Machine Learning Approach
Šoić et al. Spoken notifications in smart environments using Croatian language
CN112506405A (en) Artificial intelligent voice large screen command method based on Internet supervision field
CN110569510A (en) method for identifying named entity of user request data
CN117035064B (en) Combined training method for retrieving enhanced language model and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140625