CN107066456A - A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system - Google Patents
A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system Download PDFInfo
- Publication number
- CN107066456A CN107066456A CN201710203849.2A CN201710203849A CN107066456A CN 107066456 A CN107066456 A CN 107066456A CN 201710203849 A CN201710203849 A CN 201710203849A CN 107066456 A CN107066456 A CN 107066456A
- Authority
- CN
- China
- Prior art keywords
- receiving module
- text
- language
- machine translation
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/47—Machine-assisted translation, e.g. using translation memory
Abstract
The invention discloses a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system, receiving module is used to check that receiving module includes text language receiving module and voice identification result receiving module to the normalization that system is inputted;Wherein text language receiving module is used to carry out sentence segmentation to text language and form is changed, and voice identification result receiving module is used to split voice and noise is eliminated.The receiving module of the present invention both can be used for the reception to text language, can be used for the receiving to speech language;Receiving module of the present invention can carry out basic processing to the language of reception, for example, the content between a pair of html marks individually forms a complete sentence;When speech pause is more than 0.5s, it is believed that newly start a sentence after the pause;Dispose spoken language in input and talk about fragment of adjacent repetition etc. in text flow, be easy to follow-up machine translation module to treat the translation of interpreter language, so as to improve the efficiency and quality of translation.
Description
Technical field
The present invention relates to artificial intelligence machine translation technology field, it particularly relates to a kind of multilingual intelligence pretreatment
The receiving module of real-time statistics machine translation system.
Background technology
Machine translation is the technology for carrying out automatic translation to Human Natural Language using computer, is one using computer
The process that natural language is converted into another natural language is planted, and two kinds of natural languages should be of equal value in the sense.
At present, a kind of comparative maturity and machine translation method of main flow is Statistics-Based Method, the advantage of this method
It is that all translation informations are all automatically to be obtained from language material learning with little need for manually translation rule is write, because
The characteristics of this this method has farthest played computer high-speed computation, significantly reduces cost of labor.
Machine translation mothod based on statistical model is from Parallel Corpus learning from a kind of language A to another language B
Phrase translation.When translating new sentence, input language A sentences decomposition into some phrases, according to study come phrase
(A language)- phrase(B language)Co-occurrence probabilities, language A sentence translation into language B sentence.It is whole study, translated
Journey is completely according to statistical model.
And the receiving module of the technology is mainly used in treating cypher text or language to be translated(For example:Given a lecture in speech
The translation of content)Progress basic normative inspection and processing, enable the machine translation in later stage more smooth, at present, existing
Receiving module there is poor universality, text language or speech language can only be received, in addition, receive voice receiving module pair
Various problems etc. occur in the processing of voice.
The content of the invention
For the above-mentioned technical problem in correlation technique, the present invention proposes a kind of multilingual intelligence pretreatment real-time statistics machine
The receiving module of device translation system, can overcome the above-mentioned deficiency of prior art.
To realize above-mentioned technical purpose, the technical proposal of the invention is realized in this way:
A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system, the receiving module is used for system
The normalization of input is checked that the receiving module includes text language receiving module and voice identification result receiving module;
Wherein text language receiving module is used to carry out sentence segmentation to text language and form is changed, voice identification result receiving module
For voice is split and noise eliminate.
Further, the text language receiving module includes sentence and splits submodule, and the sentence segmentation submodule is used
Disconnected in input text at punctuation mark so that the elementary cell of follow-up machine translation module translation is in short.
It is preferred that, when including html marks in input text, the content between a pair of html marks individually forms a complete sentence, and makees
Translated for complete sentence.
Further, the text language receiving module also includes form transform subblock, the form transform subblock
The plain text or XML format supported during for language text to be converted to the translation of machine translation module.
It is preferred that, the language text includes PDF texts and/or picture text.
Further, institute's speech recognition result receiving module includes sentence and splits submodule, and the sentence splits submodule
Block is used to make pauses in reading unpunctuated ancient writings to the speech text stream of input according to the pause between word and word.
It is preferred that, when the sentence segmentation submodule is paused between sentence more than 5s, that is, think newly to open after a pause
Begin a sentence.
Further, institute's speech recognition result receiving module also includes noise elimination submodule, and the noise eliminates son
Module is used to dispose the fragment of adjacent repetition in spoken words text flow in input.
It is preferred that, machine translation system subsequent module is plain text for the receivable pattern of voice identification result and obscured
Network.
Beneficial effects of the present invention:The receiving module of the present invention both can be used for the reception to text language, can also use
In the receiving to speech language;Receiving module of the present invention can carry out basic processing to the language of reception, such as by a pair
Content between html marks individually forms a complete sentence;When speech pause is more than 0.5s, it is believed that newly start a sentence after the pause
Son;Dispose spoken language in input and talk about fragment of adjacent repetition etc. in text flow, consequently facilitating follow-up machine translation module is treated and turned over
The translation of language is translated, so as to improve the efficiency and quality of translation.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention
Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 is the schematic diagram of text language receiving module described according to embodiments of the present invention;
Fig. 2 is the schematic diagram of voice identification result receiving module described according to embodiments of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained belongs to what the present invention was protected
Scope.
As shown in Figure 1-2, described a kind of multilingual intelligence pretreatment real-time statistics machine according to embodiments of the present invention is turned over
The receiving module of system is translated, the receiving module is used to check that the receiving module includes to the normalization that system is inputted
Text language receiving module and voice identification result receiving module;Wherein text language receiving module is used to carry out text language
Sentence is split and form conversion, and voice identification result receiving module is used to split voice and noise is eliminated;The text
Language receiving module includes sentence and splits submodule, and the sentence segmentation submodule is used to input text break at punctuation mark
Open so that the elementary cell of follow-up machine translation module translation is in short;The text language receiving module also includes form
Transform subblock, the form transform subblock is used to language text be converted to machine translation module and translate to make the pure text of support
Sheet or XML format.Institute's speech recognition result receiving module includes sentence and splits submodule, and the sentence segmentation submodule is used for
The speech text stream of input is made pauses in reading unpunctuated ancient writings according to the pause between word and word;Institute's speech recognition result receiving module also includes noise
Submodule is eliminated, the noise, which eliminates submodule, is used to dispose the fragment of adjacent repetition in spoken words text flow in input;Machine
The pattern that the receivable voice identification result receiving module of device translation system subsequent module is sent is plain text and confusion network.
In one embodiment, when including html marks in input text, the content between a pair of html marks
Individually form a complete sentence, translated as complete sentence.
In one embodiment, the language text includes PDF texts and/or picture text.
In one embodiment, when the sentence segmentation submodule is paused between sentence more than 5s, that is, think stopping
Newly start a sentence after.
Understand for convenience the present invention above-mentioned technical proposal, below by way of in specifically used mode to the present invention it is above-mentioned
Technical scheme is described in detail.
When specifically used, text language receiving module of the present invention is mainly made up of two parts, Fig. 1 in such as accompanying drawing
It is shown:A.1 sentence splits submodule and A.2 form transform subblock, and A.1 sentence segmentation submodule accords with input text in punctuate
Number:Such as fullstop, question mark, disconnect at exclamation mark so that the elementary cell of follow-up machine translation module translation is an a word;
When input include html marks in text when, the content between a pair of html mark individually forms a complete sentence, to ensure it as complete
Sentence translated, the part for marking outer text not as html is translated;The subsequent module branch of machine translation
The translation of plain text and XML format text is held, therefore, when input text is extended formatting:Such as PDF is accomplished by lattice during picture
The conversion of formula modular converter;A.2 extended formatting is converted into plain text and XML format by form transform subblock.
Voice identification result receiving module is also mainly made up of two parts, as shown in Fig. 2 in accompanying drawing:A.3 sentence segmentation is sub
Module and A.4 noise eliminate submodule.A.3 sentence segmentation submodule is used for the text flow of input according between word and word
Pause punctuate, such as when being more than 0.5s when pausing, it is believed that newly start a sentence after the pause;A.4 noise eliminates submodule
The function of block is to dispose the fragment of adjacent repetition in spoken words text flow in input, for example " uh uh " is simplified to " uh ";
" that is we are necessary ... " is simplified to " that is we are necessary ... ";Machine translation system subsequent module
It is plain text and confusion network for the receivable pattern of voice identification result.
In summary, by means of the above-mentioned technical proposal of the present invention, receiving module of the invention both can be used for text
The reception of language, can be used for the receiving to speech language;Receiving module of the present invention can carry out basic to the language of reception
Processing, for example the content between a pair of html mark is individually formed a complete sentence;When speech pause is more than 0.5s, it is believed that in the pause
Newly start a sentence afterwards;Dispose spoken language in input and talk about fragment of adjacent repetition etc. in text flow, consequently facilitating follow-up machine
Device translation module treats the translation of interpreter language, so as to improve the efficiency and quality of translation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God is with principle, and any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (9)
1. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system, it is characterised in that the reception
Module is used to check that the receiving module includes text language receiving module and speech recognition to the normalization that system is inputted
As a result receiving module;Wherein text language receiving module is used to carry out sentence segmentation to text language and form is changed, and voice is known
Other result receiving module to voice for being split, noise is eliminated.
2. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 1,
Characterized in that, the text language receiving module, which includes sentence, splits submodule, the sentence segmentation submodule is used for defeated
Enter text to disconnect at punctuation mark so that the elementary cell of follow-up machine translation module translation is in short.
3. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 2,
Characterized in that, when including html marks in input text, the content between a pair of html marks individually forms a complete sentence, as complete
Whole sentence is translated.
4. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 2,
Characterized in that, the text language receiving module also includes form transform subblock, the form transform subblock is used for handle
Language text is converted to the plain text format supported during the translation of machine translation module or XML format.
5. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 4,
Characterized in that, the language text includes PDF texts and/or picture text.
6. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 1,
Characterized in that, institute's speech recognition result receiving module, which includes sentence, splits submodule, the sentence segmentation submodule is used for
The speech text stream of input is made pauses in reading unpunctuated ancient writings according to the pause between word and word.
7. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 6,
Characterized in that, when the sentence segmentation submodule is paused between sentence more than 5s, that is, thinking newly to start one after a pause
Individual sentence.
8. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 6,
Characterized in that, institute's speech recognition result receiving module, which also includes noise, eliminates submodule, the noise eliminates submodule and used
In the fragment for disposing adjacent repetition in spoken words text flow in input.
9. a kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system according to claim 8,
Characterized in that, machine translation system subsequent module is plain text for the receivable pattern of voice identification result and obscures net
Network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710203849.2A CN107066456A (en) | 2017-03-30 | 2017-03-30 | A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710203849.2A CN107066456A (en) | 2017-03-30 | 2017-03-30 | A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107066456A true CN107066456A (en) | 2017-08-18 |
Family
ID=59602734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710203849.2A Pending CN107066456A (en) | 2017-03-30 | 2017-03-30 | A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107066456A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231278A (en) * | 2011-06-10 | 2011-11-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for realizing automatic addition of punctuation marks in speech recognition |
US20120010873A1 (en) * | 2010-07-06 | 2012-01-12 | Electronics And Telecommunications Research Institute | Sentence translation apparatus and method |
CN102650987A (en) * | 2011-02-25 | 2012-08-29 | 北京百度网讯科技有限公司 | Machine translation method and device both based on source language repeat resource |
CN103956162A (en) * | 2014-04-04 | 2014-07-30 | 上海元趣信息技术有限公司 | Voice recognition method and device oriented towards child |
CN105117389A (en) * | 2015-07-28 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Translation method and device |
-
2017
- 2017-03-30 CN CN201710203849.2A patent/CN107066456A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120010873A1 (en) * | 2010-07-06 | 2012-01-12 | Electronics And Telecommunications Research Institute | Sentence translation apparatus and method |
CN102650987A (en) * | 2011-02-25 | 2012-08-29 | 北京百度网讯科技有限公司 | Machine translation method and device both based on source language repeat resource |
CN102231278A (en) * | 2011-06-10 | 2011-11-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for realizing automatic addition of punctuation marks in speech recognition |
CN103956162A (en) * | 2014-04-04 | 2014-07-30 | 上海元趣信息技术有限公司 | Voice recognition method and device oriented towards child |
CN105117389A (en) * | 2015-07-28 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Translation method and device |
Non-Patent Citations (1)
Title |
---|
JOHENNES: "Google可翻译Word或PDF文档", 《HTTPS://BLOG.CSDN.NET/JOHENNES/ARTICLE/DETAILS/12968209》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107066455A (en) | A kind of multilingual intelligence pretreatment real-time statistics machine translation system | |
Caubrière et al. | Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability | |
CN103971686B (en) | Method and system for automatically recognizing voice | |
US8571849B2 (en) | System and method for enriching spoken language translation with prosodic information | |
CN103761975B (en) | Method and device for oral evaluation | |
CN107038160A (en) | The pretreatment module of multilingual intelligence pretreatment real-time statistics machine translation system | |
DE602004018290D1 (en) | LANGUAGE RECOGNITION AND CORRECTION SYSTEM, CORRECTION DEVICE AND METHOD FOR GENERATING A LEXICON OF ALTERNATIVES | |
Kaur et al. | Review of machine transliteration techniques | |
Kumar et al. | Translations of the CALLHOME Egyptian Arabic corpus for conversational speech translation | |
CN103885924A (en) | Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method | |
CN105573994B (en) | Statictic machine translation system based on syntax skeleton | |
Bungeroth et al. | A German Sign Language Corpus of the Domain Weather Report. | |
CN106156013A (en) | The two-part machine translation method that a kind of regular collocation type phrase is preferential | |
CN104679733B (en) | A kind of voice dialogue interpretation method, apparatus and system | |
Seljan et al. | Human Quality Evaluation of Machine-Translated Poetry | |
CN107066456A (en) | A kind of receiving module of multilingual intelligence pretreatment real-time statistics machine translation system | |
CN111738023A (en) | Automatic image-text audio translation method and system | |
Schlippe et al. | Text normalization based on statistical machine translation and internet user support | |
Tam et al. | RNN-based labeled data generation for spoken language understanding. | |
CN107015971A (en) | The post-processing module of multilingual intelligence pretreatment real-time statistics machine translation system | |
CN103268314A (en) | Method and device for acquiring sentence punctuating rules of Thai language | |
CN108628851A (en) | The method for translating mandarin and Japanese based on artificial intelligence algorithm of support vector machine | |
CN108628841A (en) | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms | |
CN116386637B (en) | Radar flight command voice instruction generation method and system | |
CN108628847A (en) | A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |