CN106874262A - A kind of statistical machine translation method for realizing domain-adaptive - Google Patents
A kind of statistical machine translation method for realizing domain-adaptive Download PDFInfo
- Publication number
- CN106874262A CN106874262A CN201710013628.9A CN201710013628A CN106874262A CN 106874262 A CN106874262 A CN 106874262A CN 201710013628 A CN201710013628 A CN 201710013628A CN 106874262 A CN106874262 A CN 106874262A
- Authority
- CN
- China
- Prior art keywords
- noun
- domain
- field
- knowledge
- chinese
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of statistical machine translation method for realizing domain-adaptive, this interpretation method is by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, the recognizable knowledge hierarchy tree figure of computer is set up, there is its corresponding knowledge tree level by the noun and noun phrase that obtain all of contrast between Chinese and English;Calculate the field weight sum of each domain location point;The domain location point of highest field weight sum is drawn more afterwards, in the ken, according to noun dictionary, determines corresponding translation vocabulary.This statistical machine translation method is by simulating human brain Knowledge framework system, the method for allowing computer to learn mankind's reading character analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.
Description
Technical field
The invention belongs to statistical machine translation technical field, specifically, be related to a kind of system for realizing domain-adaptive
Meter machine translation method.
Background technology
Statistical machine translation is the current most popular machine translation for using.Its working method uses very huge
Parallel text and single language training translation engine.System can find the statistic correlation between source text and translation.So
Afterwards to source language sentence, go to search the translation of maximum probability.Translation engine is in itself without rule or grammar concept.
The major defect of statistical machine translation is, if not having the text of similar data in training corpus is translated
When, the translation for drawing is not all right.For example, a translation engine trained using technical text, is translating colloquial text timeliness
Fruit can be very poor.Accordingly, it would be desirable to persistently train engine using the text similar to material to be translated.Even with huge suitable
Training corpus, statistical machine translation can not generally also generate the text of publishing quality.Statistical machine translation is often regardless of upper
Original text is translated in the case of hereafter, is lacked to context of co-text and the correlation of professional domain.
The difficult point of statistical machine translation is field migration and self adaptation.The initial data of training machine translation system may
From wide in range every field, when meeting the uncommon word of certain specific field, sentence pattern, how fast transferring, to obtain Gao Shui
Flat translation is rather difficult, because the corpus in these fields is grasped must lack, knowledge is not enough during migration.Current several families it is famous
Line translation system, news translation is still competent (because news corpus are most), but the neck rare to bank, these language materials of law
Domain, adaptive ability is then weak many.
The content of the invention
For deficiency above-mentioned in the prior art, the present invention provides a kind of statistical machine translation side for realizing domain-adaptive
Method, this interpretation method calculates the field of correlation by Context Knowledge tree, so that for each noun selects the right of corresponding field
Translation is answered, strengthens the adaptive ability of interpretation method, improve translation accuracy.
In order to achieve the above object, the solution of present invention use is:A kind of statistical machine for realizing domain-adaptive
Interpretation method, comprises the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable
Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting,
The label of level is since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun,
Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;
Described knowledge hierarchy tree figure includes all of name by large and small domain name and the field
Word and noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, domain name shape
Into domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Chinese | English | Knowledge tree level (level) | Field weight (weight): |
n | n+k |
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the field of highest field weight sum
Location point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
Preferably, in stepb, for the noun of polysemy, then the noun is distributed in different meanings targeted
On domain location point, and the noun is (n+k)/x in the field weight of each domain location point, and x is that the noun is related to
Relevant.
Preferably, k=-0.5.
The beneficial effects of the invention are as follows, this statistical machine translation method by simulating human brain Knowledge framework system, calculating is allowed
Machine can learn the method that the mankind read character analysis association area, so as to realize that computer carries out field knowledge to word knowledge
Not, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.
Specific embodiment
The invention will be further described below:
The present invention provides a kind of statistical machine translation method for realizing domain-adaptive, comprises the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable
Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting,
The label of level is since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun,
Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;Versatility noun does not influence generally up and down
Literary field, and the industry noun for segmenting has influence higher on field, the vocabulary of the industry field for more segmenting, to context field
There is disturbance degree higher;
Described knowledge hierarchy tree figure includes all of name by large and small domain name and the field
Word and noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, domain name shape
Into domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Chinese | English | Knowledge tree level (level) | Field weight (weight): |
n | n+k |
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and should
Noun is (n+k)/x in the field weight of each domain location point, and x is the relevant that the noun is related to, k=-0.5;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the field of highest field weight sum
Location point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
This statistical machine translation method allows computer to learn the mankind and reads text by simulating human brain Knowledge framework system
The method of word analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize machine translation
Domain-adaptive function, so as to improve translation accuracy.
Claims (3)
1. a kind of statistical machine translation method for realizing domain-adaptive, it is characterized in that:Comprise the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up the recognizable knowledge of computer
System tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, level
Label since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, general
Noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;
Described knowledge hierarchy tree figure include all of noun by large and small domain name and the field and
Noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, and domain name forms neck
Domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the domain location of highest field weight sum
Point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
2. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that:In stepb,
For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and the noun is every
The field weight of individual domain location point is (n+k)/x, and x is the relevant that the noun is related to.
3. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that:Described k=-
0.5。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710013628.9A CN106874262A (en) | 2017-01-09 | 2017-01-09 | A kind of statistical machine translation method for realizing domain-adaptive |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710013628.9A CN106874262A (en) | 2017-01-09 | 2017-01-09 | A kind of statistical machine translation method for realizing domain-adaptive |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874262A true CN106874262A (en) | 2017-06-20 |
Family
ID=59164837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710013628.9A Pending CN106874262A (en) | 2017-01-09 | 2017-01-09 | A kind of statistical machine translation method for realizing domain-adaptive |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874262A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107632982A (en) * | 2017-09-12 | 2018-01-26 | 郑州科技学院 | The method and apparatus of voice controlled foreign language translation device |
CN107861953A (en) * | 2017-10-19 | 2018-03-30 | 聊城大学 | A kind of title automatic translation system and method |
CN108563643A (en) * | 2018-03-27 | 2018-09-21 | 常熟鑫沐奇宝软件开发有限公司 | A kind of polysemy interpretation method based on artificial intelligence knowledge mapping |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101131691A (en) * | 2006-08-25 | 2008-02-27 | 韩国电子通信研究院 | Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof |
CN103631773A (en) * | 2013-12-16 | 2014-03-12 | 哈尔滨工业大学 | Statistical machine translation method based on field similarity measurement method |
CN104090870A (en) * | 2014-06-26 | 2014-10-08 | 武汉传神信息技术有限公司 | Pushing method of online translation engines |
JP2016045751A (en) * | 2014-08-25 | 2016-04-04 | 日本電気株式会社 | Machine translation device, machine translation method, machine translation program, and recording medium |
CN105550174A (en) * | 2015-12-30 | 2016-05-04 | 哈尔滨工业大学 | Adaptive method of automatic machine translation field on the basis of sample importance |
-
2017
- 2017-01-09 CN CN201710013628.9A patent/CN106874262A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101131691A (en) * | 2006-08-25 | 2008-02-27 | 韩国电子通信研究院 | Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof |
CN103631773A (en) * | 2013-12-16 | 2014-03-12 | 哈尔滨工业大学 | Statistical machine translation method based on field similarity measurement method |
CN104090870A (en) * | 2014-06-26 | 2014-10-08 | 武汉传神信息技术有限公司 | Pushing method of online translation engines |
JP2016045751A (en) * | 2014-08-25 | 2016-04-04 | 日本電気株式会社 | Machine translation device, machine translation method, machine translation program, and recording medium |
CN105550174A (en) * | 2015-12-30 | 2016-05-04 | 哈尔滨工业大学 | Adaptive method of automatic machine translation field on the basis of sample importance |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107632982A (en) * | 2017-09-12 | 2018-01-26 | 郑州科技学院 | The method and apparatus of voice controlled foreign language translation device |
CN107861953A (en) * | 2017-10-19 | 2018-03-30 | 聊城大学 | A kind of title automatic translation system and method |
CN107861953B (en) * | 2017-10-19 | 2020-12-11 | 聊城大学 | Automatic name translation system and method |
CN108563643A (en) * | 2018-03-27 | 2018-09-21 | 常熟鑫沐奇宝软件开发有限公司 | A kind of polysemy interpretation method based on artificial intelligence knowledge mapping |
CN108563643B (en) * | 2018-03-27 | 2021-10-01 | 常熟鑫沐奇宝软件开发有限公司 | Artificial intelligence knowledge graph-based word polysemous translation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919673B (en) | Text mood analysis system based on deep learning | |
CN107463607B (en) | Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning | |
CN103268339B (en) | Named entity recognition method and system in Twitter message | |
CN108363716B (en) | Domain information classification model generation method, classification method, device and storage medium | |
CN107967318A (en) | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets | |
CN106445919A (en) | Sentiment classifying method and device | |
CN106815194A (en) | Model training method and device and keyword recognition method and device | |
CN103324621B (en) | A kind of Thai text spelling correcting method and device | |
CN103294660A (en) | Automatic English composition scoring method and system | |
CN106599054A (en) | Method and system for title classification and push | |
CN102279843A (en) | Method and device for processing phrase data | |
CN104142912A (en) | Accurate corpus category marking method and device | |
CN110674296B (en) | Information abstract extraction method and system based on key words | |
CN103729421B (en) | A kind of method that interpreter's document accurately matches | |
CN106874262A (en) | A kind of statistical machine translation method for realizing domain-adaptive | |
CN109947951A (en) | A kind of automatically updated emotion dictionary construction method for financial text analyzing | |
CN108257650A (en) | A kind of intelligent correction method applied to medical technologies audit report | |
CN107943786A (en) | A kind of Chinese name entity recognition method and system | |
CN106202035B (en) | Vietnamese conversion of parts of speech disambiguation method based on combined method | |
CN113360647A (en) | 5G mobile service complaint source-tracing analysis method based on clustering | |
CN110334362B (en) | Method for solving and generating untranslated words based on medical neural machine translation | |
CN109299464A (en) | Based on the insertion of the theme of network linking and document content, document representing method | |
CN107797986A (en) | A kind of mixing language material segmenting method based on LSTM CNN | |
CN110929507B (en) | Text information processing method, device and storage medium | |
CN113011154B (en) | Deep learning-based operation duplicate checking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |
|
RJ01 | Rejection of invention patent application after publication |