CN106874262A - A kind of statistical machine translation method for realizing domain-adaptive - Google Patents

A kind of statistical machine translation method for realizing domain-adaptive Download PDF

Info

Publication number
CN106874262A
CN106874262A CN201710013628.9A CN201710013628A CN106874262A CN 106874262 A CN106874262 A CN 106874262A CN 201710013628 A CN201710013628 A CN 201710013628A CN 106874262 A CN106874262 A CN 106874262A
Authority
CN
China
Prior art keywords
noun
domain
field
knowledge
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710013628.9A
Other languages
Chinese (zh)
Inventor
梁如昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Jiayin Multilingual Information Technology Co Ltd
Original Assignee
Chengdu Jiayin Multilingual Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Jiayin Multilingual Information Technology Co Ltd filed Critical Chengdu Jiayin Multilingual Information Technology Co Ltd
Priority to CN201710013628.9A priority Critical patent/CN106874262A/en
Publication of CN106874262A publication Critical patent/CN106874262A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of statistical machine translation method for realizing domain-adaptive, this interpretation method is by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, the recognizable knowledge hierarchy tree figure of computer is set up, there is its corresponding knowledge tree level by the noun and noun phrase that obtain all of contrast between Chinese and English;Calculate the field weight sum of each domain location point;The domain location point of highest field weight sum is drawn more afterwards, in the ken, according to noun dictionary, determines corresponding translation vocabulary.This statistical machine translation method is by simulating human brain Knowledge framework system, the method for allowing computer to learn mankind's reading character analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.

Description

A kind of statistical machine translation method for realizing domain-adaptive
Technical field
The invention belongs to statistical machine translation technical field, specifically, be related to a kind of system for realizing domain-adaptive Meter machine translation method.
Background technology
Statistical machine translation is the current most popular machine translation for using.Its working method uses very huge Parallel text and single language training translation engine.System can find the statistic correlation between source text and translation.So Afterwards to source language sentence, go to search the translation of maximum probability.Translation engine is in itself without rule or grammar concept.
The major defect of statistical machine translation is, if not having the text of similar data in training corpus is translated When, the translation for drawing is not all right.For example, a translation engine trained using technical text, is translating colloquial text timeliness Fruit can be very poor.Accordingly, it would be desirable to persistently train engine using the text similar to material to be translated.Even with huge suitable Training corpus, statistical machine translation can not generally also generate the text of publishing quality.Statistical machine translation is often regardless of upper Original text is translated in the case of hereafter, is lacked to context of co-text and the correlation of professional domain.
The difficult point of statistical machine translation is field migration and self adaptation.The initial data of training machine translation system may From wide in range every field, when meeting the uncommon word of certain specific field, sentence pattern, how fast transferring, to obtain Gao Shui Flat translation is rather difficult, because the corpus in these fields is grasped must lack, knowledge is not enough during migration.Current several families it is famous Line translation system, news translation is still competent (because news corpus are most), but the neck rare to bank, these language materials of law Domain, adaptive ability is then weak many.
The content of the invention
For deficiency above-mentioned in the prior art, the present invention provides a kind of statistical machine translation side for realizing domain-adaptive Method, this interpretation method calculates the field of correlation by Context Knowledge tree, so that for each noun selects the right of corresponding field Translation is answered, strengthens the adaptive ability of interpretation method, improve translation accuracy.
In order to achieve the above object, the solution of present invention use is:A kind of statistical machine for realizing domain-adaptive Interpretation method, comprises the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, The label of level is since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;
Described knowledge hierarchy tree figure includes all of name by large and small domain name and the field Word and noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, domain name shape Into domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Chinese English Knowledge tree level (level) Field weight (weight):
n n+k
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the field of highest field weight sum Location point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
Preferably, in stepb, for the noun of polysemy, then the noun is distributed in different meanings targeted On domain location point, and the noun is (n+k)/x in the field weight of each domain location point, and x is that the noun is related to Relevant.
Preferably, k=-0.5.
The beneficial effects of the invention are as follows, this statistical machine translation method by simulating human brain Knowledge framework system, calculating is allowed Machine can learn the method that the mankind read character analysis association area, so as to realize that computer carries out field knowledge to word knowledge Not, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.
Specific embodiment
The invention will be further described below:
The present invention provides a kind of statistical machine translation method for realizing domain-adaptive, comprises the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, The label of level is since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;Versatility noun does not influence generally up and down Literary field, and the industry noun for segmenting has influence higher on field, the vocabulary of the industry field for more segmenting, to context field There is disturbance degree higher;
Described knowledge hierarchy tree figure includes all of name by large and small domain name and the field Word and noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, domain name shape Into domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Chinese English Knowledge tree level (level) Field weight (weight):
n n+k
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and should Noun is (n+k)/x in the field weight of each domain location point, and x is the relevant that the noun is related to, k=-0.5;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the field of highest field weight sum Location point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
This statistical machine translation method allows computer to learn the mankind and reads text by simulating human brain Knowledge framework system The method of word analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize machine translation Domain-adaptive function, so as to improve translation accuracy.

Claims (3)

1. a kind of statistical machine translation method for realizing domain-adaptive, it is characterized in that:Comprise the following steps,
A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up the recognizable knowledge of computer System tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, level Label since 1 up to n;The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, general Noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field;
Described knowledge hierarchy tree figure include all of noun by large and small domain name and the field and Noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, and domain name forms neck Domain location point;Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level;
B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows:
Chinese English Knowledge tree level (level) Field weight (weight): n n+k
Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight;
C, the field weight sum for calculating each domain location point;
D, the field weight sum for comparing every field location point, draw the domain location of highest field weight sum Point, i.e., the related ken of this section word;
E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.
2. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that:In stepb, For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and the noun is every The field weight of individual domain location point is (n+k)/x, and x is the relevant that the noun is related to.
3. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that:Described k=- 0.5。
CN201710013628.9A 2017-01-09 2017-01-09 A kind of statistical machine translation method for realizing domain-adaptive Pending CN106874262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710013628.9A CN106874262A (en) 2017-01-09 2017-01-09 A kind of statistical machine translation method for realizing domain-adaptive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710013628.9A CN106874262A (en) 2017-01-09 2017-01-09 A kind of statistical machine translation method for realizing domain-adaptive

Publications (1)

Publication Number Publication Date
CN106874262A true CN106874262A (en) 2017-06-20

Family

ID=59164837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710013628.9A Pending CN106874262A (en) 2017-01-09 2017-01-09 A kind of statistical machine translation method for realizing domain-adaptive

Country Status (1)

Country Link
CN (1) CN106874262A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107632982A (en) * 2017-09-12 2018-01-26 郑州科技学院 The method and apparatus of voice controlled foreign language translation device
CN107861953A (en) * 2017-10-19 2018-03-30 聊城大学 A kind of title automatic translation system and method
CN108563643A (en) * 2018-03-27 2018-09-21 常熟鑫沐奇宝软件开发有限公司 A kind of polysemy interpretation method based on artificial intelligence knowledge mapping

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131691A (en) * 2006-08-25 2008-02-27 韩国电子通信研究院 Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof
CN103631773A (en) * 2013-12-16 2014-03-12 哈尔滨工业大学 Statistical machine translation method based on field similarity measurement method
CN104090870A (en) * 2014-06-26 2014-10-08 武汉传神信息技术有限公司 Pushing method of online translation engines
JP2016045751A (en) * 2014-08-25 2016-04-04 日本電気株式会社 Machine translation device, machine translation method, machine translation program, and recording medium
CN105550174A (en) * 2015-12-30 2016-05-04 哈尔滨工业大学 Adaptive method of automatic machine translation field on the basis of sample importance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131691A (en) * 2006-08-25 2008-02-27 韩国电子通信研究院 Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof
CN103631773A (en) * 2013-12-16 2014-03-12 哈尔滨工业大学 Statistical machine translation method based on field similarity measurement method
CN104090870A (en) * 2014-06-26 2014-10-08 武汉传神信息技术有限公司 Pushing method of online translation engines
JP2016045751A (en) * 2014-08-25 2016-04-04 日本電気株式会社 Machine translation device, machine translation method, machine translation program, and recording medium
CN105550174A (en) * 2015-12-30 2016-05-04 哈尔滨工业大学 Adaptive method of automatic machine translation field on the basis of sample importance

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107632982A (en) * 2017-09-12 2018-01-26 郑州科技学院 The method and apparatus of voice controlled foreign language translation device
CN107861953A (en) * 2017-10-19 2018-03-30 聊城大学 A kind of title automatic translation system and method
CN107861953B (en) * 2017-10-19 2020-12-11 聊城大学 Automatic name translation system and method
CN108563643A (en) * 2018-03-27 2018-09-21 常熟鑫沐奇宝软件开发有限公司 A kind of polysemy interpretation method based on artificial intelligence knowledge mapping
CN108563643B (en) * 2018-03-27 2021-10-01 常熟鑫沐奇宝软件开发有限公司 Artificial intelligence knowledge graph-based word polysemous translation method

Similar Documents

Publication Publication Date Title
CN106919673B (en) Text mood analysis system based on deep learning
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN103268339B (en) Named entity recognition method and system in Twitter message
CN108363716B (en) Domain information classification model generation method, classification method, device and storage medium
CN107967318A (en) A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN106445919A (en) Sentiment classifying method and device
CN106815194A (en) Model training method and device and keyword recognition method and device
CN103324621B (en) A kind of Thai text spelling correcting method and device
CN103294660A (en) Automatic English composition scoring method and system
CN106599054A (en) Method and system for title classification and push
CN102279843A (en) Method and device for processing phrase data
CN104142912A (en) Accurate corpus category marking method and device
CN110674296B (en) Information abstract extraction method and system based on key words
CN103729421B (en) A kind of method that interpreter's document accurately matches
CN106874262A (en) A kind of statistical machine translation method for realizing domain-adaptive
CN109947951A (en) A kind of automatically updated emotion dictionary construction method for financial text analyzing
CN108257650A (en) A kind of intelligent correction method applied to medical technologies audit report
CN107943786A (en) A kind of Chinese name entity recognition method and system
CN106202035B (en) Vietnamese conversion of parts of speech disambiguation method based on combined method
CN113360647A (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
CN109299464A (en) Based on the insertion of the theme of network linking and document content, document representing method
CN107797986A (en) A kind of mixing language material segmenting method based on LSTM CNN
CN110929507B (en) Text information processing method, device and storage medium
CN113011154B (en) Deep learning-based operation duplicate checking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620

RJ01 Rejection of invention patent application after publication