CN106874262A

CN106874262A - A kind of statistical machine translation method for realizing domain-adaptive

Info

Publication number: CN106874262A
Application number: CN201710013628.9A
Authority: CN
Inventors: 梁如昕
Original assignee: Chengdu Jiayin Multilingual Information Technology Co Ltd
Current assignee: Chengdu Jiayin Multilingual Information Technology Co Ltd
Priority date: 2017-01-09
Filing date: 2017-01-09
Publication date: 2017-06-20

Abstract

The invention discloses a kind of statistical machine translation method for realizing domain-adaptive, this interpretation method is by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, the recognizable knowledge hierarchy tree figure of computer is set up, there is its corresponding knowledge tree level by the noun and noun phrase that obtain all of contrast between Chinese and English；Calculate the field weight sum of each domain location point；The domain location point of highest field weight sum is drawn more afterwards, in the ken, according to noun dictionary, determines corresponding translation vocabulary.This statistical machine translation method is by simulating human brain Knowledge framework system, the method for allowing computer to learn mankind's reading character analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.

Description

A kind of statistical machine translation method for realizing domain-adaptive

Technical field

The invention belongs to statistical machine translation technical field, specifically, be related to a kind of system for realizing domain-adaptive Meter machine translation method.

Background technology

Statistical machine translation is the current most popular machine translation for using.Its working method uses very huge Parallel text and single language training translation engine.System can find the statistic correlation between source text and translation.So Afterwards to source language sentence, go to search the translation of maximum probability.Translation engine is in itself without rule or grammar concept.

The major defect of statistical machine translation is, if not having the text of similar data in training corpus is translated When, the translation for drawing is not all right.For example, a translation engine trained using technical text, is translating colloquial text timeliness Fruit can be very poor.Accordingly, it would be desirable to persistently train engine using the text similar to material to be translated.Even with huge suitable Training corpus, statistical machine translation can not generally also generate the text of publishing quality.Statistical machine translation is often regardless of upper Original text is translated in the case of hereafter, is lacked to context of co-text and the correlation of professional domain.

The difficult point of statistical machine translation is field migration and self adaptation.The initial data of training machine translation system may From wide in range every field, when meeting the uncommon word of certain specific field, sentence pattern, how fast transferring, to obtain Gao Shui Flat translation is rather difficult, because the corpus in these fields is grasped must lack, knowledge is not enough during migration.Current several families it is famous Line translation system, news translation is still competent (because news corpus are most), but the neck rare to bank, these language materials of law Domain, adaptive ability is then weak many.

The content of the invention

For deficiency above-mentioned in the prior art, the present invention provides a kind of statistical machine translation side for realizing domain-adaptive Method, this interpretation method calculates the field of correlation by Context Knowledge tree, so that for each noun selects the right of corresponding field Translation is answered, strengthens the adaptive ability of interpretation method, improve translation accuracy.

In order to achieve the above object, the solution of present invention use is：A kind of statistical machine for realizing domain-adaptive Interpretation method, comprises the following steps,

A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, The label of level is since 1 up to n；The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field；

Described knowledge hierarchy tree figure includes all of name by large and small domain name and the field Word and noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, domain name shape Into domain location point；Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level；

B, the noun and noun phrase database of setting up the recognizable contrast between Chinese and English of computer, database are as follows：

Chinese	English	Knowledge tree level (level)	Field weight (weight):
						n	n+k

Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding field weight；

C, the field weight sum for calculating each domain location point；

D, the field weight sum for comparing every field location point, draw the field of highest field weight sum Location point, i.e., the related ken of this section word；

E, in the ken, according to noun dictionary, determine corresponding translation vocabulary.

Preferably, in stepb, for the noun of polysemy, then the noun is distributed in different meanings targeted On domain location point, and the noun is (n+k)/x in the field weight of each domain location point, and x is that the noun is related to Relevant.

Preferably, k=-0.5.

The beneficial effects of the invention are as follows, this statistical machine translation method by simulating human brain Knowledge framework system, calculating is allowed Machine can learn the method that the mankind read character analysis association area, so as to realize that computer carries out field knowledge to word knowledge Not, so as to realize the domain-adaptive function of machine translation, so as to improve translation accuracy.

Specific embodiment

The invention will be further described below：

The present invention provides a kind of statistical machine translation method for realizing domain-adaptive, comprises the following steps,

A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up computer recognizable Knowledge hierarchy tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, The label of level is since 1 up to n；The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, Generic noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field；Versatility noun does not influence generally up and down Literary field, and the industry noun for segmenting has influence higher on field, the vocabulary of the industry field for more segmenting, to context field There is disturbance degree higher；

Chinese	English	Knowledge tree level (level)	Field weight (weight):
						n	n+k

For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and should Noun is (n+k)/x in the field weight of each domain location point, and x is the relevant that the noun is related to, k=-0.5；

C, the field weight sum for calculating each domain location point；

This statistical machine translation method allows computer to learn the mankind and reads text by simulating human brain Knowledge framework system The method of word analysis association area, so as to realize that computer carries out field identification to word knowledge, so as to realize machine translation Domain-adaptive function, so as to improve translation accuracy.

Claims

1. a kind of statistical machine translation method for realizing domain-adaptive, it is characterized in that：Comprise the following steps,

A, by the noun and noun phrase of all of contrast between Chinese and English according to existing knowledge hierarchy, set up the recognizable knowledge of computer System tree figure, described knowledge hierarchy tree figure includes some levels for being arranged in order and successively segmenting, level Label since 1 up to n；The noun and noun phrase of described contrast between Chinese and English are divided into generic noun and industry noun, general Noun belongs to the 1st layer, and industry noun is successively segmented since the 2nd layer by field；

Described knowledge hierarchy tree figure include all of noun by large and small domain name and the field and Noun phrase, noun and noun phrase in the field are arranged in subordinate's level of the domain name, and domain name forms neck Domain location point；Thus the noun and noun phrase for obtaining all of contrast between Chinese and English have its corresponding knowledge tree level；

Chinese English Knowledge tree level (level) Field weight (weight): n n+k

C, the field weight sum for calculating each domain location point；

D, the field weight sum for comparing every field location point, draw the domain location of highest field weight sum Point, i.e., the related ken of this section word；

2. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that：In stepb, For the noun of polysemy, then the noun is distributed on the targeted domain location point of different meanings, and the noun is every The field weight of individual domain location point is (n+k)/x, and x is the relevant that the noun is related to.

3. the statistical machine translation method for realizing domain-adaptive according to claim 1, it is characterized in that：Described k=- 0.5。