JP3388393B2

JP3388393B2 - Translation device for tense, aspect or modality using database

Info

Publication number: JP3388393B2
Application number: JP23857999A
Authority: JP
Inventors: 真樹村田; 清貴内元; 青馬; 均井佐原
Original assignee: 独立行政法人通信総合研究所
Priority date: 1999-08-25
Filing date: 1999-08-25
Publication date: 2003-03-17
Anticipated expiration: 2019-08-25
Also published as: JP2001067357A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、データベースを
利用した翻訳装置に関するものである。特に、時制など
が文末に表現される言語、例えば日本語、から他の言
語、例えば英語、に翻訳する際に問題となる文末表現の
テンス（時制）、アスペクト（相）あるいはモダリティ
（様相）を翻訳する時に用いる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a translation device using a database. In particular, the tense, aspect, or modality of sentence endings that becomes a problem when translating from a language in which tense is expressed at the end of the sentence, such as Japanese, to another language such as English. Used when translating.

【０００２】[0002]

【従来の技術】従来の技術の例を、データベースを用い
た翻訳装置における翻訳方法の従来例を第一の従来技術
に、用例を基にしたデータベースを用いた翻訳装置での
翻訳方法の従来例を第二の従来技術に、用例間の類似性
を見る方法として文末から数えた一致文字列の数を用い
た従来例を第三の従来技術として以下に示す。2. Description of the Related Art A conventional example of a conventional translation method in a translation device using a database is the first conventional technique, and a conventional example of a translation method in a translation device using a database based on an example. The second prior art will be described below, and the third prior art will be described below as a third prior art in which the number of matching character strings counted from the end of a sentence is used as a method of checking the similarity between examples.

【０００３】まず、データベースを用いた翻訳装置の翻
訳方法の第一の従来技術を図２のフローチャートに示
す。図２のフローチャートでは、次の四段階の手続を示
している。First, a first conventional technique of a translation method of a translation device using a database is shown in the flowchart of FIG. The flowchart of FIG. 2 shows the following four steps.

【０００４】従来の文末表現の日英翻訳は、人手で作成
した規則によってなされてきた。このため、まず、次の
作業を行う必要があった。Conventional Japanese-to-English translation of sentence end expressions has been done by rules created manually. Therefore, first of all, it was necessary to perform the following work.

【０００５】１）解析以前に予め、人手による規則集の
作成をする。例えば、連用形＋動詞「いる」ならば、ア
スペクトが「進行相」となる。このような規則を、他の
組み合わせに対しても作成し、規則集を作成する。ま
た、テンス（時制）、アスペクト（相）あるいはモダリ
ティ（様相）についても、その規則集を作成する。この
様な人手により規則集を作成する場合は、規則の不備が
残ってしまい、常にメンテナンスを続けて洗練化する必
要がある。1) Prior to analysis, a rulebook is manually created in advance. For example, if the continuous form + verb "Iru" is used, the aspect is "progressive phase". Such rules are created for other combinations to create a rule set. Also, for tense, aspect, or modality, a rulebook will be created. When a rulebook is created by such a person, deficiencies in the rules remain, and it is necessary to constantly maintain and refine the rules.

【０００６】次に解析作業として、２）解析における手続１入力文の翻訳のための入力文の
形態素解析や構文解析を行う。例えば「希望をいだいて
いる。」が入力文の場合は、下記のような結果を得る。
希望＜名詞＞を＜助詞＞いだいて＜動詞＞＜連用形
＞いる＜動詞＞Next, as an analysis work, 2) morphological analysis and syntactic analysis of the input sentence for translation of the procedure 1 input sentence in the analysis are performed. For example, when "I'm hoping." Is the input sentence, the following result is obtained.
Hope <noun> is <particle> say <verb><synonym><verb>

【０００７】ここで、形態素解析部や構文解析部を変更
すると、上記の規則集にも影響があり、適切な翻訳を維
持するためには、上記の規則集にも変更すべき点が発生
してしまう。[0007] Here, if the morphological analysis unit or the syntactic analysis unit is changed, the above rule set is also affected, and in order to maintain an appropriate translation, there is a point to be changed in the above rule set. Will end up.

【０００８】３）解析における手続２形態素解析や構文
解析の結果と、規則を照合して、テンス（時制）、アス
ペクト（相）あるいはモダリティ（様相）を確定する。
上記の場合、文末表現が、＜連用形＞＋動詞「いる」の
形になっているので、予め作成した規則により、「進行
相」と確定される。3) Procedure 2 in analysis The rule is collated with the result of morphological analysis or syntactic analysis to determine the tense (tense), aspect (phase) or modality (modality).
In the above case, since the sentence end expression is in the form of <conjunctive form> + verb “Iru”, the “progressive phase” is determined by the rule created in advance.

【０００９】続いて、次の様に文全体を構成するため、
合成作業を行う。４）解析における手続３テンス、アスペクトあるいはモ
ダリティの翻訳以外の部分は、従来の既によく知られた
翻訳方法のどの方法を用いてもよく、それらのどれかを
用いて翻訳し、テンス、アスペクトあるいはモダリティ
の翻訳は、上記の方法により翻訳し、これらを合成する
ことにより、文全体の翻訳を完成する。Then, to compose the whole sentence as follows,
Perform synthesis work. 4) Procedure 3 in the analysis The parts other than the translation of the tense, the aspect or the modality may use any of the conventional well-known translation methods, and the translation may be performed by using any one of them. The modality is translated by the above method, and by synthesizing them, the translation of the entire sentence is completed.

【００１０】ここで示した人手で作成した規則によって
翻訳する方法では、規則のメンテナンスに多大な人的資
源を投入する必要性があるという欠点がある。The method of translating according to the manually created rule shown here has a drawback in that a large amount of human resources must be invested in the maintenance of the rule.

【００１１】次に用例を基にしたデータベースを用いた
翻訳装置での翻訳方法の先行例である第二の従来技術を
示す。Next, a second prior art as a prior example of a translation method in a translation device using a database based on an example will be shown.

【００１２】本発明の方法でも、用例を集めたデータベ
ースを利用しており、用例ベース手法に分類される。こ
の用例を基にした手法を日英翻訳に利用した従来例とし
ては、報告書（ＥｉｉｃｈｉｒｏＳｕｍｉｔａ、Ｈｉ
ｔｏｓｈｉＩｉｄａ、ａｎｄＨｉｄｅｏＫｏｈｙａ
ｍａ、Ｔｒａｎｓｌａｔｉｎｇｅｘａｍｐｌｅｓ：Ａ
ｎｅｗａｐｐｒｏａｃｈｔｏｍａｃｈｉｎｅｔｒａ
ｎｓｌａｔｉｏｎ、ＴｈｅｔｈｉｒｄＩｎｔｅｒｎａ
ｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＴｈｅｏｒｅ
ｔｉｃａｌａｎｄＭｅｔｈｏｄｏｌｏｇｉｃａｌＩｓ
ｓｕｅｓｉｎＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ
ｏｆＮａｔｕｒａｌＬａｎｇｕａｇｅ、Ｎｏ.３、
（ＴＭＩ、１９９０）、ｐｐ.２０３−２１２）があ
る。ただし、この技術は課題が「ＡのＢ」であって、
「ＡのＢ」の日英翻訳では名詞Ａと名詞Ｂの意味情報を
複雑に組み合わせて利用していた。この研究と本発明と
は以下の点で異なっている。The method of the present invention also utilizes a database of examples and is classified as an example-based method. As a conventional example in which the method based on this example is used for Japanese-English translation, a report (Eiichiro Sumita, Hi
toshi Iida, and Hideo Kohya
ma, Translating examples: A
new approach tomachine tra
nslation, The third Interna
regional Conference on Theore
mechanical and Methodological Is
suesin Machine Translation
of Natural Language, No.3,
(TMI, 1990), pp.203-212). However, the problem with this technology is "A's B",
In the Japanese-English translation of "A no B", the semantic information of the noun A and the noun B was used in a complicated combination. This study differs from the present invention in the following points.

【００１３】１）請求項１の第一のデータベースに相当
する用いるデータベースの構成、２）用例間の類似性を評価する方法、３）ひとつの用例の中で注目する位置。1) Configuration of a database used corresponding to the first database of claim 1, 2) Method of evaluating similarity between examples, 3) Position of interest in one example.

【００１４】最後に、第三の従来技術として、用例間の
類似性を見る方法として文末から数えた一致文字列の数
を用いた従来例を示す。Finally, as a third conventional technique, a conventional example in which the number of matching character strings counted from the end of a sentence is used as a method of checking the similarity between examples will be shown.

【００１５】本発明でも、文末から数えた一致文字列の
数を、類似性を評価した値（類似度）として用いている
が、この方法自体には従来例がある。文末の省略表現の
補完を行なう研究（村田真樹、長尾真、日本語文章にお
ける表層表現と用例を用いた動詞の省略の補完、言語処
理学会誌、Ｖｏｌ．５、Ｎｏ．１、（１９９８））があ
り、文末一致文字列の文字数を類似度とする用例ベース
利用を利用して行っている。Also in the present invention, the number of matching character strings counted from the end of a sentence is used as a value (similarity) for evaluating similarity, but this method itself has a conventional example. Study on complementing abbreviations at the end of sentences (Maki Murata, Makoto Nagao, Complementing verb abbreviations using surface expressions and examples in Japanese sentences, Journal of Linguistic Processing, Vol. 5, No. 1, (1998)) There is an example base usage that uses the number of characters in the end-of-sentence matching character string as the degree of similarity.

【００１６】この従来例と本発明とは、以下の点におい
て異なっている。１）従来例では、対象とする問題が省略表現の補完であ
り、本発明の様に、異なる言語間の翻訳ではない。２）請求項１の第一のデータベースに相当する用いるデ
ータベースの構成が言語及び項目において相異してい
る。The conventional example and the present invention are different in the following points. 1) In the conventional example, the target problem is complementation of abbreviations, not translation between different languages as in the present invention. 2) The structure of the database used corresponding to the first database of claim 1 is different in language and item.

【００１７】以上の従来技術は、本発明の方法とは部分
的に一致しているに過ぎず、これらの先行技術を単に組
み合わせても、本発明の翻訳方法を容易に思いつくもの
ではない事は明らかである。The above-mentioned prior art is only partially in agreement with the method of the present invention, and it is not easy to think of the translation method of the present invention by simply combining these prior arts. it is obvious.

【００１８】[0018]

【発明が解決しようとする課題】従来のデータベースを
利用した翻訳装置における翻訳方法では、従来の文末表
現の翻訳は、人手で作成した規則によってなされてき
た。しかし、人手で作成した規則によって翻訳する方法
では、翻訳精度を向上させるために行う規則のメンテナ
ンスに多大な人的資源を投入する必要性があるという欠
点があった。In the translation method in the translation device using the conventional database, the conventional translation of the sentence end expression has been done by the rule created manually. However, the method of translating by a rule created by hand has a drawback in that it is necessary to invest a large amount of human resources for the maintenance of the rule to improve the translation accuracy.

【００１９】本発明は上記に鑑み提案されたもので、人
手による規則集の作成をする必要が無くデータベースを
利用した翻訳の知識が無くても翻訳精度の向上を図るこ
とができるデータベースを利用した翻訳装置を提供する
ことを目的とする。The present invention has been proposed in view of the above, and uses a database that does not require manual creation of a rule set and can improve translation accuracy without knowledge of translation using the database. An object is to provide a translation device .

【００２０】[0020]

【課題を解決するための手段】上記目的を達成するため
に本発明で用いる手段を、フローチャートで説明すると
図１の様になる。また、以下に本発明のよる方法を簡潔
に記述する。The means used in the present invention to achieve the above object will be described with reference to the flow chart of FIG. The method according to the present invention will be briefly described below.

【００２１】１）解析以前に予め、第一の言語の用例
と、それに対応する第二の言語の用例を集めたデータベ
ースを作成する。また、この際に第二の言語の用例のテ
ンス、アスペクトあるいはモダリティの分類を付与す
る。この付与は、人手で行っても良いし、既によく知ら
れている形態素および構文解析システムを補助として用
いることも出来る。1) Prior to analysis, a database is prepared in advance, which collects examples of the first language and corresponding examples of the second language. At this time, the classification of the tense, aspect or modality of the second language example is added. This assignment may be performed manually, or an already well-known morpheme and parsing system can be used as an aid.

【００２２】２）解析における手続１入力文の翻訳のた
めの検索で、文末からの一致文字列が最も長い用例を、
上記のデータベースから検索する。検索方法は、よく知
られた２分検索の方法を使うことが出来る。2) Procedure 1 in analysis In the search for translation of the input sentence, an example in which the matching character string from the end of the sentence is the longest,
Search from the above database. The well-known binary search method can be used as the search method.

【００２３】３）解析における手続２翻訳の確定で、手
続1から取り出した用例の英訳側の動詞部分のテンス、
アスペクトあるいはモダリティの分類を、入力文のテン
ス、アスペクトあるいはモダリティと確定する。3) Procedure 2 in the analysis, the sentence of the verb part on the English translation side of the example extracted from Procedure 1 in the confirmation of the translation,
The classification of aspect or modality is established as the tense, aspect or modality of the input sentence.

【００２４】４）解析における手続３翻訳文の構成方法
で、テンス、アスペクトあるいはモダリティの翻訳以外
の部分は、従来の翻訳方法のどの方法を用いてもよく、
それらのどれかを用いて翻訳し、テンス、アスペクトあ
るいはモダリティの翻訳は、上記の方法により翻訳し、
これらを合成することにより、文全体の翻訳を完成す
る。4) Procedure 3 in analysis In the method of constructing a translated sentence, any of conventional translation methods may be used for the portion other than the translation of the tense, aspect or modality,
Translate using any of them, and translate tense, aspect or modality by the above method,
The translation of the entire sentence is completed by synthesizing these.

【００２５】従って、上記目的を達成するために、請求
項１に記載の発明は、第一の言語から第二の言語へのデ
ータベースを利用したテンス、アスペクトあるいはモダ
リティに関する翻訳装置で、第一の言語に属する複数の
用例と第二の言語に属する複数の用例からなり且つ個々
の第一の言語に属する用例は、第二の言語に属する用例
との間に少なくとも一つ以上の対応付があり、テンス、
アスペクトあるいはモダリティ情報が付加されたことを
特徴とする第一のデータベースを備え、第一の言語に属
する第一の用例と、第一のデータベースの第一の言語に
属する第二の用例との間の文末からみて連続する共通の
文字列の数を用いて第一のデータベースの第一の言語に
属する第二の用例との間の類似性を評価した値を導く手
段を備え、前記の手段は、（１）該類似性を評価した値
が高いほど類似性が高いとしてその類似性の高い順で、
第一の言語に属する第一の用例に対する第一の言語に属
する第二の用例群を第一のデータベースから予め決めら
れた数だけ選択するという第一の方法で選択し、（２）
第一の言語に属する用例の、第二の言語に属する用例へ
の対応から、第二の言語に属する第一の用例群を第一の
データベースから選択し、（３）この選択された第二の
言語に属する第一の用例群を代表するテンス、アスペク
トあるいはモダリティについて、そのテンス、アスペク
トあるいはモダリティを個々の用例のテンス、アスペク
トあるいはモダリティの多数決で決定するという第二の
方法で決定し、（４）この決定されたテンス、アスペク
トあるいはモダリティを、第一の言語に属する第二の用
例の翻訳のテンス、アスペクトあるいはモダリティとし
て用いる、という構成を備えることを特徴としており、
用例を基にした方法でテンス、アスペクトあるいはモダ
リティを適切に翻訳するものを提案している。Therefore, in order to achieve the above object, the invention according to claim 1 is a translation device for a tense, an aspect, or a modality using a database from a first language to a second language. An example consisting of a plurality of examples belonging to a language and a plurality of examples belonging to a second language and belonging to each first language has at least one correspondence with an example belonging to the second language. , Tense,
It is equipped with a first database characterized by having aspect or modality information added, and it has a first example belonging to a first language and a first language of the first database.
Using the number of common character strings that are continuous from the end of the sentence with the second example to which it belongs, derive a value that evaluates the similarity with the second example that belongs to the first language of the first database. comprising means, said means were evaluated (1) the similarity value
, The higher the similarity, the higher the similarity ,
Select a second example group belonging to the first language for the first example belonging to the first language by a first method of selecting a predetermined number from the first database, (2)
From the correspondence of the examples belonging to the first language to the examples belonging to the second language, the first example group belonging to the second language is selected from the first database, and (3) the selected second The tense, aspect or modality representing the first example group belonging to the language of is determined by the second method of determining the tense, aspect or modality by the majority of the tense, aspect or modality of each example, ( 4) the determined Tense, aspect or modality, the second example of the translation of Tense belonging to the first language, is used as an aspect or modality, and characterized by comprising a structure that,
We propose a way to properly translate tense, aspect or modality in an example-based manner.

【００２６】また、請求項２に記載の発明は、データベ
ースを利用したテンス、アスペクトあるいはモダリティ
に関する翻訳装置であり、その類似性を評価する構成の
特徴は、形態素解析を行なって形態素を認識し、各形態
素についてシソーラスの分類番号を付して、シソーラス
の分類番号による構成に変換された該第一の言語に属す
る第一の用例と、同様に変換された第一のデータベース
内の第一の言語に属する第二の用例との間に、文末から
みて連続する共通の文字列の数を用いることを特徴とす
る、第一の言語に属する第二の用例との間の類似性を評
価した値を導く手段を備えている。 The invention according to claim 2 is a database.
-Based tense, aspect or modality
Is a translation device related to the
The feature is that a morpheme is analyzed to recognize a morpheme, a thesaurus classification number is attached to each morpheme, and the first example belonging to the first language converted into a configuration according to the thesaurus classification number is similar to the first example. It is characterized in that the number of common character strings continuous from the end of the sentence is used between the converted second database and the second example belonging to the first language .
The similarity between the second example in the first language and
It is equipped with a means for deriving a valuated value.

【００２７】[0027]

【発明の実施の形態】以下にこの発明の実施の形態を詳
細に説明する。先ず第１の実施形態を、表１を用いて説
明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below. First, the first embodiment will be described using Table 1.

【００２８】今「彼は私の知り合いだ」の時制を翻訳す
ることを考える。このとき日英の翻訳対を大量に集めた
データベースに対して「彼は私の知り合いだ」と文末か
らの文字列一致が多い用例を上位から１０個集めたもの
が表１のものだったとする。 [0028] now, "he my acquaintance's" think that to translate the tense of. At this time, it is assumed that Table 1 shows a collection of a large number of Japanese-English translation pairs, "He is my acquaintance", and 10 examples from the top that have many string matches from the end of the sentence. .

【００２９】[0029]

【表１】 [Table 1]

【００３０】表の類似度は文末からの一致文字列の数を
示している。また、ここでｋ近傍法を用いる。ｋ近傍法
とは１個の最も類似した用例を用いるかわりに、類似度
の上位から順に取り出したｋ個の用例の多数決により求
める方法である。 The similarity of the table shows the number of matches a string from the end of the sentence. Also, the k-nearest neighbor method is used here. The k-nearest neighbor method is a method of obtaining a majority rule of k examples taken in order from the highest similarity, instead of using one most similar example.

【００３１】類似度が等しい用例がある場合はｋの値に
関わらず類似度が等しい用例はすべて用いて多数決を行
なう必要がある。さらに、ここでは処理の簡単のため、
用例は多くても１０個しか調べないこととする。 [0031] it is necessary to perform majority with all similarity is equal example, regardless of the value of k if there are examples similarity is equal. Furthermore, because of the simplicity of processing here,
Only 10 examples will be examined at most.

【００３２】また、上記の表１のうち、分類の欄は英語
文の該当する動詞句より求まるものであるがこの部分は
よく知られた処理プログラムを用いて自動で行なっても
良いし、データベースを作成する際に人手であらかじめ
分類を記入しておいてもよい。 Further , in Table 1 above, the classification column is obtained from the corresponding verb phrase in the English sentence, but this portion may be automatically performed using a well-known processing program, or it may be performed in the database. When creating, the classification may be manually entered in advance.

【００３３】まず、ｋ＝１の場合を考える。このとき最
も類似度の大きい１個の用例を用いて解析するわけだ
が、ここでは１番と２番が同じ類似度のため、１と２番
の用例を用いて解析を行なう。これで多数決を行なうと
分類は「現在完了」が１、「現在」が１と意見がわか
れ、意見がわかれたときには先にあがった分類を解とす
ると決めておくと、解は先に上がった「現在完了」とな
り、不正解となる。 [0033] First, consider the case of k = 1. At this time, one example with the highest similarity is used for analysis, but since the first and second examples have the same similarity, analysis is performed using the first and second examples. When a majority decision is made with this, it is said that the classification is 1 for "currently completed" and 1 for "current", and when the opinion is found, it is decided that the classification that went up earlier is the solution, and the solution goes up first. It becomes "currently completed" and becomes an incorrect answer.

【００３４】次に、ｋ＝３の場合を考える。このとき最
も類似度の大きい３個を選ぶわけだが、３番の用例以降
はすべて類似度が等しいので、１０個すべての用例を用
いることになる。これで多数決を行なうと分類は「現在
完了」が２、「現在」が８と意見は分かれるが、数の大
きい「現在」となり、これは正解の「現在」と一致し正
解となる。 [0034] Next, consider the case of k = 3. At this time, the three with the highest degree of similarity are selected, but since the third example and thereafter have the same degree of similarity, all ten examples are used. If a majority decision is made with this, the opinion is divided as "currently completed" is 2 and "currently" is 8, but there is a large number of "currently", which corresponds to the correct "currently" and is correct.

【００３５】次に、ｋ＝５、７、９の場合も同様に１０
個の用例すべてが用いられ解は「現在」となり、これも
正解となる。 [0035] Next, as well as 10 in the case of k = 5,7,9
All the examples are used and the solution is "present", which is also correct.

【００３６】この問題ではシステムは、ｋ＝１のとき、
誤った解を出力し、ｋ＝３、５、７、９のときに正しい
解を出力するということになる。ｋの値については装置
を実際に作成する時に適切なものを選択するとよい。こ
の方法によるｋの値は通常、多数決の都合上、奇数が望
ましく、さらに３あるいは５で充分な場合が多い。デー
タベースの用例が増えるに従って、より小さいｋの値を
用いる事ができる。 In this problem, the system is such that when k = 1,
This means that an incorrect solution is output, and a correct solution is output when k = 3, 5, 7, 9. Regarding the value of k, an appropriate value may be selected when actually manufacturing the device. The value of k obtained by this method is usually an odd number for the sake of majority, and 3 or 5 is often sufficient. As the number of database applications increases, smaller k values can be used.

【００３７】次に第２の実施形態を、表２を用いて説明
する。今「彼は私の知り合いだ」の時制を翻訳すること
を考える。このとき日英の翻訳対を大量に集めたデータ
ベースに対して「彼は私の知り合いだ」と文末からの文
字列一致が多い用例を上位から１０個集めたものが次の
表２のものだったとする。 Next the second embodiment will be explained with reference to Table 2. Now consider translating the tense "He's my acquaintance". At this time, for a database that collected a large number of Japanese-English translation pairs, "He is my acquaintance" was collected from the top 10 examples of many string matches from the end of the sentence. Suppose

【００３８】[0038]

【表２】 [Table 2]

【００３９】表２の類似度は、入力文の形態素解析を行
なって形態素を認識し、各形態素についてシソーラスの
分類番号を付して、シソーラスの分類番号による構成に
変換された入力文を用意し、また日英の翻訳対を大量に
集めたデータベースに対しても同様な変換を行ったもの
を用意し、これらの変換された後の文について、文末か
らみて連続する共通の文字列の数を示している。 [0039] Table 2 of similarity recognizes the morphemes by performing morphological analysis of the input sentence, for each morpheme assigned the classification number of the thesaurus, and provide an input sentence is converted to the configuration according to the classification number of the thesaurus Also, prepare similar database for a large collection of Japanese-English translation pairs, and for these converted sentences, determine the number of common character strings that are continuous from the end of the sentence. Shows.

【００４０】解析は、第１の実施形態と同じくｋ近傍法
を用いることにする。 For the analysis, the k-nearest neighbor method will be used as in the first embodiment.

【００４１】まず、ｋ＝１の場合を考える。このとき最
も類似度の大きい１番の用例だけを用いて解析を行な
う。１番の用例は分類が「現在完了」なので正解の分類
「現在」と異なり、不正解となる。 [0041] First, consider the case of k = 1. At this time, the analysis is performed using only the first example having the highest degree of similarity. In the first example, the classification is “currently completed”, which is an incorrect answer, unlike the correct answer “current”.

【００４２】次に、ｋ＝３の場合を考える。このとき最
も類似度の大きい３個を選ぶわけだが、３番の用例と４
番の用例の類似度が等しいので、４番の用例までの四つ
の用例を用いることになる。これで多数決を行なうと分
類は「現在完了」が２、「現在」が２と意見がわかれ、
解は先に上がった「現在完了」となり、これもまた不正
解となる。 Next, consider the case of k = 3. At this time, the three with the highest similarity are selected, but the third example and 4
Since the number examples have the same degree of similarity, four examples up to the number example 4 will be used. When a majority decision is made with this, it is understood that the classification is 2 for "currently completed" and 2 for "currently",
The solution is now "completed", which is also an incorrect solution.

【００４３】次に、ｋ＝５の場合を考える。このとき最
も類似度の大きい５個を選ぶわけだが、５番の用例以降
はすべて類似度が等しいので、１０個すべての用例を用
いることになる。これで多数決を行なうと分類は「現在
完了」が２、「現在」が８と意見はわかれるが、数の大
きい「現在」となり、これは正解の「現在」と一致し正
解となる。 Next, consider the case of k = 5. At this time, the five with the highest degree of similarity are selected. However, since the fifth and subsequent examples have the same degree of similarity, all ten examples are used. If a majority decision is made with this, it can be understood that the classification is “currently completed” is 2, and “current” is 8, but the number is the “current”, which is a large number, which coincides with the correct “current” and is correct.

【００４４】次に、ｋ＝７、９の場合も同様に１０個の
用例すべてが用いられ解は「現在」となり、これも正解
となる。 Next, k = 7,9 is the solution as well as all 10 of the example is used in the case of "current", and this is also correct
Becomes

【００４５】この問題ではシステムは、ｋ＝１、３のと
き、誤った解を出力し、ｋ＝５、７、９のときに正しい
解を出力するということになる。ｋの値については装置
を実際に作成する時に適切なものを選択するとよい。こ
の方法によるｋの値は通常、多数決の都合上、奇数が望
ましく、さらに７あるいは９で充分な場合が多い。この
場合も、データベースの用例が増えるに従って、より小
さいｋの値を用いる事ができる。 In this problem, the system will output an incorrect solution when k = 1,3 and a correct solution when k = 5,7,9. Regarding the value of k, an appropriate value may be selected when actually manufacturing the device. For the value of k by this method, an odd number is usually desirable for the sake of majority, and 7 or 9 is often sufficient. Also in this case, a smaller value of k can be used as the number of database applications increases.

【００４６】上記の実施形態に示されるように、本発明
の方法では、用例を集めたデータベースを整備して行く
ことによって翻訳精度の向上を図ることができ、従っ
て、人手による規則集の作成をする必要が無くメンテナ
ンスが容易であり、また、データベースを利用した翻訳
の知識が無くても翻訳精度の向上を図ることができる。 As shown in the above embodiment, in the method of the present invention, it is possible to improve the translation accuracy by maintaining the database in which the examples are collected. Therefore, it is possible to manually prepare the rule set. Maintenance is easy because there is no need to do it, and translation accuracy can be improved without knowledge of translation using a database.

【００４７】[0047]

【発明の効果】この発明は上記した構成からなるので、
以下に説明するような効果を奏することができる。Since the present invention has the above-mentioned structure,
The effects described below can be achieved.

【００４８】請求項１に記載の発明では、用例を基にし
た翻訳が可能となり、人手による規則集の作成をする必
要が無くメンテナンスが容易であり、また、文末からみ
て、連続する共通の文字列の数であることとすることに
より、簡単に類似性を評価することが出来、データベー
スを利用した翻訳の知識が無くても翻訳精度の向上を図
ることができるようになった。 [0048] In the invention described in claim 1, enables translations based on examples, it is easy maintenance it is not necessary to the creation of Regulations manual, also endnote viewed
The number of consecutive common strings
As a result, the similarity can be easily evaluated, and the translation accuracy can be improved without any knowledge of translation using the database.

【００４９】さらに、請求項２に記載の発明では、意味
上の類似性を用いて類似性を評価することが出来るよう
になり、意味上からも適切な翻訳ができるようになっ
た。 [0049] Further, in the invention according to claim 2, it will be able to evaluate the similarity using the similarity of the meanings, can now also appropriate translation from the mean.

[Brief description of drawings]

【図１】本発明における課題を解決するための手段を示
すためのフローチャートである。FIG. 1 is a flowchart showing a means for solving the problems in the present invention.

【図２】従来の技術における手段を示すためのフローチ
ャートである。FIG. 2 is a flowchart showing a means in the prior art.

───────────────────────────────────────────────────── フロントページの続き (72)発明者井佐原均兵庫県神戸市西区岩岡町岩岡558−２郵政省通信総合研究所関西支所内 (56)参考文献特開平６−309352（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hitoshi Isahara 558-2 Iwaoka, Iwaoka-cho, Nishi-ku, Kobe, Hyogo Prefectural Telecommunications Research Institute Kansai Branch (56) Reference JP-A-6-309352 (JP, A) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/28 JISST file (JOIS)

Claims

(57) [Claims]

1. A translation device using a database from a first language to a second language, comprising a plurality of examples belonging to a first language and a plurality of examples belonging to a second language, and each individual first The example that belongs to the language of has at least one or more correspondence with the example that belongs to the second language.
A first database having a first database characterized by adding aspect or modality information, and a first database belonging to a first language and a first database
Similarity between the second example belonging to the first language of the first database and the second example belonging to the first language by using the number of common character strings continuous from the end of the sentence comprising means for guiding the value obtained by evaluating, said means, the higher the value obtained by evaluating (1) the similarity
In order of high similarity, the second example group belonging to the first language with respect to the first example belonging to the first language is arranged in a descending order from the first database by a predetermined number <br / > Select by the first method of selecting, and (2) first example group belonging to the second language from the correspondence of the example belonging to the first language to the example belonging to the second language. (3) For the tense, aspect or modality representing the first example group belonging to the selected second language , the tense, aspect or modality is set for each individual example tense, aspect or modality. in determined by the second method of determining by majority, (4) the determined Tense, aspect or modality, the second example of the translation of Tense belonging to the first language, a Used as and aspect or modality
That, characterized in that it comprises a structure that, Tense using the database, the translation device relates aspect or modality.

2. A translation device using a database from a first language to a second language, comprising a plurality of examples belonging to the first language and a plurality of examples belonging to the second language, and each individual first The example that belongs to the language of has at least one or more correspondence with the example that belongs to the second language.
It is equipped with a first database characterized by the addition of aspect or modality information, recognizes morphemes by performing a morpheme analysis with a first example belonging to the first language, and identifies the thesaurus classification numbers for each morpheme. And a second example belonging to the first language in the first database, which has been converted into a structure based on the thesaurus classification number, and a second example similarly belonging to the first database. between, comprising means for guiding the value obtained by evaluating the similarity between the second example belonging to the first language, which comprises using a number of common string consecutive when viewed from the end of the sentence, said means (1) The higher the evaluation value of the similarity,
In order of high similarity, the second example group belonging to the first language with respect to the first example belonging to the first language is arranged in a descending order from the first database by a predetermined number <br / > Select by the first method of selecting, and (2) from the correspondence of the example belonging to the first language to the example belonging to the second language, the first example group belonging to the second language is first (3) For the tense, aspect or modality representing the first example group belonging to the selected second language , the tense, aspect or modality is set for each individual example tense, aspect or modality. in determined by the second method of determining by majority, (4) the determined Tense, aspect or modality, the second example of the translation of Tense belonging to the first language, a Used as and aspect or modality
A database-based translation device relating to a tense, an aspect, or a modality characterized by having a configuration .