JP3305953B2

JP3305953B2 - Translation pattern creation method and apparatus

Info

Publication number: JP3305953B2
Application number: JP17859596A
Authority: JP
Inventors: 美穂子北村
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-06-19
Filing date: 1996-06-19
Publication date: 2002-07-24
Anticipated expiration: 2016-06-19
Also published as: JPH1011445A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、機械翻訳装置等に
用いられ、対訳文書に対して統計的処理を行って翻訳パ
ターンを作成する翻訳パターン作成装置およびその方法
に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a translation pattern creating apparatus and a method for creating a translation pattern by performing statistical processing on a bilingual document and used in a machine translation apparatus or the like.

【０００２】[0002]

【従来の技術】一般に、機械翻訳装置に使用する対訳辞
書の訳語を決定する作業は困難である。これは、単語の
訳語は、一意に決まらず、その単語と共起する単語に依
存したり、前後の文脈に依存するためである。更に、単
語単位で訳されるとは限らず、“you are welcome ”と
「どういたしまして」のように、複数の単語が共起する
ことによって一つの訳語が与えられるという、イディオ
ムや固定的な言い回しのような複数単語の対応に関わる
訳語も必要である。2. Description of the Related Art Generally, it is difficult to determine a translation word of a bilingual dictionary used for a machine translation device. This is because the translation of a word is not uniquely determined and depends on the word co-occurring with the word or on the context before and after. In addition, idioms and fixed phrases such as “you are welcome” and “welcome to you” are not always translated in units of words, and one translation is given when multiple words co-occur. Such translated words relating to the correspondence of plural words are also required.

【０００３】このような機械翻訳装置用の対訳辞書をよ
り簡単に作成するために、例えば、特開平５−１５１２
６０号公報に開示されているように、既存の対訳文を利
用することにより対訳辞書作成作業を軽減するシステム
がある。具体的には、既存の対訳文の両言語間の対応付
けを、構文解析結果と対訳辞書を用いて、または、人間
の判断を利用して行い、その結果から“you are welcom
e ”と「どういたしまして」のような翻訳パターンを作
成するといったものである。In order to more easily create such a bilingual dictionary for a machine translation apparatus, for example, Japanese Patent Laid-Open No. 5-1512 is disclosed.
As disclosed in Japanese Unexamined Patent Publication No. 60, there is a system that reduces the work of creating a bilingual dictionary by using an existing bilingual sentence. Specifically, the existing bilingual sentence is correlated between the two languages using the parsing result and the bilingual dictionary or using human judgment, and based on the result, “you are welcom
e ”and a translation pattern like“ I'm glad you are ”.

【０００４】そして、上記のシステムでは、対訳文中の
両言語間の単語や句の対応関係を自動認識させる機構に
よって、使用者は対訳文を準備するだけで、対訳文中の
単語、イディオムや固定的な言い回しに関する翻訳パタ
ーンを容易に作成することができる。使用者自ら翻訳単
位（ある訳語を付与するために必要な原言語の単位）を
同定したり、原文と目的文の単語間の対応付けを行った
りする必要はない。In the above-mentioned system, a user simply prepares a bilingual sentence by using a mechanism for automatically recognizing the correspondence between words and phrases between bilingual sentences in the bilingual sentence. It is possible to easily create a translation pattern relating to a simple wording. It is not necessary for the user to identify a translation unit (a source language unit necessary for assigning a certain translated word) or associate words between an original sentence and a target sentence.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
従来のシステムでは次の問題が発生する。第一に、対訳
辞書の作成対象となる対訳文一文内の言語情報だけで対
訳文内の語の対応関係を自動的に同定することから、従
来システムでは、構文解析処理および対訳辞書を必要と
する。しかし、実際の文書は構文解析が困難な文が多
い。更に、技術文書等、特別な文書には一般の対訳辞書
には登録されていない専門用語が多く含まれるが、従来
システムでは、このような文書における対訳関係を同定
することはできなかった。However, the above-mentioned conventional system has the following problems. First, since the correspondence between words in a bilingual sentence is automatically identified using only the linguistic information in one bilingual sentence for which a bilingual dictionary is to be created, the conventional system requires a parsing process and a bilingual dictionary. I do. However, actual documents have many sentences that are difficult to parse. Furthermore, special documents such as technical documents contain many technical terms that are not registered in a general bilingual dictionary, but the conventional system could not identify a bilingual relationship in such a document.

【０００６】第二に、対応関係を同定するための情報と
して構文解析結果と対訳辞書を利用するため、両言語で
の構文構造が大きく異なる場合、または、対訳辞書に登
録されていない単語ペアが対訳文に含まれている場合
は、対応関係を同定できなかった。Second, since the syntax analysis result and the bilingual dictionary are used as information for identifying the correspondence, if the syntax structures in the two languages are significantly different, or if a word pair not registered in the bilingual dictionary is used, If it was included in the bilingual sentence, the correspondence could not be identified.

【０００７】第三に、準備される対訳文が、論文、マニ
ュアルのような複数文から構成される文書の場合、その
文書全体から得られる言語情報は有用である。例えば、
片方の言語の文書において、ある複数単語が常に同時に
出現するならば、それらを一つの翻訳単位とみなすこと
ができる。具体的な例を挙げて説明すると、対訳文書の
日本語の文書において「シンボリック」と「リンク」が
同時に出現した時、それに対応する英語の文書において
“symbolic”と“link”が常に同時に出現するならば、
「シンボリックリンク」は、“symbolic link ”と対
応すると同定することができる。しかし、上記従来のシ
ステムでは、文書全体から得られる言語情報は全く考慮
されていなかった。Third, when the prepared translation is a document composed of a plurality of sentences such as a dissertation and a manual, linguistic information obtained from the entire document is useful. For example,
If a plurality of words always appear simultaneously in a document in one language, they can be regarded as one translation unit. To explain by giving a specific example, when "symbolic" and "link" appear simultaneously in a bilingual Japanese document, "symbolic" and "link" always appear simultaneously in the corresponding English document Then
"Symbolic link" can be identified as corresponding to "symbolic link". However, in the above-mentioned conventional system, linguistic information obtained from the entire document is not considered at all.

【０００８】第四に、上記従来システムでは、対応関係
を同定する情報として対訳辞書を用いているが、対訳辞
書はあくまで同定する材料としての使用であり、自ら抽
出した翻訳パターンを用いて対訳辞書を改良する手段は
持っていなかった。そのため、対応関係を同定する情報
として対訳辞書を使用する一方で、その対訳辞書自らを
与えられた文書に適応させていくといった、分野（文
書）単位の対訳辞書の作成が容易ではなかった。Fourth, in the above-mentioned conventional system, a bilingual dictionary is used as information for identifying a correspondence. However, a bilingual dictionary is only used as a material for identification, and a bilingual dictionary is extracted using a translation pattern extracted by itself. Did not have the means to improve. For this reason, it has not been easy to create a bilingual dictionary for each field (document), such as using a bilingual dictionary as information for identifying a correspondence, while adapting the bilingual dictionary itself to a given document.

【０００９】上記の従来技術の課題に対して、対訳文書
全体から得られる言語情報を利用することができ、か
つ、構文解析処理、対訳辞書を前提としない翻訳パター
ンの作成が行え、また、既存の対訳辞書を、与えられた
対訳文書に適応させていく手段を持つ翻訳パターン作成
方法および装置の実現が望まれていた。In order to solve the above-mentioned problems of the prior art, linguistic information obtained from the entire bilingual document can be used, a syntactic analysis process, a translation pattern that does not require a bilingual dictionary can be performed, and an existing translation pattern can be created. It has been desired to realize a translation pattern creation method and apparatus having means for adapting a bilingual dictionary to a given bilingual document.

【００１０】[0010]

【課題を解決するための手段】本発明は以上の点を解決
するため次の構成を採用する。〈請求項１の構成〉本発明に係る翻訳パターン作成方法は、対訳文テーブル
に格納され、原言語文書と該文書の対訳文である目的言
語文書とを相互に文単位で対応付けて形成した対訳文書
に対して、翻訳パターン作成装置により、対訳文書中の
対訳された単語ペアの出現回数と、該単語ペア中の原言
語単語の原言語文書における出現回数と、該単語ペア中
の目的言語単語の目的言語文書における出現回数とに基
づいて単語ペアの対応度を求め、該対応度が最も高い値
となる単語ペアを対訳単語として抽出し、抽出された単
語ペアを対訳文テーブルから除いて再び対訳文書を形成
して残存された単語ペアに対して再び対応度を求め、該
対応度の最も高い値となる単語ペアを対訳単語として抽
出し、これらを繰り返して順次対訳単語を抽出すること
を特徴とする。 The present invention employs the following structure to solve the above problems. <Structure of Claim 1> A translation pattern creating method according to the present invention comprises a bilingual sentence table.
And a target language, which is a translation of the source language document and the document
A bilingual document formed by associating word documents with each other in sentence units
The translation pattern creation device
The number of occurrences of the translated word pair and the original words in the word pair
The number of occurrences of the word in the source language document, and
Of the target language words in the target language document
The degree of correspondence between word pairs is calculated based on
Is extracted as a bilingual word, and the extracted unit
Remove word pairs from the bilingual sentence table and form a bilingual document again
The degree of correspondence is again obtained for the remaining word pairs,
The word pair with the highest degree of correspondence is extracted as a bilingual word.
And repeat these steps to sequentially extract bilingual words
It is characterized by.

【００１１】〈請求項１の説明〉請求項１の発明は、対
訳文に対して構文解析といった処理を行わず、統計的処
理のみで対訳単語を抽出する点を特徴としている。その
ためには、先ず、対訳文中、原言語の任意の単語の出現
回数および目的言語の任意の単語の出現回数を求め、更
に、原言語の単語と目的言語の単語における任意の単語
ペアとしての出現回数を求める。そして、これら出現回
数が近い値をとる程、高値となる単語対応度を求め、こ
の単語対応度が最も高い値の単語ペアを対訳単語と同定
するようにしている。即ち、ある単語に対する対訳単語
であれば、これらの出現回数が近い値となる点に着目
し、このような方法をとっている。<Explanation of Claim 1> The invention of claim 1 is characterized in that a bilingual sentence is extracted only by statistical processing without performing processing such as syntactic analysis on the bilingual sentence. For this purpose, first, the number of occurrences of an arbitrary word in the source language and the number of occurrences of an arbitrary word in the target language in the bilingual sentence are obtained. Further, the number of occurrences as an arbitrary word pair in the source language word and the target language word is determined. Find the number of times. Then, as the number of appearances becomes closer to each other, the higher the word correspondence, the higher the word correspondence is determined, and the word pair having the highest value of the word correspondence is identified as a bilingual word. That is, in the case of a bilingual word corresponding to a certain word, such a method is employed by paying attention to the point that the number of appearances of these words is a close value.

【００１２】原言語と目的言語とは、例えば日本語と英
語といった言語であるが、これらの言語はどのような言
語であってもよい。文単位で対応付けされた対訳文書と
は、原言語の文と、これに対する目的言語の訳文とが文
単位で対応していればよく、これは、一文対一文である
必要はなく、一文対複数文や、複数文対複数文であって
もよい。The source language and the target language are, for example, languages such as Japanese and English, but these languages may be any languages. A bilingual document that is correlated in units of sentences is only required to correspond to a source language sentence and a corresponding target language translation in unit of sentence. This does not have to be one sentence to one sentence. A plurality of sentences or a plurality of sentences versus a plurality of sentences may be used.

【００１３】単語対応度とは、例えば、（２×日英の単
語ペアの出現回数）／（日本語単語の出現回数＋英単語
の出現回数）によって求めるが、この式に限定されるも
のではなく、原言語の単語の出現回数と、目的言語の単
語の出現回数と、これら単語同士が同一訳文中に出現す
る回数とが近い数であるほど高値となる単語対応度であ
れば、どのような算出方法も利用可能である。The degree of word correspondence is obtained by, for example, (2 × the number of appearances of a Japanese-English word pair) / (the number of appearances of a Japanese word + the number of appearances of an English word), but is not limited to this equation. If the number of occurrences of the word in the source language, the number of occurrences of the word in the target language, and the number of occurrences of these words in the same translation are closer, the higher the word correspondence, the higher the word correspondence. Various calculation methods are also available.

【００１４】そして、請求項１の発明は、更に、前回の
対訳単語抽出処理では抽出できなかった単語ペアを抽出
するために、１回目で抽出された単語ペアを数え上げの
対象から外して、再び対応度を求める。これにより、あ
る単語ペアの単語対応度は１回目より高値になり、２回
目以降の処理でこの単語ペアを抽出することができる。
このように処理を繰り返すことにより、対訳文中に存在
する全ての単語ペアを抽出することができる。 [0014] Then, the invention of claim 1, further for the previous translation word extraction processing for extracting a word pairs that could not be extracted, removed from the object of counting the word pairs extracted by the first time, again Find the degree of correspondence. As a result, the degree of word correspondence of a certain word pair becomes higher than the first time, and this word pair can be extracted in the second and subsequent processing.
By repeating the processing in this manner, all word pairs existing in the bilingual sentence can be extracted.

【００１５】〈請求項２の構成〉請求項１に記載の翻訳パターン作成方法において、翻訳
パターン作成装置により、抽出された単語ペアを元の単
語ペアとした場合、当該元の単語ペアのうち、一方側単
語から他方側単語をみて、元の単語ペアの他方側単語
と、出現する文が同じで、かつ、一方側単語とペアでの
出現回数が同じである他方側単語があり、かつ、元の単
語ペアの他方側単語から見た場合も同様の一方の単語が
あった場合、これらの単語を元の単語ペアと共起する複
数単語対複数単語の翻訳パターンとして抽出する翻訳パ
ターン作成方法である。 [0015] In the translation pattern generating method according to claim 1 <Configuration of claims 2>, the translation pattern forming apparatus, when the extracted word pairs and the original word pair, among the original word pair, Looking at the other word from the one word, there is another word that has the same sentence as the other word in the original word pair and the same number of occurrences in the pair as the one word, and A translation pattern creation method for extracting a word from the other word of the original word pair and extracting the word as a multiple word-to-multiword translation pattern co-occurring with the original word pair when there is a similar word. It is.

【００１６】〈請求項２の説明〉請求項２の発明は、複数の単語対複数の単語の翻訳パタ
ーンを抽出するようにしたものである。ここで、一方側
単語と他方側単語とは、例えば英語と日本語であるが、
これらの言語は任意の言語に対応可能である。また、複
数の単語は、文書中で、連続した単語列でなくてもよ
く、それらの単語間に他の任意の単語が含まれている場
合であっても適用可能である。 [0016] invention <claims described in Section 2> claim 2, in which so as to extract a plurality of word pairs plurality of word translation pattern. Here, the one-sided word and the other-sided word are, for example, English and Japanese,
These languages can correspond to any language. Further, the plurality of words need not be a continuous word string in the document, and can be applied even when any other word is included between the words.

【００１７】[0017]

【００１８】〈請求項３の構成〉請求項３の発明は、対訳文テーブルに格納され、原言語
文書と該文書の対訳文である目的言語文書とを相互に文
単位で対応付けて形成した対訳文書に対して、翻訳パタ
ーン装置により、対訳文書中の対訳され且つ連続単語か
らなる単語列ペアに対して、単語列ペアの出現回数と、
該単語列ペア中の原言語単語列の原言語文書における出
現回数と及び該単語列ペア中の目的言語単語列の目的言
語文書における出現回数とに基づいて単語列ペアの対応
度を求め、該対応度が最も高い値となる単語列ペアを翻
訳パターンとして抽出し、抽出された単語列ペアを対訳
文テーブルから除いて再び対訳文書を形成して残存され
た単語列ペアに対して再び対応度を求め、該対応度の最
も高い値となる単語列ペアを翻訳パターンとして抽出
し、これらを繰り返して順次翻訳パターンを抽出するこ
とを特徴とする翻訳パターン作成方法である。 [0018] The invention of claim 3 <Configuration of Claim 3> is stored in the translated sentence table was formed in association with sentences and target language documents each other is the translation of the source language document and the document For the bilingual document, the translation pattern device uses the number of appearances of the word sequence pair for the bilingual and word sequence pair consisting of continuous words in the bilingual document;
The degree of correspondence between word string pairs is determined based on the number of appearances of the source language word string in the word string pair in the source language document and the number of appearances of the target language word string in the word string pair in the target language document. The word string pair having the highest degree of correspondence is extracted as a translation pattern, the extracted word string pair is removed from the bilingual sentence table, and a bilingual document is formed again. And a word string pair having the highest value of the degree of correspondence is extracted as a translation pattern, and these are repeated to sequentially extract a translation pattern.

【００１９】〈請求項３の説明〉請求項３の発明は、単語列対応度を求め、この単語列対
応度によって、単語列ペアを抽出するようにしたもので
ある。この単語列ペアの対応度は、例えば、（２×日英
の単語列ペアの出現回数）／（日本語単語列の出現回数
＋英単語列の出現回数）によって求めるが、この式に限
定されるものではなく、原言語の単語列の出現回数と、
目的言語の単語列の出現回数と、これら単語列同士が同
一訳文中に出現する回数とが近い数であるほど高値とな
る単語列対応度であれば、どのような算出方法も利用可
能である。 [0019] <Description of claim 3> The invention of claim 3, obtains the word string corresponding degree, by the word string corresponding degree, is obtained so as to extract a word string pairs. The degree of correspondence between the word string pairs is obtained by, for example, (2 × the number of appearances of a Japanese-English word string pair) / (the number of appearances of a Japanese word string + the number of appearances of an English word string), but is limited to this equation. Rather than the number of occurrences of the source language word sequence,
Any calculation method can be used as long as the number of appearances of the word string in the target language and the number of times these word strings appear in the same translation are closer to each other, as long as the word string correspondence degree becomes higher. .

【００２０】また、連続単語からなる単語列としては、
２連続単語や３連続単語といった複数回の連続回数も任
意に設定可能であると共に、一単語の場合も含むものと
する。このような翻訳パターン作成方法により、対訳文
書中の対訳単語だけでなく、その対訳文書中に出現する
固定的な言い回しを翻訳単位とした翻訳パターンを獲得
することができる。 Further , as a word string composed of continuous words,
A plurality of consecutive times such as two consecutive words and three consecutive words can be arbitrarily set, and also includes one word. According to such a translation pattern creating method, it is possible to obtain a translation pattern in which not only a translated word in a translated document but also a fixed phrase appearing in the translated document is used as a translation unit.

【００２１】そして、更に抽出された単語列ペアを除去
した対訳文書の生成及び該生成された新たな対訳文書に
残存した単語列ペアの対応度の計算を繰り返して行うの
で、全ての単語列ペアを対訳文書から順次抽出する作用
を持っているものである。 [0021] Then, it is performed by repeating the further generation and a corresponding degree of calculation of the word sequence pairs remaining in the new translation document the generated extracted word sequence pairs translated document obtained by removing all the word sequence pairs Are sequentially extracted from the bilingual document.

【００２２】〈請求項４の構成〉請求項４の発明は、原言語の文書及びこの文書の対訳で
ある目的言語の文書がそれぞれ文単位に対応付けされた
対訳文書を入力する入力部と、対訳文書に対して、原言
語の文書中の特定の単語の出現回数と、目的言語の文書
中の特定の単語の出現回数と及び特定の単語同士が原言
語および目的言語の文書の同一訳文中に出現する回数と
が近い数であるほど高値となる単語対応度を求め、この
単語対応度が最も高い値の単語同士の単語ペアを対訳単
語として抽出すると共に、任意の単語ペアを対訳単語と
して抽出した場合、この時に対象とした対訳文から、そ
の単語ペアを除いた新たな対訳文を生成し、当該新たな
対訳文に対して単語対応度を求め、この単語対応度が最
も高い値の単語ペアを対訳単語として抽出し、これを繰
り返し行うことにより、対訳単語を順次抽出する対訳単
語推定部と、対訳単語推定部で抽出された対訳単語を出
力する出力部とを備えたことを特徴とする翻訳パターン
作成装置である。 [0022] The invention <claimed structure of claim 4> claim 4, an input unit for inputting a bilingual document documents in the target language are respectively associated with the sentence is a document and translation of this document in the original language, For a bilingual document, the number of occurrences of a specific word in the source language document, the number of occurrences of a specific word in the target language document, and the specific words in the same translation of the source language and target language documents The higher the number of occurrences of the word is, the higher the word correspondence that is higher, the word pair having the highest value of the word correspondence is extracted as a bilingual word, and any word pair is taken as a bilingual word. If extracted, a new bilingual sentence excluding the word pair is generated from the bilingual sentence targeted at this time, and the degree of word correspondence is calculated for the new bilingual sentence. Word pairs as translated words A translation pattern estimating unit that sequentially extracts bilingual words by repeating the process, and an output unit that outputs the bilingual words extracted by the bilingual word estimating unit. is there.

【００２３】〈請求項４の説明〉入力部は、使用者が対訳文書を入力するためのものであ
る。対訳単語推定部は、入力部より対訳文書が入力され
ると、請求項２の方法と同様の処理を行い、対訳単語を
順次出力する。そして、出力部は、対訳単語推定部で抽
出された対訳単語を画面表示や印刷といった出力を行
う。従って、使用者は、文法に従わない文であっても、
その対訳文書を入力するだけで、対訳単語を得ることが
でき、しかも、対訳辞書を必要としないという効果があ
る。 [0023] <claims described in the section 4> input section is for user inputs a bilingual document. When a bilingual document is input from the input unit, the bilingual word estimating unit performs the same processing as the method of claim 2 and sequentially outputs the bilingual word. The output unit outputs the translated word extracted by the translated word estimating unit on a screen or prints. Therefore, even if the user does not follow the grammar,
By simply inputting the bilingual document, a bilingual word can be obtained, and there is an effect that a bilingual dictionary is not required.

【００２４】〈請求項５の構成〉請求項４に記載の翻訳パターン作成装置において、予め
原言語と目的言語の対訳を有する対訳辞書と、対訳文書
中に出現する単語ペアが、対訳辞書に登録されていた場
合は、その単語ペアを除いた対訳文を生成し、当該対訳
文に対して単語対応度を求め、この単語対応度が最も高
い値の単語ペアを対訳単語として抽出する対訳単語推定
部を備えたことを特徴とする翻訳パターン作成装置であ
る。 [0024] In the translation pattern forming apparatus according to claim 4 <Configuration of Claim 5>, and bilingual dictionary with translation of advance source language and target language, the word pairs that appear in the bilingual document, registered in the bilingual dictionary If so, a bilingual sentence excluding the word pair is generated, a word correspondence is calculated for the translated sentence, and a word pair having the highest value of the word correspondence is extracted as a bilingual word. And a translation pattern creating apparatus comprising:

【００２５】〈請求項５の説明〉請求項５の発明は、請求項４の発明において、予め作成
された対訳辞書を用いるようにしたものである。これに
より、請求項４の発明と同様に、文法に従わない文等で
も対訳単語を抽出できると共に、対訳単語を抽出する際
に、対訳辞書の情報を用いることができるため、入力さ
れた対訳文書の情報と、対訳辞書からの対訳単語の情報
の双方の情報を有効に生かした精度の高い対訳単語を抽
出することができる。 The invention <claims described in Section 5> claim 5 is the invention of claim 4, in which to use a previously prepared translation dictionary. Thus, similarly to the invention of claim 4, a bilingual word can be extracted even from a sentence that does not follow the grammar, and information of a bilingual dictionary can be used when extracting a bilingual word. A highly accurate bilingual word can be extracted by effectively utilizing both the information of the bilingual word and the information of the bilingual word from the bilingual dictionary.

【００２６】〈請求項６の構成〉請求項６の発明は、請求項５に記載の翻訳パターン作成
装置において、対訳として抽出した単語ペアを格納する
パターンテーブルと、対訳辞書における原言語の任意の
単語および、当該単語に対する目的言語の訳語がパター
ンテーブルの単語ペアに一致するかを判定し、一致しな
い場合は、対訳辞書において、当該任意の単語に対する
訳語を消去して、パターンテーブルの単語ペアを訳語と
して登録する対訳単語推定部を備えたことを特徴とする
翻訳パターン作成装置である。 The invention of claim 6 <Configuration of claims 6>, in the translation pattern forming apparatus according to claim 5, a pattern table for storing word pairs extracted as translation, any of the source language in the translation dictionary It is determined whether the word and the translation of the target language for the word match the word pair in the pattern table. If not, the translation for the arbitrary word is deleted in the bilingual dictionary, and the word pair in the pattern table is deleted. A translation pattern creation device is provided with a bilingual word estimation unit for registering as a translation word.

【００２７】〈請求項６の説明〉請求項６の発明は、請求項５の発明に加えて、対訳辞書
のチューニングを行うようにしたものである。このチュ
ーニングは、パターンテーブルに登録されている単語ペ
アが対訳辞書になかった場合、その単語ペアを対訳辞書
に登録する処理である。従って、対訳辞書と、使用者が
翻訳したい文書と同じ分野または特徴を持つ対訳文書と
を本装置に同時に与えることによって、対訳辞書をその
対訳文書と同等の翻訳結果が得られる対訳辞書にチュー
ニングすることができる。例えば、分野の異なった文書
毎に本作業を行うことによって、分野に特化した対訳辞
書を作成することができる。従って、これは、分野別の
専門用語辞書として機械翻訳装置等に使用することがで
きる。 The invention <claims described in Section 6> claim 6, in addition to the invention of claim 5, in which to perform the tuning bilingual dictionary. This tuning is a process of registering a word pair registered in the pattern table in the bilingual dictionary when the word pair is not in the bilingual dictionary. Therefore, by simultaneously providing a bilingual dictionary and a bilingual document having the same field or characteristics as the document to be translated by the user to the present apparatus, the bilingual dictionary is tuned to a bilingual dictionary capable of obtaining a translation result equivalent to the bilingual document. be able to. For example, by performing this work for each document in a different field, a bilingual dictionary specialized for the field can be created. Therefore, this can be used as a technical term dictionary for each field in a machine translation device or the like.

【００２８】〈請求項７の構成〉請求項７の発明は、請求項４に記載の翻訳パターン作成
装置において、対訳単語推定部で抽出された単語ペアを
元の単語ペアとし、当該元の単語ペアのうち、一方側単
語から他方側単語をみて、元の単語ペアの他方側単語
と、出現する文が同じで、かつ、一方側単語とペアでの
出現回数が同じである他方側単語があり、かつ、元の単
語ペアの他方側単語から見た場合も同様の一方の単語が
あった場合、これらの単語を元の単語ペアと共起する複
数単語対複数単語の翻訳パターンとして抽出する共起単
語推定部を備えたことを特徴とする翻訳パターン作成装
置である。 The invention of claim 7 <Configuration of claims 7>, in the translation pattern forming apparatus according to claim 4, the word pairs extracted by the translation word estimator and the original word pair, the words of the source In the pair, looking at the other word from the one word, the other word having the same sentence as the other word in the original word pair and having the same number of occurrences in the pair as the one word is found. If there is and there is one similar word when viewed from the other word of the original word pair, these words are extracted as a multi-word-multi-word translation pattern co-occurring with the original word pair A translation pattern creation device comprising a co-occurrence word estimation unit.

【００２９】〈請求項７の説明〉共起単語推定部は、入力部より対訳文書が入力される
と、請求項５の方法と同様の処理を行い、複数単語対複
数単語の翻訳パターンを出力し、出力部は、これを画面
表示や印刷といった出力を行う。従って、使用者は、文
法に従わない文であっても、その対訳文書を入力するだ
けで、対訳単語だけでなく、その対訳文書中に出現する
固定的な言い回しを翻訳単位とした翻訳パターンを獲得
することができる。 The co-occurrence word estimator <Description of claims 7>, when translated document from the input unit is inputted, the same processing as method of claim 5, outputs the multi-word pairs of multi-word translation pattern Then, the output unit performs output such as screen display and printing. Therefore, even if a user does not follow the grammar, the user can input the bilingual document, and not only the bilingual word, but also the translation pattern using the fixed phrase appearing in the bilingual document as a translation unit. Can be acquired.

【００３０】〈請求項８の構成〉請求項８の発明は、請求項７に記載の翻訳パターン作成
装置において、予め原言語と目的言語の対訳を有する対
訳辞書と、対訳文書中に出現する単語ペアが、対訳辞書
に登録されていた場合は、その単語ペアを除いた対訳文
を生成し、当該対訳文に対して単語対応度を求め、この
単語対応度が最も高い値の単語ペアを対訳単語として抽
出する対訳単語推定部を備えたことを特徴とする翻訳パ
ターン作成装置である。 The word invention of claim 8 <Configuration of claims 8> is the translation pattern forming apparatus according to claim 7, appearing with bilingual dictionary with translation of advance source language and target language, in the bilingual document If the pair is registered in the bilingual dictionary, a bilingual sentence excluding the word pair is generated, the degree of word correspondence is calculated for the bilingual sentence, and the word pair having the highest value of the word corresponding degree is translated. A translation pattern creating device comprising a bilingual word estimating unit for extracting as a word.

【００３１】〈請求項８の説明〉請求項８の発明は、請求項７の発明において、対訳辞書
を使用する機能を付加したものである。即ち、請求項５
の発明における対訳単語推定部の機能と、請求項７の発
明における共起単語推定部の機能とを備えている。この
ような構成により、対訳単語を抽出する際に、対訳辞書
の情報を用いることができるため、入力された対訳文書
の情報と、対訳辞書からの対訳単語の情報の双方の情報
を有効に生かした精度の高い対訳単語を抽出することが
でき、また、対訳文書を入力するだけで、対訳単語だけ
でなく、その対訳文書中に出現する固定的な言い回しを
翻訳単位とした翻訳パターンを獲得することができる。 The invention <claims described in Section 8> 8. is the invention of claim 7 is obtained by adding the ability to use bilingual dictionary. That is, claim 5
And a function of a co-occurrence word estimating unit according to the seventh aspect of the present invention. With this configuration, when extracting a bilingual word, the information in the bilingual dictionary can be used, so that both the information of the input bilingual document and the information of the bilingual word from the bilingual dictionary can be effectively used. It is possible to extract highly accurate bilingual words, and by simply inputting a bilingual document, obtain not only the bilingual word but also a translation pattern that uses fixed phrases that appear in the bilingual document as translation units. be able to.

【００３２】〈請求項９の構成〉請求項９の発明は、請求項８に記載の翻訳パターン作成
装置において、対訳として抽出した単語ペアを格納する
パターンテーブルと、対訳辞書における原言語の任意の
単語および、当該単語に対する目的言語の訳語がパター
ンテーブルの単語ペアに一致するかを判定し、一致しな
い場合は、対訳辞書において、当該任意の単語に対する
訳語を消去して、パターンテーブルの単語ペアを訳語と
して登録する対訳単語推定部を備えたことを特徴とする
翻訳パターン作成装置である。 The invention of claim 9 <Configuration of claims 9>, in the translation pattern forming apparatus according to claim 8, a pattern table for storing word pairs extracted as translation, any of the source language in the translation dictionary It is determined whether the word and the translation of the target language for the word match the word pair in the pattern table. If not, the translation for the arbitrary word is deleted in the bilingual dictionary, and the word pair in the pattern table is deleted. A translation pattern creation device is provided with a bilingual word estimation unit for registering as a translation word.

【００３３】〈請求項９の説明〉請求項９の発明は、請求項８の発明に加えて、対訳辞書
のチューニングを行うようにしたものである。従って、
請求項８の発明の効果に加えて、対訳辞書と、使用者が
翻訳したい文書と同じ分野または特徴を持つ対訳文書と
を本装置に同時に与えることによって、対訳辞書をその
対訳文書と同等の翻訳結果が得られる対訳辞書にチュー
ニングすることができるという効果がある。例えば、分
野の異なった文書毎に本作業を行うことによって、分野
に特化した対訳辞書を作成することができる。従って、
これは、分野別の専門用語辞書として機械翻訳装置等に
使用することができる。 The invention <claims described in Section 9> claim 9, in addition to the invention of claim 8, in which to perform the tuning bilingual dictionary. Therefore,
In addition to the effect of the invention of claim 8, by providing a bilingual dictionary and a bilingual document having the same field or characteristics as the document to be translated by the user to the apparatus at the same time, the bilingual dictionary is translated equivalent to the bilingual document. There is an effect that tuning can be performed on a bilingual dictionary that can obtain a result. For example, by performing this work for each document in a different field, a bilingual dictionary specialized for the field can be created. Therefore,
This can be used for a machine translation device or the like as a technical term dictionary for each field.

【００３４】[0034]

【００３５】〈請求項１０の構成〉請求項１０の発明は、原言語文書及び該文書の対訳文で
ある目的言語文書とを相互に文単位で対応付けて形成し
た対訳文書を入力する入力部と、対訳文書中の対訳され
た各連続単語からなる単語列ペアに対して、単語列ペア
の出現回数、該単語列ペア中の原言語単語列の原言語文
書における出現回数及び該単語列ペア中の目的言語単語
列の目的言語文書における出現回数に基づいて、単語列
ペアの対応度を求め、該対応度が最も高い値となる単語
列ペアを翻訳パターンとして抽出し、抽出された単語列
ペアを除いて再び対訳文書を形成して単語列ペアに対し
て再び対応度を求め、該対応度の最も高い値となる単語
列ペアを翻訳パターンとして抽出し、これらを繰り返し
て行い順次翻訳パターンを抽出する翻訳パターン推定部
とを備えることを特徴とする翻訳パターン作成装置であ
る。 The invention of claim 10 <Configuration of claims 10> is an input unit for inputting a bilingual document formed in association with each other sentences and target language document is translated sentence of the source language document and the document And the number of appearances of the word string pair, the number of appearances of the source language word string in the word string pair in the source language document, and the word string pair Based on the number of appearances of the target language word string in the target language document, the degree of correspondence between the word string pairs is determined, and the word string pair with the highest degree of correspondence is extracted as a translation pattern, and the extracted word string is extracted. Except for the pair, a bilingual document is formed again, the degree of correspondence is again obtained for the word string pair, the word string pair having the highest value of the degree of correspondence is extracted as a translation pattern, and these are repeated to sequentially execute the translation pattern. Extract The translation pattern forming apparatus characterized by comprising a translation pattern estimation unit.

【００３６】〈請求項１０の説明〉翻訳パターン推定部は、入力部より対訳文書が入力され
ると、請求項３の方法と同様の処理を行い、単語列の翻
訳パターンを出力し、出力部は、これを画面表示や印刷
といった出力を行う。従って、使用者は、文法に従わな
い文であっても、その対訳文書を入力するだけで、対訳
単語だけでなく、その対訳文書中に出現する固定的な言
い回しを翻訳単位とした翻訳パターンを獲得することが
できる。また、その翻訳パターンとして、一単語の場合
も抽出することができる。 The translation pattern estimation unit <Description of claims 10>, when translated document from the input unit is inputted, the same processing as method of claim 3, and outputs a word string translation pattern, the output unit Performs output such as screen display and printing. Therefore, even if a user does not follow the grammar, the user can input the bilingual document, and not only the bilingual word, but also the translation pattern using the fixed phrase appearing in the bilingual document as a translation unit. Can be acquired. In addition, one word can be extracted as the translation pattern.

【００３７】そして、翻訳パターン推定部が、対訳文テ
ーブルの格納される対訳文書から最も高い対応度の単語
列ペアを抽出した後、該抽出された単語列ペアを消去し
て新たな対訳文書を生成して、生成された対訳文書から
再び対応度に基づいて単語列ペアを抽出し、これらを繰
り返して行い、対訳文書における全ての単語列ペアを抽
出することから、請求項３に記載された方法を実現する
ことができる。 [0037] Then, the translation pattern estimation unit, after extracting the highest degree of correspondence word string pairs from bilingual documents stored in the translation table, a new translation document to erase the word sequence pairs issued extract The method according to claim 3, wherein a word string pair is extracted from the generated bilingual document based on the degree of correspondence again, and these are repeatedly performed to extract all word string pairs in the bilingual document. The method can be realized.

【００３８】[0038]

【発明の実施の形態】以下、本発明の実施の形態を具体
例を用いて説明する。《具体例１》〈構成〉図１は本発明の翻訳パターン作成装置における
具体例１の構成図である。この具体例１は、対訳辞書を
使用しなくても対訳単語（単語単位の翻訳パターン）を
自動的に抽出できる点に特徴を有している。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below using specific examples. << Specific Example 1 >><Configuration> FIG. 1 is a configuration diagram of a specific example 1 in the translation pattern creating apparatus of the present invention. This specific example 1 is characterized in that a bilingual word (translation pattern in word units) can be automatically extracted without using a bilingual dictionary.

【００３９】図の装置は、使用者が対訳文書を入力した
り、作成した対訳単語を表示したりする入出力部１１
０、対訳単語を抽出する対訳単語抽出部１２０、翻訳パ
ターンを作成する際に用いる資源を格納する資源管理部
１３０から構成される。The apparatus shown in the figure has an input / output unit 11 for a user to input a bilingual document and to display a created bilingual word.
0, a bilingual word extraction unit 120 for extracting a bilingual word, and a resource management unit 130 for storing resources used when creating a translation pattern.

【００４０】入出力部１１０は、対訳単語の作成源とな
る対訳文書を使用者が入力するための入力部１１１、対
訳単語を表示したり、修正したりするための出力部１１
２、出力形式等を設定するためのパラメータ設定部１１
３から構成される。The input / output unit 110 includes an input unit 111 for a user to input a bilingual document serving as a source of a bilingual word, and an output unit 11 for displaying and correcting a bilingual word.
2. Parameter setting unit 11 for setting output format and the like
3

【００４１】対訳単語抽出部１２０は、対訳文書を各言
語の文書毎に形態素解析を行う形態素解析部１２１、両
言語間の単語の対応関係を同定し、対訳単語を推定する
対訳単語推定部１２２、対訳単語を使用者が設定した形
式に生成する対訳単語生成部１２３から構成される。The bilingual word extraction unit 120 performs a morphological analysis on the bilingual document for each document in each language, a bilingual word estimating unit 122 that identifies the correspondence between words in both languages, and estimates the bilingual word. , A bilingual word generation unit 123 that generates a bilingual word in a format set by the user.

【００４２】資源管理部１３０は、対訳文書の両言語の
文書を形態素解析するための辞書を格納する形態素辞書
１３１、入力された対訳文書の形態素解析結果を格納す
る対訳文書格納部１３２、対訳単語抽出の際、一時的に
作られる各種テーブルを格納するテーブル格納部１３３
から構成される。The resource management unit 130 includes a morphological dictionary 131 for storing a dictionary for morphologically analyzing bilingual documents in a bilingual document, a bilingual document storage unit 132 for storing morphological analysis results of an input bilingual document, and a bilingual word. At the time of extraction, a table storage unit 133 for storing various tables created temporarily
Consists of

【００４３】〈動作〉図２は、具体例１の翻訳パターン
作成処理のフローチャートである。尚、以下、翻訳パタ
ーン作成処理の原言語を日本語、目的言語を英語として
説明する。図示のフローチャートは、翻訳パターン作成
装置が英語と日本語の対訳文書から対訳単語を抽出し、
表示するまでの処理の流れを説明したものである。ここ
では、使用者が以下のような対訳文書を入力した場合を
例に挙げ、その動作について説明する。<Operation> FIG. 2 is a flowchart of the translation pattern creation processing of the first embodiment. Hereinafter, a description will be given assuming that the source language of the translation pattern creation processing is Japanese and the target language is English. In the illustrated flowchart, the translation pattern creation device extracts bilingual words from bilingual documents in English and Japanese,
This is an explanation of the flow of processing up to the display. Here, the operation will be described by taking as an example the case where the user inputs the following bilingual document.

【００４４】図２７は、本装置の対訳辞書の作成対象と
なる対訳文書の一例である。この対訳文書は、日本語文
書と英語文書の文書からなり、両文書は行左端の行番号
で示すように文単位で対応が付いていることを前提とす
る。尚、先に出願した特願平７−１９２１３４号に示す
文の対応付けアルゴリズムを使用することによって、対
応付けられていない文の入力も可能である。また、一文
対一文の対訳文である必要はなく、一文対複数文（例え
ば、日本語の一文に対する英語の対訳が複数の文で構成
される）であっても構わないし、複数文対複数文であっ
ても構わない。FIG. 27 shows an example of a bilingual document for which a bilingual dictionary is created by the apparatus. The bilingual document is composed of a Japanese document and an English document, and it is assumed that the two documents are associated with each other as indicated by the line number at the left end of the line. By using the sentence association algorithm shown in Japanese Patent Application No. 7-192134 filed earlier, it is possible to input a sentence that is not associated. Further, the sentence need not be a sentence-to-sentence translation, but may be a sentence-to-plurality sentence (for example, an English translation for one sentence in Japanese is composed of a plurality of sentences), or a plural-sentence-to-multiple sentence. It does not matter.

【００４５】先ず、使用者が対訳辞書の作成対象となる
対訳文書を入力部１１１によって入力する（ステップＳ
１０１）と、形態素解析部１２１は、対訳文書を日本語
文書と英語文書に分け、各言語の形態素辞書１３１を用
いて形態素解析する（ステップＳ１０２）。First, the user inputs a bilingual document for which a bilingual dictionary is to be created by the input unit 111 (step S).
101), the morphological analysis unit 121 divides the bilingual document into a Japanese document and an English document, and performs morphological analysis using the morphological dictionary 131 of each language (step S102).

【００４６】図２８、図２９、図３０は、上記図２７の
対訳文書における形態素解析の結果を示す説明図であ
る。本結果は、図示のように、日英各文書において、図
２８に示す出現形結果（１），（２）と、図２９、図３
０に示す標準形結果（１），（２）の二種類の結果を持
つ。出現形結果は、入力文を形態素解析した結果を、ブ
ランクを区切り記号として、出現形見出し（出現したま
まの形）で格納したものである。一方、標準形結果は、
入力文を形態素解析した結果を、ブランクを区切り記号
として、標準形見出しとその後接に「図示の記号＋候補
品詞」が付加された形式で格納する。候補品詞が複数存
在する場合は、“／”を区切り記号として全ての候補品
詞を列挙する。そして、このような各形態素解析結果
は、対訳文書格納部１３２に格納される。FIGS. 28, 29 and 30 are explanatory diagrams showing the results of morphological analysis in the bilingual document of FIG. 27. As shown in the figure, the results are shown in each of the Japanese and English documents as the appearance results (1) and (2) shown in FIG.
There are two types of results, standard results (1) and (2) shown in FIG. The appearance form result is obtained by storing the result of morphological analysis of the input sentence as an appearance form heading (as it appears), using blanks as delimiters. On the other hand, the standard result is
The result of the morphological analysis of the input sentence is stored in a form in which a blank is used as a delimiter and a standard form heading is followed by “symbol shown + candidate part of speech”. When there are a plurality of candidate parts of speech, all candidate parts of speech are listed using "/" as a delimiter. Then, the respective morphological analysis results are stored in the bilingual document storage unit 132.

【００４７】次に、対訳単語推定部１２２において対訳
単語推定処理を行う（ステップＳ１０３）。ここでは対
訳関係にある日本語と英語の単語ペアを抽出する。抽出
される単語ペアがなくなるまで対訳単語推定処理（ステ
ップＳ１０３→ステップＳ１０４）を繰り返し、ステッ
プＳ１０４において、単語ペアがなくなれば、対訳単語
生成部１２３の対訳単語生成処理（ステップＳ１０５）
に移る。対訳単語生成処理では使用者がパラメータ設定
部１１３で設定した形式に対訳単語を整形する。その
後、出力部１１２でその結果を表示し（ステップＳ１０
６）、処理を終了する。Next, the translated word estimation section 122 performs a translated word estimation process (step S103). Here, bilingual Japanese and English word pairs are extracted. The bilingual word estimation process (step S103 → step S104) is repeated until there is no more word pair to be extracted. If no word pair is found in step S104, the bilingual word generation process of the bilingual word generation unit 123 (step S105)
Move on to In the bilingual word generation processing, the bilingual word is shaped into a format set by the user in the parameter setting unit 113. Then, the result is displayed on the output unit 112 (step S10).
6), end the process.

【００４８】次に、対訳単語推定部１２２のアルゴリズ
ムの詳細および具体的な処理の一例を説明する。図３、
図４は、対訳単語推定部１２２の対訳単語推定処理（上
記図２におけるステップＳ１０３，１０４）のアルゴリ
ズムをフローチャートで示したものである。Next, details of the algorithm of the translated word estimation unit 122 and an example of specific processing will be described. FIG.
FIG. 4 is a flowchart showing the algorithm of the translated word estimation process (steps S103 and S104 in FIG. 2) of the translated word estimation unit 122.

【００４９】最初、パターン作成カウンタｎに１をセッ
トする（ステップＳ２０１）。パターン作成カウンタ
は、繰り返し行う対訳単語推定処理の処理回数を示し、
後述する対訳文テーブル（SentJn,SentEn ）を管理する
ために使用される。First, 1 is set to a pattern creation counter n (step S201). The pattern creation counter indicates the number of parallel translation word estimation processes that are performed repeatedly,
It is used to manage a bilingual sentence table (SentJn, SentEn) described later.

【００５０】形態素解析結果の中の標準形結果（図２
９、図３０の（１），（２）参照）から、自立語（候補
品詞中に名（名詞）、動（動詞）、形（形容詞または形
容動詞）、副（副詞）を含む日本語単語、候補品詞中に
ｎ（名詞），ｖ（動詞），ａｄｊ（形容詞），ａｄｖ
（副詞）を含む英単語）の単語見出しを、各言語の対訳
文テーブル（SentJn［行番号］＝｛日本語対訳文単語リ
スト｝，SentEn［行番号］＝｛英語対訳文単語リスト｝
（ｎ：パターン作成カウンタ（ｎ＝１）））に読み込む
（ステップＳ２０２）。The standard form results in the morphological analysis results (FIG. 2)
9. Japanese words that include independent words (name (noun), verb (verb), form (adjective or adjective verb), and adverb (adverb) in candidate words, from (1) and (2) in FIG. 30) , N (noun), v (verb), adj (adjective), adv
The word headings of (English words including (adverb)) are translated into the bilingual sentence table of each language (SentJn [line number] = {Japanese bilingual word list}, SentEn [line number] = {English bilingual word list}
(N: pattern creation counter (n = 1))) (step S202).

【００５１】図３１は、上述した形態素解析結果（図２
８〜図３０）の対訳文書の一行目から三行目までの対訳
文テーブルの例である。各言語の対訳文テーブルは、イ
ンデックスとなる行番号と、その値となる文中の自立語
単語見出しリストから構成される。尚、以降、各テーブ
ルは“変数名［インデックス］＝値”で表現する。FIG. 31 shows the results of the morphological analysis described above (FIG. 2).
30 is an example of a bilingual sentence table from the first to third lines of the bilingual document of FIGS. 8 to 30). The bilingual sentence table for each language is composed of a line number serving as an index and a list of independent word headings in the sentence serving as its value. Hereinafter, each table is represented by “variable name [index] = value”.

【００５２】次に、対訳文テーブルに出現する各単語の
出現回数を数え上げ、各単語の単語見出しをインデック
スとする単語テーブル（FreqJ ［日本語単語］＝日本語
単語の出現回数，FreqE ［英単語］＝英単語の出現回
数）に格納する（ステップＳ２０３）。図３２に、各言
語の単語テーブルの例を示す。Next, the number of appearances of each word appearing in the bilingual sentence table is counted, and a word table (FreqJ [Japanese word] = frequency of appearance of Japanese word, FreqE [English word] ] = Number of appearances of English words) (step S203). FIG. 32 shows an example of a word table for each language.

【００５３】更に、対訳文に含まれる日本語単語と英単
語の組合せ（単語ペア）の出現回数を数え上げ、その出
現回数と出現する文の行番号を、各単語ペアをインデッ
クスとする単語ペアテーブル（FreqJE［日本語単語英
単語］＝日本語単語と英単語が対訳文として出現する回
数，SentJE［日本語単語英単語］＝｛出現行番号リス
ト｝）に格納する（ステップＳ２０４）。図３３、図３
４は単語ペアテーブルの説明図であり、図３３（１）に
出現回数カウント後の状態を示している。Furthermore, the number of appearances of a combination (word pair) of a Japanese word and an English word included in a bilingual sentence is counted, and the number of appearances and the line number of the appearing sentence are used as a word pair table using each word pair as an index. (FreqJE [Japanese word English word] = the number of times a Japanese word and an English word appear as a bilingual sentence, SentJE [Japanese word English word] = {List of appearance line numbers}) (step S204). FIG. 33, FIG.
4 is an explanatory diagram of the word pair table, and FIG. 33A shows a state after counting the number of appearances.

【００５４】次に、単語ペアテーブルに格納された単語
ペアにおける単語対応度を計算する（ステップＳ２０
５）。単語対応度は、図４（ステップＳ２０５）に示す
式（SimJE ［JW EW ］＝2*FreqJE［JW EW ］／（ FreqJ
［JW］＋ FreqE［EW］）（JW：日本語単語，EW：英単
語）より計算され（以降、この値SimJE ［JW EW ］を日
英対応度と呼ぶ）、その値を単語ペアテーブルに追加す
る。単語対応度は０から１までの値をとり、対訳関係に
ある単語ペアほど１に近い値をとる。このような単語対
応度を計算した結果が図３４の（２）に示す値である。Next, the degree of word correspondence in the word pair stored in the word pair table is calculated (step S20).
5). The degree of word correspondence is calculated by the equation (SimJE [JW EW] = 2 * FreqJE [JW EW] / (FreqJ) shown in FIG.
[JW] + FreqE [EW]) (JW: Japanese word, EW: English word) is calculated (hereinafter this value SimJE [JWEW] is called Japanese-English correspondence), and the value is stored in the word pair table. to add. The degree of word correspondence takes a value from 0 to 1, and a word pair having a bilingual relationship takes a value closer to 1. The result of calculating such word correspondence is the value shown in (2) of FIG.

【００５５】次に、単語ペアテーブル（図３４の（２）
に示す状態）のインデックスに格納されている日本語単
語と英単語において、日本語単語とペアとなる英単語の
中で、最も高い日英対応度（SimJE ［日本語単語英単
語］）を持つ英単語を抽出する。そして、その英単語と
ペアを組む日本語単語の中で最も高い日英対応度をとる
日本語単語が元の日本語単語であるならばその単語ペア
を対訳単語と同定し（ステップＳ２０６）、その単語ペ
アをパターンテーブル（Pattern ［日本語単語英単語］
＝｛出現行番号のリスト｝）に格納する（ステップＳ２
０７）。図３５は、パターンテーブルの一例である。Next, a word pair table ((2) in FIG. 34)
The Japanese words and English words stored in the index of the Japanese language) have the highest Japanese-English correspondence (SimJE [Japanese words English words]) among the English words paired with the Japanese words Extract English words. If the Japanese word having the highest Japanese-English correspondence among the Japanese words paired with the English word is the original Japanese word, the word pair is identified as a translated word (step S206). A pattern table (Pattern [Japanese word English word]
= {List of Appeared Line Numbers}) (Step S2)
07). FIG. 35 is an example of a pattern table.

【００５６】尚、もし最も高い日英対応度を持つ単語ペ
アが複数あるならば、その全ての単語ペアを格納する。
例えば、図３４（２）の場合、日本語単語［情報］と共
にインデックスを形成する英単語は“information ”，
“display ”，“resource”であるが、その中で最も高
い日英対応度（SimJE ）をとる英単語は“information
”である。一方、“information ”において最も高い
日英対応度をとる日本語単語は、「情報」である。その
結果、両者を対訳単語として同定し、パターンテーブル
にPattern ［情報 information］＝｛３１４…｝とし
て格納する。If there are a plurality of word pairs having the highest degree of Japanese-English correspondence, all the word pairs are stored.
For example, in the case of FIG. 34 (2), the English words forming the index together with the Japanese word [information] are “information”,
"Display" and "resource", the English word with the highest Japanese-English correspondence (SimJE) is "information"
On the other hand, the Japanese word with the highest Japanese-English correspondence in “information” is “information”. As a result, both words are identified as parallel words, and stored as Pattern [information information] = {314...} In the pattern table.

【００５７】対訳単語が抽出されたならば（ステップＳ
２０８）、パターン作成カウンタｎに１を加え（ステッ
プＳ２０９）、対訳文テーブル（SentJn［行番号］，Se
ntEn［行番号］ｎ：パターン作成カウンタ（ｎ＝２））
の作成処理（ステップＳ２０２）に戻る。但し、ｎが２
以上の場合、形態素解析結果を対訳文テーブルに読み込
む際、パターンテーブル（Pattern ［日本語単語英単
語］）に登録されている単語ペアが対訳文中に存在すれ
ば、その単語については対訳文テーブルに格納しない。
一例を挙げると、パターン作成カウンタｎが２で、図３
５のパターンテーブルが存在する場合、パターンテーブ
ルに格納されている“最初 initial”、“情報 informa
tion”は対訳文テーブルに格納されない。図３６はｎ＝
２の場合の対訳文テーブルの説明図である。If a bilingual word is extracted (step S
208), 1 is added to the pattern creation counter n (step S209), and the bilingual sentence table (SentJn [line number], Se
ntEn [line number] n: pattern creation counter (n = 2)
It returns to the creation process (step S202). Where n is 2
In the above case, when the morphological analysis result is read into the bilingual sentence table, if a word pair registered in the pattern table (Pattern [Japanese word English word]) exists in the bilingual sentence, the word is stored in the bilingual sentence table. Do not store.
As an example, if the pattern creation counter n is 2, and FIG.
If there are 5 pattern tables, “initial initial” and “information informa” stored in the pattern table
“tion” is not stored in the bilingual sentence table.
FIG. 4 is an explanatory diagram of a bilingual sentence table in the case of No. 2;

【００５８】図示のように、最初の対訳文テーブル（図
３１参照）に格納されていた“最初initial”、“情報
information”は、新たな対訳文テーブルに格納されな
い。そして、新たに作成された対訳文テーブル（図３
６）に対し、上記処理を繰り返し（ステップＳ２０２〜
２０８）、抽出される単語ペアがなくなれば、対訳単語
生成処理（ステップＳ２１０）に進む。ここで処理を繰
り返す理由は、繰り返し処理を行うことにより、前回の
対訳単語推定処理では抽出できなかった単語ペアを抽出
することができるためである。例えば、単語ペアテーブ
ル（図３４（２））に“インフォメーション informati
on”というペアが、単語対応度０．４３で存在するが、
より対応度の大きな“情報 information”が存在するた
め１回目では抽出できない。しかし、２回目では“情報
information”を数え上げの対象から外すことにより、
“インフォメーション information”の単語対応度は高
値になり、２回目以降の処理で“インフォメーション i
nformation”を抽出することができる。As shown in the figure, "initial initial" and "information" stored in the first bilingual sentence table (see FIG. 31).
“information” is not stored in the new bilingual sentence table.
6), the above processing is repeated (steps S202 to S202).
208) When there are no more word pairs to be extracted, the process proceeds to a bilingual word generation process (step S210). The reason why the process is repeated here is that by performing the repetition process, a word pair that could not be extracted in the previous bilingual word estimation process can be extracted. For example, in the word pair table (FIG. 34 (2)), "Information informati
on ”exists with a word correspondence of 0.43,
Since there is “information information” having a higher degree of correspondence, it cannot be extracted at the first time. But the second time, "Information
By removing "information" from the list,
The word correspondence of “information information” has a high value, and the “information i”
nformation ”can be extracted.

【００５９】図３７、図３８は、上記対訳単語推定処理
において要素を簡素化した場合の説明図である。今、図
３７のｎ＝１の対訳文テーブルに示すように、日本語文
書中に、「情報」が３個、「インフォメーション」が２
個、「表示」が１個出現し、英語文書に「information
」が６個出現したとする。この値が単語テーブルに格
納され、更に、単語ペアテーブルが生成される。例え
ば、“情報 information ”のSimJEは、２×３／（３
＋６）≒０．６７であり、これが他の単語ペアと比べて
最も高値であり、また、“information ”において最も
高値である日本語単語は「情報」であるため、この単語
ペアを対訳単語として抽出する。そして、この単語ペア
をパターンテーブルに格納する。FIG. 37 and FIG. 38 are explanatory diagrams when the elements are simplified in the above-described bilingual word estimation processing. Now, as shown in the bilingual sentence table for n = 1 in FIG. 37, three “information” and two “information”
And one "display" appear in the English document
"Appear six times. This value is stored in the word table, and a word pair table is generated. For example, SimJE of “information information” is 2 × 3 / (3
+6) ≒ 0.67, which is the highest price compared to other word pairs, and the highest value Japanese word in “information” is “information”, so this word pair is used as a bilingual word. Extract. Then, this word pair is stored in the pattern table.

【００６０】２回目の対訳単語推定処理（ｎ＝２）の場
合、１回目の対訳文から、“情報information ”が除か
れるため、その対訳文テーブルの内容は、図３８に示す
ようになる。これにより、今度は“インフォメーション
information ”のSimJEが最も高値（０．８）とな
り、また、“information ”において最も高値である日
本語単語は「インフォメーション」であるため、この単
語ペアを対訳単語として抽出する。そして、この単語ペ
アをパターンテーブルに追加する。このようにして、対
訳単語を順次抽出することができる。In the case of the second translation word estimation process (n = 2), since "information" is removed from the first translation text, the contents of the translation text table are as shown in FIG. As a result, this time, "Information
SimJE of "information" has the highest value (0.8), and the Japanese word having the highest value of "information" is "information", so this word pair is extracted as a bilingual word. Then, this word pair is added to the pattern table. In this way, bilingual words can be sequentially extracted.

【００６１】図２に戻り、対訳単語生成処理（図２にお
けるステップＳ１０５）は、使用者がパラメータ設定部
１１３によって生成した要求に従って、対訳単語推定処
理で作成したパターンテーブル（Pattern ［日本語単語
英単語］）を整形する。例えば、使用者が「単語対単
語の形式で対訳単語を出力する」という要求を与える
と、対訳単語生成部１２３はパラメータ設定部１１３か
らその命令を受け取り、パターンテーブルの結果を整形
して出力する。また、「品詞情報付きで出力する」とい
う命令を与えると、対訳単語生成部１２３は、パターン
テーブルに格納されている行番号リストの行番号を一つ
取り出し、その行番号が示す対訳文テーブルを参照す
る。そして、該当単語が自立語の何番目の単語であるか
を抽出し、その順番に位置する単語を形態素解析結果か
ら取り出し、そこに格納されている品詞も添えて出力す
る。その他にも、パターンテーブルに格納されている行
番号の対訳文（形態素解析結果出現形結果（図２９、３
０（１），（２））を表示することもできる。Returning to FIG. 2, the translated word generation process (step S105 in FIG. 2) is performed by a pattern table (Pattern [Japanese word English] created by the user in accordance with the request generated by the parameter setting unit 113 in the translated word estimation process. Word]). For example, when the user gives a request to “output a bilingual word in a word-to-word format”, the bilingual word generation unit 123 receives the instruction from the parameter setting unit 113, shapes and outputs the result of the pattern table. . When a command of “output with part of speech information” is given, the bilingual word generation unit 123 extracts one line number of the line number list stored in the pattern table, and generates a bilingual sentence table indicated by the line number. refer. Then, the number of the corresponding word is extracted as an independent word, the words located in that order are extracted from the result of the morphological analysis, and the part of speech stored therein is output together with the part of speech. In addition, the bilingual sentence of the line number stored in the pattern table (the morphological analysis result appearance form result (FIG. 29, 3
0 (1), (2)) can also be displayed.

【００６２】図３９は、具体例１の表示例を示す図であ
る。この例は、「単語対単語の形式で対訳単語を出力す
る」という命令を使用者が与えた場合の出力部１１２か
らの対訳単語の出力結果の例である。尚、使用者は、入
力部１１１により、本結果の正否を確認し、修正するこ
ともできる。FIG. 39 is a diagram showing a display example of the first embodiment. This example is an example of an output result of a translated word from the output unit 112 when the user gives an instruction of “outputting a translated word in a word-to-word format”. Note that the user can check whether the result is correct or not by using the input unit 111 and correct the result.

【００６３】〈効果〉以上のように、具体例１では以下
の効果をもたらす。従来では構文解析を必要としたため、文法に従わない
文では自動抽出できなかったが、本装置および方法では
構文解析を必要としないため、文法に従わない文でも、
対訳単語を抽出することができる。<Effects> As described above, the first embodiment has the following effects. In the past, parsing was required, so that sentences that did not follow the grammar could not be automatically extracted.However, the present apparatus and method do not require parsing, so even if the sentence does not follow the grammar,
Bilingual words can be extracted.

【００６４】従来では対訳辞書を必要としたが、対訳
文書全体の単語および単語ペアの出現回数から単語対応
度を求め、それを抽出に用いるため、対訳辞書が存在し
なくても対訳文中の対訳単語を獲得することができる。Conventionally, a bilingual dictionary was required. However, since the degree of word correspondence is determined from the number of appearances of words and word pairs in the entire bilingual document and is used for extraction, even if the bilingual dictionary does not exist, bilingual translations in the bilingual sentence can be obtained. You can get words.

【００６５】《具体例２》〈構成〉図５は翻訳パターン作成装置における具体例２
の構成図である。本具体例では、対訳単語を抽出する
際、既存の対訳辞書を使用する。本具体例の特徴は、既
存の対訳辞書と、使用者が翻訳したい文書と同じ分野ま
たは特徴を持つ対訳文書とを本システムに同時に与える
ことによって、既存の対訳辞書をその対訳文書と同等の
翻訳結果が得られる対訳辞書にチューニングできる点に
ある。<< Specific Example 2 >><Structure> FIG. 5 shows a specific example 2 of the translation pattern creating apparatus.
FIG. In this specific example, when extracting a bilingual word, an existing bilingual dictionary is used. The feature of this example is that an existing bilingual dictionary and a bilingual document having the same field or characteristics as the document to be translated by the user are given to the system at the same time, so that the existing bilingual dictionary is translated equivalent to the bilingual document. The point is that you can tune to a bilingual dictionary that gives you the results.

【００６６】具体例２の構成は、入出力部２１０、対訳
単語抽出部２２０、資源管理部２３０からなり、これら
の構成は、具体例１に準じるが、資源管理部２３０に対
訳辞書２３４が存在する点において異なっている。ま
た、入出力部２１０においては、入力部２１１は対訳文
書だけでなく、対訳辞書も入力とし、出力部２１２は入
力された対訳文書によってチューニングされた既存の対
訳辞書を表示する機能を有する点が異なっている。The configuration of the specific example 2 includes an input / output unit 210, a bilingual word extraction unit 220, and a resource management unit 230. These configurations are similar to the specific example 1, but the bilingual dictionary 234 exists in the resource management unit 230. Is different. In addition, in the input / output unit 210, the input unit 211 receives not only a bilingual document but also a bilingual dictionary, and the output unit 212 has a function of displaying an existing bilingual dictionary tuned by the input bilingual document. Is different.

【００６７】図中、他の構成である入出力部２１０にお
けるパラメータ設定部２１３、対訳単語抽出部２２０に
おける形態素解析部２２１、対訳単語推定部２２２、対
訳単語生成部２２３、および、資源管理部２３０におけ
る形態素辞書２３１、対訳文書格納部２３２、テーブル
格納部２３３は、具体例１におけるパラメータ設定部１
１３、形態素解析部１２１、対訳単語推定部１２２、対
訳単語生成部１２３、形態素辞書１３１、対訳文書格納
部１３２、テーブル格納部１３３と同様の機能を有して
いる。In the figure, the parameter setting unit 213 in the input / output unit 210, the morphological analysis unit 221 in the bilingual word extraction unit 220, the bilingual word estimating unit 222, the bilingual word generation unit 223, and the resource management unit 230 have other configurations. The morpheme dictionary 231, the bilingual document storage unit 232, and the table storage unit 233 in
13, the morphological analysis unit 121, the bilingual word estimation unit 122, the bilingual word generation unit 123, the morphological dictionary 131, the bilingual document storage unit 132, and the table storage unit 133 have the same functions.

【００６８】〈動作〉図６は、具体例２における対訳辞
書作成のアルゴリズムを示すフローチャートである。こ
の具体例２においても、その動作は基本的に具体例１に
準じるが、相違点は、（１）入力の際、対訳文書と同時
にチューニング対象となる対訳辞書を読み込む（ステッ
プＳ３０１）、（２）対訳単語推定処理（ステップＳ３
０３）に対訳辞書を利用する、（３）対訳単語生成処理
（ステップＳ３０５）において、対訳辞書のチューニン
グ処理を行う、の三点である。以下では、相違点を中心
に説明する。<Operation> FIG. 6 is a flowchart showing an algorithm for creating a bilingual dictionary in the second embodiment. The operation of this specific example 2 is basically the same as that of the specific example 1, except that (1) at the time of input, a bilingual dictionary to be tuned is read simultaneously with the bilingual document (step S301), (2) ) Bilingual word estimation processing (step S3)
03), using a bilingual dictionary, and (3) performing bilingual dictionary tuning in the bilingual word generation processing (step S305). The following description focuses on the differences.

【００６９】具体例１で用いた対訳文書（図２７参照）
によって、以下に示す対訳辞書（Dic ［日本語単語英
単語］（値は省略））をチューニングする場合について
説明する。図４０に、その対訳辞書の一例を示す。The bilingual document used in Example 1 (see FIG. 27)
The following describes a case in which the following bilingual dictionary (Dic [Japanese word English word] (value omitted)) is tuned. FIG. 40 shows an example of the bilingual dictionary.

【００７０】先ず、ステップＳ３０１で対訳文書を入力
する点、および、ステップＳ３０２における日本語・英
語形態素解析処理は、具体例１と同様であるためその説
明は省略する。一方、この具体例２では、ステップＳ３
０１において、使用者は、対訳文書と同時にチューニン
グしたい対訳辞書を入力部１１によって入力する。この
対訳辞書は、資源管理部２３０の対訳辞書２３４に格納
される。First, the point of inputting a bilingual document in step S301 and the Japanese / English morphological analysis processing in step S302 are the same as those in the first embodiment, and a description thereof will be omitted. On the other hand, in this specific example 2, step S3
At 01, the user inputs a bilingual dictionary to be tuned at the same time as the bilingual document through the input unit 11. This bilingual dictionary is stored in the bilingual dictionary 234 of the resource management unit 230.

【００７１】形態素解析処理（ステップＳ３０２）によ
って、各形態素解析結果が対訳文書格納部２３２に格納
された後、対訳単語推定部２２２において、対訳単語推
定処理を行う（ステップＳ３０３）。ここでは、対訳辞
書２３４を用いて対訳候補となる日本語と英語の単語ペ
アを抽出する。次のステップＳ３０４の判定において、
対訳候補となる単語ペアが存在した場合、再びステップ
Ｓ３０３の対訳単語推定処理に戻る。抽出される単語ペ
アがなくなるまで、このような対訳単語推定処理を繰り
返し、ステップＳ３０４において、単語ペアがなくなれ
ば、対訳単語生成部２２３に処理を移し、使用者がパラ
メータ設定部２１３で指定した形式で対訳辞書をチュー
ニングし（ステップＳ３０５）、出力部２１２で結果を
表示し（ステップＳ３０６）、処理を終了する。After each morphological analysis result is stored in the bilingual document storage unit 232 by the morphological analysis process (step S302), the bilingual word estimation unit 222 performs a bilingual word estimation process (step S303). Here, the bilingual dictionary 234 is used to extract Japanese and English word pairs that are bilingual candidates. In the determination of the next step S304,
If there is a word pair that is a translation candidate, the process returns to the translation word estimation process in step S303. Such a bilingual word estimation process is repeated until there is no more word pair to be extracted. If there is no more word pair in step S304, the processing is transferred to the bilingual word generation unit 223, and the format specified by the user in the parameter setting unit 213. Tunes the bilingual dictionary (step S305), displays the result on the output unit 212 (step S306), and ends the processing.

【００７２】以下、対訳単語推定処理以降のアルゴリズ
ムの詳細および具体的な処理の一例を説明する。図７、
図８、図９は、対訳単語推定部２２２の対訳単語推定処
理（上記のステップＳ３０３，３０４）のアルゴリズム
をフローチャートで示したものである。The details of the algorithm after the parallel word estimation process and an example of a specific process will be described below. FIG.
FIG. 8 and FIG. 9 are flowcharts showing the algorithm of the translated word estimation processing (the above-described steps S303 and 304) of the translated word estimation unit 222.

【００７３】先ず、パターン作成カウンタｎに１をセッ
トする（ステップＳ４０１）。パターン作成カウンタ
は、繰り返し行う対訳単語推定処理の処理回数を示し、
後述する対訳文テーブル（SentJn,SentEn ）を管理する
ために使用される。次に、形態素解析結果の中の標準形
結果（図２９、図３０（１），（２））において、対訳
辞書（Dic ［日本語単語英単語］）に格納されている
対訳単語が対訳文中に含まれるならば、その対訳単語を
パターンテーブル（Pattern ［日本語単語英単語］＝
｛出現行番号のリスト｝）に格納する（ステップＳ４０
２）。First, 1 is set to the pattern creation counter n (step S401). The pattern creation counter indicates the number of parallel translation word estimation processes that are performed repeatedly,
It is used to manage a bilingual sentence table (SentJn, SentEn) described later. Next, in the standard form results (FIGS. 29 and 30 (1) and (2)) in the morphological analysis results, the bilingual words stored in the bilingual dictionary (Dic [Japanese word English word]) are included in the bilingual sentence. Is included in the pattern table (Pattern [Japanese word English word] =
(List of occurrence line numbers)) (step S40).
2).

【００７４】図４１は、対訳辞書引き後のパターンテー
ブルであり、これは、図４０に示す対訳辞書と、図３１
に示す対訳文テーブルから作成されたパターンテーブル
を示している。FIG. 41 shows the pattern table after the translation of the bilingual dictionary. The pattern table is composed of the bilingual dictionary shown in FIG.
9 shows a pattern table created from the bilingual sentence table shown in FIG.

【００７５】対訳辞書２３４に格納され、かつ対訳文に
含まれる対訳単語は、より確実な対応とみなされる。そ
の対訳単語を数え上げの対象から外すことにより、対訳
辞書２３４に格納されている対訳単語以外の対訳関係が
ある単語ペアの単語対応度の値は高くなり、それによっ
て対訳辞書２３４が存在しない場合よりも多くの単語ペ
アを抽出することができる。更に、この処理は対訳文書
に出現し、かつ、対訳辞書２３４に格納されている対訳
単語を検出していることになり、検出された対訳単語を
優先させることにより、対訳文書に適応させるのに役立
っている。即ち、対訳辞書２３４に複数の異なる訳語が
あった場合、対訳文書にあった訳語に特定することがで
きる。The translated words stored in the bilingual dictionary 234 and included in the bilingual sentence are regarded as more reliable correspondences. By excluding the bilingual word from the target of counting, the value of the word correspondence degree of a word pair having a bilingual relationship other than the bilingual word stored in the bilingual dictionary 234 becomes higher, thereby increasing the value as compared with the case where the bilingual dictionary 234 does not exist. Can also extract many word pairs. Further, this processing is to detect a translation word that appears in the bilingual document and is stored in the bilingual dictionary 234, so that the detected bilingual word is prioritized to adapt to the bilingual document. It is helpful. That is, when there are a plurality of different translations in the bilingual dictionary 234, it is possible to specify the translation in the bilingual document.

【００７６】次に、形態素解析結果の中の標準形結果か
ら、自立語（候補品詞中に名（名詞）、動（動詞）、形
（形容詞または形容動詞）、副（副詞）を含む日本語単
語、候補品詞中にｎ（名詞），ｖ（動詞），ａｄｖ（副
詞）を含む英単語）の単語見出しを、各言語の対訳文テ
ーブル（SentJn［行番号］＝｛日本語対訳文単語リス
ト｝，SentEn［行番号］＝｛英語対訳文単語リスト｝
（ｎ：パターン作成カウンタ（ｎ＝１）））に読み込む
（ステップＳ４０３）。但し、形態素解析結果を対訳文
テーブルに読み込む場合、パターンテーブル（Pattern
［日本語単語英単語］）を参照し、もしパターンテー
ブルに格納されている対訳単語が対訳文中に存在すれ
ば、その対訳単語の各単語については対訳文テーブルに
格納しない。一例を挙げると、図４１のパターンテーブ
ルが存在する場合、図４１に格納されている“始める s
tart”、“指示 direction”、“フィールド field”
は、対訳文テーブルに格納しない。対訳単語推定処理に
おける以降の処理（ステップＳ４０４〜４１１）は具体
例１に準じるため、その説明は省略する。Next, based on the standard form results in the morphological analysis results, Japanese words containing independent words (name (noun), verb (verb), form (adjective or adjective verb), adverb (adverb) in candidate parts of speech The word heading of the word and the candidate part of speech including n (noun), v (verb), and adv (adverb) in the bilingual sentence table of each language (SentJn [line number] = {Japanese bilingual sentence word list) ｝, SentEn [line number] = {English translation word list}
(N: pattern creation counter (n = 1))) (step S403). However, when reading the morphological analysis result into the bilingual sentence table, the pattern table (Pattern
[Japanese word English word]), and if a bilingual word stored in the pattern table exists in the bilingual sentence, each word of the bilingual word is not stored in the bilingual sentence table. As an example, when the pattern table of FIG. 41 exists, “start s” stored in FIG.
tart ”,“ instruction direction ”,“ field field ”
Is not stored in the bilingual sentence table. Subsequent processing (steps S404 to S411) in the bilingual word estimation processing conforms to the first specific example, and a description thereof will be omitted.

【００７７】次のステップＳ４１１における対訳単語生
成処理は、対訳辞書２３４とパターンテーブルに格納さ
れている対訳単語を統合する処理である。The bilingual word generation process in the next step S411 is a process of integrating the bilingual dictionary 234 with the bilingual words stored in the pattern table.

【００７８】図１０は、この対訳単語生成処理における
アルゴリズムをフローチャートで示したものであり、対
訳辞書のチューニング処理を示している。最初に対訳辞
書２３４を読み込み（ステップＳ５０１）、その要素
（Dic ［日本語単語英単語］）を一つ取り出す（ステ
ップＳ５０２）。次のステップＳ５０３において、取り
出された日本語単語と英単語のいずれかがパターンテー
ブルに登録されているかを調べ、登録されていなければ
何もしない。一方、いずれかが登録されていれば、日本
語単語と英単語の対訳が格納されているかを調べる（ス
テップＳ５０４）。もし格納されていれば、それ以外の
対訳を対訳辞書に格納し、格納した対訳は、パターンテ
ーブルから消去する（ステップＳ５０５）。ステップＳ
５０４において、格納されていなければ、対訳辞書２３
４内の日本語単語と英単語の格納（Dic ［日本語単語
英単語］）を消去し、それ以外の対訳を対訳辞書２３４
に格納し、パターンテーブルからその対訳を消去する
（ステップＳ５０６）。このような処理を、全ての対訳
辞書２３４の対訳単語に対して行い、最後にパターンテ
ーブルに残っている全ての対訳単語を対訳辞書２３４に
格納する（ステップＳ５０７）。FIG. 10 is a flowchart showing the algorithm in the bilingual word generation processing, and shows the bilingual dictionary tuning processing. First, the bilingual dictionary 234 is read (step S501), and one element (Dic [Japanese word English word]) is taken out (step S502). In the next step S503, it is checked whether any of the extracted Japanese words or English words is registered in the pattern table. If not, nothing is performed. On the other hand, if any one is registered, it is checked whether a bilingual translation between a Japanese word and an English word is stored (step S504). If it is stored, other translations are stored in the translation dictionary, and the stored translation is deleted from the pattern table (step S505). Step S
If it is not stored in 504, the bilingual dictionary 23
4. Storage of Japanese and English words (Dic [Japanese word
English words]) and delete the other translations into the bilingual dictionary 234.
And deletes the translation from the pattern table (step S506). Such a process is performed on the bilingual words in all the bilingual dictionaries 234, and finally, all the bilingual words remaining in the pattern table are stored in the bilingual dictionary 234 (step S507).

【００７９】この結果、対訳辞書２３４の対訳単語は、
対訳文書から抽出された対訳単語が存在した場合はそち
らの対訳単語が格納され、ない場合は元の対訳辞書２３
４の対訳単語が格納される。従って、対訳辞書２３４に
格納される訳語は、より対訳文書に適応した訳語とな
る。最後に、ここでチューニングされた対訳辞書２３４
を、使用者は出力部２１２によって表示する。使用者は
その正否を確認した後、結果を修正することもできる。As a result, the bilingual words in the bilingual dictionary 234 are
If a bilingual word extracted from the bilingual document exists, the bilingual word is stored. If not, the original bilingual dictionary 23 is stored.
4 bilingual words are stored. Therefore, the translation word stored in the bilingual dictionary 234 is a translation word that is more adapted to the bilingual document. Finally, the bilingual dictionary 234 tuned here
Is displayed on the output unit 212 by the user. After confirming the correctness, the user can correct the result.

【００８０】〈効果〉以上の説明のように、具体例２に
おいては、以下の効果がある。従来では構文解析を必要としたため、文法に従わない
文では自動抽出できなかったが、本装置および方法で
は、構文解析を必要としないため、文法に従わない文で
も、対訳単語を抽出することができる。<Effects> As described above, the specific example 2 has the following effects. In the past, parsing was required, so that sentences that did not follow the grammar could not be automatically extracted.However, with the present apparatus and method, bilingual words can be extracted even for sentences that do not follow the grammar, because parsing is not required. it can.

【００８１】対訳文書全体の単語および単語ペアの出
現回数から単語対応度を求める際に、対訳辞書が利用で
きるため、両情報（対訳文書と対訳単語）の情報を有効
に活かした精度の高い対訳単語の抽出を行うことができ
る。Since the bilingual dictionary can be used to determine the degree of word correspondence from the number of appearances of words and word pairs in the entire bilingual document, a high-precision bilingual translation effectively utilizing the information of both information (the bilingual document and the bilingual word) Word extraction can be performed.

【００８２】既存の対訳辞書と使用者が翻訳したい文
書と同じ分野または特徴を持つ対訳文書とを本システム
に同時に与えることによって、既存の対訳辞書をその対
訳文書と同等の翻訳結果が得られる対訳辞書にチューニ
ングできる。分野の異なった文書毎に本作業を行うこと
によって、分野に特化した対訳辞書を作成することがで
きる。これは、分野別の専門用語辞書として機械翻訳装
置等に使用することができる。By simultaneously providing an existing bilingual dictionary and a bilingual document having the same field or characteristics as a document to be translated by a user to the present system, the existing bilingual dictionary can be translated to obtain a translation result equivalent to that of the bilingual document. Can tune to a dictionary. By performing this work for each document in a different field, a bilingual dictionary specialized for the field can be created. This can be used for a machine translation device or the like as a technical term dictionary for each field.

【００８３】《具体例３》〈構成〉図１１は本発明の翻訳パターン作成装置におけ
る具体例３の構成図である。本具体例は対訳辞書を使用
しなくても、文書中に複数回出現する定型パターンを翻
訳単位とした翻訳パターンを作成できる点に特徴を有す
る。例えば、「シンボリック」と「リンク」が同時に複
数回出現した場合、それに対応する英語の文において
“symbolic”と“link”も常に同時に出現するならば、
「シンボリックリンク」は、“symbolic link ”を翻訳
単位とした翻訳パターンを作成することができる。<< Embodiment 3 >><Structure> FIG. 11 is a block diagram of Embodiment 3 in the translation pattern creating apparatus of the present invention. This specific example is characterized in that a translation pattern can be created using a fixed pattern that appears multiple times in a document as a translation unit without using a bilingual dictionary. For example, if "symbolic" and "link" appear more than once at the same time, if "symbolic" and "link" always appear at the same time in the corresponding English sentence,
“Symbol link” can create a translation pattern using “symbolic link” as a translation unit.

【００８４】具体例３の構成は、入出力部３１０、資源
管理部３３０、翻訳パターン抽出部３４０からなり、入
出力部３１０および資源管理部３３０は、具体例１の構
成に準じている。また、翻訳パターン抽出部３４０は、
具体例１、２における対訳単語抽出部１２０、２２０と
同様の機能を有するが、対訳単語推定部１２２、２２２
に対応する翻訳パターン推定部３４２が異なっており、
これは両言語の単語間の対応関係を推定する対訳単語推
定部３４２ａと言語内の単語間の共起関係を推定する共
起単語推定部３４２ｂから構成される。また、翻訳パタ
ーン生成部３４３は、具体例１、２における対訳単語生
成部１２３、２２３に相当する機能を有している。The configuration of the third embodiment includes an input / output unit 310, a resource management unit 330, and a translation pattern extraction unit 340. The input / output unit 310 and the resource management unit 330 conform to the configuration of the first embodiment. Further, the translation pattern extraction unit 340
It has the same function as the bilingual word extraction units 120 and 220 in the specific examples 1 and 2, but the bilingual word estimation units 122 and 222
Are different in the translation pattern estimation unit 342 corresponding to
It comprises a bilingual word estimator 342a for estimating the correspondence between words in both languages and a co-occurrence word estimator 342b for estimating the co-occurrence between words in the language. Further, the translation pattern generation unit 343 has a function corresponding to the translated word generation units 123 and 223 in the first and second specific examples.

【００８５】尚、図１１において、他の構成である入出
力部３１０の各構成、翻訳パターン抽出部３４０の形態
素解析部３４１、および、資源管理部３３０の各構成
は、具体例１における入出力部１１０の各構成、形態素
解析部１２１、および資源管理部１３０の各構成と同様
の機能を有している。In FIG. 11, the other components of the input / output unit 310, the morphological analysis unit 341 of the translation pattern extraction unit 340, and the resource management unit 330 are the same as those of the first embodiment. It has the same function as each component of the unit 110, the morphological analysis unit 121, and each component of the resource management unit 130.

【００８６】〈動作〉図１２は、本装置が英語と日本語
の対訳文書から翻訳パターンを作成し、表示するまでの
フローチャートである。本具体例でも、使用者が図２７
の対訳文書を入力した場合を例に挙げ、動作の説明を行
う。<Operation> FIG. 12 is a flowchart showing a process in which the present apparatus creates and displays a translation pattern from a bilingual document in English and Japanese. Also in this specific example, the user
The operation will be described by taking as an example a case where a bilingual document is input.

【００８７】対訳文書の入力（ステップＳ６０１）、日
本語・英語形態素解析処理（ステップＳ６０２）は、具
体例１と同様であるため説明を省略する。そして、ステ
ップＳ６０２の形態素解析処理によって各形態素解析結
果が対訳文書格納部３３２に格納された後、翻訳パター
ン推定部３４２の対訳単語推定部３４２ａにおいて、対
訳単語推定処理（ステップＳ６０３）を行う。ここで
は、具体例１と同様に、対訳候補となる日本語と英語の
単語ペアを抽出する。The input of the bilingual document (step S 601) and the Japanese / English morphological analysis processing (step S 602) are the same as those in the first embodiment, so that the description will be omitted. Then, after each morphological analysis result is stored in the bilingual document storage unit 332 by the morphological analysis process of step S602, the bilingual word estimation unit 342a of the translation pattern estimation unit 342 performs the bilingual word estimation process (step S603). Here, as in the first specific example, Japanese and English word pairs serving as translation candidates are extracted.

【００８８】次のステップＳ６０４の単語ペアの存在判
定において、対訳候補となる単語ペアが存在した場合、
共起単語推定部３４２ｂによる共起単語推定処理（ステ
ップＳ６０５）、翻訳パターン語順決定処理（ステップ
Ｓ６０６）を行い、再びステップＳ６０３の対訳単語推
定処理に戻る。抽出される単語ペアがなくなるまで、対
訳単語推定処理と共起単語推定処理を繰り返し、ステッ
プＳ６０４において、単語ペアがなくなれば、翻訳パタ
ーン生成部３４３による翻訳パターン生成処理（ステッ
プＳ６０７）に移る。その後、使用者がパラメータ設定
部３１３で指定した出力形式に翻訳パターンを整形し
（ステップＳ６０７）、出力部３１２で結果を表示し
（ステップＳ６０８）、処理を終了する。In the next step S 604, when there is a word pair to be a translation candidate,
The co-occurrence word estimating unit 342b performs the co-occurrence word estimation processing (step S605) and the translation pattern word order determination processing (step S606), and returns to the bilingual word estimation processing of step S603 again. The bilingual word estimation process and the co-occurrence word estimation process are repeated until there are no more word pairs to be extracted. If there is no word pair in step S604, the process proceeds to the translation pattern generation process by the translation pattern generation unit 343 (step S607). Thereafter, the translation pattern is shaped into the output format designated by the user with the parameter setting unit 313 (step S607), the result is displayed on the output unit 312 (step S608), and the process ends.

【００８９】以下に、対訳単語推定処理以降のアルゴリ
ズムの詳細および具体例を説明する。図１３、図１４、
図１５は、対訳単語推定部３４２ａによる対訳単語推定
処理（ステップＳ６０３，Ｓ６０４）のアルゴリズムを
フローチャートで示したものである。The details and specific examples of the algorithm after the bilingual word estimation processing will be described below. 13, 14,
FIG. 15 is a flowchart illustrating an algorithm of the translated word estimation process (steps S603 and S604) performed by the translated word estimation unit 342a.

【００９０】先ず、パターン作成カウンタｎに１をセッ
トする（ステップＳ７０１）。パターン作成カウンタ
は、繰り返し行う対訳単語推定処理の処理回数を示し、
後述する対訳文テーブル（SentJn，SentEn）を管理する
ために使用される。First, 1 is set to the pattern creation counter n (step S701). The pattern creation counter indicates the number of parallel translation word estimation processes that are performed repeatedly,
It is used to manage a bilingual sentence table (SentJn, SentEn) described later.

【００９１】形態素解析結果の中の標準形結果（図２
９、図３０（１），（２））から、自立語（候補品詞中
に名（名詞）、動（動詞）、形（形容詞または形容動
詞）、副（副詞）を含む日本語単語、候補品詞中にｎ
（名詞），ｖ（動詞），ａｄｊ（副詞）を含む英単語）
の単語見出しを各言語の対訳文テーブル（SentJn［行番
号］＝｛日本語対訳文単語リスト｝，SentEn［行番号］
＝｛英語対訳文単語リスト｝（ｎ：パターン作成カウン
タ（ｎ＝１）））に格納する（ステップＳ７０２）。そ
の結果が図３１に示す対訳文テーブルであるとする。The standard form results in the morphological analysis results (FIG. 2)
9. From FIG. 30 (1) and (2)), a Japanese word including a self-sufficient word (name (noun), verb (verb), form (adjective or adjective verb), adverb (adverb) in candidate part of speech) N in part of speech
(English words including (noun), v (verb), and adj (adverb))
Of the word heading of each language in the bilingual sentence table (SentJn [line number] = {Japanese bilingual sentence word list}, SentEn [line number]
= {English bilingual sentence word list} (n: pattern creation counter (n = 1))) (step S702). It is assumed that the result is the bilingual sentence table shown in FIG.

【００９２】次に、対訳文テーブルに出現する単語の出
現回数を数え上げ、各単語の単語見出しをインデックス
とする単語テーブル（FreqJ ［日本語単語］＝日本語単
語の出現回数，FreqE ［英単語］＝英単語の出現回数）
に格納する（ステップＳ７０３）。これにより作成され
た単語テーブルが図３２の単語テーブルであるとする。Next, the number of appearances of words appearing in the bilingual sentence table is counted, and the word table (FreqJ [Japanese word] = the number of appearances of Japanese words, FreqE [English word]) is indexed by the word heading of each word. = The number of occurrences of English words)
(Step S703). It is assumed that the created word table is the word table shown in FIG.

【００９３】更に、対訳文に含まれる日本語単語と英単
語の組合せ（単語ペア）の出現回数を数え上げ、その出
現回数と、出現する文の行番号を、各単語ペアをインデ
ックスとする単語ペアテーブル（FreqJE［日本語単語
英単語］＝日本語単語と英単語が対訳文として出現する
回数，SentJE［日本語単語英単語］＝｛出現行番号リ
スト｝）に格納する（ステップＳ７０４）。Further, the number of appearances of a combination (word pair) of a Japanese word and an English word included in the bilingual sentence is counted, and the number of appearances and the line number of the appearing sentence are used as a word pair with each word pair as an index. Table (FreqJE [Japanese word
(English word) = the number of times a Japanese word and an English word appear as a bilingual sentence, and are stored in SentJE [Japanese word English word] = {appearance line number list} (step S704).

【００９４】次に、単語ペアテーブルに格納された単語
ペアにおける単語対応度を計算する（ステップＳ７０
５）。単語対応度は、図１４のステップＳ７０５中に示
す三つの式（SimJE ［JW EW ］＝2*FreqJE［JW EW ］／
（FreqJ ［JW］＋FreqE ［EW］）（日英対応度と呼
ぶ）、SimJ［JW EW ］＝FreqJE［JW EW ］／（FreqJ
［JW］）（日本語対応度と呼ぶ）、SimE［JW EW ］＝Fr
eqJE［JW EW ］／（FreqE ［EW］）（英語対応度と呼
ぶ）、（JW：日本語単語 EW：英単語））より計算さ
れ、その値を単語ペアテーブルに追加する。但し、日英
対応度SimJE ［JW EW ］の値は閾値ｘ以上であることを
条件とし（ステップＳ７０６）、閾値を満たさない場合
は、SimEJ ，SimJ，SimEの値に０を付与する。各単語対
応度は０から１までの値をとり、対訳関係にある単語ペ
アほど１に近い値をとる。尚、閾値ｘの値は、使用者が
パラメータ設定部３１３で好みに応じて変更することが
できる。その場合、値を高くすると単語候補の誤り率は
減るが獲得できる翻訳パターン数は減少する。本具体例
では、ｘ＝０．１とする。図４２に具体例３における単
語ペアテーブルの例を示す。Next, the degree of word correspondence in the word pair stored in the word pair table is calculated (step S70).
5). The degree of word correspondence is calculated by three expressions (SimJE [JW EW] = 2 * FreqJE [JW EW] /) shown in step S705 in FIG.
(FreqJ [JW] + FreqE [EW]) (called Japanese-English correspondence), SimJ [JWEW] = FreqJE [JWEW] / (FreqJ
[JW]) (called Japanese language support), SimE [JW EW] = Fr
It is calculated from eqJE [JW EW] / (FreqE [EW]) (called the degree of English correspondence) and (JW: Japanese word EW: English word)), and the value is added to the word pair table. However, on condition that the value of the degree of Japanese-English correspondence SimJE [JW EW] is equal to or greater than the threshold value x (step S706), if the threshold value is not satisfied, 0 is added to the values of SimEJ, SimJ, and SimE. Each word correspondence takes a value from 0 to 1, and a word pair having a bilingual relationship has a value closer to 1. The value of the threshold value x can be changed by the user according to his / her preference in the parameter setting unit 313. In this case, when the value is increased, the error rate of the word candidate decreases, but the number of translation patterns that can be obtained decreases. In this specific example, x = 0.1. FIG. 42 shows an example of the word pair table in the specific example 3.

【００９５】次に、単語ペアテーブルのインデックスに
格納されている日本語単語と英単語において、日本語単
語とペアとなる英単語の中で最も高い日英対応度（SimJ
E ［日本語単語英単語］）を持つ英単語を抽出する。
そして、その英単語とペアとなる日本語単語の中で最も
高い日英対応度をとる日本語単語が元の日本語単語であ
るならば、その単語ペアを対訳候補と推定し（ステップ
Ｓ７０７）、その単語ペアを単語ペア候補リスト（Cand
JE［日本語単語英単語］）に格納する（ステップＳ７
０８）。尚、もし最も高い日英対応度を持つ単語ペアが
複数あるならば、その全ての単語ペアを格納する。Next, of the Japanese words and English words stored in the index of the word pair table, the highest degree of Japanese-English correspondence (SimJ
E Extract English words that have [Japanese word English word]).
If the Japanese word having the highest degree of Japanese-English correspondence among the English words paired with the English word is the original Japanese word, the word pair is estimated as a bilingual candidate (step S707). , And the word pair to the word pair candidate list (Cand
JE [Japanese word English word]) (step S7)
08). If there are a plurality of word pairs having the highest degree of correspondence between Japanese and English, all the word pairs are stored.

【００９６】例えば、図４２の場合、日本語単語「リン
ク」と共にインデックスを形成する英単語は“link”、
“symbolic”、“owner ”、“change”であるが、その
中で最も高い日英単語対応度（SimJE ）をとる英単語は
“link”（SimJE=0.75）である。一方、同様に、英単語
“link”において最も高い日英対応度をとる日本語単語
は、「リンク」である。その結果、両者を対訳候補とし
て推定し、単語ペア候補リスト（CandJE［リンク link
］）に格納する（ステップＳ７０８）。ここで抽出さ
れた対訳候補があったならば（ステップＳ７０９）、共
起単語推定部３４２ｂによる共起単語推定処理（図１２
におけるステップＳ６０５）、翻訳パターン語順決定処
理（図１２におけるステップＳ６０６）を行う。以下、
この共起単語推定処理（ステップＳ６０５）、翻訳パタ
ーン語順決定処理（ステップＳ６０６）について説明す
る。For example, in the case of FIG. 42, the English words forming the index together with the Japanese word “link” are “link”,
Among them, “symbolic”, “owner” and “change”, the English word having the highest Japanese-English word correspondence (SimJE) is “link” (SimJE = 0.75). On the other hand, similarly, the Japanese word having the highest degree of Japanese-English correspondence in the English word “link” is “link”. As a result, both are estimated as bilingual candidates, and a word pair candidate list (CandJE [link link
]) (Step S708). If there is a bilingual candidate extracted here (step S709), co-occurrence word estimation processing by co-occurrence word estimation unit 342b (FIG. 12)
, The translation pattern word order determination process (step S606 in FIG. 12) is performed. Less than,
The co-occurrence word estimation processing (step S605) and the translation pattern word order determination processing (step S606) will be described.

【００９７】図１６は、共起単語推定処理と翻訳パター
ン語順決定処理のアルゴリズムをフローチャートで示し
たものである。これら処理の概要を説明すると、上述し
た図１３〜図１５における対訳単語推定処理で抽出され
た個々の単語ペアに対して、共起単語推定処理、パター
ン語順決定処理を行い、単語ペアがなくなったらパター
ン作成カウンタを一つ進め（ステップＳ８０８）、対訳
単語推定処理（図１３のステップＳ７０２）に再び戻
る。以下、処理の詳細を説明する。FIG. 16 is a flowchart showing the algorithm of the co-occurrence word estimation processing and the translation pattern word order determination processing. An outline of these processes is as follows. For each word pair extracted in the above-described bilingual word estimation process in FIGS. 13 to 15, a co-occurrence word estimation process and a pattern word order determination process are performed. The pattern creation counter is advanced by one (step S808), and the process returns to the parallel word estimation process (step S702 in FIG. 13). Hereinafter, the details of the processing will be described.

【００９８】最初に、対訳単語推定処理で作成した単語
ペア候補リスト（CandJE［日本語単語英単語］）を読
み込み（ステップＳ８０１）、単語ペア候補リストから
日本語単語と英単語の単語ペアを一つ取り出す（ステッ
プＳ８０２）。First, a word pair candidate list (CandJE [Japanese word English word]) created by the parallel word estimation process is read (step S801), and a word pair of a Japanese word and an English word is extracted from the word pair candidate list. One is taken out (step S802).

【００９９】図４３に単語ペア候補リストを示す。ここ
では、単語ペア候補リストの要素“リンク link ”を例
に挙げて、ステップＳ８０３〜Ｓ８０７の処理を説明す
る。先ず、英単語“link”に注目し、単語ペアテーブル
に格納されている“link”の対訳候補となる日本語単語
（単語テーブル（SimJE ［JW“link”］として単語ペア
テーブルに登録されているJW（但し、SimJE ［JW“lin
k”］！＝０））を全て抽出する（ステップＳ８０
３）。“link”の場合、図４２より抽出される日本語単
語JWは｛シンボリック，テンポラリ，リンク｝となる。FIG. 43 shows a word pair candidate list. Here, the processing of steps S803 to S807 will be described using the element “link” of the word pair candidate list as an example. First, paying attention to the English word “link”, a Japanese word (word table (SimJE [JW “link”] registered as a word table (SimJE [JW “link”]) as a translation candidate of “link” stored in the word pair table JW (However, SimJE [JW “lin
k "]! = 0)) are all extracted (step S80).
3). In the case of “link”, the Japanese word JW extracted from FIG. 42 is {symbol, temporary, link}.

【０１００】更に、ステップＳ８０３では、抽出された
日本語単語に関する単語ペアの英語対応度（SimE［JW英
単語］）と出現行番号（SentJE［JW英単語］）が元の単
語ペア（“リンク link ”）と等しいJW（SimE［JW“li
nk”］＝SimE［“リンク”“link”］かつSentJE［JW
“link”］＝SentJE［“リンク”“link”］の条件を満
たす全てのJW）を抽出する。ここでは図４２より、｛シ
ンボリック，リンク｝が抽出される。即ち、“リンク l
ink ”のSimEは、0.63、SentJEは、4 153 154…であ
り、これと等しい値を持つ単語ペアは、“シンボリック
link”と、元の単語ペア“リンク link”である。従
って、その日本語単語として、｛シンボリック，リン
ク｝が抽出されることになる。Further, at step S803, the English word correspondence (SimE [JW English word]) and the appearance line number (SentJE [JW English word]) of the extracted word pair relating to the Japanese word are used as the original word pair ("link"). link ”) equals JW (SimE [JW“ li
nk ”] = SimE [“ link ”“ link ”] and SentJE [JW
“Link”] = SentJE [All JWs that satisfy the condition of “link” “link”] are extracted. Here, {symbol, link} is extracted from FIG. That is, "link l
Ink ”has a SimE of 0.63 and a SentJE of 4 153 154...
link ”and the original word pair“ link link. ”Therefore, {symbol, link} is extracted as the Japanese word.

【０１０１】一方、今度は日本語単語「リンク」に注目
し、単語ペアテーブルに格納されている「リンク」の対
訳候補となる英単語（単語テーブル（SimJE ［日本語単
語EW］として単語ペアテーブルに登録されているEW（但
し、SimJE ［日本語単語EW］！＝０））を全て抽出する
（ステップＳ８０４）。「リンク」の場合、図４２より
抽出される英単語は｛link，owner ，change，symboli
c｝となる。On the other hand, this time, attention is paid to the Japanese word “link”, and an English word (word table (SimJE [Japanese word EW]) as a translation candidate of “link” stored in the word pair table is used. (Step S804) In the case of "link", the English words extracted from FIG. 42 are @link, owner, change, symboli
c｝.

【０１０２】更に、ステップＳ８０４では、抽出された
英単語に関する単語ペアの日本語対応度（SimJ［“リン
ク”EW］）と出現行番号（SentJE［“リンク”EW］）が
元の単語ペアと等しいEW（SimJ［“リンク”EW］＝SimJ
［“リンク”“link”］かつSentJE［“リンク”EW］＝
SentJE［“リンク”“link”］の条件を満たす全てのE
W）を抽出する。ここでは図４２より、｛link，symboli
c ｝が選択される。即ち、“リンク link ”のSimJは、
0.92、SentJEは、4 153 154…であり、これと等しい値
を持つ単語ペアは、“リンク symbolic”と元の単語ペ
ア“リンク link”である。従って、その英単語とし
て、｛link，symbolic ｝が抽出されることになる。Further, in step S804, the Japanese word correspondence (SimJ [“link” EW]) and the appearance line number (SentJE [“link” EW]) of the word pair relating to the extracted English word match the original word pair. Equal EW (SimJ [“link” EW] = SimJ
[“Link” “link”] and SentJE [“link” EW] =
All Es that satisfy the conditions of SentJE [“link” “link”]
W) extract. Here, from FIG. 42, ｛link, symboli
c｝ is selected. That is, SimJ of "link" is
0.92, SentJE is 4 153 154..., And the word pair having a value equal thereto is “link symbolic” and the original word pair “link link”. Therefore, {link, symbolic} is extracted as the English word.

【０１０３】次に、翻訳パターン語順決定処理（ステッ
プＳ８０５、Ｓ８０６）を行う。単語ペアテーブルか
ら、単語ペア候補の日本語単語、英単語が出現する対訳
文の行番号（SentJE［日本語単語英単語］の値（リス
トの要素の一つ）を取り出す（ステップＳ８０５）。次
いで、ステップＳ８０６では、共起単語推定処理で抽出
した対訳候補となる日本語単語の語順を、日本語の対訳
文テーブル（SentJ1［行番号］）を参照して決定する。
また、対訳候補となる英単語の語順も同様に、英語の対
訳文テーブル（SentE1［行番号］）を参照して決定す
る。もし、対訳候補となる単語に別の単語列が割り込む
場合は、割り込み単語列を“＊”と、変数として表す。
“リンク link ”の場合、単語ペアテーブル（図４２参
照）より、出現行番号４が得られ、日本語対訳文テーブ
ル（SentJ1［４］）により“シンボリックリンク”
が、一方、英語対訳文テーブル（SentE1［４］）により
“symbolic link ”が得られる。Next, translation pattern word order determination processing (steps S805 and S806) is performed. From the word pair table, the value (one of the elements of the list) of the line number (SentJE [Japanese word English word]) of the bilingual sentence in which the Japanese word and English word of the word pair candidate appear (step S805). In step S806, the word order of the Japanese words to be the translation candidates extracted in the co-occurrence word estimation processing is determined with reference to the Japanese translation table (SentJ1 [line number]).
Similarly, the word order of English words that are translation candidates is determined with reference to the English translation table (SentE1 [line number]). If another word string interrupts a word that is a translation candidate, the interrupt word string is represented as “*” as a variable.
In the case of “link link”, the appearance line number 4 is obtained from the word pair table (see FIG. 42), and the “symbolic link” is obtained from the Japanese bilingual sentence table (SentJ1 [4]).
On the other hand, “symbolic link” is obtained from the English bilingual sentence table (SentE1 [4]).

【０１０４】得られた結果は、パターンテーブル（Patt
ern ［日本語パターン英語パターン］＝｛出現行番号
のリスト｝）に格納される。本例の場合は、Pattern
［シンボリック＿リンク Symbolic link］＝｛4, 153,
154,…｝（“＿”は複数単語からなるパターンにおける
単語間の空白を示す）となる。図４４に得られたパター
ンテーブルの例を示す。The obtained result is stored in a pattern table (Patt
ern [Japanese pattern English pattern] = {List of occurrence line numbers}). In this example, Pattern
[Symbolic link] = $ 4,153,
154,...｝ (“_” Indicates a space between words in a pattern including a plurality of words). FIG. 44 shows an example of the obtained pattern table.

【０１０５】単語ペア候補リストの全ての要素に対して
上記の処理を行った後、パターン作成カウンタｎに１を
加え（ステップＳ８０７）、図１３の対訳単語推定処理
に戻る。但し、ｎが２以上の場合は、形態素解析結果を
対訳文テーブルに読み込む場合、パターンテーブル（Pa
ttern ［日本語パターン英語パターン］）を参照し、
もしパターンテーブルに格納されている翻訳パターンが
対訳文中に存在すれば、その翻訳パターンを構成する単
語については対訳文テーブルに格納しない。After performing the above processing for all elements of the word pair candidate list, 1 is added to the pattern creation counter n (step S807), and the process returns to the bilingual word estimation processing of FIG. However, when n is 2 or more, when reading the morphological analysis result into the bilingual sentence table, the pattern table (Pa
ttern [Japanese pattern English pattern])
If the translation pattern stored in the pattern table exists in the bilingual sentence, words constituting the translation pattern are not stored in the bilingual sentence table.

【０１０６】一例を挙げると、パターン作成カウンタが
２で、図４４のパターンテーブルが存在する場合、図４
４に格納されている“情報 information”、“指示＿与
えるanswer”は対訳文テーブルに格納されない。そし
て、新しく作成された対訳文テーブルに対し、対訳単語
推定処理、共起単語推定処理を繰り返し行い、対訳単語
推定処理で抽出される単語ペア候補がなくなれば、翻訳
パターン生成処理（図１５におけるステップＳ７１１）
に移る。ここで、繰り返し処理を行う理由は、具体例１
と同じく、繰り返し処理を行うことにより、前回の対訳
単語推定処理では抽出できなかった単語ペアを抽出する
ためである。To give an example, if the pattern creation counter is 2 and the pattern table of FIG.
4 "information" and "instruction_give answer" are not stored in the bilingual sentence table. Then, the bilingual word estimation processing and the co-occurrence word estimation processing are repeatedly performed on the newly created bilingual sentence table. When there are no more word pair candidates extracted in the bilingual word estimation processing, the translation pattern generation processing (step in FIG. 15) S711)
Move on to Here, the reason why the repetitive processing is performed is as follows:
In the same way as above, it is to extract word pairs that could not be extracted in the previous bilingual word estimation processing by repeating the processing.

【０１０７】ステップＳ７１１の翻訳パターン生成処理
は、使用者がパラメータ設定部３１２によって設定した
要求に従い、対訳単語推定処理で作成したパターンテー
ブル（Pattern ［日本語パターン英語パターン］）を
整形する。例えば、使用者が「日本語パターン対英語パ
ターンの形式で翻訳パターンを出力する」という要求を
与えると、翻訳パターン生成部３４３は、パラメータ設
定部３１２からその命令を受け取り、パターンテーブル
（Pattern ［日本語パターン英語パターン］）の整形
された結果を出力する。また「品詞情報付きで出力す
る」という命令を与えると、翻訳パターン生成部３４３
は、パラメータ設定部３１２からその命令を受け取り、
パターンテーブルに格納されている行番号リストの行番
号を一つ取り出し、その行番号が示す対訳文テーブルを
参照し、該当単語が自立語の何番目の単語であるかを抽
出し、その順番に位置する単語を形態素解析結果から取
り出し、そこに格納されている品詞も添えて出力する。
使用者は、本結果の正否を確認し、修正することもでき
る。In the translation pattern generation processing in step S711, the pattern table (Pattern [Japanese pattern English pattern]) created in the bilingual word estimation processing is shaped according to the request set by the user through the parameter setting unit 312. For example, when the user gives a request to “output a translation pattern in the form of a Japanese pattern versus an English pattern”, the translation pattern generation unit 343 receives the instruction from the parameter setting unit 312, and receives a pattern table (Pattern [Japan Output the formatted result of the word pattern English pattern]). When a command of “output with part of speech information” is given, the translation pattern generation unit 343
Receives the instruction from the parameter setting unit 312,
Take out one line number of the line number list stored in the pattern table, refer to the bilingual sentence table indicated by the line number, extract the number of the independent word, and in that order The located word is extracted from the morphological analysis result, and the part of speech stored therein is output together.
The user can confirm whether the result is correct or not and correct the result.

【０１０８】〈効果〉以上のように、具体例３は、具体
例１の効果に加えて、以下の効果をもたらす。即ち、片
言語内での共起関係を抽出することにより、対訳文書中
の対訳単語だけでなく、その対訳文書中に出現する固定
的な言い回しを翻訳単位とした翻訳パターンを獲得する
ことができる。<Effects> As described above, Embodiment 3 has the following effects in addition to the effects of Embodiment 1. That is, by extracting a co-occurrence relationship in a single language, it is possible to obtain not only a translated word in a bilingual document but also a translation pattern using a fixed phrase appearing in the bilingual document as a translation unit. .

【０１０９】《具体例４》〈構成〉図１７は本発明の翻訳パターン作成装置におけ
る具体例４の構成図である。本具体例は、上記の具体例
３と同様に、文書中に複数回出現する定型パターンを翻
訳単位とした翻訳パターンを作成できる点に特徴を有し
ており、本具体例４では、定型パターンを連続する単語
に限定したものである。<< Embodiment 4 >><Structure> FIG. 17 is a block diagram of Embodiment 4 of the translation pattern creating apparatus of the present invention. This specific example is characterized in that a translation pattern can be created using a fixed pattern that appears multiple times in a document as a translation unit, as in the above-described specific example 3. Is limited to consecutive words.

【０１１０】具体例４の構成は、上記具体例３に準じて
いるが、翻訳パターン推定部３４４の機能が異なってい
る。この翻訳パターン推定部３４４は、対訳文書から、
それぞれ連続する複数の単語からなる任意の単語列の単
語列対応度を求め、この単語列対応度に基づいて翻訳パ
ターンとなる単語ペアを抽出する機能を有している。
尚、他の各構成は、具体例３と同様であるため、対応す
る部分に同一符号を付してその説明を省略する。The configuration of the fourth embodiment is similar to that of the third embodiment except that the function of the translation pattern estimating section 344 is different. The translation pattern estimating unit 344 calculates the translation pattern
It has a function of obtaining a word string correspondence degree of an arbitrary word string composed of a plurality of consecutive words, and extracting a word pair serving as a translation pattern based on the word string correspondence degree.
Since the other components are the same as those of the third embodiment, the corresponding portions are denoted by the same reference numerals and description thereof will be omitted.

【０１１１】〈動作〉図１８は、本装置が英語と日本語
の対訳文書から翻訳パターンを作成し、表示するまでの
フローチャートである。対訳文書の入力（ステップＳ９
０１）、日本語・英語形態素解析処理（ステップＳ９０
２）は、具体例１と同様であるため説明を省略する。そ
して、ステップＳ９０２の形態素解析処理によって各形
態素解析結果が対訳文書格納部３３２に格納された後、
翻訳パターン推定部３４４において、翻訳パターン推定
処理（ステップＳ９０３）を行う。ここでは、翻訳パタ
ーン候補となる日本語と英語の単語列のペアを抽出す
る。次のステップＳ９０４の判定において、翻訳パター
ン候補となる単語列のペアが存在した場合、再びステッ
プＳ９０３の翻訳パターン推定処理に戻る。抽出される
単語列のペアがなくなるまで、このような翻訳パターン
推定処理を繰り返し、ステップＳ９０４において、単語
列のペアがなくなれば、翻訳パターン生成部３４３によ
る翻訳パターン生成処理（ステップＳ９０５）に移る。
その後、使用者がパラメータ設定部３１３で指定した出
力形式に翻訳パターンを整形し（ステップＳ９０５）、
出力部３１２で結果を表示し（ステップＳ９０６）、処
理を終了する。<Operation> FIG. 18 is a flowchart showing a process in which the apparatus creates and displays a translation pattern from bilingual documents in English and Japanese. Input of bilingual document (step S9)
01), Japanese / English morphological analysis processing (step S90)
2) is the same as in the first embodiment, and a description thereof will not be repeated. Then, after each morphological analysis result is stored in the bilingual document storage unit 332 by the morphological analysis processing in step S902,
The translation pattern estimation unit 344 performs a translation pattern estimation process (step S903). Here, pairs of Japanese and English word strings that are translation pattern candidates are extracted. If it is determined in the next step S904 that there is a pair of word strings serving as translation pattern candidates, the process returns to the translation pattern estimation processing in step S903. Such a translation pattern estimation process is repeated until there are no more pairs of word strings to be extracted. If there is no longer any pair of word strings in step S904, the process proceeds to a translation pattern generation process by the translation pattern generation unit 343 (step S905).
After that, the translation pattern is shaped into the output format specified by the user in the parameter setting unit 313 (step S905),
The result is displayed on the output unit 312 (step S906), and the process ends.

【０１１２】以下、翻訳パターン推定処理以降のアルゴ
リズムの詳細を説明する。図１９、図２０は、翻訳パタ
ーン推定部３４４による翻訳パターン推定処理（ステッ
プＳ９０３，Ｓ９０４）のアルゴリズムをフローチャー
トで示したものである。The details of the algorithm after the translation pattern estimation processing will be described below. FIGS. 19 and 20 are flowcharts showing the algorithm of the translation pattern estimation processing (steps S903 and S904) by the translation pattern estimation unit 344.

【０１１３】先ず、日本語、英語の出現回数の閾値（ｆ
_min ）を設定する（ステップＳ１００１）。これは、使
用者がパラメータ設定部３１２で設定するか、本装置が
適当な値を設定する（入力テキストの種類によるが、実
験によると、平均ｆ_min ＝１０が適当である。）。First, the threshold of the number of appearances of Japanese and English (f
_min ) is set (step S1001). This is set by the user in the parameter setting unit 312 or the present apparatus sets an appropriate value (according to the type of input text, an average of f _min = 10 is appropriate according to experiments).

【０１１４】次にパターン作成カウンタｎに１をセット
する（ステップＳ１００２）。パターン作成カウンタ
は、繰り返し行う対訳単語列推定処理の処理回数を示
し、後述する対訳文テーブル（SentJn，SentEn）を管理
するために使用される。Next, 1 is set to the pattern creation counter n (step S1002). The pattern creation counter indicates the number of times of the parallel translation word string estimation process to be performed repeatedly, and is used to manage a translation text table (SentJn, SentEn) described later.

【０１１５】次に、形態素解析結果の中の標準形結果
（図２９（１）、図３０（２））から、複数回出現する
単語列を格納した対訳文テーブルを作成する（ステップ
Ｓ１００３）。その作成方法は以下のとおりである。Next, a bilingual sentence table storing word strings that appear a plurality of times is created from the standard form results (FIG. 29 (1), FIG. 30 (2)) in the morphological analysis results (step S1003). The creation method is as follows.

【０１１６】形態素解析結果の中の標準形結果に出現す
る日本語自立語単語、英語自立語単語の出現回数を数
え、２回以上出現した単語のみを抽出する。次にその単
語を先頭とした２連続単語を作成し、その出現回数を数
え、２回以上出現した２連続単語の見出しを抽出する。
更に、２回以上出現した２連続単語を先頭とする３連続
単語に対しても同様に処理し、以降、連続単語の数を拡
張しながら２回以上出現した連続単語の見出しを抽出
し、それぞれを対訳文テーブルに格納する（以降、ここ
で抽出されたｎ個の単語からなる連続単語の見出しを単
純に単語列と呼ぶ。）。The number of appearances of the Japanese independent word and the English independent word appearing in the standard form result in the morphological analysis result is counted, and only the word that appears twice or more is extracted. Next, two consecutive words starting with the word are created, the number of appearances is counted, and headings of the two consecutive words that appear two or more times are extracted.
Further, the same processing is performed on three consecutive words starting with two consecutive words that appear two or more times, and thereafter, the headings of the consecutive words that appear two or more times are extracted while expanding the number of consecutive words. Is stored in the bilingual sentence table (hereinafter, a heading of a continuous word composed of n words extracted here is simply referred to as a word string).

【０１１７】次に、対訳文テーブルに出現する単語列の
出現回数を数え上げ、各単語の単語列見出しをインデッ
クスとする単語列テーブル（FreqJ ［日本語単語列］＝
日本語単語列の出現回数、FreqE ［英語単語列］＝英語
単語列の出現回数）に格納する（ステップＳ１００
４）。Next, the number of appearances of the word strings appearing in the bilingual sentence table is counted, and the word string table (FreqJ [Japanese word string] =
The number of appearances of the Japanese word string, FreqE [English word string] = the number of appearances of the English word string) are stored (step S100).
4).

【０１１８】さらに、対訳文に含まれる日本語単語列と
英語単語列の組合せ（単語列ペア）の出現回数を数え上
げ、その出現回数と出現する文の行番号を、各単語列ペ
アをインデックスとする単語列ペアテーブル（FreqJE
［日本語単語列英語単語列］＝日本語単語列と英語単
語列が対訳文として出現する回数、SentJE［日本語単語
列英語単語列］＝出現行番号リスト）に格納する（ス
テップＳ１００５）。Further, the number of appearances of a combination of a Japanese word string and an English word string (word string pair) included in the bilingual sentence is counted, and the number of appearances and the line number of the sentence are identified by using each word string pair as an index. Word string pair table (FreqJE
[Japanese word string English word string] = the number of times a Japanese word string and an English word string appear as a bilingual sentence, and stored in SentJE [Japanese word string English word string] = appearance line number list) (step S1005).

【０１１９】次に、単語列ペアテーブルに格納された単
語列ペアにおける単語列の対応度（SimJE ［日本語単語
列英語単語列］）を計算する（ステップＳ１００
６）。ここでは、単語列の対応度を計算する式として、
図２０の式（１００６ａ）を用いる。但し、対応度は、
予めステップＳ１００２で設定された出現回数の閾値ｆ
_mi _n による式（１００６ｂ）を満たすことを条件とす
る。Next, the degree of correspondence (SimJE [Japanese word string English word string]) of the word strings in the word string pairs stored in the word string pair table is calculated (step S100).
6). Here, as an expression for calculating the degree of correspondence between word strings,
The equation (1006a) in FIG. 20 is used. However, the degree of correspondence is
The threshold value f of the number of appearances set in advance in step S1002
_mi provided that satisfies the formula (1006b) by _n.

【０１２０】次に、単語列ペアテーブルのインデックス
に格納されている日本語単語列と英語単語列において、
日本語単語列とペアとなる英語単語列の中で最も高い対
応度（SimJE ［日本語単語列英語単語列］）を持つ英
語単語列を抽出する。そして、その英語単語列とペアと
なる日本語単語列の中の最も高い対応度をとる日本語単
語列が元の日本語単語列であるならば、その単語列ペア
を対訳パターンと決定し（ステップＳ１００７）、その
単語列ペアを対訳パターンリストに格納する（ステップ
Ｓ１００８）。Next, in the Japanese word string and the English word string stored in the index of the word string pair table,
The English word string having the highest correspondence (SimJE [Japanese word string English word string]) among the English word strings paired with the Japanese word string is extracted. Then, if the Japanese word string having the highest correspondence among the Japanese word strings paired with the English word string is the original Japanese word string, the word string pair is determined as a bilingual pattern ( In step S1007, the word string pair is stored in the bilingual pattern list (step S1008).

【０１２１】具体例を挙げると、まず、日本語単語列
「インタネットアドレス」に対して、最も高い対応度
を持つ英語単語列が“internet address”であった場
合、「インタネットアドレス」と“internet addres
s”のペアが抽出される。一方、日本語単語列「インタ
ネット」においても最も高い対応度を持つ英語単語列が
“internet address”であった場合は、この場合も同様
に「インタネット」と“internet address”のペアが抽
出される。次に、英語単語列“internet address”に対
して、最も高い日本語単語列を抽出する。それが、「イ
ンタネットアドレス」であった場合は、「インタネッ
トアドレス」の最も高い対応度を持つ英語単語列が
“internet address”であることが既に確認されている
ので、「インタネット」と“internet address”のペア
が対訳パターンと決定され、対訳パターンリストに格納
される。つまり、この時点で、「インタネット」と“in
ternet address”のペアは棄却される。As a specific example, first, if the English word string having the highest correspondence to the Japanese word string "Internet address" is "internet address", the "Internet address" and "internet addres
On the other hand, if the English word string having the highest correspondence in the Japanese word string “Internet” is “internet address”, the “Internet” and “Internet” An internet address "pair is extracted. Next, the highest Japanese word string is extracted for the English word string “internet address”. If it is "Internet address", it has already been confirmed that the English word string having the highest correspondence of "Internet address" is "internet address". Is determined as the bilingual pattern and stored in the bilingual pattern list. In other words, at this point, "Internet" and "in
The "ternet address" pair is rejected.

【０１２２】ここで対訳パターンリストに格納された単
語列ペアが存在すれば（ステップＳ１００９）、パター
ン作成カウンタｎに１を加え（ステップＳ１０１０）、
図１９のステップＳ１００３に戻る。但し、ステップＳ
１００３の処理において、ｎが２以上の場合は、単語列
を対訳文テーブルに格納する場合、パターンテーブル
（［日本語パターン英語パターン］）を参照し、パター
ンテーブルに格納されている単語列ペアが対訳文中に存
在すれば、その単語列を含む単語列については、対訳文
テーブルに格納しない。以降、ｎ＝１の場合と同様に処
理を続ける。If there is a word string pair stored in the bilingual pattern list (step S1009), 1 is added to the pattern creation counter n (step S1010).
It returns to step S1003 of FIG. However, step S
In the process of 1003, when n is 2 or more, when the word string is stored in the bilingual sentence table, the word string pair stored in the pattern table is referred to by referring to the pattern table ([Japanese pattern English pattern]). If it exists in the bilingual sentence, the word string including the word string is not stored in the bilingual sentence table. Thereafter, the processing is continued as in the case of n = 1.

【０１２３】一方、対訳パターンリストに格納された単
語列ペアが存在しなければ（ステップＳ１００９）、出
現回数の閾値ｆ_min を１減らした値に再設定する（ステ
ップＳ１０１１）。[0123] On the other hand, if there is no word sequence pairs stored in the bilingual pattern list (Step S1009), it resets the threshold value f _min number of occurrences to a value decremented by one (step S1011).

【０１２４】この再設定は、対訳パターンの候補となる
日本語、英語の単語列を徐々に増やすことを意味する。
出現回数の多さは精度に反映する（つまり、ペアとして
出現する回数が多いほど対応は確実となると言える）た
め、出現回数の多いものから順に対訳パターンを抽出す
ることは、品質の高い対訳パターンから順に対訳パター
ンを抽出することにつながる。従って、使用者がパラメ
ータ設定部３１２によってｚの条件を変更することによ
って、対訳パターンの品質と作成する個数を調整するこ
とができるという利点が生まれる。This resetting means that the number of Japanese and English word strings that are candidates for the translation pattern is gradually increased.
Since the number of appearances is reflected in the accuracy (that is, it can be said that the correspondence is more reliable as the number of occurrences as a pair increases), extracting bilingual patterns in descending order of appearances is a high-quality bilingual pattern. This leads to the extraction of the bilingual pattern in order. Therefore, there is an advantage that the quality of the bilingual pattern and the number of generated translation patterns can be adjusted by changing the condition of z by the user using the parameter setting unit 312.

【０１２５】次に、出現回数の閾値ｆ_min を検査する
（ステップＳ１０１２）。もし、出現回数の閾値ｆ_min
がある値ｚ（図２０ではｚ＝２とする）以下になれば、
翻訳パターン生成処理に移る（つまり、ｚは出現回数の
最低条件となる。従って、ｚが大きいほど抽出される対
訳パターンの精度は上がるが、逆に抽出数は減少す
る。）。Next, the threshold value f _min of the number of appearances is checked (step S1012). If the threshold of the number of appearances is f _min
If a value becomes equal to or less than a certain value z (z = 2 in FIG. 20),
The processing shifts to the translation pattern generation processing (that is, z is the minimum condition of the number of appearances. Therefore, as z increases, the accuracy of the extracted bilingual pattern increases, but the number of extractions decreases.).

【０１２６】一方、出現回数の閾値ｆ_min がある値ｚよ
り以上であれば、図１９のステップＳ１００２に戻り、
パターン作成カウンタを１に初期化した後、以降の処理
を繰り返す。On the other hand, if the threshold f _{min of the} number of appearances is equal to or greater than a certain value z, the flow returns to step S1002 in FIG.
After the pattern creation counter is initialized to 1, the subsequent processing is repeated.

【０１２７】これ以降の翻訳パターン生成処理は、具体
例３と同様である。即ち、使用者がパラメータ設定部３
１３によって設定した要求に従い、翻訳パターン推定処
理で得た翻訳パターンを整形して出力する。そして、使
用者は、本結果の正否を確認し、修正することもでき
る。The subsequent translation pattern generation processing is the same as in the third embodiment. That is, the user sets the parameter setting unit 3
According to the request set in step 13, the translation pattern obtained by the translation pattern estimation processing is shaped and output. Then, the user can confirm whether the result is correct or not and correct the result.

【０１２８】図４５は、抽出された翻訳パターンの説明
図である。図示のように、「インタネット」を含む四つ
の翻訳パターンの例にみるように、ある一単語に関係す
る複数の翻訳パターンを抽出することができる。また、
一つの対訳文内においても、後ろに接続する語によって
訳語が異なる。「営業秘密：trade secret」、「営業時
間：business hour 」のような翻訳パターンも正確に抽
出することができる。FIG. 45 is an explanatory diagram of the extracted translation pattern. As shown in the figure, as shown in an example of four translation patterns including “Internet”, a plurality of translation patterns related to a certain word can be extracted. Also,
Even within a single bilingual sentence, the translated word differs depending on the word connected after it. Translation patterns such as "trade secret" and "business hours: business hour" can also be accurately extracted.

【０１２９】〈効果〉以上のように、本具体例では、具
体例３と同様に、対訳文書中の対訳単語だけでなく、そ
の対訳文書中に出現する固定的な言い回しを翻訳単位と
した翻訳パターンを取得することができる。<Effects> As described above, in this specific example, as in the specific example 3, not only the translation word in the bilingual document but also the translation using the fixed phrase appearing in the bilingual document as the translation unit You can get the pattern.

【０１３０】《具体例５》〈構成〉図２１は翻訳パターン作成装置における具体例
５の構成図である。ここでは、一部の語句を置き換え可
能な変数にした対訳文形式とした翻訳テンプレートを作
成する際、既存の対訳辞書を使用する。本具体例の特徴
は、既存の対訳辞書と、使用者が翻訳したい文書と同じ
分野または特徴を持つ対訳文書とを本システムに同時に
与えることによって、既存の対訳辞書をその対訳文書と
同等の翻訳結果が得られる対訳辞書にチューニングでき
る点にある。更に、複数単語からなる翻訳パターンの獲
得も可能であるため、単語対単語の対訳辞書だけでな
く、既存のイディオム辞書等においてもチューニングす
ることができる。即ち、本具体例では、上記具体例２に
おける対訳辞書を用いた対訳単語推定処理と、具体例３
における共起単語推定処理とを共に行うようにしたもの
である。<< Embodiment 5 >><Configuration> FIG. 21 is a block diagram of Embodiment 5 in the translation pattern creating apparatus. Here, an existing bilingual dictionary is used when creating a translation template in a bilingual sentence format in which some words are replaced with replaceable variables. The feature of this example is that an existing bilingual dictionary and a bilingual document having the same field or characteristics as the document to be translated by the user are given to the system at the same time, so that the existing bilingual dictionary is translated equivalent to the bilingual document. The point is that you can tune to a bilingual dictionary that gives you the results. Further, since a translation pattern including a plurality of words can be acquired, tuning can be performed not only in a word-to-word bilingual dictionary but also in an existing idiom dictionary. That is, in this specific example, the bilingual word estimation processing using the bilingual dictionary in the specific example 2 and the specific example 3
Are performed together with the co-occurrence word estimation processing in.

【０１３１】具体例５の構成は、入出力部４１０、資源
管理部４３０、翻訳パターン抽出部４４０からなり、こ
れらの構成は、具体例３に準じるが、資源管理部４３０
に、対訳辞書４３４が存在し、翻訳パターン抽出部４４
０の翻訳パターン推定部４４２においてその対訳辞書４
３４を使用する点が異なっている。また、入出力部４１
０においては、入力部４１１は対訳文書だけでなく、対
訳辞書も入力とし、出力部４１２は入力された対訳文書
によってチューニングされた既存の対訳辞書を表示す
る。本対訳辞書は、単語単位の対応だけでなく、具体例
３で説明したような複数単語からなる翻訳パターンの対
応であっても構わない。尚、その他の入出力部４１０、
資源管理部４３０および翻訳パターン抽出部４４０の各
構成は、具体例３の入出力部３１０、資源管理部３３０
および翻訳パターン抽出部３４０の各構成と同様であ
る。The configuration of the fifth embodiment includes an input / output unit 410, a resource management unit 430, and a translation pattern extraction unit 440. These configurations are similar to those of the third embodiment.
A bilingual dictionary 434 exists, and the translation pattern extraction unit 44
0 in the translation pattern estimating section 442.
34 is used. Also, the input / output unit 41
At 0, the input unit 411 receives not only a bilingual document but also a bilingual dictionary, and the output unit 412 displays an existing bilingual dictionary tuned by the input bilingual document. The bilingual dictionary may be not only a correspondence in word units but also a translation pattern composed of a plurality of words as described in the third embodiment. The other input / output units 410,
The configurations of the resource management unit 430 and the translation pattern extraction unit 440 are the same as those of the input / output unit 310 and the resource management unit 330 of the third embodiment.
The configuration is the same as that of the translation pattern extraction unit 340.

【０１３２】〈動作〉図２２は、本具体例の翻訳パター
ン作成のアルゴリズムを示すフローチャートである。本
具体例の動作は、基本的には具体例３に準じる。相違点
は、（１）入力の際、対訳文書と同時にチューニング対
象となる対訳辞書を読み込む、（２）対訳単語推定処理
に対訳辞書を利用する、（３）翻訳パターン生成処理に
おいて対訳辞書チューニング処理を行う、の三点であ
る。以下では、相違点を中心に説明する。<Operation> FIG. 22 is a flowchart showing an algorithm for creating a translation pattern according to this example. The operation of this specific example is basically the same as that of the specific example 3. The differences are (1) at the time of input, a bilingual dictionary to be tuned is read at the same time as the bilingual document, (2) a bilingual dictionary is used for the bilingual word estimation processing, and (3) a bilingual dictionary tuning processing in the translation pattern generation processing. Do, three points. The following description focuses on the differences.

【０１３３】本具体例でも、具体例１で用いた対訳文書
（図２７参照）によって、対訳辞書（以下の図４６参
照）（Dic ［日本語パターン英語パターン］（値は省
略））をチューニングする場合について説明する。図４
６は、その対訳辞書の説明図である。In this specific example, the bilingual dictionary (see FIG. 46) (Dic [Japanese pattern English pattern] (value is omitted)) is tuned by the bilingual document (see FIG. 27) used in specific example 1. The case will be described. FIG.
6 is an explanatory diagram of the bilingual dictionary.

【０１３４】図２２において、対訳文書の入力（ステッ
プＳ１１０１）および日本語・英語形態素解析（ステッ
プＳ１１０２）は、上記各具体例と同様であるため説明
は省略する。また、使用者は、ステップＳ１１０１にお
いて、対訳文書と同時に、チューニングしたい対訳辞書
を入力部４１１によって入力する。この対訳辞書は、資
源管理部４３０の対訳辞書４３４に格納される。In FIG. 22, the input of the bilingual document (step S1101) and the Japanese / English morphological analysis (step S1102) are the same as those in the above-described specific examples, and therefore description thereof will be omitted. In step S1101, the user inputs a bilingual dictionary to be tuned by the input unit 411 simultaneously with the bilingual document. This bilingual dictionary is stored in the bilingual dictionary 434 of the resource management unit 430.

【０１３５】形態素解析処理（ステップＳ１１０２）に
よって、各形態素解析結果が対訳文書格納部４３２に格
納された後、翻訳パターン推定部４４２内の対訳単語推
定部４４２ａにおいて、対訳単語推定処理（ステップＳ
１１０３）を行う。ここでは、対訳辞書４３４を用い
て、対訳候補となる日本語と英語の単語ペアを抽出す
る。対訳候補となる単語ペアが存在した場合（ステップ
Ｓ１１０４）は、共起単語推定部４４２ｂによる共起単
語推定処理（ステップＳ１１０５）に進み、更に、翻訳
パターン語順決定処理（ステップＳ１１０６）を行い、
再びステップＳ１１０３の対訳単語推定処理に戻る。抽
出される単語ペアがなくなるまで、このような対訳単語
推定処理と共起単語推定処理を繰り返し、単語ペアがな
くなれば、翻訳パターン生成部４４３に処理を移し、使
用者がパラメータ設定部４１３で指定した形式で対訳辞
書をチューニングし（ステップＳ１１０７）、出力部４
１２で結果を表示し（ステップＳ１１０８）、処理を終
了する。After each morphological analysis result is stored in the bilingual document storage section 432 by the morphological analysis processing (step S1102), the bilingual word estimating section 442a in the translation pattern estimating section 442 performs the bilingual word estimating processing (step S1102).
1103). Here, a bilingual dictionary 434 is used to extract Japanese and English word pairs that are bilingual candidates. If there is a word pair that is a translation candidate (step S1104), the process proceeds to a co-occurrence word estimation process (step S1105) by the co-occurrence word estimation unit 442b, and further performs a translation pattern word order determination process (step S1106).
The process returns to the translated word estimation process of step S1103 again. Such a bilingual word estimation process and a co-occurrence word estimation process are repeated until there are no more word pairs to be extracted, and when there are no more word pairs, the process is transferred to the translation pattern generation unit 443, and the user specifies the word in the parameter setting unit 413. The bilingual dictionary is tuned in the specified format (step S1107), and the output unit 4
The result is displayed in step 12 (step S1108), and the process is terminated.

【０１３６】以下に、対訳単語推定処理以降のアルゴリ
ズムの詳細およびその一例を説明する。図２３、図２
４、図２５は、対訳単語推定部４４２ａの対訳単語推定
処理（上記図２２におけるステップＳ１１０３、Ｓ１１
０４）のアルゴリズムを示すフローチャートである。The details of the algorithm after the parallel word estimation process and an example thereof will be described below. FIG. 23, FIG.
4 and FIG. 25 show the translation word estimation processing (steps S1103 and S11 in FIG. 22 described above) performed by the translation word estimation unit 442a.
It is a flowchart which shows the algorithm of 04).

【０１３７】先ず、パターン作成カウンタｎに１をセッ
トする（ステップＳ１２０１）。パターン作成カウンタ
は、繰り返し行う対訳単語推定処理の処理回数を示し、
後述する対訳文テーブル（SentJn，SentEn）を管理する
ために使用される。First, 1 is set to the pattern creation counter n (step S1201). The pattern creation counter indicates the number of parallel translation word estimation processes that are performed repeatedly,
It is used to manage a bilingual sentence table (SentJn, SentEn) described later.

【０１３８】次に、形態素解析結果の中の標準形結果
（図２９、図３０（１），（２））において、対訳辞書
（Dic ［日本語パターン英語パターン］）に格納され
ている翻訳パターンが対訳文中に含まれるならば、その
翻訳パターンをパターンテーブル（Pattern ［日本語パ
ターン英語パターン］＝｛出現行番号のリスト｝）に
格納する（ステップＳ１００２）。上述した図４６の対
訳辞書と、図３１に示す対訳文テーブルからは次のよう
なパターンテーブルが作成されることになる。図４７
は、そのパターンテーブルの説明図である。Next, in the standard form results (FIGS. 29, 30 (1), (2)) in the morphological analysis results, the translation patterns stored in the bilingual dictionary (Dic [Japanese pattern English pattern]) Is included in the bilingual sentence, the translation pattern is stored in the pattern table (Pattern [Japanese pattern English pattern] = {list of occurrence line numbers}) (step S1002). From the bilingual dictionary of FIG. 46 and the bilingual sentence table shown in FIG. 31, the following pattern table is created. FIG.
Is an explanatory diagram of the pattern table.

【０１３９】対訳辞書４３４に格納され、かつ対訳文に
含まれる翻訳パターンはより確実な対応とみなされる。
その翻訳パターンを数え上げの対象から外すことによ
り、対訳辞書４３４に格納されている翻訳パターン以外
の対訳関係がある単語ペアの単語対応度の値は高くな
り、それによって対訳辞書４３４が存在しない場合より
も多くの単語ペアを抽出することができる。更に、この
処理は、対訳文書に出現しかつ対訳辞書４３４に格納さ
れている翻訳パターンを検出していることになり、検出
された翻訳パターンを優先させることにより対訳文書に
適応させるのに役立っている。The translation pattern stored in the bilingual dictionary 434 and included in the bilingual sentence is regarded as a more reliable correspondence.
By excluding the translation pattern from the objects to be counted, the value of the word correspondence degree of a word pair having a translation relation other than the translation pattern stored in the bilingual dictionary 434 becomes higher, thereby increasing the value as compared with the case where the bilingual dictionary 434 does not exist. Can also extract many word pairs. Furthermore, this processing means that the translation pattern that appears in the bilingual document and is stored in the bilingual dictionary 434 is detected, and the detected translation pattern is prioritized to be adapted to the bilingual document. I have.

【０１４０】次に、形態素解析結果の中の標準形結果か
ら、自立語（候補品詞中に名（名詞）、動（動詞）、形
（形容詞または形容動詞）、副（副詞）を含む日本語単
語、候補品詞中にｎ（名詞），ｖ（動詞），ａｄｊ（形
容詞），ａｄｖ（副詞）を含む英単語）の単語見出し
を、各言語の対訳文テーブル（SentJn［行番号］＝｛日
本語対訳文単語リスト｝，SentEn［行番号］＝｛英語対
訳文単語リスト｝（ｎ：パターン作成カウンタ（ｎ＝
１）））に読み込む（ステップＳ１２０３）。但し、形
態素解析結果を対訳文テーブルに読み込む場合、パター
ンテーブル（Pattern ［日本語パターン英語パター
ン］）を参照し、もし、パターンテーブルに格納されて
いる翻訳パターンが対訳文中に存在すれば、その翻訳パ
ターンを構成する単語については対訳文テーブルに格納
しない。Next, from the standard form results in the morphological analysis results, Japanese words including independent words (name (noun), verb (verb), form (adjective or adjective verb), adverb (adverb) in the candidate part of speech) Word headings of n (noun), v (verb), adj (adjective) and adv (adverb) in words and candidate parts of speech are translated into bilingual sentence tables for each language (SentJn [line number] = {Japan Bilingual sentence word list {, SentEn [line number] = {English bilingual sentence word list} (n: pattern creation counter (n =
1))) is read (step S1203). However, when the morphological analysis result is read into the bilingual sentence table, the pattern table (Pattern [Japanese pattern English pattern]) is referred to, and if the translation pattern stored in the pattern table exists in the bilingual sentence, the translation is performed. The words constituting the pattern are not stored in the bilingual sentence table.

【０１４１】具体例を挙げると、図４７のパターンテー
ブルが存在する場合、このパターンテーブルに格納され
ている“始める start”、“指示 direction”、“フィ
ールド field”は対訳文テーブルに格納されない。対訳
単語推定処理における以降の処理（ステップＳ１２０４
〜Ｓ１２１０）と共起単語推定処理（ステップＳ１２１
１）は、具体例３に準じるため説明を省略する。As a specific example, when the pattern table shown in FIG. 47 exists, “start start”, “instruction direction”, and “field field” stored in this pattern table are not stored in the bilingual sentence table. Subsequent processing in the bilingual word estimation processing (step S1204)
To S1210) and co-occurrence word estimation processing (step S121).
1) is similar to the specific example 3, and the description is omitted.

【０１４２】また、翻訳パターン生成処理（ステップＳ
１２１２）は、対訳辞書４３４とパターンテーブルに格
納されている翻訳パターンを統合する処理である。The translation pattern generation processing (step S
1212) is a process of integrating the bilingual dictionary 434 with the translation patterns stored in the pattern table.

【０１４３】図２６は、翻訳パターン生成処理における
アルゴリズムをフローチャートで示す図である。最初に
対訳辞書４３４を読み込み（ステップＳ１３０１）、そ
の要素（Dic ［日本語パターン英語パターン］）を一
つ取り出す（ステップＳ１３０２）。取り出された日本
語パターンと英語パターンのいずれかがパターンテーブ
ルに登録されているかを調べ（ステップＳ１３０３）、
登録されていなければ何もしない。一方、いずれかが登
録され、かつ、次のステップＳ１３０４において、日本
語パターンと英語パターンのペアが登録されていれば、
それ以外の対訳を対訳辞書４３４に格納し、格納した対
訳は、パターンテーブルから消去する（ステップＳ１３
０５）。FIG. 26 is a flowchart showing an algorithm in the translation pattern generation processing. First, the bilingual dictionary 434 is read (step S1301), and one element (Dic [Japanese pattern English pattern]) is extracted (step S1302). It is checked whether the extracted Japanese pattern or English pattern is registered in the pattern table (step S1303),
Do nothing if not registered. On the other hand, if any one is registered and a pair of a Japanese pattern and an English pattern is registered in the next step S1304,
The other translations are stored in the translation dictionary 434, and the stored translations are deleted from the pattern table (step S13).
05).

【０１４４】また、ステップＳ１３０４において、日本
語パターンと英語パターンのペアが登録されていない場
合は、対訳辞書４３４内の日本語パターンと英語パター
ンの格納（Dic ［日本語パターン英語パターン］）を
消去し、それ以外の対訳を対訳辞書４３４に格納し、パ
ターンテーブルからその対訳を消去する（ステップＳ１
３０６）。If the pair of the Japanese pattern and the English pattern is not registered in step S1304, the storage of the Japanese pattern and the English pattern in the bilingual dictionary 434 (Dic [Japanese pattern English pattern]) is deleted. Then, the other translations are stored in the translation dictionary 434, and the translations are deleted from the pattern table (step S1).
306).

【０１４５】上記の処理を、対訳辞書４３４の全ての翻
訳パターンに対して行い、最後に、パターンテーブルに
残っている全ての翻訳パターンを対訳辞書４３４に格納
する（ステップＳ１３０７）。この結果、対訳辞書４３
４の翻訳パターンは、対訳文書から抽出された翻訳パタ
ーンが存在した場合は、そちらの翻訳パターンが格納さ
れ、ない場合は元の対訳辞書４３４の対訳パターンが格
納される。その結果、対訳辞書４３４に格納される訳語
は、より対訳文書に適応した訳語となる。最後に、ここ
でチューニングされた対訳辞書４３４を使用者は出力部
４１２によって表示する。使用者はその正否を確認した
後、結果を修正することもできる。The above processing is performed on all the translation patterns in the bilingual dictionary 434, and finally, all the translation patterns remaining in the pattern table are stored in the bilingual dictionary 434 (step S1307). As a result, the bilingual dictionary 43
As for the translation pattern No. 4, if a translation pattern extracted from the bilingual document exists, the translation pattern is stored. If there is no translation pattern, the translation pattern of the original bilingual dictionary 434 is stored. As a result, the translation word stored in the bilingual dictionary 434 becomes a translation word more suitable for the bilingual document. Finally, the user displays the bilingual dictionary 434 tuned here by the output unit 412. After confirming the correctness, the user can correct the result.

【０１４６】尚、上記具体例５では、翻訳パターンの抽
出処理として、具体例３と同様の処理を用いたが、これ
を具体例４の処理により行うようにしてもよい。In the specific example 5, the same processing as that of the specific example 3 is used as the translation pattern extraction processing. However, this processing may be performed by the processing of the specific example 4.

【０１４７】〈効果〉以上のように、具体例５では、具
体例２の効果に加えて、以下の効果をもたらす。片言語内での共起関係を抽出することにより、対訳文
書中の対訳単語だけでなく、その対訳文書中に出現する
固定的な言い回しを翻訳単位とした翻訳パターンを獲得
することができる。その際、既存の対訳辞書を利用する
ことができるため、両情報（対訳文書と対訳辞書）の情
報を有効に活かした精度の高い翻訳パターン辞書が生成
できる。<Effects> As described above, Embodiment 5 has the following effects in addition to the effects of Embodiment 2. By extracting the co-occurrence relation in a single language, it is possible to obtain not only a bilingual word in the bilingual document but also a translation pattern using a fixed phrase appearing in the bilingual document as a translation unit. At this time, since an existing bilingual dictionary can be used, a highly accurate translation pattern dictionary that effectively utilizes information of both information (a bilingual document and a bilingual dictionary) can be generated.

【０１４８】単語対単語の対訳辞書だけでなく、イデ
ィオムや翻訳定型パターンに関する対訳データも翻訳パ
ターン抽出に用いることができる。更に、それらのデー
タを既存の対訳文書の翻訳に適用するようにチューニン
グすることができる。In addition to word-to-word bilingual dictionaries, bilingual data relating to idioms and translation fixed patterns can be used for translation pattern extraction. In addition, the data can be tuned to apply to the translation of existing bilingual documents.

【０１４９】尚、上記各具体例では、単語対応度の算出
方法として、式（SimJE ［JW EW ］＝2*FreqJE［JW EW
］／（ FreqJ［JW］＋ FreqE［EW］）（JW：日本語単
語，EW：英単語）を用いたが、このような算出方法に限
定されるものではなく、他の単語対応度の算出方法であ
っても、日本語単語の出現回数と、英単語の出現回数
と、これら単語同士が同一訳文中に出現する回数とが近
い数であるほど高値となる単語対応度であれば、どのよ
うな算出方法も利用可能である。In each of the above specific examples, the equation (SimJE [JW EW] = 2 * FreqJE [JW EW]
] / (FreqJ [JW] + FreqE [EW]) (JW: Japanese word, EW: English word) is used, but it is not limited to such a calculation method, and calculation of the degree of correspondence between other words is performed. If the number of occurrences of Japanese words, the number of occurrences of English words, and the number of times these words appear in the same translation are closer to each other, the higher the word correspondence, the higher the word correspondence. Such a calculation method can also be used.

【０１５０】また、上記各具体例では、原言語を日本
語、目的言語を英語としたが、これらの言語だけでな
く、他の言語にも適用可能である。更に、本装置および
方法から得られる辞書には翻訳の方向性はない。そのた
め、英日機械翻訳装置、日英機械翻訳装置両方による使
用が可能である。In each of the above specific examples, the source language is Japanese and the target language is English. However, the present invention can be applied not only to these languages but also to other languages. Further, the dictionary obtained from the present apparatus and method has no translation directionality. Therefore, it can be used by both an English-Japanese machine translator and a Japanese-English machine translator.

[Brief description of the drawings]

【図１】本発明の翻訳パターン作成装置の具体例１の構
成図である。FIG. 1 is a configuration diagram of a specific example 1 of a translation pattern creation device of the present invention.

【図２】本発明の翻訳パターン作成装置の具体例１にお
ける翻訳パターン作成処理のフローチャートである。FIG. 2 is a flowchart of a translation pattern creation process in a specific example 1 of the translation pattern creation device of the present invention.

【図３】本発明の翻訳パターン作成装置の具体例１にお
ける対訳単語推定処理のフローチャート（その１）であ
る。FIG. 3 is a flowchart (part 1) of a bilingual word estimation process in a specific example 1 of the translation pattern creation device of the present invention.

【図４】本発明の翻訳パターン作成装置の具体例１にお
ける対訳単語推定処理のフローチャート（その２）であ
る。FIG. 4 is a flowchart (part 2) of a bilingual word estimation process in the specific example 1 of the translation pattern creation device of the present invention.

【図５】本発明の翻訳パターン作成装置の具体例２の構
成図である。FIG. 5 is a configuration diagram of a specific example 2 of the translation pattern creation device of the present invention.

【図６】本発明の翻訳パターン作成装置の具体例２にお
ける翻訳パターン作成処理のフローチャートである。FIG. 6 is a flowchart of a translation pattern creation process in a specific example 2 of the translation pattern creation device of the present invention.

【図７】本発明の翻訳パターン作成装置の具体例２にお
ける対訳単語推定処理のフローチャート（その１）であ
る。FIG. 7 is a flowchart (part 1) of a bilingual word estimation process in a specific example 2 of the translation pattern creation device of the present invention.

【図８】本発明の翻訳パターン作成装置の具体例２にお
ける対訳単語推定処理のフローチャート（その２）であ
る。FIG. 8 is a flowchart (part 2) of a bilingual word estimation process in the specific example 2 of the translation pattern creation device of the present invention.

【図９】本発明の翻訳パターン作成装置の具体例２にお
ける対訳単語推定処理のフローチャート（その３）であ
る。FIG. 9 is a flowchart (part 3) of a bilingual word estimation process in the specific example 2 of the translation pattern creation device of the present invention.

【図１０】本発明の翻訳パターン作成装置の具体例２に
おける対訳辞書チューニング処理のフローチャートであ
る。FIG. 10 is a flowchart of a bilingual dictionary tuning process in the specific example 2 of the translation pattern creation device of the present invention.

【図１１】本発明の翻訳パターン作成装置の具体例３の
構成図である。FIG. 11 is a configuration diagram of a specific example 3 of the translation pattern creation device of the present invention.

【図１２】本発明の翻訳パターン作成装置の具体例３に
おける翻訳パターン作成処理のフローチャートである。FIG. 12 is a flowchart of a translation pattern creation process in a specific example 3 of the translation pattern creation device of the present invention.

【図１３】本発明の翻訳パターン作成装置の具体例３に
おける対訳単語推定処理のフローチャート（その１）で
ある。FIG. 13 is a flowchart (part 1) of a bilingual word estimation process in specific example 3 of the translation pattern creation device of the present invention.

【図１４】本発明の翻訳パターン作成装置の具体例３に
おける対訳単語推定処理のフローチャート（その２）で
ある。FIG. 14 is a flowchart (part 2) of a bilingual word estimation process in a specific example 3 of the translation pattern creation device of the present invention.

【図１５】本発明の翻訳パターン作成装置の具体例３に
おける対訳単語推定処理のフローチャート（その３）で
ある。FIG. 15 is a flowchart (part 3) of a bilingual word estimation process in the specific example 3 of the translation pattern creation device of the present invention.

【図１６】本発明の翻訳パターン作成装置の具体例３に
おける共起単語推定処理のフローチャートである。FIG. 16 is a flowchart of co-occurrence word estimation processing in a specific example 3 of the translation pattern creation device of the present invention.

【図１７】本発明の翻訳パターン作成装置の具体例４の
構成図である。FIG. 17 is a configuration diagram of a specific example 4 of the translation pattern creation device of the present invention.

【図１８】本発明の翻訳パターン作成装置の具体例４に
おける翻訳パターン作成処理のフローチャートである。FIG. 18 is a flowchart of a translation pattern creation process in a specific example 4 of the translation pattern creation device of the present invention.

【図１９】本発明の翻訳パターン作成装置の具体例４に
おける翻訳パターン推定処理のフローチャート（その
１）である。FIG. 19 is a flowchart (part 1) of a translation pattern estimation process in a specific example 4 of the translation pattern creation device of the present invention.

【図２０】本発明の翻訳パターン作成装置の具体例４に
おける翻訳パターン推定処理のフローチャート（その
２）である。FIG. 20 is a flowchart (part 2) of a translation pattern estimation process in a specific example 4 of the translation pattern creation device of the present invention.

【図２１】本発明の翻訳パターン作成装置の具体例５の
構成図である。FIG. 21 is a configuration diagram of a specific example 5 of the translation pattern creation device of the present invention.

【図２２】本発明の翻訳パターン作成装置の具体例５に
おける翻訳パターン作成処理のフローチャートである。FIG. 22 is a flowchart of a translation pattern creation process in a specific example 5 of the translation pattern creation device of the present invention.

【図２３】本発明の翻訳パターン作成装置の具体例５に
おける対訳単語推定処理のフローチャート（その１）で
ある。FIG. 23 is a flowchart (part 1) of a bilingual word estimation process in a specific example 5 of the translation pattern creation device of the present invention.

【図２４】本発明の翻訳パターン作成装置の具体例５に
おける対訳単語推定処理のフローチャート（その２）で
ある。FIG. 24 is a flowchart (part 2) of a bilingual word estimation process in a specific example 5 of the translation pattern creating device of the present invention.

【図２５】本発明の翻訳パターン作成装置の具体例５に
おける対訳単語推定処理のフローチャート（その３）で
ある。FIG. 25 is a flowchart (part 3) of a bilingual word estimation process in the specific example 5 of the translation pattern creation device of the present invention.

【図２６】本発明の翻訳パターン作成装置の具体例５に
おける対訳辞書チューニング処理のフローチャートであ
る。FIG. 26 is a flowchart of a bilingual dictionary tuning process in the specific example 5 of the translation pattern creation device of the present invention.

【図２７】本発明の翻訳パターン作成装置の各具体例で
説明するための対訳文書の一例を示す図である。FIG. 27 is a diagram showing an example of a bilingual document for explaining in each specific example of the translation pattern creation device of the present invention.

【図２８】図２７の対訳文書における形態素解析の出現
形結果の説明図である。FIG. 28 is an explanatory diagram of an appearance result of morphological analysis in the bilingual document of FIG. 27;

【図２９】図２７の対訳文書における形態素解析の標準
形結果の説明図（その１）である。FIG. 29 is an explanatory diagram (No. 1) of a standard form result of morphological analysis in the bilingual document of FIG. 27;

【図３０】図２７の対訳文書における形態素解析の標準
形結果の説明図（その２）である。30 is an explanatory diagram (No. 2) of a standard form result of morphological analysis in the bilingual document of FIG. 27;

【図３１】本発明の翻訳パターン作成装置における対訳
文テーブルの一例を示す図である。FIG. 31 is a diagram showing an example of a bilingual sentence table in the translation pattern creation device of the present invention.

【図３２】本発明の翻訳パターン作成装置における単語
テーブルの一例を示す図である。FIG. 32 is a diagram showing an example of a word table in the translation pattern creation device of the present invention.

【図３３】本発明の翻訳パターン作成装置における単語
ペアテーブルの一例を示す図（その１）である。FIG. 33 is a diagram (part 1) illustrating an example of a word pair table in the translation pattern creation device of the present invention.

【図３４】本発明の翻訳パターン作成装置における単語
ペアテーブルの一例を示す図（その２）である。FIG. 34 is a diagram (part 2) illustrating one example of a word pair table in the translation pattern creation device of the present invention.

【図３５】本発明の翻訳パターン作成装置におけるパタ
ーンテーブルの一例を示す図である。FIG. 35 is a diagram showing an example of a pattern table in the translation pattern creation device of the present invention.

【図３６】本発明の翻訳パターン作成装置におけるｎ＝
２の場合の対訳文テーブルの一例を示す図である。FIG. 36 is a diagram illustrating a translation pattern creation apparatus according to the present invention.
6 is a diagram illustrating an example of a bilingual sentence table in the case of No. 2. FIG.

【図３７】本発明の翻訳パターン作成装置の対訳単語推
定処理において要素を簡素化した場合の説明図（その
１）である。FIG. 37 is an explanatory diagram (part 1) of a case where elements are simplified in the bilingual word estimation processing of the translation pattern creation device of the present invention.

【図３８】本発明の翻訳パターン作成装置の対訳単語推
定処理において要素を簡素化した場合の説明図（その
２）である。FIG. 38 is an explanatory diagram (part 2) of a case where elements are simplified in the translation word estimation processing of the translation pattern creation device of the present invention.

【図３９】本発明の翻訳パターン作成装置の具体例１の
表示例の説明図である。FIG. 39 is an explanatory diagram of a display example of a specific example 1 of the translation pattern creation device of the present invention.

【図４０】本発明の翻訳パターン作成装置の具体例２に
おける対訳辞書の説明図である。FIG. 40 is an explanatory diagram of a bilingual dictionary in a specific example 2 of the translation pattern creation device of the present invention.

【図４１】本発明の翻訳パターン作成装置の具体例２に
おける対訳辞書引き後のパターンテーブルの説明図であ
る。FIG. 41 is an explanatory diagram of a pattern table after a bilingual dictionary is consulted in a specific example 2 of the translation pattern creating apparatus of the present invention.

【図４２】本発明の翻訳パターン作成装置の具体例３に
おける単語ペアテーブルの一例を示す図である。FIG. 42 is a diagram showing an example of a word pair table in a specific example 3 of the translation pattern creation device of the present invention.

【図４３】本発明の翻訳パターン作成装置の具体例３に
おける単語ペア候補リストの説明図である。FIG. 43 is an explanatory diagram of a word pair candidate list in a specific example 3 of the translation pattern creation device of the present invention.

【図４４】本発明の翻訳パターン作成装置の具体例３に
おけるパターンテーブルの一例を示す説明図である。FIG. 44 is an explanatory diagram showing an example of a pattern table in a specific example 3 of the translation pattern creation device of the present invention.

【図４５】本発明の翻訳パターン作成装置の具体例４に
おける翻訳パターンの抽出結果を示す説明図である。FIG. 45 is an explanatory diagram showing a translation pattern extraction result in the specific example 4 of the translation pattern creation device of the present invention.

【図４６】本発明の翻訳パターン作成装置の具体例５に
おける対訳辞書を示す説明図である。FIG. 46 is an explanatory diagram showing a bilingual dictionary in a specific example 5 of the translation pattern creation device of the present invention.

【図４７】本発明の翻訳パターン作成装置の具体例５に
おける対訳辞書引き後のパターンテーブルの説明図であ
る。FIG. 47 is an explanatory diagram of a pattern table after bilingual dictionary lookup in a specific example 5 of the translation pattern creation device of the present invention.

【符号の説明】１１１、２１１、３１１、４１１入力部１１２、２１２、３１２、４１２出力部１２２、２２２、３４２ａ、４４２ａ対訳単語推定部２３４、４３４対訳辞書３４２ｂ、４４２ｂ共起単語推定部[Description of Signs] 111, 211, 311, 411 Input unit 112, 212, 312, 412 Output unit 122, 222, 342a, 442a Bilingual word estimation unit 234, 434 Bilingual dictionary 342b, 442b Co-occurrence word estimation unit

フロントページの続き (56)参考文献特開平５−282361（ＪＰ，Ａ) Ｆ．Ｓｍａｄｊａ，Ｋ．Ｒ．ＭｃＫｅｏｗｎ，ａｎｄＶ．Ｈａｔｚｉｖａｓｓｉｌｏｇｌｏｕ，ＴｒａｎｓｌａｔｉｎｇＣｏｌｌｏｃａｔｉｏｎｓｆｏｒＢｉｌｉｎｇｕａｌＬｅｘｉｃｏｎｓ：ＡＳｔａｔｉｓｔｉｃａｌＡｐｐｒｏａｃｈ，ＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，米国，1996年４月24日，Ｖｏｌ．22，Ｎｏ．１，ｐ．１−ｐ．38 北村美穂子（沖電気工業株式会社）, 松本祐治（奈良先端科学技術大学院大学），「対訳コーパスを利用した翻訳規則の自動獲得」，情報処理学会論文誌, 日本，1996年６月15日，Ｖｏｌ．37, Ｎｏ．６，ｐ．1030−ｐ．1040 北村美穂子（沖電気工業株式会社）, 松本祐治（奈良先端科学技術大学院大学），「二言語対訳コーパスからの翻訳知識の自動獲得」，電子情報通信学会技術研究報告，日本，1994年５月13日, Ｖｏｌ．94，Ｎｏ．32，ｐ．９−ｐ．16 北村美穂子（沖電気工業株式会社）, 松本祐治（奈良先端科学技術大学院大学），「対訳コーパスを利用した対話表現の自動抽出」，情報処理学会論文誌, 日本，1997年４月15日，Ｖｏｌ．38, Ｎｏ．４，ｐ．727−ｐ．736 北村美穂子（沖電気工業株式会社）, 松本祐治（奈良先端科学技術大学院大学），「対訳コーパス中の共起頻度に基づく対話表現の自動抽出」，電子情報通信学会技術研究報告，日本，1996年７月18日，Ｖｏｌ．96，Ｎｏ．157，ｐ. 69−ｐ．76 北村美穂子（沖電気工業株式会社）, 松本祐治（奈良先端科学技術大学院大学），「対訳コーパス中の共起頻度に基づく対話表現の自動抽出」，情報処理学会研究報告，日本，1996年７月19日, Ｖｏｌ．96，Ｎｏ．65，ｐ．69−ｐ．76 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/28 Continuation of front page (56) References JP-A-5-282361 (JP, A) Smadja, K .; R. McKeown, and V.W. Hatzivas siroglow, Translatating Collations for Bilingual Lexicons: A Statistical Aproach, Computational Linguistics, USA, April 24, 1996, Vol. 22, No. 1, p. 1-p. 38 Mihoko Kitamura (Oki Electric Industry Co., Ltd.), Yuji Matsumoto (Nara Institute of Science and Technology), "Automatic acquisition of translation rules using a bilingual corpus", Transactions of Information Processing Society of Japan, Japan, June 15, 1996 Date, Vol. 37, No. 6, p. 1030-p. 1040 Mihoko Kitamura (Oki Electric Industry Co., Ltd.), Yuji Matsumoto (Nara Institute of Science and Technology), "Automatic acquisition of translation knowledge from bilingual corpus", IEICE Technical Report, Japan, 1994 May 13, Vol. 94, no. 32, p. 9-p. 16 Mihoko Kitamura (Oki Electric Industry Co., Ltd.), Yuji Matsumoto (Nara Institute of Science and Technology), "Automatic Extraction of Dialogue Expression Using Bilingual Corpus", Transactions of Information Processing Society of Japan, Japan, April 15, 1997 Date, Vol. 38, No. 4, p. 727-p. 736 Mihoko Kitamura (Oki Electric Industry Co., Ltd.), Yuji Matsumoto (Nara Institute of Science and Technology), "Automatic Extraction of Dialogue Expression Based on Co-occurrence Frequency in Bilingual Corpus", IEICE Technical Report, Japan, July 18, 1996, Vol. 96, No. 157, p. 69-p. 76 Mihoko Kitamura (Oki Electric Industry Co., Ltd.), Yuji Matsumoto (Nara Institute of Science and Technology), “Automatic Extraction of Dialogue Expression Based on Co-occurrence Frequency in Bilingual Corpus,” Information Processing Society of Japan, Japan, July 19, 1996, Vol. 96, No. 65, p. 69-p. 76 (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/28

Claims

(57) [Claims]

1. A translation pattern creation device stores a bilingual document stored in a bilingual sentence table in which a source language document and a target language document, which is a bilingual sentence of the document, are associated with each other on a sentence basis. The number of appearances of the translated word pair in the bilingual document, the number of appearances of the source language word in the word pair in the source language document, and the number of occurrences of the target language word in the word pair in the target language document. The word pair having the highest value of the correspondence is extracted as a translation word based on the word pair, and the extracted word pair is removed from the bilingual sentence table and the translated document is re-translated. Forming and re-determining the degree of correspondence for the remaining word pairs, extracting the word pair having the highest value of the degree of correspondence as a bilingual word, repeating these, and sequentially extracting bilingual words. Translation pattern generation method for the butterflies.

2. The translation pattern creation method according to claim 1, wherein the extracted word pair is an original word pair by the translation pattern creation device, and one of the original word pairs is included in the original word pair. Looking at the other word from the other word of the original word pair, the same sentence appears, and there is another word whose number of appearances in the pair with the one word is the same, and A translation pattern creation method for extracting a word from the other word of the original word pair and extracting the word as a multiple word-to-multiword translation pattern co-occurring with the original word pair when there is a similar word. .

3. A translation pattern device stores a bilingual document stored in a bilingual sentence table and forming a source language document and a target language document which is a bilingual sentence of the document in correspondence with each other on a sentence basis. For a word string pair consisting of a translated word and a continuous word in a bilingual document, the number of appearances of the word string pair, the number of appearances of the source language word string in the word string pair in the source language document, and the word string Based on the number of appearances of the target language word string in the pair in the target language document, the degree of correspondence of the word string pair is obtained, and the word string pair having the highest value of the degree of correspondence is extracted as a translation pattern. A bilingual document is formed again by removing the extracted word string pair from the bilingual sentence table, and the degree of correspondence is again determined for the remaining word string pair, and the word string pair having the highest value of the degree of correspondence is obtained. Translate patter Extracted as down, by repeating these, sequentially, a translation pattern forming method and extracting the translation pattern.

4. An input section for inputting a source language document and a source document in which a target language document which is a translation of the source document is associated with each sentence, and a source language document for the source document. The number of occurrences of the specific word in the document, the number of occurrences of the specific word in the target language document, and the number of times that the specific word appears in the same translation of the source language and target language documents are close to each other Is obtained, the word correspondence with a higher value is obtained, and the word pair having the highest value of the word correspondence is extracted as a bilingual word, and an arbitrary word pair is extracted as a bilingual word.
At this time, a new bilingual sentence excluding the word pair is generated from the target bilingual sentence, a word correspondence is calculated for the new bilingual sentence, and the word pair having the highest value of the word correspondence is determined. A translation characterized by comprising: a bilingual word estimating unit that sequentially extracts bilingual words by extracting the bilingual words and repeating the process, and an output unit that outputs the bilingual words extracted by the bilingual word estimating unit. Pattern creation device.

5. The translation pattern creating apparatus according to claim 4, wherein a bilingual dictionary having a bilingual translation between the source language and the target language and a word pair appearing in the bilingual document are registered in the bilingual dictionary in advance. Is provided with a bilingual word estimating unit that generates a bilingual sentence excluding the word pair, obtains a word correspondence degree with respect to the bilingual sentence, and extracts a word pair having the highest value of the word correspondence degree as a bilingual word. A translation pattern creating apparatus, characterized in that:

6. A translation pattern creating apparatus according to claim 5, wherein the pattern table storing the word pairs extracted as the bilingual translation, an arbitrary word of the source language in the bilingual dictionary, and a translation of the target language for the word. It is determined whether or not the word pair matches the word pair in the pattern table. If the word pair does not match, a translation word for the arbitrary word is deleted in the bilingual dictionary, and a translation word estimation unit that registers the word pair in the pattern table as a translation word is provided. A translation pattern creation device, comprising:

7. The translation pattern forming apparatus according to claim 4, the word pairs extracted by the translation word estimator and the original word pair, among the original word pair, whereas the other side word from side word See, the other word of the original word pair, the same sentence appears, and there is another word that has the same number of appearances in the pair with the one word, and the original word pair A co-occurrence word estimator is provided that extracts a word as a translation pattern of a plurality of words that co-occurs with the original word pair when there is a similar word when viewed from the other word. A translation pattern creating apparatus characterized by the following.

8. The translation pattern creating apparatus according to claim 7, wherein a bilingual dictionary having a bilingual translation between the source language and the target language and a word pair appearing in the bilingual document are registered in the bilingual dictionary in advance. Is provided with a bilingual word estimating unit that generates a bilingual sentence excluding the word pair, obtains a word correspondence degree with respect to the bilingual sentence, and extracts a word pair having the highest value of the word correspondence degree as a bilingual word. A translation pattern creating apparatus, characterized in that:

9. A translation pattern creating apparatus according to claim 8, wherein a pattern table for storing the word pairs extracted as the parallel translation, an arbitrary word in the source language in the bilingual dictionary, and a translation in the target language for the word. It is determined whether or not the word pair matches the word pair in the pattern table. If the word pair does not match, a translation word for the arbitrary word is deleted in the bilingual dictionary, and a translation word estimation unit that registers the word pair in the pattern table as a translation word is provided. A translation pattern creation device, comprising:

10. An input unit for inputting a bilingual document formed by associating a source language document and a target language document, which is a bilingual sentence of the document, with each other on a sentence basis, and inputting a bilingual document in the bilingual document. For a word string pair consisting of continuous words, the number of appearances of the word string pair, the number of appearances of the source language word string in the word string pair in the source language document, and the target language word string in the word string pair The degree of correspondence of the word string pair is determined based on the number of appearances in the target language document, and the word string pair having the highest degree of correspondence is extracted as a translation pattern, and the extracted word string pair By again forming a bilingual document, obtaining the degree of correspondence again for the word string pair, extracting the word string pair having the highest value of the degree of correspondence as a translation pattern, and repeatedly performing these, Sequentially, translation putter A translation pattern estimating unit for extracting a translation pattern.