JPH09160918A - Translated sentence corresponding method and device therefor - Google Patents

Translated sentence corresponding method and device therefor

Info

Publication number
JPH09160918A
JPH09160918A JP7324562A JP32456295A JPH09160918A JP H09160918 A JPH09160918 A JP H09160918A JP 7324562 A JP7324562 A JP 7324562A JP 32456295 A JP32456295 A JP 32456295A JP H09160918 A JPH09160918 A JP H09160918A
Authority
JP
Japan
Prior art keywords
sentence
correspondence
bilingual
pair
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP7324562A
Other languages
Japanese (ja)
Inventor
Masahiko Haruno
雅彦 春野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP7324562A priority Critical patent/JPH09160918A/en
Publication of JPH09160918A publication Critical patent/JPH09160918A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

PROBLEM TO BE SOLVED: To make the wide translated sentences of two languages correspond to each other with high precision. SOLUTION: An input part 110 inputs the correspondence text of two languages such as Japanese and English from a storage device 10, etc. A morpheme analyzing part 120 performs a morpheme analysis for each inputted text. A degree of similarity calculating part 130 calculates the degree of similarity of the word of the both languages as the mutual information capacity in the text from the morpheme analysis result and further selects a word pair with high reliability in a statistical qualification. A sentence correspondence estimating part 140 narrows down the sentence correspondence possible relation by using the degree of similarity and an existing bilingual dictionary. A post- processing part 150 selects a sentence correspondence pair having a prescribed number of times of support for the narrowed down sentence correspondence possible relation. An output part 160 outputs this selected sentence correspondence pair to a storage device 20, etc.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は対訳文対応付け方法
及び装置に係り、詳しくは、機械翻訳、知識ベースシス
テム等の自然言語システムに用いられ、対訳テキストか
ら自動的に知識を学習する対訳文対応付け方法及び装置
に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a bilingual sentence associating method and apparatus, and more particularly, to a bilingual sentence used for a natural language system such as a machine translation or a knowledge base system to automatically learn knowledge from a bilingual text. The present invention relates to an associating method and device.

【0002】[0002]

【従来の技術】従来の対訳文対応付けは、主に英語・フ
ランス語間などの構造並びに語彙が非常に近い言語間で
行われており、それらは、文中に含まれる単語数や文字
数などの情報で対訳の対応付けを行なう方法が一般的で
あった。一方、日本語・英語などの対訳に関しては、対
訳辞書のみを用いる方法、ならびに、ダイナミックプロ
グラミングの手法を用いて、対訳辞書を用いた後に後処
理として統計を用いる方法がある。
2. Description of the Related Art Conventional bilingual sentence mapping is performed mainly between languages such as English and French that have very similar structures and vocabularies, and these are information such as the number of words and the number of characters included in a sentence. It was common to associate parallel translations with. On the other hand, regarding bilingual translations such as Japanese and English, there are a method of using only a bilingual dictionary and a method of using statistics as post-processing after using a bilingual dictionary using a dynamic programming method.

【0003】[0003]

【発明が解決しようとする課題】このように、従来の対
訳文対応付け方法は、構造の似た比較的対応付けの容易
なテキストを扱ってきた。しかしながら、日本語と英語
のように全く構造も思考法も異なる言語間では、素直に
訳された対訳テキストであっても、その構成が違ってい
たり内容の削除等が行なわれるのが普通である。このよ
うな場合には、データからの統計的情報と既存の知識源
である辞書を適切に組み合わせることが重要である。統
計的情報、辞書情報の長短所は以下のようにまとめられ
る。
As described above, the conventional bilingual sentence associating method has dealt with texts having a similar structure and relatively easy to associate. However, between languages such as Japanese and English, which have completely different structures and thinking methods, it is common for the translated texts to be translated in a different way and to have their contents deleted. . In such cases, it is important to properly combine the statistical information from the data with the existing knowledge source dictionary. The advantages and disadvantages of statistical information and dictionary information are summarized as follows.

【0004】統計情報の長所:データに依存した情報を
獲得出来るので、そのテキストの文脈に適切な訳語関係
を見つけることが出来る。また、日本語のように単語切
り(形態素解析)が必要な言語においては単語切りが誤
っていても情報を取り出せることが長所である。 統計情報の短所:信頼性の高い統計情報を得るために
は、対象とする単語がデータ中に複数回出現する必要が
ある。多くの単語が1,2度しか現われないことを考え
ると、統計情報を取れる単語は限られてくる。 辞書情報の長所:一度しか現われない単語についても情
報を得ることが出来る。 辞書情報の短所:1つの単語の訳語には様々なものが考
えられ、データ中で使われているものが対訳辞書に載っ
ているとは限らない。また、形態素解析の段階で誤りが
あれば、正しい辞書びきは不可能である。 これらから分かる様に、統計的情報と辞書情報の長短所
は相補的な関係にある。
Advantages of statistical information: Since data-dependent information can be obtained, it is possible to find a translation relation suitable for the context of the text. Further, in a language such as Japanese that requires word segmentation (morphological analysis), it is an advantage that information can be retrieved even if the word segmentation is incorrect. Disadvantages of statistical information: In order to obtain reliable statistical information, the target word must appear multiple times in the data. Considering that many words appear only once or twice, the number of words for which statistical information can be obtained is limited. Advantages of dictionary information: Information can be obtained even for words that appear only once. Disadvantages of dictionary information: There are various possible translations for one word, and the ones used in the data are not always listed in the bilingual dictionary. Moreover, if there is an error in the stage of morphological analysis, correct dictionary lookup is impossible. As can be seen from these, the advantages and disadvantages of statistical information and dictionary information are in a complementary relationship.

【0005】本発明の目的は、従来の問題を解決し、統
計的情報と辞書情報を適切に組合わせた高精度な対訳文
対応付け方法及び装置を提供することにある。
SUMMARY OF THE INVENTION It is an object of the present invention to solve the conventional problems and provide a highly accurate parallel translation sentence associating method and apparatus in which statistical information and dictionary information are properly combined.

【0006】[0006]

【課題を解決するための手段】本発明は、2ヶ国語の対
応テキストが与えられると、類似度計算手段において、
両言語の単語の類似度をデータ中の相互情報量として計
算し、さらにt−test等による統計的検定で信頼度
の高いものだけを選択する。次に、文対応推定手段に
て、この類似度と既存の対訳辞書の情報を用いて可能な
文の範囲を絞り込む。この絞り込まれた情報を用いて、
さらに類似度計算手段と文対応推定手段において上記の
操作を繰り返す。この操作の繰り返しにより、対応可能
な文の組が次第に絞り込まれ、最終的に所望の文対応付
けが得られる。
According to the present invention, when corresponding texts in two languages are given, the similarity calculation means
The degree of similarity between words in both languages is calculated as the amount of mutual information in the data, and only those with a high degree of reliability are selected by a statistical test such as t-test. Next, the sentence correspondence estimation means narrows down the range of possible sentences using this similarity and the information of the existing bilingual dictionary. Using this refined information,
Further, the above operation is repeated in the similarity calculation means and the sentence correspondence estimation means. By repeating this operation, the set of available sentences is gradually narrowed down, and the desired sentence association is finally obtained.

【0007】[0007]

【発明の実施の形態】以下、本発明の一実施例として、
日本語と英語の対応テキストが与えられた場合について
説明する。
BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, as one embodiment of the present invention,
The case where corresponding texts in Japanese and English are given is explained.

【0008】図1に、本発明の一実施例の対訳文対応付
け装置のシステム構成図を示す。本対訳対応付け装置1
00は、入力部110、形態素解析部120、類似度計
算部130、文対応推定部140、後処理部150、出
力部160、これら各部のワークエリアとして使用され
る記憶部170、及び、既存の対訳辞書180からな
る。10は日本語と英語と対応テキストデータが格納さ
れている記憶装置、20は対応付けられた対訳文ペアが
格納される記憶装置である。なお、対応テキストデータ
の入力手段は、必ずしも記憶装置である必要はない。
FIG. 1 shows a system configuration diagram of a bilingual sentence associating device according to an embodiment of the present invention. Book translation matching device 1
00 is an input unit 110, a morphological analysis unit 120, a similarity calculation unit 130, a sentence correspondence estimation unit 140, a post-processing unit 150, an output unit 160, a storage unit 170 used as a work area for each of these units, and an existing unit. It consists of a bilingual dictionary 180. Reference numeral 10 is a storage device in which Japanese and English and corresponding text data are stored, and 20 is a storage device in which associated bilingual sentence pairs are stored. The input means for the corresponding text data does not necessarily have to be the storage device.

【0009】入力部110は、記憶装置10などからの
日本語と英語の2ヶ国語の対応テキストを入力して記憶
部170の所定のワークエリアに格納する。形態素解析
部120は、日本語と英語の対応テキストを記憶部17
0の所定ワークエリアからとり出して、それぞれ形態素
解析を行い、その結果を記憶部170の所定のワークエ
リアに格納する。類似度計算部130は、記憶部170
の所定のワークエリア内の形態素解析結果から両言語の
単語の対応可能関係を算出し、その相互情報量を求め、
更に統計的検定(t−test)により信頼性の高い単
語対を選択し、記憶部170の所定のワークエリアに格
納する。文対応推定部140は、記憶部170の所定ワ
ークエリア内の単語対について、あらかじめ用意された
対訳辞書180を用いて、日本文iと英文jの対応が支
持される回数をカウントし、所定の閾値にて文対応可能
関係を絞り込み、記憶部170の所定のワークエリアに
格納する。後処理部150は、記憶部170の所定ワー
クエリア内の文対応可能関係から、所定の支持回数を持
つ文対応ペアを選択し、記憶部170の所定のワークエ
リアに格納する。出力部160は、後処理部150で選
択された記憶部170の所定ワークエリア内の文対応ペ
アを記憶装置20へ出力する。
The input unit 110 inputs corresponding texts in two languages, Japanese and English, from the storage device 10 and stores them in a predetermined work area of the storage unit 170. The morphological analysis unit 120 stores the corresponding texts in Japanese and English in the storage unit 17.
0 is extracted from a predetermined work area, morphological analysis is performed on each, and the result is stored in a predetermined work area of the storage unit 170. The similarity calculation unit 130 includes a storage unit 170.
From the morphological analysis results in the given work area, the correspondence relationship between words in both languages is calculated, and the mutual information amount is calculated.
Further, a highly reliable word pair is selected by a statistical test (t-test) and stored in a predetermined work area of the storage unit 170. The sentence correspondence estimation unit 140 counts the number of times the correspondence between the Japanese sentence i and the English sentence j is supported for a word pair in a predetermined work area of the storage unit 170, using a pre-prepared bilingual dictionary 180. Sentence correspondence relationships are narrowed down by a threshold value and stored in a predetermined work area of the storage unit 170. The post-processing unit 150 selects a sentence correspondence pair having a predetermined number of support times from the sentence correspondence correspondence in the predetermined work area of the storage unit 170 and stores it in the predetermined work area of the storage unit 170. The output unit 160 outputs to the storage device 20 the sentence-corresponding pair in the predetermined work area of the storage unit 170 selected by the post-processing unit 150.

【0010】図2に、図1中の特に類似度計算部13
0、文対応推定部140、後処理部150の接続関係を
示す。ここで、類似度計算部130と文対応推定部14
0は記憶部170のワークエリアを介してループを構成
しており、この両者の処理の繰り返しで文対応範囲が絞
り込まれる。
FIG. 2 shows the similarity calculator 13 in FIG.
0 shows the connection relationship between the sentence correspondence estimation unit 140 and the post-processing unit 150. Here, the similarity calculation unit 130 and the sentence correspondence estimation unit 14
0 forms a loop via the work area of the storage unit 170, and the sentence correspondence range is narrowed down by repeating the processes of both.

【0011】図3は、本実施例の一連の処理ステップを
示したものである。まず、形態素解析部120におい
て、それぞれが対応する日本語テキストと英語テキスト
の双方が形態素解析され、必要な品詞の単語だけが選び
出される(ステップ300)。以後の対応付けでは、こ
こで取り出された単語だけが利用される。また、入力さ
れた日英テキスト中の文数から初期的な文対応可能関係
が作られる。この初期的関係では、それぞれのテキスト
の先頭、終末同士は対応し、それ以外の対応関係には幅
を持たせる。対応の幅は、テキストの両端では小さく、
テキストの中央に近いほど大きく取る。
FIG. 3 shows a series of processing steps in this embodiment. First, the morphological analysis unit 120 performs morphological analysis on both the corresponding Japanese text and English text, and selects only the necessary part-of-speech word (step 300). In the subsequent correspondence, only the words extracted here are used. An initial sentence correspondence relationship is created from the number of sentences in the input Japanese-English text. In this initial relationship, the beginning and end of each text correspond to each other, and other correspondences have a width. The width of the correspondence is small at both ends of the text,
The closer to the center of the text, the larger.

【0012】図4に、この対応可能関係の例を示す。対
応の幅はテキストの両端では小さく、テキストの中央に
近いほど大きくなっている。対応可能関係の数は日本語
の文数だけある。
FIG. 4 shows an example of this correspondence possibility. The width of the correspondence is small at both ends of the text, and increases toward the center of the text. There are as many correspondence relationships as there are Japanese sentences.

【0013】次に、類似度計算部130において、対応
可能関係から単語対応を推定する(ステップ310)。
以下に類似度計算部130の働きを説明する。
Next, the similarity calculator 130 estimates word correspondences from the correspondence relationships (step 310).
The operation of the similarity calculation unit 130 will be described below.

【0014】いま、対応ペアi中j番目の日本語単語を
ij、対応ペアi中k番目の英語単語をEikとする。ま
た、N(Jij)を単語Jijが現われる対応ペア数とす
る。ただし、1つの出現単語が複数のペアで二重に数え
られないように管理する。この時、対応ペアi中の日、
英の単語をJij、Eikの類似度は、以下の相互情報量I
(Jij、Eik)で与えられる。ここで、Prは確率、nは
ペアの総数(即ち日本語テキストの文数)である。
It is now assumed that the j-th Japanese word in the corresponding pair i is J ij and the k-th English word in the corresponding pair i is E ik . Also, let N (J ij ) be the number of corresponding pairs in which the word J ij appears. However, one occurrence word is managed so as not to be counted twice in a plurality of pairs. At this time, the day of the corresponding pair i,
The similarity between English words J ij and E ik is the mutual information I
(J ij , E ik ). Here, Pr is the probability and n is the total number of pairs (that is, the number of sentences in Japanese text).

【0015】[0015]

【数1】 [Equation 1]

【0016】この相互情報量は、単語対JijとEikの出
現の割合を表わしており、これを利用することで日英の
単語の近さを計ることができる。ただし、相互情報量は
頻度の低い単語対しても大きくなることがあるが、これ
は統計的に信頼性が低い。そこで、類似度計算部130
では、統計的検定(t−test)を合わせて行い、信
頼性の高い単語対のものだけを取り出す。この相互情報
量を取る操作を、対応可能関係中に含まれる(必要な品
詞を持つ)全ての単語の組合せについて行なう。
This mutual information indicates the rate of appearance of word pairs J ij and E ik , and by using this, the closeness of Japanese and English words can be measured. However, the mutual information can be large even for infrequent words, but this is statistically unreliable. Therefore, the similarity calculation unit 130
Then, a statistical test (t-test) is also performed, and only the reliable word pairs are extracted. The operation of obtaining the mutual information amount is performed for all word combinations (having a necessary part of speech) included in the correspondence relationship.

【0017】次に、文対応推定部140において、類似
度計算部130で得られた単語対と既存の対訳辞書を用
いて文対応可能関係を絞り込む(ステップ320)。
Next, the sentence correspondence estimation unit 140 narrows down the sentence correspondence correspondence relationship using the word pairs obtained by the similarity calculation unit 130 and the existing bilingual dictionary (step 320).

【0018】該文対応推定部320では、以下のステッ
プで日本文iと英文jの対応が支持される回数を数えて
いく。st、dicは外部から与えられるパラメータ
で、それぞれ統計、対訳辞書でサポートされた時に加え
る点数である。通常は、stをdicより大きく取る。 ステップ1:相互情報量の大きかった単語ペア順に、こ
の操作を適用する。即ち、日本文iと英文jにそのペア
が含まれ、かつ、日本文iの対応ペアに含まれる英文で
他にその英単語を含むものが無ければ、日本文iと英文
jの組合せにstを加える。なお、このステップに非公
差の条件を加えることも可能である。 ステップ2:日本文iと英文jに対訳辞書の単語ペアが
含まれ、かつ、日本文iの対応ペアに含まれる英文で他
にその英単語を含むものが無ければ、日本文iと英文j
の組合せにdicを加える。 ステップ3:ステップ1、2である閾値を越えた対応
は、確実な対応として確定する(この対応をアンカーと
呼ぶ)。次の繰り返しへの入力としてアンカーの列から
新しい文対応可能関係を構成する。2つのアンカーに挟
まれる部分の対応可能関係は幅を持つが、その幅はアン
カーに近いほど小さく、アンカーの中央に近いほど大き
く取る。
The sentence correspondence estimation unit 320 counts the number of times the correspondence between the Japanese sentence i and the English sentence j is supported in the following steps. st and dic are parameters given externally, and are points added when supported by statistics and bilingual dictionaries, respectively. Usually, st is set larger than dic. Step 1: This operation is applied in the order of word pairs with the largest mutual information. That is, if the Japanese sentence i and the English sentence j include the pair and there is no other English sentence included in the corresponding pair of the Japanese sentence i including the English word, the combination of the Japanese sentence i and the English sentence j is st. Add. It is also possible to add a non-tolerance condition to this step. Step 2: If the Japanese sentence i and the English sentence j include a word pair in the bilingual dictionary, and there is no other English sentence included in the corresponding pair of the Japanese sentence i including the English word, the Japanese sentence i and the English sentence j.
Add dic to the combination. Step 3: Correspondences exceeding the thresholds of Steps 1 and 2 are confirmed as reliable correspondences (this correspondence is called an anchor). Construct a new sentence-capable relationship from the anchor column as input to the next iteration. The correspondence between the portions sandwiched by the two anchors has a width, but the width is smaller as it is closer to the anchor and larger as it is closer to the center of the anchor.

【0019】以上の類似度計算部130および文対応推
定部140の処理を文対応可能関係が収束するまで繰り
返す(ステップ330)、これにより、入力された日英
テキスト間の文対応関係を得ることが出来る。
The above-described processing of the similarity calculation unit 130 and the sentence correspondence estimation unit 140 is repeated until the sentence correspondence relation converges (step 330), thereby obtaining the sentence correspondence relation between the input Japanese-English texts. Can be done.

【0020】本手法は、ダイナミックプログラミングに
基づいて既存の対訳辞書を用いた後で後処理として統計
を用いる既存手法と比較して以下の長所を有する。 (1) 既存手法では、初めに辞書による対応付けを行な
うため、専門分野のテキストなど語彙が辞書に掲載され
ていないテキストでは正解率が著しく低下する。既存手
法の統計処理は第一段階の辞書による対応付けの結果に
基づいて行なうため、第一段階の正解率が低い場合には
正しい結果を得ることができない。また、既存手法では
形態素解析部の正解率に大きく左右されるという問題が
生じる。本発明手法ではこれらの問題が解決されてい
る。 (2) 日本語と英語のテキストでは相互に対応していな
い部分が含まれていることが多い。また、日本語文と英
文の対応関係がクロスしていることが多い(日本文iと
英文jが対応しているときに番号がiより小さい日本文
がjより大きい英文が対応あるいはその逆ケース)。既
存手法ではダインミックプログラミングで局所的に対応
付けを行なうため、このような問題に対処出来ない。一
方、テキスト全体を見ながらアンカーを設定していく本
発明手法では上記の問題に対処可能である。
This method has the following advantages over the existing method that uses statistics as post-processing after using an existing bilingual dictionary based on dynamic programming. (1) In the existing method, since the correspondence is first made with the dictionary, the accuracy rate is remarkably lowered in the texts in which the vocabulary is not posted in the dictionary, such as texts in specialized fields. Since the statistical processing of the existing method is performed based on the result of the association by the dictionary in the first step, it is impossible to obtain a correct result when the correct answer rate in the first step is low. In addition, the existing method has a problem that it is greatly affected by the accuracy rate of the morphological analysis unit. The method of the present invention solves these problems. (2) Japanese and English texts often include parts that do not correspond to each other. In addition, the correspondence between Japanese sentences and English sentences is often crossed (when Japanese sentences i correspond to English sentences j, Japanese sentences with numbers smaller than i correspond to English sentences larger than j or vice versa). . In the existing method, such a problem cannot be dealt with because local mapping is performed by dynmic programming. On the other hand, the method of the present invention in which the anchor is set while viewing the entire text can address the above problem.

【0021】次に、後処理部150は、収束した文対応
可能関係から最終結果を導く(ステップ340)。文対
応可能関係では、支持回数の低い日本文は多くの対応英
文を持つ。そこで後処理部150では、それらの対応に
支持回数の有意な差がある場合には、多くの支持回数を
持つ文だけを対応関係として選び、どの対応の支持回数
も小さい場合は、その日本文は対応英文を持たないと判
断する。
Next, the post-processing section 150 derives a final result from the converged sentence correspondence correspondence relationship (step 340). In the sentence correspondence relationship, Japanese sentences with low support frequency have many corresponding English sentences. Therefore, in the post-processing unit 150, when there is a significant difference in the number of support times between the correspondences, only a sentence having a large number of support times is selected as a correspondence relationship. It is determined that there is no corresponding English sentence.

【0022】以上、本発明の一実施例として日本文と英
文の対応付けについて説明したが、本発明はこれに限定
されるものでないことは云うまでもない。
Although the correspondence between Japanese and English sentences has been described above as an embodiment of the present invention, it goes without saying that the present invention is not limited to this.

【0023】[0023]

【発明の効果】以上説明したように、本発明によれば、
幅広い2ヶ国のテキスト中に含まれる文間の対応付けを
高精度で自動的に行なうことが可能である。従って、こ
こで得られる対応つきコーパスは、機械翻訳、例文検索
システム等のシステムに用いられ、また、自動的に知識
を学習するシステムの入力としても利用できる。
As described above, according to the present invention,
It is possible to automatically and highly accurately associate sentences included in texts of a wide range of two countries. Therefore, the corpus with correspondence obtained here is used in a system such as a machine translation or an example sentence search system, and can also be used as an input of a system for automatically learning knowledge.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施例としてのシステム構成図であ
る。
FIG. 1 is a system configuration diagram as an embodiment of the present invention.

【図2】図1中の主要部の接続関係を示す図である。FIG. 2 is a diagram showing a connection relationship of main parts in FIG.

【図3】本発明の実施例の動作を説明するフローチャー
トである。
FIG. 3 is a flowchart illustrating the operation of the embodiment of the present invention.

【図4】文対応ペアを説明する図である。FIG. 4 is a diagram illustrating a sentence correspondence pair.

【符号の説明】[Explanation of symbols]

100 対訳対応付け装置 110 入力部 120 形態素解析部 130 類似度計算部 140 文対応推定部 150 後処理部 160 出力部 170 記憶部 180 対訳辞書 100 bilingual association device 110 input unit 120 morphological analysis unit 130 similarity calculation unit 140 sentence correspondence estimation unit 150 post-processing unit 160 output unit 170 storage unit 180 bilingual dictionary

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 2ヶ国語の対応テキスト中に含まれる文
間の対応付けを自動的に行う方法であって、2ヶ国語の
対応テキストが与えられた時に、統計に基づいて両言語
テキスト中の単語の類似度を計算する処理と、その類似
度ならびに既存の対訳辞書を用いて文対応を推定する処
理とを繰り返して、対応可能な文の組を次第に絞り込
み、最終的に所望の文対応付けを得ることを特徴とする
対訳文対応付け方法。
1. A method for automatically associating sentences included in bilingual texts, wherein bilingual texts are provided based on statistics when bilingual texts are given. Repeat the process of calculating the degree of similarity between words and the process of estimating the sentence correspondence using the similarity and the existing bilingual dictionary, gradually narrowing down the set of sentences that can be dealt with, and finally obtaining the desired sentence correspondence. A method for associating bilingual texts, which is characterized by obtaining an index.
【請求項2】 2ヶ国語の対応テキストを入力する入力
手段と、入力された各テキストを形態素解析する形態素
解析手段と、形態素解析結果から対応可能関係を算出
し、相互情報量を求め、更に統計的検定により単語対を
選択する類似度計算手段と、選択された単語対に、あら
かじめ用意された対訳辞書を用いて、一方の言語の文と
他方の言語の文の対応が支持される回数をカウントし、
所定の閾値にて文対応可能関係を絞り込む文対応推定手
段と、絞り込まれた文対応可能関係に対して、所定の支
持回数を持つ文対応ペアを選択する後処理手段と、選択
された文対応ペアを最終結果として出力する出力手段と
を有することを特徴とする対訳文対応付け装置。
2. An input means for inputting corresponding texts in two languages, a morpheme analysis means for performing a morpheme analysis on each input text, a correspondence possibility relationship is calculated from a morpheme analysis result, and a mutual information amount is obtained. The number of times the correspondence between a sentence in one language and a sentence in the other language is supported using a similarity calculation means that selects a word pair by a statistical test and a prepared bilingual dictionary for the selected word pair. Count
Sentence correspondence estimating means for narrowing down the sentence correspondence possibility with a predetermined threshold, post-processing means for selecting a sentence correspondence pair having a predetermined support frequency for the narrowed sentence correspondence relation, and selected sentence correspondence A parallel translation sentence associating device, comprising: an output unit that outputs a pair as a final result.
JP7324562A 1995-12-13 1995-12-13 Translated sentence corresponding method and device therefor Pending JPH09160918A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7324562A JPH09160918A (en) 1995-12-13 1995-12-13 Translated sentence corresponding method and device therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7324562A JPH09160918A (en) 1995-12-13 1995-12-13 Translated sentence corresponding method and device therefor

Publications (1)

Publication Number Publication Date
JPH09160918A true JPH09160918A (en) 1997-06-20

Family

ID=18167203

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7324562A Pending JPH09160918A (en) 1995-12-13 1995-12-13 Translated sentence corresponding method and device therefor

Country Status (1)

Country Link
JP (1) JPH09160918A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004107203A1 (en) * 2003-05-30 2004-12-09 Fujitsu Limited Translated sentence correlation device
US7663593B2 (en) 2005-03-02 2010-02-16 Sony Corporation Level shift circuit and shift register and display device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004107203A1 (en) * 2003-05-30 2004-12-09 Fujitsu Limited Translated sentence correlation device
US7308398B2 (en) 2003-05-30 2007-12-11 Fujitsu Limited Translation correlation device
US7663593B2 (en) 2005-03-02 2010-02-16 Sony Corporation Level shift circuit and shift register and display device

Similar Documents

Publication Publication Date Title
CN110489760B (en) Text automatic correction method and device based on deep neural network
CN110705302B (en) Named entity identification method, electronic equipment and computer storage medium
US5029085A (en) Conversational-type natural language analysis apparatus
US9594742B2 (en) Method and apparatus for matching misspellings caused by phonetic variations
CN106844351B (en) Medical institution organization entity identification method and device oriented to multiple data sources
US11531693B2 (en) Information processing apparatus, method and non-transitory computer readable medium
US20230075614A1 (en) Automatically identifying multi-word expressions
JP6778655B2 (en) Word concatenation discriminative model learning device, word concatenation detection device, method, and program
CN108932233A (en) Literary generation method is translated, literary generating means are translated and translates text and generates program
US20040243394A1 (en) Natural language processing apparatus, natural language processing method, and natural language processing program
JP6626917B2 (en) Readability evaluation method and system based on English syllable calculation method
CN112183117A (en) Translation evaluation method and device, storage medium and electronic equipment
US11907656B2 (en) Machine based expansion of contractions in text in digital media
JPH10326275A (en) Method and device for morpheme analysis and method and device for japanese morpheme analysis
Uchimoto et al. Morphological analysis of the Corpus of Spontaneous Japanese
CN112632956A (en) Text matching method, device, terminal and storage medium
CN116306594A (en) Medical OCR recognition error correction method
JP2009157888A (en) Transliteration model generation device, transliteration apparatus, and computer program therefor
JPH09160918A (en) Translated sentence corresponding method and device therefor
Uchimoto et al. Morphological analysis of a large spontaneous speech corpus in Japanese
CN110245331A (en) A kind of sentence conversion method, device, server and computer storage medium
Seresangtakul et al. Thai-Isarn dialect parallel corpus construction for machine translation
Aşliyan et al. Detecting misspelled words in Turkish text using syllable n-gram frequencies
JPH09179868A (en) Translation correspondence support system
Afli et al. From Arabic user-generated content to machine translation: integrating automatic error correction