JP5106431B2

JP5106431B2 - Machine translation apparatus, program and method

Info

Publication number: JP5106431B2
Application number: JP2009011763A
Authority: JP
Inventors: 正樹新藤; 裕美子吉村
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2009-01-22
Filing date: 2009-01-22
Publication date: 2012-12-26
Anticipated expiration: 2029-01-22
Also published as: JP2010170303A

Description

本発明は、第一言語を第二言語に翻訳する機械翻訳装置、プログラム及び方法に関する。 The present invention relates to a machine translation apparatus , a program, and a method for translating a first language into a second language.

第一言語を第二言語に翻訳する機械翻訳装置では、第一言語の文と第二言語の文とを対にした翻訳用例を予め対訳データベースに用意しておき、翻訳用例を用いて翻訳する機能を有したものがある。これは、第一言語の文と対訳データベース内の第一言語の文とを比較し、完全一致した場合に、その対訳データベース内の第二言語の例文を出力するものである。 In a machine translation device that translates a first language into a second language, a translation example in which a sentence in the first language and a sentence in the second language are paired is prepared in advance in a bilingual database, and translated using the translation example. Some have a function. This compares the sentence in the first language with the sentence in the first language in the parallel translation database, and outputs an example sentence in the second language in the parallel translation database when there is a complete match.

また、完全一致しない場合においても、ユーザが訳文を生成する上で、高速に再利用可能である例文を出力させるために、単語解析を行わず、文字ベースあるいは単語ベースで類似する度合い（類似度）を算出し、ユーザが指定した類似度以上の近似した例文を出力するといった機能もある。この場合、ユーザは、より再利用することができる例文を出力させるために類似度を高く設定して使用する。また、一方で、どれだけ近似しているかという類似度を算出するために、シソーラス（Thesaurus）を利用して類似度に重み付けを行うというものがある（例えば、特許文献１参照）。 In addition, even when there is no perfect match, in order to output an example sentence that can be reused at high speed when a user generates a translated sentence, word analysis is not performed, and the degree of similarity (similarity) ) Is calculated, and an approximate example sentence that is equal to or higher than the similarity specified by the user is output. In this case, the user sets the similarity high and uses it to output an example sentence that can be reused more. On the other hand, in order to calculate the degree of similarity of how close it is, there is a method of weighting the degree of similarity using a thesaurus (see, for example, Patent Document 1).

特開平３−２７６３６７号公報JP-A-3-276367

しかしながら、特許文献１のものでは、単語解析を行った上、シソーラスを利用するため、高速に近似した例文を出力するということはできない。また、全体の文字数や単語数が少ない場合には、たとえ、不一致の文字数や単語数が少なくても不一致の割合が高くなるので、類似度が低くなってしまう。このことから、類似度が低くてもユーザが利用しやすい用例があるケースがあり、類似度を高く設定してしまうと、このようなケースの例文が出力できなくなってしまう。 However, in Japanese Patent Application Laid-Open No. H10-228867, it is impossible to output an example sentence approximated at high speed because word analysis is performed and a thesaurus is used. Further, when the total number of characters and words is small, even if the number of mismatched characters and words is small, the mismatch rate is high, and the similarity is low. For this reason, there are cases where the user can easily use even if the degree of similarity is low, and if the degree of similarity is set high, example sentences in such a case cannot be output.

本発明の目的は、全体の文字数や単語数が少ない場合であっても、指定された類似度の下限値を内部で自動補正し、指定された類似度が高い場合においてもユーザが利用しやすい翻訳用例を出力できる機械翻訳装置、プログラム及び方法を提供することである。 An object of the present invention is to automatically correct a lower limit value of a designated similarity level internally even when the number of characters and words is small, and it is easy for a user to use even when a designated similarity level is high. A machine translation apparatus , a program, and a method capable of outputting a translation example.

本発明は、記憶装置内に第一言語の例文と第二言語の例文とが対訳で予め翻訳用例として格納された対訳データベースと、入力装置から入力された第一言語の原文及び翻訳用例の類似度条件を読み込む入力処理部と、前記入力処理部で読み込んだ第一言語の原文と前記対訳データベースの第一言語の例文との一致する単語数の割合を第一言語の原文と第一言語の例文との類似度として算出する類似度算出手段と、前記入力装置から入力された第一言語の原文の単語数に基づいて予め前記類似度の補正値が定義され前記記憶装置内に格納された類似度条件補正テーブルと、前記入力処理部で読み込んだ第一言語の原文の単語数及び前記類似度条件補正テーブルの内容を基に類似度条件を補正する類似度条件補正手段と、前記類似度条件補正手段で補正された類似度条件を満たす第一言語の例文及びその対訳の第二言語の例文を前記対訳データベースから検索する対訳データベース検索手段と、前記対訳データベース検索手段で検索された第一言語の例文及びその対訳の第二言語の例文を出力装置に出力する出力処理部とを備えたことを特徴とする。 The present invention includes a bilingual database and example sentences in the first language and the example sentence in the second language is stored in advance as a translation example in translation in the storage device, similar original and translation examples of the first language inputted from the input device An input processing unit that reads the degree condition, and a ratio of the number of matching words between the original text in the first language read by the input processing unit and the first language example sentence in the parallel translation database. Similarity calculation means for calculating the similarity with an example sentence, and a correction value for the similarity is defined in advance based on the number of words in the original text of the first language input from the input device, and stored in the storage device A similarity condition correction table, a similarity condition correction unit that corrects a similarity condition based on the number of words of the original text in the first language read by the input processing unit and the content of the similarity condition correction table, and the similarity Condition correction means A bilingual database search means for searching an example sentence in a first language satisfying the corrected similarity degree and a bilingual example sentence in the parallel translation from the parallel translation database; an example sentence in the first language searched by the bilingual database search means; And an output processing unit that outputs an example sentence in the second language of the parallel translation to an output device.

本発明によれば、全体の文字数や単語数が少ない場合であっても、指定された類似度の下限値を内部で自動補正し、指定された類似度が高い場合においてもユーザが利用しやすい翻訳用例を出力できる。 According to the present invention, even when the total number of characters and words is small, the lower limit value of the specified similarity is automatically corrected internally, and the user can easily use it even when the specified similarity is high. A translation example can be output.

本発明の実施の形態に係る機械翻訳装置の機能ブロック図。The functional block diagram of the machine translation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図。The block block diagram which shows the hardware constitutions of the machine translation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る機械翻訳装置の動作を示すフローチャート。The flowchart which shows operation | movement of the machine translation apparatus which concerns on embodiment of this invention.

図１は本発明の実施の形態に係る機械翻訳装置の機能ブロック図、図２は本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図である。図２において、機械翻訳装置１１は、例えば一般的なコンピュータに機械翻訳プログラムなどのソフトウェアプログラムがインストールされ、そのソフトウェアプログラムが演算制御装置１２のプロセッサ１３において実行されることにより実現される。 FIG. 1 is a functional block diagram of a machine translation apparatus according to an embodiment of the present invention, and FIG. 2 is a block configuration diagram showing a hardware configuration of the machine translation apparatus according to the embodiment of the present invention. In FIG. 2, the machine translation device 11 is realized by installing a software program such as a machine translation program in a general computer and executing the software program in the processor 13 of the arithmetic control device 12.

演算制御装置１２は機械翻訳に関する各種演算を行うものであり、演算制御装置１２はプロセッサ１３とメモリ１４とを有し、メモリ１４には翻訳に関する機械翻訳プログラム１５が記憶され、プロセッサ１３により処理が実行される際には作業エリア１６が用いられる。演算制御装置１２の演算結果等は出力装置１７である表示装置１８に表示出力され、また、通信制御装置１９を介して通信ネットワークに出力される。表示装置１８は、例えばＣＲＴディスプレイやプラズマ・ディスプレイ、液晶ディスプレイ等が使用できる。 The arithmetic control device 12 performs various arithmetic operations related to machine translation. The arithmetic control device 12 has a processor 13 and a memory 14. A memory translation program 15 is stored in the memory 14, and the processor 13 performs processing. When executed, the work area 16 is used. Calculation results and the like of the calculation control device 12 are displayed and output on the display device 18 that is the output device 17 and also output to the communication network via the communication control device 19. As the display device 18, for example, a CRT display, a plasma display, a liquid crystal display, or the like can be used.

入力装置２０は演算制御装置１２に情報を入力するものであり、例えば、マウス２１、キーボード２２、ディスクドライブ２３、通信制御装置１９から構成され、例えば、マウス２１やキーボード２２は表示装置１８を介して演算制御装置１２に各種指令を入力し、キーボード２２、ディスクドライブ２３、通信制御装置１９は翻訳対象の文書や翻訳に必要な情報を入力する。 The input device 20 is used to input information to the arithmetic control device 12, and includes, for example, a mouse 21, a keyboard 22, a disk drive 23, and a communication control device 19. For example, the mouse 21 and keyboard 22 are connected via the display device 18. Then, various commands are input to the arithmetic and control unit 12, and the keyboard 22, the disk drive 23, and the communication control unit 19 input a document to be translated and information necessary for translation.

すなわち、ディスクドライブ２３は翻訳対象の文書のファイルを記憶媒体に入出力するものであり、通信制御装置１９は機械翻訳装置１１をインターネットやＬＡＮなどの通信ネットワークに接続するものである。通信制御装置１９はＬＡＮカードやモデムなどの装置であり、通信制御装置１９を介して通信ネットワークと送受信したデータは入力信号又は出力信号として演算制御装置１２に送受信される。さらに、演算制御装置１２の演算結果や翻訳に必要な知識・規則を蓄積した翻訳辞書等を記憶するハードディスクドライブ（ＨＤＤ）２４が設けられている。 That is, the disk drive 23 inputs / outputs a file of a document to be translated to / from a storage medium, and the communication control device 19 connects the machine translation device 11 to a communication network such as the Internet or a LAN. The communication control device 19 is a device such as a LAN card or a modem, and data transmitted / received to / from the communication network via the communication control device 19 is transmitted / received to / from the arithmetic control device 12 as an input signal or an output signal. Further, a hard disk drive (HDD) 24 is provided for storing a calculation dictionary of the calculation control device 12 and a translation dictionary storing knowledge and rules necessary for translation.

図１は本発明の実施の形態に係る機械翻訳装置１１の機能ブロック図である。図１に示す演算制御装置１２内の各機能ブロックは、上述の機械翻訳プログラム１５を構成する各プログラムに対応する。すなわち、プロセッサ１３が機械翻訳プログラム１５を構成する各プログラムを実行することで、演算制御装置１２は、各機能ブロックとして機能することとなる。また、記憶装置２５の各ブロックは、演算制御装置１２内のメモリ１４及びハードディスクドライブ２４の記憶領域に対応する。 FIG. 1 is a functional block diagram of a machine translation apparatus 11 according to an embodiment of the present invention. Each functional block in the arithmetic and control unit 12 shown in FIG. 1 corresponds to each program constituting the machine translation program 15 described above. That is, when the processor 13 executes each program constituting the machine translation program 15, the arithmetic control device 12 functions as each functional block. Each block of the storage device 25 corresponds to a storage area of the memory 14 and the hard disk drive 24 in the arithmetic control device 12.

機械翻訳装置１１の記憶装置２５は、入力装置２０から入力され入力処理部３１で読み込まれた入力データを記憶する入力データ記憶部３７と、演算処理に使用する変数を記憶する変数データ記憶部３８と、翻訳処理時に用いる各種翻訳知識を収納した翻訳辞書部２６と、第一言語の例文と第二言語の例文とが対訳で予め翻訳用例として格納された対訳データベース２７と、対訳データベース２７に格納された第一言語の例文の単語数に基づいて予め第一言語の原文と第一言語の例文との類似度の補正値が定義された類似度条件補正テーブル２８と、第一言語の原文と第一言語の例文とが類似するとして検索された翻訳用例（第一言語の例文及びその対訳の第二言語の例文）を第一言語の原文とともに格納する類似用例リスト２９とを有する。 The storage device 25 of the machine translation device 11 includes an input data storage unit 37 that stores input data input from the input device 20 and read by the input processing unit 31, and a variable data storage unit 38 that stores variables used in arithmetic processing. A translation dictionary unit 26 storing various translation knowledge used in translation processing, a bilingual database 27 in which example sentences in the first language and example sentences in the second language are stored in parallel as translation examples, and stored in the translation database 27 A similarity condition correction table 28 in which a correction value for the similarity between the original sentence in the first language and the example sentence in the first language is defined in advance based on the number of words in the example sentence in the first language; A similar example list 29 for storing translation examples (examples in the first language and example sentences in the second language in parallel) searched together with the example sentences in the first language together with the original sentence in the first language;

また、機械翻訳装置１１の演算制御装置１２は、装置全体の制御を行う制御部３０と、外部との入力のインターフェースを行う入力処理部３１と、外部との出力のインターフェースを行う出力処理部３２と、入力処理部３１を介して読み込まれた入力文書を翻訳するための翻訳部３３と、入力装置２０から入力され入力処理部３１で読み込んだ第一言語の原文と対訳データベースの第一言語の例文との一致する単語数の割合を類似度として計算する類似度算出手段３４と、入力装置２０から入力され入力処理部３１で読み込んだ第一言語の原文の単語数及び類似度条件補正テーブル２８の内容を基に類似度条件を補正する類似度条件補正手段３５と、類似度条件補正手段３５で補正された類似度条件を満たす第一言語の例文及びその対訳の第二言語の例文を対訳データベース２７から検索する対訳データベース検索手段３６とを有する。ここで、類似度条件は、対訳データベース２７から類似した例文を検索するための閾値であり、例えば、類似度を百分率で表した閾値で指定される。 The arithmetic control device 12 of the machine translation apparatus 11 includes a control unit 30 that controls the entire apparatus, an input processing unit 31 that performs an input interface with the outside, and an output processing unit 32 that performs an output interface with the outside. A translation unit 33 for translating the input document read via the input processing unit 31, and a first language source text input from the input device 20 and read by the input processing unit 31 and the first language of the bilingual database Similarity calculation means 34 for calculating the ratio of the number of words that match the example sentence as similarity, and the number of words in the original language of the first language read from the input processing unit 31 and read by the input processing unit 31 and the similarity condition correction table 28 The similarity condition correction means 35 for correcting the similarity condition based on the contents of the first sentence, the first language example sentence that satisfies the similarity condition corrected by the similarity condition correction means 35, and And a translation database search means 36 for searching the sentences in the language from the bilingual database 27. Here, the similarity condition is a threshold value for searching for similar example sentences from the parallel translation database 27, and is specified by, for example, a threshold value representing the similarity as a percentage.

辞書部２６は、翻訳部３３での翻訳処理に必要な各種の知識・情報を収容するもので、語彙部、形態素解析規則、構文・意味解析規則、変換規則、構文生成規則、形態素解析規則などからなる。語彙部は、第一言語の単語・熟語と第二言語の訳語とを対応づけて記録するとともに、両者の品詞情報、活用情報、概念情報などの各種情報を記録している。 The dictionary unit 26 stores various kinds of knowledge and information necessary for the translation processing in the translation unit 33, such as a vocabulary unit, morpheme analysis rules, syntax / semantic analysis rules, conversion rules, syntax generation rules, morpheme analysis rules, and the like. Consists of. The vocabulary part records the first language words and idioms and the second language translations in association with each other, and also records various information such as part-of-speech information, utilization information, and concept information.

対訳データベース２７は、第一言語の例文と第二言語の例文とを対にして予め翻訳用例として蓄積するものであり、その一例を表１に示す。

The bilingual database 27 stores in advance a pair of example sentences in the first language and example sentences in the second language as translation examples. Table 1 shows an example.

表１に示すように、対訳データベース２７は、第一言語の例文と第二言語の例文とが対になって構成されている。例えば、対訳データベース２７の１番めの第一言語の例文は「This is a pen.」であり、それと対になる第二言語の例文は「これはペンだ。」である。同様に、２番目の第一言語の例文は「Documents should not be removed from the head office premises.」であり、それと対になる第二言語の例文は「書類は、本社の社屋外に持ち出してはならない。」である。以下、同様にｎ番めの第一言語の例文と第二言語の例文とが対になって構成されている。 As shown in Table 1, the parallel translation database 27 is composed of pairs of example sentences in the first language and example sentences in the second language. For example, the first example sentence in the first language of the parallel translation database 27 is “This is a pen.”, And the example sentence in the second language paired therewith is “This is a pen.” Similarly, the second example sentence in the first language is “Documents should not be removed from the head office premises.” The second example sentence in the second language is “Please take documents outside the head office. It must not be. " In the same manner, the nth example sentence in the first language and the example sentence in the second language are paired.

類似度条件補正テーブル２８は、第一言語の原文ないし翻訳用例の文字数や単語数、原文と翻訳用例の文字数、単語数の差などの属性を元に、ユーザによって入力装置２０から指定された類似度条件の値を補正するためのデータを格納しているものである。表２に、属性として第一言語の原文の単語数による補正値を定めた場合の類似度条件補正テーブル２８の一例を示す。

The similarity condition correction table 28 is a similarity specified by the user from the input device 20 based on attributes such as the number of characters and words of the original text or translation example in the first language, the number of characters between the original text and the translation example, and the difference in the number of words. The data for correcting the value of the degree condition is stored. Table 2 shows an example of the similarity condition correction table 28 when a correction value based on the number of words in the original text in the first language is defined as an attribute.

表２に示すように、第一言語の原文の単語数が「４」である場合には補正値は−２０％、第一言語の原文の単語数が「５」である場合には補正値は−１５％、第一言語の原文の単語数が「６−９」である場合には補正値は−１０％、第一言語の原文の単語数が「１０」である場合には補正値は０％、第一言語の原文の単語数が「１１−１５」である場合は補正値は＋１０％に定められている。このように、全体の単語数が少ない場合に補正値をマイナスとしているのは、全体の単語数が少ない場合には不一致の文字数や単語数が少なくても不一致の割合が高くなり類似度が低くなってしまうので、全体の単語数が少ない場合には、類似度が低くても例文の検索を可能とするためである。これにより、類似度が低くてもユーザが利用しやすい用例を検索可能となる。 As shown in Table 2, the correction value is −20% when the number of original language words in the first language is “4”, and the correction value when the number of original language words in the first language is “5”. Is −15%, the correction value is −10% when the number of original language words in the first language is “6-9”, and the correction value is when the number of original language words in the first language is “10”. Is 0%, and when the number of words in the original text in the first language is “11-15”, the correction value is set to + 10%. In this way, when the total number of words is small, the correction value is negative. When the total number of words is small, even if the number of mismatched characters or words is small, the mismatch rate is high and the similarity is low. Therefore, when the total number of words is small, it is possible to search for example sentences even if the degree of similarity is low. Thereby, even if the similarity is low, it is possible to search for an example that can be easily used by the user.

表２では、第一言語の原文の単語数に応じて補正値を定めるにあたり、補正値を百分率の割合で定め、ユーザによって指定された類似度条件に加減算して補正された類似度条件を求める。これに対し、表３に示すように、補正値を補正係数で定め、ユーザによって指定された類似度条件に補正値を乗算して補正された類似度条件を求めるようにしてもよい。表３に、属性として第一言語の原文の単語数による補正値を定めた場合の類似度条件補正テーブル２８の他の一例を示す。

In Table 2, when the correction value is determined according to the number of words in the original text of the first language, the correction value is determined as a percentage, and the corrected similarity condition is obtained by adding to or subtracting from the similarity condition specified by the user. . On the other hand, as shown in Table 3, the correction value may be determined by a correction coefficient, and the similarity condition corrected by multiplying the similarity condition specified by the user by the correction value may be obtained. Table 3 shows another example of the similarity condition correction table 28 when a correction value based on the number of words in the original text in the first language is defined as an attribute.

表３に示すように、第一言語の原文の単語数が「４」である場合には補正値は０．８、第一言語の原文の単語数が「５」である場合には補正値は０．９、第一言語の原文の単語数が「６−９」である場合には補正値は０．９５、第一言語の原文の単語数が「１０」である場合には補正値は１、第一言語の原文の単語数が「１１−１５」である場合は補正値は１．１に定められている。このように、全体の単語数が少ない場合に類似度条件が小さくなる補正値としているのは、全体の単語数が少ない場合であってもユーザが利用しやすい用例を検索可能とするためである。 As shown in Table 3, the correction value is 0.8 when the number of original language words in the first language is “4”, and the correction value when the number of original language words in the first language is “5”. Is 0.9, the correction value is 0.95 when the number of original language words in the first language is “6-9”, and the correction value is when the number of original language words in the first language is “10”. If the number of words in the original text in the first language is “11-15”, the correction value is set to 1.1. Thus, the reason why the similarity value is set to be small when the total number of words is small is to make it possible to search for an example that is easy for the user to use even when the total number of words is small. .

また、表４に示すように、第一言語の原文の単語数と対訳データベース２８の第一言語の例文との単語数の差分に応じて補正値を定めるようにしてもよい。

Further, as shown in Table 4, the correction value may be determined according to the difference between the number of words in the original text in the first language and the number of words in the first language example in the parallel translation database 28.

表４に示すように、第一言語の原文と第一言語の例文との単語数の差分が「０」である場合には補正値は０％、第一言語の原文と第一言語の例文との単語数の差分が「１」である場合には補正値は−５％、第一言語の原文と第一言語の例文との単語数の差分が「２」である場合には補正値は−１０％、第一言語の原文と第一言語の例文との単語数の差分が「３」である場合には補正値は−１５％、第一言語の原文と第一言語の例文との単語数の差分が「４」である場合には補正値は−２０％に定められている。 As shown in Table 4, when the difference in the number of words between the first language original sentence and the first language example sentence is “0”, the correction value is 0%, the first language original sentence and the first language example sentence. The correction value is −5% when the difference in the number of words is “1”, and the correction value when the difference in the number of words between the original text in the first language and the example sentence in the first language is “2”. Is −10%, and if the difference in the number of words between the first language original sentence and the first language example sentence is “3”, the correction value is −15%, and the first language original sentence and the first language example sentence When the difference in the number of words is “4”, the correction value is set to −20%.

このように、第一言語の原文と第一言語の例文との単語数の差分が大きい場合に類似度条件が小さくなる補正値としているのは、単語数の差分によって類似度が下がるのを防止するためである。例えば、第一言語の原文が「This is a heavy book.」である場合、対訳データベースの第一言語の例文「This is a pen.」と単語ベースで比較すると、比較する文同士の単語数が異なるため、必ず１単語分の差分が生じてしまうが、この場合の対訳データベースの第一言語の例文「This is a pen.」も検索できるようにするためである。 In this way, when the difference in the number of words between the original text in the first language and the example sentence in the first language is large, the similarity value is set to a correction value that reduces the similarity due to the difference in the number of words. It is to do. For example, if the original text in the first language is “This is a heavy book.”, The number of words in the sentences to be compared is compared with the first language example sentence “This is a pen.” This is because a difference for one word is always generated because of differences, but in this case, the example sentence “This is a pen.” In the first language of the parallel translation database can also be searched.

類似用例リスト２９は、対訳データベース検索手段３６によって、検索された翻訳用例（第一言語の例文及びその対訳の第二言語の例文）を第一言語の原文とともに格納しておくものである。すなわち、類似度条件補正手段３５で補正された類似度条件に合致した翻訳用例を検索し、検索された翻訳用例を第一言語の原文とともに格納しておくものである。例えば、第一言語の原文「This is a book.」に対して、表１の１番めの翻訳用例が検索されたときは、第一言語の例文「This is a pen.」と第二言語の例文「これはペンだ。」とともに第一言語の原文「This is a book.」も類似用例リスト２９に保存する。 The similar example list 29 stores translation examples (examples in the first language and example sentences in the second language of the parallel translation) searched by the parallel translation database search means 36 together with the original text in the first language. That is, a translation example that matches the similarity condition corrected by the similarity condition correction unit 35 is searched, and the searched translation example is stored together with the original text in the first language. For example, when the first translation example in Table 1 is searched for the original text “This is a book.” In the first language, the example sentence “This is a pen.” In the first language and the second language The original sentence “This is a book” in the first language is also stored in the similar example list 29 together with the example sentence “This is a pen.”

次に、入力処理部３１は、入力装置２０であるインターネットなどの通信制御装置１９やキーボード２２等を通じて、翻訳対象の第一言語の原文、対訳データベース２７に格納された翻訳用例の中から類似の翻訳用例を検索するための類似度条件、その他の各種のコマンドを受け取り入力処理するものである。入力処理部３１で読み込まれた入力データ、第一言語の原文や類似度条件は入力データ記憶部３７に記憶される。 Next, the input processing unit 31 uses a communication control device 19 such as the Internet, which is the input device 20, a keyboard 22, and the like, from the original text of the first language to be translated and the translation examples stored in the parallel translation database 27. A similarity condition for searching for translation examples and other various commands are received and processed. The input data read by the input processing unit 31, the original text in the first language, and the similarity condition are stored in the input data storage unit 37.

出力処理部３２は、制御部３０への各種命令に対する制御部３０からの応答を出力装置１７に出力するものであり、例えば、翻訳部３３での翻訳結果や対訳データベース検索手段３６で検索された第一言語の例文及びその対訳の第二言語の例文を表示装置１８に表示出力する。 The output processing unit 32 outputs responses from the control unit 30 to various commands to the control unit 30 to the output device 17. For example, the output processing unit 32 is searched by the translation result in the translation unit 33 or the parallel translation database search means 36. An example sentence in the first language and an example sentence in the second language of the translation are displayed on the display device 18.

制御部３０は、装置全体の動きを制御するものであり、例えば、入力処理部３１で読み込んだ翻訳対象の第一言語の原文の文章データや類似度条件を入力データ記憶部３７から取り出し、翻訳部３３、類似度算出手段３４、類似度条件補正手段３５に送ったり、翻訳部３３での翻訳結果や対訳データベース検索手段３６の検索結果を出力処理部３２に送るなどの全体の制御を司るものである。 The control unit 30 controls the movement of the entire apparatus. For example, the text data and similarity condition of the original text of the first language to be translated read by the input processing unit 31 are extracted from the input data storage unit 37 and translated. It controls the overall control such as sending to the output unit 32, the similarity calculation means 34, the similarity condition correction means 35, and the translation result in the translation part 33 and the search result in the parallel translation database search means 36. It is.

翻訳部３３は、制御部３０から送られてきた第一言語の原文による文章データに対して、辞書部２６に格納された辞書等を参照しながら翻訳処理を行うものである。 The translation unit 33 performs a translation process on the text data in the original text of the first language sent from the control unit 30 while referring to a dictionary or the like stored in the dictionary unit 26.

類似度算出手段３４は、入力処理部３１で読み込んだ第一言語の原文と対訳データベース２７の第一言語の例文との一致する単語数の割合を類似度として計算し、完全一致した例文（類似度１００％の例文）の対訳文を訳文として制御部３０へ返したり、完全一致しない例文に対し、辞書部２６のデータを用いて単語解析を行って一致する単語数の割合を類似度として計算する。 The similarity calculation means 34 calculates the ratio of the number of matching words between the original text in the first language read by the input processing unit 31 and the example sentences in the first language in the parallel translation database 27 as the similarity, and the example sentences (similarity similar) 100% example sentence) is returned to the control unit 30 as a translated sentence, or word analysis is performed using the data of the dictionary unit 26 for example sentences that do not completely match, and the ratio of the number of matching words is calculated as the similarity To do.

類似度条件補正手段３５は、入力処理部３１で読み込んだ第一言語の原文の単語数及び類似度条件補正テーブル２８の内容を基に類似度条件を補正するものである。そして、対訳データベース検索手段３６は、類似度条件補正手段３５で補正された類似度条件を満たす第一言語の例文及びその対訳の第二言語の例文を対訳データベース２７から検索し、検索結果ともに第一言語の原文を類似用例リスト２９に保存するものである。 The similarity condition correction unit 35 corrects the similarity condition based on the number of words in the original text in the first language read by the input processing unit 31 and the content of the similarity condition correction table 28. Then, the parallel translation database search means 36 searches the parallel translation database 27 for the first language example sentence that satisfies the similarity condition corrected by the similarity condition correction means 35 and the second language example sentence of the parallel translation. The original text in one language is stored in the similar example list 29.

次に、本発明の実施の形態に係る機械翻訳装置の動作について説明する。図３は本発明の実施の形態に係る機械翻訳装置の動作を示すフローチャートである。実際の流れを分かりやすくするために、第一言語を英語、第二言語を日本語として説明する。また、類似度条件補正テーブル２８の補正値は表２に示すものである場合について説明する。 Next, the operation of the machine translation apparatus according to the embodiment of the present invention will be described. FIG. 3 is a flowchart showing the operation of the machine translation apparatus according to the embodiment of the present invention. To make the actual flow easier to understand, the first language is explained as English and the second language is explained as Japanese. The case where the correction values in the similarity condition correction table 28 are those shown in Table 2 will be described.

まず、ユーザは、入力装置２０より翻訳対象の第一言語の原文及び類似度条件を入力する(Ｓ１)。類似度条件は、前述したように、対訳データベース２７から類似した例文を検索するための閾値であり、例えば、類似度を百分率で表した閾値で指定される。いま、入力された第一言語の原文は「This is a book.」であるとし、類似度条件は８０％であるとする。 First, the user inputs the original text of the first language to be translated and the similarity condition from the input device 20 (S1). As described above, the similarity condition is a threshold for searching for similar example sentences from the bilingual database 27, and is specified by, for example, a threshold representing the similarity as a percentage. Assume that the original text of the input first language is “This is a book.” And the similarity condition is 80%.

入力装置２０から、第一言語の原文「This is a book.」及び類似度条件８０％が入力されると、入力処理部３１は、入力された第一言語の原文「This is a book.」及び類似度条件８０％を制御部３０に渡す。制御部３０では、入力処理部３１から受け取った第一言語の原文「This is a book.」及び類似度条件８０％を入力データ記憶部に３７に記憶し、入力データ記憶部に３７に記憶した第一言語の原文「This is a book.」及び類似度条件８０％を類似度算出手段３４及び類似度条件補正手段３５に渡す。 When the original text “This is a book.” And the similarity condition 80% are input from the input device 20, the input processing unit 31 inputs the original text “This is a book.” In the first language. The similarity condition 80% is passed to the control unit 30. In the control unit 30, the original text “This is a book.” And the similarity condition 80% received from the input processing unit 31 are stored in the input data storage unit 37 and stored in the input data storage unit 37. The original text “This is a book.” In the first language and the similarity condition 80% are passed to the similarity calculation means 34 and the similarity condition correction means 35.

類似度算出手段３４は、入力された第一言語の原文「This is a book.」及び類似度条件８０％を受け取り、類似度条件８０％を変数simに代入する（Ｓ２）。変数simは類似度算出手段３４により制御部３０を介して変数データ記憶部３８に記憶される。また、類似度算出手段３４は、対訳データベース２７のインデックス変数ｎを１に初期化し（Ｓ３）、対訳データベース２７を検索してｎ番め（１番め）の第一言語の例文「This is a pen.」を取得する（Ｓ４）。インデックス変数ｎは類似度算出手段３４により制御部３０を介して変数データ記憶部３８に記憶される。 The similarity calculation means 34 receives the input original sentence “This is a book.” And the similarity condition 80% in the first language, and substitutes the similarity condition 80% into the variable sim (S2). The variable sim is stored in the variable data storage unit 38 by the similarity calculation unit 34 via the control unit 30. Also, the similarity calculation means 34 initializes the index variable n of the parallel translation database 27 to 1 (S3), searches the parallel translation database 27 and searches for an example sentence “This is a” in the nth (first) first language. pen. "is acquired (S4). The index variable n is stored in the variable data storage unit 38 by the similarity calculation unit 34 via the control unit 30.

類似度算出手段３４は２つの文（第一言語の原文「This is a book.」と第一言語の例文「This is a pen.」）の比較を行い、類似度を算出する（Ｓ５）。類似度の算出方法においては、文字数の比較や単語数の比較、句数の比較や単語の品詞の比較による類似度の重み付けなどどのような方法をとってもよい。本発明の実施の形態においては、類似度算出手段３４においては、単語数の比較で類似度を算出することとする。第一言語の原文「This is a book.」と第一言語の例文「This is a pen.」との差分は、「pen」と「book」だけであり、４単語中３単語一致となるので、類似度算出手段３４は類似度は７５％であると算出する。そして、類似度算出手段３４は、算出した類似度を変数ruijiに代入する（Ｓ６）。変数ruijiは類似度算出手段３４により制御部３０を介して変数データ記憶部３８に記憶される。 The similarity calculation means 34 compares two sentences (the original text “This is a book.” In the first language and the example sentence “This is a pen.” In the first language), and calculates the similarity (S5). As a method of calculating similarity, any method such as comparison of the number of characters, comparison of the number of words, comparison of the number of phrases, comparison of parts of speech of words, and the like may be used. In the embodiment of the present invention, the similarity calculation unit 34 calculates the similarity by comparing the number of words. The difference between the original text "This is a book." In the first language and the example sentence "This is a pen." In the first language is only "pen" and "book". The similarity calculation means 34 calculates that the similarity is 75%. Then, the similarity calculation means 34 substitutes the calculated similarity for the variable ruiji (S6). The variable ruiji is stored in the variable data storage unit 38 via the control unit 30 by the similarity calculation unit 34.

次に、類似度条件補正手段３５は第一言語の原文「This is a book.」の単語数をカウントし（Ｓ７）、変数wordへ単語数「４」を代入する（Ｓ８）。変数wordは類似度条件補正手段３５により制御部３０を介して変数データ記憶部３８に記憶される。単語数については、ただ単にスペースで区切ってカウントしてもいいし、辞書部２６のデータを用いて、辞書登録されている単位でカウントしてもよい。 Next, the similarity condition correction means 35 counts the number of words in the original text “This is a book.” In the first language (S7), and substitutes the number of words “4” into the variable word (S8). The variable word is stored in the variable data storage unit 38 via the control unit 30 by the similarity condition correction means 35. The number of words may be counted simply by separating them with spaces, or may be counted in units registered in the dictionary using data in the dictionary unit 26.

その後、類似度条件補正手段３５は、変数データ記憶部３８から変数wordを取り出し、表２の類似度条件補正テーブル２８から、変数wordと同じ値である「４」に対する補正値「−２０％」を取得し（Ｓ９）、その補正値を変数hoseiに代入する（Ｓ１０）。変数hoseiは類似度条件補正手段３５により制御部３０を介して変数データ記憶部３８に記憶される。そして、類似度条件補正手段３５は、変数データ記憶部３８から変数hoseiを取り出し、その変数hoseiを用いて（sim＝sim＋hosei）により変数simの補正を行う（Ｓ１１）。補正した変数simは類似度条件補正手段３５により制御部３０を介して変数データ記憶部３８に更新記憶される。 Thereafter, the similarity condition correction means 35 extracts the variable word from the variable data storage unit 38, and from the similarity condition correction table 28 of Table 2, the correction value “−20%” for “4” that is the same value as the variable word. (S9), and the correction value is substituted into the variable hosei (S10). The variable hosei is stored in the variable data storage unit 38 via the control unit 30 by the similarity condition correction means 35. Then, the similarity condition correction unit 35 extracts the variable hosei from the variable data storage unit 38, and corrects the variable sim using (sim = sim + hosei) using the variable hosei (S11). The corrected variable sim is updated and stored in the variable data storage unit 38 via the control unit 30 by the similarity condition correction unit 35.

すなわち、類似度条件補正手段３５は、変数データ記憶部３８から変数sim及び変数hoseiを取り出す。変数hoseiには−２０％、変数simには８０％という値が入っているため、補正後の変数simは、８０％−２０％で６０％となり、補正後の類似度条件は６０％となる。この補正後の変数simは類似度条件補正手段３５により制御部３０を介して変数データ記憶部３８に更新記憶される。 That is, the similarity condition correction unit 35 extracts the variable sim and the variable hosei from the variable data storage unit 38. Since the variable hosei has a value of −20% and the variable sim has a value of 80%, the corrected variable sim is 80% −20%, which is 60%, and the similarity condition after correction is 60%. . The corrected variable sim is updated and stored in the variable data storage unit 38 via the control unit 30 by the similarity condition correction unit 35.

また、表２の類似度条件補正テーブル２８に代えて、表３の類似度条件補正テーブル２８を使用した場合は、変数hoseiには×０．８が取得され、補正後のsimは、８０％×０．８で６４％となり、補正後の類似度条件は６４％となる。 When the similarity condition correction table 28 of Table 3 is used instead of the similarity condition correction table 28 of Table 2, x0.8 is acquired for the variable hosei, and the corrected sim is 80%. × 0.8 is 64%, and the similarity condition after correction is 64%.

対訳データベース検索手段３６は、類似度算出手段３４によって算出された類似度と、類似度条件補正手段３５により補正された類似度条件（ステップＳ１１で得られた変数sim）の値とを比較し（Ｓ１２）、類似度算出手段３４によって算出された類似度が補正された類似度条件（ステップＳ１１で得られた変数sim）よりも大きい場合は、ｎ番め（１番め）の第一言語の例文及び第二言語の例文を類似例文リスト２９に格納する（Ｓ１３）。一方、類似度算出手段３４によって算出された類似度が補正された類似度条件（ステップＳ１１で得られた変数sim）以下である場合は、対訳データベース検索手段３６は、ｎ番め（１番め）の第一言語の例文及び第二言語の例文を類似例文リスト２９に格納しない。本発明の実施の形態の場合は、類似度算出手段３４によって算出された類似度は７５％であり、補正された類似度条件は６０％であるので、類似例文リスト２９に格納される。 The parallel database search means 36 compares the similarity calculated by the similarity calculation means 34 with the value of the similarity condition (variable sim obtained in step S11) corrected by the similarity condition correction means 35 ( S12) If the similarity calculated by the similarity calculation means 34 is greater than the corrected similarity condition (variable sim obtained in step S11), the nth (first) first language The example sentence and the example sentence in the second language are stored in the similar example sentence list 29 (S13). On the other hand, if the similarity calculated by the similarity calculation means 34 is equal to or less than the corrected similarity condition (variable sim obtained in step S11), the parallel translation database search means 36 is nth (first). ) Are not stored in the similar example sentence list 29. In the case of the embodiment of the present invention, the similarity calculated by the similarity calculation means 34 is 75%, and the corrected similarity condition is 60%, so it is stored in the similar example sentence list 29.

その後、対訳データベース検索手段３６は、ｎ番め（１番め）の第一言語の例文は最後かどうかを判定し（Ｓ１４）、最後でないときは、変数simを初期値の８０％に戻し（Ｓ１５）、対訳データベース２７のインデックス変数ｎを加算し（Ｓ１６）、ステップＳ４に戻る。 Thereafter, the bilingual database search means 36 determines whether the nth (first) example sentence in the first language is the last (S14), and if not, returns the variable sim to 80% of the initial value ( S15), the index variable n of the parallel translation database 27 is added (S16), and the process returns to step S4.

これにより、類似度算出手段３４は、対訳データベース２７の２番め（ｎ＝２）の第一言語の例文「Documents should not be removed from the head office premises.」を取得し、対訳データベース２７の第一言語の例文「This is a book.」と比較を行い、類似度を算出する（Ｓ５）。第一言語の例文「Documents should not be removed from the head office premises.」は、１０単語中いづれの単語も第一言語の例文「This is a book.」に一致しないため、０％の類似度が算出される。そして、類似度算出手段３４は、算出した類似度０％を変数ruijiに代入する（Ｓ６）。 As a result, the similarity calculation unit 34 acquires the second example (n = 2) example sentence “Documents should not be removed from the head office premises.” In the parallel translation database 27, and Comparison is made with an example sentence “This is a book” in one language, and the similarity is calculated (S5). In the first language example “Documents should not be removed from the head office premises.”, None of the ten words match the first language example “This is a book.” Calculated. Then, the similarity calculation means 34 substitutes the calculated similarity 0% for the variable ruiji (S6).

次に、類似度条件補正手段３５は第一言語の原文「Documents should not be removed from the head office premises.」の単語数をカウントし（Ｓ７）、変数wordへ単語数「１０」を代入する（Ｓ８）。その後、類似度条件補正手段３５は、表２の類似度条件補正テーブル２８から、変数wordの値である「１０」に対する補正値「０％」を取得し（Ｓ９）、その補正値を変数hoseiに代入する（Ｓ１０）。そして、類似度条件補正手段３５は、変数hoseiを用いて（sim＝sim＋hosei）により変数simの補正を行う（Ｓ１１）。その結果、補正後の類似度条件である変数simは８０％である。 Next, the similarity condition correction means 35 counts the number of words in the original document “Documents should not be removed from the head office premises” in the first language (S7), and substitutes the number of words “10” into the variable word ( S8). Thereafter, the similarity condition correction means 35 acquires the correction value “0%” for the value “10” of the variable word from the similarity condition correction table 28 of Table 2 (S9), and uses the correction value as the variable hosei. (S10). Then, the similarity condition correcting unit 35 corrects the variable sim by using the variable hosei (sim = sim + hosei) (S11). As a result, the variable sim which is the similarity condition after correction is 80%.

この場合、類似度算出手段３４によって算出された類似度は０％であり、類似度条件補正手段３５により補正された類似度条件（ステップＳ１１で得られた変数sim）の値は８０％であり、対訳データベース検索手段３６は、類似度算出手段３４によって算出された類似度が補正された類似度条件よりも小さいので（Ｓ１２）、ｎ番め（２番め）の第一言語の例文及び第二言語の例文を類似例文リスト２９に格納しない（Ｓ１３）。 In this case, the similarity calculated by the similarity calculation unit 34 is 0%, and the value of the similarity condition (variable sim obtained in step S11) corrected by the similarity condition correction unit 35 is 80%. Since the parallel database search means 36 has a similarity calculated by the similarity calculation means 34 smaller than the corrected similarity condition (S12), the nth (second) example sentence in the first language and the second Bilingual example sentences are not stored in the similar example sentence list 29 (S13).

そして、ｎ番めの第一言語の例文が最後になるまで、ステップＳ４〜ステップＳ１６の処理を繰り返し行う。これにより、最終的に作成された類似例文リスト２９の内容を対訳データベース検索手段３６が制御部３０へ返し、制御部３０は類似例文リスト２９の内容を出力する（Ｓ１７）。すなわち、制御部３０は出力処理部３２にて出力装置１７の例えば表示装置１８に類似例文リスト２９の内容を表示出力する。その結果、ユーザが８０％の類似度条件の指定を行った場合においても、１文字差分で７５％になるような短い例文も、翻訳用例として出力することができる。 Then, the processes in steps S4 to S16 are repeated until the nth first language example sentence is the last. As a result, the content of the similar example sentence list 29 finally created is returned to the control unit 30 by the parallel translation database search unit 36, and the control unit 30 outputs the contents of the similar example sentence list 29 (S17). That is, the control unit 30 causes the output processing unit 32 to display and output the contents of the similar example sentence list 29 on the display device 18 of the output device 17. As a result, even when the user designates the similarity condition of 80%, a short example sentence with a one-character difference of 75% can be output as an example for translation.

ここで、類似度算出手段３４によって、類似度を算出する際に、入力された原文が「This is a heavy book.」である場合、対訳データベースの第一言語の例文「This is a pen.」と単語ベースで比較すると、比較する文同士の単語数が異なるため、必ず１単語分の差分が生じてしまう。そのため、比較する文同士の単語数が異なるような場合は、表４の類似度条件補正テーブル２８に示すように、第一言語の原文の単語数と対訳データベース２８の第一言語の例文との単語数の差分に応じて補正値を定め、この類似度条件補正テーブル２８により、さらに類似度条件を補正する。 Here, when the similarity is calculated by the similarity calculation means 34, if the input original is “This is a heavy book.”, The example sentence “This is a pen.” In the first language of the parallel translation database. Compared on a word basis, the number of words in the sentences to be compared is different, so a difference for one word is always generated. Therefore, when the number of words in the sentences to be compared is different, as shown in the similarity condition correction table 28 in Table 4, the number of words in the first language and the first language example sentence in the parallel translation database 28 A correction value is determined according to the difference in the number of words, and the similarity condition is further corrected by the similarity condition correction table 28.

なお、本発明の実施の形態の表１乃至表４に示した類似度条件補正テーブル２８の定義内容は一例であって、レベル分けの階層数や数値は自由に設定して実施できる。また、単語数について説明したが、その他の属性である文字数や句数などに基づいて、類似度の補正値をもめることも可能である。さらに、レベル分けされた属性値（単語数、文字数、句数など）によって類似度の補正値を変動させず、一律の補正率を定義するものでもよい。 The definition content of the similarity condition correction table 28 shown in Tables 1 to 4 of the embodiment of the present invention is an example, and the number of levels and numerical values for level division can be freely set and implemented. Further, although the number of words has been described, it is also possible to obtain a similarity correction value based on other attributes such as the number of characters and the number of phrases. Furthermore, a uniform correction factor may be defined without changing the similarity correction value according to attribute values (number of words, number of characters, number of phrases, etc.) divided into levels.

また、以上の説明では、ステップＳ４にて対訳データベース２７より第一言語の例文が得られてから、ステップＳ５に移行して類似度の算出と類似度の補正とを行っているが、対訳データベース２７の検索前に、原文の属性（単語数、文字数、句数など）のみ、あるいは対訳データベース２７中に蓄積されている第一言語の例文の属性の平均値などと組み合わせて予め類似度条件を補正し、補正された類似度条件とともに対訳データベース２７に対する用例の検索を行うことも有効である。この場合、類似度条件を使って検索範囲を限定する（検索範囲の枝がりをする）ことができるため、検索速度の向上の効果も得られる。 In the above description, the first language example sentence is obtained from the parallel translation database 27 in step S4, and then the process proceeds to step S5 to calculate the similarity and correct the similarity. Before the search of 27, the similarity condition is set in advance in combination with only the original attribute (number of words, number of characters, number of phrases, etc.) or the average value of example sentence attributes of the first language stored in the parallel translation database 27. It is also effective to perform a search for an example with respect to the parallel translation database 27 together with the corrected similarity condition. In this case, since the search range can be limited (branch of the search range) using the similarity condition, an effect of improving the search speed can be obtained.

本発明の実施の形態によれば、類似度の補正を定義した類似度条件補正テーブル２８と、類似度条件補正テーブル２８の内容を基に第一言語の原文ないし翻訳用例の属性に応じて類似度条件を補正する類似度条件補正手段３５とを設けたので、第一言語の原文、翻訳用例の状況に応じて指定された類似度を調整することができ、類似翻訳用例の出力の可否を制御することができる。すなわち、属性として第一言語の原文の単語数、文字数、句数による補正値を定めるので、全体の単語数、文字数、句数が少ない場合であってもユーザが利用しやすい用例を検索できる。これにより、１文字差分で指定された類似度に満たないような短い例文も出力することができる。 According to the embodiment of the present invention, the similarity condition correction table 28 in which the correction of the similarity degree is defined, and the contents of the similarity condition correction table 28 are similar based on the original language of the first language or the attribute of the translation example. Since the similarity condition correction means 35 for correcting the degree condition is provided, it is possible to adjust the degree of similarity designated according to the original text of the first language and the situation of the translation example, and whether or not the output of the similar translation example can be determined. Can be controlled. That is, since the correction value is determined as the attribute based on the number of words, characters, and phrases of the original text in the first language, it is possible to search for an example that can be easily used by the user even when the total number of words, characters, and phrases is small. As a result, it is possible to output a short example sentence that does not satisfy the similarity specified by the one-character difference.

また、第一言語の原文ないし翻訳用例の属性を第一言語の原文と翻訳用例の文字数、単語数、句数の差分とすることで、第一言語の原文と翻訳用例の文字数、単語数、句数に差がある場合でも、その差に左右されず類似している翻訳用例の例文を出力することができる。 In addition, the attribute of the original text or translation example in the first language is the difference between the number of characters, the number of words, the number of phrases in the original text in the first language and the translation example, the number of characters, the number of words in the original text in the first language and the translation example, Even when there is a difference in the number of phrases, it is possible to output similar example sentences for translation regardless of the difference.

１１…機械翻訳装置、１２…演算制御装置、１３…プロセッサ、１４…メモリ、１５…プログラム、１６…作業エリア、１７…出力装置、１８…表示装置、１９…通信制御装置、２０…入力装置、２１…マウス、２２…キーボード、２３…ディスクドライブ、２４…ハードディスクドライブ、２５…記憶装置、２６…辞書部、２７…対訳データベース、２８…類似度条件補正テーブル、２９…類似用例リスト、３０…制御部、３１…入力処理部、３２…出力処理部、３３…翻訳部、３４…類似度算出手段、３５…類似度条件補正手段、３６…対訳データベース検索手段、３７…入力データ記憶部、３８…変数データ記憶部 DESCRIPTION OF SYMBOLS 11 ... Machine translation apparatus, 12 ... Operation control apparatus, 13 ... Processor, 14 ... Memory, 15 ... Program, 16 ... Work area, 17 ... Output device, 18 ... Display apparatus, 19 ... Communication control apparatus, 20 ... Input device, DESCRIPTION OF SYMBOLS 21 ... Mouse, 22 ... Keyboard, 23 ... Disk drive, 24 ... Hard disk drive, 25 ... Storage device, 26 ... Dictionary part, 27 ... Bilingual translation database, 28 ... Similarity condition correction table, 29 ... Similarity example list, 30 ... Control , 31 ... input processing unit, 32 ... output processing unit, 33 ... translation unit, 34 ... similarity calculation means, 35 ... similarity condition correction means, 36 ... parallel database search means, 37 ... input data storage unit, 38 ... Variable data storage

Claims

Read into the storage device and the sentence of the first language and the example sentence in the second language and translation database stored in advance as a translation example in translation, the similarity conditions of the original and translation examples of the first language inputted from the input device The ratio of the number of matching words between the input processing unit and the first language source text read by the input processing unit and the first language example sentence in the parallel translation database is similar to the first language source sentence and the first language example sentence. Similarity calculation means for calculating the degree of similarity, and a similarity condition correction in which a correction value of the similarity is defined in advance based on the number of words in the original text of the first language input from the input device and stored in the storage device A similarity condition correction unit that corrects a similarity condition based on the table, the number of words of the original text in the first language read by the input processing unit and the content of the similarity condition correction table, and the similarity condition correction unit Corrected A bilingual database search means for searching an example sentence in a first language satisfying a similarity degree and a second language example sentence of the parallel translation from the parallel translation database, an example sentence in the first language searched by the parallel translation database search means, and machine translation apparatus and an output processing unit that outputs to an output device sentence of the second language.

2. The machine translation apparatus according to claim 1, wherein the number of words in the first language is replaced with the number of characters or phrases in the first language example sentence or the original sentence.

2. The machine translation device according to claim 1, wherein, instead of the number of words in the original text in the first language, a difference in the number of words, a difference in the number of characters, or a difference in the number of phrases between the example sentence in the first language and the original text. .

Storage peripherals to a sentence of the first language and the example sentence in the second language and translation database stored in advance as a translation example in translation, based on the number of words of the first language of the original input from the input device in advance first the computer that closed the language of the original and a similarity condition correction table correction value of the similarity between the sentence in the first language is defined stored in said storage device, the first language inputted from said input device The procedure for reading the similarity condition of the original text and the translation example, and the ratio of the number of matching words between the read original text in the first language and the first language example sentence in the parallel translation database, the original text in the first language and the example text in the first language A procedure for calculating the similarity condition, a procedure for correcting the similarity condition based on the number of words of the original text in the read first language and the content of the similarity condition correction table, and a first condition that satisfies the corrected similarity condition Example sentences in one language Machine translation for executing the procedure for searching the sentences in the second language of the translation from the translation database, and a procedure for outputting to an output device sentence of the second language sentence and its translation of the retrieved first language program.

Based on the bilingual database in which the example sentences of the first language and the example sentences of the second language are stored in parallel as translation examples in the storage device, and based on the number of words in the original language of the first language input from the input device A machine translation method executed by a computer having a similarity condition correction table in which a correction value for similarity between the original text and the first language example sentence is defined and stored in the storage device. A step of reading the similarity condition of the original text in the first language and the example for translation, and the ratio of the number of matching words between the read original text in the first language and the first language example sentence in the parallel translation database Calculating the similarity between the first language example sentence and the first language example sentence, correcting the similarity condition based on the number of words read in the first language source text and the content of the similarity condition correction table, A first language example sentence that satisfies the similarity condition and a second language example sentence translated from the first language example sentence, and a search of the first language example sentence and the second language example sentence translated therefrom are output. A machine translation method comprising: outputting to a device.