JP2006065542A

JP2006065542A - Machine translation method

Info

Publication number: JP2006065542A
Application number: JP2004246349A
Authority: JP
Inventors: Fukutsugu Nin; 福継任
Original assignee: University of Tokushima NUC
Current assignee: University of Tokushima NUC
Priority date: 2004-08-26
Filing date: 2004-08-26
Publication date: 2006-03-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a machine translation method suitable for the translation of many languages for quickening translation processing, for acquiring a natural text as a translation result, and for simplifying customization for each user. <P>SOLUTION: This machine translation is provided to divide an inputted original language sentence into a noun removal sentence and a noun, and to retrieve a condition-added corresponding type database and a noun database for translating it into the noun removal sentence and the noun of an object language, and for translating the original language sentence into the object language sentence only by combining the noun removal sentence of the object language with the noun of the object language. All words and phrases other than a noun are treated as a noun removal sentence being a simple character string, and a syntax analysis or a meaning analysis is not operated. As for the condition-added corresponding type database and noun database, they are only retrieved, respectively, and any analysis not operated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は原言語を目的言語に翻訳する機械翻訳方法に関し、より詳しくは、多言語間の翻訳に適した機械翻訳方法に関する。 The present invention relates to a machine translation method for translating a source language into a target language, and more particularly to a machine translation method suitable for translation between multiple languages.

現代社会は高度情報化社会であり、インターネットを始めとする通信技術の発達により、世界中の情報を地理的な制限無く瞬時に入手することが可能となったが、必要とする情報が母国語以外の言語で表記されている場合も多く、機械翻訳の需要が高まっている。既に機械翻訳は各種サービスが実用化されており、大部分が文法規則に基づくルールベース型の翻訳方法である（特開２００４−８６９１９号公報参照）。しかし、ルールベース型の翻訳方法では、翻訳処理に時間を要する、不自然な翻訳文が作成されるなど利用者が満足しているとは言い難い状況である。
特開２００４−８６９１９号公報 The modern society is an advanced information society, and with the development of communication technology including the Internet, it has become possible to obtain information from around the world instantly without geographical restrictions. There are many cases that are written in other languages, and the demand for machine translation is increasing. Various services have already been put into practical use for machine translation, and most are rule-based translation methods based on grammatical rules (see Japanese Patent Application Laid-Open No. 2004-86919). However, in the rule-based translation method, it is difficult to say that the user is satisfied because the translation processing takes time and an unnatural translation is created.
JP 2004-86919 A

機械翻訳利用者の要望は、操作が容易であり、翻訳処理が速く、自然な翻訳文が得られ、各自の利用状況に応じてカスタマイズしやすく、多言語間の翻訳に適した機械翻訳方法である。しかし、従来のルールベース型の機械翻訳方法は、構文解析及び意味解析に基づく処理を行うため、翻訳処理に時間を要し、翻訳結果についても利用者の要望に応えるには限界があった。これに対して、構文解析及び意味解析を必要としない機械翻訳方法が発明されている（特開２００２−７３９２号公報、特開２００２−７３９３号公報、特開２００２−７３９５号公報参照）。しかし、これらは、時制を含む動詞の解析や、文型による命令文・疑問文などの判定も行っているため、翻訳処理を簡略化しきれないという課題が残っていた。
特開２００２−７３９２号公報特開２００２−７３９３号公報特開２００２−７３９５号公報 Machine translation users want machine translation methods that are easy to operate, have a fast translation process, get natural translations, are easily customized according to their usage, and are suitable for multilingual translation. is there. However, since the conventional rule-based machine translation method performs processing based on syntax analysis and semantic analysis, it takes time for the translation processing, and the translation result has a limit to meet the user's request. On the other hand, machine translation methods that do not require syntactic analysis and semantic analysis have been invented (see Japanese Patent Application Laid-Open Nos. 2002-7392, 2002-7393, and 2002-7395). However, these methods also analyze verbs including tense and determine imperative sentences / question sentences based on sentence patterns, so the problem remains that the translation process cannot be simplified.
JP 2002-7392 A JP 2002-7393 A JP 2002-7395 A

以上の現状を勘案し、本発明の目的は、翻訳処理が高速であり、翻訳結果は自然な文章が得られ、利用者ごとのカスタマイズが容易であり、多言語間の翻訳に適した、機械翻訳方法を提供することである。 In view of the above-described situation, the object of the present invention is to provide a machine that is capable of high-speed translation processing, produces natural sentences as translation results, is easily customized for each user, and is suitable for translation between multiple languages. To provide a translation method.

発明者は前記課題を解決するために長年にわたり鋭意研究を続け、構文解析及び意味解析を必要としない機械翻訳方法及びシステムを開発し、以下の発明に至った。 The inventor has continued intensive research for many years in order to solve the above-mentioned problems, and has developed a machine translation method and system that does not require syntactic analysis and semantic analysis, leading to the following invention.

第一の発明は、原言語を目的言語に翻訳する機械翻訳方法であって、
（１）原言語の名詞除去文と、目的言語の名詞除去文を対応させた条件付対応式を記憶する条件付対応式データベースを作成する工程と、
（２）原言語の名詞と、該原言語の名詞に対応する目的言語の名詞と、該原言語の名詞と該目的言語の名詞に対応する名詞の属性を記憶する名詞データベースを作成する工程と、
（３）翻訳対象として、原言語文を入力する工程と、
（４）該原言語文から名詞を抽出し、原言語の名詞除去文と、原言語の名詞に分割する工程と、
（５）該原言語の名詞をもとに、名詞データベースを検索して、原言語の名詞の属性を判定すると共に、該原言語の名詞を目的言語の名詞に翻訳する工程と、
（６）該原言語の名詞除去文をもとに、名詞の属性を条件として、条件付対応式データベースを検索し、条件付対応式を決定して、該原言語の名詞除去文を目的言語の名詞除去文に翻訳する工程と、
（７）該目的言語の名詞除去文に、工程（５）で得た該目的言語の名詞を導入し、目的言語文を作成する工程と、
（８）翻訳結果として、該目的言語文を出力する工程、
からなる機械翻訳方法である。 A first invention is a machine translation method for translating a source language into a target language,
(1) creating a conditional correspondence expression database for storing a conditional correspondence expression that associates a noun removal sentence in a source language with a noun removal sentence in a target language;
(2) creating a noun database that stores source language nouns, target language nouns corresponding to the source language nouns, and nouns corresponding to the source language nouns and nouns corresponding to the target language nouns; ,
(3) inputting a source language sentence as a translation target;
(4) extracting a noun from the source language sentence and dividing it into a source language noun removal sentence and a source language noun;
(5) searching a noun database based on the source language noun to determine attributes of the source language noun, and translating the source language noun into a target language noun;
(6) Based on the noun removal sentence in the source language, search the conditional correspondence database using the noun attribute as a condition, determine the conditional correspondence expression, and use the noun removal sentence in the source language as the target language Translating into noun-removed sentences,
(7) introducing the target language noun obtained in step (5) into the target language noun removal sentence, and creating a target language sentence;
(8) a step of outputting the target language sentence as a translation result;
Is a machine translation method consisting of

第二の発明は、前記条件付対応式データベースが、文体によって分類された複数の条件付対応式データベースの集合体であり、利用者が利用目的に応じた条件付対応式データベースを１つ以上選択して使用することを特徴とする、第一の発明に記載される翻訳方法である。 In the second invention, the conditional correspondence database is a collection of a plurality of conditional correspondence databases classified by style, and the user selects one or more conditional correspondence databases according to the purpose of use. The translation method described in the first invention is characterized in that it is used as a translation method.

第三の発明は、前記条件付対応式データベースに、利用者が条件付対応式を追加できることを特徴とする、第一〜二の発明のいずれかに記載される機械翻訳方法である。 A third invention is the machine translation method according to any one of the first to second inventions, wherein a user can add a conditional correspondence expression to the conditional correspondence expression database.

第四の発明は、前記名詞データベースが、分野によって分類された複数の名詞データベースの集合体であり、利用者が利用目的に応じた名詞データベースを１つ以上選択して使用することを特徴とする、第一〜三の発明のいずれかに記載される機械翻訳方法である。 A fourth invention is characterized in that the noun database is an aggregate of a plurality of noun databases classified according to a field, and the user selects and uses one or more noun databases according to the purpose of use. The machine translation method according to any one of the first to third inventions.

第五の発明は、前記名詞データベースに、利用者が名詞を追加できることを特徴とする、第一〜四の発明のいずれかに記載される機械翻訳方法である。 A fifth invention is the machine translation method described in any one of the first to fourth inventions, wherein a user can add a noun to the noun database.

第六の発明は、前記条件付対応式データベース及び前記名詞データベース内の各項目を音韻順に登録して、該条件付対応式データベース及び該名詞データベースを検索する際に、文字列の前から順番に一文字ごとに音韻を判定することを特徴とする、第一〜五の発明のいずれかに記載される機械翻訳方法である。 The sixth invention registers the items in the conditional correspondence database and the noun database in phonological order, and searches the conditional correspondence database and the noun database in order from the front of the character string. The machine translation method according to any one of the first to fifth inventions, wherein a phoneme is determined for each character.

第七の発明は、原言語を目的言語に翻訳する機械翻訳システムであって、
（１）翻訳対象である原言語文を入力する入力装置と、
（２）翻訳結果である目的言語文を出力する出力装置と、
（３）原言語の名詞除去文と、目的言語の名詞除去文を対応させた条件付対応式を記憶する条件付対応式データベースと、
（４）原言語の名詞と、該原言語の名詞に対応する目的言語の名詞と、該原言語の名詞と該目的言語の名詞に対応する名詞の属性を記憶する名詞データベースと、
（５）入力された原言語文をもとに、条件付対応式データベース及び名詞データベースを検索し、目的言語への翻訳処理を行う翻訳エンジン、
からなる機械翻訳システムである。 A seventh invention is a machine translation system that translates a source language into a target language,
(1) an input device for inputting a source language sentence to be translated;
(2) an output device that outputs a target language sentence as a translation result;
(3) a conditional correspondence expression database for storing a conditional correspondence expression that associates a noun removal sentence in the source language with a noun removal sentence in the target language;
(4) a noun database storing source language nouns, target language nouns corresponding to the source language nouns, attributes of the source language nouns and nouns corresponding to the target language nouns;
(5) A translation engine that searches the conditional correspondence database and the noun database based on the input source language sentence and performs translation processing into the target language.
Is a machine translation system consisting of

第八の発明は、前記条件付対応式データベースと、前記名詞データベースと、前記翻訳エンジンと、をサーバーに保存し、各通信端末から該サーバーにアクセスし利用することを特徴とする、第七の発明に記載される機械翻訳システムである。 The eighth invention is characterized in that the conditional correspondence database, the noun database, and the translation engine are stored in a server and accessed from each communication terminal and used. It is a machine translation system described in the invention.

第九の発明は、前記通信端末が携帯電話であることを特徴とする、第八の発明に記載される機械翻訳システムである。 A ninth invention is the machine translation system according to the eighth invention, wherein the communication terminal is a mobile phone.

本発明の説明に先立ち、「名詞除去文」「属性」「条件付対応式」という用語を定義する。 Prior to the description of the present invention, the terms “noun removal sentence”, “attribute”, and “conditional correspondence expression” are defined.

本発明における「名詞除去文」とは原言語文や目的言語文から名詞を取り除いた残りの部分を示すものである。名詞以外の、動詞や助詞を含む語句により構成される「名詞除去文」は全て、単なる文字列として扱う。 The “noun removal sentence” in the present invention indicates a remaining part obtained by removing a noun from a source language sentence or a target language sentence. All “noun removal sentences” composed of words including verbs and particles other than nouns are treated as simple character strings.

本発明における名詞の「属性」とは、人物・動物・場所・道具・食物・交通手段など、名詞の概念で分類されたものである。ひとつの名詞が、複数の「属性」と成り得る場合には、出現頻度に応じた優先順位をもつ。 The “attribute” of a noun in the present invention is classified by the concept of a noun such as a person, an animal, a place, a tool, food, and transportation. When one noun can be a plurality of “attributes”, it has a priority according to the appearance frequency.

本発明における「条件付対応式」とは、原言語の名詞除去文と、目的言語の名詞除去文を対応させた式であって、本来名詞が存在する部分に名詞の属性を入れた文である。 The “conditional correspondence expression” in the present invention is an expression that associates a noun removal sentence in the source language with a noun removal sentence in the target language, and is a sentence in which noun attributes are added to the part where the noun originally exists. is there.

同じ動詞を使用する「条件付対応式」は、疑問文や命令文又は時制の変化など、語順や単語が異なる場合は、それぞれ異なる「条件付対応式」として扱われるが、文末が多少異なっていても同じ内容の場合（例えば、行く、行きます等）、条件付対応式データベース内でひとつの集合として保存することができる。 “Conditional expressions” that use the same verb are treated as different “conditional expressions” when the word order or words are different, such as question sentences, imperative sentences, or changes in tense, but the sentence endings are slightly different. However, if the contents are the same (for example, go, go, etc.), they can be saved as a single set in the conditional correspondence database.

本発明に基づく機械翻訳は、入力された原言語文を名詞除去文と名詞に分割し、条件付対応式データベースと名詞データベースを検索することにより、目的言語の名詞除去文と名詞に翻訳し、該目的言語の名詞除去文と該目的言語の名詞を組み合わせるのみで目的言語文への翻訳を行う。
名詞以外の語句は全て、単なる文字列である名詞除去文として扱い、構文解析及び意味解析は行わない。条件付対応式及び名詞についても、それぞれのデータベースの検索を行うのみであり、解析は一切行わない。 Machine translation based on the present invention divides the input source language sentence into noun removal sentences and nouns, and translates into noun removal sentences and nouns of the target language by searching the conditional correspondence database and noun database, The translation into the target language sentence is performed only by combining the noun removal sentence of the target language and the noun of the target language.
All phrases other than nouns are treated as noun removal sentences that are simply character strings, and syntax analysis and semantic analysis are not performed. For conditional correspondence expressions and nouns, only the respective databases are searched, and no analysis is performed.

従来のルールベース型の機械翻訳方法が必須としていた、構文解析及び意味解析を必要とせずに翻訳を行うことが可能であるため、高速な処理速度が得られる。 Since the conventional rule-based machine translation method is indispensable, translation can be performed without the need for syntactic analysis and semantic analysis, so that a high processing speed can be obtained.

条件付対応式データベースに保存されている条件付対応式は、もともと各言語の自然な文章を一件ずつ名詞除去文と属性に分割して入力したものであるため、翻訳結果は自然な文章が得られる。 The conditional correspondence expressions stored in the conditional correspondence expression database are originally entered by dividing the natural sentences of each language into noun removal sentences and attributes one by one. can get.

対訳文を入力することにより容易に条件付対応式を構築し、条件付対応式データベース及び名詞データベースを追加できるため、利用者ごとのカスタマイズが容易である。 A conditional correspondence formula can be easily constructed by inputting a bilingual sentence, and a conditional correspondence formula database and a noun database can be added. Therefore, customization for each user is easy.

本発明に基づく機械翻訳は、各言語の自然な文章を、名詞除去文と属性に分割して入力した条件付対応式データベース及び名詞データベースを用いるのみであり、文法を考慮する必要がないため、多言語間の翻訳に適している。 Machine translation based on the present invention uses only a conditional correspondence database and a noun database that are input by dividing a natural sentence of each language into a noun removal sentence and an attribute, and it is not necessary to consider grammar. Suitable for multilingual translation.

本発明による機械翻訳システムの構成図を図１に示す。 A block diagram of a machine translation system according to the present invention is shown in FIG.

図１に示す機械翻訳システムは、
翻訳対象である原言語文を入力する入力装置１００と、
翻訳結果である目的言語文を出力する出力装置２００と、
入力された原言語文をもとに、条件付対応式データベース及び名詞データベースを検索し、目的言語への翻訳処理を行う翻訳エンジン３００と、
原言語の名詞除去文と、目的言語の名詞除去文を対応させた条件付対応式を記憶する条件付対応式データベース４００と、
原言語の名詞と、該原言語の名詞に対応する目的言語の名詞と、該原言語の名詞と該目的言語の名詞に対応する名詞の属性を記憶する名詞データベース５００、
を備える。 The machine translation system shown in FIG.
An input device 100 for inputting a source language sentence to be translated;
An output device 200 for outputting a target language sentence as a translation result;
A translation engine 300 that searches a conditional correspondence database and a noun database based on the input source language sentence, and performs a translation process into a target language;
A conditional correspondence expression database 400 for storing a conditional correspondence expression that associates a noun removal sentence in the source language with a noun removal sentence in the target language;
A noun database 500 that stores source language nouns, target language nouns corresponding to the source language nouns, and nouns of the source language nouns and nouns corresponding to the target language nouns;
Is provided.

入力装置１００は、キーボード、外部記憶媒体、インターネットからのダウンロードなど通常コンピュータの操作に利用されるどのようなものであってもよい。さらには、音声入力システムを併用してもよい。 The input device 100 may be anything that is normally used for computer operations, such as a keyboard, an external storage medium, and download from the Internet. Furthermore, a voice input system may be used in combination.

出力装置２００は、ディスプレイ、プリンターなど通常コンピュータの操作に利用されるどのようなものであってもよい。さらには、音声出力システムを併用してもよい。 The output device 200 may be any device such as a display or a printer that is normally used for operating a computer. Furthermore, an audio output system may be used in combination.

条件付対応式データベース４００は、文体によって分類された文体別条件付対応式データベースの集合体４１０であっても良い。
本発明による機械翻訳は、自然な翻訳文を作成するために、条件付対応式データベースが充実していることが望ましい。しかし、大学や研究機関などにおいて機械翻訳を利用する場合、学術論文や講演原稿などを対象とすることが多く、文語的な表現が重要であり、一方、日常的に、インターネットや携帯電話を使用中に機械翻訳を利用する場合は、口語的な表現の方が利用される頻度が高いと想定される。
条件付対応式を、会話調や論文調など文体別に分割して保存し、機械翻訳を行う際に利用目的に応じた該文体別条件付対応式データベースを選択可能とすることにより、それぞれのデータベースの容量を軽減し、処理速度のさらなる向上を図ることができる。 The conditional correspondence database 400 may be a collection 410 of sentence-specific conditional correspondence databases classified by style.
In the machine translation according to the present invention, it is desirable that the conditional correspondence database is enriched in order to create a natural translation. However, when using machine translation at universities and research institutes, it is often the case that academic papers and lecture manuscripts are targeted, and literary expressions are important. On the other hand, the Internet and mobile phones are used on a daily basis. When using machine translation, colloquial expressions are expected to be used more frequently.
Conditional correspondence expressions are stored separately for each style, such as conversation style and paper style, and each database can be selected by selecting the style-specific conditional correspondence database according to the purpose of use when performing machine translation. Can be reduced, and the processing speed can be further improved.

さらに、利用者が条件付対応式を追加することのできる利用者別条件付対応式データベース４２０を設定し、前記文体別条件付対応式データベースと同様に処理を行うことにより、利用者ごとの必要に応じたカスタマイズが容易となる。 Furthermore, a user-specific conditional expression database 420 to which a user can add conditional correspondence expressions is set, and processing is performed in the same manner as the sentence-specific conditional correspondence expression database. It is easy to customize according to your needs.

名詞データベース５００は、分野によって分類された分野別名詞データベースの集合体５１０であっても良い。
名詞データベースの語彙数は翻訳能力を左右する大きな要因であるが、学術論文を翻訳する場合と日常会話を翻訳する場合では、必要とする語句は全く異なると言って良い。特に学術論文を翻訳するには、各専門分野で語彙が充実していることが望ましい。
名詞を、生化学や機械工学など分野別に分割して保存し、翻訳を行う際に利用目的に応じた該分野別名詞データベースを選択可能とすることにより、それぞれのデータベースの容量を軽減し、処理速度のさらなる向上を図ることができる。 The noun database 500 may be a collection 510 of field-specific noun databases classified by field.
The number of vocabularies in the noun database is a major factor that affects the translation ability, but it can be said that the required phrases are completely different when translating academic papers and translating everyday conversations. In particular, to translate academic papers, it is desirable that vocabulary is enriched in each specialized field.
Nouns are divided and stored by fields such as biochemistry and mechanical engineering, and by making it possible to select the field-specific noun database according to the purpose of use when translating, the capacity of each database is reduced and processed. The speed can be further improved.

さらに、利用者が名詞を追加することのできる利用者別名詞データベース５２０を設定し、前記分野別名詞データベースと同様に処理を行うことにより、利用者ごとの必要に応じたカスタマイズが容易となる。 Furthermore, by setting a user-specific noun database 520 to which a user can add nouns and performing the same processing as the field-specific noun database, customization according to the needs of each user is facilitated.

前記条件付対応式データベース及び前記名詞データベース内の各項目を音韻順に登録して、該条件付対応式データベース及び該名詞データベースを検索する際に、文字列の前から順番に一文字ごとに音韻を判定する検索方法を採用しているため、検索時間に影響を与える要素は、対象となる名詞除去文又は名詞の文字数のみであり、データベースに含まれる項目数にかかわらず、高速な検索が可能である。
また、名詞除去文をもとに条件付対応式データベースを検索する場合は、名詞の前後の文字列をそれぞれ分割して検索を行う。 Each item in the conditional correspondence database and the noun database is registered in phonological order, and when searching the conditional correspondence database and the noun database, phonemes are determined for each character in order from the front of the character string. Therefore, the only factor that affects the search time is the number of characters in the target noun removal sentence or noun, and a high-speed search is possible regardless of the number of items in the database. .
When searching the conditional correspondence database based on the noun removal sentence, the character string before and after the noun is divided and searched.

本発明を実施する機械翻訳システムは、単一のコンピュータ内で利用するのみでなく、イントラネット上、あるいはインターネット上など、個人が使用する通信端末から共通のサーバーにアクセス可能な環境において、前記条件付対応式データベースと、前記名詞データベースと、前記翻訳エンジンと、をサーバーに保存し、各個人が使用する通信端末から該サーバーにアクセスし利用するとしても良い。記憶容量を必要とする該条件付対応式データベース及び該名詞データベースをサーバーに保存することにより、各通信端末の負担が軽減される。 The machine translation system for carrying out the present invention is not only used in a single computer but also in an environment in which a common server can be accessed from a communication terminal used by an individual, such as on an intranet or the Internet. The correspondence type database, the noun database, and the translation engine may be stored in a server, and the server may be accessed and used from a communication terminal used by each individual. By storing the conditional correspondence database that requires storage capacity and the noun database in the server, the burden on each communication terminal is reduced.

さらに、前記通信端末を携帯電話やＰＤＡなどの携帯端末としても良い。本発明による機械翻訳は、翻訳処理が簡易であるため、メモリ容量や消費電力の負担が軽微であり、携帯電話やＰＤＡなどの携帯端末での利用に適した機械翻訳方法である。 Furthermore, the communication terminal may be a mobile terminal such as a mobile phone or a PDA. The machine translation according to the present invention is a machine translation method that is easy for translation processing, has a small memory capacity and power consumption, and is suitable for use in a portable terminal such as a mobile phone or a PDA.

入力文が日本語であり、英語及び中国語に翻訳する場合を例として、多言語間の翻訳の工程を説明する。
文中で使用する表記は、Ｊ：日本語文、Ｅ：英語文、Ｃ：中国語文、ｊ：日本語名詞、ｅ：英語名詞、ｃ：中国語名詞、ｘ：名詞（属性）をそれぞれ意味している。
Ｊ（ｘ）、Ｅ（ｘ）、Ｃ（ｘ）は条件付対応式である。条件付対応式では名詞は属性のみが意味を持つので、言語を考慮しない名詞としてｘを用いて表記している。 The process of translating between multiple languages will be described by taking as an example the case where the input sentence is Japanese and translated into English and Chinese.
The notation used in the sentence means J: Japanese sentence, E: English sentence, C: Chinese sentence, j: Japanese noun, e: English noun, c: Chinese noun, x: noun (attribute), respectively. Yes.
J (x), E (x), and C (x) are conditional correspondence equations. In the conditional correspondence formula, nouns have only meanings, so they are expressed using x as nouns that do not consider language.

機械翻訳フローチャートを図２に示す。機械翻訳の工程は請求項１記載のようになる。工程（１）及び（２）に記載している、条件付対応式データベース及び名詞データベースの作成は完了しているものとする。表１に条件付対応式データベース、表２に名詞データベースを示す。 A machine translation flowchart is shown in FIG. The machine translation process is as described in claim 1. It is assumed that the creation of the conditional correspondence database and the noun database described in the steps (1) and (2) has been completed. Table 1 shows the conditional correspondence database, and Table 2 shows the noun database.

工程（３）翻訳対象として、原言語文を入力する。
例として、日本語文「彼は駅までタクシーに乗った。」を入力したとする。 Step (3) A source language sentence is input as a translation target.
As an example, suppose that a Japanese sentence “He took a taxi to the station” was entered.

工程（４）該原言語文から名詞を抽出し、原言語の名詞除去文と、原言語の名詞に分割する。
原言語の名詞除去文は「Ｊ：ｊ１はｊ２までｊ３に乗った。」となり、抽出された名詞は「ｊ１＝彼、ｊ２＝駅、ｊ３＝タクシー」となる。 Step (4) A noun is extracted from the source language sentence, and is divided into a source language noun removal sentence and a source language noun.
The noun removal sentence in the source language is “J: j1 rides on j3 until j2”, and the extracted noun is “j1 = hi, j2 = station, j3 = taxi”.

工程（５）該原言語の名詞をもとに、名詞データベースを検索して、原言語の名詞の属性を判定すると共に、該原言語の名詞を目的言語の名詞に翻訳する。
名詞データベースを検索し、各名詞の属性は「ｘ１：人物、ｘ２：場所、ｘ３：交通手段」となり、英語及び中国語の名詞に翻訳すると、「ｅ１＝ｈｅ、ｅ２＝ｔｈｅｓｔａｔｉｏｎ、ｅ３＝ａｔａｘｉ」、「ｃ１＝他、ｃ２＝車站、ｃ３＝出租車」となる。 Step (5) Based on the nouns in the source language, the noun database is searched to determine the attributes of the nouns in the source language, and the nouns in the source language are translated into nouns in the target language.
The noun database is searched, and the attributes of each noun are “x1: person, x2: place, x3: means of transportation”. When translated into English and Chinese nouns, “e1 = he, e2 = the station, e3 = a” taxi ”,“ c1 = other, c2 = vehicle, c3 = departure car ”.

工程（６）該原言語の名詞除去文をもとに、名詞の属性を条件として、条件付対応式データベースを検索し、条件付対応式を決定して、該原言語の名詞除去文を目的言語の名詞除去文に翻訳する。
名詞除去文「Ｊ：ｊ１はｊ２までｊ３に乗った。」をもとに、各名詞の属性「ｘ１：人物、ｘ２：場所、ｘ３：交通手段」を条件として、条件付対応式データベースを検索すると、条件付対応式は、「Ｊ（ｘ）＝ｘ１はｘ２までｘ３に乗った。（条件ｘ１：無条件、ｘ２：場所、ｘ３：交通手段）」となる。
さらに、日本語の名詞除去文を、英語及び中国語の名詞除去文に翻訳すると、
「Ｅ：ｅ１ｔｏｏｋｅ３ｔｏｅ２．」、「Ｃ：ｃ１乗ｃ３去了ｃ２。」という結果が得られる。 Step (6) Based on the noun removal sentence in the source language, search the conditional correspondence database using the noun attribute as a condition, determine the conditional correspondence expression, and aim for the noun removal sentence in the source language Translate to language noun removal sentences.
Based on the noun removal sentence “J: j1 rides j3 up to j2”, the conditional correspondence database is searched using the attributes “x1: person, x2: location, x3: transportation” for each noun. Then, the conditional correspondence formula is “J (x) = x1 is on x3 up to x2. (Condition x1: Unconditional, x2: Location, x3: Transportation means)”.
Furthermore, when translating Japanese noun removal sentences into English and Chinese noun removal sentences,
The results of “E: e1 talk e3 to e2.” And “C: c1 raised to c3 termination c2.” Are obtained.

工程（７）該目的言語の名詞除去文に、工程（５）で得た該目的言語の名詞を導入し、目的言語文を作成する。
英語の名詞除去文「Ｅ：ｅ１ｔｏｏｋｅ３ｔｏｅ２．」に英語名詞「ｅ１＝ｈｅ、ｅ２＝ｔｈｅｓｔａｔｉｏｎ、ｅ３＝ａｔａｘｉ」を導入し英語文を作成する。
中国語の名詞除去文「Ｃ：ｃ１乗ｃ３去了ｃ２。」に中国語名詞「ｃ１＝他、ｃ２＝車站、ｃ３＝出租車」を導入し中国語文を作成する。 Step (7) The target language noun obtained in step (5) is introduced into the target language noun removal sentence to create a target language sentence.
The English noun removal sentence “E: e1 talk e3 to e2.” Introduces the English noun “e1 = he, e2 = the station, e3 = a taxi” to create an English sentence.
The Chinese noun removal sentence “C: c1 raised c3 finished c2.” Introduces the Chinese noun “c1 = other, c2 = vehicle, c3 = departure car” to create a Chinese sentence.

工程（８）翻訳結果として、該目的言語文を出力する。
翻訳対象として入力した「Ｊ：彼は駅までタクシーに乗った。」に対する翻訳結果として、英語文「Ｅ：Ｈｅｔｏｏｋａｔａｘｉｔｏｔｈｅｓｔａｔｉｏｎ．」、
中国語文「Ｃ：他乗出租車去了車站。」を表示する。 Step (8) The target language sentence is output as a translation result.
As the translation result for “J: He took a taxi to the station” entered as the translation target, the English sentence “E: He talk a taxi to the station.”
The Chinese sentence “C: other departure taxi exit vehicle” is displayed.

日本語の助詞と英語の前置詞は、同様の働きを持ちながら、単語が一対一に対応しないため、従来の機械翻訳では、充分な精度が得られず、対策が望まれている。本発明では、名詞データベースに設定されている属性により助詞や前置詞を区別している。 Japanese particles and English prepositions have the same function, but the words do not correspond one-to-one. Therefore, conventional machine translation does not provide sufficient accuracy, and countermeasures are desired. In the present invention, particles and prepositions are distinguished by attributes set in the noun database.

日本語文イ）「Ｊ：〜食堂で食べる〜」、ロ）「Ｊ：〜ナイフで食べる〜」を英語に翻訳する場合を例として、前置詞の区別を説明する。先に翻訳結果を示すと、
イ）「Ｅ：〜ｅａｔ “ｉｎ” ｔｈｅｄｉｎｉｎｇｒｏｏｍ〜」、
ロ）「Ｅ：〜ｅａｔ “ｗｉｔｈ” ａｋｎｉｆｅ〜」
である。 Japanese sentence a) “J: ~ eat at the cafeteria” and b) “J: ~ eat with a knife” are translated into English as an example to explain the distinction of prepositions. The translation result is shown first.
B) “E: ~ eat“ in ”the dining room”,
B) “E: ~ eat“ with ”a knife ~”
It is.

日本語の名詞除去文は、どちらも「Ｊ：〜ｊ１“で”食べる〜」となるため、名詞の持つ属性を考えないとすると、目的言語の名詞除去文、
イ）「Ｅ：〜ｅａｔ “ｉｎ” ｅ１〜」、
ロ）「Ｅ：〜ｅａｔ “ｗｉｔｈ” ｅ１〜」
を区別できない。（表１参照） Both noun removal sentences in Japanese are “J: ~ j1” “eat”, so if you do not consider the attributes of nouns, noun removal sentences in the target language,
B) “E: ~ eat“ in ”e1”,
B) “E: ~ eat“ with ”e1”
Cannot be distinguished. (See Table 1)

ここにおいて、名詞データベースに属性という項目を設定していることにより、
イ）「ｊ１＝食堂」には「場所」、
ロ）「ｊ１＝ナイフ」には「道具」、
という属性が付加される。 Here, by setting an attribute item in the noun database,
B) “J1 = Dining room” has “Place”
B) “Tool” for “j1 = knife”
The attribute is added.

名詞除去文が同一であっても、条件すなわち名詞の属性の違いにより、条件付対応式が、
イ）「〜ｘ１で食べる〜（ｘ１：場所）＝〜ｅａｔ “ｉｎ” ｘ１〜」、
ロ）「〜ｘ１で食べる〜（ｘ１：道具）＝〜ｅａｔ “ｗｉｔｈ” ｘ１〜」
と区別することが可能となる。 Even if the noun removal sentence is the same, the conditional correspondence formula is different depending on the condition, that is, the noun attribute,
A) “Eat at ~ x1 ~ (x1: place) = ~ eat“ in ”x1-”,
B) "Eat with x1 ~ (x1: tool) = ~ eat" with "x1""
And can be distinguished from each other.

データベース検索の処理方法を説明する。本発明においては、条件付対応式データベース及び前記名詞データベース内の各項目を音韻順に登録して、検索する際には、文字列の前から順番に一文字ごとに音韻を判定する。
また、名詞除去文をもとに条件付対応式データベースを検索する場合は、名詞の前後の文字列をそれぞれ分割して検索を行う。
検索時間に影響を与える要素は、対象となる名詞又は名詞除去文の文字数のみであり、データベースに含まれる項目数にかかわらず、高速な検索が可能である。 A database search processing method will be described. In the present invention, when the items in the conditional correspondence database and the noun database are registered in the phonemic order and searched, the phoneme is determined for each character in order from the front of the character string.
When searching the conditional correspondence database based on the noun removal sentence, the character string before and after the noun is divided and searched.
The only factor affecting the search time is the number of characters in the target noun or noun removal sentence, and a high-speed search is possible regardless of the number of items included in the database.

「ｅ１＝ｈｅ」を検索する場合を例として図３に示す。１文字目「ｈ」、２文字目「ｅ」、３文字目「／（終了）」として名詞データベースの検索を行う。 An example of searching for “e1 = he” is shown in FIG. The noun database is searched for the first character “h”, the second character “e”, and the third character “/ (end)”.

名詞除去文「Ｅ：ｅ１ｔｏｏｋｅ３ｔｏｅ２．」をもとに、条件付対応式データベースを検索する際は、検索対象となる文字列は、名詞除去文を「空白（ｅ１）ｔｏｏｋ（ｅ３）ｔｏ（ｅ２）．」と考え、「／」「ｔｏｏｋ／」「ｔｏ／」「．／」の四カ所に分割して順次検索を行う。 When searching the conditional correspondence database based on the noun removal sentence “E: e1 talk e3 to e2.”, The character string to be searched is the noun removal sentence “blank (e1) talk (e3)”. to (e2). ”, and the search is sequentially performed by dividing into four locations“ / ”,“ talk / ”,“ to / ”,“ ./ ”.

本発明に基づく機械翻訳は、特定の名詞が複数の意味を持ち、属性による判断も不可能な場合、機械翻訳システムが独断することなく、複数の翻訳結果を表示し、利用者に選択決定の権限を与える。 In machine translation based on the present invention, when a specific noun has a plurality of meanings and judgment by attribute is impossible, a machine translation system displays a plurality of translation results without decision, and allows the user to make a selection decision. Give authority.

例として、英語名詞「ｂａｎｋ」を日本語名詞に翻訳すると、「銀行」又は「河岸」に相当しどちらも属性は「場所」である。従って、「Ｅ：〜ｇｏｔｏｔｈｅｂａｎｋ〜」を日本語に翻訳する場合には、例えば「Ｊ：〜（銀行／河岸）に行く〜」のように表示され、利用者がいずれかを選択することになる。 For example, when the English noun “bank” is translated into a Japanese noun, it corresponds to “bank” or “river”, and the attribute of both is “location”. Therefore, when “E: ~ go to the bank ~” is translated into Japanese, for example, “J: go to (bank / river) ~” is displayed, and the user selects one of them. It will be.

機械翻訳システム構成図Machine translation system configuration diagram 機械翻訳フローチャートMachine translation flowchart 名詞データベース検索例Noun database search example

Explanation of symbols

１００入力装置
２００出力装置
３００翻訳エンジン
４００条件付対応式データベース
４１０・文体別条件付対応式データベース
４２０・利用者別条件付対応式データベース
５００名詞データベース
５１０・分野別名詞データベース
５２０・利用者別名詞データベース

DESCRIPTION OF SYMBOLS 100 Input device 200 Output device 300 Translation engine 400 Conditional correspondence type | formula database 410 ・ Conditional correspondence type | formula correspondence database 420 ・ User-specific conditional correspondence type | formula database 500 Noun database 510 ・ Field-specific noun database 520 ・ User-specific noun database

Claims

A machine translation method for translating a source language into a target language,
(1) creating a conditional correspondence expression database for storing a conditional correspondence expression that associates a noun removal sentence in a source language with a noun removal sentence in a target language;
(2) creating a noun database that stores source language nouns, target language nouns corresponding to the source language nouns, and nouns corresponding to the source language nouns and nouns corresponding to the target language nouns; ,
(3) inputting a source language sentence as a translation target;
(4) extracting a noun from the source language sentence and dividing it into a source language noun removal sentence and a source language noun;
(5) searching a noun database based on the source language noun to determine attributes of the source language noun, and translating the source language noun into a target language noun;
(6) Based on the noun removal sentence in the source language, search the conditional correspondence database using the noun attribute as a condition, determine the conditional correspondence expression, and use the noun removal sentence in the source language as the target language Translating into noun-removed sentences,
(7) introducing the target language noun obtained in step (5) into the target language noun removal sentence, and creating a target language sentence;
(8) a step of outputting the target language sentence as a translation result;
A machine translation method comprising:

The conditional correspondence database is a collection of a plurality of conditional correspondence databases classified by style, and the user selects and uses one or more conditional correspondence databases according to the purpose of use. The translation method according to claim 1, wherein the translation method is characterized.

The machine translation method according to claim 1, wherein a user can add a conditional correspondence expression to the conditional correspondence expression database.

The noun database is an aggregate of a plurality of noun databases classified according to a field, and a user selects and uses one or more noun databases according to the purpose of use. The machine translation method described in any one of.

The machine translation method according to claim 1, wherein a user can add a noun to the noun database.

Each item in the conditional correspondence database and the noun database is registered in phonological order, and when searching the conditional correspondence database and the noun database, phonemes are determined for each character in order from the front of the character string. The machine translation method according to claim 1, wherein the machine translation method is performed.

A machine translation system that translates a source language into a target language,
(1) an input device for inputting a source language sentence to be translated;
(2) an output device that outputs a target language sentence as a translation result;
(3) a conditional correspondence expression database for storing a conditional correspondence expression that associates a noun removal sentence in the source language with a noun removal sentence in the target language;
(4) a noun database storing source language nouns, target language nouns corresponding to the source language nouns, attributes of the source language nouns and nouns corresponding to the target language nouns;
(5) A translation engine that searches the conditional correspondence database and the noun database based on the input source language sentence and performs translation processing into the target language.
A machine translation system consisting of

The machine translation according to claim 7, wherein the conditional correspondence database, the noun database, and the translation engine are stored in a server, and the server is accessed and used from each communication terminal. system.

The machine translation system according to claim 8, wherein the communication terminal is a mobile phone.