JP2007080221A

JP2007080221A - Apparatus, method and program for machine translation

Info

Publication number: JP2007080221A
Application number: JP2005271041A
Authority: JP
Inventors: Kazunari Hashimoto; 一成橋本; Tsuguaki Ryu; 紹明劉
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-09-16
Filing date: 2005-09-16
Publication date: 2007-03-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus, a method and a program for machine translation, which may easily translate a document by using dictionary data for the machine translation. <P>SOLUTION: The method and the program for the machine translation, comprising the steps of: generating a featured vector by using a featured vector generating unit 10 for a machine translation object representing a feature for the document to be translated; generating a specified field vector by using a featured vector generating unit 20 for the machine translation object representing a feature for a specified field of the document; estimating a degree of belonging to each specified field for the documents to be translated base on the said vectors by using a category sorting unit 50; and selecting a specified dictionary data, which is fittest to translate the document, based on a result obtained through the above step for estimation. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、翻訳対象文を入力し、辞書データを用いた翻訳を行う機械翻訳装置及び機械翻訳方法に関する。 The present invention relates to a machine translation apparatus and a machine translation method for inputting a translation target sentence and performing translation using dictionary data.

近年、翻訳対象文を入力し、辞書データを用いた翻訳を行う機械翻訳装置が広く普及している。このような機械翻訳装置では、より適切な翻訳を可能とすべく、政治、経済等の専門分野毎に辞書データ（専門辞書データ）が設けられるものがある。この場合、複数の専門辞書データの中から如何にして翻訳対象文に適したものを選択するのかが重要である。 In recent years, machine translation devices that input a translation target sentence and perform translation using dictionary data have become widespread. Some of these machine translation apparatuses are provided with dictionary data (specialized dictionary data) for each specialized field such as politics and economy so as to enable more appropriate translation. In this case, it is important how to select one suitable for the sentence to be translated from among a plurality of specialized dictionary data.

例えば、特許文献１では、辞書データ選択の優先順位が利用者によって設定され、その優先順煮に基づいて辞書データを選択して翻訳を行う技術が提案されている。また、特許文献２では、翻訳対象文に含まれる各基本語に当該基本語がどの専門辞書データに登録されているのかを示す文脈ベクトルを付与し、翻訳対象文を基本語と専門分野とを属性とするテーブルに変換して、このテーブルと専門辞書対応テーブルとに基づいて専門辞書データを選択する技術が提案されている。
特開平７−１４１３７５号公報特開平６−３３２９４６号公報 For example, Patent Document 1 proposes a technique in which a dictionary data selection priority is set by a user, and dictionary data is selected and translated based on the priority order. Further, in Patent Document 2, a context vector that indicates in which specialized dictionary data the basic word is registered is assigned to each basic word included in the translation target sentence, and the translation target sentence is divided into a basic word and a specialized field. There has been proposed a technique for converting into a table as an attribute and selecting specialized dictionary data based on this table and a specialized dictionary correspondence table.
JP-A-7-141375 JP-A-6-332946

しかし、上述した特許文献１の技術では、利用者が手動によって辞書データ選択の優先順位を設定しており、煩雑である。また、特許文献２の技術では、専門辞書対応テーブルの設定には一定の専門知識が必要となる。 However, in the technique of Patent Document 1 described above, the user manually sets the priority order for selecting dictionary data, which is complicated. In the technique disclosed in Patent Document 2, a certain specialized knowledge is required for setting the specialized dictionary correspondence table.

本発明は、上記の点に鑑みてなされたもので、簡易に翻訳対象文に適した辞書データを用いた翻訳が可能な機械翻訳装置、機械翻訳方法及び機械翻訳プログラムを提供することを目的としている。 The present invention has been made in view of the above points, and an object thereof is to provide a machine translation device, a machine translation method, and a machine translation program capable of easily translating using dictionary data suitable for a sentence to be translated. Yes.

本発明に係る、機械翻訳装置は、複数の専門分野のそれぞれに帰属する文の特徴を表す専門分野特徴ベクトルを保持する第１の保持手段と、翻訳対象文に基づいて、該翻訳対象文の特徴を表す翻訳対象特徴ベクトルを作成する第１の作成手段と、前記第１の作成手段により作成された翻訳対象特徴ベクトルと、前記第１の保持手段に保持されている各専門分野特徴ベクトルとに基づいて、前記翻訳対象文が前記専門分野のそれぞれに帰属する可能性を推定する推定手段と、前記翻訳対象文が帰属する可能性が高い専門分野に対応する専門辞書データを優先的に選択する選択手段と、前記選択手段により選択された専門辞書データを用いて、前記翻訳対象文を翻訳する翻訳手段とを有する。 The machine translation device according to the present invention includes a first holding unit that holds a special field feature vector representing a feature of a sentence belonging to each of a plurality of specialized fields, and the translation target sentence based on the translation target sentence. A first creation means for creating a translation target feature vector representing a feature; a translation target feature vector created by the first creation means; and each specialized field feature vector held in the first holding means; Based on the above, the estimation means for estimating the possibility that the translation target sentence belongs to each of the specialized fields, and the specialized dictionary data corresponding to the specialized field to which the translation target sentence is likely to belong are preferentially selected. And a translation unit that translates the sentence to be translated using the specialized dictionary data selected by the selection unit.

これにより、翻訳対象文が帰属する可能性の高い専門分野に対応する専門辞書データが自動的に選択され、その専門辞書データを用いた適切な翻訳が可能となる。 Thereby, specialized dictionary data corresponding to a specialized field to which the translation target sentence is likely to belong is automatically selected, and appropriate translation using the specialized dictionary data becomes possible.

好ましくは、機械翻訳装置は、前記専門辞書データのそれぞれに対応する文に基づいて、前記専門分野特徴ベクトルを作成する第２の作成手段を有する。 Preferably, the machine translation apparatus includes a second creation unit that creates the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

好ましくは、機械翻訳装置は、前記第１の作成手段が、更に、前記翻訳対象文の周辺の文に基づいて、前記翻訳対象特徴ベクトルを作成する。 Preferably, in the machine translation device, the first creation unit further creates the translation target feature vector based on a sentence around the translation target sentence.

好ましくは、機械翻訳装置は、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を視認可能に提供する提供手段を有する。 Preferably, the machine translation device includes providing means for providing the possibility of belonging to each of the specialized fields of the translation target sentence so as to be visible.

これにより、利用者は、翻訳対象文がどの専門分野に帰属する可能性があるのかを把握することが可能となる。 As a result, the user can grasp which specialized field the translation target sentence may belong to.

好ましくは、機械翻訳装置は、前記提供手段が、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を表すグラフを表示する。 Preferably, in the machine translation apparatus, the providing unit displays a graph representing the possibility of the translation object sentence belonging to each of the specialized fields.

本発明に係る、機械翻訳方法は、翻訳対象文に基づいて、該翻訳対象文の特徴を表す翻訳対象特徴ベクトルを作成する第１の作成ステップと、前記第１の作成ステップにより作成された翻訳対象特徴ベクトルと、複数の専門分野のそれぞれに帰属する文の特徴を表す各専門分野特徴ベクトルとに基づいて、前記翻訳対象文が前記専門分野のそれぞれに帰属する可能性を推定する推定ステップと、前記翻訳対象文が帰属する可能性が高い専門分野に対応する専門辞書データを優先的に選択する選択ステップと、前記選択ステップにより選択された専門辞書データを用いて、前記翻訳対象文を翻訳する翻訳ステップとを有する。 The machine translation method according to the present invention includes a first creation step for creating a translation target feature vector representing a feature of the translation target sentence based on the translation target sentence, and a translation created by the first creation step. An estimation step for estimating a possibility that the translation target sentence belongs to each of the specialized fields based on the target feature vector and each specialized field feature vector representing a feature of the sentence belonging to each of the plurality of specialized fields; The translation step is translated using a selection step that preferentially selects specialized dictionary data corresponding to a specialized field to which the translation subject sentence is likely to belong, and the specialized dictionary data selected in the selection step A translation step.

好ましくは、機械翻訳方法は、前記専門辞書データのそれぞれに対応する文に基づいて、前記専門分野特徴ベクトルを作成する第２の作成ステップを有する。 Preferably, the machine translation method includes a second creation step of creating the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

好ましくは、機械翻訳方法は、前記第１の作成ステップが、更に、前記翻訳対象文の周辺の文に基づいて、前記翻訳対象特徴ベクトルを作成する。 Preferably, in the machine translation method, the first creation step further creates the translation target feature vector based on a sentence around the translation target sentence.

好ましくは、機械翻訳方法は、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を視認可能に提供する提供ステップを有する。 Preferably, the machine translation method includes a providing step of providing the possibility of belonging to each of the specialized fields of the translation target sentence so as to be visible.

好ましくは、機械翻訳方法は、前記提供ステップが、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を表すグラフを表示する。 Preferably, in the machine translation method, the providing step displays a graph representing the possibility of the translation target sentence belonging to each of the specialized fields.

本発明に係る、機械翻訳プログラムは、翻訳対象文に基づいて、該翻訳対象文の特徴を表す翻訳対象特徴ベクトルを作成する第１の作成ステップと、前記第１の作成ステップにより作成された翻訳対象特徴ベクトルと、複数の専門分野のそれぞれに帰属する文の特徴を表す各専門分野特徴ベクトルとに基づいて、前記翻訳対象文が前記専門分野のそれぞれに帰属する可能性を推定する推定ステップと、前記翻訳対象文が帰属する可能性が高い専門分野に対応する専門辞書データを優先的に選択する選択ステップと、前記選択ステップにより選択された専門辞書データを用いて、前記翻訳対象文を翻訳する翻訳ステップとをコンピュータに実行させる。 A machine translation program according to the present invention includes a first creation step for creating a translation target feature vector representing a feature of a translation target sentence based on the translation target sentence, and a translation created by the first creation step. An estimation step for estimating a possibility that the translation target sentence belongs to each of the specialized fields based on the target feature vector and each specialized field feature vector representing a feature of the sentence belonging to each of the plurality of specialized fields; The translation step is translated using a selection step that preferentially selects specialized dictionary data corresponding to a specialized field to which the translation subject sentence is likely to belong, and the specialized dictionary data selected in the selection step And causing the computer to execute a translation step.

好ましくは、機械翻訳プログラムは、前記専門辞書データのそれぞれに対応する文に基づいて、前記専門分野特徴ベクトルを作成する第２の作成ステップをコンピュータに実行させる。 Preferably, the machine translation program causes the computer to execute a second creation step of creating the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

好ましくは、機械翻訳プログラムは、前記第１の作成ステップが、更に、前記翻訳対象文の周辺の文に基づいて、前記翻訳対象特徴ベクトルを作成する。 Preferably, in the machine translation program, the first creation step further creates the translation target feature vector based on a sentence around the translation target sentence.

好ましくは、機械翻訳プログラムは、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を視認可能に提供する提供ステップをコンピュータに実行させる。 Preferably, the machine translation program causes the computer to execute a providing step of providing the possibility of belonging to each of the specialized fields of the translation target sentence so as to be visible.

好ましくは、機械翻訳プログラムは、前記提供ステップが、前記翻訳対象文の前記専門分野のそれぞれへの帰属可能性を表すグラフを表示する。 Preferably, in the machine translation program, the providing step displays a graph representing the possibility of the translation target sentence belonging to each of the specialized fields.

本発明によれば、翻訳対象文の翻訳に適した専門辞書データが自動的に選択されるため、簡易に専門辞書データを用いた適切な翻訳が可能となる。 According to the present invention, specialized dictionary data suitable for translation of a translation target sentence is automatically selected, so that appropriate translation using specialized dictionary data can be easily performed.

以下、本発明を実施するための最良の形態を、図面を参照して説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

図１は、本発明の実施例に係る機械翻訳装置の構成を示す図である。図１に示す機械翻訳装置１００は、特徴ベクトル作成部１０、特徴ベクトル作成部２０、専門分野ラベル付コーパスデータベース（ＤＢ）３０、専門分野代表ベクトル群データベース（ＤＢ）４０、カテゴリ分類器５０、表示部５２、専門辞書選択部６０及び翻訳モジュール７０により構成される。 FIG. 1 is a diagram showing a configuration of a machine translation apparatus according to an embodiment of the present invention. A machine translation apparatus 100 shown in FIG. 1 includes a feature vector creation unit 10, a feature vector creation unit 20, a specialized field labeled corpus database (DB) 30, a specialized field representative vector group database (DB) 40, a category classifier 50, and a display. The unit 52, the specialized dictionary selection unit 60, and the translation module 70 are configured.

特徴ベクトル作成部１０は、翻訳対象文を入力する。次に、特徴ベクトル作成部１０は、この入力した翻訳対象文について、周知の文書解析手法である形態素解析を行い、当該翻訳対象文を単語に分割する。更に、特徴ベクトル作成部１０は、単語毎に出現頻度（ＴＦ）をカウントする。そして、特徴ベクトル作成部１０は、これら単語毎の出現頻度の数値のそれぞれを、その単語に対応する１つの次元の要素として含み、翻訳対象文の特徴を表すベクトル（翻訳対象特徴ベクトル）を作成する。ここで、翻訳対象特徴ベクトルの次元数と各次元に対応する単語とは、予め定められており、後述する専門分野単一特徴ベクトル及び専門分野代表特徴ベクトルと一致する。 The feature vector creation unit 10 inputs a translation target sentence. Next, the feature vector creation unit 10 performs morphological analysis, which is a well-known document analysis technique, on the input translation target sentence, and divides the translation target sentence into words. Furthermore, the feature vector creation unit 10 counts the appearance frequency (TF) for each word. Then, the feature vector creation unit 10 creates a vector (translation target feature vector) representing each feature of the translation target sentence by including each numerical value of the appearance frequency for each word as a one-dimensional element corresponding to the word. To do. Here, the number of dimensions of the translation target feature vector and the word corresponding to each dimension are determined in advance and coincide with a specialized field single feature vector and a specialized field representative feature vector, which will be described later.

図２は、翻訳対象特徴ベクトル作成の一例を示す図である。特徴ベクトル作成部１０は、翻訳対象文「ブッシュ米大統領とニューヨーク市内のホテルで在日米軍の再編問題について会談した。」を入力すると、当該翻訳対象文について形態素解析を行い、単語に分割する。更に、ここでは、特徴ベクトル作成部１０は、形態素解析により得た単語のうち名詞のみを抽出する。すなわち、名詞である単語「ブッシュ」、「米」、「大統領」、「ニューヨーク」、「市内」、「ホテル」、「在日」、「米」、「軍」、「再編」、「問題」が抽出される。 FIG. 2 is a diagram illustrating an example of creating a translation target feature vector. When the feature vector creation unit 10 inputs a sentence to be translated “US President Bush and a US Army reorganization problem at a hotel in New York City”, the morphological analysis is performed on the sentence to be translated and the word is divided into words. To do. Further, here, the feature vector creation unit 10 extracts only nouns from words obtained by morphological analysis. That is, the noun words “Bush”, “US”, “President”, “New York”, “city”, “hotel”, “resident in Japan”, “US”, “military”, “reorganization”, “problem” Is extracted.

次に、特徴ベクトル作成部１０は、抽出した単語について、出現頻度をカウントする。ここでは、単語「ブッシュ」、「大統領」、「ニューヨーク」、「市内」、「ホテル」、「在日」、「軍」、「再編」、「問題」が各１回ずつ出現しているため、これら単語については「１」とカウントされ、単語「米」が２回出現しているため、この単語については「２」とカウントされる。 Next, the feature vector creation unit 10 counts the appearance frequency for the extracted words. Here, the words “Bush”, “President”, “New York”, “city”, “hotel”, “resident in Japan”, “army”, “reorganization”, and “problem” appear once each. Therefore, these words are counted as “1”, and since the word “rice” appears twice, this word is counted as “2”.

更に、特徴ベクトル作成部１０は、単語「ブッシュ」、「大統領」、「ニューヨーク」、「市内」、「ホテル」、「在日」、「軍」、「再編」、「問題」に対応する次元については要素を「１」とし、単語「米」に対応する次元については要素を「２」とするとともに、これら以外の単語、すなわち、翻訳対象文に出現しない単語に対応する次元については要素を「０」とした翻訳対象特徴ベクトルを作成する。 Further, the feature vector creation unit 10 corresponds to the words “bush”, “president”, “New York”, “city”, “hotel”, “resident in Japan”, “military”, “reorganization”, “problem”. For the dimension, the element is “1”, for the dimension corresponding to the word “rice”, the element is “2”, and for the dimension corresponding to words other than these, that is, words that do not appear in the sentence to be translated A feature vector to be translated is created with “0” being set.

なお、特徴ベクトル作成部１０は、例えば重要であると考えられる単語等の所定の単語については、出現頻度の数値をそのまま翻訳対象特徴ベクトルの要素とするのではなく、出現頻度の数値に所定の重み付け係数を乗じた値を翻訳対象特徴ベクトルの要素とする等、重み付けを行うようにしてもよい。 Note that the feature vector creation unit 10 does not use the numerical value of the appearance frequency as an element of the feature vector to be translated as it is for a predetermined word such as a word considered important, for example. Weighting may be performed by, for example, using a value obtained by multiplying the weighting coefficient as an element of the translation target feature vector.

また、翻訳対象文とともに、当該翻訳対象文の周辺の文（周辺文）が入力される場合には、特徴ベクトル作成部１０は、翻訳対象文と周辺文の双方からなる文集合について、上述と同様の手法によって翻訳対象特徴ベクトルを作成するようにしてもよい。特徴ベクトル作成部１０により作成された翻訳対象特徴ベクトルは、カテゴリ分類器５０に送られる。 When a sentence around the translation target sentence (peripheral sentence) is input together with the translation target sentence, the feature vector creation unit 10 describes the sentence set including both the translation target sentence and the peripheral sentence as described above. A translation target feature vector may be created by a similar method. The translation target feature vector created by the feature vector creation unit 10 is sent to the category classifier 50.

一方、特徴ベクトル作成部２０は、専門分野ラベル付コーパスＤＢ３０から専門分野ラベル付コーパスを読み出す。専門分野ラベル付コーパスは、政治、経済等の専門分野の文（専門分野文）に、その専門分野の識別情報であるラベルを付加したものである。なお、専門分野ラベル付コーパスＤＢ３０は、必ずしも機械翻訳装置１０の内部に構成される必要はなく、特徴ベクトル作成部２０は、業者による情報提供サービスを利用して取得した専門分野ラベル付コーパスを使用するようにしてもよく、インターネットのサイトから専門分野文を収集し、それにラベルを付加したものを専門分野ラベル付コーパスとして使用してもよい。また、後述する辞書データに単語の解説文が含まれているような場合には、特徴ベクトル作成部２０は、その解説文を専門分野文として、それにラベルを付加したものを専門分野ラベル付コーパスとして使用してもよい。 On the other hand, the feature vector creation unit 20 reads a corpus with specialized field labels from the corpus DB 30 with specialized fields label. A corpus with a specialized field label is obtained by adding a label, which is identification information of a specialized field, to a sentence of a specialized field such as politics or economy (specialized field sentence). The specialized field labeled corpus DB 30 does not necessarily have to be configured inside the machine translation apparatus 10, and the feature vector creation unit 20 uses a specialized field labeled corpus acquired using an information providing service provided by a vendor. It is also possible to collect a specialized field sentence from an Internet site and add a label to the specialized field sentence as a corpus with a specialized field label. In addition, when the dictionary data described later includes a commentary of a word, the feature vector creation unit 20 sets the commentary sentence as a specialized field sentence and adds a label to the corpus with a specialized field label. May be used as

次に、特徴ベクトル作成部２０は、上述した特徴ベクトル作成部１０による翻訳対象特徴ベクトルの作成と同様、読み出した専門分野ラベル付コーパス内の専門分野文から当該専門分野文の特徴を表すベクトル（専門分野単一特徴ベクトル）を作成する。すなわち、特徴ベクトル作成部２０は、専門分野文について形態素解析を行って単語に分割し、各単語について、出現頻度をカウントする。更に、特徴ベクトル作成部１０は、これら単語毎の出現頻度の数値のそれぞれを、その単語に対応する１つの次元の要素として含む専門分野単一特徴ベクトルを作成する。特徴ベクトル作成部２０は、作成した専門分野単一特徴ベクトルに、その作成に用いた専門分野ラベル付コーパス内のラベルを付加して、内蔵する記憶部（図示せず）に記憶する。 Next, the feature vector creating unit 20 is a vector that represents the feature of the specialized field sentence from the specialized field sentence in the read corpus with the specialized field label (similar to the creation of the translation target feature vector by the feature vector creating unit 10 described above). Special field single feature vector). That is, the feature vector creation unit 20 performs morphological analysis on the specialized field sentence and divides it into words, and counts the appearance frequency for each word. Furthermore, the feature vector creation unit 10 creates a specialized field single feature vector that includes each of the numerical values of the appearance frequencies for each word as a one-dimensional element corresponding to the word. The feature vector creation unit 20 adds a label in the specialized field-labeled corpus used for the creation to the created specialized field single feature vector, and stores it in a built-in storage unit (not shown).

更に、特徴ベクトル作成部２０は、内蔵する記憶部内の専門分野単一特徴ベクトルを用いて、専門分野毎に、その専門分野に対応する専門分野単一特徴ベクトルを代表する専門分野代表特徴ベクトルを作成し、専門分野代表特徴ベクトル群ＤＢ４０に登録する。 Further, the feature vector creating unit 20 uses a specialized field single feature vector in the built-in storage unit to obtain a specialized field representative feature vector representing a specialized field single feature vector corresponding to the specialized field for each specialized field. It is created and registered in the specialized field representative feature vector group DB 40.

具体的には、特徴ベクトル作成部２０は、内蔵する記憶部内の専門分野単一特徴ベクトルについて、その専門分野単一特徴ベクトル付加されているラベルに基づいて、専門分野毎に分類する。更に、特徴ベクトル作成部２０は、所定の専門分野に対応する専門分野単一特徴ベクトルが１つのみ存在する場合には、その１つの専門分野単一特徴ベクトルを、そのまま専門分野代表特徴ベクトルとして、付加されているラベルとともに、専門分野代表特徴ベクトル群ＤＢ４０に登録する。 Specifically, the feature vector creation unit 20 classifies the specialized field single feature vector in the built-in storage unit for each specialized field based on the label to which the specialized field single feature vector is added. Further, when there is only one specialized field single feature vector corresponding to a predetermined specialized field, the feature vector creating unit 20 directly uses the single specialized field single feature vector as the specialized field representative feature vector. Along with the added label, it is registered in the specialized field representative feature vector group DB 40.

一方、特徴ベクトル作成部２０は、所定の専門分野に対応する専門分野単一特徴ベクトルが複数存在する場合には、これら複数の専門分野単一特徴ベクトルを代表する専門分野代表特徴ベクトルを作成する。例えば、特徴ベクトル作成部２０は、複数の専門分野単一特徴ベクトルの同一次元の要素について平均値を算出し、これら平均値を要素とした専門分野代表特徴ベクトルを作成する。更に、特徴ベクトル作成部２０は、作成した専門分野代表特徴ベクトルについて、その作成に用いた専門分野単一特徴ベクトルに付加されているラベルとともに、専門分野代表特徴ベクトル群ＤＢ４０に登録する。 On the other hand, when there are a plurality of specialized field single feature vectors corresponding to a predetermined specialized field, the feature vector creating unit 20 creates specialized field representative feature vectors representing the plurality of specialized field single feature vectors. . For example, the feature vector creation unit 20 calculates an average value for elements of the same dimension of a plurality of specialized field single feature vectors, and creates a specialized field representative feature vector using these average values as elements. Further, the feature vector creation unit 20 registers the created specialized field representative feature vector in the specialized field representative feature vector group DB 40 together with the label added to the specialized field single feature vector used for the creation.

カテゴリ分類器５０は、特徴ベクトル作成部１０からの翻訳対象特徴ベクトルを入力するとともに、専門分野代表特徴ベクトル群ＤＢ４０に登録された、各専門分野毎の専門分野代表特徴ベクトルと、その専門分野代表特徴ベクトルに対応するラベルとを読み出す。 The category classifier 50 receives the feature vector to be translated from the feature vector creation unit 10, and stores the specialized field representative feature vector for each specialized field registered in the specialized field representative feature vector group DB 40 and the specialized field representative. Read the label corresponding to the feature vector.

次に、カテゴリ分類器５０は、翻訳対象特徴ベクトルと、各専門分野代表特徴ベクトルとに基づいて、翻訳対象文が専門分野のそれぞれに帰属する可能性を推定する。 Next, the category classifier 50 estimates the possibility that the translation target sentence belongs to each of the specialized fields based on the translation target feature vector and each specialized field representative feature vector.

図３は、カテゴリ分類器５０の処理の具体例を示す図である。カテゴリ分類器５０は、各専門分野毎に、その専門分野に対応する専門分野代表特徴ベクトルを正例とし、その専門分野以外の専門分野に対応する専門分野代表特徴ベクトルを負例とした分類関数を作成する。この分散関数は、翻訳対象特徴ベクトルと、正例として用いた専門分野代表特徴ベクトルとの近似度、換言すれば、翻訳対象文が当該分類関数に対応する専門分野に帰属する可能性を算出するための関数である。図３では、「スポーツ」、「コンピュータ」、「経済」、「政治」、「歴史」及び「科学」の各専門分野について分類関数が作成される。 FIG. 3 is a diagram illustrating a specific example of the processing of the category classifier 50. For each specialized field, the category classifier 50 has a specialized function representative feature vector corresponding to the specialized field as a positive example, and a specialized function representative feature vector corresponding to a specialized field other than the specialized field as a negative example. Create This distribution function calculates the degree of approximation between the translation target feature vector and the specialized field representative feature vector used as a positive example, in other words, the possibility that the translation target sentence belongs to the specialized field corresponding to the classification function. Is a function for In FIG. 3, a classification function is created for each specialized field of “sports”, “computer”, “economy”, “politics”, “history”, and “science”.

次に、カテゴリ分類器５０は、各分類関数に翻訳対象特徴ベクトルの要素を代入する。カテゴリ分類器５０は、この代入の結果、分類関数の値が正であり、その値が大きいほど、翻訳対象文が当該分類関数に対応する専門分野に帰属する可能性（帰属可能性）が高いと判断し、分類関数の値が負であり、その値が小さいほど、翻訳対象文が当該分類関数に対応する専門分野に帰属する可能性が小さいと判断する。 Next, the category classifier 50 substitutes the element of the translation target feature vector for each classification function. As a result of this substitution, the category classifier 50 has a positive classification function value. The larger the value, the higher the possibility that the translation target sentence belongs to the specialized field corresponding to the classification function (assignment possibility). It is determined that the value of the classification function is negative, and the smaller the value is, the smaller the possibility that the translation target sentence belongs to the specialized field corresponding to the classification function.

次に、カテゴリ分類器５０は、分類関数の値に、その分類関数の作成の際に正例として用いた専門分野代表特徴ベクトルに対応するラベルを付加したカテゴリ分類結果を作成する。更に、カテゴリ分類器５０は、カテゴリ分類結果に基づいて、翻訳対象文の各専門分野への帰属可能性を表すグラフを表示部５２に表示させる。図３では、翻訳対象文が専門分野「経済」、「政治」、「歴史」、「科学」、「コンピュータ」、「スポーツ」の順に帰属可能性が高いことを示すグラフが表されている。また、カテゴリ分類器５０は、カテゴリ分類結果を専門辞書選択部６０へ出力する。 Next, the category classifier 50 creates a category classification result by adding a label corresponding to the specialized field representative feature vector used as a positive example when creating the classification function to the value of the classification function. Further, the category classifier 50 causes the display unit 52 to display a graph indicating the possibility of belonging to each specialized field of the translation target sentence based on the category classification result. FIG. 3 shows a graph indicating that the translation target sentence is more likely to belong in the order of specialized fields “economics”, “politics”, “history”, “science”, “computer”, and “sports”. Further, the category classifier 50 outputs the category classification result to the specialized dictionary selection unit 60.

専門辞書選択部６０は、カテゴリ分類器５０からのカテゴリ分類結果を入力すると、当該カテゴリ分類結果に含まれる分散関数の値とラベルに基づいて、翻訳対象文の各専門分野への帰属可能性を認識する。更に、専門辞書選択部６０は、内蔵する記憶部（図示せず）に記憶されている専門分野毎の辞書データ（専門辞書データ）のうち、翻訳対象文が帰属する可能性が高い専門分野に対応する専門辞書データを優先的に選択する。 When the specialized dictionary selection unit 60 receives the category classification result from the category classifier 50, the specialized dictionary selection unit 60 determines the possibility of belonging to each specialized field of the translation target sentence based on the value of the dispersion function and the label included in the category classification result. recognize. Further, the specialized dictionary selection unit 60 selects a specialized field in which the translation target sentence is highly likely to belong to among specialized dictionary data (professional dictionary data) stored in a built-in storage unit (not shown). Preferentially select corresponding specialized dictionary data.

図３の例では、選択優先度は、翻訳対象文の帰属可能性の高い専門分野に対応する専門辞書データ順、すなわち、専門分野「経済」の専門辞書データ、「政治」の専門辞書データ、「歴史」の専門辞書データ、「科学」の専門辞書データ、「コンピュータ」の専門辞書データ、「スポーツ」の専門辞書データの順となる。従って、専門辞書選択部６０は、専門分野「経済」の専門辞書データを最優先に選択する。 In the example of FIG. 3, the selection priority is the order of specialized dictionary data corresponding to the specialized field to which the translation target sentence is highly likely to belong, that is, specialized dictionary data of the specialized field “economics”, specialized dictionary data of “politics”, The order is "history" specialized dictionary data, "science" specialized dictionary data, "computer" specialized dictionary data, and "sports" specialized dictionary data. Therefore, the specialized dictionary selection unit 60 selects the specialized dictionary data of the specialized field “Economy” with the highest priority.

なお、専門辞書選択部６０は、複数の専門辞書データを選択するようにしてもよい。例えば、専門辞書選択部６０は、分散関数の値の閾値を設け、カテゴリ分類結果に含まれる分散関数の値のうち、閾値以上のものを特定し、その特定した分散関数の値に付加されたラベルに対応する専門分野の専門辞書データを選択する。 The specialized dictionary selection unit 60 may select a plurality of specialized dictionary data. For example, the specialized dictionary selection unit 60 provides a threshold value for the dispersion function value, identifies a value of the dispersion function included in the category classification result that is equal to or greater than the threshold value, and adds the value to the identified dispersion function value. Select specialized dictionary data in the specialized field corresponding to the label.

専門辞書選択部６０によって選択された専門辞書データは、翻訳モジュール７０に送られる。翻訳モジュール７０は、入力した専門辞書データを用いて翻訳対象文の翻訳を行う。 The specialized dictionary data selected by the specialized dictionary selection unit 60 is sent to the translation module 70. The translation module 70 translates the translation target sentence using the input specialized dictionary data.

このように、機械翻訳装置１００では、特徴ベクトル作成部１０が、翻訳対象文の特徴を表す翻訳対象特徴ベクトルを作成するとともに、特徴ベクトル作成部２０が、専門分野文の特徴を表す専門分野特徴ベクトルを作成する。そして、カテゴリ分類器５０は、これらに基づいて、分類関数の値に応じた翻訳対象文の各専門分野への帰属可能性を推定し、専門辞書選択部６０は、その推定の結果に基づいて、翻訳対象文の翻訳に適した専門辞書データを選択しており、簡易に翻訳対象文に適した辞書データを用いた翻訳が可能となる。また、翻訳対象文の専門分野のそれぞれへの帰属可能性がグラフ表示されるため、利用者は、翻訳対象文がどの専門分野に帰属する可能性があるのかを把握することが可能となり、例えば、専門辞書選択部６０により選択された専門辞書データでは適切に翻訳ができなかった場合に、翻訳対象文が帰属する可能性が高い他の専門分野に対応する専門辞書データに選択しなおすといったことも可能となる。 Thus, in the machine translation apparatus 100, the feature vector creation unit 10 creates a translation target feature vector that represents the feature of the translation target sentence, and the feature vector creation unit 20 uses the specialized field feature that represents the feature of the specialized field sentence. Create a vector. Then, based on these, the category classifier 50 estimates the possibility of belonging to each specialized field of the translation target sentence according to the value of the classification function, and the specialized dictionary selection unit 60, based on the estimation result. The specialized dictionary data suitable for the translation of the translation target sentence is selected, and the translation using the dictionary data suitable for the translation target sentence becomes possible. In addition, since the possibility of belonging to each specialized field of the translation target sentence is displayed in a graph, the user can grasp which specialized field the translation target sentence may belong to. When the specialized dictionary data selected by the specialized dictionary selection unit 60 cannot be properly translated, the specialized dictionary data corresponding to another specialized field to which the translation target sentence is highly likely to belong is selected again. Is also possible.

以上、説明したように、本発明に係る機械翻訳装置、機械翻訳方法及び機械翻訳プログラムは、簡易に専門辞書データを用いた適切な翻訳が可能であり、機械翻訳装置等として有用である。 As described above, the machine translation device, the machine translation method, and the machine translation program according to the present invention can be easily translated appropriately using specialized dictionary data, and are useful as machine translation devices and the like.

機械翻訳装置の構成を示す図である。It is a figure which shows the structure of a machine translation apparatus. 翻訳対象特徴ベクトル作成の一例を示す図である。It is a figure which shows an example of translation object feature vector preparation. カテゴリ分類器の処理の具体例を示す図である。It is a figure which shows the specific example of a process of a category classifier.

Explanation of symbols

１００機械翻訳装置
１０、２０特徴ベクトル作成部
３０専門分野ラベル付コーパスデータベース
４０専門分野代表ベクトル群データベース
５０カテゴリ分類器
５２表示部
６０専門辞書選択部
７０翻訳モジュール
DESCRIPTION OF SYMBOLS 100 Machine translation apparatus 10, 20 Feature vector creation part 30 Corpus database with special field label 40 Special field representative vector group database 50 Category classifier 52 Display part 60 Special dictionary selection part 70 Translation module

Claims

First holding means for holding special field feature vectors representing features of sentences belonging to each of a plurality of special fields;
First creation means for creating a translation target feature vector representing a feature of the translation target sentence based on the translation target sentence;
The translation target sentence belongs to each of the specialized fields based on the translation target feature vector created by the first creating unit and each specialized field feature vector held in the first holding unit. An estimation means for estimating the possibility;
Selection means for preferentially selecting specialized dictionary data corresponding to a specialized field to which the translation target sentence is likely to belong;
A machine translation apparatus comprising translation means for translating the sentence to be translated using the specialized dictionary data selected by the selection means.

The machine translation apparatus according to claim 1, further comprising a second creation unit that creates the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

The machine translation apparatus according to claim 1, wherein the first creation unit further creates the translation target feature vector based on sentences around the translation target sentence.

The machine translation device according to claim 1, further comprising a providing unit that provides visibility of the translation target sentence to each of the specialized fields.

5. The machine translation apparatus according to claim 4, wherein the providing means displays a graph representing the possibility of belonging to the specialized field of the translation target sentence.

A first creation step of creating a translation target feature vector representing a feature of the translation target sentence based on the translation target sentence;
Based on the translation target feature vector created in the first creation step and each specialized field feature vector representing the feature of the sentence belonging to each of a plurality of specialized fields, the translation target sentence is each of the specialized fields. An estimation step for estimating the likelihood of belonging to,
A selection step of preferentially selecting specialized dictionary data corresponding to a specialized field to which the translation target sentence is likely to belong;
A machine translation method comprising: a translation step of translating the sentence to be translated using the specialized dictionary data selected in the selection step.

The machine translation method according to claim 6, further comprising a second creation step of creating the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

The machine translation method according to claim 6 or 7, wherein the first creation step further creates the translation target feature vector based on a sentence around the translation target sentence.

9. The machine translation method according to claim 6, further comprising a providing step of providing the possibility of belonging to each of the specialized fields of the translation target sentence so as to be visible.

The machine translation method according to claim 9, wherein the providing step displays a graph indicating the possibility of belonging to each of the specialized fields of the translation target sentence.

A first creation step of creating a translation target feature vector representing a feature of the translation target sentence based on the translation target sentence;
Based on the translation target feature vector created in the first creation step and each specialized field feature vector representing the feature of the sentence belonging to each of a plurality of specialized fields, the translation target sentence is each of the specialized fields. An estimation step for estimating the likelihood of belonging to,
A selection step of preferentially selecting specialized dictionary data corresponding to a specialized field to which the translation target sentence is likely to belong;
A machine translation program that causes a computer to execute a translation step of translating the sentence to be translated using the specialized dictionary data selected in the selection step.

The machine translation program according to claim 11, wherein the computer executes a second creation step of creating the specialized field feature vector based on a sentence corresponding to each of the specialized dictionary data.

The machine translation program according to claim 11 or 12, wherein the first creation step further creates the translation target feature vector based on a sentence around the translation target sentence.

The machine translation program according to any one of claims 11 to 13, wherein the computer executes a providing step of providing the possibility of belonging to each of the specialized fields of the translation target sentence in a visible manner.

The machine translation program according to claim 14, wherein the providing step displays a graph representing the possibility of belonging to each of the specialized fields of the translation target sentence.