JP4729063B2

JP4729063B2 - Machine translation apparatus, method and program

Info

Publication number: JP4729063B2
Application number: JP2008063878A
Authority: JP
Inventors: 貴志澁谷; 悦雄伊藤; 遠航蔡; 正樹新藤
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2008-03-13
Filing date: 2008-03-13
Publication date: 2011-07-20
Anticipated expiration: 2028-03-13
Also published as: JP2009223365A

Description

本発明は、電子メール送受信ツール、Ｗｅｂブラウザ等において第一言語と第二言語との間の翻訳を行うものに好適な機械翻訳装置、方法及びプログラムに関する。 The present invention relates to a machine translation apparatus, method, and program suitable for performing translation between a first language and a second language in an e-mail transmission / reception tool, a web browser, or the like.

近年、パソコンの普及率は高くインターネット環境が整備されるに伴って、電子メールやＷｅｂ（world wide web）サイトを利用した掲示板（以降、Ｗｅｂ掲示板）により海外との情報のやり取りが盛んになってきた。機械翻訳装置による翻訳精度も向上してきており、電子メールソフトやＷｅｂブラウザソフトから機械翻訳プログラムを呼び出して、翻訳処理を行うこともできるようになっている。 In recent years, the spread of personal computers has been high and the Internet environment has been improved. As a result, the exchange of information with foreign countries has become popular through bulletin boards (hereinafter referred to as web bulletin boards) using e-mail and the web (world wide web) sites. It was. Translation accuracy by machine translation devices has also been improved, and translation processing can be performed by calling a machine translation program from e-mail software or Web browser software.

ところで、翻訳精度を向上させるために、文脈情報を利用した翻訳方法が採用されることがある。この文書翻訳技術においては、文書全体を前もって解析して各種情報を抽出し、抽出した内容を翻訳に利用したり、文書の翻訳を進めながら翻訳した文の情報を後の文の翻訳に利用する。 By the way, in order to improve translation accuracy, a translation method using context information may be employed. In this document translation technology, the entire document is analyzed in advance and various information is extracted, and the extracted contents are used for translation, or the sentence information translated while the document is being translated is used for subsequent sentence translation. .

また、メール文書やＷｅｂ掲示板の翻訳を向上させるために、翻訳対象の文書以外の関連を有する文書から翻訳処理に利用可能な情報を抽出することにより、翻訳精度を向上させるようにしたものがある（例えば、特許文献１参照）。
特開２００３−１０８５５３号公報 In addition, in order to improve the translation of mail documents and Web bulletin boards, there is one that improves translation accuracy by extracting information that can be used for translation processing from documents having a relation other than the document to be translated. (For example, refer to Patent Document 1).
JP 2003-108553 A

しかしながら、文書全体や関連文書を前もって解析して各種情報を抽出し、抽出した内容を翻訳に利用するので、適切な訳語を決定するためには、形態素解析、構文・意味解析、第一言語から第二言語への意味構造変換、構文生成、形態素生成という翻訳処理の各ステップの中で意味構造変換のステップまで行わないと訳語が決定しないなど、実際に翻訳するのと同等のコストがかかってしまう。 However, since the entire document and related documents are analyzed in advance and various information is extracted, and the extracted contents are used for translation. In order to determine an appropriate translation, morphological analysis, syntax / semantic analysis, from the first language There is a cost equivalent to the actual translation, such as the translation is not determined unless the semantic structure conversion step is performed in the translation processing steps of semantic structure conversion, syntax generation, and morpheme generation to the second language. End up.

また、文書の翻訳を進めながら翻訳した文の情報を後の文の翻訳に利用する場合は、前者の手法のようにコストはかからないが、文の出現順によって翻訳精度が左右されるという問題がある。例えば、電子メールのやりとりを考えた場合、受信したメールを引用して返信を行う際、引用内容の下に返信内容を記述する場合と、引用内容の上に記述する場合、引用の途中に記述する場合など様々な形態が存在する。返信部分を翻訳するうえで、引用部分に翻訳精度を向上させる情報が存在したとしても、引用部分の上に返信部分が記述されていると、その情報を利用できずに翻訳をすることになり、引用の下に記述した場合と比較して翻訳精度が低下するという問題がある。また、引用の途中に記述した場合も、翻訳精度を向上させる情報が存在する位置との関係で、翻訳精度が低下するという問題がある。 In addition, when the translated sentence information is used for the subsequent sentence translation while the document is being translated, the cost is not as high as the former method, but the translation accuracy depends on the order of appearance of the sentence. is there. For example, when considering exchange of e-mail, when replying by quoting the received mail, when replying content is written below the quoted content, when writing on the quoted content, write in the middle of the quote There are various forms such as when When translating the reply part, even if there is information that improves the translation accuracy in the quoted part, if the reply part is described on the quoted part, it will be translated without using that information. There is a problem that the translation accuracy is lower than the case described under the quotation. In addition, even when described in the middle of citation, there is a problem that the translation accuracy is lowered due to the relationship with the position where the information for improving the translation accuracy exists.

このように、電子メールやＷｅｂ掲示板等の電子情報の授受において、文書の内容が同じであっても、文の出現順により翻訳精度が左右されるという問題点があった。また、文書全体を前もって解析した場合には翻訳処理にコストがかかるという問題点があった。 As described above, in the exchange of electronic information such as an electronic mail or a Web bulletin board, there is a problem that even if the contents of the document are the same, the translation accuracy depends on the order of appearance of the sentences. Further, when the entire document is analyzed in advance, there is a problem that the translation processing is expensive.

本発明の目的は、電子メールやＷｅｂ掲示板などのやり取りの際に含まれる引用記号を利用して、同一内容の文書が文の出現順に左右されずに翻訳精度を向上させることができる機械翻訳装置、方法及びプログラムを提供することである。 SUMMARY OF THE INVENTION An object of the present invention is to provide a machine translation apparatus that can improve the translation accuracy by using quotation marks included in exchanges such as e-mails and Web bulletin boards, and documents having the same contents are not affected by the order of appearance of sentences. It is to provide a method and a program.

本発明に係る機械翻訳装置は、機械翻訳プログラム、翻訳に必要な語彙・規則を蓄積した翻訳辞書部を記憶した記憶装置と、翻訳対象の第一言語の原文をデータとして入力する入力装置と、翻訳後の第二言語の訳文を出力する出力装置と、前記機械翻訳プログラムを演算実行するプロセッサとを備えた機械翻訳装置において、前記入力装置から入力された第一言語文書の文の先頭に引用記号が存在するか否かを判定し引用記号が存在するときは文の先頭からの引用記号の個数を引用形式の階層と判定する文書階層判定部と、前記文書階層判定部で判定された引用形式の階層ごとに第一言語文書の文を格納する文書記憶部と、前記入力装置から入力された第一言語文書の文を翻訳する際に用いられる翻訳知識情報を格納するための翻訳知識情報格納部と、前記文書記憶部に格納された第一言語文書の文のうち引用形式の階層が深い順に前記翻訳辞書部の語彙や変換規則及び前記翻訳知識情報格納部に格納された翻訳知識情報を利用して翻訳を行いその翻訳の際に使用した前記翻訳辞書部の語彙や変換規則から訳語を選択する際に用いた翻訳知識情報を取り出し前記翻訳知識情報格納部に格納し翻訳結果を前記出力装置に出力する翻訳部とを備えたことを特徴とする。
A machine translation device according to the present invention includes a machine translation program, a storage device that stores a translation dictionary unit that stores vocabulary and rules necessary for translation, an input device that inputs a source text of a first language to be translated, and an output device for outputting the translated sentence of the second language after translation, the first sentence of the machine translation program Te machine translation apparatus odor and a processor for execution, the first language document entered from the entering force system The document hierarchy determination unit determines whether the number of quotes from the beginning of the sentence is a citation format hierarchy, and the document hierarchy determination unit A document storage unit for storing sentences of the first language document for each hierarchy of the citation format, and a translation for storing translation knowledge information used when translating the sentence of the first language document input from the input device Knowledge information A pay unit, said translation dictionary of vocabulary and conversion rule and translation knowledge information stored in the translation knowledge information storage unit in the hierarchy deeper order of citation form of the sentence in the first language document stored in the document storage unit The translation knowledge information used when selecting the translation word from the vocabulary and conversion rules of the translation dictionary part used for the translation and the translation rule is extracted and stored in the translation knowledge information storage part. And a translation unit for outputting to the output device .

本発明によれば、電子メールやＷｅｂ掲示板などのやり取りの際に含まれる引用記号を利用して、文書の引用形式の階層構造を判断し、引用形式の階層が深い順に翻訳することにより、翻訳処理に要するコストを抑え、引用形式の階層が浅い文の翻訳において、同一内容の文書が文の出現順に左右されずに翻訳精度を向上させることができる。 According to the present invention, the quotation structure included in the exchange of e-mail, Web bulletin board, etc. is used to determine the hierarchical structure of the citation format of the document, The cost required for processing can be suppressed, and in translation of a sentence with a shallow citation format, the translation accuracy can be improved without the documents having the same contents being affected by the order in which the sentences appear.

図１は本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図である。機械翻訳装置１１は、例えば一般的なコンピュータに機械翻訳プログラムなどのソフトウェアプログラムがインストールされ、そのソフトウェアプログラムが演算制御装置１２のプロセッサ１３において実行されることにより実現される。 FIG. 1 is a block configuration diagram showing a hardware configuration of a machine translation apparatus according to an embodiment of the present invention. The machine translation device 11 is realized, for example, by installing a software program such as a machine translation program in a general computer and executing the software program in the processor 13 of the arithmetic control device 12.

演算制御装置１２は機械翻訳に関する各種演算を行うものであり、演算制御装置１２はプロセッサ１３とメモリ１４とを有し、メモリ１４には翻訳に関する機械翻訳プログラム１５が記憶され、プロセッサ１３により処理が実行される際には作業エリア１６が用いられる。演算制御装置１２の演算結果等は出力装置１７である表示装置１８に表示出力され、また、通信制御装置１９を介して通信ネットワークに出力される。 The arithmetic control device 12 performs various arithmetic operations related to machine translation. The arithmetic control device 12 has a processor 13 and a memory 14. A memory translation program 15 is stored in the memory 14, and the processor 13 performs processing. When executed, the work area 16 is used. Calculation results and the like of the calculation control device 12 are displayed and output on the display device 18 that is the output device 17 and also output to the communication network via the communication control device 19.

入力装置２０は演算制御装置１２に情報を入力するものであり、例えば、マウス２１、キーボード２２、ディスクドライブ２３、通信制御装置１９から構成され、例えば、マウス２１やキーボード２２は表示装置１８を介して演算制御装置１２に各種指令を入力し、キーボード２２、ディスクドライブ２３、通信制御装置１９は翻訳対象の文書を入力する。 The input device 20 is used to input information to the arithmetic control device 12, and includes, for example, a mouse 21, a keyboard 22, a disk drive 23, and a communication control device 19. For example, the mouse 21 and the keyboard 22 are connected via the display device 18. Then, various commands are input to the arithmetic and control unit 12, and the keyboard 22, the disk drive 23, and the communication control unit 19 input a document to be translated.

すなわち、ディスクドライブ２３は翻訳対象の文書のファイルを記憶媒体に入出力するものであり、通信制御装置１９は機械翻訳装置１１をインターネットやＬＡＮなどの通信ネットワークに接続するものである。通信制御装置１９はＬＡＮカードやモデムなどの装置であり、通信制御装置１９を介して通信ネットワークと送受信したデータは入力信号又は出力信号として演算制御装置１２に送受信される。さらに、演算制御装置１２の演算結果や翻訳に必要な知識・規則を蓄積した翻訳辞書等を記憶するハードディスクドライブ（ＨＤＤ）２４が設けられている。 That is, the disk drive 23 inputs / outputs a file of a document to be translated to / from a storage medium, and the communication control device 19 connects the machine translation device 11 to a communication network such as the Internet or a LAN. The communication control device 19 is a device such as a LAN card or a modem, and data transmitted / received to / from the communication network via the communication control device 19 is transmitted / received to / from the arithmetic control device 12 as an input signal or an output signal. Further, a hard disk drive (HDD) 24 is provided for storing a calculation dictionary of the calculation control device 12 and a translation dictionary storing knowledge and rules necessary for translation.

図２は本発明の実施の形態に係わる機械翻訳装置１１の機能ブロック図である。図２に示す演算制御装置１２内の各機能ブロックは、上述の機械翻訳プログラム１５を構成する各プログラムに対応する。すなわち、プロセッサ１３が機械翻訳プログラム１５を構成する各プログラムを実行することで、演算制御装置１２は、各機能ブロックとして機能することとなる。また、記憶装置２５の各ブロックは、演算制御装置１２内のメモリ１４及びハードディスクドライブ２４の記憶領域に対応する。 FIG. 2 is a functional block diagram of the machine translation apparatus 11 according to the embodiment of the present invention. Each functional block in the arithmetic and control unit 12 shown in FIG. 2 corresponds to each program constituting the machine translation program 15 described above. That is, when the processor 13 executes each program constituting the machine translation program 15, the arithmetic control device 12 functions as each functional block. Each block of the storage device 25 corresponds to a storage area of the memory 14 and the hard disk drive 24 in the arithmetic control device 12.

機械翻訳装置１１は、装置全体の制御を行う制御部２６、外部との入力のインターフェースを行う入力部２７、外部との出力のインターフェースを行う出力部２８、入力部２７を介して入手された入力文書を翻訳するための翻訳部２９、入力部２７から入力された第一言語文書の文の先頭に引用記号が存在するか否かを判定し引用記号が存在するときは文の先頭からの引用記号の個数を引用形式の階層と判定する文書階層判定部３０、翻訳部２９が翻訳時に用いる各種翻訳知識を収納した翻訳辞書部３１、文書階層判定部３０で判定された引用形式の階層ごとに第一言語文書の文を格納する文書記憶部３２、引用記号が存在する文に関連した文を翻訳する際の訳語を選択するに有用な翻訳知識情報を格納する翻訳知識情報格納部３３、引用記号を記憶する引用記号記憶部３９によって構成されている。 The machine translation device 11 includes a control unit 26 that controls the entire apparatus, an input unit 27 that interfaces with an external input, an output unit 28 that interfaces with an external output, and an input obtained via the input unit 27. It is determined whether or not a quotation mark exists at the head of the sentence of the first language document input from the translation section 29 and the input section 27 for translating the document. Document hierarchy determination unit 30 that determines the number of symbols as a citation format hierarchy, translation dictionary unit 31 that stores various translation knowledge used by translation unit 29 at the time of translation, and each citation format layer determined by document hierarchy determination unit 30 Document storage unit 32 for storing a sentence of the first language document, translation knowledge information storage unit 33 for storing translation knowledge information useful for selecting a translation word when translating a sentence related to a sentence in which a quotation mark exists, citation Record It is constituted by a reference symbol storage unit 39 for storing.

入力部２７は、入力装置２０であるインターネットなどの通信制御装置１９やキーボード２２等を通じて、翻訳対象の第一言語文書やコマンドを受け取るものである。制御部２６は、入力部２７から送られた翻訳対象の第一言語文書の文章データを文書階層判定部３０に送ったり、文書記憶部３２から取り出した文書を翻訳部２９に送り、翻訳部２９での翻訳結果を出力部２８に送るなどの全体の制御を司るものである。 The input unit 27 receives the first language document or command to be translated through the communication control device 19 such as the Internet as the input device 20, the keyboard 22, or the like. The control unit 26 sends the sentence data of the first language document to be translated sent from the input unit 27 to the document hierarchy determination unit 30 or sends the document extracted from the document storage unit 32 to the translation unit 29. It is responsible for overall control, such as sending the translation result of to the output unit 28.

翻訳部２９は、制御部２６から送られてきた第一言語文書による文章データに対して、翻訳辞書部３１の辞書や翻訳知識情報格納部３３に格納された知識情報を参照しながら翻訳処理を行うものである。 The translation unit 29 performs translation processing on the text data in the first language document sent from the control unit 26 while referring to the dictionary of the translation dictionary unit 31 and the knowledge information stored in the translation knowledge information storage unit 33. Is what you do.

翻訳辞書部３１は、翻訳部２９の翻訳処理に必要な翻訳知識を記憶している。例えば、翻訳部２９が日英・英日双方向の翻訳処理を行うものとすると、辞書部３１は夫々の翻訳方向毎に、基本辞書部３４、専門用語辞書部３５、ユーザ辞書部３６を有している。基本辞書部３４は、語彙部３４ａ、形態素解析規則３４ｂ、構文・意味解析規則３４ｃ、変換規則３４ｄ、構文生成規則３４ｅ、形態素生成規則３４ｆからなり、専門用語辞書部３５は語彙部３５ａのみからなり、ユーザ辞書部３６は、語彙部３６ａ、訳語学習部３６ｂからなる。基本辞書部３４、専門用語辞書部３５、ユーザ辞書部３６の構成は以下の通りである。 The translation dictionary unit 31 stores translation knowledge necessary for the translation process of the translation unit 29. For example, if the translation unit 29 performs bi-directional translation processing between English and English, the dictionary unit 31 has a basic dictionary unit 34, a technical term dictionary unit 35, and a user dictionary unit 36 for each translation direction. is doing. The basic dictionary unit 34 includes a vocabulary unit 34a, a morphological analysis rule 34b, a syntax / semantic analysis rule 34c, a conversion rule 34d, a syntax generation rule 34e, and a morpheme generation rule 34f. The technical term dictionary unit 35 includes only the vocabulary unit 35a. The user dictionary unit 36 includes a vocabulary unit 36a and a translated word learning unit 36b. The basic dictionary unit 34, technical term dictionary unit 35, and user dictionary unit 36 are configured as follows.

（１）基本辞書部
（ａ）語彙部
少なくとも第１言語の語彙の各々についての活用情報、意味情報、分野情報、訳語情報、訳語毎の分野情報
（ｂ）形態素解析規則
第１言語の入力文を形態素解析するための知識
（ｃ）構文・意味解析規則
第１言語の入力文を形態素解析した後、構文的・意味的な解析を行うための知識
（ｄ）変換規則
構文・意味解析された結果の第１言語の意味構造を第２言語の意味構造へ変換するための知識
（ｅ）構文生成規則
第２言語の意味構造から第２言語の単語列を生成するための知識
（ｆ）形態素生成規則
第２言語の語の活用を反映し、最終的な訳文を出力するための知識
（２）専門用語辞書部
専門用語辞書部は、複数の分野別辞書が用意されており、入力文書の内容により翻訳時に用いる辞書を選択することが出来る。語彙部のみからなる。 (1) Basic dictionary part (a) Vocabulary part Utilization information, semantic information, field information, translation information, field information for each translation word at least for each vocabulary in the first language (b) Morphological analysis rules Input sentence in the first language (C) Syntax / semantic analysis rules Knowledge for syntactic / semantic analysis after morphological analysis of input sentences in the first language (d) Conversion rules Syntax / semantic analysis Knowledge for converting the resulting semantic structure of the first language into the semantic structure of the second language (e) Syntax generation rules Knowledge for generating a word string of the second language from the semantic structure of the second language (f) Morpheme Generation rule Knowledge to output the final translation reflecting the use of words in the second language (2) Technical term dictionary part The technical term dictionary part is provided with a plurality of field-specific dictionaries. Select a dictionary to use for translation according to the contents Door can be. Consists only of vocabulary.

（３）ユーザ辞書部
ユーザ辞書部は、ユーザ定義用の辞書である。翻訳する文書に合わせてユーザが複数作成することができる。各々について語彙部と訳語学習部とがある。 (3) User dictionary part The user dictionary part is a user-defined dictionary. The user can create a plurality of documents according to the document to be translated. There is a vocabulary part and a translation learning part for each.

（ａ）語彙部
ユーザが新規登録した用語についての活用情報、意味情報、訳語情報等
（ｂ）訳語学習部
ある第１言語の語句に対してユーザが学習させた訳語の情報を格納する。 (A) Vocabulary section Utilization information, semantic information, translated word information, etc. for terms newly registered by the user (b) Translated word learning section Stores information on translated words learned by the user for a certain first language phrase.

次に、翻訳知識情報格納部３３は、翻訳部２９において翻訳辞書部３１を参照した際に、参照した見出し語に特定の分野情報が付与されている場合は分野情報を格納したり、語の共起関係で決定した見出し語と訳語のペア等、翻訳の過程で他の文にも適用できると判定された情報が格納される。例えば、引用記号が存在する文に関連した他の文を翻訳する際の訳語を選択するに有用な翻訳知識情報が格納される。詳細については具体例を用いて後述する。 Next, when referring to the translation dictionary unit 31 in the translation unit 29, the translation knowledge information storage unit 33 stores the field information when specific field information is given to the headword that is referred to. Information that is determined to be applicable to other sentences in the course of translation, such as a pair of headwords and translated words determined by the co-occurrence relationship, is stored. For example, translation knowledge information useful for selecting a translation word when translating another sentence related to a sentence having a quotation mark is stored. Details will be described later using a specific example.

出力部２８は、翻訳部１０４により訳出された第二言語の文章（訳文）を、制御部２６からの指示により出力する。文書階層判定部３０は、制御部２６より渡された入力データに引用形式の階層情報が含まれるかを判定して、引用形式の階層毎に文書記憶部３２に格納する。そして、文書記憶部３２は、文書階層判定部３０で判定された引用形式の階層ごとに第一言語文書の文を格納する。文書階層判定部３０での処理が完了すると制御部２６より文書記憶部３２の記憶内容が取り出され、翻訳部２９に渡され翻訳が行われる。これらの処理内容の詳細については後述する。 The output unit 28 outputs the second language sentence (translated sentence) translated by the translation unit 104 in accordance with an instruction from the control unit 26. The document hierarchy determination unit 30 determines whether or not the citation format hierarchy information is included in the input data passed from the control unit 26, and stores the citation format hierarchy in the document storage unit 32. The document storage unit 32 stores the sentence of the first language document for each layer of the citation format determined by the document layer determination unit 30. When the processing in the document hierarchy determination unit 30 is completed, the storage content of the document storage unit 32 is taken out from the control unit 26 and transferred to the translation unit 29 for translation. Details of these processing contents will be described later.

図３は本発明の実施の形態に係る機械翻訳装置の第一言語文書が入力されてから訳文を出力するまでの処理内容の一例を示すフローチャートである。以下の説明では、入力装置２０から入力部２７を介して入力された第一言語文書は英語による原文であり、第二言語の日本語による訳文に翻訳する場合について説明する。 FIG. 3 is a flowchart showing an example of the processing contents from the input of the first language document of the machine translation device according to the embodiment of the present invention to the output of the translation. In the following description, a case where the first language document input from the input device 20 via the input unit 27 is an original sentence in English and is translated into a Japanese translation of the second language will be described.

制御部２６は、入力部２７を介して第一言語文書の原文データが入力されると、文書階層判定部３０を起動し、文書解析判定部３０は文書の引用形式の階層を判定する（Ｓ１０１）。ここでは、入力部２７よりに以下に示す原文が入力されたとする。 When the original text data of the first language document is input via the input unit 27, the control unit 26 activates the document hierarchy determination unit 30, and the document analysis determination unit 30 determines the hierarchy of the document citation format (S101). ). Here, it is assumed that the following original text is input from the input unit 27.

He is a reliever.
>We got Taro Toshiba.
>Do you know him?
２文目、３文目の先頭に存在する「>」はメールの引用記号であり、２文目、３文目のメールの内容に対して、１文目に返信内容を記述したものである。ここでは引用記号を「>」としているが、メールツールにより引用記号は異なる場合もあるため、引用記号は機械翻訳装置の利用者が自由に引用記号記憶部３９に設定できるようにしてもよい。また引用記号の指定は単一ではなく、複数の組合せを指定できるようにしてもよい。 He is a reliever.
> We got Taro Toshiba.
> Do you know him?
The ">" at the beginning of the second and third sentences is an email quoting symbol, and the reply contents are described in the first sentence relative to the contents of the second and third sentence mails. . Here, although the quote symbol is “>”, the quote symbol may differ depending on the mail tool. Therefore, the quote symbol may be freely set in the quote symbol storage unit 39 by the user of the machine translation apparatus. The designation of the quotation mark is not limited to a single one, and a plurality of combinations may be designated.

本発明の実施の形態では、引用記号記憶部３９は、図４に示すように、予め引用記号として「>」、「|」の二つが指定されており、引用記号の配列位置ＭがＭ＝１のときは、引用記号は「>」であり、引用記号の配列位置ＭがＭ＝２のときは、引用記号は「|」である場合を示している。 In the embodiment of the present invention, as shown in FIG. 4, the quote symbol storage unit 39 is designated with “>” and “|” as quote symbols in advance, and the arrangement position M of the quote symbols is M = When 1, the quotation mark is “>”, and when the arrangement position M of the quotation mark is M = 2, the quotation mark is “|”.

図５は、図３中のステップＳ１０１における文書階層判定部３０の処理内容を示すフローチャートである。文書階層判定部３０は、第一言語文書の原文データが入力されると、入力された文字列を改行毎に文Ｓを取得する（Ｓ２０１）。続いて文書の引用形式の階層を示す変数Ｌ、取得した文Ｓの判定開始位置を示す変数Ｎ、引用記号の配列位置を示す変数Ｍをそれぞれ初期値の１に設定する（Ｓ２１）。引用形式の階層を示す変数Ｌが初期値（Ｌ＝１）であるときは引用記号がない文Ｓ、取得した文Ｓの判定開始位置を示す変数Ｎが初期値（Ｎ＝１）であるときは文Ｓの先頭位置、引用記号の配列位置を示す変数Ｍが初期値（Ｍ＝１）であるときは引用記号が「>」である場合である。 FIG. 5 is a flowchart showing the processing contents of the document hierarchy determination unit 30 in step S101 in FIG. When the original text data of the first language document is input, the document hierarchy determination unit 30 acquires the sentence S from the input character string for each line break (S201). Subsequently, a variable L indicating the hierarchy of the document citation format, a variable N indicating the determination start position of the acquired sentence S, and a variable M indicating the arrangement position of the quotation marks are respectively set to an initial value 1 (S21). When the variable L indicating the hierarchy of the citation format is the initial value (L = 1), the sentence S without the quotation mark, and the variable N indicating the determination start position of the acquired sentence S is the initial value (N = 1) Is a case where the quote symbol is “>” when the variable M indicating the start position of the sentence S and the quote symbol array position is the initial value (M = 1).

次に、引用記号の配列位置の変数Ｍ（Ｍ＝１）が示す引用記号「>」と文Ｓの先頭位置（Ｎ＝１）の文字とを比較し一致するか否かを判定する（Ｓ２０３）。一致する場合には、Ｌ＝Ｌ＋１を行い、引用形式の階層を示す変数Ｌを１つだけ深くする（Ｓ２０４）。そして、Ｎ＝Ｎ＋１を行い、文Ｓの判定開始位置を次の文字位置（２番目の文字位置）にずらし、引用記号の配列位置Ｍ（Ｍ＝１）の引用記号「>」と一致するか否かを判定する（Ｓ２０５）。一致する場合には、Ｌ＝Ｌ＋１を行い、引用形式の階層を示す変数Ｌを１つだけ深くし（Ｓ２０６）、ステップＳ２０５に戻る。そして、文Ｓの判定開始位置（Ｎ番目の位置）の文字が引用記号の配列位置Ｍ（Ｍ＝１）の引用記号「>」と一致しないと判定されるまでＳ２０５、Ｓ２０６の処理を繰り返し行う。 Next, the quote symbol “>” indicated by the variable M (M = 1) at the quote symbol array position is compared with the character at the head position (N = 1) of the sentence S to determine whether or not they match (S203). ). If they match, L = L + 1 is performed, and the variable L indicating the hierarchy of the citation format is deepened by one (S204). Then, N = N + 1 is performed, the determination start position of the sentence S is shifted to the next character position (second character position), and is matched with the quote symbol “>” of the quote symbol arrangement position M (M = 1)? It is determined whether or not (S205). If they match, L = L + 1 is performed, the variable L indicating the hierarchy of the citation format is deepened by one (S206), and the process returns to step S205. The processes in S205 and S206 are repeated until it is determined that the character at the determination start position (Nth position) of the sentence S does not match the quotation mark “>” at the quotation mark arrangement position M (M = 1). .

ステップＳ２０５の判定で、文Ｓの判定開始位置（Ｎ番目の位置）の文字が引用記号の配列位置Ｍ（Ｍ＝１）の引用記号「>」と一致しないと判定されたときは、そのときの引用形式の階層を示す変数Ｌを階層Ｌの文として文書記憶部３２に格納する（Ｓ２０７）。そして、第一言語文書の原文データの最後の文Ｓであるかどうかを判定し、最後の文ＳでないときはステップＳ２０１に戻り、最後の文Ｓであるときは処理を終了し、文書階層判定部３０は制御部２６に処理を戻す。 If it is determined in step S205 that the character at the determination start position (Nth position) of the sentence S does not match the quotation mark “>” at the quotation mark arrangement position M (M = 1), then Is stored in the document storage unit 32 as a sentence of the hierarchy L (S207). Then, it is determined whether or not it is the last sentence S of the original text data of the first language document. If it is not the last sentence S, the process returns to step S201. The unit 30 returns the processing to the control unit 26.

一方、ステップＳ２０３の判定で、引用記号の配列位置の変数Ｍ（Ｍ＝１）が示す引用記号「>」と文Ｓの先頭位置（Ｎ＝１）の文字とが一致しない場合は、Ｍ＝Ｍ＋１を行い、引用記号の配列位置Ｍ（Ｍ＝２）から別の引用記号「|」を取得して、文Ｓの先頭位置（Ｎ＝１）の文字と比較する（Ｓ２０９）。一致しない場合は、そのときの引用形式の階層を示す変数Ｌを階層Ｌの文として文書記憶部３２に格納する（Ｓ２０７）。 On the other hand, if it is determined in step S203 that the quote symbol “>” indicated by the variable M (M = 1) of the quote symbol array position does not match the character at the head position (N = 1) of the sentence S, M = M + 1 is performed, another quotation symbol “|” is obtained from the arrangement position M (M = 2) of the quotation marks, and compared with the character at the head position (N = 1) of the sentence S (S209). If they do not match, the variable L indicating the hierarchy of the citation format at that time is stored in the document storage unit 32 as a sentence of the hierarchy L (S207).

一方、一致する場合には、Ｌ＝Ｌ＋１を行い、引用形式の階層を示す変数Ｌを１つだけ深くする（Ｓ２１０）。そして、Ｎ＝Ｎ＋１を行い、文Ｓの判定開始位置を次の文字位置（２番目の文字位置）にずらし、引用記号の配列位置Ｍ（Ｍ＝２）の引用記号「|」と一致するか否かを判定する（Ｓ２１１）。一致する場合には、Ｌ＝Ｌ＋１を行い、引用形式の階層を示す変数Ｌを１つだけ深くし（Ｓ２１２）、ステップＳ２１１に戻る。そして、文Ｓの判定開始位置（Ｎ番目の位置）の文字が引用記号の配列位置Ｍ（Ｍ＝２）の引用記号「|」と一致しないと判定されたときは、そのときの引用形式の階層を示す変数Ｌを階層Ｌの文として文書記憶部３２に格納する（Ｓ２０７）。このようにして文Ｓの先頭から引用記号配列に格納されている各引用記号と一致するかを判定する。 On the other hand, if they match, L = L + 1 is performed, and the variable L indicating the hierarchy of the citation format is deepened by one (S210). Then, N = N + 1 is performed, the determination start position of the sentence S is shifted to the next character position (second character position), and is matched with the quote symbol “|” at the quote symbol arrangement position M (M = 2). It is determined whether or not (S211). If they match, L = L + 1 is performed, the variable L indicating the hierarchy of the citation format is deepened by one (S212), and the process returns to step S211. When it is determined that the character at the determination start position (Nth position) of the sentence S does not match the quotation mark “|” at the quotation mark arrangement position M (M = 2), The variable L indicating the hierarchy is stored in the document storage unit 32 as a sentence of the hierarchy L (S207). In this way, it is determined from the head of the sentence S whether or not it matches each quote symbol stored in the quote symbol array.

例えば、１文目の” He is a reliever.”には引用記号が存在しないため、ステップＳ２０１、Ｓ２０２、Ｓ２０３、Ｓ２０９、Ｓ２０７に進む。ステップＳ２０７では、階層を示す変数Ｌ（Ｌ＝１）とともに文Ｓを文書記憶部３２に記憶する。続いてステップＳ２０８において最後の文まで読み込んだか否かを判定し、まだ文書が存在する場合は、ステップＳ２０１に戻り、次の文Ｓを取得する。そして、各変数Ｌ、Ｎ、Ｍを前述と同様に初期化する。 For example, since there is no quotation mark in the first sentence “He is a reliever.”, The process proceeds to steps S201, S202, S203, S209, and S207. In step S207, the sentence S is stored in the document storage unit 32 together with the variable L (L = 1) indicating the hierarchy. Subsequently, in step S208, it is determined whether or not the last sentence has been read. If there is still a document, the process returns to step S201 to acquire the next sentence S. Then, each variable L, N, M is initialized in the same manner as described above.

２文目の” We got Taro Toshiba.”の場合は、ステップＳ２０３の判定において、引用記号配列の「>」と文ＳのＮが指す位置の文字である「>」が一致するためＳ２０４に進む。Ｓ２０４では、Ｌ＝Ｌ＋１を行い、引用形式の階層を示す変数Ｌを１つだけ深くする。そして、Ｎ＝Ｎ＋１を行い、文Ｓの判定開始位置を次の文字位置（２番目の文字位置）にずらし、引用記号の配列位置Ｍ（Ｍ＝１）の引用記号「>」と一致するか否かを判定する（Ｓ２０５）。引用記号と一致する部分がなくなった時点で、Ｓ２０７に進み、階層を示す変数Ｌとともに文Ｓを文書記憶部３２に記憶する。３文目に対しても同様の処理を行う。 In the case of “We got Taro Toshiba” in the second sentence, in the determination in step S203, “>” in the quote symbol array matches “>” as the character indicated by N in the sentence S, and the process proceeds to S204. . In S204, L = L + 1 is performed, and the variable L indicating the citation format hierarchy is deepened by one. Then, N = N + 1 is performed, the determination start position of the sentence S is shifted to the next character position (second character position), and is matched with the quote symbol “>” of the quote symbol arrangement position M (M = 1)? It is determined whether or not (S205). When there is no part that matches the quotation mark, the process proceeds to S207, and the sentence S is stored in the document storage unit 32 together with the variable L indicating the hierarchy. The same process is performed for the third sentence.

図６は文書記憶部３２に格納される第一言語文書の一例の説明図である。図６に示すように、翻訳対象である第一言語文書の文は引用形式の階層ごとに格納される。すなわち、引用記号のない１文目は「He is a reliever.」は階層１に格納され、引用記号が一つの２文目「We got Taro Toshiba.」、及び３文目「Do you know him?」は階層２に格納される。この場合、階層１は階層２より階層が浅く、逆に階層２は階層１より階層が深いことになる。なお、文書記憶部３２には、後の処理で翻訳部２９で翻訳された第二言語の訳文も格納されるようになっている。 FIG. 6 is an explanatory diagram of an example of a first language document stored in the document storage unit 32. As shown in FIG. 6, the sentence of the first language document to be translated is stored for each hierarchy of the citation format. That is, “He is a reliever.” Is stored in level 1 for the first sentence without a quotation mark, the second sentence “We got Taro Toshiba.” With one quotation mark, and “Do you know him?” "Is stored in level 2. In this case, the hierarchy 1 is shallower than the hierarchy 2, and conversely, the hierarchy 2 is deeper than the hierarchy 1. The document storage unit 32 also stores a translation of the second language translated by the translation unit 29 in a later process.

以上の説明では、引用記号を基に文書の引用形式の階層を判定したが、転送メールの場合は、転送部分と転送時に記述した内容とが区別できるように転送内容の直前に、”-------- Original Message --------“や、”<zzzz@bbb.co.jp> wrote:”などが含まれることがある。これらを階層の判断として利用してもよい。 In the above description, the hierarchy of the document citation format is determined based on the citation symbol, but in the case of forwarded mail, “- ------ Original Message -------- "" and "<zzzz@bbb.co.jp> wrote:" may be included. You may utilize these as judgment of a hierarchy.

図３のステップＳ１０１の処理が終了すると、次に制御部２６において、文書記憶部３２より階層の最大値Ｌmaxを取得する（Ｓ１０２）。続いて、文書記憶部３２より、階層の最大値Ｌmaxに対応する文を取得する（Ｓ１０３）。この場合、同じ階層の分が複数あるときはすべての文を取得する。取得した文データを対象に翻訳部２９が起動され翻訳が行われる（Ｓ１０４）。これにより、つまり、階層の深いものから翻訳されることになる。図６に示した文書記憶部３２の一例では、引用形式の階層Ｌの最大値は２である。そして、階層Ｌの最大値（Ｌ＝２）の文は、２文目「We got Taro Toshiba.」及び３文目「Do you know him?」の２つの文がある。階層の最大値に複数の文がある場合には、第一言語文書の文の出現順に翻訳対象の文とする。従って、この場合には、「We got Taro Toshiba.」が最初の翻訳対象の文となる。 When the process of step S101 in FIG. 3 is completed, the control unit 26 next acquires the maximum value Lmax of the hierarchy from the document storage unit 32 (S102). Subsequently, a sentence corresponding to the maximum value Lmax of the hierarchy is acquired from the document storage unit 32 (S103). In this case, if there are multiple parts of the same hierarchy, all sentences are acquired. The translation unit 29 is activated and translated for the acquired sentence data (S104). In other words, translation is performed from the deepest level. In the example of the document storage unit 32 illustrated in FIG. 6, the maximum value of the citation format layer L is 2. And the sentence of the maximum value (L = 2) of the hierarchy L includes two sentences of the second sentence “We got Taro Toshiba.” And the third sentence “Do you know him?”. When there are a plurality of sentences in the maximum value of the hierarchy, the sentences are to be translated in the order in which the sentences of the first language document appear. Therefore, in this case, “We got Taro Toshiba.” Is the first sentence to be translated.

図７は図３中のステップＳ１０４における翻訳部２９の翻訳処理を示すフローチャートである。図７では、第一言語文書の原文中の１文に対する翻訳部２９の翻訳処理を示しており、入力された原文全体の翻訳には、文書末まで図７のステップＳ３０１〜Ｓ３０８が繰り返し処理される。 FIG. 7 is a flowchart showing the translation processing of the translation unit 29 in step S104 in FIG. FIG. 7 shows the translation processing of the translation unit 29 for one sentence in the original text of the first language document. For the translation of the entire input original text, steps S301 to S308 in FIG. 7 are repeated until the end of the document. The

翻訳部２９は、制御部２６から翻訳対象の原文が送られてくると、まず、翻訳知識情報格納部３３より翻訳知識情報を読み込む（Ｓ３０１）。図８は翻訳知識情報格納部３３に格納される翻訳知識情報の一例の説明図である。翻訳知識情報は、翻訳対象文書の分野を特定する分野情報３７や、変換規則の適用により決定した見出し語、品詞、訳語の訳語選択情報３８などである。初期状態ではいずれも空の状態である。 When the original text to be translated is sent from the control unit 26, the translation unit 29 first reads the translation knowledge information from the translation knowledge information storage unit 33 (S301). FIG. 8 is an explanatory diagram of an example of translation knowledge information stored in the translation knowledge information storage unit 33. The translation knowledge information includes field information 37 that identifies the field of the document to be translated, headwords, parts of speech, translation selection information 38 of the translations determined by application of conversion rules, and the like. In the initial state, both are empty.

翻訳部２９は、２文目の入力文「We got Taro Toshiba.」に対して辞書部３１の下記知識を用いて、辞書引き及び形態素解析処理を行う（Ｓ３０２）。 The translation unit 29 performs dictionary lookup and morphological analysis processing on the second input sentence “We got Taro Toshiba.” Using the following knowledge of the dictionary unit 31 (S302).

（１）基本辞書部３４
（ａ）語彙部３４ａ
（ｂ）形態素解析規則３４ｂ
（２）専門用語辞書部３５
（３）ユーザ辞書部３６
次に、翻訳部２９は、図７のステップＳ３０３において、基本辞書部３４の構文・意味解析規則３４ｃを用いて、入力原文の構文・意味解析を行う。この段階で、入力言語の解析結果の構造が構築されている。 (1) Basic dictionary 34
(A) Vocabulary part 34a
(B) Morphological analysis rule 34b
(2) Technical term dictionary part 35
(3) User dictionary unit 36
Next, in step S303 of FIG. 7, the translation unit 29 performs syntax / semantic analysis of the input original sentence using the syntax / semantic analysis rule 34c of the basic dictionary unit 34. At this stage, the structure of the input language analysis result is established.

次に、翻訳部２９は、図７のステップＳ３０４に処理を移行して、基本辞書部３４の語彙部３４ａ、変換規則３４ｄ、専門用語辞書部３５の語彙部３５ａ、ユーザ辞書部３６の語彙部３６ａ、訳語学習部３６ｂの知識を用いて、入力言語の構造を出力言語の構造に変換する処理を行う。この段階で、出力言語の言語的な構造と共に、入力単語に対する訳語の決定も行う。 Next, the translation unit 29 shifts the processing to step S304 in FIG. 7, and the vocabulary unit 34a of the basic dictionary unit 34, the conversion rule 34d, the vocabulary unit 35a of the technical term dictionary unit 35, and the vocabulary unit of the user dictionary unit 36. 36a and the translation learning unit 36b are used to convert the structure of the input language into the structure of the output language. At this stage, the translation of the input word is determined together with the linguistic structure of the output language.

図９は、２文目の入力文「We got Taro Toshiba.」の入力文字列「Taro Toshiba.」に対する基本辞書部３４の語彙部３４ａの一例の説明図である。翻訳部２９は、入力文字列「Taro Toshiba.」に対する基本辞書部３４の語彙部３４ａの中から見出し語分野情報３７ａを参照し、見出し語分野情報３７ａの有無を判定する（Ｓ３０５）。見出し語分野情報３７ａが存在する場合は、その見出し語分野情報３７ａ（スポーツ：野球）を翻訳知識情報格納部３３に格納する（Ｓ３０６）。これにより、図８に示すように、入力文の”Taro Toshiba”に付与されている「スポーツ：野球」が翻訳知識情報格納部３３に格納される。 FIG. 9 is an explanatory diagram of an example of the vocabulary part 34a of the basic dictionary part 34 for the input character string “Taro Toshiba.” Of the second input sentence “We got Taro Toshiba.”. The translation unit 29 refers to the entry word field information 37a from the vocabulary part 34a of the basic dictionary part 34 for the input character string “Taro Toshiba.”, And determines the presence or absence of the entry word field information 37a (S305). If the headword field information 37a exists, the headword field information 37a (sports: baseball) is stored in the translation knowledge information storage unit 33 (S306). As a result, as shown in FIG. 8, “sports: baseball” given to “Taro Toshiba” of the input sentence is stored in the translation knowledge information storage unit 33.

この一例による例文では、翻訳知識情報格納部３３に分野情報３７が格納されるが、分野情報３７だけではなく、共に使用される単語との関係（訳語選択情報）によって訳語が決定された場合も翻訳知識情報として格納するようにしてもよい。以下の例文を用いて説明する。 In the example sentence according to this example, the field information 37 is stored in the translation knowledge information storage unit 33. However, not only the field information 37 but also a translation word is determined based on the relationship (translation word selection information) with a word used together. You may make it store as translation knowledge information. This will be explained using the following example sentence.

例文
「He erected a tent city along the banks of the Grand Canal.」
この例文中の「bank」には、「銀行」と「土手」という全く異なる性質の訳語が含まれている。このような場合は、基本辞書部３４の変換規則３４ｄには、例えば、下記のような規則１、２、３が記述されている。 Example sentence `` He erected a tent city along the banks of the Grand Canal. ''
In this example sentence, “bank” includes translations of completely different properties, “bank” and “bank”. In such a case, the following rules 1, 2, and 3 are described in the conversion rule 34d of the basic dictionary unit 34, for example.

規則１（「of」の目的語が都市名の場合は「銀行」）
規則２（「of」の目的語が「* canal」or 「* river」の場合は「土手」）
規則３（いずれにも該当しない場合は「銀行」）
この例文には、「Grand Canal」が存在するため規則２が採用され、訳語が「土手」に決定される。このように決定された情報も翻訳知識情報格納部３３に格納する。こうして、出力言語の構造が構築されると、翻訳部２９は処理をステップＳ３０７に移行して、基本辞書部３４の構文生成規則３４ｅを用いて、２次元的な構造を１次元的な構造に変換する。最後に、翻訳部２９は、ステップＳ３０８において、基本辞書部３４の形態素生成規則３４ｆを用いて、個々の語の表層形態を生成し、最終的な訳文を文書記憶部３２に出力する。 Rule 1 ("Bank" if the object of "of" is a city name)
Rule 2 (If the object of “of” is “* canal” or “* river”, “bank”)
Rule 3 (“Bank” if none of these apply)
In this example sentence, “Grand Canal” exists, so rule 2 is adopted, and the translation is determined to be “bank”. Information determined in this way is also stored in the translation knowledge information storage unit 33. When the structure of the output language is constructed in this way, the translation unit 29 moves the process to step S307, and uses the syntax generation rule 34e of the basic dictionary unit 34 to convert the two-dimensional structure into a one-dimensional structure. Convert. Finally, in step S308, the translation unit 29 uses the morpheme generation rules 34f of the basic dictionary unit 34 to generate the surface layer form of each word, and outputs the final translation to the document storage unit 32.

引き続き、次の文である３文目の「Do you know him?」に対してもステップＳ３０１〜Ｓ３０８の処理を行い、文書記憶部３２に訳文を出力する。そして、文書記憶部３２に訳文を出力した時点で階層２の翻訳が完了し制御部２６に処理が戻る。 Subsequently, the third sentence “Do you know him?”, Which is the next sentence, is also processed in steps S301 to S308, and the translated sentence is output to the document storage unit 32. When the translation is output to the document storage unit 32, the translation of the hierarchy 2 is completed and the process returns to the control unit 26.

次に、制御部２６は、図３のステップＳ１０５により、引用形式の階層１の翻訳が完了したかの判定を行う（Ｓ１０５）。翻訳を完了した「Do you know him?」の引用形式の階層（Ｌ）は、階層１ではない（階層２である）ため、Ｌ＝Ｌ−１を実行した後（Ｓ１０６）、ステップＳ１０３に処理を移行する。これにより、階層１の文である１文目の「He is a reliever.」を取得し、翻訳部２９は、階層１の文である「He is a reliever.」に対して、図７のステップＳ３０１〜Ｓ３０８の翻訳処理を行う。 Next, in step S105 in FIG. 3, the control unit 26 determines whether the translation of the citation format layer 1 has been completed (S105). Since the layer (L) in the quotation form “Do you know him?” That has been translated is not layer 1 (layer 2), L = L−1 is executed (S106), and the process proceeds to step S103. To migrate. As a result, the first sentence “He is a reliever.”, Which is a sentence in layer 1, is acquired, and the translation unit 29 performs the steps in FIG. 7 on “He is a reliever.”, Which is a sentence in layer 1. The translation processing of S301 to S308 is performed.

すなわち、翻訳部２９は、翻訳知識情報格納部３３より、階層２の文書を翻訳した際に格納された図８の分野情報３７より分野が「スポーツ：野球」という翻訳知識情報を取得する（Ｓ３０１）。続いて、入力文「He is a reliever.」に対して辞書部３１の知識を用いて、辞書引き及び形態素解析処理（Ｓ３０２）、構文・意味解析を行い（Ｓ３０３）、ステップＳ３０４の変換処理に移行する。ステップＳ３０４において訳語を決定する際に、取得した翻訳知識情報である「スポーツ：野球」と、図９の訳語分野情報３７ｂを参照して同じ分野の訳語が存在するか判定する。 That is, the translation unit 29 acquires from the translation knowledge information storage unit 33 the translation knowledge information that the field is “sports: baseball” from the field information 37 of FIG. 8 stored when the document of the hierarchy 2 is translated (S301). ). Subsequently, using the knowledge of the dictionary unit 31 for the input sentence “He is a reliever.”, Dictionary lookup and morpheme analysis processing (S302), syntax and semantic analysis are performed (S303), and the conversion processing of step S304 is performed. Transition. When the translation is determined in step S304, it is determined whether there is a translation in the same field with reference to the acquired translation knowledge information “Sports: Baseball” and the translation field information 37b in FIG.

ここで、見出し語に多数の訳語がある場合には、見出し語分野情報３７ａを一義的に定めることができないので、そのような見出し語に対する見出し語分野情報３７ａには「０」が記述され、各々の訳語ごとに訳語分野情報３７ｂが記述されている。 Here, when there are a large number of translated words in the headword, the headword field information 37a cannot be uniquely determined. Therefore, “0” is described in the headword field information 37a for such headword, Translation field information 37b is described for each translation word.

これにより、「reliever」の訳語を決定するときに、訳語分野情報３７ｂが「スポーツ：野球」である「リリーフ投手」が取得される。引き続き、新たに取得された翻訳知識情報があるかを判定し（Ｓ３０５）、翻訳知識情報がある場合は翻訳知識情報格納部３３に翻訳知識を格納し（Ｓ３０６）、構文生成処理を行い（Ｓ３０７）、形態素生成処理を行って（Ｓ３０８）訳文が文書記憶部３２に格納される。 Thus, when the translated word “reliever” is determined, “relief pitcher” whose translated word field information 37b is “sports: baseball” is acquired. Subsequently, it is determined whether there is newly acquired translation knowledge information (S305). If there is translation knowledge information, translation knowledge is stored in the translation knowledge information storage unit 33 (S306), and syntax generation processing is performed (S307). ), Morpheme generation processing is performed (S308), and the translation is stored in the document storage unit 32.

階層１の文である「He is a reliever.」の翻訳が完了すると制御部２６は、図３のステップＳ１０７に示すように、翻訳結果を文書記憶部３２より取得して（Ｓ１０７）、出力部２８を呼び出す。出力部２８は渡された翻訳結果を出力する（Ｓ１０８）。 When the translation of “He is a reliever.”, Which is a sentence in layer 1, is completed, the control unit 26 obtains the translation result from the document storage unit 32 (S107) as shown in step S107 in FIG. Call 28. The output unit 28 outputs the passed translation result (S108).

図１０は本発明の実施の形態に係る機械翻訳装置の第一言語文書が入力されてから訳文を出力するまでの処理内容の他の一例を示すフローチャートである。この他の一例は、図３に示した一例に対し、文書階層判定部３０は、ステップＳ４０１の文書階層判定処理において、入力装置２０から入力部２７を介して入力された第一言語文書の文の出現順序を文書記憶部３２に記憶し、制御部２６はステップＳ４０２の訳文のソート処理において、第一言語の文書を構成する各文の出現順序と同じ順序に訳文をソートし、出力部２８は、ステップＳ４０３の翻訳結果出力処理において、第一言語の文書を構成する各文の出現順序と同じ順序で翻訳結果の文を出力するようにしたものである。図３と同一処理については同一符号を付し重複する説明は省略する。 FIG. 10 is a flowchart showing another example of the processing contents from when the first language document of the machine translation apparatus according to the embodiment of the present invention is input until the translation is output. In another example, in contrast to the example shown in FIG. 3, the document hierarchy determination unit 30 performs the sentence of the first language document input from the input device 20 via the input unit 27 in the document hierarchy determination processing in step S401. In the document storage unit 32, the control unit 26 sorts the translated sentences in the same order as the appearance order of the sentences constituting the document in the first language in the translation sorting process in step S402, and outputs the output unit 28. In the translation result output process in step S403, the translation result sentences are output in the same order as the appearance order of the sentences constituting the first language document. The same processes as those in FIG. 3 are denoted by the same reference numerals, and redundant description is omitted.

制御部２６は、入力部２７を介して原文データが入力されると、文書階層判定部３０を起動する（Ｓ４０１）。ここでは、入力部２７よりに以下に示す原文が入力されたとする。 When the original text data is input via the input unit 27, the control unit 26 activates the document hierarchy determination unit 30 (S401). Here, it is assumed that the following original text is input from the input unit 27.

No, He is reliever.
>>We got Taro Toshiba.
>>He is a pitcher.
>Is he a starter?
図１１は、図１０中のステップＳ４０１における文書階層判定部３０の処理内容を示すフローチャートである。図１１は、図５に示した一例に対して、ステップＳ５０１において、入力された原文の行番号を格納する変数Ｐを設け、ステップＳ５０２において、入力された原文の行番号の変数Ｐを更新し、ステップＳ５０７において、引用形式の階層Ｌの文として行番号Ｐとともに文書記憶部３２に格納するようにしたものである。図５と同一処理については同一符号を付し重複する説明は省略する。 No, He is reliever.
>> We got Taro Toshiba.
>> He is a pitcher.
> Is he a starter?
FIG. 11 is a flowchart showing the processing contents of the document hierarchy determination unit 30 in step S401 in FIG. FIG. 11 provides a variable P for storing the input original text line number in step S501, and updates the input original text line number variable P in step S502, for the example shown in FIG. In step S507, the sentence is stored in the document storage unit 32 together with the line number P as a sentence in the citation format layer L. The same processes as those in FIG. 5 are denoted by the same reference numerals, and redundant description is omitted.

図１１において、まず、入力された原文の行番号を格納する変数Ｐに０を設定する（Ｓ５０１）。続いて文Ｓを取得して（Ｓ２０１）、続いて文書の階層を示す変数Ｌ、取得した文Ｓの判定開始位置を示す変数Ｎ、引用記号配列の位置を示す変数Ｍをそれぞれ１に設定するとともに、Ｐに１を加える（Ｓ５０２）。 In FIG. 11, first, a variable P that stores the line number of the input original text is set to 0 (S501). Subsequently, the sentence S is acquired (S201), then the variable L indicating the document hierarchy, the variable N indicating the determination start position of the acquired sentence S, and the variable M indicating the position of the quote symbol array are set to 1, respectively. At the same time, 1 is added to P (S502).

そして、ステップＳ２０３〜Ｓ２０６、ステップＳ２０９〜Ｓ２１２において、図３と同様な処理により、引用形式の階層Ｌを求め、ステップＳ５０７において、階層Ｌの文とともに、行番号Ｐを文書記憶部３２に格納する。 Then, in steps S203 to S206 and steps S209 to S212, the citation format hierarchy L is obtained by the same processing as in FIG. 3, and the line number P is stored in the document storage unit 32 together with the sentence of the hierarchy L in step S507. .

図１２は本発明の実施の形態に係る機械翻訳装置の他の一例での文書記憶部３２に格納される第一言語文書の一例の説明図である。図１２に示すように、文書記憶部３２には、引用形式の階層、原文とともにその原文が存在した行番号が記憶される。最後の文まで処理が終わったか否かを判定し（Ｓ２０８）、最後の文まで処理が終わった場合は制御部２６に復帰する。 FIG. 12 is an explanatory diagram of an example of the first language document stored in the document storage unit 32 in another example of the machine translation device according to the embodiment of the present invention. As shown in FIG. 12, the document storage unit 32 stores the citation format hierarchy, the original text, and the line number where the original text existed. It is determined whether or not processing has been completed up to the last sentence (S208), and if processing has been completed up to the last sentence, the process returns to the control unit 26.

このようにして、ステップ４０１にて原文が存在した行番号が記憶された状態で、図１０のステップＳ１０２〜Ｓ１０７の処理が図３の場合と同様に行われる。そして、制御部２６は、文書記憶部３２より訳文とともに行番号を取得し、取得したデータに対して、行番号を元に昇順にソートを行い（Ｓ４０２）、出力部２８を呼び出し、出力部２８は渡された翻訳結果を第一言語の文書を構成する各文の出現順序と同じ順序で出力する（Ｓ４０３）。 In this way, the process of steps S102 to S107 of FIG. 10 is performed in the same manner as in FIG. Then, the control unit 26 acquires line numbers together with the translation from the document storage unit 32, sorts the acquired data in ascending order based on the line numbers (S402), calls the output unit 28, and outputs the output unit 28. Outputs the passed translation results in the same order as the appearance order of each sentence constituting the document in the first language (S403).

本発明の実施の形態によれば、電子メールやＷｅｂ掲示板などのやり取りの際に含まれる引用記号などを利用して文書の階層構造を判断し、階層が深い順に翻訳することにより、獲得した翻訳知識情報を階層が浅い文の翻訳に適用できる。従って、同一内容の文書が文の出現順に左右されずに翻訳精度を向上させることができる。 According to the embodiment of the present invention, a translation obtained by judging the hierarchical structure of a document using quotation marks included when exchanging e-mails, Web bulletin boards, etc., and translating in descending order of hierarchy. Knowledge information can be applied to translation of sentences with a shallow hierarchy. Accordingly, it is possible to improve translation accuracy without affecting documents having the same content in the order of appearance of sentences.

また、階層が深い順に翻訳するため原文の記述状態によっては、翻訳した順序に出力すると、原文と訳文の並びがことなる結果となる場合がある。そこで、原文の出現順を記憶させ、対応する訳文を原文の並び順にソートすることで訳文を原文と同じ順に出力することができるので、翻訳結果が読みやすくなる。 In addition, since translation is performed in the order of deeper hierarchy, depending on the description state of the original sentence, if the output is performed in the translated order, the original sentence and the translated sentence may be arranged differently. Therefore, by storing the order of appearance of the original sentences and sorting the corresponding translated sentences in the order of the original sentences, the translated sentences can be output in the same order as the original sentences, so that the translation results are easy to read.

本発明の実施の形態に係る機械翻訳装置のハードウエア構成を示すブロック構成図。The block block diagram which shows the hardware constitutions of the machine translation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係わる機械翻訳装置の機能ブロック図。The functional block diagram of the machine translation apparatus concerning embodiment of this invention. 本発明の実施の形態に係る機械翻訳装置の第一言語文書が入力されてから訳文を出力するまでの処理内容の一例を示すフローチャート。The flowchart which shows an example of the processing content after inputting the 1st language document of the machine translation apparatus which concerns on embodiment of this invention until it outputs a translation. 本発明の実施の形態における引用記号の配列位置の説明図。Explanatory drawing of the arrangement position of the reference symbol in embodiment of this invention. 図３中のステップＳ１０１における文書階層判定部の処理内容を示すフローチャート。FIG. 4 is a flowchart illustrating processing contents of a document hierarchy determination unit in step S101 in FIG. 本発明の実施の形態における文書記憶部に格納される第一言語文書の一例の説明図。Explanatory drawing of an example of the 1st language document stored in the document memory | storage part in embodiment of this invention. 図３中のステップＳ１０４における翻訳部の翻訳処理を示すフローチャート。The flowchart which shows the translation process of the translation part in step S104 in FIG. 本発明の実施の形態における翻訳知識情報格納部に格納される翻訳知識情報の一例の説明図。Explanatory drawing of an example of the translation knowledge information stored in the translation knowledge information storage part in embodiment of this invention. 入力文「We got Taro Toshiba.」の入力単語「Taro Toshiba.」に対する基本辞書部の語彙部の一例の説明図。Explanatory drawing of an example of the vocabulary part of the basic dictionary part with respect to input word "Taro Toshiba." Of input sentence "We got Taro Toshiba.". 本発明の実施の形態に係る機械翻訳装置の第一言語文書が入力されてから訳文を出力するまでの処理内容の他の一例を示すフローチャート。The flowchart which shows another example of the processing content after inputting the 1st language document of the machine translation apparatus which concerns on embodiment of this invention until it outputs a translation. 図１０中のステップＳ４０１における文書階層判定部の処理内容を示すフローチャート。11 is a flowchart showing processing contents of a document hierarchy determination unit in step S401 in FIG. 本発明の実施の形態に係る機械翻訳装置の他の一例での文書記憶部に格納される第一言語文書の一例の説明図。Explanatory drawing of an example of the 1st language document stored in the document memory | storage part in another example of the machine translation apparatus which concerns on embodiment of this invention.

Explanation of symbols

１１…機械翻訳装置、１２…演算制御装置、１３…プロセッサ、１４…メモリ、１５…機械翻訳プログラム、１６…作業エリア、１７…出力装置、１８…表示装置、１９…通信制御装置、２０…入力装置、２１…マウス、２２…キーボード、２３…ディスクドライブ、２４…ハードディスクドライブ、２５…記憶装置、２６…制御部、２７…入力部、２８…出力部、２９…翻訳部、３０…文書階層判定部、３１…翻訳辞書部、３２…文書記憶部、３３…翻訳知識情報格納部、３４…基本辞書部、３５…専門用語辞書部、３６…ユーザ辞書部、３７…分野情報、３８…訳語選択情報、３９…引用記号記憶部 DESCRIPTION OF SYMBOLS 11 ... Machine translation apparatus, 12 ... Operation control apparatus, 13 ... Processor, 14 ... Memory, 15 ... Machine translation program, 16 ... Work area, 17 ... Output device, 18 ... Display apparatus, 19 ... Communication control apparatus, 20 ... Input Device 21 ... Mouse 22 ... Keyboard 23 ... Disk Drive 24 ... Hard Disk Drive 25 ... Storage Device 26 ... Control Unit 27 ... Input Unit 28 ... Output Unit 29 ... Translation Unit 30 ... Document Hierarchy Determination , 31 ... Translation dictionary part, 32 ... Document storage part, 33 ... Translation knowledge information storage part, 34 ... Basic dictionary part, 35 ... Technical term dictionary part, 36 ... User dictionary part, 37 ... Field information, 38 ... Translation word selection Information, 39 ... Quote symbol storage

Claims

A machine translation program, a storage device storing a translation dictionary part storing vocabulary and rules necessary for translation, an input device for inputting the original text of the first language to be translated, and a translated text of the second language after translation an output device for outputting, the machine translation program Te machine translation apparatus odor and a processor for execution, whether the top reference symbol sentence of the first language document entered from the entering force device is present When there is a citation symbol, the document hierarchy determination unit determines the number of citation symbols from the beginning of the sentence as the citation format hierarchy, and the first for each citation format hierarchy determined by the document hierarchy determination unit. A document storage unit for storing a sentence of a language document, a translation knowledge information storage unit for storing translation knowledge information used in translating a sentence of a first language document input from the input device, and the document storage Part Hierarchy citation form of the sentence in the first language document deep forward to the translation dictionary of words and conversion rules and the translation knowledge information storing unit by using the translation knowledge information stored in performs translation the translation A translation unit that extracts translation knowledge information used when selecting a translation word from the vocabulary and conversion rules of the translation dictionary unit used at the time, stores the translation knowledge information in the translation knowledge information storage unit, and outputs a translation result to the output device. A machine translation device characterized by that.

The document hierarchy determination unit stores the order of appearance of sentences of the first language document input from the input device in the document storage unit, and the output unit includes the order of appearance of sentences constituting the document of the first language. The machine translation apparatus according to claim 1, wherein sentences of translation results are output in the same order.

A machine translation program, a storage device storing a translation dictionary part storing vocabulary and rules necessary for translation, an input device for inputting the original text of the first language to be translated, and a translated text of the second language after translation An output device for outputting and a processor for calculating and executing the machine translation program, and the storage device is configured to calculate the number of citation symbols present at the beginning of a sentence of the first language document input from the input device in a citation format. A document storage unit for storing sentences of the first language document for each hierarchy of the citation format as a hierarchy, and translation knowledge information used when translating sentences of the first language document input from the input device In a machine translation method for performing machine translation using a machine translation device having a translation knowledge information storage unit, a sentence of a first language document input from the input device is input to the processor, The processor determines whether or not there is a quotation mark at the beginning of the sentence of the first language document input from the input device, and when there is a quotation mark, the number of quotation marks from the beginning of the sentence And the sentence of the first language document is stored in the document storage unit for each hierarchy of the determined citation format, and the citation format hierarchy is included in the sentences of the first language document stored in the document storage unit. from a deep order of the dictionary portion vocabulary and conversion rule and using a translation knowledge information stored in the translation knowledge information storage unit performs translation, lexical and conversion rules of the translation dictionary unit used in the translation A machine translation method comprising: extracting translation knowledge information used when selecting a translation word, storing the translation knowledge information in the translation knowledge information storage unit, and outputting the translated translation result to the output device.

A machine translation program, a storage device storing a translation dictionary part storing vocabulary and rules necessary for translation, an input device for inputting the original text of the first language to be translated, and a translated text of the second language after translation In a machine translation program used in a computer comprising an output device for outputting and a processor for executing and executing the machine translation program, the storage device stores a quotation mark present at the head of a sentence of the input first language document. Document storage unit for storing sentences of the first language document for each citation format hierarchy, with the number as the citation format hierarchy, and translation knowledge used when translating the sentence of the first language document input from the input device A translation knowledge information storage unit for storing information; a procedure for inputting a sentence of a first language document input from the input device to the computer; and the input device The procedure for determining whether or not there is a citation symbol at the beginning of the sentence of the first language document input from and determining the number of citation symbols from the beginning of the sentence as the citation format hierarchy, , A procedure for storing the sentence of the first language document in the document storage unit for each hierarchy of the determined citation format, and in the descending order of the citation format hierarchy among the sentences of the first language document stored in the document storage unit From the vocabulary and conversion rules of the translation dictionary unit and the procedure for performing translation using the translation knowledge information stored in the translation knowledge information storage unit, and from the vocabulary and conversion rules of the translation dictionary unit used for the translation A machine translation program for executing a procedure of extracting translation knowledge information used when selecting a translation word and storing the translation knowledge information in the translation knowledge information storage unit and a procedure of outputting a translated translation result to the output device.