JPH1115825A

JPH1115825A - Machine translation system and computer-readable recording medium recording machine translation processing program

Info

Publication number: JPH1115825A
Application number: JP9167082A
Authority: JP
Inventors: Hiroko Nozawa; 裕子野沢
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1997-06-24
Filing date: 1997-06-24
Publication date: 1999-01-22

Abstract

PROBLEM TO BE SOLVED: To obtain a translation result better than the translation of sentences simply delimited in accordance with the numbers of characters by delimiting an original language document on a suitable position as a delimiter. SOLUTION: When an original language document to be translated is delimited by limitation to the number of characters in one sentence of the original language document which is set up on the system side at the time of reading out the original language document, morpheme analysis for the undelimited document is executed, the information of a table 47 describing plural positions to be delimited is referred to and a position suited as a delimiter is retrieved from the morpheme analysis result. When a position to be newly delimited exists, a character string up to the position is delimited as one sentence and stored in an input buffer area 31. Then a character string after the new delimited position is added to the head of a character string after a position delimited by a Japanese reading part 45 and the processing of a delimited position changing part 46 is allowed to end.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、機械翻訳装置及び
機械翻訳処理プログラムを記録したコンピュータ読み取
り可能な記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine translation apparatus and a computer-readable recording medium on which a machine translation processing program is recorded.

【０００２】[0002]

【従来の技術】従来、原言語を所望の他の言語、即ち、
目的言語に翻訳する機械翻訳装置においては、既に翻訳
対象となる原言語の文書が文書データとして存在すると
き、翻訳するためにその文書データを読み込む機能が用
意されている。この、文書データを読み込む機能では、
１文毎に区切って読み込むことができる。例えば、原言
語が日本語であるときは、日本語文書を句点毎に区切っ
て読み込むのである。このようにすると、以降の翻訳処
理が１文ずつ行われるので、翻訳処理が複雑にならず、
複数文を一度に翻訳するよりも効率がよいのである。2. Description of the Related Art Conventionally, another language for which a source language is desired, that is,
In a machine translation device that translates into a target language, when a document in a source language to be translated already exists as document data, a function of reading the document data for translation is provided. With this function to read document data,
You can read each sentence separately. For example, when the source language is Japanese, a Japanese document is read by separating each period. In this case, the subsequent translation processing is performed one sentence at a time, so that the translation processing is not complicated.
It is more efficient than translating multiple sentences at once.

【０００３】さらに、１文の長さが非常に長いと、翻訳
処理が複雑になる上、翻訳結果も悪くなる傾向があるの
で、原言語１文としての文字数を制限することもある。
例えば、原言語としての日本語が１文で１００文字あっ
ても、そのシステムの原言語の１文の文字数が５０文字
に制限されている場合には、日本語文を読み込む際、２
文に分かれることになる。このように、一般には、翻訳
処理に適した単位に文書を区切って、翻訳処理を行うの
である。Further, if the length of one sentence is very long, the translation process becomes complicated and the translation result tends to be deteriorated. Therefore, the number of characters in one source language sentence may be limited.
For example, if one sentence of Japanese as a source language has 100 characters, but if the number of characters in one sentence of the source language of the system is limited to 50 characters, when reading a Japanese sentence, 2
It will be divided into sentences. As described above, in general, a translation process is performed by dividing a document into units suitable for the translation process.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、前記の
ように文字数によって文章を区切ってしまうと、実際に
は区切りとして適さない箇所が区切りとされることが起
こる。例えば、原言語が日本語である時、読み込む文書
が、「彼女の様態はあまり良くないと聞いていたが、私
が昨日病院へ見舞いに行った時には、とても顔色が良く
元気そうだったので、私は安心した。」であるとする。
この文は６４文字からなる。システムで１文の文字数が
５０文字に制限されていると、２文に分けて読み込まれ
る。その結果、１文目は「顔色が良く元気そ」まで、２
文目は「うだったので、私は安心した。」になる。However, if the text is divided according to the number of characters as described above, a portion that is not actually suitable as a delimiter may be delimited. For example, when the source language is Japanese, the document to read is, "I heard that her condition was not so good, but when I went to the hospital yesterday, my complexion looked very good and I seemed fine, I was relieved. "
This sentence consists of 64 characters. If the number of characters in one sentence is limited to 50 in the system, it is read in two sentences. As a result, the first sentence was 2
The sentence reads, "I was relieved because I was out."

【０００５】この区切り位置は日本語としては適当では
なく、その結果は翻訳結果にも反映される。即ち、翻訳
するために日本文を、辞書に登録されている品詞毎に分
解する形態素解析を行うと、１文目の最後の部分は名詞
「元気」と「そ」に分けられる。「そ」は辞書に登録さ
れていない未登録語である。このように、正しくない位
置で区切られた文を形態素解析しても正しい形態素解析
結果は得られないし、その結果を翻訳しても正しい翻訳
結果は得られないのである。[0005] This break position is not appropriate for Japanese, and the result is reflected in the translation result. That is, when a Japanese sentence is decomposed for each part of speech registered in the dictionary for translation, the last part of the first sentence is divided into the nouns "Genki" and "So". “So” is an unregistered word that is not registered in the dictionary. Thus, even if a sentence segmented at an incorrect position is morphologically analyzed, a correct morphological analysis result cannot be obtained, and even if the result is translated, a correct translation result cannot be obtained.

【０００６】本発明は、上述した問題点を解決するため
になされたものであり、原言語文書データの読み込み時
に、その一文が長いため文字数制限により文が区切られ
たとき、その区切られた箇所までを形態素解析し、その
結果に基づき、区切りとして適した位置で区切り直すこ
とにより、単に文字数によって区切られた文を翻訳した
ときよりも良い翻訳結果を得ることのできる機械翻訳装
置及び機械翻訳処理プログラムを記録したコンピュータ
読み取り可能な記録媒体を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems. When reading source language document data, when one sentence is long and the sentences are separated due to the limitation of the number of characters, the separated portion is used. Machine translation device and machine translation process that can obtain better translation results than simply translating a sentence delimited by the number of characters by morphologically analyzing It is an object of the present invention to provide a computer-readable recording medium on which a program is recorded.

【０００７】[0007]

【課題を解決するための手段】この目的を達成するため
に、本発明の請求項１に記載の機械翻訳装置は、原言語
として与えられた原言語文を、目的言語としての他言語
文に翻訳するために必要な情報を格納した翻訳辞書と、
与えられた原言語文を、前記翻訳辞書に格納された情報
を参照しながら形態素解析する第１の形態素解析手段
と、前記第１の形態素解析手段によって形態素解析され
た原言語文を、前記翻訳辞書に格納された情報を参照し
ながら目的言語に翻訳するための文法ルールを格納した
翻訳処理部と、予め用意しておいた原言語文書データ
を、翻訳に適した単位に区切りながら原文として読み込
む読み込み手段とを有し、前記読み込み手段によって区
切られる条件の少なくとも１つが文字数制限とされたも
のを対象とするものである。In order to achieve this object, a machine translation apparatus according to claim 1 of the present invention converts a source language sentence given as a source language into another language sentence as a destination language. A translation dictionary that stores information necessary for translation,
First morphological analysis means for morphologically analyzing the given source language sentence with reference to the information stored in the translation dictionary; and converting the source language sentence morphologically analyzed by the first morphological analysis means into the translation A translation processing unit that stores grammatical rules for translating into a target language while referring to information stored in a dictionary, and reads source language document data prepared in advance as original text while dividing the data into units suitable for translation. A reading unit, wherein at least one of the conditions delimited by the reading unit is limited to the number of characters.

【０００８】そして、特に、前記読み込み手段によって
文書を読み込む際に、文字数制限によって文書が区切ら
れた場合、区切られた箇所までを形態素解析する第２の
形態素解析手段と、前記第２の形態素解析手段による形
態素解析結果から、区切りとして適した位置を判断する
ために、区切りと見なす箇所を記述した区切り位置記憶
手段と、前記第２の形態素解析手段による形態素解析結
果と、前記区切り位置記憶手段の情報とに基づいて、区
切り位置を変更する区切り位置変更手段とを備え、前記
読み込み手段で読み込んだ文書を、前記区切り位置変更
手段によって変更された区切り位置で区切り直すことを
特徴とする。[0008] In particular, when the document is read by the reading means, if the document is divided due to the limitation of the number of characters, the second morphological analysis means for performing morphological analysis up to the divided part; Means for determining a position suitable as a break from the morphological analysis result by the means, a break position storing means describing a place to be regarded as a break, a morphological analysis result by the second morphological analysis means, And a break position changing means for changing a break position based on the information, wherein the document read by the reading means is re-separated by the break position changed by the break position changing means.

【０００９】従って、この発明の機械翻訳装置によれ
ば、前記読み込み手段によって文書を読み込む際に、文
字数制限によって文書が区切られた場合、前記第２の形
態素解析手段が区切られた箇所までを形態素解析する。
そして、前記第２の形態素解析手段による形態素解析結
果と、前記区切り位置記憶手段の情報とに基づいて、前
記区切り位置変更手段により区切り位置を変更する。こ
のようにして、前記読み込み手段で読み込んだ文書を、
前記区切り位置変更手段によって変更された区切り位置
で区切り直す。Therefore, according to the machine translation apparatus of the present invention, when the document is read by the reading means, if the document is separated by the character number restriction, the second morphological analysis means will use the morpheme up to the separated part. To analyze.
Then, based on the result of the morphological analysis by the second morphological analysis unit and the information of the delimiter position storage unit, the delimiter position changing unit changes the delimiter position. In this way, the document read by the reading unit is
The division is performed again at the division position changed by the division position changing means.

【００１０】また、請求項２に記載の機械翻訳装置は、
前記区切り位置変更手段によって新たに区切りとなった
箇所以降の文字列を、前記読み込み手段によって区切ら
れた箇所以降の文字列の先頭に加えるように構成したこ
とを特徴とする。従って、形態素解析の結果から決定し
た区切り箇所以降の文字列を、文字数で決定した区切り
箇所の後方の文字列の先頭に加えることにより、区切り
箇所の後方の日本語も適切な形となり、よりよい翻訳結
果を得ることが可能となる。[0010] The machine translation device according to claim 2 is
The present invention is characterized in that the character string after the part newly separated by the delimiter position changing means is added to the head of the character string after the part delimited by the reading means. Therefore, by adding the character string after the delimiter determined from the result of the morphological analysis to the beginning of the character string after the delimiter determined by the number of characters, the Japanese character after the delimiter also has an appropriate shape, which is better. Translation results can be obtained.

【００１１】また、請求項３に記載の機械翻訳装置は、
前記区切り位置記憶手段の情報をユーザーが設定できる
区切り位置設定手段を有することを特徴とする。従っ
て、形態素解析結果に基づく区切り位置の設定をユーザ
ーが自ら行うことができ、ユーーの希望ににあった読み
込み結果を得ることができる。The machine translation device according to claim 3 is
It is characterized by having a break position setting means which can set the information of the break position storage means by a user. Therefore, the user can set the delimiter position based on the morphological analysis result by himself / herself, and obtain a reading result that meets the user's desire.

【００１２】また、請求項４に記載の機械翻訳装置は、
前記第２の形態素解析手段によって形態素解析された結
果を前記第１の形態素解析手段による第１の形態素解析
結果と見なして、翻訳処理部での処理を行うように構成
したことを特徴とする。従って、区切り位置変更手段に
おける形態素解析結果を保存しておき、翻訳実行時の形
態素解析結果として利用することから、同じ文章につい
て２度形態素解析を行うことを防ぐことができ、効率的
な翻訳作業が可能となる。Further, the machine translation device according to claim 4 is
It is characterized in that the result of the morphological analysis by the second morphological analysis unit is regarded as the first morphological analysis result by the first morphological analysis unit, and the processing is performed by the translation processing unit. Therefore, since the morphological analysis result in the delimiter position changing unit is stored and used as the morphological analysis result at the time of performing translation, it is possible to prevent performing the morphological analysis twice for the same sentence, and to efficiently perform the translation operation. Becomes possible.

【００１３】また、請求項５に記載の機械翻訳装置は、
前記区切り位置記憶手段に記述する優先順位を設定でき
るように構成したことを特徴とする。従って、区切り位
置記憶手段に記述した区切り位置に優先順を設けること
により、複数箇所が検索されたときに、区切りとしてよ
り適した位置で区切ることができる。[0013] The machine translation device according to claim 5 is
It is characterized in that the priorities described in the delimiter position storage means can be set. Therefore, by providing a priority order to the delimiter positions described in the delimiter position storage means, when a plurality of locations are searched, it is possible to delimit at a position more suitable as a delimiter.

【００１４】また、請求項６に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、原言語として与えられた原言語文書データを翻訳に
適した単位に区切りながら原文として読み込み、かつ前
記区切る条件の少なくとも一つが文字数制限とされた読
み込みプログラムと、翻訳辞書に格納された情報を参照
しながら原言語文を形態素解析する第１の形態素解析プ
ログラムと、前記第１の形態素解析プログラムによって
形態素解析された原言語文を前記翻訳辞書に格納さされ
た情報を参照しながら目的言語としての他言語に翻訳す
るための翻訳プログラムとを備えたものを対象とするも
のである。A computer-readable recording medium storing the machine translation processing program according to claim 6 reads the source language document data given as the source language as an original while dividing the data into units suitable for translation, and A reading program in which at least one of the delimiting conditions is limited in the number of characters, a first morphological analysis program for morphologically analyzing a source language sentence with reference to information stored in a translation dictionary, and a first morphological analysis program A translation program for translating a morphologically analyzed source language sentence into another language as a target language while referring to information stored in the translation dictionary.

【００１５】そして、特に、前記原言語文書データが前
記文字数制限によって区切られた場合、区切られた箇所
までを形態素解析する第２の形態素解析プログラムと、
前記第２の形態素解析プログラムによる形態素解析結果
から区切りとして適した位置を判断するために、区切り
と見なす箇所を記述する区切り位置記憶プログラムと、
前記第２の形態素解析プログラムによる形態素解析結果
と、前記区切り位置記憶プログラムの情報とに基づい
て、区切り位置を変更する区切り位置変更プログラムと
を備え、文字数制限によって区切られた前記原文を、前
記区切り位置変更プログラムによって変更された区切り
位置で区切り直すことを特徴とする。In particular, when the source language document data is divided by the character number limitation, a second morphological analysis program for morphologically analyzing up to the divided part,
A delimiter position storage program for describing a position regarded as a delimiter in order to determine a position suitable as a delimiter from a morphological analysis result by the second morphological analysis program;
A delimiter position changing program for changing a delimiter position based on a result of the morphological analysis by the second morphological analysis program and information of the delimiter position storage program; It is characterized in that a break is performed again at a break position changed by the position changing program.

【００１６】従って、前記記録媒体を用いてプログラム
を実行することにより、前記読み込みプログラムによっ
て文書を読み込む際に、文字数制限によって文書が区切
られた場合、前記第２の形態素解析プログラムが区切ら
れた箇所までを形態素解析する。そして、前記第２の形
態素解析プログラムによる形態素解析結果と、前記区切
り位置記憶プログラムの情報とに基づいて、前記区切り
位置変更プログラムにより区切り位置を変更する。この
ようにして、前記読み込みプログラムで読み込んだ文書
を、前記区切り位置変更プログラムによって変更された
区切り位置で区切り直す。Therefore, by executing the program using the recording medium, when the document is read by the reading program, if the document is divided by the number of characters, the location where the second morphological analysis program is divided Morphological analysis is performed for Then, based on the result of the morphological analysis by the second morphological analysis program and the information of the delimiter position storage program, the delimiter position change program changes the delimiter position. In this way, the document read by the reading program is re-separated at the delimiter position changed by the delimiter position changing program.

【００１７】また、請求項７に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記区切り位置変更プログラムが、新たに区切りと
なった箇所以降の文字列を、前記読み込みプログラムに
よって区切られた箇所以降の文字列の先頭に加えること
を特徴とする。従って、前記記録媒体を用いてプログラ
ムを実行することにより、前記第２の形態素解析プログ
ラムによる形態素解析の結果から決定した区切り箇所以
降の文字列を、文字数で決定した区切り箇所の後方の文
字列の先頭に加えることにより、区切り箇所の後方の日
本語も適切な形となり、よりよい翻訳結果を得ることが
可能となる。According to a seventh aspect of the present invention, there is provided a computer-readable recording medium storing the machine translation processing program according to the seventh aspect, wherein the delimiter position change program causes the read program to read a character string after a new delimiter. It is added to the beginning of the character string after the delimited part. Therefore, by executing the program using the recording medium, the character string after the delimiter determined from the result of the morphological analysis by the second morphological analysis program is replaced with the character string after the delimiter determined by the number of characters. By adding it to the beginning, the Japanese language after the delimiter is also in an appropriate form, and a better translation result can be obtained.

【００１８】また、請求項８に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記区切り位置記憶プログラムの情報をユーザーが
設定できる区切り位置設定プログラムを備えたことを特
徴とする。従って、前記記録媒体を用いてプログラムを
実行することにより、前記第２の形態素解析プログラム
による形態素解析結果に基づく区切り位置の設定をユー
ザーが自ら行うことができ、ユーザーの希望ににあった
読み込み結果を得ることができる。[0018] A computer-readable recording medium storing the machine translation processing program according to the present invention has a delimiter position setting program that allows a user to set information of the delimiter position storage program. . Therefore, by executing the program using the recording medium, the user can set the break position based on the result of the morphological analysis by the second morphological analysis program, and the reading result that meets the user's desire can be set. Can be obtained.

【００１９】また、請求項９に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
は、前記翻訳プログラムが、前記第２の形態素解析プロ
グラムによって形態素解析された結果を前記第１の形態
素解析プログラムによる形態素解析結果と見なして、翻
訳処理を行うことを特徴とする。従って、前記記録媒体
を用いてプログラムを実行することにより、区切り位置
変更プログラムにおける前記第２の形態素解析プログラ
ムによる形態素解析結果を保存しておき、翻訳実行時の
形態素解析結果として利用することから、同じ文章につ
いて２度形態素解析を行うことを防ぐことができ、効率
的な翻訳作業が可能となる。A computer-readable recording medium on which the machine translation processing program according to claim 9 is recorded, wherein the translation program performs a morphological analysis by the second morphological analysis program on the basis of the first morphological analysis result. The translation processing is performed by regarding the result of the morphological analysis by the analysis program. Therefore, by executing the program using the recording medium, the morphological analysis result by the second morphological analysis program in the break position change program is stored and used as the morphological analysis result at the time of performing translation. Performing a morphological analysis twice for the same sentence can be prevented, and efficient translation can be performed.

【００２０】さらに、請求項１０に記載の機械翻訳処理
プログラムを記録したコンピュータ読み取り可能な記録
媒体は、前記区切り位置記憶プログラムが、記述する情
報について、区切りにする優先順位を設定できることを
特徴とする。従って、前記記録媒体を用いてプログラム
を実行することにより、区切り位置に優先順が設定され
ることにより、複数箇所が検索されたときに、区切りと
してより適した位置で区切ることができる。Further, a computer-readable recording medium recording the machine translation processing program according to claim 10 is characterized in that the information described by the delimiter position storage program can set a priority for delimiting. . Therefore, by executing a program using the recording medium, a priority order is set for the delimiter positions, so that when a plurality of locations are searched, it is possible to delimit at a position more suitable as a delimiter.

【００２１】[0021]

【発明の実施の形態】以下に、本発明の実施の形態につ
いて図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２２】ここでは、原言語としての日本語文を目的
言語としての英語文に翻訳する日英機械翻訳装置を例に
して説明する。Here, an example of a Japanese-English machine translator for translating a Japanese sentence as a source language into an English sentence as a target language will be described.

【００２３】先ず、図１を参照して、本実施の形態の日
英機械翻訳装置全体の構成を説明する。First, the configuration of the entire Japanese-English machine translation apparatus according to the present embodiment will be described with reference to FIG.

【００２４】図１に示すように、本実施の形態の日英機
械翻訳装置は、入力装置１０、出力装置５０、ＣＰＵ２
０、ＲＡＭ３０、ＲＯＭ４０、外部記憶装置６０等から
構成される。As shown in FIG. 1, the Japanese-to-English machine translation apparatus of the present embodiment comprises an input device 10, an output device 50, a CPU 2
0, a RAM 30, a ROM 40, an external storage device 60, and the like.

【００２５】ＣＰＵ２０は、装置全体を制御するための
中央処理装置であり、入力装置１０、出力装置５０、Ｒ
ＡＭ３０、ＲＯＭ４０、外部記憶装置６０等にそれぞれ
接続されている。The CPU 20 is a central processing unit for controlling the entire apparatus, and includes an input device 10, an output device 50,
It is connected to the AM 30, the ROM 40, the external storage device 60, and the like.

【００２６】入力装置１０は、キーボード等からなり、
翻訳の対象となる日本語文や指示を入力する。The input device 10 comprises a keyboard or the like,
Enter a Japanese sentence or instruction to be translated.

【００２７】出力装置５０は、ＣＲＴ等からなり、入力
された日本語文や翻訳結果を表示する。The output device 50 comprises a CRT or the like, and displays an input Japanese sentence and a translation result.

【００２８】ＲＡＭ３０は、入力された日本語文を記憶
するための入力バッファ領域３１と、翻訳結果の英語文
を記憶するための出力バッファ領域３２と、ワークエリ
ア３３とからなる。The RAM 30 includes an input buffer area 31 for storing an input Japanese sentence, an output buffer area 32 for storing a translated English sentence, and a work area 33.

【００２９】プログラムや辞書等を格納するＲＯＭ４０
は、各種処理を行うための制御プログラム４１と、与え
られた日本語文を形態素解析するための第１の形態素解
析手段及び第２の形態素解析手段としての形態素解析部
４２と、翻訳するためにシステムが予め用意した辞書を
格納した、翻訳辞書としての辞書部４３と、辞書情報を
参照しながら翻訳を進めるための文法ルールを格納し
た、翻訳処理部としての文法部４４と、区切り位置記憶
手段としてのテーブル４７とからなる。ROM 40 for storing programs, dictionaries, etc.
Is a control program 41 for performing various processes, a morphological analysis unit 42 as a first morphological analysis unit and a second morphological analysis unit for morphologically analyzing a given Japanese sentence, and a system for translating. A dictionary unit 43 as a translation dictionary, which stores a dictionary prepared in advance, a grammar unit 44 as a translation processing unit, which stores grammar rules for advancing translation while referring to dictionary information, and a delimiter position storage unit. Table 47.

【００３０】制御プログラム４１には、読み込み手段と
しての日本語読み込み部４５と、区切り位置変更手段と
しての区切り位置変更部４６が含まれる。The control program 41 includes a Japanese reading section 45 as reading means and a break position changing section 46 as break position changing means.

【００３１】さらに、外部記憶装置６０には、使用者が
希望に応じて用意する専門用語辞書や、使用者が翻訳結
果に反映させるために自ら作成するユーザー辞書等を格
納する。Further, in the external storage device 60, a technical term dictionary prepared by the user as desired, a user dictionary created by the user to reflect the translation result, and the like are stored.

【００３２】図２は、本実施の形態の日英翻訳装置にお
いて、翻訳対象となる日本語文書データを読み込むとき
の処理の概略を示すフローチャートである。本実施の形
態における日本語文書データ読み込み時には、日本語読
み込み部４５によって、文書が句点毎に区切られると共
に、１文が５０文字以上で構成される場合には、５０文
字で文を区切って読み込まれるように構成されているも
のとする。FIG. 2 is a flowchart showing an outline of processing when reading Japanese document data to be translated in the Japanese-English translation apparatus of the present embodiment. At the time of reading Japanese document data in the present embodiment, the Japanese reading section 45 separates the document into punctuation marks, and when one sentence is composed of 50 characters or more, reads the sentence with 50 characters. It is assumed that it is configured to be

【００３３】ここで、１文が５０文字以上の日本語を読
み込むときの本実施形態における処理の流れを図２のフ
ローチャートを参照して説明する。Here, the flow of processing in this embodiment when one sentence reads Japanese having 50 characters or more will be described with reference to the flowchart of FIG.

【００３４】文書データ読み込みの指示が出されると
（ステップ２０１。以下、ステップをＳと略す）、文字
数をカウントするカウンタが初期値である０に設定され
る（Ｓ２０２）。その後、日本語読み込み部４５が１文
字読み込むと（Ｓ２０３）、カウンタの値が１加えられ
る（Ｓ２０４）。When an instruction to read document data is issued (Step 201; hereinafter, step is abbreviated as S), a counter for counting the number of characters is set to an initial value of 0 (S202). Thereafter, when the Japanese reading section 45 reads one character (S203), the value of the counter is incremented by one (S204).

【００３５】読み込んだ文字が５０文字目でなく（Ｓ２
０５：ＮＯ）、句点でもない場合（Ｓ２０６：ＮＯ）、
後方にもう文字がなければ（Ｓ２０７：ＮＯ）、読み込
み処理を終了する。後方にまだ文字があれば（Ｓ２０
７：ＹＥＳ）、次の文字を読み込む（Ｓ２０３）。その
文字が句点なら（Ｓ２０６：ＹＥＳ）、その句点までを
１文として区切り、入力バッファ領域３１に保存する
（Ｓ２０８）。The read character is not the 50th character (S2
05: NO), when it is not a period (S206: NO),
If there are no more characters behind (S207: NO), the reading process ends. If there is still a character behind (S20
7: YES), the next character is read (S203). If the character is a punctuation mark (S206: YES), up to the punctuation mark is divided as one sentence and stored in the input buffer area 31 (S208).

【００３６】さらに、後方に文字が存在しなければ（Ｓ
２０９：ＮＯ）、読み込み処理を終了し、後方に文字が
存在すれば（Ｓ２０９：ＹＥＳ）、カウンタの値を初期
値０に設定した後（Ｓ２０２）、再び１文字ずつ文字数
をカウントしながら読み込む（Ｓ２０３、Ｓ２０４）。Further, if there is no character behind (S
(209: NO), the reading process is terminated, and if there is a character behind (S209: YES), the value of the counter is set to the initial value 0 (S202), and the character is read while counting the number of characters one by one again (S202). S203, S204).

【００３７】読み込んだ文字が５０文字目の時（Ｓ２０
５：ＹＥＳ）、その文字までを１文として区切ると（Ｓ
２１０）、区切り位置変更部４６による区切り位置のチ
ェックと区切り直しの処理が行われる（Ｓ２１１）。When the read character is the 50th character (S20
5: YES), if the character is divided as one sentence (S
210), a process of checking the break position and re-separating by the break position changing unit 46 is performed (S211).

【００３８】読み込む日本文が図５（ａ）に示すよう
に、「彼女の様態はあまり良くないと聞いていたが、私
が昨日病院へ見舞いに行った時には、とても顔色が良く
元気そうだったので、私は安心した。」であるとする
と、以上の処理により、Ｓ２１０において、「元気そ」
までで１文として区切られる。この状態を図５（ｂ）に
模式的に示す。As shown in FIG. 5 (a), the Japanese sentence read, "I heard that her condition was not so good, but when I went to the hospital yesterday, she looked very well and looked fine. Therefore, I was relieved. "
Up to a sentence. This state is schematically shown in FIG.

【００３９】図３はこの区切り位置変更部４６における
処理の流れを示したフローチャートである。図３を参照
しながら、区切り位置変更部４６における処理を以下に
説明する。FIG. 3 is a flowchart showing the flow of the processing in the break position changing unit 46. With reference to FIG. 3, the processing in the break position changing unit 46 will be described below.

【００４０】区切り位置変更部４６では、先ず、区切ら
れた５０文字の日本語文について、形態素解析部４２に
より形態素解析が行われる（Ｓ３１）。図５（ｃ）に形
態素解析結果を模式的に示す。「元気そうだ」は、形容
動詞「元気だ」の語幹と、助動詞「そうだ」の終止形と
なるべきだが、文が「そ」で終わっているため、正しい
形態素解析結果を得られない。「そ」は辞書に登録され
ていない未登録後として形態素解析されている。In the delimiter position changing unit 46, first, the morphological analysis is performed by the morphological analysis unit 42 on the delimited 50-character Japanese sentence (S31). FIG. 5C schematically shows the result of the morphological analysis. "I'm fine" should be the stem of the adjective "Genki" and the final form of the auxiliary verb "Soda", but I can't get the correct morphological analysis because the sentence ends with "so". “So” is morphologically analyzed as unregistered and not registered in the dictionary.

【００４１】その後、この形態素解析の結果から、テー
ブル４７に記述された区切り位置を検索する（Ｓ３
２）。Thereafter, based on the result of the morphological analysis, a delimiter position described in the table 47 is searched (S3).
2).

【００４２】図４は、テーブル４７に格納された区切り
位置の情報を模式的に示した図である。FIG. 4 is a diagram schematically showing information on the break positions stored in the table 47. As shown in FIG.

【００４３】図４に示すように、テーブル４７には、区
切りとする位置が１つ以上記述してある。図５（ｃ）の
形態素解析結果からテーブル４７に格納された位置を検
索して、新たに区切りとする箇所があれば（Ｓ３３：Ｙ
ＥＳ）、その箇所までを１文として区切り、入力バッフ
ァ領域３１に保存する（Ｓ３４）。As shown in FIG. 4, the table 47 describes one or more positions to be delimited. The position stored in the table 47 is searched from the result of the morphological analysis of FIG. 5C, and if there is a part to be newly delimited (S33: Y
ES), the part up to that point is divided as one sentence and stored in the input buffer area 31 (S34).

【００４４】図４に示すように、テーブル４７に形容詞
連用形を区切りとする旨の情報が保存されており、図５
（ｃ）で「良い」が形容詞の連用形であることから、こ
の位置を新たに区切り位置とし、文を区切る。As shown in FIG. 4, information indicating that the adjective conjunctive form is used as a delimiter is stored in the table 47.
Since "good" is a conjunctive form of an adjective in (c), this position is set as a new delimiter position to separate sentences.

【００４５】この、区切り位置の決定は、テーブル４７
に保存されたそれぞれの区切り位置に当てはまる箇所を
形態素解析結果から検索し、複数箇所が検索された場合
には、最も後方の位置で区切るようにしても良いし、テ
ーブル４７に保存された区切り位置に優先順を設け、複
数箇所が検索された場合には、優先順の高い位置で区切
るようにしても良い。The determination of the break position is performed by using the table 47
Is searched from the result of the morphological analysis, and if a plurality of positions are found, the position may be divided at the rearmost position, or the position of the delimiter stored in the table 47 may be used. May be set in the order of priority, and when a plurality of locations are searched, the sections may be separated at a position with a higher priority order.

【００４６】その後、新しい区切り箇所以降の文字列
を、日本語読み込み部４５による区切り箇所以降の文字
列の先頭に加え（Ｓ３５）、区切り位置変更部４６にお
ける処理を終える。Thereafter, the character string after the new delimiter is added to the beginning of the character string after the delimiter by the Japanese reading unit 45 (S35), and the processing in the delimiter position changing unit 46 is completed.

【００４７】また、形態素解析結果からテーブル４７に
記述された区切り位置が検索されなかった場合（Ｓ３
３：ＮＯ）は、新しい区切りを設定せずに区切り位置変
更部４６の処理を終える。When the delimiter position described in the table 47 is not searched from the morphological analysis result (S3
3: NO), the processing of the break position changing unit 46 ends without setting a new break.

【００４８】以上の処理を終え（Ｓ２１１）、後方にさ
らに文字が存在すれば（Ｓ２０９：ＹＥＳ）、再びカウ
ンタを初期値０に戻し（Ｓ２０２）、文字数をカウント
しながら１文字ずつ読み込み（Ｓ２０３、２０４）、図
２のフローチャートに示す処理が続く。When the above processing is completed (S211), and there are more characters behind (S209: YES), the counter is returned to the initial value 0 again (S202), and the characters are read one by one while counting the number of characters (S203, S203). 204), the processing shown in the flowchart of FIG. 2 continues.

【００４９】図５（ｄ）に読み込みが終了した状態を模
式的に示す。文字数制限によって区切られた位置とは別
の位置で区切られた状態で読み込みが終了している。FIG. 5D schematically shows a state in which reading has been completed. The reading has been completed in a state where it is separated at a position different from the position separated by the character number limit.

【００５０】読み込み処理が終了すると、翻訳処理を実
行することになる。図６は翻訳の処理の流れを示したフ
ローチャートである。翻訳の実行が指示されると、１文
毎について図６に示すように、形態素解析（Ｓ６１）、
日本語解析（Ｓ６２）、日英変換（Ｓ６３）、英語生成
（Ｓ６４）が行われ、翻訳結果としての英語文が生成さ
れるが、この処理については既知の技術であるので詳細
な説明は省く。When the reading process is completed, a translation process is executed. FIG. 6 is a flowchart showing the flow of the translation process. When the execution of the translation is instructed, the morphological analysis (S61) is performed for each sentence as shown in FIG.
Japanese analysis (S62), Japanese-to-English conversion (S63), and English generation (S64) are performed, and an English sentence is generated as a translation result. However, since this processing is a known technique, detailed description is omitted. .

【００５１】ここで、Ｓ６１の形態素解析結果に図３の
Ｓ３１で行った形態素解析結果を利用することが考えら
れる。即ち、図３のＳ３４で、形態素解析結果から区切
り位置を決定して１文として入力バッファ領域３１に保
存する際に、形態素解析結果も併せて保存しておき、翻
訳実行時の形態素解析結果として利用すれば、同じ文章
について２度形態素解析を行う必要がなくなる。Here, it is conceivable to use the result of the morphological analysis performed in S31 of FIG. 3 as the result of the morphological analysis in S61. That is, in S34 of FIG. 3, when the delimiter position is determined from the morphological analysis result and is stored in the input buffer area 31 as one sentence, the morphological analysis result is also stored, and the result is used as the morphological analysis result at the time of translation execution. This eliminates the need to perform morphological analysis on the same sentence twice.

【００５２】なお、本発明は上述した実施の形態に限定
されるものではなく、その要旨を逸脱しない範囲におい
て、種々の変更を加えることができる。The present invention is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present invention.

【００５３】例えば、５０文字で区切ったが、６０文字
や７０文字等他の文字数で区切っても良い。For example, the division is made by 50 characters, but may be made by another number such as 60 characters or 70 characters.

【００５４】[0054]

【発明の効果】以上説明したことから明かなように、本
発明の請求項１に記載の機械翻訳装置によれば、日本文
読み込みの際に文字数で文が区切られた場合に、形態素
解析をし、文の区切りとして適した箇所で区切り直すこ
とにより、以降の翻訳処理には正しい形態素解析結果を
利用でき、よりよい翻訳結果を得ることができる。As described above, according to the machine translation apparatus of the first aspect of the present invention, when a sentence is divided by the number of characters when reading a Japanese sentence, morphological analysis is performed. However, by re-segmenting the sentence at a place suitable as a sentence break, a correct morphological analysis result can be used for the subsequent translation processing, and a better translation result can be obtained.

【００５５】また、請求項２に記載の機械翻訳装置によ
れば、形態素解析の結果から決定した区切り箇所以降の
文字列を、文字数で決定した区切り箇所の後方の文字列
の先頭に加えることにより、区切り箇所の後方の日本語
も適切な形となり、よりよい翻訳結果を得ることができ
る。According to the machine translation device of the second aspect, the character string after the delimiter determined from the result of the morphological analysis is added to the head of the character string after the delimiter determined by the number of characters. Also, the Japanese language behind the delimiter is also in an appropriate form, and a better translation result can be obtained.

【００５６】さらに、請求項３に記載の機械翻訳装置に
よれば、形態素解析結果に基づく区切り位置の設定をユ
ーザーが自ら行うことにより、よりユーザーの希望にあ
った読み込み結果を得ることができる。Further, according to the machine translation device of the third aspect, the user himself / herself sets the delimiter position based on the result of the morphological analysis, so that a read result more desired by the user can be obtained.

【００５７】また、請求項４に記載の機械翻訳装置によ
れば、区切り位置変更手段における形態素解析結果を保
存しておき、翻訳実行時の形態素解析結果として利用す
ることから、同じ文章について２度形態素解析を行うこ
とを防ぐことができ、効率的な翻訳作業ができる。According to the machine translation apparatus of the present invention, the result of the morphological analysis by the delimiter position changing means is stored and used as the result of the morphological analysis at the time of executing the translation. Performing morphological analysis can be prevented, and efficient translation work can be performed.

【００５８】さらに、請求項５に記載の機械翻訳装置に
よれば、区切り位置記憶手段に記述した区切り位置に優
先順を設けることにより、複数箇所が検索されたとき
に、区切りとしてより適した位置で区切ることができ
る。Further, according to the machine translation device of the fifth aspect, by providing a priority order to the delimiter positions described in the delimiter position storage means, when a plurality of locations are searched, a position more suitable as a delimiter is provided. Can be separated by

【００５９】また、請求項６に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
によれば、各プログラムをフロッピーディスクやＣＤー
ＲＯＭ等の様々な媒体の中から機械翻訳装置に適した記
録媒体に格納して提供することができる。そして、この
記録媒体を用いてプログラムを実行することにより、日
本文読み込みの際に文字数で文が区切られた場合に、形
態素解析をし、文の区切りとして適した箇所で区切り直
すことにより、以降の翻訳処理には正しい形態素解析結
果を利用でき、よりよい翻訳結果を得ることができる。According to a computer-readable recording medium on which the machine translation processing program according to claim 6 is recorded, each program is suitable for a machine translation apparatus from various media such as a floppy disk and a CD-ROM. Stored in a recording medium. Then, by executing a program using this recording medium, when sentences are delimited by the number of characters when reading Japanese sentences, morphological analysis is performed, and the sentence is re-separated at a place suitable as a sentence delimiter. The correct morphological analysis result can be used for the translation processing of, and a better translation result can be obtained.

【００６０】また、請求項７に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
によれば、各プログラムをフロッピーディスクやＣＤー
ＲＯＭ等の様々な媒体の中から機械翻訳装置に適した記
録媒体に格納して提供することができる。そして、この
記録媒体を用いてプログラムを実行することにより、形
態素解析の結果から決定した区切り箇所以降の文字列
を、文字数で決定した区切り箇所の後方の文字列の先頭
に加えることにより、区切り箇所の後方の日本語も適切
な形となり、よりよい翻訳結果を得ることができる。According to a computer-readable recording medium on which the machine translation processing program according to claim 7 is recorded, each program is suitable for a machine translation apparatus from various media such as a floppy disk and a CD-ROM. Stored in a recording medium. Then, by executing the program using this recording medium, the character string after the delimitation point determined from the result of the morphological analysis is added to the beginning of the character string after the delimitation point determined by the number of characters. The Japanese behind is also in an appropriate form, and better translation results can be obtained.

【００６１】さらに、請求項８に記載の機械翻訳処理プ
ログラムを記録したコンピュータ読み取り可能な記録媒
体によれば、各プログラムをフロッピーディスクやＣＤ
ーＲＯＭ等の様々な媒体の中から機械翻訳装置に適した
記録媒体に格納して提供することができる。そして、こ
の記録媒体を用いてプログラムを実行することにより、
形態素解析結果に基づく区切り位置の設定をユーザーが
自ら行うことができ、よりユーザーの希望にあった読み
込み結果を得ることができる。According to a computer-readable recording medium on which the machine translation processing program according to claim 8 is recorded, each program can be stored on a floppy disk or a CD.
-It can be provided by being stored in a recording medium suitable for a machine translation device from various media such as a ROM. Then, by executing a program using this recording medium,
The user can set the delimiter position based on the morphological analysis result by himself, and can obtain a read result more desired by the user.

【００６２】また、請求項９に記載の機械翻訳処理プロ
グラムを記録したコンピュータ読み取り可能な記録媒体
によれば、各プログラムをフロッピーディスクやＣＤー
ＲＯＭ等の様々な媒体の中から機械翻訳装置に適した記
録媒体に格納して提供することができる。そして、この
記録媒体を用いてプログラムを実行することにより、区
切り位置変更手段における形態素解析結果を保存してお
き、翻訳実行時の形態素解析結果として利用することか
ら、同じ文章について２度形態素解析を行うことを防ぐ
ことができ、効率的な翻訳作業ができる。According to a computer-readable recording medium having recorded thereon the machine translation processing program according to the ninth aspect, each program is suitable for a machine translation apparatus from various media such as a floppy disk and a CD-ROM. Stored in a recording medium. Then, by executing the program using this recording medium, the morphological analysis result in the delimiter position changing means is stored and used as the morphological analysis result at the time of performing the translation. Can be prevented, and efficient translation work can be performed.

【００６３】さらに、請求項１０に記載の機械翻訳処理
プログラムを記録したコンピュータ読み取り可能な記録
媒体によれば、各プログラムをフロッピーディスクやＣ
ＤーＲＯＭ等の様々な媒体の中から機械翻訳装置に適し
た記録媒体に格納して提供することができる。そして、
この記録媒体を用いてプログラムを実行することによ
り、区切り位置記憶プログラムに記述した区切り位置に
優先順が設けられ、複数箇所が検索されたときには、区
切りとしてより一層適した位置で区切ることができる。According to a computer-readable recording medium on which the machine translation processing program according to claim 10 is recorded, each program can be stored on a floppy disk or
It can be provided by being stored in a recording medium suitable for a machine translation device from various media such as a D-ROM. And
By executing the program using this recording medium, priority is given to the delimiter positions described in the delimiter position storage program, and when a plurality of locations are searched, the delimiters can be partitioned at positions more suitable as delimiters.

[Brief description of the drawings]

【図１】本実施の形態の機械翻訳装置の構成を示すブロ
ック図である。FIG. 1 is a block diagram illustrating a configuration of a machine translation device according to an embodiment.

【図２】本実施の形態における日本語文書を読み込む処
理の流れを示すフローチャートである。FIG. 2 is a flowchart illustrating a flow of processing for reading a Japanese document according to the present embodiment.

【図３】本実施の形態の区切り位置変更部における処理
の流れを示すフローチャートである。FIG. 3 is a flowchart illustrating a flow of processing in a break position changing unit according to the present embodiment.

【図４】本実施の形態の区切り位置を示すテーブルの情
報を模式的に示した図である。FIG. 4 is a diagram schematically illustrating information of a table indicating a break position according to the present embodiment.

【図５】本実施の形態の各段階での日本語の状態を模式
的に示す図である。FIG. 5 is a diagram schematically showing a state of Japanese at each stage of the present embodiment.

【図６】本実施の形態における翻訳処理の流れを示すフ
ローチャートである。FIG. 6 is a flowchart illustrating a flow of a translation process according to the present embodiment.

[Explanation of symbols]

３０ＲＡＭ３１入力バッファ領域３２出力バッファ領域４０ＲＯＭ４２形態素解析部４３辞書部４４文法部４５日本語読み込み部４６区切り位置変更部４７テーブル Reference Signs List 30 RAM 31 Input buffer area 32 Output buffer area 40 ROM 42 Morphological analysis part 43 Dictionary part 44 Grammar part 45 Japanese reading part 46 Delimiter position change part 47 Table

Claims

[Claims]

1. A translation dictionary storing information necessary for translating a source language sentence given as a source language into another language sentence as a target language, and a given source language sentence is stored in the translation dictionary. First morphological analysis means for performing a morphological analysis while referring to the stored information; and a source language sentence morphologically analyzed by the first morphological analysis means in a target language while referring to the information stored in the translation dictionary. A translation processing unit that stores grammar rules for translating the source language document data, and reading means for reading source language document data prepared in advance as original text while dividing the data into units suitable for translation. In a machine translation apparatus in which at least one of the conditions to be set is limited to the number of characters, when the reading unit reads the document data, the document is divided by the limited number of characters. In this case, a second morphological analysis unit that performs morphological analysis up to the divided portion, and a portion that is regarded as a break is described in order to determine a position suitable as a break from the morphological analysis result by the second morphological analyzer. A delimiter position storage unit; and a delimiter position changing unit that changes a delimiter position based on a result of the morphological analysis by the second morphological analysis unit and information of the delimiter position storage unit. A machine translation apparatus for re-separating a document at a break position changed by the break position changing means.

2. The apparatus according to claim 1, wherein a character string after a part newly separated by said separation position changing means is added to a head of a character string after a part separated by said reading means. Item 10. The machine translation device according to Item 1.

3. The machine translation apparatus according to claim 1, further comprising a break position setting unit that allows a user to set information in the break position storage unit.

4. A configuration in which a result of the morphological analysis by the second morphological analysis unit is regarded as a morphological analysis result by the first morphological analysis unit, and the processing by the translation processing unit is performed. The machine translation device according to claim 1, wherein

5. The machine translation apparatus according to claim 1, wherein the information described in the delimiter position storage means can be set so as to set a priority order for the delimiter.

6. A reading program in which source language document data given as a source language is read as an original while being divided into units suitable for translation, and at least one of the delimiting conditions is limited in the number of characters, and is stored in a translation dictionary. A first morphological analysis program for morphologically analyzing the source language sentence while referring to the information obtained from the source language sentence, and a source language sentence morphologically analyzed by the first morphological analysis program while referring to the information stored in the translation dictionary. In a computer-readable recording medium having a translation program for translating into another language as a target language, when the source language document data is divided by the character number limit, a morphological analysis is performed up to the divided portion. 2 and a morphological analysis result obtained by the second morphological analysis program. In order to determine a suitable position as a partition, a partition position storage program describing a portion to be regarded as a partition, a morphological analysis result by the second morphological analysis program, and information on the partition position storage program And a computer-readable recording medium having recorded thereon a machine translation processing program characterized by comprising: a delimiter position changing program for changing a position, wherein the original text delimited by the character number limit is re-separated at the delimiter position changed by the delimiter position changing program. Possible recording medium.

7. The program according to claim 6, wherein the delimiter position changing program adds a character string after the newly delimited part to the beginning of the character string after the part delimited by the reading program. A computer-readable recording medium on which the machine translation processing program described above is recorded.

8. A computer-readable storage medium storing a machine translation processing program according to claim 6, further comprising a break position setting program that allows a user to set information of the break position storage program.

9. The translation program according to claim 1, wherein a result of the morphological analysis performed by the second morphological analysis program is regarded as a result of the morphological analysis performed by the first morphological analysis program, and the translation program performs a translation process. A computer-readable recording medium recording the machine translation processing program according to claim 6.

10. The computer-readable recording medium according to claim 6, wherein the delimiter position storage program can set priorities for delimiting information to be described.