JPH0673134B2

JPH0673134B2 - Machine translation system

Info

Publication number: JPH0673134B2
Application number: JP60221090A
Authority: JP
Inventors: 敏行舟部
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-10-03
Filing date: 1985-10-03
Publication date: 1994-09-14
Anticipated expiration: 2009-09-14
Also published as: JPS6279569A

Description

【発明の詳細な説明】技術分野本発明は、原言語文を他の言語の文に翻訳する機械翻訳
システムに関し、特に原言語（以下、例えば、「日本
語」とする）としてあいまい性が生じていたり、全く日
本語としておかしい文（以下、これらを総称して「非
文」という）を抽出する手段を備えた機械翻訳システム
に関するものである。TECHNICAL FIELD The present invention relates to a machine translation system for translating a source language sentence into a sentence of another language, and in particular, ambiguity occurs as a source language (hereinafter, referred to as “Japanese”). Or, the present invention relates to a machine translation system provided with a means for extracting sentences that are totally strange as Japanese (hereinafter, these are collectively referred to as "non-sentences").

従来技術従来、機械翻訳システムにおいて、入力された日本文
は、まず、形態素解析され、次に、構文解析等がなさ
れ、その構文解析された文が他の言語（例えば、英語，
仏語）に翻訳されることになる。しかし、入力される日
本文は、全て正しい日本語とは限らないので、前述した
構文解析の処理中に非文か否かのチェックを行ってい
た。または、非文チェックを行わず、入力された日本文
そのまま構文解析して翻訳するという方法を採用してい
る。2. Description of the Related Art Conventionally, in a machine translation system, an input Japanese sentence is first subjected to morphological analysis, and then subjected to syntactic analysis and the like, and the syntactic analyzed sentence is converted into another language (for example, English,
It will be translated into French). However, all the input Japanese sentences are not necessarily correct Japanese, so it was checked whether they were non-sentences during the above-mentioned parsing process. Alternatively, a method is adopted in which non-sentence checking is not performed and the input Japanese sentence is directly parsed and translated.

このような構文解析中に非文をチェックする方法では、
非文が見つかるまで無駄な処理をしてしまうという問題
があった。また、非文を見つける文法を用いて構文解析
を行うと、構文解析が複雑になり、処理速度が低下する
という問題がある。A way to check for non-sentences during parsing like this is
There was a problem that unnecessary processing was performed until a non-sentence was found. In addition, when parsing is performed using a grammar that finds a non-sentence, there is a problem that the parsing becomes complicated and the processing speed is reduced.

一方、非文をチェックしないで構文解析を行い、翻訳結
果を出力すると、その翻訳結果が全く正しくないものと
なる可能性がある。On the other hand, if the syntactic analysis is performed without checking the non-sentence and the translation result is output, the translation result may be completely incorrect.

このように、構文解析中に非文のチェックを行う文法を
用いて翻訳システムを構成すると、システムのコスト高
となり、構文解析に時間ばかりかかっても結局、結果が
でないという状態になるおそれがある。In this way, if a translation system is constructed using a grammar that checks non-sentences during parsing, the cost of the system will increase, and even if parsing takes a long time, the result may end up not being obtained. .

目的本発明は上記事情に鑑みてなされたものであり、その目
的とするところは、前述の「非文」を効率よく抽出可能
な非文判定抽出手段を備え、構文解析を速度を向上させ
て構文解析用の文法辞書の容量を節減することが可能な
機械翻訳システムを提供することにある。Aim The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a non-sentence determination extraction unit capable of efficiently extracting the above-mentioned "non-sentence" to improve the speed of syntactic analysis. It is to provide a machine translation system capable of reducing the capacity of a grammar dictionary for parsing.

構成本発明の上述の目的は、原言語文を他の言語の文に翻訳
する機械翻訳システムの原言語文入力受付部において、
予め、機械翻訳システムへの入力原言語文として不適切
な文（非文）を検出する条件を複数定めておき、入力さ
れた原言語文の前記複数の条件に対応する件数の統計を
とる複数の統計手段と、前記複数の統計手段により算出
された結果と前記条件とを比較して非文か否かの判定を
行い、非文の抽出を行う非文判定抽出手段とを設けたこ
とを特徴とする機械翻訳システムによって達成される。Configuration The above object of the present invention is to provide a source language sentence input receiving unit of a machine translation system for translating a source language sentence into a sentence of another language,
A plurality of conditions for detecting an inappropriate sentence (non-sentence) as an input source language sentence to the machine translation system are defined in advance, and a plurality of statistics for the number of cases corresponding to the plurality of conditions of the input source language sentence are collected. And a non-sentence determination and extraction unit that extracts the non-sentence by comparing the results calculated by the plurality of statistical units with the condition to determine whether or not the sentence is a non-sentence. Achieved by a featured machine translation system.

実施例以下、本発明の実施例を図面に基づいて詳細に説明す
る。第２図は、本発明の一実施例に係る機械翻訳システ
ムの概略構成図である。Embodiment Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. FIG. 2 is a schematic configuration diagram of a machine translation system according to an embodiment of the present invention.

第２図において、１は自然言語文を入力する入力装置で
あり、例えば、キーボード,OCR等の文字入力装置であ
る。２は日本語辞書，接続表を用いて、入力文を単語に
分割し、品詞その他の情報を付加する形態素解析装置で
ある。３は本実施例の特徴的構成要素である非文抽出装
置であり、形態的統計（文字数が多い、文節が多いな
ど）、パラメータリスト10,パラメータリスト11を用い
て、文法的、用法的にみて非文の特徴をもっているかを
判定し、非文ならばユーザにそのことを表示し、非文で
なければ、翻訳装置４に入力して翻訳を行う。４は翻訳
装置であり、翻訳用文法、翻訳用辞書を用いて、入力文
を構文解析、変換、生成処理し、日本語から他言語（例
えば、英語，仏語）へ翻訳する。In FIG. 2, reference numeral 1 is an input device for inputting a natural language sentence, for example, a character input device such as a keyboard or OCR. Reference numeral 2 is a morphological analyzer that divides an input sentence into words using a Japanese dictionary and connection table and adds information such as part of speech. Reference numeral 3 denotes a non-sentence extraction device, which is a characteristic component of the present embodiment, and is grammatically and usage-wise using morphological statistics (a large number of characters, a large number of phrases, etc.), a parameter list 10 and a parameter list 11. It is judged whether or not it has a feature of non-sentence, and if it is not a sentence, it is displayed to the user, and if it is not a sentence, it is input to the translation device 4 and translated. Reference numeral 4 denotes a translation device, which uses a translation grammar and translation dictionary to parse, convert, and generate an input sentence, and translate it from Japanese into another language (for example, English or French).

５は非文抽出装置３により抽出される非文、または、翻
訳装置４により翻訳された外国語を出力する出力装置で
あり、例えば、CRTディスプレイ，プリンターなどであ
る。An output device 5 outputs a non-sentence extracted by the non-sentence extraction device 3 or a foreign language translated by the translation device 4, such as a CRT display or a printer.

第１図は、第２図に示した実施例中の非文抽出装置３の
構成図である。FIG. 1 is a block diagram of the non-sentence extracting device 3 in the embodiment shown in FIG.

第１図において、６は判定表12の統計結果、しきい値を
用いて非文を判定する非文判定装置であり、処理は、統
計装置７→統計装置８→統計装置９と進み、どれかひと
つの統計装置で非文と判定されたときは、処理を中断
し、ユーザに表示する。このとき、非文と決定した理由
も表示する。７は入力文の文字数，文節数をカウントし
て判定表12に記入する統計装置、８はパラメータリスト
10を用いて、各パラメータの文中の統計を取る統計装
置、９はパラメータリスト11を用いて各パラメータの文
中の統計を取る統計装置、10は非文を判定するためのキ
ーとなる語，属性，属性値が記入されているパラメータ
リスト、11は非文を判定するためのキーとなる語の関係
が記入されているパラメータリスト、12は処理名，パラ
メータ名，統計結果，しきい値の項目を持っている判定
表であり、統計結果は、各処理におけるパラメータ数の
カウント数が記入される。In FIG. 1, reference numeral 6 denotes a non-sentence determination device that determines a non-sentence using a statistical result of the determination table 12 and a threshold value. If one of the statistical devices determines that the sentence is non-sentence, the process is interrupted and displayed to the user. At this time, the reason why the sentence is determined to be non-sentence is also displayed. 7 is a statistical device that counts the number of characters and phrases in the input sentence and fills in the judgment table 12, 8 is a parameter list
A statistical device that uses 10 to obtain the statistics in the sentence of each parameter, 9 is a statistical device that obtains the statistics from the sentence of each parameter using the parameter list 11, and 10 is a key word or attribute for determining a non-sentence ， Parameter list in which the attribute value is entered, 11 is the parameter list in which the relationship of the key words for determining the non-sentence is entered, 12 is the processing name, parameter name, statistical result, threshold item In the statistical result, the count number of the number of parameters in each process is entered.

第３図、第２図のパラメータリスト10の構成例を示す図
である。このパラメータリスト10は、単語の属性名，属
性値，表記の３組から構成される。例えば、ひらがな
「の」は、属性名“品詞”、属性値“格助詞”、表記
“の”からパラメータリストを構成する。FIG. 3 is a diagram showing a configuration example of a parameter list 10 in FIGS. 3 and 2. The parameter list 10 is composed of three sets of word attribute names, attribute values, and notations. For example, the hiragana “no” constitutes a parameter list from the attribute name “part of speech”, the attribute value “case particle”, and the notation “no”.

第４図は、第２図のパラメータリスト11の構成例を示す
図である。このパラメータリスト11は、単語，文節の前
後関係を示すために用いられ、まず、リストを前と後に
２分割し、前の部分は属性名，属性値，表記から構成さ
れ、後の部分は属性名，属性値，表記，評価値から構成
される。FIG. 4 is a diagram showing a configuration example of the parameter list 11 of FIG. This parameter list 11 is used to indicate the context of words and phrases. First, the list is divided into two parts, the front part and the rear part. The front part consists of attribute names, attribute values, and notations, and the latter part consists of attributes. It consists of name, attribute value, notation, and evaluation value.

第５図は、本発明による判定表12の構成例を示す図であ
る。この判定表12は、処理名，パラメータ名，統計結
果，しきい値から構成され、統計処理１は、統計装置７
による処理を示しており、パラメータ名には、例えば、
文の文字数，文字数／文節数を記入し、しきい値をあら
かじめ決めておく。ここでは、一応文字数のしきい値
は、40,文節数／文字数を0.6としてある。しかし、この
数値に限定されるものではない。また、統計結果は実際
に入力された文に対して統計装置７により統計を取った
結果を記入するので、通常は何も記入されていない。統
計処理２は、統計装置８による統計処理を示しており、
パラメータ名には、例えば、「の」の数，連用中止の
数，並列助詞の数が記入されており、しきい値はそれぞ
れ‘3',‘3',‘2'としてある。この数値についても、前
述の統計処理１と同様、限定されるものではない。統計
処理３は、統計装置３による統計処理を示しており、パ
ラメータ名には、「〜が，〜が」の数が記入されてお
り、しきい値は‘1'としてある。したがって、非文判定
装置６により上記のような判定表12を見て、統計結果が
しきい値以上の場合は、入力文を非文とみなすものとす
る。FIG. 5 is a diagram showing a configuration example of the judgment table 12 according to the present invention. This judgment table 12 is composed of a processing name, a parameter name, a statistical result, and a threshold value.
Shows the processing by, and in the parameter name, for example,
Enter the number of characters in the sentence, the number of characters / the number of phrases, and decide the threshold value in advance. Here, the threshold value of the number of characters is 40 and the number of phrases / the number of characters is 0.6. However, it is not limited to this value. Further, as the statistical result, since the result obtained by taking statistics by the statistical device 7 is entered with respect to the sentence actually input, nothing is usually entered. Statistical processing 2 indicates statistical processing by the statistical device 8,
In the parameter name, for example, the number of "no", the number of continuous cancellations, and the number of parallel particles are entered, and the threshold values are "3", "3", and "2", respectively. This numerical value is not limited as in the statistical processing 1 described above. The statistical processing 3 shows the statistical processing by the statistical device 3, in which the number of "..., ~" is entered in the parameter name, and the threshold value is "1". Therefore, when the non-sentence determination device 6 looks at the determination table 12 as described above and the statistical result is equal to or more than the threshold value, the input sentence is regarded as a non-sentence.

第６図は、本発明による非文抽出処理を示すフローチャ
ートである。これは、第１図の非文抽出装置の動作を示
したものである。以下、第６図のフローチャートに従っ
て説明する。FIG. 6 is a flowchart showing the non-sentence extraction processing according to the present invention. This shows the operation of the non-sentence extraction device of FIG. Hereinafter, description will be given according to the flowchart of FIG.

まず、形態素解析装置２により形態素解析された文の文
字数をカウントし、判定表12に記入し（ステップ60
1）、文節数をカウントし、判定表12に記入し（ステッ
プ602）、判定表12の統計結果としきい値を比較し、非
文判定装置６により非文の判定を行う（ステップ60
3）。非文判定装置６により非文と判定された場合は
（ステップ604）、第２図に示す出力装置５によりユー
ザに対して表示する。非文判定装置６により非文でない
と判定されたときには（ステップ604）、統計装置８に
よりパラメータ・リスト10について、各パラメータを入
力文中でカウントし、判定表12に記入し（ステップ60
5）、非文判定装置６により判定表12の統計結果としき
い値を比較し、非文の判定を行う（ステップ606）。非
文と判定された場合には（ステップ607）、出力装置５
により非文を表示する（ステップ611）。非文でないと
きには（ステップ607）、統計装置９によりパラメータ
・リスト11について、各パラメータを入力文中でカウン
トし、判定表12に記入し（ステップ608）、非文判定装
置６により判定表の統計結果としきい値を比較し、非文
の判定を行う（ステップ609）。非文と判定された場合
は（ステップ610）、出力装置５によりユーザ表示する
（ステップ611）。非文でないと判定された場合は（ス
テップ610）、処理を終了する。First, the number of characters in the sentence morphologically analyzed by the morphological analyzer 2 is counted and entered in the judgment table 12 (step 60
1), the number of clauses is counted and entered in the judgment table 12 (step 602), the statistical result of the judgment table 12 and the threshold value are compared, and the non-sentence judgment device 6 judges the non-sentence (step 60).
3). When the non-sentence determining device 6 determines that the sentence is not a sentence (step 604), the output device 5 shown in FIG. When the non-sentence determination device 6 determines that the sentence is not a non-sentence (step 604), the statistical device 8 counts each parameter in the input sentence for the parameter list 10 and writes it in the determination table 12 (step 60).
5) The non-sentence determination device 6 compares the statistical result of the determination table 12 with the threshold value to determine the non-sentence (step 606). If it is determined as a non-sentence (step 607), the output device 5
The non-sentence is displayed by (step 611). When the sentence is not a non-sentence (step 607), each parameter of the parameter list 11 is counted in the input sentence by the statistical device 9 and entered in the decision table 12 (step 608), and the statistical result of the decision table is obtained by the non-sentence determination device 6. And the threshold value are compared to determine non-sentence (step 609). If it is determined that the sentence is not a sentence (step 610), the output device 5 displays it by the user (step 611). If it is determined that the sentence is not a non-sentence (step 610), the process ends.

第７図は、入力文の例を示す図である。この例に示すよ
うな「彼の机の本棚の本」は、文節は４個であるという
ことはすぐわかるが、この文を解析すると、あいまい性
が生じる。すなわち、彼の机，机の本棚，本棚の本、
彼の本棚，机の本棚，本棚の本、彼の本，机の本
棚，本棚の本、彼の机，机の本，本棚の本という解釈
が考えられる。したがって、このような入力文は非文と
して抽出し、構文解析等は行わない方が処理低下を防止
できる。以下、この「彼の机の本棚の本」が本非文抽出
装置３により非文とみなされるまでの非文抽出処理を説
明する。まず、入力装置１により「彼の机の本棚の本」
が入力されると、次に、形態素解析装置２により形態素
解析が行われる。この形態素解析された文「彼の机の本
棚の本」が非文抽出装置３に入力され、非文か否かのチ
ェックが行われる。以下、第１図に示す非文抽出装置３
を参照しながら非文抽出処理について説明する。FIG. 7 is a diagram showing an example of an input sentence. It can be readily seen that the "book on his desk bookshelf" as shown in this example has four clauses, but parsing this sentence causes ambiguity. Ie his desk, desk bookshelf, bookshelf books,
His bookshelf, desk bookshelf, bookshelf books, his books, desk bookshelves, bookshelves books, his desk, desk books, bookshelves books can be considered. Therefore, it is possible to prevent processing deterioration by extracting such an input sentence as a non-sentence and not performing syntax analysis or the like. Hereinafter, the non-sentence extraction processing until the "book on the bookshelf of his desk" is regarded as a non-sentence by the book non-sentence extraction device 3 will be described. First, using the input device 1, "book on the bookshelf of his desk"
When is input, the morphological analysis device 2 next performs morphological analysis. The morphologically analyzed sentence “book on the bookshelf at his desk” is input to the non-sentence extraction device 3, and whether or not it is a non-sentence is checked. Hereinafter, the non-sentence extraction device 3 shown in FIG.
The non-sentence extraction process will be described with reference to FIG.

まず、形態素解析された文「彼の机の本棚の本」は、統
計数値７で文字数，文節数，文節数／文字数の統計が取
られる。そして、判定表12にその結果が記入される。統
計結果は、文字数＝8,文節数＝4,文節数／文字数＝0.5
となる。この判定表12の例を第８図に示す。非文判定装
置６は、第８図に示すような判定表12を見て、統計結果
がしきい値より大きいかどうかを判定し、しきい値以上
であれば、非文とみなす。しかし、この例では、両方と
もしきい値より小さいので、統計１の結果では、非文で
ないとする。処理は統計装置８へと進む。統計装置８で
は、パラメータリスト10を用いてそのパラメータによ
り、「彼の机の本棚の本」を検索し、カウントし、判定
表12に記入する。この統計装置８による統計結果を第９
図に示す。いまパラメータ・リスト10のパラメータ中で
品詞が格助詞で表記が「の」のパラメータに注目する
（説明のため、他のパラメータは無視する）。すると、
入力文「彼の机の本棚の本」には、格助詞の「の」が３
つ含まれている。これを判定表12に記入する。よって、
非文判定装置６では、格助詞「の」の統計結果と、その
しきい値を比較すると、統計結果がしきい値と等しいこ
とから、「彼の机の本棚の本」は非文の可能性があり、
処理がここで中断し、第２図に示す出力装置５によりユ
ーザに「彼の机の本棚の本」が非文であることを表示
し、それとともに、非文と判定された要因を表示する。First, the morphologically analyzed sentence "book on the bookshelf of his desk" is statistically numerically 7 and the statistics of the number of characters, the number of phrases, and the number of phrases / number of characters are taken. Then, the result is entered in the judgment table 12. The statistical results are: number of characters = 8, number of phrases = 4, number of phrases / number of characters = 0.5
Becomes An example of this judgment table 12 is shown in FIG. The non-sentence determination device 6 looks at the determination table 12 as shown in FIG. 8 and determines whether or not the statistical result is larger than the threshold value. However, in this example, since both are smaller than the threshold value, the result of statistic 1 is not a non-sentence. The process proceeds to the statistical device 8. The statistical device 8 uses the parameter list 10 to search for "books on the bookshelf of his desk" using the parameters, counts them, and fills in the judgment table 12. The statistical result by this statistical device 8
Shown in the figure. Now, pay attention to a parameter whose part of speech is a case particle and whose notation is "no" in the parameters of the parameter list 10 (for the sake of explanation, other parameters are ignored). Then,
In the input sentence "book on his desk bookshelf", the case particle "no" is 3
One is included. Enter this in Judgment Table 12. Therefore,
In the non-sentence determination device 6, when the statistical result of the case particle “no” and its threshold value are compared, the statistical result is equal to the threshold value, so “the book on the bookshelf of his desk” can be a non-sentence. There is a
The processing is interrupted here, and the output device 5 shown in FIG. 2 displays to the user that "books on the bookshelf of his desk" is non-sentence, and at the same time displays the factors determined to be non-sentence. .

第10図は、他の実施例を示す非文抽出装置３の構成図で
ある。ここで、構成については、非文判定装置６以外
は、第１図と同様であるので、説明を省略し、非文判定
装置６について説明する。FIG. 10 is a block diagram of a non-sentence extraction device 3 showing another embodiment. Here, the configuration is the same as that of FIG. 1 except for the non-sentence determination device 6, so the description is omitted and the non-sentence determination device 6 will be described.

非文判定装置６は、一度、統計装置７に処理を移した後
は、統計装置９から処理終了の信号がくるまで判定表12
を見にいかない。つまり、統計装置７→統計装置８→統
計装置９と処理が終了したときには、判定表12を見にい
き、非文の判定を行い、非文と判定された場合、判定表
12で統計結果がしきい値以上のものすべて取り出し、ユ
ーザにその非文の要因すべてを出力装置５により表示す
る。The non-sentence determination device 6 once moves the process to the statistical device 7, and then determines the determination table 12 until the statistical device 9 sends a signal indicating the end of processing.
I can't go see. That is, when the processing is completed in the order of statistical device 7 → statistical device 8 → statistical device 9, the decision table 12 is looked at, and non-sentence is determined.
In step 12, all statistical results that are equal to or greater than the threshold value are extracted, and all the factors of the non-sentence are displayed on the output device 5 to the user.

第11図は、他の実施例を示す非文抽出処理のフローチャ
ートである。これは、第10図の非文抽出装置３による非
文抽出の処理を示したものである。以下、第11図のフロ
ーチャートに従って説明する。FIG. 11 is a flowchart of a non-sentence extraction process showing another embodiment. This shows the processing of non-sentence extraction by the non-sentence extraction device 3 in FIG. Hereinafter, description will be given according to the flowchart of FIG.

まず、統計装置７により入力文の文字数をカウントし、
判定表12に記入し（ステップ1101）、文節数をカウント
し、判定表12に記入する（ステップ1102）。次に、統計
装置８によりパラメータリスト10について、各パラメー
タを入力文中でカウントし、判定表12に記入する（ステ
ップ1103）。さらに、統計装置９によりパラメータ・リ
スト11について、各パラメータを入力文中でカウント
し、判定表12に記入する（ステップ1104）。以上の処理
により記入された判定表12の統計結果としきい値とを比
較し、非文の判定を行う（ステップ1105,1106）。非文
と判定された場合は、出力装置５にデータ転送して非文
であることを表示する（ステップ1107）。非文でないと
きは、翻訳装置４により構文解析，変換，生成等を行っ
て、他言語を出力装置５に出力する。First, the number of characters in the input sentence is counted by the statistical device 7,
Fill in the judgment table 12 (step 1101), count the number of phrases, and fill in the judgment table 12 (step 1102). Next, each parameter of the parameter list 10 is counted by the statistical device 8 in the input sentence and entered in the judgment table 12 (step 1103). Furthermore, each parameter of the parameter list 11 is counted by the statistical device 9 in the input sentence and entered in the judgment table 12 (step 1104). The statistical results of the judgment table 12 entered by the above processing are compared with the threshold value to judge non-sentence (steps 1105, 1106). If it is determined to be non-sentence, the data is transferred to the output device 5 to display that it is a non-sentence (step 1107). When it is not a non-sentence, the translation device 4 performs syntax analysis, conversion, generation, etc., and outputs another language to the output device 5.

ここで、他の実施例が本実施令と異なる点は、ユーザに
対し、非文と判定された要因を複数表示することであ
る。判定表12には、統計処理1,2,3の各項目がすべて埋
められているので、これを見て非文判定装置６は、非文
の要因を抽出して（統計結果がしきい値以上のもの）そ
のパラメータ名をすべて出力装置５に送り表示する。Here, the point that the other embodiment is different from this embodiment is that a plurality of factors determined to be non-sentences are displayed to the user. Since all the items of the statistical processing 1, 2, and 3 are filled in the judgment table 12, the non-sentence judging device 6 looks at this and extracts the factors of the non-sentence (the statistical result is the threshold value). All the above parameters are sent to the output device 5 for display.

このように、本実施例においては、形態素解析された入
力文に対して、あらかじめ決められた非文の条件と比較
することにより、非文か否かの判定をハードウェアによ
り行い、その後、構文解析を行うので、構文解析に必要
な文法辞書等を容量も小さくできる。また、この非文抽
出は、ある程度機械的に処理できるので処理速度を向上
できる。さらに、無駄な構文解析等を処理をしなくてす
むので、構文解析の処理速度も向上できる。また、本実
施例では、入力文は非文みなされるもの以外について翻
訳することになるので、最終的に訳文の質を向上できる
ようになる。As described above, in this embodiment, the input sentence subjected to morphological analysis is compared with a predetermined non-sentence condition to determine whether or not the sentence is a non-sentence, and then the syntax is determined. Since parsing is performed, the capacity of the grammar dictionary and the like required for parsing can be reduced. Further, since this non-sentence extraction can be mechanically processed to some extent, the processing speed can be improved. Further, since unnecessary parsing and the like need not be processed, the parsing processing speed can be improved. Further, in the present embodiment, since the input sentence is translated except for the non-sentence, the quality of the translated sentence can be finally improved.

効果以上説明したように、本発明によれば、機械翻訳システ
ム等の自然言語処理において、簡単なハードウェア構成
で、構文解析処理を行う前に入力文に対して非文を抽出
できるようになり、かつ、構文解析の処理速度を向上で
き、構文解析用の文法辞書の容量を節減できる。Effect As described above, according to the present invention, in natural language processing such as a machine translation system, it is possible to extract a non-sentence from an input sentence with a simple hardware configuration before performing a parsing process. In addition, the processing speed of parsing can be improved, and the capacity of the grammar dictionary for parsing can be saved.

[Brief description of drawings]

第１図は本発明の一実施例を示す非文抽出装置の構成
図、第２図は本発明を適用した機械翻訳システムの構成
図、第３図および第４図は本発明によるパラメータリス
トの構成例を示す図、第５図は本発明による判定表の構
成例を示す図、第６図は第１図の非文抽出装置による非
文抽出処理を示すフローチャート、第７図は入力文の例
を示す図、第８図，第９図は第７図の入力文の統計結果
を示す判定表の構成例を示す図、第10図は本発明の他の
実施例による非文抽出装置の構成図、第11は第10図の非
文抽出装置による非文抽出処理を示すフローチャートで
ある。 1:入力装置、2:形態素解析装置、3:非文抽出装置、4:翻
訳装置、5:出力装置、6:非文判定装置、7,8,9:統計装
置、11,12:パラメータリスト。FIG. 1 is a block diagram of a non-sentence extraction device showing an embodiment of the present invention, FIG. 2 is a block diagram of a machine translation system to which the present invention is applied, and FIGS. 3 and 4 are parameter lists according to the present invention. FIG. 5 is a diagram showing a configuration example, FIG. 5 is a diagram showing a configuration example of a judgment table according to the present invention, FIG. 6 is a flow chart showing non-sentence extraction processing by the non-sentence extraction device of FIG. 1, and FIG. FIG. 8 is a diagram showing an example, FIG. 8 is a diagram showing a configuration example of a judgment table showing the statistical results of the input sentence of FIG. 7, and FIG. 10 is a non-sentence extracting device according to another embodiment of the present invention. 11 is a flow chart showing a non-sentence extraction process by the non-sentence extraction device of FIG. 1: input device, 2: morphological analysis device, 3: non-sentence extraction device, 4: translation device, 5: output device, 6: non-sentence determination device, 7, 8, 9: statistical device, 11, 12: parameter list .

Claims

[Claims]

1. A source language sentence input receiving unit of a machine translation system for translating a source language sentence into a sentence of another language, in advance, an inappropriate sentence (non-sentence) as an input source language sentence to the machine translation system. A plurality of conditions to be detected are set in advance, and a plurality of statistical means for taking statistics corresponding to the plurality of conditions of the input source language sentence are compared with the results calculated by the plurality of statistical means and the conditions. A machine translation system comprising: a non-sentence determination extraction unit that determines whether or not a sentence is a non-sentence and extracts a non-sentence.

2. The statistical means is characterized by the number of characters in the source language sentence,
The machine translation system according to claim 1, wherein the number of phrases or the number of words or phrases having a specific grammatical attribute is counted, and these upper limit values are set as a determination condition.