JPS6279569A

JPS6279569A - Non-sentence extracting device

Info

Publication number: JPS6279569A
Application number: JP60221090A
Authority: JP
Inventors: Toshiyuki Funabe; 舟部　敏行
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-10-03
Filing date: 1985-10-03
Publication date: 1987-04-11
Anticipated expiration: 2009-09-14
Also published as: JPH0673134B2

Abstract

PURPOSE:To improve a processing speed of syntactic analysis and to reduce the capacity of a grammar dictionary for the syntactic analysis by providing a non-sentence deciding means that compares the result of a statistics by plural kinds of statistics means with a decision condition and decides whether a sentence is the non-sentence or not. CONSTITUTION:A statistic device 7 counts the number of characters of an inputted sentence, posting them on a decision table 12, and counts the number of clauses, posting them on the decision table 12. At the next, a statistics device 8 counts the parameter of a parameter list 10 in the inputted sentence and a statistics device 9 counts the parameter of a parameter list 11, posting them on the decision 12 respectively. By comparing the statistics result of the decision table 12 posted by the above process and a threshold value, the decision of the non-sentence is performed, and when it is decided as the non-sentence, a data is transferred to an output device 5, displaying it is the non-sentence and when it is not the non-sentence, the syntactic analysis, a conversion and a generation, etc. are performed by a translation device 4, outputting other language to the output device 5. Thereby, the extraction of the non-sentence against the inputted sentence before the process of the syntactic analysis and also, the speed up of the process of the syntactic analysis and the saving of the capacity of the grammar dictionary for the syntactic analysis can be obtained.

Description

【発明の詳細な説明】技術分野本発明は、非文抽出装置に関し、特に機械翻訳システム
などの自然言語処理において、入力文が日本語としてあ
いまい性がかなり生じていたり、全く日本語としておか
しい文（以下、非文という）を機械的に抽出するのに好
適な非文抽出装置に関するものである。[Detailed Description of the Invention] Technical Field The present invention relates to a non-sentence extraction device, and particularly in natural language processing such as a machine translation system, when an input sentence has considerable ambiguity as Japanese, or a sentence that is completely strange as Japanese. The present invention relates to a non-sentence extraction device suitable for mechanically extracting non-sentences (hereinafter referred to as non-sentences).

従来技術従来、機械翻訳システムにおいて、入力された日本文は
、まず、形態素解析され、次に、非文解析等がなされ、
その非文解析された文が他の言語（例えば、英語、仏語
）に翻訳されることになる。Prior Art Conventionally, in machine translation systems, input Japanese sentences are first subjected to morphological analysis, then non-sentential analysis, etc.
The non-sententially parsed sentence will be translated into another language (eg, English, French).

しかし、入力される日本文は、全て正しい日本語とは限
らないので、前述した非文解析の処理中に非文か否かの
チェックを行っていた。または、非文チェックを行わず
、入力された［１本文そのまま非文解析してｌｌ’ｌ訳
するという方法を採用している。However, since not all input Japanese sentences are correct Japanese, a check is made to see if they are non-sentences during the above-mentioned non-sentence analysis process. Alternatively, a method is adopted in which a non-sentence check is not performed and an input text is directly analyzed as a non-sentence and translated.

このような非文解析中に非文をチェックする方法では、
非文が見つかるまで無駄な処理をしてしまうという問題
があった。Ｊ、た、非文をｔｔつける文法を用いて非文
解析を行うと、非文解析が複雑になり、処理速度が低下
するという問題がある。In this method of checking non-sentences during non-sentence analysis,
There was a problem in that unnecessary processing was performed until a non-sentence was found. If non-sentence analysis is performed using a grammar that adds tt to non-sentences, the non-sentence analysis becomes complicated and the processing speed decreases.

一方、非文をチェックしないで非文解析を行い、翻訳結
果を出力すると、その翻訳結果が全く正しくないものと
なる可能性がある。On the other hand, if non-sentence analysis is performed without checking non-sentences and a translation result is output, the translation result may be completely incorrect.

このように、非文解析中に非文のチェックを行う文法を
用いて翻訳システムを構成すると、システムのコスト高
となり、非文解析に時間ばかｊｌがかっても結局、結果
がでないという状態になるおそれがある。In this way, if a translation system is configured using a grammar that checks non-sentences during non-sentence analysis, the cost of the system will increase, and even if it takes a lot of time to analyze non-sentences, no results will be obtained in the end. There is a risk.

目　　　　　的本発明の目的は、このような従来の問題を解消し、機械
翻訳システム等の自然言語処理において、簡単なハード
ウェア構成で、非文解析処理を行う前に入力文に対して
非文を抽出でき、かつ、非文解析の処理速度を向−卜さ
せて非文解析用の文法辞書の容量を節減できる非文抽出
装置に提倶することにある。Purpose The purpose of the present invention is to solve such conventional problems, and in natural language processing such as machine translation systems, to perform non-sentence analysis on input sentences before performing non-sentence analysis processing with a simple hardware configuration. The object of the present invention is to provide a non-sentence extraction device that can extract the following, increase the processing speed of non-sentence analysis, and reduce the capacity of a grammar dictionary for non-sentence analysis.

構　　　成本発明の非文抽出装置は、−］―記目的を達成するため
に、日本語文を他の言語に機械翻訳を行う機械翻訳シス
テムにおいて、あらかじめ非文と判定する条件を決めて
おき、入力された日本語文の該条件に対応する統計を取
る複数種類の統計手段、および該複数種類の統計手段に
より統計された結果と上記条件を比較して非文か否かの
判定を行う非文判定手段を設けたことに特徴がある。Configuration In order to achieve the purpose of the non-sentence extraction device of the present invention, in a machine translation system that machine-translates a Japanese sentence into another language, conditions for determining it as a non-sentence are determined in advance and input A plurality of types of statistical means that take statistics corresponding to the conditions of the Japanese sentences that have been written, and a non-sentence judgment that compares the statistical results of the plurality of types of statistical means with the above conditions to determine whether or not it is a non-sentence. It is characterized by the provision of means.

以下、本発明の一実施例を、図面により詳細に説明する
。Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings.

第２図は、本発明を適用した機械翻訳システムの概略構
成図である。FIG. 2 is a schematic configuration diagram of a machine translation system to which the present invention is applied.

第２図において、１は自然言語文を入力する入力装置で
あり、例えば、キーボード、ＯＣＲ等の文字入力装置で
ある８２は日本語辞書、接続表を用いて、入力文をｍ語
に分割し、品詞その他の情報を付加する形態素解析装置
である。３け本発明による非文抽出装ばてあり、形態的
統計（文字数が多い、文節が多いなど）、パラメータリ
ス１〜１０、パラメータリスト１１を用いて、文法的、
用法的にみて非文の特徴をもっているかを判定し、非文
ならばユーザにそのことを表示し、非文でなければ、翻
訳装置４に入力して翻訳を行う。４は翻訳装置であり、
翻訳用文法、翻訳用辞書を用いて、入力文を非文解析、
変換、生成処理し、日本語から他言語（例えば、英語、
仏語）へ翻訳する。In FIG. 2, 1 is an input device for inputting natural language sentences, and 82 is a character input device such as a keyboard or OCR, which divides the input sentence into m words using a Japanese dictionary and connection table. This is a morphological analysis device that adds part of speech and other information. There are three types of non-sentence extraction equipment according to the present invention, which uses morphological statistics (large number of characters, many clauses, etc.), parameter lists 1 to 10, and parameter list 11 to extract grammatical,
It is determined whether the text has the characteristics of a non-sentence from a usage point of view, and if it is a non-sentence, this is displayed to the user, and if it is not a non-sentence, it is input to the translation device 4 and translated. 4 is a translation device;
Non-sentential analysis of input sentences using translation grammars and translation dictionaries,
Conversion, generation processing, and conversion from Japanese to other languages (e.g. English,
Translate to French).

５は非文抽出装ｗ３により抽出される非文、または、翻
訳装置４により翻訳された外国語を出力する出力装置で
あり、例えば、ＣＲＴディスプレイ。5 is an output device that outputs the non-sentence extracted by the non-sentence extraction device w3 or the foreign language translated by the translation device 4, and is, for example, a CRT display.

プリンターなどである。Such as printers.

第１図は、本発明の一実施例を示す非文抽出装置の構成
図である。FIG. 1 is a block diagram of a non-sentence extraction device showing an embodiment of the present invention.

第１図において、６け判定表１２の統計結果、しきい値
を用いて非文を判定する非文判定装置であり、処理は、
統計装置７→統計装ｇｆ、８→統ｆｉｌ装Ｗ９と進み、
どれかひどつの統計装置で非文と判４一定されたときは、処理を中断し、ユーザに表示する。こ
のとき、非文と決定した理由も表示する。In FIG. 1, it is a non-sentence judgment device that judges non-sentences using the statistical results of the six-digit judgment table 12 and a threshold, and the processing is as follows:
Proceed to statistics device 7 → statistics device gf, 8 → standard fil device W9,
If any one of the statistical devices determines that it is not a sentence, the process is interrupted and displayed to the user. At this time, the reason why it was determined to be non-text is also displayed.

７は入力文の文字数２文節数をカウントして判定表１２
に記入する統計装置、８はパラメータリスト１０を用い
て、各パラメータの文中の統計を取る統計装置、９はパ
ラメータリスト１１を用いて各パラメータの文中の統計
を取る統計装置、１０は非文を判定するためのキーとな
る語、属性、属性値が記入されているパラメータリスト
、１１は非文を判定するためのキーとなる語の関係が記
入されているパラメータリスト、１２は処理名、パラメ
ータ名、統計結果、しきい値の項目を持っている判定表
であり、統計結果は、各処理におけるパラメータ数のカ
ウント数が記入される。7 counts the number of characters and the number of phrases in the input sentence and makes a judgment table 12.
8 is a statistical device that uses the parameter list 10 to take statistics on each parameter in a sentence; 9 is a statistical device that uses the parameter list 11 to take statistics on each parameter in a sentence; 10 is a statistical device that takes non-sentence statistics. 11 is a parameter list in which words, attributes, and attribute values that are keys for determination are filled in; 11 is a parameter list in which relationships between words that are keys to judge non-sentences; 12 is a process name and parameters This is a judgment table that has items such as name, statistical result, and threshold value, and the statistical result is the count number of parameters in each process.

第３図は、第２図のパラメータリス１〜ＩＯの構成例を
示す図である。このパラメータリスト１０は、単語の属
性名、属性値１表記の３組から構成される。例えば、ひ
らがな「の」は、属性名″品詞″、属性値゛格助詞″、
表記パの″からパラメータリストを構成する。FIG. 3 is a diagram showing an example of the configuration of parameter lists 1 to IO in FIG. 2. This parameter list 10 is composed of three sets of word attribute names and one attribute value representation. For example, the hiragana "no" has an attribute name "part of speech", an attribute value "case particle",
Construct the parameter list from the notation "parameter".

第４図は、第２図のパラメータリス１へ１１の構成例を
示す図である。このパラメータリス１−１１は、ｔｌｉ
語１文節の前後関係を示すために用いられ、まず、リス
トを前ど後に２分割し、前の部分は属性名、属性値２表
記から構成され、後の部分は属性名、属性値２表記、評
価値から構成される。FIG. 4 is a diagram showing an example of the configuration of the parameter list 1 to 11 in FIG. 2. This parameter list 1-11 is tli
It is used to show the context of a single word clause. First, the list is divided into two parts, front and back. The first part consists of an attribute name and two attribute values, and the second part consists of an attribute name and two attribute values. , consists of evaluation values.

第５図は、本発明による判定表１２の構成例を示す図で
ある。この判定表１２は、処理名、パラメータ名、統計
結果、しきい値から構成され、統計処理ｌは、統計装置
７による処理を示しており、パラメータ名には、例えば
、文の文字数１文字数／文節数を記入し５しきい値をあ
らかじめ決めておく。ここでは、一応文字数のしきい値
は、４０゜文節数／文字数に０．６としである。しかし
、この数値に限定されるものではない。また、統ｓ口、
〜果は実際に入力された文に対して統計装置７により統
計を取った結果を記入するので、通１：（゛は何も記入
されていない。統計処理２は、統計装置８による統計処
理を示しており、パラメータ名には、例えば、「の」の
数、連用中止の数、並列助詞の数が記入されており、し
きい値はそれぞれ３′。FIG. 5 is a diagram showing a configuration example of the determination table 12 according to the present invention. This judgment table 12 is composed of a process name, a parameter name, a statistical result, and a threshold value. Statistical process l indicates a process by the statistical device 7, and the parameter name includes, for example, the number of characters in a sentence, the number of characters, and the threshold value. Enter the number of bunsetsu and determine the 5 threshold value in advance. Here, the threshold value for the number of characters is set to 0.6 for 40° number of clauses/number of characters. However, it is not limited to this value. Also, the main entrance,
- The results are filled in with the results of statistics taken by the statistical device 7 for the sentences that were actually input. For example, the number of ``no'', the number of discontinuations, and the number of parallel particles are entered in the parameter name, and the threshold value is 3' for each.

１３　ｇ　、　ｌ　２１としである。この数値について
も、前述の続開処理１と同様、限定されるものではない
。13 g, l 21. This numerical value is also not limited as in the above-described subsequent opening process 1.

統計処理３は、統計装置３による統計処理を示しており
、パラメータ名には、ｒ〜が、〜が」の数が記入されて
おり、しきい値は１′としである。Statistical processing 3 indicates statistical processing by the statistical device 3, in which the number of "r~ is" is entered in the parameter name, and the threshold value is 1'.

したがって、非文判定装置６により上記のような判定表
１２を見て、統計結果がしきい値以上の場合は、入力文
を非文とみなすものとする。Therefore, when the non-sentence determining device 6 looks at the above-described determination table 12 and the statistical result is equal to or higher than the threshold value, the input sentence is deemed to be a non-sentence.

第６図は、本発明による非文抽出処理を示すフローチャ
ートである。これは、第１図の非文抽出装置の動作を示
したものである。以下、第６図のフローチャート・に従
って説明する。FIG. 6 is a flowchart showing non-sentence extraction processing according to the present invention. This shows the operation of the non-sentence extraction device shown in FIG. The process will be explained below according to the flowchart shown in FIG.

まず、形態素解析装Ｗ２により形態素解析された文の文
字数をカウントし、判定表１２に記入しくステップ６０
１）、文節数をカウントし、判定表１２に記入しくステ
ップ６０２）、判定表１２の統計結果としきい値を比較
し、非文判定装置６により非文の判定を行う（ステップ
、６０３）。非文判定装置６により非文と判定された場
合は（ステップ６０４）、第２図に示す出力装置５によ
りユーザに対して表示する。非文判定装置６により非文
でないと判定されたときには（ステップ６０４）、統計
装置８によりパラメータ・リスト１０について、各パラ
メータを入力文中でカウントシ、判定表１２に記入しく
ステップ６０５）、非文判定装置６により判定表１２の
統計結果としきい値を比較し、非文の判定を行う（ステ
ップ６０６）。非文と判定された場合には（ステップ６
０７）、出力装置５により非文を表示する（ステップ６
１１）。非文でないときには（ステップ６０７）、統Ｒ
１装置９によりパラメータ・リスト１１について、各パ
ラメータを入力文中でカウントし、判定表１２に記入し
くステップ６０８）、非文判定装置６により判定表の統
Ｒ１結果としきい値を比較し、非文の判定を行う（ステ
ップｆｉ０９）。非文と判定された場合は（ステップ６
１０）、出力装置５によりユーザ表示する（ステップ６
１１）。非望でないと判定された場合は（ステップ６］
０）、処理を終了する。First, the number of characters in the sentence morphologically analyzed by the morphological analyzer W2 is counted and entered in the judgment table 12.Step 60
1) Count the number of phrases and enter them in the judgment table 12 (step 602), compare the statistical results of the judgment table 12 with the threshold value, and use the non-sentence judgment device 6 to judge whether it is a non-sentence (step 603). If the non-sentence determining device 6 determines that it is a non-sentence (step 604), the output device 5 shown in FIG. 2 displays it to the user. When the non-sentence determining device 6 determines that it is not a non-sentence (step 604), the statistical device 8 counts each parameter in the input sentence in the parameter list 10 and enters it in the determination table 12 (step 605). The device 6 compares the statistical results of the determination table 12 with the threshold value and determines whether it is a non-sentence (step 606). If it is determined that it is a non-sentence (step 6
07), display the non-sentence by the output device 5 (step 6
11). When it is not a non-sentence (step 607), the command R
1 device 9 counts each parameter in the input sentence with respect to the parameter list 11 and enters it in the judgment table 12 (step 608). A determination is made (step fi09). If it is determined that it is a non-sentence (step 6
10), displayed to the user by the output device 5 (step 6)
11). If it is determined that it is not unwanted (Step 6)
0), the process ends.

第７図は、入力文の例を示す図である。この例に示すよ
うな「彼の机の本棚の本」は、文節は４個であるという
ことはすぐわかるが、この文を解析すると、あいまい性
が生じる。すなわち、■彼の机、机の本棚１本棚の本、
Ｑ）彼の本棚、机の本棚２本棚の本、ｆｉ＋彼の本、机
の本棚１本棚の本。FIG. 7 is a diagram showing an example of an input sentence. Although it is immediately obvious that ``a book on the bookshelf on his desk'' as shown in this example has four clauses, an ambiguity arises when this sentence is analyzed. In other words, ■His desk, 1 bookshelf on the desk, 1 bookshelf,
Q) Books on his bookshelf, 2 bookshelves on the desk bookshelf, fi + his books, 1 bookshelf on the desk bookshelf.

■彼の机、机の本２本棚の本という解釈が考えられる。■One possible interpretation is his desk, 2 books on the desk, 2 books on the shelf.

し、たがって、このような入力文は非文として抽出し、
非文解析等は行わない方が処理低下を防止できる。以下
、この「彼の机の本棚の本」が本非文抽出装置３により
非文とみなされるまでの非文抽出処理を説明する。まず
、入力装置１により「彼の机の本棚の本」が入力される
と、次に、形態素解析装置２により形態素解析が行オ）
れる。Therefore, such input sentences are extracted as non-sentences,
Processing degradation can be prevented by not performing non-sentence analysis. The non-sentence extraction process until this "book on the bookshelf on his desk" is deemed to be a non-sentence by the book-nonsentence extraction device 3 will be explained below. First, when "a book on the bookshelf on his desk" is inputted by the input device 1, the morphological analysis is performed by the morphological analysis device 2.
It will be done.

この形態素解析された文「彼の机の本棚の本」が非文抽
出装置３に入力され、非文か否かのチェックが行われる
。以下、第１図に示す非文判定装置３を参照しながら非
文抽出処理について説明する。This morphologically analyzed sentence "a book on the bookshelf on his desk" is input to the non-sentence extraction device 3 and checked to see if it is a non-sentence. The non-sentence extraction process will be described below with reference to the non-sentence determining device 3 shown in FIG.

まず、形態素解析された文「彼の机の本棚の本」は、統
計装置７で文字数２文節数２文節数／文字数の統計が取
られる。そして、判定表１０にその結果が記入される。First, for the morphologically analyzed sentence "a book on the bookshelf on his desk," the statistical device 7 calculates the number of characters, the number of clauses, and the number of clauses/number of characters. Then, the results are entered in the determination table 10.

統計結果は、文字数＝８２文節数＝４１文節数／文字数
＝０．５となる。この判定表１２の例を第８図に示す。The statistical result is that number of characters = 82, number of clauses = 41, number of clauses/number of characters = 0.5. An example of this judgment table 12 is shown in FIG.

非文判定装置６は、第８図に示すような判定表１２を見
て、統計結果がしきい値より大きいかどうかを判定し、
しきい値以上であれば、非文とみなす。しかし、この例
では、両方ともしきい値より小さいので、統計１の結果
では、非文でないとする。処理は統計装置８へと進む。The non-sentence determining device 6 looks at the determination table 12 as shown in FIG. 8 and determines whether the statistical result is greater than a threshold value,
If it is above the threshold value, it is considered a non-text. However, in this example, since both are smaller than the threshold value, it is assumed that the result of Statistics 1 is not a non-sentence. Processing proceeds to the statistical device 8.

統計装置８では、パラメータリスト１０を用いてそのパ
ラメータにより、「彼の机の本棚の本」を検索し、カラ
ン１へし、判定表１２に記入する。この統計装Ｗ８によ
る統計結果を第９図に示す。いまハラメータ・リスト１
０のパラメータ中で品詞が格助詞で表記が「の」のパラ
メータに注目する（説明のため、他のパラメータは無視
する）。すると、入力文「彼の机の本棚の本」には、格
助詞の「の」が３つ含まれている。The statistical device 8 uses the parameter list 10 to search for "a book on the bookshelf on his desk" using the parameters, sends it to Callan 1, and enters it in the determination table 12. The statistical results obtained by this statistical instrument W8 are shown in FIG. Now Harameter List 1
Among the parameters of 0, pay attention to the parameter whose part of speech is a case particle and whose notation is "no" (for the sake of explanation, other parameters are ignored). Then, the input sentence "books on the bookshelf on his desk" contains three case particles "no".

これを判定表１２に記入する。よって、非文判定装置６
では、格助詞「の」の統計結果と、そのしきい値を比較
すると、統計結果がしきい値と等しいことから、「彼の
机の本棚の本」は非文の可能性があり、処理がとこで中
断し、第２図に示す出力装置５によりユーザに「彼の机
の本棚の本」が非文であることを表示し、それとともに
、非文と判定された要因を表示する。Enter this in the judgment table 12. Therefore, the non-sentence determining device 6
Now, if we compare the statistical result of the case particle "no" with its threshold value, the statistical result is equal to the threshold value, so "the book on the bookshelf on his desk" may be a non-sentence, and we cannot process it. The process is interrupted at this point, and the output device 5 shown in FIG. 2 displays to the user that the "book on the bookshelf on his desk" is a non-text, and also displays the reason why it is determined to be a non-text.

第１０図は、他の実施例を示す非文判定装置３の構成図
である。ここで、構成については、非文判定装置６以外
は、第１図と同様であるので、説明を省略し、非文判定
装ｗ６について説明する。FIG. 10 is a configuration diagram of a non-sentence determining device 3 showing another embodiment. Here, since the configuration is the same as that in FIG. 1 except for the non-sentence determining device 6, the explanation will be omitted, and the non-sentence determining device w6 will be explained.

非文判定装ｗ６は、一度、統計装置７に処理を移した後
は、統計装Ｗ９から処理終了の信号がくるまで判定表１
２を見にいかない。つまり、統計装置７→統計装置８→
統計装置９と処理が終了したときには、判定表１２を見
にいき、非文の判定を行い、非文と判定された場合、判
定表１２で統計結果がしきい値以−Ｌのものすべて取り
出し、ユーザにその非文の要因すべてを出力装置５によ
り表示する。Once the non-sentence judgment device w6 transfers the processing to the statistical device 7, it uses the judgment table 1 until it receives a processing end signal from the statistical device W9.
I'm not going to see 2. In other words, statistical device 7 → statistical device 8 →
When the processing with the statistical device 9 is completed, go to the judgment table 12 and judge whether it is a non-sentence, and if it is judged to be a non-sentence, extract all the statistical results from the judgment table 12 that are equal to or greater than the threshold value -L. , all the non-sentential factors are displayed to the user by the output device 5.

第１１図は、他の実施例を示す非文抽出処理のフローチ
ャートである。これは、第１０図の非文抽出装置３によ
る非文抽出の処理を示したものである。以下、第１１図
のブローチヤードに従って説明する。FIG. 11 is a flowchart of non-sentence extraction processing showing another embodiment. This shows the non-sentence extraction process by the non-sentence extraction device 3 of FIG. 10. Hereinafter, the explanation will be made according to the brooch yard shown in FIG.

まず、統計装置７により入力文の文字数をカウントし、
判定表１２に記入しくステップ１１０１）、文節数をカ
ウントシ、判定表１２に記入する（ステップ１１０２）
。次に、統計装置８によりパラメータリスト１０につい
て、各パラメータを入力文中でカウントし、判定表１２
に記入する（ステップ１１０３）。さらに、統計装置９
によりパラメータ・リスト１１について、各パラメータ
を入力文中でカウントし、判定表１２に記入する（ステ
ップ１１０４）。以上の処理により記入された判定表１
２の統計結果としきい値とを比較し、非文の判定を行う
（ステップ１１０５．１１０６）。非文と判定された場
合は、出力装置５にデータ転送して非文であることを表
示する（ステップ１１０７）。First, the statistical device 7 counts the number of characters in the input sentence,
Fill in the judgment table 12 (step 1101), count the number of clauses, and write in the judgment table 12 (step 1102)
. Next, the statistical device 8 counts each parameter in the input sentence for the parameter list 10, and the judgment table 12
(Step 1103). Furthermore, the statistical device 9
With respect to the parameter list 11, each parameter is counted in the input sentence and entered in the judgment table 12 (step 1104). Judgment table 1 filled in through the above process
The statistical result of step 2 is compared with the threshold value to determine whether it is a non-sentence (steps 1105 and 1106). If it is determined that it is a non-sentence, the data is transferred to the output device 5 to display that it is a non-sentence (step 1107).

非文でないときは、翻訳装置４により非文解析。If it is not a non-sentence, the translation device 4 analyzes it as a non-sentence.

変換、生成等を行って、他言語を出力装置５に出力する
。It performs conversion, generation, etc., and outputs the other language to the output device 5.

ここで、他の実施例が本実施例と異なる点は、ユーザに
対し、非文と判定された要因を複数表示することである
。判定表１２には、統計処理１゜２．３の各項目がすべ
て埋められているので、これを見て非文判定装置６は、
非文の要因を抽出して（統計結果がしきい値以上のもの
）そのパラメータ名をすべて出力装置５に送り表示する
。Here, the difference between the other embodiments and this embodiment is that a plurality of factors that are determined to be non-sentences are displayed to the user. In the judgment table 12, all the items of statistical processing 1゜2.3 are filled in, so the non-sentence judgment device 6 looks at this and
Non-sentence factors are extracted (those whose statistical results are above the threshold) and all their parameter names are sent to the output device 5 and displayed.

このように、本実施例においては、形態素解析さ九た入
力文に対して、あらかじめ決められた非文の条件と比較
することにより、非文か否かの判定をハードウェアによ
り行い、その後、非文解析を行うので、非文解析に必要
な文法辞書等の容量も小さくできる。また、この非文抽
出は、ある程度機械的に処理できるので処理速度を向上
できる。In this way, in this embodiment, the hardware determines whether or not the morphologically analyzed input sentence is a non-sentence by comparing it with predetermined non-sentence conditions, and then Since non-sentence analysis is performed, the capacity of grammar dictionaries and the like required for non-sentence analysis can be reduced. Moreover, since this non-sentence extraction can be processed mechanically to some extent, the processing speed can be improved.

さらに、無駄な非文解析等の処理をしなくてすむので、
非文解析の処理速度も向−Ｌできる。また、本実施例で
は、入力文は非文みなされるもの以外について翻訳する
ことになるので、最終的に訳文の質を向上できるように
なる。Furthermore, it eliminates the need for unnecessary processing such as non-sentence analysis.
The processing speed of non-sentence analysis can also be improved. Furthermore, in this embodiment, since input sentences other than those that are considered non-sentences are translated, the quality of the translated sentences can be improved in the end.

効　　　果以」−説明したように１本発明によれば、機械翻訳シス
テム等の自然言語処理において、簡Ｑｌ−７’Ｊ！ハー
ドウエア構成で、非文解析処理を行う１）１ｆに入力文
に対して非文を抽出できるようになり、かつ、非文解析
の処理速度を向−１６でき、非文解析用の文法辞書の容
社を節減できる。According to the present invention, simple Ql-7'J! is used in natural language processing such as machine translation systems. Perform non-sentence analysis processing using hardware configuration 1) In 1F, non-sentences can be extracted from input sentences, the processing speed of non-sentence analysis can be increased by -16, and a grammar dictionary for non-sentence analysis can be used. You can save on transportation costs.

[Brief explanation of drawings]

第１図は本発明の一実施例４示す外交抽出装置の構成図
、第２図は本発明を適用した機械翻訳システムの構成図
、第３図および第４図は本発明によるパラメータリスト
・の構成例を示す図、第５図は本発明による判定表の構
成例夕示す図、第６図は第１図の外交抽出装置による非
文抽出処理を示すブローチ（−−−１−１第７図は入力
文の例を示す図、第８図、第９図は第７図の入力文の統
８１結果を示す判定表の構成例を示ず図、第１０図は本
発明の他の実施例による非文抽出１・ν百の構成図、第
１１は第１０図の外交抽出装置による非文抽出処理位示
すフローチャートである。４、図面の簡単な説明１：入力装置、２：形態素解析装置、３：外交抽出装置
、４：翻訳装置、５：出力装置、６：非文判定装置、７
，８．９：統計装置、１１．１２：パラメータリスト。第　　　１　　　図第　　　２　　　図　１ｂ　− 第　　　３　　　図第　　　４　　図第５図第　　　６　　　図第　　　７　　　図第　　　８　　　図第　　　９　　　図第　　　１０　　　図統計装置非　　６８文・う７　　　　　　　　　　　　　　　　　　　　　　
　　　　　刊、。　　、リ　　　統計装置　　　　　　
　　定値９看一メ１１　　　　　　　統計装置４！ＩＩ定表第　　　１１　　　図Fig. 1 is a block diagram of a diplomatic extraction device showing a fourth embodiment of the present invention, Fig. 2 is a block diagram of a machine translation system to which the present invention is applied, and Figs. FIG. 5 is a diagram showing a configuration example of a judgment table according to the present invention, and FIG. 6 is a diagram showing a non-sentence extraction process by the diplomatic extraction device of FIG. The figure shows an example of an input sentence, FIGS. 8 and 9 do not show an example of the structure of a judgment table showing the result of the input sentence synthesis 81 in FIG. 7, and FIG. 10 shows another embodiment of the present invention. 11 is a flowchart showing the non-sentence extraction processing by the diplomatic extraction device of Fig. 10. 4. Brief explanation of the drawings 1: Input device, 2: Morphological analysis Device, 3: Diplomacy extraction device, 4: Translation device, 5: Output device, 6: Non-sentence determination device, 7
, 8.9: Statistical equipment, 11.12: Parameter list. Fig. 1 Fig. 2 Fig. 1b - Fig. 3 Fig. 4 Fig. 5 Fig. 6 Fig. 7 Fig. 8 Fig. 9 Fig. 10
Published. , Li statistical device
Fixed value 9 views 11 statistics device 4! II Schedule Figure 11

Claims

[Claims]

(1) In a machine translation system that performs machine translation of Japanese sentences into other languages, multiple types of statistical means are used to determine conditions for determining non-sentences in advance and to obtain statistics corresponding to the conditions for input Japanese sentences. , and non-sentence determining means for comparing the statistical results obtained by the plurality of types of statistical means with the above-mentioned conditions to determine whether or not it is a non-sentence.

(2) The non-sentence extraction device according to claim 1, wherein the statistical means counts the number of characters and the number of phrases in the Japanese sentence.

(3) The above-mentioned statistical means counts each parameter using a parameter list in which parameters consisting of words, attributes, and attribute values for determining non-sentences are entered. The non-sentence extraction device described in Section 1.

(4) The statistical means counts each parameter using a parameter list in which parameters consisting of word relationships for determining non-sentences are entered. non-sentence extraction device.