JPH0713968A

JPH0713968A - Syntax checking device

Info

Publication number: JPH0713968A
Application number: JP5030983A
Authority: JP
Inventors: Tadashi Nagano; 野正永; Masaki Kiyono; 野正樹清; Masanori Takahashi; 橋雅則高; Hideko Mori; 秀子森
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-02-19
Filing date: 1993-02-19
Publication date: 1995-01-17

Abstract

PURPOSE:To support the generation of a document by indicating the cause or position of a grammatical error included in the document with high precision. CONSTITUTION:The document is segmented by each sentence by a sentence segmenting part 1 and transmitted to a syntax analysis part 2 by a sentence unit. When the analysis of the whole sentence becomes failure in the syntax analysis part 2, prepared partial trees (plural) and information on the failure of rule application is transmitted to a non-sentence analyzing part 5. The non- sentence analyzing part 5 applies a non-sentence analysis rule 6 for investigating the error with the combination of those as an argument. As a result, an error kind is specified and a message to a user is outputted from a message generating part 7 in accordance with the kind. A priority processing part 9 for deciding the priority of the partial trees is added before the non-sentence analyzing part 5 so that a processing can be executed at high speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書処理および自然言
語処理における構文チェック装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a syntax check device in document processing and natural language processing.

【０００２】[0002]

【従来の技術】従来、文書作成を支援するために誤文を
指摘する技術としては、辞書の見出しにない綴りの文字
列を未知語として指摘するスペルチェックや、誤った結
果の単語の並び方のパターンをマッチングして、文法的
な誤りを指摘するものなどがある。2. Description of the Related Art Conventionally, techniques for pointing out erroneous sentences in order to support document creation include spell checking that points out spelled character strings that are not found in dictionary headings as unknown words, and erroneous word arrangements. Some match patterns and point out grammatical mistakes.

【０００３】[0003]

【発明が解決しようとする課題】誤りには、スペルミス
と文法的誤りとがあるが、ここでは文法的誤りについて
考える。単語の並び方のパターンをマッチングして誤り
を指摘する方法では、入力がパターンに完全にマッチし
た場合には指摘することが可能であるが、修飾語句や活
用などの影響でパターンが乱されると、マッチできない
ため誤りを指摘できなくなる。このような誤りを指摘し
ようとすると、別のパターンを用意しなければならず、
ルールの数が膨大になるばかりでなく、また誤ってヒッ
トしてしまう可能性も大きくなる。また、マッチングに
ヒットした場合には誤りを指摘することができるが、ヒ
ットしない場合には、誤りに関する情報は何も得られな
い。さらに、語の欠落に対しては、パターンとして記述
しにくいので指摘することが難しい。なお、文法的な間
違いについて、置換、欠落、追加などのすべての可能性
を予測して間違いを指摘することは理論的に可能ではあ
るが、これらの可能性は組み合わせが膨大であるため、
辞書や構文規則のデータが大きくなり、処理時間がかか
り過ぎて実用的でなくなる。There are spelling errors and grammatical errors as errors, but here we will consider grammatical errors. With the method of matching the pattern of word arrangement and pointing out an error, it is possible to point out when the input matches the pattern completely, but if the pattern is disturbed by the influence of modifiers or conjugation , I can't match because I can't match. In order to point out such an error, you have to prepare another pattern,
Not only does the number of rules grow enormous, but it also increases the possibility of accidental hits. Also, if the matching is hit, an error can be pointed out, but if the matching is not hit, no information regarding the error can be obtained. Furthermore, it is difficult to point out missing words because it is difficult to describe them as patterns. It is theoretically possible to predict all possible substitutions, omissions, additions, etc. for grammatical mistakes, but it is theoretically possible, but since these possibilities have a huge number of combinations,
The data of dictionaries and syntax rules become large, and the processing time becomes too long to be practical.

【０００４】本発明は、このような従来の問題を解決す
るものであり、文の中の各種の誤りを単なるパターンマ
ッチングに比べて大きく向上させることのできる構文チ
ェック装置を提供することを目的とする。The present invention solves such a conventional problem, and an object of the present invention is to provide a syntax check device capable of greatly improving various errors in a sentence as compared with simple pattern matching. To do.

【０００５】[0005]

【課題を解決するための手段】本発明は、上記目的を達
成するために、入力された文書を１文ずつ切り出す文切
出し部と、切り出された入力文を構文解析規則を用いて
構文解析し、入力文に誤りがあって構文解析が失敗した
場合にはその時点までの部分木を出力する構文解析部
と、構文解析部が出力した部分木から非文分析規則を用
いて入力文の誤りを検出する非文分析部と、検出された
誤りに対応するメッセージを出力するメッセージ作成部
とを備えたものである。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention parses an input document sentence by sentence, and parses the clipped input sentence using a parsing rule. , If the input sentence has an error and the parsing fails, the parser outputs the subtree up to that point and the error in the input sentence using the non-sentence analysis rule from the subtree output by the parser. Is provided with a non-sentence analysis unit for detecting a message and a message creation unit for outputting a message corresponding to the detected error.

【０００６】本発明はまた、構文解析部が作成した部分
木を構成する部分構造の集合の中から、他の部分木に含
まれない部分構造に対して高い優先度を与えて非文分析
部に送る優先度処理部を備えたものである。According to the present invention, a non-sentence analysis unit is given a high priority to a substructure that is not included in another subtree from a set of substructures that form a subtree created by the syntactic analysis unit. It is provided with a priority processing unit for sending to.

【０００７】[0007]

【作用】本発明は、上記構成により、まず入力された文
書は、文切り出し部で１文ずつ切り出されて構文解析部
に送られる。構文解析部では、切り出された入力文の構
文解析を行なう。入力文に誤りがあって構文解析が失敗
した場合、解析の過程でできた部分的な構文木（複数）
と失敗情報を非文分析部に送る。非文分析部では、これ
らの構文木に対して誤りを検出するための非文分析規則
を次々に適用する。その結果誤りが発見されれば、メッ
セージ作成部で対応するメッセージを作成し、ユーザに
指摘する。非文分析規則によって誤りが発見できなかっ
た場合は、作成された部分木の位置に基づいて、間違い
のありそうな場所に関する情報をユーザに提示する。ま
た、優先度処理部は、構文解析部から出力された部分木
に対して優先づけを行ない、優先度の高い順に非文分析
規則を適用するようにする。According to the present invention, with the above configuration, the input document is first cut out one sentence by the sentence cutout unit and sent to the syntax analysis unit. The parsing unit parses the extracted input sentence. Partial parse tree (plural) created in the process of parsing when the input sentence is incorrect and parsing fails
And the failure information to the non-sentence analysis section. The non-sentence analysis unit successively applies non-sentence analysis rules for detecting errors to these syntax trees. If an error is found as a result, the message creating unit creates a corresponding message and points it out to the user. If no error can be found by the non-sentence analysis rule, the user is presented with information regarding a likely error position based on the position of the created subtree. Also, the priority processing unit gives priority to the subtrees output from the syntax analysis unit, and applies the non-sentence analysis rules in descending order of priority.

【０００８】[0008]

【実施例】図１は本発明の一実施例を示す構文チェック
装置のブロック図である。１は入力された文書を１文ず
つ切り出す文切り出し部、２は切出された入力文を構文
解析して、入力文に誤りがあって構文解析が失敗した場
合にはその時点までの部分木を出力する構文解析部、３
は構文解析に使用される構文解析規則、４は単語情報を
保持する単語辞書、５は構文解析部２が出力した部分木
から非文原因を検出する非文分析部、６は誤りを探すと
きの条件を記述した非文分析規則、７は検出された誤り
に対応するメッセージを作成するメッセージ作成部、８
はメッセージを作成するときに使用されるメッセージ辞
書である。９は構文解析部２が構文解析に失敗して、非
文分析部５に渡した部分木を構成する部分構造（ノー
ド）の集合のうち、他の部分木に含まれない部分構造に
対して高い優先度を与えて非文分析部５に処理させる優
先度処理部である。1 is a block diagram of a syntax check device showing an embodiment of the present invention. 1 is a sentence cutout unit that cuts out the input document one sentence at a time, 2 is a syntax analysis of the cut out input sentence, and if there is an error in the input sentence and the syntax analysis fails, the subtree up to that point Parser that outputs
Is a parsing rule used for parsing, 4 is a word dictionary that holds word information, 5 is a non-sentence analysis unit that detects a non-sentence cause from the subtree output by the syntactic analysis unit 2, and 6 is when searching for an error The non-sentence analysis rule describing the condition of No. 7, a message creating unit for creating a message corresponding to the detected error, 8
Is a message dictionary used when composing a message. 9 indicates a substructure that is not included in another subtree in the set of substructures (nodes) that make up the subtree passed to the non-sentence analysis unit 5 because the parsing unit 2 failed in the parsing. This is a priority processing unit that gives a high priority to the non-sentence analysis unit 5 for processing.

【０００９】次に、上記実施例の動作について説明す
る。誤文指摘の対象となる文書は、まず文切り出し部１
に入力される。文切り出し部１では、区切り記号などを
もとに文書を１文ずつに切り出して構文解析部２に送り
出す。入力文が構文解析部２に入力されると、構文解析
部２は文頭から解析を初め、構文解析規則３と単語辞書
４を引きながら構文解析木を成長させる。構文解析規則
３の中のある規則が成功した場合、その結果出来た部分
木を保存しておく。以降この部分木のことを「成功ゴー
ル」と呼ぶ。成功ゴールは部分木であるから、その部分
木に含まれる各単語の品詞や活用形、各枝の範疇や役
割、文中における位置などの情報を含む。構文解析のア
ルゴリズムが同じ部分木を何度も生成しうるものである
場合には、全く同じ部分木が重複して登録されないよう
にチェックする。本実施例では構文解析プログラムであ
るパーザに範疇を予測しながら木をボトムアップに成長
させる構文解析アルゴリズムを用いた例で説明する。解
析の途中である範疇が予測されていながらその範疇をル
ートとする木の生成に失敗することがある。この場合に
は失敗した範疇名とその文中における位置を保存してお
く。以降この範疇名と位置の２つの情報をもつ失敗の痕
跡のことを「失敗ゴール」と呼ぶ。なお、本実施例では
構文解析部２が文頭から木を成長させていくので、文中
に誤りがあると、それ以降の部分では部分木ができない
場合もある。大きく成長した木の属性の多くはその木の
部分木の中で主要な役割を果たすものの属性が継承され
る。このようにしてどの部分木にもその木（範疇）とし
ての属性が構造や含まれる単語に応じて登録されている
ようにできる。構文解析が成功した場合はそこで処理が
終る。構文解析が失敗した場合、作成された成功ゴール
と失敗ゴールが非文分析部５に送られる。Next, the operation of the above embodiment will be described. The document to be pointed out as a erroneous sentence is first the sentence segmentation unit 1
Entered in. The sentence cutout unit 1 cuts out a document into sentences one by one based on a delimiter and the like, and sends it to the syntax analysis unit 2. When the input sentence is input to the syntactic analysis unit 2, the syntactic analysis unit 2 starts the analysis from the beginning of the sentence and grows the syntactic analysis tree while drawing the syntactic analysis rule 3 and the word dictionary 4. When a certain rule in the parsing rule 3 is successful, the resulting subtree is saved. Hereinafter, this subtree will be referred to as a “success goal”. Since the success goal is a subtree, it includes information such as the part of speech and conjugation of each word included in the subtree, the category and role of each branch, and the position in the sentence. If the parsing algorithm can generate the same subtree many times, check that the exact same subtree is not duplicated. In the present embodiment, an example will be described in which a parser that is a syntactic analysis program uses a syntactic analysis algorithm that grows a tree bottom-up while predicting a category. Sometimes a category that is in the middle of analysis is predicted but the tree rooted in that category may fail to be generated. In this case, the category name that failed and the position in the sentence are saved. Hereinafter, the trace of failure that has two pieces of information on the category name and the position will be referred to as a "failure goal". In this embodiment, the syntax analysis unit 2 grows the tree from the beginning of the sentence. Therefore, if there is an error in the sentence, the subtree may not be formed in the subsequent portions. Most of the attributes of a tree that has grown greatly are inherited from the attributes of those that play a major role in the subtree of the tree. In this way, the attribute as the tree (category) can be registered in every subtree according to the structure and the included words. If the parsing is successful, processing ends there. When the syntactic analysis fails, the created success goal and failure goal are sent to the non-sentence analysis unit 5.

【００１０】非文分析部５では、非文分析規則６を適用
する。各非文分析規則６には、（１）幾つかの範疇名
（品詞も含む）、（２）それらの間の位置関係、（３）
各範疇の成功ゴール／失敗ゴールの区別、の３つを起動
条件として記述できる。例えば、英語の冠詞とそれがか
かる名詞の単数／複数に関する矛盾を抽出する規則であ
れば、（１）は冠詞と名詞句主部、（２）はこれらが冠
詞、名詞句主部の順序で隣接していること、（３）は両
方とも「成功ゴール」であること、となる。"a boys"と
いった間違いに対してこのルールが適用できる。ここで
名詞句主部は、形容詞などの修飾語句がかかっていても
部分木として成立するので、「ａ＋ｓのつく語」と
いったパターンマッチングではできない「a good boys
」のような形に対しても誤りを検出することができ
る。また、複数形であるかどうかといった属性は構文解
析部２が解析木を作る段階で木の属性として登録される
ので、「木の属性」のみに着目することによって、同種
の誤りに対しては表層的には違うパターンであっても１
つの規則によって誤りを判定でき、規則数の増加を防ぐ
ことができる。The non-sentence analysis section 5 applies the non-sentence analysis rule 6. Each non-sentence analysis rule 6 includes (1) some category names (including part of speech), (2) positional relationship between them, and (3)
It is possible to describe three of the success conditions and the failure goals of each category as the activation conditions. For example, in the case of rules for extracting contradictions concerning English articles and the singular / plural of such nouns, (1) is the article and the noun phrase main part, and (2) is these articles in the order of the article and the noun phrase main part. They are adjacent, and both (3) are "success goals". This rule can be applied to mistakes such as "a boys". Here, the main part of the noun phrase is formed as a subtree even if a modifier such as an adjective is applied, so "a good boys" that cannot be used for pattern matching such as "words with a + s"
An error can be detected even for a form such as "." In addition, attributes such as whether or not they are plural are registered as tree attributes when the parsing unit 2 creates a parse tree, so by paying attention only to “tree attributes” 1 even if the pattern is different on the surface
An error can be determined by one rule, and an increase in the number of rules can be prevented.

【００１１】非文分析部５は、各非文分析規則６の
（１）、（２）、（３）の条件に合致する成功ゴールや
失敗ゴールの全ての組み合わせを入力から抽出してその
規則を適用する。全ての組み合わせを抽出する方法は、
部分木を位置やルートの範疇名でソートするなどよく知
られているアルゴリズムを使用すればよい。各非文分析
規則６の内部では、規則（１）のそれぞれのゴールに対
して必要なら属性の検査を行ない、矛盾を抽出する。英
語であれば、例えば時制や単数／複数の一致などがチェ
ックされる。なお、属性のチェックを行なわないルール
もある。範疇間で矛盾が検出された場合、必要であれば
他の矛盾しない解釈がないかどうかがチェックされる。
これは多くの場合、文中の位置と範疇名が同じで属性の
み違うものがないかどうかを検査することによって行な
う。この検査によって他の矛盾しないような解釈が見つ
からなければ、誤りの種類（識別子）と関連するゴール
をメッセージ作成部７に出力する。The non-sentence analysis unit 5 extracts from the input all combinations of success goals and failure goals that meet the conditions (1), (2), and (3) of each non-sentence analysis rule 6, and then extracts the rules. Apply. The method to extract all combinations is
A well-known algorithm such as sorting subtrees by position or root category may be used. Inside each non-sentence analysis rule 6, an attribute is inspected for each goal of rule (1) if necessary, and a contradiction is extracted. In English, for example, tenses and singular / plural matches are checked. There are also rules that do not check attributes. If a discrepancy between categories is detected, it is checked for other non-conflicting interpretations if necessary.
This is often done by checking for positions in the sentence that have the same category name but different attributes. If no other inconsistent interpretation is found by this inspection, the goal associated with the error type (identifier) is output to the message creating unit 7.

【００１２】メッセージ作成部７では、非文分析部５か
ら識別子が送られると、矛盾の種類に応じたメッセージ
をメッセージ辞書８から取り出す。対応する語句などを
入力文や部分木から抽出し、メッセージ内に埋め込むな
どの処理を行なって最終的なメッセージを合成する。ま
た、上記の処理でどの非文分析規則６によっても矛盾が
検出できなかった場合には、文中において成功ゴールが
作成されている範囲をメッセージにしてユーザに提示す
る。これは非文分析部５が誤りを検出しないまま処理を
終了した場合、特別な誤り識別子を生成することによっ
て実現できる。When the identifier is sent from the non-sentence analysis unit 5, the message creation unit 7 extracts a message according to the type of contradiction from the message dictionary 8. The corresponding word or phrase is extracted from the input sentence or subtree, and the final message is composed by performing processing such as embedding in the message. If no contradiction can be detected by any of the non-sentence analysis rules 6 in the above process, the range in which the success goal is created in the sentence is presented as a message to the user. This can be realized by generating a special error identifier when the non-sentence analysis unit 5 ends the process without detecting an error.

【００１３】次に、以上の動作を実際の例文に基づいて
説明する。対象言語は英語とする。以下の説明では、英
語の品詞について、名詞をＮ、一般動詞をＶ、ｂｅ動詞
をＢＥ、前置詞をＰ、代名詞をＰＮ、冠詞をＤＥＴ、形
容詞をＡＤＪなどと表わすことがある。また、構文解析
木が重要な役割を果たすが、そのノードとなる範疇名に
ついて、文をＳ、名詞句をＮＰ、動詞句をＶＰ、不定詞
句をＩＮＦ，名詞句主部をＮＨなどと表わすことがあ
る。Next, the above operation will be described based on an actual example sentence. The target language is English. In the following description, for English part-of-speech, a noun may be expressed as N, a general verb as V, a be verb as BE, a preposition as P, a pronoun as PN, an article as DET, and an adjective as ADJ. In addition, although the parse tree plays an important role, the category name that becomes the node is expressed as S, noun phrase NP, verb phrase VP, infinitive phrase INF, noun phrase main part NH, etc. There is.

【００１４】"He drank a cold water." という入力文
を例として説明すると、これは water（水）という不加
算名詞に単数を表わす不定冠詞 "a" がかかっている点
が文法的に誤っている。（口語としては"a cup of wate
r"が省略された形、といった解釈もありうるかも知れな
いが、ここでは文法的な誤りとする。）この文は文切り
出し部１によって前後の文から切り離され、構文解析部
２に入力される。この入力文は誤りを含んでいるので、
構文解析部２では完全な解析に失敗する。しかし、途中
まで解析した結果、図２（ａ）のような部分木ができ
る。これを成功ゴールと失敗ゴールとして表わしたのが
図２（ｂ）である。成功ゴールは範疇名とその支配する
範囲が示されている。実際にはさらに内部構造と文法的
属性を持つ。失敗ゴールは文中における開始位置と範疇
名が示されている。他の図についても成功ゴールと失敗
ゴールの記述法は同様とする。Taking the input sentence "He drank a cold water." As an example, this is grammatically erroneous in that the indefinite noun "a" is applied to the non-additive noun "water". There is. (In colloquial terms "a cup of wate
There may be an interpretation such that the r "is omitted, but this is a grammatical error.) This sentence is separated from the preceding and following sentences by the sentence cutout unit 1 and input to the parsing unit 2. Since this input sentence contains an error,
The parsing unit 2 fails in complete parsing. However, as a result of partial analysis, a subtree as shown in FIG. This is shown in FIG. 2B as a success goal and a failure goal. The success goals are shown in the category name and the range of control. It actually has more internal structure and grammatical attributes. The failure goal is indicated by the starting position and category name in the sentence. The description method of success goals and failure goals is the same for other figures.

【００１５】ここで図３のような構文解析規則があると
する。各規則を適用する際には属性の一致などがチェッ
クされ、条件が満たされなければ失敗する。この規則を
図２のような部分木のできる過程に当てはめて説明す
る。文頭において規則１によってＮＰが予測され、規則
２によってＰＮが予測されるが、単語"He"の属性がＰＮ
であるので予測が満たされ、成功ゴールＰＮができ、更
に成功ゴールＮＰができる。これによって規則１の次の
要素であるＶＰが予測され、更に規則６によってＶ（他
動詞）が予測され、"drank"が他動詞であることから、
成功ゴールＶができ、規則６の次の要素であるＮＰが予
測される。さらに規則３によって成功ゴールＤＥＴと成
功ゴールＮＨが予測され、かつ作成される。しかし、こ
の段階で、規則３に基づいてＮＰを作成しようとすると
き、属性（加算／不加算）の不一致によって成功ゴール
ＮＰの作成は失敗する。このとき、”ａ”の位置から始
まるＮＰが失敗したという情報が「失敗ゴール」として
登録される。ここまでで、図２のような成功ゴールと失
敗ゴールができることがわかる。なお、以上の構文解析
部２の規則や動作は非文分析を行なう上で重要なものの
みを述べたが、多くの言語現象をカバーするようにする
ためには規則数はこれよりずっと多くする必要があり、
１つの単語の品詞や属性も複数ある場合が多いので、成
功ゴールや失敗ゴールは上にあげたもの以外にもでき
る。特に失敗ゴールは非常に多くでき、この時点ではど
のゴールが誤りにつながる本質的なものかはわからな
い。Here, it is assumed that there is a parsing rule as shown in FIG. When applying each rule, the attribute match is checked, and if the conditions are not met, it fails. This rule will be described by applying it to the process of forming a subtree as shown in FIG. At the beginning of the sentence, rule 1 predicts NP and rule 2 predicts PN, but the attribute of the word "He" is PN.
Therefore, the prediction is satisfied, the successful goal PN can be made, and further the successful goal NP can be made. As a result, VP, which is the next element of rule 1, is predicted, and further V (transitive verb) is predicted by rule 6, and "drank" is a transitive verb.
A success goal V is made, and the next element of rule 6, NP, is predicted. Further, according to Rule 3, the success goal DET and the success goal NH are predicted and created. However, at this stage, when trying to create an NP based on Rule 3, the creation of the success goal NP fails due to the mismatch of attributes (addition / non-addition). At this time, information that the NP starting from the position of “a” has failed is registered as a “failure goal”. By now, it can be seen that success goals and failure goals as shown in Fig. 2 can be made. Although the above-mentioned rules and actions of the parsing unit 2 are only important for performing non-sentence analysis, the number of rules is much larger than this in order to cover many language phenomena. Must be,
Since there are often multiple parts of speech and attributes for a single word, success goals and failure goals can be other than those listed above. In particular, there are so many failed goals that I don't know at this point which goals are the ones that lead to mistakes.

【００１６】以上のようにして作成された図２（ｂ）の
成功ゴールと失敗ゴールが非文分析部５に送られる。非
文分析部５では、図２（ｂ）において隣合った範疇に対
して非文分析規則を適用する。非文分析規則の例を図４
に示す。図４の規則Ａの引数である隣合った成功ゴール
ＤＥＴ，成功ゴールＮＨの組合せは、入力の中の成功ゴ
ールＤＥＴ(="a")，成功ゴールＮＨ(="cold water") の
部分に適合する。従って、これらを引数として規則Ａが
適用される。このとき、成功ゴールＤＥＴの中には"a"
の属性「加算」、「単数」などが登録されている。また
成功ゴールＮＨにはその中の主部である名詞"water" の
属性が成功ゴール全体の属性として継承されるので、
「不加算」などの属性が成功ゴールに登録されている。
規則Ａ内でこれらが参照され、「加算」と「不加算」が
それぞれ検出されたことにより「誤り１」が成立してい
ると判定される。次に他の解釈がないかどうかが調べら
れるが、"water"には加算名詞であるような別の意味は
ないことなどがわかり、矛盾しないような別の解釈はな
いことがわかる。これにより、識別子「誤り１」と成功
ゴールＤＥＴ，成功ゴールＮＨがメッセージ作成部７に
送られることになる。The success goal and the failure goal of FIG. 2B created as described above are sent to the non-sentence analysis section 5. The non-sentence analysis unit 5 applies the non-sentence analysis rules to the categories adjacent to each other in FIG. Figure 4 Example of non-sentence analysis rules
Shown in. The combination of adjacent success goal DET and success goal NH, which are arguments of rule A in FIG. 4, is the success goal DET (= "a") and success goal NH (= "cold water") in the input. Fits. Therefore, the rule A is applied with these as arguments. At this time, in the success goal DET is "a"
The attributes "addition", "single", etc. are registered. The attribute of the noun "water" which is the main part of the success goal NH is inherited as the attribute of the entire success goal.
Attributes such as "no addition" are registered in the success goal.
These are referred to in the rule A, and "addition" and "non-addition" are respectively detected, so that it is determined that "error 1" is established. Next, it is examined whether there is another interpretation, but it is found that "water" has no other meaning such as an additive noun, and there is no other interpretation that is consistent. As a result, the identifier “error 1”, the success goal DET, and the success goal NH are sent to the message creating unit 7.

【００１７】メッセージ作成部７では、エラーの種類に
対応したメッセージをメッセージ辞書８から取り出し、
わかりやすいように成功ゴールの情報をメッセージ内に
組み込む。この文の場合は、識別子「誤り１」から「
＊と＊の加算／不加算が矛盾しています。」というメッ
セージがメッセージ辞書８から取り出され、これに入力
された成功ゴールＤＥＴからの単語"a" と、成功ゴール
ＮＨからの中心単語"water" とが組み込まれ、「"a"
と"water" の加算／不加算が矛盾しています」というメ
ッセージが作成され、ユーザへ出力される。In the message creating section 7, a message corresponding to the type of error is retrieved from the message dictionary 8 and
Include success goal information in the message for clarity. In the case of this sentence, the identifiers "error 1" to "
Addition / non-addition of * and * is inconsistent. Message is extracted from the message dictionary 8, and the word "a" from the success goal DET and the central word "water" from the success goal NH input into the message dictionary 8 are incorporated, and "" a "
And "addition / non-addition of" water "are inconsistent" is created and output to the user.

【００１８】次に、"This is pen."という入力文を例と
して別の動作を簡単に説明する。この場合、"pen" の直
前に冠詞"a" が欠落している。このような欠落は単語レ
ベルのパターンマッチングによっては検出できない。構
文解析部２における解析結果を図５に示す。なお、失敗
ゴールは実際には図中のもの以外にも生成されるがここ
では省略する。構文解析部２では、"This is" の後にＮ
Ｐが予測されるが、"pen" だけではＮＰを構成しえない
ので、この位置に失敗ゴールＮＰができる。非文分析部
５においては、"pen" の位置の成功ゴールＮＨと失敗ゴ
ールＮＰを引数として図４中の規則Ｂの適用が成功し、
「誤り３」が検出され、メッセージ作成部７に送られ
る。メッセージ作成部７では、「誤り３」に対応するメ
ッセージ「＊の前に冠詞が抜けています」がメッセージ
辞書８から取り出され、成功ゴールＮＨから表層の"pe
n" が抽出されて「"pen" の前に冠詞が抜けています」
というメッセージが合成され、ユーザに出力される。Next, another operation will be briefly described by taking the input sentence "This is pen." As an example. In this case, the article "a" is missing just before "pen". Such a lack cannot be detected by word-level pattern matching. The analysis result in the syntax analysis unit 2 is shown in FIG. The failure goals are actually generated in addition to those shown in the figure, but are omitted here. In the parsing unit 2, N is put after "This is"
P is predicted, but the NP cannot be constructed by "pen" alone, so a failure goal NP can be made at this position. In the non-sentence analysis unit 5, the rule B in FIG. 4 is successfully applied with the success goal NH and the failure goal NP at the position of "pen" as arguments,
“Error 3” is detected and sent to the message creating unit 7. In the message creating unit 7, the message “The article is missing before *” corresponding to “Error 3” is retrieved from the message dictionary 8 and the success goal NH is followed by “pe” in the surface layer.
n "is extracted and" the article is missing before "pen""
Message is synthesized and output to the user.

【００１９】最後の例として"I went have been to the
island." という誤文に対する動作を述べる。これは、
最初"I went to the island."と書こうとして途中で"we
nt"を"have been" に直そうとしたが、"went"を消すの
を忘れた、という状況を想定している。この場合の分析
例を図６に示す。この場合"I went"の直後に予期しない
語が来たためにこの位置での予測がすべて失敗する。こ
のため、heve以降では成功ゴールができない。さらに成
功ゴールと失敗ゴールを組み合わせても明確な誤りは検
出できない。非文分析部５では、このように成功ゴール
のできていない場合は、特別な識別子「検出失敗」を作
成し、これと成功ゴールのできている範囲をメッセージ
作成部７に送る。これを受けてメッセージ作成部７で
は、解析できない範囲を指摘するメッセージ「"have be
en to the island."が解析できません」を合成して出力
する。このようにして明確な誤りの特徴が抽出できなか
った場合にも、誤りの位置に関する情報をユーザに示す
ことができる。As a final example, "I went have been to the
Describes the action for the erroneous sentence "island."
At first I tried to write "I went to the island."
It is assumed that you tried to fix "nt" to "have been", but forgot to delete "went". An example of analysis in this case is shown in Fig. 6. In this case, "I went" All predictions at this position fail because an unexpected word comes immediately after that, so no success goals can be achieved after heve, and no clear error can be detected by combining success goals and failure goals. When the success goal has not been established in this way, the unit 5 creates a special identifier "detection failure" and sends the range in which the success identifier has been created to the message creation unit 7. In response to this, the message creating unit 7 issues a message "" have be
en to the island. "Unable to analyze" is synthesized and output. In this way, even when a clear error feature cannot be extracted, information about the error position can be shown to the user.

【００２０】次に、優先度処理部９の動作について述べ
る。構文チェック装置において、構文解析規則や１つ単
語に対する品詞や属性の曖昧性が増大すると規則を適用
すべき位置の組み合わせが増大し、処理時間がかかる。
これに対処するため、優先度処理部９を付加することに
より、処理を高速化することができる。この高速化は、
「大きい部分木はそれだけその木の作成過程において多
くの予測が成功したことになり、その構造がユーザの意
図するものである可能性が高い」という性質と、「大き
い部分木はそれを構成する小さい部分木の重要な属性を
継承する」という構文木の性質を用いて、なるべく早く
誤り位置に到達するような部分木の優先づけを行ない、
平均的な処理時間を短縮するものである。Next, the operation of the priority processing section 9 will be described. In the syntax check device, if the parsing rule or the ambiguity of the part of speech or attribute for one word increases, the number of combinations of positions to which the rule should be applied increases, and the processing time increases.
In order to deal with this, the processing can be speeded up by adding the priority processing unit 9. This speedup is
The property that "a large subtree has many successful predictions in the process of creating the tree, and the structure is likely to be intended by the user", and "the large subtree constitutes it" By using the property of the syntax tree "inherit the important attributes of a small subtree", the subtree is prioritized so that the error position is reached as soon as possible.
This is to reduce the average processing time.

【００２１】まず、構文解析部２から出力された成功ゴ
ールと失敗ゴールは、一旦優先度処理部９に入力され
る。優先度処理部では、入力された成功ゴールに対し
て、その文中における開始位置と終了位置によってグル
ーピングを行なう。開始位置と終了位置がともに等しい
成功ゴール同士は同じグループに属する。グループの中
で位置が互いに隣接するものを用いて「列」を作る。グ
ループＡとグループＢが隣接するとは、グループＡの終
了位置とグループＢの開始位置とが等しいことである。
「列」はいくつかの互いに隣接したグループによって生
成される。列は文頭を開始位置とし、その終了位置を開
始位置とするグループが無いように作る。これに違反す
るものは列として数えない。列に属するグループの数を
以後「優先度」と呼び、この数が小さいことを「優先度
が高い」と呼ぶことにする。優先度処理部９はまず最も
優先度の高い列を構成するグループに属する成功ゴール
のみを抽出する。ここで最も優先度が高い列だけを作成
する必要があるが、これにはグループの中で「自分の範
囲を完全に含むような範囲を持つ他のグループが存在し
ないグループ」だけを抽出し、抽出したグループのみを
組み合わせて列を作るとよい。この列の中に最も優先度
の高いものが全て含まれる。もし最も優先度の高い列が
複数存在すれば、それらの全てから成功ゴールを重複し
ないように抽出する。また、失敗ゴールの中で、その開
始位置が抽出した列の中の各グループの開始位置のどれ
かに一致するものだけを抽出する。なお、どの成功ゴー
ルおよび失敗ゴールを抽出したかは優先度処理部９内に
記憶しておく。抽出した成功ゴールと失敗ゴールは非文
分析部５に送られる。非文分析部５では、これを入力と
して上記と同様の処理を行ない、誤りが検出されればユ
ーザにメッセージを出力してこの文に関する処理を終了
する。この場合、優先度処理部９もその文に関する処理
を終了する。矛盾が検出されずに終った場合は、それを
優先度処理部９に通知する。この場合、成功ゴールと失
敗ゴールは非文分析部５内に保存しておく。優先度処理
部９では、この通知をうけて処理を続行する。まず、前
回の優先度の値に１を加えた数を優先度として持つ列が
できないかどうかを調べる。そのような列を作成するに
は、前回の優先度をｎとすると、前回作成した列の中で
優先度ｎ＋１を持つものを全て収集し、さらに前回作成
した優先度ｎを持つ列に対してそれを構成するグループ
のどれかを２分割することによって得られる列を全て生
成すればよい。このようにしてこの優先度の列を作るこ
とができればその列に含まれるグループの中で以前に選
択されたことのないグループを抽出し、それらのグルー
プに含まれる成功ゴールと、これらのグループと開始位
置が一致する失敗ゴールを非文分析部５に送る。非文分
析部５では、新しく入力されたゴールと以前からあった
ゴールを組み合わせてルールを適用する。ここで、新し
く入ってきたゴールが適用規則の引数の１つとして含ま
れる場合にのみルール適用する。これは以前と全く同じ
引数でルールが適用されることを防ぐためである。以
後、同様にして優先順位を１つずつ減少させながら矛盾
が検出されるまで処理を続ける。なお、矛盾が検出され
ない場合は特定の（最初の優先度に対する）相対優先度
で処理を打ち切ってもよい。First, the success goals and failure goals output from the syntax analysis section 2 are once input to the priority processing section 9. The priority processing unit performs grouping on the input success goals according to the start position and end position in the sentence. Success goals with the same start position and end position belong to the same group. A "column" is formed by using those whose positions are adjacent to each other in the group. The fact that the group A and the group B are adjacent to each other means that the end position of the group A and the start position of the group B are the same.
A "column" is created by several adjacent groups. The column is created so that there is no group whose start position is the beginning of the sentence and whose start position is the end position. Anything that violates this is not counted as a row. Hereinafter, the number of groups belonging to a row will be referred to as "priority", and the smaller number will be referred to as "higher priority". The priority processing unit 9 first extracts only the success goals belonging to the group forming the column having the highest priority. Only the highest priority column needs to be created here, but for this, only "groups that do not have other groups with ranges that completely include their range" are extracted, It is advisable to combine only the extracted groups to form a row. All of the highest priorities are included in this column. If there are multiple columns with the highest priority, the success goals are extracted from all of them so that they do not overlap. Moreover, only the failure goal whose start position matches any one of the start positions of the groups in the extracted column is extracted. It should be noted that which success goal and failure goal are extracted is stored in the priority processing unit 9. The extracted success goal and failure goal are sent to the non-sentence analysis unit 5. The non-sentence analysis unit 5 receives this as input and performs the same processing as above. If an error is detected, a message is output to the user and the processing relating to this sentence ends. In this case, the priority processing unit 9 also ends the processing relating to the sentence. When the contradiction is not detected and ends, the priority processing unit 9 is notified of the contradiction. In this case, the success goal and the failure goal are stored in the non-sentence analysis unit 5. The priority processing unit 9 receives this notification and continues the processing. First, it is checked whether or not there is a column having, as a priority, the number obtained by adding 1 to the previous priority value. To create such a column, let n be the previous priority, collect all columns that have priority n + 1 among the columns that were created previously, and It is sufficient to generate all the columns obtained by dividing any of the groups forming the column into two. If you can create a column of this priority in this way, you can extract the groups that have not been selected previously from the groups included in that column, and the success goals included in those groups and these groups. The failure goal having the same start position is sent to the non-sentence analysis unit 5. The non-sentence analysis unit 5 applies a rule by combining a newly input goal and a previously existing goal. Here, the rule is applied only when the newly entered goal is included as one of the arguments of the application rule. This is to prevent the rule from being applied with exactly the same arguments as before. Thereafter, similarly, the processing is continued until the contradiction is detected while decreasing the priority one by one. If no contradiction is detected, the processing may be terminated at a specific relative priority (relative to the first priority).

【００２２】図７（ａ）に"The stars are saw at nigh
t." という誤文の解析結果を示す。これを例として優先
度処理部９の動作を説明する。この文はsaw を過去分詞
形"seen"にすれば正しい文となる。図７（ａ）の成功ゴ
ールをグループに分けたのが図８である。図８におい
て、「自分の範囲を完全に含むような他のグループがな
いもの」は、グループａ，ｄ，ｅだけである。ａとｄ、
ｄとｅはそれぞれ互いに隣接しており、ａは文頭を開始
位置とし、ｅの終了位置（文末）を開始位置とするグル
ープはないから、“ａｄｅ”が最も優先度の高い列の１
つになる。ａ，ｄ，ｅの組合せで「列」の条件を満たす
ものは“ａｄｅ”以外には存在しないから、これが唯一
の「優先度３」の列となる。以上のようにして、図７
（ｂ）のような成功ゴールと失敗ゴールがまず選択さ
れ、非文分析部５に送られる。非文分析部５では、"ar
e" の構成するＢＥと"saw at night"の構成するＶＰの
間でルールが適用され、ＶＰの属性が過去分詞でないこ
とから誤りが判定される。誤りが判定されたので、ここ
でこの文に対する処理は終了する。このようにして、非
文分析部において適用対象となるゴールを絞り込むこと
ができる。In FIG. 7A, "The stars are saw at nigh
The result of analysis of the erroneous sentence "t." is shown. The operation of the priority processing unit 9 will be described by taking this as an example. This sentence becomes a correct sentence if the past participle form "seen" is used for saw. 8) shows that the success goals of () are divided into groups, and in FIG.8, only the groups a, d and e are "there are no other groups that completely include their own range". a and d,
Since "d" and "e" are adjacent to each other and "a" is the start position of the sentence and there is no group whose start position is the end position (sentence) of "e", "ade" is the highest priority column 1
Become one Since there is no combination of a, d, and e that satisfies the condition of "column" except "ade", this is the only column of "priority 3". As described above, FIG.
Success goals and failure goals as shown in (b) are first selected and sent to the non-sentence analysis unit 5. In the non-sentence analysis unit 5, "ar
The rule is applied between the BE composed of e "and the VP composed of" saw at night ", and an error is determined because the attribute of the VP is not a past participle. Since an error was determined, this sentence is described here. In this way, it is possible to narrow down the goals to be applied in the non-sentence analysis unit.

【００２３】以上のように、上記実施例によれば、まず
対象となる文に対して、全体として失敗したとしても中
間結果として生成された部分的な木が残るように構文解
析を行ない、これらの部分的な木についての情報を保持
し、それらの木がそれ以上成長できなかった原因を、そ
の木の属性や回りの他の木の属性を参照することによっ
て判断する。具体的には、人間が犯しやすい誤りを木同
士の相対位置や木の属性によって記述した規則を用意し
ておくことによって各種の誤りを検出する。部分木を単
位として誤りの探索を行なうので、パターンの乱れは部
分木の構造の多様性によって吸収され、単語レベルのマ
ッチングと比較して少ないルールで多くの誤りを検知で
き、精度を大きく向上させることができる。As described above, according to the above-described embodiment, the target sentence is first parsed so that the partial tree generated as an intermediate result remains even if the sentence as a whole fails. Holds information about partial trees in and determines why those trees could not grow anymore by referring to the attributes of that tree and the attributes of other trees around it. Specifically, various kinds of errors are detected by preparing a rule in which errors that are easy for humans to make are described by the relative positions of trees and the attributes of trees. Since the error search is performed in units of subtrees, the pattern disorder is absorbed by the diversity of the subtree structure, and many errors can be detected with fewer rules compared to word-level matching, greatly improving accuracy. be able to.

【００２４】また、登録された誤り規則にヒットしなか
った場合にも、構文解析が失敗したことによりどこかに
誤りが含まれているということがわかるため、それまで
に生成された木がどの範囲に分布しているかを調べるこ
となどにより、誤りが発生している位置に関する情報を
ユーザに提供することができる。Further, even if the registered error rule is not hit, it can be understood that the error has been included somewhere due to the failure of the parsing. It is possible to provide the user with information regarding the position where an error has occurred by checking whether or not it is distributed in the range.

【００２５】さらに人間の犯しやすい誤りをもとに規則
を作成しておくことができるので、すべての置換、欠
落、追加の可能性を考えるよりもはるかに探索空間が小
さくなり、実用的な分量の辞書や構文規則に対応するこ
とができる利点がある。Furthermore, since the rules can be created based on human-prone mistakes, the search space is much smaller than when considering all the substitutions, omissions, and additions, and the practical amount There is an advantage that it can correspond to the dictionary and syntax rules of.

【００２６】[0026]

【発明の効果】以上のように、本発明によれば、修飾な
どのパターンの崩れに左右されずに誤りを指摘すること
ができ、また語の欠落など従来扱えなかった種類の誤り
を指摘でき、さらに誤りの検出に失敗した場合でも、誤
りの位置に関する情報を出力することができ、構文チェ
ックを大量の規則と語彙に対して高速に実行することが
できるという効果を有する。As described above, according to the present invention, it is possible to point out an error without being influenced by the collapse of a pattern such as a decoration, and to point out a type of error that cannot be handled conventionally, such as a missing word. Further, even if the error detection fails, information about the position of the error can be output, and the syntax check can be executed quickly for a large number of rules and vocabularies.

[Brief description of drawings]

【図１】本発明の一実施例における構文チェック装置の
構成を示すブロック図FIG. 1 is a block diagram showing the configuration of a syntax check device according to an embodiment of the present invention.

【図２】（ａ）入力文"He drank a cold water." の部
分構文木を示す構文解析図（ｂ）入力文"He drank a cold water." の成功ゴール
と失敗ゴールを示す構文解析図[Fig. 2] (a) Parsing diagram showing a partial syntax tree of the input sentence "He drank a cold water." (B) Parsing diagram showing success and failure goals of the input sentence "He drank a cold water."

【図３】構文解析規則の一例を示す規則構造図FIG. 3 is a rule structure diagram showing an example of parsing rules.

【図４】非文分析規則の一例を示す規則構造図FIG. 4 is a rule structure diagram showing an example of a non-sentence analysis rule.

【図５】入力文"This is pen."の成功ゴールと失敗ゴー
ルを示す構文解析図[Figure 5] Parsing diagram showing the success and failure goals of the input sentence "This is pen."

【図６】入力文"I went have been to the island.."の
成功ゴールと失敗ゴールを示す構文解析図[Figure 6] Parsing diagram showing the success and failure goals of the input sentence "I went have been to the island .."

【図７】（ａ）入力文"The stars are saw at night."
の成功ゴールと失敗ゴールを示す構文解析図（ｂ）入力文"The stars are saw at night." の選択さ
れた成功ゴールと失敗ゴールを示す構文解析図[Fig.7] (a) Input sentence "The stars are saw at night."
Parsing diagram showing the success and failure goals of (b) Parsing diagram showing the selected success and failure goals of the input sentence "The stars are saw at night."

【図８】入力文"The stars are saw at night." の成功
ゴールのグループとその包含関係を示す図[Fig. 8] A diagram showing groups of success goals of the input sentence "The stars are saw at night." And their inclusion relationships.

[Explanation of symbols]

１文切り出し部２構文解析部３構文解析規則４単語辞書５非文分析部６非文分析規則７メッセージ作成部８メッセージ辞書９優先度処理部 1 sentence cut-out part 2 syntactic analysis part 3 syntactic analysis rule 4 word dictionary 5 non-sentence analysis part 6 non-sentence analysis rule 7 message creation part 8 message dictionary 9 priority processing part

───────────────────────────────────────────────────── フロントページの続き (72)発明者森秀子大阪府門真市大字門真1006番地松下電器産業株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hideko Mori 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. A sentence segmentation unit for segmenting an input document one sentence at a time, and parsing the segmented input sentence using a parsing rule. When the input sentence has an error and the syntactic analysis fails, A parser that outputs the subtree up to that point, a non-sentence analyzer that detects an error in the input sentence using the non-sentence analysis rules from the subtree that the parser outputs, and the detected error A syntax check device having a message creating section for outputting a message.

2. A priority given to a non-sentence analysis unit by giving a high priority to a substructure not included in other subtrees from a set of substructures forming a subtree created by the syntactic analysis unit. The syntax check device according to claim 1, further comprising a degree processing unit.