JP2007316761A

JP2007316761A - Data processing device

Info

Publication number: JP2007316761A
Application number: JP2006143159A
Authority: JP
Inventors: Kyoko Makino; 恭子牧野; Toshiyuki Kano; 敏行加納; Hiroshi Taira; 博司平; Kunitake So; 国威祖; Shigeru Matsumoto; 茂松本
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2006-05-23
Filing date: 2006-05-23
Publication date: 2007-12-06
Anticipated expiration: 2026-05-23
Also published as: JP5095128B2

Abstract

<P>PROBLEM TO BE SOLVED: To determine lack of necessary description in document data. <P>SOLUTION: A morphologic analysis portion 3 analyzes morpheme in input document data stored by a storing device 2, and recognizes each paragraph in the document data with word class information of the paragraph. A syntax analysis portion 4 analyzes syntax of each paragraph by inputting a result of morphologic analysis, and creates a syntax tree indicating grammatical relationship of each paragraph as a result of syntax analysis. A determining portion 6 determines the presence of defect in a sentence in the input document data, that is, the paragraph for establishing grammatically correct sentence is not described, when the paragraph corresponding to a route of the syntax tree indicating a result of syntax analysis is not found. Here, the determining portion 6 creates a message indicating the deficiency in description necessary for establishing the grammatically correct sentence, and outputs the message to an output device 8. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文書データの解析を行なうデータ処理装置に関する。 The present invention relates to a data processing apparatus that analyzes document data.

従来、文書データの作成を支援するために、文を構成する表現の品詞のつながりから、誤入力の可能性のある部分を判定してユーザに提示するものがある。
また、例えば特許文献１に開示されるように、入力文書データ中の記述のうち具体性に欠ける部分を提示するものがある。この文献では、入力文書データ中の記述に予め定められた特徴部分が含まれる場合に、この特徴部分を有する文が具体的であるために欠落すべきでない５Ｗ１Ｈ（when,whereなど）を含んでいる否かを判定する。
特開２００２−１８３１１７号公報 2. Description of the Related Art Conventionally, in order to support the creation of document data, there is one that determines and presents to a user a portion that may be erroneously input from a connection of parts of speech of expressions constituting a sentence.
Further, as disclosed in, for example, Patent Document 1, there is a document that presents a portion lacking in concreteness among descriptions in input document data. In this document, when a predetermined feature portion is included in the description in the input document data, a sentence having this feature portion is included, so that it includes 5W1H (when, where, etc.) that should not be omitted. Determine whether or not.
JP 2002-183117 A

しかし、前述したように品詞の繋がりから誤入力を判定する技術では単語の意味は解析しないため、例えば「セクタ不良は認めません。」と記述すべきところを「セクタは認めません。」などのように、品詞の繋がりが正しくても意味が不十分な記述がなされている場合でも、これを指摘することはできない。 However, as described above, the technique for determining erroneous input based on the connection of parts of speech does not analyze the meaning of the word. For example, “Sector failure is not allowed” should be described as “Sector is not allowed.” Thus, even if the part-of-speech connection is correct but the meaning is insufficient, this cannot be pointed out.

また、前述したように文書データ中の予め定められた特徴を有する文に５Ｗ１Ｈが含まれているか否かを判定すれば具体性のない文をチェックすることができるが、これは文が具体性を有するか否かを判定するにとどまるものであり、文中に必要な要素、例えば主語や述語の記述不足を判定するものではない。 Further, as described above, if it is determined whether or not 5W1H is included in a sentence having predetermined characteristics in the document data, a sentence having no specificity can be checked. It is only a matter of determining whether or not it has a statement, and it does not determine whether a required element in the sentence, for example, a subject or predicate is insufficiently described.

そこで、本発明の目的は、文書データ中に必要な記述の不足を判定することが可能になるデータ処理装置を提供することにある。 Accordingly, an object of the present invention is to provide a data processing apparatus that can determine the lack of necessary descriptions in document data.

すなわち、本発明に係わるデータ処理装置は、入力文書データの構文解析結果をもとに、入力文書データに必要な表現の記述が不足しているか否かを判定し、この判定結果を出力することを特徴とする。 That is, the data processing apparatus according to the present invention determines whether or not the description of the expression necessary for the input document data is insufficient based on the parsing result of the input document data, and outputs the determination result. It is characterized by.

本発明に係わるデータ処理装置では、文書データ中に必要な記述の不足を判定することができる。 In the data processing apparatus according to the present invention, it is possible to determine the lack of necessary descriptions in the document data.

以下図面により本発明の実施形態について説明する。
（第１の実施形態）
まず、本発明の第１の実施形態について説明する。
図１は、本発明の第１の実施形態にしたがった文書データ処理装置の構成例を示すブロック図である。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
First, a first embodiment of the present invention will be described.
FIG. 1 is a block diagram showing a configuration example of a document data processing apparatus according to the first embodiment of the present invention.

図１に示すように、本発明の実施形態にしたがった文書データ処理装置は、装置全体の制御を司る制御部１、記憶装置２、形態素解析部３、構文解析部４、抽出処理部５、判定部６、入力装置７、出力装置８および入出力インタフェース９を備え、それぞれがバス１０を介して接続される。 As shown in FIG. 1, a document data processing device according to an embodiment of the present invention includes a control unit 1 that controls the entire device, a storage device 2, a morpheme analysis unit 3, a syntax analysis unit 4, an extraction processing unit 5, A determination unit 6, an input device 7, an output device 8, and an input / output interface 9 are provided and are connected via a bus 10.

記憶装置２は例えばハ−ドディスクドライブや不揮発性メモリ装置などのハードウェアで構成される。記憶装置２は制御部１による実行対象の制御プログラムを記憶するのに加え、制御部１による各種処理のワークメモリとしても機能する。 The storage device 2 is configured by hardware such as a hard disk drive or a nonvolatile memory device. The storage device 2 functions as a work memory for various processes performed by the control unit 1 in addition to storing a control program to be executed by the control unit 1.

形態素解析部３は、記憶装置２に記憶された文書データを読み出して、この文書データの形態素解析を行なって各文節を抽出する。
構文解析部４は、形態素解析部３により抽出された文節の構文解析を行なって、各文節の関係を解析する。 The morpheme analysis unit 3 reads out the document data stored in the storage device 2 and performs morpheme analysis on the document data to extract each phrase.
The syntax analysis unit 4 performs the syntax analysis of the clauses extracted by the morpheme analysis unit 3 and analyzes the relationship between the clauses.

抽出処理部５は、構文解析部４による処理結果を抽出する。判定部６は、抽出処理部５による抽出結果をもとに、入力文書データ中に、当該文書データに必要な単語が記述されているか否かを判定する。抽出処理部５は、必要な単語が記述されているか否かを判定するに限らず、例えば所定量の文章が記述されているか否かを判定するようにしても良い。 The extraction processing unit 5 extracts the processing result from the syntax analysis unit 4. The determination unit 6 determines whether or not a word necessary for the document data is described in the input document data based on the extraction result by the extraction processing unit 5. The extraction processing unit 5 is not limited to determining whether or not a necessary word is described, and for example, it may be determined whether or not a predetermined amount of text is described.

入力装置７は例えばキーボードやマウスであり、文書データの新規作成などにかかる操作を受け付ける。出力装置８は例えばディスプレイ装置である。また、入出力インタフェース９は図示しない外部記憶装置とケーブルを介して接続可能であり、この外部記憶装置との間で文書データを入出力する。 The input device 7 is a keyboard or a mouse, for example, and accepts an operation related to creation of new document data. The output device 8 is, for example, a display device. The input / output interface 9 can be connected to an external storage device (not shown) via a cable, and inputs / outputs document data to / from the external storage device.

次に、図１に示した構成の文書データ処理装置の動作について説明する。
図２は、本発明の第１の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャートである。 Next, the operation of the document data processing apparatus having the configuration shown in FIG. 1 will be described.
FIG. 2 is a flowchart showing an example of the processing operation of the document data processing apparatus according to the first embodiment of the present invention.

まず、文書データ処理装置の入力装置７に対する操作により文書データが作成されると、制御部１はこの文書データを記憶装置２に記憶する。また、外部記憶装置に記憶された作成済みの文書データが入出力インタフェース９に入力されると、制御部１はこの文書データを記憶装置２に記憶する（ステップＳ１）。 First, when document data is created by an operation on the input device 7 of the document data processing apparatus, the control unit 1 stores the document data in the storage device 2. When the created document data stored in the external storage device is input to the input / output interface 9, the control unit 1 stores the document data in the storage device 2 (step S1).

ここでは入力文書データ中に第１文「前回と比較して改善が認められます。」および第２文「前回と比較し。」が記述されているとする。ここで説明した第１文および第２文は入力文書データ中の句点で区切られた部分であって、文法上の正しい文として成立している場合とそうでない場合がある。まず、第１文に対する解析について説明する。 Here, it is assumed that the first sentence “Improved compared with the previous time” and the second sentence “Compared with the previous time” are described in the input document data. The first sentence and the second sentence described here are parts separated by punctuation points in the input document data, and may or may not be established as a grammatically correct sentence. First, analysis for the first sentence will be described.

形態素解析部３は、記憶装置２に記憶される入力文書データ中の形態素解析を行なって、文書データ中の各文節を当該文節の品詞情報とともに認識する（ステップＳ２）。
図３は、本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第１文の形態素解析結果を示す図である。
そして、構文解析部４は、図３に示した形態素解析結果を入力し、各文節の構文解析を行なって、各文節の文法上の関係を示す構文木を構文解析結果として生成する（ステップＳ３）。 The morpheme analysis unit 3 performs morpheme analysis in the input document data stored in the storage device 2, and recognizes each phrase in the document data together with the part of speech information of the phrase (step S2).
FIG. 3 is a diagram showing a morphological analysis result of the first sentence in the document data input by the document data processing apparatus according to the first embodiment of the present invention.
Then, the syntax analysis unit 4 inputs the morpheme analysis result shown in FIG. 3, performs the syntax analysis of each clause, and generates a syntax tree indicating the grammatical relationship of each clause as the syntax analysis result (step S3). ).

図４は、本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第１文の構文解析結果を示す図である。
図４に示すように、第１文の構文解析結果である構文木では、文節の「認められます」に対して「比較して」および「改善が」が連なり、「比較して」に対して「前回と」が連なる。 FIG. 4 is a diagram showing a syntax analysis result of the first sentence in the document data input by the document data processing apparatus according to the first embodiment of the present invention.
As shown in FIG. 4, in the syntax tree that is the result of parsing the first sentence, “comparison” and “improvement” are linked to “accepted” in the clause, and “comparison” “Same last time”.

抽出処理部５は、図４に示した構文解析結果を入力し、この解析結果で示される構文木の最も基底の単一の要素であるルート（ｒｏｏｔ）に対応する文節がある場合には、これを抽出する。第１文では「認められます」がルートに対応する文節である。 The extraction processing unit 5 inputs the parsing result shown in FIG. 4 and if there is a clause corresponding to the root (root) which is the single most basic element of the parsing tree indicated by the parsing result, Extract this. In the first sentence, “accepted” is the clause corresponding to the route.

判定部６は、抽出処理部５が第１文の構文解析結果からルートに対応する文節を抽出すると、入力文書データ中の第１文に欠損がない、つまり第１文に当該第１文が文法上の正しい文として成立するための文節が記述されていると判定する（ステップＳ４のＮＯ）。この場合には構文解析された第１文に関するその後の処理は行なわれない。 When the extraction processing unit 5 extracts the phrase corresponding to the root from the syntax analysis result of the first sentence, the determination unit 6 has no defect in the first sentence in the input document data, that is, the first sentence is not included in the first sentence. It is determined that a clause for establishing a grammatically correct sentence is described (NO in step S4). In this case, subsequent processing relating to the first sentence that has been parsed is not performed.

次に、前述した第２文に対する解析について説明する。図５は、本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第２文の形態素解析結果を示す図である。図６は、本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第２文の構文解析結果を示す図である。 Next, the analysis for the second sentence will be described. FIG. 5 is a diagram showing a morphological analysis result of the second sentence in the document data input by the document data processing apparatus according to the first embodiment of the present invention. FIG. 6 is a diagram showing a syntax analysis result of the second sentence in the document data input by the document data processing apparatus according to the first embodiment of the present invention.

図６に示すように、第２文の構文解析結果である構文木では、文節の「前回と」に対して「比較して」が連なるが、ルートに対応する文節は存在しない。よって、抽出処理部５による文節の抽出はなされない。判定部６は抽出処理部５が第２文中のルートに対応する文節を抽出できなかった場合には、入力文書データ中の第２文に欠損がある、つまり文法上の正しい文として成立するための文節が記述されていないと判定する（ステップＳ４のＹＥＳ）。 As shown in FIG. 6, in the syntax tree that is the result of the parsing of the second sentence, “comparison” continues with “previous” of the clause, but there is no clause corresponding to the root. Therefore, the extraction processing unit 5 does not extract the phrase. When the extraction processing unit 5 cannot extract the phrase corresponding to the route in the second sentence, the determination unit 6 has a defect in the second sentence in the input document data, that is, it is established as a grammatically correct sentence. Is determined not to be described (YES in step S4).

この場合には、判定部６は、入力文書データ中の第２文に当該第２文が文法上の正しい文として成立するための必要な記述が不足している旨を示すメッセージを生成し、これを出力装置８に出力させる（ステップＳ５）。 In this case, the determination unit 6 generates a message indicating that the second sentence in the input document data lacks a description necessary to establish the second sentence as a grammatically correct sentence. This is output to the output device 8 (step S5).

図７は、本発明の第１の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図である。
ユーザは図７に示された画面を確認することで入力文書データ中に必要な記述の不足部分を容易に認識することができる。 FIG. 7 is a view showing an example of an error output screen output by the document data processing apparatus according to the first embodiment of the present invention.
By confirming the screen shown in FIG. 7, the user can easily recognize the lack of necessary description in the input document data.

以上のように、本発明の第１の実施形態にしたがった文書データ処理装置では、入力文書データの句点で区切られた部分の構造解析を行ない、この解析結果をもとにして、当該区切られた部分が文法上の正しい文として成立するための記述が不足していると判定した場合に、この判定結果を出力するので、入力文書データ中に必要な記述の不足部分をユーザに指摘することができる。 As described above, in the document data processing apparatus according to the first embodiment of the present invention, the structure of the portion delimited by the punctuation points of the input document data is analyzed, and the delimiter is divided based on the analysis result. If it is determined that there is not enough description to establish the correct part as a grammatical sentence, this determination result is output, so the user must be informed of the lack of necessary description in the input document data. Can do.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。なお、本実施形態に係る文書データ処理装置の構成は図１に示したものと基本的にほぼ同様であるので同一部分の説明は省略する。
この第２の実施形態にしたがった文書データ処理装置は文書データ中の必要な係り受けの表現の不足を判定する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. The configuration of the document data processing apparatus according to the present embodiment is basically the same as that shown in FIG.
The document data processing apparatus according to the second embodiment determines a lack of necessary dependency expressions in the document data.

図８は、本発明の第２の実施形態にしたがった文書データ処理装置の記憶装置に記憶される条件定義テーブルの一例を表形式で示す図である。
本発明の第２の実施形態にしたがった文書データ処理装置の記憶装置２には図８に示した形式の条件定義テーブルが記憶される。このテーブルでは、文書データ中の予め定められた自立語である第１表現と、この第１表現が文書データ中に記述される場合に正しい係り受けとして記述されるべき自立語である第２表現とが関連付けられて管理される。条件定義テーブル上では、単一の第１表現に単一または複数種類の第２表現が関連付けられる。 FIG. 8 is a table showing an example of a condition definition table stored in the storage device of the document data processing apparatus according to the second embodiment of the present invention.
A condition definition table of the format shown in FIG. 8 is stored in the storage device 2 of the document data processing apparatus according to the second embodiment of the present invention. In this table, a first expression that is a predetermined independent word in the document data and a second expression that is a free word that should be described as a correct dependency when the first expression is described in the document data. And are managed in association with each other. On the condition definition table, single or multiple types of second expressions are associated with a single first expression.

本発明の第２の実施形態にしたがった文書データ処理装置は、入力文書データ中に条件定義テーブル上の第１表現が記述されている場合で、かつこれと関連付けられる第２表現が第１表現の係り受けとして同一文中に記述されているか否かを判定することで、入力文書データ中の予め定められた表現の係り受けとして記述されるべき表現が記述されているか否かを判定する。 In the document data processing device according to the second embodiment of the present invention, the first expression on the condition definition table is described in the input document data, and the second expression associated therewith is the first expression. It is determined whether or not an expression to be described as a dependency of a predetermined expression in the input document data is described by determining whether or not the same dependency is described in the same sentence.

次に、本発明の第２の実施形態にしたがった文書データ処理装置の処理動作について説明する。図９は、本発明の第２の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャートである。 Next, the processing operation of the document data processing apparatus according to the second embodiment of the present invention will be described. FIG. 9 is a flowchart showing an example of the processing operation of the document data processing apparatus according to the second embodiment of the present invention.

ここでは入力文書データ中に第３文「セクタの不良は発見できませんでした。」、第４文「不良セクタは発見できませんでした。」、第５文「セクタの異常は発見できませんでした。」および第６文「セクタは発見できませんでした。」が記述されているとする。
図９に示されるステップＳ１１からステップＳ１３までの処理は第１の実施形態で説明したステップＳ１からステップＳ３までの処理と同じである。 Here, in the input document data, the third sentence “Sector failure was not found”, the fourth sentence “Bad sector was not found”, and the fifth sentence “Sector abnormality was not found.” And the sixth sentence “The sector could not be found.” Is described.
The processing from step S11 to step S13 shown in FIG. 9 is the same as the processing from step S1 to step S3 described in the first embodiment.

図１０は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第３文の形態素解析結果を示す図である。図１１は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第３文の構文解析結果を示す図である。 FIG. 10 is a diagram showing a morphological analysis result of the third sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention. FIG. 11 is a diagram showing the syntax analysis result of the third sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention.

図１０に示すように、第３文の構文解析結果である構文木では、文節の「セクタの」に対して「不良は」が連なり、この「不良は」に対して「発見できませんでした」が連なる。 As shown in FIG. 10, in the syntax tree that is the result of parsing the third sentence, “defect is” continues to “sector” in the clause, and “is not found” for this “defect”. Are connected.

図１２は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第４文の形態素解析結果を示す図である。図１３は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第４文の構文解析結果を示す図である。 FIG. 12 is a diagram showing a morphological analysis result of the fourth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention. FIG. 13 is a diagram showing a syntax analysis result of the fourth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention.

図１３に示すように、第４文の構文解析結果である構文木では、文節の「不良」に対して「セクタは」が連なり、この「セクタは」に対して「発見できませんでした」が連なる。 As shown in FIG. 13, in the syntax tree that is the result of parsing the fourth sentence, “sector is” connected to “bad” in the clause, and “sector was not found” for “sector”. It is a series.

図１４は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第５文の形態素解析結果を示す図である。図１５は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第５文の構文解析結果を示す図である。 FIG. 14 is a diagram showing a morphological analysis result of the fifth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention. FIG. 15 is a diagram showing the syntax analysis result of the fifth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention.

図１５に示すように、第５文の構文解析結果である構文木では、文節の「セクタの」に対して「異常は」が連なり、この「異常は」に対して「発見できませんでした」が連なる。 As shown in FIG. 15, in the syntax tree that is the result of parsing the fifth sentence, “abnormality” is connected to “sector” of the clause, and “abnormality” was “not found” Are connected.

図１６は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第６文の形態素解析結果を示す図である。図１７は、本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第６文の構文解析結果を示す図である。
図１７に示すように、第６文の構文解析結果である構文木では、文節の「セクタは」に対して「発見できませんでした」が連なる。 FIG. 16 is a diagram showing a morphological analysis result of the sixth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention. FIG. 17 is a diagram showing the syntax analysis result of the sixth sentence in the document data input by the document data processing apparatus according to the second embodiment of the present invention.
As shown in FIG. 17, in the syntax tree that is the result of parsing the sixth sentence, “sector was not found” continues to “sector” in the phrase.

抽出処理部５は、図１１、図１３、図１５および図１７に示した構文解析結果をそれぞれ入力し、この解析結果で示される構文木の要素に対応する文節における自立語と記憶装置２に記憶される条件定義テーブルを照合することで、構文木中の文節と同じ第１表現を条件定義テーブルから検索する。ここでは、各種構文解析結果にしたがって条件定義テーブルからの検索対象となる自立語は「セクタ」である。 The extraction processing unit 5 inputs the syntax analysis results shown in FIGS. 11, 13, 15, and 17, and stores the independent words in the phrase corresponding to the element of the syntax tree indicated by the analysis results and the storage device 2. By collating the stored condition definition table, the same first expression as the clause in the syntax tree is searched from the condition definition table. Here, the independent word to be searched from the condition definition table according to various syntax analysis results is “sector”.

そして抽出処理部５は、構文解析結果の構文木の自立語部分の連なりをもとに、検索済みの第１表現の係り受けとなる自立語を当該構文木から抽出する。
具体的には、第３文中の自立語のうち、前述した抽出対象となる自立語である「セクタ」の係り受けとなる自立語は「不良」である。また、第４文中の自立語のうち、前述した抽出対象となる自立語である「セクタ」の係り受けとなる自立語は「不良」および「発見」である。 Then, the extraction processing unit 5 extracts a self-supporting word that is a dependency of the searched first expression from the syntax tree based on a series of self-supporting word parts of the syntax tree of the syntax analysis result.
Specifically, among the independent words in the third sentence, the independent word that depends on the “sector”, which is the independent word to be extracted, is “bad”. In addition, among the independent words in the fourth sentence, the independent words that depend on the “sector” that is the independent word to be extracted are “bad” and “discovery”.

第５文中の自立語のうち、前述した抽出対象となる自立語である「セクタ」の係り受けとなる自立語は「異常」である。また、また、第６文中の自立語のうち、条件定義テーブルからの抽出対象となる自立語である「セクタ」の係り受けとなる自立語は「発見」である。 Of the independent words in the fifth sentence, the independent word that depends on the “sector”, which is the independent word to be extracted, is “abnormal”. In addition, among the independent words in the sixth sentence, the independent word that depends on the “sector” that is the independent word to be extracted from the condition definition table is “discovery”.

判定部６は、抽出処理部５により抽出した自立語の組み合わせを入力し、この組み合わせと記憶装置２に記憶される条件定義テーブルとを照合することで、当該条件テーブル上の第１表現が抽出済みの自立語の一方となっており、かつこの第１表現と関連付けられる第２表現が自立語の組み合わせの他方である場合には、これらの自立語が含まれる文に正しい係り受けの表現が記述されていると判定する（ステップＳ１４のＮＯ）。 The determination unit 6 inputs the combination of independent words extracted by the extraction processing unit 5 and collates this combination with the condition definition table stored in the storage device 2 to extract the first expression on the condition table. If the second expression associated with this first expression is the other of the combinations of independent words, the correct dependency expression is included in the sentence containing these independent words. It is determined that it is described (NO in step S14).

具体的には判定部６は、図８に示した条件定義テーブル上の第１表現である「セクタ」および、この「セクタ」の正しい係り受けの第２表現である「不良」が第３および第４文中に記述されていると判定し、条件定義テーブル上の第１表現である「セクタ」および、この「セクタ」の正しい係り受けである「異常」が第５文中に記述されていると判定する。この場合には、これらの文に関するその後の処理は行なわれない。 Specifically, the determination unit 6 determines that “sector” which is the first expression on the condition definition table shown in FIG. 8 and “bad” which is the second expression of the correct dependency of this “sector” are the third and It is determined that it is described in the fourth sentence, and “sector” that is the first expression on the condition definition table and “abnormal” that is the correct dependency of this “sector” are described in the fifth sentence. judge. In this case, the subsequent processing regarding these sentences is not performed.

一方、判定部６は、抽出処理部５からの自立語の組み合わせと記憶装置２に記憶される条件定義テーブル上とを照合することで、自立語の組み合わせの一方、つまり条件テーブル上の第１表現と関連付けられる第２表現に対応する文節が構文解析対象の構文木に含まれていない場合には、入力文書データ中の文のうち条件定義テーブルで管理される予め定められた表現を含む文に当該定められた表現の正しい係り受けの記述が不足していると判定する。（ステップＳ１４のＹＥＳ）。
具体的には判定部６は、図８に示した第６文中に記述されて条件定義テーブル上の第１表現である「セクタ」の正しい係り受けが当該第６文中に記述されていないと判定する。 On the other hand, the determination unit 6 collates the combination of independent words from the extraction processing unit 5 with the condition definition table stored in the storage device 2 so that one of the combinations of independent words, that is, the first on the condition table is stored. If a clause corresponding to the second expression associated with the expression is not included in the syntax tree to be parsed, a sentence including a predetermined expression managed in the condition definition table among sentences in the input document data It is determined that the description of the correct dependency of the defined expression is insufficient. (YES in step S14).
Specifically, the determination unit 6 determines that the correct dependency of “sector”, which is described in the sixth sentence shown in FIG. 8 and is the first expression on the condition definition table, is not described in the sixth sentence. To do.

この場合には、判定部６は、入力文書データ中の第６文に含まれて条件定義テーブル上で管理される表現の正しい係り受けとなる表現の記述が不足している旨を示すメッセージを生成し、これを出力装置８に出力させる（ステップＳ１５）。
図１８は、本発明の第２の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図である。 In this case, the determination unit 6 displays a message indicating that the description of the expression included in the sixth sentence in the input document data and managed correctly on the condition definition table is insufficient. This is generated and output to the output device 8 (step S15).
FIG. 18 is a diagram showing an example of an error output screen output by the document data processing apparatus according to the second embodiment of the present invention.

以上のように、本発明の第２の実施形態にしたがった文書データ処理装置では、入力文書データの各文の構造解析を行ない、この解析された文中の予め定められた表現の正しい係り受けの記述が不足している不足していると判定した場合に、この判定結果を出力するので、入力文書データ中の係り受けの記述不足をユーザに指摘することができる。 As described above, in the document data processing apparatus according to the second embodiment of the present invention, the structure of each sentence of the input document data is analyzed, and the correct dependency of the predetermined expression in the analyzed sentence is determined. If it is determined that the description is insufficient, this determination result is output, so that it is possible to indicate to the user that the description of the dependency in the input document data is insufficient.

次に、本発明の第２の実施形態の第１の変形例について説明する。この変形例では、入力文書データに対する構文解析部４による構文解析を行なわずとも、入力文書データ中の係り受けの記述不足の有無を判定することができる。 Next, a first modification of the second embodiment of the present invention will be described. In this modification, it is possible to determine whether or not there is a lack of dependency description in the input document data without performing syntax analysis by the syntax analysis unit 4 on the input document data.

この変形例では、記憶装置２に抽出定義テーブルをさらに記憶する。図１９は、本発明の第２の実施形態の第１の変形例にしたがった文書データ処理装置の記憶装置に記憶される抽出定義テーブルの一例を表形式で示す図である。 In this modification, an extraction definition table is further stored in the storage device 2. FIG. 19 is a table showing an example of the extraction definition table stored in the storage device of the document data processing device according to the first modification of the second embodiment of the present invention.

この抽出定義テーブルでは、第１の自立語情報および第２の自立語情報が関連付けて管理される。また、一部の第１の自立語情報および第２の自立語情報には当該第１および第２の自立語情報の間の付属語情報が関連付けられて管理される。これらの自立語情報は図１９に示したように品詞名であってもよいし、具体的な自立語であってもよい。 In this extraction definition table, the first independent word information and the second independent word information are managed in association with each other. In addition, some of the first independent word information and the second independent word information are managed in association with associated word information between the first and second independent word information. These independent words information may be part-of-speech names as shown in FIG. 19, or specific independent words.

このテーブルを用いた係り受け表現の抽出処理について説明する。抽出処理部５は、形態素解析部３による入力文書データの形態素解析結果で示される単語の品詞情報と記憶装置２に記憶される抽出定義テーブルとを照合することで、抽出定義テーブルで定義される自立語や付属語の組み合わせのうち、形態素解析結果で示される品詞情報の組み合わせと同じ組み合わせを検索する。 The dependency expression extraction process using this table will be described. The extraction processing unit 5 is defined in the extraction definition table by collating the part-of-speech information of the word indicated by the morphological analysis result of the input document data by the morpheme analysis unit 3 with the extraction definition table stored in the storage device 2. Among the combinations of independent words and attached words, the same combination as the combination of the part of speech information indicated by the morphological analysis result is searched.

例えば係り受け表現の抽出対象が前述した第３文である場合には、この文の形態素解結果は図１０に示した結果となるので、この結果で示される「セクタ［名詞］−の［助詞］／不良［名詞］」における単語の組み合わせが、図１９に示した抽出定義テーブルの上から２段目で定義される条件と一致するので、第３文における係り受け表現は「セクタの不良」となる。 For example, when the dependency expression is extracted from the third sentence described above, the morphological solution result of this sentence is the result shown in FIG. ] / Defect [noun] ”matches the condition defined in the second row from the top of the extraction definition table shown in FIG. 19, and the dependency expression in the third sentence is“ sector defect ”. It becomes.

抽出対象が前述した第４文である場合には、この文の形態素解結果は図１２に示した結果となるので、この結果で示される「不良［名詞］／セクタ［名詞］」における単語の組み合わせが、図１９に示した抽出定義テーブルの最上段で定義される条件と一致するので、第４文における係り受け表現は「不良セクタ」となる。 When the extraction target is the above-described fourth sentence, the result of the morphological solution of this sentence is the result shown in FIG. 12, so that the word in the “bad [noun] / sector [noun]” indicated by this result is shown. Since the combination matches the condition defined at the top of the extraction definition table shown in FIG. 19, the dependency expression in the fourth sentence is “bad sector”.

抽出対象が前述した第５文である場合には、この文の形態素解結果は図１４に示した結果となるので、この結果で示される「セクタ［名詞］−の［助詞］／異常［名詞］」における単語の組み合わせが、図１９に示した抽出定義テーブルの上から２段目で定義される条件と一致するので、第５文における係り受け表現は「セクタの異常」となる。 When the extraction target is the fifth sentence described above, the result of the morphological solution of this sentence is the result shown in FIG. 14. Therefore, “sector [noun]-[participant] / abnormal [noun] ] ”Matches the condition defined in the second row from the top of the extraction definition table shown in FIG. 19, and the dependency expression in the fifth sentence is“ sector abnormality ”.

また、抽出対象が前述した第６文である場合には、この文の形態素解結果は図１６に示した結果となるので、この結果で示される各単語の組み合わせは、図１９に示した抽出定義テーブルの条件と一致しないので、第６文における係り受け表現は「なし」となる。以後は抽出処理部５および判定部６により、この係り受け表現と図８に示したような条件定義テーブルとを照合することで正しい係り受けの記述の有無を判定すればよい。 Further, when the extraction target is the above-described sixth sentence, the result of the morphological solution of this sentence is the result shown in FIG. 16, and the combination of each word indicated by this result is the extraction shown in FIG. Since it does not match the conditions of the definition table, the dependency expression in the sixth sentence is “none”. Thereafter, the extraction processing unit 5 and the determination unit 6 may determine whether or not there is a correct description of the dependency by comparing this dependency expression with the condition definition table as shown in FIG.

次に、本発明の第２の実施形態の第２の変形例について説明する。この変形例では、構文解析部４による構文解析や第１の変形例で説明した抽出定義テーブルを用いなくとも入力文書データ中の係り受けの記述不足の有無を判定することができる。 Next, a second modification of the second embodiment of the present invention will be described. In this modification, it is possible to determine whether or not there is a lack of dependency description in the input document data without using the syntax analysis by the syntax analysis unit 4 or the extraction definition table described in the first modification.

抽出処理部５は、形態素解析部３による入力文書データの形態素解析結果で示される品詞情報をもとに、図８に示した条件定義テーブル中の第１表現を含む単一または複数の組み合わせを当該形態素解析結果から抽出する。 Based on the part-of-speech information indicated by the morphological analysis result of the input document data by the morphological analysis unit 3, the extraction processing unit 5 selects a single or a plurality of combinations including the first expression in the condition definition table shown in FIG. Extracted from the morphological analysis result.

抽出対象の文が前述した第３文である場合には、図８に示した条件定義テーブル中の第１表現である「セクタ」を含む自立語の組み合わせは「セクタ−不良」、「セクタ−発見」および「セクタ−できる」となる。 When the sentence to be extracted is the third sentence described above, the combinations of free words including “sector” which is the first expression in the condition definition table shown in FIG. 8 are “sector-bad” and “sector--”. “Discover” and “Sector-capable”.

抽出対象の文が前述した第４文である場合には、図８に示した条件定義テーブル中の第１表現である「セクタ」を含む自立語の組み合わせは「不良−セクタ」、「セクタ−発見」および「セクタ−できる」となる。 When the sentence to be extracted is the above-described fourth sentence, the combination of free words including “sector” as the first expression in the condition definition table shown in FIG. 8 is “bad-sector”, “sector--”. “Discover” and “Sector-capable”.

抽出対象の文が前述した第５文である場合には、図８に示した条件定義テーブル中の第１表現である「セクタ」を含む自立語の組み合わせは「セクタ−異常」、「セクタ−発見」および「セクタ−できる」となる。 When the sentence to be extracted is the above-described fifth sentence, the combination of free words including “sector” which is the first expression in the condition definition table shown in FIG. 8 is “sector-abnormal”, “sector-- “Discover” and “Sector-capable”.

抽出対象の文が前述した第６文である場合には、図８に示した条件定義テーブル中の第１表現である「セクタ」を含む自立語の組み合わせは「セクタ−発見」および「セクタ−できる」となる。 When the sentence to be extracted is the above-described sixth sentence, the combination of free words including “sector” as the first expression in the condition definition table shown in FIG. Can do it. "

以後は抽出処理部５および判定部６により、この自立語の組み合わせを図８に示したような条件定義テーブルと照合することで正しい係り受けの記述の有無を判定すればよい。 Thereafter, the extraction processing unit 5 and the determination unit 6 may determine the presence / absence of a correct dependency description by collating this independent word combination with the condition definition table as shown in FIG.

図２０は、本発明の第２の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の変形例を示す図である。
また、本発明の第２の実施形態にしたがった文書データ処理装置は、正しい係り受けが入力文書データに記述されていないと判定した場合には、文書データ中の表現のうち条件定義テーブル上の第１表現である表現がある場合で、この表現のうち、さらに係り受けの表現を追加して記述すべき表現に図２０に示したように下線を付すなどして強調表示してもよいし、この強調表示部分に吹き出しを対応させて表示させた上で、追加して記述すべき表現の候補を条件定義テーブルから検索して吹き出し内に表示させてもよい。 FIG. 20 is a view showing a modification of the error output screen output by the document data processing apparatus according to the second embodiment of the present invention.
In addition, when the document data processing apparatus according to the second embodiment of the present invention determines that the correct dependency is not described in the input document data, the expression on the condition definition table among the expressions in the document data. If there is an expression that is the first expression, the expression to be described by adding a dependency expression may be highlighted by adding an underline as shown in FIG. Alternatively, a balloon may be displayed in correspondence with the highlighted portion, and a candidate for an expression to be additionally described may be retrieved from the condition definition table and displayed in the balloon.

例えば、判定対象の文が第６文である場合には、この文の表現のうち条件定義テーブル上の第１表現と同じ表現は「セクタ」であって、これとの係り受けとなる表現は第６文には含まれておらず、「セクタ」と図８に示した条件定義テーブル上で関連付けられる第２表現は「異常」および「不良」である。 For example, when the sentence to be determined is the sixth sentence, the same expression as the first expression on the condition definition table among the expressions of this sentence is “sector”, and the expression that is dependent on this is “sector”. The second expression that is not included in the sixth sentence and is associated with the “sector” on the condition definition table shown in FIG. 8 is “abnormal” and “bad”.

判定部６はこの第２表現をもとに、記述情報が不足している旨を示すメッセージとして「『の異常』を挿入する」、「『の不良』を挿入する」および「このまま修正しない」をそれぞれ提示する。この提示されたメッセージのうち表現の挿入に関するメッセージが入力装置７に対する操作により選択されると、制御部１はこの選択されたメッセージに対応する表現を文書データ中に挿入する。 Based on the second expression, the determination unit 6 “inserts“ abnormal ””, “inserts“ defect ””, and “does not correct as it is” as a message indicating that the description information is insufficient. Present each. When a message relating to insertion of an expression is selected from the presented messages by an operation on the input device 7, the control unit 1 inserts an expression corresponding to the selected message into the document data.

なお、この発明は前記実施形態そのままに限定されるものではなく実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、前記実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を省略してもよい。更に、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 The present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be omitted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

本発明の第１の実施形態にしたがった文書データ処理装置の構成例を示すブロック図。1 is a block diagram showing a configuration example of a document data processing apparatus according to a first embodiment of the present invention. 本発明の第１の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャート。6 is a flowchart illustrating an example of a processing operation of the document data processing apparatus according to the first embodiment of the present invention. 本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第１文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 1st sentence in the document data input by the document data processing apparatus according to the 1st Embodiment of this invention. 本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第１文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 1st sentence in the document data input by the document data processing apparatus according to the 1st Embodiment of this invention. 本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第２文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 2nd sentence in the document data input by the document data processing apparatus according to the 1st Embodiment of this invention. 本発明の第１の実施形態にしたがった文書データ処理装置が入力した文書データ中の第２文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 2nd sentence in the document data which the document data processing apparatus according to the 1st Embodiment of this invention input. 本発明の第１の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図。FIG. 6 is a diagram showing an example of an error output screen output by the document data processing apparatus according to the first embodiment of the present invention. 本発明の第２の実施形態にしたがった文書データ処理装置の記憶装置に記憶される条件定義テーブルの一例を表形式で示す図。The figure which shows an example of the condition definition table memorize | stored in the memory | storage device of the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャート。10 is a flowchart showing an example of processing operation of the document data processing apparatus according to the second embodiment of the present invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第３文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 3rd sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第３文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 3rd sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第４文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 4th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第４文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 4th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第５文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 5th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第５文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 5th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第６文の形態素解析結果を示す図。The figure which shows the morphological analysis result of the 6th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置が入力した文書データ中の第６文の構文解析結果を示す図。The figure which shows the syntax analysis result of the 6th sentence in the document data input by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図。The figure which shows an example of the error output screen output by the document data processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施形態の第１の変形例にしたがった文書データ処理装置の記憶装置に記憶される抽出定義テーブルの一例を表形式で示す図。The figure which shows an example of the extraction definition table memorize | stored in the memory | storage device of the document data processing apparatus according to the 1st modification of the 2nd Embodiment of this invention. 本発明の第２の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の変形例を示す図。The figure which shows the modification of the error output screen output by the document data processing apparatus according to the 2nd Embodiment of this invention.

Explanation of symbols

１…制御部、２…記憶装置、３…形態素解析部、４…構文解析部、５…抽出処理部、６…判定部、７…入力装置、８…出力装置、９…入出力インタフェース、１０…バス。 DESCRIPTION OF SYMBOLS 1 ... Control part, 2 ... Memory | storage device, 3 ... Morphological analysis part, 4 ... Syntax analysis part, 5 ... Extraction process part, 6 ... Determination part, 7 ... Input device, 8 ... Output device, 9 ... Input / output interface, 10 …bus.

Claims

A document input means for inputting document data;
Syntax analysis means for performing syntax analysis of document data input by the document input means;
Based on the syntax analysis result by the syntax analysis means, a determination means for determining whether or not the description of the necessary expression is insufficient in the input document data;
A data processing apparatus comprising: output means for outputting a determination result by the determination means.

The determination means includes
Based on the result of the parsing by the parsing means, there is a lack of description of the expression to be established as a grammatical sentence in the part delimited by the punctuation in the document data input by the document input means The data processing apparatus according to claim 1, wherein it is determined whether or not.

A document input means for inputting document data;
Storage means for storing the first expression information in association with the second expression information that is the correct dependency of the first expression information;
When the first expression information stored in the storage means is described in the document data input by the document input means, the second expression stored in the storage means in association with the first expression information. Determining means for determining whether or not expression information is described as dependency information of the first expression information in the input document data;
A data processing apparatus comprising: output means for outputting a determination result by the determination means.

In the output means, second expression information associated with the first expression information and stored in the storage means is described in the input document data as dependency expression information of the first expression information. The data processing apparatus according to claim 3, wherein when it is determined that there is no data, the first expression information in the input document data is emphasized and output.

In the output means, second expression information associated with the first expression information and stored in the storage means is described in the input document data as dependency expression information of the first expression information. When it is determined that there is not, the second expression information associated with the first expression information is searched from the storage means, and this is output as a candidate for additional writing of the expression not described. 4. The data processing device according to 3.