JP2007310829A

JP2007310829A - Data processor, data processing method and data processing program

Info

Publication number: JP2007310829A
Application number: JP2006141957A
Authority: JP
Inventors: Kyoko Makino; 恭子牧野; Toshiyuki Kano; 敏行加納; Hiroshi Taira; 博司平; Kunitake So; 国威祖; Shigeru Matsumoto; 茂松本
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2006-05-22
Filing date: 2006-05-22
Publication date: 2007-11-29

Abstract

<P>PROBLEM TO BE SOLVED: To determine lack of a word to be described in association with a word in document data. <P>SOLUTION: An extraction processing part 3 recognizes each word of input document data, and extracts an item name associated with an expression same as that of the recognized word on a text mining dictionary when the expression is managed as a corresponding expression on the text mining dictionary. The extraction processing part 3 extracts related information associated with the extracted item name on the related information table. A determination part 4 determines that a word to be described in association with the word in the input document data is not included in the input document data and makes an output device 6 output a message showing that the word is not included when a word corresponding to the extracted related information is not included in the input document data. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文書データの解析を行なうデータ処理装置、データ処理方法およびデータ処理プログラムに関する。 The present invention relates to a data processing apparatus, a data processing method, and a data processing program for analyzing document data.

従来、文書データの作成や校正を支援するために、分野依存辞書を使用して入力文書データ中の記述が対応する分野の単語が優先的に提示する技術や、構文解析により文単位で文法チェックを行なう技術がある。また、例えば特許文献１に開示されるように、入力文書データ中の記述のうち具体性に欠ける部分を提示するものがある。この文献では、入力文書データ中の記述に予め定められた特徴部分が含まれる場合に、この特徴部分を有する文が具体的であるために欠落すべきでない５Ｗ１Ｈ（when,whereなど）を含んでいる否かを判定する。
特開２００２−１８３１１７号公報 Conventionally, to support the creation and proofreading of document data, a field-dependent dictionary is used to preferentially present words in the field corresponding to the description in the input document data, and grammar checking is performed on a sentence-by-sentence basis through syntax analysis. There is technology to do. Further, as disclosed in, for example, Patent Document 1, there is a document that presents a portion lacking in concreteness among descriptions in input document data. In this document, when a predetermined feature portion is included in the description in the input document data, a sentence having this feature portion is included, so that it includes 5W1H (when, where, etc.) that should not be omitted. Determine whether or not.
JP 2002-183117 A

しかし、前述したように文書データ中の記述が関わる分野の単語を優先的に表示させたり文法チェックを行なったりする技術では、記述内容の意味的不足を判定することはできない。また、前述したように文書データ中の予め定められた特徴を有する文に５Ｗ１Ｈが含まれているか否かを判定すれば具体性のない文をチェックすることができるが、５Ｗ１Ｈは概念的な要素であるので、文書データが例えば製品の修理報告書である場合に、当該修理報告書に記述されるべき具体的な単語が無い場合にこれを指摘することができない。 However, as described above, the technology that preferentially displays the words in the field related to the description in the document data or performs the grammar check cannot determine the semantic deficiency of the description content. Further, as described above, if it is determined whether or not 5W1H is included in a sentence having predetermined characteristics in document data, it is possible to check a sentence having no concreteness, but 5W1H is a conceptual element. Therefore, when the document data is, for example, a product repair report, this cannot be pointed out when there is no specific word to be described in the repair report.

そこで、本発明の目的は、文書データ中のある単語に関連して記述されるべき単語の記述不足を判定することが可能になるデータ処理装置、データ処理方法およびデータ処理プログラムを提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a data processing device, a data processing method, and a data processing program capable of determining a lack of description of a word to be described in relation to a certain word in document data. is there.

すなわち、本発明に係わるデータ処理装置は、予め定められた第１の単語情報とこの第１の単語情報と関連して記述されるべき第２の単語情報とを関連付けて記憶しておき、この記憶された第１の単語情報が入力文書データに含まれる場合に当該第１の単語情報に関連付けられて記憶される第２の単語情報が入力文書データに含まれているか否かを判定して、この判定結果を出力することを特徴とする。 That is, the data processing apparatus according to the present invention stores the first word information determined in advance in association with the second word information to be described in association with the first word information. When the stored first word information is included in the input document data, it is determined whether or not the second word information stored in association with the first word information is included in the input document data. The determination result is output.

本発明に係わるデータ処理装置では、文書データ中のある単語に関連して記述されるべき単語の記述不足を判定することができる。 In the data processing apparatus according to the present invention, it is possible to determine the lack of description of a word to be described in relation to a certain word in the document data.

以下図面により本発明の実施形態について説明する。
図１は、本発明の実施形態にしたがった文書データ処理装置の構成例を示すブロック図である。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram illustrating a configuration example of a document data processing apparatus according to an embodiment of the present invention.

図１に示すように、本発明の実施形態にしたがった文書データ処理装置は、装置全体の制御を司る制御部１、記憶装置２、抽出処理部３、判定部４、入力装置５、出力装置６および入出力インタフェース７を備え、それぞれがバス８を介して接続される。 As shown in FIG. 1, a document data processing apparatus according to an embodiment of the present invention includes a control unit 1 that controls the entire apparatus, a storage device 2, an extraction processing unit 3, a determination unit 4, an input device 5, and an output device. 6 and an input / output interface 7, each connected via a bus 8.

記憶装置２は例えばハ−ドディスクドライブや不揮発性メモリ装置などのハードウェアで構成される。記憶装置２は制御部１による実行対象の制御プログラムを記憶するのに加え、制御部１による各種処理のワークメモリとしても機能する。 The storage device 2 is configured by hardware such as a hard disk drive or a nonvolatile memory device. The storage device 2 functions as a work memory for various processes performed by the control unit 1 in addition to storing a control program to be executed by the control unit 1.

抽出処理部３は、入力文書データ中から予め定められた単語を抽出する。判定部４は、抽出処理部３が抽出した単語情報に関連して記述されるべき単語が入力文書データに含まれているか否かを判定する。入力装置５は例えばキーボードやマウスであり、文書データの新規作成などにかかる操作を受け付ける。出力装置６は例えばディスプレイ装置である。
また、入出力インタフェース７は図示しない外部記憶装置とケーブルを介して接続可能であり、この外部記憶装置との間で文書データを入出力する。 The extraction processing unit 3 extracts a predetermined word from the input document data. The determination unit 4 determines whether or not a word to be described in association with the word information extracted by the extraction processing unit 3 is included in the input document data. The input device 5 is, for example, a keyboard or a mouse, and accepts operations related to creation of new document data. The output device 6 is a display device, for example.
The input / output interface 7 can be connected to an external storage device (not shown) via a cable, and inputs / outputs document data to / from the external storage device.

図２は、本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶されるテキストマイニング辞書テーブルの構成例を表形式で示す図である。
このテキストマイニング辞書テーブルは、入力文書データ中に予め定められた単語が含まれているか否かの判別のために用いられるテーブルである。なお、図２に示すテキストマイニング辞書テーブルにおける「項目名」や「該当表現」には「単語」が記述されているが、単語に限定しなくても良い。例えば、文節や文章になっていても良い。図３に示す「項目名」や「関連情報」、さらに図８に示す「項目名」や「関連情報」や「該当表現」においても同様である。 FIG. 2 is a diagram showing a configuration example of the text mining dictionary table stored in the storage device of the document data processing device according to the embodiment of the present invention in a table format.
This text mining dictionary table is a table used for determining whether or not a predetermined word is included in the input document data. In addition, although “word” is described in “item name” and “corresponding expression” in the text mining dictionary table shown in FIG. 2, it is not limited to words. For example, it may be a clause or a sentence. The same applies to “item name” and “related information” shown in FIG. 3, and “item name”, “related information”, and “relevant expression” shown in FIG.

このテキストマイニング辞書テーブルでは、コード、項目名および該当表現が関連付けて管理される。コードは項目名ごとに対応する管理番号であり、アルファベットおよび数字からなる。また、テキストマイニング辞書テーブル中の単一の項目名には単一または複数の該当表現が関連付けられる。 In this text mining dictionary table, codes, item names, and corresponding expressions are managed in association with each other. The code is a management number corresponding to each item name, and consists of alphabets and numbers. A single item name in the text mining dictionary table is associated with a single or a plurality of corresponding expressions.

本発明の実施形態にしたがった文書データ処理装置は、入力文書データ中の単語にテキストマイニング辞書テーブルで管理される該当表現と同じ単語が存在する場合に、これとテーブル上で関連付けられる項目名に対応する単語が入力文書データに含まれていると判定する。また、テキストマイニング辞書テーブルで管理される該当表現はテキスト処理で一般に使用される「正規表現」で記述してもよい。 In the document data processing apparatus according to the embodiment of the present invention, when the same word as the corresponding expression managed in the text mining dictionary table exists in the word in the input document data, the document name is associated with the item name associated with the table. It is determined that the corresponding word is included in the input document data. Further, the corresponding expression managed in the text mining dictionary table may be described by “regular expression” generally used in text processing.

図３は、本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶される関連情報テーブルの構成例を表形式で示す図である。
この関連情報テーブルでは項目名、関連情報および対象が関連付けられて管理される。関連情報テーブル上で管理される「関連情報」は入力文書データ中に関連情報テーブル項目名に対応する単語が含まれる場合に、この単語に関連して入力文書データ内に含まれるべき単語情報である。「対象」はこれに関連情報テーブル上で関連付けられる関連情報が関わる例えば製品種別などのカテゴリ情報である。 FIG. 3 is a diagram showing a configuration example of the related information table stored in the storage device of the document data processing device according to the embodiment of the present invention in a table format.
In this related information table, item names, related information, and objects are managed in association with each other. “Related information” managed on the related information table is word information to be included in the input document data in relation to this word when the word corresponding to the related information table item name is included in the input document data. is there. “Target” is category information such as product type related to related information related to the related information on the related information table.

本発明の実施形態にしたがった文書データ処理装置は、入力文書データ中の単語にテキストマイニング辞書テーブルで管理される該当表現と同じ単語が存在し、この該当表現と対応する項目名に関連情報テーブル上で関連付けられる関連情報と同じ単語が入力文書データに含まれない場合に、前述した項目名に対応する単語と関連して記述されるべき単語が入力文書データに含まれていないと判定する。
前述したテキストマイニング辞書テーブルや関連情報テーブルで管理される情報は入力装置５に対する予め定められた手順の操作を行なうことで変更することができる。 In the document data processing apparatus according to the embodiment of the present invention, a word in the input document data has the same word as the corresponding expression managed in the text mining dictionary table, and the related information table is stored in the item name corresponding to the corresponding expression. When the same word as the related information associated above is not included in the input document data, it is determined that the word to be described in association with the word corresponding to the item name described above is not included in the input document data.
Information managed in the above-described text mining dictionary table and related information table can be changed by operating a predetermined procedure on the input device 5.

次に、図１に示した構成の文書データ処理装置の動作について説明する。
図４は、本発明の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャートである。 Next, the operation of the document data processing apparatus having the configuration shown in FIG. 1 will be described.
FIG. 4 is a flowchart showing an example of the processing operation of the document data processing apparatus according to the embodiment of the present invention.

まず、文書データ処理装置の入力装置５に対する操作により文書データが作成されると、制御部１はこの文書データを記憶装置２に記憶する。また、外部記憶装置に記憶された作成済みの文書データが入出力インタフェース７に入力されると、制御部１はこの文書データを記憶装置２に記憶する（ステップＳ１）。 First, when document data is created by an operation on the input device 5 of the document data processing apparatus, the control unit 1 stores the document data in the storage device 2. When the created document data stored in the external storage device is input to the input / output interface 7, the control unit 1 stores the document data in the storage device 2 (step S1).

図５は、本発明の実施形態にしたがった文書データ処理装置に入力される文書データの構成例を示す図である。
この文書データはエアコンの修理報告書であり、修理対象機器、機種、症状および報告内容が例えば表形式で記述されたものである。入力文書データは図５に示した表形式に限らず、箇条書き形式であってもよいし通常の文章形式であってもよい。 FIG. 5 is a diagram showing a configuration example of document data input to the document data processing apparatus according to the embodiment of the present invention.
This document data is an air conditioner repair report, in which the repair target device, model, symptom, and report contents are described in a table format, for example. The input document data is not limited to the table format shown in FIG. 5, but may be a bullet format or a normal text format.

抽出処理部３は、記憶装置２に記憶された入力文書データの形態素解析を行なって、文書データ中の単語を認識し、この認識した単語と記憶装置２に記憶されるテキストマイニング辞書テーブルとを照合することにより、認識した単語と同じ単語がテキストマイニング辞書テーブル上の該当表現として管理されている場合には、これにテキストマイニング辞書テーブル上で関連付けられる項目名を抽出する（ステップＳ２）。 The extraction processing unit 3 performs morphological analysis of the input document data stored in the storage device 2 to recognize words in the document data, and recognizes the recognized words and the text mining dictionary table stored in the storage device 2. If the same word as the recognized word is managed as a corresponding expression on the text mining dictionary table by collating, the item name associated with this on the text mining dictionary table is extracted (step S2).

このとき、抽出処理部３は図５に示す症状「水漏れ」がテキストマイニング辞書テーブル上に含まれているか否かを照合し、さらに図５に示す報告内容「冷房使用時に水漏れとのことで修理依頼あり。」の中に症状「水漏れ」に対応する該当表現が含まれているか否かを照合するという処理を行ってもよい。つまり、テキストマイニング辞書テーブルの照合対象が入力文書データの全体であっても良いし、その文書データの一部であってもよい。 At this time, the extraction processing unit 3 collates whether or not the symptom “water leak” shown in FIG. 5 is included in the text mining dictionary table, and further, the report content shown in FIG. May be processed to check whether or not the corresponding expression corresponding to the symptom “water leak” is included in the “repair request in”. That is, the collation target of the text mining dictionary table may be the entire input document data or a part of the document data.

図６は、本発明の実施形態にしたがった文書データ処理装置による項目名抽出結果の一例を示す図である。
この抽出結果における文字列＜＊＊＊＊＊＞および＜／＊＊＊＊＊＞で囲まれた単語が抽出処理部３により抽出された項目名に対応する単語である。「＊＊＊＊＊」は前述したように囲まれた単語である項目名にテキストマイニング辞書テーブル上で関連付けられたコード名である。 FIG. 6 is a diagram showing an example of the item name extraction result by the document data processing apparatus according to the embodiment of the present invention.
The words surrounded by the character strings <******> and </ ******> in the extraction result are the words corresponding to the item names extracted by the extraction processing unit 3. “******” is a code name associated with the item name, which is a word enclosed as described above, on the text mining dictionary table.

図６に示した例では、入力文書データ中の「症状」の項目と連なる項目中の「水漏れ」が当該水漏れにテキストマイニング辞書テーブル上で関連付けられるコード名に対応する文字列＜Ｅ０００１＞および＜／Ｅ０００１＞で囲まれているので、「水漏れ」が抽出処理部３により抽出した項目名に対応する単語である。 In the example shown in FIG. 6, the character string <E0001> corresponding to the code name in which “water leak” in the item connected to the “symptom” item in the input document data is associated with the water leak on the text mining dictionary table. Since it is surrounded by </ E0001>, “water leakage” is a word corresponding to the item name extracted by the extraction processing unit 3.

そして、抽出処理部３は、ステップＳ２の処理で抽出した項目名と記憶装置２に記憶される関連情報テーブルとを照合することにより、抽出済みの項目名に関連情報テーブル上で関連付けられる関連情報を抽出する（ステップＳ３）。 And the extraction process part 3 collates the item name extracted by the process of step S2 with the related information table memorize | stored in the memory | storage device 2, and is related information related with the extracted item name on a related information table Is extracted (step S3).

具体的には、ステップＳ２の処理で抽出した項目名が図６に示した「水漏れ」であって、照合対象の関連情報テーブルが図３に示した構成である場合には、抽出処理部３は、関連情報テーブル上の項目名「水漏れ」に関連付けられる関連情報「ホース」、「フィルタ」および「破損」を抽出する。 Specifically, when the item name extracted in the process of step S2 is “water leak” shown in FIG. 6 and the related information table to be collated has the configuration shown in FIG. 3 extracts related information “hose”, “filter” and “damage” associated with the item name “water leak” on the related information table.

そして、抽出処理部３は、抽出済みの関連情報が複数ある場合には、入力文書データ中の記述にしたがった絞込みを行なう。
具体的には、抽出された関連情報が「ホース」、「フィルタ」および「破損」である場合には、これらの情報を、関連情報テーブル上の「対象」のうち限定の無い「（なし）」に関連付けられる「フィルタ」と入力文書データ中の記述と同じ対象である「エアコン」に関連付けられる「破損」に絞り込む。 Then, when there are a plurality of pieces of related information that have been extracted, the extraction processing unit 3 performs narrowing down according to the description in the input document data.
Specifically, when the extracted related information is “hose”, “filter”, and “damaged”, these information are not limited to “(none)” among “targets” on the related information table. "Filter" associated with "" and "damage" associated with "air conditioner" that is the same target as the description in the input document data.

記憶装置２には判定部４による判定条件の情報が記憶されている。ここでは判定条件は「抽出又は絞り込まれた関連情報に対応する単語が入力文書データに記述されていない場合には記述不足と判定する」である。 The storage device 2 stores information on determination conditions by the determination unit 4. Here, the determination condition is “determined that the description is insufficient when a word corresponding to the extracted or narrowed-down related information is not described in the input document data”.

判定部４は、この判定条件にしたがって、前述したように絞り込まれた関連情報の何れかに対応する単語が入力文書データ中に含まれていない場合には、入力文書データ中の「水漏れ」に関連して記述されるべき単語である「フィルタ」および「破損」のいずれも当該入力文書データに記述されていないので、記述が不足していると判定する（ステップＳ４のＹＥＳ）。 When the word corresponding to any of the related information narrowed down as described above is not included in the input document data in accordance with the determination condition, the determination unit 4 performs “water leakage” in the input document data. Since neither “filter” nor “damaged”, which are words to be described in relation to the above, are described in the input document data, it is determined that the description is insufficient (YES in step S4).

すると判定部４は入力文書データ中の記述が不足している旨を示すメッセージを生成し、このメッセージを含むエラー出力画面を出力装置６に出力させる（ステップＳ５）。 Then, the determination unit 4 generates a message indicating that the description in the input document data is insufficient, and causes the output device 6 to output an error output screen including this message (step S5).

図７は、本発明の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図である。
この出力画面では図５に示した入力文書データ中の「水漏れ」に関連して記述されるべき単語が無いことが示されている。 FIG. 7 is a diagram showing an example of an error output screen output by the document data processing apparatus according to the embodiment of the present invention.
This output screen indicates that there is no word to be described in relation to “water leak” in the input document data shown in FIG.

以上のように、本発明の実施形態にしたがった文書データ処理装置では予め定められた第１の単語情報とこれに関連して記述されるべき第２の単語情報を関連付けて記憶装置２に記憶しておき、第１の単語情報が入力文書データ中の記述として存在する場合に、これに関連付けられて記憶装置２に記憶される第２の単語情報が入力文書データ中に存在しない場合には、入力文書データ中の記述が不足している判定するので、文書データ作成時の不備の修正を支援することができる、よってユーザによる文書データ作成の負荷を軽減することができる。 As described above, in the document data processing apparatus according to the embodiment of the present invention, the predetermined first word information and the second word information to be described in association therewith are stored in the storage device 2. When the first word information exists as a description in the input document data, and the second word information stored in the storage device 2 in association with the first word information does not exist in the input document data. Since it is determined that the description in the input document data is insufficient, it is possible to support correction of deficiencies in the creation of the document data, thereby reducing the burden of creating the document data by the user.

前述した実施形態にしたがった文書データ処理装置は、テキストマイニング辞書テーブル上の項目名に対応する単語が入力文書データに存在する場合に、これに関連情報テーブル上で関連付けられる関連情報が入力文書データ中に存在しない場合に、入力文書データ中の記述が不足していると判定した。しかしこれに限らず、図３に示したような関連情報テーブル上の関連情報と同じ単語が当該関連情報テーブル上の項目名として管理されている場合に、この項目名に関連情報テーブル上で関連付けられる関連情報と、前述したように入力文書データに存在する単語の項目名に関連情報テーブル上で関連付けられる関連情報とのいずれかが入力文書データ中に存在しない場合に、入力文書データ中の記述が不足していると判定する形態でもよい。 In the document data processing apparatus according to the above-described embodiment, when a word corresponding to an item name on the text mining dictionary table exists in the input document data, related information associated with the word on the related information table is input document data. If it does not exist, it is determined that the description in the input document data is insufficient. However, the present invention is not limited to this, and when the same word as the related information on the related information table as shown in FIG. 3 is managed as the item name on the related information table, this item name is associated on the related information table. Description in the input document data when any of the related information to be associated with the item name of the word existing in the input document data on the related information table as described above does not exist in the input document data. It may be determined that is insufficient.

例えば入力文書データに存在する単語である項目名「水漏れ」と図３に示した関連情報テーブル上で関連付けられる関連情報は「ホース」、「フィルタ」、「破損」であるが、このうち「フィルタ」は同じテーブル上の項目名としても管理される。この場合、抽出処理部３は、項目名「フィルタ」とテーブル上で関連付けられる「目詰まり」および前述した「ホース」、「フィルタ」、「破損」をあわせて抽出する。そして判定部４はこれら抽出された単語のいずれかが入力文書データに含まれるか否かを判定する。これにより、項目名「水漏れ」と関連情報「目詰まり」とを関連情報テーブル上で直接関連付けなくとも、直接関連付けた場合と同様の効果が得られるので、関連情報テーブルにおける情報量を最小限にすることができる。 For example, the item name “water leak”, which is a word existing in the input document data, and related information associated with the related information table shown in FIG. 3 are “hose”, “filter”, and “damage”. “Filter” is also managed as an item name on the same table. In this case, the extraction processing unit 3 extracts “clogging” associated with the item name “filter” on the table and “hose”, “filter”, and “breakage” described above together. Then, the determination unit 4 determines whether any of these extracted words is included in the input document data. As a result, even if the item name “water leakage” and the related information “clogged” are not directly related to each other on the related information table, the same effect as that obtained when directly related can be obtained. Can be.

次に、本発明の実施形態にしたがった文書データ処理装置の変形例について説明する。
図８は、本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶される関連情報テーブルの変形例を表形式で示す図である。
この変形例では、記憶装置２に記憶される関連情報テーブルは、図３に示した項目名、関連情報および対象に加え、関連情報の該当表現をさらに関連付けて管理する。 Next, a modification of the document data processing apparatus according to the embodiment of the present invention will be described.
FIG. 8 is a diagram showing a modification of the related information table stored in the storage device of the document data processing device according to the embodiment of the present invention in a table format.
In this modified example, the related information table stored in the storage device 2 manages the associated expression of the related information in addition to the item name, the related information, and the object shown in FIG.

図８に示すように、関連情報「フィルタ」に該当表現「（フィルタ｜フィルター｜ｆｉｌｔｅｒ）」が関連情報テーブル上で関連付けられており、入力文書データ中に項目名「水漏れ」が記述されている場合には、判定部４は、入力文書データ中に「フィルタ」、「フィルター」および「ｆｉｌｔｅｒ」のいずれかが記述されていれば、項目名「水漏れ」に関連して記述されるべき単語が入力文書データ中に記述されていると判別する。よって、入力文書データに記述されるべき単語の判定精度を向上させることができる。 As shown in FIG. 8, the related expression “(filter | filter | filter)” is associated with the related information “filter” on the related information table, and the item name “water leak” is described in the input document data. If there is any one of “filter”, “filter” and “filter” in the input document data, the determination unit 4 should be described in relation to the item name “water leak”. It is determined that the word is described in the input document data. Therefore, it is possible to improve the determination accuracy of the word to be described in the input document data.

なお、この発明は前記実施形態そのままに限定されるものではなく実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、前記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を省略してもよい。更に、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 The present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Moreover, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be omitted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

本発明の実施形態にしたがった文書データ処理装置の構成例を示すブロック図。1 is a block diagram showing a configuration example of a document data processing apparatus according to an embodiment of the present invention. 本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶されるテキストマイニング辞書テーブルの構成例を表形式で示す図。The figure which shows the structural example of the text mining dictionary table memorize | stored in the memory | storage device of the document data processing apparatus according to embodiment of this invention in a table format. 本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶される関連情報テーブルの構成例を表形式で示す図。The figure which shows the structural example of the related information table memorize | stored in the memory | storage device of the document data processing apparatus according to embodiment of this invention in a table format. 本発明の実施形態にしたがった文書データ処理装置の処理動作の一例を示すフローチャート。6 is a flowchart showing an example of a processing operation of the document data processing apparatus according to the embodiment of the present invention. 本発明の実施形態にしたがった文書データ処理装置に入力される文書データの構成例を示す図。The figure which shows the structural example of the document data input into the document data processing apparatus according to embodiment of this invention. 本発明の実施形態にしたがった文書データ処理装置による項目名抽出結果の一例を示す図。The figure which shows an example of the item name extraction result by the document data processing apparatus according to embodiment of this invention. 本発明の実施形態にしたがった文書データ処理装置により出力するエラー出力画面の一例を示す図。The figure which shows an example of the error output screen output by the document data processing apparatus according to embodiment of this invention. 本発明の実施形態にしたがった文書データ処理装置の記憶装置に記憶される関連情報テーブルの変形例を表形式で示す図。The figure which shows the modification of the related information table memorize | stored in the memory | storage device of the document data processing apparatus according to embodiment of this invention in a table format.

Explanation of symbols

１…制御部、２…記憶装置、３…抽出処理部、４…判定部、５…入力装置、６…出力装置、７…入出力インタフェース、８…バス。 DESCRIPTION OF SYMBOLS 1 ... Control part, 2 ... Memory | storage device, 3 ... Extraction process part, 4 ... Determination part, 5 ... Input device, 6 ... Output device, 7 ... Input / output interface, 8 ... Bus.

Claims

A document input means for inputting document data;
Storage means for associating and storing predetermined first word information and second word information to be described in association with the first word information;
Second word information stored in the storage unit in association with the first word information when the document data input by the document input unit includes the first word information stored in the storage unit Determining means for determining whether or not is included in the input document data;
A data processing apparatus comprising: output means for outputting a determination result by the determination means.

The output means outputs a comment that the description in the input document data is insufficient when the determination means determines that the second word information is not included in the document data. The data processing apparatus according to claim 1, wherein

The first word information stored in the storage means includes the same word information as the second word information,
In the case where the first word information is included in the document data input by the document input unit among the word information stored in the storage unit, the determination unit includes the word information stored in the storage unit. Any of the second word information and the word information associated with the first word information when the second word information is the same word information as the first word information is included in the input document data The data processing apparatus according to claim 1, wherein it is determined whether or not the data is processed.

Used in a device having a storage device that stores predetermined first word information and second word information to be described in association with the first word information;
A document input step for inputting document data;
Second word information stored in the storage device in association with the first word information when the document data input in the document input step includes the first word information stored in the storage device. A step of determining whether or not is included in the input document data;
A data processing method comprising: an output step for outputting a determination result obtained by the determination step.

A program for controlling a computer including a storage device that stores predetermined first word information and second word information to be described in association with the first word information. ,
The computer,
Document input means for inputting document data,
Second word information stored in the storage device in association with the first word information when the document data input by the document input means includes the first word information stored in the storage device Is a computer-readable data processing program that functions as a determination unit that determines whether or not a document is included in the input document data, and an output unit that outputs a determination result by the determination unit.