JP2000194712A

JP2000194712A - Data processor and recording medium

Info

Publication number: JP2000194712A
Application number: JP10370863A
Authority: JP
Inventors: Hirofumi Nakajima; 弘文中島; Shozo Fukutani; 正三福谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-12-25
Filing date: 1998-12-25
Publication date: 2000-07-14

Abstract

PROBLEM TO BE SOLVED: To permit rising of a multidocument system in a short time by simplifying previous setup including keyword extraction/registration. SOLUTION: The kind of file is discriminated from the expander of a file read out of a document file 2 and when the kind of file is a drawing, a keyword is extracted from a range including a title column by a drawing keyword extracting means 5 but when it is a slip, a keyword is extracted from the item row and item column of the slip by a slip keyword extracting means 6. When it is a document, a keyword is extracted from a character string having the prescribed number of characters from the head of the document by a document keyword extracting means 8.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、CAD で作成された
図面、表計算ソフトで作成された帳票、ワードプロセッ
サで作成された文書等、複数種類のファイルからキーワ
ードを自動的に抽出する機能を備えたデータ処理装置、
及びキーワード抽出のコンピュータプログラムが記録さ
れている記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention has a function of automatically extracting keywords from a plurality of types of files, such as drawings created by CAD, forms created by spreadsheet software, and documents created by a word processor. Data processing device,
And a recording medium on which a computer program for keyword extraction is recorded.

【０００２】[0002]

【従来の技術】多数のファイルを一元管理するために
は、検索の手がかりとなるキーワードを登録する必要が
ある。例えば、CAD で作成されたような図面の場合であ
れば、「図番」「図面の名称」「設計者氏名」「設計年
月日」等のような項目名の欄の右側又は下側の欄に図面
に固有のデータが入力されている表題欄を参照して、オ
ペレータが固有のデータをキー入力してキーワードを登
録する。2. Description of the Related Art In order to centrally manage a large number of files, it is necessary to register a keyword that is a key for retrieval. For example, in the case of a drawing created by CAD, the right or bottom of the item name column such as “drawing number”, “drawing name”, “designer name”, “design date”, etc. The operator refers to the title box in which the data unique to the drawing is entered in the column, and inputs the unique data by key, and registers a keyword.

【０００３】また表計算ソフトで作成されたような帳票
の場合であれば、表形式の帳票の各列，各行に入力され
る数値の属性を示す「年度」「売上」「原価」「損益」
「平成９年度」等のような、各行，各列の項目名をオペ
レータがキー入力してキーワードを登録する。In the case of a form created by spreadsheet software, "year", "sales", "cost", "profit and loss" indicating attributes of numerical values inputted in each column and each row of a form in a table form.
An operator key-inputs an item name such as "1997" in each row and each column to register a keyword.

【０００４】また文書の場合であれば、例えば全文にお
ける各語の出現回数をカウントし、出現頻度が相対的に
高い語をキーワードとして自動的に抽出する手法も提案
されている。しかし、全文を検索することにより、出現
頻度は高いが重要ではない語がキーワードとして抽出さ
れる可能性がある等の不都合から、多くの場合、オペレ
ータがキー入力して登録しているのが現状である。In the case of a document, for example, a method has been proposed in which the number of appearances of each word in the whole sentence is counted, and a word having a relatively high appearance frequency is automatically extracted as a keyword. However, in many cases, the search is performed in full text, and the infrequent and insignificant words may be extracted as keywords. It is.

【０００５】[0005]

【発明が解決しようとする課題】ところで、図面の場
合、キーワードを含んでいる表題欄の範囲を指定し、こ
の範囲からキーワードを自動的に抽出することが考えら
れる。しかし、図面のサイズ（A0,A1,A2等）、原点の位
置が様々な図面が混在している場合、表題欄の範囲を指
定するための座標値（例えば表題欄の左上の座標値）が
全ての図面で必ずしも同一とは限らないので、オペレー
タは、図面毎に表題欄の範囲を指定するための座標値を
設定入力しなければならない。さらに、「図番」「図面
の名称」「設計者氏名」「設計年月日」等のような項目
名毎にキーワードを抽出する場合では、図面毎に各項目
の座標値も設定入力しなければならない。By the way, in the case of a drawing, it is conceivable that a range of a title box including a keyword is designated, and the keyword is automatically extracted from this range. However, if there are various drawings with various drawing sizes (A0, A1, A2, etc.) and different origin positions, the coordinate values for specifying the range of the title box (for example, the upper left coordinate value of the title box) The operator is not necessarily the same in all drawings, so the operator must set and input coordinate values for designating the range of the title box for each drawing. Further, when keywords are extracted for each item name such as “drawing number”, “drawing name”, “designer name”, “design date”, etc., the coordinate value of each item must be set and input for each drawing. Must.

【０００６】また、帳票において行数が不定である場
合、「年度」「売上」「原価」「損益」「平成９年度」
等のようなキーワードとなるべき項目名が入力されてい
る行，列の位置を帳票毎に設定入力しなければならな
い。If the number of rows is indefinite in the form, the “year”, “sales”, “cost”, “profit / loss”, and “1997”
It is necessary to set and input the position of the row or column in which the item name to be a keyword such as is input for each form.

【０００７】従って、図面、帳票、文書等、形式が異な
る複数種類のファイルを一元管理するシステム（以下、
マルチ・ドキュメント・システムという）を構築する場
合、キーワードの抽出（登録）を含む事前のセットアッ
プ作業が煩雑であるために作業が長期間にわたり、シス
テムの立ち上げまでに長期間を要する。このため、シス
テムを立ち上げた時にはファイルの内容が既に陳腐化し
てしまっているというおそれがある。Accordingly, a system (hereinafter, referred to as a unit) for integrally managing a plurality of types of files having different formats such as drawings, forms, documents, etc.
In the case of constructing a multi-document system), the setup work including the extraction (registration) of keywords is complicated, so the work is long, and it takes a long time to start up the system. Therefore, when the system is started, there is a possibility that the contents of the file have already become obsolete.

【０００８】本発明はこのような問題点を解決するため
になされたものであって、図面の座標を、所定点、例え
ば左下を原点とした座標に変換して図面のサイズ，原点
の位置の如何にかかわらず、表題欄が含まれている範囲
を検出し、この範囲からキーワードを自動的に抽出する
ことにより、また帳票の各列，各行の項目名が入力され
ている行，列を、項目に入力されている文字の文字種に
基づき、例えば数字以外の文字が有るか無いか、この文
字数が最多か否か等に基づいて検出し、この行，列から
キーワードを自動的に抽出することにより、さらに、目
次，概要等が含まれている可能性が高い、文書の先頭部
分の所定数の文字列からキーワードを自動的に抽出する
ことにより、キーワード抽出・登録を含む事前のセット
アップ作業を簡略化して、短期間でのマルチ・ドキュメ
ント・システムの立ち上げを可能にする、キーワード抽
出機能を備えたデータ処理装置、及びキーワード抽出の
コンピュータプログラムが記録されている記録媒体の提
供を目的とする。The present invention has been made to solve such a problem, and converts the coordinates of a drawing into a predetermined point, for example, coordinates having the origin at the lower left, to determine the size of the drawing and the position of the origin. Regardless of what the title block is included in, the keyword is automatically extracted from this range. Automatically extract keywords from this row and column based on the character type of the characters entered in the item, for example, to detect whether there is a character other than a number or not, based on whether the number of characters is the largest, etc. In addition, by automatically extracting keywords from a predetermined number of character strings at the beginning of a document, which is likely to include a table of contents, an outline, etc., the prior setup work including keyword extraction and registration can be performed. Simple And, allowing the launch of a multi-document system in a short period, the data processing apparatus having a keyword extraction function, and aims to provide a recording medium that keyword extraction computer program is recorded.

【０００９】[0009]

【課題を解決するための手段】第１発明のデータ処理装
置は、表題欄を有する図面情報のファイルからキーワー
ドを抽出する機能を備えたデータ処理装置であって、所
定点を原点とした座標で範囲を規定した図面の中で、表
題欄が含まれる範囲を規定する少なくとも一点の座標値
を図面のサイズ別に記憶する手段と、図面の座標を、前
記所定点を原点とした座標に変換する手段と、座標変換
後の図面の中の、キーワードを抽出すべき図面のサイズ
に対応付けて記憶されている、前記少なくとも一点の座
標値に基づいて規定される範囲からキーワードを抽出す
る手段とを備えたことを特徴とする。A data processing apparatus according to a first aspect of the present invention is a data processing apparatus having a function of extracting a keyword from a drawing information file having a title box, wherein the data processing apparatus uses coordinates having a predetermined point as an origin. Means for storing at least one point coordinate value defining a range including a title block in a drawing defining a range for each size of the drawing, and means for converting coordinates of the drawing into coordinates having the predetermined point as an origin. Means for extracting a keyword from a range defined based on the coordinate values of the at least one point, stored in association with the size of the drawing from which the keyword is to be extracted, in the drawing after coordinate conversion. It is characterized by having.

【００１０】第２発明のデータ処理装置は、第１発明の
前記表題欄が、図面に固有のデータ及び該データの属性
を示す固定データを入力するための複数の項目欄を有
し、前記項目欄に入力されている固定データを抽出する
手段と、固定データが入力されている項目欄に隣接する
位置の項目欄に入力されているデータを、該固定データ
のキーワードとして抽出する手段とを備えたことを特徴
とする。In the data processing apparatus according to a second aspect of the present invention, the title field of the first aspect has a plurality of item fields for inputting data unique to the drawing and fixed data indicating attributes of the data. Means for extracting fixed data entered in the column, and means for extracting data entered in the item column adjacent to the item column in which the fixed data is entered as a keyword of the fixed data. It is characterized by having.

【００１１】第３発明のデータ処理装置は、表形式の情
報のファイルからキーワードを抽出する機能を備えたデ
ータ処理装置において、各列の項目に入力されているデ
ータの属性を示すデータが入力されている項目からなる
項目行及び／又は各行の項目に入力されているデータの
属性を示すデータが入力されている項目からなる項目列
を、項目に入力されている文字の文字種に基づいて検出
する手段と、項目行及び／又は項目列の項目からキーワ
ードを抽出する手段とを備えたことを特徴とする。A data processing apparatus according to a third aspect of the present invention is a data processing apparatus having a function of extracting a keyword from a file of information in a table format, wherein data indicating an attribute of the data input to each column is input. An item row composed of items having data items indicating the attributes of data items entered in the item lines and / or items in each line is detected based on the character type of the characters entered in the items. Means, and means for extracting a keyword from the item of the item row and / or the item column.

【００１２】第４発明のデータ処理装置は、第３発明に
加えて、表形式の情報以外の範囲からキーワードを抽出
する手段を備えたことを特徴とする。A data processing apparatus according to a fourth invention is characterized in that, in addition to the third invention, a means for extracting a keyword from a range other than information in a table format is provided.

【００１３】第５発明のデータ処理装置は、文書ファイ
ルからキーワードを抽出する機能を備えたデータ処理装
置において、文書の先頭から所定の文字数の文字列を語
に分解する手段と、該文字列からキーワードを抽出する
手段とを備えたことを特徴とする。A data processing apparatus according to a fifth aspect of the present invention is a data processing apparatus having a function of extracting a keyword from a document file, comprising: means for decomposing a character string having a predetermined number of characters from the beginning of a document into words; Means for extracting a keyword.

【００１４】第６発明のデータ処理装置は、図面情報の
ファイル、表形式の情報のファイル、及び文書ファイル
を含む複数種類のファイルからキーワードを抽出する機
能を備えたデータ処理装置であって、キーワードを抽出
すべきファイルの拡張子からファイルの種類を判定する
手段と、所定点を原点とした座標で範囲を規定した図面
の中で、表題欄が含まれる範囲を規定する少なくとも一
点の座標値を図面のサイズ別に記憶する手段と、ファイ
ルの種類が図面情報のファイルの場合は、図面の座標
を、前記所定点を原点とした座標に変換し、座標変換後
の図面の中の、キーワードを抽出すべき図面のサイズに
対応付けて記憶されている、前記少なくとも一点の座標
値に基づいて規定される範囲からキーワードを抽出する
手段と、ファイルの種類が表形式の情報のファイルの場
合は、各列の項目に入力されているデータの属性を示す
データが入力されている項目からなる項目行及び／又は
各行の項目に入力されているデータの属性を示すデータ
が入力されている項目からなる項目列を、項目に入力さ
れている文字の文字種に基づいて検出し、項目行及び／
又は項目列の項目からキーワードを抽出する手段と、フ
ァイルの種類が文書ファイルの場合は、文書の先頭から
所定の文字数の文字列を語に分解し、該文字列からキー
ワードを抽出する手段とを備えたことを特徴とする。A data processing apparatus according to a sixth aspect of the present invention is a data processing apparatus having a function of extracting a keyword from a plurality of types of files including a drawing information file, a tabular information file, and a document file. Means for judging the file type from the extension of the file to be extracted, and at least one coordinate value defining the range including the title box in the drawing defining the range with coordinates with the predetermined point as the origin. Means for storing each drawing size; if the file type is a drawing information file, the coordinates of the drawing are converted into coordinates having the predetermined point as the origin, and keywords in the drawing after the coordinate conversion are extracted. Means for extracting a keyword from a range defined based on the coordinate values of the at least one point stored in association with the size of the drawing to be read; If the type is a file of information in the form of a table, an item row consisting of items in which data indicating the attributes of the data entered in the items of each column has been entered and / or the data of the data entered in the items of each line. An item string consisting of items in which data indicating attributes is input is detected based on the character type of the characters input in the item, and the item line and / or
Alternatively, means for extracting a keyword from the items in the item string, and when the file type is a document file, means for decomposing a character string having a predetermined number of characters from the beginning of the document into words, and extracting a keyword from the character string It is characterized by having.

【００１５】第７発明のデータ処理装置は、第１乃至第
６発明のいずれかにおいて、抽出したキーワードをソー
トしてキーワードの重複を検出する手段と、キーワード
が重複している場合は１個以外を削除する手段とを備え
たことを特徴とする。A data processing device according to a seventh aspect of the present invention is the data processing device according to any one of the first to sixth aspects, wherein the means for sorting the extracted keywords and detecting duplication of the keywords is provided. And means for deleting

【００１６】第８発明の記録媒体は、図面情報のファイ
ル、表形式の情報のファイル、及び文書ファイルを含む
複数種類のファイルからキーワードを抽出するプログラ
ムコード手段を含むコンピュータプログラムが記録され
ており、コンピュータでの読み取りが可能な記録媒体で
あって、前記コンピュータに、キーワードを抽出すべき
ファイルの拡張子からファイルの種類を判定させるプロ
グラムコード手段と、前記コンピュータに、所定点を原
点とした座標で範囲を規定した図面の中で、表題欄が含
まれる範囲を規定する少なくとも一点の座標値を図面の
サイズ別に記憶させるプログラムコード手段と、ファイ
ルの種類が図面情報のファイルの場合は、前記コンピュ
ータに、図面の座標を、前記所定点を原点とした座標に
変換させ、座標変換後の図面の中の、キーワードを抽出
すべき図面のサイズに対応付けて記憶されている、前記
少なくとも一点の座標値に基づいて規定される範囲から
キーワードを抽出させるプログラムコード手段と、ファ
イルの種類が表形式の情報のファイルの場合は、前記コ
ンピュータに、各列の項目に入力されているデータの属
性を示すデータが入力されている項目からなる項目行及
び／又は各行の項目に入力されているデータの属性を示
すデータが入力されている項目からなる項目列を、項目
に入力されている文字の文字種に基づいて検出させ、項
目行及び／又は項目列の項目からキーワードを抽出させ
るプログラムコード手段と、ファイルの種類が文書ファ
イルの場合は、前記コンピュータに、文書の先頭から所
定の文字数の文字列を語に分解させ、該文字列からキー
ワードを抽出させるプログラムコード手段とを含むこと
を特徴とする。A recording medium according to an eighth aspect of the present invention stores a computer program including program code means for extracting keywords from a plurality of types of files including a drawing information file, a tabular information file, and a document file, A computer-readable recording medium, wherein the computer causes the computer to determine a file type from an extension of a file from which a keyword is to be extracted. In a drawing defining a range, program code means for storing at least one point coordinate value defining a range including a title box for each drawing size, and when the file type is a drawing information file, the computer Then, the coordinates of the drawing are converted into coordinates with the predetermined point as the origin, and the coordinate transformation is performed. Program code means for extracting a keyword from a range defined based on the coordinate values of the at least one point, stored in association with the size of the drawing from which the keyword is to be extracted in a later drawing, and a file type Is a file of information in a tabular format, the computer inputs the data indicating the attribute of the data input to the item of each column to the item line composed of the item to which the data is input and / or is input to the item of each line. A program code for detecting an item string composed of items in which data indicating the attribute of the data is input based on the character type of the character input in the item, and extracting a keyword from the item in the item line and / or item string Means, when the file type is a document file, the computer causes the computer to decompose a character string having a predetermined number of characters from the beginning of the document into words. Characterized in that it comprises a program code means for extracting a keyword from the character string.

【００１７】本発明では、キーワードを抽出すべきファ
イルの拡張子からファイルの種類を判定し、ファイルの
種類が図面情報のファイルの場合は、例えば図面の左下
を原点とした座標で範囲が規定される図面の中で、例え
ば表題欄の左上の座標値を図面のサイズ（A0,A1,A2等）
別に記憶し、図面の座標を、図面の左下を原点とした座
標に変換し、座標変換後の図面の中の、図面のサイズに
対応付けて記憶されている、表題欄の左上の座標値に基
づいて、例えば表題欄の左上の座標値と図面の右下の座
標値とによって規定される範囲からキーワードを抽出
し、またファイルの種類が表形式の情報のファイルの場
合は、各列の項目に入力されているデータの属性を示す
項目名のようなデータが入力されている項目からなる項
目行，各行の項目に入力されているデータの属性を示す
項目名のようなデータが入力されている項目からなる項
目列を、項目に入力されている文字の文字種に基づいて
検出し、項目行，項目列の項目からキーワードを抽出
し、さらにファイルの種類が文書ファイルの場合は、文
書の先頭から目次，概要等が含まれ得る文字数の文字列
を語に分解し、この文字列からキーワードを抽出する。
これにより、キーワード抽出・登録を含む事前のセット
アップ作業を簡略化して、短期間でのマルチ・ドキュメ
ント・システムの立ち上げを可能にする。In the present invention, the file type is determined from the extension of the file from which the keyword is to be extracted, and if the file type is a drawing information file, the range is defined by coordinates with the origin at the lower left of the drawing, for example. In the drawing, for example, the coordinate value at the upper left of the title box is the size of the drawing (A0, A1, A2, etc.)
Separately stores and converts the coordinates of the drawing into coordinates with the lower left of the drawing as the origin, and converts the coordinates of the drawing into the upper left coordinates of the title box, which are stored in association with the size of the drawing. Based on, for example, keywords are extracted from the range defined by the upper left coordinate value of the title box and the lower right coordinate value of the drawing, and if the file type is a table format information file, the items in each column An item row consisting of items that have data entered, such as item names that indicate the attributes of the data that is being entered, and data that has been entered, such as item names that indicate the attributes of the data that has been entered for the items in each row The item string consisting of the item is detected based on the character type of the character entered in the item, the keyword is extracted from the item of the item line and item column, and if the file type is a document file, the beginning of the document To Table of Contents Decomposing the string of characters equal can include the word, a keyword is extracted from this string.
This simplifies the pre-setup work including keyword extraction and registration, and enables a multi-document system to be set up in a short period of time.

【００１８】また本発明では、図面の表題欄から、「図
番」「図面名称」「設計者氏名」「設計年月日」等の項
目名のような固定データを抽出し、固定データが入力さ
れている項目欄の右側，下側等の項目欄に入力されてい
るデータを、この固定データのキーワードとして抽出す
る。これにより、固定データ別に検索を行うことができ
るようになり、目的のファイルを短時間で検索できる。Further, in the present invention, fixed data such as item names such as “drawing number”, “drawing name”, “designer name”, “design date” are extracted from the title box of the drawing, and the fixed data is input. The data input to the item columns such as the right side and the lower side of the item column is extracted as a keyword of the fixed data. As a result, a search can be performed for each fixed data, and a target file can be searched in a short time.

【００１９】また本発明では、表形式の情報以外の範囲
からキーワードを抽出する。これにより、帳票の項目行
の上，項目列の左，最下行の下等に帳票の名称が入力さ
れている場合に、この名称をキーワードとして抽出する
ことができる。In the present invention, a keyword is extracted from a range other than information in a table format. Thus, when the name of the form is input above the item row of the form, to the left of the item column, below the bottom line, etc., this name can be extracted as a keyword.

【００２０】また本発明では、抽出したキーワードを例
えばファイル別にソートしてキーワードの重複を検出
し、重複しているキーワードを１個に絞る。これによ
り、検索時間を短縮できる。Further, according to the present invention, the extracted keywords are sorted, for example, for each file to detect keyword duplication, and the number of duplicate keywords is reduced to one. Thereby, the search time can be reduced.

【００２１】また本発明では、パーソナル・コンピュー
タ，ワークステーション等のデータ処理装置が記録媒体
のコンピュータプログラムを読み取り、キーワードを抽
出すべきファイルの拡張子からファイルの種類を判定
し、ファイルの種類が図面情報のファイルの場合は、例
えば図面の左下を原点とした座標で範囲が規定される図
面の中で、例えば表題欄の左上の座標値を図面のサイズ
（A0,A1,A2等）別に記憶し、図面の座標を、図面の左下
を原点とした座標に変換し、座標変換後の図面の中の、
図面のサイズに対応付けて記憶されている、表題欄の左
上の座標値に基づいて、例えば表題欄の左上の座標値と
図面の右下の座標値とによって規定される範囲からキー
ワードを抽出し、またファイルの種類が表形式の情報の
ファイルの場合は、各列の項目に入力されているデータ
の属性を示す項目名のようなデータが入力されている項
目からなる項目行，各行の項目に入力されているデータ
の属性を示す項目名のようなデータが入力されている項
目からなる項目列を、項目に入力されている文字の文字
種に基づいて検出し、項目行，項目列の項目からキーワ
ードを抽出し、さらにファイルの種類が文書ファイルの
場合は、文書の先頭から目次，概要等が含まれ得る文字
数の文字列を語に分解し、この文字列からキーワードを
抽出する。これにより、複数種類のファイルからキーワ
ードを抽出するコンピュータプログラムを、CD-ROM、MO
等の可搬形記録媒体、又は回線経由で提供することがで
きる。According to the present invention, a data processing device such as a personal computer or a workstation reads a computer program on a recording medium, determines a file type from an extension of a file from which a keyword is to be extracted, and determines whether the file type is a drawing. In the case of an information file, for example, in a drawing whose range is defined by coordinates with the origin at the lower left of the drawing, for example, the coordinate value of the upper left of the title box is stored for each drawing size (A0, A1, A2, etc.). , Convert the coordinates of the drawing into coordinates with the lower left of the drawing as the origin,
Based on the coordinates of the upper left corner of the title block stored in association with the size of the drawing, for example, a keyword is extracted from a range defined by the upper left coordinates of the title block and the lower right coordinates of the drawing. If the file type is a table format information file, an item line consisting of items in which data such as an item name indicating the attribute of the data entered in the item of each column is entered, and an item in each line Detects an item string consisting of items in which data such as an item name indicating the attribute of the data entered in item is entered based on the character type of the character entered in the item, and detects the item row and item column If the file type is a document file, a character string having the number of characters that can include the table of contents, outline, etc. is decomposed into words from the beginning of the document, and the keyword is extracted from this character string. This allows computer programs to extract keywords from multiple types of files to CD-ROM, MO
And the like, or can be provided via a line.

【００２２】[0022]

【発明の実施の形態】図１は本発明の全体構成図であ
る。ドキュメント指定手段１は、画面に表示されている
ファイル（ドキュメントともいう）名の中から、キーワ
ード抽出対象のファイルのファイル名を、マウスクリッ
ク，実行キーの押し下げ等でユーザに選択させることに
よって指定する。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is an overall configuration diagram of the present invention. The document designation unit 1 designates a file name of a file from which a keyword is to be extracted from among the file names (also referred to as documents) displayed on the screen by allowing the user to select the file name by clicking the mouse, depressing an execution key, or the like. .

【００２３】ドキュメントファイル２には、マルチ・ド
キュメント・システムを構築すべき複数種類のファイル
が格納されている。図９はドキュメントファイル２の概
念図であって、図面（拡張子“.CAD”）、帳票（拡張子
“.EXL”）、文書（拡張子“.DOC”）、画像（拡張
子“.TIF”）等の複数種類のファイルが格納され、ファ
イル名で管理されている。The document file 2 stores a plurality of types of files for constructing a multi-document system. FIG. 9 is a conceptual diagram of the document file 2, which includes a drawing (extension “.CAD”), a form (extension “.EXL”), a document (extension “.DOC”), and an image (extension “.TIF”). Multiple types of files such as ") are stored and managed by file names.

【００２４】ドキュメント種類判定手段３は、ドキュメ
ント指定手段１により指定されたファイルの種類を、フ
ァイル名の拡張子から判定する。抽出方法判定ファイル
４には、図面キーワード抽出、帳票キーワード抽出、文
書キーワード抽出といったキーワード抽出方法、及び文
字情報抽出といったキーワード抽出方法判定の前段に行
うべき方法名に対応付けて、この方法を実行してキーワ
ードを抽出すべきファイルの拡張子が格納されている。
図10は抽出方法判定ファイルの概念図である。The document type determining means 3 determines the type of the file specified by the document specifying means 1 from the file name extension. In the extraction method determination file 4, this method is executed in association with a keyword extraction method such as drawing keyword extraction, form keyword extraction, and document keyword extraction, and a method name to be performed before the keyword extraction method determination such as character information extraction. The extension of the file from which the keyword is to be extracted is stored.
FIG. 10 is a conceptual diagram of the extraction method determination file.

【００２５】図面キーワード抽出手段５、帳票キーワー
ド抽出手段６、及び文書キーワード抽出手段８は、後述
する手順を実行して図面、帳票、及び文書からキーワー
ドをそれぞれ自動抽出する。The drawing keyword extracting means 5, the form keyword extracting means 6, and the document keyword extracting means 8 automatically extract keywords from the drawing, the form, and the document, respectively, by executing a procedure described later.

【００２６】文字情報抽出手段７は、図面，帳票，文書
以外の画像のようなファイルから、例えばマクロで文字
情報を書き出せるか否かを判定し、文字情報を書き出せ
る場合は、文字情報ワークファイルのような作業領域に
文字情報を書き出して文書キーワード抽出手段８を起動
し、文字情報ワークファイルの文字情報からキーワード
を自動抽出させる。一方、文字情報が書き出せない場合
は、このファイルを文字情報未抽出ファイル９に格納す
る。図11は文字情報未抽出ファイル９の概念図である。The character information extracting means 7 determines whether or not character information can be written out from a file such as a drawing, a form, or an image other than a document, for example, by a macro. The character information is written in such a work area, the document keyword extracting means 8 is started, and the keyword is automatically extracted from the character information of the character information work file. On the other hand, if the character information cannot be written, this file is stored in the character information unextracted file 9. FIG. 11 is a conceptual diagram of the character information unextracted file 9.

【００２７】キーワード抽出ファイル10には、図面キー
ワード抽出手段５、帳票キーワード抽出手段６、及び文
書キーワード抽出手段８が抽出したキーワードのうち、
例えば２文字以上といったような指定文字数のキーワー
ドが、重複している分も含めて格納されている。図12は
キーワード抽出ファイル10の概念図である。The keyword extraction file 10 includes, among the keywords extracted by the drawing keyword extraction means 5, the form keyword extraction means 6, and the document keyword extraction means 8,
For example, a keyword having a designated number of characters, such as two or more characters, is stored including an overlapping portion. FIG. 12 is a conceptual diagram of the keyword extraction file 10.

【００２８】キーワード格納手段11は、キーワード抽出
ファイル10に格納されているキーワードをファイル別に
ソートしてキーワードの重複を検出し、重複しているキ
ーワードは１個以外を削除して、１個だけをキーワード
ファイル12に格納する。キーワードファイル12には、フ
ァイル名に対応付けて、各ファイルのキーワードが、各
キーワードを“，”のような特定の記号で区切って格納
されている。図17はキーワードファイル12の概念図であ
る。The keyword storage unit 11 sorts the keywords stored in the keyword extraction file 10 for each file, detects keyword duplication, deletes one or more duplicate keywords, and deletes only one. It is stored in the keyword file 12. In the keyword file 12, the keywords of each file are stored in association with the file names, with each keyword separated by a specific symbol such as ",". FIG. 17 is a conceptual diagram of the keyword file 12.

【００２９】キーワード入力手段13は、キーワード抽出
処理の過程で、ファイルから文字情報が書き出せないと
判定した時点で、又は一連のキーワード抽出処理が終了
した後で、オペレータにより、後述する出現回数ファイ
ルの参照等によってキー入力されたキーワードをキーワ
ード抽出ファイル10に格納する。When the keyword input means 13 determines that character information cannot be written out of the file in the course of the keyword extraction process, or after a series of keyword extraction processes is completed, the operator inputs an appearance frequency file described later by the operator. The keyword input by reference or the like is stored in the keyword extraction file 10.

【００３０】次に、図面キーワード抽出の具体的な手順
の一例を、図２のフローチャート、図３の相対座標変換
の説明図、図13の表題欄左上座標ファイルの概念図、及
び図14の固定項目キーワード抽出ファイルの概念図に基
づいて説明する。表題欄左上座標ファイル（図13）に
は、図面サイズに関係なく、図面の左下を原点(0,0) 、
左上を(0,1) 、右下を(1,0) 、右上を(1,1) とした場合
に、表題欄が含まれる範囲を規定する表題欄の左上の相
対座標値が図面サイズ（用紙サイズ）別に格納されてい
る。Next, an example of a specific procedure of drawing keyword extraction will be described with reference to the flowchart of FIG. 2, the explanatory diagram of the relative coordinate conversion of FIG. 3, the conceptual diagram of the upper left coordinate file of FIG. 13, and the fixed diagram of FIG. A description will be given based on a conceptual diagram of the item keyword extraction file. In the title block upper left coordinate file (Fig. 13), the origin (0,0),
When the upper left is (0,1), the lower right is (1,0), and the upper right is (1,1), the relative coordinate value at the upper left of the title box that defines the range including the title box is the drawing size ( Paper size).

【００３１】また、固定項目キーワード抽出ファイル
（図14）には、図面の表題欄の「図番」「名称」「（設
計者）氏名」「（設計）年月日」といった項目名に対応
付けて、図面の各ファイルのキーワードが格納される。The fixed item keyword extraction file (FIG. 14) is associated with item names such as “drawing number”, “name”, “(designer) name”, and “(design) date” in the title column of the drawing. Then, keywords of each file of the drawing are stored.

【００３２】まず、ドキュメントファイル２から図面を
読み込む（ステップS2-1）。図面の左下と右上との座標
値を、例えば(0,0) 、(1,1) と決め（ステップS2-2）、
図３に示すように、例えば左下の座標値が(-1,-1) 、右
上の座標値が(2,1) であるような図面の絶対座標を、左
下の座標値が(0,0) 、右上の座標値が(1,1) の相対座標
に変換する（ステップS2-3）。First, a drawing is read from the document file 2 (step S2-1). The coordinate values of the lower left and upper right of the drawing are determined, for example, as (0,0) and (1,1) (step S2-2),
As shown in FIG. 3, for example, the absolute coordinates of the drawing in which the lower left coordinate value is (-1, -1) and the upper right coordinate value is (2,1) are represented by (0,0) ), The upper right coordinate value is converted into a relative coordinate of (1,1) (step S2-3).

【００３３】次に、左上座標ファイル（図13）を参照し
て表題欄が含まれている範囲（表題欄の範囲）を決め
（ステップS2-4) 、表題欄内に有るキーワードを抽出す
る（ステップS2-5）。予め指定されている文字数（例え
ば２文字以上）のキーワードをキーワード抽出ファイル
10へ書き込む（ステップS2-6）。Next, referring to the upper left coordinate file (FIG. 13), a range including the title box (a range of the title box) is determined (step S2-4), and keywords in the title box are extracted (step S2-4). Step S2-5). Keyword extraction file for keywords with a specified number of characters (for example, two or more characters)
Write to 10 (step S2-6).

【００３４】キーワード抽出ファイル10に書き込んだキ
ーワードが、項目名のような固定項目か否かを判定する
（ステップS2-7）。固定項目の場合は、固定項目を抽出
し、この項目の例えば右側の項目の文字列をキーワード
として、固定項目キーワード抽出ファイル（図14）へ書
き込み（ステップS2-8）、この項目が最終項目か否かを
判定し（ステップS2-9）、最終項目でない場合はステッ
プS2-7へ戻って、次の項目の判定へ移行し、最終項目の
場合は処理を終了する。一方、固定項目でない場合は、
この項目が最終項目か否かを判定し（ステップS2-9）、
最終項目でない場合はステップS2-7へ戻って、次の項目
の判定へ移行し、最終項目の場合は処理を終了する。It is determined whether the keyword written in the keyword extraction file 10 is a fixed item such as an item name (step S2-7). In the case of the fixed item, the fixed item is extracted, and for example, the character string of the item on the right side of the item is written as a keyword in the fixed item keyword extraction file (FIG. 14) (step S2-8). It is determined whether it is not the last item or not (step S2-9). If it is not the last item, the process returns to step S2-7 to shift to the determination of the next item, and if it is the last item, the process ends. On the other hand, if it is not a fixed item,
Determine whether this item is the last item (step S2-9),
If it is not the last item, the process returns to step S2-7 to shift to the determination of the next item, and if it is the last item, the process ends.

【００３５】なお、上述の例では、A0、A1、A2等の図面
サイズにかかわらず、図面の絶対座標を、左下を(0,0)
、右上を(1,1) とした相対座標に変換する場合につい
て説明したが、図面の左下を(0,0) （右上は図面サイズ
によって異なる）とした座標における図面サイズ別の表
題欄の左上の座標値を記憶しておき、図面の絶対座標値
を、左下を原点(0,0) とした座標値に変換するだけでも
よい。In the above example, the absolute coordinates of the drawing are represented by (0,0) in the lower left, regardless of the drawing size such as A0, A1, A2, etc.
, The upper right is converted to relative coordinates with (1,1), but the lower left of the drawing is (0,0) (the upper right depends on the drawing size). May be stored, and the absolute coordinate values of the drawing may be simply converted into coordinate values with the origin at the lower left (0,0).

【００３６】次に、帳票キーワード抽出の手順を、図４
のフローチャート、及び図５の帳票キーワード抽出の説
明図に基づいて説明する。ドキュメントファイル２から
帳票を読み込み（ステップS4-1）、漢字，平仮名，及び
カタカナが最も多く含まれている先頭の行を項目行、こ
れらが最も多く含まれている先頭の列を項目列と決める
（ステップS4-2、S4-3）。次に、図５に示すように、先
頭行から項目行までの範囲から帳票名、各列の項目名の
ようなキーワードを抽出し（ステップS4-4）、さらに、
項目行以下の各行の項目列のキーワードを抽出する（ス
テップS4-5) 。抽出したキーワードのうち、例えば２文
字以上といったような指定文字数のキーワードをキーワ
ード抽出ファイル10に格納する（ステップS4-6）。Next, the procedure for extracting the form keyword will be described with reference to FIG.
Will be described with reference to the flowchart of FIG. The form is read from the document file 2 (step S4-1), and the first line containing the most kanji, hiragana and katakana is determined as the item line, and the first line containing the most kanji, hiragana and katakana is determined as the item column. (Steps S4-2, S4-3). Next, as shown in FIG. 5, keywords such as a form name and an item name of each column are extracted from the range from the first line to the item line (step S4-4).
The keyword of the item string of each line below the item line is extracted (step S4-5). Among the extracted keywords, keywords having a designated number of characters, such as two or more characters, are stored in the keyword extraction file 10 (step S4-6).

【００３７】次に、文字情報抽出の手順を図６のフロー
チャート及び図15の文字情報抽出ワークファイルの概念
図に基づいて説明する。ドキュメントファイル２からド
キュメントを読み込む（ステップS6-1）。マクロ等を使
用して文字情報を書き出し（ステップS6-2）、文字情報
の書き出しが可能か否かを判定する（ステップS6-3）、
文字情報の書き出しが不可能な場合は、処理を終了して
ドキュメントを文字情報未抽出ファイル９に格納する。Next, the procedure of character information extraction will be described with reference to the flowchart of FIG. 6 and the conceptual diagram of the character information extraction work file of FIG. The document is read from the document file 2 (step S6-1). The character information is written using a macro or the like (step S6-2), and it is determined whether or not the character information can be written (step S6-3).
If writing of the character information is not possible, the process is terminated and the document is stored in the character information unextracted file 9.

【００３８】一方、文字情報の書き出しが可能な場合
は、図15に示すような文字情報抽出ワークファイルへ文
字情報を書き込む（ステップS6-4）。文字情報抽出ワー
クファイルに書き込まれた文字情報からは、後述する文
書キーワード抽出手順によってキーワードを抽出する。On the other hand, if the writing of character information is possible, the character information is written into a character information extraction work file as shown in FIG. 15 (step S6-4). A keyword is extracted from the character information written in the character information extraction work file by a document keyword extraction procedure described later.

【００３９】次に、文書キーワード抽出の手順を図７の
フローチャートに基づいて説明する。ドキュメントファ
イル２から、又は図15に示すような文字情報抽出ワーク
ファイルからドキュメントを読み込み（ステップS7-
1）、全体の指定文字数のうち、先頭から、タイトル及
び「目次」という名称を含むと考えられる文字数から割
り出した所定の文字数の文字列を抽出し、語句に分解し
てキーワード抽出ファイルに書き込む（ステップS7-
2）。抽出した文字列の中に「目次」という語が有るか
否かを判定する（ステップS7-3）。Next, the procedure of document keyword extraction will be described with reference to the flowchart of FIG. The document is read from the document file 2 or from the character information extraction work file as shown in FIG. 15 (step S7-
1) A character string having a predetermined number of characters calculated from the number of characters considered to include the title and the "table of contents" from the beginning is extracted from the total number of specified characters, decomposed into words and written into a keyword extraction file ( Step S7-
2). It is determined whether the word “contents” is present in the extracted character string (step S7-3).

【００４０】「目次」という語が有る場合は、目次の内
容からキーワードを抽出してキーワード抽出ファイル10
に書き込む（ステップS7-4）。さらに、全体の指定文字
数から上述の所定の文字数を差し引いた残りの指定文字
数の文字列を抽出し、語句に分解してキーワード抽出フ
ァイル10に書き込む（ステップS7-5）。一方、上述の所
定の文字数の文字列の中に「目次」という語が無い場合
は、ステップS7-5へ移行して、全体の指定文字数から所
定の文字数を差し引いた残りの指定文字数の文字列を抽
出し、語句に分解してキーワード抽出ファイル10に書き
込む（ステップS7-5）。If there is a word "table of contents", a keyword is extracted from the contents of the table of contents and a keyword extraction file 10 is extracted.
(Step S7-4). Further, a character string of the remaining designated number of characters obtained by subtracting the above-mentioned predetermined number of characters from the entire designated number of characters is extracted, decomposed into words and written into the keyword extraction file 10 (step S7-5). On the other hand, if the word “table of contents” is not included in the character string having the predetermined number of characters, the process proceeds to step S7-5, and the character string having the remaining specified number of characters obtained by subtracting the predetermined number of characters from the entire specified number of characters Is extracted, decomposed into words, and written into the keyword extraction file 10 (step S7-5).

【００４１】次に、キーワード格納の手順を、図８のフ
ローチャート、図16の出現回数ファイルの概念図、及び
図17のキーワードファイルの概念図に基づいて説明す
る。キーワード抽出ファイル10を読み込み（ステップS8
-1）、キーワードをファイル別にソートする（ステップ
S8-2) 。キーワードが重複している場合はキーワードを
１つにする（ステップS8-3）。同時に、ソートしたキー
ワードの、各ファイル又は全ファイルにおける出現回数
をカウントして、図16に示すような出現回数ファイルに
書き込む（ステップS8-4）。重複分を除外したファイル
別のキーワードを、例えば図面の場合であれば、「図
番，文書番号以外は２文字以上の漢字，平仮名，カタカ
ナのみをキーワードとする」といったような規則に従っ
て選別した後、各キーワードを“，”のような特定の記
号で区切ってキーワードファイル12に書き込む（ステッ
プS8-5）。Next, the procedure for storing the keyword will be described with reference to the flowchart of FIG. 8, the conceptual diagram of the appearance frequency file of FIG. 16, and the conceptual diagram of the keyword file of FIG. Read the keyword extraction file 10 (Step S8
-1), sort keywords by file (step
S8-2). If the keywords are duplicated, the number of keywords is reduced to one (step S8-3). At the same time, the number of appearances of the sorted keywords in each file or all files is counted and written to the appearance frequency file as shown in FIG. 16 (step S8-4). For example, in the case of a drawing, keywords for each file excluding duplicates are selected according to a rule such as "Only two or more kanji, hiragana, and katakana are used as keywords except for the figure number and document number". Then, each keyword is written in the keyword file 12 by being separated by a specific symbol such as "," (step S8-5).

【００４２】キーワード入力手段13は、オペレータによ
り、後述する出現回数ファイルの参照等によってキー入
力されたキーワードをキーワード抽出ファイル10に格納
する。オペレータは、キーワード抽出処理の過程で、ド
キュメントから文字情報が書き出せないと判定した時点
で、又は一連のキーワード抽出処理が終了した後で、文
字情報未抽出ファイル９に格納されているドキュメント
のキーワードを、例えばマルチ・ドキュメント・システ
ムで管理するドキュメントが同一分野に属しているよう
な場合であれば出現回数ファイルを参照する等してキー
入力する。The keyword input means 13 stores in the keyword extraction file 10 a keyword that has been key-input by the operator by referring to an appearance frequency file, which will be described later. When the operator determines that character information cannot be written out of the document in the course of the keyword extraction process, or after a series of keyword extraction processes is completed, the operator extracts the keyword of the document stored in the character information unextracted file 9. For example, if the documents managed by the multi-document system belong to the same field, key input is performed by referring to the appearance frequency file.

【００４３】図面，帳票，文書等のドキュメントからキ
ーワードを自動的に抽出してマルチ・ドキュメント・シ
ステムを以上のようにして短期間で立ち上げた後、ドキ
ュメントを登録し替える時等に、キーワードをキー入力
により登録し直していけば、検索精度が向上する。After automatically extracting keywords from documents such as drawings, forms, documents, etc., and starting up the multi-document system in a short period of time as described above, when the documents are registered and replaced, the keywords are input. Re-registering by key input improves search accuracy.

【００４４】以上のようなキーワード抽出のコンピュー
タプログラムはコンピュータにプレインストールして提
供することも、またCD-ROM、MO等の可搬型記録媒体で提
供することも可能である。さらに回線経由で提供するこ
とも可能である。The computer program for keyword extraction as described above can be provided by being preinstalled on a computer, or can be provided on a portable recording medium such as a CD-ROM or MO. Further, it can be provided via a line.

【００４５】[0045]

【発明の効果】以上のように、本発明のデータ処理装置
及び記録媒体は、図面の座標を、所定点、例えば左下を
原点とした座標に変換して図面のサイズ，原点の位置の
如何にかかわらず、表題欄が含まれている範囲を検出
し、この範囲からキーワードを自動的に抽出するので、
また帳票の各列，各行の項目名が入力されている行，列
を、項目に入力されている文字の文字種に基づき、例え
ば数字以外の文字が有るか無いか、この文字数が最多か
否か等に基づいて検出し、この行，列からキーワードを
自動的に抽出するので、さらに、目次，概要等が含まれ
ている可能性が高い、文書の先頭部分の所定数の文字列
からキーワードを自動的に抽出するので、キーワード抽
出・登録を含む事前のセットアップ作業を簡略化して、
短期間でのマルチ・ドキュメント・システムの立ち上げ
を可能にするという優れた効果を奏する。As described above, the data processing apparatus and the recording medium of the present invention convert the coordinates of a drawing into a predetermined point, for example, coordinates with the origin at the lower left, and determine the size of the drawing and the position of the origin. Regardless, we detect a range that includes the title block and automatically extract keywords from this range,
In addition, based on the character type of the character entered in the item, for example, whether or not there is a character other than a numeral, and whether or not the number of characters is the largest, based on the character type of the character entered in the item. And automatically extract keywords from these rows and columns. Therefore, keywords are extracted from a predetermined number of character strings at the beginning of the document, which are likely to contain a table of contents, summary, etc. Because it extracts automatically, it simplifies the pre-setup work including keyword extraction and registration,
It has an excellent effect that a multi-document system can be started up in a short time.

[Brief description of the drawings]

【図１】本発明の全体構成図である。FIG. 1 is an overall configuration diagram of the present invention.

【図２】図面キーワード抽出のフローチャートである。FIG. 2 is a flowchart of drawing keyword extraction.

【図３】相対座標変換の説明図である。FIG. 3 is an explanatory diagram of relative coordinate conversion.

【図４】帳票キーワード抽出のフローチャートである。FIG. 4 is a flowchart of form keyword extraction.

【図５】帳票キーワード抽出の説明図である。FIG. 5 is an explanatory diagram of form keyword extraction.

【図６】文字情報抽出のフローチャートである。FIG. 6 is a flowchart of character information extraction.

【図７】文書キーワード抽出のフローチャートである。FIG. 7 is a flowchart of document keyword extraction.

【図８】キーワード格納のフローチャートである。FIG. 8 is a flowchart of keyword storage.

【図９】ドキュメントファイルの概念図である。FIG. 9 is a conceptual diagram of a document file.

【図１０】抽出方法判定ファイルの概念図である。FIG. 10 is a conceptual diagram of an extraction method determination file.

【図１１】文字情報未抽出ファイルの概念図である。FIG. 11 is a conceptual diagram of a character information unextracted file.

【図１２】キーワード抽出ファイルの概念図である。FIG. 12 is a conceptual diagram of a keyword extraction file.

【図１３】表題欄左上座標ファイルの概念図である。FIG. 13 is a conceptual diagram of a title column upper left coordinate file.

【図１４】固定項目キーワード抽出ファイルの概念図で
ある。FIG. 14 is a conceptual diagram of a fixed item keyword extraction file.

【図１５】文字情報抽出ワークファイルの概念図であ
る。FIG. 15 is a conceptual diagram of a character information extraction work file.

【図１６】出現回数ファイルの概念図である。FIG. 16 is a conceptual diagram of an appearance frequency file.

【図１７】キーワードファイルの概念図である。FIG. 17 is a conceptual diagram of a keyword file.

[Explanation of symbols]

１ドキュメント指定手段２ドキュメントファイル３ドキュメント種類判定手段４抽出方法判定ファイル５図面キーワード抽出手段６帳票キーワード抽出手段７文字情報抽出手段８文書キーワード抽出手段９文字情報未抽出ファイル 10 キーワード抽出ファイル 11 キーワード格納手段 12 キーワードファイル 13 キーワード入力手段 DESCRIPTION OF SYMBOLS 1 Document designation means 2 Document file 3 Document type judgment means 4 Extraction method judgment file 5 Drawing keyword extraction means 6 Form keyword extraction means 7 Text information extraction means 8 Document keyword extraction means 9 Text information unextracted file 10 Keyword extraction file 11 Keyword storage Means 12 Keyword file 13 Keyword input means

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B075 ND06 NK32 NK37 NR05 NR14 UU21 5B082 EA08 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5B075 ND06 NK32 NK37 NR05 NR14 UU21 5B082 EA08

Claims

[Claims]

1. A data processing device having a function of extracting a keyword from a drawing information file having a title box, wherein a drawing defines a range by coordinates with a predetermined point as an origin.
Means for storing coordinate values of at least one point defining a range including the title box for each size of the drawing; means for converting the coordinates of the drawing into coordinates having the predetermined point as the origin; Means for extracting a keyword from a range defined based on the coordinate value of the at least one point, stored in association with the size of the drawing from which the keyword is to be extracted. .

2. The title column has a plurality of item columns for inputting data unique to the drawing and fixed data indicating the attribute of the data, and extracts the fixed data input in the item column. Means and the data entered in the item column at a position adjacent to the item column in which the fixed data is entered,
2. A data processing apparatus according to claim 1, further comprising means for extracting the fixed data as a keyword.

3. A data processing apparatus having a function of extracting a keyword from a file of information in a tabular format, wherein an item row including an item in which data indicating an attribute of data input in an item of each column is input. And / or means for detecting, based on the character type of the character input to the item, an item string consisting of the item in which data indicating the attribute of the data input to the item in each line is input, and Or a means for extracting a keyword from the items in the item string.

4. The data processing apparatus according to claim 3, further comprising means for extracting a keyword from a range other than the information in a table format.

5. A data processing apparatus having a function of extracting a keyword from a document file, comprising: means for decomposing a character string having a predetermined number of characters from the beginning of the document into words; and means for extracting a keyword from the character string. A data processing device comprising:

6. A data processing apparatus having a function of extracting a keyword from a plurality of types of files including a drawing information file, a tabular information file, and a document file, wherein a file from which the keyword is to be extracted is extended. Means for determining the type of file from a child, and a drawing in which a range is defined by coordinates with a predetermined point as an origin,
Means for storing at least one point coordinate value defining a range including the title box for each drawing size, and, if the file type is a drawing information file, drawing coordinates are set to coordinates with the predetermined point as an origin. Means for extracting a keyword from a range defined based on the coordinate values of the at least one point, wherein the keyword is stored in association with the size of the drawing from which the keyword is to be extracted, in the drawing after the coordinate conversion. When the file type is a file of the information in the table format, the file is entered in the item line of the item in which the data indicating the attribute of the data entered in the item of each column is entered and / or the item of each line. An item string consisting of items in which data indicating data attributes is input is detected based on the character type of the character input in the item, and a key is detected from the item in the item line and / or item column. And a means for extracting a character string having a predetermined number of characters from the beginning of the document into words and extracting a keyword from the character string when the file type is a document file. Data processing device.

7. The apparatus according to claim 1, further comprising: means for sorting the extracted keywords to detect duplication of the keywords; and, when the keywords are duplicated, means for deleting a keyword other than one. Data processing equipment.

8. A computer program including program code means for extracting a keyword from a plurality of types of files including a drawing information file, a tabular information file, and a document file is recorded, and can be read by a computer. A program code means for causing the computer to determine a file type from an extension of a file from which a keyword is to be extracted; and A program code means for storing at least one point coordinate value defining a range including the title box for each drawing size; and when the file type is a drawing information file, the computer stores the drawing coordinates in a file. The coordinates are converted to coordinates with the predetermined point as the origin, and the Program code means for extracting a keyword from a range defined based on the coordinate values of the at least one point, stored in association with the size of the drawing from which the word is to be extracted; In the case of (1), the computer is provided with an item row composed of items in which data indicating the attributes of the data input to the items of each column are input, and / or data indicating the attributes of the data input in the items of each row. Is an item string consisting of items for which
Program code means for detecting based on the character type of the character input to the item and extracting a keyword from the item in the item line and / or the item column; and when the file type is a document file, the computer A recording medium comprising program code means for decomposing a character string having a predetermined number of characters from the beginning into words and extracting a keyword from the character string.