JP2014134920A

JP2014134920A - Design document retrieval system, construction method of design document retrieval system and program therefor

Info

Publication number: JP2014134920A
Application number: JP2013001845A
Authority: JP
Inventors: Yoshiyuki Kobayashi; 義行小林
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-01-09
Filing date: 2013-01-09
Publication date: 2014-07-24

Abstract

PROBLEM TO BE SOLVED: To automatically extract retrieval information from design documents so as to retrieve required design documents without missing, and to retrieve the design documents by using the retrieval information.SOLUTION: A design document retrieval system comprises: a character string extraction section for extracting a character string from a design document file; a specification item name collection section for collecting specification item names from a collection of design document files classified for each design document type, while using the extracted character string; a specification data collection section for collecting specification data from a collection of design document files classified for each design document case, while using the extracted character string; and a specification item name-specification data relation extraction section in which the design document files, the specification item names collected by the specification item name collection section and the specification data collected by the specification data collection section are made correspondent and stored in a specification database.

Description

本発明は、製品やプラントの設計時に作成する設計関連文書の管理および検索の技術に関する。 The present invention relates to a technology for managing and retrieving design-related documents created when designing a product or a plant.

製品の製造やプラントの建設に先だって作成する設計関連文書（以下では、「設計書」と呼ぶ。）は、ひとつの対象を複数の観点にしたがって記述することが多い。一例として、発電所などのプラントの建設においては、ひとつの構造物に対して、観点が異なるさまざまな設計書が作られる。例えば、配管の設計図では、物理的な構造を表す構造図、配管がどのようにつながっているかを表す配管系統図、配管上にどのようにセンサを配置するかを表すセンサ配置図、センサデータをどのように伝達するかを表すデータフロー図などがある。 Design-related documents (hereinafter referred to as “design documents”) created prior to product manufacturing and plant construction often describe one target from multiple viewpoints. As an example, in the construction of a plant such as a power plant, various design documents with different viewpoints are made for one structure. For example, in a piping design drawing, a structural diagram showing the physical structure, a piping system diagram showing how the pipes are connected, a sensor arrangement drawing showing how the sensors are arranged on the piping, sensor data There is a data flow diagram showing how to transmit the information.

設計書は、製造や建設の目的で使用する。さらに、保守・点検のときにも設計書を参照することがある。さらには、既存の設計書を基にして新規の設計書を作成することもある。これらの設計書を利用する作業では、複数の設計書をすべて検索し、相互に関係づけて閲覧することが必要である。 Design documents are used for manufacturing and construction purposes. In addition, the design document may be referred to during maintenance and inspection. Furthermore, a new design document may be created based on an existing design document. In the work using these design documents, it is necessary to search all the plurality of design documents and browse them in relation to each other.

設計書は自然文で書かれたテキストだけでなく、図面を含むことが多い。また、ＣＡＤシステムを含むさまざまなシステムで作成されるため、作成データを含む電子ファイルそのものでなく、印刷した設計書をスキャナなどで取り込んだ形式で保存していることも多い。スキャナで取り込んだ形式であれば、どのような図面であっても閲覧することが可能であるからである。 Design documents often include drawings as well as text written in natural language. In addition, since it is created by various systems including a CAD system, a printed design document is often stored in a format captured by a scanner or the like instead of an electronic file itself containing created data. This is because it is possible to view any drawing as long as it is in a format captured by a scanner.

このため、設計書の検索には、全文検索のようなテキスト検索技術では十分な検索結果を得ることができない。そこで、特許文献１に開示されているように、設計書ごとにキーワードなど検索用の情報をあらかじめ付与しておき、それらの情報を使って検索することが一般的である。 For this reason, text search techniques such as full-text search cannot retrieve sufficient search results for design documents. Therefore, as disclosed in Patent Document 1, it is common to search information such as keywords in advance for each design document, and search using such information.

特開２０１０−４９５６２号公報JP 2010-49562 A

「アルゴリズムイントロダクション」ＩＳＢＮ：９７８−４７６４９０３３５７の１５章"Algorithm Introduction" ISBN: 978-476903357 Chapter 15

設計書ごとにキーワードなどの検索用の情報を付与しておくことで、必要な設計書を検索できるようになる。 By adding search information such as keywords to each design document, it becomes possible to retrieve the necessary design document.

しかし、検索用情報の入力は設計書を登録する者にまかされている。そのため、検索用情報の登録漏れがあった場合や、当初は想定していなかった検索用情報が必要となった場合には、関連する設計書を全て検索することは難しくなる。このとき、図面データが多い設計書に対しては、一般的な文書検索プログラムを適用することは困難である。 However, the input of search information is left to the person who registers the design document. For this reason, when there is omission of registration of search information, or when search information that was not initially assumed is required, it is difficult to search all related design documents. At this time, it is difficult to apply a general document search program to a design document with a lot of drawing data.

対策としては、キーワードなどの検索用情報を自動的に抽出する方法が考えられる。しかし、一般的な文書に対して研究されてきた文書からのターム抽出では、統計的な性質に基づく自動抽出が多い。設計書の中でキーワードとすべきタームが、統計的に特徴的なタームであるとは限らない。 As a countermeasure, a method of automatically extracting search information such as keywords can be considered. However, in terms of term extraction from documents that have been studied for general documents, there are many automatic extractions based on statistical properties. The terms that should be keywords in the design document are not necessarily statistically characteristic terms.

そのため、設計書を関連付けるキーワード抽出という観点からは、文書相互を関連付けることができるかという観点によって、タームを抽出する方法が必要である。そこで、設計書の間で受け渡される記述を自動的に抽出し、それらを適切に分類することで自動的にキーワードを抽出する方法が望まれる。 Therefore, from the viewpoint of keyword extraction for associating design documents, a method for extracting terms from the viewpoint of whether documents can be associated with each other is necessary. Therefore, there is a demand for a method of automatically extracting keywords passed between design documents and automatically extracting keywords by appropriately classifying them.

本発明は、必要な設計書を漏れなく検索できるように、設計書から自動的に検索用情報を抽出し、この検索用情報を使って検索することを目的とする。 An object of the present invention is to automatically extract search information from a design document so that a necessary design document can be searched without omission, and to perform a search using the search information.

上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。
本願は上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、設計書ファイルを保存する設計書データベースと、前記設計書ファイルと仕様項目名と仕様データの関係を登録する仕様データベースと、検索クエリの入力を受け付ける検索クエリ受付部と、検索クエリとして入力された案件名や仕様項目名などに基づいて前記仕様データベースを検索する設計書検索部と、検索結果に基づいて前記設計書データベースから設計書ファイルを出力する設計書ファイル出力部とを備える設計書の検索システムである。 In order to solve the above problems, for example, the configuration described in the claims is adopted.
The present application includes a plurality of means for solving the above problems. To give an example, a design document database for storing a design document file, and a specification for registering the relationship between the design document file, specification item name, and specification data. A database, a search query reception unit that receives an input of a search query, a design document search unit that searches the specification database based on a matter name or a specification item name input as a search query, and the design based on a search result The design document search system includes a design document file output unit that outputs a design document file from a document database.

本発明の設計書の検索システムにおいて、更に、前記設計書ファイルから文字列を抽出する文字列抽出部と、設計書の種類ごとに分類した前記設計書ファイルの集合から、前記抽出した文字列を使って前記仕様項目名を収集する仕様項目名収集部と、設計書の案件ごとに分類した前記設計書ファイルの集合から、前記抽出した文字列を使って前記仕様データを収集する仕様データ収集部と、前記設計書ファイルと前記仕様項目名収集部で収集した仕様項目名と前記仕様データ収集部で収集した仕様データとを対応づけ、前記仕様データベースに記憶する仕様項目名−仕様データ関係抽出部とを備えていてもよい。 In the design document search system of the present invention, the extracted character string is further extracted from a character string extraction unit that extracts a character string from the design document file and the set of design document files classified for each type of design document. A specification item collection unit that collects the specification item name using the extracted character string from a set of the design document files classified for each item of the design document. A specification item name-specification data relationship extraction unit that associates the design document file with the specification item name collected by the specification item name collection unit and the specification data collected by the specification data collection unit and stores them in the specification database And may be provided.

また、本発明の設計書の検索システムの構築方法は、設計書ファイルから文字列を抽出するステップと、前記設計書ファイルを設計書の種類と設計書の案件にしたがって分類するステップと、前記設計書の種類ごとに分類した設計書ファイルの集合から、前記抽出した文字列を使って仕様項目名を収集するステップと、前記設計書の案件ごとに分類した設計書ファイルの集合から、前記抽出した文字列を使って仕様データを収集するステップと、前記収集した仕様項目名と前記収集した仕様データとを対応づけるステップと、前記設計書ファイルと前記仕様項目名と前記仕様データとの関係を仕様データベースとして登録するステップとを備えるものである。 Further, the design document search system construction method of the present invention includes a step of extracting a character string from a design document file, a step of classifying the design document file according to a design document type and a design document item, and the design The specification item names are collected from the set of design document files classified for each type of document using the extracted character strings, and the extracted from the set of design document files classified for each item of the design document. Collecting specification data using character strings, associating the collected specification item names with the collected specification data, and specifying the relationship between the design document file, the specification item names, and the specification data And registering as a database.

また、本発明のプログラムは、コンピュータに設計書の検索システムを構築させるためのプログラムであって、設計書ファイルから文字列を抽出するステップと、前記設計書ファイルを設計書の種類と設計書の案件にしたがって分類するステップと、前記設計書の種類ごとに分類した設計書ファイルの集合から、前記抽出した文字列を使って、文字列の間の類似度を評価して仕様項目名を収集するステップと、前記設計書の案件ごとに分類した設計書ファイルの集合から、前記抽出した文字列を使って、文字列の間の類似度を評価して仕様データを収集するステップと、前記収集した仕様項目名と前記収集した仕様データとを対応づけるステップと、前記設計書ファイルと前記仕様項目名と前記仕様データとの関係を仕様データベースとして登録するステップとを実行させるためのプログラムである。 The program of the present invention is a program for causing a computer to construct a design document search system, which includes a step of extracting a character string from a design document file, and the design document file includes a type of design document and a design document. Collecting specification item names by evaluating the similarity between character strings using the extracted character strings from the step of classifying according to the project and a set of design document files classified for each type of design document Collecting the specification data by evaluating the similarity between the character strings using the extracted character string from the set of design document files classified for each case of the design document, and the collected A step of associating a specification item name with the collected specification data, and a relationship between the design document file, the specification item name and the specification data as a specification database Is a program for executing the steps of recording.

本発明によれば、設計書から自動的に仕様項目名と仕様データを抽出することで、あらかじめ付与していない検索用情報を、検索用の情報として追加できるようになり、必要な設計書を検索する作業の負担を軽減することができる。 According to the present invention, by automatically extracting the specification item name and the specification data from the design document, it becomes possible to add the search information that has not been given in advance as the search information. The burden of searching can be reduced.

本発明の設計書の検索システムの一例を示す機能ブロック図。The functional block diagram which shows an example of the search system of the design document of this invention. 本発明の設計書の検索システムの機器構成の一例を示す図。The figure which shows an example of the apparatus structure of the search system of the design document of this invention. 設計書ファイルのデータ構造を示す図。The figure which shows the data structure of a design document file. 仕様項目名収集処理の流れを示す図。The figure which shows the flow of a specification item name collection process. 仕様項目名収集処理の集計テーブルを示す図。The figure which shows the total table of a specification item name collection process. 仕様項目名収集処理の出力テーブルを示す図。The figure which shows the output table of a specification item name collection process. 仕様データ収集処理の流れを示す図。The figure which shows the flow of a specification data collection process. 仕様データ収集処理の集計テーブルを示す図。The figure which shows the aggregation table of a specification data collection process. 仕様データ収集処理の出力テーブルを示す図。The figure which shows the output table of a specification data collection process. 文字列の対応付けの例文。An example of string mapping. 文字列の類似度評価の表の一例を示す図。The figure which shows an example of the table | surface of similarity evaluation of a character string. 例文の文字列の対応づけ結果。Result of matching example string. 仕様項目名―仕様関係データ抽出処理の流れを示す図。The figure which shows the flow of a specification item name-specification relation data extraction process. 仕様項目名―仕様関係データの集計テーブルを示す図。The figure which shows the aggregation table of specification item name-specification relation data. 設計書データベースのテーブルを示す図。The figure which shows the table of a design document database. 仕様データベースのテーブルを示す図。The figure which shows the table of a specification database. 企業資産管理システムと設計書の検索システムとの関係を示す図。The figure which shows the relationship between a company asset management system and the search system of a design document. 文字列類似度を求める数式１。Formula 1 for obtaining the character string similarity.

本発明の実施の形態を、図面に基づいて説明する。 Embodiments of the present invention will be described with reference to the drawings.

図１に、本発明の設計書の検索システムの機能ブロック図を示す。このシステムでは、設計書から仕様項目名と仕様データを抽出する処理と、設計書を検索する処理の２つの処理を行う。設計書から仕様項目名と仕様データを抽出する処理は、設計書ファイル入力部１０１、設計書ファイル読込部１０２、文字列抽出部１０３、設計書ファイル分類部１０４、文字列類似度評価部１０５、仕様項目名収集部１０６、仕様データ収集部１０７、仕様項目名−仕様データ関係抽出部１０８、設計書データベース１０９、仕様データベース１１０で実行される。また、設計書を検索する処理は、検索クエリ受付部１１１、設計書検索部１１２、設計書ファイル出力部１１３、設計書データベース１０９、仕様データベース１１０で実行される。 FIG. 1 shows a functional block diagram of the design document search system of the present invention. This system performs two processes: a process for extracting specification item names and specification data from a design document, and a process for retrieving a design document. The process of extracting the specification item name and the specification data from the design document includes a design document file input unit 101, a design document file reading unit 102, a character string extraction unit 103, a design document file classification unit 104, a character string similarity evaluation unit 105, The specification item name collection unit 106, the specification data collection unit 107, the specification item name-specification data relationship extraction unit 108, the design document database 109, and the specification database 110 are executed. The process for searching for a design document is executed by the search query receiving unit 111, the design document search unit 112, the design document file output unit 113, the design document database 109, and the specification database 110.

また、本発明は、図２に示すように、中央処理ユニットを持ち蓄積プログラム方式によって情報を処理する中央処理装置２０１と、ランダムアクセスメモリからなる主記憶装置２０２、処理対象の文書や処理結果の辞書を保存する外部記憶装置２０３、文書などの入力に使用する入力装置２０４、作成した辞書など情報処理結果を出力する出力装置２０５から構成される装置、例えばコンピュータ上で動作する。なお、外部記憶装置２０３には、データベース２０３１や辞書２０３２が記憶される。入力装置２０４は、ＣＤ−ＲＯＭ読取装置２０４１、ＤＶＤ読取装置２０４２、キーボード２０４３などから構成される。出力装置２０５は、ＣＤ−ＲＯＭ書込装置２０５１、ＤＶＤ書込装置２０５２、ディスプレイ２０５３などから構成される。また、中央処理装置２０１は、ネットワーク２０６を介して他の情報処理装置２０７と接続されていてもよい。 In addition, as shown in FIG. 2, the present invention has a central processing unit 201 that has a central processing unit and processes information by a storage program method, a main storage device 202 composed of a random access memory, a processing target document and processing results. The system operates on an apparatus such as a computer, which includes an external storage device 203 that stores a dictionary, an input device 204 that is used to input a document, and an output device 205 that outputs information processing results such as a created dictionary. The external storage device 203 stores a database 2031 and a dictionary 2032. The input device 204 includes a CD-ROM reader 2041, a DVD reader 2042, a keyboard 2043, and the like. The output device 205 includes a CD-ROM writing device 2051, a DVD writing device 2052, a display 2053, and the like. In addition, the central processing unit 201 may be connected to another information processing apparatus 207 via the network 206.

設計書の検索システムは、コンピュータにおいて、中央処理装置が所定のプログラムを主記憶装置上にロードし、また、中央処理装置が主記憶装置上にロードした所定のプラグラムを実行することにより実現できる。この所定のプログラムは、入力装置を介して当該プログラムが記憶された記憶媒体から、または、ネットワークから入力して、直接主記憶装置上にロードするか、もしくは、一旦、外部記憶装置に格納してから、主記憶装置上にロードすれば良い。 The design document search system can be realized by executing a predetermined program loaded on the main storage device by the central processing unit and executed by the central processing unit in the computer. This predetermined program is input from a storage medium storing the program via an input device or from a network and directly loaded onto the main storage device or temporarily stored in an external storage device. From the main storage device.

本発明におけるプログラムの発明は、このようにコンピュータに組み込まれ、コンピュータを設計書の検索システムとして動作させるプログラムである。本発明のプログラムをコンピュータに組み込むことにより、図１のブロック図に示される設計書の検索システムが構成される。 The invention of the program according to the present invention is a program that is incorporated in a computer and operates as a design document search system. The design document search system shown in the block diagram of FIG. 1 is configured by incorporating the program of the present invention into a computer.

設計書ファイル入力部１０１は、ＤＶＤやＣＤ−ＲＯＭなどの記憶媒体の形式でシステムの外部から入力される電子ファイルを入力装置２０４で受けつけ、外部記憶装置２０３に保存する。外部記憶装置２０３上には、設計書データベース１０９、仕様データベース１１０が構築される。設計書ファイルはＣＡＤシステムやワードプロセッサなどで作成したり、紙に印刷された設計書をスキャン処理によって取り込んだ電子ファイルである。なかに含まれる文字列は、文字コード化されているとする。文字コードはとくに制限しない。スキャン処理したファイルの場合は、光学式文字読み取りプログラムによって文字画像から文字コードの認識処理を行っているものとする。 The design document file input unit 101 receives an electronic file input from the outside of the system in the form of a storage medium such as a DVD or a CD-ROM by the input device 204 and stores it in the external storage device 203. On the external storage device 203, a design document database 109 and a specification database 110 are constructed. The design document file is an electronic file that is created by a CAD system, a word processor, or the like, or a design document printed on paper is captured by a scanning process. It is assumed that the character string contained therein is character-coded. The character code is not particularly limited. In the case of a scanned file, it is assumed that the character code is recognized from the character image by the optical character reading program.

設計書ファイルには識別のためのユニークな記号が付与される。以下では識別のためのユニークな記号をファイル識別子と呼ぶ。設計書ファイルには、ファイル識別子、設計書の種類、設計書の案件が記述され、読み取り可能とする。 The design document file is given a unique symbol for identification. Hereinafter, a unique symbol for identification is called a file identifier. In the design document file, a file identifier, a design document type, and a design document item are described and can be read.

設計書ファイル読込部１０２は、外部記憶装置２０３上に構築されている設計書データベース１０９に保存されている設計書を主記憶装置２０２に読込む。 The design document file reading unit 102 reads the design document stored in the design document database 109 built on the external storage device 203 into the main storage device 202.

文字列抽出部１０３は、設計書ファイルから文字列を抽出する。抽出した文字列はファイル識別子、設計書の種類、設計書の案件、ファイル内の位置と対応づけて主記憶装置２０２に保持するものとする。ファイル内の位置は、ページ番号とページ内の２次元座標によって表すものとする。また、文字列は、改行コードや文字列のあいだの空間的距離の大きさによって適宜分割されているものとする。 The character string extraction unit 103 extracts a character string from the design document file. It is assumed that the extracted character string is stored in the main storage device 202 in association with the file identifier, the design document type, the design document item, and the position in the file. The position in the file is represented by a page number and two-dimensional coordinates in the page. In addition, the character strings are appropriately divided according to the line feed code and the size of the spatial distance between the character strings.

設計書ファイル分類部１０４は、設計書ファイルを設計書の種類と設計書の案件にしたがって分類する。種類ごとに分類した設計書ファイルの集合を使い、仕様項目名収集部１０６で仕様項目名の収集を実行する。案件ごとに分類した設計書ファイルの集合を使い、仕様データ収集部１０７で仕様データ集計処理を実行する。設計書ファイル分類部１０４から仕様項目名収集部１０６および仕様データ収集部１０７にデータを送るときは、図３に示す形式でファイル識別子、設計書の種類、設計書の案件、設計書に含まれる文字列の情報を送るものとする。文字列の情報は、文字列そのもの、文字列が現れるページ、ページのＸ座標、Ｙ座標を含むものとする。なお、図３は、１つの設計書ファイルのデータを表している。 The design document file classification unit 104 classifies the design document file according to the design document type and the design document item. Using a set of design document files classified for each type, the specification item name collection unit 106 collects specification item names. Using the set of design document files classified for each case, the specification data collection unit 107 executes specification data aggregation processing. When data is sent from the design document file classification unit 104 to the specification item name collection unit 106 and the specification data collection unit 107, they are included in the file identifier, design document type, design document item, and design document in the format shown in FIG. Assume that string information is sent. The character string information includes the character string itself, the page on which the character string appears, and the X and Y coordinates of the page. FIG. 3 shows data of one design document file.

ここで、設計書ファイルの一例を示すと、設計書の種類としては「配管図」などが、設計書の種類としては「Ａ発電所の配管工事」などが、仕様項目としては「配管名」・・などが、仕様データとしては「ＡＳ−１（配管を表す記号）」・・・などが挙げられる。 Here, an example of the design document file is “Piping diagram” as the type of design document, “Plumbing work for power plant A” as the type of design document, and “Piping name” as the specification item. As specification data, “AS-1 (symbol representing piping)”...

仕様項目名収集部１０６は、同じ種類の設計書、例えば配管図から抽出した文字列のあいだの類似度を評価し、その結果を使い仕様項目名を収集する処理を実行する。このとき、文字列類似度評価部１０５を使って設計書から抽出した文字列のあいだの類似度を評価する。 The specification item name collection unit 106 evaluates the similarity between character strings extracted from the same type of design document, for example, a piping diagram, and executes a process of collecting specification item names using the result. At this time, the character string similarity evaluation unit 105 is used to evaluate the similarity between character strings extracted from the design document.

仕様項目名収集処理の流れを図４に示す。まず、同じ種類の設計書ファイルを所定の数だけ読み込む（Ｓ４０１）。所定の数はとくに制限しない。処理装置の処理性能に応じて適宜設定すればよい。すべての設計書ファイルの２つ組を比較すれば処理は終了である（Ｓ４０２）。処理していない２つ組があれば、設計書ファイルを２つ取り出し（Ｓ４０３）、その２つのファイルを比較する。それぞれから文字列を抽出し、すべての文字列の類似度を文字列類似度を使い比較する（Ｓ４０４）。 The flow of the specification item name collection process is shown in FIG. First, a predetermined number of design document files of the same type are read (S401). The predetermined number is not particularly limited. What is necessary is just to set suitably according to the processing performance of a processing apparatus. If the two sets of all design document files are compared, the process is completed (S402). If there are two sets not processed, two design document files are taken out (S403), and the two files are compared. Character strings are extracted from each, and the similarity of all character strings is compared using the character string similarity (S404).

比較した結果は図５に示す集計テーブルに保存する（Ｓ４０５）。文字列類似度評価部１０５で類似すると評価された文字列が１つのレコード（横一行）に集計され、それぞれの文字列ごとに頻度を保存する。すべての文字列の頻度を合計した値を、全頻度の部分に保存する。ファイル識別子は複数個の可能性がある。図５において、例えば、文字列１が「ＡＡＡ」であり、文字列ｋが「ＡＡａ」である。文字列「ＢＢＢ」、「ＣＣＣ」などについても、同様なレコードに集計される。 The comparison result is stored in the tabulation table shown in FIG. 5 (S405). Character strings evaluated as similar by the character string similarity evaluation unit 105 are aggregated into one record (one horizontal line), and the frequency is stored for each character string. The sum of the frequencies of all strings is stored in the total frequency part. There may be multiple file identifiers. In FIG. 5, for example, the character string 1 is “AAA” and the character string k is “AAa”. Character strings “BBB”, “CCC”, and the like are also tabulated in a similar record.

すべてのファイルを処理し終えたら、集計テーブルの全頻度をチェックし、その値が、ファイル数に対してｎ％以上のレコードのみを選択する（Ｓ４０６）。ｎは任意に設定できる。選択したレコードについて最大出現頻度の文字列を選び、その文字列、例えば「ＡＡＡ」を標準文字列とする（Ｓ４０７）。図６の形式で、仕様項目名リストを出力する（Ｓ４０８）。 When all files have been processed, the total frequency of the tabulation table is checked, and only records whose values are n% or more of the number of files are selected (S406). n can be set arbitrarily. A character string having the maximum appearance frequency is selected for the selected record, and the character string, for example, “AAA” is set as a standard character string (S407). A specification item name list is output in the format of FIG. 6 (S408).

仕様データ収集部１０７は、同じ案件の設計書から抽出した文字列のあいだの類似度を評価し、その結果を使い仕様データを収集する処理を実行する。このとき、文字列類似度評価部１０５を使って設計書から抽出した文字列のあいだの類似度を評価する。 The specification data collection unit 107 evaluates the similarity between character strings extracted from the design document of the same item, and executes a process of collecting specification data using the result. At this time, the character string similarity evaluation unit 105 is used to evaluate the similarity between character strings extracted from the design document.

仕様データ収集処理の流れを図７に示す。まず、同じ案件の設計書ファイルを所定の数だけ読み込む（Ｓ７０１）。所定の数はとくに制限しない。処理装置の処理性能に応じて適宜設定すればよい。さらに仕様項目名収集処理で作成した仕様項目名リストを読み込む（Ｓ７０２）。すべての設計書ファイルの２つ組を比較すれば処理は終了である（Ｓ７０３）。処理していない２つ組があれば、設計書ファイルを２つ取り出し（Ｓ７０４）、その２つのファイルを比較する。それぞれから文字列を抽出し、すべての文字列の類似度を文字列類似度を使い比較する（Ｓ７０５）。 The flow of the specification data collection process is shown in FIG. First, a predetermined number of design document files of the same item are read (S701). The predetermined number is not particularly limited. What is necessary is just to set suitably according to the processing performance of a processing apparatus. Further, the specification item name list created by the specification item name collection process is read (S702). If the two sets of all design document files are compared, the process is completed (S703). If there are two unprocessed sets, two design document files are taken out (S704), and the two files are compared. Character strings are extracted from each of them, and the similarity of all the character strings is compared using the character string similarity (S705).

比較した結果は図８に示す集計テーブルに保存する（Ｓ７０７）。文字列類似度評価部１０５で類似すると評価された文字列が１つのレコード（横一行）に集計され、それぞれの文字列ごとに頻度を保存する。ファイル識別子は複数個の可能性がある。すべてのファイルを処理し終えたら、集計テーブルの全頻度をチェックし、その値が、ファイル数に対してｎ％以上のレコードのみを選択する（Ｓ７０８）。ｎは任意に設定できる。選択したレコードをチェックし、仕様項目名リストにある仕様項目名と同じ文字列がある場合は、そのレコードを除外する（Ｓ７０９）。さらに選択したレコードについて最大出現頻度の文字列を選び、その文字列を標準文字列とする（Ｓ７１０）。最後に仕様データリストを、図９の形式で出力する（Ｓ７１１）。 The comparison result is stored in the aggregation table shown in FIG. 8 (S707). Character strings evaluated as similar by the character string similarity evaluation unit 105 are aggregated into one record (one horizontal line), and the frequency is stored for each character string. There may be multiple file identifiers. When all files have been processed, the total frequency of the aggregation table is checked, and only records whose values are n% or more of the number of files are selected (S708). n can be set arbitrarily. The selected record is checked, and if there is the same character string as the specification item name in the specification item name list, the record is excluded (S709). Further, a character string having the maximum appearance frequency is selected for the selected record, and the character string is set as a standard character string (S710). Finally, the specification data list is output in the format of FIG. 9 (S711).

文字列類似度評価部１０５における文字列の類似度計算は、DPマッチングを利用して計算する。本計算方法は、非特許文献１など多数の書籍で公開されているのでここでは詳細に説明しない。 The character string similarity calculation in the character string similarity evaluation unit 105 is performed using DP matching. Since this calculation method is disclosed in many books such as Non-Patent Document 1, it will not be described in detail here.

ここでは、図１０に示す文字列の対応づけを例として用い簡単に計算方法を説明する。簡単のために２つの文字列を照合する。３つ以上の文字列を照合する場合は、すべての組み合わせを計算すればよいので、ここでは説明しない。 Here, the calculation method will be briefly described using the correspondence of the character strings shown in FIG. 10 as an example. Match two strings for simplicity. In the case of collating three or more character strings, all the combinations need only be calculated, and will not be described here.

コストは、２つの文字列がどの程度異なっているかを示す数値である。一方の文字列を他方の文字列に変形するのに必要な操作の回数を使い計算する。操作としては、文字の挿入や削除、置換を考える。それぞれの操作にコストを付与し、必要な操作についてコストを合計する。ここでは、文字を挿入、削除、置換したとき−２点を与え、一致したときに２点を与えるものとする。 The cost is a numerical value indicating how different two character strings are. The calculation is performed using the number of operations necessary to transform one character string into the other character string. As operations, insertion, deletion, and replacement of characters are considered. A cost is assigned to each operation, and the costs are totaled for the necessary operations. Here, it is assumed that -2 points are given when a character is inserted, deleted, or replaced, and 2 points are given when they match.

DPマッチングでは、図１１に示すように、比較する文字列をそれぞれ列と行に対応させ、スコアを２次元の表で管理し、表のマスにスコアを順番に計算してゆく。図１１の例は、行に「文書部品を利用した文書作成方法」、列に「文書を再利用した文書作成方法」を対応させている。表のマスの位置を行と列を使い表すとする。ｎ行目、ｍ列目のマスは（ｎ，ｍ）で表す。なお、行、列とも１から始まるものとする。マス（ｎ，ｍ）のスコアＳ（ｎ，ｍ）は、図１８の数式１で計算される。このとき、スコアを計算するときに使ったマスを記憶しておく。 In DP matching, as shown in FIG. 11, the character strings to be compared are associated with columns and rows, the scores are managed in a two-dimensional table, and the scores are calculated in turn in the table. In the example of FIG. 11, “document creation method using document parts” is associated with a row, and “document creation method using document reuse” is associated with a column. Suppose that the position of a cell in the table is expressed using rows and columns. The square in the nth row and the mth column is represented by (n, m). Note that both rows and columns start from 1. The score S (n, m) of the cell (n, m) is calculated by Equation 1 in FIG. At this time, the cell used to calculate the score is stored.

例えば、Ｓ（１２，１３）の値は、行の１２番目の文字が「作」であり、列の１３番目の文字が「作」であるので、１項はｄ（１２，１３）が２になるので１４、２項は１０、３項は１０なので、最大である１項を選び、１４になる。このとき、マス（１２，１３）のスコアはマス（１１，１２）の値から計算したことを記憶しておく。ただし、スコアが０になったときは、記憶していた分を含めすべて消去する。 For example, the value of S (12,13) is such that the twelfth character of the row is “saku” and the thirteenth character of the column is “saku”. Therefore, since the second term is 10, the third term is 10, and the third term is 10, the largest one term is selected and becomes 14. At this time, it is stored that the score of the square (12, 13) is calculated from the value of the square (11, 12). However, when the score becomes 0, all of the stored contents are deleted.

スコア表の各セルのスコアは、そのセルのスコアを計算するまでにたどったセルが対応する文字列の類似度を表す。このスコアを計算すると使ったマスを記憶したときと逆の順にたどることで文字列の対応関係を得ることができる。スコアが高いセルから順に対応関係をたどることで、類似度が高い対応関係から順に得ることができる。このとき、一度たどったセルを２度たどらないようにすることで、同じ文字列を含む部分を何度も抽出することを抑制することができる。図１１では、（１５，１６）の値が２０で最大である。この値を得るためにたどったマスをたどる。文字列を対応づけた結果を図１２に示す。 The score of each cell in the score table represents the similarity of the character string corresponding to the cell traced until the score of the cell is calculated. When this score is calculated, the correspondence between the character strings can be obtained by following the reverse order of storing the used squares. By following the correspondences in order from the cell with the highest score, it is possible to obtain the correspondences in descending order of similarity. At this time, it is possible to prevent the portion including the same character string from being extracted many times by preventing the cell once traced from being traced twice. In FIG. 11, the value of (15, 16) is 20 and is the maximum. Follow the squares you followed to get this value. The result of associating the character strings is shown in FIG.

本実施列では、数式１を使い文字列が類似かどうか判定しているが、辞書を使い判定することも考えられる。辞書を使えば、「照合」と「マッチング」のような表記がまったく異なる同義語についても同一であることを判定できる。 In this embodiment, whether or not the character strings are similar is determined using Equation 1, but it is also possible to determine using a dictionary. By using a dictionary, it is possible to determine that synonyms having completely different notations such as “collation” and “matching” are the same.

仕様項目名−仕様データ関係抽出部１０８は、ひとつひとつの設計書において、仕様項目名収集処理で収集した仕様項目名と、仕様データ収集処理で収集した仕様データの対応づけを行う。 The specification item name-specification data relationship extraction unit 108 associates the specification item name collected in the specification item name collection process with the specification data collected in the specification data collection process in each design document.

処理の流れを図１３に示す。設計書ファイル、仕様項目名リスト、仕様データリストを読み込む（Ｓ１３０１、Ｓ１３０２、Ｓ１３０３）。設計書ファイルの文字列をチェックし、仕様項目名リストに含まれる文字列を部分文字列として含む文字列を選択する（Ｓ１３０４）。選択した文字列の長さが、部分文字列として含む仕様項目名リストの文字列よりもｎ以上長いかチェックする（Ｓ１３０６）。ｎは任意の数であるが、５程度でよい。長い場合は、仕様項目名を含む文字列を係り受け解析し、仕様項目名と合致する文字列の係り先を抽出する。この係り先の文字列が仕様データリストにある場合（Ｓ１３０７）、図１４の集計テーブルに登録する（Ｓ１３０８）。含まれない場合は、印刷したときの空間的な近接関係をチェックする（Ｓ１３０９）。印刷したときの空間的な近接関係は図３に示すＸ−Ｙ座標を使い、ユークリッド距離を計算し、その値が閾値以下かどうか調べることで実現できる。空間的な近接関係が近い文字列が仕様データリストにある場合（Ｓ１３１０）、図１４の集計テーブルに登録する（Ｓ１３０８）。全ての仕様項目名を処理した場合には（Ｓ１３０５）、仕様項目名−仕様関係データを出力して（Ｓ１３１１）、終了する。 The flow of processing is shown in FIG. The design document file, specification item name list, and specification data list are read (S1301, S1302, and S1303). The character string of the design document file is checked, and a character string including the character string included in the specification item name list as a partial character string is selected (S1304). It is checked whether or not the length of the selected character string is longer than the character string of the specification item name list included as a partial character string (S1306). n is an arbitrary number, but may be about 5. If it is long, the character string including the specification item name is subjected to dependency analysis, and the destination of the character string matching the specification item name is extracted. If the character string of the destination is in the specification data list (S1307), it is registered in the tabulation table of FIG. 14 (S1308). If not included, the spatial proximity relationship when printing is checked (S1309). The spatial proximity relationship at the time of printing can be realized by calculating the Euclidean distance using the XY coordinates shown in FIG. 3 and checking whether the value is equal to or less than the threshold value. If a character string having a close spatial relationship is in the specification data list (S1310), it is registered in the tabulation table of FIG. 14 (S1308). If all the specification item names have been processed (S1305), the specification item name-specification relationship data is output (S1311), and the process ends.

設計書データベース１０９には、すべての設計書ファイルが保存されている。本データベースは、リレーショナルデータベースや、XMLデータベース、あるいは、ファイルサーバなどのプログラムを使って外部記憶装置２０３の上に構築することができる。本実施例では、リレーショナルデータベース上にテーブル形式で構築するものとする。図１５に示すようテーブルで構築する。 The design document database 109 stores all design document files. This database can be constructed on the external storage device 203 using a program such as a relational database, an XML database, or a file server. In this embodiment, it is assumed that the table is constructed on the relational database. The table is constructed as shown in FIG.

仕様データベース１１０には、設計書から収集し、ひとつひとつの設計書ファイルを関係づけた仕様項目名と仕様データ、設計書ファイルの関係を登録しておく。本データベースは、リレーションナルデータベースや、XMLデータベースなどのプログラムを使って外部記憶装置２０３の上に構築することができる。本実施例では、リレーショナルデータベース上にテーブル形式で構築するものとする。図１６に示すテーブルで構築する。 In the specification database 110, the relationship between the specification item name, the specification data, and the design document file that are collected from the design document and related to each design document file is registered. This database can be constructed on the external storage device 203 using a program such as a relational database or an XML database. In this embodiment, it is assumed that the table is constructed on the relational database. It builds with the table shown in FIG.

検索クエリ受付部１１１は、設計書を検索するためのクエリの入力を受け付ける。クエリの入力は、キーボードなどの入力装置２０４を使う。本発明で想定される検索は、案件名、例えば「A発電所の配管工事」と仕様項目名、例えば「配管名」を指定して、指定した案件のなかで仕様項目名を含む設計書ファイルを検索し、それらの設計書ファイルを仕様項目名と仕様データの組が共通のファイルごとに整理してユーザに提示するものと考えられる。したがって、ユーザは、メニュの形式で表示された案件リスト、仕様項目名リストから検索したい項目を選び出すことでクエリを入力する方法が適している。ただし、キーボードを使って任意の文字列を入力して検索する方法であってもとくに問題はない。 The search query receiving unit 111 receives an input of a query for searching for a design document. A query is input using an input device 204 such as a keyboard. The search assumed in the present invention specifies a project name, for example, “Plumbing construction of power plant A” and a specification item name, for example, “piping name”, and a design document file including the specification item name in the designated item It is considered that these design document files are arranged for each file having a common specification item name and specification data and presented to the user. Therefore, it is suitable for the user to input a query by selecting an item to be searched from the item list and specification item name list displayed in the menu format. However, there is no problem even if it is a method of searching by inputting an arbitrary character string using the keyboard.

設計書検索部１１２は、検索クエリとして入力された案件名、仕様項目名と仕様データの組が、仕様データベース１１０に保存されているファイル識別子と仕様項目名、仕様データにあるか検索する。このような検索処理は一般的な関係データベースやＸＭＬデータベースを使い実現することが容易であるので、ここでは詳しく説明しない。 The design document search unit 112 searches the file identifier, specification item name, and specification data stored in the specification database 110 for the combination of the case name, specification item name, and specification data input as the search query. Such search processing is easy to implement using a general relational database or XML database, and will not be described in detail here.

設計書ファイル出力部１１３は、記憶しておいたファイル識別子を使って設計書データベース１０９から設計書ファイルを読み出し、ディスプレイなどの出力装置２０５に出力する。このとき、仕様データが共通の設計書を近接させて出力することで関連する設計書を閲覧しやすく表示することができる。 The design document file output unit 113 reads the design document file from the design document database 109 using the stored file identifier, and outputs it to the output device 205 such as a display. At this time, it is possible to easily display the related design documents by outputting the design documents having the same specification data in close proximity.

また、検索した設計文書は、企業資産管理（ＥＡＭ）システムと連携させて利用することができる。企業資産管理システムには、プラントなどの機器データが登録されている。設計書に記載がある機器とこのデータを関係づけることで、効率的な機器の保守を支援できる。 The retrieved design document can be used in cooperation with an enterprise asset management (EAM) system. Device data such as a plant is registered in the corporate asset management system. By associating this data with the equipment described in the design document, it is possible to support efficient equipment maintenance.

図１７に、企業資産管理（ＥＡＭ）システム３０１と、本発明の設計書の検索システム３０２との関連構成の一例を示す。企業資産管理システム３０１は、機器Ａなどの稼働データ、保守履歴、故障履歴等のデータ３０５を備えている。設計書の検索システム３０２は、設定書データベース３０３に、機器Ａの配置図などの設計書ファイル３０６を備えている。機器Ａの稼働データ３０５に基づき機器Ａの保守作業を計画する場合には、設計書の検索システム３０２を用いて、機器Ａについての記述を含む設計書である「機器Ａ配置図」３０６を検索し、機器Ａの配置図３０６を表示する。これにより、効率的な機器の保守の支援などを行うことができる。 FIG. 17 shows an example of a related configuration of the enterprise asset management (EAM) system 301 and the design document search system 302 of the present invention. The corporate asset management system 301 includes data 305 such as operation data of the device A, maintenance history, failure history, and the like. The design document search system 302 includes a design document file 306 such as a layout drawing of the device A in the setting document database 303. When planning maintenance work of device A based on the operation data 305 of device A, a “document A layout diagram” 306 that is a design document including a description of device A is retrieved using the design document retrieval system 302. Then, the layout diagram 306 of the device A is displayed. As a result, it is possible to provide support for efficient device maintenance.

以上、図１に示す設計書の検索システムによって設計書から検索用の情報を自動的に抽出し、この情報を使って効率的に検索できることを説明した。 As described above, it has been explained that search information is automatically extracted from a design document by the design document search system shown in FIG. 1 and can be efficiently searched using this information.

１０１設計書ファイル入力部
１０２設計書ファイル読込部
１０３文字列抽出部
１０４設計書ファイル分類部
１０５文字列類似度評価部
１０６仕様項目名収集部
１０７仕様データ収集部
１０８仕様項目名−仕様データ関係抽出部
１０９設計書データベース
１１０仕様データベース
１１１検索クエリ受付部
１１２設計書検索部
１１３設計書ファイル出力部
２０１中央処理装置
２０２主記憶装置
２０３外部記憶装置
２０４入力装置
２０５出力装置
２０６ネットワーク
２０７情報処理装置
３０１企業資産管理システム
３０２設計書の検索システム
３０３設計書データベース
３０５企業資産管理システムのデータ
３０６設計書の検索システムの設計書ファイル 101 Design Document File Input Unit 102 Design Document File Reading Unit 103 Character String Extraction Unit 104 Design Document File Classification Unit 105 Character String Similarity Evaluation Unit 106 Specification Item Name Collection Unit 107 Specification Data Collection Unit 108 Specification Item Name-Specification Data Relationship Extraction Unit 109 Design document database 110 Specification database 111 Search query reception unit 112 Design document search unit 113 Design document file output unit 201 Central processing unit 202 Main storage unit 203 External storage unit 204 Input unit 205 Output unit 206 Network 207 Information processing unit 301 Companies Asset management system 302 Design document search system 303 Design document database 305 Enterprise asset management system data 306 Design document search system design document file

Claims

A design document database for storing design document files;
A specification database for registering the relationship between the design document file, specification item names, and specification data;
A search query receiving unit that receives input of a search query;
A design document search unit that searches the specification database based on a case name or a specification item name input as a search query;
A design document search system comprising: a design document file output unit for outputting a design document file from the design document database based on a search result.

In the design document search system according to claim 1,
The design document file output unit is a design document search system for outputting a design document file having common specification data in close proximity.

In the design document search system according to claim 1,
The design document search system includes a file identifier for identification, a type of design document, and a design document item in the design document file.

In the design document search system according to claim 1,
A design document search system in which the specification database includes a file identifier for identification, specification items, and specification data.

The design document search system according to claim 1, further comprising:
A character string extraction unit for extracting a character string from the design document file;
A specification item name collection unit that collects the specification item name using the extracted character string from the set of design document files classified for each type of design document;
A specification data collection unit that collects the specification data using the extracted character string from the set of design document files classified for each case of the design document;
A specification item name-specification data relationship extraction unit that associates the design document file with the specification item name collected by the specification item name collection unit and the specification data collected by the specification data collection unit and stores the specification data in the specification database. A design document search system.

The design document search system according to claim 5, further comprising:
A character string similarity evaluation unit for evaluating the similarity between character strings,
The specification item name collection unit collects the specification item name using the evaluation of the similarity of the character string by the character string similarity evaluation unit,
The specification data collection unit is a design document search system that collects the specification data by using the character string similarity evaluation performed by the character string similarity evaluation unit.

The design document search system according to claim 6,
The character string similarity evaluation unit is a design document search system that evaluates the similarity of character strings using a DP matching method or a dictionary.

In the design document search system according to claim 5,
The character string data extracted from the design document file includes a file identifier for identification, a design document type, a design document item, and character string information.
The character string information includes a character string itself, a page on which the character string appears, a design document search system including the X coordinate and Y coordinate of the page.

In the design document search system according to claim 5,
The specification item name-specification data relationship extraction unit is a design document search system that associates the specification item name with the specification data based on a destination of a character string that matches the specification item name.

In the design document search system according to claim 5,
The specification item name-specification data relationship extraction unit is a design document search system that associates the specification item name with the specification data based on a spatial proximity relationship when printed.

Extracting a character string from the design document file;
Classifying the design document file according to the type of design document and the project of the design document;
Collecting a specification item name from the set of design document files classified for each type of design document using the extracted character string;
Collecting specification data using the extracted character string from a set of design document files classified for each case of the design document;
Associating the collected specification item names with the collected specification data;
A method for constructing a design document search system, comprising: registering a relationship between the design document file, the specification item name, and the specification data as a specification database.

In the construction method of the design document search system according to claim 11,
The step of collecting the specification item names evaluates the similarity between character strings, collects the specification item names,
The step of collecting the specification data is a method for constructing a design document search system for evaluating the similarity between character strings and collecting the specification data.

In the construction method of the design document search system according to claim 11,
The step of associating the specification item name with the specification data is a design document search for associating the specification item name with the specification data based on a destination of a character string that matches the specification item name. How to build a system.

In the construction method of the design document search system according to claim 11,
The step of associating the specification item name with the specification data includes a design document search system that associates the specification item name with the specification data based on a spatial proximity relationship when printed. Construction method.

A program for causing a computer to construct a design document search system,
Extracting a character string from the design document file;
Classifying the design document file according to the type of design document and the project of the design document;
From the set of design document files classified for each type of design document, using the extracted character string, evaluating the similarity between the character strings and collecting specification item names;
From the set of design document files classified for each case of the design document, using the extracted character string, evaluating the similarity between the character strings and collecting specification data;
Associating the collected specification item names with the collected specification data;
A program for executing a step of registering a relationship between the design document file, the specification item name, and the specification data as a specification database.