WO2014002212A1 - Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor - Google Patents

Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor Download PDF

Info

Publication number
WO2014002212A1
WO2014002212A1 PCT/JP2012/066348 JP2012066348W WO2014002212A1 WO 2014002212 A1 WO2014002212 A1 WO 2014002212A1 JP 2012066348 W JP2012066348 W JP 2012066348W WO 2014002212 A1 WO2014002212 A1 WO 2014002212A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
work
work procedure
similarity
classification
Prior art date
Application number
PCT/JP2012/066348
Other languages
French (fr)
Japanese (ja)
Inventor
義行 小林
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to JP2014522292A priority Critical patent/JP5894273B2/en
Priority to PCT/JP2012/066348 priority patent/WO2014002212A1/en
Publication of WO2014002212A1 publication Critical patent/WO2014002212A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present invention relates to a method and an apparatus for searching a document such as a work procedure manual (manual) for business that performs work in accordance with predetermined procedures such as manufacture, maintenance, and inspection of products.
  • a document such as a work procedure manual (manual) for business that performs work in accordance with predetermined procedures such as manufacture, maintenance, and inspection of products.
  • Patent Document 1 is an invention that supports efficient search of work procedure manuals.
  • the present invention is intended to assist the search for a work procedure manual that serves as a model when creating a work procedure manual for a network maintenance worker such as a computer.
  • the work procedure manual is searched based on the similarity of the work procedure, and the similarity between the construction target network and the network described in the work procedure manual is evaluated, so that the work procedure manual can be efficiently searched.
  • a sequence matching technique can be used to calculate the similarity of work procedures. As such a technology, there is Non-Patent Document 1.
  • Patent Document 1 is a combination of two methods: a method of searching for documents according to similar work procedures and a method of narrowing down search results using the similarity of the network to be constructed.
  • the latter method cannot be applied to work without network construction.
  • the former method can be applied to search for various work procedure manuals, but it is assumed that the worker can input a sufficiently detailed work procedure.
  • inputting a detailed work procedure is more burdensome than inputting a keyword. Therefore, the operator is expected to input a simple work procedure and search. In such a case, it is difficult to select an appropriate work procedure manual because it is possible to obtain similar search results from various viewpoints only by searching for a similar work procedure manual.
  • Patent Document 1 it is assumed that a search is performed using a similarity to a work procedure written in a work procedure document being created. This is a description of a detailed work procedure by inputting a simple work procedure. It can be considered that we are going to get.
  • An object of the present invention is to provide a document search method and apparatus capable of searching an appropriate document only by using a work procedure as a query, and not requiring a detailed work procedure to be input when the work procedure is used as a query.
  • a document association method including a step of classifying documents into a tree structure according to the similarity of work procedures.
  • the document search method of the present invention includes a step of inputting a search query in the form of a work procedure in the classification database created using the document association method, and a work procedure for each classification of the search query and the classification database. And a step of searching for a document by evaluating the similarity.
  • a document file input unit that inputs a plurality of document files, a work procedure extraction unit that extracts a description of a work procedure from each of the document files,
  • a document association apparatus comprising: a work procedure similarity evaluation unit to be evaluated; and a document classification unit that classifies the document into a tree structure according to the similarity of the work procedures.
  • the document search apparatus of the present invention includes a classification database created using the document association apparatus, a search query reception unit that receives input of a search query in a work procedure format, and a classification of the search query and the classification database.
  • a document search unit for evaluating the similarity of each work procedure and searching for a document is provided.
  • An example of the program of the present invention is a program for causing a computer to function as a document association apparatus, which extracts a document file input unit for inputting a plurality of document files and a description of a work procedure from each of the document files. And a work procedure similarity evaluation unit that evaluates the similarity of work procedures, and a document classification unit that classifies the document into a tree structure according to the similarity of the work procedures.
  • documents classified according to the similarity of work procedures are searched. Therefore, it is only necessary to compare documents for each classification and select an appropriate document, thereby reducing the work of selecting a document. be able to.
  • FIG. 1 is a block configuration diagram of a document association apparatus and a document search apparatus according to an embodiment of the present invention.
  • Embodiments of the present invention will be described with reference to a block diagram representing a system function and a specific system diagram.
  • FIG. 1 shows a block configuration of the embodiment of the present invention based on functions on the system.
  • the configuration for classifying and searching work procedure manuals includes a work procedure manual file input unit 101, a work procedure manual file reading unit 102, a work procedure extraction unit 103, a work procedure manual classification unit 104, a work procedure similarity evaluation unit 105, a work procedure.
  • Book database 106 work procedure manual classification database 107, search query receiving unit 108, work procedure manual search unit 109, and work procedure manual file output unit 110.
  • this system has a specific system configuration, that is, a central processing unit 201 having a central processing unit and processing information by a storage program system, and a main storage device 202 including a random access memory.
  • a computer comprising an external storage device 203 for storing a document to be processed and a dictionary of processing results, an input device 204 for inputting documents and the like, and an output device 205 for outputting information processing results such as a created dictionary Operates on devices.
  • the central processing unit 201 may be connected to another information processing apparatus 207 via the network 206.
  • the external storage device 203 includes a database 2031 and a dictionary 2032.
  • the input device 204 includes a CR-ROM reader 2041, a DVD reader 2042, a keyboard 2043, and the like.
  • the output device 205 includes a CR-ROM writing device 2051, a DVD writing device 2052, a display 2053, and the like.
  • the system shown in FIG. 1 can be realized by reading the program into the main storage device 202 via the input device 204 or the network 206 and operating it on the central processing unit 201.
  • the work procedure manual file input unit 101 receives a document file input from the outside of the system in the form of a storage medium such as a DVD or a CD-ROM by the input device 204 and stores it in the external storage device 203.
  • a work procedure manual database 106 and a work procedure manual classification database 107 are constructed.
  • the work procedure manual file is an electronic document created by a word processor or the like, and its contents are character-coded.
  • the character code is not particularly limited.
  • a unique symbol for identification is given to the work procedure manual file.
  • a unique symbol for identification is called a file identifier.
  • the work procedure manual file reading unit 102 reads the work procedure manual stored in the work procedure manual database 106 constructed on the external storage device 203 into the main storage device 202.
  • the work procedure extraction unit 103 extracts the contents of the work procedure from the work procedure manual file. It is assumed that the extracted work procedure is stored in the main storage device 202 in association with the file identifier. The work procedure is explicitly indicated by a tag or the like in a structured document such as an XML file. Since the work procedure manual is a document intended to explain the procedure for proceeding with the work, such an assumption is appropriate. In the following description, it is assumed that the work procedure is extracted using explicit information. However, when there is no such explicit information, it is conceivable to use a method as described in Non-Patent Document 2.
  • the work procedure extracted from the work procedure manual is stored as a set of file identifier and work procedure as shown in FIG. 4 and used in the subsequent processing.
  • the work procedure manual classification unit 104 classifies the work procedure manuals registered in the work procedure manual database 106 according to the similarity of the work procedures. At this time, the process proceeds while evaluating the similarity of the work procedure extracted using the work procedure similarity evaluation unit 105. The result of classifying the work procedure manual is stored in the work procedure manual classification database 107.
  • the flow of the work procedure manual classification process is shown in FIG.
  • classification is performed by hierarchical clustering.
  • the classification algorithm is not limited to hierarchical clustering as long as classification is performed using similarity.
  • S501 all work procedures are classified into one classification. That is, the number of work procedure manuals is created as many as the number of work procedure manuals, and a different work procedure manual is put into each category.
  • the work procedure similarity evaluation unit 105 calculates the similarity between the classifications in S502. Since the first classification includes only one work procedure, the similarity is calculated using this work procedure.
  • the work procedures included in the classification are integrated into one work procedure, and the similarity is calculated using this work procedure.
  • two categories having the maximum similarity are selected and integrated in S503. Further, the two work procedures included in the classification integrated in S504 are integrated into one work procedure. Each of the two work procedures is associated with the calculation in the work procedure similarity evaluation unit 105. Using this result, work procedures are integrated.
  • Fig. 6 shows an example of integration
  • Fig. 7 shows the flow of integration processing.
  • the number of classifications is checked in S505. If it is 1, the process is terminated. If it is greater than 1, the above process is repeated.
  • a tree structure classification such as the schematic example shown in FIG. 8 is obtained.
  • the classification structure is as follows. First, the classification that includes all work procedures is at the top. This classification is divided into a classification including work procedure manuals A to G and a classification including work procedure manuals H to L. The former classification is further divided into a classification including work procedure manuals A to C and a classification including work procedure manuals D to G. Furthermore, the classification including work procedure manuals A to C is divided into a classification including work procedure manuals A and B and a classification including work procedure manual C. The structure of other classifications is the same.
  • the work procedure similarity evaluation unit 105 evaluates the similarity of work procedures. At this time, the way in which the work is arranged in the work procedure is compared by a sequence matching method, and the evaluation is performed based on the ratio of work correspondence between the work procedures.
  • the work n (n is a natural number) represents the name of each work, and the work names arranged vertically represent the work procedure.
  • the left side of (a) represents a work procedure in which work 1, work 2, work 3, work 4, work 5, and the work are sequentially advanced.
  • the relationship between the tasks in correspondence and the tasks is arranged horizontally, and the case where the same tasks in the correspondence are grouped is represented by a straight line.
  • the left and right sides of (a) are the same work. In this case, there is a correspondence between all the operations, and all the correspondence relationships are the same task set. Therefore, the similarity of work procedures is 1 at maximum.
  • the left side and the right side of (b) are series composed of the same work, but the order of the work is different.
  • the correspondence of work procedures is the same as that of character strings. Therefore, the degree of correspondence of the work procedure is calculated using a method for determining the similarity of character strings using DP matching. Since this calculation method is disclosed in many books such as Non-Patent Document 3, it will not be described in detail here.
  • the cost is a numerical value indicating how different the two character strings are.
  • the calculation is performed using the number of operations necessary to transform one character string into the other character string. As operations, insertion, deletion, and replacement of characters are considered.
  • a cost is assigned to each operation, and the costs are totaled for the necessary operations.
  • -2 points are given when a character is inserted, deleted, or replaced, and 2 points are given when they match.
  • each character string to be compared is associated with a column and a row, the score is managed in a two-dimensional table, and the score is calculated in turn in the table.
  • “document creation method using document parts” is associated with a row
  • “document creation method using document reuse” is associated with a column.
  • the square in the nth row and the mth column is represented by (n, m). Note that both rows and columns start from 1.
  • the score S (n, m) of the cell (n, m) is calculated by Equation 1. At this time, the cell used to calculate the score is stored.
  • the value of S (12,13) is such that the twelfth character of the row is “saku” and the thirteenth character of the column is “saku”. Therefore, since the second term is 10, the third term is 10, and the third term is 10, the largest one term is selected and becomes 14. At this time, it is stored that the score of the square (12, 13) is calculated from the value of the square (11, 12). However, when the score becomes 0, all of the stored contents are deleted.
  • the score of each cell in the score table represents the similarity of the character string corresponding to the cell traced until the score of the cell is calculated.
  • the correspondence between the character strings can be obtained by following the reverse order of storing the used squares.
  • the correspondences in order from the cell with the highest score it is possible to obtain the correspondences in descending order of similarity.
  • it is possible to prevent the portion including the same character string from being extracted many times by preventing the cell once traced from being traced twice.
  • the value of (15, 16) is 20 and is the maximum.
  • the work procedure can be associated by outputting a correspondence having a large correspondence m (S (1, n), T (1, m)).
  • the character string similarity determination method using DP matching is also used to determine the similarity of work names.
  • the work procedure manual database 106 stores all work procedure manual files.
  • This database can be constructed on the external storage device 203 using a program such as a relational database, an XML database, or a file server. In this embodiment, it is assumed that the table is constructed on the relational database. As shown in FIG. 13, the file ID, file identifier, and character string in the file used for management inside the database are stored in association with each other.
  • the work procedure manual classification database 106 stores the result of classifying the work procedure manual in association with the file identifier and the work procedure extracted from the work procedure manual.
  • This database can be constructed on the external storage device 203 using a program such as a relational database or an XML database. In this embodiment, it is assumed that the table is constructed on the relational database. As shown in FIG. 14, it is constructed with two tables. In the table (a), the work procedure, the classification ID, and the parent classification ID are stored in association with each other. The parent classification ID is a classification ID of the hierarchy one level higher than the classification hierarchy obtained by hierarchical clustering.
  • the work procedure is an integrated work procedure obtained when the two classifications are integrated at the time of calculation by the work procedure similarity evaluation unit 105. In (b), the correspondence between the classification ID and the file ID is stored.
  • the search query receiving unit 108 receives a query input for searching for a work procedure manual.
  • a query is input using an input device 204 such as a keyboard.
  • the query is input in the form of a work procedure.
  • the work procedure manual search unit 109 searches the work procedure manual by evaluating the similarity between the work procedure input as a search query and the work procedure for each classification stored in the work procedure manual classification database 107. At this time, the similarity between work procedures is evaluated using the work procedure similarity.
  • the search result is represented by the classification ID and the file identifier of the work procedure manual file.
  • the processing procedure is shown in FIG.
  • S1501 the search query input by the search query receiving unit 108 is read.
  • step S1502 one classification ID is read from the work procedure manual classification database 107 in order from the lower hierarchy.
  • work procedures are related and read.
  • step S1503 the similarity between the work procedure with which the classification is related and the search query is calculated. If the similarity is greater than 0 (similar), the file identifier of the work procedure manual included in the classification is stored in S1505. This result is used by the work procedure manual file output unit 110.
  • S1506 the higher-level classification (the classification including the processed classification) is marked as checked so as to be treated as processed thereafter. If there is no check mark in S1507 or there is a classification that has not yet been processed, the matching process with the search query is performed again.
  • the work procedure manual file output unit 110 reads the work procedure manual file from the work procedure manual database 106 using the stored file identifier, and outputs it to the output device 205 such as a display. At this time, by sorting and outputting in descending order of similarity, it is possible to display the work procedure manual that matches the search query at the top of the ranking.
  • FIG. 3 shows a processing flow of this embodiment corresponding to the block diagram of FIG.
  • the work procedure manual file is input and stored in the work procedure manual database 106.
  • the work procedure manual file is read from the work procedure manual database 106.
  • a description of the work procedure is extracted from the work procedure manual file.
  • the similarity of the work procedure is evaluated.
  • the documents are classified according to the similarity of the work procedure, and the work procedure manual classification database 107 is constructed.
  • the process up to here corresponds to the work procedure association method.
  • the work procedure manual is searched using the constructed work procedure manual classification database.
  • an input of a search query for searching the work procedure manual is accepted.
  • the degree of similarity between the work procedure input as a search query in S307 and the work procedure for each category stored in the work procedure manual classification database 107 is evaluated, and the work procedure manual is searched.
  • step S ⁇ b> 308 based on the search result, the work procedure manual file is read from the work procedure manual database 106 and output to the output device 205.

Abstract

Provided is a work procedure document searching method which can retrieve appropriate work procedure documents by just a query about a work procedure and does not require input of a detailed work procedure. A document linking method comprises a step of inputting a plurality of document files, a step of extracting descriptions about the work procedures from the document files, a step of evaluating similarities among the work procedures, and a step of classifying the documents into a tree structure according to similarities among the work procedures. Also, a document searching method retrieves documents that were linked by this method according to the similarities among the work procedures.

Description

文書関連付け方法および文書検索方法、文書関連付け装置および文書検索装置、並びにそのためのプログラムDocument association method, document retrieval method, document association apparatus, document retrieval apparatus, and program therefor
 本発明は、製品の製造、保守、検査など定められた手順にしたがって作業を行う業務向けの作業手順書(マニュアル)などの文書を検索する方法および装置に関する。 The present invention relates to a method and an apparatus for searching a document such as a work procedure manual (manual) for business that performs work in accordance with predetermined procedures such as manufacture, maintenance, and inspection of products.
 定まった手順で作業を進める業務は数多くある。そのような業務では、作業を進める手順を説明した作業手順書があらかじめ準備され、作業者はその作業手順書に従って適正な手順で作業を進めることが求められる。したがって、作業者が実施しようとする業務にふさわしい作業手順書が簡便に検索できることが求められている。 There are many tasks to proceed with work according to a fixed procedure. In such work, a work procedure manual that describes the procedure for proceeding the work is prepared in advance, and the worker is required to proceed with the work in an appropriate procedure according to the work procedure manual. Therefore, it is required that a work procedure manual suitable for the work to be performed by the worker can be easily searched.
 しかし、作業を説明する用語の種類が限られているために、一般的なキーワード検索では、効率的に適切な作業手順書を見つけ出すことが難しいという問題がある。 However, since the types of terms used to describe the work are limited, there is a problem that it is difficult to efficiently find an appropriate work procedure manual in a general keyword search.
 作業手順書の効率的な検索を支援する発明として、特許文献1がある。この発明は、コンピュータなどのネットワーク保守作業者向けに作業手順書を作成するとき、ひな形となる作業手順書の検索を支援するためのものである。作業手順の類似度によって作業手順書を検索し、さらに工事対象のネットワークと作業手順書に記載のネットワークのあいだの類似度を評価することで、作業手順書を効率的に検索できるようにする。作業手順の類似度の計算には、系列マッチング技術を用いることができる。そのような技術として、非特許文献1がある。 Patent Document 1 is an invention that supports efficient search of work procedure manuals. The present invention is intended to assist the search for a work procedure manual that serves as a model when creating a work procedure manual for a network maintenance worker such as a computer. The work procedure manual is searched based on the similarity of the work procedure, and the similarity between the construction target network and the network described in the work procedure manual is evaluated, so that the work procedure manual can be efficiently searched. A sequence matching technique can be used to calculate the similarity of work procedures. As such a technology, there is Non-Patent Document 1.
特開2009-181170号公報JP 2009-181170 A
 特許文献1の発明は、作業手順の類似にしたがい文書を検索する方法と工事対象のネットワークの類似度を使った検索結果の絞込み方法の2つの方法を組み合わせたものである。後者の方法は、ネットワーク工事がない作業には適用できない。また、前者の方法は、さまざまな作業手順書の検索に適用できるが、作業者が十分に詳細な作業手順を入力できることを想定している。しかし、詳細な作業手順を入力することは、キーワードを入力することに比べ大きな負担である。したがって、作業者は、簡素な作業手順を入力して検索すると予想される。このような場合、類似する作業手順書を検索するだけでは、さまざまな観点で類似する検索結果が得られ、適切な作業手順書を選ぶことが難しい。特許文献1では、作成途上の作業手順書に書かれている作業手順との類似度を使い検索することを想定しており、これは、簡素な作業手順を入力して詳細な作業手順に関する記述を得ようとすることを想定しているとみなせる。 The invention of Patent Document 1 is a combination of two methods: a method of searching for documents according to similar work procedures and a method of narrowing down search results using the similarity of the network to be constructed. The latter method cannot be applied to work without network construction. In addition, the former method can be applied to search for various work procedure manuals, but it is assumed that the worker can input a sufficiently detailed work procedure. However, inputting a detailed work procedure is more burdensome than inputting a keyword. Therefore, the operator is expected to input a simple work procedure and search. In such a case, it is difficult to select an appropriate work procedure manual because it is possible to obtain similar search results from various viewpoints only by searching for a similar work procedure manual. In Patent Document 1, it is assumed that a search is performed using a similarity to a work procedure written in a work procedure document being created. This is a description of a detailed work procedure by inputting a simple work procedure. It can be considered that we are going to get.
 本発明は、作業手順をクエリとするだけで適切な文書を検索でき、かつ、作業手順をクエリとするとき、詳細な作業手順を入力する必要がない文書検索方法および装置を提供することを目的とする。 An object of the present invention is to provide a document search method and apparatus capable of searching an appropriate document only by using a work procedure as a query, and not requiring a detailed work procedure to be input when the work procedure is used as a query. And
 上記課題を解決するために、本発明は請求の範囲に記載した構成を採用する。 In order to solve the above problems, the present invention adopts the configuration described in the claims.
 本発明の文書関連付け方法の一例を挙げるならば、複数の文書ファイルを入力するステップと、前記文書ファイルそれぞれから作業手順の記述を抽出するステップと、作業手順の類似度を評価するステップと、前記作業手順の類似度にしたがって文書を木構造に分類するステップを備える文書関連づけ方法である。 As an example of the document association method of the present invention, a step of inputting a plurality of document files, a step of extracting a description of the work procedure from each of the document files, a step of evaluating the similarity of the work procedure, A document association method including a step of classifying documents into a tree structure according to the similarity of work procedures.
 また、本発明の文書検索方法は、前記の文書関連付け方法を用いて作成した分類データベースを、作業手順の形式の検索クエリを入力するステップと、前記検索クエリと前記分類データベースの分類ごとの作業手順の類似度を評価し、文書を検索するステップとにより検索する方法である。 The document search method of the present invention includes a step of inputting a search query in the form of a work procedure in the classification database created using the document association method, and a work procedure for each classification of the search query and the classification database. And a step of searching for a document by evaluating the similarity.
 本発明の文書関連付け装置の一例を挙げるならば、複数の文書ファイルを入力する文書ファイル入力部と、前記文書ファイルそれぞれから作業手順の記述を抽出する作業手順抽出部と、作業手順の類似度を評価する作業手順類似度評価部と、前記作業手順の類似度にしたがって前記文書を木構造に分類する文書分類部を備えることを特徴とする文書関連付け装置である。 As an example of the document association apparatus of the present invention, a document file input unit that inputs a plurality of document files, a work procedure extraction unit that extracts a description of a work procedure from each of the document files, A document association apparatus comprising: a work procedure similarity evaluation unit to be evaluated; and a document classification unit that classifies the document into a tree structure according to the similarity of the work procedures.
 また、本発明の文書検索装置は、前記の文書関連付け装置を用いて作成した分類データベースと、作業手順の形式の検索クエリの入力を受け付ける検索クエリ受付部と、前記検索クエリと前記分類データベースの分類ごとの作業手順の類似度を評価し、文書を検索する文書検索部を備えるものである。 The document search apparatus of the present invention includes a classification database created using the document association apparatus, a search query reception unit that receives input of a search query in a work procedure format, and a classification of the search query and the classification database. A document search unit for evaluating the similarity of each work procedure and searching for a document is provided.
 本発明のプログラムの一例を挙げるならば、コンピュータを文書関連付け装置として機能させるためのプログラムであって、複数の文書ファイルを入力する文書ファイル入力部と、前記文書ファイルそれぞれから作業手順の記述を抽出する作業手順抽出部と、作業手順の類似度を評価する作業手順類似度評価部と、前記作業手順の類似度にしたがって前記文書を木構造に分類する文書分類部として機能させるプログラムである。 An example of the program of the present invention is a program for causing a computer to function as a document association apparatus, which extracts a document file input unit for inputting a plurality of document files and a description of a work procedure from each of the document files. And a work procedure similarity evaluation unit that evaluates the similarity of work procedures, and a document classification unit that classifies the document into a tree structure according to the similarity of the work procedures.
 本発明によれば、作業手順の類似度にしたがってに分類された文書が検索されるので、分類ごとに文書を比較して適切な文書を選べばよく、文書を選択する作業の負担を軽減することができる。 According to the present invention, documents classified according to the similarity of work procedures are searched. Therefore, it is only necessary to compare documents for each classification and select an appropriate document, thereby reducing the work of selecting a document. be able to.
本発明の実施例の文書関連付け装置および文書検索装置のブロック構成図。1 is a block configuration diagram of a document association apparatus and a document search apparatus according to an embodiment of the present invention. 本発明の実施例のシステムの構成図。The block diagram of the system of the Example of this invention. 本発明の実施例の処理のフローを示す図。The figure which shows the flow of a process of the Example of this invention. 作業手順とファイル識別子のデータ構造を示す図。The figure which shows the data structure of a work procedure and a file identifier. 作業手順書分類処理の処理の流れを示す図。The figure which shows the flow of a process of work procedure manual classification | category process. 作業手順の統合例を示す図。The figure which shows the example of integration of a work procedure. 作業手順の統合処理の流れを示す図。The figure which shows the flow of the integration process of a work procedure. 作業手順書分類結果の模式例を示す図。The figure which shows the model example of a work procedure manual classification | category result. 作業手順の類似度の模式例を示す図。The figure which shows the model example of the similarity of a work procedure. 文字列の対応付けの例文。An example of string mapping. 文字列の類似度評価の表。Table of string similarity evaluation. 文字列の対応関係の例。An example of correspondence between character strings. 作業手順書データベースのテーブル。Work procedure database table. 作業手順書分類データベースのテーブル。Work procedure manual classification database table. 作業手順書検索の処理の流れを示す図。The figure which shows the flow of a process of work procedure manual search.
 本発明の実施の形態を、システム上の機能に基づいて表したブロック構成図と、具体的なシステムの構成図によって説明する。 Embodiments of the present invention will be described with reference to a block diagram representing a system function and a specific system diagram.
 図1に、本発明の実施例の、システム上の機能に基づいて表したブロック構成を示す。作業手順書を分類および検索する構成は、作業手順書ファイル入力部101、作業手順書ファイル読込部102、作業手順抽出部103、作業手順書分類部104、作業手順類似度評価部105、作業手順書データベース106、作業手順書分類データベース107、検索クエリ受付部108、作業手順書検索部109、作業手順書ファイル出力部110から構成されている。 FIG. 1 shows a block configuration of the embodiment of the present invention based on functions on the system. The configuration for classifying and searching work procedure manuals includes a work procedure manual file input unit 101, a work procedure manual file reading unit 102, a work procedure extraction unit 103, a work procedure manual classification unit 104, a work procedure similarity evaluation unit 105, a work procedure. Book database 106, work procedure manual classification database 107, search query receiving unit 108, work procedure manual search unit 109, and work procedure manual file output unit 110.
 また、本システムは、具体的なシステム構成としては、図2に示すように、中央処理ユニットを持ち蓄積プログラム方式によって情報を処理する中央処理装置201と、ランダムアクセスメモリからなる主記憶装置202と、処理対象の文書や処理結果の辞書を保存する外部記憶装置203と、文書などの入力に使用する入力装置204と、作成した辞書など情報処理結果を出力する出力装置205から構成されるコンピュータ等の装置上で動作する。中央処理装置201は、ネットワーク206を介して他の情報処理装置207と接続されていても良い。外部記憶装置203は、データベース2031や辞書2032を含んでいる。入力装置204は、CR-ROM読取装置2041、DVD読取装置2042、キーボード2043などから構成される。出力装置205は、CR-ROM書込装置2051、DVD書込装置2052、ディスプレイ2053などから構成される。そして、プログラムを入力装置204やネットワーク206を介して主記憶装置202に読み込み、中央処理装置201上で動作させることにより、図1に示したシステムを実現することができる。 Further, as shown in FIG. 2, this system has a specific system configuration, that is, a central processing unit 201 having a central processing unit and processing information by a storage program system, and a main storage device 202 including a random access memory. A computer comprising an external storage device 203 for storing a document to be processed and a dictionary of processing results, an input device 204 for inputting documents and the like, and an output device 205 for outputting information processing results such as a created dictionary Operates on devices. The central processing unit 201 may be connected to another information processing apparatus 207 via the network 206. The external storage device 203 includes a database 2031 and a dictionary 2032. The input device 204 includes a CR-ROM reader 2041, a DVD reader 2042, a keyboard 2043, and the like. The output device 205 includes a CR-ROM writing device 2051, a DVD writing device 2052, a display 2053, and the like. The system shown in FIG. 1 can be realized by reading the program into the main storage device 202 via the input device 204 or the network 206 and operating it on the central processing unit 201.
 以下に、図1の各構成を、詳細に説明する。 Hereinafter, each configuration of FIG. 1 will be described in detail.
 作業手順書ファイル入力部101は、DVDやCD-ROMなどの記憶媒体の形式でシステムの外部から入力される文書ファイルを入力装置204で受け付け、外部記憶装置203に保存する。外部記憶装置203上には、作業手順書データベース106、作業手順書分類データベース107が構築される。作業手順書ファイルはワードプロセッサなどで作成された電子化文書であり、内容は文字コード化されているとする。文字コードはとくに制限しない。作業手順書ファイルには識別のためのユニークな記号が付与される。以下では識別のためのユニークな記号をファイル識別子と呼ぶ。 The work procedure manual file input unit 101 receives a document file input from the outside of the system in the form of a storage medium such as a DVD or a CD-ROM by the input device 204 and stores it in the external storage device 203. On the external storage device 203, a work procedure manual database 106 and a work procedure manual classification database 107 are constructed. It is assumed that the work procedure manual file is an electronic document created by a word processor or the like, and its contents are character-coded. The character code is not particularly limited. A unique symbol for identification is given to the work procedure manual file. Hereinafter, a unique symbol for identification is called a file identifier.
 作業手順書ファイル読込部102は、外部記憶装置203上に構築されている作業手順書データベース106に保存されている作業手順書を主記憶装置202に読込む。 The work procedure manual file reading unit 102 reads the work procedure manual stored in the work procedure manual database 106 constructed on the external storage device 203 into the main storage device 202.
 作業手順抽出部103は、作業手順書ファイルから作業手順の内容を抽出する。抽出した作業手順は、ファイル識別子と対応づけて主記憶装置202に保持するものとする。作業手順は、XMLファイルなどの構造化文書ではタグなどによって明示的に示されているものとする。作業手順書は、作業を進める手順を説明することを目的とする文書なので、このような想定は妥当である。以下の説明では、明示的な情報を使って作業手順を抽出したものと仮定するが、そのような明示的な情報が無い場合、非特許文献2のような方法を使うことが考えられる。 The work procedure extraction unit 103 extracts the contents of the work procedure from the work procedure manual file. It is assumed that the extracted work procedure is stored in the main storage device 202 in association with the file identifier. The work procedure is explicitly indicated by a tag or the like in a structured document such as an XML file. Since the work procedure manual is a document intended to explain the procedure for proceeding with the work, such an assumption is appropriate. In the following description, it is assumed that the work procedure is extracted using explicit information. However, when there is no such explicit information, it is conceivable to use a method as described in Non-Patent Document 2.
 作業手順書から抽出した作業手順は、図4に示すようにファイル識別子と作業手順を組にして保持し、以降の処理で用いる。 The work procedure extracted from the work procedure manual is stored as a set of file identifier and work procedure as shown in FIG. 4 and used in the subsequent processing.
 作業手順書分類部104は、作業手順書データベース106に登録されている作業手順書を作業手順の類似度にしたがって分類する。このとき、作業手順類似度評価部105を使って抽出した作業手順の類似度を評価しながら処理を進める。作業手順書を分類した結果は、作業手順書分類データベース107に保存される。 The work procedure manual classification unit 104 classifies the work procedure manuals registered in the work procedure manual database 106 according to the similarity of the work procedures. At this time, the process proceeds while evaluating the similarity of the work procedure extracted using the work procedure similarity evaluation unit 105. The result of classifying the work procedure manual is stored in the work procedure manual classification database 107.
 作業手順書分類処理の流れを図5に示す。ここでは、階層的クラスタリングによって分類するものとする。分類のアルゴリズムは、類似度を使って分類する方法であれば、階層的クラスタリングに限らない。
  まず、S501ですべての作業手順をそれぞれひとつの分類とする。すなわち、作業手順書をひとつだけ含む分類を作業手順書数だけ作成し、各分類に異なる作業手順書を入れる。
  つぎに、S502で分類のあいだの類似度を作業手順類似度評価部105によって計算する。最初の分類は作業手順をひとつだけ含むので、この作業手順を使い類似度を計算する。ループのなかの2回目以降の計算では、分類に含まれる作業手順を統合して、ひとつの作業手順とし、この作業手順を使い類似度を計算する。
  類似度を計算した後は、S503で類似度最大の分類を2つ選び、それらを統合する。
  さらに、S504で統合した分類に含まれる2つの作業手順を統合し、ひとつの作業手順とする。ふたつの作業手順のひとつひとつの作業は、作業手順類似度評価部105における計算において対応がつけられている。この結果を利用して作業手順を統合する。
The flow of the work procedure manual classification process is shown in FIG. Here, classification is performed by hierarchical clustering. The classification algorithm is not limited to hierarchical clustering as long as classification is performed using similarity.
First, in S501, all work procedures are classified into one classification. That is, the number of work procedure manuals is created as many as the number of work procedure manuals, and a different work procedure manual is put into each category.
Next, the work procedure similarity evaluation unit 105 calculates the similarity between the classifications in S502. Since the first classification includes only one work procedure, the similarity is calculated using this work procedure. In the second and subsequent calculations in the loop, the work procedures included in the classification are integrated into one work procedure, and the similarity is calculated using this work procedure.
After calculating the similarity, two categories having the maximum similarity are selected and integrated in S503.
Further, the two work procedures included in the classification integrated in S504 are integrated into one work procedure. Each of the two work procedures is associated with the calculation in the work procedure similarity evaluation unit 105. Using this result, work procedures are integrated.
 統合例を図 6に、また、統合処理の流れを図7に示す。 Fig. 6 shows an example of integration, and Fig. 7 shows the flow of integration processing.
 まず、S701でふたつの作業手順の一方をA、他方をBとする。S702でBの作業手順の先頭から作業をひとつ読み出す。S703で読み出した作業についてAの作業との対応づけ結果についてチェックする。以下、対応づけのチェック結果ごとに処理が変わる(S704)。対応づいている作業が同じ作業の場合(例の作業1と作業5)、何もせず次の作業をチェックする。Bに空白が挿入されAの作業と対応づく場合、何もせず次の作業を処理する。Bの作業と対応づくAの作業が無い場合、Bの作業をAのこの位置に挿入する(例の作業4)(S705)。対応づいている作業が異なる作業の場合、Bの作業とAの作業を比較し、辞書順に並ぶようにこの位置に挿入する(例の作業3と作業6)(S706)。 First, assume that one of the two work procedures is A and the other is B in S701. In S702, one work is read from the head of the work procedure of B. The work read out in S703 is checked for the correspondence result with work A. Hereinafter, the process changes for each check result of association (S704). If the corresponding work is the same work (work 1 and work 5 in the example), the next work is checked without doing anything. If a blank is inserted in B and corresponds to the work of A, the next work is processed without doing anything. If there is no work of A corresponding to the work of B, the work of B is inserted at this position of A (work 4 in the example) (S705). If the corresponding work is different, the work of B and the work of A are compared, and inserted in this position so as to be arranged in the dictionary order (work 3 and work 6 in the example) (S706).
 図5において、最後にS505で分類の数をチェックし、1であれば処理を終了し、1より大きければ、上記の処理を繰り返す。 In FIG. 5, finally, the number of classifications is checked in S505. If it is 1, the process is terminated. If it is greater than 1, the above process is repeated.
 作業手順書分類部104の処理により、図8に示す模式例のような木構造の分類が得られる。分類の構造は、つぎのようになっている。まず、すべての作業手順書を含む分類が最上位にある。この分類は、作業手順書AからGを含む分類と作業手順書HからLを含む分類に分かれる。前者の分類は、さらに、作業手順書AからCを含む分類と作業手順書DからGを含む分類に分かれる。さらに、作業手順書AからCを含む分類は、作業手順書AとBを含む分類と作業手順書Cを含む分類に分かれる。ほかの分類の構造も同様である。 By the processing of the work procedure manual classification unit 104, a tree structure classification such as the schematic example shown in FIG. 8 is obtained. The classification structure is as follows. First, the classification that includes all work procedures is at the top. This classification is divided into a classification including work procedure manuals A to G and a classification including work procedure manuals H to L. The former classification is further divided into a classification including work procedure manuals A to C and a classification including work procedure manuals D to G. Furthermore, the classification including work procedure manuals A to C is divided into a classification including work procedure manuals A and B and a classification including work procedure manual C. The structure of other classifications is the same.
 作業手順類似度評価部105は、作業手順の類似度を評価する。このとき、作業手順における作業の並び方を系列マッチングの手法で比較し、作業手順のあいだで作業の対応がとれる割合によって評価する。 The work procedure similarity evaluation unit 105 evaluates the similarity of work procedures. At this time, the way in which the work is arranged in the work procedure is compared by a sequence matching method, and the evaluation is performed based on the ratio of work correspondence between the work procedures.
 作業手順の類似度の評価方法を図9の模式的な例を使って説明する。 The method for evaluating the similarity of work procedures will be described using the schematic example of FIG.
 作業n(nは自然数)は、ひとつひとつの作業の名称を表し、縦に並べた作業名が作業手順を表す。例えば、(a)の左側は作業1,作業2,作業3,作業4,作業5と作業を順番に進める作業手順を表す。対応関係にある作業と作業の関係を横に並べ、対応関係ののうち同じ作業が組みになっている場合を直線で表す。
(a)の左側と右側は、同じ作業である。この場合、すべての作業のあいだに対応がつき、かつ、対応関係はすべて同じ作業の組である。したがって、作業手順の類似度は最大1となる。
(b)の左側と右側は、同じ作業から構成される系列であるが、作業の順番が異なる。この場合、対応関係が同じ作業なのは1組だけであり、作業手順の長さが5に対して1であるので、類似度を1/5=0.2とする。
(c)の左側と右側は、ほぼ同じ作業から構成される系列である。この場合、対応関係が同じ作業の組なの4組あり、作業手順の長さが5に対して4であるので、類似度を4/5=0.8 とする。
(d)の左側と右側も、ほぼ同じ作業から構成される系列であるが、系列の長さが異なる。この場合、系列に空白を挿入することで対応をつける。対応関係が同じ作業の組は4組ある。長いほうの作業手順を使って類似度を計算し4/5=0.8 とする。
The work n (n is a natural number) represents the name of each work, and the work names arranged vertically represent the work procedure. For example, the left side of (a) represents a work procedure in which work 1, work 2, work 3, work 4, work 5, and the work are sequentially advanced. The relationship between the tasks in correspondence and the tasks is arranged horizontally, and the case where the same tasks in the correspondence are grouped is represented by a straight line.
The left and right sides of (a) are the same work. In this case, there is a correspondence between all the operations, and all the correspondence relationships are the same task set. Therefore, the similarity of work procedures is 1 at maximum.
The left side and the right side of (b) are series composed of the same work, but the order of the work is different. In this case, there is only one set of work having the same correspondence, and the length of the work procedure is 1 for 5, so the similarity is 1/5 = 0.2.
The left side and the right side of (c) are a series composed of almost the same work. In this case, since there are four sets of work having the same correspondence and the length of the work procedure is 4 with respect to 5, the similarity is 4/5 = 0.8.
The left side and the right side of (d) are series composed of substantially the same work, but the lengths of the series are different. In this case, correspondence is made by inserting a blank in the series. There are 4 sets of work with the same correspondence. The similarity is calculated using the longer work procedure and 4/5 = 0.8.
 図9に示した作業手順の類似度は、数式1にしたがって計算する。 The similarity of the work procedure shown in FIG.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 作業手順の対応づけは、文字列の対応づけと考え方は同じである。そこでDPマッチングを用いた文字列の類似度判定方法を利用して、作業手順の対応度を計算する。本計算方法は、非特許文献3など多数の書籍で公開されているのでここでは詳細に説明しない。 The correspondence of work procedures is the same as that of character strings. Therefore, the degree of correspondence of the work procedure is calculated using a method for determining the similarity of character strings using DP matching. Since this calculation method is disclosed in many books such as Non-Patent Document 3, it will not be described in detail here.
 ここでは、図10に示す文字列の対応づけを例として用い、簡単に計算方法を説明する。簡単のために2つの文字列、「文書部品を利用した文書作成方法」と「文書を再利用した文書作成方法」を照合する。3つ以上の文字列を照合する場合は、すべての組み合わせを計算すればよい。 Here, the calculation method will be briefly described using the correspondence of the character strings shown in FIG. 10 as an example. For simplicity, two character strings, “a document creation method using a document part” and “a document creation method that reuses a document” are collated. When three or more character strings are collated, all combinations may be calculated.
 コストは、2つの文字列がどの程度異なっているかを示す数値である。一方の文字列を他方の文字列に変形するのに必要な操作の回数を使い計算する。操作としては、文字の挿入や削除、置換を考える。それぞれの操作にコストを付与し、必要な操作についてコストを合計する。ここでは、文字を挿入、削除、置換したとき-2点を与え、一致したときに2点を与えるものとする。 The cost is a numerical value indicating how different the two character strings are. The calculation is performed using the number of operations necessary to transform one character string into the other character string. As operations, insertion, deletion, and replacement of characters are considered. A cost is assigned to each operation, and the costs are totaled for the necessary operations. Here, -2 points are given when a character is inserted, deleted, or replaced, and 2 points are given when they match.
 DPマッチングでは、図11に示すように、比較する文字列をそれぞれ列と行に対応させ、スコアを2次元の表で管理し、表のマスにスコアを順番に計算してゆく。図11の例は、行に「文書部品を利用した文書作成方法」、列に「文書を再利用した文書作成方法」を対応させている。表のマスの位置を、行と列を使い、表すとする。n行目、m列目のマスは(n,m)で表す。なお、行、列とも1から始まるものとする。マス(n,m)のスコアS(n,m)は、式1で計算される。このとき、スコアを計算するときに使ったマスを記憶しておく。 In DP matching, as shown in FIG. 11, each character string to be compared is associated with a column and a row, the score is managed in a two-dimensional table, and the score is calculated in turn in the table. In the example of FIG. 11, “document creation method using document parts” is associated with a row, and “document creation method using document reuse” is associated with a column. Suppose that the position of a cell in the table is expressed using rows and columns. The square in the nth row and the mth column is represented by (n, m). Note that both rows and columns start from 1. The score S (n, m) of the cell (n, m) is calculated by Equation 1. At this time, the cell used to calculate the score is stored.
 例えば、S(12,13)の値は、行の12番目の文字が「作」であり、列の13番目の文字が「作」であるので、1項はd(12,13)が2になるので14、2項は10、3項は10なので、最大である1項を選び、14になる。このとき、マス(12,13)のスコアはマス(11,12)の値から計算したことを記憶しておく。ただし、スコアが0になったときは、記憶していた分を含めすべて消去する。 For example, the value of S (12,13) is such that the twelfth character of the row is “saku” and the thirteenth character of the column is “saku”. Therefore, since the second term is 10, the third term is 10, and the third term is 10, the largest one term is selected and becomes 14. At this time, it is stored that the score of the square (12, 13) is calculated from the value of the square (11, 12). However, when the score becomes 0, all of the stored contents are deleted.
 スコア表の各セルのスコアは、そのセルのスコアを計算するまでにたどったセルが対応する文字列の類似度を表す。このスコアを計算すると、使ったマスを記憶したときと逆の順にたどることで、文字列の対応関係を得ることができる。スコアが高いセルから順に対応関係をたどることで、類似度が高い対応関係から順に得ることができる。このとき、一度たどったセルを2度たどらないようにすることで、同じ文字列を含む部分を何度も抽出することを抑制することができる。図11では、(15,16)の値が20で最大である。この値を得るために、たどったマスをたどる。文字列を対応づけた結果を図12に示す。 The score of each cell in the score table represents the similarity of the character string corresponding to the cell traced until the score of the cell is calculated. When this score is calculated, the correspondence between the character strings can be obtained by following the reverse order of storing the used squares. By following the correspondences in order from the cell with the highest score, it is possible to obtain the correspondences in descending order of similarity. At this time, it is possible to prevent the portion including the same character string from being extracted many times by preventing the cell once traced from being traced twice. In FIG. 11, the value of (15, 16) is 20 and is the maximum. Follow this square to get this value. The result of associating the character strings is shown in FIG.
 DPマッチングを用いた作業手順の対応づけは、数式2を再帰的に適用することで計算する。長さnの作業手順S(1, n)=(s1, s2, …, sn)と長さmの作業手順T(1, m)=(t1, t2, …, tm)の対応度を計算し、対応度m(S(1, n), T(1, m))が大きい対応づけを出力することで作業手順を対応づけられる。 The correspondence of work procedures using DP matching is calculated by applying Equation 2 recursively. Calculate the degree of correspondence between work procedure S (1, n) = (s1, s2,…, sn) of length n and work procedure T (1, m) = (t1, t2,…, tm) of length m The work procedure can be associated by outputting a correspondence having a large correspondence m (S (1, n), T (1, m)).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 作業名の類似度判定にも、DPマッチングを用いた文字列の類似度判定方法を利用する。作業名K(1, n)=(k1, k2, …, kn)と作業名L(1, m)=(l1, l2, …, lm)の文字列類似度SMは、数式3により計算する。 文字 The character string similarity determination method using DP matching is also used to determine the similarity of work names. The character string similarity SM between the work name K (1, n) = (k1, k2,…, kn) and the work name L (1, m) = (l1, l2,…, lm) is calculated by Equation 3. .
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 本実施列では、数式3を使い作業名が同一かどうか判定しているが、辞書を使い判定することも考えられる。辞書を使えば、「照合」と「マッチング」のような表記がまったく異なる同義語についても同一であることを判定できる。 In this implementation column, it is determined whether the work names are the same using Formula 3, but it is also possible to determine using a dictionary. By using a dictionary, it is possible to determine that synonyms having completely different notations such as “collation” and “matching” are the same.
 作業手順書データベース106には、すべての作業手順書ファイルが保存されている。本データベースは、リレーショナルデータベースや、XMLデータベース、あるいは、ファイルサーバなどのプログラムを使って外部記憶装置203の上に構築することができる。本実施例では、リレーショナルデータベース上にテーブル形式で構築するものとする。図13に示すようにデータベース内部での管理に用いるファイルID、ファイル識別子、ファイル内の文字列を対応づけて保存する。 The work procedure manual database 106 stores all work procedure manual files. This database can be constructed on the external storage device 203 using a program such as a relational database, an XML database, or a file server. In this embodiment, it is assumed that the table is constructed on the relational database. As shown in FIG. 13, the file ID, file identifier, and character string in the file used for management inside the database are stored in association with each other.
 作業手順書分類データベース106には、作業手順書を分類した結果と、ファイル識別子および作業手順書から抽出した作業手順と関係づけて保存する。本データベースは、リレーションナルデータベースや、XMLデータベースなどのプログラムを使って外部記憶装置203の上に構築することができる。本実施例では、リレーショナルデータベース上にテーブル形式で構築するものとする。図14に示すように2つのテーブルで構築する。テーブル(a)には、作業手順、分類ID、親の分類IDを対応づけて保存する。親の分類IDとは、階層的クラスタリングで得られる分類階層でひとつ上の階層の分類IDである。作業手順は、作業手順類似度評価部105の計算時にふたつの分類を統合したときに得られる統合した作業手順である。(b)には、分類IDとファイルIDの対応関係を保存する。 The work procedure manual classification database 106 stores the result of classifying the work procedure manual in association with the file identifier and the work procedure extracted from the work procedure manual. This database can be constructed on the external storage device 203 using a program such as a relational database or an XML database. In this embodiment, it is assumed that the table is constructed on the relational database. As shown in FIG. 14, it is constructed with two tables. In the table (a), the work procedure, the classification ID, and the parent classification ID are stored in association with each other. The parent classification ID is a classification ID of the hierarchy one level higher than the classification hierarchy obtained by hierarchical clustering. The work procedure is an integrated work procedure obtained when the two classifications are integrated at the time of calculation by the work procedure similarity evaluation unit 105. In (b), the correspondence between the classification ID and the file ID is stored.
 検索クエリ受付部108は、作業手順書を検索するためのクエリの入力を受け付ける。クエリの入力は、キーボードなどの入力装置204を使う。クエリは、作業手順の形式で入力されるものとする。 The search query receiving unit 108 receives a query input for searching for a work procedure manual. A query is input using an input device 204 such as a keyboard. The query is input in the form of a work procedure.
 作業手順書検索部109は、検索クエリとして入力された作業手順と作業手順書分類データベース107に保存されている分類ごとの作業手順の類似度を評価することで作業手順書を検索する。このとき作業手順類似度を使い作業手順のあいだの類似度を評価する。検索結果は、分類IDと作業手順書ファイルのファイル識別子によって表される。 The work procedure manual search unit 109 searches the work procedure manual by evaluating the similarity between the work procedure input as a search query and the work procedure for each classification stored in the work procedure manual classification database 107. At this time, the similarity between work procedures is evaluated using the work procedure similarity. The search result is represented by the classification ID and the file identifier of the work procedure manual file.
 処理の手順を図15に示す。S1501で検索クエリ受付部108で入力された検索クエリを読み込み。続いて、S1502で作業手順書分類データベース107から、階層が下の分類から順に分類IDをひとつ読み込み。このとき、作業手順を関係付けて読み込む。つぎに、S1503で分類が関係づいている作業手順と、検索クエリとの類似度を計算する。類似度が0よりも大きい場合(類似する場合)、S1505で分類が含む作業手順書のファイル識別子を記憶しておく。この結果は、作業手順書ファイル出力部110で使用する。つづいて、S1506で分類の上位分類(処理した分類を含む分類)は、以降、処理済みとして扱うようチェック済みとしてマークする。S1507でチェックマークがないあるいはまだ処理していない分類があれば、もういちど検索クエリとの照合処理を行う。 The processing procedure is shown in FIG. In S1501, the search query input by the search query receiving unit 108 is read. Subsequently, in step S1502, one classification ID is read from the work procedure manual classification database 107 in order from the lower hierarchy. At this time, work procedures are related and read. Next, in S1503, the similarity between the work procedure with which the classification is related and the search query is calculated. If the similarity is greater than 0 (similar), the file identifier of the work procedure manual included in the classification is stored in S1505. This result is used by the work procedure manual file output unit 110. Subsequently, in S1506, the higher-level classification (the classification including the processed classification) is marked as checked so as to be treated as processed thereafter. If there is no check mark in S1507 or there is a classification that has not yet been processed, the matching process with the search query is performed again.
 作業手順書ファイル出力部110は、記憶しておいたファイル識別子を使って作業手順書データベース106から作業手順書ファイルを読み出し、ディスプレイなどの出力装置205に出力する。このとき、類似度が大きい順にソートして出力することで検索クエリにより適合した作業手順書をランキングの上位に表示することができる。 The work procedure manual file output unit 110 reads the work procedure manual file from the work procedure manual database 106 using the stored file identifier, and outputs it to the output device 205 such as a display. At this time, by sorting and outputting in descending order of similarity, it is possible to display the work procedure manual that matches the search query at the top of the ranking.
 図3に、図1のブロック構成図に対応する、本実施例の処理のフローを示す。S301で作業手順書ファイルを入力し、作業手順書データベース106に記憶する。S302で作業手順書データベース106から作業手順書ファイルを読み込む。S303で作業手順書ファイルから作業手順の記述を抽出する。S304で作業手順の類似度を評価する。S305で作業手順の類似度にしたがって文書を分類し、作業手順書分類データベース107を構築する。ここまでが、作業手順関連づけ方法に該当する。 FIG. 3 shows a processing flow of this embodiment corresponding to the block diagram of FIG. In S301, the work procedure manual file is input and stored in the work procedure manual database 106. In step S <b> 302, the work procedure manual file is read from the work procedure manual database 106. In S303, a description of the work procedure is extracted from the work procedure manual file. In step S304, the similarity of the work procedure is evaluated. In S305, the documents are classified according to the similarity of the work procedure, and the work procedure manual classification database 107 is constructed. The process up to here corresponds to the work procedure association method.
 次に、構築した作業手順書分類データベースを用いて、作業手順書を検索する。S306で作業手順書を検索するための検索クエリの入力を受け付ける。S307で検索クエリとして入力された作業手順と、作業手順書分類データベース107に保存されている分類ごとの作業手順の類似度を評価し、作業手順書を検索する。S308で検索結果に基づいて、作業手順書データベース106から作業手順書ファイルを読み出して出力装置205へ出力する。 Next, the work procedure manual is searched using the constructed work procedure manual classification database. In S306, an input of a search query for searching the work procedure manual is accepted. The degree of similarity between the work procedure input as a search query in S307 and the work procedure for each category stored in the work procedure manual classification database 107 is evaluated, and the work procedure manual is searched. In step S <b> 308, based on the search result, the work procedure manual file is read from the work procedure manual database 106 and output to the output device 205.
 以上、図1および図3に示す装置および方法によって、作業手順を関連づけまた作業手順書を効率的に検索できることを説明した。 As described above, it has been explained that work procedures can be associated and work procedure manuals can be efficiently searched by the apparatus and method shown in FIGS.
101 作業手順書ファイル入力部
102 作業手順書ファイル読込部
103 作業手順抽出部
104 作業手順書分類部
105 作業手順類似度評価部
106 作業手順書データベース
107 作業手順書分類データベース
108 検索クエリ受付部
109 作業手順書検索部
110 作業手順書ファイル出力部
201 中央処理装置
202 主記憶装置
203 外部記憶装置
204 入力装置
205 出力装置
206 ネットワーク
207 情報処理装置
DESCRIPTION OF SYMBOLS 101 Work procedure manual file input part 102 Work procedure manual file reading part 103 Work procedure extraction part 104 Work procedure manual classification part 105 Work procedure similarity evaluation part 106 Work procedure manual database 107 Work procedure manual classification database 108 Search query reception part 109 Work Procedure manual search unit 110 Work procedure manual file output unit 201 Central processing unit 202 Main storage device 203 External storage device 204 Input device 205 Output device 206 Network 207 Information processing device

Claims (15)

  1.  複数の文書ファイルを入力するステップと、
     前記文書ファイルそれぞれから作業手順の記述を抽出するステップと、
     作業手順の類似度を評価するステップと、
     前記作業手順の類似度にしたがって前記文書を木構造に分類するステップを備えることを特徴とする文書関連付け方法。
    Entering multiple document files;
    Extracting a description of the work procedure from each of the document files;
    A step of evaluating the similarity of work procedures;
    A document associating method comprising: classifying the document into a tree structure according to the similarity of the work procedure.
  2.  請求項1記載の文書関連付け方法において、
     前記作業手順の類似度を評価するステップは、作業の並び方を系列マッチングの手法を利用して比較し、作業手順の類似度を評価することを特徴とする文書関連付け方法。
    The document association method according to claim 1,
    The step of evaluating the similarity of work procedures is a method for associating the work procedures by comparing the order of work using a sequence matching method and evaluating the similarity of work procedures.
  3.  請求項1または請求項2に記載の文書関連付け方法において、
     前記文書を分類するステップは、階層的クラスタリングを利用して前記文書を分類することで木構造に分類することを特徴とする文書関連付け方法。
    The document association method according to claim 1 or 2,
    The step of classifying the documents is classified into a tree structure by classifying the documents using hierarchical clustering.
  4.  請求項2に記載の文書関連付け方法において、
     前記作業手順の類似度を評価するステップにおいて、更に、作業手順を構成する作業名を比較するとき系列マッチングを利用して作業手順の名称について類似度を評価することを特徴とする文書関連付け方法。
    The document association method according to claim 2,
    In the step of evaluating the similarity of the work procedure, the document association method further comprises evaluating the similarity of the names of the work procedures using series matching when comparing the work names constituting the work procedures.
  5.  請求項2に記載の文書関連付け方法において、
     前記作業手順の類似度を評価するステップにおいて、更に、作業手順を構成する作業名を比較するとき辞書を利用して作業手順の名称について類似度を評価することを特徴とする文書関連付け方法。
    The document association method according to claim 2,
    In the step of evaluating the similarity of the work procedure, the document association method further comprising: using a dictionary to evaluate the similarity of the work procedure name when comparing the work names constituting the work procedure.
  6.  請求項1~5の何れか1つに記載の文書関連付け方法において、
     前記文書は、作業手順書であることを特徴とする文書関連付け方法。
    The document association method according to any one of claims 1 to 5,
    The document association method, wherein the document is a work procedure manual.
  7.  請求項1~6の何れか1つに記載の文書関連付け方法を用いて作成した分類データベースを、
     作業手順の形式の検索クエリを入力するステップと、
     前記検索クエリと前記分類データベースの分類ごとの作業手順の類似度を評価し、文書を検索するステップとにより検索することを特徴とする文書検索方法。
    A classification database created using the document association method according to any one of claims 1 to 6,
    Entering a search query in the form of a routing,
    A document search method comprising: searching for a document by evaluating a similarity between the search query and a work procedure for each classification in the classification database.
  8.  複数の文書ファイルを入力する文書ファイル入力部と、
     前記文書ファイルそれぞれから作業手順の記述を抽出する作業手順抽出部と、
     作業手順の類似度を評価する作業手順類似度評価部と、
     前記作業手順の類似度にしたがって前記文書を木構造に分類する文書分類部を備えることを特徴とする文書関連付け装置。
    A document file input section for inputting a plurality of document files;
    A work procedure extraction unit that extracts a description of the work procedure from each of the document files;
    A work procedure similarity evaluation unit for evaluating the similarity of work procedures;
    A document association apparatus comprising a document classification unit that classifies the document into a tree structure according to the similarity of the work procedure.
  9.  請求項8記載の文書関連付け装置において、
     前記作業手順類似度評価部は、作業の並び方を系列マッチングの手法を利用して比較し、作業手順の類似度を評価することを特徴とする文書関連付け装置。
    The document association apparatus according to claim 8.
    The work procedure similarity evaluation unit compares work arrangements using a series matching method and evaluates the similarity of work procedures.
  10.  請求項8または請求項9に記載の文書関連付け装置において、
     前記文書分類部は、階層的クラスタリングを利用して前記文書を分類することで木構造に分類することを特徴とする文書関連付け装置。
    The document association apparatus according to claim 8 or 9,
    The document classification device classifies the document into a tree structure by classifying the document using hierarchical clustering.
  11.  請求項9に記載の文書関連付け装置において、
     前記作業手順類似度評価部は、作業手順を構成する作業名を比較するとき系列マッチングを利用して作業手順の名称について類似度を評価することを特徴とする文書関連付け装置。
    The document association apparatus according to claim 9, wherein
    The work procedure similarity evaluation unit evaluates the similarity of work procedure names using series matching when comparing the work names constituting the work procedures.
  12.  請求項9に記載の文書関連付け装置において、
     前記作業手順類似度評価部は、作業手順を構成する作業名を比較するとき辞書を利用して作業手順の名称について類似度を評価することを特徴とする文書関連付け装置。
    The document association apparatus according to claim 9, wherein
    The work procedure similarity evaluation unit uses a dictionary to evaluate the similarity of work procedure names when comparing the work names constituting the work procedures.
  13.  請求項8~12の何れか1つに記載の文書関連付け装置において、
     前記文書は、作業手順書であることを特徴とする文書関連付け装置。
    The document association apparatus according to any one of claims 8 to 12,
    The document association apparatus, wherein the document is a work procedure manual.
  14.  請求項8~13の何れか1つに記載の文書関連付け装置を用いて作成した分類データベースと、
     作業手順の形式の検索クエリの入力を受け付ける検索クエリ受付部と、
     前記検索クエリと前記分類データベースの分類ごとの作業手順の類似度を評価し、文書を検索する文書検索部を備えることを特徴とする文書検索装置。
    A classification database created using the document association apparatus according to any one of claims 8 to 13;
    A search query accepting unit that accepts input of a search query in the form of a work procedure;
    A document search apparatus comprising: a document search unit that evaluates the similarity between the search query and a work procedure for each classification of the classification database, and searches for a document.
  15.  コンピュータを文書関連付け装置として機能させるためのプログラムであって、
     複数の文書ファイルを入力する文書ファイル入力部と、
     前記文書ファイルそれぞれから作業手順の記述を抽出する作業手順抽出部と、
     作業手順の類似度を評価する作業手順類似度評価部と、
     前記作業手順の類似度にしたがって前記文書を木構造に分類する文書分類部をとして機能させるプログラム。
    A program for causing a computer to function as a document association device,
    A document file input section for inputting a plurality of document files;
    A work procedure extraction unit that extracts a description of the work procedure from each of the document files;
    A work procedure similarity evaluation unit for evaluating the similarity of work procedures;
    A program that functions as a document classification unit that classifies the document into a tree structure according to the similarity of the work procedure.
PCT/JP2012/066348 2012-06-27 2012-06-27 Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor WO2014002212A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2014522292A JP5894273B2 (en) 2012-06-27 2012-06-27 Document association method, document retrieval method, document association apparatus, document retrieval apparatus, and program therefor
PCT/JP2012/066348 WO2014002212A1 (en) 2012-06-27 2012-06-27 Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/066348 WO2014002212A1 (en) 2012-06-27 2012-06-27 Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor

Publications (1)

Publication Number Publication Date
WO2014002212A1 true WO2014002212A1 (en) 2014-01-03

Family

ID=49782444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/066348 WO2014002212A1 (en) 2012-06-27 2012-06-27 Document linking method, document searching method, document linking apparatus, document linking apparatus, and program therefor

Country Status (2)

Country Link
JP (1) JP5894273B2 (en)
WO (1) WO2014002212A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016031613A (en) * 2014-07-28 2016-03-07 富士通株式会社 Search program, device and method
JP2018073354A (en) * 2016-11-04 2018-05-10 Kddi株式会社 Device, method, and program for extracting similar document

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6847812B2 (en) 2017-10-25 2021-03-24 株式会社東芝 Document comprehension support device, document comprehension support method, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08106474A (en) * 1994-10-07 1996-04-23 Hitachi Ltd Method and device for displaying similar example sentence retrieval result
JP2000222215A (en) * 1999-01-27 2000-08-11 Mitsubishi Electric Corp Procedure base example retrieving system
JP2003316796A (en) * 2002-04-26 2003-11-07 Fuji Xerox Co Ltd Hierarchical clustering device, hierarchical clustering method, hierarchical clustering program and hierarchical clustering system
JP2004145626A (en) * 2002-10-24 2004-05-20 Telecommunication Advancement Organization Of Japan Documents classification support device and computer program
JP2005266866A (en) * 2004-03-16 2005-09-29 Fuji Xerox Co Ltd Document classifying device and classification system generating device and method for document classifying device
JP2009181170A (en) * 2008-01-29 2009-08-13 Fujitsu Ltd Operation procedure manual preparation support system
JP2010176626A (en) * 2009-02-02 2010-08-12 Fujitsu Ltd Program and method for document clustering
JP2010267141A (en) * 2009-05-15 2010-11-25 Toshiba Corp Document classification device and program

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08106474A (en) * 1994-10-07 1996-04-23 Hitachi Ltd Method and device for displaying similar example sentence retrieval result
JP2000222215A (en) * 1999-01-27 2000-08-11 Mitsubishi Electric Corp Procedure base example retrieving system
JP2003316796A (en) * 2002-04-26 2003-11-07 Fuji Xerox Co Ltd Hierarchical clustering device, hierarchical clustering method, hierarchical clustering program and hierarchical clustering system
JP2004145626A (en) * 2002-10-24 2004-05-20 Telecommunication Advancement Organization Of Japan Documents classification support device and computer program
JP2005266866A (en) * 2004-03-16 2005-09-29 Fuji Xerox Co Ltd Document classifying device and classification system generating device and method for document classifying device
JP2009181170A (en) * 2008-01-29 2009-08-13 Fujitsu Ltd Operation procedure manual preparation support system
JP2010176626A (en) * 2009-02-02 2010-08-12 Fujitsu Ltd Program and method for document clustering
JP2010267141A (en) * 2009-05-15 2010-11-25 Toshiba Corp Document classification device and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016031613A (en) * 2014-07-28 2016-03-07 富士通株式会社 Search program, device and method
JP2018073354A (en) * 2016-11-04 2018-05-10 Kddi株式会社 Device, method, and program for extracting similar document

Also Published As

Publication number Publication date
JP5894273B2 (en) 2016-03-23
JPWO2014002212A1 (en) 2016-05-26

Similar Documents

Publication Publication Date Title
CN109885692B (en) Knowledge data storage method, apparatus, computer device and storage medium
US7882119B2 (en) Document alignment systems for legacy document conversions
US10360294B2 (en) Methods and systems for efficient and accurate text extraction from unstructured documents
JP5605583B2 (en) Retrieval method, similarity calculation method, similarity calculation and same document collation system, and program thereof
WO2020056977A1 (en) Knowledge point pushing method and device, and computer readable storage medium
Billey et al. Fingerprint databases for theorems
CN106933824A (en) The method and apparatus that the collection of document similar to destination document is determined in multiple documents
JP5894273B2 (en) Document association method, document retrieval method, document association apparatus, document retrieval apparatus, and program therefor
Klampfl et al. Reconstructing the logical structure of a scientific publication using machine learning
Matsuoka et al. Examination of effective features for CRF-based bibliography extraction from reference strings
CN105426490A (en) Tree structure based indexing method
JP2010272006A (en) Relation extraction apparatus, relation extraction method and program
KR100659370B1 (en) Method for constructing a document database and method for searching information by matching thesaurus
JP2008197952A (en) Text segmentation method, its device, its program and computer readable recording medium
Chala et al. A Framework for Enriching Job Vacancies and Job Descriptions Through Bidirectional Matching.
Souza et al. ARCTIC: metadata extraction from scientific papers in pdf using two-layer CRF
Garrido et al. NEREA: Named entity recognition and disambiguation exploiting local document repositories
JP4148247B2 (en) Vocabulary acquisition method and apparatus, program, and computer-readable recording medium
Wu et al. Query Expansion by Word Embedding in the Suggestion Track of CLEF 2016 Social Book Search Lab.
Nghiem et al. Which one is better: presentation-based or content-based math search?
Kale et al. Author identification on imbalanced class dataset of Indian literature in Marathi
Adefowoke Ojokoh et al. Automated document metadata extraction
Chanod et al. From legacy documents to xml: A conversion framework
CN112949287B (en) Hot word mining method, system, computer equipment and storage medium
JP2007272699A (en) Document indexing device, document retrieval device, document classifying device, and method and program thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12879808

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014522292

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12879808

Country of ref document: EP

Kind code of ref document: A1