JP2003178055A

JP2003178055A - Document data relation extracting device and extracting program

Info

Publication number: JP2003178055A
Application number: JP2001377507A
Authority: JP
Inventors: Takayuki Tsuno; 孝行津野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-12-11
Filing date: 2001-12-11
Publication date: 2003-06-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document data relation extracting device capable of easily acquiring the information on the relation between documents by various procedures, by accumulating the relation information between the documents. <P>SOLUTION: This device is provided with a document data managing function 103 for managing the registration of a document, the reference and the acquisition of the differential information of the document on the basis of the instruction from a terminal, a document database 110 for storing the document information specifying the registered document and the document data indicating the contents of the document in a state of being related to each other, a system definition information defining function 104 for defining the relation extracting condition for extracting the relation of the stored document, and the system definition information for defining the change and the elimination of the extracting condition, a relation information extracting function 105 extracting the information indicating the relation between the documents from the document stored in the document database on the basis of the system definition information, and a display data creating function 106 for displaying the information indicating the relation between the extracted documents. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は文書データの関連性
抽出装置等に係り、特にデータベース（ＤＢ）に蓄積し
た文書データの関連性情報を抽出する文書データの関連
性抽出装置等に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document data relevance extracting apparatus and the like, and more particularly to a document data relevance extracting apparatus and the like for extracting relevance information of document data accumulated in a database (DB).

【０００２】[0002]

【従来の技術】研究論文、仕様書、取り扱い説明書、公
文書等、複数の文書を通読しないと一つの事象や事柄等
の調査が終わらないことがよくある。これは、各文書間
に関連性があるためであり、その関連性を維持すること
が文書制作での課題の一つとなっている。文書を改訂す
る場合、複数の文書にある関連する情報に矛盾が発生し
ないよう、例えば、ある情報を削除する場合は他文書の
同件の情報を修正し、また、別の情報を書き換える場合
は他文書の関連情報を削除する等の作業が必要となる。2. Description of the Related Art The research of one event or matter is often not completed unless a plurality of documents such as research papers, specifications, instruction manuals, and official documents are read. This is because there is a relationship between the documents, and maintaining the relationship is one of the issues in document production. When revising a document, make sure that there is no inconsistency in the related information in multiple documents.For example, if you delete one information, modify the same information in another document, or rewrite another information. Work such as deleting related information in other documents is required.

【０００３】一つの事象や事柄が複数の文書に分散して
書かれることは珍しくなく、さらに、文書間の関連性が
参照指示や引用等によってのみ示されているとは限らな
い。この結果、文書の改訂等に伴う影響を調べる場合、
関連しそうな文書をすべて通読して確認し、あるいは汎
用の全文検索機能を使って抽出する等の作業が必要とな
り、作業負担が増大する。It is not uncommon for one event or matter to be distributed and written in a plurality of documents, and the relevance between documents is not always indicated only by reference instructions or citations. As a result, when investigating the effects of document revisions,
It is necessary to read through all the documents that are likely to be related and check them, or to extract them using a general-purpose full-text search function, which increases the work load.

【０００４】本来、文書は他の文書との関連性を持って
書かれるか又は何らかの関連を有するものである。前記
関連性を明示するものとしては、例えば、参照指示や引
用があり、ＨＴＭＬ文書のようなタグ付き文書であれば
リンク情報もそれにあたる。また、関連性のある事項に
は同じ用語が使われることに着目すれば、前記同一用語
は関連性があることを暗示していると仮定できる。Originally, a document is written or has some relation with other documents. For example, a reference instruction or a citation is used to clearly indicate the relevance, and link information corresponds to a tagged document such as an HTML document. Also, focusing on the fact that the same terms are used for related items, it can be assumed that the same terms imply that they are related.

【０００５】文書間の関連性に着目した技術としては、
例えば、特開平９−１４６９６８号、あるいは特開２０
００−９９５４３号等がある。特開平９−１４６９６８
号には、文書の登録時に参考文献のタイトルを手作業で
入力し、文書管理情報内のタイトルと比較し、合致する
文書があれば関連文書テーブルにその結果を保存するこ
とが示されている。特開２０００−９９５４３号には、
文書データをＸＭＬ形式のタグ付き文書に変換し、参考
文献を示す文字列を含む文書構造上のノードの抽出後、
抽出した情報から参考文献のタイトルを取り出すという
方式を使って、文書の関連性を把握することが示されて
いる。As a technique focusing on the relation between documents,
For example, Japanese Patent Laid-Open No. 9-146968 or Japanese Patent Laid-Open No. 20
00-99543 and the like. JP-A-9-146968
The issue shows that the title of the reference is manually entered when the document is registered, compared with the title in the document management information, and if there is a matching document, the result is saved in the related document table. . Japanese Patent Laid-Open No. 2000-99543 discloses
After converting the document data into an XML-formatted tagged document and extracting the nodes on the document structure including the character strings indicating the references,
It is shown that the relevance of documents is grasped by using the method of extracting the title of the reference document from the extracted information.

【０００６】これらの技術はいずれも、文書に、参考文
献あるいは関連文書といった関連性を明示する文字列が
含まれることを前提とする技術である。つまり、文書名
に着目し、文書間の参照関係を参照された順番に探して
いくことにより関連文書を検索するものである。All of these techniques are based on the premise that the document includes a character string that clearly indicates the relevance such as a reference document or a related document. That is, the related documents are searched by focusing on the document names and searching for the reference relationships between the documents in the order in which they are referenced.

【０００７】[0007]

【発明が解決しようとする課題】前記公報に示された技
術は、いずれも参照指示や引用情報等の明示された関連
性情報を取り扱うに過ぎず、それらの情報が含まれてい
ない文書間の関連性を把握することは難しい。Each of the techniques disclosed in the above publications only deals with the specified relevance information such as reference instructions and citation information, and between documents that do not contain such information. It is difficult to understand the relevance.

【０００８】文書間のすべての関連性を漏れなく見つけ
るには、前述のように、汎用の全文検索機能を用い関連
する用語を利用者が手作業で繰り返し入力して検索する
方法、あるいは目視で確認する方法しか現実には対処で
きない。In order to find all the relationships between documents without omission, as described above, the user can repeatedly enter the related terms manually by using the general-purpose full-text search function, or visually. Only the method of confirmation can deal with the reality.

【０００９】本発明はこれらの事情に鑑みてなされたも
ので、文書間の関連性情報を明示／暗示の区別なく蓄積
することにより文書間の関連性に則った情報を様々な手
順で容易に取得することができる文書データの関連性抽
出装置を提供するものである。The present invention has been made in view of these circumstances. By accumulating relevance information between documents without distinction between explicit and implicit, information according to relevance between documents can be easily obtained by various procedures. The present invention provides a relevance extraction device for document data that can be acquired.

【００１０】[0010]

【課題を解決するための手段】本発明は、上記の課題を
解決するために次のような手段を採用した。The present invention adopts the following means in order to solve the above problems.

【００１１】端末からの指示のもとに文書の登録、参照
及び前記文書の差分情報の取得を管理する文書データの
管理機能と、前記登録した文書を特定する文書情報及び
前記文書の内容を表す文書データを互いに関連づけて格
納する文書データベースと、前記格納した文書の関連性
を抽出するための関連性抽出条件及び該抽出条件の変
更、削除を定義するシステム定義情報を定義するシステ
ム定義情報定義機能と、前記システム定義情報をもとと
に前記文書データベースに格納した文書から文書間の関
連性を示す情報を抽出する関連性情報の抽出機能と、抽
出した文書間の関連性を示す情報を表示する表示データ
の生成機能を備えた。A document data management function for managing registration and reference of a document and acquisition of difference information of the document based on an instruction from the terminal, and document information for specifying the registered document and content of the document. A document database that stores document data in association with each other, and a system definition information definition function that defines a relationship extraction condition for extracting the relationship between the stored documents and system definition information that defines change or deletion of the extraction condition And a function of extracting relevance information for extracting information indicating relevance between documents from documents stored in the document database based on the system definition information, and displaying information indicating relevance between extracted documents It has a function to generate display data.

【００１２】[0012]

【発明の実施の形態】以下、本発明の実施形態を添付図
面を参照しながら説明する。図１は、本発明の実施形態
に係る文書データの関連性抽出装置を示す図である。図
において、端末１０１は、文書データの関連性抽出装置
１０２に接続され、利用者はこの端末１０１を介して前
記文書データの関連性抽出装置１０２を操作する。文書
データの関連性抽出装置１０２の構成要素は、文書デー
タの管理機能１０３、システム定義情報定義機能１０
４、関連性情報の抽出機能１０５、表示データの生成機
能１０６、システム定義情報ＤＢ１０７、文書ＤＢ１１
０、及び関連性ＤＢ１１３である。DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a diagram showing a document data relevance extraction apparatus according to an embodiment of the present invention. In the figure, a terminal 101 is connected to a document data relevance extraction apparatus 102, and a user operates the document data relevance extraction apparatus 102 via this terminal 101. The constituent elements of the document data relevance extraction device 102 are a document data management function 103 and a system definition information definition function 10.
4. Relevance information extraction function 105, display data generation function 106, system definition information DB 107, document DB 11
0 and the relevance DB 113.

【００１３】端末１０１から文書データの管理機能１０
３に対し、文書データの新規登録や削除等の指示がある
場合、文書データの管理機能１０３は端末１０１から文
書名や作成者等が示されている文書情報１１１及び文書
データ１１２を受け取って文書ＤＢ１１０に格納する。
また、文書の削除時には文書ＤＢ１１０から該当する文
書データ１１２及び文書情報１１１を削除する。このた
め、文書ＤＢ１１０には、文書情報１１１及び文書デー
タ１１２がセットで蓄積されることになる。文書ＤＢ１
１０に対して端末１０１から文書データの新規登録ある
いは削除等の指示を受けた場合、文書データの管理機能
１０３は、関連性情報の抽出機能１０５に対して文書間
の関連性情報の抽出の開始を指示する。さらに、端末１
０１からの指示を実現するのに必要な情報として、文書
ＤＢ１１０から文書情報１１１及び文書データ１１２を
取り出し、必要に応じて新旧文書データの差分情報を取
得してを関連性情報の抽出機能１０５に渡す。Document data management function 10 from the terminal 101
3, the document data management function 103 receives from the terminal 101 the document information 111 and the document data 112 indicating the document name, creator, etc. It is stored in the DB 110.
When deleting a document, the corresponding document data 112 and document information 111 are deleted from the document DB 110. Therefore, the document DB 110 stores the document information 111 and the document data 112 as a set. Document DB1
When an instruction to newly register or delete document data is received from the terminal 101, the document data management function 103 causes the relationship information extraction function 105 to start extraction of relationship information between documents. Instruct. Furthermore, terminal 1
01, the document information 111 and the document data 112 are taken out from the document DB 110 as the information necessary to realize the instruction, and the difference information between the old and new document data is acquired as necessary to the relevance information extraction function 105. hand over.

【００１４】文書データの管理機能１０３が関連性情報
の抽出機能１０５に渡す情報は、文書の新規登録時に
は、新たに登録する文書情報１１１及び文書データ１１
２であり、改訂時には、新旧の文書データ１１２を比較
して取得した差分情報及び新旧の文書情報１１１であ
り、削除時には、該当する文書データ１１２の文書情報
１１１である。なお、文書ＤＢ１１０は、文書データの
管理機能１０３の指示に従って文書情報１１１及び文書
データ１１２の登録、削除、参照の機能を有すれば良い
ため、この発明では文書データの登録方式や登録できる
文書形態は規定しない。The information passed by the document data management function 103 to the relevance information extraction function 105 is the document information 111 and the document data 11 to be newly registered when the document is newly registered.
2, the difference information is obtained by comparing the old and new document data 112 at the time of revision and the old and new document information 111, and at the time of deletion, the document information 111 of the corresponding document data 112. Note that the document DB 110 only needs to have the functions of registering, deleting, and referring to the document information 111 and the document data 112 in accordance with the instructions of the document data management function 103. Is not specified.

【００１５】関連性情報の抽出機能１０５は、文書デー
タの管理機能１０３から関連性情報の抽出の開始を指示
されると、前記文書データの管理機能１０３から受け取
った情報及びシステム定義情報１０７に格納されている
文章パターン辞書１０８を参照して、参照や引用等を示
す文字列から相手先の文書データを特定する情報を抽出
する。また、用語区分の指定１０９を参照し着目する用
語区分に合致した用語を抽出する。次いでこれらの抽出
した情報を関連性ＤＢ１１３に関連性情報１１４として
格納する。この他、改訂時には改訂のための処理を、削
除時には削除のための処理を実施する。When the document data management function 103 gives an instruction to start the extraction of the related information, the related information extraction function 105 stores the information received from the document data management function 103 and the system definition information 107. By referring to the existing sentence pattern dictionary 108, the information for identifying the document data of the other party is extracted from the character string indicating the reference or the citation. Further, referring to the term division designation 109, a term that matches the term division of interest is extracted. Next, the extracted information is stored in the relevance DB 113 as the relevance information 114. In addition, a process for revision is performed at the time of revision, and a process for deletion is performed at the time of deletion.

【００１６】システム定義情報１０７に格納している文
章パターン辞書１０８は、参照あるいは引用等を示す文
字列から相手先の文書データを特定する情報を抽出する
ためのものある。また、用語区分の指定１０９は、文書
データで使われている文字列の中で着目する用語の区分
を指定したものである。これらの指定によって、どのよ
うな形態の用語を文書データから抽出するかを決定する
ことができる。The text pattern dictionary 108 stored in the system definition information 107 is for extracting information for specifying the document data of the other party from a character string indicating reference or citation. The term category designation 109 designates the term category of interest in the character string used in the document data. By these designations, it is possible to determine what type of term is extracted from the document data.

【００１７】システム定義情報１０７の情報は、システ
ム構築時にシステム定義情報定義機能１０４を使って定
義することができる。この定義は、どのような文書を文
書ＤＢ１１０に登録するか、どのような文字列を参照あ
るいは引用を示す文字列として扱うか、どのような用語
をキーにして文書の関連性を抽出するかといったことを
検討して決めることができる。もし、文書ＤＢ１１０へ
の文書の登録後に文章パターン辞書１０８あるいは用語
区分の指定１０９の情報を変更した場合、関連性情報の
抽出機能１０５は、変更した定義に従って文書ＤＢ１１
０に登録されているすべての文書を対象に関連性の抽出
処理を再度実施し、関連性ＤＢ１１３に蓄積されている
関連性情報１１４を一括して更新する。この処理を実施
するのは、システム定義情報１０７に蓄積される情報が
文書間の関連性を抽出するための条件であるからであ
る。The information of the system definition information 107 can be defined by using the system definition information definition function 104 when constructing the system. This definition includes what kind of document is registered in the document DB 110, what kind of character string is treated as a character string indicating reference or citation, and what term is used as a key to extract the relevance of the document. You can consider and decide. If the information in the sentence pattern dictionary 108 or the term classification designation 109 is changed after the document is registered in the document DB 110, the relevance information extraction function 105 causes the document DB 11 according to the changed definition.
The relevance extraction processing is performed again for all the documents registered in 0, and the relevance information 114 accumulated in the relevance DB 113 is collectively updated. This processing is performed because the information stored in the system definition information 107 is a condition for extracting the relevance between documents.

【００１８】関連性ＤＢ１１３に格納される関連性情報
１１４は、表示データの生成機能１０６の処理データで
もある。この関連性情報１１４は、関連性情報の抽出機
能１０５によって抽出された相手先文書を特定する情
報、相手先文書の文書データで使われている用語とその
出現位置情報、及び抽出された情報間のマッピング情報
等から構成されている。The relevance information 114 stored in the relevance DB 113 is also processing data of the display data generation function 106. The relevance information 114 includes information for identifying the other party document extracted by the relevance information extracting function 105, terms used in the document data of the other party document and their appearance position information, and the extracted information. The mapping information and the like are included.

【００１９】表示データの生成機能１０６は、端末１０
１からの指示により、関連性ＤＢ１１３から前記関連性
情報を抽出し、抽出した情報を表示可能データに変換し
た後、前記端末１０１に表示する。The display data generation function 106 is used by the terminal 10
According to the instruction from 1, the relevance information is extracted from the relevance DB 113, the extracted information is converted into displayable data, and then displayed on the terminal 101.

【００２０】図２は、関連性情報の抽出機能１０５が行
う処理を示すフローチャートである。処理が開始される
と、まず、端末１０１からの指示による振り分けを行う
（（ステップ２０１）。FIG. 2 is a flowchart showing the processing performed by the relevance information extraction function 105. When the processing is started, first, distribution is performed according to an instruction from the terminal 101 ((step 201).

【００２１】端末１０１からの指示が新規登録の場合
は、まず、文書データの管理機能１０３から文書情報１
１１及び文書データ１１２を受け取り（ステップ２０
２）、次にシステム定義情報１０７に定義されている文
章パターン辞書１０８と用語区分の指定１０９を受け取
る（ステップ２０３）。次いで、受け取った情報を用い
て文字列を抽出する（ステップ２０４）。文字列の抽出
は、文章パターン辞書１０８に従って、文書データを解
析し参照や引用等を示す文字列から参照先の文書データ
を特定するための情報を抽出する。次いで、用語区分の
指定１０９に従って、文書データで使われている用語
（文字列）を抽出する。抽出した情報は、いったん関連
性ＤＢ１１３に格納した後、既存情報とのマッチング処
理を行い文書間の関連性を示す情報を付加する。When the instruction from the terminal 101 is new registration, first, the document data management function 103 causes the document information 1
11 and the document data 112 (step 20)
2) Next, the sentence pattern dictionary 108 and the term classification designation 109 defined in the system definition information 107 are received (step 203). Next, a character string is extracted using the received information (step 204). In the extraction of the character string, the document data is analyzed according to the sentence pattern dictionary 108, and the information for identifying the document data of the reference destination is extracted from the character string indicating the reference or the citation. Next, according to the term classification designation 109, terms (character strings) used in the document data are extracted. The extracted information is once stored in the relevance DB 113 and then subjected to matching processing with existing information to add information indicating relevance between documents.

【００２２】端末１０１からの指示が改訂の場合は、ま
ず、文書データの管理機能１０３から文書情報１１１と
文書データ１１２に加えて新旧文書データの差分情報を
受け取る（ステップ２０６）。前記新規登録の場合との
処理の違いは前記差分情報の扱いにある。差分情報は旧
版と新版の差違を示す情報であり、関連性ＤＢ１１３に
格納している既存の関連性情報１１４の修正に使われ
る。次にシステム定義情報１０７に定義されている文章
パターン辞書１０８と用語区分の指定１０９を受け取る
（ステップ２０７）。受け取った情報を用いて文字列を
抽出する。文字列の抽出は、差分情報を参照して文書デ
ータを解析し、参照や引用等を示す文字列から参照先を
特定するための情報や文書データ上で使われている用語
を抽出する。（ステップ２０８）。次いで、既存情報の
更新２０９で、抽出された情報に基づいて関連性情報１
１４に保存されている情報を修正し（ステップ２０
９）、次いで改訂による文書間の関連性の変化を既存情
報とのマッチング処理によって反映する（ステップ２１
０）。When the instruction from the terminal 101 is a revision, first, in addition to the document information 111 and the document data 112, the difference information between the old and new document data is received from the document data management function 103 (step 206). The difference in processing from the case of the new registration lies in the handling of the difference information. The difference information is information indicating the difference between the old version and the new version, and is used to correct the existing relevance information 114 stored in the relevance DB 113. Next, the sentence pattern dictionary 108 and the term classification designation 109 defined in the system definition information 107 are received (step 207). Extract the character string using the received information. To extract a character string, the document data is analyzed with reference to the difference information, and information for identifying a reference destination or a term used on the document data is extracted from the character string indicating reference or citation. (Step 208). Then, in the update 209 of the existing information, the relevance information 1 based on the extracted information.
Modify the information stored in 14 (step 20
9) Then, the change in the relevance between the documents due to the revision is reflected by the matching process with the existing information (step 21).
0).

【００２３】端末１０１からの指示が削除の場合は、ま
ず、文書データの管理機能１０３から文書情報１１１を
受け取り（ステップ２１１）、次いで、該当する文書デ
ータ１１２に関するすべての情報が関連性ＤＢ１１３か
ら削除される（ステップ２１２）。When the instruction from the terminal 101 is deletion, first, the document information 111 is received from the document data management function 103 (step 211), and then all the information related to the corresponding document data 112 is deleted from the relevance DB 113. (Step 212).

【００２４】以上の三つの処理によって、関連性ＤＢ１
１３に格納されている関連性情報１１４の正確性が確保
できる。なお、関連性情報１１４に蓄積される情報がど
のような形態で、どのように更新されるかは以降で説明
する。By the above three processes, the relation DB1
The accuracy of the relevance information 114 stored in 13 can be secured. Note that the form of the information stored in the relevance information 114 and how it is updated will be described below.

【００２５】図３は、新規登録時に関連性情報１１４が
どのように蓄積されるかを示す概略図である。FIG. 3 is a schematic diagram showing how the relevance information 114 is stored at the time of new registration.

【００２６】関連性情報１１４には、文書データ１１２
ごとに、管理テーブル３０２、参照／引用テーブル３０
３、用語テーブル３０４、及び関連文書情報３０５が用
意される。これは、文書データが改訂されたときにも同
様で、旧版と新版の関連性情報がそれぞれ保存される。
これにより、旧版の文書間の関連性が保持されることに
なる。The related information 114 includes the document data 112.
Management table 302 and reference / citation table 30 for each
3, term table 304, and related document information 305 are prepared. This is also the case when the document data is revised, and the related information of the old version and the new version is stored respectively.
As a result, the relationship between the old documents is maintained.

【００２７】図３では、文書データＡの関連性情報１１
４を文書データＡのデータ群３０１として示している。
管理テーブル３０２の１欄目には文書名（文書データ
Ａ）が登録される。２欄目には文書ＤＢ１１０に格納さ
れている文書データとの対応を示すリンクが格納され、
３欄目には旧版の関連性情報１１４を指し示すリンクが
格納され、４欄目には該当する文書データの参照／引用
テーブル、用語テーブル、及び関連文書情報へのリンク
が格納される。この管理テーブル３０２によって、文書
ＤＢ１１０との関係、旧版の関連性情報１１４との関
係、及びある文書データのための関連性情報１１４を構
成する情報の範囲が特定できる。なお、図３は、新規登
録時の情報のため、管理テーブル３０２の３欄目の旧版
の関連性情報へのリンクはヌル値とする。In FIG. 3, the relevance information 11 of the document data A is shown.
4 is shown as a data group 301 of the document data A.
The document name (document data A) is registered in the first column of the management table 302. A link indicating the correspondence with the document data stored in the document DB 110 is stored in the second column,
A link pointing to the old version of the relevance information 114 is stored in the third column, and a reference / citation table of the corresponding document data, a term table, and a link to the related document information are stored in the fourth column. From this management table 302, the relationship with the document DB 110, the relationship with the old version of the relationship information 114, and the range of information that constitutes the relationship information 114 for certain document data can be specified. Since FIG. 3 is information at the time of new registration, the link to the old version relevance information in the third column of the management table 302 has a null value.

【００２８】参照／引用テーブル３０３の１欄目に文章
パターン辞書１０８の情報に基づいて参照や引用等を示
す文字列から抽出した相手先の文書データを特定する情
報が格納され、２欄目に１欄目の文字列が指し示す情報
の種別が格納される。この場合、「文書名」がそれに当
たり、文書情報１１１の情報の種別と一致させることが
可能である。文書情報１１１との関連については、図５
で詳しく述べる。３欄目には、１欄目の情報から見付か
った他文書の関連性情報へのリンクが格納される。もし
参照先の文書データが見付からなければ、３欄目はヌル
値となる。The first column of the reference / quotation table 303 stores information for specifying the document data of the other party extracted from the character string indicating the reference or the citation based on the information of the sentence pattern dictionary 108, and the second column of the first column. The type of information indicated by the character string of is stored. In this case, the "document name" corresponds to it, and it is possible to match it with the information type of the document information 111. Regarding the relationship with the document information 111, FIG.
Will be described in detail in. The third column stores a link to the relevance information of another document found from the information in the first column. If the referenced document data cannot be found, the third column has a null value.

【００２９】また、新規に登録された文書が参照されて
いないかを確認するため、他文書の参照／引用テーブル
をチェックし、参照されていれば相手先の参照／引用テ
ーブルの３欄目に、新規に登録した文書の関連性情報へ
のリンクを追加する。これにより、参照や引用といった
明示的な文書間の関連性が保持されることになる。Further, in order to confirm whether or not the newly registered document is referenced, the reference / citation table of another document is checked, and if it is referenced, the third column of the reference / citation table of the other party is checked. Add a link to the relevance information of the newly registered document. This preserves explicit document relationships such as references and citations.

【００３０】用語テーブル３０４の１欄目に用語区分の
指定１０９の情報に基づいて文書データから抽出した用
語が格納され、２欄目に新規、追加、変更、及び削除を
示す属性が格納され、３欄目に用語が変更されたときの
元の用語が格納され、４欄目に１欄目に格納された用語
が文書データ中のどこで使われているかを示す出現位置
情報がそれぞれ格納される。出現位置情報としては、出
現位置までの文字数、ページ番号等の位置を割り出すた
めの情報を用いることができる。これにより、１欄目に
格納された用語が文書データ上のどの位置で使われてい
るかが把握できる。なお、新規登録の場合、２欄目はす
べて「新規」属性が、３欄目はすべてヌル値が格納され
る。また、３欄目は改訂時に使われるものである。The first column of the term table 304 stores the terms extracted from the document data based on the information of the term classification designation 109, the second column stores the attributes indicating new, added, changed, and deleted, and the third column. The original term when the term is changed is stored in, and appearance position information indicating where the term stored in the first column is used in the document data is stored in the fourth column. As the appearance position information, information for identifying the position such as the number of characters up to the appearance position, page number, etc. can be used. As a result, it is possible to grasp at which position on the document data the term stored in the first column is used. In the case of new registration, the second column stores all “new” attributes, and the third column stores null values. The third column is used at the time of revision.

【００３１】関連文書情報３０５には、用語テーブル３
０４の１欄目に登録された用語が格納され、２欄目に同
じ用語が使われている他文書の関連性情報へのリンクが
０個又は一つ以上格納される。具体的には、１欄目が用
語テーブルの情報を元に生成され、２欄目が関連性情報
の抽出機能１０５によって１欄目の用語と他文書の関連
文書情報の１欄目とが総当たりでチェックされることに
より、同じ用語があれば自分と相手側の関連文書情報３
０５の２欄目が同時に更新される。さらに、２欄目だけ
に着目し、関連しない用語に関するリンクが登録されて
いないかがチェックされ、必要に応じて修正又は削除さ
れる。The related document information 305 includes a term table 3
The registered term is stored in the first column of 04, and zero or one or more links to the relevance information of other documents using the same term are stored in the second column. Specifically, the first column is generated based on the information in the term table, and the second column is brute-force checked for the terms in the first column and the first column of the related document information of another document by the relevance information extraction function 105. By doing so, if there is the same term, related document information of yourself and the other party 3
The second column of 05 is updated at the same time. Furthermore, paying attention only to the second column, it is checked whether or not a link relating to an unrelated term is registered, and it is corrected or deleted as necessary.

【００３２】具体的には、関連文書情報３０５と関連文
書情報３０６の１欄目を比較することで、文書データＡ
と文書データＢで共通で使われている「用語Ａ」という
用語が見付かり、関連文書情報３０５と関連文書情報３
０６の１行２欄目に、各文書データの関連性情報１１４
へのリンクが追加されることになる。さらに、２欄目に
自文書データと相手側文書データとの関連において矛盾
するリンクが登録されていた場合には修正されることに
なる。この処理により、新規登録や改訂といった登録条
件の違い、又は文書の登録順序等に関係なく、文書間の
関連性が保持されるのである。Specifically, by comparing the first column of the related document information 305 and the related document information 306, the document data A
And the term “Term A” commonly used in the document data B and the related document information 305 and the related document information 3 are found.
In the 1st line and 2nd column of 06, the relevance information 114 of each document data
A link to will be added. Further, if an inconsistent link is registered in the second column in the relation between the own document data and the other party's document data, it will be corrected. By this processing, the relationship between documents is maintained regardless of the difference in registration conditions such as new registration or revision, or the registration order of documents.

【００３３】図４は、改訂時に関連性情報１１４がどの
ように修正されるかを示す概略図である。文書データＡ
の改訂版が登録されると、文書データＡの既存の関連性
情報１１４がコピーされ、管理テーブル４０２の１欄目
が改訂された文書データの文書名に修正され、２欄目の
文書ＤＢへのリンクが新版の文書データＡを示すリンク
に修正され、３欄目が旧版の関連性情報１１４へのリン
クに修正される。さらに４欄目はコピーされた参照／引
用テーブル、用語テーブル、及び関連文書情報へのリン
クに修正される。これにより文書ＤＢ１１０に格納され
た新版の文書データＡと関連性情報１１４との関係、及
び旧版の文書データのための関連性情報１１４との関係
が保持されることになる。FIG. 4 is a schematic diagram showing how the relevance information 114 is modified upon revision. Document data A
Is registered, the existing relevance information 114 of the document data A is copied, the first column of the management table 402 is corrected to the document name of the revised document data, and the second column is linked to the document DB. Is corrected to a link indicating the new version of the document data A, and the third column is corrected to a link to the related information 114 of the old version. Further, the fourth column is corrected to the copied reference / citation table, term table, and link to related document information. As a result, the relationship between the new version document data A stored in the document DB 110 and the relevance information 114 and the relationship information 114 for the old version document data is retained.

【００３４】参照／引用テーブル４０３は、いったん初
期化され、新規登録時と同じ処理がなされる。The reference / citation table 403 is initialized once and the same processing as that at the time of new registration is performed.

【００３５】用語テーブル４０４は、文書データの管理
機能１０３から受け取った差分情報に基づいて、既存の
情報が修正される。用語の書き換えがなければ、２欄目
と３欄目はヌル値となり、出現位置情報も変わらなけれ
ばその修正もないことになる。用語が変更されていた場
合は、２欄目に「変更」属性が格納され、３欄目に変更
前の用語が格納され、出現位置情報も必要に応じて修正
される。用語が削除された場合は、２欄目に「削除」属
性が指定され、出現位置情報も削除される。用語が追加
された場合は、２欄目に「追加」属性が指定され、出現
位置情報も追加されるといった処理がなされる。Existing information in the term table 404 is modified based on the difference information received from the document data management function 103. If the terms are not rewritten, the second and third columns have null values, and if the appearance position information does not change, there is no correction. When the term has been changed, the “change” attribute is stored in the second column, the term before the change is stored in the third column, and the appearance position information is also corrected as necessary. When the term is deleted, the “delete” attribute is designated in the second column and the appearance position information is also deleted. When a term is added, the "addition" attribute is designated in the second column, and the appearance position information is also added.

【００３６】具体的な書き換え処理の例を差分４０６を
用いて説明する。差分４０６は、旧文書データと新文書
データの一部を比較して取られた情報である。差分の文
字列の中で、アンダーラインが引かれている部分が追加
された文字列、取り消し線が引かれている部分が削除さ
れた文字列を示している。二重取り消し線で引かれた文
字列が変更された用語でそれに続くアンダーラインが引
かれている文字列が変更後の用語であることを示してい
る。なお、差分４０６の表示は、データの形式や差分を
採る方式等を規定するものではなく、差分情報を図示化
したサンプルである。A specific example of rewriting processing will be described using the difference 406. The difference 406 is information obtained by comparing a part of the old document data and the new document data. Among the character strings of the difference, the underlined part is the added character string, and the strikeout line part is the deleted character string. The double-strike-through character string is the modified term, and the underlined character string is the modified term. The display of the difference 406 does not define the format of the data, the method of taking the difference, or the like, but is a sample that illustrates the difference information.

【００３７】用語テーブル４０４に格納された用語は、
関連性情報の抽出機能１０５が用語区分の指定１０９の
情報に従って抽出したものであり、文書データ上の出現
順に出現位置情報といっしょに格納されている。したが
って、旧文書データの文書中の用語が出現順に登録され
ていることを示しており、差分４０６の情報をキーに用
語テーブル４０４が修正できる。The terms stored in the term table 404 are
It is extracted by the relevance information extraction function 105 according to the information of the term classification designation 109, and is stored together with the appearance position information in the order of appearance on the document data. Therefore, it indicates that the terms in the document of the old document data are registered in the order of appearance, and the term table 404 can be corrected using the information of the difference 406 as a key.

【００３８】用語テーブル４０４の「用語Ａ」と「用語
Ｃ」は、差分４０６で修正されていないため、用語テー
ブル４０４の２欄目と３欄目はヌル値となり、ページず
れ等がなければ、出現位置情報も変わらない。出現位置
に関する処理については以降も同様である。Since "Term A" and "Term C" in the term table 404 have not been corrected by the difference 406, the second and third columns of the term table 404 have null values, and if there is no page misalignment, etc., the appearance position. The information remains unchanged. The same applies to the processing regarding the appearance position.

【００３９】３行目の「用語Ｃ」は差分４０６上で「用
語Ｅ」に変更されているため、１欄目が「用語Ｅ」に変
更され、２欄目に「変更」属性が、３欄目に元の用語で
ある「用語Ｃ」が登録される。これにより、用語の変更
に関する情報が保持されることになる。Since the "Term C" on the third line is changed to "Term E" on the difference 406, the first column is changed to "Term E", the "change" attribute is set to the second column, and the third column is set to the "change" attribute. The original term “Term C” is registered. This will retain information about term changes.

【００４０】４行目の「用語Ａ」は変更がないため、１
行目と同じ処理がなされ、５行目の「用語Ｄ」は新文書
データから削除されているために、２欄目に「削除」属
性が登録され、４欄目の出現位置情報もヌル値に修正さ
れている。Since there is no change in "Term A" on the 4th line, 1
Since the same processing as the line is performed and "Term D" on the 5th line has been deleted from the new document data, the "delete" attribute is registered in the 2nd column and the appearance position information in the 4th column is also corrected to a null value. Has been done.

【００４１】６行目の「用語Ｆ」は、旧文書データにな
く、差分４０６上で追加されていることが示されている
ため、行が新たに追加され、１欄目に「用語Ｆ」が、２
欄目に「追加」属性が、４欄目に出現位置情報がそれぞ
れ登録される。Since it is shown that "Term F" in the sixth line is not in the old document data and is added on the difference 406, a new line is added and "Term F" is added in the first column. Two
The “addition” attribute is registered in the column, and the appearance position information is registered in the fourth column.

【００４２】上記の処理によって、用語テーブル４０４
に関し、文書データの改訂による影響がすべて把握され
たことになる。By the above processing, the term table 404
With respect to the above, all the effects of the revision of the document data have been understood.

【００４３】関連文書情報４０５は、追加、変更、削除
のあった用語に関連した修正が行われる。「追加」属性
を持つ用語が関連文書情報４０５にない場合は、この用
語を追加して他文書の関連性情報へのリンクを登録し、
ある場合は何の処理もなされない。The related document information 405 is modified in relation to the added, changed, or deleted terms. When the term having the "addition" attribute is not in the related document information 405, this term is added and a link to the relevance information of another document is registered,
If so, no action is taken.

【００４４】変更属性を持つ用語の処理では、まず、変
更された用語のための行が追加され、１欄目に変更され
た用語が追加され、２欄目には変更前の用語のための旧
版に格納されている他文書の関連性情報へのリンクがコ
ピーされると共に、１欄目と２欄目には「変更」属性を
付加されるという処理がなされる。これは、対象の用語
の変更が他の文書データでも必要となることを仮定する
必要があるためである。In processing a term having a change attribute, first, a line for the changed term is added, a changed term is added in the first column, and an old version for the term before the change is added in the second column. In addition to copying the link to the stored relevance information of the other document, a "change" attribute is added to the first and second columns. This is because it is necessary to assume that the change of the target term is necessary for other document data.

【００４５】３行目の１欄目と２欄目に付加された「変
更」属性は、旧版の他文書の関連性情報へのリンクがす
べて無くなるか、又は修正されるまで保持される。旧版
の他文書の関連性情報へのリンクが修正されるのは、参
照先の文書データが改訂によって「用語Ｃ」が「用語
Ｅ」に置き換えられることで変更属性が解除されるか、
「用語Ｃ」がそのまま使われることで削除されるか、該
当する文書データが削除されるかのいずれかである。The "change" attribute added to the first and second columns of the third line is held until all the links to the relevance information of other documents of the old version disappear or are corrected. The link to the relevance information of other documents in the old version is corrected because the reference document data is revised and "term C" is replaced with "term E" so that the change attribute is canceled or
It is either deleted by directly using the “term C” or the corresponding document data is deleted.

【００４６】「削除」属性を持つ用語の処理では、他文
書の関連性情報１１４に含まれる関連文書情報から該当
文書データの関連性情報へのリンクをすべて削除する処
理がなされ、さらに、「削除」属性を持つ用語は、最終
的に用語テーブル４０４からも削除されるため、不要な
情報は残らない。In the processing of the term having the "deletion" attribute, the processing of deleting all the links from the related document information included in the related information 114 of another document to the related information of the relevant document data, and "Delete" The term having the "attribute" is finally deleted from the term table 404, so that unnecessary information does not remain.

【００４７】上記の一連の処理によって、改訂された文
書データＡのデータ群４０１に改訂による影響がすべて
反映されたことになる。By the series of processes described above, all the influences of the revision are reflected in the data group 401 of the revised document data A.

【００４８】図５は、システム定義情報定義機能１０４
を用いてシステム定義情報１０７の内容を定義する処理
を説明する図である。FIG. 5 shows the system definition information definition function 104.
FIG. 6 is a diagram illustrating a process of defining the contents of system definition information 107 using.

【００４９】システム定義情報１０７の内容を定義又は
修正する場合、まずシステム定義情報の定義画面５０１
を端末１０１に表示する。システム定義情報の定義画面
５０１には、定義項目のメニューとして「文章パターン
辞書の指定」と「用語区分の指定」という二つの項目が
表示されている。When defining or modifying the contents of the system definition information 107, first, the system definition information definition screen 501 is displayed.
Is displayed on the terminal 101. On the definition screen 501 of the system definition information, two items of “designation of sentence pattern dictionary” and “designation of term category” are displayed as a menu of definition items.

【００５０】「文章パターン辞書の指定」を選択する
と、文章パターン辞書の指定画面５０２が表示され、シ
ステム定義情報５０４の文章パターン辞書５０５の内容
が参照されて、編集可能な状態で表示される。When "Specify text pattern dictionary" is selected, the text pattern dictionary specification screen 502 is displayed, and the contents of the text pattern dictionary 505 of the system definition information 504 are referenced and displayed in an editable state.

【００５１】文章パターン辞書５０５には、１欄目に文
章中に出現する参照や引用等を示す文字列の文章パター
ンが格納され、２欄目に抽出した情報の種別を示す文字
列が格納される。文章中に出現する参照や引用等を示す
文字列の文章パターンに着目する理由は、人間が文章を
読んだときにそれらの文字列が参照や引用等を示してい
ることを理解できるよう、文章表現上の規則に基づいて
執筆されるためである。したがって、文章パターン辞書
５０５の１欄目の文字列を使って参照や引用等を示す文
字列から相手先の文書データを特定するための情報を抽
出することで、文書間の関連性が特定できるのである。The sentence pattern dictionary 505 stores a sentence pattern of a character string indicating a reference or a citation appearing in the sentence in the first column, and a character string indicating the type of the extracted information in the second column. The reason for paying attention to the sentence pattern of a character string that indicates a reference or a citation that appears in a sentence is that when a human reads the sentence, the character string indicates a reference or a citation, etc. This is because it is written based on the rules of expression. Therefore, by using the character string in the first column of the sentence pattern dictionary 505 to extract information for specifying the document data of the other party from the character string indicating reference, citation, etc., the relevance between documents can be specified. is there.

【００５２】文章パターン辞書５０５の１欄目に登録す
る文字列の形態について詳しく述べる。The form of the character string registered in the first column of the sentence pattern dictionary 505 will be described in detail.

【００５３】１行目の「マニュアル「(.*)」を参照」の
場合、文書データ中の「マニュアル「ＡＢＣ入門」を参
照」といった文字列にマッチする文章パターンである。
「．＊」が可変の任意の文字列にマッチし、小括弧で囲
まれた範囲の文字列を抽出することを示す。したがっ
て、「マニュアル「ＡＢＣ入門」を参照」という文字列
からは、「ＡＢＣ入門」という文字列を文書名として抽
出できる。In the case of "Refer to manual" (. *) "On the first line, it is a sentence pattern that matches a character string such as" Refer to manual "Introduction to ABC""in the document data.
Indicates that ". *" Matches any variable character string and extracts the character string within the range enclosed by the parentheses. Therefore, the character string "Introduction to ABC" can be extracted as the document name from the character string "Refer to manual" Introduction to ABC "".

【００５４】発行日から任意の文書データを特定できる
場合は、「（［０−９］＊／［０−９］＊／［０−９］
＊）」という文字列を指定することで、「２００１／０
８／０７」という文字列をマッチさせることができる。
これは、「［０−９］＊」の部分が任意の桁数の数字に
マッチできるためである。If any document data can be specified from the issue date, "([0-9] * / [0-9] * / [0-9]
By specifying the character string "*)", "2001/0"
You can match the string "8/07".
This is because the part of "[0-9] *" can match the number of arbitrary digits.

【００５５】また、特許の公開番号のような複数の形式
を持つ文字列にも対応できる。例えば、「特開平９−１
４６９６８」や「特開２０００−９９５４３」という文
字列を一つの文章パターンでマッチさせる場合は、３行
目で示した「（特開平？［０−９］＊−［０−９］
＊）」という文字列を指定する。「平？」の部分が
「平」という文字が０個又は一つマッチすることを示す
ため、この文字列により、「特開平９−１４６９６８」
と「特開２０００−９９５４３」をマッチさせることが
できるのである。It is also possible to deal with a character string having a plurality of formats such as a patent publication number. For example, "Japanese Patent Laid-Open No. 9-1
When the character strings "46968" and "JP 2000-99543" are matched in one sentence pattern, "(JP ?? [0-9] *-[0-9]" shown in the third line is displayed.
*) ”Is specified. The part "Hira?" Indicates that 0 or 1 of the characters "Hira" match, so this character string is used to indicate "Japanese Patent Laid-Open No. 9-146968".
And "Japanese Patent Laid-Open No. 2000-99543" can be matched.

【００５６】当然ながら文章パターン辞書５０５に文章
パターンが登録されていなければ、参照や引用を示す文
字列であっても抽出することはできないが、不必要な参
照や引用等の情報を排除でき、着目したい参照や引用等
を任意にコントロールできるということが重要である。Naturally, if the text pattern is not registered in the text pattern dictionary 505, even a character string indicating reference or citation cannot be extracted, but unnecessary information such as reference and citation can be eliminated, It is important to be able to control the references and citations that you want to pay attention to.

【００５７】次に「用語区分の指定」を選択すると、用
語区分の指定画面５０３が表示され、システム定義情報
５０４の用語区分の指定５０６の内容が参照されて、既
存の定義内容に従って画面の項目が更新される。Next, when "Specify term category" is selected, the term category designation screen 503 is displayed, the content of the term category designation 506 of the system definition information 504 is referred to, and the items on the screen are displayed according to the existing definition content. Will be updated.

【００５８】用語区分の指定５０６の１欄目が着目する
用語区分や付加条件項目である。２欄目に「ｏｎ」又は
「ｏｆｆ」が指定され、「ｏｎ」が選択された状態、
「ｏｆｆ」が選択されていない状態を表わしている。用
語区分の指定画面５０３はサンプルであり、画面上で選
択できる項目は、扱う文書データの種類や用途、形態素
解析、あるいは構文解析等の文章解析手法の機能要件に
合わせて登録される。例えば、用語区分の指定画面５０
３では、着目する用語として「固有名詞」、「一般名
詞」、「動詞」等の国文法での区分と、助詞の扱いを決
めるための付加条件が指定できるようになっている。The first column of the designation 506 of term classification is the term classification or additional condition item of interest. In the state where "on" or "off" is specified in the second column and "on" is selected,
This indicates a state where "off" is not selected. The term classification designation screen 503 is a sample, and the items that can be selected on the screen are registered according to the type and purpose of the document data to be handled, the morphological analysis, or the functional requirements of a sentence analysis method such as syntactic analysis. For example, a term category designation screen 50
In 3, the classification in the national grammar such as “proper noun”, “general noun”, “verb”, and additional conditions for determining the handling of particles can be specified.

【００５９】用語区分の指定画面５０３の場合、「固有
名詞」の部分だけが選択されているので、用語区分の指
定５０６の１行目の２欄目だけがｏｎとなり、その他の
項目はｏｆｆとなっている。In the term segment designation screen 503, only the "proper noun" portion is selected, so only the second column of the first line of the term segment designation 506 is on and the other items are off. ing.

【００６０】文書データ上での使われている用語を抽出
し、各用語の使用状況を把握する理由は、一つの情報を
細分化して複数の文書に掲載する状況、情報の用途に合
わせて複数の情報をグループ化する状況等の文書制作時
の条件に対応し、使用されている用語によって暗示され
ている文書間の関連性を把握することにある。The reason for extracting the terms used in the document data and grasping the usage status of each term is that one information is subdivided into multiple documents and a plurality of documents are used according to the usage of the information. It is to understand the relationship between documents implied by the terms used in response to the conditions at the time of document production such as the situation where information is grouped.

【００６１】したがって、文章パターン辞書５０５と用
語区分の指定５０６を使うことで、参照や引用を明示的
に表わす文字列と、文章データ上で使われる用語によっ
て、文書間の関連性を網羅的に扱うことができることに
なる。Therefore, by using the sentence pattern dictionary 505 and the term classification designation 506, the relevance between documents can be comprehensively defined by the character strings that express reference or citation explicitly and the terms used in the sentence data. You will be able to handle it.

【００６２】図６は、文章パターン辞書１０８に登録さ
れる情報を用いて、参照された文書を特定する処理を説
明する図である。FIG. 6 is a diagram for explaining the process of specifying the referenced document using the information registered in the sentence pattern dictionary 108.

【００６３】システム定義情報６０１の文章パターン辞
書６０２の１行１欄目には、「マニュアル「（．＊）」
を参照」という文字列が登録されている。この文字列を
使って、文書ＤＢ６０３に登録されている文書データを
検索すると、ＡＢＣ入門６０４という文書データに含ま
れる「詳細は、マニュアル「ＡＢＣシステムガイド」を
参照のこと。」の行の「マニュアル「ＡＢＣシステムガ
イド」を参照」の部分がマッチし、「ＡＢＣシステムガ
イド」が相手先を特定するための情報として抽出され
る。In the first line, first column of the text pattern dictionary 602 of the system definition information 601, "manual" (. *) "Is displayed.
The character string "refer to" is registered. When the document data registered in the document DB 603 is searched using this character string, refer to the manual “ABC System Guide” for details contained in the document data of the ABC Primer 604. The part of "Refer to the manual" Refer to "ABC system guide""in the line" matches and the "ABC system guide" is extracted as information for identifying the other party.

【００６４】さらに、抽出した情報が文章パターン辞書
６０２の２欄目により、文書名であることが示されてい
るため、「ＡＢＣシステムガイド」という文字列を使っ
て、他文書の文書情報で定義されている文書名を検索す
ることにより、ＡＢＣシステムガイド６０５を相手先と
して特定できる。Furthermore, since the extracted information indicates that it is the document name in the second column of the sentence pattern dictionary 602, it is defined in the document information of another document using the character string "ABC system guide". The ABC system guide 605 can be specified as the other party by searching the document name that is displayed.

【００６５】このように、文章パターン辞書６０２を使
えば、相手先の文書データを特定するための情報を抽出
でき、さらに参照先の文書データをも特定できることに
なる。As described above, by using the sentence pattern dictionary 602, information for identifying the document data of the other party can be extracted, and the document data of the reference destination can also be identified.

【００６６】図７は、用語区分の指定１０９に従って用
語を抽出する方法を説明する図である。「固有名詞」だ
けをｏｎにした用語区分の指定７０２の指定を使って文
章のサンプル７０１から用語を抽出すると、「ＡＢＣシ
ステム」、「ログファイル」、「［ファイル（Ｆ）］−
［ログ（Ｒ）］コマンド」の三つの用語が抽出される。FIG. 7 is a diagram for explaining a method of extracting a term according to designation 109 of a term category. When a term is extracted from the sentence sample 701 using the designation of the term classification 702 in which only “proper noun” is turned on, “ABC system”, “log file”, “[file (F)] −
The three terms “log (R) command” are extracted.

【００６７】「固有名詞」と「助詞の「の」を含める」
をｏｎにした用語区分の指定７０３の指定を使って文章
のサンプル７０１から用語を抽出すると、「ＡＢＣシス
テム」、「ログファイルの生成」、「［ファイル
（Ｆ）］−［ログ（Ｒ）］コマンド」の三つの用語が抽
出される。"Proper noun" and "include particle" no ""
When a term is extracted from the sentence sample 701 using the designation of the term classification 703 in which “ON” is turned on, “ABC system”, “Generate log file”, “[File (F)]-[Log (R)] Three terms of "command" are extracted.

【００６８】上記二つの用語区分の指定上の定義の違い
は、「助詞の「の」を含める」がｏｎになっているかど
うかであり、抽出された用語は、前者が「ログファイ
ル」、後者が「ログファイルの生成」であり、用語区分
の指定によって、異なる用語が抽出されることが示され
ている。The difference between the above two definitions of terms in terms of designation is whether or not "include" no "of particle" is turned on, and the extracted terms are "log file" in the former and "latter in the latter". Is "generation of log file", and it is shown that different terms are extracted by specifying the term classification.

【００６９】このように、用語区分の指定１０９を使え
ば、文書データ上で使われている用語を文書間の関連性
を抽出する上で必要な形態で抽出できることになる。As described above, by using the term classification designation 109, the terms used in the document data can be extracted in a form necessary for extracting the relevance between documents.

【００７０】図８は、表示データの生成機能１０６が生
成するユーザーインタフェースのサンプル画面を示す図
である。FIG. 8 is a diagram showing a sample screen of the user interface generated by the display data generating function 106.

【００７１】前述したように、関連性ＤＢ１１３には文
書ＤＢ１１０に格納されている文書データに関する様々
な情報が蓄積されている。この情報は表示データの生成
機能１０６を用いることにより端末１０１の表示画面に
表示することができる。As described above, the relevance DB 113 stores various information regarding the document data stored in the document DB 110. This information can be displayed on the display screen of the terminal 101 by using the display data generation function 106.

【００７２】図に示す表示画面８０１は、関連性情報１
１４から情報を抽出するための条件を指定するための端
末１０１に表示されるユーザーインタフェースのサンプ
ルである。この表示画面８０１には検索機能と表示機能
が用意されている。検索機能を使う場合、用語を入力
し、［検索実行］ボタンを押すことで用語検索が実現で
きることを示している。The display screen 801 shown in FIG.
14 is a sample user interface displayed on the terminal 101 for designating a condition for extracting information from 14. The display screen 801 has a search function and a display function. When using the search function, it indicates that a term search can be realized by entering a term and clicking the [Search] button.

【００７３】この他、「文書関連図の表示」、「用語関
連図の表示」、「特定文書の参照先の表示」等の機能も
用意されている。例えば、「文書関連図の表示」を選択
すれば、文書関連図が表示され、「未修正用語の表示」
を選択すれば、未修正用語を含む文書の一覧が表示され
るのである。In addition to this, functions such as "display of document related diagram", "display of term related diagram", "display of reference destination of specific document" and the like are also prepared. For example, if you select "Display Document Relationship Diagram", the Document Relationship Diagram will be displayed, and "Display Uncorrected Terms" will be displayed.
If you select, a list of documents containing unmodified terms is displayed.

【００７４】これらの機能は、すべて関連性ＤＢ１１３
に蓄積されている関連性情報１１４を使って実現できる
ものであり、文書間の関連性を把握する上で便利な機能
だと言える。All of these functions are related DB 113.
It can be realized by using the relevance information 114 stored in the document, and can be said to be a convenient function for grasping the relevance between documents.

【００７５】関連性ＤＢ１１３上の情報の具体的な利用
方法については、代表的な機能である「検索」機能、
「文書関連図の表示」機能、「未修正用語の表示」機能
の三つを使って以降で説明する。Regarding a specific method of using the information on the relevance DB 113, a typical function is a "search" function,
A description will be given below using three functions of "display document relation diagram" function and "display uncorrected term" function.

【００７６】図９は、表示データの生成機能１０６が提
供する「検索」機能を説明する図である。端末１０１に
表示される関連性情報の表示画面９０１の検索用語の入
力領域（図８参照）に「用語Ｃ」を入力し、検索実行ボ
タンを押すと、検索結果９０２が表示される。検索結果
９０２は、文書名と詳細表示のためのリンクが用意され
る。文書名のリンクを選択したときには、文書ＤＢ１１
０に格納されている文書データを取り出すことができ、
出現個所のリンクを選択すると、該当文書データの用語
テーブルから該当用語の出現位置情報が一覧表示される
ものである。FIG. 9 is a diagram for explaining the “search” function provided by the display data generation function 106. When “Term C” is input in the search term input area (see FIG. 8) of the relevance information display screen 901 displayed on the terminal 101 and the search execution button is pressed, the search result 902 is displayed. The search result 902 has a document name and a link for displaying details. When the document name link is selected, the document DB 11
The document data stored in 0 can be retrieved,
When the link of the appearance location is selected, the appearance position information of the relevant term is displayed in a list from the term table of the relevant document data.

【００７７】検索結果９０２に文書名を表示するにあた
って、検索対象となる情報は関連性情報１１４の関連文
書情報である。例えば、文書データＡ、文書データＢ、
文書データＣの関連文書情報が、それぞれ文書データＡ
の関連文書情報９０３、文書データＢの関連文書情報９
０４、文書データＣの関連文書情報９０５の場合、「用
語Ｃ」が検索用語として入力されると、三つともマッチ
することになる。When displaying the document name in the search result 902, the information to be searched is the related document information of the relevance information 114. For example, document data A, document data B,
The related document information of the document data C is the document data A, respectively.
Related document information 903 and related document information 9 of document data B
04, in the case of the related document information 905 of the document data C, if "Term C" is input as a search term, all three will match.

【００７８】また、「用語Ｄ」が入力されると文書デー
タＢが、「用語Ｅ」が入力されると文書データＣだけが
検索結果として表示される。When "Term D" is input, the document data B is displayed, and when "Term E" is input, only the document data C is displayed as the search result.

【００７９】図１０は、表示データの生成機能１０６が
提供する「文書関連図の表示」機能を説明する図であ
る。FIG. 10 is a diagram for explaining the "display document relation diagram" function provided by the display data generation function 106.

【００８０】端末１０１に表示される関連性情報の表示
画面１００１の文書関連図の表示（図８参照）を選択
し、表示実行ボタンを押すと、文書関連図１００２が表
示される。文書関連図１００２は、文書データ間の参照
／引用関係を図示化したもので、文書データを選択する
ことで、文書ＤＢ１１０に格納されている文書データを
取り出すことができ、また、文書間の関連性をビジュア
ルに把握できるものである。When the display of the document relation diagram (see FIG. 8) on the relation information display screen 1001 displayed on the terminal 101 is selected and the display execution button is pressed, the document relation diagram 1002 is displayed. A document relation diagram 1002 illustrates reference / citation relations between document data. By selecting document data, the document data stored in the document DB 110 can be extracted, and the relations between documents can be extracted. You can visually grasp the sex.

【００８１】文書関連図１００２を表示する際の検索対
象となる情報は関連性情報１１４の参照／引用テーブル
である。例えば、文書データＡ、文書データＢ、文書デ
ータＣ、文書データＤ、文書データＥ、及び文書データ
Ｆの参照／引用テーブルが、それぞれ文書データＡの参
照／引用テーブル１００３、文書データＢの参照／引用
テーブル１００４、文書データＣの参照／引用テーブル
１００５、文書データＤの参照／引用テーブル１００
６、文書データＥの参照／引用テーブル１００７、文書
データＦの参照／引用テーブル１００８である場合に
は、文書間の関連性が文書関連図１００２のように表示
される。Information to be searched when the document relation diagram 1002 is displayed is the reference / citation table of the relation information 114. For example, the reference / citation table of the document data A, the document data B, the document data C, the document data D, the document data E, and the document data F are the reference / citation table 1003 of the document data A and the reference / citation table of the document data B, respectively. Citation table 1004, document data C reference / citation table 1005, document data D reference / citation table 100
6, the reference / quotation table 1007 of the document data E and the reference / citation table 1008 of the document data F are displayed as a document relation diagram 1002.

【００８２】具体的には、「文書データＡ」は「文書デ
ータＣ」への参照を含み、「文書データＢ」は「文書デ
ータＣ」と存在しない「文書データＧ」への参照を含ん
でいることを示す。「文書データＧ」は存在しない文書
データのため、文書関連図１００２上で楕円の形で表示
されていることが確認できる。Specifically, "document data A" includes a reference to "document data C", and "document data B" includes a reference to "document data C" and "document data G" that does not exist. Indicates that Since the “document data G” does not exist, it can be confirmed that the “document data G” is displayed in the shape of an ellipse on the document relation diagram 1002.

【００８３】「文書データＥ」は「文書データＣ」と
「文書データＤ」への参照を含み、さらに「文書データ
Ｄ」からも参照されていることから相互参照関係の文書
データであることが確認できる。また、どの文書からも
参照されていない文書データであれば「文書データＦ」
のように単独で表示され、他文書への参照を含まない文
書データであっても他文書から参照されていれば「文書
データＣ」のような参照される側の文書として表示され
る。Since "document data E" includes references to "document data C" and "document data D" and is also referred to by "document data D", it is a document data having a cross-reference relationship. I can confirm. If the document data is not referenced by any document, then "document data F"
Even if the document data does not include a reference to another document, it is displayed as a referenced document such as “document data C” if the document data is referenced by another document.

【００８４】図１１は、表示データの生成機能１０６が
提供する「未修正用語の表示」機能を説明する図であ
る。FIG. 11 is a diagram for explaining the “display uncorrected term” function provided by the display data generation function 106.

【００８５】端末１０１に表示される関連性情報の表示
画面１１０１の未修正用語の表示（図８参照）を選択
し、表示実行ボタンを押すと、未修正用語の表示画面１
１０２が表示される。未修正用語の表示画面１１０２
は、特定用語が変更されたときにその影響範囲を把握す
るために使うもので、用語リストで選択した用語ごと
に、変更後の用語、変更が必要な文書候補、及び出現位
置を表示させるためのリンクが一覧表示される。変更が
必要な文書候補に表示された文書データを選択すれば文
書ＤＢ１１０に格納されている文書データを取り出すこ
とができ、また、出現位置の表示のためのリンク（出現
位置）を選択すれば、各文書データ中での出現個所が一
覧表示される。When the display of uncorrected terms (see FIG. 8) on the related information display screen 1101 displayed on the terminal 101 is selected and the display execution button is pressed, the uncorrected term display screen 1 is displayed.
102 is displayed. Uncorrected term display screen 1102
Is used to grasp the range of influence when a specific term is changed.To display the changed term, the document candidate that needs to be changed, and the appearance position for each term selected in the term list. A list of links is displayed. If the document data displayed in the document candidates that need to be changed is selected, the document data stored in the document DB 110 can be retrieved, and if a link (appearance position) for displaying the appearance position is selected, A list of appearance points in each document data is displayed.

【００８６】未修正用語の表示画面１１０２を表示する
際の検索対象となる情報は関連性情報１１４の用語テー
ブルと関連文書情報である。例えば、文書データＡ、文
書データＢの関連性情報が、それぞれ文書データＡの関
連性情報１１０３、文書データＢの関連性情報１１０４
である場合に、用語リストで「用語Ｃ」が選択される
と、未修正用語の表示画面１１０２は図のように表示さ
れる。Information to be searched when displaying the uncorrected term display screen 1102 is the term table of the relevance information 114 and the related document information. For example, the relevance information of the document data A and the relevance information of the document data B are the relevance information 1103 of the document data A and the relevance information 1104 of the document data B, respectively.
If “Term C” is selected in the term list, the uncorrected term display screen 1102 is displayed as shown in the figure.

【００８７】具体的には、用語テーブルの２欄目に「変
更」属性が、３欄目に「用語Ｃ」が指定されている文書
データの関連性情報がピックアップされ、その後、１行
１欄目に指定された用語と対応する関連文書情報が参照
されることになる。Specifically, the relevance information of the document data in which the "change" attribute is designated in the second column of the term table and the "term C" is designated in the third column, and then designated in the first row and the first column. The related document information corresponding to the selected term will be referred to.

【００８８】この場合、文書データＡの関連性情報１１
０３、文書データＢの関連性情報１１０４に格納されて
いる関連文書情報の２欄目が変更が必要な文書候補とし
て扱われるのであるが、関連文書情報の２欄目に格納さ
れているリンクの中で、「変更」属性を持たない情報は
除外される。関連文書情報の２欄目の「変更」属性が削
除される条件は、先にも述べたとおり、参照先の文書が
改訂されたとき等であり、文書データＡの関連性情報１
１０３の関連文書情報は、「文書データＤ」が改訂さ
れ、「文書データＤ」に含まれていた「用語Ｃ」が「用
語Ｅ」に変更されていることを示している。当然ながら
「文書データＤ」の改訂により、「用語Ｃ」が「用語
Ｅ」に変更されたことによって、文書データＢの関連性
情報１１０４の関連文書情報の２欄目に存在していた
「文書データＤ」へのリンクは削除されていることにな
る。In this case, the relevance information 11 of the document data A
03, the second column of the related document information stored in the relevance information 1104 of the document data B is treated as a document candidate that needs to be changed. Among the links stored in the second column of the related document information, Information that does not have a "change" attribute is excluded. As described above, the condition for deleting the “change” attribute in the second column of the related document information is, for example, when the referenced document is revised, and the relevance information 1 of the document data A
The related document information 103 indicates that “document data D” has been revised, and “term C” included in “document data D” has been changed to “term E”. As a matter of course, since the “term C” is changed to the “term E” due to the revision of the “document data D”, the “document data” existing in the second column of the relevant document information of the relevance information 1104 of the document data B. The link to "D" has been deleted.

【００８９】この結果、未修正用語の表示画面１１０２
には、「変更後の用語」として「用語Ｅ」と「用語Ｆ」
が、変更が必要な文書候補として「文書データＣ」と
「文書データＤ」がそれぞれ表示されることになる。As a result, the uncorrected term display screen 1102 is displayed.
"Terms E" and "F" as "terms after change"
However, "document data C" and "document data D" are displayed as the document candidates that need to be changed.

【００９０】以上説明したように、本実施形態によれ
ば、文書間の関連性を示す情報を関連性の明示／暗示の
区別なく抽出し、抽出した情報を文書データ等とは別に
保存し、その後、抽出した情報を用いて文書間の関連性
に即した情報を取得するので、文書間の関連性に即した
情報を簡単な操作で漏れなく取得することができる。As described above, according to the present embodiment, information indicating the relevance between documents is extracted without distinction between the explicitness / implication of the relevance, and the extracted information is stored separately from the document data and the like. After that, the information according to the relevance between the documents is acquired using the extracted information, so that the information according to the relevance between the documents can be acquired without omission by a simple operation.

【００９１】[0091]

【発明の効果】以上説明したように本発明によれば、文
書間の関連性情報を明示／暗示の区別なく蓄積すること
により文書間の関連性に則った情報を様々な手順で容易
に取得することが可能になる。As described above, according to the present invention, the information according to the relation between documents can be easily obtained by various procedures by accumulating the relation information between documents without distinction between explicit and implicit. It becomes possible to do.

[Brief description of drawings]

【図１】本発明の実施形態に係る文書データの関連性抽
出装置を示す図である。FIG. 1 is a diagram showing a document data relevance extraction apparatus according to an embodiment of the present invention.

【図２】関連性情報の抽出機のが行う処理を示すフロー
チャートである。FIG. 2 is a flowchart showing a process performed by a relevance information extractor.

【図３】新規登録時に関連性情報がどのように蓄積され
るかを示す概略図である。FIG. 3 is a schematic diagram showing how relevance information is stored during new registration.

【図４】改訂時に関連性情報がどのように修正されるか
を示す概略図である。FIG. 4 is a schematic diagram showing how the relevance information is modified upon revision.

【図５】システム定義機能を用いてシステム定義情報の
内容を定義する処理を説明する図である。FIG. 5 is a diagram illustrating a process of defining the contents of system definition information using a system definition function.

【図６】文章パターン辞書に登録される情報を用いて参
照された文書を特定する処理を説明する図である。FIG. 6 is a diagram illustrating a process of identifying a referenced document using information registered in a text pattern dictionary.

【図７】用語区分の指定にしたがって用語を抽出する方
法を説明する図である。FIG. 7 is a diagram illustrating a method of extracting a term according to designation of a term classification.

【図８】表示データの生成機能が生成するユーザインタ
フェースのサンプル画面を示す図である。FIG. 8 is a diagram showing a sample screen of a user interface generated by a display data generation function.

【図９】表示データの生成機能が提供する「検索」機能
を説明する図である。FIG. 9 is a diagram illustrating a “search” function provided by a display data generation function.

【図１０】表示データの生成機能が提供する「文書関連
図の表示」機能を説明する図である。FIG. 10 is a diagram illustrating a “display document related diagram” function provided by a display data generation function.

【図１１】表示データの生成機能が提供する「未修正用
語の表示」機能を説明する図である。FIG. 11 is a diagram illustrating a “display uncorrected term” function provided by a display data generation function.

[Explanation of symbols]

１０１端末１０２文書データの関連性抽出装置１０３文書データの管理機能１０４システム定義情報定義機能１０５関連性情報の抽出機能１０６表示データの生成機能１０７システム定義情報１０８文章パターン辞書１０９用語区分の指定１１０文書ＤＢ１１１文書情報１１２文書データ１１３関連性ＤＢ１１４関連性情報 101 terminal 102 Document Data Relationship Extraction Device 103 Document data management function 104 System definition information definition function 105 Relevance information extraction function 106 Display data generation function 107 system definition information 108 sentence pattern dictionary 109 Specifying term classification 110 document DB 111 Document information 112 document data 113 Relationship DB 114 Related information

Claims

[Claims]

1. A document data management function for managing registration and reference of a document and acquisition of difference information of the document based on an instruction from a terminal, document information for specifying the registered document, and contents of the document. A document database that stores document data that represents the above, and system definition information that defines a relationship extraction condition for extracting the relationship between the stored documents and system definition information that defines change or deletion of the extraction condition Definition function, a function of extracting relevance information that extracts information indicating relevance between documents from documents stored in the document database based on the system definition information, and information indicating relevance between extracted documents A document data relevance extraction device having a display data generation function for displaying.

2. The system definition information defining function according to claim 1, wherein the system definition information definition function corrects the system definition information, and based on the modification, the extraction unit re-extracts information indicating the relationship between the documents. Characteristic document data relation extraction device.

3. The display data generation function according to claim 1, further comprising a user interface for designating a display form of the relationship between documents displayed on the terminal, and the extracted document inter-documents are designated according to the designation of the interface. A document data relevance extraction device characterized by displaying information indicating relevance of the.

4. A step of managing registration and reference of a document and difference information of the document based on an instruction from a terminal, and document information specifying the registered document and document data representing the content of the document are mutually exchanged. Storing them in a document database in association with each other; defining relationship extraction conditions for extracting the relationships of the stored documents and defining system definition information for changing or deleting the extraction conditions; A step of extracting information indicating the relationship between the documents from the documents stored in the document database based on the above, and a step of displaying the information indicating the relationship between the extracted documents. Document data relevance extraction program

5. The method according to claim 4, wherein the step of defining the system definition information includes a step of modifying the system definition information, and the step of extracting information indicating the relevance between the documents is based on the modification. A document data relationship extracting program comprising the step of re-extracting information indicating the relationship between documents.