JP2007072723A

JP2007072723A - Document management device, document management method, program and recording medium

Info

Publication number: JP2007072723A
Application number: JP2005258519A
Authority: JP
Inventors: Satoshi Nakamura; 聡史中村
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2005-09-06
Filing date: 2005-09-06
Publication date: 2007-03-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document management device systematizing and managing documents on the basis of quotation relation between the documents by use of an information source, and allowing retrieval of a related document by use of a degree of sharing of the information source. <P>SOLUTION: This document management device has: an information acquisition means 1 acquiring information from the information source; an information storage means 2 storing the information acquired from the information source; an information source acquisition means 3 acquiring identification information of the information source; an information source embedding means 4 adding and embedding the identification information of the information source when adding the information to the document; an information source takeout means 5 taking out the identification information embedded by the information source embedding means 4; and a systematization means 6 generating systematization information obtained by systematizing the document on the basis of the identification information of the information source. The document management device manages the document on the basis of the systematization information. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、文書の情報源の識別情報を基づいて文書を管理し、共有する情報源を用いて関連文書を検索する文書管理装置および文書管理方法に関する。 The present invention relates to a document management apparatus and a document management method for managing a document based on identification information of a document information source and searching for a related document using a shared information source.

文書管理、検索に関連して、マルチメディア文書を文書の構成要素単位に分解し、それぞれの構成要素毎に特徴量を抽出して文書に関連付けて蓄積を行い、テキストだけでなく、画像や音声などテキスト以外の情報による類似文書検索を行う文書検索システムが提案されている（特許文献１参照）。
また、文書をテキストや画像のまとまりに分解し、そのまとまり単位の特徴量を抽出することにより、テキストと画像の両方または片方を用いて検索を行う方法が提案されている（特許文献２参照）。
特開２０００−１４８７９３公報特開２００１−０９２８５２公報 In relation to document management and retrieval, a multimedia document is decomposed into document component units, feature quantities are extracted for each component component, stored in association with the document, and not only text but also images and audio A document retrieval system that performs similar document retrieval using information other than text has been proposed (see Patent Document 1).
In addition, a method has been proposed in which a document is decomposed into a set of texts and images, and a feature amount for each unit is extracted to perform a search using both or one of the texts and images (see Patent Document 2). .
JP 2000-148793 A JP 2001-092552 A

関連文書検索を行う際、一般的にキーワードを用いた検索が行われる。キーワード検索は簡便であるが、表現の多様性、言葉の多義性などの問題から、従来の技術では適切な検索キーを見つけられなければ意図した結果が得られないという問題があった。
また特許文献１のように、キーワードと画像などを複合的に用いた文書検索システムも提案されているが、画像なども言葉同様に多義性・多様性を持つため、その意味をコンピュータに理解させることは困難であり、検索精度向上には結びつきにくい。
一般に一つの文書は単一のテーマについて書かれており、複数の文書がある文書の別々な部分を引用していても、引用された情報の根底をなすテーマが共通しているため、引用した文書は似たようなテーマについて書かれている可能性が高い。このことから、文書の関連性はどの程度情報源を共有しているかを用いて量ることができる。
本発明はこのような背景に鑑みてなされたものであり、情報源を用いて文書間の引用関係を基に文書を系統化して管理し、情報源の共有の程度を用いて関連文書の検索を可能とする文書管理装置および文書管理方法を提供することを目的とする。 When a related document search is performed, a search using a keyword is generally performed. Keyword search is simple, but due to problems such as diversity of expressions and ambiguity of words, the conventional technique has a problem that an intended result cannot be obtained unless an appropriate search key is found.
Also, as in Patent Document 1, a document search system using a combination of keywords and images has been proposed, but since images have ambiguity and diversity as well as words, let the computer understand the meaning. It is difficult to improve search accuracy.
In general, a single document is written on a single theme, and even if multiple documents cite different parts of a document, they are quoted because they share the same underlying theme. The document is likely to be written on a similar theme. From this, the relevance of documents can be measured using how much information sources are shared.
The present invention has been made in view of such a background, and systematically manages documents based on citation relationships between documents using information sources, and searches for related documents using the degree of sharing of information sources. An object of the present invention is to provide a document management apparatus and a document management method that enable the management of documents.

上記目的を達成するために、請求項１記載の発明は、情報源から情報を取得する情報取得手段と、該情報取得手段により取得された情報を記憶する情報記憶手段と、前記情報源の識別情報を取得する情報源取得手段と、文書に情報を追加する際に前記情報源の識別情報を付加して埋め込む情報源埋込手段と、該情報源埋込手段によって埋め込まれた識別情報を取り出す情報源取出手段と、該情報源取出手段により取り出した前記情報源の識別情報を基に文書を系統化した系統化情報を生成する系統化手段と、を備える文書管理装置を特徴とする。
請求項２記載の発明は、請求項１に記載の文書管理装置において、前記情報源取得手段が取得する情報源の識別情報と、前記情報源埋込手段が埋め込む識別情報が、情報源の名称、情報源の種類、情報源の場所の少なくとも一つを含むことを特徴とする。
請求項３記載の発明は、請求項１に記載の文書管理装置において、前記情報取得手段は、前記情報源から取得した情報に大元の情報源が存在した場合に、前記情報源の識別情報に加えて大元の情報源の識別情報を取得し、前記情報源埋込手段により前記情報源の識別情報に加えて大元の情報源の識別情報を文書に埋め込むことを特徴とする。
請求項４記載の発明は、請求項１に記載の文書管理装置において、前記系統化手段が前記情報源の識別情報と、前記情報源と文書との従属関係に基づいて系統化情報を作成することを特徴とする。
請求項５記載の発明は、請求項１に記載の文書管理装置において、前記系統化情報を基に系統図を作成し、作成した系統図上に対応する文書へのリンクを表示する文書系統図作成手段を備えることを特徴とする。
請求項６記載の発明は、請求項１に記載の文書管理装置において、文書間の類似度を各文書の情報源の識別情報を基にして算出し、類似度を基に文書を検索する類似文書検索手段を備えることを特徴とする。 In order to achieve the above object, the invention described in claim 1 is characterized in that information acquisition means for acquiring information from an information source, information storage means for storing information acquired by the information acquisition means, and identification of the information source Information source acquisition means for acquiring information, information source embedding means for adding and embedding the information source identification information when adding information to a document, and identification information embedded by the information source embedding means is taken out The document management apparatus includes: information source extraction means; and systematization means for generating systematic information obtained by systematizing documents based on the identification information of the information sources extracted by the information source extraction means.
According to a second aspect of the present invention, in the document management apparatus according to the first aspect, the information source identification information acquired by the information source acquisition unit and the identification information embedded by the information source embedding unit are information source names. And at least one of the type of information source and the location of the information source.
According to a third aspect of the present invention, in the document management apparatus according to the first aspect, the information acquisition unit is configured to identify the information source when the information source acquired from the information source includes a source information source. In addition, the identification information of the original information source is acquired, and the identification information of the original information source is embedded in the document in addition to the identification information of the information source by the information source embedding means.
According to a fourth aspect of the present invention, in the document management apparatus according to the first aspect, the systematizing unit creates systematic information based on identification information of the information source and a dependency relationship between the information source and the document. It is characterized by that.
According to a fifth aspect of the present invention, in the document management apparatus according to the first aspect, a system diagram is created on the basis of the systematization information, and a link to the corresponding document is displayed on the created system diagram. A creation means is provided.
The invention described in claim 6 is a document management apparatus according to claim 1, wherein the similarity between documents is calculated based on the identification information of the information source of each document, and the documents are searched based on the similarity. Document search means is provided.

請求項７記載の発明は、請求項１に記載の文書管理装置において、文書を表示する際に文書中に含まれる情報源からの追加された情報が存在した場合に、情報を情報源の種類に応じて色分けして表示する文書表示手段を備えることを特徴とする。
請求項８記載の発明は、情報源から情報を取得する第１ステップと、前記情報源から取得された情報を記憶する第２ステップと、前記情報源の識別情報を取得する第３ステップと、文書に情報を追加する際に前記情報源の識別情報を付加して埋め込む第４ステップと、埋め込まれた前記識別情報を取り出す第５ステップと、前記情報源の識別情報を基に文書を系統化した系統化情報を生成する第６ステップと、を有する文書管理方法を特徴とする。
請求項９記載の発明は、請求項８に記載の文書管理方法をコンピュータに実行させるプログラムを特徴とする。
請求項１０記載の発明は、請求項９に記載のプログラムを格納したコンピュータ読み取り可能な記録媒体を特徴とする。 According to the seventh aspect of the present invention, in the document management apparatus according to the first aspect, when there is added information from the information source included in the document when the document is displayed, the information is classified into the type of the information source. According to the present invention, there is provided a document display means for displaying in different colors.
The invention according to claim 8 is a first step of acquiring information from an information source, a second step of storing information acquired from the information source, a third step of acquiring identification information of the information source, 4th step of adding and embedding the information source identification information when adding information to the document, 5th step of extracting the embedded identification information, and systematizing the document based on the information source identification information And a sixth step of generating the organized information.
According to a ninth aspect of the invention, there is provided a program for causing a computer to execute the document management method according to the eighth aspect.
A tenth aspect of the invention is characterized by a computer-readable recording medium storing the program according to the ninth aspect.

請求項１、請求項８に記載の本発明によれば、系統情報を基に文書を管理することができるので、利用者は文書をその文書の情報源の情報を基に系統付けて管理することが可能となる。
請求項２に記載の本発明によれば、情報源取得手段が取得する情報源の識別情報ならびに情報源埋込手段が埋め込む識別情報が、情報源の名称、情報源の種類、情報源の場所の少なくとも一つを含むことから利用者は文書に利用された情報の情報源を辿ることが可能となる。
請求項３に記載の本発明によれば、利用者は文書に利用された情報に直接の情報源以外の大元の情報源が存在した場合に、大元の情報源を辿ることが可能となる。
請求項４に記載の本発明によれば、利用者は情報源と文書との従属関係を維持した形で系統化して文書を管理することが可能となる。
請求項５に記載の本発明によれば、利用者は文書間の従属関係を系統図に表して文書を管理することができ、文書の従属関係を容易に把握することが可能となる。
請求項６に記載の本発明によれば、利用者は類似文書の検索を文書の内容ではなく、文書の情報源を基に行うことが可能となる。
請求項７に記載の本発明によれば、利用者は文書に含まれている情報の情報源の種類を容易に把握できるとともに、情報源を辿るために必要な手段を知ることができる。 According to the present invention, the document can be managed based on the system information, so that the user manages the document based on the information of the information source of the document. It becomes possible.
According to the second aspect of the present invention, the identification information of the information source acquired by the information source acquisition unit and the identification information embedded by the information source embedding unit include the name of the information source, the type of the information source, and the location of the information source. Since at least one of them is included, the user can trace the information source of the information used for the document.
According to the third aspect of the present invention, the user can trace the source information source when there is a source information source other than the direct information source in the information used for the document. Become.
According to the fourth aspect of the present invention, the user can manage the document in a systematic manner in a manner that maintains the dependency relationship between the information source and the document.
According to the fifth aspect of the present invention, the user can manage the documents by representing the dependency relationships between the documents in a system diagram, and can easily grasp the document dependency relationship.
According to the present invention described in claim 6, the user can search for a similar document based on the information source of the document, not on the content of the document.
According to the present invention as set forth in claim 7, the user can easily grasp the type of information source of the information included in the document and can know the means necessary for tracing the information source.

以下、図面を参照して、本発明の実施形態を詳細に説明する。
図１は本発明の実施形態に係る文書管理装置の構成図である。本文書管理装置は、情報源から情報を取得する情報取得手段１と、情報源から取得された情報を記憶する情報記憶手段２と、情報源の識別情報を取得する情報源取得手段３と、文書に情報を追加する際に情報源の識別情報を付加して埋め込む情報源埋込手段４と、情報源埋込手段４によって埋め込まれた識別情報を取り出す情報源取出手段５と、情報源の識別情報を基に文書を系統化した系統化情報を生成する系統化手段６と、系統化情報を基に系統図を作成し系統図上に対応する文書へのリンクを表示する文書系統図作成手段７と、文書間の類似度を各文書の情報源の識別情報を基にして算出し、類似度を基に文書を検索する類似文書検索手段８と、文書を表示する際に文書中に含まれる情報源からの追加された情報が存在した場合に、情報を情報源の種類に応じて色分けして表示する文書表示手段９とを備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a configuration diagram of a document management apparatus according to an embodiment of the present invention. The document management apparatus includes an information acquisition unit 1 that acquires information from an information source, an information storage unit 2 that stores information acquired from the information source, an information source acquisition unit 3 that acquires identification information of the information source, An information source embedding unit 4 for adding and embedding information source identification information when adding information to a document, an information source extracting unit 5 for extracting identification information embedded by the information source embedding unit 4, and an information source Systematic means 6 for generating systematic information that systematizes documents based on identification information, and creation of a systematic diagram for generating a systematic diagram based on the systematic information and displaying links to corresponding documents on the systematic diagram Means 7, similarity between documents is calculated based on the identification information of the information source of each document, and similar document search means 8 that searches for a document based on the similarity, and when the document is displayed, If additional information from the included sources exists, the information The color-coded according to the type of information source and a document display unit 9 for displaying.

［第１の実施形態］
本実施形態では図１に示す文書管理装置を実装したパソコンを用いて、図２に示すような電子文書Ａを作成している利用者が、既に作成済みの図３に示すような電子文書Ｘの情報を引用する場合について説明する。尚、電子文書Ａの情報源は電子文書Ｘのみとし、電子情報Ｘには情報源が存在しないものとする。また、図１に示す文書管理装置を実装する装置はパソコンに限らず、携帯端末やＭＦＰなどでも差し支えない。
図４は本実施形態の文書管理装置における処理のうち、大元の情報源識別情報の取得までの処理を示すフローチャート、図５は本実施形態文書管理装置における処理のうち、大元の情報源識別情報を取得した以降の処理を示すフローチャートである。
まず利用者は電子文書Ｘの、引用したいと考えている図３に示す画像１２を選択し（Ｓ１）、クリップボードにコピーする旨をパソコンに指示する（Ｓ２）。するとパソコンは利用者に指定された電子文書Ｘの画像１２をクリップボードに取得すると共に（Ｓ３）、その画像の情報源である電子文書Ｘの識別情報も合わせて取得する（Ｓ４）。
識別情報としては、情報源の名称やＩＤ、種類（電子文書、Ｗｅｂページ、紙文書など）、場所（ファイルパス、ＵＲＬ、保管場所など）などが挙げられる。指定した情報に大元の情報源の識別情報が存在した場合（Ｓ５のＹｅｓ）、大元の情報源の識別情報を取得する（Ｓ６）。ここで大元の情報源と呼んでいるものは、情報が引用を繰り返されていた場合にその情報が始めて登場した情報源、つまり情報のオリジナルの所在地に当たる。
次に利用者は、画像の引用先である電子文書Ａの所定の位置１１（図２）を指定し（Ｓ１１）、クリップボードに取得されている情報を貼り付ける旨をパソコンに指示する（Ｓ１２）。するとパソコンはクリップボードを確認し情報が存在すれば（Ｓ１３のＹｅｓ）、その情報を貼り付ける（Ｓ１４）。情報とともに情報源の識別情報が存在すれば（Ｓ１５のＹｅｓ）、情報源の識別情報を情報と合わせて貼り付ける（Ｓ１６）。 [First Embodiment]
In this embodiment, a user who has created an electronic document A as shown in FIG. 2 using a personal computer equipped with the document management apparatus shown in FIG. 1 has already created an electronic document X as shown in FIG. The case of quoting the information will be described. It is assumed that the information source of the electronic document A is only the electronic document X, and there is no information source in the electronic information X. 1 is not limited to a personal computer, and may be a portable terminal, an MFP, or the like.
FIG. 4 is a flowchart showing processing up to acquisition of the original information source identification information in the processing in the document management apparatus according to the present embodiment. FIG. 5 is the main information source in processing in the document management apparatus according to the present embodiment. It is a flowchart which shows the process after acquiring identification information.
First, the user selects the image 12 shown in FIG. 3 to be cited in the electronic document X (S1), and instructs the personal computer to copy it to the clipboard (S2). Then, the personal computer acquires the image 12 of the electronic document X designated by the user on the clipboard (S3) and also acquires the identification information of the electronic document X that is the information source of the image (S4).
The identification information includes the name and ID of the information source, the type (electronic document, Web page, paper document, etc.), location (file path, URL, storage location, etc.), and the like. When the identification information of the original information source exists in the designated information (Yes in S5), the identification information of the original information source is acquired (S6). What is called the source of information here is the information source where the information first appeared when the information is repeatedly quoted, that is, the original location of the information.
Next, the user designates a predetermined position 11 (FIG. 2) of the electronic document A to which the image is cited (S11), and instructs the personal computer to paste the information acquired on the clipboard (S12). . Then, the personal computer confirms the clipboard, and if information exists (Yes in S13), the information is pasted (S14). If the information source identification information exists together with the information (Yes in S15), the information source identification information is pasted together with the information (S16).

第１の実施形態では、クリップボード上に存在する電子文書Ｘからコピーされた画像を電子文書Ａの指定された位置１１に貼り付けると同時に、画像とともに取得された情報源である電子文書Ｘの識別情報を、貼り付けられた画像に付加する。更に、情報の大元の情報源を識別する識別情報が存在すれば（Ｓ１７のＹｅｓ）、その情報も合わせて貼り付ける（Ｓ１８）。
このように情報源である電子文書などから情報である画像などを得る際は情報源の識別情報も合わせて取得し、情報を文書に追加する際に情報に情報源の識別情報も合わせて追加する。更に情報に大元の情報源が存在すればその識別情報も合わせて追加する。つまり、引用された情報に情報の親となる文書の識別情報と、根源となる文書の識別情報が合わせて付加されるわけである。
ここでは引用される情報として画像を挙げたが、これだけに留まらず、文字や文章、図表は勿論のこと、映像などのマルチメディアコンテンツについても同様に扱うことができる。
情報の引用元が電子文書であれば、引用する情報やその情報源の識別情報はクリップボードにコピーすることができるが、情報源が紙文書など直接クリップボードにコピーを行うことができないものに関しては、スキャナで画像として取り込む、利用者が直接入力するなどした後、手動で情報源の識別情報を入力できることが望ましい。 In the first embodiment, the image copied from the electronic document X existing on the clipboard is pasted to the designated position 11 of the electronic document A, and at the same time, the electronic document X that is the information source acquired together with the image is identified. Information is added to the pasted image. Further, if there is identification information for identifying the information source that is the source of the information (Yes in S17), the information is also pasted together (S18).
In this way, when obtaining an image as information from an electronic document as an information source, the information source identification information is also acquired, and when information is added to the document, the information source identification information is also added to the information. To do. Further, if there is a source of information in the information, the identification information is also added. That is, the identification information of the document that is the parent of the information and the identification information of the document that is the source are added together with the cited information.
Here, images are cited as cited information. However, the present invention is not limited to this, and multimedia contents such as videos as well as characters, sentences and diagrams can be handled in the same manner.
If the source of information is an electronic document, the information to be cited and the identification information of the information source can be copied to the clipboard, but for information sources that cannot be copied directly to the clipboard, such as paper documents, It is desirable that the information source identification information can be manually input after being captured as an image by a scanner or directly input by a user.

情報源の識別情報を付加する仕方としては、図６（ａ）のように引用先の所定の場所に構造化文書を用いて識別情報を記述しても良いし、図６（ｂ）のように引用した情報に直接情報源の情報をタグのような形で付加してもよいし、他の方法を用いてもよい。
図６（ａ）及び（ｂ）はいずれも“pict1.jpeg”を“C:\Documents\”にあるX.docという電子ファイルから引用し、その“pict1.jpeg”の大元の情報源は“http://www.abc.com.pq/”に存在する“xyz.html”であることを示している。
また図６（ａ）では情報源の識別情報として名称、種類、所在地を記述しているのに対し、図６（ｂ）では情報源の所在地と名称を合わせて記述し、情報源の種類は明記していない。情報源の識別情報の記述方法に関しては様々な組み合わせが考えられ、これらに限定されるものではない。
利用者は電子文書Ａの作成を終え、パソコンの記憶装置内の適切な場所に格納したものとする。電子文書Ａの内部には、電子文書Ｘから画像の引用を行ったことが記述されているので、本実施形態の文書管理装置により、自動的に電子文書Ａと電子文書Ｘが関連付けられ、電子文書Ａは電子文書Ｘの属する系統で電子文書Ｘの子文書として登録された系統情報が作成される。 As a method of adding the identification information of the information source, the identification information may be described using a structured document at a predetermined place of citation as shown in FIG. 6A, or as shown in FIG. 6B. The information of the information source may be added directly to the information quoted in the form of a tag, or other methods may be used.
6 (a) and 6 (b) both quote “pict1.jpeg” from an electronic file called “X.doc” in “C: \ Documents \”, and the source of information of “pict1.jpeg” is “Xyz.html” present in “http: //www.abc.com.pq/”.
In FIG. 6 (a), the name, type, and location are described as information source identification information, whereas in FIG. 6 (b), the location and name of the information source are described together. Not specified. Various combinations are possible for the description method of the identification information of the information source, and the present invention is not limited to these.
It is assumed that the user has created the electronic document A and stored it in an appropriate location in the storage device of the personal computer. Since it is described in the electronic document A that the image has been cited from the electronic document X, the electronic document A and the electronic document X are automatically associated with each other by the document management apparatus of the present embodiment. In the document A, system information registered as a child document of the electronic document X in the system to which the electronic document X belongs is created.

電子文書Ｘが系統の最上位に位置していれば、電子文書Ａは電子文書Ｘの直接の子文書なので、電子文書Ｘに続く位置に配置されることになる（図７（ａ））。また、電子文書Ｘが系統の途中に配置されていれば、電子文書Ａは電子文書Ｘに続く位置に配置されるわけである（図７（ｂ））。
ここで、電子文書Ｘの子文書として電子文書Ｂという、電子文書Ｘから文章の一部を引用した文書が存在したとする。電子文書Ａを用いて関連文書検索を行った場合は、この例では電子文書Ａは電子文書Ｘからのみ引用を行っているので、関連文書と判定されるのは電子文書Ａの情報源である電子文書Ｘおよび電子文書Ｘの子文書である電子文書Ｂとなる。
電子文書Ａと情報源である電子文書Ｘとの関連度の評価値は引用された情報量に比例させてもよいし、引用されたか引用されていないかという情報のみを用いてもよいし、他の方法であってもよい。本実施形態の関連文書検索は単体で用いてもよいし、キーワード検索、画像検索などと組み合わせて用いてもよい。
このようにして、本実施形態の文書管理装置を用いることにより、利用者はキーワードや画像などの検索キーを意識することなく、関連文書の検索を行うことができる。本実施形態の文書管理装置に従来のキーワード検索や画像検索などを組み合わせれば、より多面的な検索が可能となり、検索性能が向上することが期待できる。 If the electronic document X is located at the highest level of the system, the electronic document A is a direct child document of the electronic document X, and therefore is arranged at a position following the electronic document X (FIG. 7A). If the electronic document X is arranged in the middle of the system, the electronic document A is arranged at a position following the electronic document X (FIG. 7B).
Here, it is assumed that there is a document referred to as an electronic document B, which is a part of the sentence from the electronic document X, as a child document of the electronic document X. When the related document search is performed using the electronic document A, since the electronic document A is cited only from the electronic document X in this example, the information source of the electronic document A is determined as the related document. The electronic document X and the electronic document B that is a child document of the electronic document X are obtained.
The evaluation value of the relevance between the electronic document A and the electronic document X that is the information source may be proportional to the amount of information cited, or only information about whether it is cited or not cited may be used. Other methods may be used. The related document search of this embodiment may be used alone or in combination with a keyword search, an image search, or the like.
In this way, by using the document management apparatus of the present embodiment, the user can search for related documents without being aware of search keys such as keywords and images. If a conventional keyword search, image search, or the like is combined with the document management apparatus of the present embodiment, a multifaceted search can be performed, and it can be expected that the search performance is improved.

［第２の実施形態］
第１の実施形態では電子文書Ａは電子文書Ｘのみを情報源としていたが、第２の実施形態は複数の情報源を持つ場合について説明する。
情報源から情報を取得しそれを追加する方法は第１の実施形態の通りである。ただし、第２の実施形態では電子文書Ａは電子文書Ｘから画像と電子文書Ｙから文章の一部を引用し、また電子文書Ｂは電子文書Ｘと電子文書Ｚからそれぞれ文章の一部を引用しているものとする。また、電子文書Ｘ、電子文書Ｙ、電子文書Ｚはいずれも情報源を持たないものとする。
情報源の識別情報は電子文書Ａと電子文書Ｂのそれぞれに記述されているので、本実施形態の文書管理装置により、これらは自動的に情報源と関連付けられる。電子文書Ａと電子文書Ｂは二つの電子文書を情報源としているので、系統情報の上では電子文書Ａは電子文書Ｘと電子文書Ｙのそれぞれの系統、電子文書Ｂは電子文書Ｘと電子文書Ｚのそれぞれの系統に属する。情報源が三つ以上であった場合も同様で、情報源の数だけの系統に属することになる。
ここで、電子文書Ａを用いて関連文書検索を行ったとする。電子文書Ａは電子文書Ｘおよび電子文書Ｙの子文書であるので、関連文書としてはこれら二つと、電子文書Ｘを共有の情報源として持つ電子文書Ｂの計三文書が選ばれることになる。
電子文書Ａと電子文書Ｂは系統上ではどちらも電子文書Ｘの子文書であるが、二つの文書が共有する情報源は電子文書Ｘのみなので、仮に別に電子文書Ｘと電子文書Ｙを情報源とする電子文書Ｃが存在した時に、共有情報源が一つである電子文書Ａと電子文書Ｂとの関連度の評価値は、共有情報源が二つである電子文書Ａと電子文書Ｃとの評価値と比較すると、引用した情報量が同じであれば小さくなることが望ましい。
また電子文書Ａと電子文書Ｃは電子文書Ｙから文章の一部を引用しているが、同じ文章を引用している場合の方が異なる文章を引用している場合よりも評価値が高く判定されることが望ましい。
つまり、同じ情報源から同じ情報を引用している場合は、同じ情報源から異なる情報を引用している場合よりも、引用先が同じ内容に言及している可能性が高いことが予想されることから、評価値の判定もこれに従い高くなることが望ましい。また、共有する情報源の数が多いほど評価値の判定が高くなることが望ましい。
このようにして、共有する情報源の数や共有する情報に応じて関連度の評価値を適切に変化させることによって、目的とする文書をより確実かつ簡便に得ることができるようになる。 [Second Embodiment]
In the first embodiment, the electronic document A uses only the electronic document X as an information source, but the second embodiment will explain a case where there are a plurality of information sources.
The method of acquiring information from the information source and adding it is as in the first embodiment. However, in the second embodiment, the electronic document A quotes a part of the text from the electronic document X and the electronic document Y, and the electronic document B quotes a part of the text from the electronic document X and the electronic document Z, respectively. Suppose you are. Further, it is assumed that none of the electronic document X, the electronic document Y, and the electronic document Z has an information source.
Since the identification information of the information source is described in each of the electronic document A and the electronic document B, these are automatically associated with the information source by the document management apparatus of this embodiment. Since the electronic document A and the electronic document B have two electronic documents as information sources, the electronic document A is an electronic document X and an electronic document Y, and the electronic document B is an electronic document X and an electronic document. It belongs to each system of Z. The same applies to the case where there are three or more information sources, and the information sources belong to the number of information sources.
Here, it is assumed that a related document search is performed using the electronic document A. Since the electronic document A is a child document of the electronic document X and the electronic document Y, three documents are selected as the related documents, that is, the electronic document B having the electronic document X as a shared information source.
The electronic document A and the electronic document B are both child documents of the electronic document X in the system, but since the information source shared by the two documents is only the electronic document X, the electronic document X and the electronic document Y are assumed to be information sources separately. When there is an electronic document C, the evaluation value of the relevance between the electronic document A and the electronic document B having one shared information source is the electronic document A and the electronic document C having two shared information sources. Compared with the evaluation value, it is desirable that the amount of quoted information is the same.
In addition, the electronic document A and the electronic document C quote a part of the sentence from the electronic document Y, but the evaluation value is higher when the same sentence is cited than when the different sentence is cited. It is desirable that
In other words, if you are quoting the same information from the same source, you are more likely to refer to the same content than when you are quoting different information from the same source. Therefore, it is desirable that the evaluation value is also determined to be higher according to this. Further, it is desirable that the evaluation value determination be higher as the number of information sources to be shared is larger.
Thus, by appropriately changing the evaluation value of the degree of association according to the number of information sources to be shared and the information to be shared, the target document can be obtained more reliably and easily.

［第３の実施形態］
第３の実施形態は情報源の識別情報を元に系統図を作成する場合について説明する。
電子文書Ａと電子文書Ｂは電子文書Ｘを情報源とし、電子文書Ｘは電子文書Ｙを情報源とし、電子文書Ｚは電子文書Ｙを情報源としているものとする。
これらの電子文書を系統情報に表すと、電子文書Ｙの系統に電子文書Ｘと電子文書Ｚが属し（図８（ａ））、電子文書Ｘの系統に電子文書Ａと電子文書Ｂが属する（図８（ｂ））という形に表すことができる。また、電子文書Ｘを介して二つの系統を結合し、電子文書Ｙが系統の最上位である第一階層に位置し、その子文書として電子文書Ｘと電子文書Ｚが第二階層で電子文書Ｙにぶら下がり、さらに電子文書Ｘの子文書である電子文書Ａと電子文書Ｂが第三階層で電子文書Ｘにぶら下がるという形に表すこともできる（図８（ｃ））。
文書の評価値ではなく相関関係を元に関連文書を探すには、文書の相関が視覚的に容易に認識できる形で表現されていることが望ましい。これを実現するために先に求めた系統情報を系統図という形で利用者に表示し、更に系統図の各ノードに該当する文書へのリンクを貼っておく。
これによって利用者は文書の相関関係を利用者自身で判断して関連文書を探すことが可能となる。仮に利用者が電子文書Ａの関連文書を探していた時に、系統図が図８（ｃ）のようであったとすれば、電子文書Ｂや電子文書Ｘ、更には電子文書Ｙや電子文書Ｚが検索なしで関連性を見出すことができる。
視認性を考慮して、階層構造を利用者が任意に展開または省略できることが望ましい。階層構造が省略されている場合は、それを容易に把握できることが望ましい。また、第１の実施形態や第２の実施形態で求めた文書間の関連度の評価値を用いて、例えば評価値が高いと文書間の距離が短く、逆に評価値が低いと距離が長くなるように配置してもよい。 [Third Embodiment]
In the third embodiment, a case where a system diagram is created based on identification information of an information source will be described.
Assume that the electronic document A and the electronic document B use the electronic document X as an information source, the electronic document X uses the electronic document Y as an information source, and the electronic document Z uses the electronic document Y as an information source.
When these electronic documents are represented in the systematic information, the electronic document X and the electronic document Z belong to the system of the electronic document Y (FIG. 8A), and the electronic document A and the electronic document B belong to the system of the electronic document X ( It can be expressed in the form of FIG. In addition, the two systems are connected via the electronic document X, the electronic document Y is located in the first hierarchy, which is the highest level of the system, and the electronic document X and the electronic document Z as the child documents are the electronic document Y in the second hierarchy. Furthermore, the electronic document A and the electronic document B, which are child documents of the electronic document X, can also be represented as hanging on the electronic document X in the third hierarchy (FIG. 8C).
In order to search for a related document based on the correlation rather than the document evaluation value, it is desirable that the correlation of the document is expressed in a form that can be easily recognized visually. In order to realize this, the system information obtained previously is displayed to the user in the form of a system diagram, and a link to a document corresponding to each node of the system diagram is pasted.
As a result, the user can search for a related document by judging the correlation of the document by the user himself / herself. If the user is searching for a related document of the electronic document A and the system diagram is as shown in FIG. 8C, the electronic document B and the electronic document X, and further the electronic document Y and the electronic document Z Relevance can be found without searching.
In consideration of visibility, it is desirable that the user can arbitrarily expand or omit the hierarchical structure. If the hierarchical structure is omitted, it is desirable that it can be easily grasped. Also, using the evaluation value of the degree of association between documents obtained in the first embodiment and the second embodiment, for example, if the evaluation value is high, the distance between documents is short, and conversely if the evaluation value is low, the distance is You may arrange | position so that it may become long.

［第４の実施形態］
第１の実施形態〜第３の実施形態までは、情報源として電子文書のみを使用したが、実際には第１の実施形態でも触れているように情報源は電子文書に関わらず、Ｗｅｂページや紙文書、画像や映像データなどを用いることも可能である。従って情報源の種類は多岐に渡るため、利用者が情報源を辿ろうと考えた時に情報源の種類を容易に把握できることが望ましい。
一例を挙げるならば、情報源が電子文書やＷｅｂページなど電子的な情報であればパソコン上からアクセスできるが、情報源が紙文書であれば紙文書を物理的に取得する必要がある。そもそも情報源が存在しないことも考えられる。
本実施形態では、情報源の種類に応じて情報を色分けすることができる。具体的には情報源が電子文書であれば赤で、Ｗｅｂページであれば青で、紙文書であれば黄で、画像や映像などマルチメディアデータであれば緑でと、色分けして表示することが考えられる。無論、情報源の種類や分類はここで挙げたものに限らない。
また、色分けに使用する色もここで挙げたものに限定されるわけではない。色分けの仕方は、各情報を該当する色で塗り潰してもよいし、各情報の背景に該当する色を表示してもよいし、各情報に該当する色の下線や枠をつけてもよいし、他の方法でもよい。
引用された情報に情報源へのリンクをはり、利用者がそのリンクをクリックするなどしてリンクを辿る旨をパソコンに指示した場合、指示を受けたパソコンが適切なプログラムを用いて情報源を表示することが望ましい。
以上のようにして、利用者は引用された情報の情報源の種類を視覚的に容易に判別することができ、これによってどの様にその情報源にアクセスすればよいかを判断することができる。 [Fourth Embodiment]
In the first embodiment to the third embodiment, only an electronic document is used as an information source. However, as mentioned in the first embodiment, the information source is actually a Web page regardless of the electronic document. It is also possible to use paper documents, images and video data. Therefore, since there are various types of information sources, it is desirable that the type of information source can be easily grasped when the user considers following the information source.
For example, if the information source is electronic information such as an electronic document or a Web page, it can be accessed from a personal computer. However, if the information source is a paper document, it is necessary to physically acquire the paper document. In the first place, there may be no information source.
In the present embodiment, information can be color-coded according to the type of information source. Specifically, if the information source is an electronic document, it is displayed in red, blue if it is a Web page, yellow if it is a paper document, and green if it is multimedia data such as an image or video. It is possible. Of course, the types and classifications of information sources are not limited to those listed here.
Also, the colors used for color coding are not limited to those listed here. The method of color coding may be to fill each information with a corresponding color, display a color corresponding to the background of each information, or add an underline or a frame corresponding to each information. Other methods may be used.
When a link to an information source is added to the cited information and the user instructs the computer to follow the link by clicking the link, etc., the computer that received the instruction uses the appropriate program to select the information source. It is desirable to display.
As described above, the user can easily visually determine the type of the information source of the cited information, and thereby determine how to access the information source. .

なお、本発明は、上述した実施形態の各機能をプログラム化し、そのプログラムを実行することによって、本発明の目的が達成されることは言うまでもない。この場合、記録媒体から読み出されて実行された状態が上述した実施形態の機能を実現することになり、そのプログラムおよびそのプログラムを記録した記録媒体も本発明を構成することになる。
なお、このような機能を実現するプログラムは、半導体媒体（例えば、ＲＯＭ、不揮発性メモリ等）、光媒体（例えば、ＤＶＤ、ＭＯ、ＭＤ、ＣＤ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）等いずれの形態の記録媒体で提供されてもよい。あるいは、ネットワーク等の通信網を介して記憶装置に格納されたプログラムをサーバコンピュータから直接供給を受けるようにしてもよい。この場合、このサーバコンピュータの記憶装置も本発明の記録媒体に含まれる。 Needless to say, the present invention achieves the object of the present invention by programming each function of the above-described embodiment and executing the program. In this case, the state read and executed from the recording medium realizes the functions of the above-described embodiment, and the program and the recording medium on which the program is recorded also constitute the present invention.
Note that a program that realizes such a function includes a semiconductor medium (eg, ROM, nonvolatile memory, etc.), an optical medium (eg, DVD, MO, MD, CD, etc.), a magnetic medium (eg, magnetic tape, flexible disk, etc.). Etc.) may be provided in any form of recording medium. Alternatively, the program stored in the storage device may be directly supplied from the server computer via a communication network such as a network. In this case, the storage device of this server computer is also included in the recording medium of the present invention.

本実施形態の実施形態に係る文書管理装置の構成図である。It is a block diagram of the document management apparatus which concerns on embodiment of this embodiment. 電子文書において情報引用位置を示す図である。It is a figure which shows an information citation position in an electronic document. 図２の引用位置に引用される電子文書の画像を示す図である。It is a figure which shows the image of the electronic document quoted in the quotation position of FIG. 本実施形態の文書管理装置における処理のうち、大元の情報源識別情報の取得までの処理を示すフローチャートである。It is a flowchart which shows the process until acquisition of the original information source identification information among the processes in the document management apparatus of this embodiment. 本実施形態の文書管理装置における処理のうち、大元の情報源識別情報を取得した以降の処理を示すフローチャートである。It is a flowchart which shows the process after acquiring the original information source identification information among the processes in the document management apparatus of this embodiment. 識別情報が付加された電子文書を示す図である。It is a figure which shows the electronic document to which identification information was added. 親文書、子文書の関係を示す図（その１）である。FIG. 3 is a diagram (part 1) illustrating a relationship between a parent document and a child document. 親文書、子文書の関係を示す図（その２）である。FIG. 6 is a diagram (part 2) illustrating a relationship between a parent document and a child document.

Explanation of symbols

１情報取得手段、２情報記憶手段、３情報源取得手段、４情報源埋込手段、５情報源取出手段、６系統化手段、７文書系統図作成手段、８類似文書検索手段、９文書表示手段 DESCRIPTION OF SYMBOLS 1 Information acquisition means, 2 Information storage means, 3 Information source acquisition means, 4 Information source embedding means, 5 Information source extraction means, 6 Systematization means, 7 Document system diagram preparation means, 8 Similar document search means, 9 Document display means

Claims

Information acquisition means for acquiring information from an information source, information storage means for storing information acquired by the information acquisition means, information source acquisition means for acquiring identification information of the information source, and adding information to a document Information source embedding means for adding and embedding the identification information of the information source, information source extracting means for extracting the identification information embedded by the information source embedding means, and the information extracted by the information source extracting means A document management apparatus comprising: systematization means for generating systematization information that systematizes documents based on source identification information.

2. The document management apparatus according to claim 1, wherein the identification information of the information source acquired by the information source acquisition unit and the identification information embedded by the information source embedding unit include an information source name, an information source type, and an information source. A document management apparatus including at least one of the locations.

The document management apparatus according to claim 1, wherein the information acquisition unit includes a source information source in addition to the identification information of the information source when the source information exists in the information acquired from the information source. The document management apparatus is characterized in that identification information of an original information source is embedded in a document in addition to the information source identification information by the information source embedding means.

2. The document management apparatus according to claim 1, wherein the systematizing unit creates systematic information based on identification information of the information source and a dependency relationship between the information source and the document. .

The document management apparatus according to claim 1, further comprising a document system diagram creating unit that creates a system diagram based on the systematic information and displays a link to a corresponding document on the created system diagram. Document management device.

The document management apparatus according to claim 1, further comprising: a similar document search unit that calculates a similarity between documents based on identification information of an information source of each document, and searches for a document based on the similarity. Document management device.

2. The document management apparatus according to claim 1, wherein when there is added information from an information source included in the document when the document is displayed, the information is displayed in different colors according to the type of the information source. A document management apparatus comprising a document display means.

When adding information to a document, a first step of acquiring information from an information source, a second step of storing information acquired from the information source, a third step of acquiring identification information of the information source, and A fourth step of adding and embedding identification information of the information source, a fifth step of extracting the embedded identification information, and generating systematic information that systematizes documents based on the identification information of the information source A document management method comprising: 6 steps.

A program for causing a computer to execute the document management method according to claim 8.

A computer-readable recording medium storing the program according to claim 9.