JP2010044556A

JP2010044556A - Document tracking system

Info

Publication number: JP2010044556A
Application number: JP2008207780A
Authority: JP
Inventors: Yusuke Ota; 雄介太田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2008-08-12
Filing date: 2008-08-12
Publication date: 2010-02-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document tracking system which can track a document without available identification information and a document with unregistered identification information assigned thereto. <P>SOLUTION: If a tracking scanner 20 or a tracking PC 21 designates a tracking object document, a content analysis server 2 extracts from a file server 1 a document having content similar in content to the tracking object document, and a tracking server 22 tracks a history of input/output for the extracted document to present a tracking result to the tracking scanner 20 or the tracking PC 21. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、文書追跡システムに関し、特に、文書が複製または複写された経路を追跡する文書追跡システムに関する。 The present invention relates to a document tracking system, and more particularly, to a document tracking system that tracks a route in which a document is duplicated or copied.

近年、ＩＴシステムの発達によって重要文書の流出が広範囲に発生するようになり、情報漏洩が大きな問題となっている。その対策として、電子文書に関してはユーザ認証やアクセス制御、紙文書に関してはマル秘スタンプの追加による心理的な抑止や印刷情報の付加による生成元の判別等が提案されている。 In recent years, leakage of important documents has occurred in a wide range due to the development of IT systems, and information leakage has become a big problem. As countermeasures, for electronic documents, user authentication and access control, and for paper documents, psychological deterrence by adding a secret stamp, generation source determination by adding print information, and the like have been proposed.

しかしながら、特に紙文書に関するセキュリティ対策は不十分である。このため、ある紙文書に対して、それがどのプリンタで印刷され、どの複合機でコピーされ、どのスキャナでスキャンされたか等という文書の流通経路を正確に把握するために、用紙を一意に識別する識別子と、用紙と、その用紙に対する操作情報とを関連付けて追跡ログとして記録し、追跡したい対象用紙をユーザがデジタル複合機で「追跡スキャン」すると、複合機が対象用紙から識別子を検出し、追跡ログに基づいて、その用紙の来歴と流通先を解析し、結果をユーザに示すものがある（例えば、特許文献１参照）。 However, the security measures especially for paper documents are insufficient. For this reason, a paper document is uniquely identified in order to accurately grasp the distribution route of the document such as which printer was printed, which MFP was copied, and which scanner was scanned. The identifier to be recorded, the paper, and the operation information for the paper are recorded as a tracking log. When the user performs “tracking scan” on the target paper to be tracked by the digital multifunction peripheral, the multifunction peripheral detects the identifier from the target paper, There is one that analyzes the history and distribution destination of the paper based on the tracking log and shows the result to the user (for example, see Patent Document 1).

また、スキャナによって紙文書から読み取られた画像データを一方向関数（ハッシュ関数）にかけ、算出されたハッシュ値をその画像データの識別子として使用することで、その電子ファイルの流通経路を後から追跡できるようにしたことにより、紙文書をスキャンして得られた電子文書も追跡対象とすることができ、文書の流通経路の追跡範囲を広げるものもある（例えば、特許文献２参照）。
特開２００８−４２６６６号公報特願２００８−０３９６１２号 In addition, the distribution path of the electronic file can be traced later by applying the one-way function (hash function) to the image data read from the paper document by the scanner and using the calculated hash value as the identifier of the image data. By doing so, an electronic document obtained by scanning a paper document can also be set as a tracking target, and there are some which expand the tracking range of the distribution route of the document (for example, see Patent Document 2).
JP 2008-42666 A Japanese Patent Application No. 2008-039612

しかしながら、上述した従来の技術は、対象となる文書が記録された記録媒体（例えば、スキャンする文書が記録された用紙）を識別するための識別情報に基づいて対象文書の追跡を行っているため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができないといった課題があった。 However, the above-described conventional technique tracks the target document based on identification information for identifying a recording medium on which the target document is recorded (for example, a sheet on which a document to be scanned is recorded). However, there is a problem that it is not possible to track a document for which identification information cannot be obtained or a document to which identification information not registered is assigned.

本発明は、従来の課題を解決するためになされたもので、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる文書追跡システムを提供することを目的とする。 The present invention has been made to solve the conventional problems, and provides a document tracking system capable of tracking a document for which identification information cannot be obtained or a document to which identification information not registered is assigned. For the purpose.

本発明の文書追跡システムは、文書に対する入出力操作を追跡する文書追跡システムにおいて、前記文書を格納するファイルサーバと、前記ファイルサーバに格納された文書の内容から取得される特徴量に該文書を識別するための登録文書識別情報を関連付けた特徴量テーブルを予め格納する特徴量テーブル格納装置と、前記文書に対する入出力操作が行われるたびに、該文書を識別するための流通文書識別情報を取得する少なくとも１つの端末装置と、前記端末装置による入出力操作を表す操作情報を該流通文書識別情報に関連付けた操作テーブルを格納する操作テーブル格納装置と、追跡対象の文書が指定される追跡対象文書指定装置と、前記追跡対象文書指定装置に指定された文書の内容から特徴量を取得し、取得した特徴量と、前記特徴量テーブルに含まれる特徴量とを比較することによって、前記追跡対象の文書に内容が類似する文書の登録文書識別情報を検出する文書識別情報検出装置と、前記文書識別情報検出装置によって検出された登録文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、該文書に対する入出力の履歴を追跡する追跡処理を行う追跡処理装置と、を備えた構成を有している。 The document tracking system according to the present invention is a document tracking system that tracks input / output operations on a document, and the file is stored in a file server that stores the document and a feature amount acquired from the contents of the document stored in the file server. A feature table storage device for storing a feature table associated with registered document identification information for identification in advance, and distribution document identification information for identifying the document each time an input / output operation is performed on the document At least one terminal device, an operation table storage device for storing an operation table in which operation information representing an input / output operation by the terminal device is associated with the distribution document identification information, and a tracking target document in which a tracking target document is designated Obtaining a feature amount from the content of the document designated by the designation device and the tracking target document designation device, the obtained feature amount, A document identification information detecting device for detecting registered document identification information of a document whose content is similar to the document to be tracked by comparing with a feature amount included in the collection amount table, and the document identification information detecting device. A tracking process for extracting operation information representing an input / output operation for the document identified by the registered document identification information from the operation table and performing a tracking process for tracking an input / output history for the document based on the extracted operation information And a device.

この構成により、本発明の文書追跡システムは、追跡対象の文書に内容が類似する文書をファイルサーバから抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 With this configuration, the document tracking system of the present invention extracts a document whose content is similar to the document to be tracked from the file server, and tracks the input / output history for the extracted document. It is possible to trace a document to which identification information that is not registered is assigned.

前記追跡対象文書指定装置は、前記追跡対象の文書から前記流通文書識別情報を取得し、前記追跡処理装置は、前記追跡対象文書指定装置によって取得された流通文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、前記ファイルサーバに格納された文書からの入出力の履歴を追跡する追跡処理を行い、前記文書識別情報検出装置は、該追跡処理の結果が得られなかった場合に限り、該追跡対象の文書に内容が類似する文書の登録文書識別情報を検出するようにしてもよい。 The tracking target document specifying device acquires the distribution document identification information from the tracking target document, and the tracking processing device is configured to input the document identified by the distribution document identification information acquired by the tracking target document specifying device. The document identification information detection device that extracts operation information representing an output operation from the operation table, performs a tracking process for tracking an input / output history from a document stored in the file server based on the extracted operation information, and The registered document identification information of a document whose content is similar to the document to be tracked may be detected only when the result of the tracking process is not obtained.

この構成により、本発明の文書追跡システムは、追跡が可能な識別情報が得られた文書に対しては、この識別情報に基づいた追跡を行うことができる。 With this configuration, the document tracking system of the present invention can perform tracking based on the identification information for a document for which identification information that can be tracked is obtained.

また、本発明の文書追跡システムは、前記ファイルサーバの特定の場所に格納された文書を監視する文書監視装置を備え、前記文書監視装置は、該文書の状態の変化を検出したときに、前記特徴量テーブル格納装置に前記特徴量テーブルを更新させるようにしてもよい。 The document tracking system of the present invention includes a document monitoring device that monitors a document stored in a specific location of the file server, and the document monitoring device detects the change in the state of the document when the change is detected. The feature quantity table storage device may update the feature quantity table.

この構成により、本発明の文書追跡システムは、ファイルサーバに格納された文書の状態が変化した場合に、特徴量テーブルを更新することができる。 With this configuration, the document tracking system of the present invention can update the feature amount table when the state of the document stored in the file server changes.

また、本発明の文書追跡システムは、文書に対する入出力操作を追跡する文書追跡システムにおいて、前記文書に対する入出力操作が行われるたびに、該文書を識別するための流通文書識別情報を取得する少なくとも１つの端末装置と、該文書の内容から取得される特徴量に該流通文書識別情報を関連付けた特徴量テーブルを格納する特徴量テーブル格納装置と、前記端末装置による入出力操作を表す操作情報を前記流通文書識別情報に関連付けた操作テーブルを格納する操作テーブル格納装置と、追跡対象の文書が指定される追跡対象文書指定装置と、前記追跡対象文書指定装置に指定された文書の内容から特徴量を取得し、取得した特徴量と、前記特徴量テーブルに含まれる特徴量とを比較することによって、前記追跡対象の文書に内容が類似する文書の流通文書識別情報を検出する文書識別情報検出装置と、前記文書識別情報検出装置によって検出された流通文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、該文書に対する入出力の履歴を追跡する追跡処理を行う追跡処理装置と、を備えた構成を有している。 Also, the document tracking system of the present invention is a document tracking system that tracks input / output operations on a document, and acquires at least distribution document identification information for identifying the document each time the input / output operation is performed on the document. One terminal device, a feature amount table storage device for storing a feature amount table in which the circulation document identification information is associated with a feature amount acquired from the content of the document, and operation information representing input / output operations by the terminal device. An operation table storage device for storing an operation table associated with the distribution document identification information, a tracking target document specifying device for specifying a tracking target document, and a feature amount from the contents of the document specified for the tracking target document specifying device And by comparing the acquired feature quantity with the feature quantity included in the feature quantity table, the content in the tracking target document is Document identification information detection apparatus for detecting distribution document identification information of similar documents, and operation information representing input / output operations for a document identified by distribution document identification information detected by the document identification information detection apparatus from the operation table And a tracking processing device that performs a tracking process for tracking an input / output history for the document based on the extracted operation information.

この構成により、本発明の文書追跡システムは、追跡対象の文書に内容が類似する文書を他の識別情報が割り当てられた文書から抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 With this configuration, the document tracking system of the present invention extracts a document whose content is similar to the document to be tracked from a document assigned with other identification information, and tracks the input / output history for the extracted document. It is possible to trace a document for which no information is obtained or a document to which identification information not registered is assigned.

また、前記追跡対象文書指定装置は、前記追跡対象の文書から前記流通文書識別情報を取得し、前記追跡処理装置は、前記追跡対象文書指定装置によって取得され流通文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、該流通文書識別情報によって識別される文書の入出力の履歴を追跡する追跡処理を行い、前記文書識別情報検出装置は、該追跡処理の結果が得られなかった場合に限り、該追跡対象の文書に内容が類似する文書の流通文書識別情報を検出するようにしてもよい。 In addition, the tracking target document specifying device acquires the distribution document identification information from the tracking target document, and the tracking processing device performs processing for a document acquired by the tracking target document specifying device and identified by the distribution document identification information. Operation information representing input / output operations is extracted from the operation table, and based on the extracted operation information, a tracking process is performed for tracking an input / output history of a document identified by the distributed document identification information, and the document identification information The detection device may detect distribution document identification information of a document whose content is similar to the document to be tracked only when the result of the tracking process is not obtained.

この構成により、本発明の文書追跡システムは、追跡が可能な流通識別情報が得られた文書に対しては、この識別情報に基づいた追跡を行うことができる。 With this configuration, the document tracking system of the present invention can perform tracking based on this identification information for a document for which distribution identification information that can be tracked is obtained.

また、前記追跡処理装置は、前記追跡処理によって複数の結果が得られた場合には、前記複数の結果を前記追跡対象文書指定装置に出力させ、該追跡対象文書指定装置のユーザに１つの結果を選択させるようにしてもよい。 In addition, when a plurality of results are obtained by the tracking process, the tracking processing device causes the tracking target document specifying device to output the plurality of results, and causes the user of the tracking target document specifying device to obtain one result. May be selected.

この構成により、本発明の文書追跡システムは、追跡対象の文書に内容が類似する文書が多く抽出された場合には、抽出された文書をユーザに絞り込ませることができる。 With this configuration, the document tracking system of the present invention can narrow down the extracted documents to the user when many documents whose contents are similar to the document to be tracked are extracted.

また、前記特徴量テーブル格納装置は、前記文書を断片化した文書ピースを該文書の特徴量として取得するようにしてもよい。 The feature quantity table storage device may acquire a document piece obtained by fragmenting the document as a feature quantity of the document.

この構成により、本発明の文書追跡システムは、追跡対象の文書が他の文書に含まれている場合であっても、当該文書を検出することができる。 With this configuration, the document tracking system of the present invention can detect a document to be tracked even when the document to be tracked is included in another document.

また、前記追跡対象文書指定装置は、前記追跡対象の文書が読み込まれるスキャナによって構成してもよい。 The tracking target document specifying device may be configured by a scanner that reads the tracking target document.

この構成により、本発明の文書追跡システムは、スキャナを用いて追跡対象の文書を指定させることができる。 With this configuration, the document tracking system of the present invention can designate a document to be tracked using a scanner.

また、前記追跡対象文書指定装置は、前記追跡対象の文書のファイル名が指定されるようにしてもよい。 The tracking target document specifying device may specify a file name of the tracking target document.

この構成により、本発明の文書追跡システムは、ファイル名によって追跡対象の文書を指定させることができる。 With this configuration, the document tracking system of the present invention can designate a document to be tracked by a file name.

また、前記流通文書識別情報は、前記文書の内容が記録された記録媒体を識別するための記録媒体識別情報であってもよい。 The distribution document identification information may be recording medium identification information for identifying a recording medium on which the content of the document is recorded.

この構成により、本発明の文書追跡システムは、記録媒体を識別するための記録媒体識別情報から文書の識別情報を得ることができる。 With this configuration, the document tracking system of the present invention can obtain the document identification information from the recording medium identification information for identifying the recording medium.

ここで、前記記録媒体識別情報は、前記記録媒体としての用紙の表面の凹凸パターンに基づいた情報であってもよく、前記記録媒体としての用紙に無作為に漉き込まれた金属繊維のパターンに基づいた情報であってもよく、前記記録媒体としての用紙に埋め込まれたＩＣチップに記録されている識別子に基づいた情報であってもよい。 Here, the recording medium identification information may be information based on a concavo-convex pattern on the surface of the paper as the recording medium, and may be a pattern of metal fibers randomly inserted into the paper as the recording medium. The information may be information based on an identifier recorded on an IC chip embedded in a sheet as the recording medium.

また、本発明の文書追跡システムは、前記追跡処理の結果を出力する追跡結果出力装置を備えるようにしてもよい。 The document tracking system of the present invention may further include a tracking result output device that outputs the result of the tracking process.

この構成により、本発明の文書追跡システムは、追跡対象の文書の追跡結果をユーザに提示することができる。 With this configuration, the document tracking system of the present invention can present the tracking result of the tracked document to the user.

また、前記操作情報には、該操作情報が表す入出力操作を行ったユーザを識別するためのユーザ識別情報が含まれていてもよい。 The operation information may include user identification information for identifying a user who has performed an input / output operation represented by the operation information.

この構成により、本発明の文書追跡システムは、追跡対象の文書に対して入出力操作を行ったユーザを特定することができる。 With this configuration, the document tracking system of the present invention can specify a user who has performed an input / output operation on a tracked document.

また、本発明の文書追跡方法は、ファイルサーバに格納された文書に対する入出力操作をコンピュータに追跡させる文書追跡方法において、前記ファイルサーバに格納された文書の内容から取得される特徴量に該文書を識別するための登録文書識別情報を関連付けた特徴量テーブルを予め格納しておく特徴量テーブル格納ステップと、前記文書に対する入出力操作が行われるたびに、該文書を識別するための流通文書識別情報を取得する識別情報取得ステップと、前記入出力操作を表す操作情報を該流通文書識別情報に関連付けた操作テーブルを格納する操作テーブル格納ステップと、追跡対象の文書が指定される追跡対象文書指定ステップと、前記追跡対象文書指定ステップで指定された文書の内容から特徴量を取得し、取得した特徴量と、前記特徴量テーブルに含まれる特徴量とを比較することによって、前記追跡対象の文書に内容が類似する文書の登録文書識別情報を検出する文書識別情報検出ステップと、前記文書識別情報検出ステップで検出された登録文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、該文書に対する入出力の履歴を追跡する追跡処理ステップと、を有する。 Further, the document tracking method of the present invention is a document tracking method for causing a computer to track input / output operations for a document stored in a file server, wherein the document has a feature amount acquired from the contents of the document stored in the file server. A feature amount table storing step for storing a feature amount table associated with registered document identification information for identifying the document in advance, and a distribution document identifier for identifying the document each time an input / output operation is performed on the document An identification information acquisition step for acquiring information, an operation table storage step for storing an operation table in which the operation information representing the input / output operation is associated with the distribution document identification information, and a tracking target document specification for specifying a tracking target document A feature amount from the content of the document specified in the tracking target document specifying step, and the acquired feature amount A document identification information detection step for detecting registered document identification information of a document whose content is similar to the document to be tracked by comparing the feature amounts included in the feature amount table, and detection in the document identification information detection step A tracking process step of extracting operation information representing an input / output operation for the document identified by the registered document identification information, from the operation table, and tracking an input / output history for the document based on the extracted operation information; Have

したがって、本発明の文書追跡方法は、追跡対象の文書に内容が類似する文書をファイルサーバから抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 Accordingly, the document tracking method of the present invention extracts a document whose contents are similar to the document to be tracked from the file server, and tracks the input / output history for the extracted document. Documents to which identification information that has not been assigned are assigned can be traced.

また、本発明の文書追跡方法は、文書に対する入出力操作をコンピュータに追跡させる文書追跡方法において、前記文書に対する入出力操作が行われるたびに、該文書を識別するための流通文書識別情報を取得する識別情報取得ステップと、該文書の内容から取得される特徴量に該流通文書識別情報を関連付けた特徴量テーブルを格納する特徴量テーブル格納ステップと、前記入出力操作を表す操作情報を該流通文書識別情報に関連付けた操作テーブルを格納する操作テーブル格納ステップと、追跡対象の文書が指定される追跡対象文書指定ステップと、前記追跡対象文書指定ステップで指定された文書の内容から特徴量を取得し、取得した特徴量と、前記特徴量テーブルに含まれる特徴量とを比較することによって、前記追跡対象の文書に内容が類似する文書の流通文書識別情報を検出する文書識別情報検出ステップと、前記文書識別情報検出ステップで検出された流通文書識別情報によって識別される文書に対する入出力操作を表す操作情報を前記操作テーブルから抽出し、抽出した操作情報に基づいて、該文書に対する入出力の履歴を追跡する追跡ステップと、を有する。 The document tracking method of the present invention is a document tracking method for causing a computer to track input / output operations for a document, and obtains distribution document identification information for identifying the document every time the input / output operation for the document is performed. An identification information acquisition step, a feature amount table storage step for storing a feature amount table in which the distribution document identification information is associated with a feature amount acquired from the contents of the document, and operation information representing the input / output operation An operation table storing step for storing an operation table associated with document identification information, a tracking target document specifying step for specifying a tracking target document, and a feature amount obtained from the contents of the document specified in the tracking target document specifying step Then, by comparing the acquired feature quantity with the feature quantity included in the feature quantity table, A document identification information detecting step for detecting distribution document identification information of documents having similar contents, and operation information representing an input / output operation for a document identified by the distribution document identification information detected in the document identification information detection step. A tracking step of extracting from the table and tracking an input / output history for the document based on the extracted operation information.

したがって、本発明の文書追跡方法は、追跡対象の文書に内容が類似する文書を他の識別情報が割り当てられた文書から抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 Therefore, the document tracking method of the present invention extracts a document whose contents are similar to the document to be tracked from a document to which other identification information is assigned, and tracks the input / output history for the extracted document. It is possible to trace documents that cannot be obtained or are assigned identification information that is not registered.

本発明は、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる文書追跡システムを提供することができる。 The present invention can provide a document tracking system capable of tracking a document for which identification information cannot be obtained or a document to which identification information not registered is assigned.

以下、本発明の実施の形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（第１の実施の形態）
本発明の第１の実施の形態としての文書追跡システムを図１に示す。本実施の形態の文書追跡システムは、ファイルサーバ１と、コンテンツ解析サーバ２と、クライアントパーソナルコンピュータ（ＰＣ）１０と、プリンタ１１と、複合機１２、１３と、スキャナ１４と、追跡スキャナ２０と、追跡ＰＣ２１と、追跡サーバ２２とを備えている。 (First embodiment)
FIG. 1 shows a document tracking system as a first embodiment of the present invention. The document tracking system according to the present embodiment includes a file server 1, a content analysis server 2, a client personal computer (PC) 10, a printer 11, multifunction peripherals 12 and 13, a scanner 14, a tracking scanner 20, A tracking PC 21 and a tracking server 22 are provided.

なお、本実施の形態において、ファイルサーバ１は、本発明におけるファイルサーバおよび文書監視装置を構成し、コンテンツ解析サーバ２は、本発明における特徴量テーブル格納装置および文書識別情報検出装置を構成し、クライアントＰＣ１０、プリンタ１１、複合機１２、１３およびスキャナ１４は、本発明における端末装置を構成する。 In the present embodiment, the file server 1 constitutes a file server and a document monitoring device in the present invention, and the content analysis server 2 constitutes a feature amount table storage device and a document identification information detection device in the present invention. The client PC 10, the printer 11, the multifunction machines 12, 13 and the scanner 14 constitute a terminal device in the present invention.

また、追跡スキャナ２０および追跡ＰＣ２１は、本発明における追跡対象文書指定装置および追跡結果出力装置を構成し、追跡サーバ２２は、本発明における操作テーブル格納装置および追跡処理装置を構成する。 The tracking scanner 20 and the tracking PC 21 constitute a tracking target document designation device and a tracking result output device according to the present invention, and the tracking server 22 constitutes an operation table storage device and a tracking processing device according to the present invention.

ファイルサーバ１は、コンピュータ装置によって構成され、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ハードディスク装置、入力装置、表示装置およびネットワークモジュール等を有する。 The file server 1 includes a computer device, and includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, an input device, a display device, a network module, and the like.

ファイルサーバ１のＲＯＭおよびハードディスク装置には、当該コンピュータ装置をファイルサーバ１として機能させるためにＣＰＵに実行させるプログラムが格納されている。 The ROM and hard disk device of the file server 1 store a program to be executed by the CPU so that the computer device functions as the file server 1.

また、ファイルサーバ１のハードディスク装置には、監視フォルダ４０が格納されている。監視フォルダ４０には、クライアントＰＣ１０等の端末装置がアクセス可能な電子文書（以下、「ファイル」という。）が格納されている。 A monitoring folder 40 is stored in the hard disk device of the file server 1. The monitoring folder 40 stores an electronic document (hereinafter referred to as “file”) that can be accessed by a terminal device such as the client PC 10.

また、図２に示すように、ファイルサーバ１は、文書監視部５０を有し、文書監視部５０は、監視フォルダ４０を設定する監視フォルダ設定部５１と、監視フォルダ４０の格納内容を監視する文書保存監視部５２と、監視フォルダ４０の格納内容が変更されたときに、変更された文書の登録をコンテンツ解析サーバ２に要求する文書登録要求部５３とを有する。 As shown in FIG. 2, the file server 1 includes a document monitoring unit 50, and the document monitoring unit 50 monitors the storage contents of the monitoring folder 40 and the monitoring folder setting unit 51 that sets the monitoring folder 40. The document storage monitoring unit 52 and the document registration requesting unit 53 that requests the content analysis server 2 to register the changed document when the stored contents of the monitoring folder 40 are changed.

例えば、ファイルサーバ１の監視フォルダ４０に新たなファイルが登録されたことが文書保存監視部５２によって検出された場合には、文書登録要求部５３は、監視フォルダ４０に新たに登録されたファイルの登録文書識別情報を文書保存監視部５２から取得するようになっている。ここで、登録文書識別情報は、ファイルサーバ１におけるファイルのファイル名、パス、生成日時および生成ユーザ名（ユーザ識別情報）等よりなる。 For example, when the document storage monitoring unit 52 detects that a new file has been registered in the monitoring folder 40 of the file server 1, the document registration requesting unit 53 stores the newly registered file in the monitoring folder 40. Registered document identification information is acquired from the document storage monitoring unit 52. Here, the registered document identification information includes a file name, a path, a generation date and time, a generation user name (user identification information), and the like of the file in the file server 1.

さらに、文書登録要求部５３は、当該ファイルの内容（以下、「文書データ」という。）を監視フォルダ４０から取得し、取得した文書データおよび登録文書識別情報をコンテンツ解析サーバ２に送信するようになっている。 Further, the document registration request unit 53 acquires the contents of the file (hereinafter referred to as “document data”) from the monitoring folder 40, and transmits the acquired document data and registered document identification information to the content analysis server 2. It has become.

コンテンツ解析サーバ２は、ファイルサーバ１と同様に、コンピュータ装置によって構成され、ＣＰＵに実行されるプログラムによって、コンテンツ解析サーバ２として機能するようになっている。 Similar to the file server 1, the content analysis server 2 is configured by a computer device and functions as the content analysis server 2 by a program executed by the CPU.

コンテンツ解析サーバ２には、ファイルサーバ１の監視フォルダ４０に格納されたファイルの内容から取得される特徴量と、当該ファイルの登録文書識別情報とが対応付けられた特徴量テーブルが予め格納されている。 The content analysis server 2 stores in advance a feature value table in which feature values acquired from the contents of a file stored in the monitoring folder 40 of the file server 1 are associated with registered document identification information of the file. Yes.

また、コンテンツ解析サーバ２は、文書登録部６０を有し、文書登録要求部５３からの登録要求を処理する文書登録処理部６１と、文書データを複数の文書ピースに分割する文書ピース分割部６２と、文書ピースを特徴量テーブルに登録する特徴量テーブル登録部６３と、特徴量テーブルを格納する特徴量テーブル格納部６４とを有する。 The content analysis server 2 also includes a document registration unit 60, a document registration processing unit 61 that processes a registration request from the document registration request unit 53, and a document piece division unit 62 that divides document data into a plurality of document pieces. A feature amount table registration unit 63 that registers document pieces in the feature amount table, and a feature amount table storage unit 64 that stores the feature amount table.

ここで、文書ピースは、文書データを文字数単位に分割したものでもよく、各頁を均等に分割したものでもよく、段落単位、文単位または文節単位に分割したものでもよい。 Here, the document piece may be obtained by dividing the document data into units of the number of characters, may be obtained by dividing each page equally, or may be divided into paragraph units, sentence units, or phrase units.

文書登録処理部６１は、文書登録要求部５３によって送信された文書データが画像や映像を表す場合には、ＯＣＲ（Optical Character Reader）等によって画像や映像から文字情報を抽出するようになっている。 When the document data transmitted by the document registration request unit 53 represents an image or video, the document registration processing unit 61 extracts character information from the image or video using OCR (Optical Character Reader) or the like. .

文書ピース分割部６２は、文書登録処理部６１によって抽出された文書データおよび登録文書識別情報を受信し、受信した文書データを複数の文書ピースに分割し、分割した文書ピース、当該文書ピースの番号（以下、「ピース番号」という。）および登録文書識別情報を特徴量テーブル登録部６３に出力するようになっている。 The document piece dividing unit 62 receives the document data and registered document identification information extracted by the document registration processing unit 61, divides the received document data into a plurality of document pieces, and the divided document pieces and the numbers of the document pieces (Hereinafter referred to as “piece number”) and registered document identification information are output to the feature table registration unit 63.

特徴量テーブル登録部６３は、文書ピース分割部６２によって出力された文書ピース、ピース番号を登録文書識別情報に対応付けて、特徴量テーブル格納部６４に格納された特徴量テーブルに登録するようになっている。 The feature quantity table registration unit 63 associates the document piece and piece number output by the document piece division unit 62 with the registered document identification information and registers them in the feature quantity table stored in the feature quantity table storage unit 64. It has become.

図１において、追跡サーバ２２は、ファイルサーバ１と同様に、コンピュータ装置によって構成され、ＣＰＵに実行されるプログラムによって、追跡サーバ２２として機能するようになっている。 In FIG. 1, the tracking server 22 is configured by a computer device, like the file server 1, and functions as the tracking server 22 by a program executed by the CPU.

追跡サーバ２２は、クライアントＰＣ１０、プリンタ１１、複合機１２、１３およびスキャナ１４等の各端末装置による入出力操作を表す操作情報を流通文書識別情報に関連付けた操作テーブルを格納するようになっている。 The tracking server 22 stores an operation table in which operation information representing input / output operations by terminal devices such as the client PC 10, the printer 11, the multifunction machines 12, 13 and the scanner 14 is associated with the distribution document identification information. .

ここで、流通文書識別情報は、文書を一意に識別する情報である。例えば、記録媒体に記録された文書には、この記録媒体を識別するための記録媒体識別情報が流通文書識別情報として割り当てられる。 Here, the distribution document identification information is information for uniquely identifying a document. For example, recording medium identification information for identifying the recording medium is assigned to the document recorded on the recording medium as distribution document identification information.

ここで、記録媒体が用紙である場合には、記録媒体識別情報は、用紙の表面の凹凸パターン（以下、「紙紋」という。）、用紙に無作為に漉き込まれた金属繊維のパターン、用紙の地紋または用紙に埋め込まれたＩＣチップに記録されている識別子等に基づいて検出される。 Here, when the recording medium is a sheet, the recording medium identification information includes a concavo-convex pattern on the surface of the sheet (hereinafter referred to as “paper pattern”), a pattern of metal fibers randomly inserted in the sheet, It is detected based on an identifier or the like recorded on a paper pattern or an IC chip embedded in the paper.

また、電子化された文書、すなわち、ファイルには、当該ファイルの文書データにハッシュ関数を用いて得られるハッシュ値が流通文書識別情報として割り当てられる。 In addition, a hash value obtained by using a hash function for document data of the file is assigned to the digitized document, that is, the file, as distribution document identification information.

例えば、ファイル３０ａがファイルサーバ１からクライアントＰＣ１０に複製された場合には、クライアントＰＣ１０は、入出力操作内容（複製）、操作対象を識別するための操作対象識別情報（ファイル３０ａの登録文書識別情報）、操作日時、ユーザ名（ユーザ識別情報）および機器ＩＤ（例えば、当該端末のネットワークインタフェイスに割り当てられているＭＡＣ（Media Access Control）アドレス）を含む操作情報と、ファイル３０ａの文書データのハッシュ値である流通文書識別情報とを追跡サーバ２２に送信し、追跡サーバ２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録するようになっている。 For example, when the file 30a is copied from the file server 1 to the client PC 10, the client PC 10 determines the input / output operation content (duplication), operation target identification information for identifying the operation target (registered document identification information of the file 30a). ), Operation information including operation date and time, user name (user identification information) and device ID (for example, MAC (Media Access Control) address assigned to the network interface of the terminal), and hash of document data in the file 30a The distribution document identification information, which is a value, is transmitted to the tracking server 22, and the tracking server 22 associates the operation information with the distribution document identification information and registers it in the operation table.

また、ファイル３０ａの文書データがプリンタ１１によって印刷された場合には、プリンタ１１は、入出力操作内容（印刷）、操作対象識別情報（ファイル３０ａの文書データのハッシュ値）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、ファイル３０ａの文書データを印刷した用紙３０ｂの記録媒体識別情報である流通文書識別情報とを追跡サーバ２２に送信し、追跡サーバ２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録するようになっている。 When the document data of the file 30a is printed by the printer 11, the printer 11 performs input / output operation content (printing), operation target identification information (hash value of the document data of the file 30a), operation date and time, user name And the operation information including the device ID and the distribution document identification information which is the recording medium identification information of the paper 30b on which the document data of the file 30a is printed are transmitted to the tracking server 22, and the tracking server 22 transmits the operation information and the distribution information. The document identification information is associated and registered in the operation table.

ここで、プリンタ１１は、文書データを印刷する用紙から記録媒体識別情報を検出する記録媒体識別情報検出部を有する。本実施の形態において、記録媒体識別情報検出部は、用紙の表面の一部にレーザを照射し、その反射光の強度分布を検出することによって、記録媒体識別情報として紙紋を検出するものとする。 Here, the printer 11 includes a recording medium identification information detection unit that detects recording medium identification information from a sheet on which document data is printed. In this embodiment, the recording medium identification information detection unit detects a paper pattern as the recording medium identification information by irradiating a part of the surface of the paper with a laser and detecting the intensity distribution of the reflected light. To do.

また、用紙３０ｂから複合機１２、１３によって文書データがそれぞれ読み込まれた場合には、複合機１２、１３は、入出力操作内容（スキャン）、操作対象識別情報（用紙３０ｂの記録媒体識別情報）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、読み込んだ文書データのハッシュ値である流通文書識別情報とを追跡サーバ２２に送信し、追跡サーバ２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録するようになっている。 When the document data is read from the paper 30b by the multifunction devices 12 and 13, the multifunction devices 12 and 13 read the input / output operation content (scan) and the operation target identification information (recording medium identification information of the paper 30b). , The operation information including the operation date and time, the user name and the device ID, and the distribution document identification information which is a hash value of the read document data are transmitted to the tracking server 22, and the tracking server 22 transmits the operation information and the distribution document identification. Information is associated with the information and registered in the operation table.

また、複合機１２、１３によってそれぞれ読み込まれた文書データが印刷された場合には、複合機１２、１３は、入出力操作内容（印刷）、操作対象識別情報（当該文書データのハッシュ値）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、当該文書データを印刷した用紙３０ｃ、３０ｄの記録媒体識別情報である流通文書識別情報とを追跡サーバ２２にそれぞれ送信し、追跡サーバ２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録するようになっている。 When the document data read by the multifunction devices 12 and 13 is printed, the multifunction devices 12 and 13 are configured to input / output operation details (printing), operation target identification information (hash value of the document data), The operation information including the operation date and time, the user name and the device ID, and the distribution document identification information which is the recording medium identification information of the sheets 30c and 30d on which the document data is printed are transmitted to the tracking server 22, respectively. The operation information and the distribution document identification information are associated with each other and registered in the operation table.

ここで、複合機１２、１３は、文書データを読み込む用紙から記録媒体識別情報を検出する第１の記録媒体識別情報検出部と、文書データを印刷する用紙から記録媒体識別情報を検出する第２の記録媒体識別情報検出部を有する。 Here, the multifunction peripherals 12 and 13 detect the recording medium identification information from the sheet from which the document data is read, and the second recording medium identification information from the sheet from which the document data is printed. Recording medium identification information detection unit.

また、用紙３０ｄからスキャナ１４によって文書データが読み込まれた場合には、スキャナ１４は、入出力操作内容（スキャン）、操作対象識別情報（用紙３０ｄの記録媒体識別情報）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、読み込んだ文書データのハッシュ値である流通文書識別情報とを追跡サーバ２２に送信し、追跡サーバ２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録するようになっている。 When the document data is read from the paper 30d by the scanner 14, the scanner 14 reads the input / output operation content (scan), operation target identification information (recording medium identification information on the paper 30d), operation date and time, user name, and The operation information including the device ID and the distribution document identification information that is the hash value of the read document data are transmitted to the tracking server 22, and the tracking server 22 associates the operation information with the distribution document identification information and operates the operation table. To register.

ここで、スキャナ１４は、文書データを読み込む用紙から記録媒体識別情報を検出する記録媒体識別情報検出部を有する。 Here, the scanner 14 includes a recording medium identification information detection unit that detects recording medium identification information from a sheet from which document data is read.

追跡スキャナ２０は、追跡対象の文書が記録された用紙３０ｆから、文書データを読み込むときに、用紙３０ｆから記録媒体識別情報を検出する記録媒体識別情報検出部を有し、記録媒体識別情報検出部によって検出された流通文書識別情報を追跡サーバ２２に送信するようになっている。 The tracking scanner 20 includes a recording medium identification information detection unit that detects recording medium identification information from the sheet 30f when reading document data from the sheet 30f on which the document to be tracked is recorded. The distribution document identification information detected by the above is transmitted to the tracking server 22.

また、追跡ＰＣ２１は、ファイル３０ｇが追跡対象の文書として指定された場合には、ファイル３０ｇの文書データのハッシュ値を流通文書識別情報として算出し、算出した流通文書識別情報を追跡サーバ２２に送信するようになっている。 Further, when the file 30g is designated as a tracking target document, the tracking PC 21 calculates the hash value of the document data of the file 30g as the distribution document identification information, and transmits the calculated distribution document identification information to the tracking server 22. It is supposed to be.

ここで、追跡サーバ２２は、追跡スキャナ２０や追跡ＰＣ２１から送信された流通文書識別情報に基づいて操作テーブルから流通文書識別情報を抽出し、抽出した流通文書識別情報に関連付けられた操作情報に含まれる操作対象識別情報を検出して行くことにより、当該文書の流通経路を表す流通情報を生成し、返信するようになっている。 Here, the tracking server 22 extracts the distribution document identification information from the operation table based on the distribution document identification information transmitted from the tracking scanner 20 or the tracking PC 21, and is included in the operation information associated with the extracted distribution document identification information. By detecting the operation target identification information, the distribution information representing the distribution route of the document is generated and returned.

一方で、追跡サーバ２２が操作テーブルから当該流通文書識別情報を含む要素を抽出することができず、流通情報が得られなかった場合、または、追跡対象の文書の流通文書識別情報が検出できなかった場合には、追跡スキャナ２０は、用紙３０ｆから読み込んだ文書データをコンテンツ解析サーバ２に送信し、追跡ＰＣ２１は、ファイル３０ｇの文書データをコンテンツ解析サーバ２に送信するようになっている。 On the other hand, when the tracking server 22 cannot extract the element including the distribution document identification information from the operation table and distribution information cannot be obtained, or the distribution document identification information of the document to be tracked cannot be detected. In this case, the tracking scanner 20 transmits the document data read from the paper 30f to the content analysis server 2, and the tracking PC 21 transmits the document data of the file 30g to the content analysis server 2.

ここで、コンテンツ解析サーバ２は、追跡スキャナ２０や追跡ＰＣ２１から送信された追跡対象の文書データを前述した文書登録処理部６１および文書ピース分割部６２を用いて複数の文書ピースに分割するようになっている。 Here, the content analysis server 2 divides the document data to be tracked transmitted from the tracking scanner 20 or the tracking PC 21 into a plurality of document pieces using the document registration processing unit 61 and the document piece dividing unit 62 described above. It has become.

コンテンツ解析サーバ２は、分割した各文書ピースに対する相関値が予め定められた閾値ＴＨ１より高い文書ピースを特徴量テーブルに基づいて検出し、検出した文書ピースが追跡対象の文書データに占める割合を類似度として算出し、算出した類似度が予め定められた閾値ＴＨ２より高い文書データを有するファイル（以下、「類似文書」という。）の登録文書識別情報を返信するようになっている。 The content analysis server 2 detects a document piece having a correlation value higher than a predetermined threshold TH1 for each divided document piece based on the feature amount table, and the proportion of the detected document piece in the document data to be tracked is similar. The registered document identification information of a file (hereinafter referred to as “similar document”) having document data in which the calculated similarity is higher than a predetermined threshold TH2 is returned.

ここで、複数の類似文書が検出された場合には、コンテンツ解析サーバ２は、ユーザが各類似文書を識別できる情報（例えば、登録文書識別情報）を追跡スキャナ２０や追跡ＰＣ２１に提示させ、追跡スキャナ２０や追跡ＰＣ２１は、ユーザに１つの類似文書を選択させるようになっている。 Here, when a plurality of similar documents are detected, the content analysis server 2 causes the tracking scanner 20 and the tracking PC 21 to present information that allows the user to identify each similar document (for example, registered document identification information) for tracking. The scanner 20 and tracking PC 21 allow the user to select one similar document.

なお、類似文書が複数ある場合には、コンテンツ解析サーバ２は、類似度が最も高い文書データを有する類似文書の登録文書識別情報を返信するようにしてもよい。 When there are a plurality of similar documents, the content analysis server 2 may return registered document identification information of a similar document having document data with the highest similarity.

追跡スキャナ２０および追跡ＰＣ２１は、コンテンツ解析サーバ２から返信された登録文書識別情報を追跡サーバ２２に送信するようになっている。ここで、追跡サーバ２２は、追跡スキャナ２０や追跡ＰＣ２１から送信された登録文書識別情報に基づいて操作テーブルから操作情報を抽出し、抽出した操作情報に関連付けられた流通文書識別情報を検出して行くことにより、当該文書の流通経路を表す流通情報を生成し、返信するようになっている。 The tracking scanner 20 and the tracking PC 21 transmit the registered document identification information returned from the content analysis server 2 to the tracking server 22. Here, the tracking server 22 extracts the operation information from the operation table based on the registered document identification information transmitted from the tracking scanner 20 or the tracking PC 21, and detects the distribution document identification information associated with the extracted operation information. By going, distribution information indicating the distribution route of the document is generated and returned.

このように、追跡サーバ２２から返信された流通情報に対して、追跡スキャナ２０および追跡ＰＣ２１は、図３に示すように、系図等を以って画像表示するようになっている。 As described above, the tracking scanner 20 and the tracking PC 21 display an image with a genealogy or the like as shown in FIG. 3 with respect to the distribution information returned from the tracking server 22.

以上のように構成された文書追跡システムの追跡動作を図４を用いて説明する。以下、追跡スキャナ２０によって用紙３０ｆから追跡対象の文書データが読み込まれた場合を例として説明する。 The tracking operation of the document tracking system configured as described above will be described with reference to FIG. Hereinafter, a case where document data to be tracked is read from the paper 30f by the tracking scanner 20 will be described as an example.

まず、追跡スキャナ２０によって用紙３０ｆから追跡対象の文書データが読み込まれるときに、用紙３０ｆから流通文書識別情報が取得される（ステップＳ１）。ここで、流通文書識別情報が取得された場合には（ステップＳ２：ＹＥＳ）、取得された流通文書識別情報が追跡サーバ２２によって操作テーブルから抽出され、抽出された流通文書識別情報に関連付けられた操作情報に含まれる操作対象識別情報が追跡サーバ２２によって検出されて行くことにより、当該文書の流通経路を表す流通情報が追跡サーバ２２によって生成される（ステップＳ３）。 First, when document data to be tracked is read from the paper 30f by the tracking scanner 20, distribution document identification information is acquired from the paper 30f (step S1). Here, when the distribution document identification information is acquired (step S2: YES), the acquired distribution document identification information is extracted from the operation table by the tracking server 22 and associated with the extracted distribution document identification information. As the operation target identification information included in the operation information is detected by the tracking server 22, distribution information indicating the distribution route of the document is generated by the tracking server 22 (step S3).

ここで、追跡サーバ２２によって流通情報が生成された場合には（ステップＳ４：ＹＥＳ）、追跡サーバ２２によって生成された流通情報が追跡スキャナ２０に設けられた表示部に表示され（ステップＳ５）、追跡動作は、終了する。 Here, when the distribution information is generated by the tracking server 22 (step S4: YES), the distribution information generated by the tracking server 22 is displayed on the display unit provided in the tracking scanner 20 (step S5). The tracking operation ends.

一方、追跡サーバ２２によって流通情報が生成されなかった場合（ステップＳ４：ＮＯ）、または、ステップＳ１で流通文書識別情報が取得されなかった場合には（ステップＳ２：ＮＯ）、追跡スキャナ２０によって用紙３０ｆから読み込まれた文書データと類似する文書データを有する類似文書がファイルサーバ１の監視フォルダ４０のなかからコンテンツ解析サーバ２によって検出される（ステップＳ６）。 On the other hand, when the distribution information is not generated by the tracking server 22 (step S4: NO), or when the distribution document identification information is not acquired at step S1 (step S2: NO), the paper is recorded by the tracking scanner 20. A similar document having document data similar to the document data read from 30f is detected by the content analysis server 2 from the monitoring folder 40 of the file server 1 (step S6).

ここで、類似文書が検出されなかった場合には（ステップＳ７：ＮＯ）、追跡対象の文書の追跡が行えなかった旨が追跡スキャナ２０に設けられた表示部に表示され（ステップＳ８）、追跡動作は、終了する。 If a similar document is not detected (step S7: NO), the fact that the tracking target document could not be tracked is displayed on the display unit provided in the tracking scanner 20 (step S8). The operation ends.

一方、類似文書が検出された場合において（ステップＳ７：ＹＥＳ）、複数の類似文書が検出されたときには（ステップＳ９：ＹＥＳ）、ユーザに各類似文書を識別させる情報が追跡スキャナ２０の表示部に表示され、追跡スキャナ２０のユーザによって１つの類似文書が選択される（ステップＳ１０）。 On the other hand, when a similar document is detected (step S7: YES), when a plurality of similar documents are detected (step S9: YES), information for allowing the user to identify each similar document is displayed on the display unit of the tracking scanner 20. One similar document is displayed by the user of the tracking scanner 20 (step S10).

次に、類似文書の登録文書識別情報に基づいて操作テーブルから操作情報が追跡サーバ２２によって抽出され、抽出された操作情報に関連付けられた流通文書識別情報が追跡サーバ２２によって検出されて行くことにより、当該文書の流通経路を表す流通情報が追跡サーバ２２によって生成される（ステップＳ１１）。 Next, the operation information is extracted from the operation table by the tracking server 22 based on the registered document identification information of the similar document, and the distribution document identification information associated with the extracted operation information is detected by the tracking server 22. Distribution information representing the distribution route of the document is generated by the tracking server 22 (step S11).

ここで、追跡サーバ２２によって流通情報が生成された場合には（ステップＳ１２：ＹＥＳ）、追跡サーバ２２によって生成された流通情報が追跡スキャナ２０に設けられた表示部に表示され（ステップＳ５）、追跡動作は、終了する。 Here, when the distribution information is generated by the tracking server 22 (step S12: YES), the distribution information generated by the tracking server 22 is displayed on the display unit provided in the tracking scanner 20 (step S5). The tracking operation ends.

一方、追跡サーバ２２によって流通情報が生成されなかった場合には（ステップＳ１２：ＮＯ）、追跡対象の文書の追跡が行えなかった旨が追跡スキャナ２０に設けられた表示部に表示され（ステップＳ８）、追跡動作は、終了する。 On the other hand, if the distribution information is not generated by the tracking server 22 (step S12: NO), the fact that the tracking target document could not be tracked is displayed on the display unit provided in the tracking scanner 20 (step S8). ) The tracking operation ends.

以上に説明したように、本発明の第１の実施の形態としての文書追跡システムは、追跡対象の文書に内容が類似する文書をファイルサーバ１から抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 As described above, the document tracking system according to the first exemplary embodiment of the present invention extracts a document whose content is similar to the document to be tracked from the file server 1, and records the input / output history for the extracted document. In order to trace, it is possible to trace a document for which identification information cannot be obtained or a document to which identification information not registered is assigned.

なお、本実施の形態においては、ファイルサーバ１が、本発明におけるファイルサーバおよび文書監視装置を構成し、コンテンツ解析サーバ２が、本発明における特徴量テーブル格納装置および文書識別情報検出装置を構成し、追跡スキャナ２０および追跡ＰＣ２１が、本発明における追跡対象文書指定装置および追跡結果出力装置を構成し、追跡サーバ２２が、本発明における操作テーブル格納装置および追跡処理装置を構成する例について説明したが、これに限定するものではない。 In the present embodiment, the file server 1 constitutes a file server and a document monitoring apparatus in the present invention, and the content analysis server 2 constitutes a feature amount table storage apparatus and a document identification information detection apparatus in the present invention. The tracking scanner 20 and the tracking PC 21 constitute the tracking target document designating device and the tracking result output device in the present invention, and the tracking server 22 constitutes the operation table storage device and the tracking processing device in the present invention. However, the present invention is not limited to this.

例えば、追跡サーバ２２が、本発明における追跡対象文書指定装置、追跡結果出力装置、操作テーブル格納装置および追跡処理装置を構成してもよい。また、追跡サーバ２２が、特徴量テーブル格納装置および文書識別情報検出装置をさらに構成するようにしてもよい。 For example, the tracking server 22 may constitute a tracking target document specifying device, a tracking result output device, an operation table storage device, and a tracking processing device in the present invention. The tracking server 22 may further configure a feature amount table storage device and a document identification information detection device.

（第２の実施の形態）
本発明の第２の実施の形態としての文書追跡システムを図５に示す。本実施の形態の文書追跡システムは、ファイルサーバ１０１と、コンテンツ解析サーバ１０２と、クライアントＰＣ１１０と、プリンタ１１１と、複合機１１２、１１３と、スキャナ１１４と、追跡スキャナ１２０と、追跡ＰＣ１２１と、追跡サーバ１２２とを備えている。 (Second Embodiment)
FIG. 5 shows a document tracking system as a second embodiment of the present invention. The document tracking system according to the present embodiment includes a file server 101, a content analysis server 102, a client PC 110, a printer 111, multifunction peripherals 112 and 113, a scanner 114, a tracking scanner 120, a tracking PC 121, and a tracking. And a server 122.

なお、本実施の形態において、コンテンツ解析サーバ１０２は、本発明における特徴量テーブル格納装置および文書識別情報検出装置を構成し、クライアントＰＣ１１０、プリンタ１１１、複合機１１２、１１３およびスキャナ１１４は、本発明における端末装置を構成する。 In the present embodiment, the content analysis server 102 constitutes the feature amount table storage device and the document identification information detection device according to the present invention, and the client PC 110, the printer 111, the multifunction peripherals 112 and 113, and the scanner 114 are the present invention. The terminal device is configured.

また、追跡スキャナ１２０および追跡ＰＣ１２１は、本発明における追跡対象文書指定装置および追跡結果出力装置を構成し、追跡サーバ１２２は、本発明における操作テーブル格納装置および追跡処理装置を構成する。 The tracking scanner 120 and the tracking PC 121 constitute a tracking target document designation device and a tracking result output device according to the present invention, and the tracking server 122 constitutes an operation table storage device and a tracking processing device according to the present invention.

ファイルサーバ１０１は、コンピュータ装置によって構成され、ＣＰＵ、ＲＡＭ、ＲＯＭ、ハードディスク装置、入力装置、表示装置およびネットワークモジュール等を有する。 The file server 101 is configured by a computer device, and includes a CPU, a RAM, a ROM, a hard disk device, an input device, a display device, a network module, and the like.

ファイルサーバ１０１のＲＯＭおよびハードディスク装置には、当該コンピュータ装置をファイルサーバ１０１として機能させるためにＣＰＵに実行させるプログラムが格納されている。また、ファイルサーバ１０１のハードディスク装置には、クライアントＰＣ１１０等の端末装置がアクセス可能なファイルが格納されている。 The ROM and hard disk device of the file server 101 store a program to be executed by the CPU so that the computer device functions as the file server 101. The hard disk device of the file server 101 stores a file that can be accessed by a terminal device such as the client PC 110.

追跡サーバ１２２は、ファイルサーバ１０１と同様に、コンピュータ装置によって構成され、ＣＰＵに実行されるプログラムによって、追跡サーバ１２２として機能するようになっている。 Similar to the file server 101, the tracking server 122 is configured by a computer device, and functions as the tracking server 122 by a program executed by the CPU.

追跡サーバ１２２は、クライアントＰＣ１１０、プリンタ１１１、複合機１１２、１１３およびスキャナ１１４等の各端末装置による入出力操作を表す操作情報を流通文書識別情報に関連付けた操作テーブルを格納すると共に、端末装置によって入出力された文書データと、流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 The tracking server 122 stores an operation table in which operation information representing input / output operations by each terminal device such as the client PC 110, the printer 111, the multifunction peripherals 112 and 113, the scanner 114, and the like is associated with the circulation document identification information. The input / output document data and the distribution document identification information are transmitted to the content analysis server 102.

例えば、ファイル３０ａがファイルサーバ１０１からクライアントＰＣ１１０に複製された場合には、クライアントＰＣ１１０は、入出力操作内容（複製）、操作対象を識別するための操作対象識別情報（ファイル３０ａの登録文書識別情報）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、ファイル３０ａの文書データのハッシュ値である流通文書識別情報と、ファイル３０ａの文書データとを追跡サーバ１２２に送信し、追跡サーバ１２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、ファイル３０ａの文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 For example, when the file 30a is copied from the file server 101 to the client PC 110, the client PC 110 reads the input / output operation contents (copy), operation target identification information for identifying the operation target (registered document identification information of the file 30a). ), The operation information including the operation date and time, the user name and the device ID, the distribution document identification information which is the hash value of the document data of the file 30a, and the document data of the file 30a are transmitted to the tracking server 122. The operation information and the distribution document identification information are associated with each other and registered in the operation table, and the document data of the file 30a and the distribution document identification information are transmitted to the content analysis server 102.

また、ファイル３０ａの文書データがプリンタ１１１によって印刷された場合には、プリンタ１１１は、入出力操作内容（印刷）、操作対象識別情報（ファイル３０ａの文書データのハッシュ値）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、ファイル３０ａの文書データを印刷した用紙３０ｂの記録媒体識別情報である流通文書識別情報と、ファイル３０ａの文書データとを追跡サーバ１２２に送信し、追跡サーバ１２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 When the document data of the file 30a is printed by the printer 111, the printer 111 reads the input / output operation content (printing), operation target identification information (hash value of the document data of the file 30a), operation date and time, user name And the operation information including the device ID, the distribution document identification information which is the recording medium identification information of the paper 30b on which the document data of the file 30a is printed, and the document data of the file 30a are transmitted to the tracking server 122. The operation information and the distribution document identification information are associated with each other and registered in the operation table, and the document data and the distribution document identification information are transmitted to the content analysis server 102.

ここで、プリンタ１１１は、文書データを印刷する用紙から記録媒体識別情報を検出する記録媒体識別情報検出部を有する。本実施の形態において、記録媒体識別情報検出部は、用紙の表面の一部にレーザを照射し、その反射光の強度分布を検出することによって、記録媒体識別情報として紙紋を検出するようになっている。 Here, the printer 111 includes a recording medium identification information detection unit that detects recording medium identification information from a sheet on which document data is printed. In the present embodiment, the recording medium identification information detection unit detects a paper pattern as the recording medium identification information by irradiating a part of the surface of the paper with a laser and detecting the intensity distribution of the reflected light. It has become.

また、用紙３０ｂから複合機１１２、１１３によって文書データがそれぞれ読み込まれた場合には、複合機１１２、１１３は、入出力操作内容（スキャン）、操作対象識別情報（用紙３０ｂの記録媒体識別情報）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、読み込んだ文書データのハッシュ値である流通文書識別情報と、読み込んだ文書データとを追跡サーバ１２２に送信し、追跡サーバ１２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 When document data is read from the sheet 30b by the multifunction peripherals 112 and 113, the multifunction peripherals 112 and 113 perform input / output operation details (scanning) and operation target identification information (recording medium identification information of the sheet 30b). The operation information including the operation date and time, the user name and the device ID, the distribution document identification information which is the hash value of the read document data, and the read document data are transmitted to the tracking server 122. The tracking server 122 The information and the distribution document identification information are associated with each other and registered in the operation table, and the document data and the distribution document identification information are transmitted to the content analysis server 102.

また、複合機１１２、１１３によってそれぞれ読み込まれた文書データが印刷された場合には、複合機１１２、１１３は、入出力操作内容（印刷）、操作対象識別情報（当該文書データのハッシュ値）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、当該文書データを印刷した用紙３０ｃ、３０ｄの記録媒体識別情報である流通文書識別情報と、当該文書データとを追跡サーバ１２２にそれぞれ送信し、追跡サーバ１２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、当該文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 When the document data read by the MFPs 112 and 113 is printed, the MFPs 112 and 113 are used to input / output operation details (printing), operation target identification information (hash value of the document data), The operation information including the operation date and time, the user name and the device ID, the distribution document identification information which is the recording medium identification information of the sheets 30c and 30d on which the document data is printed, and the document data are transmitted to the tracking server 122, respectively. The tracking server 122 associates the operation information with the distribution document identification information and registers it in the operation table, and transmits the document data and distribution document identification information to the content analysis server 102.

ここで、複合機１１２、１１３は、文書データを読み込む用紙から記録媒体識別情報を検出する第１の記録媒体識別情報検出部と、文書データを印刷する用紙から記録媒体識別情報を検出する第２の記録媒体識別情報検出部を有する。 Here, the multifunction peripherals 112 and 113 detect the recording medium identification information from the sheet from which the document data is read, and the second recording medium identification information from the sheet from which the document data is printed. Recording medium identification information detection unit.

また、用紙３０ｄからスキャナ１１４によって文書データが読み込まれた場合には、スキャナ１１４は、入出力操作内容（スキャン）、操作対象識別情報（用紙３０ｄの記録媒体識別情報）、操作日時、ユーザ名および機器ＩＤを含む操作情報と、読み込んだ文書データのハッシュ値である流通文書識別情報と、読み込んだ文書データとを追跡サーバ１２２に送信し、追跡サーバ１２２は、この操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、当該文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信するようになっている。 When the document data is read from the paper 30d by the scanner 114, the scanner 114 reads the input / output operation content (scan), operation target identification information (recording medium identification information of the paper 30d), operation date / time, user name, and The operation information including the device ID, the distribution document identification information that is a hash value of the read document data, and the read document data are transmitted to the tracking server 122. The tracking server 122 transmits the operation information and the distribution document identification information. Are registered in the operation table, and the document data and the distribution document identification information are transmitted to the content analysis server 102.

ここで、スキャナ１１４は、文書データを読み込む用紙から記録媒体識別情報を検出する記録媒体識別情報検出部を有する。 Here, the scanner 114 includes a recording medium identification information detection unit that detects recording medium identification information from a sheet from which document data is read.

コンテンツ解析サーバ１０２は、ファイルサーバ１０１と同様に、コンピュータ装置によって構成され、ＣＰＵに実行されるプログラムによって、コンテンツ解析サーバ１０２として機能するようになっている。 Similar to the file server 101, the content analysis server 102 is configured by a computer device, and functions as the content analysis server 102 by a program executed by the CPU.

コンテンツ解析サーバ１０２は、追跡サーバ１２２によって送信された文書データと流通文書識別情報とを受信し、この文書データから取得される特徴量に流通文書識別情報を関連付けた特徴量テーブルを格納するようになっている。 The content analysis server 102 receives the document data and the distribution document identification information transmitted by the tracking server 122, and stores a feature amount table in which the distribution document identification information is associated with the feature amount acquired from the document data. It has become.

例えば、図６に示すように、複合機１１２は、スキャン部１４１と、印刷部１４２と、操作情報を生成する操作情報収集部１４３とを備えている。スキャン部１４１は、画像読取部１５１と、第１の記録媒体識別情報検出部としての紙紋検出部１５２とを有し、印刷部１４２は、画像描画部１５５と、第２の記録媒体識別情報検出部としての紙紋検出部１５６とを有している。 For example, as illustrated in FIG. 6, the multifunction machine 112 includes a scanning unit 141, a printing unit 142, and an operation information collection unit 143 that generates operation information. The scanning unit 141 includes an image reading unit 151 and a paper pattern detection unit 152 as a first recording medium identification information detection unit, and the printing unit 142 includes an image drawing unit 155 and second recording medium identification information. And a paper pattern detection unit 156 as a detection unit.

操作情報収集部１４３は、複合機１１２に行われた入出力操作を表す操作情報を生成し、生成した操作情報と、紙紋検出部１５２または紙紋検出部１５６によって検出された流通文書識別情報と、画像読取部１５１によって読み込まれた文書データまたは画像描画部１５５によって印刷された文書データとを追跡サーバ１２２に送信するようになっている。 The operation information collection unit 143 generates operation information representing the input / output operation performed on the multifunction machine 112, and the generated operation information and the distribution document identification information detected by the paper pattern detection unit 152 or the paper pattern detection unit 156. The document data read by the image reading unit 151 or the document data printed by the image drawing unit 155 is transmitted to the tracking server 122.

追跡サーバ１２２は、操作テーブルを格納する操作テーブル格納部１６１と、操作情報収集部１４３によって送信された操作情報と、流通文書識別情報とを関連付けて操作テーブルに登録すると共に、操作情報収集部１４３によって送信された文書データと流通文書識別情報とをコンテンツ解析サーバ１０２に送信する操作テーブル登録部１６２とを備えている。 The tracking server 122 associates the operation table storage unit 161 that stores the operation table, the operation information transmitted by the operation information collection unit 143, and the distribution document identification information in the operation table and registers them in the operation table, and also the operation information collection unit 143. Is provided with an operation table registration unit 162 that transmits the document data and the distribution document identification information transmitted to the content analysis server 102.

また、コンテンツ解析サーバ１０２は、文書登録部１７０を有し、操作テーブル登録部１６２からの登録要求を処理する文書登録処理部１７１と、文書データを複数の文書ピースに分割する文書ピース分割部１７２と、文書ピースを特徴量テーブルに登録する特徴量テーブル登録部１７３と、特徴量テーブルを格納する特徴量テーブル格納部１７４とを有する。 The content analysis server 102 also includes a document registration unit 170, a document registration processing unit 171 that processes a registration request from the operation table registration unit 162, and a document piece division unit 172 that divides document data into a plurality of document pieces. And a feature quantity table registration unit 173 that registers document pieces in the feature quantity table, and a feature quantity table storage unit 174 that stores the feature quantity table.

文書登録処理部１７１は、操作テーブル登録部１６２によって送信された文書データが画像や映像を表す場合には、ＯＣＲ等によって画像や映像から文字情報を抽出するようになっている。 When the document data transmitted by the operation table registration unit 162 represents an image or video, the document registration processing unit 171 extracts character information from the image or video using OCR or the like.

文書ピース分割部１７２は、文書登録処理部１７１によって抽出された文書データおよび流通文書識別情報を受信し、受信した文書データを複数の文書ピースに分割し、分割した文書ピース、ピース番号および流通文書識別情報を特徴量テーブル登録部１７３に出力するようになっている。 The document piece dividing unit 172 receives the document data and the distribution document identification information extracted by the document registration processing unit 171, divides the received document data into a plurality of document pieces, and the divided document piece, piece number, and distribution document The identification information is output to the feature amount table registration unit 173.

特徴量テーブル登録部１７３は、文書ピース分割部１７２によって出力された文書ピース、ピース番号を流通文書識別情報に対応付けて、特徴量テーブル格納部１７４に格納された特徴量テーブルに登録するようになっている。 The feature quantity table registration unit 173 associates the document piece and piece number output by the document piece division unit 172 with the circulation document identification information and registers them in the feature quantity table stored in the feature quantity table storage unit 174. It has become.

図５において、追跡スキャナ１２０は、追跡対象の文書が記録された用紙３０ｆから、文書データを読み込むときに、用紙３０ｆから記録媒体識別情報を検出する記録媒体識別情報検出部を有し、記録媒体識別情報検出部によって検出された流通文書識別情報と文書データとを追跡サーバ１２２に送信するようになっている。 In FIG. 5, the tracking scanner 120 has a recording medium identification information detection unit that detects recording medium identification information from the paper 30 f when reading document data from the paper 30 f on which the document to be tracked is recorded. The circulation document identification information and the document data detected by the identification information detection unit are transmitted to the tracking server 122.

また、追跡ＰＣ１２１は、ファイル３０ｇが追跡対象の文書として指定された場合には、ファイル３０ｇの文書データのハッシュ値を流通文書識別情報として算出し、算出した流通文書識別情報と文書データとを追跡サーバ１２２に送信するようになっている。 Further, when the file 30g is designated as a document to be tracked, the tracking PC 121 calculates the hash value of the document data of the file 30g as the distribution document identification information, and tracks the calculated distribution document identification information and the document data. The data is transmitted to the server 122.

ここで、追跡サーバ１２２は、追跡スキャナ１２０や追跡ＰＣ１２１から送信された流通文書識別情報に基づいて操作テーブルから流通文書識別情報を抽出し、抽出した流通文書識別情報に関連付けられた操作情報に含まれる操作対象識別情報を検出して行くことにより、当該文書の流通経路を表す流通情報を生成し、返信するようになっている。 Here, the tracking server 122 extracts the distribution document identification information from the operation table based on the distribution document identification information transmitted from the tracking scanner 120 or the tracking PC 121, and is included in the operation information associated with the extracted distribution document identification information. By detecting the operation target identification information, the distribution information representing the distribution route of the document is generated and returned.

一方で、操作テーブルから当該流通文書識別情報を含む要素を抽出することができず、流通情報が得られなかった場合、または、追跡対象の文書の流通文書識別情報が検出できなかった場合には、追跡サーバ１２２は、追跡スキャナ１２０や追跡ＰＣ１２１から送信された文書データをコンテンツ解析サーバ１０２に送信するようになっている。 On the other hand, when the element including the distribution document identification information cannot be extracted from the operation table and distribution information cannot be obtained, or the distribution document identification information of the document to be tracked cannot be detected. The tracking server 122 transmits the document data transmitted from the tracking scanner 120 and the tracking PC 121 to the content analysis server 102.

ここで、コンテンツ解析サーバ１０２は、追跡サーバ１２２から送信された追跡対象の文書データを前述した文書登録処理部１７１および文書ピース分割部１７２を用いて複数の文書ピースに分割するようになっている。 Here, the content analysis server 102 divides the document data to be tracked transmitted from the tracking server 122 into a plurality of document pieces by using the document registration processing unit 171 and the document piece dividing unit 172 described above. .

コンテンツ解析サーバ１０２は、分割した各文書ピースに対する相関値が予め定められた閾値ＴＨ１より高い文書ピースを特徴量テーブルに基づいて検出し、検出した文書ピースが追跡対象の文書データに占める割合を類似度として算出し、算出した類似度が予め定められた閾値ＴＨ２より高い文書データを有する類似文書の流通文書識別情報を返信するようになっている。 The content analysis server 102 detects a document piece having a correlation value higher than a predetermined threshold TH1 for each divided document piece based on the feature amount table, and the proportion of the detected document piece in the document data to be tracked is similar. The distribution document identification information of similar documents having document data whose calculated similarity is higher than a predetermined threshold TH2 is returned.

ここで、複数の類似文書が検出された場合には、コンテンツ解析サーバ１０２は、ユーザが各類似文書を識別できる情報（例えば、流通文書識別情報）を追跡スキャナ１２０や追跡ＰＣ１２１に提示させ、追跡スキャナ１２０や追跡ＰＣ１２１は、ユーザに１つの類似文書を選択させるようになっている。 Here, when a plurality of similar documents are detected, the content analysis server 102 causes the tracking scanner 120 and the tracking PC 121 to present information that allows the user to identify each similar document (for example, distribution document identification information) for tracking. The scanner 120 and the tracking PC 121 allow the user to select one similar document.

なお、類似文書が複数ある場合には、コンテンツ解析サーバ１０２は、類似度が最も高い文書データを有する類似文書の流通文書識別情報を返信するようにしてもよい。 When there are a plurality of similar documents, the content analysis server 102 may return the circulation document identification information of the similar document having the document data with the highest similarity.

追跡サーバ１２２は、コンテンツ解析サーバ１０２から返信された流通文書識別情報に基づいて操作テーブルから操作情報を抽出し、抽出した操作情報に関連付けられた流通文書識別情報を検出して行くことにより、当該文書の流通経路を表す流通情報を生成し、返信するようになっている。 The tracking server 122 extracts the operation information from the operation table based on the distribution document identification information returned from the content analysis server 102, and detects the distribution document identification information associated with the extracted operation information. Distribution information representing the distribution route of the document is generated and returned.

このように、追跡サーバ１２２から返信された流通情報に対して、追跡スキャナ１２０および追跡ＰＣ１２１は、図３に示したように、系図等を以って画像表示するようになっている。 As described above, the tracking scanner 120 and the tracking PC 121 display the image with the genealogy and the like as shown in FIG. 3 with respect to the distribution information returned from the tracking server 122.

以上のように構成された文書追跡システムの追跡動作は、本発明の第１の実施の形態で図４を用いて説明した追跡動作と同様であるため、その説明を省略する。 Since the tracking operation of the document tracking system configured as described above is the same as the tracking operation described with reference to FIG. 4 in the first embodiment of the present invention, the description thereof is omitted.

ただし、本実施の形態においては、クライアントＰＣ１１０、プリンタ１１１、複合機１１２、１１３およびスキャナ１１４等の端末装置で入出力操作が行われるたびに、コンテンツ解析サーバ１０２に格納された特徴量テーブルに同一の内容の文書が何度も登録されてしまうことがある。 However, in this embodiment, every time an input / output operation is performed on a terminal device such as the client PC 110, the printer 111, the multifunction peripherals 112 and 113, and the scanner 114, it is the same as the feature amount table stored in the content analysis server 102. Documents with the contents of may be registered many times.

したがって、追跡動作で類似文書の検出を行うと、多数の類似文書が検出されることとなり、その結果をそのままユーザに提示して選択させるのは好ましくない。このため、本実施の形態においては、追跡スキャナ１２０や追跡ＰＣ１２１がコンテンツ解析サーバ１０２に対して類似文書の検出を要求するよりも、端末装置で入出力操作が行われるたびに、追跡サーバ１２２がコンテンツ解析サーバ１０２に対して類似文書の検出を要求するほうが好ましい。 Therefore, when similar documents are detected by the tracking operation, a large number of similar documents are detected, and it is not preferable to present the results as they are and select them. For this reason, in the present embodiment, the tracking server 122 or the tracking PC 121 requests the content analysis server 102 to detect similar documents each time the tracking server 122 performs an input / output operation on the terminal device. It is preferable to request the content analysis server 102 to detect similar documents.

この場合には、追跡サーバ１２２は、コンテンツ解析サーバ１０２から得られた検出結果に基づいて、操作情報を分析して該当する文書の流通経路を表す流通情報を生成して行くように構成する。 In this case, the tracking server 122 is configured to analyze the operation information based on the detection result obtained from the content analysis server 102 and generate distribution information indicating the distribution route of the corresponding document.

このように構成することにより、同じ流通経路（同じツリー）に含まれる類似文書を一つのセットとして扱うことができるため、ユーザに提示する選択肢を減らすことができる。具体的には、追跡サーバ１２２は、同じ流通経路に含まれる対象文書のうち、最も上流に存在する文書を候補として提示すればよい。 By configuring in this way, similar documents included in the same distribution route (the same tree) can be handled as one set, so that options presented to the user can be reduced. Specifically, the tracking server 122 may present the most upstream document as a candidate among the target documents included in the same distribution route.

以上に説明したように、本発明の第２の実施の形態としての文書追跡システムは、追跡対象の文書に内容が類似する文書を他の識別情報が割り当てられた文書から抽出し、抽出した文書に対する入出力の履歴を追跡するため、識別情報が得られない文書や、登録されていない識別情報が割り当てられた文書の追跡を行うことができる。 As described above, the document tracking system according to the second exemplary embodiment of the present invention extracts a document whose content is similar to the document to be tracked from a document to which other identification information is assigned, and extracts the extracted document. Since the input / output history is tracked, it is possible to track a document for which identification information cannot be obtained or a document to which identification information not registered is assigned.

本発明の第１の実施の形態としての文書追跡システムを示すブロック図である。1 is a block diagram showing a document tracking system as a first embodiment of the present invention. FIG. 本発明の第１の実施の形態としての文書追跡システムを構成するファイルサーバおよびコンテンツ解析サーバの機能ブロック図である。It is a functional block diagram of the file server and content analysis server which comprise the document tracking system as the 1st Embodiment of this invention. 本発明の第１の実施の形態としての文書追跡システムを構成する追跡スキャナまたは追跡ＰＣに表示される追跡結果の例を示すイメージである。It is an image which shows the example of the tracking result displayed on the tracking scanner or tracking PC which comprises the document tracking system as the 1st Embodiment of this invention. 本発明の第１の実施の形態としての文書追跡システムの追跡動作を示すフローチャートである。It is a flowchart which shows the tracking operation | movement of the document tracking system as the 1st Embodiment of this invention. 本発明の第２の実施の形態としての文書追跡システムを示すブロック図である。It is a block diagram which shows the document tracking system as the 2nd Embodiment of this invention. 本発明の第１の実施の形態としての文書追跡システムを構成する複合機、追跡サーバおよびコンテンツ解析サーバの機能ブロック図である。1 is a functional block diagram of a multifunction peripheral, a tracking server, and a content analysis server that constitute a document tracking system according to a first embodiment of the present invention. FIG.

Explanation of symbols

１、１０１ファイルサーバ
２、１０２コンテンツ解析サーバ
１０、１１０クライアントＰＣ
１１、１１１プリンタ
１２、１３、１１２、１１３複合機
１４、１１４スキャナ
２０、１２０追跡スキャナ
２１、１２１追跡ＰＣ
２２、１２２追跡サーバ
４０監視フォルダ
５０文書監視部
５１監視フォルダ設定部
５２文書保存監視部
５３文書登録要求部
６０、１７０文書登録部
６１、１７１文書登録処理部
６２、１７２文書ピース分割部
６３、１７３特徴量テーブル登録部
６４、１７４特徴量テーブル格納部
１４１スキャン部
１４２印刷部
１４３操作情報収集部
１５１画像読取部
１５２、１５６紙紋検出部
１５５画像描画部
１６１操作テーブル格納部
１６２操作テーブル登録部 1, 101 File server 2, 102 Content analysis server 10, 110 Client PC
11, 111 Printer 12, 13, 112, 113 Multifunction device 14, 114 Scanner 20, 120 Tracking scanner 21, 121 Tracking PC
22, 122 Tracking server 40 Monitoring folder 50 Document monitoring unit 51 Monitoring folder setting unit 52 Document storage monitoring unit 53 Document registration request unit 60, 170 Document registration unit 61, 171 Document registration processing unit 62, 172 Document piece division unit 63, 173 Feature amount table registration unit 64, 174 Feature amount table storage unit 141 Scan unit 142 Printing unit 143 Operation information collection unit 151 Image reading unit 152, 156 Paper pattern detection unit 155 Image drawing unit 161 Operation table storage unit 162 Operation table registration unit

Claims

In a document tracking system that tracks input / output operations on a document,
A file server for storing the document;
A feature quantity table storage device that prestores a feature quantity table in which registered document identification information for identifying the document is associated with a feature quantity acquired from the content of the document stored in the file server;
Each time an input / output operation is performed on the document, at least one terminal device that acquires distribution document identification information for identifying the document;
An operation table storage device for storing an operation table in which operation information representing an input / output operation by the terminal device is associated with the circulation document identification information;
A tracking target document designating device for specifying a tracking target document;
The feature amount is acquired from the content of the document specified in the tracking target document specifying device, and the acquired feature amount is compared with the feature amount included in the feature amount table. A document identification information detection device for detecting registered document identification information of similar documents;
Operation information representing an input / output operation for a document identified by the registered document identification information detected by the document identification information detection device is extracted from the operation table, and an input / output history for the document is extracted based on the extracted operation information. A document tracking system comprising: a tracking processing device that performs a tracking process for tracking a document.

The tracking target document specifying device acquires the distribution document identification information from the tracking target document,
The tracking processing device extracts operation information representing an input / output operation for a document identified by the distribution document identification information acquired by the tracking target document designating device from the operation table, and based on the extracted operation information, Performs tracking processing to track the input / output history from documents stored in the file server,
The document identification information detection apparatus detects registered document identification information of a document whose content is similar to the document to be tracked only when the result of the tracking process is not obtained. Document tracking system described.

A document monitoring device for monitoring a document stored in a specific location of the file server;
3. The document tracking system according to claim 1, wherein the document monitoring apparatus causes the feature quantity table storage device to update the feature quantity table when detecting a change in a state of the document. 4. .

In a document tracking system that tracks input / output operations on a document,
Each time an input / output operation is performed on the document, at least one terminal device that acquires distribution document identification information for identifying the document;
A feature amount table storage device for storing a feature amount table in which the circulation document identification information is associated with a feature amount acquired from the content of the document;
An operation table storage device for storing an operation table in which operation information representing an input / output operation by the terminal device is associated with the circulation document identification information;
A tracking target document designating device for specifying a tracking target document;
The feature amount is acquired from the content of the document specified in the tracking target document specifying device, and the acquired feature amount is compared with the feature amount included in the feature amount table. A document identification information detection device for detecting distribution document identification information of similar documents;
Operation information representing an input / output operation for a document identified by the distribution document identification information detected by the document identification information detection device is extracted from the operation table, and an input / output history for the document is extracted based on the extracted operation information. A document tracking system comprising: a tracking processing device that performs a tracking process for tracking a document.

The tracking target document specifying device acquires the distribution document identification information from the tracking target document,
The tracking processing device extracts operation information representing an input / output operation for a document acquired by the tracking target document specifying device and identified by the distribution document identification information from the operation table, and based on the extracted operation information, the distribution processing Perform tracking processing to track the input / output history of the document identified by the document identification information,
5. The document identification information detection apparatus detects distribution document identification information of a document whose content is similar to the document to be tracked only when the result of the tracking process is not obtained. Document tracking system described.

When a plurality of results are obtained by the tracking process, the tracking processing device outputs the plurality of results to the tracking target document specifying device, and selects one result for the user of the tracking target document specifying device. The document tracking system according to any one of claims 1 to 5, wherein

7. The document tracking system according to claim 1, wherein the feature amount table storage device acquires a document piece obtained by fragmenting the document as a feature amount of the document.

The document tracking system according to any one of claims 1 to 7, wherein the tracking target document designating device is configured by a scanner that reads the tracking target document.

8. The document tracking system according to claim 1, wherein the tracking target document specifying device specifies a file name of the tracking target document.

10. The document tracking system according to claim 1, wherein the distribution document identification information is recording medium identification information for identifying a recording medium on which the content of the document is recorded.

The document tracking system according to claim 10, wherein the recording medium identification information is information based on a concavo-convex pattern on a surface of a sheet as the recording medium.

11. The document tracking system according to claim 10, wherein the recording medium identification information is information based on a pattern of metal fibers randomly inserted in a sheet as the recording medium.

11. The document tracking system according to claim 10, wherein the recording medium identification information is information based on an identifier recorded on an IC chip embedded in a sheet as the recording medium.

The document tracking system according to claim 1, further comprising a tracking result output device that outputs a result of the tracking process.

The document tracking system according to claim 1, wherein the operation information includes user identification information for identifying a user who has performed an input / output operation represented by the operation information. .

In a document tracking method for causing a computer to track input / output operations for documents stored in a file server,
A feature amount table storing step for storing in advance a feature amount table in which registered document identification information for identifying the document is associated with the feature amount acquired from the content of the document stored in the file server;
An identification information acquisition step of acquiring distribution document identification information for identifying the document each time an input / output operation is performed on the document;
An operation table storing step for storing an operation table in which operation information representing the input / output operation is associated with the distribution document identification information;
A tracked document specifying step in which a tracked document is specified;
The feature amount is acquired from the content of the document specified in the tracking target document specification step, and the acquired feature amount is compared with the feature amount included in the feature amount table. A document identification information detecting step for detecting registered document identification information of a similar document;
Operation information representing an input / output operation for the document identified by the registered document identification information detected in the document identification information detection step is extracted from the operation table, and an input / output history for the document is extracted based on the extracted operation information. And a tracking process step for tracking the document.

In a document tracking method for causing a computer to track input / output operations on a document,
An identification information acquisition step of acquiring distribution document identification information for identifying the document each time an input / output operation is performed on the document;
A feature table storing step for storing a feature table in which the circulation document identification information is associated with a feature acquired from the content of the document;
An operation table storing step for storing an operation table in which operation information representing the input / output operation is associated with the distribution document identification information;
A tracked document specifying step in which a tracked document is specified;
The feature amount is acquired from the content of the document specified in the tracking target document specification step, and the acquired feature amount is compared with the feature amount included in the feature amount table. A document identification information detecting step for detecting distribution document identification information of similar documents;
The operation information representing the input / output operation for the document identified by the distribution document identification information detected in the document identification information detection step is extracted from the operation table, and the input / output history for the document is extracted based on the extracted operation information. And a tracking step for tracking the document.